MOABS: model based analysis of bisulfite sequencing data
 Deqiang Sun^{1, 2},
 Yuanxin Xi^{1, 2},
 Benjamin Rodriguez^{1, 2},
 Hyun Jung Park^{1, 2},
 Pan Tong^{1, 2},
 Mira Meong^{3},
 Margaret A Goodell^{3} and
 Wei Li^{1, 2}Email author
DOI: 10.1186/gb2014152r38
© Sun et al.; licensee BioMed Central Ltd. 2014
Received: 12 October 2013
Accepted: 24 February 2014
Published: 24 February 2014
Abstract
Bisulfite sequencing (BSseq) is the gold standard for studying genomewide DNA methylation. We developed MOABS to increase the speed, accuracy, statistical power and biological relevance of BSseq data analysis. MOABS detects differential methylation with 10fold coverage at singleCpG resolution based on a BetaBinomial hierarchical model and is capable of processing two billion reads in 24 CPU hours. Here, using simulated and real BSseq data, we demonstrate that MOABS outperforms other leading algorithms, such as Fisher’s exact test and BSmooth. Furthermore, MOABS analysis can be easily extended to differential 5hmC analysis using RRBS and oxBSseq. MOABS is available at http://code.google.com/p/moabs/.
Background
DNA methylation, an epigenetic modification affecting organization and function of the genome, plays a critical role in both normal development and disease. Until recently, the only known DNA methylation was 5methylcytosine (5mC) at CpG dinucleotides, which is generally associated with transcriptional repression [1]. In 2009, another form of DNA methylation termed 5hydroxymethylcytosine (5hmC) [2] was found to be involved in active demethylation [3] and gene regulation [4]. Understanding the functional role of DNA methylation requires knowledge of its distribution in the genome [5, 6]. Bisulfite conversion of unmethylated Cs to Ts followed by deep sequencing (BSSeq) has emerged as the gold standard to study genomewide DNA methylation at singlenucleotide resolution. The most popular protocols include RRBS (Reduced Representation Bisulfite Sequencing) [7] and WGBS (Whole Genome Bisulfite Sequencing) [8] for the combination of 5mc and 5hmc, oxBSSeq (Oxidative Bisulfite Sequencing) [9] for 5mc and TABSeq (Tetassisted Bisulfite Sequencing) [10] for 5hmc, respectively. After mapping BSseq reads to the genome, the proportion of unchanged Cs is regarded as the absolute DNA methylation level. Due to random sampling nature of BSseq, deep sequencing (e.g. >30 fold) is usually required to reduce the measurement error. Technological advances and reduced costs have seen a significant increase in interest in BSseq among biologists. Currently, BSseq is widely used by small laboratories to profile cell lines and animal models [11], as well as by large consortiums such as the NIH ENCODE, Roadmap Epigenomics, The Cancer Genome Atlas (TCGA), and European BLUEPRINT to profile thousands of cell populations. Hence, it is expected that BSseq data will continue to grow exponentially. However, despite recent progress [7, 12–14], computational methods designed for issues specific to BSseq are much less developed than those for other sequencing applications such as ChIPSeq and RNAseq.
The most fundamental aspects of BSseq data analysis include read mapping and differential methylation detection. We previously developed one of the most widely used BS mapping programmed BSMAP [15]. After read mapping, the most common task is the identification of differentially methylated regions (DMRs) between samples, such as disease versus normal. Based on the biological question, DMRs can range in size from a single CpG (DMC: differentially methylated CpG) to tens of millions of bases. Although several statistical methods have been applied to DMR detection [12], among which Fisher’s exact test pvalue (FETP) method [16] is the most popular, several challenges remain to be addressed. 1) Statistical Power: most previous methods are very conservative in power and require deep sequencing (e.g. 30 fold). For example, Hansen [13] recently calculated that for single CpG methylation level “even 30× coverage yields standard error as large as 0.09”. As a compromise, many studies assumed that neighboring CpGs have similar methylation levels, thus can be combined together within a genomic region (e.g. 1 kb) to increase the statistical power [17]. For example, BSmooth [13] performs local smoothing followed by ttest for DMR detection. While this strategy may be applicable in many cases, regional average analysis will unfortunately miss lowCpGdensity DMRs that are abundant in the genome and critical for gene expression, such as TFBSs. Most TFBSs are small (i.e. < 50 bp) as implied by highresolution ChIPseq and ChIPexo experiments [18] and contain few or even a single CpG(s) that are in general differentially methylated compared to surrounding ones, thus are very likely to be “overlooked” by the regional average analysis. 2) Biological Significance: previous methods use pvalue for statistical significance of DMR. This pvalue metric only tells whether a region is differentially methylated, but does not directly measure the magnitude of the methylation difference. A similar problem also exists in gene expression profiling, where the pvalue does not directly measure the expression foldchange [19]. Since sequencing depth in BSseq experiments can fluctuate by an order of magnitude in different loci, a very small methylation difference, although not biologically meaningful, can easily return a significant pvalue if the underlying sequencing depth is deep enough. On the other hand, the nominal methylation difference, i.e. direct subtraction of two methylation ratios, suffers significantly from the random sampling error such that a large difference with low sequencing depth is not likely to be statistically meaningful. 3) Biological Variation is an essential feature of DNA methylation [20], and should be handled carefully to detect reproducible DMRs that represent the common characteristics of the sample group. However, most previous methods fail to account for biological variation between replicates, and simply pool the raw data from replicates for DMR detection. Some of the resulting DMRs may have significant differences at the mean level but might not be reproducible between replicates, and hence are “falsepositives”. To our knowledge, BSmooth [13] is the first replicateaware program that accounted for biological variation using a modified ttest.
In response to these challenges, we developed a powerful differential methylation analysis algorithm termed MOABS: Modelbased Analysis of Bisulfite Sequencing data. Its source code is available as Additional file 1. MOABS uses a BetaBinomial hierarchical model to capture both sampling and biological variations, and accordingly adjusts observed nominal methylation difference by sequencing depth and sample reproducibility. The resulting credible methylation difference (CDIF) is a single metric that combines both biological and statistical significance of differential methylation. Using both simulated and real wholegenome BSseq data from mouse brain tissues and stem cells, we demonstrate the superior performance of MOABS over other leading methods, especially at low sequencing depth. Furthermore, one practical challenge is that BSseq data analysis is usually computational intensive, and requires multiple steps. We therefore seamlessly integrate several major BSseq processing procedures into MOABS, including read mapping, methylation ratio calling, identification of hypo or hyper methylated regions from one sample, and differential methylation from multiple samples. MOABS is implemented in C++ with highly efficient numerical algorithms, and thus is at least 10 times faster than other popular packages. For example, it takes only 24 CPU hours to detect differential methylation from 2 billion aligned reads. Together, MOABS provides a comprehensive, accurate, efficient and userfriendly solution for analyzing largescale BSseq data.
Results and discussion
BetaBinomial hierarchical model for both sampling and biological variations
For a single CpG locus in the jth biological replicate of condition i, we denote the number of total reads, the number of methylated reads and methylation ratio as n_{ ij }, k_{ ij } and p_{ ij }, respectively. For a typical two group comparison, i = 1,2 and j = 1, 2, …, N, where N is the number of replicates in each condition. The n_{ ij } and k_{ ij } are observations from experiments, while the p_{ ij } is unknown with k_{ ij }/n_{ ij } as its nominal estimation. Given p_{ ij }and n_{ ij }, the number of methylated reads k_{ ij } is characterized by the sampling variation from sequencing and can be modeled by a Binomial distribution: k_{ ij } ~ Binomial(n_{ ij }, p_{ ij }). The posterior distribution of the methylation ratio p_{ ij } will then follow a Beta distribution Beta(α_{ ij }, β_{ ij }) and can be estimated using an Empirical Bayes approach. The prior distribution will be estimated from the whole genome, in which most CpGs are either fully methylated or fully unmethylated, resulting in a bimodal distribution. The Empirical Bayes approach will automatically incorporate such bimodal information in the methylation ratio estimation and hence increases the power of our analysis.
Credible methylation difference (CDIF) is a single metric for both statistical and biological significance of differential methylation
In practice, CDIF represents the conservative estimation of the true methylation difference, i.e. for 97.5% of chance the absolute value of true methylation difference is greater than or equal to that of CDIF. The CDIF value will be assigned to 0 is there is no significant difference. Constructed in this way, the CDIF value, if greater than the resolution defined as min(1/n_{1}, 1/n_{2}), guarantees a significant pvalue from Fisher’s exact test, and at the same time represents the magnitude of methylation difference. The sequencing depth will largely influence CDIF, since bigger n_{ i } will make a smaller 95% CI of the methylation difference, normally resulting in greater CDIF value.
We believe CDIF is a better metric to capture the methylation difference than statistical pvalue or nominal methylation difference. Three CpG examples are shown in Figure 1c. According to pvalue 1.4e10, CpG #3 is the most significant one. However, this significant pvalue, which is largely driven by the high sequencing depth, does not correctly represent the actual biological difference of 0.3, which is the smallest among three CpGs. On the other hand, if we use nominal difference, CpG #2 would be the most significant. However, its low sequencing depth makes this high nominal difference unreliable. CDIF is able to penalize the nominal difference according to its statistical significance and ranks CpG #1 as the most significant followed by CpGs #2 and #3, although CpG #1 does not have the most significant pvalue or nominal difference. Taken together, CDIF reaches a well balance between statistical and biological significance and gives a more stable and biological meaningful interpretation and ranking of differential methylation.
Functions and performance of the MOABS pipeline
All the modules are wrapped in a single master script such that users can specify the input BSseq reads and run all the modules one by one automatically. The MOABS pipeline is developed using C++ with highly efficient numerical algorithms, native multiplethreading and cluster support so that multiple jobs can run in parallel on different computing nodes. Several mathematical and computational optimizations have made MOABS pipeline extremely efficient. For example, it takes only one hour on 24 CPUs (IBM power7 4 Ghz) to detect differential methylation for approximately 30 million CpGs in the human genome based on 2 billion aligned reads. MOABS is significantly faster than other software. For example, a benchmark (Additional file 2: Table S1) based on public BSseq data in mouse hematopoietic stem cell (HSC) [26] reveals that MOABS is roughly 3.3, 1.7, and 1.4 times faster than BSmooth in bisulfite mapping, methylation call and differential methylation analysis, respectively. In summary, MOABS is a comprehensive, accurate, efficient and userfriendly solution for analyzing largescale BSseq data.
Simulated BSseq data reveals the superior performance of MOABS
MOABS improves the detection of allele specific DNA methylation
The 32 known DMRs can be easily detected by both MOABS and FETP mainly because they have large methylation differences and high read depth (54fold in DMR regions), which is consistent with our simulation study. However, deep bisulfite sequencing of the mammalian genome is still quite expensive. This reality motivated us to test to what extent these known DMRs can still be recovered at a lower sequencing depth. The same previous procedure was applied to compare all three methods. The number of recovered known DMRs at 5% FDR is plotted at each sequencing depth from random sampling (Figure 4b). We observe that the lower sequencing depth, the greater performance difference between MOABS and FETP. For example, when the depth is at 11fold, MOABS recovers roughly 90% of known DMRs, while FETP only detects 78% of DMRs. When the depth is further lowed to 3.1fold, MOABS can still recover roughly 70% of known DMRs, while FETP detects 44% DMRs. Interestingly, BSmooth’s performance is largely independent of sequencing depth, probably because it was designed mainly for low sequencing depth. Indeed, at a low depth of 3.1fold, BSmooth outperforms FETP. However, at sequencing depth higher than 3.1fold, BSmooth has a lower sensitivity than the other two methods. Collectively, we conclude that MOABS is superior in DMR detection, especially at low sequencing depth.
MOABS reliably reveals differential methylation underlying TFBSs
We then used TFBSs to evaluate DMC detection assuming tissuespecific TF binding is associated with tissuespecific hypomethylation. For a fair comparison, we calculated FDR for each method based on a methodspecific null distribution obtained through permutation of read sample labels. At FDR of 5%, MOABS, FETP and BSmooth predicted 32,867, 32,047 and 18,021 differentially methylated TFBSs respectively (Figure 5c). We also used a method similar to Gene Set Enrichment Analysis (GSEA [29]) to test enrichment of TFBS moving down the lists of DMCs ranked by different methods. MOABS shows the highest enrichment score (Figure 5d) of TFBS. For example, with the same 4,000 most significant DMCs, MOABS recovers 1,000 TFBSs while FETP only predicts ~600 TFBSs (i.e., 40% less).
In this instance, the sequencing depth is sufficient to enable MOABS and FETP to recover very similar number of TFBSs. However, when we randomly sampled reads to a depth of 4fold, MOABS recovered many more TFBS (15,349) than FETP (7,520) and BSmooth (4,028) (Figure 5e). Again, at this low sequencing depth, MOABS not only recovers 2–3 fold more TFBSs, but also exhibit more significant score of TFBS enrichment in the most significant DMCs. In both high and low sequencing depths, BSmooth recovers fewer TFBSs mainly because its smoothing function easily ignores small region with a few CpGs. Together, using tissue specific in vivo TFBSs, we demonstrate that MOABS can better recover differential methylation in small regulatory regions with a few CpGs, especially at low sequencing depth (e.g. 4fold).
MOABS detects differential 5hmc in ES cells using RRBS and oxBSSeq
When applied to RRBS and oxBSseq data derived from ES cell lines with different passages [9], MOABS reported 299 genes with decreased 5hmc and 125 genes with increased 5hmc (Additional file 6: Table S3) in promoters in the later passage P20, which is consistent with the mass spectrometry data [9] that shows overall reduced 5hmc in later passage. This result implies that the epigenetic stability of ES cells is impacted by prolonged in vitro culture. This is an important issue for both the safety and efficacy of stem cellderived tissues in cellreplacement therapies as well as the appropriate interpretation of experimental models. Monoallelic gene expression, including genomic imprinting, is primarily regulated through epigenetic mechanisms and thus can serve as a useful model of epigenetic stability. As expected, our analysis identified five imprinted genes with decreased 5hmc: Plagl1, Sfmbt2, Gpr1, Kcnq1 and Kcnq1ot1, as well as one imprinted gene with increased 5hmc, Pcdha4g.
The role of 5hmc in disease remains unclear. A recent study suggests that genomewide loss of 5hmc is an epigenetic feature of neurodegenerative Huntington’s disease [30]. The authors identified 559 genes with decreased 5hmc in the diseased mice compared to healthy controls. A considerable fraction of these diseasespecific genes were uncovered in our differential 5hmc analysis in ES cells. This included 26 of 299 and 11 of 125 genes (overlapping pvalue < 8e5) with decreased and increased 5hmc, respectively. These results suggest that one potential consequence of decreased epigenetic stability over time in ES cells is the acquisition of pathological epimutations.
The observed bias toward loss of 5hmc in ES cells upon longterm culture may also suggest stem cell properties, such as pluripotency, are affected. Ficz and colleagues [31] showed that knockdown of Tet1/Tet2 in mouse ES cells down regulates epigenetic reprogramming and pluripotencyrelated genes such as Esrrb, Klf2, Tcl1, Zfp42, Dppa3, Ecat1 and Prdm14. Decreased expression was concomitant with both decreased 5hmC and increased 5mC at the gene promoters. In our differential 5hmc analysis in ES cells, we observed decreased 5hmc at three of these genes: Ecat1, Esrrb, and Zfp42. Together, we conclude that MOABS can be used effectively to infer differential 5hmc using RRBS and oxBSSeq.
Conclusions
While progress in nextgeneration sequencing allows increasingly affordable BSseq experiments, the resulting data generated poses significant and unique bioinformatics challenges. The lack of efficient computational methods is the major bottleneck that prevents a broad adoption of such powerful technologies. In response to this challenge, we developed MAOBS, an accurate, comprehensive, efficient, and userfriendly pipeline for BSseq data analysis. The MOABS analysis is novel and significant in two major aspects: 1) MOABS CDIF value provides an innovative strategy to combine statistical pvalue and biological difference into a single metric, which will bring biological relevance to the interpretation of the DNA methylation data. 2) MOABS does not sacrifice resolution with low sequencing depth. By relying on the BetaBinomial Hierarchical Model and Empirical Bayes approach, MOABS has enough power to detect singleCpGresolution differential methylation in lowCpGdensity regulatory regions, such as TFBSs, with as low as 10fold. The lowdepth BSseq experimental design enables remarkable cost reduction per sample. In Figure 3 simulated data, we showed that MOABS achieved roughly 80% sensitivity with 5% FDR at 10fold sequencing depth. In Figure 4b real data, we showed that as sequencing depth decreased to 11fold by sampling, MOABS recovered roughly 90% of known DMRs. The MOABS sensitivity starts to drop dramatically when sequencing depth is further reduced. Based on the above two observations, we would recommend lowdepth (e.g. 10fold) BSseq on more biological samples with the same limited budget, which in most scenarios will provide greater biological insights than highdepth BSseq on fewer samples.
Copy Number Variation (CNV) is a common issue in many disease related bisulfite sequencing. The sequencing depth is normally higher or lower in high (or low) copynumber regions and this depth bias has an impact on our CDIF calculation. To correct this bias, we have included a separate script ‘redepth.pl’ in the MOABS package. Users can select their favorite CNV detection tools [32], such as CNVSeq, ControlFREEC and VarScan, to predict the CNV region from genome sequencing or bisulfite sequencing. Nearly all these tools output a bed file of CNV regions with predicted copy number based on a pvalue cutoff. The script ‘redepth.pl’ manipulates the read alignment BAM files according to the CNV prediction. If a read is located in a CNV region with a predicted copy number of X in a diploid genome, the read will have a probability of 2/X to be kept in the new BAM files. Reads in the nonCNV regions will keep unchanged. This process will result in CNV bias free BAM files for downstream analysis.
Largescale case–control epigenomewide association study (EWAS) is a powerful strategy to identify diseaseassociated epigenetic biomarkers. Currently, most studies use Illumina bisulfite arrays (e.g. 450 K) mainly due to the cost constraint. MOABS in theory can also be applied to such studies when EWAS bisulfite sequencing data are publicly available.
In summary, as DNA methylation is increasingly recognized as a key regulator of genomic function, deciphering its genomewide distribution using BSseq in numerous samples and conditions will continue to be a major research interest. MOABS significantly increase the speed, accuracy, statistical power and biological relevance of the BSseq data analysis. We believe that MOABS’s superior performance will greatly facilitate the study of epigenetic regulation in numerous biological systems and disease models.
Materials and methods
The major portions of the methods for the model are described here. In the Additional file 7, we provide more details and additional methods to make the model complete.
Distribution for difference of two Binomial proportions
where α = k + α_{0}, β = nk + β_{0}, if Be(α_{0}, β_{0}) is priori distribution for p. We also give formulas to numerically calculate the confidence interval for the single Binomial proportional p under observed (n, k).
The methylation ratio difference at a defined genomic locus from two biological samples is the difference of two Binomial proportions p_{1}p_{2}. Many methods have been proposed to estimate the confidence interval p_{1}p_{2} of and their merits have been subject to decades of considerable debate [22, 33–38]. No comprehensive comparison of currently available methods is available. This motivated us to turn to the direct and exact numerical calculation of confidence interval from Bayesian perspective.
where f_{ i }(p_{ i }) ≡ f(p_{ i }; n_{ i }, k_{ i }). Boundary conditions like the proportional area condition, minimal length condition can be applied to get unique solutions for (a, b).
Distribution for difference of difference
where f_{ i }(p_{ i }) ≡ f(p_{ i }; n_{ i }, k_{ i }).
where f(t) and f′(t′) are the distributions of 5hmc ratio for sample S and S′ respectively. After distribution of difference of 5hmc ratio between two samples is obtained, similarly confidence interval, credible difference and similarity test pvalue can be calculated.
Distribution for measurements with replicates
where the expression inside log is the probability P(k_{ i }; n_{ i }, α, β) defined in equation (5) and B (α, β)is the Beta function.
Abbreviations
 CDIF:

Credible difference
 DMC:

Differentially methylated cytosine
 DMR:

Differentially methylated region
 FETP:

Fisher’s exact test Pvalue
 TFBS:

Transcription factor binding site.
Declarations
Acknowledgements
We are grateful to Wei Xie for sharing the mouse methylome data, and Grant A. Challen for critical reading of this manuscript. This work was supported by CPRIT RP110471C3 and NIH R01HG007538 (to WL).
Authors’ Affiliations
References
 Jones PA: Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012, 13: 484492. 10.1038/nrg3230.PubMedView ArticleGoogle Scholar
 Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, Agarwal S, Iyer LM, Liu DR, Aravind L, Rao A: Conversion of 5methylcytosine to 5hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science (New York, NY). 2009, 324: 930935. 10.1126/science.1170116.View ArticleGoogle Scholar
 He YF, Li BZ, Li Z, Liu P, Wang Y, Tang Q, Ding J, Jia Y, Chen Z, Li L, Sun Y, Li X, Dai Q, Song CX, Zhang K, He C, Xu GL: Tetmediated formation of 5carboxylcytosine and its excision by TDG in mammalian DNA. Science. 2011, 333: 13031307. 10.1126/science.1210944.PubMedPubMed CentralView ArticleGoogle Scholar
 Song CX, Yi C, He C: Mapping recently identified nucleotide variants in the genome and transcriptome. Nat Biotechnol. 2012, 30: 11071116. 10.1038/nbt.2398.PubMedPubMed CentralView ArticleGoogle Scholar
 Laird PW: Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010, 11: 191203. 10.1038/nrg2732.PubMedView ArticleGoogle Scholar
 Law JA, Jacobsen SE: Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet. 2010, 11: 204220. 10.1038/nrg2719.PubMedPubMed CentralView ArticleGoogle Scholar
 Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES: Genomescale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008, 454: 766770.PubMedPubMed CentralGoogle Scholar
 Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE: Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008, 452: 215219. 10.1038/nature06745.PubMedPubMed CentralView ArticleGoogle Scholar
 Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W, Balasubramanian S: Quantitative Sequencing of 5Methylcytosine and 5Hydroxymethylcytosine at SingleBase Resolution. Science. 2012, 336: 934937. 10.1126/science.1220671.PubMedView ArticleGoogle Scholar
 Yu M, Hon GC, Szulwach KE, Song CX, Zhang L, Kim A, Li X, Dai Q, Shen Y, Park B, Min JH, Jin P, Ren B, He C: Baseresolution analysis of 5hydroxymethylcytosine in the mammalian genome. Cell. 2012, 149: 13681380. 10.1016/j.cell.2012.04.027.PubMedPubMed CentralView ArticleGoogle Scholar
 Challen GA, Sun D, Jeong M, Luo M, Jelinek J, Berg JS, Bock C, Vasanthakumar A, Gu H, Xi Y, Liang S, Lu Y, Darlington GJ, Meissner A, Issa JP, Godley LA, Li W, Goodell MA: Dnmt3a is essential for hematopoietic stem cell differentiation. Nat Genet. 2012, 44: 2331.View ArticleGoogle Scholar
 Bock C: Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012, 13: 705719. 10.1038/nrg3273.PubMedView ArticleGoogle Scholar
 Hansen KD, Langmead B, Irizarry RA: BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. 2012, 13: R8310.1186/gb20121310r83.PubMedPubMed CentralView ArticleGoogle Scholar
 Akalin A, Kormaksson M, Li S, GarrettBakelman FE, Figueroa ME, Melnick A, Mason CE: methylKit: a comprehensive R package for the analysis of genomewide DNA methylation profiles. Genome Biol. 2012, 13: R8710.1186/gb20121310r87.PubMedPubMed CentralView ArticleGoogle Scholar
 Xi Y, Li W: BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinforma. 2009, 10: 23210.1186/1471210510232.View ArticleGoogle Scholar
 Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, TontiFilippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, AntosiewiczBourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009, 462: 315322. 10.1038/nature08514.PubMedPubMed CentralView ArticleGoogle Scholar
 Gu H, Bock C, Mikkelsen TS, Jäger N, Smith ZD, Tomazou E, Gnirke A, Lander ES, Meissner A: Genomescale DNA methylation mapping of clinical samples at singlenucleotide resolution. Nat Meth. 2010, 7: 133136. 10.1038/nmeth.1414.View ArticleGoogle Scholar
 Rhee Ho S, Pugh BF: Comprehensive Genomewide ProteinDNA Interactions Detected at SingleNucleotide Resolution. Cell. 2011, 147: 14081419. 10.1016/j.cell.2011.11.013.PubMedPubMed CentralView ArticleGoogle Scholar
 Feng J, Meyer CA, Wang Q, Liu JS, Liu XS, Zhang Y: GFOLD: a generalized fold change for ranking differentially expressed genes from RNAseq data. Bioinformatics. 2012, 28: 27822788. 10.1093/bioinformatics/bts515.PubMedView ArticleGoogle Scholar
 Feinberg AP, Irizarry RA: Evolution in health and medicine Sackler colloquium: Stochastic epigenetic variation as a driving force of development, evolutionary adaptation, and disease. Proc Natl Acad Sci USA. 2010, 107: 17571764. 10.1073/pnas.0906183107.PubMedPubMed CentralView ArticleGoogle Scholar
 Bonnans JF, Gilbert JC, Lemaréchal C, Sagastizábal CA: Numerical optimization: theoretical and practical aspects. 2006, New Yor: SpringerVerlagGoogle Scholar
 Kawasaki Y: Comparison of exact confidence intervals for the difference between two independent binomial proportions. Adv Appl Stat. 2010, 15: 157170.Google Scholar
 Brenner DJ, Quan H: Exact confidence limits for binomial proportions—Pearson and Hartley revisited. J R Stat Soc. Series D (The Statistician). 1990, 39: 391397.Google Scholar
 Lin X, Sun D, Rodriguez B, Zhao Q, Sun H, Zhang Y, Li W: BSeQC: quality control of bisulfite sequencing experiments. Bioinformatics. 2013, 29: 32273229. 10.1093/bioinformatics/btt548.PubMedPubMed CentralView ArticleGoogle Scholar
 Xie W, Barr CL, Kim A, Yue F, Lee AY, Eubanks J, Dempster EL, Ren B: Baseresolution analyses of sequence and parentoforigin dependent DNA methylation in the mouse genome. Cell. 2012, 148: 816831. 10.1016/j.cell.2011.12.035.PubMedPubMed CentralView ArticleGoogle Scholar
 Jeong M, Sun D, Luo M, Huang Y, Challen GA, Rodriguez B, Zhang X, Chavez L, Wang H, Hannah R, Kim SB, Yang L, Ko M, Chen R, Göttgens B, Lee JS, Gunaratne P, Godley LA, Darlington GJ, Rao A, Li W, Goodell MA: Large conserved domains of low DNA methylation maintained by Dnmt3a. Nat Genet. 2014, 46: 1723.PubMedPubMed CentralView ArticleGoogle Scholar
 Stadler MB, Murr R, Burger L, Ivanek R, Lienert F, Scholer A, van Nimwegen E, Wirbelauer C, Oakeley EJ, Gaidatzis D, Tiwari VK, Schübeler D: DNAbinding factors shape the mouse methylome at distal regulatory regions. Nature. 2011, 480: 490495.PubMedGoogle Scholar
 Hannah R, Joshi A, Wilson NK, Kinston S, Gottgens B: A compendium of genomewide hematopoietic transcription factor maps supports the identification of gene regulatory control mechanisms. Exp Hematol. 2011, 39: 531541. 10.1016/j.exphem.2011.02.009.PubMedView ArticleGoogle Scholar
 Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstrale M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC1alpharesponsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003, 34: 267273. 10.1038/ng1180.PubMedView ArticleGoogle Scholar
 Wang F, Yang Y, Lin X, Wang JQ, Wu YS, Xie W, Wang D, Zhu S, Liao YQ, Sun Q, Yang Y, Guo C, Han C, Tang T: Genomewide loss of 5hmC is a novel epigenetic feature of Huntington’s disease. Hum Mol Genet. 2013, 22: 36413653. 10.1093/hmg/ddt214.PubMedView ArticleGoogle Scholar
 Ficz G, Branco MR, Seisenberger S, Santos F, Krueger F, Hore TA, Marques CJ, Andrews S, Reik W: Dynamic regulation of 5hydroxymethylcytosine in mouse ES cells and during differentiation. Nature. 2011, 473: 398402. 10.1038/nature10008.PubMedView ArticleGoogle Scholar
 Min Zhao QW, Quan W, Peilin J, Zhongming Z: Computational tools for copy number variation (CNV) detection using nextgeneration sequencing data: features and perspectives. BMC Bioinforma. 2013, 14: S1View ArticleGoogle Scholar
 Newcombe RG: Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med. 1998, 17: 873890. 10.1002/(SICI)10970258(19980430)17:8<873::AIDSIM779>3.0.CO;2I.PubMedView ArticleGoogle Scholar
 Wilson EB: Probable inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927, 22: 209212. 10.1080/01621459.1927.10502953.View ArticleGoogle Scholar
 Santner TJ, Pradhan V, Senchaudhuri P, Mehta CR, Tamhane A: Smallsample comparisons of confidence intervals for the difference of two independent binomial proportions. Comput Stat Data Anal. 2007, 51: 57915799. 10.1016/j.csda.2006.10.018.View ArticleGoogle Scholar
 Nurminen MM, Newcombe RG: Score intervals for the difference of two binomial proportions. http://markstat.net/en/images/stories/notes_on_score_intervals.pdf,
 Pradhan VB, Tathagata : Confidence interval of the difference of two independent binomial proportions using weighted profile likelihood. Commun Stat Simul Comput. 2008, 37: 645659. 10.1080/03610910701767721.View ArticleGoogle Scholar
 Coe PR, Tamhane AC: Small sample confidence intervals for the difference,ratio and odds ratio of two success probabilities. Commun Stat Simul Comput. 1993, 22: 925938. 10.1080/03610919308813135.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.