Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data

Fig. 2

Effect of PCR and polymerase bias on sequence coverage and methylation estimates. a Coverage of dinucleotides in WGBS datasets generated with four pre-bisulfite (left panel) and three post-bisulfite (right panel) library preparation protocols. Averaged coverage values per method are expressed as log2 difference from the genomic expected and compared to a non-BS-treated control. For clarity, the dinucleotides are underlined as derived from C, G or A/T only. Statistical analysis of overall dinucleotide (base) coverage was performed on the average absolute deviations from control with one sample two-tailed t-tests followed by Bonferroni correction. Individual libraries/sequencing runs and studies per method are presented in Additional file 2: Figure S3a; details on study, laboratory and species can be found in Table 2 and Additional file 1. b CG dinucleotide coverage over expected for each individual library per method (left panel) and unbiased CA dinucleotide coverage for comparison (right panel). The main method groups are additionally split into subgroups of Heat and Alkaline for KAPA and ampPBAT, as shown also in Additional file 2: Figure S3a. The number of libraries per method is presented in the right panel. Pfu Cx in brackets next to the main Alkaline and Heat methods stands for the Pfu Turbo Cx polymerase, which they are generated with (all method details are provided in Table 2). Statistical analysis on CG coverage was performed with one-way ANOVA with Bonferroni correction. c Read coverage dependence on the G/C composition in 100-bp tiles. Cytosine content of reads was calculated from the corresponding genomic sequence and not the actual read sequence, where unmethylated cytosines appear as thymines. The tile distribution per G/C content is plotted in the background along the x-axis for reference. Am-BS was omitted from this analysis due to unavailability of same species datasets. d Global methylation levels of mESCs as measured by LC-MS and a panel of WGBS datasets. The LC-MS value is an average for J1 and E14 lines from different passages and studies to account for lineage and tissue culturing variances. The differences between the LC-MS values and WGBS measurements are marked as a methylation ‘artefact’ within each WGBS method. Significance is calculated by one-way ANOVA with Dunnett’s multiple comparisons test on the absolute WGBS values (‘genomic’ + ‘artefact’) against the LC-MS value. **p < 0.01, ***p < 0.001, ****p < 0.0001; n.s. not significant; error bars represent standard deviation

Back to article page