Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data

Fig. 3

Precise Read-level Imputation of Methylation (PReLIM) imputes missing methylation values at the read level. a Conceptual illustration of PReLIM. During training, PReLIM learns about associations of CpG methylation patterns within and among millions of reads from a given dataset. PReLIM then uses this knowledge to impute missing CpG values for all reads overlapping each 100-bp bin, enabling the generation of complete matrices that can be used by CluBCpG. b PReLIM expands each individual CpG site to a 1D vector which contains all the information for that CpG site in the context of all other reads in that bin. Read encodings are the relative proportions of each possible type of methylation pattern found in the matrix. c Receiver operating characteristic plot showing PReLIM’s performance on the 20% of mouse neuron data held out during training. d Corresponding precision-recall plot. e Trade-off plot illustrating associations between prediction confidence, prediction accuracy, and proportion of imputations achieved. Dotted lines show that, for this data set, considering only predictions with confidence > 0.6 enables 90% of missing values to be imputed at 95% accuracy. f Line plots (scale on left axis) show that imputation by PReLIM enables substantial gains in the proportion of genomic bins meeting CluBCpG coverage requirements on the ENCODE B cell data. Bar plot (scale on right axis) shows estimated coverage level of WGBS libraries currently deposited in the NCBI SRA; libraries with less than 5X coverage are not shown. For the majority of these datasets, PReLIM can increase coverage by 50–100%

Back to article page