Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data

Fig. 6

CluBCpG read clusters improve prediction of gene expression. a Receiver operating characteristic (ROC) curves of a random forest model trained on promoter average methylation alone (green line), promoter average methylation plus cluster information (purple line), promoter average methylation plus cluster information on the subset of gene promoters containing a major cluster (orange line), and promoter average methylation plus cluster information in which the class labels were permuted (gray line). Shading represents the 95% confidence interval of 100 random train-test splits. b–d Box and whisker plot overlaid with individual points showing the area under the ROC curve for train-test splits. Whiskers extend to 1.5x the intra-quartile range. c AUC results from a 10-fold nested cross-validation strategy that was used to ensure the models were not overfitting. d Downsampled data were the full B cell vs. monocyte dataset randomly reduced to 9X genome-wide coverage. Statistical tests: t test, two-tailed

Back to article page