Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data

Fig. 1

EPISCORE concept and workflow. a First step in EPISCORE is the construction of a tissue-specific mRNA expression reference matrix E(R)gk, which is derived from a corresponding scRNA-Seq tissue atlas. The expression reference matrix is defined over a set of marker genes that are differentially expressed between cell types, as defined in the scRNA-Seq tissue atlas. b Using completely independent (matched) bulk RNA-Seq and DNAm data of purified samples from Epigenomics Roadmap and SCM2, we identify genes for which differential DNAm at their regulatory elements (e.g., promoter) across the samples is predicted by corresponding gene expression. For the predictive genes, we learn a probabilistic Bayesian model denoted M (Eg), using logistic regression fits if necessary that allows prediction of likely DNAm values from gene expression. c Using the model learned in b and the expression values from the reference matrix constructed in a, we impute a corresponding tissue-specific DNAm reference matrix M(R)gk, weighting the marker genes (wg) according to how well the imputed DNAm values reflect gene expression. d Using the imputed DNAm reference matrix, we can now estimate proportions for the corresponding cell types (encoded as a vector \( \overrightarrow{\mathrm{f}} \) with K elements, one for each cell type) in a bulk DNAm profile \( \overrightarrow{\mathrm{x}} \) (encoded as a vector over the CpGs/genes in the DNAm reference matrix) representing the given tissue type, be it healthy or disease. The estimation proceeds via weighted multivariate robust linear least squares that tries to minimize the objective function as shown. e With these cell type fraction estimates, it is then possible to generate genome-wide maps of cell type-specific differential DNAm changes at resolution of single CpGs, informing us which CpGs are hyper or hypomethylated in any given cell type in relation to some phenotype of interest. In the equation, \( {\overrightarrow{\mathrm{x}}}_{\mathrm{c}} \) denotes the DNA methylation profile of a CpG c across the samples, \( {\overrightarrow{\hat{\mathrm{f}}}}_{\mathrm{k}} \) is the estimated cell type fraction for cell type k across the samples, and \( \overrightarrow{\mathrm{y}} \) denotes the phenotype-label (e.g., normal/cancer) of the samples

Back to article page