Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Fig. 6

Identification of cell-type-specific differentially expressed genes from bulk RNA-seq data for basal vs. ER + subtypes. a Top cell-type-specific gene signatures for basal and ER + . GTM-decon was pretrained on a scRNA-seq reference dataset from normal breast tissue to infer the expression distribution of 5 cell types, namely basal, basal myoepithelial, luminal 1–1, luminal 1–2, and luminal 2. The resulting genes-by-cell-type estimates were then used as the initial topic distributions for another GTM-decon, which is guided by the basal and ER + cancer subtypes in modeling the sparsified TCGA-BRCA bulk data. This led to a 10-topic distribution, each of which was specifically tailored for a combination of cell type and cancer subtype. The heatmap displays the probabilities of the top 20 genes for each topic. The left half displays the cell-type-specific topic distribution for basal and the right half for ER + . b Predicted differentially expressed (DE) genes for each cell type between basal and ER + . The top DE genes for basal in contrast to ER + were identified by subtracting the gene topic scores for ER + from the gene topic score for basal under the same cell type. The resulting DE scores were shown in the top half of the heatmap. The bottom half displays the DE scores of the top genes for ER + in contrast to basal. The pairwise Wilcox signed-rank tests were performed to compare the gene topic scores across all genes between the two subtypes for the same cell type. All tests yielded p-values lower than 2.2e−16. c Classification of basal and ER + based on the phenotype probabilities. As a validation for our nested phenotype-cell-type guided approach, we evaluated the classification accuracy on the 160 held-out sparsified breast tumor samples. For each subtype, we summed the cell-type-specific topic probabilities from bottom heatmap for each sample to obtain the phenotype scores, which are shown in the top heatmap. d Comparison of DE genes detected by our approach and by DESeq2. DESeq2 was applied to the bulk RNA-seq gene expression data to compare gene expression between ER + and basal samples. In total, 6815 DE genes were deemed significant by adjusted p-value < 0.05 (Wald test) with 2952 upregulated and 3863 downregulated genes in ER + relative to basal. The grey bar and the heatmap on the left display the − log adjusted p-value for all of the upregulated genes (top half) and the downregulated genes (bottom half). Genes were ordered in decreasing order of the absolute test statistic for each half. The corresponding log2 fold-change of ER + over basal was also shown as heatmap. The heatmap on the right displays the change of gene topic score from basal subtype to ER + subtype. e Cell-type-specific DE genes identified by nested-guided topic approach. The top and bottom part of the heatmap displays the topic scores for the upregulated and downregulated genes in basal relative to ER + , respectively (p-value < 0.05; permutation test). Genes that were also detected by DESeq2 were labeled in the color bar. f ORA was applied to the differential topic scores of upregulated and downregulated genes in ER + relative to basal. MSigDb HALLMARK pathway gene sets were used in ORA. The − log p-values for the significant pathways were shown in the bar plot

Back to article page