Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: GTM-decon: guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes

Fig. 2

Evaluation of deconvolution performance on real bulk data. a Evaluation of sample deconvolution. We evaluated the deconvolution performance of GTM-decon using all genes (GTM-ALL), preprocessed genes (GTM-PP), and highly variable genes (GTM-HVG) with five SOTA methods—CIBERSORTx, MUSIC, BSEQ-sc, BISQUE, BayesPrism. The 3 immune bulk data and the brain data were deconvolved using independent references of a similar tissue, while the pancreas bulk data is deconvolved using single-cell reference from the same individuals in a leave-one-out cross-validation (LOOCV) manner. The bulk labeled PBMC-1 corresponds to SDY67 dataset, PBMC-2 corresponds to S13 cohort, whole blood to whole blood dataset, and prefrontal cortex to ROSMAP dataset (Additional file 1: Table S2). For each test bulk sample, Spearman correlation and root mean square error (RMSE) were computed between its ground truth and predicted cell-type proportions by each method. The box and whiskers in each boxplot indicate the 25–75% quartile and min–max of the evaluation scores over all samples in a dataset, respectively. The boxplot on the left displays the evaluation across cell types per sample, and the boxplot on the right displays the evaluation across samples per cell type. b Heatmaps comparing the cell-type-specific deconvolution performance of GTM-decon against existing methods on 5 different real bulk datasets with known ground truth mixing proportions. The cell types are ordered from most to least prevalent in the bulk data (green barplots in first row indicate average proportion for each cell type in the bulk data). The middle row shows the Pearson correlation coefficient between the predicted and known cell-type proportions. The lower row shows the inverse RMSE (higher is better, scaled between 0 and 1), per cell type per dataset. The barplots on the right show the average performance over all cell types for each method. For each cell type, Pearson correlation and RMSE were computed between its ground truth and predicted cell-type proportion for each dataset by each method. c Cell-type prediction accuracy of the purified immune bulk RNA-seq samples. The two panels indicate the use of different independent immune references, for the deconvolution of two purified bulk immune datasets (Accession Numbers: GSE107011, GSE64655). For each purified bulk sample, the cell type corresponding to the highest inferred cell-type proportion by each method was used as the predicted cell type. The barplots show the prediction accuracy as the percentage of the correctly predicted samples

Back to article page