Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: A systematic evaluation of single-cell RNA-sequencing imputation methods

Fig. 2

Similarity between bulk and imputed single-cell expression data in cell lines. a For the H1975 cell line, a scatter plot of the scran normalized [37] log2-transformed scRNA-seq cell profiles (N = 440) averaged across all cells (“pseudobulk”) with that in a bulk RNA-seq profile with the Spearman’s correlation coefficient (SCC). b For each cell, we also calculated the SCC between an imputed cell’s profile (e.g., using scVI) and the bulk RNA-seq profile. c Distribution of correlations between bulk profiles and single-cell profiles (imputed or not imputed—i.e., no_imp) across all cells in the H1975 cell line dataset. The red dotted line represents the estimated SCC (\(\hat {\rho }\) = 0.61) shown in a. Here, we use the pseudobulk as a reference for an upper bound in performance. While the correlation between bulk and pseudobulk is higher than between bulk and an imputed cell’s profile, the imputed profiles are still useful because pseudobulk ignores cell variability. The methods are ordered in the same order as d for comparison. d A heatmap of the median correlation for each imputation method and each dataset across two experimental platforms (five datasets from the 10x Genomics platform and five datasets from the Fluidigm platform with the number of cells in each dataset in parentheses). The rows are sorted by first averaging the median correlations across datasets within each platform and then averaging across platforms. The asterisks are used to denote the methods with significant platform difference, defined as the Benjamini-Hochberg adjusted p values (i.e., FDR) < 0.05 from two-sample t-tests that evaluate whether the SCCs have equal mean between the two (10x and Fluidigm) platforms, and the relative performance change \(|\overline {SCC}_{10x}-\overline {SCC}_{fluidigm}| / max(\overline {SCC}_{10x},\overline {SCC}_{fluidigm})\) is greater than 25%. Filled circles (brown: Fluidigm, green: 10x) indicate methods for which the imputation performance (values in a row) and the number of cells in a dataset show high correlation (Spearman correlation ≥ 0.6). eh Similar to ad except, for any two cell types, the SCC is calculated comparing the difference (log fold change or LFC) in two bulk cell type profiles compared to two scRNA-seq cell type profiles. The average number of cells across two cell lines is shown in parentheses. The minimum cell number in the cell type pair is used for computing the cell number -performance correlation

Back to article page