Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Phiclust: a clusterability measure for single-cell transcriptomics reveals phenotypic subpopulations

Fig. 1

Phiclust is a proxy for the theoretically achievable adjusted rand index (tARI). a Scheme illustrating the rationale behind phiclust. b Singular value distributions of simulated data sets with 5 clusters and different levels of noise; Red: low signal-to-noise, Green: high signal-to-noise. The MP distribution is indicated by a solid blue line, the TW threshold is indicated by a red solid line, and significant singular values are highlighted with asterisks. Inserts show UMAPs of the data. The data set with a higher signal-to-noise ratio has more significant singular values and those singular values are bigger. c Value of the largest singular value versus for simulated data. Arrows indicate where the examples from b are located. The relationship between the largest singular values and phiclust only depends on the dimensions of the expression matrix. Simulations with different cell-to-gene ratios are shown in different colors. d Phiclust versus theoretically achievable ARI (tARI). Red data points: simulated data sets with two clusters. The number of differentially expressed (DE) genes was varied; the log fold change between clusters was fixed. Green data points: simulated data sets with two clusters. The mean log fold change between clusters was varied; the number of differentially expressed genes was fixed. Blue data points: two synthetic clusters were created by weighted averages of cells from two clusters in the PBMC data set. Cluster weights were varied. The grey dashed line indicates identity. Inset: UMAP of PBMC data set with the two clusters used indicated by red solid circles. e scRNA-seq of mixtures of RNA extracted from three different cell lines. Each data point is a mixture. For each mixture, the entries of the first two singular vectors are plotted. Colors indicate different ratios of contributions from the three cell lines. f First two singular vectors of the cluster indicated by a black solid ellipse in e. The amount of mRNA per mixture [pg] is indicated in color. g Normalized total counts per mixture versus first singular vector of the cluster shown in f. Linear regression (dashed line) is used to regress out the correlation with the total counts. Grey area indicates standard deviation

Back to article page