Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: SCA: recovering single-cell heterogeneity through information-based dimensionality reduction

Fig. 2

Performance of SCA on data simulated with Splatter [25]. a Ability of PCA, ICA, and SCA to recover rare cell populations of different sizes and with varying numbers of marker genes (out of 1000 cells and 1000 genes total). The population is considered “recovered” if the downstream Leiden clusters capture it with F1 score greater than 0.9 (Methods). SCA detects smaller populations with few marker genes. b, c Performance of FiRE [18] and RaceID [15] on a simulated dataset where 3% of the cells are defined by 10 marker genes (out of 1000 cells and 1000 genes total). For easy comparison with (e), FiRE scores and cluster memberships are plotted in the UMAP embedding downstream of the SCA representation. d Performance of SCA on the same dataset with a variety of neighborhood sizes. With a neighborhood size of 20 or fewer, SCA captures the rare population with very high F1 score after 2 or more iterations. The F1 score decreases when the neighborhood size approaches the size of the rare population, though it remains higher than PCA’s score of 0.153. e Top: UMAP plots downstream of various dimensionality reduction strategies, as well as the PHATE embedding [9]. SCA alone separates the rare population. Bottom: Scatter plots of the first two components of each reduction. The leading surprisal component separates the rare population from the rest

Back to article page