Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: PAUSE: principled feature attribution for unsupervised gene expression analysis

Fig. 2

PAUSE pathway attributions accurately identify the major sources of variation in single cell datasets. To verify that our novel pathway attribution method (“PAUSE” in the plots above) is capable of identifying the major sources of transcriptomic variation across a variety of datasets, we apply two benchmarks of pathway identification. The first (a), termed our impute benchmark, measures how much the reconstruction error of a biologically-constrained autoencoder model increases as important pathways are replaced with an uninformative, imputed baseline. Better methods will increase the error faster, leading to a larger area under the curve (AUC). The second benchmark (b), termed our retrain benchmark, measures how well a model can reconstruct the observed expression when retrained using only the most important pathways. Better methods will decrease the error faster, leading to a smaller AUC. The AUCs for both the impute (ce) and retrain (f–h) benchmarks are shown for ten separate train/test splits across three separate single cell gene expression datasets (intestinal cells, peripheral blood monocytes, and Jurkat cells). In each experiment, the PAUSE loss attribution method significantly outperforms other methods. The other methods shown here are logistic regression score (LR), Kullback-Leibler divergence (KLD), random ranking, and latent space variance (LSV). The boxes in c–h mark the quartiles (25th, 50th, and 75th percentiles) of the distribution, while the whiskers extend to show the minimum and maximum of the distribution (excluding outliers)

Back to article page