Fig. 2From: Widespread redundancy in -omics profiles of cancer mutation statesA Overall distribution of performance across three gene sets, using gene expression (RNA-seq) data to predict mutations. Each data point represents the mean cross-validated AUPR difference, compared with a baseline model trained on permuted mutation presence/absence labels, for one gene in the given gene set; notches show bootstrapped 95% confidence intervals. “random” = 268 random genes, “most mutated” = 268 most mutated genes, and “cancer gene set” = 268 cancer-related genes from curated gene sets. Significance stars indicate results of Bonferroni-corrected pairwise Wilcoxon tests: **p < 0.01, ***p < 0.001, ns: not statistically significant for a cutoff of p = 0.05. B–D Volcano-like plots showing mutation presence/absence predictive performance for each gene in each of the three gene sets. The x-axis shows the difference in mean AUPR compared with a baseline model trained on permuted labels, and the y-axis shows p-values for a paired t-test comparing cross-validated AUPR values within folds. Points (genes) marked with an “X” are overlapping between the cancer gene set and either the random or most mutated gene setBack to article page