Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Widespread redundancy in -omics profiles of cancer mutation states

Fig. 3

A Count of overlapping samples between gene expression, 27K methylation, 450K methylation, and somatic mutation data used from TCGA. Only non-zero overlap counts are shown. Somatic mutation sample information is included because it is needed to generate the mutation presence/absence labels. B Predictive performance for genes in the cancer-related gene set, using each of the three data types as predictors. The gene expression predictor uses the top 8000 gene features by mean absolute deviation, and the methylation predictors use the top 5000 principal components as predictive features. Significance stars indicate results of Bonferroni-corrected pairwise Wilcoxon tests: **p < 0.01, ***p < 0.001, ns: not statistically significant for a cutoff of p = 0.05. C Predictive performance for genes where at least one of the considered data types predicts mutation labels significantly better than the permuted baseline. D–F Predictive performance for each gene in the cancer-related gene set, for each data type, compared with a baseline model trained on permuted labels. G, H Direct comparison of performance using gene expression and each methylation dataset, for genes that perform significantly better than the baseline for both data types. Points (genes) to the left of y=0 perform better using gene expression-derived features, and points to the right perform better using methylation-derived features. I Pan-cancer survival prediction performance, quantified using c-index on the y-axis, for gene expression, 27K methylation, and 450K methylation. The x-axis shows results with varying numbers of principal components included for each data type. Models also included covariates for patient age, sample mutation burden, and sample cancer type; gray dotted line indicates mean performance for a covariate-only baseline model

Back to article page