Skip to main content
Fig. 7 | Genome Biology

Fig. 7

From: Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations

Fig. 7

Using BioBombe compressed features as input in supervised machine learning tasks. Predicting a cancer type status and b gene mutation status for select cancer types and important cancer genes using five compression algorithms and two ensemble models in TCGA data. The area under the precision recall (AUPR) curve for cross validation (CV) data partitions is shown. The blue lines represent predictions made with permuted data input into each compression algorithm. The dotted lines represent AUPR on untransformed RNAseq data. The dotted gray line represents a hypothetical random guess. c Tracking the average change in AUPR between real and permuted data across latent dimensionalities and compression models in predicting (top) cancer types and (bottom) mutation status. The average includes the five cancer types and mutations tracked in panels a and b. d Tracking the sparsity and performance of supervised models using BioBombe compressed features in real and permuted data. e Performance metrics for the all-compression feature ensemble model predicting TP53 alterations. (left) Receiver operating characteristic (ROC) and (right) precision recall curves are shown. f The average absolute value weight per algorithm for the all-compression-feature ensemble model predicting TP53 alterations. The adjusted scores are acquired by dividing by the number of latent dimensionalities in the given model

Back to article page