Skip to main content
Figure 8 | Genome Biology

Figure 8

From: An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era

Figure 8

A performance comparison of k-nearest neighbors (k-NN) models in predicting microarray and RNA-Seq validation data based on the TCGA AML data. In the comparison, both microarray log2 intensity and RNA-Seq log2 count were per sample z-scored. For each of the two binary clinical endpoints and each of the two mapping groups A and B, a set of 500 k-NN models were developed from microarray and RNA-Seq training data independently. Each set of k-NN models were then used to predict both microarray and RNA-Seq validation samples. The average prediction accuracies of the 500 microarray-based models in prediction microarray data were plotted against those in predicting RNA-Seq data (a), with per sample agreement better than chance assessed with the Kappa statistic as shown in (b); while the average accuracies of the 500 RNA-Seq-based models in predicting RNA-Seq data were compared to those in predicting microarray data (c), with per sample agreement better than chance evaluated with the Kappa statistic as shown in (d). The two symbols in each panel represent the two binary clinical endpoints with green and blue colors denoting mapping groups A and B, respectively. In panels (b) and (d), each symbol denotes the average Kappa statistic of 500 pairs of prediction results; and each error bar shows the 95% confidence interval (CI) for the mean Kappa statistic. Each CI was calculated with the bootstrap estimation.

Back to article page