Skip to main content
Figure 6 | Genome Biology

Figure 6

From: An investigation of biomarkers derived from legacy microarray data for their utility in the RNA-seq era

Figure 6

A performance comparison of k-nearest neighbors (k-NN) in predicting microarray and RNA-Seq validation samples. The comparison is based on the SEQC NB data set. In the comparison, both microarray log2 intensity data and RNA-Seq log2 counts were per sample z-scored. For each of the six binary clinical endpoints and each of the two mapping groups A and B, a set of 500 k-NN models were developed from microarray and RNA-Seq training data independently. Each set of k-NN models were then used to predict both microarray and RNA-Seq validation samples. The average prediction accuracies of the 500 microarray k-NN models in predicting microarray data are plotted against those in predicting RNA-Seq data (a), with the per sample agreement better than chance evaluated with the Kappa statistic as shown in (b); while the average accuracies of the 500 RNA-Seq k-NN models in predicting RNA-Seq data are compared to those in predicting microarray data (c), with the per sample agreement better than chance assessed with the Kappa statistic as shown in (d). The six symbols in each panel represent the six binary clinical endpoints with green and blue colors denoting mapping groups A and B, respectively. In panels (b) and (d), each symbol denotes the average Kappa statistic of the 500 pairs of prediction results; and each error bar shows the 95% confidence interval (CI) for the mean Kappa statistic. Each CI was calculated with the bootstrap estimation.

Back to article page