Skip to main content
Figure 1 | Genome Biology

Figure 1

From: An inventory of yeast proteins associated with nucleolar and ribosomal components

Figure 1

Estimation of prediction accuracy. The accuracy of predictions was estimated from 1,000 runs of 10-fold cross-validations using 1,000 alternative training sets (see Materials and methods). The threshold/working point used for the final predictions of new nucleolar proteins is marked in each plot. (a) The sensitivity (SE = TP/(TP + FN)) of our classifier is plotted over different thresholds of classifier scores (log posterior odds ratios) applied to each cross-validation run. The logarithmic posterior odds ratios indicate how likely it is under the naïve Bayesian model that a protein is an NRCA protein (positive scores) versus that it is not an NRCA protein (negative scores). A single point on the line and its error bar stems from calculations of the average sensitivity and its standard deviation obtained from 1,000 cross-validation runs using a distinct classification score threshold. Confidence intervals are ± 2-fold standard deviation intervals around the mean. Note that at the threshold that was finally used for prediction (0.4) we expect to reach a sensitivity of 50.4%. This means that we have probably still missed as many NRCA proteins as we have predicted (62). (b) The specificity (SP = TN/(TN + FP)) of our classifier is plotted over different thresholds of classifier thresholds (log posterior odds ratios) that were applied on results of each of 1,000 cross-validation runs. Confidence intervals are ± 2-fold standard deviation intervals around the mean. Note that at the finally used threshold of 0.4 the specificity reaches 0.986, meaning that we expect only 1.4% of false positives among our predictions. (c) The ROC curve of our classifier is plotted as sensitivity versus (1-specificity). Each individual data point reflects predictions at a single cross-validation run when a single prediction threshold is applied. The central line is based on averaged SE/SP values for each threshold applied. The ROC curve gives an impression of the quality of a classifier. It is a general indicator of classification performance. The bigger the AUC, the better the classifier. We obtained an AUC value of 0.98, which generally indicates a classification of high quality. The ROC curve was also the basis for the selection of our final classifier threshold, as it illustrates the trade-off between sensitivity and specificity. We chose to be very conservative (high specificity) for the sake of missing true NRCA proteins (lower sensitivity).

Back to article page