Accuracy as a function of training set size. Percentage of correct exons (F score) is shown on the y-axis and training set size in thousands is shown on the x-axis. Data points (N = 121) are shown in blue; the best fit function of the form y = a/(1+be
-cx+d) is shown in red; a = 69.01, b = 0.0152, c = 0.0012, d = 2.09. The curve is effectively flat for values of x above 6,000 (not shown). The curve for nucleotide and gene level accuracies and for the second test set are of very similar shape. F = 2 × Sn × Sp/(Sn + Sp).