Skip to main content

Table 1 Prediction accuracies for vertical and horizontal components

From: Vertebrate gene finding from multiple-species alignments using a two-level strategy

 

Acceptors

Donors

Starts

Stops

Train set size

204,021

221,421

7,571

25,071

Eval set size

52,605

57,179

1,805

6,162

%True sites

14.05

13.01

16.68

8.08

F scores (%)

    

   Presence

52.72

48.77

39.70

34.64

   Vertical

82.01

81.00

55.70

49.25

   Horizontal

84.36

84.43

57.01

48.22

   Both

84.86

84.60

58.22

49.60

   ENCODE Cl

63.18

65.86

27.44

14.67

   ENCODE GF

80.23

81.38

42.47

50.49

100-ROC (%)

    

   Presence

12.41

12.66

20.62

23.98

   Vertical

2.46

2.52

14.49

12.76

   Horizontal

1.81

1.58

12.48

11.77

   Both

1.74

1.54

10.41

10.90

   ENCODE Cl

0.99

0.61

9.14

10.49

  1. The table shows the F score (geometric mean of sensitivity and specificity, which are close to each other) for various classifier components. The test set for the presence, vertical, horizontal and 'both' conditions is 'challenging' data; we show results for a mixture of the classifiers trained on challenging and randomly selected data. The 'ENCODE Cl' and 'ENCODE GF' lines are for the 31 ENCODE test regions, using classifier scores and gene-finder scores, respectively. The table also shows the 100%-ROC (receiver operating characteristic) error value for each condition. This error value is the probability that if a true instance and a decoy are selected at random, the classifier will give the decoy a higher score than the true instance.