Skip to main content

Table 1 Coverage of validation sets (excluding PMIDs in the training set) within the top10k, top50k, and top100k ranked abstracts for the vector space model relevancy ranking

From: Text-mining assisted regulatory annotation

 

TRANSFAC

FlyReg

ORegAnno Queue

ORegAnno prior to RegCreative

RegCreative success

RegCreative failure

Number of PMIDs

5,719

200

4,145

376

260

218

Number of PMIDs (no training data)

5,183

186

3,687

340

228

212

Number in top10k

1,390

38

1,035

89

59

18

Percent in top10k

26.8%

20.4%

28.1%

26.2%

25.9%

8.5%

Number in top50k

3,908

146

2,753

260

165

79

Percent in top50k

75.4%

78.5%

74.7%

76.5%

72.4%

37.3%

Number in top100k

4,572

166

3,208

301

199

110

Percent in top100k

88.2%

89.2%

87.0%

88.5%

87.3%

51.9%