Skip to main content

Table 1 Coverage of validation sets (excluding PMIDs in the training set) within the top10k, top50k, and top100k ranked abstracts for the vector space model relevancy ranking

From: Text-mining assisted regulatory annotation

  TRANSFAC FlyReg ORegAnno Queue ORegAnno prior to RegCreative RegCreative success RegCreative failure
Number of PMIDs 5,719 200 4,145 376 260 218
Number of PMIDs (no training data) 5,183 186 3,687 340 228 212
Number in top10k 1,390 38 1,035 89 59 18
Percent in top10k 26.8% 20.4% 28.1% 26.2% 25.9% 8.5%
Number in top50k 3,908 146 2,753 260 165 79
Percent in top50k 75.4% 78.5% 74.7% 76.5% 72.4% 37.3%
Number in top100k 4,572 166 3,208 301 199 110
Percent in top100k 88.2% 89.2% 87.0% 88.5% 87.3% 51.9%