Skip to main content

Table 2 Efficiency of document recovery, sequence extraction and genome mapping for the source lists of PMIDs with high cis-regulatory content

From: Text-mining assisted regulatory annotation

  TRANSFAC FlyReg ORegAnno Queue top4,501 All
Number of PMIDs 5,719 202 914 4,145 4,491 11,437
Number of PMIDs with PDF 5,302 187 835 3,710 3,677 9,940
Percent PMIDs with PDF 92.7% 92.6% 91.4% 89.5% 81.9% 86.9%
Number of PMIDs with text >2 Kbytes 5,051 175 793 3,517 3,498 9,440
Percent PMIDs with text >2 Kbytes 88.3% 86.6% 86.8% 84.8% 77.9% 82.5%
Efficiency of text conversion 95.3% 93.6% 95.0% 94.8% 95.1% 95.0%
Number of PMIDs with fasta sequence 4,357 155 660 3,044 3,080 8,066
Percent PMIDs with fasta sequence 76.2% 76.7% 72.2% 73.4% 68.6% 70.5%
Efficiency of sequence extraction 86.3% 88.6% 83.2% 86.6% 88.1% 85.4%
Number of PMIDs with fasta sequence mapped to genome 1,518 75 303 1,279 1,260 2,975
Percent PMIDs with fasta sequence mapped to genome 26.5% 37.1% 33.2% 30.9% 28.1% 26.0%
Efficiency of genome mapping 34.8% 48.4% 45.9% 42.0% 40.9% 36.9%
  1. Note that totals are less than the sum of the sets since many PMIDs are found in more than one source list.