Skip to main content

Table 2 Efficiency of document recovery, sequence extraction and genome mapping for the source lists of PMIDs with high cis-regulatory content

From: Text-mining assisted regulatory annotation

 

TRANSFAC

FlyReg

ORegAnno

Queue

top4,501

All

Number of PMIDs

5,719

202

914

4,145

4,491

11,437

Number of PMIDs with PDF

5,302

187

835

3,710

3,677

9,940

Percent PMIDs with PDF

92.7%

92.6%

91.4%

89.5%

81.9%

86.9%

Number of PMIDs with text >2 Kbytes

5,051

175

793

3,517

3,498

9,440

Percent PMIDs with text >2 Kbytes

88.3%

86.6%

86.8%

84.8%

77.9%

82.5%

Efficiency of text conversion

95.3%

93.6%

95.0%

94.8%

95.1%

95.0%

Number of PMIDs with fasta sequence

4,357

155

660

3,044

3,080

8,066

Percent PMIDs with fasta sequence

76.2%

76.7%

72.2%

73.4%

68.6%

70.5%

Efficiency of sequence extraction

86.3%

88.6%

83.2%

86.6%

88.1%

85.4%

Number of PMIDs with fasta sequence mapped to genome

1,518

75

303

1,279

1,260

2,975

Percent PMIDs with fasta sequence mapped to genome

26.5%

37.1%

33.2%

30.9%

28.1%

26.0%

Efficiency of genome mapping

34.8%

48.4%

45.9%

42.0%

40.9%

36.9%

  1. Note that totals are less than the sum of the sets since many PMIDs are found in more than one source list.