Skip to main content

Table 2 Assessment, by keyword recovery, of the functional linkages established by the Operon method at various distance thresholds

From: Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach

Threshold (bp)

Functional links between SwissProt Annotated Proteins

Functional links with no keywords in common

Correct keywords recovered

Total keywords

Maximum false positive fraction*

Keyword recovery†

0

308

78

446

883

0.25

0.51

25

642

180

856

1766

0.28

0.48

50

818

254

1044

2226

0.31

0.47

75

912

326

1080

2453

0.36

0.44

100

1044

362

1224

2726

0.35

0.45

  1. *The maximum false positive fractions were calculated as the fraction of pairwise links that do not have any SWISS-PROT keywords in common (ignoring the keywords 'hypothetical protein', 'three-dimensional structure', 'transmembrane' and 'complete proteome'). †Keyword recovery was calculated by comparing the SWISS-PROT keyword annotation between each pair of linked M. tuberculosis genes. The keyword recovery of all linkages was calculated as:
  2. where X is the total number of query protein keywords, Y is the total number of linked gene pairs, x is the number of query protein SWISS-PROT keywords, and nj is the number of times the query protein keyword j occurs in the linked protein. Notice that at 0 bp the keyword recovery is quite high, about 50%, while the maximum false positive rate is about 25%. As the distance threshold increases from 0 bp to 100 bp the keyword recovery decreases, while the maximum false positive fraction increases.