Skip to main content

Table 3 Pattern discovery at the level of upstream regions

From: Evaluation of thresholds for the detection of binding sites for regulatory proteins in Escherichia coliK12 DNA

 

Consensus/Patser

Dyad-analysis/sweeping

Regulon

Sites/sites

250/sites

450/sites

Sites/sites

250/sites

450/sites

AraC

100.00

20.00

20.00

100.00

-

80.00

ArcA

80.00

60.00

60.00

90.00

-

80.00

ArgR

100.00

100.00

100.00

100.00

33.33

100.00

CRP

95.24

93.65

95.24

90.48

66.67

65.08

CysB

100.00

60.00

40.00

   

CytR

100.00

16.67

16.67

   

FIS

60.00

60.00

-

   

FNR

90.00

75.00

70.00

80.00

60.00

60.00

FadR

100.00

75.00

-

   

FruR

100.00

14.29

71.43

71.43

-

-

Fur

100.00

100.00

25.00

75.00

-

-

GlpR

100.00

100.00

100.00

100.00

75.00

100.00

IHF

100.00

75.00

33.33

58.33

-

-

LexA

100.00

87.50

87.50

100.00

100.00

100.00

Lrp

80.00

60.00

50.00

50.00

-

-

MalT

100.00

50.00

50.00

100.00

100.00

100.00

NR_I

100.00

100.00

100.00

100.00

100.00

100.00

NagC

75.00

25.00

25.00

-

-

50.00

NarL

100.00

55.56

22.22

77.78

-

-

OmpR

100.00

-

25.00

100.00

-

-

OxyR

75.00

25.00

-

   

PhoB

100.00

100.00

100.00

100.00

100.00

75.00

PurR

92.31

84.62

84.62

100.00

84.62

84.62

TrpR

100.00

100.00

100.00

100.00

100.00

100.00

TyrR

100.00

100.00

87.50

87.50

87.50

100.00

Average

94.14

68.22

61.98

88.45

82.47

85.34

  1. For each family, we show the results with Dyad-analysis/sweeping and with Consensus/Patser. The data shown are obtained using different training sets - the 200+50 and 400+50 regions (250 and 450) and a comparison with training sets of known binding sites (sites) as a reference standard. Results are given as the number of regions where at least one binding site was found divided by the total number of regions, and expressed as percentages. Note that only the dyads extracted from the max ROMs within each region are used here. In each column heading, the first word refers to the training set and the second refers to the regions where the patterns were searched. For instance, columns headed 450/sites show the results of pattern discovery when Consensus or Dyad-analysis has as input the 450+50 bp regions, and the sensor is evaluated with the files of known sites. We counted only those regions containing known binding sites within the range covered (that is, if a known binding site is present more than 200 bp upstream of the gene start site, the corresponding 200+50 region is not counted). Averages count only the lines where the programs provided a result. Dashes mean that either there was no binding site within the region, or the programs failed to provide a matrix (Consensus) or significant dyads (Dyad-analysis). A region is considered found if at least one of its binding sites is matched.