Skip to main content

Table 3 Pattern discovery at the level of upstream regions

From: Evaluation of thresholds for the detection of binding sites for regulatory proteins in Escherichia coliK12 DNA

  Consensus/Patser Dyad-analysis/sweeping
Regulon Sites/sites 250/sites 450/sites Sites/sites 250/sites 450/sites
AraC 100.00 20.00 20.00 100.00 - 80.00
ArcA 80.00 60.00 60.00 90.00 - 80.00
ArgR 100.00 100.00 100.00 100.00 33.33 100.00
CRP 95.24 93.65 95.24 90.48 66.67 65.08
CysB 100.00 60.00 40.00    
CytR 100.00 16.67 16.67    
FIS 60.00 60.00 -    
FNR 90.00 75.00 70.00 80.00 60.00 60.00
FadR 100.00 75.00 -    
FruR 100.00 14.29 71.43 71.43 - -
Fur 100.00 100.00 25.00 75.00 - -
GlpR 100.00 100.00 100.00 100.00 75.00 100.00
IHF 100.00 75.00 33.33 58.33 - -
LexA 100.00 87.50 87.50 100.00 100.00 100.00
Lrp 80.00 60.00 50.00 50.00 - -
MalT 100.00 50.00 50.00 100.00 100.00 100.00
NR_I 100.00 100.00 100.00 100.00 100.00 100.00
NagC 75.00 25.00 25.00 - - 50.00
NarL 100.00 55.56 22.22 77.78 - -
OmpR 100.00 - 25.00 100.00 - -
OxyR 75.00 25.00 -    
PhoB 100.00 100.00 100.00 100.00 100.00 75.00
PurR 92.31 84.62 84.62 100.00 84.62 84.62
TrpR 100.00 100.00 100.00 100.00 100.00 100.00
TyrR 100.00 100.00 87.50 87.50 87.50 100.00
Average 94.14 68.22 61.98 88.45 82.47 85.34
  1. For each family, we show the results with Dyad-analysis/sweeping and with Consensus/Patser. The data shown are obtained using different training sets - the 200+50 and 400+50 regions (250 and 450) and a comparison with training sets of known binding sites (sites) as a reference standard. Results are given as the number of regions where at least one binding site was found divided by the total number of regions, and expressed as percentages. Note that only the dyads extracted from the max ROMs within each region are used here. In each column heading, the first word refers to the training set and the second refers to the regions where the patterns were searched. For instance, columns headed 450/sites show the results of pattern discovery when Consensus or Dyad-analysis has as input the 450+50 bp regions, and the sensor is evaluated with the files of known sites. We counted only those regions containing known binding sites within the range covered (that is, if a known binding site is present more than 200 bp upstream of the gene start site, the corresponding 200+50 region is not counted). Averages count only the lines where the programs provided a result. Dashes mean that either there was no binding site within the region, or the programs failed to provide a matrix (Consensus) or significant dyads (Dyad-analysis). A region is considered found if at least one of its binding sites is matched.