Skip to main content

Table 10 Summary of data sets used

From: Promotech: a general tool for bacterial promoter recognition

Bacterium Genome accession PubMed ID NGS technology #TSS Genome length T or V
Escherichia coli str. K-12 substr. MG1655 NC_000913.3 27748404 [43] dRNA-seq 278 4,641,652 T
Escherichia coli str. K-12 substr. MG1655 NC_000913.2 25266388 [44] dRNA-seq 2672 4,639,675 T
Helicobacter pylori 26695 NC_000915.1 20164839 [45] dRNA-seq 1907 1,667,867 T
Helicobacter pylori 26695 NC_000915.1 30169674 [46] dRNA-seq 449 1,667,867 T
Campylobacter jejuni subsp. jejuni 81116 NC_009839.1 30169674 [46] dRNA-seq 269 1,628,115 T
Campylobacter jejuni subsp. jejuni NCTC 11168 NC_002163.1 23696746 [47] dRNA-seq 1905 1,641,481 T
Campylobacter jejuni RM1221 NC_003912.7 23696746 [47] dRNA-seq 2167 1,777,831 T
Campylobacter jejuni subsp. jejuni 81116 NC_009839.1 23696746 [47] dRNA-seq 1944 1,628,115 T
Campylobacter jejuni subsp. jejuni 81-176 NC_008787.1 23696746 [47] dRNA-seq 2003 1,616,554 T
Streptococcus pyogenes strain S119 LR031521.1 30902048 [48] dRNA-seq 892 1,877,450 T
Salmonella enterica subsp. enterica serovar NC_016810.1 22538806 [49] dRNA-seq 1873 4,878,012 T
Typhimurium SL1344       
Chlamydia pneumoniae CWL029 NC_000922.1 21989159 [50] dRNA-seq 530 1,230,230 T
Shewanella oneidensis MR-1 NC_004347.2 24987095 [51] dRNA-seq 4729 4,969,811 T
Leptospira interrogans serovar Manilae isolate L495 NZ_LT962963.1 28154810 [52] dRNA-seq 2865 4,614,703 T
Streptomyces coelicolor A3(2) NC_003888.3 27251447 [53] dRNA-seq 3570 8,667,507 T
Mycolicibacterium smegmatis str. MC2 155 NC_008596.1 30984135 [54] dRNA-seq 4054 6,988,209 V
Lachnoclostridium phytofermentans ISDg NC_010001.1 27982035 [55] Cappable-seq 1187 4,847,594 V
Rhodobacter capsulatus SB 1003 NC_014034.1 – [56] dRNA-seq 5374 3,738,958 V
Bacillus amyloliquefaciens XH7 CP002927.1 26133043 [57] dRNA-seq 1064 3,939,203 V
  1. In the last column, a T or V indicates whether the bacterium is reserved for training or validation, respectively. Additional information is included such as the number of TSS per bacterium, the genome’s length, the next-generation sequencing technology used to obtain the TSSs, and the literature sources’ PubMed ID (if PubMed ID is missing, then at the time of this publication, the source manuscript was still in preparation