Skip to main content

Identification of attenuation and antitermination regulation in prokaryotes

Abstract

Many operons of biochemical pathways in bacterial genomes are regulated by processes called attenuation and antitermination. Though the specific mechanism can be quite different, attenuation and antitermination in these operons have in common the termination of transcription by a RNA 'terminator' fold upstream of the first gene in the operon. In the past, detecting regulation by attenuation or antitermination has often been a long process of experimental trial and error, on a case by case basis. We report here the prediction of over 290 upstream regions of genes with attenuation or antitermination regulation structures in the completed genomes of Bacillis subtilis and Escherichia coli for which extensive experimental studies have been done on attenuation and antitermination regulation. These predictions are based on a computational method devised from characteristics of known terminator fold candidates and benchmark regions of entire genomes. We extend this methodology to 24 additional complete genomes and are thus able to give a more complete picture of attenuation and antitermination regulation in bacteria.

Background

The control of gene expression can occur at many points in the transcription and translation of the genes of bacterial operons. Two mechanisms of operon regulation of great interest are "attenuation" and "antitermination" [1,2,3,4,5]. These mechanisms regulate the early termination of transcription of a wide variety of operons in diverse species. Classically, attenuation occurs when the transcribed RNA upstream of an operon has the ability to fold into two mutually-exclusive RNA-fold structures, one which is termed an antiterminator and the other a terminator. If the terminator hairpin loop is allowed to fold, transcription is ultimately halted. Alternatively, if the antiterminator structure folds, the terminator is precluded from folding and transcription of the operon proceeds. The mechanisms that alternate between these two RNA folds (terminators and antiterminators) are quite diverse. Regulation by antitermination (not to be confused with the alternative antiterminator fold of attenuation) can be differentiated from attenuation by the fact that alteration of the transcription complex (rather than alternate RNA structures) decreases the efficiency of downstream terminators. Though, in reality, the boundary between these two types of regulation is not distinct [3].

Attenuation and antitermination mechanisms have both been described in a wide variety of regulatory and biochemical pathways. These include operons involved in aminoacyl-tRNA biosynthesis, catabolic metabolism, amino-acid biosynthesis, ABC transport systems, ribosomal structural peptides and several others. They have been characterized in genomes as disparate as the low-GC gram-positive Bacillis subtilis and the proteobacteria Escherichia coli. The precise mechanisms that cause the attenuation or antitermination of these operons can be quite distinct. For example, the trp operons of E. coli [4,5] and B. subtilis [6,7], though both regulated by attenuation, are controlled by quite different mechanisms. Other operons, such as the structural ribosomal S10 operon of E. coli [8,9] are regulated by yet a different mechanism. Even between closely related species, the attenuation and antitermination, and upstream regulatory sequences can be entirely different.

Yet, one common and necessary feature of most experimentally described attenuation and antitermination mechanisms is an intrinsic terminator RNA fold structure [2,3]. The stem-loop structure of an intrinsic terminator has been well described [10] and the understanding of the mechanisms of termination has made great progress in recent years [11,12]. This structure is not only found at the location of 'standard' termination of transcription at the end of transcriptional units, but is also, by definition a part of attenuation and antitermination regulation. The major characteristics of this standard terminator structure is that it is relatively short, is energetically stable, has a G/C rich stem, contains a small terminal loop structure and, importantly, also contains a run of U residues on the 3' side of the stem-loop structure [10]. These characteristics of intrinsic terminators have been used in the past to predict terminator structures at the end of transcriptional units (operons) and thus assist in the prediction of transcriptional units in complete genomes [10,13] and on a limited basis for predicting regulatory attenuators [14]. Here we focus on using the characteristics and position of intrinsic terminators to predict and characterize attenuation and antitermination regulation in operons of B. subtilis and E. coli. These mechanisms of regulation are well described in these two genomes. We extend this characterization to an additional 24 genomes representative of the diversity of eubacteria and archeabacteria to give a broader picture of attenuation and antitermination regulation in prokaryotes and in a more automated and extensive manner than previously achieved.

Results

Characterization of attenuators in B. subtilis and E. coli

An extensive literature search for operons in B. subtilis regulated by attenuation or antitermination was conducted and 46 such operons were found. These range from the experimentally well described trp operon to those operons where terminator structures have been found and attenuation is expected though not well characterized experimentally [15,16] (for a full list see http://www.bork.embl-heidelberg.de/Docu/attenuation). These 46 known terminator structures were employed to determine common characteristics of B. subtilis attenuation terminators. Using these characteristics, we screened upstream regions of 3650 B. subtilis genes (using procedures described in Materials and Methods) for terminator folds. Forty-three of the original 46 known terminators found in the literature search were retained in this screening. An additional 1117 upstream folds that fit our criteria were also obtained. In addition, as a control, we used the same filtering and folding methodology on intergenic regions after the sequences were shuffled randomly (952 folds of randomly shuffled sequences were obtained after filtering).

The resulting folds of all intergenic regions and shuffled sequences obtained after filtering were plotted in terms of their stability and length (Figure 1). The known terminator folds lie in a cluster clearly separate and distinct from those folds of randomly shuffled sequences. Terminator folds are of a lesser free energy (ΔG) in relation to length than predicted folds of random sequences. A similar pattern of two easily separated clusters emerges when comparing known terminator structures with folded intragenic regions in which terminator are not expected to be found (data not shown).

Figure 1
figure1

Stability and length distributions of stem-loop structures in upstream sequence segments of B. subtilis. The red line shows the largest variance (see Materials and Methods) derived from stem-loop structures in shuffled sequences. Light blue lines give the significance measurements based on standard deviation. The definition for each point together with the orientation of neighboring genes are shown in upper right panel.

Using principal component analysis, we determined the greatest variance of the randomly shuffled sequences. This can give us a measure (using standard deviation) of which folds are significantly different from folds of random sequences (see Materials and Methods). Of the 1160 folds, a total of 203 folds of intergenic regions obtained in our screen fall below the 2nd deviation line (Z ≤ -2) derived from the principal component. These are thus considered significantly different from random folds and possible terminations sites of attenuation or antitermination regulation. Forty-two of these are the known attenuation terminators folds (of the original 43 known folds maintained after filtering). Thus we are able to obtain 91.3% (42/46) of the known and experimentally characterized attenuation and antitermination sites using our filter and significance measure. Additionally, the filter and significance measure screens out over 97.7% (930 of 952) of the folds of random sequences. One hundred and sixty-one (203 total excluding 42 known) folds under the line (Z ≤ -2) are folds not yet analyzed experimentally and could be predicted to be attenuation terminator structures.

A detailed investigation found many of these predictions are strongly supported as a putative attenuation or antitermination sites by genomic context such as the presence of putative promoter sequences, upstream location of putative and known operons, etc. Two terminator structures upstream genes ydbJ and yqhI serve as detailed examples of how genomic context can inform and strongly support the predictions made in Table 1 (Figure 2). Gene ydbJ of B. subtilis is listed as hypothetical with homology to an ABC transporter gene (ATP-binding protein involved in copper transport). The gene immediately downstream, ydbK, has homology to membrane spanning permeases. Using STRING (a search tool for find recurring instances of neighboring genes [17]), orthologs of these two genes are also found in the same order in transcriptional units of 15 other distantly related genomes, suggesting the possibility these genes form an operon. These genes appear to be in a typical ABC transporter operon configuration and several ABC transporter operons are known to be regulated by attenuation in B. subtilis [15,16]. The ydbJ upstream region also has a putative promoter sequence and predicted folds using RNAfold (See Materials and Methods) of the entire upstream sequence suggest it can fold in complex possible antitermination folds (data not shown). Based on this context, we predict this is an ABC transporter operon regulated by attenuation. The second example, yqhI, is the first gene of a run of three genes all having homology to glycine biosynthesis genes in a putative transcriptional unit. This run of three genes also has orthologs found as neighbors in other genomes [17]. Many amino acid biosynthesis operons in B. subtilis are known to be regulated by attenuation [16], thus supporting this prediction.

Figure 2
figure2

Schematic drawing of the neighborhood and predicted structures for the B. subtilis genes ydbJ and yqhI. Genes are signified by colored arrows and are in orientation of transcription in relation to orientation of reference gene (ydbJ or yqhI). Large blue stem-loop cartoons signify predicted terminator fold in attenuation, 't' is an annotated standard terminator fold. Intergenic regions are drawn to scale and bp lengths of these are given underneath figure.

Table 1 Predicted attenuators in the genome of B. subtilis

In order to see if the observed patterns hold for the only other genome in which attenuation or antitermination is well studied and experimentally described, we also applied the same methodology to upstream regions of genes in the E. coli genome for which 16 operons have been described as being regulated by attenuation or antitermination. As can be seen in Figure 3, the known E. coli attenuation and antitermination terminator structures have similar properties as those of B. subtilis. 15 of the 16 known attenuators were maintained after filtering. The significance measure separates 14 of these E. coli terminators from random folds as seen in Figure 3. As in B. subtilis, using the (Z≤-2) line as a measure of significance, we are able to predict attenuation for 146 regions (Figure 3 and Table 2).

Figure 3
figure3

Stability and length distributions of stem-loop structures in upstream sequence segments in E. coli. The red line shows the largest variance (see Materials and Methods) derived from stem-loop structures in shuffled sequences. Light blue lines give the significance measurements based on standard deviation. The definition for each point together with the orientation of neighboring genes are shown in upper right panel.

Table 2 Predicted attenuators in the genome of E. coli

Extension of analysis to 26 genomes

Analysis of B. subtilis and E. coli suggest that a broader survey of bacterial genomes might prove useful in both the prediction of attenuation and antitermination regulation in these genomes and the characterization of the evolution and distribution of these mechanisms of regulation. Twenty-four completed genomes were selected for this survey based on their broad distribution across the evolutionary spectrum (Table 3). The intergenic regions of each of these genomes were analyzed using the same methods and filters as with B. subtilis and E. coli and predicted attenuation and antitermination terminator folds similarly obtained.

Table 3 List of all 26 genomes surveyed in this study

As shown in Table 3, there is a wide distribution of the number of putative attenuation and antitermination regulatory sites in the surveyed genomes. These range from 5 in Mycobacterium tuberculosis to 275 in Clostridium acetobutylicum (Table 3). Earlier attempts to predict standard transcription termination sites at the end of transcription units give similar results. Interestingly, the results for standard transcription terminators correlate with ours. As was found in Ermolaeva et. al [13] with standard terminators at the end of transcription units (this paper studied terminators at end of ORFs and did not target upstream regions, thus filtering out possible attenuators), some of the highest number of occurrences of attenuation and antitermination sites in our survey are similarly found in the genomes of E. coli, H. influenze, D. radiodurans and B. subtilis and the lowest number of occurrences in such genomes as H. pylori, and M. tuberculosis (genomes reported in their survey).

At first glance, this would seem to suggest that many genomes do not use the same mechanisms of termination for the standard transcription termination and do not use attenuation or antitermination in regulation. This is likely the case in some genomes. Yet, if the number of upstream intergenic regions is plotted against the number of predicted sites, a strong positive correlation is shown (Figure 4). The smaller the number of genes and intergenic regions a genome has, the lower the occurrence of predicted terminators (both standard transcription terminators and attenuation/antitermination regulatory terminators). This indicates that the low numbers of both standard termination and regulatory termination in many genomes is due to a much reduced genome size and the reduction of the number of regulatory operons, and not necessarily to the reliance on different mechanisms of termination and regulation.

Figure 4
figure4

Graph of the number of intergenic regions vs. the number of putative attenuation and antitermination sites in all 26 genomes surveyed. Several genomes with known attenuation or antitermination are labeled for comparison as is M. tuberculosis and the Archaea. The dashed line is a exponential trendline.

There is a clear outlier with a much lower than expected number of putative terminators seen in Figure 4, Mycobacterium tuberculosis. This genome has a much lower occurrence of putative attenuation and antitermination sites than would be suggested by its size and the number of intergenic regions. A recent paper by Unniraman et al. [18] concludes that M. tuberculosis uses a different mechanism of termination that utilizes terminator structures without the poly-U tail necessary in other genomes. Thus the reduced number of poly-U containing terminator structures in relation to the number of intergenic regions can be explained by M. tuberculosis' reliance on a different mechanism of termination. This does not necessarily prove there is no attenuation or antitermination type regulation in M. tuberculosis. However, it does indicate that either the loss of the standard mechanism of termination in this genome has reduced if not eliminated attenuation or antitermination in M. tuberculosis or alternatively, an attenuation-like mechanism could exist in this genome that utilizes the M. tuberculosis non-standard terminator.

All other of the 25 genomes surveyed have putative attenuation or antitermination regulation sites. Even the lowest number of predicted attenuation or antitermination sites found in M. genitalium are a significant proportion of possible regulatory intergenic regions, the low number is easily accounted for by this genome's relatively small size and few intergenic regions and transcriptional units. These results suggest that attenuation and antitermination regulation is a possibly ubiquitous mechanism of regulation in prokaryotes with few exceptions.

Genome Size and Attenuation

If the GC content of a genome is compared with the number of predicted attenuators based on randomly shuffled sequence, GC content does somewhat correlate with the number of predicted attenuators, which would be expected since a poly-U run is required in the filters. In Figure 5a, folds from randomly shuffled intergenic sequences of our 26 genomes were plotted by the number of filtered folds per intergenic region in relation to number of intergenic regions. If the number of filtered folds was completely random, there should be a relatively constant number of sites per region in relation to the number of regions. As seen in figure 5a, this is not completely the case. The number of filtered folds per region obtained from randomly shuffled sequences is dependent on the GC content of the genome. Low-GC content genomes have a slightly higher per region number of folds than do genomes of around 50% GC content and high-GC content genomes have much lower number than both. This is expected from random sequences filtered for stem-loop structures containing poly-U runs.

Figure 5
figure5

Genome Size and Regulation. (a) Intergenic sequences of 26 genomes were randomly shuffled, folded and filtered using reported method to obtain putative 'attenuators'. The number of these shuffled and filtered folds per intergenic region were plotted for each genome against the number of intergenic regions. The correlation, if random, should remain constant and independent of genome size. Blue spheres represent proteobacteria and Bacillis species in our survey, beige are archaeabacteria and green the rest. Spheres are in size in proportion to the genome's GC content and GC content is labeled within each sphere. The number of random folds per intergenic region is a function of GC content as would be expected from filtering for folds with poly-U runs. Genomes with known attenuation or antitermination are labeled as is the genome known not to use attenuators with poly-U runs in termination. (b) Intergenic sequences of 22 genomes were folded and filtered for possible attenuators and indication of attenuation or antitermination regulation. The number of these predicted attenuators per intergenic region is compared to the number of intergenic regions in the genome. In contrast to folds of randomly shuffled sequences, the strongest determinate for the frequency of attenuation is genome size (number of intergenic regions and genome size are strongly correlated). Colors and labeling are the same as in 5a.

Even when taking into account the GC content of M. tuberculosis, it has a reduced number of predicted attenuators in relation to the other high-GC genomes (Figure 5b). In fact, Figure 5b (predicted attenuators of actual intergenic sequences) shows that the strongest determinate of the number of predicted attenuators per intergenic region is not GC content but rather genome size (more specifically the number of intergenic regions). In general, not only do larger genomes have a greater absolute number of predicted attenuators, but have a greater occurrence of predicted attenuators per region. If GC content is equal in two genomes, the larger genome is more likely to have a higher number of predicted attenuators per intergenic region. Previous reports have suggested similar phenomena in regulatory proteins, large genomes appear to have a larger proportion of their total number of genes that code for proteins which contain regulatory motifs [19]. Interestingly, discounting the archaebacteria and high GC content genomes, a genome of about 1500 intergenic regions appears to be the threshold at where the frequency of regulatory attenuators increases in a genome.

Distribution and Conservation of Attenuators in Gram positive Bacteria

Seven genomes of gram-positive bacterias (B. subtilis, B. halodurans, L. innocua, S. aureus, C. acetobutylicum, L. lactis, and S. pneumoniae) were analyzed to see whether the attenuation terminators are conserved in front of the orthologs. The number of predicted attenuation terminators for the genes known to be regulated in B. subtilis and their orthologs in the other six genomes are listed in Table 4. The genomes are sorted by phylogenetic distance from B. subtilis calculated by amino acid sequences of the shared orthologs among these genomes. The closest one to the B. subtilis is B. halodurans and the averaged number of amino acid substitutions per site is 0.238, and the most distant one is S. pneumoniae and the averaged number of amino acid substitutions per site is 0.422. For the 42 genes listed in Table 4, the numbers of orthologs that are found in the other genomes vary little from genome to genome: The highest and the lowest numbers of orthologs are 31 in L. lactis and 26 in S. aureus and C. acetobutylicum, respectively. This is mainly because these 42 genes carry some basic functions such as aminoacyl-tRNA synthesis. On the other hand, the numbers of predicted attenuation termination structures vary significantly: In B. halodurans, 22 orthologous genes have predicted attenuation termination structures, while only 4 orthologous genes have the predicted structures in S. pneumoniae. This indicates that the absence or presence of regulation by attenuation is much more weakly conserved than the gene or operons presence.

Table 4 List of known attenuators in B. subtilis compared with predictions in six other genomes of gram-positive bacteria

The same trend holds true for the predicted attenuation termination structures other than known ones (Table 5). There are 105 orthologous gene groups that have at least one other genome containing a predicted attenuator structure upstream an orthologous gene. Restricting to the orthologs that have predicted attenuators in B. subtilis (35 groups), the highest and the lowest numbers of shared orthologs of genes known to be regulated by attenuation or antitermination in B. subtilis are 28 (L. innocua) and 18 (S. pneumoniae), respectively. The numbers of predicted attenuation termination structures, however, vary more. While there are 13 genes with predicted structures in B. halodurans, which is the closest species to B. subtilis among the six gram-positive bacterias, only 2 genes have predicted structures in S. pneumoniae.

Table 5 List of all orthologous genes in the six gram-positive bacteria genomes in which two or more genomes share predicted attenuators

Although there is weak conservation of attenuators as a whole, predicted attenuation termination structures and the order of their downstream genes are conserved for some groups of genes. One of such example is infC-rpml-rplT operon (figure 6a). No attenuation termination structure is predicted in the upstream region of infC in S. pneumoniae (Table 5). Closer look at this region by BLAST [20] revealed that the N-terminal of infC is over predicted in 27 bases. By adding the 27 bases to the intergenic region in the upstream, we found a stable stem-loop structure that followed by poly-U residues also in S. pneumoniae (Figure 6b). Even in this example however, there are considerable differences among species in the relative position of the stem-loop structures and sequence conservation. Moreover, even between the phylogenetically closest pair, B. subtilis and B. halodurans, the distances from the end of the stem to the start codon of infC are 69 and 37 bases, respectively, and only the common segments found in the stem are GUGUGGGN{x}CCCACAC (x = 12 in B. subtilis and x = 9 in B. halodurans). Among all the seven genomes, there is only a weak similarity, GYGGG (GACGG in C. acetobutylicum) in the stem region.

Figure 6
figure6

Predicted attenuation termination structure in upstream region of putative infC-rpmI-rplT operon. (a) Order of genes. Only intergenic regions are drawn to scale and the length of intergenic regions are given below the line. Orthologous genes are indicated in the same colors. Hypothetical genes and the other non-orthologous genes are indicated by "hyp" and their gene IDs, respectively. Abbreviation for genomes: Bs, B. subtilis; Bh, B. halodurans; Li, Listeria innocua; Sa, Staphylococcus aureus; Ca, Clostridium acetobutylicum; Ll, Lactococcus lactis; Sp, Streptococcus pneumoniae. (b) Predicted attenuation termination structures. Base pairs are indicated by red dots between the base codes. Base numbering shows the distance from the start codon of the down stream gene. Poly-Us just down stream of the stem-loop structure is colored in green. Weakly conserved segments are colored in red. Abbreviation for genomes is the same as in (a).

Conservation of predicted attenuation termination structures is also observed in the upstream regions of the possible operon containing nusA gene (Figure 7a). Four out of seven genomes contain predicted attenuator structures in upstream of the hypothetical protein (ylxS in B. subtilis). Stem-loop structures are also found in the rest of three genomes, although these structures do not pass the filters. The location of the structures to the transcription start site of the downstream gene and sequences themselves vary significantly in this example also. In these stem sequences, the segment GUGGG (GAGCG in L. lactis and GAGGC in S. pneumoniae) is conserved in the predicted operon containing nusA gene (Figure 7b). Interestingly, the 5-base segments are identical or very similar to the segments in the stem-loop structures located in the upstream of infC (figure 6b). The proteins encoded the genes in these two operon are involved in transcription. The conservation of the sequence segments in the predicted attenuation terminator structures for infC-rpmI-rplT operon and the operon containing nusA implies that there exists a common regulatory mechanism that recognizes the stem-loop structure and this would regulate both operons in the same manner.

Figure 7
figure7

Predicted attenuation termination structure in upstream region of ylxS gene. (a) Order of genes. Predicted stem-loop structures with statistical significance are indicated in blue, and the other structures that neither pass the filters nor have less significance are indicated in red. For the other explanation, see legend to figure 6a. (b) Predicted attenuation termination structures. See legend to figure 6b for the explanation.

Distribution and Conservation of Attenuators in Proteobacteria

Several aspects of the conservation of attenuators are immediately apparent from our analysis of gram-positive bacteria . First, the distribution of attenuation or antitermination regulation is not well conserved across gram-postive baceria and additionally, even in conserved regulatory systems, sequence and structure conservation is weak. The same holds true for proteobacteria. Of the 14 genes in E. coli (see Table 5a) known to be regulated by attenuation or antitermination, none have attenuators predicted upstream orthologs in all of the four other proteobacteria genomes. Six have attenuators predicted upstream orthologs in at least one of the other four genomes. Three are genes that have orthologs in all four other genomes, but these have no predicted attenuators. The remaining five genes in E. coli have either no known orthologs in the other genome or orthologs have a spotty distribution and no predicted attenuators. Closer inspection by hand confirms this conclusion. Table 5b is a list of all predicted attenuators in each of the five genomes of the gamma division of proteobacteria in which a similar attenuator is predicted for an ortholog of another genome. As shown in this table, attenuation and antitermination appears to be poorly conserved as a mechanism of regulation in analogous operons in proteobacterial genomes. Of the total of 475 genes and their orthologs in these five genomes that have predicted attenuators, only 36 are shared upstream orthologs of two or more genomes (Tables 3, 5a and 5b).

Table 5a List of known attenuators in E. coli compared with predictions in four other genomes of proteobacteria (gamma subdivision)
Table 5b List of all orthologous genes in the five proteobacteria (gamma subdivision) genomes in which two or more genomes share predicted attenuators

Previous research concerning specific systems have reported that attenuation and antitermination regulation in some operons in E. coli are only mildly conserved across gamma division proteobacteria. The regulation rpsJ operon [21] and the trpE and pheA operons [22] of E. coli have been shown to have a spotty distribution and weakly conserved across proteobacteria. As shown in Tables 2, 5a and 5b, we have been able to extensively extend this analysis of attenuation and antitermination to most such systems in proteobacteria, and have shown that this holds true for all known attenuation and antitermination regulatory mechanisms in E. coli and other predicted mechanisms in additional gamma division genomes. An example is given in figure 8 of the low sequence conservation of attenuators and regulation. In figure 8a, one of the more conserved attenuators is shown for that of the hisG operon. This operon and regulatory mechanism is well characterized in E. coli [23] and our analysis predicts similar mechanisms of attenuation regulation in V. cholerae and H. influenzae. The predicted attenuators have conserved position (at approximately 40-50 bp upstream start codon of hisG gene), and stem sequence. Though the surrounding intergenic regions are not possible to align, V. cholerae and H. influenzae do have possible amino acid leader sequences with a run of histidines that is characteristic of the attenuation regulation mechanism in E. coli. Predicted attenuators were not found in the other three gamma subdivision probacteria genomes of P. aeruginosa, N. meningitidus and X. fastidiosa. In P. aeruginosa the intergenic region upstream of the hisG ortholog is only 17 bp in length, in X. fastidious the orthologous gene overlaps with the ORF upstream, and though the analogous N. meningitidus intergenic region is of sufficient length, no attenuator is predicted.

Figure 8
figure8

Predicted attenuation termination structure in upstream region of HisG gene in E. coli. (a) Order of genes. Predicted stem-loop structures with statistical significance are indicated in blue. For the other explanation, see legend to figure 6a. Abbreviations for genomes: Ec, Escherichia coli; Hi, Haemophilus influenzae; Vc, Vibrio cholerae; Pa, Pseudomonas aeruginosa; Xf, Xylella fastidiosa; Nm, Neisseria meningitidis. (b) Predicted attenuation termination structures. See legend to figure 6b for the explanation.

Discussion

In summary, attenuation terminators reveal a striking pattern distinct from both folds of randomly shuffled sequences and intragenic regions. In relation to their length, terminator folds have a much lower free energy (ΔG) than random folds or those within cistronic regions. This enables us to differentiate and predict many novel attenuation regulation sites in a variety of putative operons and would be a 5-fold increase in the number of known attenuation structures in B. subtilis and E. coli. This measure works in two highly divergent species with distinct mechanisms of attenuation and antitermination, and different GC content. Hence, it is feasible to extend such analysis to all bacterial genomes. Extending the study to a diverse collection an additional 24 complete genomes suggests that attenuation and antitermination is likely used in most genomes, with the possible exception of M. tuberculosis, as a form of regulation.

The standard transcription termination mechanism likely came early in the evolution of bacteria and regulation by attenuation and antitermination most probably arose by co-opting existing terminators and the transcription termination mechanism. How and when attenuation and antitermination has evolved in individual genomes and taxa and how this mechanism arose in specific operons and biochemical systems is a question that can now be further analyzed and is a subject of future work.

This study also allows us to make strong predictions for specific instances of attenuation and antitermination regulation. Previously, Merino et al. [14] published a chapter in TITLE looking at orthologous genes of genes known to be regulated by attenuation and antitermination and found a significant number of putative attenuators. These were reported also on a web site (http://cmgm.stanford.edu/~merino). The results of our research reported here confirm most of those predictions. In addition, as shown in this paper, since the conservation of attenuation and antitermination regulation across a taxa is weak, looking at orthologous genes in other genomes will miss many potential attenuators. This report greatly extends the predictions of attenuation and antitermination. These predictions can be very useful in directing future research. As proof of point, one such prediction made by our study was for attenuation regulation in the gene ctrA (pyrG) in B. subtilis. In the course of our study a recent report confirmed this prediction [23].

This research also enables a better understanding of the evolution and distribution of attenuation and antitermination regulation. Such predictions can be very beneficial in directing research into operon regulation, assisting in predicting gene function, understanding the evolution of regulation in general and heightening our understanding of regulons.

Materials and Methods

Genome sequence data

Genome sequences and their annotations were obtained from GenBank [24] (species and accession numbers: A. fulgidus AE000782; B. burgdorferi AE000783; B. halodurans BA000004; B. subtilis AL009126; Buchnera sp. AP000398; C. acetobutylicum AE001437; C. jejuni AL111168; C. pneumoniae AE001363; D. radiodurans chromosome 1 AE000513; E. coli K-12 U00096; H. influenzae L42023; H. pylori J99 AE001439; L. innocua AL592022; L. lactis AE005176; M. genitalium L43967; M. jannaschii L77117; M. tuberculosis AL123456; N. meningitidis MC58 AE002098; P. abyssi AL096836; P. aeruginosa AE004091; S. aureus Mu50 BA000017; S. pneumoniae AE005672; Synechocystis sp. AB001339; T. maritima AE000512; V. cholerae chromosome 1 AE003852; X. fastidiosa AE003849).

RNA folding and filters

For each gene in a genome, we collected upstream sequence segments up to 300 residues or the the neighboring ORF. ORFs with less than 50 amino acids were considered as intergenic regions since some attenuation mechanisms are coupled with the synthesis of leader peptides. The total number of these segments was 3560 in B. subtilis and 3613 in E. coli. Stem-loop structures in these segments were predicted by using the RNAfold program [25]. As a reference, each upstream sequence was shuffled to produce random sequences with the same base composition and folded in the same manner. Using the characteristics of the known structures, we derived three filters based on location and poly-U runs which were applied to the collected folds to optimize the possibility of finding new attenuation structures (Figure 9). The filtering process retained 44 of the 46 known attenuation terminators. The three filters used, based on known attenuation terminators in B. subtilis and E. coli are: (i) Poly-U stretches (≥ 4 Us) must be located within from the 10 residues at the top of the stem to 15 residues downstream of the stem: (ii) the length of the 3' sequence from the end of the stem to the start position of the downstream gene must be ≤ 170 and (iii) the length of the 5' sequence from the beginning of the intergenic region to the 5' start of the stem must be ≥ 30, if the upstream gene is in the same orientation as the downstream (to partially eliminate 'standard' transcription terminator structures). Available public programs for transcription termination prediction, such as TransTerm [13] were not useful in this analysis. The program takes the direction of neighboring genes into account and distance filters, and could not be applied to the prediction of termination structures located upstream of a gene.

Figure 9
figure9

Schematic drawing of the analysis of upstream sequence segments and definition of filters as described in Materials and Methods.

To futher support the prediction of attenuation terminator, we also applied promoter prediction: if a promoter is predicted in the upstream of a predicted attenuation terminator, then it is more likely to be the real attenuation terminator. NNPP version 2.2 [26, 27] was used for the prediction of prokaryotic promoters.

Significance measurement

To evaluate the significance of stem-loop structures in upstream sequences, we used the distribution of those structures in the randomly shuffled sequences, which has the same base composition and only the order of the bases are randomized. First, all the stem-loop structures found in the shuffled sequences are plotted according the their stability and stem length (figures 1 and 3). Then the line running along the largest variance is calculated by principal component analysis [28]. Using the standard deviation which is calculated from distribution of stem-loop structures in the shuffled sequence around the line, Z-score is calculated for each stem-loop structure. We took those stem-loop structures in the upstream sequences with Z ≤ -2 as significant structures.

Identification of orthologous genes

To identify orthologous gene pair among a pair of genomes, first we carried out all-against-all comparison between the sets of proteins, each of which is from a whole genome. We used the BLASTP program [20] for this comparison. Only the hits with BLAST E ≤ 0.001 are collected as significant hits. Then, amang those significant hits, a pair of genes are defined as ortholog if the pair satisfies the "bi-directional best hit" [17]. For a group of more than 2 genomes, a group of genes, each of which is from a genome, are defined as ortholog if all possible pairs of genes satisfy the bi-directional significant best hit.

References

  1. 1.

    Henkin TM: Control of transcription termination in prokaryotes. Annu Rev Genet. 1996, 30: 35-57. 10.1146/annurev.genet.30.1.35.

    PubMed  CAS  Article  Google Scholar 

  2. 2.

    Yanofsky C: Transcription attenuation: once viewed as a novel regulatory strategy. J Bacteriol. 2000, 182: 1-8.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  3. 3.

    Wagner R: Transcription Regulation in Prokaryotes. Oxford Oxford University Press,. 2000

    Google Scholar 

  4. 4.

    Yanofsky C: Attenuation in the control of expression of bacterial genomes. Nature. 1981, 289: 751-758.

    PubMed  CAS  Article  Google Scholar 

  5. 5.

    Yanofsky C, Konan KV, Sarsero JP: Some novel transcription attenuation mechanisms used by bacteria. Biochimie. 1996, 78: 1017-1024. 10.1016/S0300-9084(97)86725-9.

    PubMed  CAS  Article  Google Scholar 

  6. 6.

    Babitzke P: Regulation of tryptophan biosynthesis: Trp-ing the TRAP or how Bacillus subtilis reinvented the wheel. Mol Microbiol. 1997, 26: 1-9. 10.1046/j.1365-2958.1997.5541915.x.

    PubMed  CAS  Article  Google Scholar 

  7. 7.

    Du H, Yakhnin A, Dharmaraj S, Babitzke P: trp RNA-binding attenuation protein-5' stem-loop RNA interaction is required for proper transcription attenuation control of the Bacillus subtilis trpEDCFBA operon. J Bacteriol. 2000, 182: 1819-1827. 10.1128/JB.182.7.1819-1827.2000.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  8. 8.

    Allen T, Shen P, Samsel L, Liu R, Lindahl L, Zengel JM: Phylogenetic analysis of L4-mediated autogenous control of the S10 ribosomal protein operon. J Bacteriol. 1999, 181: 6124-32.

    PubMed  CAS  PubMed Central  Google Scholar 

  9. 9.

    Zengel JM, Lindahl L: A hairpin structure upstream of the terminator hairpin required for ribosomal protein L4-mediated attenuation control of the S10 operon of Escherichia coli. J Bacteriol. 1996, 178: 2383-238.

    PubMed  CAS  PubMed Central  Google Scholar 

  10. 10.

    Carafa YdA, Brody E, Thermes C: Prediction of rho-independent Escherichia coli transcription terminators: A statistical analysis of their RNA stem-loop structures. J Mol Biol. 1990, 216: 835-858.

    CAS  Article  Google Scholar 

  11. 11.

    Wilson KS, von Hippel PH: Transcription termination at intrinsic terminators: The role of the RNA hairpin. Proc Natl Acad Sci USA. 1995, 92: 8793-8797.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  12. 12.

    Yarnell WS, Roberts JW: Mechanism of intrinsic transcription termination and antitermination. Science. 1999, 284: 611-615. 10.1126/science.284.5414.611.

    PubMed  CAS  Article  Google Scholar 

  13. 13.

    Ermolaeva MD, Khalak HG, White O, Smith HO, Salzberg SL: Prediction of transcription terminators in bacterial genomes. J Mol Biol. 2000, 301: 27-33. 10.1006/jmbi.2000.3836.

    PubMed  CAS  Article  Google Scholar 

  14. 14.

    Merino E, Yanofsky C: Regulation by Termination-Antitermination: a Genomic Approach,. in Bacillis subtilis and its closest relatives: From Genes to Cells. Washington D.C.: American Society of Microbiology. 2001

    Google Scholar 

  15. 15.

    Chopin A, Biaudet V, Ehrlich SD: Analysis of the Bacillus subtilis genome sequence reveals nine new T-box leaders. Mol Microbiol. 1998, 29: 661-669. 10.1046/j.1365-2958.1998.00911.x.

    Article  Google Scholar 

  16. 16.

    Grundy FJ, Henkin TM: The S box regulon: a new global transcription termination control system for methionine and cystein biosynthesis genes in gram-positive bacteria. Mol Microbiol. 1998, 30: 737-749. 10.1046/j.1365-2958.1998.01105.x.

    PubMed  CAS  Article  Google Scholar 

  17. 17.

    Snel B, Lehmann G, Bork P, Huynen M: STRING: a web-server to retrieve and display the repeatedly occurring neighborhood of a gene. Nucleic Acids Res. 2000, 28: 3442-3444. 10.1093/nar/28.18.3442.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  18. 18.

    Unniraman S, Prakash R, Nagaraja V: Alternate paradigm for intrinsic transcription termination in eubacteria. J Biol Chem. 2001, 276: 41850-41855. 10.1074/jbc.M106252200.

    PubMed  CAS  Article  Google Scholar 

  19. 19.

    Stover CK, et al: Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature. 2000, 406: 959-64. 10.1038/35023079.

    PubMed  CAS  Article  Google Scholar 

  20. 20.

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 339-3402. 10.1093/nar/25.17.3389.

    Article  Google Scholar 

  21. 21.

    Allen T, Shen P, Samsel L, Liu R, Lindahl L, Zengel JM: Phylogenetic analysis of L4-mediated autogenous control of the S10 ribosomal protein operon. J Bacteriol. 1999, 181: 6124-32.

    PubMed  CAS  PubMed Central  Google Scholar 

  22. 22.

    Panina EM, Vitreschak AG, Mironov AA, Gelfand MS: Regulation of aromatic amino acid biosynthesis in gamma-proteobacteria. J Mol Microbiol Biotechnol. 2001, 3: 529-43.

    PubMed  CAS  Google Scholar 

  23. 23.

    Meng Q, Switzer R: Regulation of Transcription of the Bacillus subtilis pyrG Gene, encoding cytidine triphosphate synthetase. J Bacteriol. 2001, 183: 5513-5522. 10.1128/JB.183.19.5513-5522.2001.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  24. 24.

    Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL: GenBank. Nucleic Acids Res. 2002, 30: 17-20. 10.1093/nar/30.1.17.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  25. 25.

    Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, Schuster P: Fast folding and comparison of RNA secondary structures. Monatsh Chem. 1994, 125: 167-188.

    CAS  Article  Google Scholar 

  26. 26.

    Reese MG: Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem. 2001, 26: 51-56. 10.1016/S0097-8485(01)00099-7.

    PubMed  CAS  Article  Google Scholar 

  27. 27.

    Neural Network Promoter Prediction:. [http://www.fruitfly.org/seq_tools/promoter.html]

  28. 28.

    Afifi AA, Clark V: Computer-aided multivariate analysis. Baca Raton, Florida: Chapman & Hall. 1996, 3

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Warren C Lathe III.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lathe, W.C., Suyama, M. & Bork, P. Identification of attenuation and antitermination regulation in prokaryotes. Genome Biol 3, preprint0003.1 (2002). https://doi.org/10.1186/gb-2002-3-6-preprint0003

Download citation

Keywords

  • Intergenic Region
  • Orthologous Gene
  • Transcriptional Unit
  • Attenuation Regulation
  • Putative Promoter Sequence