Conservation of the binding site for the arginine repressor in all bacterial lineages
© Makarova et al., licensee BioMed Central Ltd 2001
Received: 30 October 2000
Accepted: 6 February 2001
Published: 22 March 2001
The arginine repressor ArgR/AhrC is a transcription factor universally conserved in bacterial genomes. Its recognition signal (the ARG box), a weak palindrome, is also conserved between genomes, despite a very low degree of similarity between individual sites within a genome. Thus, the arginine repressor is different from two other universal transcription factors - HrcA, whose recognition signal is very strongly conserved both within and between genomes, and LexA/DinR, whose signal is strongly conserved within, but not between, genomes. The arginine regulon is well studied in Escherichia coli and to some extent in Bacillus subtilis and some other genomes. Here, we apply the comparative genomic approach to the prediction of the ArgR-binding sites in all completely sequenced bacterial genomes.
Orthologs of ArgR/AhrC were identified in the complete genomes of E. coli, Haemophilus influenzae, Vibrio choleras, B. subtilis, Mycobacterium tuberculosis, Thermotoga maritima, Chlamydia pneumoniae and Deinococcus radiodurans. Candidate arginine repressor binding sites were identified upstream of arginine transport and metabolism genes.
We found that the ArgR/AhrC recognition signal is conserved in all genomes that contain genes encoding orthologous transcription factors of this family. All genomes studied except M. tuberculosis contain ABC transport cassettes (related to the Art system of E. coli) belonging to the candidate arginine regulons.
Comparison of three transcriptional regulator families with predominantly single representatives from each bacterial genome
Pattern of species*
Type of DNA-binding domain and fused domain
DNA-binding domain conservation†
Sites per genome
'Winged helix'¶ HTH and serine protease (S24 family)
2.20 ± 0.28
SOS box (Gram-negative): CTGTatatatatMCAG
Cheo box (Gram-positive): cGAACrnryGTTYg
Predicted 'winged helix' HTH and uncharacterized domain possibly responsible for activation by chaperonin GroE
1.72 ± 0.22
CIRCE box: TTAGCACTCn9GAGTGCTAA
'Winged helix' HTH and arginine-binding domain
1.81 ± 0.22
ARG box: TGMATwwwwATKCA
The signals recognized by LexA in Gram-negative bacteria and by its ortholog DinR in Gram-positive bacteria (the SOS box  and the Cheo box , respectively) are completely different. Accordingly, the DNA-binding domains of these proteins are divergent (Table 1). The heat-shock regulator HrcA binds CIRCE elements that are located upstream of genes encoding heat-shock proteins (molecular chaperones) in many different genomes [5,6]; in the mycoplasmas, HrcA also regulates heat-shock protease genes . The CIRCE signal is very specific (two complementary nonamers with a 9 base pair (bp) spacer) and is extremely highly conserved in all genomes that encode HrcA (not more than five, and usually less than three, mismatches to the consensus in all known and predicted sites ). The amino acid sequence of HrcA is conserved as well (Table 1).
The arginine regulon, which is regulated by the arginine repressor ArgR/AhrC, represents an evolutionary strategy distinct from that of either the SOS or the heat-shock regulons. The DNA-binding domains of the ArgR/AhrC family are less conserved than those of the HrcA family, but more conserved than those of the LexA/DinR family (Table 1, column 5). DNA signals recognized by ArgR/AhrC are also similar in several bacterial lineages at least [8,9,10,11]. These sites often occur in pairs [12,13,14,15], although single-box sites have also been shown to bind ArgR/AhrC, for example the sites in the catabolic operons of B. subtilis , the adenine deaminase pathway operon in Bacillus licheniformis , and the cer recombination region of the E. coli plasmid ColE1 ([16,17]; see also the study of mutated ArgR ). Unlike the CIRCE element, the ARG box seems to be weakly conserved, even within a genome, and the specificity of recognition is often achieved by cooperative interactions between tandem sites, as shown in both experimental [9,12,13] and statistical  studies. The set of ARG boxes from different genomes, however, is fairly homogeneous, and indeed, arginine repressors from different bacteria appear to be at least partially interchangeable within major taxonomic groups: there is some cross-binding between ArgR and AhrC ; ArgR but not AhrC binds to the Thermus thermophilus sites  and AhrC binds to the Streptomyces coelicolor sites . The ARG box consensus was described as TNTGAATWWWWATTCANW in E. coli [8,12], CATGAATAAAAATKCAAK in B. subtilis [9,10] and AWTGCATRWWYATGCAWT in Streptomycetes  (where W = A or T, K = G or T, R = A or G, Y = T or C, N = any base; Table 1). In addition, binding of ArgR homologs to the sites similar to ARG boxes was reported for other Bacillus species (B. licheniformis  and B. stearothermophilus [23,24]), and for Salmonella typhimurium . Several ArgR-binding sites were predicted on the basis of similarity with the E. coli consensus in the upstream regions of various genes involved in arginine metabolism in Moritella .
In a previous study , we used comparative genomic analysis of regulatory signals to predict the gene composition of the arginine regulon of Haemophilus influenzae using the well characterized E. coli regulon as the starting point. Here we extend this analysis to explore the conservation of the ARG box in all bacteria that encode an ortholog of the ArgR repressor.
Results and discussion
The comparative approach to the analysis of regulation is based on the assumption that regulons (sets of co-regulated genes) are conserved in genomes containing orthologs of the relevant regulatory proteins. Thus true candidate binding sites for the regulator occur upstream of orthologous genes, whereas false positives are scattered at random in the genome. This provides a consistency check that sharply increases the accuracy of prediction.
Candidate ARG boxes upstream of arginine metabolism related genes and operons
In addition to previously characterized ARG boxes in B. subtilis we identified a candidate ARG box upstream of the yqjN gene (Figure 1, Table 2), a probable product of recent duplication of the rocB gene encoding an arginine utilization protein with unknown biochemical function. Thus is it likely that YqjN has the same function as RocB and is also involved in arginine degradation.
In the Pseudomonas aeruginosa genome there are three systems closely related to the above transporters. One is orthologous to hisJQMP and the other to artPIQM. These two systems have not been characterized experimentally. The third system, aotQJMP, is closer to hisJQMP than to artPIQM. It encodes transporters of arginine and ornithine, but not lysine , and is located within the arginine and ornithine catabolism locus aot-aru. The aot system is positively regulated by an activator, ArgR, which is encoded by the distal gene of the aotJQMOPargR operon . This activator belongs to the AraC family and is not related to the ArgR repressor of E. coli .
The situation with the C. pneumoniae genome is not clear. It contains the argR gene but no genes for the arginine metabolism. There is a stand-alone artJ gene (encoding an ABC cassette periplasmic protein) and two genes annotated as glnPQ immediately downstream of argR (encoding the transmembrane and ATPase components respectively). In fact, glnP of C. pneumoniae is the bidirectional best hit of the E. coli gene yecC situated in the flagellar locus. The ABC transporters are not easily amenable to orthology analysis, as their specificity may change at a fast rate. As mentioned above, positional and regulatory analysis is often the only computational technique for determining the cellular role of ABC cassettes before experimental verification. We note a pair of ARG boxes upstream of glnPQ and two ARG boxes with lower z-scores upstream of the artJ operon of C. pneumoniae. Thus it is very tempting to predict that these genes in fact encode an arginine transport system regulated by ArgR. We feel, however, that this prediction cannot be accepted without experimental verification, especially in view of two complicating observations. First, both artJ and glnPQ operons are conserved in the genome of C. trachomatis, despite the fact that the latter has no gene for ArgR. Second, ArgR of C. pneumoniae is closer to the ArgR of gamma-proteobacteria than to the AhrC/ArgR of Gram-positive bacteria, but nevertheless the ARG boxes of C. pneumoniae are visible with the Bacillus profile, but not with the gamma-proteobacteria profile.
Taken together these data suggest that ARG regulons represent an interesting (and possibly unique) case which could be considered as an intermediate evolutionary state compared to the HrcA and LexA/DinR regulons. ArgR orthologs retain high similarity on the amino acid level within the major taxonomic groups, and are identifiable between these groups, whereas ARG box conservation is low, although sufficient to be detected in diverged bacterial lineages. Nevertheless, this state seems to be stable and it is not clear what evolutionary forces are responsible for its stability. In this respect it is noteworthy that the structural type of the DNA-binding domain in the protein apparently does not determine the evolutionary relationships with its recognition site. All three aforementioned regulator families, as well as many others, contain the so-called 'winged helix' DNA-binding domain and its conservation is not correlated with conservation of its binding site (Table 1).
The composition of the ARG regulons in different bacteria is known to vary mainly because of diversity in the arginine degradation pathways and species-specific paralogs. The question of the origin of 'additional' ARG boxes thus arises. Because of the low conservation of the ArgR-binding signal, it is possible that some of the sites could be convergent in origin. Moreover, each genome contains a large number of potential ARG box-like sequences that could become actual sites when they become located upstream of an arginine metabolism gene following chromosomal rearrangements .
In contrast, CIRCE elements appear to be direct descendants of the ancient regulon present in the common ancestor of the Bacteria, because the variation in the composition of the CIRCE regulon is minimal and the few additional sites found in some genomes are apparently products of duplication. Most other DNA-binding domains of transcriptional regulators (including LexA) seem to undergo considerable changes together with their DNA signals and regulons. Thus, the evolution of the arginine regulon and ARG boxes seems to reflect a tradeoff between maintaining regulon flexibility on one hand and retaining the universal regulatory mechanism on the other.
Another interesting aspect of the arginine regulon strategy is the use of single and cooperative sites. In E. coli, the use of cooperative binding sites by ArgR seems to be a consequence of a requirement for a sharper response to a stimulus (arginine starvation) compared to the SOS response (single sites are usually used by LexA) . Unfortunately, the available data seems to be insufficient to draw any systematic conclusions. In particular, as second sites in the cooperative cassettes are often weak (have low scores), some of them could be missed by the recognition rule. Direct experimental studies are needed to clarify this issue. Another problem that was not directly addressed in this study is the role of the E. coli arginine repressor in recombination and its binding to the cer site, which contains a single ARG box [16,17]. We have noted, however, conservation of this box in the monomerization site ckr of the plasmid ColK .
There are a few more transcription factor families (biotin operon repressor, COG1654; putative stress-responsive transcriptional regulator PcpC, COG1983; Bvg accessory factor homologs, COG1521 ) with a single representative per genome, and it would be interesting to compare them as well. They do not, however, contain a sufficient number of experimentally determined binding sites and are not so ubiquitous in the bacterial genomes as the three regulators discussed previously. With more available genomes, we hope that our approach, combined with positional analysis aimed at finding co-localized, and thus possibly functionally related enzymes and regulator genes [35,36], will enable us to make this comparison. On the other hand, we feel that the predictions made in this study, especially identification of the Art family ABC transporters in several diverse genomes, are sufficiently interesting to warrant experimental verification.
Materials and methods
The profile for ARG box identification was constructed as follows. Upstream regions of B. subtilis operons involved in arginine metabolism were selected. An iterative signal search procedure was applied as described previously . The resulting ARG box profile was constructed using the four sites upstream of argC, argG, rocA and rocD. These formally identified sites are a subset of the experimentally known sites . Gamma-proteobacteria were analyzed using the longer E. coli ARG box profile taken from . Only genes having candidate sites in five or more out of the eight genomes analyzed were considered as candidate regulon members and were retained for further analysis. This procedure could lead to the loss of some true sites, but ensured that false sites were not accepted.
The complete genomes of E. coli, H. influenzae, Vibrio cholerae, B. subtilis, Mycobacterium tuberculosis, Thermotoga maritima, Chlamydia pneumoniae and Deinococcus radiodurans were downloaded from GenBank . The complete genome of Clostridium acetobutylicum was obtained at .
We thank Eugene Koonin, Yury Kozlov and Igor Rogosin for useful discussions. This study was partially supported by grants from the Merck Genome Research Institute (244), the Russian Fund of Basic Research (99-04-48247 and 00-15-99362), the Russian State Scientific Program 'Human Genome', INTAS (99-1476), the Howard Hughes Medical Institute (55000309), and Microbial Genome Program, Office of Biological and Environmental Research, DOE (DE-FG02-98ER62583).
- Phylogenetic classification of proteins encoded in complete genomes. [http://www.ncbi.nlm.nih.gov/COG/]
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28: 33-36. 10.1093/nar/28.1.33.PubMedPubMed CentralView ArticleGoogle Scholar
- Walker GC: The SOS response of Escherichia coli. In Escherichia coli and Salmonella in Cellular and Molecular Biology, Vol 1. Edited by Neidhardt, FC. Washington DC: ASM Press;. 1996, 1400-1412.Google Scholar
- Winterling KW, Chafin D, Hayes JJ, Sun J, Levine AS, Yasbin RE, Woodgate R: The Bacillus subtilis DinR binding site: redefinition of the consensus sequence. J Bacteriol. 1998, 180: 2201-2211.PubMedPubMed CentralGoogle Scholar
- Hecker M, Schumann W, Volker U: Heat-shock and general stress response in Bacillus subtilis. Mol Microbiol. 1996, 19: 417-428. 10.1046/j.1365-2958.1996.396932.x.PubMedView ArticleGoogle Scholar
- Segal R, Ron EZ: Regulation and organization of the groE and dnaK operons in Eubacteria. FEMS Microbiol Lett. 1996, 138: 1-10. 10.1016/0378-1097(96)00020-1.PubMedView ArticleGoogle Scholar
- Gelfand MS: Recognition of regulatory sites by genomic comparison. Res Microbiol. 1999, 150: 755-771. 10.1016/S0923-2508(99)00117-5.PubMedView ArticleGoogle Scholar
- Maas WK: The arginine repressor of Escherichia coli. Microbiol Rev. 1994, 58: 631-640.PubMedPubMed CentralGoogle Scholar
- Miller CM, Baumberg S, Stockley PG: Operator interactions by the Bacillus subtilis arginine repressor/activator, AhrC: novel positioning and DNA-mediated assembly of a transcriptional activator at catabolic sites. Mol Microbiol. 1997, 26: 37-48. 10.1046/j.1365-2958.1997.5441907.x.PubMedView ArticleGoogle Scholar
- Klingel U, Miller CM, North AK, Stockley PG, Baumberg S: A binding site for activation by the Bacillus subtilis AhrC protein, a repressor/activator of arginine metabolism. Mol Gen Genet. 1995, 248: 329-340.PubMedView ArticleGoogle Scholar
- Rodriguez-Garcia A, Ludovice M, Martin JF, Liras P: Arginine boxes and the argR gene in Streptomyces clavuligerus: evidence for a clear regulation of the arginine pathway. Mol Microbiol. 1997, 25: 219-228. 10.1046/j.1365-2958.1997.4511815.x.PubMedView ArticleGoogle Scholar
- Charlier D, Roovers M, Van Vliet F, Boyen A, Cunin R, Nakamura Y, Glansdorff N, Pierard A: Arginine regulon of Escherichia coli K-12. A study of repressor-operator interactions and of in vitro binding affinities versus in vivo repression. J Mol Biol. 1992, 226: 367-386.PubMedView ArticleGoogle Scholar
- Tian G, Lim D, Carey J, Maas WK: Binding of the arginine repressor of Escherichia coli K12 to its operator sites. J Mol Biol. 1992, 226: 387-397.PubMedView ArticleGoogle Scholar
- Maghnouj A, de Sousa Cabral TF, Stalon V, Vander Wauven C: The arcABDC gene cluster, encoding the arginine deiminase pathway of Bacillus licheniformis, and its activation by the arginine repressor argR. J Bacteriol. 1998, 180: 6468-6475.PubMedPubMed CentralGoogle Scholar
- Wang H, Glansdorff N, Charlier D: The arginine repressor of Escherichia coli K-12 makes direct contacts to minor and major groove determinants of the operators. J Mol Biol. 1998, 277: 805-824. 10.1006/jmbi.1998.1632.PubMedView ArticleGoogle Scholar
- Stirling CJ, Szatmari G, Stewart G, Smith MC, Sherratt DJ: The arginine repressor is essential for plasmid-stabilizing site-specific recombination at the CoIEI cer locus. EMBO J. 1988, 7: 4389-4395.PubMedPubMed CentralGoogle Scholar
- Guhathakurta A, Summers D: Involvement of ArgR and PepA in the pairing of CoIEI dimer resolution sites. Microbiology. 1995, 141: 1163-1171.PubMedView ArticleGoogle Scholar
- Chen SH, Merican AF, Sherratt DJ: DNA binding of Escherichia coli arginine repressor mutants altered in oligomeric state. Mol Microbiol. 1997, 24: 1143-1156. 10.1046/j.1365-2958.1997.4301791.x.PubMedView ArticleGoogle Scholar
- Berg OG: Selection of DNA binding sites by regulatory proteins: the LexA protein and the arginine repressor use different strategies for functional specificity. Nucleic Acids Res. 1988, 16: 5089-6105.PubMedPubMed CentralView ArticleGoogle Scholar
- Smith MC, Czaplewski L, North AK, Baumberg S, Stockley PG: Sequences required for regulation of arginine biosynthesis promoters are conserved between Bacillus subtilis and Escherichia coli. Mol Microbiol. 1989, 3: 23-38.PubMedView ArticleGoogle Scholar
- Sanchez R, Roovers M, Glansdorff N: Organization and expression of a Thermus thermophilus arginine cluster: presence of unidentified open reading frames and absence of a Shine-Dalgarno sequence. J Bacteriol. 2000, 182: 5911-5915. 10.1128/JB.182.20.5911-5915.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Soutar A, Baumberg S: Implication of a repression system, homologous to those of other bacteria, in the control of arginine biosynthesis genes in Streptomyces coelicolor. Mol Gen Genet. 1996, 251: 245-251. 10.1007/s004380050163.PubMedGoogle Scholar
- Savchenko A, Charlier D, Dion M, Weigel P, Hallet JN, Holtham C, Baumberg S, Glansdorff N, Sakanyan V: The arginine operon of Bacillus stearothermophilus : characterization of the control region and its interaction with the heterologous B. subtilis arginine repressor. Mol Gen Genet. 1996, 252: 69-78. 10.1007/s004389670008.PubMedView ArticleGoogle Scholar
- Dion M, Charlier D, Wang H, Gigot D, Savchenko A, Hallet JN, Glansdorff N, Sakanyan V: The highly thermostable arginine repressor of Bacillus stearothermophilus : gene cloning and repressor-operator interactions. Mol Microbiol. 1997, 25: 385-398. 10.1046/j.1365-2958.1997.4781845.x.PubMedView ArticleGoogle Scholar
- Lu CD, Abdelal AT: Role of ArgR in activation of the ast operon, encoding enzymes of the arginine succinyltransferase pathway in Salmonella typhimurium. J Bacteriol. 1999, 181: 1934-1938.PubMedPubMed CentralGoogle Scholar
- Xu Y, Liang Z, Legrain C, Ruger HJ, Glansdorff N: Evolution of arginine biosynthesis in the bacterial domain: novel gene-enzyme relationships from psychrophilic Moritella strains (Vibrionaceae) and evolutionary significance of N-alpha-acetyl ornithinase. J Bacteriol. 2000, 182: 1609-1615. 10.1128/JB.182.6.1609-1615.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Mironov AA, Koonin EV, Roytberg MA, Gelfand MS: Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes. Nucleic Acids Res. 1999, 27: 2981-2989. 10.1093/nar/27.14.2981.PubMedPubMed CentralView ArticleGoogle Scholar
- Gelfand MS, Koonin EV, Mironov AA: Prediction of transcription regulatory sites in Archaea by a comparative genomic approach. Nucleic Acids Res. 2000, 28: 695-705. 10.1093/nar/28.3.695.PubMedPubMed CentralView ArticleGoogle Scholar
- Dimova D, Weigel P, Takahashi M, Marc F, Van Duyne GD, Sakanyan V: Thermostability, oligomerization and DNA-binding properties of the regulatory protein ArgR from the hyperthermophilic bacterium Thermotoga neapolitana. Mol Gen Genet. 2000, 263: 119-130. 10.1007/s004380050038.PubMedGoogle Scholar
- Higgins CF, Ames GF: Two periplasmic transport proteins which interact with a common membrane receptor show extensive homology: complete nucleotide sequences. Proc Natl Acad Sci USA. 1981, 78: 6038-6042.PubMedPubMed CentralView ArticleGoogle Scholar
- Nishijyo T, Park SM, Lu CD, Itoh Y, Abdelal AT: Molecular characterization and regulation of an operon encoding a system for transport of arginine and ornithine and the ArgR regulatory protein in Pseudomonas aeruginosa. J Bacteriol. 1998, 180: 5559-5566.PubMedPubMed CentralGoogle Scholar
- Park SM, Lu CD, Abdelal AT: Cloning and characterization of argR, a gene that participates in regulation of arginine biosynthesis and catabolism in Pseudomonas aeruginosa PAOI. J Bacteriol. 1997, 179: 5300-5308.PubMedPubMed CentralGoogle Scholar
- Berg OG: Selection of DNA binding sites by regulatory proteins. Functional specificity and pseudosite competition. J Biomol Struct Dyn. 1988, 6: 275-297.PubMedView ArticleGoogle Scholar
- Summers D, Yaish S, Archer J, Sherratt D: Multimer resolution systems of ColE1 and ColK: localisation of the crossover site. Mol Gen Genet. 1985, 201: 334-338.PubMedView ArticleGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999, 96: 2896-2901. 10.1073/pnas.96.6.2896.PubMedPubMed CentralView ArticleGoogle Scholar
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998, 23: 324-328. 10.1016/S0968-0004(98)01274-2.PubMedView ArticleGoogle Scholar
- GenBank. [http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?db=Genome]
- Genome Therapeutics Corporation. [http://www.genomecorp.com/]
- Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 1996, 266: 418-427.PubMedView ArticleGoogle Scholar
- Structural classification of proteins. [http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.1.001.004.004.html]
- Fogh RH, Ottleben G, Ruterjans H, Schnarr M, Boelens R, Kaptein R: Solution structure of the LexA repressor DNA binding domain determined by 1H NMR spectroscopy. EMBO J. 1994, 13: 3936-3944.PubMedPubMed CentralGoogle Scholar
- Van Duyne GD, Ghosh G, Maas WK, Sigler PB: Structure of the oligomerization and L-arginine binding domain of the arginine repressor of Escherichia coli. J Mol Biol. 1996, 256: 377-391. 10.1006/jmbi.1996.0093.PubMedView ArticleGoogle Scholar
- Sunnerhagen M, Nilges M, Otting G, Carey J: Solution structure of the DNA-binding domain and model for the complex of multifunctional hexameric arginine repressor with DNA. Nat Struct Biol. 1997, 4: 819-826.PubMedView ArticleGoogle Scholar
- Ni J, Sakanyan V, Charlier D, Glansdorff N, Van Duyne GD: Structure of the arginine repressor from Bacillus stearothermophilus. Nat Struct Biol. 1999, 6: 427-432. 10.1038/8229.PubMedView ArticleGoogle Scholar