New connections in the prokaryotic toxin-antitoxin network: relationship with the eukaryotic nonsense-mediated RNA decay system
© Anantharaman and Aravind; licensee BioMed Central Ltd. 2003
Received: 21 August 2003
Accepted: 13 October 2003
Published: 26 November 2003
Several prokaryotic plasmids maintain themselves in their hosts by means of diverse post-segregational cell killing systems. Recent findings suggest that chromosomally encoded copies of toxins and antitoxins of post-segregational cell killing systems - such as the RelE system - might function as regulatory switches under stress conditions. The RelE toxin cleaves ribosome-associated transcripts, whereas another post-segregational cell killing toxin, ParE, functions as a gyrase inhibitor.
Using sequence profile analysis we were able unify the RelE- and ParE-type toxins with several families of small, uncharacterized proteins from diverse bacteria and archaea into a single superfamily. Gene neighborhood analysis showed that the majority of these proteins were encoded by genes in characteristic neighborhoods, in which genes encoding toxins always co-occurred with genes encoding transcription factors that are also antitoxins. The transcription factors accompanying the RelE/ParE superfamily may belong to unrelated or distantly related superfamilies, however. We used this conserved neighborhood template to transitively search genomes and identify novel post-segregational cell killing-related systems. One of these novel systems, observed in several prokaryotes, contained a predicted toxin with a PilT-N terminal (PIN) domain, which is also found in proteins of the eukaryotic nonsense-mediated RNA decay system. These searches also identified novel transcription factors (antitoxins) in post-segregational cell killing systems. Furthermore, the toxin Doc defines a potential metalloenzyme superfamily, with novel representatives in bacteria, archaea and eukaryotes, that probably acts on nucleic acids.
The tightly maintained gene neighborhoods of post-segregational cell killing-related systems appear to have evolved by in situ displacement of genes for toxins or antitoxins by functionally equivalent but evolutionarily unrelated genes. We predict that the novel post-segregational cell killing-related systems containing a PilT-N terminal domain toxin and the eukaryotic nonsense-mediated RNA decay system are likely to function via a common mechanism, in which the PilT-N terminal domain cleaves ribosome-associated transcripts. The core of the eukaryotic nonsense-mediated RNA decay system has probably evolved from a post-segregational cell killing-related system.
Post-segregational cell killing (PSK) is a widespread mechanism that aids several plasmids to maintain themselves in their bacterial hosts [1–4]. Operons containing genes for interacting toxin-antitoxin (T-A) pairs that are borne on these plasmids, are the basis for PSK. Typically, the first gene in these operons encodes a labile antitoxin, which also acts as a transcriptional regulator of the operon, while the second gene encodes a stable toxin. Usually, the antitoxin forms a physical complex with the toxin and neutralizes its action. A variation on this theme is seen in the form of the unstable anti-sense RNAs, which act as inhibitors of translation of the toxin mRNAs. If the plasmid is lost, the antitoxin is rapidly degraded while the stable toxin lingers on, killing cells that lack the plasmid. Thus, plasmids with systems for PSK cause their host cells to become addicted to them [1–4]. Additionally, several of these T-A systems are also found on prokaryotic chromosomes, where they may have alternative regulatory functions .
A systematic survey of such T-A operons and their mechanisms was presented in the seminal work of Gerdes in 2000 . Subsequently, there have also been some important studies that have elucidated the biochemical details regarding the action of several toxins. One of these toxins, ParE, was shown to act as an inhibitor of the DNA gyrase, and it induced formation of DNA-gyrase covalent complexes, which could inhibit replication and damage the integrity of the chromosome . In contrast, the RelE and Doc toxins were shown to be inhibitors of translation [5, 8]. More recently, it was demonstrated that the RelE protein cleaved transcripts associated with the ribosome, by specifically targeting codons associated with the ribosomal A-site . RelE displays codon-specificity by showing highest preference for UAG among the stop codons and UCG and CAG among the sense codons . Interestingly, this inhibition of translation by RelE is reversed by the transfer-messenger RNA (tmRNA), which acts as a regulator of protein stability in bacteria . These studies have also suggested that the chromosomal versions of these antitoxin-toxin pairs could function as regulatory switches that control gene expression under poor growth conditions.
Although Gerdes proposed that all T-A operons could have a common origin , an objective evaluation of the evolutionary relationships of these proteins and the origin of these systems has not been conducted. The availability of a large number of prokaryotic genome sequences allows us to use a variety of computational approaches to address the problem of the origin and evolution of these systems. One approach, involving sensitive sequence searches using profile methods, allows the detection of distant relationships, which were hitherto not detected [11–13]. Additionally, it also enables objective evaluation of relationships, based on statistical significance of the detected similarities and multiple alignment-derived secondary structure predictions. A second approach involves the use of comparative genomics to detect conserved gene neighborhoods, gene or domain fusions, and to extract functional and evolutionary information from these contextual connections [14–18]. This approach is particular useful in the case of the prokaryotic PSK systems because of the strong coupling of the toxin and antitoxin genes in a single operon. Our objective in applying these analyses was to discover new functional connections that may not have been previously uncovered in experimental studies on these systems. Given the recent experimental results suggesting a specific role for these systems in the regulation of cellular responses to stress [9, 10, 19], we were also interested in identifying novel genomic versions of PSK-related systems with a wide phyletic distribution.
As a result of our analyses we were able to uncover several new T-A systems and establish an evolutionary relationship between them and the eukaryotic nonsense-mediated RNA degradation system. We also present evidence that the RelE and ParE families of toxins, despite their very distinct modes of action, have been ultimately derived from a common ancestor. Furthermore, we show that the Doc toxin defines a large family of enzymes that could potentially act on RNA and function as regulators of translation in both prokaryotes and eukaryotes.
Results and discussion
Unification of the RelE and ParE families and identification of new related families of proteins
As Escherichia coli RelE and its close relatives are amongst the functionally best-characterized toxins of the PSK systems, with a wide phyletic pattern in bacteria and archaea , we chose them as the starting point of our investigation of the general cellular functions and natural history of these systems. In order to determine the deep evolutionary affinities of the RelE proteins, we initiated a sequence profile search of the non-redundant (NR) protein database (National Center for Biotechnology Information, Bethesda, USA) using the PSI-BLAST program (threshold for inclusion in profile = 0.01, iterated till convergence) . At convergence, this search recovered a large number of homologs of RelE-including all the previously described versions - from a variety of bacteria and archaea. We selected distinct representatives from the newly-detected members and transitively searched the NR database with these proteins as queries. As these proteins are typically small (85-110 residues in length) and divergent, several searches initiated with different seed sequences were required to exhaustively identify distant homologs of RelE. For example, RelE (gi: 16129522, E. coli) recovers a Staphylococcus aureus protein (gi: 15925446, ortholog of E. coli YoeB) in the third iteration (e= 6e-04), a Campylobacter fetus protein (gi: 28974229, ortholog of E. coli YafQ) in the fourth iteration (e = 2e-04), a Microbulbifer degradans protein (gi: 23028223, ParE family) in the fourth iteration (e = 0.004) and a Magnetococcus protein (gi: 23001539, with the RelE-related segment fused to a SF-I helicase module) in the fifth iteration (e = 0.001). To further ensure the detection of highly divergent members, all unique members detected in these searches were included in a single PSI-BLAST PSSM that was used to iteratively search the NR database till convergence. As result of this procedure, we were able to recover over 150 distinct homologs (less than 92% identical) of RelE. Reciprocal searches started with diverse proteins detected in the above procedure recovered a common set of obvious RelE-related 'intermediate' sequences supporting these relationships. For example, a reciprocal search with a protein from Bacteroides thetaiotaomicron (gi: 29350140), which is consistently recovered from various starting sequences that were detected in the above searches, recovers other divergent RelE-related proteins (for example, Nostoc punctiforme protein gi: 23129164) in the third iteration (e= 0.001) and the E. coli RelE itself in the fifth iteration (e = 3e-06). These sequences were then clustered using the BLASTCLUST program and individual clusters were aligned using the T_coffee program . These alignments were used to predict individually the secondary structure for each of these clusters with the PHD program . A very similar arrangement of the predicted secondary structure elements between diverse groups of these proteins further reinforced their relationships.
A striking aspect of these searches was the establishment of the relationship between the ParE (typified by the plasmid RK2-encoded toxin, ParE)  and RelE families of toxins that were previously believed to be unrelated. These toxins have very different targets of action: ParE acts at the level of DNA replication and recombination by interfering with the action of gyrase , whereas RelE acts on RNA at the level of translation . This observation suggested that despite a common origin and significant sequence similarity, these PSK toxins could have diverged into different functional roles. Hereinafter, we refer to this superfamily of proteins, which includes the toxin families defined by RelE, ParE and other evolutionarily-related proteins that were detected in the above searches, as the RelE/ParE superfamily. The majority of proteins in this superfamily are of similar length and appear to fold into a single globular domain.
The multiple alignment of the RelE/ParE family shows that much of the conservation is associated with the residues forming the core of the conserved, predicted secondary structure elements (Figure 1). Two charged or polar residues, one associated with the first conserved helix and the second associated with the end of the carboxy-terminal-most strand, are also strongly conserved throughout the superfamily. A third, slightly less conserved polar residue is also seen to be associated with the second universally predicted strand of these proteins. This conservation of a charged residue is consistent with the nucleic acid-associated role of the functionally characterized proteins of this family, and could mediate interactions with RNA or DNA. However, beyond this general similarity, the ParE and RelE proteins have very different modes of action. Experimental studies have suggested that ParE inhibits the gyrase by trapping it with DNA in a stable complex, but so far there has been no report of any catalytic activity in ParE. In contrast, RelE and its homologs have been shown to cleave mRNA only when it is associated with the ribosome, but not free mRNAs [5, 9]. This suggests that certain members of this superfamily may possess catalytic activity under certain circumstances, and the conserved polar residues could contribute to this activity. In particular, the charged residue, which occurs at the carboxyl terminus of the last strand in these proteins, is an attractive candidate for a potential catalytic residue in the RelE proteins. In light of the relationship between the ParE and RelE families of proteins it would be of some interest to investigate the possibility of an unexplored DNA-cleaving activity in members of the ParE family, analogous to the ribosome-associated RNAse activity of RelE.
Wider phyletic spread of the RelE family and its relatives, as compared to the ParE family, may suggest that the former group represents the more ancient member of the superfamily, with the ParE lineage being secondarily derived in bacteria. This would imply that the RNA-cleaving activity is likely to be the primitive function of this superfamily, with a secondary innovation of gyrase inhibitor activity in the ParE family. The sporadic, but widespread phyletic patterns of several families, and differences in representation between strains of the same species (for example, E. coli), suggest a potential role for lateral transfer in the spread of these genes. At the same time, the extensive occurrence of genes for this superfamily in the chromosomal partitions of the genomes, and not merely on plasmids, supports the proposal that they may be widely used as cellular regulators. Thus, the acquisition of members of the RelE/ParE superfamily through lateral transfer could be a means by which certain strains could rapidly evolve a new regulatory pathway that helps in adapting their gene expression to unique environmental stresses.
Gene-neighborhood analysis of the RelE/ParE superfamily and identification of PSK-like systems encoding PilT-N terminal (PIN) domain proteins
Given the tight coupling of the toxin-antitoxin gene pairs, we investigated contextual information derived from their gene neighborhoods [14–18]. We concentrated on the newly identified members of the RelE/ParE superfamily to glean previously unknown contextual connections to other genes. Upstream genes encoding transcription factors of the MetJ/Arc superfamily accompany both RelE and ParE families [6, 27, 28]. This transcription factor serves as the antitoxin, which not only regulates the transcription of genes in the T-A operon, but also physically binds to the toxins and counters their actions . A systematic survey of all the newly identified members of the RelE, YafQ and ParE families showed that the majority of the genes encoding these proteins were associated with upstream genes for MetJ/Arc transcription factors (Figures 1,3). In contrast, a range of novel gene neighborhood associations was observed in several of the newly identified families of the RelE/ParE superfamily.
Genes encoding members of the Rv3182, mlr1576, VCA0468 families of the RelE/ParE superfamily were consistently associated with conserved downstream genes that encoded small proteins (90-110 residues) unrelated to either the Phd/YefM or MetJ/Arc superfamilies (Figure 3). PSI-BLAST searches initiated with these proteins showed that they all contained a conserved helix-turn-helix domain related to the lambda cro protein (cHTH domain). This suggested that they are likely to be DNA-binding proteins that act as transcription regulators of the upstream genes, which encoded members of the RelE/ParE superfamily. By analogy to the other PSK systems, these cHTH proteins are also expected to function as antitoxins countering the action of the products of their upstream genes. However, given the 'reverse' organization with respect to the classical PKS systems, it is conceivable that the functional interaction between the cHTH transcriptional regulator and the toxin component is different in these systems.
One possibility, which is supported by the specific relationship between these cHTH proteins and cro/cI repressors, is that these proteins act as repressors of the toxin gene. The degradation of the repressor under certain conditions could then allow the expression of the toxin component. The Z5902 family of the RelE/ParE superfamily, where the RelE/ParE domain is fused to a carboxy-terminal SF-I helicase module, differs from all other families in its predicted operon organization. These proteins typically co-occur with genes for another large helicase of superfamily II (SF-II), a restriction endonuclease and a DNA methylase. This implies that these proteins could constitute a novel restriction-modification complex, in which the RelE/ParE domain could function as a DNA-binding domain.
The above observations suggested that there is considerable unity in the organization of these toxin-antitoxin gene systems: typically these comprise of two small genes, in which one member of the pair encodes a toxin and the other encodes a DNA-binding protein that functions as an antitoxin and a transcription factor. However, the transcription factor and toxin in a functional comparable pair might belong to entirely unrelated superfamilies of proteins. Thus, genes of the RelE/ParE superfamily may be associated with genes for transcription factors belonging to either the MetJ/Arc or Phd/YefM or cHTH superfamilies. Likewise, a survey of the operonic associations for transcription factors showed that the Phd/YefM might be associated with at least two unrelated toxin superfamilies, namely RelE/ParE and Doc (see below). Nevertheless, this strongly coupled operon architecture in the form of a gene-dyad encoding a transcription factor and a toxin, appears to be a unique signature of PSK and related regulatory systems. Hence, to detect other potentially novel transcription factors and toxins, we systematically surveyed the gene neighborhoods of transcription factors which were close homologs of those associated with the RelE/ParE-superfamily toxins in order to find organizations similar to the PSK systems. We then transitively extended this scanning of gene neighborhoods on the homologs of any potential toxin candidates that were detected in the first screen and sought to detect any other transcription factors they may be associated with these newly predicted toxin-like genes. In particular, we concentrated on only those potential toxin or transcription factors that are conserved across a wide range of cellular genomes. Figure 3 illustrates the network of contextual connections that were recovered in these screens in the form of a directed graph. Previously observed associations such as that of MetJ/Arc transcription factors with toxins of the MazF superfamily , and Phd/YefM transcription factors with toxins of the Doc family were recovered in these screens supporting the effectiveness of this procedure.
Based on this web of contextual connections offered by gene neighborhoods (Figure 3) we predict that the above-detected group of solo PIN domain proteins defines a toxin-like component of novel PSK-related regulatory systems. These predicted PSK-related systems with the PIN domain are as widespread as the systems with proteins of the RelE/ParE superfamily in both archaea and bacteria.
Functional and evolutionary connections of the PIN and Doc domains and eukaryotic nonsense-mediated mRNA decay
In contrast to the RelE proteins that are restricted to prokaryotes, the PIN domain is found in all three superkingdoms of life. This suggested that the PSK-related regulatory systems with PIN domain proteins might throw light on the more general roles of such systems. Given the RNA-binding role for the PIN domain [34–36], it is likely that these systems elicit their action by acting upon some RNA substrate. Importantly, a highly-conserved solo PIN domain protein is encoded by the archaeal super-operons that contain genes for ribosomal proteins and translation GTPases, like eIF3γ (Figure 3). This contextual connection implies that this version of the solo PIN domain is likely to function in the translation process in association with the ribosome and eIF3γ. This observation, along with the analogy to the Doc, RelE and possibly the MazF systems, implies that the PSK-related systems with PIN domains might function as translation inhibitors. The PIN domain proteins from eukaryotes suggest a deeper functional analogy between the PIN and RelE domains. These eukaryotic PIN domain proteins, such as SMG-7 from Caenorhabditis elegans and Nmd4p from yeast, are known to participate in the process of nonsense codon mediated decay (NMD) of mRNA [36, 39–41]. In eukaryotes, this system specifically targets mRNAs with stop codons for degradation [42, 43]. This suggests that the prokaryotic PSK-related systems with PIN domain proteins are likely to target transcripts in a process analogous to NMD of mRNA. There has been an earlier proposal that the PIN domain may be related to 39R59 exonucleases . However, even though these two domains may have a common fold, they show differences in the conserved residues that constitute their active sites (additional data file 1) . Hence, it possible that certain PIN domains, analogous to the RelE domains, cleave RNA only when it is associated with the ribosome. Thus, we predict that a ribosome-associated RNAse activity is likely to be the common mechanism of action for the solo PIN proteins in NMD as well as in prokaryotic PSK-related systems.
The above observations suggest that the crucial PIN domain protein of the NMD system is perhaps a remnant of an ancient PSK-type regulatory system. The emergence of the nucleus in eukaryotes, and the uncoupling of translation and transcription could have caused the PIN domain protein to be released from the tight regulatory circuit involving a coupled antitoxin transcription factor. Our earlier studies have suggested that other key components of the NMD system and the eukaryotic translation initiation systems have evolved from a common group of ancestral proteins . The evolution of interactions with this eukaryote-specific complex might have contributed to the decoupling of the solo PIN domain proteins from the ancestral PSK-related system, and led to their incorporation into the NMD system.
We examined other superfamilies of toxins to determine if they included widely distributed members with a general functional significance similar to the solo PIN domain proteins. Several PSK-systems have a very limited phyletic distribution  and are not further detailed here because they are unlikely to throw light on broadly deployed regulatory mechanisms. The well-known MazF/CcdB/Kid superfamily is widely represented in the bacterial superkingdom  and a single archaeal genus, Pyrococcus, but not in eukaryotes (Figure 2). As the structures of several proteins from this superfamily are currently available, we searched the PDB database  with them to detect other related structures. These searches indicated that although the MazF/CcdB/Kid domain possessed a SH3-barrel fold, they were not closely related to any other members of this fold. Hence, it is likely that these domains represent a specialized version of the SH3-barrel fold that was derived in the bacteria.
The Doc toxin of the Phd-Doc PSK system has been hitherto detected only in P1-like phages and related mobile DNA elements from γ-proteobacteria . Our sequence profile searches with the PSI-BLAST program recovered several homologs of Doc from several proteobacterial lineages, low GC Gram positive bacteria, actinobacteria, cyanobacteria, spirochetes, Aquifex, Fusobacterium, some archaeal lineages and animals, with statistically significant expect values (e < 0.001). Amongst these newly-detected homologs of Doc were proteins such as the Fic protein from E. coli [46, 47], and the huntingtin associated protein E (HYPE) . The conserved region shared by all these proteins was approximately 125 to 150 residues long, and appeared to define a novel globular domain that we refer to, hereinafter, as the Doc domain.
A phylogenetic analysis of the Doc superfamily reveals that it contains three distinct families (Figure 6). The first family contains the Doc protein from phage P1 and its homologs from several bacterial genomes. Typically, upstream genes for an antitoxin transcription factor accompany genes encoding members of this family (Figure 3). All these proteins contain a minimal stand-alone version of the Doc domain. The second family, typified by the animal HYPE protein is also found in several bacteria and some archaea. These proteins contain a longer insert after the conserved amino-terminal motifs (Figure 6) and are typically multidomain proteins. The animal HYPE contains a amino-terminal tetratricopeptide repeat (TPR) module, whereas most prokaryotic versions are fused to a carboxy-terminal DNA-binding winged HTH (wHTH) domain . Interestingly, a single bacterial protein, XCC2565 from Xanthomonas, has leucine-rich repeats (LRR, Figure 3) amino-terminal to the Doc domain. The presence of TPR repeats is reminiscent of similar TPR modules that are present amino-terminal to the PIN domain in NMD proteins such as Smg-7 . The human HYPE protein interacts with the huntingtin protein, which also contains similar α-helical ARM repeats that adopt a superstructure similar to the TPR repeats . While the physiological relevance of these interactions is unclear, it is plausible that the HYPE is part of an uncharacterized multiprotein complex in the animal cells that may have a regulatory role similar to the chromosomally encoded versions of the bacterial Doc systems. Although no transcription factor genes are seen accompanying the genes for the prokaryotic HYPE orthologs, the carboxy-terminal wHTH could possibly function as an inbuilt transcriptional regulator for these proteins. A single bacterial member of the HYPE family, namely PfhB2 from Pasteurella, contains two Doc domains fused to several fibrinogen-type repeats and a conserved domain found in several bacterial agglutinins (Figure 3). This protein is likely to be an extracellular protein, and may represent an unusual case of recruitment of the Doc domain for a novel function, perhaps as a secreted nuclease or an enzyme for the processing of extracellular polysaccharides. The third family of Doc-related proteins is comprised of the E. coli Fic protein and its orthologs from diverse bacteria (Figure 6). Like the HYPE family, they also contain a longer insert in the Doc domain after the amino-terminal conserved motif (Figure 6). These clearly do not appear to be parts of a PSK-related system for they do not show any conserved operon architectures. Mutations in the Fic protein result in filamentous growth, indicating a role in cell division [46, 47]. Based on the predicted catalytic activity for the Doc superfamily, it is possible that the Fic proteins may target specific transcripts when induced under certain growth conditions.
The above analysis suggests that there is considerable diversity amongst the T-A systems. Most widespread prokaryotic PSK or related systems appear to have been derived by mixing and matching a few major classes of toxins and antitoxins (Figure 2) that appear to have independent evolutionary origins. The major classes of toxins are the RelE/ParE superfamily, the MazF/CcdB superfamily, the Doc superfamily and the solo PIN domain superfamily (Figure 2). The major classes of antitoxin transcription factors are the MetJ/Arc superfamily and related ribbon-helix-helix fold proteins, the HTH superfamily, the AbrB superfamily and the Phd/YefM superfamily. This suggests that all PSK-related systems have not descended from a common ancestor, but have been assembled on different occasions from a relatively small pool of proteins. One simple hypothesis that could account for the observed pattern of gene neighborhoods is the in situ displacement of genes for functionally related proteins in a tightly maintained operon. In this process, the operon architecture is maintained due to the strong functional interactions of the encoded polypeptides, but the actual origin of the polypeptides encoded by it is not constrained. This is likely to happen if unrelated polypetides can perform the same function equally effectively. This is consistent with the functional identity of different superfamilies of antitoxins that act as transcription factors. The potential functional equivalence of several unrelated toxins, such as RelE, the PIN domain and Doc domain toxins, or ParE and CcdB suggests that even the toxin genes are viable candidates for in situ displacement by analogs. Thus toxin or antitoxin genes could be displaced in situ by functionally equivalent, but unrelated genes, while the operon architecture itself is preserved. This process is highly reminiscent of the displacement of functionally equivalent, but evolutionarily unrelated genes in certain DNA recombination related operons in bacteria and phages . However, the case of the RelE/ParE superfamily suggests that toxin-antitoxin gene pairs could undergo vertical evolutionary divergence to acquire very distinct functions.
Finally, the abundant presence of PSK-related systems in prokaryotic chromosomes supports the original proposal of Gerdes and recent experimental studies that these systems could function as more generic regulatory systems [5, 6, 8, 19]. In particular, they appear to have proliferated on the chromosomes of some prokaryotes, such as the RelE system in several proteobacteria and the PIN system in archaea, Nostoc and Mycobacterium tuberculosis (Figure 2).
Furthermore, in some cases, domains such as Doc, PIN, RelE/ParE and YefM proteins appear to have been incorporated in systems that function outside the context of classic PSK-related systems.
Using sequence profile analysis and contextual data derived from comparative genomics, we investigated the evolutionary relationships of prokaryotic T-A systems. As a result we were able to unify the functionally unrelated toxin families defined by the ParE and RelE proteins and detect several new families of this protein superfamily. The contextual information obtained from comparative genomics allowed us to identify several new operons of PSK-related systems. One of these encodes a protein with a solo RNA-binding PIN domain as the toxin component. We suggest that these PIN domain proteins function similarly to the RelE proteins in cleaving ribosome-associated transcripts. We predict that this is likely to be a common mode of action of the PIN domain containing PSK-related systems of prokaryotes and the NMD system that cleaves transcripts with stop codons in eukaryotes. We also show that the Doc toxin defines a large family of proteins that include the animal huntingtin-interacting HYPE proteins and the bacterial Fic proteins. These proteins are predicted to function as metalloenzymes that could potentially cleave RNA. Finally, we also describe several new families of associated transcription factors that are predicted to function as antitoxins in the newly identified PSK systems. These predictions are likely to aid in experimental investigation of poorly understood aspects of both eukaryotic and prokaryotic regulatory systems, including the process of nonsense mediated decay in eukaryotes.
Materials and methods
The non-redundant (NR) database of protein sequences (National Center for Biotechnology Information, NIH, Bethesda) was searched using the BLASTP program . Profile searches were conducted using the PSI-BLAST program with either a single sequence or an alignment used as the query, with a default profile inclusion expectation (E) value threshold of 0.01 (unless specified otherwise), and was iterated until convergence [11, 13]. For all searches with compositionally biased proteins we used a statistical correction for this bias to reduce false positives in these searches. Multiple alignments were constructed using the T_Coffee  or PCMA  programs, followed by manual correction based on the PSI-BLAST results. All large-scale sequence analysis procedures were carried out using the SEALS package .
Structural manipulations were carried out using the Swiss-PDB viewer program  and the ribbon diagrams were constructed with MOLSCRIPT . Searches of the PDB database with query structures was conducted using the DALI program . Protein secondary structure was predicted using a multiple alignment as the input for the PHD program . Similarity-based clustering of proteins was carried out using the BLASTCLUST program . Phylogenetic analysis was carried out using the maximum-likelihood, neighbor-joining and least squares methods [56, 57]. Briefly, this process involved the construction of a least squares tree using the FITCH program  or a neighbor joining tree using the NEIGHBOR  or the MEGA program , followed by local rearrangement using the ProtML program of the Molphy package  to arrive at the maximum likelihood (ML) tree. The statistical significance of various nodes of this ML tree was assessed using the relative estimate of logarithmic likelihood bootstrap (ProtML RELL-BP), with 10,000 replicates. Gene neighborhoods were determined by searching the NCBI PTT tables with a script that was custom-written by the authors. Briefly the procedure involved collecting fixed neighborhoods centered on a set of query genes, followed by the clustering of their products using the BLASTCLUST program to determine related products. The presence of clusters of related genes amongst the neighbors of the query set implied the presence of conserved gene neighborhoods. This was used in combination with a previously reported screen for conserved gene neighborhoods [15, 35]. These tables can be accessed from the genomes division of the Genbank database .
Additional data files
A complete list of all the novel proteins belonging to the various superfamilies discussed in this paper will be made available for download via . A multiple alignment of selected PIN domains (Additional data file 1), including the predicted toxins of PSK-like systems is provided with the online version of this article.
- Couturier M, Bahassi el M, Van Melderen L: Bacterial death by DNA gyrase poisoning. Trends Microbiol. 1998, 6: 269-275. 10.1016/S0966-842X(98)01311-0.PubMedView ArticleGoogle Scholar
- Engelberg-Kulka H, Glaser G: Addiction modules and programmed cell death and antideath in bacterial cultures. Annu Rev Microbiol. 1999, 53: 43-70. 10.1146/annurev.micro.53.1.43.PubMedView ArticleGoogle Scholar
- Jensen RB, Gerdes K: Programmed cell death in bacteria: proteic plasmid stabilization systems. Mol Microbiol. 1995, 17: 205-210.PubMedView ArticleGoogle Scholar
- Yarmolinsky MB: Programmed cell death in bacterial populations. Science. 1995, 267: 836-837.PubMedView ArticleGoogle Scholar
- Christensen SK, Mikkelsen M, Pedersen K, Gerdes K: RelE, a global inhibitor of translation, is activated during nutritional stress. Proc Natl Acad Sci USA. 2001, 98: 14328-14333. 10.1073/pnas.251327898.PubMedPubMed CentralView ArticleGoogle Scholar
- Gerdes K: Toxin-antitoxin modules may regulate synthesis of macromolecules during nutritional stress. J Bacteriol. 2000, 182: 561-572. 10.1128/JB.182.3.561-572.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Jiang Y, Pogliano J, Helinski DR, Konieczny I: ParE toxin encoded by the broad-host-range plasmid RK2 is an inhibitor of Escherichia coli gyrase. Mol Microbiol. 2002, 44: 971-979. 10.1046/j.1365-2958.2002.02921.x.PubMedView ArticleGoogle Scholar
- Hazan R, Sat B, Reches M, Engelberg-Kulka H: Postsegregational killing mediated by the P1 phage 'addiction module' phd-doc requires the Escherichia coli programmed cell death system mazEF. J Bacteriol. 2001, 183: 2046-2050. 10.1128/JB.183.6.2046-2050.2001.PubMedPubMed CentralView ArticleGoogle Scholar
- Pedersen K, Zavialov AV, Pavlov MY, Elf J, Gerdes K, Ehrenberg M: The bacterial toxin RelE displays codon-specific cleavage of mRNAs in the ribosomal A site. Cell. 2003, 112: 131-140.PubMedView ArticleGoogle Scholar
- Christensen SK, Gerdes K: RelE toxins from bacteria and Archaea cleave mRNAs on translating ribosomes, which are rescued by tmRNA. Mol Microbiol. 2003, 48: 1389-1400. 10.1046/j.1365-2958.2003.03512.x.PubMedView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Neuwald AF, Liu JS, Lipman DJ, Lawrence CE: Extracting protein alignment models from the sequence database. Nucleic Acids Res. 1997, 25: 1665-1677. 10.1093/nar/25.9.1665.PubMedPubMed CentralView ArticleGoogle Scholar
- Aravind L, Koonin EV: Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol. 1999, 287: 1023-1040. 10.1006/jmbi.1999.2653.PubMedView ArticleGoogle Scholar
- Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D: A combined algorithm for genome-wide prediction of protein function. Nature. 1999, 402: 83-86. 10.1038/47048.PubMedView ArticleGoogle Scholar
- Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context. Genome Res. 2001, 11: 356-372. 10.1101/gr.GR-1619R.PubMedView ArticleGoogle Scholar
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999, 96: 2896-2901. 10.1073/pnas.96.6.2896.PubMedPubMed CentralView ArticleGoogle Scholar
- Aravind L: Guilt by association: contextual information in genome analysis. Genome Res. 2000, 10: 1074-1077. 10.1101/gr.10.8.1074.PubMedView ArticleGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000, 10: 1204-1210. 10.1101/gr.10.8.1204.PubMedPubMed CentralView ArticleGoogle Scholar
- Hayes CS, Sauer RT: Toxin-antitoxin pairs in bacteria: killers or stress regulators?. Cell. 2003, 112: 2-4.PubMedView ArticleGoogle Scholar
- Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.PubMedView ArticleGoogle Scholar
- Rost B, Sander C: Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993, 232: 584-599. 10.1006/jmbi.1993.1413.PubMedView ArticleGoogle Scholar
- Roberts RC, Helinski DR: Definition of a minimal plasmid stabilization system from the broad-host-range plasmid RK2. J Bacteriol. 1992, 174: 8119-8132.PubMedPubMed CentralGoogle Scholar
- Kamada K, Hanaoka F, Burley SK: Crystal structure of the MazE/MazF complex: molecular bases of antidote-toxin recognition. Mol Cell. 2003, 11: 875-884.PubMedView ArticleGoogle Scholar
- de la Cueva-Mendez G: Distressing bacteria: structure of a prokaryotic detox program. Mol Cell. 2003, 11: 848-850.PubMedView ArticleGoogle Scholar
- Mittenhuber G: Occurrence of mazEF-like antitoxin/toxin systems in bacteria. J Mol Microbiol Biotechnol. 1999, 1: 295-302.PubMedGoogle Scholar
- Hargreaves D, Santos-Sierra S, Giraldo R, Sabariegos-Jareno R, de la Cueva-Mendez G, Boelens R, Diaz-Orejas R, Rafferty JB: Structural and functional analysis of the kid toxin protein from E. coli plasmid R1. Structure (Camb). 2002, 10: 1425-1433. 10.1016/S0969-2126(02)00856-0.View ArticleGoogle Scholar
- Oberer M, Zangger K, Prytulla S, Keller W: The anti-toxin ParD of plasmid RK2 consists of two structurally distinct moieties and belongs to the ribbon-helix-helix family of DNA-binding proteins. Biochem J. 2002, 361: 41-47. 10.1042/0264-6021:3610041.PubMedPubMed CentralView ArticleGoogle Scholar
- Aravind L, Koonin EV: DNA-binding proteins and evolution of transcription regulation in the archaea. Nucleic Acids Res. 1999, 27: 4658-4670. 10.1093/nar/27.23.4658.PubMedPubMed CentralView ArticleGoogle Scholar
- Gazit E, Sauer RT: Stability and DNA binding of the phd protein of the phage P1 plasmid addiction system. J Biol Chem. 1999, 274: 2652-2657. 10.1074/jbc.274.5.2652.PubMedView ArticleGoogle Scholar
- Magnuson R, Yarmolinsky MB: Corepression of the P1 addiction operon by Phd and Doc. J Bacteriol. 1998, 180: 6342-6351.PubMedPubMed CentralGoogle Scholar
- Allen GC, Kornberg A: Assembly of the primosome of DNA replication in Escherichia coli. J Biol Chem. 1993, 268: 19204-19209.PubMedGoogle Scholar
- Hayes F: A family of stability determinants in pathogenic bacteria. J Bacteriol. 1998, 180: 6415-6418.PubMedPubMed CentralGoogle Scholar
- Makarova KS, Aravind L, Galperin MY, Grishin NV, Tatusov RL, Wolf YI, Koonin EV: Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell. Genome Res. 1999, 9: 608-628.PubMedGoogle Scholar
- Anantharaman V, Koonin EV, Aravind L: Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res. 2002, 30: 1427-1464. 10.1093/nar/30.7.1427.PubMedPubMed CentralView ArticleGoogle Scholar
- Koonin EV, Wolf YI, Aravind L: Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach. Genome Res. 2001, 11: 240-252. 10.1101/gr.162001.PubMedPubMed CentralView ArticleGoogle Scholar
- Clissold PM, Ponting CP: PIN domains in nonsense-mediated mRNA decay and RNAi. Curr Biol. 2000, 10: R888-R890. 10.1016/S0960-9822(00)00858-7.PubMedView ArticleGoogle Scholar
- Huffman JL, Brennan RG: Prokaryotic transcription regulators: more than just the helix-turn-helix motif. Curr Opin Struct Biol. 2002, 12: 98-106. 10.1016/S0959-440X(02)00295-6.PubMedView ArticleGoogle Scholar
- Vaughn JL, Feher V, Naylor S, Strauch MA, Cavanagh J: Novel DNA binding domain and genetic regulation model of Bacillus subtilis transition state regulator abrB. Nat Struct Biol. 2000, 7: 1139-1146. 10.1038/81999.PubMedView ArticleGoogle Scholar
- Cali BM, Kuchma SL, Latham J, Anderson P: smg-7 is required for mRNA surveillance in Caenorhabditis elegans. Genetics. 1999, 151: 605-616.PubMedPubMed CentralGoogle Scholar
- Anders KR, Grimson A, Anderson P: SMG-5, required for C. elegans nonsense-mediated mRNA decay, associates with SMG-2 and protein phosphatase 2A. EMBO J. 2003, 22: 641-650. 10.1093/emboj/cdg056.PubMedPubMed CentralView ArticleGoogle Scholar
- Domeier ME, Morse DP, Knight SW, Portereiko M, Bass BL, Mango SE: A link between RNA interference and nonsense-mediated decay in Caenorhabditis elegans. Science. 2000, 289: 1928-1931. 10.1126/science.289.5486.1928.PubMedView ArticleGoogle Scholar
- Wagner E, Lykke-Andersen J: mRNA surveillance: the perfect persist. J Cell Sci. 2002, 115: 3033-3038.PubMedGoogle Scholar
- Schell T, Kulozik AE, Hentze MW: Integration of splicing, transport and translation to achieve mRNA quality control by the nonsense-mediated decay pathway. Genome Biol. 2002, 3: reviews1006.1-1006.6. 10.1186/gb-2002-3-3-reviews1006.View ArticleGoogle Scholar
- Aravind L, Koonin EV: Eukaryote-specific domains in translation initiation factors: implications for translation regulation and evolution of the translation system. Genome Res. 2000, 10: 1172-1184. 10.1101/gr.10.8.1172.PubMedPubMed CentralView ArticleGoogle Scholar
- PDB - Protein Data Bank. [http://www.rcsb.org/pdb/]
- Komano T, Utsumi R, Kawamukai M: Functional analysis of the fic gene involved in regulation of cell division. Res Microbiol. 1991, 142: 269-277. 10.1016/0923-2508(91)90040-H.PubMedView ArticleGoogle Scholar
- Kawamukai M, Matsuda H, Fujii W, Utsumi R, Komano T: Nucleotide sequences of fic and fic-1 genes involved in cell filamentation induced by cyclic AMP in Escherichia coli. J Bacteriol. 1989, 171: 4525-4529.PubMedPubMed CentralGoogle Scholar
- Faber PW, Barnes GT, Srinidhi J, Chen J, Gusella JF, MacDonald ME: Huntingtin interacts with a family of WW domain proteins. Hum Mol Genet. 1998, 7: 1463-1474. 10.1093/hmg/7.9.1463.PubMedView ArticleGoogle Scholar
- Iyer LM, Koonin EV, Aravind L: Classification and evolutionary history of the single-strand annealing proteins, RecT, Redbeta, ERF and RAD52. BMC Genomics. 2002, 3: 8-10.1186/1471-2164-3-8.PubMedPubMed CentralView ArticleGoogle Scholar
- Pei J, Sadreyev R, Grishin NV: PCMA: fast and accurate multiple sequence alignment based on profile consistency. Bioinformatics. 2003, 19: 427-428. 10.1093/bioinformatics/btg008.PubMedView ArticleGoogle Scholar
- SEALS Home Page. [http://www.ncbi.nlm.nih.gov/CBBresearch/Walker/SEALS/index.html]
- Guex N, Peitsch MC: SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 1997, 18: 2714-2723.PubMedView ArticleGoogle Scholar
- Kraulis PJ: Molscript. J Appl Cryst. 1991, 24: 946-950. 10.1107/S0021889891004399.View ArticleGoogle Scholar
- Holm L, Sander C: Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993, 233: 123-138. 10.1006/jmbi.1993.1489.PubMedView ArticleGoogle Scholar
- BLASTCLUST - BLAST score-based single-linkage clustering. [ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.txt]
- Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 1996, 266: 418-427.PubMedView ArticleGoogle Scholar
- Hasegawa M, Kishino H, Saitou N: On the maximum likelihood method in molecular phylogenetics. J Mol Evol. 1991, 32: 443-445.PubMedView ArticleGoogle Scholar
- Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.Google Scholar
- Kumar S, Tamura K, Jakobsen IB, Nei M: MEGA2: molecular evolutionary genetics analysis software. Bioinformatics. 2001, 17: 1244-1245. 10.1093/bioinformatics/17.12.1244.PubMedView ArticleGoogle Scholar
- NCBI Entrez Genome. [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome]
- Supplementary information and additional files. [ftp://ftp.ncbi.nih.gov/pub/aravind/rele/]
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.