- Open Access
Bacterial α2-macroglobulins: colonization factors acquired by horizontal gene transfer from the metazoan genome?
Genome Biologyvolume 5, Article number: R38 (2004)
Invasive bacteria are known to have captured and adapted eukaryotic host genes. They also readily acquire colonizing genes from other bacteria by horizontal gene transfer. Closely related species such as Helicobacter pylori and Helicobacter hepaticus, which exploit different host tissues, share almost none of their colonization genes. The protease inhibitor α2-macroglobulin provides a major metazoan defense against invasive bacteria, trapping attacking proteases required by parasites for successful invasion.
Database searches with metazoan α2-macroglobulin sequences revealed homologous sequences in bacterial proteomes. The bacterial α2-macroglobulin phylogenetic distribution is patchy and violates the vertical descent model. Bacterial α2-macroglobulin genes are found in diverse clades, including purple bacteria (proteobacteria), fusobacteria, spirochetes, bacteroidetes, deinococcids, cyanobacteria, planctomycetes and thermotogae. Most bacterial species with bacterial α2-macroglobulin genes exploit higher eukaryotes (multicellular plants and animals) as hosts. Both pathogenically invasive and saprophytically colonizing species possess bacterial α2-macroglobulins, indicating that bacterial α2-macroglobulin is a colonization rather than a virulence factor.
Metazoan α2-macroglobulins inhibit proteases of pathogens. The bacterial homologs may function in reverse to block host antimicrobial defenses. α2-macroglobulin was probably acquired one or more times from metazoan hosts and has then spread widely through other colonizing bacterial species by more than 10 independent horizontal gene transfers. yfhM-like bacterial α2-macroglobulin genes are often found tightly linked with pbpC, encoding an atypical peptidoglycan transglycosylase, PBP1C, that does not function in vegetative peptidoglycan synthesis. We suggest that YfhM and PBP1C are coupled together as a periplasmic defense and repair system. Bacterial α2-macroglobulins might provide useful targets for enhancing vaccine efficacy in combating infections.
The broad-spectrum protease inhibitor α2-macroglobulin (α2M) and the complement factors C3, C4 and C5 belong to a gene family present in all metazoans ranging from corals to humans. These large (approximately 1,500 residue) proteins all undergo proteolytic processing and structural rearrangement as part of their role in host defense. The family is characterized by a unique thioester motif (CxEQ; single-letter amino-acid code), and a propensity for multiple conformationally sensitive binding interactions , which define their functional properties. The highly reactive thioester bond is buried inside the molecule in the native protein, protected from precocious inactivation . Upon proteolytic cleavage, the thioester bond becomes exposed and can then mediate covalent attachment to activating self and non-self surfaces, in the case of complement factors, or covalent or noncovalent crosslinking to the attacking proteases in the case of α2Ms . The proteolytic activation of these proteins also mediates interactions with receptors.
In contrast to complement factors, which are activated by specific 'convertase' protease complexes, α2Ms have an accessible 'bait' region with target sites for many proteases. The rearrangement of α2M that follows cleavage of the bait region entraps the attacking protease in a cage-like structure, hindering protein substrates from reaching the protease active site . In this way, exported proteases that are essential for parasitic infections can be rendered ineffective by α2M entrapment [5–7]. Protease-reacted α2M is then cleared from circulation by binding to the receptor CD91, triggering endocytosis. In addition, α2Ms bind cytokines and growth factors and regulate their clearance and activity [8, 9].
Vertebrate complement factors C3, C4 and C5 are part of an activation cascade that leads to the assembly of the membrane-attack complex and lysis of the pathogen. Binding of C3 also targets pathogens for phagocytosis. Proteolytic activation of all three complement proteins yields anaphylatoxins (cleaved amino-terminal fragments) which are recognized by specific receptors and activate the inflammatory response at the site of infection. In contrast to α2Ms, complement factors also possess a carboxy-terminal domain extension, the netrin or NTR module (PFAM:PF01759) . Some members of the complement/α2M family (for example, C5 and ovostatin) have lost the thioester motif.
No α2M-related proteins have been found in any eukaryotes outside metazoans. Within the Metazoa, representatives have been found in all species examined, with a so-called 'C3-like' protein sequenced from the cnidarian Swiftia exserta (SWISS-PROT acc:Q8IYP1). There is no information from sponges as yet. We may speculate that the gene family evolved in an early metazoan in response to challenge from invasive microorganisms exploiting the new niche provided by the interstitial spaces and body cavities. The more derived role of the complement factors, together with their extra netrin domain, suggests that they arose by gene duplication from an ancestral α2M-like gene. Apart from vertebrates, α2M-group proteins have been most actively studied in arthropods. The horseshoe crab Limulus has a plasma α2M that is a component of an ancient invertebrate defense system; it is able to inhibit a wide range of proteases as well as to modulate plasma cytolytic activity . Limulus α2M forms tetramers, binding covalently across the multimers rather than to the attacking proteases, but still traps these in a cage-like structure after proteolytic activation . In dipteran insects, there are multiple α2M homologs, the thioester-containing proteins (TEPs). The TEP genes have been amplified by a process of tandem duplication into linked multigene families. Drosophila melanogaster has six TEP genes, whereas the mosquito Anopheles gambiae has 15 . It is thought that the impressive expansion of TEP genes in the mosquito might be linked to the parasitic challenge provided by its blood-sucking lifestyle . The first characterized TEP in mosquitoes, TEP1, binds to and promotes phagocytosis of bacteria . TEP1 also binds to Plasmodium berghei and mediates its killing . Thus the complement/α2M protein family is part of an innate immune system in metazoans that long pre-dates the immunoglobulin-based immune system of vertebrates, yet remains vital for combating parasites in all animal lineages examined.
While reviewing the distribution of α2M/TEP proteins from invertebrates , we conducted BLAST searches of the protein databases and were surprised to discover a number of bacterial sequences with BLAST E-values indicating homology with α2M. Given the absence of α2Ms in all non-metazoan eukaryotic lineages, it immediately seemed clear that horizontal gene transfer (HGT) of α2Ms must have occurred between metazoans and bacteria. But which way? Here we summarize the evidence for numerous horizontal transfers between bacterial lineages and discuss some biochemical and medical implications of the finding.
Our BLAST2SRS server provides the species in the BLAST output page: this is useful for quick visual surveys of the taxonomic distribution of a protein family. A BLAST2SRS search with human α2M unexpectedly listed an entry (SWISS-PROT accession number Q9X079) with E-value 2.3e-8 from Thermotoga maritima, a thermophilic eubacterium. With a length of 1,538 residues, a signal sequence and a matching CxEQ motif, there was no doubt that this was a genuine α2M homolog. Numerous other bacterial sequences with lower E-values but obvious topological equivalence were also listed: for example, Escherichia coli YfhM (P76578) at 5.8e-5; Pseudomonas putida AAN66197 at 1.3e-4; Rhizobium meliloti Q92VA6 at 5.0e-3. Profile searches with a metazoan α2M alignment and subsequently with an alignment of the stronger bacterial hits revealed a number of additional, highly diverged homologs, some lacking the CxEQ. For example, E. coli has a second divergent homolog, YfaS (P76464). It is noteworthy that not a single instance of an archaeal α2M sequence could be found. Thus α2M-like sequences are restricted to eubacteria and metazoans. No function has been experimentally ascribed to any of the bacterial α2Ms (bact-α2Ms).
Bacterial α2-macroglobulin sequences
Figure 1a shows an alignment of the segment spanning the CxEQ motif for a representative set of bacterial α2M homologs. Not all bact-α2Ms possess the CxEQ motif. Using E. coli as the reference, YfhM is the archetype of a large group, mostly with the thioester motif, and YfaS is the archetype of a smaller, diverged group always lacking the motif. The sequences of the YfhM group are sufficiently divergent that accurate alignment proved time-consuming, but was achieved over almost the whole sequence length, other than the highly variable amino termini. We did not attempt to align together the YfhM and YfaS groups and the metazoan α2Ms. This would only be useful if the trees would be informative, but the high divergence between the groups precludes accurate alignment, leading to unreliable tree calculation. (In future, given more YfaS sequences and α2Ms from more metazoan lineages and a solved three-dimensional structure to guide alignment, this might be worth revisiting.) One feature apparent in many of the aligned YfhM sequences is a conserved cysteine directly following the signal peptide (Figure 1b), indicating palmitoylation. The presence of an aspartic acid residue following the palmitoylated cysteine has been shown in E. coli to dictate sorting to the inner membrane [17, 18], in which case YfhM will be found in the periplasmic space, attached to the inner membrane. Given the CxEQ motif, covalent trapping of proteases in the periplasmic space seems to be the most likely function (whether the covalent links are to the trapped protease or between the α2M multimers, as in the horseshoe crab Limulus ). The YfaS group of bact-α2Ms lack a palmitoylable cysteine, so may be secreted, while absence of the CxEQ motif indicates the molecular function must be different, at least in part, though this does not, of itself, rule out protease entrapment, as in chicken ovostatin which also lacks the reactive thioester motif .
Genomic context of bacterial α2-macroglobulins
A survey of completely sequenced bacterial genomes was undertaken to establish which lineages possessed bact-α2Ms and which did not. Representative results are summarized in Figure 2. It is clear that there is a highly inconsistent correlation of bact-α2M possession and phylogenetic relationship, except for very closely related species.
Bact-α2Ms are absent from the full proteomes of the following anciently diverged free-living species: the hyperthermophilic chemolithoautotroph Aquifex aeolicus, the thermophilic photolithoautotroph Chlorobium tepidum, the cyanobacteria Synechocystis, Synechococcus and Prochlorococcus, all firmicutes including Bacillus subtilis, all actinobacteria including Streptomyces coelicolor, the β-proteobacterium Nitrosomonas europaea and the δ-proteobacterium Geobacter metallireducens. Furthermore, possession of bact-α2M is inconsistently represented within clades such as the proteobacteria, spirochetes and cyanobacteria. This is well illustrated by the two species of Helicobacter, one exploiting the acidic stomach and the other the very different environment of the liver: only the latter has a bact-α2M. The H. hepaticus genome lacks essentially all the proposed H. pylori virulence factors and is believed to possess a quite different set, adapted to its hepatobiliary habitat . The irregular phylogenetic correlation suggests that bact-α2Ms are 'lifestyle' genes, affecting which niches a bacterium is able to exploit. Although an association with colonization seems clear (Figure 2), there is a strong bias in bacterial genome sequencing in favor of pathogenic species: this currently precludes a statistical assessment and might create a misleading phylogenetic perspective.
The STRING server  was used to check for neighboring genes that persistently co-occur with bact-α2Ms. Using either yfhM or yfaS as seed, STRING reported two conserved gene sets that are widely found with bact-α2Ms. The results are summarized in Figure 2. The yfhM group always co-occurs with pbpC, which encodes penicillin-binding protein 1C (PBP1C). The gene topology is almost always consistent with pbpC and yfhM being in the same operon (or co-transcribed from a bidirectional promoter, as in Anabaena). The more strongly an operon structure is conserved across species, the more likely are the encoded proteins to have associated functions . Moreover, products of conserved gene pairs very often associate physically . Therefore, if YfhM is involved in colonizing or pathogenic lifestyles, so should be its partner. PBP1C is a paralog of the periplasmic cell-wall biosynthesis proteins PBP1A and PBP1B, though with the addition of a carboxy-terminal non-enzymatic domain of approximately 100 residues (PFAM:PF06832). The PBP1A and PBP1B peptidoglycan synthases each have two enzymatic domains, an amino-terminal transglycosylase and a carboxy-terminal transpeptidase (reviewed in ). Although it possesses the two enzymatic domains, studies have shown that PBP1C does not substitute for these proteins in cell-wall biosynthesis during vegetative growth : indeed deletion of pbpC has a weak phenotype not affecting cell viability in the laboratory, although the number of peptide crosslinks is increased . The transpeptidase domain in PBP1C is thought not to bind to most of the β-lactams that inhibit the paralogous enzymes, nor to be a functional transpeptidase . One curious finding is that, in vitro, PBP1C accounts for 75% of transglycosylase activity, yet is responsible for only 3% of de novo peptidoglycan biosynthesis in the cell . As PBP1C does not substitute for the biosynthetic enzymes, a possible role would be in emergency repairs to the peptidoglycan, where its efficient transglycosylase activity would be appropriate.
The yfaS group of bact-α2Ms is likewise usually found in a candidate operon, at least within the proteobacteria (Figure 2), in this case with four other gene families, defined by the E. coli yfaA, yfaQ, yfaP and yfaT genes. All these genes have signal sequences and their encoded proteins are expected to be secreted or periplasmic, but, otherwise, sequence analysis has yielded no clues to their function. It is possible that all the encoded proteins function to disrupt or resist host defenses. The YfaS-like bact-α2Ms of the free-living and highly divergent Thermotoga, Deinococcus and Rhodopirellula (none of which is known to be invasive) are not found associated with most of these other genes.
Microarray expression data
The STRING server was also used to check for any significant coexpression of yfhM, yfaS and other members of the two candidate operons, using E. coli data from the Stanford microarray database . All the genes associated with those for bact-α2Ms are present in the experiments included in the STRING database, and are expressed at levels significantly above background. However, none of the genes exhibits coordinated variation in expression levels either with each other or with any other genes in the E. coli genome under the conditions investigated.
Calculation of sequence trees
An initial rough tree calculated from an alignment of yfhM family sequences gave strong indications that several horizontal transfers had occurred among the available set. As yfhM is always found together with pbpC, indicating that the paired genes should have a shared phylogenetic history, a quick check of the PBP1C tree was also done. The two trees, which provide controls for each other's topologies, were very similar, indicating that the apparent HGTs were unlikely to be artifacts. Therefore, we undertook a more careful phylogenetic analysis with a view to improving the phylogenetic signal-to-noise ratio and using a method that is less prone to rate variation artifacts than neighbor-joining.
Alignments were reviewed and edited by hand, then processed to remove especially noisy segments, as outlined in Materials and methods. Trees were calculated with MrBayes, a Bayesian resampling protocol that is now widely adopted : MrBayes approaches the quality of maximum-likelihood methods while being quicker to calculate (though still computationally demanding). Results of the tree calculations are presented in Figure 3. The two trees differ by only three branch placements, indicating that the topologies are mostly sound, except for a few branches with low support (low posterior probabilities). As the calculated trees are unrooted, the ordering of the deepest branches cannot be mapped onto time.
Fitting the observed tree topologies to the vertical descent model
The number of ancestral genes required to explain an observed tree topology can be determined by embedding the sequence tree within a species tree. We prepared a species tree for the bacterial species in Figure 3 such that currently uncertain affinities were assigned in favor of the observed trees: this will provide a minimum estimate of ancestral gene number. The sequence tree topology was embedded into the bacterial species tree using GeneTree . The reconciled tree required six gene-duplication events and 29 lineage-specific deletions. The last common ancestor (LCA) of the full set had a minimum of three genes, the LCA of the proteobacteria had four genes, while the LCA of the α/β-proteobacteria had six genes. The tree reveals a tendency for increasing gene number over time when vertical descent has strictly occurred.
The problems of the vertical descent model are manifold. First, all sequenced extant genomes have single copies of the yfhM/pbpC genes, yet vertical descent shows a progression toward increasing gene number over time. This requires late but fully independent massive gene loss to have occurred in all lineages. Second, the observed robust sequence tree topologies would require a clear affinity between cyanobacteria and spirochetes, an affinity that has hitherto gone entirely unnoticed in the field of bacterial phylogeny. Third, the number of events (gene duplications and deletions) found to be required under a model of vertical descent is based on a species tree chosen to minimize this number (see Materials and methods.) As the species tree used is unlikely to be accurate in places where bacterial phylogeny is unresolved, the number of such events required under a vertical descent model is probably greater than described (and hence, correspondingly less likely.)
Although bizarre evolutionary scenarios can always be invoked, the given tree topologies are difficult to explain solely by vertical descent from a common ancestral eubacterium.
Horizontal transfers of the yfhM and pbpCgene couplet
Difficulties in accounting for the observed YfhM and PBP1C trees disappear if it is assumed that a number of horizontal gene transfers have occurred. Vertical transmission then only occurred among some sets of quite closely related bacteria. There are four deeply diverged sets within the tree, which will be discussed in turn.
The major proteobacterial grouping
Of the 22 proteobacterial species sampled, 18 are exclusively grouped together in the two trees. The species are all plant or animal pathogens and symbionts - even the anaerobic sulfate-reducing Desulfovibrio desulfuricans is a symbiont of deep-sea hydrothermal vent polychete worms . Sub-branches compatible with vertical descent are present for five α-proteobacteria including Agrobacterium tumefaciens and for seven γ-proteobacteria including E. coli. For bact-α2M and PBP1C to have existed in proteobacteria before the α/γ split, these gene sequences would have to be evolving more slowly than in other parts of the tree. It is more likely that the genes spread via HGT through these groups some time ago and then have been vertically inherited (at least in part). The remainder of the grouping consists of unambiguous HGT, although the direction of transfer is not always clear-cut. The β-proteobacterium Bordetella pertussis has acquired the genes from a γ-proteobacterium. The δ-proteobacterium D. desulfuricans has acquired the genes from an α-proteobacterium. An outlier set of α- and γ-proteobacteria, including Rickettsia conorii and Yersinia pestis, indicate two further transfers, but in this case the order of the transfers is not determined. Therefore to create the topology of this grouping, a minimum of four unique horizontal transfers has occurred.
The bacteroidete/fusobacteria/ε-proteobacteria grouping
This group consists of three unrelated taxa which exploit niches related to the animal digestive system. The ε-proteobacterium Helicobacter hepatica colonizes mouse liver ducts, Fusobacterium species colonize the teeth, Bacteroides thetaiotamicron (not shown on the tree owing to an incomplete bact-α2M sequence) is a major gut bacterium, while a second bacteroidete, Cytophaga hutchinsonii, exploits cellulose-rich animal waste. Horizontal transfer into the ε-proteobacterium H. hepaticus is clear-cut, as it is isolated on the trees from all other proteobacteria, whereas other Helicobacter lack these genes. Another transfer has occurred between fusobacterial and bacteroidete lineages, but the direction is not clear. A third HGT is likely to have originally introduced the genes into these lineages but cannot be formally assigned without a root.
The isolated Magnetospirillumα-proteobacteria branch
Magnetospirillum magnetotacticum bact-α2M and PBP1C are deeply diverged from all other species, including other α-proteobacteria. This positioning away from its relatives indicates that HGT occurred into the Magnetospirillum lineage. The strong divergence from other sequences may indicate that the sequence has undergone rapid evolution. This latter point may be addressed in future if the branch becomes populated by some closer relatives.
The cyanobacteria/spirochete/β-proteobacteria grouping
This branch consists of three very unrelated taxa: cyanobacteria facultatively symbiotic with plants, spirochetes pathogenic to metazoans and a pair of closely related genera of β-proteobacteria that each include free-living, symbiotic and pathogenic forms. The deepest diverged in the group are the Anabaena-like symbiotic cyanobacteria. The economically significant Anabaena-Azolla symbiosis provides the nitrogen fixation that fertilizes paddy fields . As other free-living cyanobacteria, such as Synechococcus, lack these genes, HGT into this lineage is very likely. The isolation of the Ralstonia and Chromobacterium clade from other proteobacteria also indicates HGT into their lineage. HGT for Leptospira (the causal agent of leptospirosis) is also indicated, as other spirochetes such as Borrelia burgdorferi (the causal agent of Lyme disease) and Treponema pallidum (the causal agent of syphilis) lack these genes. Thus, this set of genes that are clearly grouped together by molecular phylogeny, yet are found within very diverse taxa, appear to have been transmitted three times.
Sifting the evidence for bacterial HGT
There is increasing evidence that HGT has had - and continues to have - a major role in the adaptation of organisms, especially prokaryotes, to exploiting new environments. Nevertheless, it is often hard to demonstrate HGT, and there is considerable confusion about how to do so. The default hypothesis should remain vertical transmission unless there is good evidence for HGT. The over-hasty assignment of recent bacterial-to-vertebrate gene transfers, solely on the basis of BLAST E-values , has been firmly refuted [32, 33]. Such premature HGT assignments have been surveyed and used to provide guidelines for evaluating HGT [34, 35]. Sometimes the evidence is clear-cut, as when adaptive genes are carried on phage, plasmid or transposon. Inconsistent phylogenetic distribution may be evidence for HGT but must be carefully balanced against gene-loss models, recognizing that the two processes are not mutually exclusive. Phylogenetic trees only provide good evidence for HGT when branching is robust and clearly delimited by appropriate outgroups: the HGT must carry a diagnostic molecular evolutionary signal.
One of the best paradigms for investigating recent and ongoing HGT in parasitic prokaryotes is the γ-proteobacterium Vibrio cholerae, which acquired pathogenicity late in recorded history. Free-living Vibrio species are common, harmless aquatic microorganisms. The first recorded cholera pandemic occurred in 1817, the sixth and seventh occurred recently enough to be investigated with modern molecular techniques, and the eighth is probably underway now (see  for details). The basic pathogenicity genes ctxAB, which encode cholera toxin, lie within the genome of the filamentous phage CTXφ . Other pathogenicity gene 'islands' include the toxin-co-regulated pilus, needed for colonization, and the VSP-1 and VSP-2 islands, which appeared in strains of the seventh pandemic and are suggested to have been integral to that event . The recent O139 serotype arose by wholesale replacement of the pre-existing gene cluster encoding lipopolysaccharide O side-chain synthesis, yielding an outer surface with a different architecture, less susceptible to pre-existing immunity . Thus, pathogenic V. cholerae continues to adapt to the invasive lifestyle, to a large extent through HGT-mediated acquisition of new capabilities, including, but not limited to, better avoidance of host defenses. Although many of the functions encoded by the genes within pathogenic islands are not understood, their absence from the free-living Vibrio species is good evidence that they have been incorporated, and then conserved, because of a direct or indirect role in enhancing virulence. Even though it is a γ-proteobacterium, the genomic sequence data show that V. cholerae has not (re-)acquired a bact-α2M gene. At least, not yet.
HGT of α2-macroglobulin among colonizing bacteria
Our unexpected finding that α2-macroglobulins, hitherto only known from metazoans, are widely present in eubacterial genomes has provided one of the most clear-cut examples of widespread HGT between extremely divergent bacterial taxa that can be monitored by molecular phylogenetic approaches. We have been able to infer a minimum of 11 independent HGTs for the major yfhM group among 27 sequences tested. Because this group always coexists with a second gene, pbpC, shared evolutionary history means the trees are controlled for topological consistency, so that the assignment of HGT is not in doubt. This work does not address an earlier evolutionary history preceding the link-up of this gene pair.
It is striking that all four deeply diverged groups in the trees include proteobacterial species. This alone clearly indicates that HGT has occurred. Because this is the most heavily researched bacterial taxon and provides most of the sequenced genomes, it is not yet clear whether other taxa will also show multiple independent acquisitions of bact-α2M and pbpC. Currently, the trees show a minimum of 11 independent HGT events, even if the originating (but unknown) taxon were represented here. A twelfth HGT is indicated if bact-α2M was originally captured from a metazoan (or vice versa). Extensive gene loss is also likely to have contributed to the phylogenetic distributions in Figure 2, particularly amongst the α-,β-, and γ-proteobacteria, where possession seems the default yet both vertical and horizontal transmission occur. Quite possibly, a cycle of gain-loss-gain has repeatedly occurred as strains adapt between colonization and free-living environments. The role of gene loss cannot be quantified with current data, but this may become possible in the future with more comprehensive genome coverage.
Where pathogenic bacteria and their eukaryotic hosts share related genes that appear to be transferred from one to the other, it is believed that the direction is overwhelmingly from the eukaryote to the bacterium. The failure to find phylogenetic evidence for bacterium-to-vertebrate gene transfers is consistent with this direction [32, 33]. We expect that bact-α2M was transferred from a metazoan host to a pathogenic bacterium, but this is not yet demonstrable and remains supposition. Given a simple early metazoan, where the germ cells would not be physically isolated from any bacterial infection, one can see how selection could act to fix a bact-α2M gene transferred in the opposite direction, if bact-α2M was originally bacterial. This issue may become resolvable in future given much more extensive phylogenetic coverage.
Bacterial α2-macroglobulin in apparently free-living bacteria
Many bacterial taxa contain a plethora of strains adapted for free-living, symbiotic and pathogenic lifestyles. Examples include the Ralstonia and Anabaena genera adapted to plants, Escherichia and Treponema adapted to animals and pseudomonads adapted to both. Many free-living bacterial strains are also facultative colonizers. This creates some difficulty in cataloguing genes that are adapted to colonizing niches versus free-living: it is rarely certain whether an apparently free-living species never colonizes a higher organism, or is not part of a continuum of strains frequently exchanging lifestyle genes. Given this caveat, we reviewed all the currently completed genomes of bacteria that are not in any way known to have close associations with higher eukaryotes. The available set of Gram-positive bacterial genomes stand out as never possessing a bact-α2M gene (see below). Only three apparently free-living Gram-negatives (Magnetospirillum, Caulobacter and Thermotoga) have bact-α2Ms while seven (Aquifex, Chlorobium, Synechocystis, Synechococcus, Prochlorococcus, Nitrosomonas and Geobacter) do not. Thus this crude estimate would suggest that possession of a bact-α2M gene is associated with colonization, not as a core colonization factor, but as an accessory that enhances fitness for the colonization environment. Further, it may imply that the three 'free-living' species possessing a bact-α2M gene have undocumented facultative symbiotic capabilities with higher eukaryotes.
Usage of host α2-macroglobulin by invasive Gram-positive bacteria
The Gram-positive firmicutes and actinobacteria stand out as always lacking bact-α2M genes (Figure 2). However, certain Gram-positives have found a more direct way to take advantage of α2M proteins. Pathogenic Streptococcus pyogenes directly co-opt host α2M for defense against host proteases through the cell-surface proteins GRAB and protein G [40, 41]. As Gram-positive bacteria do not possess an outer membrane, defensive strategies are likely to differ from those of Gram-negatives. Invasive Gram-positives are found to coat themselves in a selected set of host proteins to obstruct host defenses. Streptococcal GRAB mutants that are unable to bind α2M have attenuated virulence . It seems remarkable that prokaryotes have evolved two totally independent strategies to take advantage of α2M. On the one hand, Gram-positives are able to use the host's own protein, on the other, Gram-negatives have acquired their own gene. The clear implication is that α2M functionality has a wide and general significance spanning many bacterial taxa.
Bacterial α2-macroglobulin YfhM/PBP1C: a second line of defense?
The lipopolysaccharide (LPS) layer of the outer membrane of Gram-negative bacteria provides a first line of defense. The outer membrane barrier is sufficient to prevent the enzyme lysozyme from lysing Gram-negative bacteria in culture . Under attack from host immunity and antimicrobial peptides , LPS can be disrupted or stripped away - for example, when released into the circulation, it can lead to septic shock  - leaving the peptidoglycan cell wall and inner membrane exposed. There is current interest in antibacterial strategies that endeavor to enhance lysozyme activity by co-administration with agents that disrupt the outer membrane, such as EDTA .
The following assumptions lead us to a hypothesis for YfhM bact-α2M/PBP1C as a periplasmic defense system. First, bact-α2M and PBP1C form a complex, probably through the carboxy-terminal non-enzymatic domain of PBP1C. Second, the complex resides in the periplasmic space, attached by acylation to the inner membrane. Third, bact-α2M functions to entrap attacking proteases. Fourth, PBP1C is a transglycosylase that polymerizes glycan chains. Fifth, a periplasmic defense is only needed when the outer membrane has been breached and peptidoglycan is under attack.
The role of the bact-α2M/PBP1C system is then perceived to be defense at, and repair of, peptidoglycan breaches induced by the host (Figure 4). PBP1C provides 75% of the transglycosylase activity in vitro, but only 3% of peptidoglycan biosynthesis in vivo : it is a fast linear transglycosylase, ideal for traversing and repairing a breach. During repair it will, however, be exposed to attacking proteases and may be rapidly rendered dysfunctional. The role of bact-α2M will be to entrap attacking proteases, protecting PBP1C and other periplasmic proteins such as the high-affinity lysozyme inhibitor Ivy in E. coli . In this way, the fate of the invading bacterial cell will depend on the relative balance of the host's attacking forces versus the bacterial defense systems. Under an optimized host attack, such defenses would be rapidly overwhelmed but when (or where) the host is not well prepared, these defenses may serve to prolong colonization.
Potential experimental and medical applications
The yfhM/pbpC gene pair in bacteria not only suggests experimental research strategies, but may have medical potential to help combat pathogenic organisms. Predicted periplasmic location and complexing of bact-α2M and PBP1C with each other (and any other periplasmic proteins) should be straightforward to investigate biochemically. Elucidation of the host proteases entrapped by bact-α2Ms should reveal which host defense proteases are targeted at which parasites, leading to enhanced understanding of host defense mechanisms. Bact-α2M-inhibited proteases should be directly active against pathogen proteins - or else act indirectly as, for example, do the proteases of the complement cascade. PbpC deletions should show increased sensitivity to lysozyme treatments and pbpC/ivy double mutants, yet more so.
The bact-α2M/PBP1C proteins also provide targets for medical intervention, for example by training host immunity, the administration of anti-bact-α2M monoclonal antibody or in combination therapies. Antibodies to bact-α2Ms should act not just by promoting immune clearance but also to block the bact-α2M activity, so that the host antibacterial proteases are unhindered. This dual effect may provide an enhanced prophylactic efficacy for vaccines that are augmented with extra bact-α2M protein (probably as an inactive variant) or be directly invoked by targeted anti-bact-α2M antibody administration for combating acute infection. PBP1C should also be rendered dysfunctional by specific antibodies, perhaps in combination with transglycosylase inhibitors such as the antibiotic moenomycin.
Bact-α2Ms are spread widely amongst symbiotic and pathogenic bacteria. The implication is that protease inhibition is often an aid to colonizing higher eukaryotes. The major form of bact-α2Ms is typified by E. coli YfhM and is a periplasmic protein that co-occurs with periplasmic PBP1C, a candidate peptidoglycan repair enzyme. The distribution of the yfhM/pbpC gene pair is inconsistent with the established bacterial phylogeny. Molecular trees calculated for each of the proteins are in good agreement with each other. Each tree provides a control for the other tree's topology, allowing confidence in the general topology. This allows us to state with high confidence that at least 11 separate gene transfers have occurred between highly diverged bacterial taxa. An additional gene transfer has occurred between bacteria and metazoans. We are not yet able to determine in which direction this transfer occurred, and therefore the title question is not yet answerable.
The known properties of α2Ms and PBP1C point to a periplasmic line of defense at cell-wall breaches, mounted by the YfhM bact-α2M and PBP1C. This defensive line should be sensitive to antibody-based therapeutic approaches, whether enhanced vaccine efficacy or direct administration of antibody.
Materials and methods
Sequence database searches
Bacterial α2Ms were clearly revealed in a search of SWISSALL  using BLAST2SRS  in which the species names are included in the BLAST output . Profile searches as described  using the EMBL Bioccelerators  supported and extended the findings and were used to retrieve a set of bacterial sequences. Reciprocal searches with bact-α2M profiles reconfirmed the findings with good E-values (<1.e-25). The sets of proteomes provided by the BLAST server [51, 52] at the National Center for Biotechnology Information (NCBI)  were surveyed to determine the presence or absence of α2Ms in bacteria and in non-metazoan eukaryotes.
Survey of genomic context
The STRING server  is a resource for exploring genome context (for example, identifying groups of genes found in close proximity in many different genomes ). Queries with bact-α2Ms from E. coli or other bacteria yielded a recurring result: in most species the bact-α2Ms cluster consistently with certain other gene families. This behavior is typical of gene sets belonging to the same operon. These families were retrieved and used for further database explorations, alignments and trees. To identify the location of these gene families in other genomes where linkage to bact-α2Ms is less direct than those presented by STRING, we downloaded genomic database entries from the NCBI, converted the format of these files to EMBL using BioPerl , and assessed the location of the genes using Artemis . In addition, linkage of these gene families was investigated in organisms not included in STRING using the same method.
Sequence alignment and editing
Sequences were aligned using Clustal X 1.83 . Because many sequences are very dissimilar to each other, misaligned regions were to be expected. These were identified using the 'low scoring segments' check and either realigned using the 'realign selected range' option or were hand-edited in SeaView . Corrections were assessed by both improvements to conserved hydrophobic columns (indicating structurally important residues) and with the 'low scoring segments' check. Sequences excluded because they were either too divergent to be aligned or may contain sequencing errors included Deinococcus radiodurans and Bacteroides thetaiotamicron.
Calculation of sequence trees
Preliminary trees were made by neighbor-joining  as implemented in Clustal X, excluding gaps and correcting for multiple substitutions with the Kimura PAM model. These initial trees indicated that HGT had occurred, warranting more careful assessment. Alignments were processed with the Gblocks server  (for the divergent bact-α2Ms, the low stringency settings were used). Gblocks heuristically removes poorly conserved excessively divergent segments of alignments with low signal-to-noise ratio in order to enhance the phylogenetic signal . Processed alignments were used to derive tree topologies using Bayesian inference of phylogeny as implemented by MrBayes v2.01  with maximum-likelihood branch-length estimates provided by PUZZLE . MrBayes was used with four heated chains over 250,000 generations, sampling every 20 trees. The likelihoods of these trees were examined to estimate the length of the burn-in phase, and all trees sampled 20,000 generations later than this point were used to create a consensus tree using the 50% majority rule. Both MrBayes and PUZZLE were used with the JTT model of amino-acid substitution , assuming the presence of invariant sites and using a gamma distribution approximated by four different rate categories to model rate variation between sites, estimating amino-acid frequencies from the alignment. Trees were displayed and rooted in Njplot .
Estimation of minimum yfhMgene number in the bacterial last common ancestor
The program GeneTree  was used to evaluate the cost of embedding the YfhM sequence tree in a bacterial species tree. To compute the minimum gene number required in the last common ancestor of the given bacterial set, we set the unresolved bacterial affinities to match the YfhM/PBP1C trees (that is, cyanobacteria and spirochetes form a clade, as do bacteroidetes and fusobacteria; within the proteobacteria, the subgroup affinities were allocated to minimize the number of duplications required in the observed trees). Magnetospirillum was excluded from the analysis as its position is not stable in the YfhM and PBP1C trees. Embedding the observed tree topology in this bacterial species tree yielded a reconciled tree requiring six duplication and 29 deletion events.
Microarray expression data
STRING was used to investigate the expression patterns of genes as detected by DNA microarray. The Stanford Microarray Database (SMD) [26, 65] was used to verify that these genes were indeed spotted on the arrays used by STRING, and that the spots displayed intensities significantly higher than background levels.
Chu CT, Pizzo SV: alpha 2-macroglobulin, complement, and biologic defense: antigens, growth factors, microbial proteases, and receptor ligation. Lab Invest. 1994, 71: 792-812.
Salvesen GS, Sayers CA, Barrett AJ: Further characterization of the covalent linking reaction of alpha 2-macroglobulin. Biochem J. 1981, 195: 453-461.
Law SK, Dodds AW: The internal thioester and the covalent binding properties of the complement proteins C3 and C4. Protein Sci. 1997, 6: 263-274.
Sottrup-Jensen L: Alpha-macroglobulins: structure, shape, and mechanism of proteinase complex formation. J Biol Chem. 1989, 264: 11539-11542.
Armstrong PB: The contribution of proteinase inhibitors to immune defense. Trends Immunol. 2001, 22: 47-52. 10.1016/S1471-4906(00)01803-2.
Miyoshi S, Kawata K, Tomochika K, Shinoda S, Yamamoto S: The C-terminal domain promotes the hemorrhagic damage caused by Vibrio vulnificus metalloprotease. Toxicon. 2001, 39: 1883-1886. 10.1016/S0041-0101(01)00171-4.
Woo PT: Cryptobia (Trypanoplasma) salmositica and salmonid cryptobiosis. J Fish Dis. 2003, 26: 627-646. 10.1046/j.1365-2761.2003.00500.x.
Chu CT, Pizzo SV: Interactions between cytokines and alpha 2-macroglobulin. Immunol Today. 1991, 12: 249-
Lysiak JJ, Hussaini IM, Webb DJ, Glass WF, Allietta M, Gonias SL: Alpha 2-macroglobulin functions as a cytokine carrier to induce nitric oxide synthesis and cause nitric oxide-dependent cytotoxicity in the RAW 264.7 macrophage cell line. J Biol Chem. 1995, 270: 21919-21927. 10.1074/jbc.270.37.21919.
Banyai L, Patthy L: The NTR module: domains of netrins, secreted frizzled related proteins, and type I procollagen C-proteinase enhancer protein are homologous with tissue inhibitors of metalloproteases. Protein Sci. 1999, 8: 1636-1642.
Armstrong PB, Quigley JP: Alpha2-macroglobulin: an evolutionarily conserved arm of the innate immune system. Dev Comp Immunol. 1999, 23: 375-390. 10.1016/S0145-305X(99)00018-X.
Dolmer K, Husted LB, Armstrong PB, Sottrup-Jensen L: Localisation of the major reactive lysine residue involved in the self-crosslinking of proteinase-activated Limulus alpha 2-macroglobulin. FEBS Lett. 1996, 393: 37-40. 10.1016/0014-5793(96)00852-6.
Christophides GK, Zdobnov E, Barillas-Mury C, Birney E, Blandin S, Blass C, Brey PT, Collins FH, Danielli A, Dimopoulos G, et al: Immunity-related genes and gene families in Anopheles gambiae. Science. 2002, 298: 159-165. 10.1126/science.1077136.
Levashina EA, Moita LF, Blandin S, Vriend G, Lagueux M, Kafatos FC: Conserved role of a complement-like protein in phagocytosis revealed by dsRNA knockout in cultured cells of the mosquito, Anopheles gambiae. Cell. 2001, 104: 709-718.
Blandin S, Shiao S-H, Moita LF, Janse CJ, Waters AP, Kafatos FC, Levashina EA: Complement-like protein TEP1 is a determinant of vectorial capacity in the malaria vector Anopheles gambiae. Cell. 2004, 116: 661-670. 10.1016/S0092-8674(04)00173-4.
Blandin S, Levashina EA: Thioester-containing proteins and insect immunity. Mol Immunol. 2004, 40: 903-908. 10.1016/j.molimm.2003.10.010.
Fukuda A, Matsuyama S, Hara T, Nakayama J, Nagasawa H, Tokuda H: Aminoacylation of the N-terminal cysteine is essential for Lol-dependent release of lipoproteins from membranes but does not depend on lipoprotein sorting signals. J Biol Chem. 2002, 277: 43512-43518. 10.1074/jbc.M206816200.
Yamaguchi K, Yu F, Inouye M: A single amino acid determinant of the membrane localization of lipoproteins in E. coli. Cell. 1988, 53: 423-432. 10.1016/0092-8674(88)90162-6.
Nagase H, Harris ED: Ovostatin: a novel proteinase inhibitor from chicken egg white. II. Mechanism of inhibition studied with collagenase and thermolysin. J Biol Chem. 1983, 258: 7490-7498.
Suerbaum S, Josenhans C, Sterzenbach T, Drescher B, Brandt P, Bell M, Droge M, Fartmann B, Fischer HP, Ge Z, et al: The complete genome sequence of the carcinogenic bacterium Helicobacter hepaticus. Proc Natl Acad Sci USA. 2003, 100: 7901-7906. 10.1073/pnas.1332093100.
von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B: STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003, 31: 258-261. 10.1093/nar/gkg034.
Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA. 1999, 96: 2896-2901. 10.1073/pnas.96.6.2896.
Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998, 23: 324-328. 10.1016/S0968-0004(98)01274-2.
Goffin C, Ghuysen JM: Multimodular penicillin-binding proteins: an enigmatic family of orthologs and paralogs. Microbiol Mol Biol Rev. 1998, 62: 1079-1093.
Schiffer G, Holtje JV: Cloning and characterization of PBP 1C, a third member of the multimodular class A penicillin-binding proteins of Escherichia coli. J Biol Chem. 1999, 274: 32031-32039. 10.1074/jbc.274.45.32031.
Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, et al: The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 2003, 31: 94-96. 10.1093/nar/gkg078.
Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.
Page RD: GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics. 1998, 14: 819-820. 10.1093/bioinformatics/14.9.819.
Cottrell MT, Cary SC: Diversity of dissimilatory bisulfite reductase genes of bacteria associated with the deep-sea hydrothermal vent polychaete annelid Alvinella pompejana. Appl Environ Microbiol. 1999, 65: 1127-1132.
Hill DJ: Pattern of development of Anabaena in Azolla-Anabaena symbiosis. Planta. 1975, 122: 179-184.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.
Salzberg SL, White O, Peterson J, Eisen JA: Microbial genes in the human genome: lateral transfer or gene loss?. Science. 2001, 292: 1903-1906. 10.1126/science.1061036.
Stanhope MJ, Lupas A, Italia MJ, Koretke KK, Volker C, Brown JR: Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature. 2001, 411: 940-944. 10.1038/35082058.
Genereux DP, Logsdon JM: Much ado about bacteria-to-vertebrate lateral gene transfer. Trends Genet. 2003, 19: 191-195. 10.1016/S0168-9525(03)00055-6.
Ragan MA: Detection of lateral gene transfer among microbial genomes. Curr Opin Genet Dev. 2001, 11: 620-626. 10.1016/S0959-437X(00)00244-6.
Faruque SM, Mekalanos JJ: Pathogenicity islands and phages in Vibrio cholerae evolution. Trends Microbiol. 2003, 11: 505-510. 10.1016/j.tim.2003.09.003.
Waldor MK, Mekalanos JJ: Lysogenic conversion by a filamentous phage encoding cholera toxin. Science. 1996, 272: 1910-1914.
Dziejman M, Balon E, Boyd D, Fraser CM, Heidelberg JF, Mekalanos JJ: Comparative genomic analysis of Vibrio cholerae: genes that correlate with cholera endemic and pandemic disease. Proc Natl Acad Sci USA. 2002, 99: 1556-1561. 10.1073/pnas.042667999.
Bik EM, Bunschoten AE, Gouw RD, Mooi FR: Genesis of the novel epidemic Vibrio cholerae O139 strain: evidence for horizontal transfer of genes involved in polysaccharide synthesis. EMBO J. 1995, 14: 209-216.
Rasmussen M, Müller HP, Björck L: Protein GRAB of Streptococcus pyogenes regulates proteolysis at the bacterial surface by binding alpha2-macroglobulin. J Biol Chem. 1999, 274: 15336-15344. 10.1074/jbc.274.22.15336.
Sjobring U, Trojnar J, Grubb A, Akerstrom B, Björck L: Ig-binding bacterial proteins also bind proteinase inhibitors. J Immunol. 1989, 143: 2948-2954.
Masschalck B, Michiels CW: Antimicrobial properties of lysozyme in relation to foodborne vegetative bacteria. Crit Rev Microbiol. 2003, 29: 191-214.
Ganz T: Antimicrobial polypeptides. J Leukoc Biol. 2003, 75: 34-38. 10.1189/jlb.0403150.
Wiese A, Gutsmann T, Seydel U: Towards antibacterial strategies: studies on the mechanisms of interaction between antibacterial peptides and model membranes. J Endotoxin Res. 2003, 9: 67-84. 10.1179/096805103125001441.
Monchois V, Abergel C, Sturgis J, Jeudy S, Claverie JM: Escherichia coli ykfE ORFan gene encodes a potent inhibitor of C-type lysozyme. J Biol Chem. 2001, 276: 18437-18441. 10.1074/jbc.M010297200.
Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, et al: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003, 31: 365-370. 10.1093/nar/gkg095.
EMBL Blast sequence retrieval tool. [http://blast2srs.embl.de]
Bimpikis K, Budd A, Linding R, Gibson TJ: BLAST2SRS, a web server for flexible retrieval of related protein sequences in the SWISS-PROT and SPTrEMBL databases. Nucleic Acids Res. 2003, 31: 3792-3794. 10.1093/nar/gkg535.
Thompson JD, Higgins DG, Gibson TJ: Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994, 10: 19-29.
BIC web home page. [http://eta.embl-heidelberg.de:8000]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L: Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003, 31: 28-33. 10.1093/nar/gkg033.
NCBI BLAST. [http://www.ncbi.nlm.nih.gov/BLAST]
STRING: functional association protein networks. [http://www.bork.embl-heidelberg.de/STRING]
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, et al: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618. 10.1101/gr.361602.
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16: 944-945. 10.1093/bioinformatics/16.10.944.
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003, 31: 3497-3500. 10.1093/nar/gkg500.
Galtier N, Gouy M, Gautier C: SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci. 1996, 12: 543-548.
Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.
Gblocks server. [http://woody.embl-heidelberg.de/phylo/index.html]
Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17: 540-552.
Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.
Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 8: 275-282.
Perriere G, Gouy M: WWW-query: an on-line retrieval system for biological sequence banks. Biochimie. 1996, 78: 364-369. 10.1016/0300-9084(96)84768-7.
SMD: home page. [http://genome-www.stanford.edu/microarray]
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.
We thank Christian von Mering and Lars Jensen for helpful discussions. We are grateful to Fotis Kafatos for his consistent support of the Anopheles TEP studies.