Genomics and the bacterial species problem
© BioMed Central Ltd 2006
- Published: 29 September 2006
Whether or not bacteria have species is a perennially vexatious question. Given what we now know about variation among bacterial genomes, we argue that there is no intrinsic reason why the processes driving diversification and adaptation must produce groups of individuals sufficiently coherent in their genetic and phenotypic properties to merit the designation 'species' - although sometimes they might.
- Homologous Recombination
- Gene Content
- Lateral Gene Transfer
- Species Concept
- Bacillus Anthracis
"The species problem is caused by two conflicting motivations; the drive to devise and deploy categories, and the more modern wish to recognize and understand evolutionary groups. As understandable as it might be that we try to equate these two, and as reasonable and correct as it might be to use taxa as starting hypotheses of evolutionary groups, the problem will endure as long as we continue to fail to recognize our taxa as inherently subjective, and as long as we keep searching for a magic bullet, a concept that somehow makes a taxon and an evolutionary group both one and the same."
Jody Hey 
Thus Jody Hey  dismisses the vast and highly philosophical literature on the meaning of the word 'species'. Of course, this literature overwhelmingly addresses species in the context of eukaryote (especially vertebrate) evolution, and seldom tackles the special problems that microbes pose. We microbiologists, to our credit, have often acknowledged that the exercise of formulating a useful 'species definition' and the quest for an underlying 'species concept' are not the exactly same [2–6]. But we too have a 'species problem'.
What we want from a species definition is a set of easily applied and stable rules by which to decide when two organisms are similar enough in their genomic and/or phenotypic properties to be given the same name [5–8]. The needs for such a guide to taxonomic practice in medicine, biotechnology and defense are obvious, and even arbitrary rules to satisfy them would be better than no rules at all . We look to a species concept, on the other hand, for a genetic and/or ecological model of bacterial diversification and adaptation. Ideally, this model would make sense of our definition, justifying the choice of one particular set of rules for defining species as less arbitrary, or more natural, than another [2–4, 9–14]. Thus, while acknowledging the dual nature of our quest, we still hope for "a concept that somehow makes a taxon and an evolutionary group both one and the same" .
The prevailing bacterial species definition has species as a "category that circumscribes a (preferably) genomically coherent group of individual isolates/strains sharing a high degree of similarity in (many) independent features, comparatively tested under highly standardized conditions" . In practice, degree of similarity is assessed in molecular terms: "a prokaryotic species is considered to be a group of strains (including the type strain) that are characterized by a certain degree of phenotypic consistency, showing 70% of DNA-DNA binding and over 97% of 16S ribosomal RNA (rRNA) gene-sequence identity" . A more precise and appropriate modern measure, but limited in its application to sequenced genomes, is the average nucleotide identity (ANI) calculated from pair-wise comparison of all genes shared between any two strains.
An ANI of 94% generally corresponds to other molecular species definitions and to traditional taxonomic practice , so a solid consensus definition, genomic in spirit, may be in the offing. The more we learn about genomes, however, the more unlikely it seems that any unifying species concept will be possible. In particular, lateral gene transfer (LGT), within-species genomic variability and homologous recombination all make it harder to imagine how any single model for the maintenance of genomic coherence could be broadly valid or why, when valid, groups that match any single species definition should be the inevitable outcome.
In animal species, evolutionary novelties arise as mutant alleles within populations. Because of the presence of sex and recombination, selection can effect their fixation independently of alleles at other loci. Bacteria have been traditionally thought of as asexuals lacking recombination, with their populations being clones [2, 15, 16]. Favored alleles can still sweep to fixation, but they bring the rest of the genome in which they first occurred along for the ride. Still, even radical (species-founding) evolutionary novelties would originate as mutations occurring within the ancestral bacterial population. And, for both animal and bacterial species, genomic coherence - which we might define as a greater degree of similarity in gene content (the actual number and identity of the genes present) and gene sequence (the sequences of corresponding genes) within species than between species - would be maintained by the selective purging of variability, one gene at a time in sexual species and one genome at a time in asexuals. (In the early days of bacterial genetics, this genomic sweeping process was called 'periodic selection').
But genomics tells us that bacteria often acquire evolutionary novelties from outside the ancestral population by LGT [16–18]. Best studied, not surprisingly, are bacteria that have become pathogens by the acquisition of novel plasmids, chromosomal genes or mobile pathogenicity islands , but non-pathogens also evolve in this saltatory fashion. From a recent comparative genomics/metagenomics study of the cyanobacterium Prochlorococcus, the ocean's principal prokaryotic photosynthesizer, Coleman et al.  conclude that "genetic variability between phenotypically distinct strains that differ by less that 1% in 16S ribosomal RNA sequences occurs mostly in genomic islands. Island genes appear to have been acquired in part by phage-mediated lateral gene transfer, and some are differentially expressed under light and nutrient stress."
In this and many similar cases, many genes conferring a highly complex adaptation can be acquired in one event, instantly dividing a single population into two subpopulations that differ substantially in lifestyle but continue to share in a common gene pool. LGT radically uncouples the evolution of phenotype from the evolution of the bulk of the genome, as this is reflected in overall genome similarity (coherence). For instance, Bacillus anthracis (strain Ames ancestor), Bacillus cereus (ATCC1098) and Bacillus thuringiensis (serovar konkukian str. 97-27) all show more than 94% ANI (and so are a single species by this criterion and others), and are highly syntenic in chromosome structure. And yet they are famously different in phenotype - a virulent pathogen and potentially lethal bioterror agent, a cause of food poisoning, and a popular eco-friendly organic biopesticide, respectively.
For every acquired gene for which a role in a radical species-creating LGT event might be inferred, there will be dozens or hundreds more whose contributions - if any - to evolutionary novelty remain unknown. And even within species as traditionally defined there can be enormous strain-to-strain variation in gene content. In a survey of 33 clusters of strains (with 2-11 genomes per cluster) that would be considered species by the greater than 94% ANI criterion, we find anywhere from 1 to 4,404 genes per cluster that are present in some strains but absent from others (O. Zhaxybayeva, C.L. Nesbø and W.F.D, unpublished work). From a similar study, Konstantinidis and Tiedje  observe that strains of the same species by this criterion "can vary up to 30% in gene content", and raise the possibility of resetting the 'species' to something like a 99% ANI cut-off.
Five years ago, when only the tip of the iceberg of variability in gene content was visible, Lan and Reeves  suggested that we look at 'species genomes' as comprising a core set (all genes present in at least 95% of strains) and an auxiliary set (present in 1-95% of strains). Something like this notion is embraced in the more recently articulated 'pangenome' concept, this term denoting the total number of genes found in at least one of the strains of a species . In some species (such as Bacillus anthracis) the depth of the pangenome may have been plumbed after only a few genomes have been sequenced. For others, such as the ecologically versatile Streptococcus agalactiae, Tettelin et al.  suggest that "unique genes will continue to be identified even after sequencing hundreds of genomes."
The problem here (as we might have predicted from the comparisons of sequenced 'conspecific' genomes discussed above) is that these same strains may be enormously more diverse in gene content than they are in gene sequence (see Figure 1). In a survey of genome sizes of Vibrio splendidus isolates by pulsed-field gel electrophoresis, in which all the isolates were greater than 99% identical at the 16S level and all taken from a single site (albeit at multiple times) on the coast of Massachusetts, Thompson et al.  concluded that "this group consists of at least a thousand distinct genotypes, each occurring at extremely low environmental concentrations (on average less than one cell per milliliter)." Genome sizes varied by as much as 1 Mb among them. The authors' suggestion that much of the observed genome size (and hence gene content) variation may be selectively neutral is attractive. What clearly cannot be supported, however, is the notion that species qua ecotypes are genomically coherent.
Another surprise of the past decade is that bacteria are not all asexuals lacking recombination, but that in some homologous recombination is so frequent that it easily outperforms mutation as a source of strain-to-strain sequence differences . The evidence for this comes from multi-locus sequence analysis (MLSA) based on sequences from five to seven unlinked core housekeeping genes amplified from scores or hundreds of strains of a species and, more recently, from the use of recombination detection algorithms  with aligned long segments or entire genomes (from fewer strains). As Dykhuizen and Green presciently observed some 15 years ago , we might apply to such recombining groups something like Ernst Mayr's 'biological species concept' (BSC). In this context the BSC would require that a bacterial species maintains genomic coherence because its members share an exclusive common gene pool (see Figure 2). Different species would have separate gene pools, and diverge and adapt through the separate fixation within them of favorable mutations or laterally acquired genetic novelties.
If we are to base a robust bacterial species concept on such a traditional model we must know first, whether biological barriers to exchange between gene pools of related species can be expected to define species boundaries with anything like the sharpness that various prezygotic (for example, mating behavior) and postzygotic (for example, hybrid sterility) factors define animal species , and second, whether such sharpness is indeed observed. Both are in question.
One barrier to exchange could be a precipitous decline in the frequency of homologous recombination as sequences diverge. The strength of this barrier will vary between species because of idiosyncrasies of the recombinational machinery. More interestingly, it should also vary between genes because of their different rates of sequence divergence. And it does vary within species, thanks to mutations in the mismatch repair system, which can increase homologous recombination between moderately diverged (1-2%) genomes 1,000-fold, and permit homologous recombination between highly divergent (20%) sequences. Townsend et al.  calculate that such mutations elevate rates of adaptive evolution several thousand-fold, and the facts that mismatch repair mutants are common in nature (as if hitchhiking on the favorable recombination events they encourage) and that mismatch repair genes are often themselves mosaics (as if frequently themselves restored by homologous recombination) are good evidence that much adaptive evolution occurs through this transiently open window.
Although the periodic selection process at the heart of Cohan's ecotype model  will produce both genomic coherence and ecologically driven divergence if operating alone, homologous recombination between ecotypes can disrupt both these properties at all but the loci under selection. Although homologous recombination operating within, but not between, populations will promote both coherence and divergence, the barriers to between-population homologous recombination are contingent on many factors and unlikely to produce species of similar genomic coherence across the board. And crucially, LGT has the potential to radically disrupt any genomic coherence achieved by either model. Contingent ecological and biological factors (like the host specificities of phages, the prevalence of mismatch-repair mutants or the selective advantages of acquiring specific long DNA segments) will all affect coherence one way or another. We know too little about the frequencies of any of the underlying processes to predict their net effect - but enough to guess that it will not always be the same. We do know that coherence at the level of gene sequence (as measured by any single marker gene or by ANI) is very poorly coupled to coherence at the level of gene content (see Figure 1), however that might be maintained. And yet gene content is quite possibly the better predictor of coherence at the level of phenotype.
Indeed, genomics has given us too many processes with too many possible synergistic and antagonistic effects on genomic coherence - and in most cases we know too little about their relative magnitudes - to predict outcomes. If coherence were the usual observation, that is, if bacteria almost always fell into discrete clusters defined genomically (even if not phenotypically), then we would have an ample repertoire of known processes to explain this behavior - although still no reason to presume that the explanation would always be the same. But if such coherence were not the usual observation, then we could use what we know about process to explain that too.
So what is the usual observation? Opinions on this seem unstable. In 2002, Cohan  wrote that "bacterial species exist - on this much bacteriologists can agree", while Stackebrandt et al.  asserted that "experimental and theoretic evidence is compelling that the 'lumpy diversity' present in prokaryotes is recognizable as discrete centers of variation when appropriate methods are applied." In 2005, however, both Cohan and Stackebrandt were authors on a publication that suggested that "it might not be possible to delineate groups within a continuous spectrum of genotypic variation: that is, clustering might not occur ..." .
A path more squarely down the middle was taken by Hanage et al.  in summarizing an MLSA study of Neisseria.
"The bacterial domain of life is not uniform. Instead we see clumps of similar strains that share many characteristics, and with an innate human urge to classify, we have defined these as species. This work shows that by applying a simple approach using sequence data from multiple core housekeeping loci, we can resolve those clusters, provided such clusters exist. However, these species clusters are not ideal entities with sharp and unambiguous boundaries; instead they come in multiple forms and their fringes, especially in recombinogenic bacteria, may be fuzzy and indistinct." .
To return to our original quotation, Hey  is right in the case of bacteria too: the species problem is very much in our heads. Sometimes the many contingent genetic and ecological forces driving bacterial genome evolution will have produced clusters of genomes so much like each other and so much unlike any others in the world that even the tightest species definition will be satisfied. Sometimes this will merely appear to be so, because we have selected as medically interesting, or have been able to culture, certain organisms only by virtue of their possession of a single gene, while a spectrum of otherwise genomically similar relatives lacking it have gone unnoticed. Sometimes it will not be so, the contingent genetic and ecological forces working against each other and producing 'clusters' so fuzzy and with gene content versus genome sequence incongruities so striking that even the loosest criteria for genomic coherence cannot be met. We might, in an effort to match definition and concept, choose to think of genuine 'species' as those evolutionary groups that both satisfy an accepted species definition based on genomic coherence and whose coherence can be understood as the product of a biological process, as in the ecotype or BSC model. But many bacteria will not belong to such groups - and it is not a given that any such 'genuine' species exist.
There will, of course, always be a need to have some agreed-upon way of naming organisms, some species definition. Konstantinidis and Tiedje  suggest, primarily because of variability in gene content among closely related strains, that "standards could be as stringent as including only strains that show a greater than 99% ANI, or are less identical at the nucleotide level but share an overlapping ecological niche." But they do not endorse such a tightening up, because this "would instantaneously increase the number of existing species probably by a factor of 10, and cause considerable confusion in the diagnostic and regulatory (legal) fields". Without a magic bullet that makes our species definition and our species concept (or concepts) "one and the same", such expediency considerations will always - and legitimately - play a role in defining species.
It will often also be expedient to think in terms of lineages of strains within species and of phylogenetic relationships between species. There seems to be no other sensible way of doing this than to use concatenated shared (core) genes, and to represent the results as trees [18, 30, 31]. Useful as such trees may be, we must realize that they will not represent the true intergenomic relationships in recombinogenic groups, which will be reticulate, not tree-like - nor will they describe the evolutionary behavior of the non-core part of the pangenome of any species, which may be much larger than the core .
In understanding genome evolution, the 'species concept' does limited work. The ecotype and BSC models (see Figure 2) are useful heuristics, but calling them models for speciation does not make them more useful. In biogeography and biodiversity studies, the word 'species' may actually work some mischief. Questions such as 'How many species of bacteria are there?' or 'Are bacterial species cosmopolitan?' are invaluable in stimulating research into the diversity and distribution of microbial genotypes and phenotypes. But without a species definition coupled to a magic bullet concept that guarantees that defined species are natural biological entities, these questions would be better reformulated in terms of genotypes and phenotypes. There will never be such a magic bullet. In using species concepts, we microbiologists would do well to follow the advice of a philosopher, William James, who wrote: "Since it is only the conceptual form which forces the dialectic contradictions upon the innocent sensible reality, the remedy would seem to be simple. Use concepts when they help, and drop them when they hinder, understanding."
We thank Joe Bielawski, Eric Bapteste, Paco Rodriguez-Valera and Olga Zhaxybayeva for invaluable comments, and CIHR and Genome Atlantic for support.
- Hey J: The mind of the species problem. Trends Ecol Evol. 2001, 16: 326-330. 10.1016/S0169-5347(01)02145-0.PubMedView ArticleGoogle Scholar
- Franklin L: Bacteria, sex and systematics. Philosophy of Science. 2006.Google Scholar
- Ward DM: A natural species concept for prokaryotes. Curr Opin Microbiol. 1998, 1: 271-277. 10.1016/S1369-5274(98)80029-5.PubMedView ArticleGoogle Scholar
- Rosselló-Mora E, Amann R: The species concept for prokaryotes. FEMS Microbiol Rev. 2001, 25: 39-37. 10.1016/S0168-6445(00)00040-1.PubMedView ArticleGoogle Scholar
- Gevers D, Cohan FM, Lawrence JG, Sprat BG, Coeyne T, Feil EJ, Stackebrandt E, Van de Peer Y, Vandamme P, Thompson FL, Swings J: Opinion: Re-evaluating prokaryotic species. Nat Rev Microbiol. 2005, 3: 733-739. 10.1038/nrmicro1236.PubMedView ArticleGoogle Scholar
- Stackebrandt E, Frederiksen W, Garrity GM, Grimont PAD, Kämpfer P, Maiden MCJ, Nesme X, Roselló-Mora R, Swings J, Trüper HG, et al: Report of the ad hoc committee for the reevaluation of the species definition in bacteriology. Int J Syst Evol Microbiol. 2002, 52: 1043-1047. 10.1099/ijs.0.02360-0.PubMedGoogle Scholar
- Konstantinidis KT, Tiedje JM: Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci USA. 2005, 102: 2567-2572. 10.1073/pnas.0409727102.PubMedPubMed CentralView ArticleGoogle Scholar
- Lan R, Reeves PR: Intraspecies variation in bacterial genomes: the need for a species genome concept. Trends Microbiol. 2000, 8: 396-401. 10.1016/S0966-842X(00)01791-1.PubMedView ArticleGoogle Scholar
- Godreuil S, Cohan F, Shah H, Tibayrenc M: Which species concept for pathogenic bacteria? An E-Debate. Infect Genet Evol. 2005, 5: 375-87. 10.1016/j.meegid.2004.03.004.PubMedView ArticleGoogle Scholar
- Hanage WP, Fraser C, Spratt BG: Fuzzy species among recombinogenic bacteria. BMC Biol. 2005, 3: 6-10.1186/1741-7007-3-6.PubMedPubMed CentralView ArticleGoogle Scholar
- Cohan FM: What are bacterial species?. Annu Rev Microbiol. 2002, 56: 457-487. 10.1146/annurev.micro.56.012302.160634.PubMedView ArticleGoogle Scholar
- Dykhuizen DE, Green L: Recombination in Escherichia coli and the definition of biological species. J Bacteriol. 1991, 173: 7257-7268.PubMedPubMed CentralGoogle Scholar
- Lawrence JG: Gene transfer in bacteria: speciation without species?. Theor Popul Biol. 2002, 61: 449-460. 10.1006/tpbi.2002.1587.PubMedView ArticleGoogle Scholar
- Nesbø CL, Dlutek M, Doolittle WF: Recombination in Thermotoga: implications for species concepts and biogeography. Genetics. 2006, 172: 759-769. 10.1534/genetics.105.049312.PubMedPubMed CentralView ArticleGoogle Scholar
- Levin BR, Bergstrom CT: Bacteria are different: observations, interpretations, speculations, and opinions about the mechanisms of adaptive evolution in prokaryotes. Proc Natl Acad Sci USA. 2000, 97: 6981-6985. 10.1073/pnas.97.13.6981.PubMedPubMed CentralView ArticleGoogle Scholar
- Gogarten JP, Doolittle WF, Lawrence JG: Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 2002, 19: 2226-2238.PubMedView ArticleGoogle Scholar
- Doolittle WF: Phylogenetic classification and the universal tree. Science. 1999, 284: 2124-2129. 10.1126/science.284.5423.2124.PubMedView ArticleGoogle Scholar
- Lerat E, Daubin V, Ochman H, Moran NA: Evolutionary origins of genomic repertoires in bacteria. PLoS Biol. 2005, 3: e130-10.1371/journal.pbio.0030130.PubMedPubMed CentralView ArticleGoogle Scholar
- Dobrindt U, Hochhut B, Hentschel U, Hacker J: Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2004, 2: 414-424. 10.1038/nrmicro884.PubMedView ArticleGoogle Scholar
- Coleman ML, Sullivan MB, Martiny AC, Steglich C, Barry K, Delong EF, Chisholm SW: Genomic islands and the ecology and evolution of Prochlorococcus. Science. 2006, 311: 1768-1770. 10.1126/science.1122050.PubMedView ArticleGoogle Scholar
- Fraser-Liggett CM: Insights on biology and evolution from microbial genome sequencing. Genome Res. 2005, 15: 1603-1610. 10.1101/gr.3724205.PubMedView ArticleGoogle Scholar
- Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin A, et al: Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci USA. 2005, 102: 13950-13955. 10.1073/pnas.0506758102.PubMedPubMed CentralView ArticleGoogle Scholar
- Acinas SG, Klepac-Ceraj V, Hunt DE, Pharino C, Ceraj I, Distel DL, Polz MF: Fine-scale phylogenetic architecture of a complex bacterial community. Nature. 2004, 430: 551-554. 10.1038/nature02649.PubMedView ArticleGoogle Scholar
- Giovannoni S: Evolutionary biology: oceans of bacteria. Nature. 2004, 430: 515-516. 10.1038/430515a.PubMedView ArticleGoogle Scholar
- Thompson JR, Pacocha S, Pharino C, Klepac-Ceraj V, Hunt DE, Benoit J, Sarma-Rupavtarm R, Distel DL, Polz MF: Genotypic diversity within a natural coastal bacterioplankton population. Science. 2005, 307: 1311-1313. 10.1126/science.1106028.PubMedView ArticleGoogle Scholar
- Feil EJ, Spratt BG: Recombination and the population structures of bacterial pathogens. Annu Rev Microbiol. 2001, 55: 561-90. 10.1146/annurev.micro.55.1.561.PubMedView ArticleGoogle Scholar
- Mau B, Glasner JD, Darling AE, Perna NT: Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli. Genome Biol. 2006, 7: R44-10.1186/gb-2006-7-5-r44.PubMedPubMed CentralView ArticleGoogle Scholar
- Townsend JP, Nielsen KM, Fisher DS, Hartl DL: Horizontal acquisition of divergent chromosomal DNA in bacteria: effects of mutator phenotypes. Genetics. 2003, 164: 13-21.PubMedPubMed CentralGoogle Scholar
- Heinemann JA, Sprague GF: Bacterial conjugative plasmids mobilize DNA transfer between bacteria and yeast. Nature. 1989, 340: 205-209. 10.1038/340205a0.PubMedView ArticleGoogle Scholar
- Lan R, Reeves PR: When does a clone deserve a name? A perspective in bacterial species based on population genetics. Trends Microbiol. 2001, 9: 419-424. 10.1016/S0966-842X(01)02133-3.PubMedView ArticleGoogle Scholar
- Wertz JE, Goldstone C, Gordon DM, Riley MA: A molecular phylogeny of enteric bacteria and implications for a bacterial species concept. J Evol Biol. 2003, 16: 1236-1248. 10.1046/j.1420-9101.2003.00612.x.PubMedView ArticleGoogle Scholar
- Legault BA, Lopez-Lopez A, Alba-Casado JC, Doolittle WF, Bolhuis H, Rodriguez-Valera F, Papke TR: Environmental genomics of "Haloquadratum walsbyi" in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species. BMC Genomics. 2006, 7: 171-10.1186/1471-2164-7-171.PubMedPubMed CentralView ArticleGoogle Scholar