Analysis of genetic systems using experimental evolution and whole-genome sequencing

Whole-genome sequencing of bacteria evolving in the laboratory promises to reveal the complex network of mutations that underlie adaptation.

The comparative study of extant genomes has revolutionized biology, shedding light not only on evolution but also on physiology, genetics and medicine. But the utility of comparisons among naturally evolved isolates is lessened by incomplete knowledge of the environment to which the organisms adapted. Precise knowledge of conditions is attainable only in comparative genomic studies of organisms that have diverged under the controlled conditions of the laboratory, where it is possible to run replicate experiments that distinguish which outcomes are inevitable and which the result of mere chance.
Advanced sequencing and mutation-detection technologies now make it possible to reveal the complete genetic basis for an adaptive trait that separates an evolved clone from a reference strain [1][2][3][4]. The first whole-genome sequencing of cellular organisms adapted to controlled laboratory conditions has already revealed mutations that contribute to symbiosis [1] and cooperative behavior [5][6][7]. A new study by Herring et al. [8] takes whole-genome sequencing a significant step further by exploring parallel evolution and its dynamics in replicate populations of Escherichia coli. They also provide direct characterizations of the effects of the detected mutations using site-directed mutagenesis. Their results offer clues to how complex biological systems function and evolve, suggesting that adaptive regulation can occur not only at the loci of genes that are directly involved in the adaptive trait but also in distant areas of the network. Whole-genome sequencing of parallel evolved strains promises to reveal novel functional links among genes and genetic modules. Future studies may be able to use genome-sequencing technologies to answer a range of pressing questions in biology and evolution: how biological networks are constructed, constrained, and modified; how clonal interference shapes the outcomes of evolution; and what is the complete spectrum of genetic mutations available to selection.

The advantages of bacteria for experimental evolution
In 1893, HL Russell, a bacteriologist at the University of Wisconsin, enumerated some of the "evident advantages that bacteria possess for experimental research in evolutionary biology" [9]. These included how the "physical and chemical environment [in which bacteria grow] can be so rigidly controlled that the variability of conditions …is practically excluded", as well as how, by virtue of short generation times, a "rapid successive transference of cultures to fresh media can secure the effect of an experiment covering an immense number of generations within a limited space of time" [9]. Russell's ideas appear to have remained unrealized for nearly a century, but the field of experimental evolution finally emerged as a vibrant and independent discipline towards the end of the twentieth century [10]. With advances in the culture and genetic manipulation of microbes, investigators in the 1980s began to let microbes compete and evolve in the laboratory. Early studies used the ability to obtain precise fitness measurements in chemostats to reveal subtle fitness differences associated with natural, induced, and engineered mutations [11], demonstrating the direct link between metabolic flux and fitness [12]. Later experiments, using long-term laboratory evolution of parallel lines, were aimed at the key evolutionary question of how much variability we would expect were we to replay the 'tape of life' [13]; that is, how reproducible are evolutionary outcomes. The most celebrated long-term parallel experiment was begun by RE Lenski in 1988 with 12 replicate populations of Escherichia coli and has been running continuously for almost 20 years and more than 40,000 generations in glucose-limited medium [14]. These long-term lines have shed much light on the inherent variability of the evolutionary process at a range of phenotypic levels [14,15]. Now, with recent advances in genomic technologies, these questions have begun to be addressed at the genotypic level [14,[16][17][18]. With wholegenome sequencing, all genetic changes underlying an adaptive trait can be revealed and their dynamics tracked over time. The new study by Herring et al. [8] suggests some of the ways in which whole-genome sequencing will provide deeper insight into the connections between parallel evolution at the genotypic and phenotypic levels.

Parallel adaptation in functional modules
One salient finding that has emerged from laboratory studies of evolving microbes is that parallel evolutionary changes are often seen in replicate populations adapting to a novel environment. Parallel evolution is a hallmark of natural selection: identical or very similar changes reach high frequency or fixation in independent lineages evolving under identical conditions. The use of parallel evolution to infer that adaptation had occurred was first applied to morphological traits [19], but it has been even more convincing in the world of molecules [4,14,[20][21][22][23][24][25]. With their spartan genomes, RNA and DNA viruses were the first organisms for which individual genomes from replicate laboratory populations were fully sequenced. Although whole-genome sequencing reveals all the mutations between an evolved strain and its ancestor, further experimentation is needed to show whether any of these mutations are neutral and how the various mutations combine to form an adaptive trait. In addition to the revelation that the vast majority of mutations that reach appreciable frequency in viral populations are beneficial, such sequencing studies produced striking examples of parallel evolution -often exactly the same change in the same amino acid [26].
It is perhaps not so surprising that we should find a limited set of changes and pervasive parallel evolution in viruses, whose genomes are very small and which lack the complicated regulatory networks of the higher forms of life. Evolution acts on biological function, and in viruses functions are often mapped to single genes. In complex cellular life forms such as bacteria and yeast, however, complex functions are typically attributed to modules of genes [27]. We might expect, therefore, that parallel evolution for cellular life does not necessarily mean similar changes in the same genes but rather similar changes in related modules. For example, recent studies have found that a phenotype under significant positive selection in Lenski's long-term lines is the degree of DNA supercoiling [21]. Although a candidate-gene approach revealed the genes responsible for the changes in supercoiling in some of the evolving strains, the genetic causes underlying the same phenotype in many of his other strains remain obscure [21]. Whole-genome sequencing of these lines could undoubtedly reveal the many different genetic changes that can produce the same parallel phenotypic change in DNA topology, and it could thus unmask the supercoiling genemodule under selection. Through the revelation of parallel cellular phenotypes produced by seemingly dissimilar genetic changes, functional connections within and between genetic modules [28][29] can now be revealed by experimental evolution coupled with whole-genome sequencing.
The current study by Herring et al. [8] focuses on metabolism and its regulation. Metabolism provides perhaps the best example of a large cellular network (comprising hundreds of genes) that is relatively amenable to quantitative phenotypic predictions at the whole-cell level [30][31][32][33]. Although the overall optimal fitness of adapting populations limited by a given single metabolic resource can be predicted [34,35], only some of the mutations underlying the actual fitness changes appear in the list of candidate genes [34]. Presumably, a regulatory change in a remote location of the network can have far-reaching and unexpected effects.
Using a new microarray-based technology of whole-genome resequencing for identifying the changes between a known and reference strain, Herring et al. [8] explore the parallel changes in metabolic and regulatory networks that appeared in five replicate E. coli populations that evolved separately in glycerol minimal media for 44 days. This study provides new examples of parallel evolution in candidate genes, but also, as a consequence of the comprehensive sequence information obtained, begins to provide examples of how remote changes might propagate through complex networks and how seemingly disparate changes can have similar phenotypic effects.
Herring et al. [8] observed parallel changes in both global regulation patterns and local protein sequences. Resequencing five clones -one clone from each of the replicate populations -the authors identified 13 mutations. A single gene, glpK (encoding glycerol kinase), was mutated in all five lineages. The protein synthesized by this gene catalyzes the first step in glycerol catabolism, and the mutations in this gene led to more than 50% increases in the reaction rate of glycerol kinase. This is an exceptional example of parallel evolution that resonates with the results from experimental viral evolution.
Apart from the glycerol kinase mutations, the most significant mutations identified affected global transcription patterns. The largest fitness changes (representing roughly half of the total increase in growth rate) in any of the five populations resulted from mutations in genes encoding the two major subunits of RNA polymerase (rpoB and rpoC). In three of the five populations, natural selection fixed a mutant variant of rpoB or rpoC within 25 days from the start of the experiment. The reason that these changes were beneficial is unknown. Two of these populations subsequently experienced a sweep of secondary mutations that were only conditionally beneficial; these may represent compensatory changes that might have been needed to alleviate the presumed deleterious effects of global changes in gene expression. One of the populations that did not have mutations in RNA polymerase had an 82 basepair deletion adjacent to crr, which encodes critical component in catabolite repression. Herring et al. [8] suggest that attenuation of crr expression, as well as the mutations in the genes encoding RNA polymerase, may reduce the expression of genes that lead to catabolite repression, which inhibits growth on glycerol. The basis of these effects is still to be identified.
Whole-genome sequencing coupled with the careful control of conditions that is possible in laboratory evolution thus allowed Herring et al. [8] to demonstrate how molecular evolution proceeds both in cis and in trans: that is, adaptation involves local changes to specific proteins (for example, glpK) as well as remote regulatory changes.

Studying the basis of clonal interference by whole-genome sequencing
Herring et al. [8] sequenced clonal samples from their populations after 44 days. Sequencing of many clones from each population is still technologically unfeasible. How different would the results have been if it was possible to sequence many different individuals from each evolving population? Bacterial populations invariably show some degree of genetic variability as a result of spontaneous mutation rates and genetic drift of neutral and deleterious alleles. But beneficial mutations are particularly important to population heterogeneity, especially on laboratory timescales. Microbial evolution invariably includes clonal interference among competing lineages, that is, multiple distinct beneficial mutations spread through a population at a given time [25,[36][37][38][39][40][41][42][43][44][45]. On laboratory timescales during which horizontal transfer of mutations is negligible, beneficial mutations remain linked to the genome in which they appeared, and so the spread of one beneficial mutation can impede the spread of others.
Herring et al. [8] recognized that clonal interference shaped the evolution of their populations, and they attempted to discover competing clonal lineages by sequencing the handful of candidate genes suggested by their whole-genome sequencing in search of alternative alleles. They found four alternative glpK alleles in two populations. Furthermore, their time course of allele frequency measurements shows several telltale signs of clonal interference, such as transient or permanent decreases in frequency of particular beneficial alleles after an initial rise, indicating competition with a fitter lineage. The independent appearances of mutations in glpK and rpoC in replicate populations is a less obvious consequence of clonal interference -many beneficial mutations of small effect are probably spreading through the population but do not reach high frequency by the time the strong mutations in glpK and rpoC spread through the population.
As whole-genome sequencing becomes cheaper and more reliable, it will be easier to study clonal interference as a mechanism affecting the overall rate of adaptation. One question is whether clonal interference happens most frequently between clones with roughly the same phenotype -that is, competition between different genotypic changes in the same specific genes, pathways, or regulatory networks -or whether different phenotypic changes are competing instead.

The raw material of evolution
In the relatively brief evolutionary timescales and moderate population sizes of studies such as that of Herring et al. [8], neutral mutations would not have had time to spread appreciably in the population by genetic drift. Furthermore, although new neutral and deleterious mutations would arise every generation, deleterious and neutral alleles are pushed toward extinction as lineages carrying strongly beneficial alleles spread to fixation. Thus, it is not surprising that the few mutations discovered by Herring et al. [8] were all beneficial, and typically of large effect. In addition to studying adaptive mutations, whole-genome sequencing could be used to explore better the actual underlying genotypic spectrum of mutations before selection's winnowing; that is, it could be used to look at what the raw material presented to natural selection is and how it varies across organisms and environments.
Whole-genome sequencing can elucidate the nature of spontaneous mutations when coupled with experimental evolution in mutation accumulation (MA) lines. MA lines are evolved for many generations with as little selection as experimentally feasible [46][47][48][49]. This is typically achieved by putting a population through consecutive one-organism bottlenecks every few generations. This allows one to observe how deleterious, neutral and beneficial mutations accumulate without selection. Whole-genome sequencing of MA lines offers considerable promise for seeing the types of mutations that arise in such selection-less experiments. This will enable geneticists to go from the most basal level, the mutations that compose the raw material for evolution, all the way through gene function and regulation to the ultimate evolutionary phenotype -fitness.
When we see how much whole-genome sequencing has revealed about evolution in nature [50,51], we can imagine how much more can be learned about evolution on a laboratory timescale. By sequencing clones from populations that have evolved in identical laboratory conditions, Herring et al. [8] provide further evidence for the ubiquity of parallel evolution on the genotypic level, and their study suggests that remote changes are propagated through genetic systems. Experimental evolution, coupled with genomic technologies, is poised to answer many important questions at the interface between cellular processes and observed evolutionary consequences. Evolution is becoming a powerful tool for studying biological processes, principles and systems.