Life at the extreme: lessons from the genome.

Extremophile plants thrive in places where most plant species cannot survive. Recent developments in high-throughput technologies and comparative genomics are shedding light on the evolutionary mechanisms leading to their adaptation.

Vascular plants have adapted to virtually all terrestrial environments, no matter how benign or stressful. Extremo philes are the plants operating in the most chal lenging environments [1], such as those dominated by the extreme cold in Antarctica [2], wide temperature swings and extreme drought in deserts [3], or salinity in combination with a broad range of other stresses. This last group, the halophytes, are the best documented [4]; the Kew Gardens database [5] recognizes over 1,500 species. Table 1 summarizes some examples of extremo phile transcriptomes and genomes that have been pub lished in recent years, at increasing levels of complexity as new sequencing technologies have become available. Six of these plants and their ecological contexts, not all familiar to most plant biologists, are illustrated in Figure 1.
Because of their diverse life forms and life history strategies and in some cases their experimental tracta bility, halophytes have attracted more attention than the other groups at the molecular level. These include shrubs and forbs (such as Salicornia spp. (Table 1, Figure 1d), Chenopodium spp., Atriplex spp.), grasses (such as Festuca rubra (Table 1), Spartina spp., Aeluropus spp., and two adapted to saline sodic deserts, Leptochloa fusca and Leymus chinensis), trees (several mangroves, espe cially Avicennia and members of the Rhizophoraceae), and desert succulents (especially Mesembryanthemum crystallinum, Table 1, Figure 1c). Perhaps most import antly, from the standpoint of comparative genomics, the halophytes also include highly salttolerant close relatives of Arabidopsis thaliana.
Extremophiles are not simply outliers, plants with little to offer to the mainstream defined by poorly stress adapted model plants. They occupy one end of a con tinuum of plant abilities to withstand stress. In all extreme environments, multiple stresses arise concur rently. For example, saline environments are often poor in essential nutrients (especially N and P), but replete to the point of toxicity in others (for example Mg, sulfate or micronutrients). They may experience seasonal swings between flooding and droughtrelated salt pans (for example, as shown in Figure 1b). Daily and seasonal tempera ture ranges may be very broad, or, increasingly over the past century, they may be natural or agricultural ecosystems degraded by overgrazing or inappropriate irrigation management. Understanding plants endemic to these environments provides us with the opportunity to understand the successful and unsuccessful adjust ments that less tolerant plants make when faced with lesser stresses [1,6].
Plant environmental responses are coordinated through crosstalk among multiple signaling and stressresponse networks, and one of the major goals of modern plant biology is to understand these. For example, dehydration response elements, redox controls and the downstream processes they regulate are central to drought and cold responses [7]. In addition, abscisic acid mediates a broad range of environmental responses [8]. But networks are often, if not always, more complicated than can be revealed by analysis of genes 'known to be involved' in particular responses; using Gaussian graphical methods, for example, Ma et al. [9] visualized response networks to salt involved in signaling and adaptation including a large number of unknown and uncharacterized genes. Clearly, 450 million years of land plant evolution has generated biological complexity that cannot be repre sented by the sequence of a single species, such as A. thaliana, or even a single representative of each major clade. By scrutinizing the few plant genomes that are available, however, the plant biology community is begin ning to identify characters of developmental, physiolo gical, and environmental integrative quality that can be deduced and refined into hypotheses for further scrutiny.
Nextgeneration sequencing (NGS) technologies (espe cially Roche 454 and IlluminaSolexa) brought with them the promise of highquality, highvolume, lowcost genomes and transcriptomes. In fact, it is meeting this expectation. Using the resulting datasets, it is now possible to address the evolutionary mechanisms leading to adaptation to extreme environments. The recently sequenced genome of Thellungiella parvula [10] exem plifies such efforts, providing resources for highresolu tion genomewide comparison with its nonextremophile relative, A. thaliana.
Here, we look at three notable evolutionary features reflected in the genomes that may contribute to adaptations to abiotic stress. These are gene duplication, lineagespecific, largely functionally uncharacterized genes, and epigenomic modifications effected by abiotic stress.

Genomic resources: the harvest of cheap deep sequencing
Clearly, the search for genetic mechanisms for environ mental adaptation was never on hold pending the inven tion of NGS. Differences in individual genes un question ably have a big role in adaptation to stress. In some cases, they have been inferred from the primary sequences of wellcharacterized genes, such as the 37aminoacid stretch in lmyoinositol1phosphate synthase, which distinguishes the salttolerant wild rice (Porteresia coarctata) from domesticated rice (Oryza sativa) [11], or the singleaminoacid variation in AtHKT1;1 (which encodes the highaffinity K + transporter 1) that distin guishes coastal from inland clines of Arabidopsis [12]. In other cases, they have been implicated by the constitu tively higher expression in the absence of stress of genes that are induced by stress in Arabidopsis, as in the resurrection plant Craterostigma plantagineum [13], the salttolerant poplar Populus euphratica [14], or the Arabi dopsis relatives T. parvula and Thellungiella salsuginea (formerly T. halophila) [1517].
But genomes are far more than collections of protein coding sequences. To extend the search for 'genetic mechanisms' beyond this level of primary DNA or cDNA sequences, highquality genomic resources are a para mount necessity. Especially critical are the genomes of closely related species, or even genotypes, that have adapted to different climates and habitats (that is, that have different lifestyles). Such genomes are beginning to appear, albeit few being proper extremophiles. The strawberry, apple, and peach genomes in the Rosaceae, for example, have begun to reveal how artificial selection for fruit quality has shaped these genomes [18]. Differences reflecting natural selection should also be discernible, for a start, from resources such as those summarized in Table 1.
However, given the long history of Arabidopsis as a model system, the new genomes most immediately useful for comparative studies at this point are likely to be those Craterostigma plantagineum Scrophulariales NGS -transcriptome Desiccation [13] Solanum commersonii Solanales Microarray Cold [78] closely related to it. One of these is the genome of Arabidopsis lyrata [19], a potential comparative model for drought tolerance [20], and T. parvula (Figure 1a,b) will be perhaps even more useful for elucidating a broad range of environmental adaptations [10]. This species and the congeneric T. salsuginea are endemic to regions that experience temperature extremes, poor, degraded, and toxic soils, and especially very high salinities [6,21]. The T. parvula genome is of particular interest because chromosomal assemblies that approach the coverage of A. thaliana are available. Moreover, because the Thellungiella species share many of the characteristics that led to the acceptance of Arabidopsis as a model (size, growth habit, seed amount, mutants, and transformation ability), they have been recognized as excellent candidates for comparative genomics studies [15,22].

Data prospecting and data mining -finding the gems in the genome
Given the evolutionary continuum of genomelevel adaptations to abiotic stress, the signatures of the critical adaptive mechanisms must be archived in the genomes of extremophiles. These are the gems in the genome; the challenge is to find and understand them. Comparisons of known genes and transfers between species the mainstay approach before cheap deep sequencing can now be supplemented with more extensive genome prospecting, and thereafter with large scale data mining.
In this section, we consider three issues as they apply to the problem: what has been explored so far, what has been found, and what is needed to move forward. First, comparing gene expression at the broad level reflected in Gene Ontology (GO) profiles, stresstolerant and sensitive species show different patterns [10]. Salt tolerant extremophiles, on the one hand, seem to have a bias towards ion transporters in the gene function GO category that is not found in glycophytic species such as Arabidopsis. This bias is evident, for example, both in T. parvula and T. salsuginea [23,24] and in the unrelated salt marsh halophyte Limonium sinense [25]. Arabidopsis, on the other hand, has invested in an arsenal of pathogen responsive and developmentally related genes. It is reasonable to suppose although future research could prove otherwise that transporters would be critical to salt stress tolerance, and that developmental flexibility and pathogen protection would be important for a winter annual in a high resource environment.
Wholetranscriptome analyses of two mangrove species, Heritiera littoralis (Malvaceae; Figure 1d) and Rhizophora mangle (Rhizophoraceae; Figure 1e), showed a similar high representation of transportrelated genes. Interestingly, despite these species having different life histories and physiological strategies in their adaptation to tropical intertidal habitats, their transcriptomes showed strikingly similar allocations in GO and Kyoto Encyclo pedia of Genes and Genomes (KEGG) functional cate gories, suggesting convergent evolution as 'mangroves' [26].
Going beyond transcriptomes, at the genome level, where are the gems, that is, what are targets currently considered most promising as being part of integrative mechanisms that lead to stress adaptation? At this point, there are few genomes complete enough to allow detailed comparisons, essentially only T. parvula and A. thaliana. In these two, although the gene spaces show extensive overall colinearity, there are also major translocations of generich regions and extensive changes in intergenic sequences [10,15]. Beyond this, there are three promis ing, potentially adaptive linkages to explore. These involve gene duplication, lineagespecific sequences, and epi genetic regulation. We look at these briefly below, with particular reference to their contributions as reflected in the newly released genome of T. parvula and the testable predictions that follow.

Stress adaptation by gene duplication
A striking feature of all plant genomes is gene enrichment due to duplication events. Suggested by Haldane in 1932 [27] and later popularized by Ohno [28], gene duplication as an evolutionary mechanism that adds new biological function is a wellestablished idea. Both the duplication rate and the proportion of retained duplicates seem to be greater in plants than in the other domains of life [29]. With respect to individual genes, the result is termed copy number variation (CNV). From resequencing the genomes of 80 individual Arabidopsis ecotypes, it seems that natural selection has led to CNVs covering 2.2 Mb of the reference genome [30]. CNVs can also arise in a short time. For example, they appeared in Arabidopsis in several generations under the selection pressure of a continuous stress in the laboratory [31]. These were distributed with a 42%:58% ratio between those initiated by transposable elements (TEs) and those involving tandem duplications.
Practically all angiosperms have polyploidy somewhere in their history, either current or long past. The initially increased gene dosage following duplication is often assumed to be beneficial for survival in new habitats, at least in the short term [32]. But although there are certainly polyploid species known for their extreme adaptations to abiotic stresses, an equal fraction are adapted to less harsh conditions, and there are also diploid extremophiles (including Thellungiella spp.). Thus, there is little overall evidence that polyploidy itself is a major evolutionary driving force leading to extremophiles.
In most plants, including T. parvula, genomes enriched by polyploidy have subsequently experienced extensive gene losses [33]. Their modern genomes reflect this. On the other hand, the copy numbers of other genes have increased as a result of segmental or tandem duplication events and duplicationtranslocation events. Individual copies of duplicated genes have, in many cases, also assumed new functionality resulting from mutation (neo functionalization), or become specialized by acquisition of new promoters or regulatory elements (subfunc tionali zation). One such example is found in allopolyploid cotton (Gossypium hirsutum), in which reciprocal silencing of alcohol dehydrogenase homologs led to their expression in different tissues under distinct abiotic stresses [34].
An example of changes in transcript expression and neofunctionalization is provided by homologs encoding HKT1, a plasma membrane Na + /K + transporter con sidered to be a genetic determinant of salt tolerance [12,35]. HKT1 exists as tandem duplicated copies in both Thellungiella species [10,17]. One copy encodes new protein functionality and also has an expression pattern different from that of the Arabidopsis counterpart [17]. This copy, called TsHKT1;2 in T. salsuginea, is induced under salt stress and leads to continued uptake of potassium ions. By contrast, TsHKT1;1 in Thellungiella behaves like the singlecopy AtHKT1; because this protein transports sodium ions under salt stress [36], it exacer bates stress unless its expression is downregulated [37].
In T. parvula and in A. thaliana, a major source of CNV has been tandem duplication [10]. The extant populations of unique tandem duplicates reflect the fact that both copies originated since the species diverged about 11 million years ago [38] and that selective gene loss has occurred in each taxon in response to environ mental selective pressures. Either through gene duplica tion or expression strength differences, a large number of other seemingly stressrelevant genes that have not been recognized in Arabidopsis show the hallmarks of CNV in Thellungiella, including a variety of ion transporters and membranelocated proton ATPases [10]. Such a difference might be expected, as Thellungiella shares only 40% of saltinduced regulation of transcript expression with A. thaliana [39].
Tandem duplications seem to have a more important role in shaping genomes for stress adaptations than polyploidy, segmental transpositionduplications, or ectopic duplication and translocation [40]; recombination and tandem duplication events may both become accelerated by environmental challenges [29]. As the result of unequal crossingover during recombination, tandem duplications vary in their 'genetic neighborhoods' , with copies receiving different regulatory motifs that can lead to drastic changes in expression [40]. A comparative study on plant genomes ranging from Arabidopsis to Physcomitrella showed genes associated with defense, transport functions, or abiotic stress responses enriched in tandem duplicates, whereas duplicates due to other mechanisms included genes enriched in other intracellular regulatory roles [41].
The A. thaliana and T. parvula genomes have approxi mately 10% of their total genes in tandem duplicates [10], and they are clearly implicated in the species' dramati cally different stress tolerance strategies. This is exem pli fied by the amplification of NHX8 homologs (Figure 2a), known to encode a putative Li + transporter in A. thaliana [42]. The duplication leads to a constitutively higher expression in T. parvula than in A. thaliana, which might be responsible for the apparently enhanced tolerance of T. parvula to high Li + in its natural habitat in central Anatolia [43].
Gene duplication may also result from single gene/ segmental transpositionduplication or ectopic dupli cation/translocation [44] in such a way that any syntenic evidence for its ancestral origin is lost. Comparisons of T. parvula and A. thaliana genomes indicate multiple translocationduplication events involving stressrelated genes, exemplified by the duplications of orthologs of CBL10, encoding a calcium sensor [10], and AVP1, encod ing a vacuolar proton transporter (Figure 2b) in T. parvula. The details of the relationship between this mechanism and stressadaptive evolution deserve further exploration.
From these initial observations, there are a number of important questions for future studies. For example, how do duplications arise and become stabilized in targeted regions of the genome? Can stress increase the rate of their generation? How rapidly can new regulatory sequences evolve to become operational and do they evolve along with duplicated genes or independently? How rapidly can neofunctionalization occur and how is it  [42], and AVP1, encoding a vacuolar proton transporter [79], were compared between T. parvula (Tp) and Arabidopsis thaliana (At). Shown are five colinear genes adjacent to NHX8 and AVP1 in the two species. Red arrows indicate duplications.  balanced by gene loss? And how is tandem duplication called into play to adjust expression levels?

Stress adaptation through lineage-specific sequences
In any single genome, the suite of genes shaped by stress during adaptation should reflect, above all, the nature of the stresses. In turn, physiological and developmental changes will mirror genomic changes. Thus, both the suite of altered genes and their regulatory sequences can be expected to demonstrate lineage specificity. Lineagespecific or taxonomically restricted genes (TRGs) are proteincoding genes that do not share sequence similarity outside the lineage. For that reason, they are also sometimes referred to as 'orphan genes' [45], or 'unknown' . Indeed, with each new EST collection or genome, the number of new unknowns (or 'unknown unknowns') proliferates. Regardless of the taxon, and in all the examples included in Table 1, 10 to 20% of the genes in eukaryote genomes or transcriptomes are TRGs [46]. In the Brassicaceae, familyspecific TRGs are enriched for genes responsive to abiotic stresses [47]. It should be noted here that 'stressresponsive' or 'stress related' are not labels indicating that the functions of the genes are then known. They simply mean that expression is induced by stress. In Arabidopsis, but not in T. parvula, the expansion is pronounced in pathogenresponsive genes; in T. parvula, but not in Arabidopsis, the expan sion is pronounced in abiotic stressrelated genes [10]. Across the spectrum of plant stress tolerance, pools of rapidly evolving TRGs may function as a reservoir of adaptive potential to challenging environments.
In Arabidopsis, 3.4% of all genes share sequence similarity only within the Brassicaceae, and another 5% lack similarity with any sequences deposited in public databases [48]. Because the Arabidopsis genome is the most fully annotated, it can be expected that the more evolutionarily distant from Arabidopsis a species is, the larger will be the number of TRGs, especially if the species is highly adapted to an environment in which Arabidopsis cannot survive. In the T. parvula genome, 11% of the annotated nontransposon putative protein coding genes show no sequence similarity with A. thaliana genes. About twothirds of those also lack similarity with any known plant sequence [10]. In Lobularia maritima (sweet alyssum), a salttolerant coastal relative of Arabidopsis [49], 35% of the salt induced trans criptome is 'unknown' , as are half of the saltstressinduced transcripts from a facultative halophyte, Festuca rubra ssp. litoralis [50] and nearly 55% of the contigs in two mangrove transcriptomes (R. mangle and H. littoralis) [26].
Regulatory elements in the untranslated regions and promoters also show lineage specificity. For example, a detailed comparison of the upstream regulatory region of SOS1, a gene critical for salt tolerance in both Arabidopsis and Thellungiella [51], showed conserved repeat sequen ces and secondary structures in Thellungiella spp. and other halophytes that are absent in Arabidopsis. These differences in regions that are not transcribed are corre lated with differences in expression observed for SOS1 in Thellungiella [15,16].
TEs seem to have a key role in generating TRGs [31], because novel chimeric genes originate when active retro transposons recruit new exons from flanking sequences [52]. About 10% of the Arabidopsis TRGs showed degenerate sequence conservation with transposable elements, a proportion double that among nonTRGs [47]. In the T. parvula genome, TRGs are enriched in pericentromeric TErich regions, suggesting roles of transposons in their evolution [10].
Without sequence similarities on which to base anno tation, 'orphan genes' usually lack assignable functions [10,26]. Clearly, this is a major obstacle to elucidating the genetic basis for any characteristic, not just for under standing stress tolerance, and overcoming this is an important target. Again, there are associated questions to be addressed. For example, why do duplications, especially those associated with TEs, seem to be clustered in centromeric regions? And how do lineagespecific, taxonomically restricted, or 'orphan' genes fit in the overall picture of functioning organisms? With regard to this last question, network analysis has already proved to be a good starting place. As has already been demon strated in Arabidopsis transcriptional network models, the correlated expression of TRGs and genes with assigned functions in response to stresses provides, even without definitive annotations, useful linkages for visualiz ing coexpression patterns and identifying 'hub' genes that have core roles in regulating pathways [53,54]. Although still limited for extremophiles, RNA sequencing experiments performed under both transient and chronic stress conditions should, before long, contri bute the expression data needed for extending similar networks to nonmodel or new model species.

Epigenetic modifications and non-coding RNAs
Beyond adaptations embedded in the basic nucleotide sequence of a genome, epigenetic controls have key roles in ensuring plant survival and reproduction under suboptimal growth conditions [55,56]. Selective hyper methylation on salt stress adaptation in the extremophile Crassulacean acid metabolism (CAM) plant Mesembry anthemum crystallinum, for example, indicates both specific and global epigenetic restructuring in plant abiotic stress response regulation [57].
Methylation, alone or in combination with small interfering RNA degradation pathways, can also regulate transposon activity [58]. Although most TEs are inactive at any time, the proportion that is active is highly dynamic and stress responsive [59,60]. TE copies can vary significantly within single species (for example, maize haplotypes [58]), or between closely related species; in T. parvula and T. salsuginea, TEs make up about 7.4% [10] and up to 50% (Q Xie, personal communication) of the genome, respectively.
The potential influence of retrotransposonrich gene neighborhoods undoubtedly varies in ways yet to be fully appreciated. It may, for example, be represented in the HKT1 locus in T. parvula [10], as it is for Arabidopsis TIP1;2, the aquaporin whose high basal expression has been caused by TEs in the promoter region [61].
Plant microRNAs (miRNAs) also act epigenetically, through target mRNA cleavage or translational inhibition, and their effects are further compounded by feedback regulation. The majority are lineage specific or species specific. Even conserved miRNAs, however, have speciesspecific functions, as demonstrated by compari sons of Arabidopsis and poplar [62]. Only 80% of known miRNAs identified in the T. parvula genome share sequence similarity with A. thaliana miRNAs. Another 10% are found in Brassicaceae species, but not in A. thaliana [10].
An in silico comparison of the target sequences of miRNAs in the mRNAs of mangroves and Arabidopsis showed that both the conservation of miRNA targets in stressresponsive genes and their placements within those genes are lineage specific. They may also be similarly represented in unrelated species showing similar ecological affinities [23].
Both methylation and miRNAbased epigenetic regula tion are fields of intense activity at present and, from the standpoint of stress adaptation, how miRNA targeting comes about and varies between species is an important question. Another is how the functions of miRNAs and proteincoding genes are regulated and coordinated. Can epigenetic signatures due to stress adaptation be trans generational, and if so, for how many generations? The concept of transgenerational epigenetic stress signatures has support from some studies. For example, when Arabidopsis parent populations were exposed to abiotic stresses that increased global methylation, their progeny were more stress tolerant [63]. Similarly, in rice, parents with hypermethylation of particular loci in response to lownutrient stress produced progeny with increased tolerance [64]. In dandelion (Taraxacum officinale), exposure to stress resulted in heritable markers, again implying epigenetic heritability for stress adaptation [65]. In Arabidopsis mutants impaired for small interfering RNA biogenesis, increased copy numbers of the ONSEN retrotransposon element were induced by heat stress. ONSEN insertion, in turn, rendered adjacent genes heat inducible. Unlike in wildtype plants, these numbers failed to decay over a period of 20 to 30 days. Because transposition was particularly active during flower development and before gametogenesis, the effect was transgenerational [60].

Concluding remarks
To know that the phenomena we have presented here operate is not sufficient. By themselves, sequences pro vide only the raw materials for addressing more impor tant questions. On the one hand, they set the stage for exploring how genomes have evolved in plants with different adaptations to environmental conditions. On the other, and more fundamentally, expanding genomic resources bring the opportunity to explore mechanisms of genome evolution themselves.
The recently completed genome sequences of T. parvula [10] and the soon to be available genome of T. salsuginea [66] are critical resources, enabling highresolution genomewide comparisons between extremophiles and their nonextremophile crucifer relatives. Along with a dozen other transcriptomes of extremophile plants and numerous genomes from nonextremophiles, they have supported the ideas, first, that there is a basal set of genes shared between all plants, and second, that a subset of these has experienced selective modification and ampli fication of a sort required for adaptation to and success in changing or stressful environments. With sequencing technologies evolving rapidly, a 'third generation' of instruments will undoubtedly have an even greater transforming effect.
As output increases in amount and quality and cost comes down, it seems clear that the genome sequence of any plant species deemed important, and eventually multiple ecotypes of each, can, as needed, become avail able. The value and importance of this cannot be over stated in a world where the population is rising much faster than total agricultural production and land degrada tion is rapidly reducing the area useable for crops. Extremophiles provide not only a model for what is possible, but for the traits that may be necessary for crops in the future.