Paleogenomics: reconstruction of plant evolutionary trajectories from modern and ancient DNA

Pont, Caroline; Wagner, Stefanie; Kremer, Antoine; Orlando, Ludovic; Plomion, Christophe; Salse, Jerome

doi:10.1186/s13059-019-1627-1

Opinion
Open access
Published: 11 February 2019

Paleogenomics: reconstruction of plant evolutionary trajectories from modern and ancient DNA

Caroline Pont¹,
Stefanie Wagner^2,3,
Antoine Kremer³,
Ludovic Orlando^2,4,
Christophe Plomion³ &
…
Jerome Salse¹

Genome Biology volume 20, Article number: 29 (2019) Cite this article

12k Accesses
49 Citations
25 Altmetric
Metrics details

Abstract

How contemporary plant genomes originated and evolved is a fascinating question. One approach uses reference genomes from extant species to reconstruct the sequence and structure of their common ancestors over deep timescales. A second approach focuses on the direct identification of genomic changes at a shorter timescale by sequencing ancient DNA preserved in subfossil remains. Merged within the nascent field of paleogenomics, these complementary approaches provide insights into the evolutionary forces that shaped the organization and regulation of modern genomes and open novel perspectives in fostering genetic gain in breeding programs and establishing tools to predict future population changes in response to anthropogenic pressure and global warming.

Introduction

Flowering plants, or angiosperms, have come to dominate terrestrial vegetation. They are an essential component of the carbon, oxygen and water cycles, and paramount to the stability of the climate and substrate of our planet. Through photosynthesis, angiosperms convert solar energy into the basal source of chemical energy that underlies the development of almost all terrestrial ecosystems. Flowering plants are also essential to human society as our principal source of food, animal fodder, medicines, and materials for building, clothing, and manufacturing, among many other uses. Molecular clock estimates [1] and paleontological data [2] suggest that angiosperms emerged some 120–170 million years ago (mya), during a period extending from the Cretaceous to the end of the Jurassic; whereas integrated timescale approaches suggest that they might have emerged even further in the past, some 200–250 mya [3]. Flowering plants rapidly diversified so that over 350,000 species are alive today [4,5,6,7]. These species are divided into two main groups, the monocots and eudicots, which account for 20% and 75%, respectively, of the diversity characterized to date [6]. Recent advances in high-throughput DNA sequencing and computational biology have helped researchers to develop the field of paleogenomics, making it possible to retrieve invaluable information about the evolutionary history that underlies the emergence and subsequent diversification of flowering plants.

This research field relies on two main complementary approaches that aim to track the evolutionary genomic changes at both the macro-evolutionary and micro-evolutionary temporal scales. The first, an indirect (or ‘synchronic’) approach, compares modern genomes to reconstruct ancestral genomes over deep timescales of several millions of years (macro-evolution). The second approach, a direct (or ‘allochronic’) strategy, relies on the direct sequencing of genomes from past plant subfossil materials that have been preserved over the past 10,000 years (micro-evolution). Here, we address the underlying methodologies for both paleogenomics approaches, as well as their major achievements and prospects in providing an understanding of the evolutionary trajectories that underpin the genetic makeup of modern plant species.

Reconstruction of an ancestral genome from modern genome sequences (synchronic reconstruction)

Background

The recent accumulation of plant genomic resources has provided an unprecedented opportunity to compare modern genomes with each other and to infer their evolutionary history from the reconstructed genomes of their most recent common ancestors (MRCA). Such ancestral genome reconstruction was initially used to investigate 105 million years of eutherian (placental) mammal evolution. The inferred ancestral karyotypes for the eutherians (2n = 44), boreoeutherians (2n = 46), and great apes (2n = 48) were used to increase our understanding of the mechanisms driving speciation and adaptation [8,9,10,11]. In particular, eutherian genomes have been found to be surprisingly stable, and affected by only a limited number of large-scale rearrangements during evolution. Higher rates of such chromosomal shuffling have been reported for the branch extending from the great ape ancestor to the ancestor of humans and chimpanzees, which diverged after the Cretaceous–Paleogene (K–Pg) boundary, at a time when the dinosaurs became extinct. Computational reconstructions of mammalian ancestral genomes were instrumental in suggesting that environmental changes may have driven genome plasticity through chromosome rearrangements. These changes may also have led to new variation in gene content and gene expression that gave rise to key adaptive biological functions, such as olfactory receptors [11,12,13]. Ancestral genome reconstruction has also shed light on plant evolution.

State-of-the-art methodology

The ancestral genome is a ‘median’ or ‘intermediate’ genome consisting of a clean reference gene order that is common to all of the investigated extant species (Fig. 1). The ancestral genomes that are inferred in silico are actually minimal shared ancestral genomes, which lack components of the ‘real’ (unknown) ancestral genomes that were either lost from all of the investigated descendants and/or retained by only one modern species. Such inferred ancestral (minimal) genomes are reconstructed following a four-step strategy [14]. First, sequence comparison across genomes is used to characterize conserved or duplicated gene pairs on the basis of alignment parameters and/or phylogenetic inferences that define genes that are conserved in pairs of species (i.e., putative protogenes (pPGs)). The pPGs that are conserved in all of the investigated species (i.e., core protogenes (core-pPGs)) are used for the definition of synteny blocks (SBs), with the filtering out of groups of fewer than five (pPGs) genes. SBs are then merged on the basis of chromosome-to-chromosome orthologous relationships between the compared genomes, delivering the ancestral protochromosomes (also referred to as contiguous ancestral regions (CARs)). These CARs correspond to independent sets of genomic blocks that display paralogous and/or orthologous relationships in modern species. Finally, the ordering of protogenes (including non-core-pPGs, i.e., genes that are conserved in only a subset of the investigated species) onto the previously defined protochromosomes yields an exhaustive set of ordered protogenes (oPGs).

Putative orthologous (or ancestral) genes that have either been transposed outside of CARs so that they are not conserved in synteny in the course of evolution, or that are only retained in one of the investigated species, or that are lost from all of the investigated species are not identified in SBs and therefore are missing from the inferred ancestral genomes. Several tools such as DRIMM-synteny [15], ADHoRe [16], DiagHunter [17], DAGchainer [18], SyMAP [19], and MCScanX [20] are publicly available for clustering or chaining collinear gene pairs, whereas ANGES [21], MRGA [22], and inferCARs [10] are used for reconstructing ancestral genomes. Finally, the reconstructed ancestral karyotypes can be used to infer a parsimonious evolutionary model that assumes minimal numbers of genomic rearrangements (including inversions, deletions, fusions, fissions, and translocations). Such a model fosters new investigations of the evolutionary fate of ancestral genes/genomes, through precise identification of the changes involved (chromosome fusion, fission, translocation, gains, and losses of genes) and their assignment to specific species or botanical families.

Major achievements

The ancestral angiosperm karyotype (AAK) has recently been reconstructed with a repertoire of 22,899 ancestral genes that are conserved in present-day crops and that date back 190–238 mya. The angiosperms have also been proposed to emerge some 250 mya using evolutionary timescale approaches [3]. This time period largely overlaps with the late Triassic era and predates the earliest recorded plant fossil [23]. The AAK then diverged, giving rise to the ancestral monocot karyotype (AMK), with five protochromosomes and 6707 ordered protogenes (or seven protochromosomes according to Ming et al. [24]), and the ancestral eudicot karyotype (AEK), with seven protochromosomes and 6284 ordered protogenes [23]. It is possible to reconstruct any investigated modern monocot or eudicot genome using these inferred ancestors (AAK and AMK or AEK), such that modern karyotypes can be seen as a mosaic of reconstructed ancestral protochromosomal segments (Fig. 2). The availability of the AAK, AMK, and AEK helps us to track the evolutionary plasticity acting at the gene, chromosome, genome, and species levels over more than 200 million years of plant evolution [23].

At the gene level, the comparison of the AAK gene repertoire to those of outgroup species, such as gymnosperms, mosses, and single-cell green algae, uncovered genes that are specific to flowering plants. These genes were preferentially assigned to Gene Ontology (GO) terms such as ‘pollen–pistil interaction’, ‘response to endogenous stimuli’, ‘flower development’, and ‘pollination’, corresponding to the key biological processes that drove the transition between gymnosperms and angiosperms [23].

At the genome level, the genomic plasticity inherited through polyploidization events can be assessed, with ~ 60 % of AAK protogenes being present as singletons today in modern species despite recurrent polyploidization events (Fig. 2). This general phenomenon of gene repertoire contraction following polyploidy is also observed at the chromosome level, with a general decrease in chromosome number after whole-genome duplication (WGD) resulting from massive ancestral chromosome fusions through two mechanisms, centromeric chromosome fusion (CCF) and telomeric chromosome fusion (TCF). CCF, which is mainly observed in grasses, involves the insertion of an entire chromosome into a break in the centromeric region of another chromosome. TCF involves the ‘end-to-end’ joining of two chromosomes via their telomeres [25]. The observed general pattern of chromosome number reduction involves unequal reciprocal translocations and the loss of several centromeres, such that only a subset of the ancestral pool of telomeres or centromeres are re-used as functional telomeres or centromeres in modern species [25, 26]. Despite multiple rounds of WGD in the course of plant evolution, the number of genes and chromosomes has been kept constant by massive diploidization and fusion events, at the gene and genome levels, respectively. Diploidization did not occur at random in the genome, particularly where retained ancestral genes were partitioned between paralogous blocks so as to form ‘most fractionated’ (MF, also known as S for sensitive) and ‘least fractionated’ (LF, also known as D for dominant) chromosomal compartments [14]. ‘RNA binding’, ‘nucleic acid binding’, ‘receptor activity’, ‘signal transducer activity’, ‘receptor binding’, and ‘transcription factor activity’ are frequent GO terms associated with molecular functions that are enriched in extant genomes relative to the AAK. They correspond to adaptive or specialized biological functions for which multiple copies of genes were conserved after WGD and have survived the general diploidization phenomenon [23].

As has been proposed for mammalian evolution, paleopolyploidy events in angiosperms are usually considered rare, are likely to lead to an evolutionary dead-end, and may have served as the basis for species diversification and survival during episodes of mass species extinction [27,28,29]. Although still debated, the ancient paleopolyploidization as well as ancestral speciation events in angiosperms may have been associated with known periods of species extinction, such as the Cretaceous/Paleogene (called K-Pg, ~ 65 mya) transition [27] or the Triassic/Jurassic (called Tr-J, ~ 200 mya) transition [30]. More recent paleopolyploidization events that are specific to plant lineages (or even species) may be associated with more recent plant diversification periods during the Paleogene and Neogene (~ 20–30 mya), as observed from historical changes in dry forest communities and biomasses [31, 32]. Thus, polyploidy appears to have played a major role in (re-)shaping structural and functional genomic diversification during angiosperm evolution, with contrasting rates of changes between species, subgenomes, genes, and functions. It may also have delivered biological novelties that have enhanced tolerance of environmental changes, including those occurring during mass extinction events.

Grasses as a case study

Besides the recovery of extinct AMK, AEK, and AAK founder karyotypes, the synchronic approach has also enabled the computational reconstruction of the ancestral genomes of major angiosperm lineages. In eudicots, ancestral genomes have been proposed for the Rosaceae [33], Brassicaceae [34], and Cucurbitaceae [35] subfamilies, consisting of nine, eight (or seven), 12 (using the melon genome as pivot) protochromosomes, respectively, as well as for the legumes [36]. In grasses, the ancestral grass karyotype (AGK), which takes into account gene conservation between rice, wheat, barley, Brachypodium, sorghum, setaria, and maize, was structured into seven protochromosomes containing 8581 protogenes (9430 in Wang et al. [37]) and with a minimal gene space physical size of 30 Mb [23, 38, 39]. This ancestral genome went through a paleotetraploidization event (involving seven duplicated blocks shared by modern monocots) more than ~ 95 mya [37, 38, 40]. Two subsequent symmetric reciprocal translocations, one of which was centromeric (CCF) and the other telomeric (TCF), and two asymmetric reciprocal translocations resulted in a total of 12 chromosomes [23, 39] bearing 16,464 protogenes (18,860 according to Wang et al. [37]). All investigated modern grass genomes can then be reconstructed from this post-polyploidy ancestral karyotype of 12 protochromosomes, taking into account CCF, TCF, translocation, and inversion events (Fig. 2). Rice has retained the n = 12 structure of the AGK and has been proposed to be the slowest evolving species among the grasses [23, 37], whereas the other species underwent numerous chromosome rearrangements to reach their present-day karyotypes [23, 38, 39]. Rice can, therefore, be considered as a reference genome (also known as a ‘pivot’) for comparative genomics studies in grasses.

The grasses appear to constitute a key botanical family in which to investigate the role of polyploidizations in promoting species speciation and adaptation. Grasses experienced an ancestral paleotetraploidization event as well as species-specific polyploidization events, with a tetraploidization event in maize and tetraploidization or hexaploidization events in wheat. After a polyploidization event, homoeologous chromosome differentiation is necessary to stabilize meiosis by preventing incorrect pairing between homoeologs. This is achieved through massive partitioning of the organization and regulation of the subgenomes, involving the fusion, fission, inversion, and translocation of chromosomes, loss of genes or DNA, and neo- or sub-functionalization of gene pairs. Ultimately, such post-polyploidy genomic plasticity led to novel phenotypes that underlie the evolutionary success of polyploid plants and, ultimately, was selected for by humans during domestication (reviewed in [14, 29, 41, 42]).

Promising scientific avenues from inferred ancestral genomes

Inferred ancestral genomes are not only crucial for understanding how plant genomes have evolved at the chromosome and gene scales, but also offer the possibility to address, in novel ways, issues regarding translational research and post-polyploidy plasticity that are relevant to plant breeding.

Translational research

Ancestral genomes and related comparative genomics data are delivered through public web servers such as PlantSyntenyViewer (https://urgi.versailles.inra.fr/synteny [30, 34, 39]), Genomicus (http://www.genomicus.biologie.ens.fr/genomicus-plants [43]), COGE (https://genomevolution.org/coge/ [44]) and PLAZA (http://bioinformatics.psb.ugent.be/plaza/ [45]). The ancestral genomes (AAK, AEK, AMK, and AGK, as well as ancestral genomes for the Rosaceae, Brassicaceae, and Cucurbitaceae; Table 1) provide a list of accurate orthologs between species that can be used to improve the structural and functional annotation of genomes. The plant genomes shown in Fig. 2 have been sequenced, assembled, and finally annotated by different methods and groups, potentially resulting in some inconsistencies. With the use of reconstructed ancestral genomes, structural (intron and exon structure) and functional (GO) annotations of genes can be improved by comparing orthologous and paralogous gene sets that may share similar (ancestral) genomic features. Reconstructed ancestors can also be used as a useful resource for translational research on key agronomical traits, particularly from model species (such as Arabidopsis thaliana) to crops [46]. Modern monocot and eudicot crops can now be connected via the 22,899 protogenes that define the AAK [23], offering the opportunity to exploit the knowledge gained on genes underlying traits of interest in models based on orthologs or paralogs in crops delivered in the proposed evolutionary scenario and associated paleogenomic data (Fig. 2). Such translational-based dissection of traits has been performed successfully in several botanical families, including legumes (for example, between Medicato truncatula and pea, as described by Bordat et al. [47]) and grasses (for example, between Brachypodium distachyon and wheat, as described by Dobrovolskaya et al. [48]).

Table 1 Ancestral plant genomes

Full size table

Polyploidization

Polyploidization events have been proposed as a major source of genetic novelty during evolution. Such post-polyploidy genomic plasticity takes place in paleopolyploids that are subject to diploidization (evolution toward a reduction of duplicate redundancy) through (not exclusively): (i) differences in ancestral gene retention yielding contrasted plasticity between MF (or S) and LF (or D) compartments; (ii) bias in GO for the retention of multiple copies of genes displaying an enrichment in functional categories such as transcriptional regulation, ribosomes, response to abiotic or biotic stimuli, response to hormonal stimuli, cell organization, and transporter functions; (iii) partitioned gene expression with differences in transcript abundance or neo- and sub- functionalization patterns between retained pairs; (iv) contrasted single nucleotide polymorphisms (SNPs) at the population level between paralogous genomic fragments; and (v) contrast in small regulation as well as differences in epigenetic (CG methylation) marks between duplicated blocks/genes [49]. Such subgenome dominance phenomena, which partition the organization and regulation of diploidized paleopolyploids, have been particularly exemplified in Brassicaceae and maize [50,51,52,53] but are reportedly so far undetectable in soybean, banana, and poplar [54].

The evolutionary plasticity gained from recurrent polyploidization and diploidization (also known as post-polyploidization diploidization (PPD) [55]) processes has provided the basis for functional and phenotypic novelty in angiosperms. This plasticity may underlie a plant's ability to survive in or invade a novel environment, ultimately driving the observed evolutionary success of important plant families [27]. Nevertheless, the continuum and interplay between the reported structural and functional reprogramming after PPD processes remain poorly understood. The access to ancient DNA (aDNA) sequences from extinct diploid and polyploid ancestors contemporary to past polyploidization events will further expand our understanding of this major phenomenon driving plant evolutionary dynamics, making it possible to characterize the driving molecular mechanisms that have potential for use in breeding. In that regard, nascent polyploids (particularly in wheat and Brassicaceae) provide opportunities for testing the hypothesis that polyploidization accelerates evolutionary adaptation to environmental changes [56].