- Open Access
Chromosome evolution at the origin of the ancestral vertebrate genome
© The Author(s). 2018
- Received: 9 February 2018
- Accepted: 4 October 2018
- Published: 17 October 2018
It has been proposed that more than 450 million years ago, two successive whole genome duplications took place in a marine chordate lineage before leading to the common ancestor of vertebrates. A precise reconstruction of these founding events would provide a framework to better understand the impact of these early whole genome duplications on extant vertebrates.
We reconstruct the evolution of chromosomes at the beginning of vertebrate evolution. We first compare 61 extant animal genomes to reconstruct the highly contiguous order of genes in a 326-million-year-old ancestral Amniota genome. In this genome, we establish a well-supported list of duplicated genes originating from the two whole genome duplications to identify tetrads of duplicated chromosomes. From this, we reconstruct a chronology in which a pre-vertebrate genome composed of 17 chromosomes duplicated to 34 chromosomes and was subject to seven chromosome fusions before duplicating again into 54 chromosomes. After the separation of the lineage of Gnathostomata (jawed vertebrates) from Cyclostomata (extant jawless fish), four more fusions took place to form the ancestral Euteleostomi (bony vertebrates) genome of 50 chromosomes.
These results firmly establish the occurrence of two whole genome duplications in the lineage that precedes the ancestor of vertebrates, resolving in particular the ambiguity raised by the analysis of the lamprey genome. This work provides a foundation for studying the evolution of vertebrate chromosomes from the standpoint of a common ancestor and particularly the pattern of duplicate gene retention and loss that resulted in the gene composition of extant vertebrate genomes.
Approximately 35% of extant human genes still exist in duplicate copies (ohnologs) owing to the 1R-2R WGDs [8, 9], including the four clusters of HOX genes found in all tetrapods . Ohnologs represent the vast majority of duplicated genes in the human genome . They have been shown to be enriched in disease genes [8, 12–14] and to influence the frequency of structural variations in human populations . The ancient 1R-2R genome duplications therefore still exert a strong influence on present-day genomes, warranting a better understanding of their early history. The first reconstruction of the evolution of the karyotypes before, during, and after the 1R-2R  left a number of questions unanswered. It could not reliably identify all ancestral vertebrate chromosomes nor could it determine if chromosome fusions or fissions took place between the two WGDs. This low resolution was due to the many chromosome rearrangements that have occurred since the vertebrate ancestor (Vertebrata), scrambling the ancestral gene organization in extant genomes, and the absence of a suitable outgroup species. In addition, a recent comparison between the lamprey genome and the chicken genome questioned the 1R-2R hypothesis and suggested that a single WGD and multiple segmental duplications could also explain the synteny patterns observed [16, 17] between these two species. Both the uncertainties of the first reconstruction of the pre-1R genome and this alternative to the 1R-2R hypothesis motivate a detailed analysis of the early evolution of vertebrate chromosomes.
The number of sequenced vertebrate genomes has recently greatly increased, allowing the reconstruction of ancestral genomes with much higher accuracy. After reconstructing the ancestral Amniota genome as a stepping stone to better understand the details of early vertebrate karyotype evolution, we find that the pre-1R vertebrate genome contained 17 chromosomes and that all vertebrates descend from a post-2R genome comprising 54 chromosomes. We show that the human genome still bears a strong imprint of the pre-1R genome, and provide resources to study the impact of the two WGDs in vertebrates from the perspective of a reference point shared by all descendent species.
Identification of pairs of ohnologous genes in the ancestral Amniota genome
We inferred from Ensembl gene trees (version 69) that 19,786 genes existed in the ancestral Amniota genome, the ancestor of birds, reptiles, and mammals (326 million years). We used AGORA (Algorithm for Gene Order Reconstruction in Ancestors)  to order and orient these genes as they were in the Amniota genome (Additional file 1: Figure S1). This in silico reconstruction is composed of 470 segments, with 50% of the genes in segments larger than 253 genes (N50 length). We then selected the 56 chromosome-size segments larger than 50 genes as an initial set of Contiguous Ancestral Regions (CARs; mean CAR length 256 genes, 12,134 genes total) to identify duplicated regions.
Ohnologs resulting from the 1R-2R WGDs are key to identifying pairs of duplicated chromosome segments. We identified pairs of putative ohnologous genes directly in the reconstructed Amniota genome using gene trees to date their duplication while ensuring that each member of a pair belongs to a different CAR (see the “Methods” section), resulting in a “List A” containing 5616 ancestral Amniota ohnolog pairs. Two previous studies have also identified ohnologs from the two WGDs, in the human and other vertebrate genomes, using conserved synteny and sequence similarity. We used Ensembl gene trees to convert extant gene identifiers from these studies to their ancestral Amniota gene identifiers. The first list established by Makino and McLysaght  and hereafter called “List B” contains 4870 ancestral Amniota gene pairs. The second study by Singh et al.  established three levels of confidence (strict, intermediate, and relaxed) to define ohnologs. Following these criteria, we defined three additional lists containing respectively 2873 (List “C-strict”), 5253 (List “C-inter”), and 7806 (List “C-relax”) ancestral Amniota ohnolog pairs.
This list of ohnolog pairs is of high quality for a number of reasons. First, it is based on the gene content and synteny in the reconstructed ancestral Amniota genome which is 326 million years closer to the 1R-2R events than extant genomes, and thus, signatures of 1R-2R events are read with greater accuracy. Second, this list abides by a 1R-2R-compatibility rule, i.e., no ohnologous gene family connects more than four CARs. Third, the ohnologous pairs are all phylogenetically consistent in that both genes in a pair always belong to the same Ensembl gene tree. Fourth, the two genes of a pair were allowed to be on the same CAR only if ≥ 90 genes separated them to avoid spurious inclusions of genes duplicated in tandem.
Identification of post-2R duplicated CARs
We identified two post-2R chromosomal fusions that required a split of two Amniota CARs in two sub-CARs each (CARs 5 and 118), and we identified three probable assembly errors that required a split of three CARs (CARs 40, 46, and 97). We also performed 23 CAR assemblies (Table 1), ending with a final set of 51 CARs edited to more closely represent their post-2R configuration. The ohnology relationships between CARs based on significant p values of the proportion test connected the 51 CARs into 17 tetrads (Fig. 3d). The precise step-by-step procedure that we followed to split or assemble CARs and group them in tetrads is detailed in Additional file 1, including Additional file 1: Figure S7B to S14B.
Chromosome evolution between the 1R and 2R whole genome duplications
Remarkably, each CLG was associated with one predominant CAR tetrad (Additional file 1: Figure S16). Consequently, all seven adjacent tetrads result from chromosome fusions between the 1R and 2R WGDs. Indeed, a chromosome fission would have split the gene content of a CLG over two different tetrads (Fig. 4b). Incidentally, we also found evidence for a chromosome fusion between the chordate ancestor and the 1R, because more than 75% of genes from CLGs 6 and 7 have their descendants in a single tetrad (tetrad 14; Additional file 1: Figure S16 and Additional file 7). Conversely, we could not confidently assign tetrad 13 to a CLG, likely because it is a small tetrad. We therefore conclude that the pre-1R karyotype comprised 17 chromosomes, duplicated into 34 chromosomes after the first WGD and followed by seven fusions. The resulting 27 chromosomes were duplicated in the second WGD leading to 54 Vertebrata chromosomes, at the origin of the approximately 60,000 extant species of vertebrates.
Chromosome evolution after the 2R
This karyotype was followed by additional chromosome fusions at different stages after the 2R WGD. Four fusions followed the scenario described in Fig. 4c, where a single CAR joins two tetrads (CAR 5, CAR5a_152, CAR3_22, CAR 10_240_2; Fig. 3d). They can be dated to the period between the 2R WGD and the Euteleostomi ancestor because of their homologies to both descendent and outgroup genomes. For example, CAR_10_240_2 is homologous to a single chicken chromosome (GG4), a single human chromosome (chromosome X) and a single spotted gar chromosome (LG7), which is most parsimoniously consistent with a situation where this CAR was already a single chromosome in Euteleostomi, the common ancestor of these three species. A fifth fusion can be dated to the period between Euteleostomi and Amniota: CAR 118 is common to two tetrads but while it is homologous to a single chicken chromosome (GG1), a disruption of synteny in the spotted gar genome and a disruption in the DCS pattern in the medaka genome are consistent with a fusion in the lineage leading to Amniota.
Comparative genomics between the pre-1R genome and the human genome
The four Hox clusters originate from pre-1R chromosome 1, and we examined in the same light other paralogous clusters that have been proposed to originate from a single pre-1R locus. The MHC region on human chromosome 6 contains a number of genes unrelated to immune functions but which possess ohnologs in 3 other loci on chromosomes 1, 9, and 19 . All 4 regions descend from pre-1R chromosome 9. Similarly, the loci containing FOX gene clusters have been compared and found to share paralogs suggestive of en-bloc duplications early in vertebrate evolution . We confirm here their unique origin on pre-1R chromosome 10. In contrast, no common origin was found for the different clusters of imprinted genes in the human genome (e.g., H19 locus on chromosome 11, IGF2R locus on chromosome 6, PON [1–3] locus on chromosome 7, UBE3A locus on chromosome 15) [27, 28], in line with their known progressive appearance later in therian mammals . The evolutionary scenario of ancestral vertebrate chromosomes presented here is therefore consistent with our current view of the evolution of these important gene families.
Finally, we analyzed the frequency of Gene Ontology terms of the human descendants of the 1416 pairs (2 gene losses), 502 triplets (1 gene loss), and 172 quartets (no loss) of ancestral Amniota ohnologs and find a striking pattern: quartets are enriched in both neuronal development and neuronal function (synaptic transmission) and triplets are enriched in muscle development (especially heart) and in muscle function (contraction), while pairs (two losses) are enriched in protein maturation and transport between organelles (Additional file 9).
We analyzed 61 animal genomes to reconstruct the evolutionary history of genes and chromosomes in the lineage leading to the ancestor of vertebrates, which then diversified in more than 60,000 species in the course of the following 450 million years. In contrast to previous studies which analyzed extant genomes, we first carefully reconstructed the ancestral Amniota genome to identify the signature of chromosome duplications more clearly. Our rationale is that the reconstructed ancestral Amniota genome should be devoid of the noise caused by the numerous rearrangements that took place during the following 326 million years of evolution. Indeed, the benefit of this approach can be seen when comparing the distribution of ohnologs in the ancestral Amniota genome versus their descendants in the human genome, for example in the chromosomes carrying the Hox clusters (Fig. 3c). The expected four-way association between duplicated chromosomes is striking in the Amniota reconstruction but blurred in the human genome. Statistical confidence and power are thus higher (Fig. 3d), enabling a higher resolution of chromosome events than was previously possible. In turn, this higher signal-to-noise ratio enables a straightforward reconstruction strategy of pre-duplication genomes, without requirements for complex statistical steps or algorithmic developments (reviewed in [30, 31]).
Progress was made on understanding early vertebrate genome evolution when Nakatani et al.  reconstructed a vertebrate pre-duplication ancestral genome by segmenting the human genome into conserved vertebrate linkage groups. These were compared with other vertebrate genomes, especially with medaka, using tunicate and sea urchin genes to define ohnologs. Using this strategy, the authors inferred a karyotype of the pre-1R vertebrate ancestor, but could not resolve inter-WGD chromosome fusions or fissions, leading to the conclusion that between 10 and 13 pre-1R chromosomes existed. Here, using the ancestral Amniota genome instead of the human genome, we reconstruct 17 chromosomes for the pre-1R genome and find evidence for 7 fusions between the 2 WGDs but none for fissions. We therefore identify new chromosomes and find major differences between the two karyotypes (Additional file 1: Table S1). Our study provides an improved picture of ancestral vertebrate genome evolution in part because it is based on the reconstructed Amniota genome and also because it relied on information from more species and more recent genome annotations. For example the spotted gar genome  was more informative than the synteny with the medaka genome  because it lacks the numerous chromosome fusions that occurred before the teleost WGD . Nakatani et al. also reconstructed an Osteichthyes post-2R genome corresponding here to the Euteleostomi ancestor, using a 2-of-3 rule relying on conservation between two of three genomes: the teleost pre-WGD, the chicken, and the vertebrate pre-1R genome . Here however, our Euteleostomi ancestor is substantially different from this earlier study, since we describe an Euteleostomi genome of 50 chromosomes, not 31. This difference mainly comes from fusions that we believe were incorrectly inferred from chicken-teleost genome comparisons, which were confounded by high rates of chromosome fusions in the lineage leading to the ancestral teleost . Indeed, given that the ancestral teleost fish possessed only 13 chromosome, the 50 chromosomes inferred here in the ancestral Euteleostomi suggest that the rate of chromosomes fusions in the lineage leading to teleosts must have been more intense than previously thought, in the order of 37 fusions in 100–150 million years. Similarly, the ancestral eutherian karyotype probably consisted of 23 pairs , suggesting a consistent pattern of karyotype reduction by chromosome fusion after the 1R-2R whole genome duplications. These fusions in the teleost and mammalian lineages involved both macro- and micro-chromosomes, probably explaining why these lineages do not possess micro-chromosomes any more.
The reconstructed Amniota genome is not complete, as the 49 chromosomes only contain 80% of the 15,854 ancestral Amniota genes assigned to CARs. We note that although all chromosome tetrads corresponding to pre-1R chromosomes are complete (i.e., are composed of 4 CARs), the 49 reconstructed Amniota chromosomes display large differences in gene numbers: the largest contains 862 genes (chromosome 37) and the smallest only 16 genes (chromosome 49). This could reflect either a more intense process of gene inactivation and loss on chromosomes with fewer genes, or a more intense rate of rearrangement on those chromosomes, leading to greater difficulties in reconstructing them. Small chromosomes with few genes do not follow any noticeable pattern in their distribution among tetrads, which might have indicated systematic biases in gene deletion during rediploidization. We further examined if chromosomes could be paired within a tetrad, as expected if gene loss following the first WGD left a distinct pattern on the two ohnologous chromosomes, that would have propagated to the two duplicates resulting from the second WGD. Here again, such a pattern is not noticeable, which may indicate that the two WGDs took place in rapid succession, as suggested before , leaving little time for gene deletions (diploidization) to leave their imprint. More high-quality genome sequences from extant amniotes are required to improve the reconstruction of their ancestral genome and identify more subtle patterns left by the 1R-2R. However, this current Amniota reconstruction and its ohnolog gene annotation already provide a solid foundation for new studies that should help resolve important questions, including complex phylogenetic histories .
An interesting question was raised during the analysis of the gar genome , when authors noticed a frequent 1:1 relationship between gar and chicken chromosomes, including micro-chromosomes. Micro-chromosomes are unusual because of their small sizes (usually below 20 Mb in chicken), their high GC content, high recombination rates, and high gene density. The gar-chicken comparison suggests that micro-chromosomes are ancestral features in Euteleostomi, which in turn raises the question of their origin through the 1R-2R duplications. Twelve gar and chicken micro-chromosomes are homologous and can parsimoniously be considered ancestral to Euteleostomi (Additional file 1: Table S2). They are distributed in 7 to 11 tetrads (some Euteleostomi micro-chromosomes originate from two tetrads), each of these tetrads containing both micro- and macro-chromosomes (Fig. 5). This leaves open the question of the timing of their formation: before the 1R, between the 1R and 2R, or immediately after the 2R.
The pre-2R karyotype and its evolution described here provide a framework to study the impact of the 1R-2R in extant vertebrate genomes, especially via the set of phylogenetically consistent ohnolog families that underlie the analysis. For example, previous studies have provided alternative hypotheses to explain a possible biased retention of ohnologs, including a requirement for stoichiometric balance of proteins in complexes [8, 37] or the preferential retention of genes involved in diseases owing to their dominant negative consequence on fitness if deleted from the genome [12, 38]. Ohnologs have also been shown to be enriched in developmental genes  and regulatory function . A precise mechanism to explain biased ohnolog retention, if any, remains to be established. This may be because these studies are based on extant genes copies, and 500 million years of evolution have elapsed since the 1R-2R WGDs. Using the human descendants of ohnologous Amniota genes, we show that a striking pattern of biological functions are preferentially retained in ohnologous families that were subject to two gene losses, one gene loss or remained as a quartet of genes. Such distinctive and specific distribution of functions between the three types of ohnolog families raises the question of strong functional constraints governing early rates of ohnolog retention and loss. It is possible that these patterns reflect the functions and anatomical parts (central nervous system and muscles) most targeted by selection during early vertebrate evolution, when gene redundancy was maximum and provided a template for evolutionary innovations, leading to increased organismal complexity. This further illustrates the benefit of comparative genomics based on reconstructed ancestral genomes.
Biology is a historical science, but this historical dimension is often difficult to acknowledge because the genomic records required to document ancestral states are missing. In practical terms this lack of information hinders our ability to integrate conclusions made across different living organisms, and to draw all the benefits from comparative genomics. Here, the detailed reconstruction of the early history of the vertebrate genome is a step towards a better understanding of the founding events at the origin of living mammals, birds, reptiles, amphibians and fish. It provides a new perspective, that of a common reference, to study the evolution of these extant animals.
Phylogenetic gene trees
The complete set of 20,285 phylogenetic gene trees built by the Ensembl Compara pipeline (Ensembl version 69)  were downloaded using the Ensembl API. Duplication nodes were edited when their consistency score was below 0.3, as described in . These phylogenetic gene trees were used in this study to (i) identify ancestral genes and orthologous gene relationships when reconstructing the Amniota ancestral genome, (ii) identify ohnologous gene relationships when constructing the “A” list of ohnologous gene pairs, and (iii) identify the ancestral and extant gene copies of genes from the original B and C lists of ohnologs and update the human-CLG relationships from reference  (all built using other Ensembl versions).
Reconstruction of the Amniota ancestral genome
The ancestral Amniota genome was reconstructed using the AGORA (Algorithm for Gene Order Reconstruction in Ancestors)  (https://github.com/DyogenIBENS), which is routinely used to reconstruct ancestral gene order presented in Genomicus since Ensembl release 53. The Amniota reconstruction used here is available for download on the ftp site of the Genomicus webserver  (ftp://ftp.biologie.ens.fr/pub/dyogen/genomicus/69.10/ and ). AGORA takes as input the gene orders, gene orientations, and gene trees from 61 extant metazoan genomes available in release 69 of Ensembl (2012/11/15). The 61 species are listed in the Additional file 1: Table S4 and include 40 mammals, 3 birds, 1 reptile, 1 turtle, 1 amphibian, 1 coelacanth, 1 lamprey, and 8 teleost fish among vertebrates, and 2 Cionas, 1 fruit fly and 1 nematode. The corresponding species tree can be found at http://www.genomicus.biologie.ens.fr/genomicus-69.01/data/SpeciesTree.pdf
Briefly, AGORA is a graph-based parsimony method: it builds an adjacency graph where ancestral genes are vertices (nodes) and ancestral adjacencies are edges (links) weighted by the frequency of their conservation. This frequency is measured by comparing pairs of genomes descending independently from the Amniota ancestor (one Mammalia and one Sauropsida) or one descendant and one outgroup, and counting how many times a given ancestral gene adjacency is conserved between the two extant genomes being compared. The graph is then linearized along the edges of maximal weight to produce Contiguous Ancestral Regions (CARs). An AGORA reconstruction of the Amniota genome based on Ensembl version 84 (2016/03/15) was also used for comparison (Table 1).
Identification of ohnolog gene pairs in the Amniota ancestor
Human ohnolog gene pairs from  based on Ensembl 52 data were assigned to their Ensembl version 69 gene trees using Ensembl gene IDs. Their Amniota ancestral genes were used to build list B of ohnolog gene pairs. Similarly, we downloaded three lists of ohnolog gene pairs from http://ohnologs.curie.fr/ described in , each list corresponding to a different degree of confidence level. A non-redundant list of Amniota ancestral genes was identified in Ensembl version 69 by sequentially identifying the Amniota ancestral genes of these ohnologs (data downloaded on October 27, 2014). We built list A by identifying ohnolog pairs directly in the Amniota reconstructed genome starting from ancestral genes that were duplicated between the Chordata and the Euteleostomi ancestors: these candidate pairs were considered ohnologs if another candidate pair from a different gene tree could be found on the same pair of CARs at a distance ≤ N genes (to account for massive duplicate loss after a WGD): the parameter N was made vary to optimize both sensitivity and specificity and fixed to 45 genes. A pair of ohnologs was allowed to occur between genes of the same CAR if they were located at a distance of 2N genes (90 genes, to account for possible rearrangements on the Amniota lineage), which added only 44 pairs. An integrated, high-quality list of ohnolog gene pairs was built from these five lists. Starting from their intersection consisting of 1273 pairs, we built disjoint graphs of ohnologs connected by ohnologous relationships. Replacing genes by the CARs they belong to, these graphs never involved more than four CARs of at least ten genes. To this core list, we sequentially added lists of pairs present in several or only one of the five initial lists, removing at each step the newly added pairs that would build ohnolog networks involving more than four CARs (Additional file 2: Table S5). The order in which these added lists were considered was established using the levels of confidence of the lists C, the number of lists where the pairs were found and the 1R-2R-compatibility criterion: a list was of better quality if it created fewer ohnolog networks of more than four CARs (per added pair) when added to the current validated list. Two properties of list A were maintained along the process: (1) the two ohnologs of a pair were allowed to be located on the same CAR only if they were ≥ 90 genes away from each other; (2) the two ohnologs of the pair had to belong to the same Ensembl gene tree.
Identification of ohnolog CARs in the Amniota ancestor
Given the distribution of ohnologs on the Amniota CARs, a proportion test was performed (prop.test function in R) between each pair of CARs to estimate if the corresponding CARs shared more ohnolog pairs than expected by chance. Bonferroni adjusted p values of 0.05 or less were considered significant. However, p values obtained without the Bonferroni correction were also examined if CARs were included in a tetrad with a least one significant Bonferroni-adjusted p value.
Attribution of ancestral Olfactores genes to the 17 pre-1R chromosomes
Phylogenetic gene trees from version 69 of Ensembl were analyzed to identify 10,093 nodes (genes) in the ancestral Olfactores genome, which were assigned to the 17 pre-1R tetrads according to the distribution of their descendent ancestral Amniota genes in the Amniota CARs used to build ohnologous tetrads. In the simple case of a single disjoint tetrad, all ancestral Olfactores genes with descendants exclusively in the 4 CARs of a tetrad were included in the corresponding pre-1R chromosome. Ancestral Olfactores genes with descendants in more than one tetrad were considered ambiguous and excluded. In the more complex cases of adjacent tetrads, the same principle was applied but when CARs belonged to two adjacent tetrads, only ohnolog genes were retained. In order to maximize the number of ancestral genes in the pre-1R chromosomes, we used all possible small CARs (≤ 50 genes) that could be assembled to the CARs of the tetrads, respecting all criteria of the 23 previous assemblies (Additional file 10).
Human genes derived from ancestral chordate linkage groups (CLGs)
The coordinates of the 120 human chromosome segments from Tables S1 and S14 in reference  were updated from the hg18 to hg19 version of the human genome using the UCSC liftOver utility. Twenty-two segments were not converted by liftOver and were further fragmented in 100 sub-segments of equal size, which were mapped again to hg19 in order to recover as many genes as possible (Additional file 1: Figure S15).
Gene Ontology (GO) analysis
The 2973 Amniota ohnolog families each contain from 2 to 11 ohnologs, depending on gene loss and duplications that occurred between the 1R and the ancestor of amniotes. To perform the GO analysis, we identified families containing a maximum of 4 ohnologs and where each ohnolog was located on a different CAR (1516 pairs, 502 triplets, 172 quartets). Ensembl gene trees were then used to identify human descendants of each ohnolog, if it exists. Lists of human genes were used on the Gorilla server , comparing each ohnolog list against the rest of the genome. After selecting GO terms enriched with an FDR < 10−5, terms were ranked by decreasing enrichment fold and the first 30 GO terms were analyzed and reported in Additional file 9.
We wish to thank Yann Guiguen for commenting on the manuscript, Pierre Vincens for the coordination of computing resources, and Lucas Tittmann for the assistance with statistics.
This work was supported by grants from the French Government and implemented by ANR (ANR-10-BINF-01-03 Ancestrome, ANR–10–LABX–54 MEMOLIFE and ANR–10–IDEX–0001–02 PSL* Research University).
Availability of data and materials
All data generated during this study are included in this published article. In addition, a dedicated Genomicus server (http://www.genomicus.biologie.ens.fr/genomicus-69.10/)  provides a graphical access to the data available as plain text at https://www.ebi.ac.uk/biostudies/studies/S-BSST199 .
CS designed the experiments, performed the analyses, and helped write the manuscript. AL performed the ancestral Amniota genome reconstruction, the GO analysis, contributed to the comparison with the lamprey genome and built the Genomicus website. C. Bon built the ohnolog list A. C Berthelot helped write the manuscript. HRC designed the study, designed the experiments, performed the analyses, and wrote the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010;11:97–108.View ArticleGoogle Scholar
- Conrad B, Antonorakis, SE. Gene duplication: a drive for phenotypic diversity and cause of human disease. Annu Rev Genomics Hum Genet. 2007;8:17–35.View ArticleGoogle Scholar
- Van de Peer Y, Maere S, Meyer A. The evolutionary significance of ancient genome duplications. Nat Rev Genet. 2009;10:725–32.View ArticleGoogle Scholar
- Ohno S. Evolution by gene duplication. London: Allen and Unwin; 1970.View ArticleGoogle Scholar
- Dehal P, Boore JL. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005;3:e314.View ArticleGoogle Scholar
- Nakatani Y, Takeda H, Kohara Y, Morishita S. Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome Res. 2007;17:1254–65.View ArticleGoogle Scholar
- Putnam NH, Butts T, Ferrier DEK, Furlong RF, Hellsten U, Kawashima T, Robinson-Rechavi M, Shoguchi E, Terry A, Yu J-K, et al. The amphioxus genome and the evolution of the chordate karyotype. Nature. 2008;453:1064–71.View ArticleGoogle Scholar
- Makino T, McLysaght A. Ohnologs in the human genome are dosage balanced and frequently associated with disease. Proc Natl Acad Sci. 2010;107:9270–4.View ArticleGoogle Scholar
- Singh PP, Arora J, Isambert H. Identification of ohnolog genes originating from whole genome duplication in early vertebrates, based on synteny comparison across multiple genomes. PLoS Comput Biol. 2015;11:e1004394.View ArticleGoogle Scholar
- Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, et al. Zebrafish hox clusters and vertebrate genome evolution. Science. 1998;282:1711–4.View ArticleGoogle Scholar
- Blomme T, Vandepoele K, De Bodt S, Simillion C, Maere S, Van de Peer Y: The gain and loss of genes during 600 million years of vertebrate evolution. Genome Biol 2006, 7:R43.View ArticleGoogle Scholar
- Singh PP, Affeldt S, Cascone I, Selimoglu R, Camonis J, Isambert H. On the expansion of “dangerous” gene repertoires by whole-genome duplications in early vertebrates. Cell Rep. 2012;2:1387–98.View ArticleGoogle Scholar
- McLysaght A, Makino T, Grayton HM, Tropeano M, Mitchell KJ, Vassos E, Collier DA. Ohnologs are overrepresented in pathogenic copy number mutations. Proc Natl Acad Sci U S A. 2014;111:361–6.View ArticleGoogle Scholar
- Rice AM, McLysaght A. Dosage sensitivity is a major determinant of human copy number variant pathogenicity. Nat Commun. 2017;8:14366.View ArticleGoogle Scholar
- Makino T, McLysaght A, Kawata M. Genome-wide deserts for copy number variation in vertebrates. Nat Commun. 2013;4:2283.View ArticleGoogle Scholar
- Smith JJ, Keinath MC. The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications. Genome Res. 2015;25:1081–90.View ArticleGoogle Scholar
- Smith JJ, Timoshevskaya N, Ye C, Holt C, Keinath MC, Parker HJ, Cook ME, Hess JE, Narum SR, Lamanna F, et al. The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution. Nat Genet. 2018;50:270–7.View ArticleGoogle Scholar
- Berthelot C, Muffato M, Abecassis J, Roest Crollius H. The 3D organization of chromatin explains evolutionary fragile genomic regions. Cell Rep. 2015;10:1913–24.View ArticleGoogle Scholar
- Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447:714–9.View ArticleGoogle Scholar
- Jaillon O, Aury J-M, Brunet F, Petit J-L, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431:946–57.View ArticleGoogle Scholar
- Kellis M, Birren BW, Lander ES. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004;428:617–24.View ArticleGoogle Scholar
- Louis A, Roest Crollius H, Robinson-Rechavi M. How much does the amphioxus genome represent the ancestor of chordates? Brief Funct Genomics. 2012;11:89–95.View ArticleGoogle Scholar
- Nguyen NTT, Vincens P, Roest Crollius H, Louis A. Genomicus 2018: karyotype evolutionary trees and on-the-fly synteny computing. Nucleic Acids Res. 2017;46:D816–D822.View ArticleGoogle Scholar
- Krumlauf R. Hox genes in vertebrate development. Cell. 1994;78:191–201.View ArticleGoogle Scholar
- Danchin EG, Pontarotti P. Towards the reconstruction of the bilaterian ancestral pre-MHC region. Trends Genet. 2004;20:587–91.View ArticleGoogle Scholar
- Wotton KR, Shimeld SM. Comparative genomics of vertebrate Fox cluster loci. BMC Genomics. 2006;7:271.View ArticleGoogle Scholar
- Morison IM, Reeve AE. A catalogue of imprinted genes and parent-of-origin effects in humans and animals. Hum Mol Genet. 1998;7:1599–609.View ArticleGoogle Scholar
- Dunzinger U, Haaf T, Zechner U. Conserved synteny of mammalian imprinted genes in chicken, frog, and fish genomes. Cytogenet Genome Res. 2007;117:78–85.View ArticleGoogle Scholar
- Renfree MB, Hore TA, Shaw G, Graves JA, Pask AJ. Evolution of genomic imprinting: insights from marsupials and monotremes. Annu Rev Genomics Hum Genet. 2009;10:241–62.View ArticleGoogle Scholar
- Muffato M, Roest Crollius H. Paleogenomics in vertebrates, or the recovery of lost genomes from the mist of time. Bioessays. 2008;30:122–34.View ArticleGoogle Scholar
- Nakatani Y, McLysaght A. Genomes as documents of evolutionary history: a probabilistic macrosynteny model for the reconstruction of ancestral genomes. Bioinformatics. 2017;33:i369–78.View ArticleGoogle Scholar
- Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, Amores A, Desvignes T, Batzel P, Catchen J, et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat Genet. 2016;48:427–37.View ArticleGoogle Scholar
- Ferguson-Smith MA, Trifonov V. Mammalian karyotype evolution. Nat Rev Genet. 2007;8:950–62.View ArticleGoogle Scholar
- Heimberg AM, Cowper-Sal-lari R, Semon M, Donoghue PC, Peterson KJ. microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the nature of the ancestral vertebrate. Proc Natl Acad Sci U S A. 2010;107:19379–83.View ArticleGoogle Scholar
- Furlong RF, Holland PW. Were vertebrates octoploid? Philos Trans R Soc Lond Ser B Biol Sci. 2002;357:531–44.View ArticleGoogle Scholar
- Lynch VJ, Wagner GP. Multiple chromosomal rearrangements structured the ancestral vertebrate Hox-bearing protochromosomes. PLoS Genet. 2009;5:e1000349.View ArticleGoogle Scholar
- Makino T, Hokamp K, Mclysaght A. The complex relationship of gene duplication and essentiality. Trends Genet. 2009;25:152–5.View ArticleGoogle Scholar
- Singh PP, Affeldt S, Malaguti G, Isambert H. Human dominant disease genes are enriched in paralogs originating from whole genome duplication. PLoS Comput Biol. 2014;10:e1003754.View ArticleGoogle Scholar
- Brunet FG, Roest Crollius H, Paris M, Aury J-M, Gibert P, Jaillon O, Laudet V, Robinson-Rechavi M. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol. 2006;23:1808–16.View ArticleGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009;19:327–35.View ArticleGoogle Scholar
- Peres A, Roe st Crollius H. Improving duplicated nodes position in vertebrate gene trees. BMC Bioinformatics. 2015;16(Suppl. 3):A9.View ArticleGoogle Scholar
- Sacerdot C, Louis A, Bon C, Berthelot C, Roest Crollius H: Chromosome evolution at the origin of the ancestral vertebrate genome - dataset. https://http://www.ebi.ac.uk/biostudies/studies/S-BSST199; 2018.Google Scholar
- Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009;10:48.View ArticleGoogle Scholar
- Louis A, Sacerdot C, Roest Crollius H: Chromosome evolution at the origin of the ancestral vertebrate genome - Genomicus server. http://www.genomicus.biologie.ens.fr/genomicus-69.10/ . 2018.Google Scholar
- Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–45.View ArticleGoogle Scholar