Reconstruction of avian ancestral karyotypes reveals differences in the evolutionary history of macro- and microchromosomes

Background Reconstruction of ancestral karyotypes is critical for our understanding of genome evolution, allowing for the identification of the gross changes that shaped extant genomes. The identification of such changes and their time of occurrence can shed light on the biology of each species, clade and their evolutionary history. However, this is impeded by both the fragmented nature of the majority of genome assemblies and the limitations of the available software to work with them. These limitations are particularly apparent in birds, with only 10 chromosome-level assemblies reported thus far. Algorithmic approaches applied to fragmented genome assemblies can nonetheless help define patterns of chromosomal change in defined taxonomic groups. Results Here, we make use of the DESCHRAMBLER algorithm to perform the first large-scale study of ancestral chromosome structure and evolution in birds. This algorithm allows us to reconstruct the overall genome structure of 14 key nodes of avian evolution from the Avian ancestor to the ancestor of the Estrildidae, Thraupidae and Fringillidae families. Conclusions Analysis of these reconstructions provides important insights into the variability of rearrangement rates during avian evolution and allows the detection of patterns related to the chromosome distribution of evolutionary breakpoint regions. Moreover, the inclusion of microchromosomes in our reconstructions allows us to provide novel insights into the evolution of these avian chromosomes, specifically. Electronic supplementary material The online version of this article (10.1186/s13059-018-1544-8) contains supplementary material, which is available to authorized users.


Selection of reference and descendant genomes
The set of genomes used in ancestors' chromosome reconstructions included 27 avian genomes (seven chromosome-and 20 scaffold-level assemblies) and four outgroup genomes (two chromosome-and two scaffold-level assemblies: three non-avian reptiles and one mammal; Table   S1). These genomes were selected according to their assembly continuity (N50 > 2 Mbp) and alignment coverage of the reference genome (zebra finch genome coverage >96%). On one hand, assembly continuity is a critical selection parameter because: a) the use of highly fragmented descendant genomes will reduce the support to predicted ancestral adjacencies and result in fragmented ancestors' karyotype reconstructions, and b) increased genome continuity will increase the chances of detection of evolutionary breakpoint regions (EBRs) flanking genome rearrangements, as EBRs located in between scaffolds in the extant species assemblies will be missed or lead to misassignment of detected EBRs on the phylogenetic tree. On the other hand, high reference genome alignment coverage assures a more complete reconstruction of ancestral karyotypes. The final set of avian species included in our reconstructions represent 15 out of 37 avian orders, a comprehensive sampling of the avian phylogenetic tree, and allowed the reconstruction of 14 ancestor chromosome structures starting with the Avian ancestor and leading to the zebra finch.

Selection of resolution for syntenic fragment detection and RACF reconstructions
We first performed the reconstruction of the Neognathae ancestor chromosome structure at three (100, 300 and 500 Kbp) syntenic fragment (SF) resolutions, to set the minimum length of SF to be included in the reconstructions. We selected the Neognathae ancestor for this experiment as both ingroup and avian outgroup genome alignments covered >94% of the zebra finch genome.
To be included in the RACFs an SF needs to be present in at least one of the outgroup genomes therefore genomes with high coverage alignments of the reference genome would result in more complete RACFs. Also, this test aimed at the establishment of the optimal resolution for avian ancestral genome reconstructions. We found that the number of RACFs obtained at 100 Kbp SF resolution was the lowest (N=62) and the reference genome coverage was the highest (79%; Table   S2). At 300 Kbp resolution, RACFs covered ~46% of reference genome and the number of reconstructed RACFs was higher (N=80 ; Table S2). Lastly, at 500 Kbp resolution, there were 64 RACFs reconstructed but they covered just ~31% of the reference genome (Table S2). To minimise the fragmentation of the reconstructed ancestral genomes and, at the same time, maximise the coverage of the reference genome the 100 Kbp SF resolution was used for the reconstructions.

EBR distribution in Avian ancestor chromosomes
To test if EBRs were distributed uniformly across the ancestral avian chromosomes, we calculated the difference between the number of observed and expected EBRs for each Avian ancestor chromosomes. The expected number of EBRs per chromosome was calculated by multiplying the length of the chromosome (in Mbp) by the genome-wide rate of EBRs. The latter was calculated by dividing the total number of detected EBRs by the total length of the reconstructed Avian ancestor genome.
We observed that all Avian ancestor chromosomes with an EBR density significantly lower than average also possessed fewer EBRs than would be expected if the EBRs were distributed uniformly across the genome (FDR-corrected p-value <0.02; Table S12). We noted the same trend for the chromosomes with an EBR density higher than average, that is, these chromosomes contained a significantly higher number of EBRs than would be expected from a uniform EBR distribution along the genome (FDR corrected p-value <0.03; Table S12).