- Open Access
Raptor genomes reveal evolutionary signatures of predatory and nocturnal lifestyles
Genome Biology volume 20, Article number: 181 (2019)
Birds of prey (raptors) are dominant apex predators in terrestrial communities, with hawks (Accipitriformes) and falcons (Falconiformes) hunting by day and owls (Strigiformes) hunting by night.
Here, we report new genomes and transcriptomes for 20 species of birds, including 16 species of birds of prey, and high-quality reference genomes for the Eurasian eagle-owl (Bubo bubo), oriental scops owl (Otus sunia), eastern buzzard (Buteo japonicus), and common kestrel (Falco tinnunculus). Our extensive genomic analysis and comparisons with non-raptor genomes identify common molecular signatures that underpin anatomical structure and sensory, muscle, circulatory, and respiratory systems related to a predatory lifestyle. Compared with diurnal birds, owls exhibit striking adaptations to the nocturnal environment, including functional trade-offs in the sensory systems, such as loss of color vision genes and selection for enhancement of nocturnal vision and other sensory systems that are convergent with other nocturnal avian orders. Additionally, we find that a suite of genes associated with vision and circadian rhythm are differentially expressed in blood tissue between nocturnal and diurnal raptors, possibly indicating adaptive expression change during the transition to nocturnality.
Overall, raptor genomes show genomic signatures associated with the origin and maintenance of several specialized physiological and morphological features essential to be apex predators.
Birds of prey, also known as raptors, are key apex predators in nearly every terrestrial biotic community. Species in this guild comprise a non-monophyletic set of three orders within the core landbird clade, and recent large-scale phylogenomic studies have led to the suggestion that the common ancestor of this clade may have been an apex predator . There are three main orders of birds of prey: Strigiformes (true and barn owls), Falconiformes (falcons and caracaras), and Accipitriformes (eagles, buzzards, hawks, kites, and vultures). Species in each of these three raptor clades are obligate predators with adaptations for hunting, killing, and/or eating meat [2, 3]. Additionally, the common ancestor of owls evolved nocturnality, and most extant owl species are nocturnal, a habit they share with two other avian orders for which we have genome sequences (Caprimulgiformes and Apterygiformes). These independent transitions in lifestyle provide an opportunity to test for patterns of genome evolution that are linked with being raptorial and nocturnal, respectively [3,4,5].
Genomes have been published for more than 50 avian species, including nine birds of prey (peregrine and saker falcons, bald, white-tailed, and golden eagles, turkey vulture, barn owl, northern spotted owl, and burrowing owl) [3, 6,7,8,9]. However, the barn owl, white-tailed eagle, and turkey vulture genomes were assembled at low quality , and a detailed comparative evolutionary analysis was performed only for the falcons . Here, we report new high-quality whole-genome reference sequences of four raptor species (Eurasian eagle-owl [Bubo bubo] and oriental scops owl [Otus sunia] in Strigiformes, eastern buzzard [Buteo japonicus] in Accipitriformes, and common kestrel [Falco tinnunculus] in Falconiformes) with a set of raptor whole-genome and transcriptome data, extending the genomic coverage of raptors (Fig. 1, Additional file 1: Figure S1 and Tables S1, S2, and S3). Our investigation revealed numerous genomic signatures of evolution that are shared among the three raptor orders or that appear to be associated with nocturnal adaptations of owls.
Results and discussion
Raptor genome sequencing and assembly
We applied whole-genome shotgun sequencing and de novo assembly strategies [6, 10,11,12] to build reference genomes of the four raptor species (Eurasian eagle-owl, oriental scops owl, eastern buzzard, and common kestrel). The extracted DNA samples from wild individuals were sequenced using Illumina HiSeq platforms at high coverage (> 185×) using various insert sizes of short-insert (170 bp, 500 bp, and 700 bp for the two owls and eastern buzzard, and 350 bp and 550 bp for the common kestrel) and long-mate pair libraries (2 Kb, 5 Kb, 10 Kb, and 15 Kb; Additional file 1: Tables S4 and S5). The four raptor genomes showed relatively higher levels of genomic diversity compared to the previously assembled genomes of eagles and falcons (Additional file 1: Figures S2 and S3). Therefore, we tried to assemble reference genomes of the four raptor species using both SOAPdenove2  and Platanus  software in various conditions (Additional file 1: Tables S6, S7, and S8). Protein-coding genes (~ 16,000 to 18,000 genes) for these assemblies were predicted by combining de novo and homologous gene prediction methods with whole blood transcriptome data (Additional file 1: Table S9). By assessing assembly statistics, transcript mapping results, and single-copy ortholog mapping results (Additional file 1: Tables S7, S8, and S10), we obtained the final reference genomes for the four raptor species at a high quality, resulting in scaffold N50 sizes from 7.49 to 29.92 Mb; we defined as high-quality genome if the scaffold N50 length is > 1 Mb and as low-quality genome if scaffold N50 length is < 1 Mb, similar to the previous studies [1, 6] (Additional file 1: Table S11). Roughly 9.2% of the raptor genomes were predicted as transposable elements (Additional file 1: Table S12), consistent with the composition of other avian genomes . Additionally, we sequenced the whole genome and blood transcriptome from another 12 raptors (five owls, six accipitrids, and a falconid) and four non-raptor birds (Additional file 1: Tables S11, S13, S14, and S15), most of which were sequenced for the first time. The whole-genome sequences (WGS) of the 12 additional raptors and four non-raptor birds were not assembled, but aligned to the reference genomes of the closely related species for comparison purposes to remove possible bias derived from a small number of raptor/nocturnal species genomes; the whole genome sequenced but not assembled genomes were hereinafter referred to as WGS.
Evolutionary analysis of raptors compared to non-raptor birds
To identify the genetic basis of predation and nocturnality in raptors, we performed in-depth comparative evolutionary analyses for 25 birds of prey (including 10 nocturnal owls and 15 diurnal raptors) and 23 non-raptor bird species (including nocturnal brown kiwi  and chuck-will’s-widow , and other avian representatives genome assembled at a high quality [13,14,15,16] (Additional file 1: Figure S4 and Tables S1, S2 and S11). First, gene family clusters were constructed using a total of 25 assembled avian genomes (both 23 high- and 2 low-quality genomes; Additional file 1: Tables S11 and S16). Of the 29,115 orthologous gene families found in the 25 avian genomes, 12,662 were found in the all raptor genomes (Fig. 2a and Additional file 1: Figure S4). Based on the comparison of orthologous gene families among the only 23 high-quality avian genomes, 136 expanded and 559 contracted, 69 expanded and 1282 contracted, and 26 expanded and 554 contracted gene families were found in the common ancestors of Strigiformes, Accipitriformes, and Falconiformes, respectively, compared with the common ancestors of each raptor order and its sister group (Fig. 2b). Birds have evolved to employ many different strategies to obtain food, and raptors are specialized for hunting [2, 3, 7]. Several molecular signatures were shared by the three raptor orders, and the ancestral branches of these orders each showed an expansion of gene families associated with sensory perception of sound, regulation of anatomical structure morphogenesis, postsynaptic density and specialization, and learning functions (P < 0.05, Fisher’s exact test; Additional file 1: Table S17).
To further examine the shared evolutionary adaptations related to avian predatory lifestyles, we identified selection signatures shared by the three orders of birds of prey compared to the non-raptor birds (both high- and low-quality genomes) at the gene sequence level, which possibly reflects their shared requirement for highly developed sensory systems, efficient circulatory and respiratory systems, and exceptional flight capabilities necessary to capture prey [2,3,4,5, 7, 8]. Based upon dN/dS ratio calculation [17, 18], only RHCE and CENPQ genes were commonly found as positively selected genes (PSGs) in the three raptor ancestral branches of the Strigiformes, Accipitriformes, and Falconiformes (Additional file 2: Datasheets S1, S2, and S3). In addition, we identified three genes as positively selected in the ancestral branches of two raptor orders (SFTPA1 in the Strigiformes and Falconiformes; TFF2 and PARL in the Strigiformes and Accipitriformes). A lung surfactant protein encoded by SFTPA1 plays an essential role in the defense against respiratory pathogens and normal respiration . TFF2 gene encodes a protein that mediates gastric wound repair and inhibits gastric acid secretion . Finally, we found that 148 genes showed accelerated dN/dS in the raptor ancestral branches (Additional file 1: Table S18). Of these, SLC24A1, NDUFS3, and PPARA encode proteins that play roles in visual transduction cascade, mitochondrial membrane respiratory chain, and lipid metabolism, respectively [19, 21, 22].
It has been suggested that genes with elevated frequencies of guanine-cytosine at the third codon position (GC3) are more adaptable to external stresses, through providing more targets for de novo methylation that affect the variability of gene expression . Therefore, we analyzed the GC3 content in the three raptor orders, and we found that regulation of nervous system development, central nervous system neuron differentiation, and locomotion-associated genes showed high GC3 bias (Fig. 2c, Additional file 1: Figure S5, Table S19, and Additional file 2: Datasheet S6). In the highly conserved genomic regions (HCRs) among species belonging to the same order, 79 functional categories were commonly enriched in the three raptor orders (Additional file 1: Tables S20, S21, S22, S23, S24, S25, S26, S27, S28, and S29). Among these categories, eye, sensory organ, muscle organ, epithelium, and limb development functions were commonly conserved in the three raptor orders, but not in Passeriformes (a control avian order in this analysis), suggesting that those functions are important in raptors for their predatory lifestyle.
Evolutionary analysis of nocturnal birds compared to diurnal birds
Since several avian clades have adapted to a nocturnal lifestyle independently, the comparative method can be used to identify genes underlying convergent phenotypes that are associated with nocturnal adaptation . When comparing the gene families among the 23 high-quality avian genomes, two nocturnal bird groups (the ancestral branch of owls and brown kiwi) shared an expansion of gene families associated with synapse organization, sensory perception of chemical stimulus and sensory perception of smell functions (P < 0.05; Additional file 1: Tables S30 and S31). As expected, gene families associated with vision were commonly contracted in the nocturnal birds, when comparing gene family sizes between the extant species (Additional file 1: Tables S32 and S33). Specifically, gene loss of the violet/ultraviolet-sensitive opsin SWS1 (OPN1SW) was found in all of the nocturnal bird genomes, as previously reported [4, 24].
Compared to the diurnal birds, the nocturnal birds (including two low-quality nocturnal species genomes: barn owl and chuck-will’s-widow) also showed common selection signatures likely linked to their adaptation to a nocturnal environment. A total of 14 PSGs were shared among the three nocturnal groups, and 98 PSGs were shared by at least two nocturnal bird groups (Additional file 2: Datasheets S1, S4, and S5). The shared PSGs were overrepresented in the detection of mechanical stimulus involved in sensory perception of sound, wound healing, and skin development functions (Additional file 1: Table S34), although the enrichment did not pass the false discovery rate criterion. Interestingly, at least one of two wound healing-associated genes (TFF2 and COL3A1) [25, 26] was found to be positively selected in the nocturnal birds. Additionally, six genes (RHO, BEST1, PDE6B, RPE65, OPN4-1, and RRH) involved in light detection, and RDH8 that is involved in retinol (vitamin A1) metabolism [19, 27], showed accelerated dN/dS in the nocturnal birds (Additional file 1: Table S34). It is well-known that rhodopsin encoded by RHO is a light-sensitive receptor and thus enables vision in low-light conditions . Notably, RHO also showed a high level of GC3 biases in the nocturnal birds (Additional file 2: Datasheet S7). Furthermore, RPE65 encodes a protein that is a component of the vitamin A visual cycle of the retina, while PDE6B plays a key role in the phototransduction cascade and mutations in this gene result in congenital stationary night blindness. In addition, melanopsin encoded by OPN4-1 is a photoreceptor required for regulation of circadian rhythm [19, 27]. We also found that only SLC51A gene possesses specific amino acid sequences to the nocturnal birds (Additional file 1: Figure S6). SLC51A, also known as OST-α, is essential for intestinal bile acid transport , and it has been suggested that the bile acids affect the circadian rhythms by regulating the expression level of circadian clock-associated gene families [30, 31]. Interestingly, burrowing owl (Athene cunicularia), which is known as one of diurnal/crepuscular owls, showed a different sequence alteration pattern from the other nocturnal or diurnal birds in SLC51A locus (Additional file 1: Figure S6).
Sensory adaptations to nocturnal environment
Modifications of the major sensory systems (not only vision, but also olfaction, hearing, and circadian rhythm) are among the most common changes that occur when shifting from a diurnal to a nocturnal lifestyle . Analysis of the major sensory systems in the nocturnal bird genomes (owls, chuck-will’s-widow, and brown kiwi) revealed evidence of highly developed senses for adaptation to nocturnality. First, vision system-associated genes showed significantly accelerated dN/dS in the three nocturnal birds compared to diurnal birds (P < 0.05; Mann-Whitney U test; Fig. 3). Owls and chuck-will’s-widow (Caprimulgiformes) had the highest acceleration in vision-related genes. The total number of functional olfactory receptors (ORs) was not larger in the nocturnal birds than in the diurnal birds. However, the numbers of γ-clade ORs in the nocturnal birds and γ-c-clade ORs in the owls were significantly larger than others (after excluding two outlier species  showing extensive γ-c-clade OR expansion, chicken and zebra finch; P < 0.05, Mann-Whitney U test; Fig. 3 and Additional file 1: Table S36). The diversity of ORs is thought to be related to a detection range of odors , and we found that the diversity of α-clade ORs was significantly higher in the nocturnal birds (Additional file 1: Table S37). Additionally, the diversity in the γ-c-clade ORs was much higher in the owls and brown kiwi (Apterygiformes) compared to their sister groups (downy woodpecker in Piciformes and common ostrich in Struthioniformes, respectively), suggesting that increased olfactory abilities evolved repeatedly under nocturnal conditions [5, 12]. Hearing system-associated genes showed a relatively high level of dN/dS ratio in the owls and brown kiwi; interestingly, two vocal learning species (budgerigar in Psittaciformes and Anna’s hummingbird in Apodiformes) had the first and third most accelerated dN/dS for hearing-associated genes, which may be linked with their highly developed cognitive abilities [32, 34]. Circadian rhythm-associated genes showed the first and second largest acceleration in the owls and brown kiwi, but the lowest in chuck-will’s-widow, suggesting that these independent instances of adaptation to nocturnality occurred by different mechanisms . Additionally, we found that 33 hearing system- and 18 circadian rhythm-associated genes showed accelerated dN/dS in the three nocturnal bird groups (Additional file 1: Table S38). Considered together, these results suggest that selection to augment nocturnal vision and other sensory systems predictably compensates for loss of color vision, supporting a functional trade-off of sensory systems in nocturnal birds [4, 5, 12].
Changes in gene expression are thought to underlie many of the phenotypic differences between species . Therefore, we carried out cross-species comparison of gene expression among the blood transcriptomes from 13 raptors (five owls, four accipitrids, and four falconids) and five non-raptor birds (Additional file 1: Tables S11 and S15). We found that several vision-associated genes [19, 27] were differentially expressed in the owls (P < 0.05, moderated t test; Additional file 1: Figures S7 and S8, and Additional file 2: Datasheets S8, S9, S10, and S11). For example, PDCL (lowly expressed) and WFS1 (highly expressed) genes were differentially expressed specific to the owls. Interestingly, we could also find several circadian rhythm-related genes that were differentially expressed between the nocturnal and diurnal raptors. Three circadian rhythm-associated genes (ATF4, PER3, and NRIP1) were lowly expressed and two genes (BTBD9 and SETX) were highly expressed in the owls, whereas ATF4 and SIRT1 in the falconids and NRIP1 in the accipitrids were highly expressed. These results likely indicate that selectively driven expression switches contributed to nocturnal adaptation of owls . However, the comparison of gene expressions based on blood transcriptome may not represent gene expression profiles of vision system, and therefore, further studies are needed to confirm our results (e.g., analyzing expression profiles of retinal tissue and visual brain regions).
Our study provides whole-genome assemblies of Eurasian eagle-owl, oriental scops owl, eastern buzzard, and common kestrel, as well as a suite of whole-genome sequencing and transcriptome data from birds of prey. This is the first in-depth genomics study comparing the three raptor orders, and we identified a number of shared molecular adaptations associated with a predatory lifestyle. Furthermore, compared with diurnal birds, owls and other nocturnal birds showed distinct genomic features, especially in sensory systems. At the same time, it is important to note that genome assembly based on short-read sequencing methods could possess incomplete genomic regions, thus causing an erroneous result in comparative evolutionary analyses [36, 37]. Therefore, the candidate genes identified in this study need to be further confirmed with additional genomic data, and functional studies of candidate genes will be needed to understand the molecular mechanisms of adaptation. In overall, these results provide a genome-wide description and gene candidates of adaptations that have allowed each of these three raptor groups to evolve into diverse, ecologically dominant apex predators.
Sample and genome sequencing
All blood samples used for genome and transcriptome sequencing were collected from individuals being euthanized due to poor survival during wound treatment of rescued animals, except blood samples of A. flammeus, O. semitorques, and P. ptilorhynchus that were obtained from the live individuals during a medical check-up at the wildlife rescue center. Muscle tissue samples collected in 2017 were obtained from the fresh carcasses (Additional file 1: Table S3).
To build reference genome assemblies of the four raptor species (Eurasian eagle-owl, oriental scops owl, eastern buzzard, and common kestrel), we constructed 11 genomic libraries with various insert sizes (Illumina short-insert and long-mate pair libraries) for each species, according to the manufacturer’s protocol. The libraries were sequenced using Illumina HiSeq platforms (Additional file 1: Table S4). The remaining 12 raptor and four non-raptor bird samples were sequenced using Illumina HiSeq platforms with short-insert libraries (Additional file 1: Table S11c). Blood transcriptomes of ten raptors and four non-raptor birds were sequenced using Illumina HiSeq platforms according to the manufacturer’s instructions (Additional file 1: Table S11d).
Genome assembly and annotation
To assemble the raptor genomes, PCR duplicated, sequencing and junction adaptor contaminated, and low-quality (Q20) reads were filtered out. The short-insert and long-mate library reads were trimmed into 90 bp and 50 bp, respectively, to remove low-quality bases at the ends of the reads (Additional file 1: Table S5). As the four raptor genomes showed relatively higher levels of genomic diversity (Additional file 1: Figures S2 and S3), we assembled reference genomes of the four raptor species using both SOAPdenove2  and Platanus  software; the Platanus assembler is more efficient for highly heterozygous genomes . When performing the SOAPdenovo2 assembler, we applied various K-mer values (33, 43, 53, and 63) to obtain fragments with long contiguity. To reduce the number of gaps in the scaffolds, we closed the gaps using the short-insert library reads in two iterations. To correct base-pair-level errors, we performed two iterations of aligning the short-insert library reads to the gap-closed scaffolds using BWA-MEM  and calling variants using SAMtools . In this process, homozygous variants were assumed as erroneous sequences from the assembly process, and thus substituted for the correction purpose (Additional file 1: Table S7).
To select final high-quality reference assemblies for the four raptors, we annotated all assemblies and evaluated the quality of each assembly. We first searched the genomes for tandem repeats and transposable elements (Additional file 1: Table S9) using Tandem Repeats Finder (version 4.07b) , Repbase (version 19.03) , RepeatMasker (version 4.0.5) , RMBlast (version 2.2.28) , and RepeatModeler (version 1.0.7) . The protein-coding genes were predicted by combining de novo and homology-based gene prediction methods with the blood transcriptome data for each assembly. For the homology-based gene prediction, we searched for avian protein sequences from the NCBI database using TblastN (version 2.2.26)  with an E value cutoff of 1E−5. The matched sequences were clustered using GenBlastA (version 1.0.4)  and filtered by coverage and identity of > 40% criterion. Gene models were predicted using Exonerate (version 2.2.0) . For the de novo gene prediction, AUGUSTUS (version 3.0.3)  was used with the blood transcriptome for each species. We filtered out possible pseudogenes having premature stop codons and single exon genes that were likely to be derived from retro-transposition (Additional file 1: Table S9). The assembly and gene annotation qualities were assessed by aligning independently de novo assembled transcripts using the Trinity software  and by searching for evolutionary conserved orthologs using BUSCO software  (Additional file 1: Tables S8 and S10). By considering the assembly statistics (e.g., N50 values and assembled sequence length) and the completeness of the genome assembly, final high-quality reference assemblies for the four raptors were obtained. Genome, transcriptome, and protein sequences for other comparison species were downloaded from the NCBI database. Genes with possible premature stop codons were excluded in the comparative analyses. The northern spotted owl’s genome and protein sequences were acquired from the Zenodo linked in the published paper .
Comparative evolutionary analyses
Orthologous gene families were constructed for avian genomes using the OrthoMCL 2.0.9 software (Additional file 1: Figure S4) . To estimate divergence times of the 25 avian representatives, protein sequences of the avian single-copy gene families were aligned using the MUSCLE program . The poorly aligned regions from the alignments were trimmed using the trimAl software . The divergence times were estimated using the MEGA7 program  with the phylogenetic tree topology of published previous studies [1, 6] and the TimeTree database . When we calculated the divergence times among the 23 species with high-quality reference genomes (Fig. 2b), the date of the node between chicken and rock dove was constrained to 98 million years ago (MYA), chicken and brown kiwi was constrained to 111 MYA, and common ostrich and brown kiwi was constrained to 50–105 according to the divergence times from TimeTree. To estimate divergence times among the birds of prey (Fig. 1), the date of the node between downy woodpecker and Eurasian eagle-owl constrained to 61–78 MYA and common kestrel and budgerigar was constrained to 60–80 MYA according to the divergence times from the previous studies [1, 6] and TimeTree; as the divergence times and phylogenetic topologies of the previous studies [1, 6] and TimeTree were quite different, we used the divergence times from the previous studies as the minimum and the divergence times from the TimeTree database as the maximum constraints. A gene family expansion and contraction analysis for the ancestral branches of the three bird of prey orders was conducted using the CAFÉ program  with a P < 0.05 criterion. As the gene family expansion and contraction analysis can be affected by erroneous genomic regions derived from the assembly process [36, 37], we calculated the mapping depth coverage of genes in the raptor and nocturnal bird genomes, and then filtered out genes having abnormal depth coverage (if the mapping depth coverage of genes is less than half of the average depth coverage [less than quarter of the average depth coverage for genes in sex chromosomal scaffolds] or more than twice of the average depth coverage; Additional file 1: Figure S9). The significantly different gene family sizes of the present nocturnal bird species were identified by performing the Mann-Whitney U test (P < 0.05).
To identify selection at the gene sequence level, two orthologous gene sets were compiled, as previously reported : the single-copy orthologs among avian species and representative genes from multiple-copy orthologs. The representative genes from multiple-copy orthologs were selected, if all species’ protein sequences are reciprocally best matched to a chicken protein sequence using BLASTp with an E value cutoff of 1E−5. PRANK  was used to construct multiple sequence alignments among the orthologs. The CODEML program in PAML 4.5 was used to estimate the dN/dS ratio (non-synonymous substitutions per non-synonymous site to synonymous substitutions per synonymous site) . The one-ratio model was used to estimate the general selective pressure acting among comparison species. The two-ratio model (model = 2) was used to ensure that the dN/dS ratio is the difference between foreground species (raptors and nocturnal birds, respectively) and other species. Additionally, the dN/dS ratios for each order-level branch of raptors and nocturnal birds were used to confirm if the foreground dN/dS ratio is not biased to a specific raptor and nocturnal bird order. The branch-site test was also conducted . Statistical significance was assessed using likelihood ratio tests with a conservative 10% false discovery rate criterion (Additional file 2: Datasheets S1, S2, S3, S4, and S5).
We identified target species-specific amino acid sequences . To filter out biases derived from individual-specific variants, we used all of the raptor WGS data by mapping to the Eurasian eagle-owl genome for Strigiformes, the eastern buzzard genome for Accipitriformes, and the common kestrel genome for Falconiformes. The mapping was conducted using BWA-MEM, and consensus sequences were generated using SAMtools with the default options, except the “-d 5” option (Additional file 1: Table S13). When we identified the specific amino acid sequences, protein sequences of other birds from the NCBI database were also compared. We also checked multiple sequence alignments manually to remove artifacts. To identify genetic diversity based on heterozygous SNV rates, variants were also called using Sentieon pipeline  with the default options, except the “--algo Genotyper” option (Additional file 1: Table S14). The heterozygous SNV rates were calculated by dividing the total number of heterozygous SNVs by the length of sufficiently mapped (> 5 depth) genomic regions (Additional file 1: Figure S3).
To identify HCRs in the three raptor orders and Passeriformes, we scanned genomic regions that show significantly reduced genetic variation by comparing variations of each window and whole genome as previously suggested . In the case of Passeriformes, whole-genome data of four Passeriformes species (medium ground-finch, white-throated sparrow, common canary, and collared flycatcher) were mapped to the zebra finch genome assembly, and then variants were identified using the same methods used for the three raptor orders. Genetic variation was estimated by calculating the number of different bases in the same order genomes within each 100-Kb window. P value was calculated by performing Fisher’s exact test to test whether the genetic variation of each window is significantly different from that of the whole genome. Only adjusted P values (q values)  of < 0.0001 were considered significant. As both ends of scaffolds have usually incorrect sequences and many gaps, the middle 10 Kb of each significantly different window was only considered as HCRs (Additional file 1: Table S20).
For functional enrichment tests of candidate genes, GO annotations of chicken, zebra finch, turkey, flycatcher, duck, anole lizard, and human genomes were downloaded from the Ensembl database  and used to assign the avian protein-coding genes with GO categories. A KEGG pathway was assigned using KAAS . Functional information of candidate genes was retrieved from the GO, KEGG, UniProt , and GeneCards  databases.
De novo transcriptome assembly and differentially expressed genes
The blood transcriptome data were assembled using Trinity software . Contaminated transcripts were searched for bacteria and fungi sequence from the Ensembl database using BLASTN and filtered by identity of > 95% and E value cutoff of 1E−6 criteria. Coding sequence (CDS) were predicted using TransDecoder [49, 64]. To identify differentially expressed genes, RNA reads were aligned to the reference genome (species whole genome assembled) or the assembled transcripts (species without reference genome) using TopHat2 software . The number of reads that were mapped to orthologous genes was counted using HTSeq-0.6.1 software  and then converted into RPKM (reads per kilobase per million mapped reads) value (Additional file 1: Table S15). The RPKM values were normalized with the Trimmed Mean of M values (TMM)  correction using the R package edgeR . The significance of differential expression was calculated by the moderated t test  (ebayes function) using the R package limma (P < 0.05; Additional file 2: Datasheets S8, S9, S10, and S11) .
Sensory system-associated gene analysis
To compare the olfactory sense across avian clades, we collected a total of 215 chicken olfactory receptor (OR) gene sequences (functional only) from a previously published paper . These ORs were then searched against the 25 avian species genomes using TblastN with default parameters. For OR candidates lacking start/stop codons, we searched 90 bp upstream to find start codons and 90 bp downstream to find stop codons. After collecting sequences for each species, the CD-HIT program  was used to remove redundant sequences with an identity cutoff of 100%. A Pfam  search against sequences using hmmer-3.1 program  with an E value cutoff of 1.0 was used to identify sequences that contained 7tm_4 domain. To assign OR clades and filter out non-OR genes, the multiple sequence alignments and phylogenetic analysis were conducted with previously clade-assigned OR and non-OR genes of human, anole lizard, and chicken  using ClustalW2 program . The remaining OR candidates were classified into three categories: (1) intact genes with normal start and stop codons and longer than 215 amino acid sequences, thus can code for seven transmembrane domains; (2) partial genes without start and/or stop codons; and (3) pseudogenes with frameshift mutations and/or premature stop codons (Additional file 1: Table S36). OR genes have evolved by multiple duplications and display a large number of pseudogenes, which makes the assembly of OR regions challenging and complicates the annotation process of OR genes [5, 12, 77, 78]. To overcome these issues, we also calculated the diversity of OR genes from the clade-assigned intact genes by Shannon entropy  using BioEdit  as previously suggested [5, 12] (Additional file 1: Table S37). Amino acid positions with above 20% of gaps were excluded, and entropy was averaged across all amino acid positions.
The vision system-associated genes were retrieved from previous studies [5, 13]. Hearing-associated genes were retrieved from the AmiGO database  using GO categories related to hearing . Circadian rhythm-related genes were retrieved from the AmiGO database using “biorhythm/circadian” as search keywords. The protein sequences with the same gene name were aligned using ClustalW2 and manually inspected one by one for quality. A total of 402 sensory system-associated genes (64 genes for vision, 219 genes for hearing, and 133 genes for circadian rhythm) shared by the brown kiwi, chuck-will’s-widow, and at least two Strigiformes were included for selection constraint (the dN/dS ratio) analyses (Additional file 1: Table S38).
Availability of data and materials
All the sequences reported in this study are deposited into the NCBI Sequence Read Archive under the accession number PRJNA431699 .
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346:1320–31.
Fowler DW, Freedman EA, Scannella JB. Predatory functional morphology in raptors: interdigital variation in talon size is related to prey restraint and immobilisation technique. PLoS One. 2009;4:e7999.
Zhan X, Pan S, Wang J, Dixon A, He J, Muller MG, et al. Peregrine and saker falcon genome sequences provide insights into evolution of a predatory lifestyle. Nat Genet. 2013;45:563–6.
Wu Y, Hadly EA, Teng W, Hao Y, Liang W, Liu Y, et al. Retinal transcriptome sequencing sheds light on the adaptation to nocturnal and diurnal lifestyles in raptors. Sci Rep. 2016;6:33578.
Le Duc D, Schöneberg T. Adaptation to nocturnality - learning from avian genomes. Bioessays. 2016;38:694–703.
Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, et al. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346:1311–20.
Van Den Bussche RA, Judkins ME, Montague MJ, Warren WC. A resource of genome-wide single nucleotide polymorphisms (Snps) for the conservation and management of golden eagles. J Raptor Res. 2017;51:368–77.
Hanna ZR, Henderson JB, Wall JD, Emerling CA, Fuchs J, Runckel C, et al. Northern spotted owl (Strix occidentalis caurina) genome: divergence with the barred owl (Strix varia) and characterization of light-associated genes. Genome Biol Evol. 2017;9:2522–45.
Mueller JC, Kuhl H, Boerno S, Tella JL, Carrete M, Kempenaers B. Evolution of genomic variation in the burrowing owl in response to recent colonization of urban areas. Proc Biol Sci. 2018;285:20180206.
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1:18.
Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 2014;24:1384–95.
Le Duc D, Renaud G, Krishnan A, Almén MS, Huynen L, Prohaska SJ, et al. Kiwi genome provides insights into evolution of a nocturnal lifestyle. Genome Biol. 2015;16:147.
Ganapathy G, Howard JT, Ward JM, Li J, Li B, Li Y, et al. High-coverage sequencing and annotated assemblies of the budgerigar genome. Gigascience. 2014;3:11.
Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, et al. The genome of a songbird. Nature. 2010;464:757–62.
Shapiro MD, Kronenberg Z, Li C, Domyan ET, Pan H, Campbell M, et al. Genomic diversity and evolution of the head crest in the rock pigeon. Science. 2013;339:1063–7.
International Chicken Genome Sequencing Consortium. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432:695–716.
Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2017;24:1586–91.
Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005;22:2472–9.
Safran M, Dalah I, Alexander J, Rosen N, Iny Stein T, Shmoish M, et al. GeneCards version 3: the human gene integrator. Database (Oxford). 2010;2010:baq020.
Engevik K, Aihara E, Matthis A, Montrose M. TFF2, CXCR4 and EGF-R mediated gastric wound repair in vitro in gastric organoids. FASEB J. 2017;31(Suppl 1):1043–8.
Saada A, Vogel RO, Hoefs SJ, van den Brand MA, Wessels HJ, Willems PH, et al. Mutations in NDUFAF3 (C3ORF60), encoding an NDUFAF4 (C6ORF66)-interacting complex I assembly protein, cause fatal neonatal mitochondrial disease. Am J Hum Genet. 2009;84:718–27.
Kersten S, Seydoux J, Peters JM, Gonzalez FJ, Desvergne B, Wahli W. Peroxisome proliferator–activated receptor α mediates the adaptive response to fasting. J Clin Invest. 1999;103:1489–98.
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics. 2010;11:308.
Borges R, Khan I, Johnson WE, Gilbert MT, Zhang G, Jarvis ED, et al. Gene loss, adaptive evolution and the co-evolution of plumage coloration genes with opsins in birds. BMC Genomics. 2015;16:751.
Yu G, Zhang Y, Xiang Y, Jiang P, Chen Z, Lee W, et al. Cell migration-promoting and apoptosis-inhibiting activities of Bm-TFF2 require distinct structure basis. Biochem Biophys Res Commun. 2010;400:724–8.
Crane NJ, Brown TS, Evans KN, Hawksworth JS, Hussey S, Tadaki DK, et al. Monitoring the healing of combat wounds using Raman spectroscopic mapping. Wound Repair Regen. 2010;18:409–16.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–9.
Zhao H, Ru B, Teeling EC, Faulkes CG, Zhang S, Rossiter SJ. Rhodopsin molecular evolution in mammals inhabiting low light environments. PLoS One. 2009;4:e8326.
Dawson PA, Hubbert M, Haywood J, Craddock AL, Zerangue N, Christian WV, et al. The heteromeric organic solute transporter α-β, Ostα-Ostβ, is an ileal basolateral bile acid transporter. J Biol Chem. 2005;280:6960–8.
Govindarajan K, MacSharry J, Casey PG, Shanahan F, Joyce SA, Gahan CG. Unconjugated bile acids influence expression of circadian genes: a potential mechanism for microbe-host crosstalk. PLoS One. 2016;11:e0167319.
Zhang F, Duan Y, Xi L, Wei M, Shi A, Zhou Y, et al. The influences of cholecystectomy on the circadian rhythms of bile acids as well as the enterohepatic transporters and enzymes systems in mice. Chronobiol Int. 2018;35:673–90.
Khan I, Yang Z, Maldonado E, Li C, Zhang G, Gilbert MT, et al. Olfactory receptor subgenomes linked with broad ecological adaptations in Sauropsida. Mol Biol Evol. 2015;32:2832–43.
Hasin-Brumshtein Y, Lancet D, Olender T. Human olfaction: from genomic variation to phenotypic diversity. Trends Genet. 2009;25:178–84.
Emery NJ. Cognitive ornithology: the evolution of avian intelligence. Philos Trans R Soc Lond Ser B Biol Sci. 2006;361:23–43.
Brawand D, Soumillon M, Necsulea A, Julien P, Csárdi G, Harrigan P, et al. The evolution of gene expression levels in mammalian organs. Nature. 2011;478:343–8.
Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods. 2011;8:61–5.
Korlach J, Gedman G, Kingan SB, Chin CS, Howard JT, Audet JN, et al. De novo PacBio long-read and phased avian genome assemblies correct and add to reference genes generated with intermediate and short reads. Gigascience. 2017;6:1–6.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv. 2013;1303:3997.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–80.
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.
Bedell JA, Korf I, Gish W. MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics. 2000;16:1040–1.
RMBlast. http://www.repeatmasker.org/RMBlast.html. Accessed 16 Aug 2016.
Abrusán G, Grundmann N, DeMester L, Makalowski W. TEclass--a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics. 2009;25:1329–30.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.
She R, Chu JS, Wang K, Pei J, Chen N. GenBlastA: enabling BLAST to identify homologous gene sequences. Genome Res. 2009;19:143–9.
Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics. 2005;6:31.
Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–9.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2.
Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–89.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–7.
Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–3.
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–4.
Hedges SB, Dudley J, Kumar S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics. 2006;22:2971–2.
Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013;30:1987–97.
Löytynoja A, Goldman N. An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A. 2005;102:10557–62.
Weber JA, Aldana R, Gallagher BD, Edwards JS. Sentieon DNA pipeline for variant detection—software-only solution, over 20× faster than GATK 3.3 with identical results. PeerJ PrePrints. 2016;4:e1672v2.
Kim S, Cho YS, Kim HM, Chung O, Kim H, Jho S, et al. Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly. Genome Biol. 2016;17:211.
Storey JD. A direct approach to false discovery rates. J R Stat Soc Series B Stat Methodol. 2002;64:479–98.
Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, et al. Ensembl 2017. Nucleic Acids Res. 2017;45:D635–42.
Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35:W182–5.
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45:D158–69.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494–512.
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36.
Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40.
McCarthy DJ, Smyth GK. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics. 2009;25:765–71.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.
Steiger SS, Kuryshev VY, Stensmyr MC, Kempenaers B, Mueller JC. A comparison of reptilian and avian olfactory receptor gene repertoires: species-specific expansion of group gamma genes in birds. BMC Genomics. 2009;10:446.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44:D279–85.
Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41:e121.
Niimura Y. On the origin and evolution of vertebrate olfactory receptor genes: comparative genome analysis among 23 chordate species. Genome Biol Evol. 2009;1:34–44.
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2009;31:3497–500.
Niimura Y, Nei M. Evolution of olfactory receptor genes in the human genome. Proc Natl Acad Sci U S A. 2003;100:12235–40.
Newman T, Trask BJ. Complex evolution of 7E olfactory receptor genes in segmental duplications. Genome Res. 2003;13:781–93.
Shannon CE. The mathematical theory of communication. Bell Syst Tech J. 1948;27:379–423.
Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–8.
Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25:288–9.
Cho YS, Jun J, Kim JA, Kim HM, Chung O, Kang SG, et al. Birds of prey genome. Sequence Read Archive. 2019. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA431699.
Korea Institute of Science and Technology Information (KISTI) provided us with Korea Research Environment Open NETwork (KREONET), which is the Internet connection service for efficient information and data transfer. We thank ARA Jo for bird illustrations.
The review history is available as Additional file 3.
This work was supported by the grants from the National Institute of Biological Resources (NIBR), funded by the Ministry of Environment (MOE) of the Republic of Korea (NIBR201403101, NIBR201503101, NIBR201703102). This work was also supported by the Genome Korea Project in Ulsan (800 genome sequencing) Research Fund (1.180017.01) of Ulsan National Institute of Science & Technology (UNIST).
Ethics approval and consent to participate
Permissions for the endangered species of Korea or animals listed as natural monument were obtained from the Ministry of Environment (MOE) or from the Cultural Heritage Administration (CHA), respectively (see Additional file 1: Table S3 for detailed sampling and permission information). No animals were killed or captured as a result of these studies.
Consent for publication
YSC, JeJ, OC, and HYL are employees, and JB is a chief executive officer of Clinomics Inc. JB, YSC, and HMK have an equity interest in the company. All other coauthors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures S1-S9, Tables S1-S38, and Supplementary Methods. Supplementary figures, tables, and methods supporting the manuscript. (PDF 4139 kb)
Datasheets S1-S11. Supplementary datasheets supporting the manuscript. (XLS 1689 kb)
Review history. (DOCX 48 kb)