Raptor genomes reveal evolutionary signatures of predatory and nocturnal lifestyles

Background Birds of prey (raptors) are dominant apex predators in terrestrial communities, with hawks (Accipitriformes) and falcons (Falconiformes) hunting by day and owls (Strigiformes) hunting by night. Results Here, we report new genomes and transcriptomes for 20 species of birds, including 16 species of birds of prey, and high-quality reference genomes for the Eurasian eagle-owl (Bubo bubo), oriental scops owl (Otus sunia), eastern buzzard (Buteo japonicus), and common kestrel (Falco tinnunculus). Our extensive genomic analysis and comparisons with non-raptor genomes identify common molecular signatures that underpin anatomical structure and sensory, muscle, circulatory, and respiratory systems related to a predatory lifestyle. Compared with diurnal birds, owls exhibit striking adaptations to the nocturnal environment, including functional trade-offs in the sensory systems, such as loss of color vision genes and selection for enhancement of nocturnal vision and other sensory systems that are convergent with other nocturnal avian orders. Additionally, we find that a suite of genes associated with vision and circadian rhythm are differentially expressed in blood tissue between nocturnal and diurnal raptors, possibly indicating adaptive expression change during the transition to nocturnality. Conclusions Overall, raptor genomes show genomic signatures associated with the origin and maintenance of several specialized physiological and morphological features essential to be apex predators. Electronic supplementary material The online version of this article (10.1186/s13059-019-1793-1) contains supplementary material, which is available to authorized users.


List of Supplementary
. Bird of prey genome and transcriptome data used in this study ..   The x-axis represents K-mer depth, and the y-axis represents the proportion of K-mer count at that depth. Individuals with two peaks in the K-mer plot represent greater heterozygosity in them. Figure S3. Genetic diversity in 25 bird of prey species. The heterozygous SNVs rates (yaxis) were calculated by dividing the total number of heterozygous SNVs by the length of sufficiently mapped (>5 depth) genomic regions. The estimated heterozygous SNVs rates were based on single individuals. The heterozygous SNVs rates can be altered according to which reference assembly is used and the assembly quality.   gray, respectively. We could not find the amino acid residue in the chuck-will's widow, as SLC51A gene is partially annotated in cuck-will's widow genome. Figure S7. Differentially expressed genes (DEGs) in the birds of prey species. P-value (P <0.05) heatmap of differentially expressed genes in the blood transcriptome of three raptor orders (Strigiformes, Accipitriformes, and Falconiformes). Greater and less than 2-fold expressions are shown in each column. The orders of DEGs were sorted by P-values in each target group (raptor-specific, Strigiformes-specific, Accipitriformes-specific, and Falconiformes-specific  Figure S9. The mapping depth coverage of genes in the birds of prey and nocturnal birds. A small peak was observed at the half of the average mapping depth in the most species, suggesting that there are erroneous genomic regions derived from the assembly process. Chuck-will's-widow genome has many zero-coverage genes. Therefore, we filtered out the genes having abnormal depth coverage.         Table S10. Evaluation of the completeness of bird of prey assemblies and gene sets using single-copy orthologs mapping approach. Final assemblies were selected by assessing assembly statistics, transcripts mapping results, and single-copy ortholog mapping results.

Supplementary Tables
For the common kestrel, only the Platanus assembly was tested, since the N50 lengths of contig and scaffold were much longer than those of SOAPdenovo2 assembly.                Table S25. Accipitriformes specific GO enrichment of genes in the highly conserved genomic regions (HCRs). P-value was calculated by Fisher's exact test.           Table S36. Olfactory receptors identified in 25 avian genomes. P-value was calculated by Mann-Whitney U test, after removing two outlier species, chicken and zebra finch.

Species identification
Species of the sequenced samples were confirmed by mapping their DNA sequences to previously reported mitochondrial sequences (COI and CYTB genes) for closely related species using BWA-MEM [S18] with default options. Variants were identified using the mpileup command in SAMtools [S19]. The consensus sequences were generated using the vcf2fq command. The COI gene of common kestrel was sequenced by Sanger method. Phylogenetic reconstruction was performed using MrBayes 3.2 software [S20] with the "lset=mixed rates=invgamma" substitution model specifications. Species sampling in the phylogeny was designed to include all species from the families Accipitridae, Ardeidae, Falconidae, Picidae, Strigidae, and Threskiornithidae that occur in South Korea. Species that could not be included were Aviceda leuphotes (Accipitridae), Dendrocopos hyperythrus (Picidae), and Threskiornis melanocephalus (Threskiornithidae). In case of the two latter species, congeneric species were included. KM364882 sequence was first attributed to B. buteo burmanicus, a junior synonym of B. refectus; the sampling locality is outside the known range of B. refectus and suggests that it is a misidentified B. japonicus. The latter hypothesis is also supported by comparative analyses with the tRNAGlu-Pseudo-control Region sequences [S21].

Sequence filtering criteria
To reduce sequencing error effects in assembling the bird of prey genomes (Eurasian eagleowl, oriental scops-owl, eastern buzzard, and common kestrel), we filtered out PCR duplicated, low quality, and adaptor contaminated reads. The filtering criteria for exclusion were as follows: 1) Reads were considered PCR duplicates if read1 (left) and read2 (right) of the two paired end reads were identical. The PCR duplicated reads were filtered out remaining one unique read pairs.
2) Reads with sequencing adapter contamination were filtered out.
Sequencing adapter left= "GATCGGAAGAGCACACGTCTGAACTCCAGTCAC" Sequencing adapter right= "GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" 3) Reads with ambiguous base (N) for more than 5% of the reads were filtered out. 4) Reads with an average base quality below 20 (<Q20) were filtered out. 5) Reads with junction adapter contamination for mate-pair libraries were filtered out.