The Australian black swan (Cygnus atratus) is an iconic species with contrasting plumage to that of the closely related northern hemisphere white swans. The relative geographic isolation of the black swan may have resulted in a limited immune repertoire and increased susceptibility to infectious diseases, notably infectious diseases from which Australia has been largely shielded. Unlike mallard ducks and the mute swan (Cygnus olor), the black swan is extremely sensitive to highly pathogenic avian influenza. Understanding this susceptibility has been impaired by the absence of any available swan genome and transcriptome information.
Here, we generate the first chromosome-length black and mute swan genomes annotated with transcriptome data, all using long-read based pipelines generated for vertebrate species. We use these genomes and transcriptomes to show that unlike other wild waterfowl, black swans lack an expanded immune gene repertoire, lack a key viral pattern-recognition receptor in endothelial cells and mount a poorly controlled inflammatory response to highly pathogenic avian influenza. We also implicate genetic differences in SLC45A2 gene in the iconic plumage of the black swan.
Together, these data suggest that the immune system of the black swan is such that should any avian viral infection become established in its native habitat, the black swan would be in a significant peril.
The distinctive black plumage of the native Australian black swan (Cygnus atratus) is in stark contrast to the white swans that are native to Europe and North America. This unique feature has resulted in the black swan playing an important role in Australian heraldry and culture. The limited native geographic range (Australia) and relative isolation of the black swan has direct consequences for its immune repertoire and susceptibility to infectious disease common to other parts of the world. Specifically, geographic isolation can result in founder effects and reduced immune diversity as a result of limited pathogen challenge .
The native Australian black swan has a remarkably distinct response to infection by highly pathogenic avian influenza (HPAI) virus compared to the closely related white swans (e.g., the mute swan; Cygnus olor) and other waterfowl [2, 3]. Unlike mallard ducks and mute swans, the black swan is extremely sensitive to HPAI, succumbing to the disease within 2 to 3 days post-infection. This disease pathogenesis mirrors that of infected chickens, viewed as the most susceptible species to HPAI . One of the striking features common to both black swans and chickens is that HPAI viruses preferentially infect endothelial cells, which may contribute to the disease severity in these two species . These experimental studies are consistent with reports of natural infections, which suggest that captive black swans quickly succumb to HPAI while co-housed mute swans survive the infection .
Comparative genomics has played an important role in understanding species-dependent differences in HPAI pathogenesis  while also revealing the unique immune systems of many native Australian fauna . However, comparative genomics is contingent upon the availability of high-quality species-specific genomes and transcriptomes.
Here, we generate the first black and mute swan reference genomes and transcriptomes, including the transcriptional response of primary black swan endothelial cells to HPAI. These data show that the black swan has numerous unique characteristics including (i) lack of an expanded immune gene repertoire, (ii) undetectable Toll-like Receptor (TLR) 7 gene expression in infected endothelial cells, and (iii) a dysregulated pro-inflammatory response to viral infection that is likely to leave the species highly susceptible to viral infections such as HPAI. It is also likely that genetic differences in melanin production contribute to the distinctive black plumage of the black swan.
The chromosome-length reference genomes for black and mute swan were constructed according to a Pacbio continuous long-read-based (CLR) DNA Zoo pipeline and Vertebrate Genomes Pipeline 1.5 , respectively (Additional file 1: Figures S1 and S2). This included scaffolding contigs with Hi-C (Hi-C contact maps; Additional file 1: Figure S3) and Bionano maps. Curations of the assemblies identified scaffolds representing 34 autosomes plus the Z sex chromosome and were named according to the descending order of the physical size. Others were nominated as unplaced scaffolds due to their lack of resolution to be placed as chromosomes. We assigned final scaffolds to 35 chromosomes and 36 chromosomes for black swan and mute swans respectively (including the sex chromosome), based on the physical size. The W chromosome was not assigned a scaffold in black swan as the DNA was derived from a male. The expected diploid number of chromosomes in the mute swan is 80 . The black swan genome was sequenced using 90x PacBio CLR coverage, generating a 1.12 Gb reference assembly. The mute swan genome was sequenced using 60x PacBio CLR coverage, generating a 1.13 Gb reference assembly. The details of both genomes, including a comparison to the latest VGP (Vertebrate Genome Project) chicken (Gallus gallus) and mallard duck (Anas platyrhynchos) genomes, are shown in Table 1. For each genome, we estimated the heterozygosity, QV, and completeness values using Merqury based on the highly accurate Illumina® short reads (Additional file 2: Table S1 and Additional file 3: Online Methods).
The total classified repeat content of the genome was 10.56% for the black swan, with 1.71% unclassified repeats, and 10.76% for the mute swan genome, with 1.55% unclassified. This is lower than the repeat content recorded in the chicken (15%) and the mallard duck (17%) [9, 10].
The black and mute swan genomes were annotated with RNASeq and IsoSeq transcriptome data, homology-based alignments with other species and with bioinformatically inferred gene predictions, according to the methods listed in Additional file 1: Figure S4.
The completeness of the black and mute swan genomes was assessed using the Core Eukaryotic Genes Mapping (CEGMA) and Benchmark Universal Single Copy Orthologues (BUSCO) analyses and compared to the chicken and mallard duck genome (Additional file 4: Table S2, Additional file 5: Table S3). Notably, the black swan genome had the highest complete BUSCOs (8093), followed by the chicken (8054) and the mute swan (8010) genomes. While the chicken genome had the highest complete core-eukaryotic gene content (224), this was only marginally higher than that of the mute (221) and black swan (219) genomes.
One-to-one alignment between the black and mute swan genomes showed 98.35% average nucleotide identity between the 34 autosomes of the black and mute swans (Additional file 1: Figure S5). The Z chromosome of the black and mute swan had several large (>1kb) structural variants (Additional file 1: Figure S6), but were otherwise largely consistent. These structural differences in the Z chromosome may be associated with the speciation of the black and mute swan from their last common ancestor (approximately 6.1 million years ago) .
Inversions have been associated with the differential susceptibility of chickens and ducks to avian influenza . We found no substantial inversions between the black and mute swan. However, given that both ducks and swans are Anseriformes, we compared structural genomic differences between the susceptible black swan and mute swan and relatively resistant duck, using the ostrich as an outgroup. We investigated genes in the inverted genomic regions present in the duck but absent in the swans on chromosome 1 (Additional file 1: Figure S7). Strikingly, we found that 53 inverted genes (out of 1758 total genes) in chromosome 1 of the duck genome were mapped to immune system processes (Additional file 6: Table S4). However, given the absence of substantial structural variants between the black and mute swan, it is likely that any immunological consequences of these structural variants would be present in multiple swan species.
The black and mute swan genomes were then annotated according to the methods listed in Additional file 1: Figure S4. Sixteen thousand two hundred four (16,204) and 15,789 gene models were obtained through Evidence Modeler as the final gene models in the black swan and the mute swan, respectively. Protein alignment against the UniProt/Swiss-Prot database was used to infer 15,478 gene model names for the black swan and 14,791 gene model names for the mute swan.
Importantly, it is impossible to infer if these differences represent true biological differences in the number of annotated genes or simply reflect the higher quality of the black swan genome.
Mutations in SLC45A2 may account for differences in swan plumage
One of the most remarkable features of the black swan is its distinct plumage colour. To determine if the highly annotated genomes presented herein could offer insight into the iconic plumage, we examined genes (Additional file 7: Table S5) known to be associated in avian plumage color [13, 14]. We observed that SLC45A2 in the mute swan had a nucleotide deletion in the open reading frame instigating a frame-shift mutation and an in-frame early stop codon (Fig. 1). Multiple non-homologous nucleotides were detected in the chicken and the duck SLC45A2 relative to that of the black swan (Fig. 1). SLC45A2 encodes a membrane-associated transport protein which regulates the tyrosinase activity and the melanin content of melanosomes . The knockdown of this gene causes reduced tyrosinase activity and low melanin content in human melanoma cell lines . These results suggest that this deletion in SLC45A2 is a candidate genetic change that could be responsible for the white plumage in white swans in the genus Cygnus.
Immune gene families are expanded in the mallard duck and mute swan genomes
To understand whether the black swan (which has been isolated in Australia) has an altered immune gene repertoire compared to its close relatives, evolutionary gene gain and loss were determined (Fig. 2) (p-value < 0.05). The black swan genome was estimated to be contractive, indicating that the total gene gain was less than the gene loss from the last common ancestor. The biological function of expanded genes in the black swan, chicken, mute swan, and duck was then investigated.
Strikingly, gene families related to immune system processes (e.g., GO0002376) were only expanded in the mallard duck and the mute swan genomes (Fig. 3). In contrast, in chickens, expanded gene families were associated with regulation of GTPase activity, extracellular matrix and structure organization, while the over-represented functional terms for expanded black swan gene families included cell-matrix adhesion, cell-substrate adhesion, extracellular matrix organization and extracellular structure organization. To specifically compare the immune gene repertoire of black and mute swans, we used human and mouse immune genes to identify immune gene families in Cygnus species. Thirty-nine immune-related gene families of the black swan were contractive compared to the mute swan (Additional file 8: Table S6). The PANTHER pathways related to these genes included apoptosis signaling, cadherin signaling, general transcription by RNA polymerase, gonadotropin-releasing hormone receptor, inflammation mediated by chemokine and cytokine signaling, interleukin signaling, TGF-beta signaling, and Wnt signaling pathways.
Black and mute swan classical Major Histocompatability Complex loci are homologous
Major Histocompatability Complex (MHC) diversity is altered in some avian species , which may affect susceptibility to disease . We therefore compared MHC loci between black and mute swans. Two MHC class I and MHC class II loci were identified in the black and mute swan (Additional file 1: Figure S8). These were located on chromosome 33 in the swan genomes. A similar number of MHC complex associated genes were identified in each species. None of these genes appeared to be pseudogenes. Compared to mammals, both mute and the black swans have a compact, relatively simple MHC B locus (Additional file 1: Figure S9), with two class IIb (BLB) genes followed by a pair of class I (BF) genes that flank the TAP genes. The TAPBP gene in both birds, unlike chickens, does not flank the two-class-IIb genes . Overall, the MHC region of both the black and mute swan share a similar genome landscape and represent a "minimal essential MHC" similar to chicken and mallard duck . It is therefore unlikely that differences in the MHC complex contribute to species-specific differences in the response to HPAI virus infection.
Toll-like receptor 7 expression cannot be detected in black swan endothelial cells
Toll-like receptor 7 (TLR7) signaling has been implicated in influenza A virus recognition in mammals and birds where it functions as a pathogen recognition receptor that recognizes single-stranded viral-RNA . TLR7 has been duplicated independently in several avian species  and differences in TLR7 tropism and function have been associated with the increased resistance of ducks to HPAI . There was no notable structural difference in the TLR7 gene between the black and mute swan genomes. However, strikingly, TLR7 expression signals were detected in ISO-Seq analysis of mute swans but not in the ISO-seq analysis of black swan (Additional file 1: Figure S10). To independently confirm these data, we investigated the expression of TLR7 using qRT-PCR in black swan tissues collected post-mortem. TLR7 mRNA could not be detected in any of the collected black swan tissues (Additional file 9: Table S7). As TLR7 expression can be induced by interferon, we reasoned that gene expression in the black swan may only be detected in the presence of virus infection. Accordingly, we sought to establish an in vitro model of HPAI infection in black swans. In black swans experimentally infected with HPAI virus, endothelial cells are the primary target of infection . We observed similar infection of endothelial cells in swans naturally infected with HPAI (Additional file 1: Figure S11). We therefore cultured primary black swan endothelial cells according to our previously described protocol for avian species  and endothelial cell identity was confirmed by tube formation, uptake of acetylated low density lipoprotein, von Willebrand factor expression and the absence of CD45 expression (Additional file 1: Figure S12). Chicken, duck, and black swan endothelial cells were infected with A/Chicken/Vietnam/008/2004/H5N1 (VN04) and 6 h later gene expression was examined by RNASeq. PCA plots showed that mock and virus-infected samples clustered separately for all three species (Fig. 4). Viral-RNA was detected in the infected endothelial cells of all three species (data not shown). Importantly, while TLR7 transcription was upregulated (although not statistically significant) in infected duck and chicken endothelial cells, TLR7 transcription could not be detected in infected or naïve black swan endothelial cells (Table 2). Moreover, while myeloid differentiation primary response 88 (MyD88), the downstream adaptor of TLR7 was upregulated in infected duck and chicken endothelial cells, it was downregulated in infected black swan endothelial cells (Table 2). These data are consistent with the absence of TLR7 expression in black swan endothelial cells, despite an apparently intact TLR7 gene in the genome. No other notable expression differences in ISOSeq/RNASeq transcripts were recorded in immune genes noted in the influenza A KEGG pathway (https://www.kegg.jp/kegg-bin/show_pathway?ko05164).
Black swan endothelial cells display a dysregulated pro-inflammatory response to HPAI virus infection
The transcriptomic data generated herein offer a unique insight into the transcriptomic response of black swans to HPAI virus infection. To explore this matter further, we performed Gene Ontology (GO) enrichment analysis using significantly differentially expressed genes in infected chicken, duck, and black swan endothelial cells (Additional file 10: Table S8, Additional file 11: Table S9, Additional file 12: Table S10). The most significantly upregulated gene in black swans was IFIT5, IL8 in chickens, and BCOR in ducks. IL6 was significantly upregulated in the black swan (log2fold change = 1.89) and chicken (log2fold change = 1.21), indicating a strong pro-inflammatory response while not differentially expressed in ducks. Black swan, chicken, and duck endothelial cells differentially expressed other cytokines/chemokines and their receptors in response to HPAI VN04 infection (Additional file 13: Table S11). Typically, black swan and chicken endothelial cells upregulated more cytokines and cytokine receptors than duck endothelial cells in response to HPAI VN04 infection. Indeed, the highest number of cytokines and cytokine receptors were upregulated by infected chicken endothelial cells (Additional file 13: Table S11). In infected black swan endothelial cells, 113 GO terms were significantly enriched (Additional file 14: Table S12; Fig. 5A). Many of these GO terms were associated with innate immunity, the cytokine signaling response and chemokine signaling. Several innate immunity pathways were increased in response to viral infection (z-score > 0) while GO terms such as negative regulation of Mitogen-Activated Protein Kinase (MAPK) activity and negative regulation of c-JUN N-terminal Kinase (JNK) cascade were decreased (z-score < 0). Similarly, 123 enriched GO terms in infected chicken endothelial cells included positive regulation of viral response and regulation of leukocyte chemotaxis (Additional file 15: Table S13; Fig. 5B). Terms such as leukocyte mediated cytotoxicity were increased after infection (z-score > 0) while negative regulation of apoptotic signaling and the positive regulation of innate immune responses were decreased. GO biological process terms enriched in infected duck endothelial cells were not primarily associated with the innate immune response (Additional file 16: Table S14; Fig. 5C). Rather, most genes were linked to cellular biological signaling and activity. This finding is consistent with our previous study of HPAI viruses in duck endothelial cells . Interestingly, in direct contrast to black swans, the inactivation of MAPK activity was significantly increased in ducks (z-score < 0). Due to the wide-ranging role of the MAPK cascade, including pro-inflammatory responses, we further investigated the expression profiles of the genes and identified ten genes involved in the “inactivation of MAPK pathway,” five of which were significantly downregulated genes (i.e., DUSP1, DUSP4, DUSP7, DUSP10, and RGS3) in black swans (Additional file 1: Figure S13). Dual-specificity phosphatases (DUSPs) are negative regulators of MAPKs and their associated pro-inflammatory effects . Accordingly, we specifically examined the differential expression of DUSPs across the three avian species. In the black swan, all DUSPs were either not differentially expressed or downregulated in response to infection. In contrast, in the duck, all DUSPS (except for DUSP15) were upregulated. Similar, in the chicken, DUSP1, DUSP5, DUSP7, DUSP10, DUSP15, and DUSP16 were significantly downregulated in response to infection (Additional file 17: Table S15). In contrast, in the duck, all DUSPs (except for DUSP15) were upregulated. This transcriptional profile is consistent with poor regulation of a pro-inflammatory response to HPAI virus in black swans.
The black and mute swan reference genomes provided herein represent the first publicly available swan genomes. The analyses of these genomes, together with the first black swan transcriptome in response to HPAI virus infection, has provided a unique insight into the plumage and immune system of the black swan.
The genomic insights provided by the present study were only possible due to the growing availability and accessibility of third generation sequencing. Specifically, older technologies that generate short-read sequences can result in incorrect assembly, annotation errors, and a large amount of manual effort to correct individual genes . In contrast, and consistent with the broader goals of the Vertebrate Genomes Project , the use of longer read sequences herein allowed us to generate black swan and mute swan genomes that were scaffolded to near chromosomal length and that were of comparable quality to the well-annotated chicken reference genome.
Genomic analysis of genes known to be associated with plumage color in other birds  identified a potential frame-shift mutation in the first exon of SLC45A2 in the mute swan, which may have led to pseudogenization of this gene. SLC45A2 encodes a transporter protein involved in melanin synthesis and is considered one of the most important proteins affecting human pigmentation . Mutations in the SLC45A2 gene have been reported in albinism in humans . Furthermore, mutations in the gene have been associated with plumage color variation in Japanese quails , indicating the importance of the SLC45A2 in avian plumage. Interestingly, should a mutation of SLC45A2 have resulted in the differential plumage of the black and mute swan, it would suggest that the last common ancestor of these birds was, in fact, black. This is direct contrast to the metaphor of “black swan events” that are so defined because of their unprecedented and unexpected nature. Instead, it would appear that at one point in history black plumage for the swan was the norm rather than the exception.
Compared to the last common ancestor, mute swan and mallard duck gene families involved in immune system processes were expansive. In contrast, no expansion in immune gene families was noted in the chicken or the black swan. This differential immune gene expansion, and its implications for susceptibility to HPAIV, is likely compounded by the observed impaired expression of TLR7 in the endothelial cells of black swans. Interestingly, other genes that have been observed to be differentially expressed between chickens and ducks, and implicated in susceptibility to HPAIV, were not differentially expressed between infected black swan and duck endothelial cells (e.g., RIG-I and IFITM3) [30,31,32,33]. It is interesting to speculate as to whether mute swan endothelial cells would express TLR7. However, the presence or absence of TLR7 in the endothelial cells of mute swans is perhaps irrelevant to the pathogenesis of HPAIV, as the virus is not heavily endothelial tropic in this species . In the black swan, the observed differences in TLR7 expression in endothelial cells speak to the value of combining genomics with both primary cell culture and transcriptomics, as has recently been suggested as the new standard for comparative genomics by Stephan and colleagues . Either as a result of, or in addition to, these observed immune differences, black swan endothelial cells also mounted a markedly pro-inflammatory response to HPAIV infection. We have previously reported a similar pro-inflammatory response in infected chicken endothelial cells (compared to those of ducks) and speculated that this inflammatory response leads to immunopathology in chickens in vivo. Whether disease severity in black swans is driven by immunopathology remains to be determined, although it is consistent with the observed pathology in infected birds [2, 3]. In sum, it is likely that this combination of species-specific differences in the immune response contribute to the marked susceptibility of both the black swan and chicken to HPAIVs.
The observed species-dependent differences in the immune responses of swans raise the intriguing question as to why the black swan continues to thrive in its native Australia as well as in New Zealand (where it was introduced in nineteenth century). This may be due to the fact that HPAIV is not endemic in Australia and New Zealand. Indeed, captive populations of black swans located in parts of the world frequently exposed to HPAIV are highly susceptible to severe disease . The data presented in this study would therefore suggest that should HPAI become more prevalent in the Oceania region, the ongoing survival of the black swan would be at significant risk. Moreover, many of the immune limitations described herein are not specific to avian influenza viruses. For example, TLR7 is essential in the immune recognition of a wide number of viral pathogens including avian coronaviruses . These data suggest that should any avian endothelial specific viral infection become established in the native habitat of the black swan, the survival of this iconic species would be in significant peril.
A full description of the methods can be found in Additional file 3: Online Methods. No statistical methods were used to predetermine sample size.
Raw SMRT sequencing reads were used for de novo black swan genome assembly using open-source FLACON (v2018. 31-08-03.0)/FALCON-UNZIP (126.96.36.199841) diploid aware genome assembly algorithms to produce primary and alternative haplotigs as the primary assembly. For FALCON assembly, the estimated genome size (haplotype) used for the black swan was 1.4181 × 109 base pairs. Additional parameters used for FALCON assembly are listed in Additional file 18: Table S16. Raw FASTA sequences were extracted from the subread BAM files produced by SMRT sequencing using PacBio® BAM2fasta tool. Raw FASTA sequences from 10 SMRT cells were given as the input for the FALCON assembler, and the FALCON assembly parameters were provided using a configuration file. Once the FALCON run was completed, FALCON-UNZIP was executed in the same directory as the initial FALCON run. For the FALCON-UNZIP run, the subread BAM data was also used. Once the FALCON UNZIP run was complete, two genome polishing steps were undertaken through PacBio® SMRT Link version 7.0.0 resequencing pipeline using PacBio CLR reads with default settings (-x 30) in addition to FALCON_UNZIP integrated arrow polishing to produce the primary PacBio assembly. This PacBio assembly was screened for the presence of mitochondrial/non-chromosomal DNA suing NCBI Blast+ (v2.9.0+) against the black swan mitochondrial genome . The mitochondrial DNA sequences were removed from the primary PacBio assembly. PURGE_DUPS (v1.2.5) was next employed to remove false haplotypic duplications from the concatenated primary PacBio assembly and alternate haplotigs. The PacBio assembly completeness was assessed via BUSCO V.5.1.2 against avian orthoDB10. The presence of highly conserved core vertebrate genes was evaluated using the CEGMA pipeline, available online in gVolante (https://gvolante.riken.jp/).
The FALCON draft was scaffolded to chromosome-length by the DNA Zoo Consortium following the methodology described here: www.dnazoo.org/methods. Briefly, the Hi-C data was processed using Juicer  and used as input into the 3D-DNA pipeline  to produce a candidate chromosome-length genome assembly. We performed additional finishing on the scaffolds using Juicebox Assembly Tools [37, 38]. The contact matrices generated by aligning the Hi-C data to the genome assembly before and after the Hi-C scaffolding are available for browsing at multiple resolutions on https://www.dnazoo.org/assemblies/Cygnus_atratus visualized using Juicebox.js, a cloud-based visualization system for Hi-C data . Finally, the Hi-C assembly was subjected to gap closing using the TGS-GapCloser (v1.0.1) algorithm  with default settings. PacBio continuous long reads (CLRs) were used as input for the software tool for gap closing.
The Anatidae family’s established terminology for the chromosome-length scaffolds was based on the homology between black swan scaffolds/chromosomes and the closely related mute swan. MUMMER (v3) was used to establish a 1:1 homology between the black swan and the mute swan before assigning the macro and microchromosomes terminology to black swan chromosomes.
Mute swan genome
The mute swan genome was sequenced by the vertebrate genome project (VGP) according to the “vertebrate genome project assembly phase I” pipeline. The genomic DNA was extracted from the stomach tissue of a male adult mute swan. The sequencing technology and data used to build the mute swan haploid chromosome length hybrid assembly included PacBio Sequel I CLR, Illumina Novaseq®, Arima® Hi-C linkage data and BioNano optical maps.
FALCON (v2018. 31-08-03.0)/FALCON-UNZIP (188.8.131.52841) was used to generate the phased contigs. To remove the false duplications in the phased haplotigs, Purge_Haplotigs (v1.0.3+) was used. After that, two rounds of scaffolding were performed using 10x Genomics link data with scaff10x (v4.1.0) (https://github.com/wtsi-hpag/Scaff10X) for the first step of scaffolding with 10xGenomics® link data to build the S1 scaffold set. Next, the second step of scaffolding was performed on S1 using BioNano data.
BioNano solve (v3.2.1_04_122018) software was used to produce the S2 scaffold. The last scaffolding was performed to scaffold the S2 set using Arima® Hi-C linkage data with SALSA (v2.2) [41, 42]. The Hi-C contact matrices were generated and visualized as described in the vgp assembly pipeline (https://github.com/VGP/vgp-assembly). Primary and alternate haplotigs were concatenated, and bases were polished with Arrow (SMRTanalysis v184.108.40.206841) using PacBio CLR reads. Two more rounds of polishing were performed with linked reads by aligning with Longranger (v2.2.2) and variant calling with FreeBayes (v1.3.1) followed by consensus calling with bcftools consensus. Finally, manual curation of the assembly was performed using gEVAL (v2019-09-26).
The chromosome assignment was performed using evidence from Hi-C. A scaffold was considered a chromosome with Hi-C evidence when there is a clear unbroken diagonal in the Hi-C maps in the Juicebox. Every Hi-C box from the largest to the smallest for evidence-validated scaffolds was considered as a chromosome. Subsequently, established terminology for the chromosomes was given for the Anatidae family.
The final assembly’s structural and functional annotation of genes was conducted using de novo homology-based and evidence-based methods (for details, see Additional file 3: Online Methods). Firstly, the primary PacBio assembly was annotated using NCBI eukaryotic and Ensembl gene annotation pipelines. A set of transcriptomic data (of both short-reads and long reads) derived from the liver, spleen, peripheral blood mononuclear cells and primary bone marrow-derived endothelial cells was used for black swan genome annotation, and the ISOseq data for mute swan annotation was derived from the peripheral blood.
Structural variant assessment of the genes associated with plumage color in swans
Structural variants in genes associated with plumage color were examined using the University of California, Santa Cruz (UCSC) genome browser with multiple whole-genome alignments. We first aligned four Gallianseriformes’ whole genomes (black swan, mute swan, chicken and mallard duck) with CACTUS to produce a HAL alignment file . The HAL file alignment was then visualized in the Genome assembly In a Box (GBiB) using the HAL tools package (V2.1). We then used predicted protein and mRNA sequences plumage color genes (from all four species) as queries to perform protein coding region alignment with the BLAT built in the GBiB and compare with each species in the HAL alignment. We validated putative open reading frames (ORF) for each gene from each species by visual inspection. Validity was assessed based on the presence or absence of putative donor-recipient splice sites, start codon, and in-frame stop codons. We also calculated the percentage identify and coverage for each gene between mute and black swans using Blast pairwise alignment.
Gene family evolution
The longest peptide sequence for a given protein-coding gene for 17 species was retrieved from Ensembl (release 104) and NCBI Annotation release 103, including zebrafish (Danio rerio) (GCA_000002035.4), Japanese rice fish (Oryzias latipes) (GCF_002234675.1), three-pinned stickleback fish (Gasterosteus aculeatus) (BROAD S1), tropical clawed frog (Xenopus tropicalis) (GCA_000004195.3), platypus (Ornithorhynchus anatinus) (GCA_004115215.2), gray short-tailed opossum (Monodelphis domestic) (GCA_000002295.1), mouse (Mus musculus) (GCA_000001635.9), human (Homo sapiens) (GCA_000001405.28), cattle (Bos taurus) (GCA_002263795.2), anole lizard (Anolis carolinensis) (GCA_000090745.2), mallard duck (Anas platyrhynchos) (GCA_002743455.1), turkey (Meleagris gallopavo) (GCA_000146605.4), chicken (Gallus gallus) (GCA_000002315.5), zebra finch (Taeniopygia guttata) (GCA_003957565.2), and collared flycatcher (Ficedula albicollis) (GCA_000247815.2). The longest amino acid sequences from coding genes in the final reference annotations were used for the black and mute swan. The phylogenetic tree for the above 17 species was constructed using the Orthofinder pipeline with – M msa -S blast -I 1.3 settings. The final rooted species tree was inferred using STAG from the Orthofinder, which was then made ultra-metric with the root-age of 435 million years (obtained from TimeTree.org database). Peptide sequences in 17 species were scored using PANTHER (v15.0) database and clustered into PANTHER families and subfamilies and then identified by a family-specific PANTHER identifier. The maximum likelihood of gene family evolution for a given taxon was estimated by running CAFÉ(v5) with the PANTHER assigned protein families of the 17 species and the Orthofinder-based ultrametric species tree (significant at p-value < 0.05).
Functional profiling of expanded and contracted gene families in swans
Each PANTHER sub-family identification with a significant gene family evolution was assigned to the gene ontology category (GOslim identification number) to examine the biological role of significantly evolving (either by expansion or contraction) (Vertibri p-value < 0.05) gene families through the r-package PANTHER.db. The significant gene families that appeared to have expanded or contracted were separated. Subsequently, GO enrichment analyses were conducted using topGO  to test for over-represented functional terms associated with expanded and contracted gene families (a given gene family is considered expanded if the number of genes are increased compared to last common ancestor). The significant gene sub-families annotated with GOslim identifications were used as the foreground, and the GOslim annotated gene subfamilies of the last common ancestor with at least one gene present in the family were used as the background. Fisher’s exact test was used for statistical testing, and the terms with corrected p-value < 0.01 were chosen as overrepresented GO terms.
Comparison of the immune gene repertoire
Selected immune databases were used to obtain the list of known immune genes from Ensembl release and were compared to that of the black swan and mute swan genome (see Additional file 3: Online Methods for details)
Major histocompatibility complex (MHC) class annotation in black and mute swan genomes
The major histocompatibility complex (MHC) loci (MHC class I and MHC class IIb) for both mute and the black swan genomes were identified as previously described . Briefly, the avian order-level consensus sequences for MHC I and MHC II loci were built using several different bird species covering at least three exons (exon 2, exon 3, and exon 4). The blast algorithm aligned the order-level consensus sequences to the black swan and mute swan genomes. MHC loci were estimated as the number of blast hits that contained all three exons within 2kb of each other. Each exon was examined for in-frame stop codons, and it was eliminated as a locus if present. Genes present in regions 100kb upstream and downstream of each locus were manually annotated and the gene location plotted through genes to identify the additional genes of each predicted locus. Each predicted MHC locus was manually inspected for premature stop codons, and if present, was eliminated. Identified MHC molecules and MHC B loci associated genes were visualized for black and the mute swan for comparison using Circlize (R-package).
TLR7 expression in black swan tissues
Total RNA was extracted from tissues of an adult male black swan (stomach, kidney, and liver) using an RNA plus mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. Genomic DNA was extracted from the stomach muscles using the MagAttract HMW DNA kit (Qiagen®, Hilden, Germany) for the positive control, following the manufacturer’s recommendation. The black swan-specific TLR7 (FW: TTGCACTTCCACACTCCAAG; RVTCAGTCCAATTGCACCTCTG; Probe:CTCCGAAACAATCGCATTCAACGG) and 18S (FW: CCTGCGGCTTAATTTGACTC; RV: AGACAAATCGCTCCACCAAC; Probe:TTGAGAGCTCTTTCTCGATTCCGTGG) primers and probes were designed using the primer3plus web tool
Immunohistochemistry for influenza A virus nucleoprotein (NP) was performed according to Additional file 3: Online Methods.
Isolation of primary black swan, chicken, and duck endothelial cells
We used previously isolated and characterized chicken and duck primary aortic endothelial cells as the controls for black swan primary endothelial cell characterization [24, 25]. Briefly, 17-day-old chicken and 21-day-old Pekin duck embryonated eggs were purchased from Darling Downs Hatchery (Queensland, Australia). Primary aortic endothelial cells were cultured from the aortic arches of chicken and duck embryos as described previously using EGM-2MV medium (Lonza, Basel, Switzerland) with 10% FBS (Gibco, Waltham, MA, USA) [305, 307]. All avian cells were cultured at 40 °C 5% CO2 unless otherwise stated.
The black swans used for cell isolation were sourced from Currumbin Wildlife Animal Hospital (−28.14, 153.48), Queensland, Australia, and Tauranga Harbor, Bay of Plenty, New Zealand (−37.57, 175.97). The animals were either euthanized due to a terminal illness or as part of a government-approved cull. Tibiotarsi and femurs were obtained from all birds. The bones were then partially opened in the middle to expose the bone marrow and then submerged in pre-chilled (4°C) Dulbecco’s modified Eagle medium (DMEM) in T175 flasks. Flasks were then transferred to a personal containment level 2 (PC2) laboratory for further processing within 4 to 3 h post-mortem.
The experiment was designed to include two groups viz. challenge and control, from each of the species, i.e., black swan, chicken and duck. Each group consisted of three biological replicates of endothelial cells.
A/Chicken/Vietnam/0008/2004(H5N1) (VN04) virus was amplified in MDCK cells. All experiments using HPAIVs were performed under physical containment level 3 (PC3) settings at the Australian Centre for Disease Preparedness (Geelong, Australia). Primary endothelial cells were grown to at least 60% confluency and were inoculated with 5 × 105 PFU/mL of VN04 for 60 min at 40°C under physical containment 3 (PC3) settings. The inoculum was removed, and the cells were washed once with PBS, and EGM-2MV media without FBS was added. The cells were incubated for an additional 5 h.
According to the manufacturer’s instructions, cellular RNA was extracted from cultured cells using the RNeasy Plus Kit (Qiagen, Hilden, Germany). Subsequently, 3M of sodium acetate (NaOAc, pH 5.5) and 100% ethanol were added for RNA precipitation.
Paired-end, 150bp long, RNAseq was performed by Macrogen (Macrogen, Seoul, South Korea) approximately 40 million reads per sample. Library preparation was performed using the SMARTer® ultra-low kit (Takara Bio® USA, Inc.). All the libraries were barcoded and subsequently sequenced using the Novaseq® 6000 (Illumina, San Diego, CA, USA) with Novaseq® 6000 S4 Reagent Kit (Illumina, San Diego, CA, USA) as per the manufacturer’s instructions using within a single lane in a single cell to avoid potential technical variations. RNAseq reads were separated into original samples based on the corresponding bar code.
Differential gene expression analysis
RNAseq-based differential gene expression analysis was performed after assessing the quality of the raw reads using the FastQC tool before and after adaptor and quality trimming with TrimGalore version 0.6.2 (RRID: SCR_011847). RNAseq reads were then quantified at the transcript level using Salmon v1.5.2. First, the indexes were created for the black swan reference transcriptome and the Ensembl version 104 chicken (GRCg6a) and duck (CAU_Duck1.0) transcriptomes with following parameters (salmon quant -i (params.ind) -l (params.libtype) -g (params.genemap) -r input_r1 -f input_r2 --validateMappings -o (output.dir)). The transcript abundance data of each species were imported through the R package “tximport” by transforming data from transcripts to gene-level quantification. The expression data from each species (black swan, chicken, and duck) were independently analyzed for differential gene expression. Once RNAseq read libraries were normalized for sequencing depth and RNA composition, differential gene expression analysis was performed using DESeq2 version 1.30.1 . A Bonferroni adjusted p-value (adj.p.val) cut-off of 0.05 and log2fold enrichment of 0.58 (fold-change 1.5) were used as the significance threshold (differential gene expression: mock vs. infected).
Gene Ontology enrichment analysis
Gene Ontology (GO) terms were assigned to all quantified genes in chicken and duck using the BioConductor package BioMart , and the GO terms were assigned to all expressed and quantified genes using InterProScan5 . For black swan genes, we used InterProScan 5  for GO annotation. Using the GO annotation, we then performed GO enrichment analysis with topGO . For the statistical enrichment, Fisher’s exact test was used with the “weight01” algorithm. GO enrichment analyses were performed for all three GO categories, including “biological process,” “cellular component,” and “molecular function.” Results of the GO enrichment were visualized with “ggplot2” and “GOplot”  R packages. We calculated a score (named as “z-score” in the GOplot package) to evaluate the trend (increasing or decreasing biological process) using GOplot in-built mathematical formula (Formula 1) for enriched terms of the GO biological process category in each species.
Availability of data and materials
The codes and data used for the bioinformatic analysis are detailed in Tables 3 and 4, respectively.
O'Brien SJ, Evermann JF. Interactive influence of infectious disease and genetic diversity in natural populations. Trends Ecol Evol. 1988;3:254–9.
Westerdahl H, Mellinger S, Sigeman H, Kutschera VE, Proux-Wéra E, Lundberg M, et al. The genomic architecture of the passerine MHC region: High repeat content and contrasting evolutionary histories of single copy and tandemly duplicated MHC genes. Mol Ecol Resour. 2022;22(6):2379–95.
Tong ZWM, Karawita AC, Kern C, Zhou H, Sinclair JE, Yan L, et al. Primary chicken and duck endothelial cells display a differential response to infection with highly pathogenic avian influenza virus. Genes. 2021;12:901.
Tóth L, Fábos B, Farkas K, Sulák A, Tripolszki K, Széll M, et al. Identification of two novel mutations in the SLC45A2 gene in a Hungarian pedigree affected by unusual OCA type 4. BMC Med Genet. 2017;18:27.
Smith J, Smith N, Yu L, Paton IR, Gutowska MW, Forrest HL, et al. A comparative analysis of host responses to avian influenza infection in ducks and chickens highlights a role for the interferon-induced transmembrane proteins in viral resistance. BMC Genomics. 2015;16:574.
Jiang F, Miao Y, Liang W, Ye H, Liu H, Liu B. The complete mitochondrial genomes of the whistling duck (Dendrocygna javanica) and black swan (Cygnus atratus): dating evolutionary divergence in Galloanserae. Mol Biol Rep. 2010;37:3001–15.
Kraus, R., Fedrigo, O., Formenti, G., Mountcastle, J., Chow, W., Collins, J., Howe, K., Rhie, A., Karawita, A., Short, K. and Jarvis, E.D. Cygnus olor (Mute swan) genome, bCygOlo1, primary haplotype, v2. Datasets. NCBI. https://www.ncbi.nlm.nih.gov/assembly/GCF_009769625.2.
Kraus, R., Fedrigo, O., Formenti, G., Mountcastle, J., Chow, W., Collins, J., Howe, K., Rhie, A., Karawita, A., Short, K. and Jarvis, E.D. Mute swan raw read data (PacBio CLR/Arima Hi-C Illumina, 10x Linked data/Bionano data), Datasets. https://vgp.github.io/genomeark/Cygnus_olor/.
We would like to acknowledge colleagues from the Friedrich-Loeffler-Institut (Jens Peter Teifke, Robert Klopfleisch, and Angele Breithaupt) for providing FFPE material. Special thanks to Christoph Schulze from the state diagnostic laboratory who was involved in the autopsy of most of the samples use for FFPE material collection and Ashling Charles from DNA Zoo Australia team for routine data processing support. We also would like to thank Leo Joseph from Australian National Wildlife Collection at CSIRO and Tiggy Grillo from Wildlife Health Australia for assisting/obtaining samples for genome sequencing. We are also thankful for research computing center at The University of Queensland for providing the high performance computing facility for black swan genome assembly and annotation.
Peer review information
Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
The review history is available as Additional file 19.
This study was funded by the Australian Department of Agriculture and Water Resources (CSIRO Science and Innovation Award; KRS), the Australian Department of Agriculture and Water Resources (Australian Eggs Innovation Award; KRS), and an Australian Research Council DECRA (DE180100512; KRS). This research was funded in part by the Wellcome Trust [Grant numbers 108749/Z/15/Z, 222155/Z/20/Z]. For the purpose of open access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. ACK was supported by the Australian Government’s research training program (RTP) scholarship. PK is supported by the University of Western Australia with additional computational resources and support from the Pawsey Supercomputing Centre with funding from the Australian Government and Government of Western Australia. Hi-C data for the Black Swan was created by the DNA Zoo Consortium (www.dnazoo.org). DNA Zoo sequencing effort is supported by Illumina, Inc. E.L.A. was supported by the Welch Foundation (Q-1866- 20210327), an NIH Encyclopedia of DNA Elements Mapping Center Award (UM1HG009375), a US-Israel Binational Science Foundation Award (2019276), the Behavioral Plasticity Research Institute (NSF DBI-2021795), NSF Physics Frontiers Center Award (NSF PHY-2019745), and an NIH CEGS (RM1HG011016-01A1).
Authors and Affiliations
School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD, 4072, Australia
Anjana C. Karawita, Keng Yih Chew, Marcus Z. W. Tong, Katina D. Hulme, Helle Bielefeldt-Ohmann, Lauren E. Steele, Melanie Wu, Julian Sng, Ellesandra Noye, David W. Burt & Kirsty R. Short
Commonwealth Scientific and Industrial Research Organisation, Australian Centre for Disease Preparedness, 5 Portarlington Road, Geelong, VIC, 3220, Australia
Anjana C. Karawita, Arjun Challagulla, Gough G. Au, Suzanne Lowther, Alexander J. McCauley & Michelle L. Baker
School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, 2006, Australia
Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, 78315, Germany
Robert Kraus & Ralf C. Mueller
Department of Biology, University of Konstanz, Konstanz, 78457, Germany
Robert Kraus & Ralf C. Mueller
Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
Timothy J. Bruxner
Department of Organismal Biology – Systematic Biology, Evolutionary Biology Centre, Uppsala University, Science for Life Laboratory, Uppsala, 752 36, Sweden
Julie Blommaert & Alexander Suh
The New Zealand Institute for Plant & Food Research Ltd, Nelson, 7010, New Zealand
School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TU, UK
School of Agriculture and Environment, The University of Western Australia, Perth, WA, 6009, Australia
Parwinder Kaur & Erez Aiden
The Centre for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
Olga Dudchenko & Erez Aiden
Centre for Theoretical Biological Physics and Department of Computer Science, Rice University, Houston, TX, 77030, USA
Olga Dudchenko & Erez Aiden
Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA
Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech, Pudong, 201210, China
The Vertebrate Genome Laboratory, The Rockefeller University, NY, 10065, USA
Olivier Fedrigo, Giulio Formenti, Jacquelyn Mountcastle & Erich D. Jarvis
Tree of Life, Welcome Sanger Institute, Cambridge, CB10 1SA, UK
William Chow, Kerstin Howe & Alan Tracey
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
Fergal J. Martin & Denye N. Ogeh
National Centre for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
Jacqueline Smith & Richard I. Kuo
School of Biosciences, The University of Melbourne, Melbourne, VIC, 3052, Australia
Marilyn B. Renfree
Faculty of Veterinary Medicine, Hokkaido University, Sapporo, Hokkaido, 060-0818, Japan
Takashi Kimura & Yoshihiro Sakoda
New Zealand Fish & Game – Eastern Region, Rotorua, 3046, New Zealand
Department of Zoology, University of Otago, Dunedin, 9054, New Zealand
Hamish G. Spencer
Currumbin Wildlife Sanctuary, Currumbin, QLD, 4223, Australia
Centre for Ecology and Evolution in Microbial Model Systems (EEMiS), Linnaeus University, Kalmar, SE-391 82, Sweden
Animal ethics approval was obtained from the animal ethics unit, the University of Queensland’s research ethics office (ANFRA/311/18; SCMB/002/18). A wildlife research permit was obtained from the Queensland Department of Environment, Land, and Water Planning under the Wildlife Act 1975 (permit number: 10008904). For bone marrow tissue collection in New Zealand, we obtained permission from the Department of Conservation under Wildlife Act Authority on non-public conservation land (70705-FAU & 70709-DOA). Ethical approval to sample mute swans was obtained by the Linköping Animal Ethics board (permit 01125-2020).
KRS is a consultant for Sanofi, Roche, and NovoNordisk. The opinions and data presented in this manuscript are of the authors and are independent of these relationships. The other authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1: Supplementary Figure S1. Schematic overview of the process used to construct the black swan genome. Supplementary Figure S2. Schematic overview of the assembly pipeline version 1.5 of the vertebrate genome project used to construct the mute swan genome. Supplementary Figure S3. Hi-C contact maps for the black swan (a) and mute swan (b). Supplementary Figure S4. Schematic overview of the gene annotation pipeline. Supplementary Figure S5. The largest 34 chromosomes aligned between the black swan (left) and the mute swan (right). Supplementary Figure S6. Collinearity analysis of the Z chromosome between the mute swan (left) and black swan (right). Supplementary Figure S7. Black swan and duck synteny plot between the first chromosome. Supplementary Figure S8. MHC Class I (A) and MHC Class II (B) aligned regions of order-level consensus exons of the mute and black swan. Supplementary Figure S9. Relative locations of MHC complex and associated genes in black and mute swan chromosome 33. Supplementary Figure S10. TLR7 expression can be detected in ISO Seq analysis of the mute swan (A) but not the black swan (B). Supplementary Figure S11. IAV NP antigen distribution in tissues from a black swan naturally infected with A/Black swan/Akita/2/2016 (H5N6) in 2016 in Akita prefecture, Japan. Supplementary Figure S12. The successful culture of primary black swan endothelial cells confirmed by qRT-PCR, immunofluorescence and tube formation. Supplementary Figure S13. Differential regulation of DEGs linked to the “inactivation of MAPK” pathway in black swan endothelial cells in response to VN04 infection along with randomly selected GO terms.
Additional file 8: Supplementary Table S6. Immune gene families are contractive in black swans compared to mute swans. The number of genes in each immune gene sub-family identified is given in each column for the corresponding species.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.