Low frequency of paleoviral infiltration across the avian phylogeny
Genome Biology volume 15, Article number: 539 (2014)
Mammalian genomes commonly harbor endogenous viral elements. Due to a lack of comparable genome-scale sequence data, far less is known about endogenous viral elements in avian species, even though their small genomes may enable important insights into the patterns and processes of endogenous viral element evolution.
Through a systematic screening of the genomes of 48 species sampled across the avian phylogeny we reveal that birds harbor a limited number of endogenous viral elements compared to mammals, with only five viral families observed: Retroviridae, Hepadnaviridae, Bornaviridae, Circoviridae, and Parvoviridae. All nonretroviral endogenous viral elements are present at low copy numbers and in few species, with only endogenous hepadnaviruses widely distributed, although these have been purged in some cases. We also provide the first evidence for endogenous bornaviruses and circoviruses in avian genomes, although at very low copy numbers. A comparative analysis of vertebrate genomes revealed a simple linear relationship between endogenous viral element abundance and host genome size, such that the occurrence of endogenous viral elements in bird genomes is 6- to 13-fold less frequent than in mammals.
These results reveal that avian genomes harbor relatively small numbers of endogenous viruses, particularly those derived from RNA viruses, and hence are either less susceptible to viral invasions or purge them more effectively.
Vertebrate genomes commonly harbor retrovirus-like  and non-retrovirus-like  viral sequences, resulting from past chromosomal integration of viral DNA (or DNA copies of viral RNA) into host germ cells. Tracing the evolutionary histories of these endogenous viral elements (EVEs) can provide important information on the origin of their extant counterparts, and provide an insight into host genome dynamics -. Recent studies have shown that these genomic ‘fossils’ can also influence the biology of their hosts, both beneficially and detrimentally; for example, by introducing novel genomic rearrangements, influencing host gene expression, as well as evolving into new protein-coding genes with cellular functions (that is, ‘gene domestication’) ,.
Because integration into host genomes is intrinsic to the replication cycle of retroviruses which employ reverse transcriptase (RT), it is no surprise that retroviruses are commonly found to have endogenous forms in a wide range of animal genomes . Indeed, most of the EVEs present in animal genomes are of retroviral origin - endogenous retroviruses (ERVs) - and EVEs representing all retroviral genera, with the exception of Deltaretrovirus, have been found to possess endogenous forms. Remarkably, recent studies have revealed the unexpected occurrence of non-retroviral elements in various animal genomes, including RNA viruses that lack a DNA form in their replication cycle ,. Since their initial discovery, EVEs in animal genomes have been documented for families of double-stranded (ds)DNA viruses (virus classification Group I) - Herpesviridae; single-stranded (ss)DNA viruses (Group II) - Circoviridae and Parvoviridae; ssRNA viruses (Group IV) - Bornaviridae and Filoviridae; ssRNA-RT viruses (Group VI) - Retroviridae; and dsDNA-RT viruses (Group VII) - Hepadnaviridae .
To date, most studies of animal EVEs have focused on mammals due to their relatively high density of sampling. In contrast, few studies on the EVEs present in avian species have been undertaken. The best-documented avian EVEs are endogenous hepadnaviruses. These virally derived elements were first described in the genome of a passerine bird - the zebra finch  - and then in the genome of the budgerigar  as well as some other passerines , and may have a Mesozoic origin in some cases . Also of note was the discovery of a great diversity of ERVs in the genomes of zebra finch, chicken and turkey, most of which remain transcriptionally active . In contrast, most mammalian ERVs are inert.
In this study, we systematically mined 48 avian genomes for EVEs of all viral families, as one of a body of companion studies on avian genomics ,. Importantly, our data set represents all 32 neognath and two of the five palaeognath orders, and thus represents nearly all major orders of extant birds. Such a large-scale data analysis enabled us to address a number of key questions in EVE evolution, namely (i) what types of viruses have left such genomic fossils across the avian phylogeny and in what frequencies, (ii) what are the respective frequencies of EVE inheritance between species and independent species-specific insertion, and (iii) what is the frequency and pattern of avian EVE infiltration compared with other vertebrates?
Genome scanning for avian endogenous viral elements
Our in silico genomic mining of the 48 avian genomes , (Table S1 in Additional file 1) revealed the presence of five families of endogenous viruses - Retroviridae, Hepadnaviridae, Circoviridae, Parvoviridae, and Bornaviridae (Figure 1), almost all of which (>99.99%) were of retroviral origin. Only a single family of RNA viruses (Group IV; the Bornaviridae) was present. Notably, three closely related oscine passerine birds - the American crow, medium ground-finch and zebra finch - possessed greater ERV copy numbers in their genomes than the avian average (Table 1; discussed in detail below), while their suboscine passerine relatives - rifleman and golden-collared manakin - possessed lower ERV numbers close to the avian average (Table 1) and occupied basal positions in the passerine phylogeny (Figure 1). Hence, there appears to have been an expansion of ERVs coincident with the species radiation of the suborder Passeri.
We next consider each of the EVE families in turn.
Endogenous viral elements related to the Retroviridae
As expected, ERVs were by far the most abundant EVE class in the avian genomes, covering the genera Alpha-, Beta-, Gamma-, and Epsilonretrovirus, with total ERV copy numbers ranging from 132 to 1,032. The greatest numbers of ERVs were recorded in the three oscine passerines (American crow, medium ground-finch and zebra finch, respectively) that exhibited EVE expansion (Table 1). ERVs related to beta- and gammaretroviruses were the most abundant in all avian genomes as noted in an important earlier study of three avian genomes . In contrast, ERVs derived from epsilonretroviruses were extremely rare, with very few copies distributed (Additional file 2). We also found that ERVs related to alpharetroviruses were widely distributed in avian phylogeny, although with very low copy numbers . In accord with the overall genetic pattern among the EVEs, the three oscine passerines exhibited greater numbers of ERVs than other taxa (two- to three-fold higher than the average; Table 1). This suggests that an ERV expansion occurred in the oscine passerines subsequent to their split from the suboscines. Phylogenetic analysis revealed that this pattern was due to frequent invasions of similar beta- and gammaretroviruses in these species (Table 1; Additional file 2).
Strikingly, the avian and non-avian (American alligator, green turtle and anole lizard) genomes seldom shared orthologous sequences (that is, only a few avian sequences can be aligned with those of non-avians and without matching flanking regions) and all their ERVs were distantly related (Additional file 2), indicative of a lack of vertical or horizontal transmission among these vertebrates. In addition, no non-retroviral elements were found in the non-avian genomes using our strict mining pipeline.
Endogenous viral elements related to the Hepadnaviridae
Hepadnaviruses have very small genomes (approximately 3 kb) of partially double-stranded and partially single-stranded circular DNA. Their replication involves an RNA intermediate that is reverse transcribed in the cytoplasm and transported as cDNA back into the nucleus. Strikingly, we found endogenous hepadnaviral elements in all the avian genomes studied (Table S2 in Additional file 1), such that they were the most widely distributed non-retroviral EVEs recorded to date. In this context it is important to note that no mammalian endogenous hepadnaviruses have been described even though primates are major reservoirs for exogenous hepatitis B viruses .
Our phylogenetic analysis revealed a number of notable evolutionary patterns in the avian endogenous hepadnaviruses: (i) endogenous hepadnaviruses exhibited a far greater phylogenetic diversity, depicted as diverse clades, than their exogenous relatives (Additional file 3), suggesting they were older, although an acceleration in evolutionary rates among some hepadnaviral EVEs cannot be excluded; (ii) exogenous hepadnaviruses formed a tight monophyletic group compared with the endogenous elements (Additional file 3), indicative of a turnover of exogenous viruses during avian evolution; (iii) there was a marked difference in copy number (from 1 to 68) among avian species (Table S2 in Additional file 1), suggestive of the frequent gain and loss of viruses during avian evolution; and (iv) there was a phylogeny-wide incongruence between the virus tree (Additional file 3) and the host tree (P = 0.233 using ParaFit method), indicative of multiple independent genomic integration events as well as potential cross-species transmission events.
Despite the evidence for independent integration events, it was also clear that some hepadnavirus EVEs were inherited from a common ancestor of related avian groups, and perhaps over deep evolutionary time-scales. We documented these cases by looking for pairs of endogenous hepadnaviruses from different avian hosts that received strong (>70%) bootstrap support (Data S1 in Additional file 4) and which occupied orthologous locations. Specifically: (i) in the genomes of the white-tailed and bald eagles, the 5′ end of an hepadnavirus EVE was flanked by a same unknown gene while the 3′ end was flanked by the dendritic cell immunoreceptor (DCIR) gene (Additional file 3); (ii) an EVE shared by the emperor penguin and Adelie penguin (Additional file 3) was flanked by a same unknown gene at the 5′ end and the Krueppel-like factor 8-like gene at the 3′ end; and (iii) the ostrich and the great tinamou had the same flanking genes, albeit of unknown function, at both ends of an EVE.
We also recorded a rare case of vertical transmission of a hepadnavirus with a complete genome that has seemingly been inherited by 31 species (Table S2 in Additional file 1) prior to the diversification of the Neoaves 73 million years ago . This virus has been previously denoted as eZHBV_C , and was flanked by the furry homolog (FRY) gene at both the 5′ and 3′ ends. Our hepadnavirus phylogeny (Figure 2) showed that this EVE group clustered tightly with extremely short internal branches, although with some topological patterns that were inconsistent with the host topology (Figure 1). A lack of phylogenetic resolution notwithstanding, this mismatch between the virus and host trees could be also in part be due to incomplete lineage sorting, in which there has been insufficient time for allele fixation during the short time period between bird speciation events. Indeed, Neoaves are characterized by a rapid species radiation .
Strikingly, we observed that two Galliformes species, chicken and turkey, have seemingly purged their hepadnaviral EVEs. Specifically, genomic mining revealed no hepadnaviral elements in these galliformes, even though their closest relatives (Anseriformes) contained such elements. In support of this genome purging, we noted that one hepadnaviral element present in the mallard genome has been severely degraded through frequent mutation in the chicken genome (Additional file 5). In addition, remnants of orthologous 5′ and 3′ regions could also be found in the turkey genome, although the rest of the element was deleted (Additional file 5).
Endogenous viral elements related to the Bornaviridae
Bornaviruses (family Bornaviridae) are linear, unsegmented negative-sense ssRNA viruses with genomes of approximately 9 kb. They are unusual among animal RNA viruses in their ability to replicate within the host cell nucleus, which in turn assists endogenization. Indeed, orthomyxoviruses and some insect rhabdoviruses also replicate in the nucleus and both have been found to occur as endogenous forms in insect genomes . Endogenous elements of bornaviruses, denoted endogenous bornavirus-like N (EBLN) ,, and endogenous bornavirus-like L (EBLL) ,, have been discovered in mammalian genomes, including humans, and those present in primates have been dated to have arisen more than 40 million years ago ,. Although exogenous bornaviruses circulate in both mammals and birds and cause fatal diseases ,, endogenous bornaviruses have not yet been documented in avian species.
We report, for the first time, that both EBLN and EBLL are present in several avian genomes (Additional file 6), although in only three species and with very low copy numbers (1 to 4; Table S3 in Additional file 1): the Anna’s hummingbird, the closely related chimney swift, and the more distantly related woodpecker. Both EBLN and EBLL in the genome of Anna’s hummingbird were divergent compared with other avian or mammalian viruses. The chimney swift possessed a copy of EBLN, which was robustly grouped in the phylogenetic tree with the EVE present in Anna’s hummingbird (Figure S4A in Additional file 6). However, as these viral copies did not share the same flanking regions in the host genomes, as well as the inconsistent phylogenetic positions of the EBLN (Figure S4A in Additional file 6) and EBLL (Figure S4C in Additional file 6) of Anna’s hummingbird, they likely represent independent integration events. In addition, due to the close relationships among some of the viruses in different species, it is possible that cross-species transmission has occurred because of shared geographical distributions (for example, woodpeckers are widely distributed across the United States, with geographic distributions that overlap with those of Anna’s hummingbirds). The EBLN in the downy woodpecker was likely to have entered the host genome recently as in the phylogenetic tree it was embedded within the genetic diversity of exogenous viruses; the same pattern was observed in the case of the two viral copies in the genome of Anna’s hummingbird (Figure S4B in Additional file 6). Similar to previous studies in mammals , we found that more species have incorporated EBLN than EBLL. However, compared with their wide distribution in mammalian genomes, it was striking that only three avian species carried endogenous bornavirus-like elements.
Endogenous viral elements related to the Circoviridae
Circoviruses (family Circoviridae) possess approximately 2 kb ssDNA, nonenveloped and unsegmented circular genomes, and replicate in the nucleus via a rolling circle mechanism. They are known to infect birds and pigs and can cause a wide range of severe symptoms such as Psittacine circovirus disease. There are two main open reading frames, usually arranged in an ambisense orientation, that encode the replication (Rep) and capsid (Cap) proteins. Endogenous circoviruses (eCiVs) are rare, and to date have only been reported in four mammalian genomes, with circoviral endogenization in carnivores dating to at least 42 million years .
We found circoviruses to be incorporated into only four avian genomes - medium ground finch, kea, egret, and tinamou - and at copy numbers of only 1 to 2 (Additional file 7; Table S5 in Additional file 1). There were at least two divergent groups of eCiVs in the viral phylogenetic tree, one in the medium ground-finch and great tinamou (Figure S5A-C in Additional file 7), which was closely related to exogenous avian circoviruses, and another in the little egret and kea (Figure S5C,D in Additional file 7), which was only distantly related to avian exogenous counterparts. The large phylogenetic distances among these endogenous viruses are suggestive of independent episodes of viral incorporation. In addition, two pieces of evidence strongly suggested that eCiVs in the medium ground-finch and great tinamou (Figure S5A-C in Additional file 7) have only recently entered host genomes: (i) they had close relationships with their exogenous counterparts, and (ii) they maintained complete (or nearly complete) open reading frames (Table S5 in Additional file 1).
Endogenous viral elements related to the Parvoviridae
The family Parvoviridae comprises two subfamilies - Parvovirinae and Densovirinae - that infect diverse vertebrates and invertebrates, respectively. Parvoviruses typically possess linear, non-segmented ssDNA genomes with an average size of approximately 5 kb, and replicate in the nucleus. Parvoviruses have been documented in a wide range of hosts, including humans, and can cause a range of diseases . Recent studies revealed that endogenous parvoviruses (ePaVs) have been broadly distributed in mammalian genomes, with integration events dating back at least 40 million years .
We found multiple entries of ePaVs with very low copy numbers (1 to 3; Table S5 in Additional file 1) in 10 avian genomes (Additional file 8), and they were not as widely distributed as those parvoviruses present in mammalian genomes . All avian ePaVs were phylogenetically close to exogenous avian parvoviruses with the exception of a single one from the brown mesite, which was distantly related to all known animal parvoviruses (Additional file 8). We also found several cases of apparently vertical transmission. For example, one common ePaV in the American crow and rifleman was flanked by the same unknown host gene; the viral copy in the golden-collared manakin and zebra finch was flanked by the tyrosine-protein phosphatase non-receptor type 13 (PTPN13) gene at the 5′ end and the same unknown gene at the 3′ end; and one viral element in the little egret and Dalmatian pelican was flanked by a same chicken repeat 1 (CR1) at the 5′ end and collagen alpha 1 gene (COL14A1) at the 3′ end (Data S2 in Additional file 4). These findings suggest both independent integration and vertical transmission (that is, common avian ancestry) for ePAVs that have seemingly existed in birds for at least 30 million years (that is, the separation time of Corvus and Acanthisitta ).
Low frequency of retroviral endogenous viral elements in bird genomes
To determine the overall pattern and frequency of infiltration of EVEs in the genomes of birds, American alligator, green turtle, anole lizard, and mammals, we documented the phylogeny-wide abundance of long terminal repeat (LTR)-retrotransposons of retrovirus-like origin . As retroviral elements comprise >99.99% of avian EVEs they obviously represent the most meaningful data set to explore patterns of EVE evolution. This analysis revealed that retroviral EVEs are far less common in birds than in mammals: the average retroviral proportion of the genome was 1.12% (range 0.16% to 3.57%) in birds, 2.39% to 11.41% in mammals, and 0.80% to 4.26% in the genomes of American alligator, green turtle and anole lizard (Tables S6 and S7 in Additional file 1). Strikingly, there was also a simple linear relationship between host genome size and EVE proportion (R2 = 0.787, P = 0.007; Figure 3). Of equal note was the observation that EVE copy numbers in bird genomes were an order of magnitude less frequent than in mammals (Figure 4; Tables S6 and S7 in Additional file 1), and that the relationship between viral copy number and host genome size exhibited a linear trend (R2 = 0.780, P < 0.001). Importantly, in all cases (that is, genome size versus proportion and genome size versus copy number) we employed phylogenetic regression analyses to account for the inherent phylogenetic non-independence of the data points.
Discussion and conclusions
Although a diverse array of viruses can possess endogenous forms , our analysis revealed that they are uncommon in avian genomes, especially those derived from RNA viruses. Indeed, among RNA viruses, we found only bornavirus endogenized forms occurred in avian genomes, and these had a sporadic distribution and very low frequencies. Although bird genomes are approximately one-third to one-half the size of those of mammals ,, the proportion of their genomes that comprises EVEs and their EVE copy numbers are 6 and 13 times less frequent, respectively. It is generally acknowledged that the genome size reduction associated with flying avian species evolved in the asurischian dinosaur lineage . Our broad-scale genomic screening also suggested that a low frequency of EVEs was an ancestral trait in avian lineage, especially in the case of ERVs, such that there has been an expansion of EVE numbers in mammals concomitant with an increase in their genome sizes. Also of note was that although some genomic integration events in birds were vertical, allowing us to estimate an approximate time-scale for their invasion over many millions of years, by far the most common evolutionary pattern in the avian data was the independent integration of EVEs into different species/genera.
There are a variety of reasons why EVE numbers could be so relatively low in avian genomes. First, it is theoretically possible that birds have been exposed to fewer viral infections than mammals. However, this seems unlikely as, although they are likely to have been examined less intensively than mammals , exogenous viruses of various kinds are found in avian species (for example, Coronaviridae, Flaviviridae, Hepadnaviridae, Orthomyxoviridae, Paramyxoviridae, Poxviridae, Retroviridae). In addition, the most common phylogenetic pattern we noted was that of independent integration, suggesting the presence of diverse exogenous infections. However, it is notable that mammals apparently harbor a more diverse set of exogenous retroviruses than birds, as well as a greater abundance of ERVs, which is indicative of a deep-seated evolutionary interaction between host and virus . For example, the only gammaretrovirus known in birds is reticuloendotheliosis virus (REV), and a recent study suggested that avian REVs have a mammalian origin . This is consistent with our observation that there are no endogenized forms of REVs among this diverse set of avian genomes.
It is also possible that birds are in some way refractory to EVE integration following viral infection. ERVs can replicate both as retrotransposons and as viruses via infection as well as re-infection. Although bird cells are known to be susceptible to certain retroviruses , the replication of avian ERVs within the host genome could be suppressed, at least in part, by host-encoded factors. However, a general conclusion of our study is that non-retroviral EVEs are seemingly rare in all vertebrates, such that their integration appears to be generically difficult, and the relative abundance of endogenous retroviruses in birds (albeit low compared with mammals) indicates that they are able to enter bird genomes, with some being actively transcribed and translated . Our observation of a lineage-specific ERV expansion in three passerines also argues against a general refractory mechanism.
A third explanation is that birds are particularly efficient at purging EVEs especially for viruses with retroviral origin from their genomes, a process that we effectively ‘caught in the act’ in the case of the galliform hepadnaviruses. Indeed, our observation of a very low frequency of LTR-retrotransposons in avian genomes may reflect the action of a highly efficient removal mechanism, such as a form of homologous recombination. Hence, it is likely that active genome purging must be responsible for some of the relative absence of EVEs in birds, in turn retaining a selectively advantageous genomic compactness . Clearly, additional work is needed to determine which of these, or other mechanisms, explain the low EVE numbers in avian genomes.
Materials and methods
Genome sequencing and assembly
To systematically study endogenous viral elements in birds, we mined the genomes of 48 avian species (Table S1 in Additional file 1). Of these, three genomes - chicken , zebra finch  and turkey  - were downloaded from Ensembl . The remaining genomes were acquired as part of our avian comparative genomics and phylogenomics consortium ,. All genomes can be obtained from our two databases: CoGe  and Phylogenomics Analysis of Birds . American alligator, green turtle, anole lizard, and 20 mammal genomes (Table S7 in Additional file 1) were downloaded from Ensembl  and used for genomic mining and the subsequent comparative analysis.
Chromosome and whole genome shotgun assembles ,- of all species (Table S1 in Additional file 1) were downloaded and screened in silico using tBLASTn and a library of representative viral protein sequences derived from Groups I to VII (dsDNA, ssDNA, dsRNA, +ssRNA, -ssRNA, ssRNA-RT, and dsDNA-RT) of the 2009 ICTV (International Committee on Taxonomy of Viruses)  species list (Additional file 9). All viral protein sequences were used for genomic mining. Host genome sequences that generated high-identity (E-values <1e-5) matches to viral peptides were extracted. Matches similar to host proteins were filtered and discarded. The sequences were considered virus-related if they were unambiguously matched viral proteins in the NCBI nr (non-redundant) database  and the PFAM database . The putative viral gene structures were inferred using GeneWise . The in silico mining of LTR-retrotransposons was performed using RepeatMasker .
To establish the phylogenetic positions of the avian EVEs, particularly in comparison with their exogenous counterparts, we collected all relevant reference viral sequences (Table S9 in Additional file 1) from GenBank . Protein sequences (both EVEs and exogenous viruses) were aligned using MUSCLE  and checked manually. Phylogenetic trees were inferred using the maximum likelihood method available in PhyML 3.0 , incorporating the best-fit amino acid substitution models determined by ProtTest 3 . The robustness of each node in the tree was determined using 1,000 bootstrap replicates. We subdivided our viral data into 16 categories for phylogenetic analysis (see Results): 1) endogenous hepadnaviruses, using both complete and partial P (polymerase) protein sequences from positions 429 to 641 (reference sequence DHBV, NC_001344); 2) EBLN, using partial N (nucleoprotein) protein sequences, from positions 43 to 224 (BDV, NC_001607); 3) EBLL, using partial L (RNA-dependent RNA polymerase) protein sequences, from positions 121 to 656; 4) eCiV Cap, using complete Cap (capsid) protein sequences (GooCiV, NC_003054); 5) eCiV Rep data set 1, using complete Rep (replicase) protein sequences; 6) eCiV Rep data set 2, using partial Rep protein sequences, from positions 160 to 228; 7) eCiV Rep data set 3, using partial Rep protein sequences, from positions 8 to 141; 8) ePaV Cap data set 1, using partial Cap protein sequences, from positions 554 to 650 (DucPaV, NC_006147); 9) ePaV Cap data set 2, using partial Cap protein sequences, from positions 406 to 639; 10) ePaV Cap data set 3, using partial Cap protein sequences, from positions 554 to 695; 11) ePaV Cap data set 4, using partial Cap protein sequences, from positions 662 to 725; 12) ePaV Rep data set 1, using partial Rep protein sequences, from positions 104 to 492; 13) ePaV Rep data set 2, using partial Rep protein sequences, from positions 245 to 383; 14) ePaV Rep data set 3, using partial Rep protein sequences, from positions 300 to 426; 15) ePaV Rep data set 4, using partial Rep protein sequences, from positions 1 to 40; and 16) ERVs, using the retroviral motif ‘DTGA-YMDD’ of Pro-Pol sequences. The best-fit models of amino acid substitution in each case were: 1) JTT + Γ; 2) JTT + Γ; 3) LG + Γ; 4) RtREV + Γ; 5) LG + I + Γ; 6) LG + Γ; 7) LG + I + Γ; 8) LG + Γ; 9) WAG + I + Γ; 10) LG + Γ; 11) LG + Γ; 12) LG + Γ; 13) LG + I + Γ; 14) LG + I + Γ; 15) LG + Γ; and 16) JTT + Γ.
To account for the phylogenetic relationships of avian taxa when investigating patterns of EVE evolution we employed phylogenetic linear regression as implemented in R . Specifically, using Mesquite  we manually created a tree that matched the host vertebrate phylogeny ,. For the subsequent phylogenetic regression analysis we utilized the ‘phylolm’ package in R , which provides a function for fitting phylogenetic linear regression and phylogenetic logistic regression.
The extent of co-divergence between viruses and hosts was tested by using ParaFit , as implemented in the COPYCAT package . The significance of the test was derived from 99,999 randomizations of the association matrix.
Data can be accessed by GigaDB . Alternatively, the IDs of NCBI BioProject/Sequence Read Archive (SRA)/study are as follows: Chaetura pelagica, PRJNA210808/SRA092327/SRP026688; Calypte anna, PRJNA212866/SRA096094/SRP028275; Charadrius vociferus, PRJNA212867/SRA096158/SRP028286; Corvus brachyrhynchos, PRJNA212869/SRA096200/SRP028317; Cuculus canorus, PRJNA212870/SRA096365/SRP028349; Manacus vitellinus, PRJNA212872/SRA096507/SRP028393; Ophisthocomus hoazin, PRJNA212873/SRA096539/SRP028409; Picoides pubescens, PRJNA212874/SRA097131/SRP028625; Struthio camelus, PRJNA212875/SRA097407/SRP028745; Tinamus guttatus, PRJNA212876/SRA097796/SRP028753; Acanthisitta chloris, PRJNA212877/SRA097960/SRP028832; Apaloderma vittatum, PRJNA212878/SRA097967/SRP028834; Balearica regulorum, PRJNA212879/SRA097970/SRP028839; Buceros rhinoceros, PRJNA212887/SRA097991/SRP028845; Antrostomus carolinensis, PRJNA212888/SRA098079/SRP028883; Cariama cristata, PRJNA212889/SRA098089/SRP028884; Cathartes aura, PRJNA212890/SRA098145/SRP028913; Chlamydotis macqueenii, PRJNA212891/SRA098203/SRP028950; Colius striatus, PRJNA212892/SRA098342/SRP028965; Eurypyga helias, PRJNA212893/SRA098749/SRP029147; Fulmarus glacialis, PRJNA212894/SRA098806/SRP029180; Gavia stellata, PRJNA212895/SRA098829/SRP029187; Haliaeetus albicilla, PRJNA212896/SRA098868/SRP029203; Haliaeetus leucocephalus, PRJNA237821/SRX475899, SRX475900, SRX475901, SRX475902/SRP038924; Leptosomus discolor, PRJNA212897/SRA098894/SRP029206; Merops nubicus, PRJNA212898/SRA099305/SRP029278; Mesitornis unicolor, PRJNA212899/SRA099409/SRP029309; Nestor notabilis, PRJNA212900/SRA099410/SRP029311; Pelecanus crispus, PRJNA212901/SRA099411/SRP029331; Phaethon lepturus, PRJNA212902/SRA099412/SRP029342; Phalacrocorax carbo, PRJNA212903/SRA099413/SRP029344; Phoenicopterus ruber, PRJNA212904/SRA099414/SRP029345; Podiceps cristatus, PRJNA212905/SRA099415/SRP029346; Pterocles gutturalis, PRJNA212906/SRA099416/SRP029347; Tauraco erythrolophus, PRJNA212908/SRA099418/SRP029348; Tyto alba, PRJNA212909/SRA099419/SRP029349; Nipponia nippon, PRJNA232572/SRA122361/SRP035852; Egretta garzetta, PRJNA232959/SRA123137/SRP035853. The following IDs are released before this study: Aptenodytes forsteri, PRJNA235982/SRA129317/SRP035855; Pygoscelis adeliae, PRJNA235983/SRA129318/SRP035856; Gallus gallus, PRJNA13342/SRA030184/SRP005856; Taeniopygia guttata, PRJNA17289/SRA010067/SRP001389; Meleagris gallopavo, PRJNA42129/Unknown/Unknown; Melopsittacus undulatus/PRJEB1588/ERA200248/ERP002324; Anas platyrhynchos, PRJNA46621/SRA010308/SRP001571; Columba livia, PRJNA167554/SRA054954/SRP013894; Falco peregrinus, PRJNA159791/SRA055082/SRP013939; Geospiza fortis, PRJNA156703/SRA051234/SRP011940.
endogenous bornavirus-like L
endogenous bornavirus-like N
endogenous viral element
Sequence Read Archive
Weiss RA: The discovery of endogenous retroviruses. Retrovirology. 2006, 3: 67-10.1186/1742-4690-3-67.
Katzourakis A, Gifford RJ: Endogenous viral elements in animal genomes. PLoS Genet. 2010, 6: e1001191-10.1371/journal.pgen.1001191.
Kazazian HH: Mobile elements: drivers of genome evolution. Science. 2004, 303: 1626-1632. 10.1126/science.1089670.
Jern P, Coffin JM: Effects of retroviruses on host genome function. Annu Rev Genet. 2008, 42: 709-732. 10.1146/annurev.genet.42.110807.091501.
Emerman M, Malik HS: Paleovirology–modern consequences of ancient viruses. PLoS Biol. 2010, 8: e1000301-10.1371/journal.pbio.1000301.
Feschotte C, Gilbert C: Endogenous viruses: insights into viral evolution and impact on host biology. Nat Rev Genet. 2012, 13: 283-296. 10.1038/nrg3199.
Stoye JP: Studies of endogenous retroviruses reveal a continuing evolutionary saga. Nat Rev Microbiol. 2012, 10: 395-406.
Herniou E, Martin J, Miller K, Cook J, Wilkinson M, Tristem M: Retroviral diversity and distribution in vertebrates. J Virol. 1997, 71: 437-443.
Gilbert C, Feschotte C: Genomic fossils calibrate the long-term evolution of hepadnaviruses. PLoS Biol. 2010, 8: e1000495-10.1371/journal.pbio.1000495.
Cui J, Holmes EC: Endogenous Hepadnaviruses in the genome of the budgerigar (Melopsittacus undulatus) and the evolution of avian hepadnaviruses. J Virol. 2012, 86: 7688-7691. 10.1128/JVI.00769-12.
Suh A, Brosius J, Schmitz J, Kriegs JO: The genome of a Mesozoic paleovirus reveals the evolution of hepatitis B viruses. Nat Commun. 2013, 4: 1791-10.1038/ncomms2798.
Bolisetty M, Blomberg J, Benachenhou F, Sperber G, Beemon K: Unexpected diversity and expression of avian endogenous retroviruses. mBio. 2012, 3: e00344-12-10.1128/mBio.00344-12.
Zhang G, Li C, Li Q, Li B, Larkin DM, Lee C, Storz JF, Antunes A, Greenwold MJ, Meredith RW, Ödeen A, Cui J, Zhou Q, Xu L, Pan H, Wang Z, Jin L, Zhang P, Hu H, Yang W, Hu J, Xiao J, Yang Z, Liu Y, Xie Q, Yu H, Lian J, Wen P, Zhang F, Li H, et al: Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014, 346: 1311-1320. 10.1126/science.1251385.
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, et al: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014, 346: 1320-1331. 10.1126/science.1253451.
Robertson BH, Margolis HS: Primate hepatitis B viruses - genetic diversity, geography and evolution. Rev Med Virol. 2002, 12: 133-141. 10.1002/rmv.348.
Hackett SJ, Kimball RT, Reddy S, Bowie RC, Braun EL, Braun MJ, Chojnowski JL, Cox WA, Han KL, Harshman J, Huddleston CJ, Marks BD, Miglia KJ, Moore WS, Sheldon FH, Steadman DW, Witt CC, Yuri T: A phylogenomic study of birds reveals their evolutionary history. Science. 2008, 320: 1763-1768. 10.1126/science.1157704.
Horie M, Honda T, Suzuki Y, Kobayashi Y, Daito T, Oshida T, Ikuta K, Jern P, Gojobori T, Coffin JM, Tomonaga K: Endogenous non-retroviral RNA virus elements in mammalian genomes. Nature. 2010, 463: 84-87. 10.1038/nature08695.
Belyi VA, Levine AJ, Skalka AM: Unexpected inheritance: multiple integrations of ancient bornavirus and ebolavirus/marburgvirus sequences in vertebrate genomes. PLoS Pathog. 2010, 6: e1001030-10.1371/journal.ppat.1001030.
de la Torre JC: Molecular biology of borna disease virus: prototype of a new group of animal viruses. J Virol. 1994, 68: 7669-7675.
VandeWoude S, Richt JA, Zink MC, Rott R, Narayan O, Clements JE: A borna virus cDNA encoding a protein recognized by antibodies in humans with behavioral diseases. Science. 1990, 250: 1278-1281. 10.1126/science.2244211.
Holmes EC: The evolution of endogenous viral elements. Cell Host Microbe. 2011, 10: 368-377. 10.1016/j.chom.2011.09.002.
Belyi VA, Levine AJ, Skalka AM: Sequences from ancestral single-stranded DNA viruses in vertebrate genomes: the parvoviridae and circoviridae are more than 40 to 50 million years old. J Virol. 2010, 84: 12458-12462. 10.1128/JVI.01789-10.
Lehmann HW, von Landenberg P, Modrow S: Parvovirus B19 infection and autoimmune disease. Autoimmun Rev. 2003, 2: 218-223. 10.1016/S1568-9972(03)00014-4.
Finnegan DJ: Retrotransposons. Curr Biol. 2012, 22: R432-R437. 10.1016/j.cub.2012.04.025.
Organ CL, Shedlock AM, Meade A, Pagel M, Edwards SV: Origin of avian genome size and structure in non-avian dinosaurs. Nature. 2007, 446: 180-184. 10.1038/nature05621.
Animal Genome Size Database. [http://www.genomesize.com/]
Lipkin WI: The changing face of pathogen discovery and surveillance. Nat Rev Microbiol. 2013, 11: 133-141. 10.1038/nrmicro2949.
Cui J, Tachedjian M, Wang L, Tachedjian G, Wang LF, Zhang S: Discovery of retroviral homologs in bats: implications for the origin of mammalian gammaretroviruses. J Virol. 2012, 86: 4288-4293. 10.1128/JVI.06624-11.
Niewiadomska AM, Gifford RJ: The extraordinary evolutionary history of the reticuloendotheliosis viruses. PLoS Biol. 2013, 11: e1001642-10.1371/journal.pbio.1001642.
Griffin DK, Robertson LB, Tempest HG, Skinner BM: The evolution of the avian genome as revealed by comparative molecular cytogenetics. Cytogenet Genome Res. 2007, 117: 64-77. 10.1159/000103166.
International Chicken Genome Sequencing Consortium: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432: 695-716. 10.1038/nature03154.
Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, Searle S, White S, Vilella AJ, Fairley S, Heger A, Kong L, Ponting CP, Jarvis ED, Mello CV, Minx P, Lovell P, Velho TA, Ferris M, Balakrishnan CN, Sinha S, Blatti C, London SE, Li Y, Lin YC, George J, Sweedler J, Southey B, Gunaratne P, Watson M, et al: The genome of a songbird. Nature. 2010, 464: 757-762. 10.1038/nature08819.
Dalloul RA, Long JA, Zimin AV, Aslam L, Beal K, Le Blomberg A, Bouffard P, Burt DW, Crasta O, Crooijmans RP, Cooper K, Coulombe RA, De S, Delany ME, Dodgson JB, Dong JJ, Evans C, Frederickson KM, Flicek P, Florea L, Folkerts O, Groenen MA, Harkins TT, Herrero J, Hoffmann S, Megens HJ, Jiang A, de Jong P, Kaiser P, Kim H, et al: Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo): genome assembly and analysis. PLoS Biol. 2010, 8: e1000475-10.1371/journal.pbio.1000475.
Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, García-Girón C, Gordon L, Hourlier T, Hunt S, Juettemann T, Kähäri AK, Keenan S, Komorowska M, Kulesha E, Longden I, Maurel T, McLaren WM, Muffato M, Nag R, Overduin B, Pignatelli M, Pritchard B, Pritchard E, et al: Ensembl 2013. Nucleic Acids Res. 2013, 41: D48-D55. 10.1093/nar/gks1236.
CoGe database. [http://genomevolution.org/CoGe/]
Phylogenomics analysis of birds. [http://phybirds.genomics.org.cn/]
King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ: Virus Taxonomy: Classification and Nomenclature of Viruses: Ninth Report of the International Committee on Taxonomy of Viruses. 2012, Elsevier Academic Press, San Diego
RefSeq: NCBI reference sequence database. [http://www.ncbi.nlm.nih.gov/refseq/]
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer EL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2010, 38: D211-D222. 10.1093/nar/gkp985.
Birney E, Clamp M, Durbin R: GeneWise and genomewise. Genome Res. 2004, 14: 988-995. 10.1101/gr.1865504.
Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. [http://www.repeatmasker.org]
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2013, 41: D36-D42. 10.1093/nar/gks1195.
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59: 307-321. 10.1093/sysbio/syq010.
Darriba D, Taboada GL, Doallo R, Posada D: ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011, 27: 1164-1165. 10.1093/bioinformatics/btr088.
The R Project for Statistical Computing. [http://www.r-project.org]
Maddison WP, Maddison DR: Mesquite: a modular system for evolutionary analysis. Version 2.75. [http://mesquiteproject.org]
Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, Ward LD, Lowe CB, Holloway AK, Clamp M, Gnerre S, Alföldi J, Beal K, Chang J, Clawson H, Cuff J, Di Palma F, Fitzgerald S, Flicek P, Guttman M, Hubisz MJ, Jaffe DB, Jungreis I, Kent WJ, Kostka D, Lara M, et al: A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011, 478: 476-482. 10.1038/nature10530.
Ho LST, Ané C: A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst Biol. 2014, 63: 397-408. 10.1093/sysbio/syu005.
Legendre P, Desdevises Y, Bazin E: A statistical test for host-parasite coevolution. Syst Biol. 2002, 51: 217-234. 10.1080/10635150252899734.
Meier-Kolthoff JP, Auch AF, Huson DH, Göker M: COPYCAT: cophylogenetic analysis tool. Bioinformatics. 2007, 23: 898-900. 10.1093/bioinformatics/btm027.
The avian phylogenomic project data. [http://gigadb.org/dataset/101000]
We thank the avian comparative genomics and phylogenomics consortium for providing the avian genomes sequenced. We thank Mang Shi, The University of Sydney, and Cai Li, BGI-Shenzhen, for statistical advice. ECH is supported by an NHMRC Australia Fellowship and by NIH grant R01 GM080533. We thank two reviewers for informative comments.
The authors declare that they have no competing interests.
JC, ECH and GZ designed research, which was coordinated by EDJ and MTPG; EDJ, MTPG and GZ provided genome data; JC, WZ, ZH analyzed the data; JC and ECH drafted the complete manuscript, with sections of text contributed by EDJ, MTPG, PJW, and GZ. All authors read and approved the final manuscript.
Electronic supplementary material
Additional file 1: Table S1: Avian genomes used for genomic mining. Table S2. Endogenous hepadnaviruses in avian genomes. Table S3. Endogenous bornaviruses in avian genomes. Table S4. Endogenous circoviruses in avian genomes. Table S5. Endogenous parvoviruses in avian genomes. Table S6. LTR-retrotransposon composition of avian genomes. Table S7. LTR-retrotransposon composition of American alligator, green turtle, anole lizard and mammalian genomes. Table S9. Reference sequences used for phylogenetic analyses. (DOCX 77 KB)
Additional file 2: Figure S1: Phylogenetic tree of endogenous retroviruses (ERVs). The tree was inferred using the conserved motif ‘DTGA-YMDD’ within the Pro-Pol region of retroviruses (approximately 320 amino acids in length, although this differs among retrovirus genera). Bootstrap values lower than 70% are not shown; single asterisks indicate values higher than 70%, while double asterisks indicate values higher than 90%. Branch lengths are drawn to a scale of amino acid substitutions per site (subs/site). The tree is midpoint rooted for purposes of clarity only. The host name indicates the species from which the ERV was obtained. Exogenous retroviruses are highlighted using family names. ERVs of alligator, turtle and lizard origin are also highlighted. (PDF 637 KB)
Additional file 3: Figure S2: Phylogenetic tree of exogenous and endogenous avian hepadnaviruses. Bootstrap values lower than 70% are not shown; single asterisks indicate values higher than 70%, while double asterisks indicate values higher than 90%. Branch lengths are drawn to a scale of amino acid substitutions per site (subs/site). The tree is midpoint rooted for purposes of clarity only. The exogenous hepadnaviruses are highlighted. Avian host species names are used to denote avian endogenous hepadnaviruses, and different EVEs from the same host are numbered. All abbreviations are provided in Table S9 in Additional file 1. (PDF 416 KB)
Additional file 4: Data S1. Alignments of the orthologous hepadnaviral scaffolds. Data S2. Alignments of the orthologous parvoviral scaffolds. (DOCX 180 KB)
Additional file 5: Figure S3: Alignment of a hepadnaviral element in the genome of mallard duck with orthologous (and partial) sequences found in the genomes of chicken and turkey. Note that we found a 94% match to the 5′ conserved region (marked as C) in turkey, and a 39% match to the orthologous chicken sequence; 45% of the central 12,042-bp virus-like sequence matched the 5′ variable region (marked as V). The relatively conserved nucleotides in chicken showing virus-like characteristics are boxed. Asterisks indicate the conserved nucleotides in the alignment, dashes denote deletions. (PDF 258 KB)
Additional file 6: Figure S4: Phylogenetic trees of endogenous and exogenous bornaviruses. The phylogenies contain (A) endogenous bornavirus-like N (nucleoprotein) (EBLN) and (B) avian endogenous bornavirus-like L (RNA-dependent RNA polymerase) (EBLL) sequences. Bootstrap values lower than 70% are not shown; single asterisks indicate values higher than 70%, while double asterisks indicate values higher than 90%. Branch lengths are drawn to a scale of amino acid substitutions per site (subs/site). The trees are midpoint rooted for purposes of clarity only. Avian host species names for those that harbor EVEs are given in parentheses and different EVEs from the same host are numbered. All abbreviations are provided in Table S9 in Additional file 1. (PDF 203 KB)
Additional file 7: Figure S5: Phylogenetic trees of endogenous circoviruses. (A-D) The phylogenies contain avian endogenous circoviruses (eCiVs) Cap (A) and Rep (B-D). Bootstrap values lower than 70% are not shown; single asterisks indicate values higher than 70%, while double asterisks indicate values higher than 90%. Branch lengths are drawn to a scale of amino acid substitutions per site (subs/site). The trees are midpoint rooted for purposes of clarity only. Avian host species names for those that harbor EVEs are given in parentheses. All abbreviations are provided in Table S9 in Additional file 1. (PDF 259 KB)
Additional file 8: Figure S6: Phylogenetic trees of endogenous and exogenous parvoviruses. (A-H) The phylogenies contain avian endogenous parvoviruses (ePaVs) Cap (A-D) and Rep (E-H). Bootstrap values lower than 70% are not shown; single asterisks indicate values higher than 70%, while double asterisks indicate values higher than 90%. Branch lengths are drawn to a scale of amino acid substitutions per site (subs/site). The trees are midpoint rooted for purposes of clarity only. Avian host species names for those that harbor EVEs are given in parentheses and different EVEs from the same host are numbered. All abbreviations are provided in Table S9 in Additional file 1. (PDF 406 KB)
About this article
Cite this article
Cui, J., Zhao, W., Huang, Z. et al. Low frequency of paleoviral infiltration across the avian phylogeny. Genome Biol 15, 539 (2014). https://doi.org/10.1186/s13059-014-0539-3
- Zebra Finch
- Green Turtle
- Genomic Mining
- Anole Lizard
- Avian Genome