Viruses take center stage in cellular evolution
© BioMed Central Ltd 2006
Published: 16 June 2006
Skip to main content
© BioMed Central Ltd 2006
Published: 16 June 2006
The origins of viruses are shrouded in mystery, but advances in genomics and the discovery of highly complex giant DNA viruses have stimulated new hypotheses that DNA viruses were involved in the emergence of the eukaryotic cell nucleus, and that they are worthy of being considered as living organisms.
The reputedly intractable problem of the origin of viruses has long been neglected. In the modern literature, 'virus evolution' has come to refer to studies more akin to population genetics, such as the worldwide scrutiny of new polymorphisms appearing daily in the H5N1 avian flu virus , than to the fundamental question of where viruses come from. This is now rapidly changing, as a result of the coincidence of bold new ideas (and the revival of old ones), the unexpected spectacular features of some recently isolated giant viruses [2, 3], as well as the steady increase in the numbers of genomic sequences for 'regular' viruses and cellular organisms, which enhances the power of comparative genomics . After being considered non-living and relegated to the wings by most biologists, viruses are now center stage: they might have been there at the origin of DNA, might have played a central role in the emergence of the eukaryotic cell, and might even have been the cause of partitioning of biological organisms into the three domains of life: Bacteria, Archaea and Eukarya. In this article, I shall briefly survey some of the recent discoveries and the new evolutionary thoughts they have prompted, before adding to the discussion with a question of my own: what if we have totally missed the true nature of (at least some) viruses?
As of April 2006, more than 1,600 viral genomes have been sequenced, approximately equally divided between RNA and DNA viruses. In view of this fundamental difference in their genetic material (and thus in their replication mechanisms, size, genetic complexity, host range and other features) it is tempting to immediately rule out the idea that viruses are monophyletic, that is, that they derive from a common ancestor. That might not be so easy to do, however. Although there are many arguments in favor of the idea that RNA and DNA viruses were generated independently - RNA viruses first, in the context of the 'RNA world' theory - their genesis might have overlapped quite significantly either before or shortly after the Last Universal Common Ancestor (LUCA, the last unique ancestor of all cellular life, reviewed in ), allowing a non-negligible level of genome mixing. Indeed, several proteins have homologs in both RNA and DNA viruses, the most important of all being the jelly-roll capsid protein , the sole protein that is found in most viruses and not found in cellular organisms . Other components are shared between the two types of viruses, but these are considered to be the results of more recent lateral gene transfers; they include the chaperonin Hsp70, which is found in the giant double-stranded DNA (dsDNA) mimivirus  and the positive-strand RNA closteroviruses .
Such back-and-forth eukaryogenesis-viriogenesis could readily explain the multiplicity of present-day virus lineages, together with their diversity in size, complexity and gene complement, as well as the apparent mixture of monophyly and polyphyly (descent from more than one ancestor) exhibited by the viral world. In this context, extant complex eukaryotic DNA viruses could have originated from iterative waves of nuclear viriogenesis. But we still need some initial 'seeding' virus, the one that, for instance, invented the prototype of the now nearly ubiquitous jelly-roll capsid protein. Reviving d'Herelle's initial 'virus first' hypothesis, Koonin and Martin  paradoxically proposed that RNA viruses might have emerged even before the invention of individual cells, as selfish RNA replicons roaming prebiotic inorganic compartments. There is little chance, however, that this hypothesis could be scientifically proven anytime soon.
Also quite provocative is the idea that RNA viruses might be at the origin of DNA biochemistry [2, 18]. According to this scenario, RNA-based viruses infecting RNA-based cells would have acquired an RNA-to-DNA modification system to resist cellular RNA-degrading enzymes (the RNA equivalent of present-day bacterial restriction and modification systems). For this to happen, RNA viruses would have had to evolve the ribonucleotide reductase enzyme, to convert diphosphate-ribonucleotides to diphosphate-deoxyribonucleotides, and thymidylate synthase, to make dTMP from dUMP, the two key pathways in DNA synthesis. Cellular RNA was then replaced by DNA in the course of evolution because of its greater stability and the capacity for repair conferred by its double-stranded structure, allowing larger, more complex genomes to out-compete the RNA-based genomes of more primitive cells . Note that this scenario is nicely complementary to the viral eukaryogenesis hypothesis, the cellular RNA genes being progressively recruited within the newly acquired DNA-based 'nucleus' (see Figure 1). Interestingly, deoxyuridine is known to replace thymidine in the DNA of several bacteriophages .
Finally, in a paper that has already received much attention, Forterre  promoted (ancient) viruses to another fundamental role: to have been at the origin of the three basic cellular domains. His 'three RNA cells, three DNA viruses' hypothesis explains firstly, why there are three discrete lineages of modern cells instead of a continuum; secondly, the existence of three canonical ribosomal patterns; and thirdly, the critical differences exhibited by the, nevertheless similar, eukaryotic and archaeal replication machineries. This is readily done by postulating that DNA technology was independently transferred by three different founder DNA viruses to RNA-based ancestors of the Archaea, Bacteria, and Eukarya respectively. The reduction in rates of evolution following the transition from an RNA to a DNA genome would have stabilized the three canonical versions of translation proteins that are still recognizable today.
If, for a moment, we put aside the paradoxical virus-first hypothesis, we are left with two more traditional (cell-first) hypotheses about the origin of viruses in general. One is the 'escape hypothesis', which views viruses as originating from cells by the escape of a minimal set of cellular components necessary to constitute an infectious selfish replicating system. The other is the 'reduction hypothesis', in which viruses would have derived from a cellular organism through a progressive loss of functions until it finally became a bona fide virus. In real life, unfortunately, this simple dichotomy will be blurred by the accretion of genes laterally transferred between viruses (or parasitic cellular organisms) sharing identical hosts, or directly captured from the virus hosts. In that respect, bacteriophages differ markedly from most eukaryotic dsDNA viruses by exhibiting massive recombinational reassortment and accretion of genes, most probably resulting from the existence of a prophage state integrated into the host genome . Yet 80% of the genes of dsDNA bacteriophages have no obvious homologs in microbial genomes, suggesting a large degree of evolutionary independence of the phage gene set . A much stricter genetic isolation is exhibited by the eukaryotic nucleocytoplasmic large dsDNA viruses (NCLDV), such as the giant Acanthamoeba polyphaga mimivirus , whose 1.2 Mb genome (911 genes) exhibits little evidence of horizontal transfer . This also holds true for the next-largest NCLDVs, alga-infecting phycodnaviruses (with known genome sequences in the 300-400 kb range) [24, 25]. Mimivirus also exhibits a high level of genomic coherence, as shown by the homogeneity of its nucleotide composition and the strict conservation of half of its promoter sequences .
As more genomes of large eukaryotic viruses are sequenced, new genes keep turning up, most of them with no obvious phylogenetic affinity with known hosts or extant cellular organisms. This simple observation is definitely more favorable to the idea that these large viruses arose from the reduction of a more complex ancestral (viral) genome, than to the hypothetical accretion of numerous exogenous genes (without recognizable origin) around a primitive minimal viral genome. Recent results on coccolithovirus EhV-86 illustrate this point very nicely. Until the 407 kb genome of EhV-86 was characterized, the trademark of all previously characterized phycodnaviruses (with smaller 320 kb genomes) compared with other NCLDVs was the absence of a virus-encoded transcription machinery (a lack of DNA-directed RNA polymerase) . Obviously, the presence or absence of an RNA polymerase implies major differences in virus physiology. Unexpectedly, EhV-86 was found to encode its own six-subunit transcriptional machinery . Nevertheless, a phylogenetic analysis of 25 core genes common to NCLDVs firmly placed EhV-86 within the Phy-codnaviridae clade . In this case, the loss of the transcription apparatus by the smaller phycodnaviruses, rather than the simultaneous gain of the six subunits of an RNA polymerase by EhV-86, appears much more likely.
The reduction hypothesis received a strong boost from the discovery and genomic characterization of A. polyphaga mimivirus , the first virus to largely overlap with the world of cellular organisms, in terms of both particle size and genome complexity . The finding of numerous virally encoded components of an incomplete translation apparatus strongly suggested a process of reductive evolution from an even more complex ancestor that was endowed with protein synthetic capability. Such an ancestor could either have evolved from an obligate intracellular parasitic cell (functionally similar to Rickettsia or Chlamydia), or be derived from the nucleus of a primitive eukaryote through the mechanism illustrated in Figure 1. If reduction is the scenario at the origin of mimivirus, it is most likely to apply to other NCLDVs, in particular to those exhibiting the closest phylogenetic affinity with mimivirus such as the Phycodnaviridae and Iridoviridae. Sequencing additional large genomes from representatives of these families should provide valuable insights about this postulated giant ancestor.
Conceptually, the analogy between a virus life cycle and the reproductive cycle of a nondividing organism can be extended further. Sensu August Weismann, the virus particle possesses all the property of the Germen (the germline, the continuous immortal lineage responsible for carrying one generation to the next), whereas the transient virus factory exhibits all the property of the Soma, the body or somatic cells . Also, according to Weismann, such a partition implies the phenomenon of aging: once the opportunity to pass germplasm on has passed (that is, once viral particles have been produced), there is no need to maintain the integrity of the somaplasm. In this interpretation, the virus factory now becomes the ultimate illustration of a disposable soma, vanishing immediately after viral particles have been produced. Nevertheless, I believe that the virus factory should be considered the actual virus organism when referring to a virus. Incidentally, in this interpretation the living nature of viruses is undisputable, on the same footing as intracellular bacterial parasites. Focusing on the structure of the virus factory rather than on the morphology of the virus particle might help us reach a better understanding of the evolutionary history of viruses.
A serious difficulty in the reductive hypothesis for the origin of viruses (when considered as particles) is to propose reasonable mechanisms by which a cell, even a highly parasitic cell, might switch from a cellular dividing mode to a host-supported particle-replication mode all at once. Focusing on viruses as cell-like factories rather than particles makes it much easier to conceive a gradual transition. I would like to propose the following scenario. The event committing a parasitic cell towards the reductive viral evolution pathway would be the loss of an essential component of its translation apparatus (for example, a ribosomal protein): the presence or absence of an encoded protein synthesis system clearly remains the last unambiguous genomic divide between the viral and the cellular worlds. In order to survive, the now translation-defective cell would have had to adopt new strategies to gain access to the ribosomes of its hosts. At the same time, this translation-defective cell could now dispense with the rest of its ribosome-encoding genes. Such an intermediate protoviral cell could survive in its original host while improving the design of a bona fide virus factory. Finally, a gamete-like genome-packaging process could emerge, following the acquisition of a capsid protein gene from an ancestral RNA virus. Such an event would allow the reduced cellular genome to be reproduced in many more copies, at the same time relieving the burden of maintaining the viability of the infected host cells. The soma-like virus factory could then become the transient organism we observe today.
In summary, the past few years have seen a spectacular renaissance of the field of viral evolution, prompted equally by the publication of increasing bold theories on the origin of life, the realization that viruses are the dominant life form on Earth, an exponential increase of genomic data, and the serendipitous discovery of few giant viruses. Viruses have come a long way from being unwanted inhabitants of the Tree of Life, to being given a central role in all major evolutionary transitions . The challenge is now to unify the many evolutionary scenarios that have been proposed, using hard facts and experimental data, without getting sidetracked by the many spectacular but anecdotal features that individual virus families have incorporated during their long and probably chaotic history.