A conifer genome spruces up plant phylogenomics
© BioMed Central Ltd 2013
Published: 27 June 2013
Skip to main content
© BioMed Central Ltd 2013
Published: 27 June 2013
The Norway spruce genome provides key insights into the evolution of plant genomes, leading to testable new hypotheses about conifer, gymnosperm, and vascular plant evolution.
Genome sizes in land plants
Range (1C; pg)
Like the genomes of other conifers, the Norway spruce genome is immense , despite a typical conifer chromosome number of 2N = 24: the genome size for species of Picea ranges from 15.75 to 19.125 pg, and the draft genome for P. abies is 20 Gb (or about 20.5 pg). Based on analyses using synonymous substitution rates for inferring ancient WGDs, P. abies lacks evidence of WGDs other than the one that predated all extant seed plants . The large genome of Picea and other conifers has occurred through mechanisms other than WGD: proliferation of long terminal repeat retrotransposons (LTR-RTs), including both the well known Ty3-gypsy and Ty1-copia superfamilies of transposable elements, accounts for its genomic 'obesity'  and, by extension, that of other conifers .
Furthermore, the two fundamental differences between angiosperms and other seed plants relate to reproduction and water-conducting ability, and the genes found in the Picea genome provide information on these systems. P. abies, as expected, lacks FLOWERING LOCUS T, a set of key activation genes for flowering in angiosperms, and contains an expanded set of FT/TFL1-like genes, which probably act as repressors of flowering . In contrast, the genetic control of water conduction is not as clear. Water transport in conifers is accomplished by cells called tracheids, but most angiosperms have more efficient conducting cells (vessels). Angiosperm-specific innovations in water conduction are controlled by a gene family (VASCULAR NAC DOMAIN, VND) that may have originated in gymnosperms, or possibly earlier - two VND genes were detected in P. abies, compared with seven in Arabidopsis.
The sequencing of the Norway spruce genome  is a landmark development in our understanding of plant genomes. The gene space of conifers is not substantially different from that of angiosperms - despite the much larger conifer genomes. In fact, the number of predicted genes for essentially all sequenced plant genomes is approximately 25,000 regardless of genome size and the number of WGDs. Even the bladderwort Utricularia gibba (an angiosperm), with a genome size of only 77 Mb, has an estimated 28,500 genes , nearly the same as that predicted for P. abies (28,345). In contrast, the sacred lotus Nelumbo nucifera (also an angiosperm), with a genome size more than 10-fold greater than that of U. gibba and 20 times smaller than that of P. abies at 929 Mb, contains approximately 26,685 genes . The consistency of these three estimates is striking, especially as the three papers [5, 8, 9] appeared within a month of one another and followed community standards for gene annotation. Furthermore, despite the much larger and more complex genomes of plants compared to those of most animals, the number of genes in their genomes is similar and does not seem to be proportional to genome size. Finally, the Norway spruce genome has expanded by the slow and steady accumulation of LTR-RTs, a phenomenon also observed in pine genomes ; this may reflect the lack of an efficient mechanism for eliminating these transposable elements.
Because the P. abies genome is the sole representative of the four extant lineages of gymnosperms (conifers, cycads, Ginkgo, and gnetophytes; Figure 1), it is difficult to infer which features are common to other conifers and which are unique to P. abies. However, taken together with data for other land plants, including genomic resources available for other gymnosperms, we will be able to start to assemble an understanding of the features that are unique to seed plants as a whole and those that are restricted to angiosperms. Further assembly and annotation are needed to understand genome structure and gene content in Norway spruce; genome sequences for additional conifers or other lineages of gymnosperms, despite their generally large size, should help clarify the uniqueness of the P. abies genome and provide the information needed for comparative studies that will enable the application of genomic data to forestry, breeding, and analysis of seed plant traits.
Despite the relatively small size of most plant genomes sequenced so far, extensive genetic and physical map resources have typically been required for the organization of the sequencing effort and the genome assembly. Therefore, for both scientific and practical reasons, sequencing efforts have focused mostly on genetic models with small genomes. However, as efforts to understand the evolution of plant genes and genomes expand, species will be sequenced solely because of their pivotal phylogenetic position - and these species will probably lack genetic and genomic resources.
Fortunately, the emergence of next-generation sequencing technologies, as well as new strategies for genome assembly, now make it possible to generate and assemble high-quality, cost-effective genome sequences for evolutionary models lacking genetic resources. For example, fluorescent in situ hybridization of bacterial artificial chromosomes or other probes, coupled with whole-genome mapping (optical mapping), can be used to guide and validate de novo genome assembly based on next-generation sequencing data. This strategy should be widely applicable to non-model plants with poor genomic resources, thus facilitating whole-genome sequencing and assembly for many other plant species.
Technical and analytical breakthroughs provide unprecedented opportunities for gaining new and fundamental insight into genome and organismal evolution across land plants. The Picea genome, at 20 Gb, takes us in a bold new direction. But what genomes to sequence next? A phylogenetic perspective can help identify future targets.
Viewing genome size across land-plant phylogeny reveals a dynamic pattern of genome size evolution, with an increase in genome size coincident with the origin of vascular plants, subsequent independent genome reductions in Selaginella and angiosperms, and further increases within some groups of angiosperms (such as monocots; Figure 1). Genome sizes are labile even within gymnosperms; from a large ancestral gymnosperm genome, independent increases occurred in Ephedra (a gnetophyte), Pinaceae, Pinus, and two non-Pinaceae conifers (Sciadopitys and Sequoia, the latter the only known polyploid conifer), and reduction in Gnetum (a gnetophyte), Ginkgo, and most non-Pinaceae conifers . Monilophytes typically have very large genomes and high chromosome numbers, features typically attributed to ancient WGD. Although recent episodes of polyploidy have been documented in many fern genera, there is no compelling evidence for ancient WGD in any monilophyte lineage.
These patterns of genome size change raise intriguing questions about the evolution of plant genomes. Through analysis of the P. abies genome, we now know that conifer genomes have expanded through proliferation of LTR-RTs, but does that mechanism apply to other large genomes, such as those of other gymnosperms and monilophytes, especially Psilotaceae and Ophioglossaceae, which have even larger genomes than conifers? Are the large monilophyte genomes comparable in structure and function to the P. abies genome? Do other large plant genomes also have long introns, similar to P. abies? What are the ancestral features of vascular plant genomes? Is genomic obesity the result of LTR-RTs retained from the ancestral vascular plants? What is the structural and functional role of the large chromosomes associated with large genomes? What role, if any, has ancient WGD played in the evolution of large monilophyte genomes? Reductions in genome size have occurred independently in gymnosperms (for example Gnetum), monilophytes (for example, Azolla, a water fern), and angiosperms: did genome downsizing occur by the same mechanism, for example, by repression of transposable element expansion coupled with loss of genetic material?
With the P. abies genome as a reference, analysis of genomes from a cycad, Ginkgo, gnetophyte, and other conifers could reveal how many features of the P. abies genome are actually shared, ancestral features of all gymnosperms and which are unique to conifers. Inclusion of a leptosporangiate fern (Ceratopteris) and a member of Marattiales (Angiopteris), both of which have some genetic resources, and of a lycophyte with a large genome (Lycopodium) would facilitate testing hypotheses of ancestral patterns of genomic change in vascular plants and their underlying mechanisms.
Genome sequences carry the keys to understanding genotype-to-phenotype relationships - for features as diverse as morphological characters, biochemical pathways, transcriptional networks, stress response, and more. Increased strategic sampling of plant genomes from across land plants and their green algal relatives will yield unparalleled information on the genes and gene families responsible for the major transitions in plant evolutionary history - the move onto land and the origins of vascular tissue, seeds, and flowers - as well as the genes controlling traits that could be harnessed for human benefit.