Skip to main content

Open access to tree genomes: the path to a better forest


An open-access culture and a well-developed comparative-genomics infrastructure must be developed in forest trees to derive the full potential of genome sequencing in this diverse group of plants that are the dominant species in much of the earth's terrestrial ecosystems.

Opportunities and challenges in forest tree genomics are seemingly as diverse and as large as the trees themselves; however, here, we have chosen to focus on the potential significant impact on all of tree biology research if only an open-access culture and comparative-genomics infrastructure were developed. In earlier articles [1, 2], we argued that the great diversity of forest trees found in both the undomesticated and domesticated state provides an excellent opportunity to understand the molecular basis of adaptation in plants and furthermore that comparative-genomic approaches will greatly facilitate discovery and understanding. We identified several priority research areas towards realizing these goals (Box 1), such as establishing reference genome sequences for important tree species, determining how to apply sequencing technologies to understand adaptation, and developing resources for storing and accessing forestry data. Significant progress has been made in many of these priorities, with the exception of investments in database resources and understanding ecological functions. Here, we briefly summarize the rapid progress in developing genomic resources in a small number of species and then offer our view on what we believe it will take to realize the final two priorities.

The great diversity found in forest trees

There are an estimated 60,000 tree species on earth, and approximately 30 of the 49 plant orders contain tree species. Clearly, the tree phenotype has evolved many times in plants. The diversity of plant structures, development, life history, environments occupied and so on in trees is nearly as broad as higher plants in general, but trees share the common characteristic that all are perennial and many are very long lived. Because of the sessile nature of plants, each tree must survive and reproduce in a specific environment over the seasonal cycles of its lifetime. This tight association between individual genotypes and their environment provides a powerful research setting, just as it has driven the evolution of a plethora of uniquely arboreal adaptations. Understanding these evolutionary strategies is a long-standing area of study of tree biologists, with many broader biological implications.

Completed and current genome-sequencing projects in forest trees are limited to about 25 species from just 4 of more than 100 families: Pinaceae (pines, spruces and firs), Salicaceae (poplars and willows), Myrtaceae (eucalyptus) and Fagaceae (oaks, chestnuts and beeches). Large-scale sequencing projects such as the 1000 Human Genomes [3], 1000 Plant Genomes (1KP) [4] or the 5000 Insect Genome (i5k) [5] projects have not yet been proposed for forest trees.

Rapidly developing genomic resources in forest trees

Genome resources are developing rapidly in forest trees in spite of the challenges associated with working with large, long-lived organisms and sometimes very large genomes [2]. Complete genome sequencing, however, has been slow to advance in forest trees owing to funding limitations and the large size of conifer genomes. Black cottonwood (Populus trichocarpa Torr. & Gray) was the first forest tree genome to be sequenced by the US Department of Energy Joint Genome Institute (DOE/JGI) [6] (Table 1). Black cottonwood has a relatively small genome (450 Mb) and is a target feedstock species for cellulosic ethanol production, and thus fits into the DOE/JGI priority of sequencing bioenergy feedstock species. The genus Populus has 30+ species (aspens and cottonwoods) with genome sizes of approximately 500 Mb. Several species are being sequenced by DOE/JGI, and other groups around the world, and it seems likely that all members of the genus will soon have a genome sequence (Table 1). The next forest tree to be sequenced was the flooded gum (Eucalyptus grandis BRASUZ1, which is a member of the Myrtaceae family), again by DOE/JGI. Eucalyptus species and their hybrids are important commercial species grown in their native Australia and many regions throughout the southern hemisphere. Several more eucalyptus species are being sequenced (Table 1), each with relatively small genomes (500 Mb), but it will probably take many years before all 700+ members of this genus are completed. Several members of the Fagaceae family are now being sequenced (Table 1). Members of this group include the oaks, beeches and chestnuts, with genome sizes less than 1 Gb.

Table 1 Genome resources in forest trees

The gymnosperm forest trees (such as the conifers) were the last to enter the world of genome sequencing. This was entirely due to their very large genomes (10 Gb and greater) as they are extremely important economically and ecologically, and phylogenetically they represent the ancient sister lineage to that of angiosperm species. Genome resources needed to support a sequencing project were reasonably well developed, but it was not until the introduction of next-generation sequencing (NGS) technologies that sequencing conifer genomes became tractable. Currently, there are at least ten conifer (Pinaceae) genome-sequencing projects under way (Table 1).

Aside from reference genome sequencing in forest trees, there is significant activity in transcriptome sequencing and resequencing for polymorphism discovery (Tables 2 and 3). We have only listed the transcriptome and resequencing projects in Table 1 that are associated with a species that has an active genome-sequencing project.

Table 2 Transcriptome resources in forest trees
Table 3 Polymorphism resources in forest trees

The opportunity for comparative-genomic approaches in forest trees

The power of comparative-genomic approaches for understanding function in an evolutionary framework is well established [713]. Comparative genomics can be applied to sequence data (nucleotide and protein) at the level of individual genes or genome-wide. Genome-wide approaches provide insight into both chromosome evolution and the diversification of biological functions and interactions.

Understanding of gene function in forest tree species is challenged by the lack of standard reverse-genetic tools routinely used in other systems - for example, standard marker stocks, facile transformation and regeneration - and by the long generation times. Thus, comparative genomics becomes the more powerful approach to understanding gene function in trees.

Comparative genomics requires not only data availability but also cyber-infrastructure to support exchange and analysis. The TreeGenes database is the most comprehensive resource for comparative-genomic analyses in forest trees [14]. Several smaller databases have been created to facilitate collaborations, including: Fagaceae genomics web,, Quercus portal, PineDB, ConiferGDB, EuroPineDB, PopulusDB, PoplarDB, EucalyptusDB and Eucanext (Tables 1, 2, and 3). These resources vary greatly in their scope, relevance and integration. Some are static and archival, whereas others focus on current sequence content for a specific species or a small number of related species. This results in overlapping and conflicting data among repositories. In addition, each database uses its own custom interfaces and back-end database technology to serve sequence to the user. The US National Science Foundation funding for large-scale infrastructure projects, such as iPlant, is leading efforts aimed towards centralizing resources for research communities [15]. Without centralized resources, researchers are forced to employ inefficient data-mining methods through queries of independently maintained databases or inconsistently formatted supplemental files on journal websites. Specific areas of interest for the forest tree genomic community include the ability to connect sequence, genotype and phenotype to individual, geo-referenced trees. This type of integration can only be achieved through web services that allow disparate resources to communicate in ways that are transparent to the user [16]. With the recent increase of genome sequences available for many of these species, there is a need to facilitate community-level annotation and research support.

The need for a better-developed open-access culture in forest tree genomics research

The Human Genome Project established a culture of open access and data sharing in genomics research for both humans and animal models that has been extended to many other species, including Arabidopsis, rat, cow, dog, rice, maize and more than 500 other eukaryotes. Beginning in the late 1990s, these large-scale projects released data very rapidly to the scientific community, often years before publication. This rapid release of data with few restrictions has allowed thousands of scientists to begin work on specific genes and gene families, and on functional studies, long before the genome papers have appeared. One of the driving motivations for this culture, and the reason that many scientists support it, is that large-scale sequencing can be done most efficiently when centers that have expertise in sequencing technology take the lead. With all the sequencing concentrated, the body of data needs to be shared freely in order to get it in the hands of the widely distributed experts. This open-access culture has dramatically accelerated scientific progress in biological research.

The path to success avoids delays

Careful inspection of Table 1 reveals that forest tree genome projects are very slow to release sequence data into the public domain. Once a project is finished and submitted for publication, a draft genome becomes available - for example, the poplar genome was released and published in 2006. However, pre-publication releases are infrequent, exceptions being the PineRefSeq project that has made three releases and the SMarTForest project that has made one (Table 1). This is unfortunate because good-quality sequence contigs and scaffolds could be made available years before publication, delivering an extremely important resource to the community. This delay can be understood from privately financed projects seeking commercial advantages, but nearly all the projects listed in Table 1 are financed by public funds whose stated mission is advancing science and development of community resources. Publication rights are easily protected by data-use policy statements such as the Ft Lauderdale [17] and Toronto agreements [18], but unfortunately these conventions are not often used and data access is restricted by password-protected websites (Tables 1, 2, and 3). We hope the opinion offered here will lead to a discussion in the forest tree community, to a more open-access culture and thus to a more vibrant and rapidly advancing research area.

Box 1

Research priorities in forest tree genomics identified in earlier Opinion papers.

From Neale and Ingvarsson [1]:

  • Deep expressed-sequence tag (EST) sequencing in many species

  • Comparative resequencing in many species

  • Reference genome sequence for pine

From Neale and Kremer [2]:

  • Reference genome sequences for several important species

  • Greater investment in diverse species towards understanding ecological function

  • Application of next-generation sequencing technologies to understand adaptation using landscape genomic approaches

  • Greater investment in database resources and cyber-infrastructure development

  • Development of new and high-throughput phenotyping technologies



expressed-sequence tag




next-generation sequencing.


  1. 1.

    Neale DB, Ingvarsson PK: Population, quantitative and comparative genomics of adaptation in forest trees. Curr Opin Plant Biol. 2008, 11: 149-155. 10.1016/j.pbi.2007.12.004.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Neale DB, Kremer A: Forest tree genomics: growing resources and applications. Nat Rev Genet. 2011, 12: 111-122.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Genomes Project C, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491: 56-65. 10.1038/nature11632.

    Article  Google Scholar 

  4. 4.

    The 1KP Project. []

  5. 5.

    i5k Insect and other Arthropod Genome Sequencing Initiative. []

  6. 6.

    Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Arabidopsis Genome I: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.

    Article  Google Scholar 

  8. 8.

    Batzoglou S, Pachter L, Mesirov JP, Berger B, Lander ES: Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res. 2000, 10: 950-958. 10.1101/gr.10.7.950.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Zdobnov EM, von Mering C, Letunic I, Torrents D, Suyama M, Copley RR, Christophides GK, Thomasova D, Holt RA, Subramanian GM, Mueller HM, Dimopoulos G, Law JH, Wells MA, Birney E, Charlab R, Halpern AL, Kokoza E, Kraft CL, Lai Z, Lewis S, Louis C, Barillas-Mury C, Nusskern D, Rubin GM, Salzberg SL, Sutton GG, Topalis P, Wides R, Wincker P, et al: Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science. 2002, 298: 149-159. 10.1126/science.1077061.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J, Aggarwal G, Caler E, Renauld H, Worthey EA, Hertz-Fowler C, Ghedin E, Peacock C, Bartholomeu DC, Haas BJ, Tran AN, Wortman JR, Alsmark UC, Angiuoli S, Anupama A, Badger J, Bringaud F, Cadag E, Carlton JM, Cerqueira GC, Creasy T, Delcher AL, Djikeng A, Embley TM, Hauser C, Ivens AC, et al: Comparative genomics of trypanosomatid parasitic protozoa. Science. 2005, 309: 404-409. 10.1126/science.1112181.

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Adryan B, Aguade M, Akashi H, Anderson WW, Aquadro CF, Ardell DH, Arguello R, Artieri CG, Barbash DA, Barker D, Barsanti P, Batterham P, Batzoglou S, Begun D, et al: Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007, 450: 203-218. 10.1038/nature06341.

    Article  PubMed  Google Scholar 

  12. 12.

    Koonin EV, Wolf YI: Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 2008, 36: 6688-6719. 10.1093/nar/gkn668.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Carlton JM, Adams JH, Silva JC, Bidwell SL, Lorenzi H, Caler E, Crabtree J, Angiuoli SV, Merino EF, Amedeo P, Cheng Q, Coulson RM, Crabb BS, Del Portillo HA, Essien K, Feldblyum TV, Fernandez-Becerra C, Gilson PR, Gueye AH, Guo X, Kang'a S, Kooij TW, Korsinczky M, Meyer EV, Nene V, Paulsen I, White O, Ralph SA, Ren Q, Sargeant TJ, et al: Comparative genomics of the neglected human malaria parasite Plasmodium vivax. Nature. 2008, 455: 757-763. 10.1038/nature07327.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Wegrzyn JL, Lee JM, Tearse BR, Neale DB: TreeGenes: A forest tree genome database. Int J Plant Genomics. 2008, 2008: 412875-

    Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A, Muir A, Merchant N, Lowry S, Mock S, Helmke M, Kubach A, Narro M, Hopkins N, Micklos D, Hilgert U, Gonzales M, Jordan C, Skidmore E, Dooley R, Cazes J, McLay R, Lu Z, Pasternak S, Koesterke L, Piel WH, et al: The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Front Plant Sci. 2011, 2: 34-

    Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Vasquez-Gross HA, Yu JJ, Figueroa B, Gessler DD, Neale DB, Wegrzyn JL: CartograTree: connecting tree genomes, phenotypes and environment. Mol Ecol Resour. 2013, 13: 528-537. 10.1111/1755-0998.12067.

    Article  PubMed  Google Scholar 

  17. 17.

    The Wellcome Trust: Sharing data from large-scale biological research projects: a system of tripartite responsibility. []

  18. 18.

    Toronto International Data Release Workshop A, Birney E, Hudson TJ, Green ED, Gunter C, Eddy S, Rogers J, Harris JR, Ehrlich SD, Apweiler R, Austin CP, Berglund L, Bobrow M, Bountra C, Brookes AJ, Cambon-Thomsen A, Carter NP, Chisholm RL, Contreras JL, Cooke RM, Crosby WL, Dewar K, Durbin R, Dyke SO, Ecker JR, El Emam K, Feuk L, Gabriel SB, Gallacher J, Gelbart WM, et al: Prepublication data sharing. Nature. 2009, 461: 168-170.

    Article  Google Scholar 

  19. 19.

    Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2013, 41: D36-D42. 10.1093/nar/gks1195.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Accelerating Pine Genomics. []

  21. 21.

    Kovach A, Wegrzyn JL, Parra G, Holt C, Bruening GE, Loopstra CA, Hartigan J, Yandell M, Langley CH, Korf I, Neale DB: The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences. BMC Genomics. 2010, 11: 420-10.1186/1471-2164-11-420.

    Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Magbanua ZV, Ozkan S, Bartlett BD, Chouvarine P, Saski CA, Liston A, Cronn RC, Nelson CD, Peterson DG: Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine. PLoS One. 2011, 6: e16214-10.1371/journal.pone.0016214.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    PineRefSeq. []

  24. 24.

    Jermstad KD, Eckert A, Wegrzyn J, Delfino-Mix A, Davis DA, Burton DC, Neale DB: Comparative mapping inPinus: sugar pine (Pinus lambertiana Dougl.) and loblolly pine (Pinus taeda L.). Tree Genet Genomes. 2011, 7: 457-468. 10.1007/s11295-010-0347-1.

    Article  Google Scholar 

  25. 25.

    Eckert AJ, Bower AD, Wegrzyn JL, Pande B, Jermstad KD, Krutovsky KV, St Clair JB, Neale DB: Association genetics of coastal Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-hardiness related traits. Genetics. 2009, 182: 1289-1302. 10.1534/genetics.109.102350.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Promoting Conifer Genomic Resources. []

  27. 27.

    Lepoittevin C, Frigerio JM, Garnier-Gere P, Salin F, Cervera MT, Vornam B, Harvengt L, Plomion C: In vitro vs in silico detected SNPs for the development of a genotyping array: what can we learn from a non-model species?. PLoS One. 2010, 5: e11034-10.1371/journal.pone.0011034.

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Genome Research and Education center. Siberian Federal University. []

  29. 29.

    Dillon SK, Nolan M, Li W, Bell C, Wu HX, Southerton SG: Allelic variation in cell wall candidate genes affecting solid wood properties in natural populations and land races of Pinus radiata. Genetics. 2010, 185: 1477-1487. 10.1534/genetics.110.116582.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Chen J, Kallman T, Ma X, Gyllenstrand N, Zaina G, Morgante M, Bousquet J, Eckert A, Wegrzyn J, Neale D, Lagercrantz U, Lascoux M: Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics. 2012, 191: 865-881. 10.1534/genetics.112.140749.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  31. 31.

    ConGenIE. []

  32. 32.

    Index of public Picea Glauca Release. []

  33. 33.

    Hamberger B, Hall D, Yuen M, Oddy C, Keeling CI, Ritland C, Ritland K, Bohlmann J: Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defense reveal insights into a conifer genome. BMC Plant Biol. 2009, 9: 106-10.1186/1471-2229-9-106.

    Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS: Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012, 40: D1178-D1186. 10.1093/nar/gkr944.

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Kelleher CT, Chiu R, Shin H, Bosdet IA, Krzywinski MI, Fjell2 CD, Wilkin J, Yin T, DiFazio SP, Ali J, Asano JK, Chan S, Cloutier A, Girn N, Leach S, Lee D, Mathewson CA, Olson T, O'Connor K, Prabhu A-L, Smailus DE, Stott JM, Tsai M: A physical map of the highly heterozygous Populus genome: integration with the genome sequence and genetic map and analysis of haplotype variation. Plant J. 2007, 50: 1066-1078.

    Article  Google Scholar 

  36. 36.

    Slavov GT, DiFazio SP, Martin J, Schackwitz W, Muchero W, Rodgers-Melnick E, Lipphardt MF, Pennacchio CP, Hellsten U, Pennacchio LA, Gunter LE, Ranjan P, Vining K, Pomraning KR, Wilhelm LJ, Pellegrini M, Mockler TC, Freitag M, Geraldes A, El-Kassaby YA, Mansfield SD, Cronk QC, Douglas CJ, Strauss SH, Rokhsar D, Tuskan GA: Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa. New Phytol. 2012, 196: 713-725. 10.1111/j.1469-8137.2012.04258.x.

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Popgenie draft assemblies. []

  38. 38.

    Fladung M, Kaufmann H, Markussen T, Hoenicka H: Construction of a Populus tremuloides Michx. BAC library. Silvae Genetica. 2008, 57: 65-69.

    Google Scholar 

  39. 39.

    IGA External Resources. []

  40. 40.

    Willowpedia. []

  41. 41.

    Paiva JA, Prat E, Vautrin S, Santos MD, San-Clemente H, Brommonschenkel S, Fonseca PG, Grattapaglia D, Song X, Ammiraju JS, Kudrna D, Wing RA, Freitas AT, Berges H, Grima-Pettenati J: Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries. BMC Genomics. 2011, 12: 137-10.1186/1471-2164-12-137.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. 42.

    EUCAGEN. []

  43. 43.

    Eucaliptus camaldulensis Genome Database. []

  44. 44.

    Hirakawa H, Nakamura Y, Kaneko T, Isobe S, Sakai H, Kato T, Hibino T, Sasamoto S, Watanabe A, M Y, S N, Fujishiro T, Kishida Y, Kohara M, Tabata S, Sato S: Survey of the genetic information carried in the genome of Eucalyptus camaldulensis. Plant Biotechnol J. 2011, 28: 471-480. 10.5511/plantbiotechnology.11.1027b.

    CAS  Article  Google Scholar 

  45. 45.

    Corymbia Genome Project. []

  46. 46.

    Faivre Rampant P, Lesur I, Boussardon C, Bitton F, Martin-Magniette ML, Bodenes C, Le Provost G, Berges H, Fluch S, Kremer A, Plomion C: Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome. BMC Genomics. 2011, 12: 292-10.1186/1471-2164-12-292.

    Article  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Lesur I, Durand J, Sebastiani F, Gyllenstrand N, Bodénès C, Lascoux M, Kremer A, Vendramin GG, Plomion C: A sample view of the pedunculate oak (Quercus robur) genome from the sequencing of hypomethylated and random genomic libraries. Tree Genet Genomes. 2011, 7: 1277-1285. 10.1007/s11295-011-0412-4.

    Article  Google Scholar 

  48. 48.

    Quercus portal, a European genetic and genomic web resource for Quercus. []

  49. 49.

    The Hardwood Genomics Project. []

  50. 50.

    Fagaceae Genomics Web. []

  51. 51.

    Fang G, Blackmon B, Staton M, Nelson D, Kubisiak TL, Olukolu BA, Henry D, Zhebentyayeva T, Saski CA, Cheng CH, Monsanto M, Ficklin S, Atkins M, Georgi LL, Barakat A, Wheeler N, Carlson J, Sederoff R, Abbott A: A physical map of the Chinese chestnut (Castanea mollissima) genome and its integration with the genetic map. Tree Genet Genomes. 2013, 9: 525-537. 10.1007/s11295-012-0576-6.

    Article  Google Scholar 

  52. 52.

    The Dwarf Birch Genome Project. []

  53. 53.

    Wang N, Thomson M, Bodles WJ, Crawford RM, Hunt HV, Featherstone AW, Pellicer J, Buggs RJ: Genome sequence of dwarf birch (Betula nana) and cross-species RAD markers. Mol Ecol. 2013, 22: 3098-3111. 10.1111/mec.12131.

    CAS  Article  PubMed  Google Scholar 

  54. 54.

    The British Ash Tree Genome Project. []

  55. 55.

    PineDB. []

  56. 56.

    Allona I, Quinn M, Shoop E, Swope K, St Cyr S, Carlis J, Riedl J, Retzel E, Campbell MM, Sederoff R, Whetten RW: Analysis of xylem formation in pine by cDNA sequencing. Proc Natl Acad Sci USA. 1998, 95: 9693-9698. 10.1073/pnas.95.16.9693.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci USA. 2003, 100: 7383-7388. 10.1073/pnas.1132171100.

    Article  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Cairney J, Zheng L, Cowels A, Hsiao J, Zismann V, Liu J, Ouyang S, Thibaud-Nissen F, Hamilton J, Childs K, Pullman GS, Zhang Y, Oh T, Buell CR: Expressed sequence tags from loblolly pine embryos reveal similarities with angiosperm embryogenesis. Plant Mol Biol. 2006, 62: 485-501. 10.1007/s11103-006-9035-9.

    Article  PubMed  Google Scholar 

  59. 59.

    Lorenz WW, Sun F, Liang C, Kolychev D, Wang H, Zhao X, Cordonnier-Pratt MM, Pratt LH, Dean JF: Water stress-responsive genes in loblolly pine (Pinus taeda) roots identified by analyses of expressed sequence tag libraries. Tree Physiol. 2006, 26: 1-16. 10.1093/treephys/26.1.1.

    Article  PubMed  Google Scholar 

  60. 60.

    Lorenz WW, Ayyampalayam S, Bordeaux JM, Howe GT, Jermstad KD, Neale DB, Rogers D, Dean JF: Conifer DBMagic: a database housing multiple de novo transcriptome assemblies for 12 diverse conifer species. Tree Genet Genomes. 2012, 8: 1477-1485. 10.1007/s11295-012-0547-y.

    Article  Google Scholar 

  61. 61.

    Neves LG, Davis JM, Brad Barbazuk W, Kirst M: Whole-exome targeted sequencing of the uncharacterized pine genome. Plant J. 2013,

    Google Scholar 

  62. 62.

    Treeversity. []

  63. 63.

    Muller T, Ensminger I, Schmid KJ: A catalogue of putative unique transcripts from Douglas fir (Pseudotsuga menziesii) based on 454 transcriptome sequencing of genetically diverse, drought stressed seedlings. BMC Genomics. 2012, 13: 673-10.1186/1471-2164-13-673.

    Article  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Howe GT, Yu J, Knaus B, Cronn R, Kolpak S, Dolan P, Lorenz WW, Dean JF: A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation. BMC Genomics. 2013, 14: 137-10.1186/1471-2164-14-137.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Fernandez-Pozo N, Canales J, Guerrero-Fernandez D, Villalobos DP, Diaz-Moreno SM, Bautista R, Flores-Monterroso A, Guevara MA, Perdiguero P, Collada C, Cervera MT, Soto A, Ordas R, Canton FR, Avila C, Canovas FM, Claros MG: EuroPineDB: a high-coverage web database for maritime pine transcriptome. BMC Genomics. 2011, 12: 366-10.1186/1471-2164-12-366.

    Article  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Sancho dos Santos CS, Wilton de Vasconcelos M: Identification of genes differentially expressed in Pinus pinaster and Pinus pinea after infection with the pine wood nematode. Eur J Plant Pathol. 2012, 132: 407-418. 10.1007/s10658-011-9886-z.

    Article  Google Scholar 

  67. 67.

    Walden AR, Walter C, Gardner RC: Genes expressed in Pinus radiata male cones include homologs to anther-specific and pathogenesis response genes. Plant Physiol. 1999, 121: 1103-1116. 10.1104/pp.121.4.1103.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Li X, Wu HX, Dillon SK, Southerton SG: Generation and analysis of expressed sequence tags from six developing xylem libraries in Pinus radiata D. Don. BMC Genomics. 2009, 10: 41-10.1186/1471-2164-10-41.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Li X, Wu HX, Southerton SG: Seasonal reorganization of the xylem transcriptome at different tree ages reveals novel insights into wood formation in Pinus radiata. New Phytol. 2010, 187: 764-776. 10.1111/j.1469-8137.2010.03333.x.

    CAS  Article  PubMed  Google Scholar 

  70. 70.

    Li X, Wu HX, Southerton SG: Transcriptome profiling of Pinus radiata juvenile wood with contrasting stiffness identifies putative candidate genes involved in microfibril orientation and cell wall mechanics. BMC Genomics. 2011, 12: 480-10.1186/1471-2164-12-480.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Li X, Wu HX, Southerton SG: Transcriptome profiling of wood maturation in Pinus radiata identifies differentially expressed genes with implications in juvenile and mature wood variation. Gene. 2011, 487: 62-71. 10.1016/j.gene.2011.07.028.

    CAS  Article  PubMed  Google Scholar 

  72. 72.

    Li X, Wu HX, Southerton SG: Identification of putative candidate genes for juvenile wood density in Pinus radiata. Tree Physiol. 2012, 32: 1046-1057. 10.1093/treephys/tps060.

    CAS  Article  PubMed  Google Scholar 

  73. 73.

    Chen J, Uebbing S, Gyllenstrand N, Lagercrantz U, Lascoux M, Kallman T: Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms. BMC Genomics. 2012, 13: 589-10.1186/1471-2164-13-589.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Rigault P, Boyle B, Lepage P, Cooke JE, Bousquet J, MacKay JJ: A white spruce gene catalog for conifer genome analyses. Plant Physiol. 2011, 157: 14-28. 10.1104/pp.111.179663.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH, Sundberg B, Gustafsson P, Uhlen M, Bhalerao RP, Nilsson O, Sandberg G, Karlsson J, Lundeberg J, Jansson S: A Populus EST resource for plant functional genomics. Proc Natl Acad Sci USA. 2004, 101: 13951-13956. 10.1073/pnas.0401641101.

    Article  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B, Bhalerao R, Larsson M, Villarroel R, Van Montagu M, Sandberg G, Olsson O, Teeri TT, Boerjan W, Gustafsson P, Uhlen M, Sundberg B, Lundeberg J: Gene discovery in the wood-forming tissues of poplar: analysis of 5, 692 expressed sequence tags. Proc Natl Acad Sci USA. 1998, 95: 13330-13335. 10.1073/pnas.95.22.13330.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Kohler A, Delaruelle C, Martin D, Encelot N, Martin F: The poplar root transcriptome: analysis of 7000 expressed sequence tags. FEBS Lett. 2003, 542: 37-41. 10.1016/S0014-5793(03)00334-X.

    Article  PubMed  Google Scholar 

  78. 78.

    Dejardin A, Leple JC, Lesage-Descauses MC, Costa G, Pilate G: Expressed sequence tags from poplar wood tissues--a comparative analysis from multiple libraries. Plant Biol (Stuttg). 2004, 6: 55-64.

    Article  Google Scholar 

  79. 79.

    Zhou L, Holliday JA: Targeted enrichment of the black cottonwood (Populus trichocarpa) gene space using sequence capture. BMC Genomics. 2012, 13: 703-10.1186/1471-2164-13-703.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Ralph SG, Chun HJ, Cooper D, Kirkpatrick R, Kolosova N, Gunter L, Tuskan GA, Douglas CJ, Holt RA, Jones SJ, Marra MA, Bohlmann J: Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding. BMC Genomics. 2008, 9: 57-10.1186/1471-2164-9-57.

    Article  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Bhalerao R, Keskitalo J, Sterky F, Erlandsson R, Bjorkbacka H, Birve SJ, Karlsson J, Gardestrom P, Gustafsson P, Lundeberg J, Jansson S: Gene expression in autumn leaves. Plant Physiol. 2003, 131: 430-442. 10.1104/pp.012732.

    Article  PubMed  PubMed Central  Google Scholar 

  82. 82.

    Rai H, Mock K, Richardson B, Cronn R, Hayden K, Wright J, Knaus B, Wolf P: Transcriptome characterization and detection of gene expression differences in aspen (Populus tremuloides). Tree Genet Genomes. 2013,

    Google Scholar 

  83. 83.

    Nanjo T, Sakurai T, Totoki Y, Toyoda A, Nishiguchi M, Kado T, Igasaki T, Futamura N, Seki M, Sakaki Y, Shinozaki K, Shinohara K: Functional annotation of 19,841 Populus nigra full-length enriched cDNA clones. BMC Genomics. 2007, 8: 448-10.1186/1471-2164-8-448.

    Article  PubMed  PubMed Central  Google Scholar 

  84. 84.

    Paux E, Tamasloukht M, Ladouce N, Sivadon P, Grima-Pettenati J: Identification of genes preferentially expressed during wood formation in Eucalyptus. Plant Mol Biol. 2004, 55: 263-280.

    CAS  Article  PubMed  Google Scholar 

  85. 85.

    Ranik M, Creux NM, Myburg AA: Within-tree transcriptome profiling in wood-forming tissues of a fast-growing Eucalyptus tree. Tree Physiol. 2006, 26: 365-375. 10.1093/treephys/26.3.365.

    CAS  Article  PubMed  Google Scholar 

  86. 86.

    Foucart C, Paux E, Ladouce N, San-Clemente H, Grima-Pettenati J, Sivadon P: Transcript profiling of a xylem vs phloem cDNA subtractive library identifies new genes expressed during xylogenesis in Eucalyptus. New Phytol. 2006, 170: 739-752. 10.1111/j.1469-8137.2006.01705.x.

    CAS  Article  PubMed  Google Scholar 

  87. 87.

    Novaes E, Drost DR, Farmerie WG, Pappas GJ, Grattapaglia D, Sederoff RR, Kirst M: High throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome. BMC Genomics. 2008, 9: 312-10.1186/1471-2164-9-312.

    Article  PubMed  PubMed Central  Google Scholar 

  88. 88.

    Salazar MM, Nascimento LC, Camargo EL, Goncalves DC, Neto JL, Marques WL, Teixeira PJ, Mieczkowski P, Mondego JM, Carazzolle MF, Deckmann AC, Pereira GA: Xylem transcription profiles indicate potential metabolic responses for economically relevant characteristics of Eucalyptus species. BMC Genomics. 2013, 14: 201-10.1186/1471-2164-14-201.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  89. 89.

    Thumma BR, Sharma N, Southerton SG: Transcriptome sequencing of Eucalyptus camaldulensis seedlings subjected to water stress reveals functional single nucleotide polymorphisms and genes under selection. BMC Genomics. 2012, 13: 364-10.1186/1471-2164-13-364.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  90. 90.

    Chancerel E, Lepoittevin C, Le Provost G, Lin YC, Jaramillo-Correa JP, Eckert AJ, Wegrzyn JL, Zelenika D, Boland A, Frigerio JM, Chaumeil P, Garnier-Gere P, Boury C, Grivet D, Gonzalez-Martinez SC, Rouze P, Van de Peer Y, Neale DB, Cervera MT, Kremer A, Plomion C: Development and implementation of a highly-multiplexed SNP array for genetic mapping in maritime pine and comparative mapping with loblolly pine. BMC Genomics. 2011, 12: 368-10.1186/1471-2164-12-368.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  91. 91.

    Ueno S, Le Provost G, Leger V, Klopp C, Noirot C, Frigerio JM, Salin F, Salse J, Abrouk M, Murat F, Brendel O, Derory J, Abadie P, Leger P, Cabane C, Barre A, de Daruvar A, Couloux A, Wincker P, Reviron MP, Kremer A, Plomion C: Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak. BMC Genomics. 2010, 11: 650-10.1186/1471-2164-11-650.

    Article  PubMed  PubMed Central  Google Scholar 

  92. 92.

    Barakat A, DiLoreto DS, Zhang Y, Smith C, Baier K, Powell WA, Wheeler N, Sederoff R, Carlson JE: Comparison of the transcriptomes of American chestnut (Castanea dentata) and Chinese chestnut (Castanea mollissima) in response to the chestnut blight infection. BMC Plant Biol. 2009, 9: 51-10.1186/1471-2229-9-51.

    Article  PubMed  PubMed Central  Google Scholar 

  93. 93.

    Barakat A, Staton M, Cheng CH, Park J, Yassin NB, Ficklin S, Yeh CC, Hebard F, Baier K, Powell W, Schuster SC, Wheeler N, Abbott A, Carlson JE, Sederoff R: Chestnut resistance to the blight disease: insights from transcriptome analysis. BMC Plant Biol. 2012, 12: 38-10.1186/1471-2229-12-38.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  94. 94.

    Bai X, Rivera-Vega L, Mamidala P, Bonello P, Herms DA, Mittapalli O: Transcriptomic signatures of ash (Fraxinus spp.) phloem. PLoS One. 2011, 6: e16368-10.1371/journal.pone.0016368.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  95. 95.

    Eckert AJ, Bower AD, Gonzalez-Martinez SC, Wegrzyn JL, Coop G, Neale DB: Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae). Mol Ecol. 2010, 19: 3789-3805. 10.1111/j.1365-294X.2010.04698.x.

    CAS  Article  PubMed  Google Scholar 

  96. 96.

    Eckert AJ, van Heerwaarden J, Wegrzyn JL, Nelson CD, Ross-Ibarra J, Gonzalez-Martinez SC, Neale DB: Patterns of population structure and environmental associations to aridity across the range of loblolly pine (Pinus taeda L., Pinaceae). Genetics. 2010, 185: 969-982. 10.1534/genetics.110.115543.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  97. 97.

    Pavy N, Gagnon F, Rigault P, Blais S, Deschenes A, Boyle B, Pelgas B, Deslauriers M, Clement S, Lavigne P, Lamothe M, Cooke JE, Jaramillo-Correa JP, Beaulieu J, Isabel N, Mackay J, Bousquet J: Development of high-density SNP genotyping arrays for white spruce (Picea glauca) and transferability to subtropical and nordic congeners. Mol Ecol Resour. 2013, 13: 324-336. 10.1111/1755-0998.12062.

    CAS  Article  PubMed  Google Scholar 

  98. 98.

    Pavy N, Pelgas B, Beauseigle S, Blais S, Gagnon F, Gosselin I, Lamothe M, Isabel N, Bousquet J: Enhancing genetic mapping of complex genomes through the design of highly multiplexed SNP arrays: application to the large and unsequenced genomes of white spruce and black spruce. BMC Genomics. 2008, 9: >21-

    Article  PubMed  PubMed Central  Google Scholar 

  99. 99.

    Dryad. []

  100. 100.

    Geraldes A, Pang J, Thiessen N, Cezard T, Moore R, Zhao Y, Tam A, Wang S, Friedmann M, Birol I, Jones SJ, Cronk QC, Douglas CJ: SNP discovery in black cottonwood (Populus trichocarpa) by population transcriptome resequencing. Mol Ecol Resour. 2011, 81-92. 11 Suppl 1

  101. 101.

    Wegrzyn JL, Eckert AJ, Choi M, Lee JM, Stanton BJ, Sykes R, Davis MF, Tsai CJ, Neale DB: Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem. New Phytol. 2010, 188: 515-532. 10.1111/j.1469-8137.2010.03415.x.

    CAS  Article  PubMed  Google Scholar 

  102. 102.

    Isabel N, Lamothe M, Thompson SL: A second-generation diagnostic single nucleotide polymorphism (SNP)-based assay, optimized to distinguish among eight poplar (Populus L.) species and their early hybrids. Tree Genet Genomes. 2013, 9: 621-626. 10.1007/s11295-012-0569-5.

    Article  Google Scholar 

  103. 103.

    Guerra FP, Wegrzyn JL, Sykes R, Davis MF, Stanton BJ, Neale DB: Association genetics of chemical wood properties in black poplar (Populus nigra). New Phytol. 2013, 197: 162-176. 10.1111/nph.12003.

    CAS  Article  PubMed  Google Scholar 

  104. 104.

    Diversity Arrays Technology Pty Ltd (DArT P/L). []

  105. 105.

    Sansaloni CP, Petroli CD, Carling J, Hudson CJ, Steane DA, Myburg AA, Grattapaglia D, Vaillancourt RE, Kilian A: A high-density Diversity Arrays Technology (DArT) microarray for genome-wide genotyping in Eucalyptus. Plant Methods. 2010, 6: 16-10.1186/1746-4811-6-16.

    Article  PubMed  PubMed Central  Google Scholar 

  106. 106.

    Grattapaglia D, Silva-Junior OB, Kirst M, de Lima BM, Faria DA, Pappas GJ: High throughput SNP genotyping in the highly heterozygous genome of Eucalyptus: assay success, polymorphism and transferability across species. BMC Plant Biol. 2011, 11: 65-10.1186/1471-2229-11-65.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  107. 107.

    Hendre PS, Kamalakannan R, Varghese M: High-throughput and parallel SNP discovery in selected candidate genes in Eucalyptus camaldulensis using Illumina NGS platform. Plant Biotechnol J. 2012, 10: 646-656. 10.1111/j.1467-7652.2012.00699.x.

    CAS  Article  PubMed  Google Scholar 

Download references


Writing of this paper was funded by US Department of Agriculture, National Institute of Food and Agriculture grant #2011-67009-30030.

Author information



Corresponding author

Correspondence to David B Neale.

Additional information

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Neale, D.B., Langley, C.H., Salzberg, S.L. et al. Open access to tree genomes: the path to a better forest. Genome Biol 14, 120 (2013).

Download citation


  • Forest tree genome
  • Open access
  • Sequencing
  • Genomics
  • Database