The tomato genome: implications for plant breeding, genomics and evolution
© BioMed Central Ltd 2012
Published: 30 August 2012
Skip to main content
© BioMed Central Ltd 2012
Published: 30 August 2012
The genome sequence of tomato (Solanum lycopersicum), one of the most important vegetable crops, has recently been decoded. We address implications of the tomato genome for plant breeding, genomics and evolutionary studies, and its potential to fuel future crop biology research.
Plants are indispensable for human life, as they not only provide food, fiber, and fuel but also are critical for provision of oxygen and adsorption of CO2. Plant breeding and genetics are powerful tools for increasing plant productivity through development of improved varieties. The rapid progress of plant genomics in recent years has opened new possibilities in targeted breeding of specific traits, and provides a powerful approach to sustainable crop production . Plant genomics, in combination with genetics and breeding, has a particularly crucial role to play in ensuring food security to the rapidly growing world population.
Arabidopsis has served as a model plant for basic plant research due to its small size, self-pollination, short life cycle, ease of propagation and genetic transformation . Additionally, its small genome and the availability of its genome sequence made it a favorite for genetic and molecular studies. The sheer volume and extent of Arabidopsis-related research and integration of genetic, molecular, biochemical, genomic and morphological data from Arabidopsis provided insights into many universal aspects of plant biology. Due to a high level of synteny (Box 1) between the genomes of various plant species, the Arabidopsis genome also provided information on the structure of other Eurosid genomes .
The family Solanaceae, having many species of economic importance, such as tomato, potato, tobacco, pepper, eggplant, and so on, is the most extensively studied family among the Euasterids. Solanaceous crop genomics is in an exciting phase of development following the recent sequencing of the potato and tomato genomes [6, 7]. Tomato (Solanum lycopersicum L.), originated in South America, and was spread around the world to become one of the most extensively used vegetable crops. Besides its economic value, it has interesting developmental features, such as compound leaves, fleshy fruits, and sympodial shoot branching (Box 1, Figure 2) [39–41]. Moreover, it has simple diploid genetics, short generation time, and routine transformation technology and is easy to maintain. Together these make tomato an excellent species for both basic and applied plant research.
Several genetic and genomic resources were available for tomato before the inception of the tomato genome sequencing project. Large germplasm collections consisting of numerous accessions of landraces of tomato (S. lycopersicum) and its wild relatives (Box 2) [42, 43] had been established, many of which are sexually compatible with tomato and are also are a source of valuable disease resistance and other genes that had been exploited by breeders to develop modern cultivated tomato varieties. Tomato geneticists had used a number of morphological and isozyme markers to construct a genetic map of tomato and identified the 12 linkage groups corresponding to the cytologically visible chromosomes. This aided the construction of an RFLP (restriction fragment length polymorphism) linkage map [44, 45]. The resulting comprehensive molecular linkage map enabled breeders to identify quantitative trait loci (QTLs) leading to an understanding of the genetic basis of numerous quantitative traits . The Solanaceae Genomics Network website provides extensive information on the available tomato genetic and genomic resources .
Domesticated tomato and related wild species (Box 2) exhibit tremendous genetic and trait biodiversity, making the group highly suitable for evolutionary and domestication studies [48, 49]. Sequence diversity analysis of extensive expressed sequence tags from domesticated and wild tomato species identified numerous inter- and intra-specific polymorphisms, many of which could be important for domestication . In order to exploit the rich trait reservoir of domesticated and wild tomato species, tomato breeders developed advanced backcross mapping populations for identifying and transferring favorable QTLs from wild to cultivated germplasm. This subsequently led to development of permanent mapping populations in the form of introgression lines (ILs) where, by repeated backcrossing, a segment of a wild species genome is introduced into a cultivated tomato background [51, 52]. A set of 76 ILs, ensuring complete genome coverage of S. pennellii introgressed into the cultivated tomato M82 variety, have been extensively phenotyped for numerous traits such as morphology, yield, fruit quality, and fruit primary and secondary metabolites for the identification of QTLs . The high-resolution mapping approach applied to S. pennellii ILs has led to the map-based cloning of the sugar yield QTL Brix9-2-5, and the fruit weight QTL fw2.2 [54, 55]. Brix9-2-5 was delimited to an invertase gene, which is expressed early in fruit development, whereas fw2.2 was delimited to the gene ORFX, which is expressed early in floral development. Classical breeding and marker analysis has also made remarkable contributions to improve various yield traits of tomato. For example, the fruit size QTL fasciated, initially identified using a cross between S. lycopersicum and S. pimpenellifolium, has recently been characterized . The first example of yield improvement by a single overdominant gene (SINGLE FLOWER TRUSS) through heterosis has been demonstrated in tomato . Furthermore, tomato and its wild relatives have also been used as a model for self/hybrid incompatibility studies [58, 59].
Sequencing of the tomato genome was initiated in 2005 as a multinational effort between 14 countries. The genome of the domesticated tomato Heinz 1706 was sequenced using a combination of longer Sanger and 454/Roche GS FLX reads and high-coverage, shorter SOLiD and Illumina GAIIx reads. The sequences were assembled into 91 scaffolds, covering 760 Mb of the approximately 900 Mb of genome, aligned to the 12 tomato chromosomes with 34,727 predicted protein-coding genes . Most of the gaps were restricted to repeat-rich pericentromeric regions. Additionally, a draft sequence of the closest wild relative S. pimpinellifolium was compared to the Heinz sequence. The two genomes are highly similar showing only 0.6% nucleotide divergence. Sixty percent of the genes are identical or with only synonymous changes between domesticated and wild tomato, while the remaining 40% have non-synonymous changes, including alterations of stop codons with potential consequences for gene function. Compared to the potato genome, the tomato and S. pimpinellifolium genomes show more than 8% nucleotide divergence. Moreover, the tomato genome is highly syntenic with the genomes of other economically important members of the family Solanaceae, such as eggplant and pepper. Comparative genome analysis identified two consecutive triplication events in the Solanum lineage. Interestingly, these genome triplications added new gene family members such as transcription factors and enzymes necessary for ethylene biosynthesis and perception, which mediate important fruit-specific functions.
Modern tomato genetics had already used molecular markers and functional analysis to identify a handful of genes underlying developmental or yield traits, but the availability of the tomato genome sequence will further revolutionize tomato genetics and breeding. However, since the domesticated tomato varieties show limited genetic diversity, the wild tomato relatives provide a rich source of useful allelic variation. The 150 tomato genome resequencing project was recently initiated with an objective to reveal and explore extant genetic variation in tomato, and will provide a major boost to identification of valuable alleles. The project aims to sequence 83 genotypes, including 30 wild accessions, 43 land races and 10 old varieties . This will not only help identify useful SNPs from the wild accessions but also rare SNPs within domesticated varieties. Tomato breeders can then target gene variants (SNPs) in the wild species associated with desirable traits such as disease or pest resistance or growth in extreme environmental conditions and introduce them into cultivars in order to exploit the rich tomato germplasm for breeding purposes. More genome sequences will facilitate QTL identification, mapping and cloning of underlying genes, and provide new SNP markers for marker-assisted breeding. For example, genome-wide association studies (GWAS) will allow detection and fine mapping of QTLs in the post-genome era, given the high phenotypic diversity among various tomato wild relatives [61, 62]. QTL analyses will also help to investigate the process of domestication and associated yield increase [56, 63]. Additionally, millions of informative markers (SNPs/InDels) and structural variations, such as duplications, inversions, transpositions, and so on, identified through comparison of genome sequences of domesticated and wild tomatoes will promote investigations into the genetic and molecular basis of the process of domestication and crop improvement.
Identification of introgressions of segments of the S. pimpinellifolium genome into the Heinz genome already suggests that introgression through conventional (rather than marker assisted) breeding has been a significant factor in crop improvement/domestication in tomato . These wild-species introgressions have provided disease resistance and others have been associated with small fruit size (cherry tomatoes). ILs in the background of cultivated tomato exist for many wild tomato species [51, 64–68]. The tomato ILs are an excellent tool for functional genomics studies to investigate genetic and environmental interactions. Expression QTLs (eQTLs), as identified by large-scale transcriptome profiling of the ILs, will be useful in connecting phenotypic variation to genotypic diversity, thus leading to a hypothetical regulatory network based on location of eQTLs and phenotypic QTLs . Integrating additional functional genomics approaches such as metabolomics and proteomics can significantly reduce the number of candidate genes for a given QTL [70, 71]. One of the major thrusts of functional genomics in future will be RNA-seq enabled transcriptome profiling. For example, comparison of transcriptome profiles from domesticated and wild tomato species will give us insights into the gene expression differences associated with the process of domestication and trait diversity. The tomato functional genomics database (TFGD), which includes microarray, metabolite and small RNA data, has already been established as a comprehensive resource even before the complete tomato genome sequence was released .
The advent of next generation sequencing and available genome sequence should make characterization of large collections of tomato mutants even more rapid and robust through sequencing of phenotyped sub-pools from F2 populations and subsequent mapping using methods such as SHOREmap and next generation mapping (NGM) [73, 74]. Availability of the tomato genome sequence will speed up the understanding of gene function in developmental and metabolic pathways and identify key steps in co-regulation mechanisms by mapping relevant tomato mutants. Additionally, multiple TILLING (Targeting Induced Local Lesions IN Genomes) resources in different backgrounds have already been developed for tomato functional genomics [75, 76]. These TILLING resources, in combination with the tomato genome sequence, should be useful for both forward and reverse genetics in tomato for both basic science and/or crop improvement.
Besides the genus Solanum, the family Solanaceae has more than 3,000 species that exhibit diversity in development, organ morphology, metabolism and geographic distribution. Many of these species have high economic, nutritional and agricultural importance. The SOL-100 sequencing project, with an objective to centralize sequences and phenotypes of 100 different species across the phylogenetic diversity of the group, will facilitate the genetic mapping of simple and complex phenotypes affecting the numerous diverse traits in SOL-100 species . It has long been known that gene duplication events followed by sub-functionalization and neo-functionalization have fueled the process of evolution. Genome sequences of SOL-100 species will promote comparative genetics efforts in the family Solanaceae, and will provide important insights into the evolution of gene families . This knowledge will help identify genes for traits that may be useful in tomato breeding - either through introgression from sexually compatible species or by moving them into tomato via a transgenic route.
In the post-genomic era, an overwhelming amount of data from different 'omics' approaches is being generated and utilized for genomics research. It is becoming increasingly clear that with the availability of new plant genome sequences from both the family Solanacae and more crops of agricultural interest coupled with cheap next generation sequencing technologies, conversion of raw data into biologically meaningful information will require better and easily accessible bioinformatics tools . Progress in plant research in general will depend on our ability to tie together independent components such as genotypic information, phenotypic data, expression profiles and so on into higher order complexity with multiple dimensions. The availability of genome sequences and the ability to handle large data sets will promote system biology approaches for crop plant research to build higher order gene regulatory networks for understanding plant developmental and metabolic pathways . For example, laser capture microdissection (LCM) or fluorescence-activated cell sorting (FACS) in combination with RNA-seq have enabled us to generate transcriptome profiles of specific tissues/cell types, such as leaf, inflorescence, fruit, and so on, at high spatial and temporal resolution. The resulting large-scale data generated can be integrated using bioinformatic tools to understand how cell/tissue-specific gene expression leads to the production of whole organ phenotypes and will address the developmental changes associated with environmental adaptation and diversification of crop plants [80, 81]. Integration of other 'omics' data from proteomics, metabolomics, epigenomics and other studies will further allow crop biology to be approached from a systems perspective.
Genome sequencing of Euasterids in progress
Draft sequence available
Joint effort of Northern Illinois University, The Petunia Platform and BGI
Monocots: A group of flowering plants whose seeds typically have one cotyledon (embryonic-leaf).
Dicots: A group of flowering plants whose seeds typically have two cotyledons (embryonic leaves). However, the term dicots is paraphyletic. 'Eudicots' defines a monophyletic group that excludes monocots, their allies and more basal 'dicot' species.
Synteny: Physical co-localization of genetic loci in the same relative positions in two or more genomes.
Sympodial branching: In this type of branching the shoot apex terminates in a flower and an axillary bud or buds continues growth of the inflorescence. The process is reiterated many times.
Tomato wild relatives fall into two groups: red- or orange-fruited species such as S. pimpinellifolium, S. galapagense, and S. cheesmaniae, and the green-fruited species such as S. neorickii, S. chmielewskii, S. chilense, S. arcanum, S. corneliomulleri, S. huaylasense, S. peruvianum, S. habrochaites, S. pennellii, S. lycopersicoides, S. sitiens, S. ochranthum and S. juglandifolium. Red-fruited species accumulate glucose and fructose, while the green-fruited species accumulate sucrose . Traditionally, wild and cultivated tomatoes were grouped within the genus Lycopersicon in the Solanaceae. However, molecular phylogenetic studies support placement of tomatoes in the genus Solanum .
genome-wide association study
quantitative trait locus
We thank Dr Lauren Headland, Dr Dan Chitwood, Dr Ravi Kumar and Brad Townsley for their input and comments on this opinion. This work by is funded by NSF DBI-082085 (to NS).