Comparative genomics of Arabidopsisand maize: prospects and limitations
- Volker Brendel1Email author,
- Stefan Kurtz2 and
- Virginia Walbot3
https://doi.org/10.1186/gb-2002-3-3-reviews1005
© BioMed Central Ltd 2002
Published: 14 February 2002
Abstract
The completed Arabidopsis genome seems to be of limited value as a model for maize genomics. In addition to the expansion of repetitive sequences in maize and the lack of genomic micro-colinearity, maize-specific or highly-diverged proteins contribute to a predicted maize proteome of about 50,000 proteins, twice the size of that of Arabidopsis.
Keywords
Maize (Zea mays L., corn) was domesticated in the highlands of Central Mexico approximately 10,000 years ago [1]. Corn agriculture spread rapidly into diverse climate zones, ranging from 45° N to 45° S, and supported vast Native American civilizations. Today, maize is one of the world's most important crops: for direct human consumption, as a key component of animal feed, and as the source of chemical feed stocks. Grass species (including maize) cover 20% of the terrestrial surface of the earth, and the grains from maize, rice, wheat, and minor grass crops provide the majority of calories in the human diet [2].
Maize inflorescences. The separation of (a) female inflorescence (ear) and (b) male inflorescence (tassel) is one of the key features of the maize plant responsible for its pivotal role in plant genetics, greatly simplifying controlled pollination (photos courtesy of Tom Peterson, Iowa State University).
The beautiful detail evident in meiotic maize chromosomes stimulated a generation of gifted cytogeneticists to identify the physical basis for recombination, to construct linkage maps tied to chromosomes, and to analyze the consequences of chromosome breakage. Of particular importance to current functional genomics was Barbara McClintock's discovery of transposable elements by analyzing the regulation of somatic variegation and germinal mutation in maize. Once maize transposons were molecularly cloned, they provided the means to clone any tagged gene: maize provided the first discovery of many plant-specific gene products and facilitated the cloning of related genes from other flowering plants. The availability of detailed genetic knowledge, a large community of researchers, and ease of gene cloning and genetic analysis make maize the monocotyledenous species of choice for many studies.
The maize genome is organized into 10 chromosomes (2N = 20), and is about 2.4 × 109 base-pairs in total. Sorghum, which is estimated to have diverged from a common ancestor with maize about 15-20 million years ago (MYA), has the same chromosome number, but its genome is about one third of the size. Rice diverged from a common ancestor with maize and sorghum about 50-60 MYA and has 12 chromosomes (2N = 24), comprising a much smaller genome of about 430 million base-pairs. Comparative genomics of these grasses suggests considerable colinearity between their genomes [3]. The size differences of the genomes are presumed to result from the ancestral allotetraploidization (approximate duplication from diploid to tetraploid when two species hybridize) of the maize genome [4] and differences in the expansion and dispersion of repetitive DNA (long terminal repeat retrotransposons, miniature inverted repeat transposons, and other repetitive sequences) [5].
In December 2000, Arabidopsis thaliana became the first plant species for which the genome was almost entirely sequenced (currently, 117 of an estimated 125 million base-pairs are available, with only centromeric and ribosomal DNA repeat regions as yet unsequenced [6]; reviewed in [7]). Because of its small genome size, ease of transformation, and tolerance of life in a growth chamber, this seemingly lowly weed has emerged as the model flowering plant, ahead of commercially important crops. The choice will be well justified if the evolutionarily recent advent of flowering plants means that most genes found in Arabidopsis prove to be common to all flowering plants. Among the crops, members of the Brassica genus (including B. oleracea and B. rapa, the so-called 'cole-crops', oilseeds, and mustard) are most closely related to Arabidopsis (divergence less than 20 MYA). Gene order seems to be largely conserved, and thus the Arabidopsis genome should prove a powerful tool for studying Brassica genomics [8,9]. Significant colinearity has also been observed between Arabidopsis and soybean [10] (divergence time 100 MYA), and Arabidopsis and tomato [11,12] (divergence time more than 100 MYA). This article assesses the prospects for comparative maize-Arabidopsis genome analysis in view of the greater divergence time (more than 150 MYA) between grasses (which are monocots) and flowering plants (dicots).
Lack of synteny between maize and Arabidopsis
The extent of conservation of gene order between the grasses and Arabidopsis can be estimated from three well-studied groups of maize loci: the a1-sh2 region [13,14,15], the adh1 region [16,17], and the bz locus and its associated genes [18]. The a1-sh2 region in maize, sorghum, and rice contains the sh2 gene upstream of a1, transcribed in the same direction. The a1 gene encodes an NADPH dihydroflavonol reductase required for anthocyanin biosynthesis and sh2 encodes an endosperm-expressed ADP glucose pyrophosphorylase important in starch biosynthesis. The two genes are separated by about 140 kilobases (kb) in maize but only about 19 kb in sorghum and rice. Moreover, a1 is duplicated in sorghum. Sequences that are highly similar to sh2 can be found on Arabidopsis chromosomes 1, 2, 4, and 5. Potential homologs of a1 map to Arabidopsis chromosomes 2 and 5, but they are far apart from the potential sh2 genes. Recently, two additional genes have been identified in the a1-sh2 interval: x1 and yz1, which are of unknown function and conserved among maize, rice, and sorghum [14,19].
Genic regions are generally conserved between the adh1 regions of maize and sorghum, although adh1 is the only gene with assigned function (alcohol dehydrogenase), and maize is missing three out of ten other potential genes within this region [16]. Whereas the maize region is replete with retrotransposons, gathered into sequence blocks of 14-70 kb and inserted between the potential genes, the sorghum sequence does not contain any retrotransposons. Colinearity with Arabidopsis appears limited to a block of two genes conserved between sorghum and Arabidopsis [16]. Interestingly, the colinearity of this locus pair is interrupted even between maize and rice [17].
The recently sequenced bz locus of maize and its chromosomal region displays a gene-dense genomic organization very different from adh1, with ten putative genes within a 32 kb stretch that is free of retrotransposons [18]. Although this gene density is similar to that in Arabidopsis, and most of the genes have potential homologs in Arabidopsis according to the genome sequence, no colinearity is evident. Thus, on the basis of our current picture of plant genome organization, micro-colinearity between different genomes may be even more limited than has previously been stated [20].
Proteome comparisons
Maize proteins with no obvious homologs in Arabidopsis
Protein | GenBank accession number | Function |
---|---|---|
BETL(2-4) | CAB4466(2-4) | Anti-microbial, endosperm |
Ribosome-inactivating proteins | S11859, CAC16167, P10593, T03942 | Anti-microbial, anti-fungal |
Female gametophyte-specific protein ES3 | AAK08134 | Defensin |
Basal layer anti-fungal peptides | CAC21604, CAC21605, CAC21607 | |
Trypsin inhibitor | TIZM, TIZM1, S36236 | Anti-insect |
RAB-17 | S08633 | Vesicle traffic |
FDR3 | AAK53546 | Iron stress |
ZmGR2(b,c) | BAA7480(6,7) | Gibberellin-responsive |
Aluminum-induced proteins | AAB86493, T01322 | |
ABA- and ripening-inducible-like protein | T02081 | |
Bundle-sheath cell specific protein 1 | BAB20906 | C4 photosynthesis |
Peroxidase K | AAC79955 | |
Phytase | T04130 | Degradation of phytic acid, the main phosphor storage in maize seeds |
ESR1c1 | CAA67122 | Endosperm-specific |
Teosinte-branched protein 1 | AAK30124 | Associated with maize domestication (specific alleles) |
Globulin 1O | C53234 | Storage |
Ae(1,3) | CAB5655(2,3) | Amylase extender; modification of kernel starch composition |
Arabinogalactan protein | AAF43497 | Cell-wall component |
Probable membrane protein DAD1 | T01578 |
Maize EST analysis
Comparison of maize proteins predicted from EST sequences with Arabidopsis proteins. A non-redundant set of protein sequences consisting of at least 120 amino acids each, derived from 27,294 distinct maize ESTs, was compared with 25,617 putative Arabidopsis proteins at different BLASTP stringency levels. The percentages in each pie chart give the fractions of the two sequence sets involved in these matches, at each stringency level.
A glimpse of the maize genome
Analysis of maize genome survey sequences: a comparison with maize proteins and ESTs
Approach | Number of entries | Unique sequences | wORF | Comparison with maize proteins | Reference | ||||
---|---|---|---|---|---|---|---|---|---|
%NS | %TE | %HP | %KP | NS %EST | |||||
Mutator insertions | 4412 | 970 | 375 | 93 | 3 | 2 | 2 | 26 | [26] |
Random inserts | 3480 | 2529 | 1015 | 61 | 38 | 1 | 1 | 44 | [27] |
Methylation filter | 1692 | 1083 | 258 | 84 | 10 | 2 | 3 | 37 | [28] |
BAC ends | 945 | 881 | 454 | 48 | 51 | 0 | 0 | 28 | [29] |
MPP | 669 | 338 | 150 | 80 | 1 | 7 | 11 | 47 | [30] |
ORFs | 399 | 86 | 79 | 76 | 0 | 14 | 10 | 22 | [31] |
Other | 28 | 11 | 3 | 33 | 67 | 0 | 0 | 0 |
Analysis of maize genome survey sequences: a comparison with Arabidopsis proteins
Approach | %NS | %TE | %HP | %KP |
---|---|---|---|---|
Mutator insertions | 69 | 3 | 24 | 4 |
Random inserts | 63 | 33 | 3 | 0 |
Methylation filter | 81 | 7 | 11 | 0 |
BAC ends | 47 | 48 | 5 | 0 |
MPP | 59 | 3 | 34 | 4 |
ORFs | 46 | 1 | 49 | 4 |
Other | 33 | 67 | 0 | 0 |
To assess these possibilities, we compared the sequences of novel ORFs with the maize EST set (application of GeneSeqer [22]). The result, that 26-44% of the four large GSS collections match (a still limited collection of) maize ESTs (see Table 2), suggests that many of the ORFs do indeed correspond to expressed genes. The remaining fraction may include less abundantly expressed genes. We can estimate the gene fraction accessible by EST sequencing from the EST coverage of GSS-derived ORFs: if the roughly 10,000 novel ORFs in the maize EST set constitute only 40% of the genes, we can anticipate some 25,000 novel maize proteins that are not found in Arabidopsis. It is likely that many of these proteins are derived from gene duplications. The lack of sequence conservation across the monocot-dicot divide suggests that there has been extensive functional divergence after duplication.
The need for a maize genome sequencing project
On the basis of available data, we think that the resource provided by the Arabidopsis genome cannot adequately substitute for more extensive maize genome sequencing. Genome organization is very different between the two plants, and the proteomes may also have significant differences, particularly with respect to agronomically important maize genes involved in plant-pathogen interactions, reproduction, and the development and function of specific tissues. The many exceptions to micro-colinearity even among the grasses suggest that the completion of the rice genome [32] will still not answer many of the questions particular to maize genomics. Beyond questions concerning agronomically important traits, plant biologists also look to maize as a model for the evolution of plant genomes that are not as small and streamlined as those of Arabidopsis and rice [33]. Correspondingly, a maize genome sequencing project will focus on sequencing gene-rich genome fractions first [34], and other crop genome projects are likely to follow. Plant biologists should look forward to very exciting times when whole-genome comparisons become possible, leading to a clearer understanding of the development of plants from their genetic blueprints.
Declarations
Acknowledgements
V.B. and V.W. were supported in part by NSF Plant Genome Research Program grant DBI-9872657. S.K. was partially supported by grant KU 1257/1 from the Deutsche Forschungsgemeinschaft. The authors are grateful to Phil Becraft, Alan Myers, Tom Peterson, Pat Schnable, and Robert Thornburg for critical comments on the manuscript.
Authors’ Affiliations
References
- Wang R-L, Stec A, Hey J, Lukens L, Doebley J: The limits of selection during maize domestication. Nature. 1999, 398: 236-239. 10.1038/18435.PubMedView ArticleGoogle Scholar
- Kellogg EA: Evolutionary history of the grasses. Plant Physiol. 2001, 125: 1198-1205. 10.1104/pp.125.3.1198.PubMedPubMed CentralView ArticleGoogle Scholar
- Freeling M: Grasses as a single genetic system. Reassessment 2001. Plant Physiol. 2001, 125: 1191-1197. 10.1104/pp.125.3.1191.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaut BS, Doebley JF: DNA sequence evidence for the segmental allotetraploid origin of maize. Proc Natl Acad Sci USA. 1997, 94: 6809-6814. 10.1073/pnas.94.13.6809.PubMedPubMed CentralView ArticleGoogle Scholar
- White S, Doebley J: Of genes and genomes and the origin of maize. Trends Genet. 1998, 14: 327-332. 10.1016/S0168-9525(98)01524-8.PubMedView ArticleGoogle Scholar
- The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Walbot V: A green chapter in the book of life. Nature. 2000, 408: 794-795. 10.1038/35048685.PubMedView ArticleGoogle Scholar
- O'Neill CM, Bancroft I: Comparative physical mapping of segments of the genome of Brassica oleracea var. alboglabra that are homoeologous to sequenced regions of chromosomes 4 and 5 of Arabidopsis thaliana. Plant J. 2000, 23: 233-243. 10.1046/j.1365-313X.2000.00781.x.PubMedView ArticleGoogle Scholar
- Paterson AH, Lan T, Amasino R, Osborn TC, Quiros C: Brassica genomics: a complement to, and early beneficiary of, the Arabidopsis sequence. Genome Biol. 2001, 2: reviews1011.1-1011.4. 10.1186/gb-2001-2-3-reviews1011.View ArticleGoogle Scholar
- Grant D, Cregan P, Shoemaker RC: Genome organization in dicots: genome duplication in Arabidopsis and synteny between soybean and Arabidopsis. Proc Natl Acad Sci USA. 2000, 97: 4168-4173. 10.1073/pnas.070430597.PubMedPubMed CentralView ArticleGoogle Scholar
- Ku HM, Liu JP, Doganlar S, Tanksley SD: Exploitation of Arabidopsis-tomato synteny to construct a high-resolution map of the ovate-containing region in tomato chromosome 2. Genome. 2001, 44: 470-475. 10.1139/gen-44-3-470.PubMedView ArticleGoogle Scholar
- Mysore KS, Tuori RP, Martin GB: Arabidopsis genome sequence as a tool for functional genomics in tomato. Genome Biol. 2001, 2: reviews1003.1-1003.4. 10.1186/gb-2001-2-1-reviews1003.View ArticleGoogle Scholar
- Civardi L, Xia YJ, Edwards K, Schnable PS, Nikolau BJ: The relationship between the genetic and physical distances of the cloned a1-sh2 interval of the Zea mays L. genome. Proc Natl Acad Sci USA. 1994, 91: 8268-8272.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen M, SanMiguel P, De Oliveira AC, Woo S-S, Zhang H, Wing RA, Bennetzen JL: Microcolinearity in sh2-homologous regions of the maize, rice, and sorghum genomes. Proc Natl Acad Sci USA. 1997, 94: 3431-3435. 10.1073/pnas.94.7.3431.PubMedPubMed CentralView ArticleGoogle Scholar
- Bennetzen JL, SanMiguel P, Chen MS, Tikhonov A, Francki M, Avramova Z: Grass genomes. Proc Natl Acad Sci USA. 1998, 95: 1975-1978. 10.1073/pnas.95.5.1975.PubMedPubMed CentralView ArticleGoogle Scholar
- Tikhonov AP, SanMiguel PJ, Nakajima Y, Gorenstein NM, Bennetzen JL, Avramova Z: Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc Natl Acad Sci USA. 1999, 96: 7409-7414. 10.1073/pnas.96.13.7409.PubMedPubMed CentralView ArticleGoogle Scholar
- Tarchini R, Biddle P, Wineland R, Tingey S, Rafalski A: The complete sequence of 340 kb of DNA around the rice Adh1-Adh2 region reveals interrupted colinearity with maize chromosome 4. Plant Cell. 2000, 12: 381-391. 10.1105/tpc.12.3.381.PubMedPubMed CentralView ArticleGoogle Scholar
- Fu HH, Park WK, Yan XH, Zheng ZW, Shen BZ, Dooner HK: The highly recombinogenic bz locus lies in an unusually gene-rich region of the maize genome. Proc Natl Acad Sci USA. 2001, 98: 8903-8908. 10.1073/pnas.141221898.PubMedPubMed CentralView ArticleGoogle Scholar
- Yao H, Zhou Q, Li J, Smith H, Yandeau M, Nikolau B, Schnable PS: Meiotic recombination across the 140 kb multigenic maize a1-sh2interval. Proc Natl Acad Sci USA. Google Scholar
- Bennetzen JL: Comparative sequence analysis of plant nuclear genomes: microcolinearity and its many exceptions. Plant Cell. 2000, 12: 1021-1029. 10.1105/tpc.12.7.1021.PubMedPubMed CentralView ArticleGoogle Scholar
- Kurtz S, Choudhuri JV, Ohlebusch E, Schleiermacher C, Stoye J, Giegerich R: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001, 29: 4633-4642. 10.1093/nar/29.22.4633.PubMedPubMed CentralView ArticleGoogle Scholar
- TIGR FTP site: Arabidopsis putative proteins. [ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/SEQUENCES/ATH1.pep]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Usuka J, Brendel V: Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring. J Mol Biol. 2000, 297: 1075-1085. 10.1006/jmbi.2000.3641.PubMedView ArticleGoogle Scholar
- dbEST. [http://www.ncbi.nlm.nih.gov/dbEST]
- ZmDB: Maize genome database. [http://www.zmdb.iastate.edu/]
- Meyers BC, Tingey SV, Morgante M: Abundance, distribution and transcriptional activity of repetitive elements in the maize genome. Genome Res. 2001, 11: 1660-1676. 10.1101/gr.188201.PubMedPubMed CentralView ArticleGoogle Scholar
- Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA: Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nat Genet. 1999, 23: 305-308. 10.1038/15479.PubMedView ArticleGoogle Scholar
- Maize ZMMBBb STC project. [http://www.genome.clemson.edu/projects/stc/maize/ZMMBBb/index.html]
- Maize mapping project. [http://www.cafnr.missouri.edu/mmp/]
- ISU maize genome project. [http://maize.math.iastate.edu/isumaize/homepage.html]
- Yu J, Hu S, Wang J, Li S, Wong KG, Liu B, Deng Y, Dal L, Zhou Y, Zhang X, et al: A draft sequence of the rice (Oryza sativa ssp. Indica) genome. Chinese Sci Bull. 2001, 46: 1937-1942.View ArticleGoogle Scholar
- Gaut BS, Le Thierry d'Ennequin M, Peek AS, Sawkins MC: Maize as a model for the evolution of plant nuclear genomes. Proc Natl Acad Sci USA. 2000, 97: 7008-7015. 10.1073/pnas.97.13.7008.PubMedPubMed CentralView ArticleGoogle Scholar
- Bennetzen JL, Chandler VL, Schnable P: National Science Foundation-sponsored workshop report. Maize genome sequencing project. Plant Physiol. 2001, 127: 1572-1578. 10.1104/pp.127.4.1572.PubMedPubMed CentralView ArticleGoogle Scholar