Construction, alignment and analysis of twelve framework physical maps that represent the ten genome types of the genus Oryza
- HyeRan Kim1,
- Bonnie Hurwitz2,
- Yeisoo Yu1,
- Kristi Collura1,
- Navdeep Gill3,
- Phillip SanMiguel4,
- James C Mullikin5,
- Christopher Maher6,
- William Nelson7,
- Marina Wissotski1,
- Michele Braidotti1,
- David Kudrna1,
- José Luis Goicoechea1,
- Lincoln Stein2,
- Doreen Ware2, 8,
- Scott A Jackson3,
- Carol Soderlund7 and
- Rod A Wing1Email author
© Kim et al.; licensee BioMed Central Ltd. 2008
Received: 14 December 2007
Accepted: 28 February 2008
Published: 28 February 2008
We describe the establishment and analysis of a genus-wide comparative framework composed of 12 bacterial artificial chromosome fingerprint and end-sequenced physical maps representing the 10 genome types of Oryza aligned to the O. sativa ssp. japonica reference genome sequence. Over 932 Mb of end sequence was analyzed for repeats, simple sequence repeats, miRNA and single nucleotide variations, providing the most extensive analysis of Oryza sequence to date.
Comparative genomics is a powerful tool for unraveling the evolutionary history and gene functionality of related species. The availability of high resolution genetic and physical maps and genome assemblies has established comparative genomics platforms in organisms ranging from bacteria and fungi to animals and plants [1–7]. The majority of sequence-based eukaryotic platforms have focused on comparisons between genera [8–11]. Although highly informative, the divergence times between genera are often too distant to be useful for the identification of conserved noncoding sequences such as transcription factor binding sites, enhancers and matrix attachment regions (for example, see [12, 13]). Sequence comparisons between species within a single genus have focused primarily on orthologous loci or genomic regions [14, 15]. Whole genome comparative platforms within a genus are still in their infancy, with the most notable exception being the generation of whole genome shotgun assemblies of 12 Drosophila species that span an evolutionary time frame of approximately 60 million years .
In plants, rice (Oryza sativa) and thale cress (Arabidopsis thaliana) are important model systems for functional and evolutionary biology. Both genomes have been completely sequenced [17, 18] and serve as the reference sequences for the major monocot and dicot plant lineages. Rice has added significance by virtue of being the world's most important crop, directly feeding half the human population. The population that depends on rice is expected to double in 25-50 years and there is thus an intense effort by breeders to double current rice yields using less land and water and on poorer soils . To achieve this goal, the plant biology community is engaged in a concerted effort to functionally characterize all plant genes in both model plants using approaches including genetics, transgenetics, and comparative genomics. These efforts have significance not only to world food and bio-energy security issues but also to fundamental eukaryotic biology.
Comparative genomics across the cereals and within the genus Oryza will play a major role in these efforts. Oryza is composed of 24 species, including 2 domesticated (O. sativa, and O. glaberrima) and 22 wild species [20, 21]. These species are classified into ten genome types (six diploids and four allotetraploids) based on crossing barriers [22, 23], chromosome pairing [24, 25], morphology [26, 27], and molecular phylogenetics . Compared to the domesticates, the wild relatives of rice are phenotypically inferior grass-like plants, but are a virtually untapped reservoir of agriculturally important genes and allelic variants that can be used to improve cultivated rice [21, 29]. Therefore, comparative genomics of wild and domesticated Oryza species may not only provide useful materials for breeding, but will also shed light on the evolution and domestication of cultivated rice.
Our long-term objective is to develop a genome-level comparative experimental system for the genus Oryza (the Oryza Map Alignment Project (OMAP)). Our strategy is to develop bacterial artificial chromosome (BAC)-end sequence (BES) physical frameworks of the ten genome types of Oryza and align them to the reference sequence. These frameworks can then be populated with an array of phenotypic, genetic, biochemical, and physiological data to address fundamental questions in biology and agriculture. Previously, we described the construction and characterization of 12 BAC libraries from representative species that encompass the 10 genome types of Oryza . Here we report the construction of 12 BAC/BES framework physical maps derived from these libraries and an analysis of the BESs in terms of transposable element, simple sequence repeat (SSR), microRNA (miRNA), and single nucleotide variation (SNV) content.
Fingerprinting, BAC end sequencing and contig assembly to generate phase I physical maps for the ten genome types of Oryza
Summary of BAC end sequences of 12 Oryza species
Genome size (Mb)*
No. of GenBank submissions
Average length after trim (in GenBank)
Total sequenced length (in GenBank)
No. of forward reads
No. of reverse reads
No. of clones with paired reads (% of total BES)
~ 71 Mb
~ 50 Mb
~ 39 Mb
~ 49 Mb
~ 72 Mb
~ 95 Mb
~ 75 Mb
~ 85 Mb
~ 45 Mb
~ 93 Mb
~ 129 Mb
~ 129 Mb
~ 932 Mb
Summary of phase I FPC physical maps of 12 Oryza species
Average insert size
No. of total attempted
No. of clones FPCed
Genome coverage by all FPCed clones
Total organellar contam + no. inserts containing clones*
No. of FPC clones with paired BES reads
No. of singletons (%)
No. of contigs
Total CB units
Average no. of bands/clone
Deduced size of 1 CB unit† (kb)
Deduced genome size (Mb)‡ (coverage)
To determine the approximate genome coverage of each of the 12 phase I physical maps, we estimated the size of the ordered consensus bands for each map in kilobases based on the average insert size of each BAC library and the average number of fingerprint bands observed for each species. Using this estimation, the genome coverage of the phase I physical maps ranged from 136% in O. granulata [GG] to 77% in O. brachyantha [FF]. In total, 7 of the physical maps had coverages greater that 100% (O. nivara, O. rufipogon, O. glaberrima, O. minuta, O. alta, O. granulata, and O. ridleyi), 4 had coverages between 98% and 87% (O. punctata, O. officinalis, O. australiensis, and O. coarctata) and 1 had a coverage of 77% (O. brachyantha). Coverage discrepancies above or below 100% could be due to a number of parameters, including: possible overlaps between the contigs that were not merged by the criteria used in this study; inaccurate estimation of genome size; and over-estimation of average insert sizes of the BAC libraries that would lead to an over-estimation of kilobases per consensus band (CB).
Alignment of phase I physical maps to the O. sativareference sequence
Summary of alignments of 12 OMAP Phase I FPC maps to the O. sativa RefSeq
Not aligned contig
No. of contigs aligned
Total CB aligned
Total size aligned* (Mb)
No. of clones in aligned contig
No. of BESs aligned
Average identity of BES alignments
E-value of BES alignments
No. of clones/contig (average)
e-237 to e-33
e-224 to e-57
e-213 to e-42
e-183 to e-30
e-204 to e-16
e-173 to e-07
e-193 to e-02
e-182 to e-03
e-175 to e-18
e-179 to e-05
e-170 to e-06
e-184 to e-07
Sequence analysis of the OMAP BES data set: repeats, SSRs, miRNAs, and SNV
The large BES data set produced by this project offered a unique opportunity to explore sequence content and variation across the Oryza. We next investigated repeat and SSR content across all twelve species and identified miRNA content and SNV amongst the three AA and one BB genomes.
Repeat content analysis of 12 Oryza species using RepeatMasker and RECON
Repeat content by RECON
Repeat content by RepeatMasker
Overlap with RepeatMasker (% of total)
Uniqe (% of total)
Total repeat content
O. sativa (Nipponbare)
The amount of repeats in each genome was directly proportional to the genome size except for O. granulata [GG] (Additional data file 3), suggesting that repetitive sequences play a major role in genome size expansion in the genus Oryza. O. australiensis [EE] had a high amount of repetitive DNA, consistent with the observation of a rapid burst of retrotransposons in this species .
Analysis of O. alta [CCDD] revealed that approximately 9% of the BESs were categorized as unclassified repeats, high relative to other 12 Oryza species (2% in O. minuta [BBCC] to 0.2% in O. coarctata [HHKK]), implying the existence of one or more unique classes of repeats in O. alta or the presence of highly diverged repeats that were recognized by our current stringency parameters. On closer inspection, we identified a copia type LTR retrotransposon in a fully sequenced BAC (OA_BBa0237I11) representing a large family of retrotransposons that accounts for the majority of these unclassified repeats. This element, considered in detail elsewhere , accounted for two-thirds (2,939,525 bases of 4,392,322 bases) of these unclassified repeat sequences when assayed using cross_match alone.
Simple sequence repeats
Distribution of OMAP non-redundant SSRs by motif type
Total no. of SSRs containing BESs (%)
SSR density (no. of SSRs/Mb)
O. sativa spp japonica
O. sativa spp indica
Distribution of SSR length by the repeat types
G/C rich TNR†
A/T rich TNR†
Identification of miRNAs in four Oryzaspecies based on OMAP alignment data
miRNAs play an important role in plant developmental and physiological processes by negatively controlling gene expression [40–43]. To detect evolutionarily conserved miRNAs between wild rice and O. sativa genomes, we anlayzed BES datasets from one domesticated and three wild Oryza genomes using an alignment-based approach.
miRNAs conservation between O. sativa and four wild rice species detected from their BES datasets
O. sativa miRNA ID hits by wild rice species*
O. sativa miRNA family
Target mRNA class
Total no. of loci
MYB and TCP TFs
f, h, k
h, k, n
Auxin response factors TFs
a, d, f
CCAAT binding factor and HAP2-like TFs
c, e, f
i, j, k
GRF TFs, rhodenase-like, and kinesin-like protein B
Laccases and beta-6 tubulin
a, b, c
Total no. of loci
Variation discovery in four wild rice species and O. sativa
Candidate SNVs between wild rice BESs and the O. sativa genome were detected with a multi-tiered alignment strategy  using BLASTZ  and cross_match followed by a base-by-base analysis of the resulting high quality alignment (see Materials and methods). Putative nucleotide variations were further filtered using sequence quality scores and quality scores of neighboring regions to exclude variations that were due to possible sequencing error.
Our initial data set contained 330,993 high quality BESs (73,716 from O. rufipogon, 110,589 from O. nivara, 73,344 from O. glaberrima and 73,344 from O. punctata). Since two assemblies of the O. sativa ssp. japonica RefSeq (International Rice Genome Sequencing Project (IRGSP) V.4 and The Institute for Genomic Research (TIGR) V.4) were available, we detected variations on both assemblies using a quality score cutoff of 23. We were able to accurately map 217,202 BESs (53,426 from O. rufipogon, 76,886 from O. nivara, 50,736 from O. glaberrima, and 36,154 from O. punctata) to the IRGSP V.4 assembly and 217,258 BESs (53,273 from O. rufipogon, 76,833 from O. nivara, 50,833 from O. glaberrima, and 36,319 from O. punctata) to the TIGR V.4 assembly. A total of 1,619,446 non-redundant SNVs (Additional data file 8a) and 319,974 insertions and deletions (INDELs; Additional data file 8b) were detected between the BESs from the four Oryza species (O. rufipogon, O. nivara, O. glaberrima, and O. punctata) and the O. sativa genome assembly from IRGSP V.4. Likewise 1,619,557 SNVs (Additional data file 8c) and 320,592 INDELs (Additional data file 8d) were detected on the TIGR V.4 assembly. The overall divergence rate was found to be 1.22%, 1.32%, 1.59% and 7.15% for O. rufipogon, O. nivara, O. glaberrima and O. punctata, respectively. The variation rate was found to be 12.2, 13.2, 15.9 and 71.5 SNVs/kb and 1.5, 1.6, 2.0 and 11.4 INDELs/kb for O. rufipogon, O. nivara, O. glaberrima and O. punctata, respectively. In comparison, the polymorphism rate reported by the Beijing Genomics Institute  between the indica and japonica cultivars of O. sativa was 1.70 SNPs/kb and 0.11 INDELs/kb . Therefore, the rate of variation between Oryza species is roughly an order of magnitude greater than the polymorphism rate within species, as we would expect. The variation data are available through dbSNP and Gramene . An example view of the variation data in Gramene is shown in Additional data file 9.
We describe here the generation and analysis of a comparative genomics resource for the genus Oryza that can be used as a research platform to address grand challenge questions in basic and applied biology. The primary resources developed were 1,452,912 BESs, 710,536 fingerprints, 12 phase I physical maps aligned to the rice RefSeq, and repeat, SSR, miRNA and SNV analyses. All sequences and variations have been deposited in GenBank and dbSNP, and all physical maps can be visualized using SyMAP  and CMAP at the OMAP  and Gramene  websites, respectively. The BAC libraries, BESs, SNaPshot fingerprints, and phase I physical maps have been publicly available since January 2006 and have been extensively accessed by the international community [31, 38, 51–56].
Estimates of genome coverage of the physical maps relative to flow cytometry data were quite close except for O. granulate. To make the Oryza physical maps more useful for future research, they must now be manually edited to merge contigs and resolve any conflicts and mis-assemblies, as was done for O. punctata [BB] . Manual editing will be particularly challenging for the four allotetraploid species, where contigs need to be assigned to one of two subgenomes for each species. Fortunately for two tetraploids, O. minuta [BBCC] and O. alta [CCDD], we can use BES information from their diploid progenitors to assist in this process.
The BES data set represents an unprecedented sampling of genomic sequence from the Oryza. This resource is enhanced by the fact that the majority of BESs are binned in FPC contigs where 6-60% of BESs (depending on species) can be mapped to the rice RefSeq, and approximately 95% are paired. Here, we mined the BES data set for repeat, SSR, miRNA and SNV content. Repetitive sequences, especially transposable elements, are major components of plant genomes and contribute significantly to genome size variation. Detailed analysis of the BES resource for O. australiensis [EE] and O. granulata [GG] BESs have already demonstrated that retrotranspositional bursts of RIRE2, Wallaby and Kangaroo in the EE genome  and Gran3 in the GG genome  account for 50% and 25% of the genome sizes of these species, respectively. The RECON data described here for O. alta [CCDD] may likely represent yet a third retrotransposon burst accounting for genome size variation, subsequent to speciation .
Further analysis of the Oryza BES data identified 16,980 and 1,619,446 non-redundant SSRs and SNVs, respectively. The SSRs and SNVs can be immediately translated into polymorphic markers for breeding crosses between the two sequenced O. sativa genomes with the AA genome OMAP accessions to accelerate crop improvement. Further, the SSRs will serve as candidates to identify polymorphic genetic makers for intraspecific Oryza mapping populations currently under development (for example, BB × BB; CC × CC, and so on). The availability of molecular genetic maps tied to our physical maps for all 12 accessions will make these resources even stronger for functional and map-based cloning applications.
Paired-end data are also extremely important for studying structural variation, such as expansions, contractions, inversions and translocations [57, 58]. Using our paired BES database, we plan to catalogue all such variation across the Oryza in order to address fundamental questions on the roles of such variation on the speciation process.
To our knowledge, the combined resources described herein comprise the most comprehensive within-genus comparative genomics framework available for any higher eukaryote, exceeded only by Drosophila, for which 12 draft genome sequences of wild flies are available . We envisage that the OMAP system will be a model for other within-genus comparative frameworks, including large genome plant genera like Gossypium (cotton), Solanum (tomato), Zea (corn) and Phaseolus (soybean). Having physical map frameworks for all genome types of a particular genus aligned to a reference genome provides immediate access to any orthologous or unique region of a genome(s) for functional and evolutionary analysis. Such frameworks are also highly compatible with next generation sequencing technologies whereby whole genomes can be sequenced in 4-8 Mb chunks, thereby reducing assemblies to the size of a bacterial genome .
The development and analysis of genus-wide research platforms is the next frontier in comparative systems biology. The genus Oryza, which includes the world's most important cereal, rice, is composed of 24 species (2 domesticated and 22 wild) with wide geographical distribution and contains 10 distinct genome types (6 diploid (2N = 24) and 4 polyploid (2N = 48)) and a 3.6-fold genome size variation (357-1,283 Mb).
We generated BAC-based physical maps of 12 Oryza species representing all the 10 genome types and aligned them to the O. sativa RefSeq. The 12 physical maps covered between 77% and 136% of each Oryza genome and revealed extensive colinearity to the O. sativa RefSeq. We analyzed the BES for repeats, SSRs, miRNA and SNVs. The repeat analysis not only cataloged the repeat content of each species, it suggested the possible roll they have played on genome size variation during the course of evolution. The SSR analysis identified 16,908 non-redundant SSRs from OMAP BESs, providing 18 SSRs/Mb of SSR density in the Oryza. We also found that the amplification of certain repeat elements contributed to an increase of the TAA and CAA motif repeats in the O. granulata [GG] and O. ridleyi [HHJJ] genomes. A total of 1,619,446 non-redundant SNVs and 319,974 insertions and deletions (INDELs) were detected between the BESs from four Oryza species (O. rufipogon, O. nivara, O. glaberrima, and O. punctata) and the O. sativa RefSeq. The SSR, SNV and INDEL markers can be immediately used for marker assistant selection as well as evolutionary studies. The miRNA analysis identified 64 paralogous miRNA precursors of O. sativa with more conservation in evolutionarily closer species. In addition, we observed species-wide miRNA families (miR166 and miR171) and AA genome specific miRNAs (miR420, miRNA441, miR442, and miR446).
All biological reagents are available at the AGI BAC/EST Resource Center  and all fingerprints and BESs are available at [49, 50] and GenBank. We envision that the resources and analysis presented here will serve as a model for the establishment of similar genus-wide frameworks for plants and animals. The genus-wide physical framework also fits in well with next generation sequencing technologies where an entire genome or targeted regions across a genus can be reduced to bacterial genome size chunks that can be easily sequenced and assembled as opposed to a whole genome sequencing approach.
Materials and methods
BAC libraries, end sequencing, fingerprinting and FPC assembly
FPC map alignment to the O. sativaRefSeq
SyMAP  was used for alignment and display of FPC contigs from each of the 12 phase I physical maps to the O. sativa RefSeq (IRGSP V.4). Briefly, the 12 BES data sets were repeat-masked  and then searched for sequence similarity to the rice genome using BLAT . To capture weaker sequence similarities from the more divergent species, BLAT parameters were adjusted as follows (-minIdentity = 70 -tileSize = 10 -minScore = 30 -qMask = lower -maxIntron = 10,000). BLAT results were filtered, to reduce false-positives, by retaining the top two hits for each query, as measured by match length, and subject to the additional criteria that no retained hit had a match length within 25% of a discarded hit. Approximate linear chains were computed from the retained hits using dynamic programming and then merged to form synteny blocks that were displayed by SyMAP. After the Oryza BESs were positioned on the O. sativa RefSeq, contigs containing the aligned BESs were anchored to the chromosomes and then renumbered in FPC. Contigs that aligned to less than 200 bp of O. sativa RefSeq were ignored in the alignment analysis (Table 3). All 12 SyMAP alignment results are available at .
Synteny analysis pipeline and incorporation of data in the CMap, Genome Browser and SyntenyView displays in Gramene
We constructed a pipeline to align clones and contigs from the OMAP species to O. sativa by combining data from the BES alignments to O. sativa and the phase I physical maps. The pipeline consisted of 6 steps: 1) upload phase I physical maps and BES data, 2) align BES to the RefSeq, 3) determine the best alignments for each clone, 4) assemble the clone positions to determine the region where the contig is found to align, 5) create blocks of synteny between selected OMAP species and O. sativa, 6) utilize the data to create visualizations in CMap, the Genome Browser, and SyntenyView in Gramene. Details on each step can be found at  Screen shots of the data in Gramene are shown in Additional data file 9.
RepeatMasker (V3.1.5), loaded with a custom Oryza repeat database, was used to identify repeats from the Oryza BES data set and the O. sativa RefSeq. The custom database was composed of annotated repeats from: TIGR rice repeat database ; Dr Susan Wessler (University of Georgia, USA); and Dr Tom Bureau (McGill University, Canada).
RECON  was used to identify de novo repeats from the Oryza BES data set. To increase the speed and efficiency of the program, the BLAST output was parsed to discard self hits as well as hits with an e-value greater than 1e-5. The RECON output, which identified repetitive elements and classified them into distinct families, was parsed for sequences greater than 40 bp in length that were found at least 5 times/family. Overlap between the de novo and the custom library was determined using RepeatMasker. Sequences left unmasked by this process and, thus, were not a part of our custom repeat database, were extracted, assembled using PHRAP  and annotated using BLASTN at an e-value = 1e-4 against the NCBI non-redundant nucleotide database  and a dataset of 2,050 full length LTR-retrotransposons identified from the whole genome sequence of Nipponbare through LTR_STRUC . Finally, these sequences were compiled into repeat databases specific to each species. The procedure is outlined in Additional data file 10.
Simple sequence repeat analysis
A total of 20,064 SSRs were identified from 1,452,912 BESs using RepeatMasker (V3.1.5) with a cutoff of longer than 20 bp and less than 5% sequence divergence. BESs containing SSRs (SSR-BES) were assembled for each species using CAP3  to collect a total of 16,980 non-redundant SSRs. An SSR was considered redundant when the BAC clones came from the same FPC contig, and an assembled CAP contig contained the same SSR motif, repeat length and sequence divergence. Abundance and relative frequency of SSRs from the Oryza BESs were compared to SSRs identified from two rice genomic sequences: 20,653 SSRs from the japonica RefSeq  and 19,370 from the indica draft [68, 69] sequences using identical criteria. Other interspersed repeats in SSR-BESs were detected using RepeatMasker with a custom repeat database (courtesy of Dr Ning Jiang, Michigan State University).
Identification of OryzamiRNA genes
O.sativa japonica miRNA precursors from miRBase (Version 8.2) , including 250 nucleotides upstream and downstream, were extracted from the O. sativa RefSeq  and aligned, using BLASTN , against BES of four closely related wild rice species (AA and BB genomes). Alignments were filtered and multiple hits to the same region of a BES were discarded, keeping the hit with the lowest e-value. The best orthologous match typically had more conservation in the region surrounding the precursor and, therefore, had a much lower e-value. If two miRNAs aligned to different regions of the same BES, they were not discarded, since they were likely to be tandemly duplicated miRNA genes. To obtain the exact coordinates and secondary structure of the miRNA precursor within the remaining BES, a pattern matching approach was used to align the mature miRNA precursor to the BES. Once the mature miRNA was mapped to the BES, the extended sequence was then submitted to MFold to determine if it could form a stable hairpin structure . MFold alignments were checked for the following criteria: had conserved orientation; contained fewer than eight mismatches; was not overlapping the loop region; and lacked some types of asymmetric bulges. A precursor was discarded if the bulge had four or more nucleotides that failed to match with any nucleotide from the opposite strand. A precursor was retained if a bulge of four or fewer nucleotides on one stem corresponded, although not necessarily matching, with one or more nucleotides on the opposite strand. If the sequence that aligned to the mature miRNA met these criteria, it was categorized as the opposing stem (miRNA*). Both stems and the intervening loop region were extracted and resubmitted as input to MFold. If this refined sequence formed a stable hairpin structure under the four conditions, it was categorized as a paralogous miRNA precursor.
BES trace files from O. rufipogon, O. nivara, O. glaberrima and O. punctata were compared to the O. sativa genome with BLASTZ  using the S1-S2 scoring method to select only the best alignment for each read. Each chromosome was then divided into 5 Mb segments and BESs aligned to each region from the previous step were re-aligned to the segment using cross_match  (-bandwidth 100 -alignments -discrep_lists) with variation filtering using the neighborhood quality standard (NQS) . Settings for NQS were such that candidate variation in the read had a Phred quality value (Q) of at least 23, its neighboring 7 bases on either side of the candidate variation all had Phred quality values of ≥ 15 and at least 11 of the 14 neighbors matched. If a read aligned to more than one place in the genome, then only the longest alignment with the fewest SNVs was reported. To decrease the number of false positive insertions or deletions, variations were filtered out if they had a single base variation and the quality score was less than 40 or if they were multiple base variations and the quality score was less than 23. Since two assemblies of the O. sativa RefSeq were available (IRGSP V.4 and TIGR V.4), variations were detected using both assemblies (Additional data file 8a-d). All data have been deposited into dbSNP and can be viewed at Gramene .
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a figure describing the SyMAP display details. Additional data file 2 is a table listing distribution of CB units from the O. sativa reference genome aligned contigs to each chromosome of 12 OMAP phase I physical maps. Additional data file 3 is a figure showing correlation of genome size and repeat content in the genus Oryza. Additional data file 4 is a figure presenting a comparison of MITE compositions in 12 OMAP genomes and the O. sativa genome. Additional data file 5 is a figure showing distribution of OMAP non-redundant SSR by motif types. Additional data file 6 is a table listing repeat association analysis for TAA and CAA motifs of O. ridleyi and O. granulata and BLAST analysis of CAA-BESs. Additional data file 7 is a figure comparing the SSR length and sequence compositions of OMAP SSR motifs. Additional data file 8 is a table listing SNVs and INDELs found between the BESs for O. rufipogon, O. nivara, O. glaberrima, O. punctata and the IRGSP V.4 pseudomolecules (or the TIGR V.4 pseudomolecules). Additional data file 9 is a figure describing wild rice BESs and variation data at Gramene. Additional data file 10 is a figure showing the strategy of repeat analysis and classification using the OMAP BES resources.
bacterial artificial chromosome
BAC end sequence
insertions and deletions
International Rice Genome Sequencing Project
long terminal repeat
miniature inverted repeat transposable element
Oryza map alignment project
single nucleotide variation
simple sequence repeat
The Institute for Genomic Research.
We thank Dr K Arumuganathan (Virginia Mason Research Center, Seattle, WA, USA) for determining the genome size of O. coarctata [HHJJ]. This work was supported by National Science Foundation grant DBI-0321678 (to RAW, SJ, LS, CS), the Bud Antle Endowed Chair (to RAW), and the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health (to LS).
- Riesenfeld CS, Schloss PD, Handelsman J: Metagenomics: genomic analysis of microbial communities. Annu Rev Genet. 2004, 38: 525-552. 10.1146/annurev.genet.38.072902.091216.PubMedView ArticleGoogle Scholar
- Ciccarelli FD, Doerks T, Mering CV, Creevey CJ, Sne B, Bork P: Toward automatic reconstruction of a highly resolved tree of life. Science. 2006, 311: 1283-1287. 10.1126/science.1123061.PubMedView ArticleGoogle Scholar
- Galagan JE, Calvo SE, Cuomo C, Ma LJ, Wortman JR, Batzoglou S, Lee SI, Basturkmen M, Spevak CC, Clutterbuck J, Kapitonov V, Jurka J, Scazzocchio C, Farman M, Butler J, Purcell S, Harris S, Braus GH, Draht O, Busch S, D'Enfert C, Bouchier C, Goldman GH, Bell-Pedersen D, Griffiths-Jones S, Doonan JH, Yu J, Vienken K, Pain A, Freitag M, et al: Sequencing of Aspergillus nidulans and comparative analysis with A. fumigatus and A. oryzae. Nature. 2005, 438: 1105-1115. 10.1038/nature04341.PubMedView ArticleGoogle Scholar
- Thomas JW, Touchman JW: Vertebrate genome sequencing: building a backbone for comparative genomics. Trends Genet. 2002, 18: 104-108. 10.1016/S0168-9525(02)02599-4.PubMedView ArticleGoogle Scholar
- Yogeeswaran K, Frary A, York TL, Amenta A, Lesser AH, Nasrallah JB, Tanksley SD, Nasrallah ME: Comparative genome analyses of Arabidopsis spp.: inferring chromosomal rearrangement events in the evolutionary history of A. thaliana. Genome Res. 2005, 15: 505-515. 10.1101/gr.3436305.PubMedPubMed CentralView ArticleGoogle Scholar
- Devos KM: Updating the 'crop circle'. Curr Opin Plant Biol. 2005, 8: 155-162. 10.1016/j.pbi.2005.01.005.PubMedView ArticleGoogle Scholar
- Town CD, Cheung F, Maiti R, Crabtree J, Haas BJ, Wortman JR, Hine EE, Althoff R, Arbogast TS, Tallon LJ, Vigouroux M, Trick M, Bancroft I: Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell. 2006, 18: 1348-1359. 10.1105/tpc.106.041665.PubMedPubMed CentralView ArticleGoogle Scholar
- Suwabe K, Tsukazaki H, Iketani H, Hatakeyama K, Kondo M, Fujimura M, Nunome T, Fukuoka H, Hirai M, Matsumoto S: Simple sequence repeat-based comparative genomics between Brassica rapa and Arabidopsis thaliana: the genetic origin of clubroot resistance. Genetics. 2006, 173: 309-319. 10.1534/genetics.104.038968.PubMedPubMed CentralView ArticleGoogle Scholar
- Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D: Genetic evidence for complex speciation of humans and chimpanzees. Nature. 2006, 441: 1103-1108. 10.1038/nature04789.PubMedView ArticleGoogle Scholar
- Peng JH, Zadeh H, Lazo GR, Gustafson JP, Chao S, Anderson OD, Qi LL, Echalier B, Gill BS, Dilbirligi M, Sandhu D, Gill KS, Greene RA, Sorrells ME, Akhunov ED, Dvorák J, Linkiewicz AM, Dubcovsky J, Hossain KG, Kalavacharla V, Kianian SF, Mahmoud AA, Miftahudin , Conley EJ, Anderson JA, Pathan MS, Nguyen HT, McGuire PE, Qualset CO, Lapitan NL: Chromosome bin map of expressed sequence tags in homoeologous group 1 of hexaploid wheat and homoeology with rice and Arabidopsis. Genetics. 2004, 168: 609-623. 10.1534/genetics.104.034793.PubMedPubMed CentralView ArticleGoogle Scholar
- Bossolini E, Wicker T, Knobel PA, Keller B: Comparison of orthologous loci from small grass genomes Brachypodium and rice: implications for wheat genomics and grass genome annotation. Plant J. 2007, 49: 704-717. 10.1111/j.1365-313X.2006.02991.x.PubMedView ArticleGoogle Scholar
- Mouse Genome Sequencing Consortium, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticleGoogle Scholar
- Prabhakar S, Poulin F, Shoukry M, Afzal V, Rubin EM, Couronne O, Pennacchio LA: Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 2006, 16: 855-863. 10.1101/gr.4717506.PubMedPubMed CentralView ArticleGoogle Scholar
- Grover CE, Kim H, Wing RA, Paterson AH, Wendel JF: Incongruent patterns of local and global genome size evolution in cotton. Genome Res. 2004, 14: 1474-1482. 10.1101/gr.2673204.PubMedPubMed CentralView ArticleGoogle Scholar
- Hurle B, Swanson W, NISC Comparative Sequencing Program, Green ED: Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage. Genome Res. 2007, 17: 276-286. 10.1101/gr.6004607.PubMedPubMed CentralView ArticleGoogle Scholar
- Drosophila 12 Genomes Consortium: Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007, 450: 203-218. 10.1038/nature06341.View ArticleGoogle Scholar
- Arabidopsis Genome Initiative (AGI): Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- International Rice Genome Sequencing Project (IRGSP): The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Peng S, Cassman KG, Virmani SS, Sheehy J, Khush GS: Yield potential trends of tropical rice since the release of IR8 and the challenge of increasing rice yield potential. Crop Sci. 1999, 39: 1552-1559.View ArticleGoogle Scholar
- International Rice Research Institute (IRRI): Rice Germplasm: Collecting, Preservation, Use. 1991, Manila: IRRIGoogle Scholar
- Vaughan DA, Morishima H, Kadowaki K: Diversity in the Oryza genus. Curr Opin Plant Biol. 2003, 6: 139-146. 10.1016/S1369-5266(03)00009-8.PubMedView ArticleGoogle Scholar
- Tateoka T: Taxonomic studies of Oryza III. Key to the species and their enumeration. Bot Mag Tokyo. 1963, 76: 165-173.View ArticleGoogle Scholar
- Tateoka T: Notes of some grasses. XVI. Embryo structure of the genus Oryza in relation to their systematics. Am J Bot. 1964, 51: 539-543. 10.2307/2440283.View ArticleGoogle Scholar
- Nayar M: Origin and cytogenetics of rice. Adv Genet. 1973, 17: 153-292.View ArticleGoogle Scholar
- Li CB, Zhang DM, Ge S, Lu BR, Hong DY: Differentiation and inter-genomic relationships among C, E and D genomes in the Oryza officinalis complex (Poaceae) as revealed by multicolor genomic in situ hybridization. Theor Appl Genet. 2001, 103: 197-203. 10.1007/s001220100562.View ArticleGoogle Scholar
- de Wit JMJ: Concepts and systematics of domesticated cereals. Kulturpflanze. 1981, 29: 177-198. 10.1007/BF02014749.View ArticleGoogle Scholar
- Juliano AB, Naredo MEB, Jackson MT: Taxonomic status of Oryza glumaepatula Steud. I. Comparative morphological studies of New World diploids and Asian AA genome species. Genet Res Crop Evol. 1998, 45: 197-203. 10.1023/A:1008682316449.View ArticleGoogle Scholar
- Ge S, Sang T, Lu BR, Hong DY: Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci. 1999, 96: 14400-14405. 10.1073/pnas.96.25.14400.PubMedPubMed CentralView ArticleGoogle Scholar
- Brar DS, Khush GS: Alien introgression in rice. Plant Mol Biol. 1997, 35: 35-47. 10.1023/A:1005825519998.PubMedView ArticleGoogle Scholar
- Ammiraju JS, Luo M, Goicoechea JL, Wang W, Kudrna D, Mueller C, Talag J, Kim H, Sisneros NB, Blackmon B, Fang E, Tomkins JB, Brar D, MacKill D, McCouch S, Kurata N, Lambert G, Galbraith DW, Arumuganathan K, Rao K, Walling JG, Gill N, Yu Y, SanMiguel P, Soderlund C, Jackson S, Wing RA: The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res. 2006, 16: 140-147. 10.1101/gr.3766306.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim H, SanMiguel P, Nelson W, Collura K, Wissotski M, Walling JG, Kim JP, Jackson SA, Soderlund C, Wing RA: Comparative physical mapping between O. sativa (AA genome type) and O. punctata (BB genome type). Genetics. 2007, 176: 379-390. 10.1534/genetics.106.068783.PubMedPubMed CentralView ArticleGoogle Scholar
- Soderlund C, Humphray S, Dunham A, French L: Contigs built with fingerprints, markers, and FPC V4.7. Genome Res. 2000, 10: 1772-1787. 10.1101/gr.GR-1375R.PubMedPubMed CentralView ArticleGoogle Scholar
- Pampanwar V, Engler F, Hatfield J, Blundy S, Gupta G, Soderlund C: FPC Web tools for rice, maize, and distribution. Plant Physiol. 2005, 138: 116-126. 10.1104/pp.104.056291.PubMedPubMed CentralView ArticleGoogle Scholar
- Soderlund C, Nelson W, Shoemaker A, Paterson A: SyMAP: a system for discovering and viewing syntenic regions of FPC maps. Genome Res. 2006, 16: 1159-1168. 10.1101/gr.5396706.PubMedPubMed CentralView ArticleGoogle Scholar
- OMAP SyMAP. [http://www.omap.org/cgi-bin/symap/omap_top.cgi]
- RepeatMasker. [http://www.repeatmasker.org/]
- Bao Z, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002, 12: 1269-1276. 10.1101/gr.88502.PubMedPubMed CentralView ArticleGoogle Scholar
- Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, Panaud O: Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice. Genome Res. 2006, 16: 1262-1269. 10.1101/gr.5290206.PubMedPubMed CentralView ArticleGoogle Scholar
- Zuccolo A, Sebastian A, Talag J, Yu Y, Kim H, Collura K, Kudrna D, Wing RA: Transposable element distribution, abundance and role in genome size variation in the genus Oryza. BMC Evol Biol. 2007, 7: 152-10.1186/1471-2148-7-152.PubMedPubMed CentralView ArticleGoogle Scholar
- Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, Weigel D: Control of leaf morphogenesis by microRNAs. Nature. 2003, 425: 257-263. 10.1038/nature01958.PubMedView ArticleGoogle Scholar
- Vaucheret H, Vazquez F, Crete P, Bartel DP: The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev. 2004, 18: 1187-1197. 10.1101/gad.1201404.PubMedPubMed CentralView ArticleGoogle Scholar
- Allen E, Xie Z, Gustafson AM, Carrington JC: MicroRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell. 2005, 121: 207-221. 10.1016/j.cell.2005.04.004.PubMedView ArticleGoogle Scholar
- Vaucheret H: Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev. 2006, 20: 759-771. 10.1101/gad.1410506.PubMedView ArticleGoogle Scholar
- Maher C, Stein L, Ware D: Evolution of Arabidopsis microRNA families through duplication events. Genome Res. 2006, 16: 510-519. 10.1101/gr.4680506.PubMedPubMed CentralView ArticleGoogle Scholar
- Margulies EH, Vinson JP, Miller W, Jaffe DB, Lindblad-Toh K, Chang JL, Green ED, Lander ES, Mullikin JC, Clamp M: An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc Natl Acad Sci. 2005, 102: 4795-4800. 10.1073/pnas.0409882102.PubMedPubMed CentralView ArticleGoogle Scholar
- Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W: Human-mouse alignments with BLASTZ. Genome Res. 2003, 13: 103-107. 10.1101/gr.809403.PubMedPubMed CentralView ArticleGoogle Scholar
- Beijing Genomics Institute. [http://www.genomics.org.cn/bgi_new/index.html]
- Feltus A, Wan J, Schulze S, Estill J, Jiang N, Paterson A: An SNP resource for rice genetics and breeding based on subspecies Indica and Japonica genome assemblies. Genome Res. 2004, 14: 1812-1819. 10.1101/gr.2479404.PubMedPubMed CentralView ArticleGoogle Scholar
- Gramene. [http://www.gramene.org/]
- The Oryza Map Alignment Project (OMAP). [http://www.omap.org/]
- Ammiraju JS, Zuccolo A, Yu Y, Song X, Piegu B, Chevalier F, Walling JG, Ma J, Talag J, Brar DS, SanMiguel PJ, Jiang N, Jackson SA, Panaud O, Wing RA: Evolutionary dynamics of an ancient retrotransposon family provides insights into evolution of genome size in the genus Oryza. Plant J. 2007, 52: 342-351. 10.1111/j.1365-313X.2007.03242.x.PubMedView ArticleGoogle Scholar
- Zhang S, Gu YQ, Singh J, Coleman-Derr D, Brar DS, Jiang N, Lemaux PG: New insights into Oryza genome evolution: high gene colinearity and differential retrotransposon amplification. Plant Mol Biol. 2007, 64: 589-600. 10.1007/s11103-007-9178-3.PubMedView ArticleGoogle Scholar
- Hass-Jacobus BL, Futrell-Griggs M, Abernathy B, Westerman R, Goicoechea JL, Stein J, Klein P, Hurwitz B, Zhou B, Rakhshan F, Sanyal A, Gill N, Lin JY, Walling JG, Luo MZ, Ammiraju JS, Kudrna D, Kim HR, Ware D, Wing RA, San Miguel P, Jackson SA: Integration of hybridization-based markers (overgos) into physical maps for comparative and evolutionary explorations in the genus Oryza and in Sorghum. BMC Genomics. 2006, 7: 199-10.1186/1471-2164-7-199.PubMedPubMed CentralView ArticleGoogle Scholar
- Lu J, Tang T, Tang H, Huang J, Shi S, Wu CI: The accumulation of deleterious mutations in rice genomes: a hypothesis on the cost of domestication. Trends Genet. 2006, 22: 126-131. 10.1016/j.tig.2006.01.004.PubMedView ArticleGoogle Scholar
- Tang T, Lu J, Huang J, He J, McCouch SR, Shen Y, Kai Z, Purugganan MD, Shi S, Wu CI: Genomic variation in rice: genesis of highly polymorphic linkage blocks during domestication. PLoS Genet. 2006, 2: e199-10.1371/journal.pgen.0020199.PubMedPubMed CentralView ArticleGoogle Scholar
- The Rice Chromosome 3 Sequencing Consortium: Sequence, annotation, and analysis of synteny between rice chromosome 3 and diverged grass species. Genome Res. 2005, 15: 1284-1291. 10.1101/gr.3869505.View ArticleGoogle Scholar
- Roberto R, Capozzi O, Wilson RK, Mardis ER, Lomiento M, Tuzun E, Cheng Z, Mootnick AR, Archidiacono N, Rocchi M, Eichler EE: Molecular refinement of gibbon genome rearrangements. Genome Res. 2007, 17: 249-257. 10.1101/gr.6052507.PubMedPubMed CentralView ArticleGoogle Scholar
- Newman TL, Tuzun E, Morrison VA, Hayden KE, Ventura M, McGrath SD, Rocchi M, Eichler EE: A genome-wide survey of structural variation between human and chimpanzee. Genome Res. 2005, 15: 1344-1356. 10.1101/gr.4338005.PubMedPubMed CentralView ArticleGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437: 376-380.PubMedPubMed CentralGoogle Scholar
- AGI BAC/EST Resource Center. [http://www.genome.arizona.edu/orders/]
- Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- GRAMENE: Synteny Analysis Pipeline. [http://dev.gramene.org/cmap/omap_methods.html]
- TIGR Rice Repeat Database. [ftp://ftp.tigr.org/pub/data/TIGR_Plant_Repeats/TIGR_Oryza_Repeats.v3.3]
- PHRAP. [http://www.phrap.org/phredphrapconsed.html]
- NCBI non-redundant nucleotide database. [ftp://ftp.ncbi.nih.gov/blast/db]
- McCarthy EM, McDonald JF: LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics. 2003, 19: 362-367. 10.1093/bioinformatics/btf878.PubMedView ArticleGoogle Scholar
- Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, et al: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002, 296: 79-92. 10.1126/science.1068037.PubMedView ArticleGoogle Scholar
- Yu J, Wang J, Lin W, Li S, Li H, Zhou J, Ni P, Dong W, Hu S, Zeng C, Zhang J, Zhang Y, Li R, Xu Z, Li S, Li X, Zheng H, Cong L, Lin L, Yin J, Geng J, Li G, Shi J, Liu J, Lv H, Li J, Wang J, Deng Y, Ran L, Shi X, et al: The genomes of Oryza sativa : a history of duplications. PLoS Biol. 2005, 3: e38-10.1371/journal.pbio.0030038.PubMedPubMed CentralView ArticleGoogle Scholar
- Griffiths-Jones S: miRBase: The microRNA sequence database. Methods Mol Biol. 2006, 342: 129-138.PubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999, 288: 911-940. 10.1006/jmbi.1999.2700.PubMedView ArticleGoogle Scholar
- The International SNP Map Working Group: A map of human genome sequence variation containing 1.4 million single nucleotide polymorphisms. Nature. 2001, 409: 928-933. 10.1038/35057149.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.