A physical map for the Amborella trichopodagenome sheds light on the evolution of angiosperm genome structure
- Andrea Zuccolo1,
- John E Bowers2,
- James C Estill2,
- Zhiyong Xiong3,
- Meizhong Luo1, 4,
- Aswathy Sebastian1,
- José Luis Goicoechea1,
- Kristi Collura1,
- Yeisoo Yu1,
- Yuannian Jiao5,
- Jill Duarte5,
- Haibao Tang2, 6, 7,
- Saravanaraj Ayyampalayam2,
- Steve Rounsley8, 9,
- Dave Kudrna1,
- Andrew H Paterson2, 7,
- J Chris Pires3,
- Andre Chanderbali10,
- Douglas E Soltis10,
- Srikar Chamala10,
- Brad Barbazuk10,
- Pamela S Soltis11,
- Victor A Albert12,
- Hong Ma5, 13,
- Dina Mandoli14,
- Jody Banks15,
- John E Carlson16,
- Jeffrey Tomkins17,
- Claude W dePamphilis5,
- Rod A Wing1 and
- Jim Leebens-Mack2Email author
© Zuccolo et al.; licensee BioMed Central Ltd. 2011
Received: 21 November 2010
Accepted: 27 May 2011
Published: 27 May 2011
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
The origin and rapid diversification of the angiosperms (flowering plants) were pivotal events in the evolutionary history of Earth's biota. Over the past 130 to 150 million years angiosperms have diversified to include approximately 350,000 species occupying nearly all habitable terrestrial and many aquatic environments. Angiosperms generate the vast majority of human food either directly or indirectly as animal feed, and they account for a huge proportion of land-based photosynthesis and carbon sequestration. Comparative analyses of genome sequences and gene function for a growing number of species are shedding light on how gene and genome duplications have contributed to the diversification within major flowering plant lineages (for example, Rosidae, Asteridae, Monocotyledoneae ), but elucidation of the genetic and genomic processes underlying the key innovations associated with the origin of flowering plants (for example, typically bisexual flowers, endosperm formation, double fertilization, ovules with two integuments, seed development within the carpel) requires comparisons between lineages that diverged from the last common ancestor of all extant angiosperms [2, 3].
Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree or shrub species endemic to the forests of New Caledonia, as the sister species to all other extant angiosperms [4–8]. Amborella is no more 'ancient' or 'primitive' than any other extant flowering plant species, but comparisons between Amborella and other angiosperms are allowing researchers to triangulate on characteristics of their last common ancestor. Using a similar approach, researchers have used the complete genome sequence of platypus, Ornithorhynchus anatinus, representing the sister group of all other extant mammals, to elucidate mammalian genome evolution .
Previous comparisons of transcriptome content , gene expression patterns [11–13], and gene function [14, 15] between Amborella and other flowering plant species have suggested that much of the floral development program that has been characterized in Arabidopsis, snapdragon and maize existed in the last common ancestor of extant angiosperms. While gene duplications in the MADS-box transcription factor family likely contributed to the earliest floral development regulatory networks [11, 12, 16–19], it is not clear whether these were single gene duplications or the product of polyploidization. Genome duplications have occurred repeatedly throughout angiosperm history [20–23] but there is uncertainty in the timing of polyploidy events relative to the origin of the angiosperms and important innovations in flowering plant history .
Here we describe a BAC-based draft physical map for A. trichopoda and use BAC end sequences (BESs) to compare the structure of the Amborella genome to representative eudicot (Vitis, Populus and Arabidopsis) and grass (Oryza) genomes. Comparative analyses of sequences for two large contiguous regions (487.3 and 629.7 kb in the Amborella genome) were also performed. In addition we use a large transcriptome assembly to identify BAC ends matching protein-coding sequences . Our aim here is to begin to investigate whether regions of these genomes have remained syntenic throughout angiosperm history, and determine whether ancient genome duplications discovered in eudicot and grass genomes [26–29] occurred before or after the divergence of these lineages from the Amborella lineage. In addition, the physical map and sequence analyses establish a framework for future studies of all flowering plant genomes, including the Amborella genome itself.
Results and discussion
BAC library and physical map
The structure and composition of the 870 Mbp/C A. trichopoda genome was investigated through physical mapping of clones from a 5.2 × coverage BAC library. The library was constructed after partial digest of high-molecular-weight DNA with HindIII. The library, which comprises 36,684 BAC clones with an estimated average insert size of 123 kb, is available through the Arizona Genomics Institute . The BAC library was double spotted in high density onto Hybond N+ filters. All 36,684 clones were end-sequenced, and a physical map was constructed after high information content fingerprinting (HICF) [32, 33]. A total of 32,719 fingerprinted BACs was assembled into 3,106 contigs and 1,356 singletons using the program FPC version 7.2 .
The quality of the physical map was assessed by screening the arrayed library with probes developed for Amborella homologs for eight genes that have been found to be single-copy in sequenced plant genomes [35, 36]. Probes derived from Amborella cDNA clones or PCR amplicons were putative homologs of following single-copy Arabidopsis genes: ASD (At1g14810), DWARF1 (At3g19820), GIGANTEA (At1g22770), LEAFY (At5g61850), a dienelactone hydrolase gene (At2g32520), a cytochrome-C-oxidase-related gene (At4g37830), EIF3K (At4g33250) and a hypothetical protein-coding gene with strong similarity to rice gene Os02g0593400 (At5g63135). All verified positive clones mapped to the same FPC contig for six of the eight probes (Figure S1 in Additional file 1). Positive clones for the EIF3K and the hypothetical protein-coding gene probes were each distributed between two FPC contigs and inspection of the HICF bands for these contigs suggests that the genes have been duplicated in the Amborella lineage. In accordance with the expected library coverage, the single copy nuclear gene probes hybridized to 3 to 13 clones (mean 6.9).
The correlation between HICF bands and the number of BACs included in each FPC contig was 0.655 for all contigs and 0.917 after removing two contigs derived from the chloroplast and mitochondrial genomes and one contig composed largely of repetitive elements (Figure S2 in Additional file 1). We used a calibration of average insert size (123 kb) over the average number of HICF bands per BAC clone (128) to obtain a rough estimate of FPC contig lengths. Of 77 FPC contigs with 39 or more BACs (not including the contigs with the plastome and repetitive elements), estimated lengths ranged from 308 to 1,429 kb.
BAC end sequencing was performed on all fingerprinted BACs producing 69,466 Sanger reads with an average length of 695 bp after quality and vector trimming. This corresponds to 48.25 Mbp, or roughly 5.4% of the Amborella genome. BESs were related to the physical map and used to identify regions of synteny between regions of the Amborella genome and the sequenced Arabidopsis, Populus, Vitis (grape), and Oryza (rice) genomes (see below). In addition, end sequences were used to verify the identity of the three excluded FPC contigs described above. All BESs mapping at least 100 bp apart on the plastid genome  were found in the same FPC contig. This contig included just 532 BACs, indicating very low (1.6%) plastid DNA contamination.
Characterization of repeats in BAC end and shotgun sequences
Repeat composition and frequency in the Amborella genome were characterized through analysis of the BAC end and whole genome survey sequences. Reads were first compared with sequences in Repbase (v.15.08)  using BLASTN . In order to minimize the effect of divergence between Amborella genes and homologous repeats from other species, we used relaxed BLASTN settings (-q -4 -r 5) to accommodate an estimated 160 million years of sequence divergence since the last common ancestor of extant flowering plants [8, 40–42] while maintaining rigorous support for significant hits (E-value threshold was set at 1e-10). All BAC end sequences without significant hits were then compared with the non-redundant protein database in GenBank using BLASTX and an E-value threshold of e-5. Finally, the remaining sequences without matches in Repbase or the GenBank nr database were compared with sequences that did have matches in either database using BLASTN with an E-value threshold of 1.0e-10. We report results both excluding these 'internal' BLAST searches and including them (I). Together these results provide estimates of transposable element (TE) content based on conservative and more comprehensive (and possibly more permissive; I) search strategies.
Frequencies of BAC end sequences and Sanger shot gun sequences matching sequences in Repbase
Absolute number in BESs
Percentage repeats in BESs
Absolute number in SGSs
Percentage repeats in SGSs
LTR not classified
Retro not classified
Putatively high-copy MITEs identified in the BESs and Sanger shot gun sequences using MUST pipeline
Inverted repeat length
Copy number estimate
Copy number estimate
Simple sequence repeats identified in BESs and Sanger shot gun sequences
Distribution of BESs with matches to protein-coding regions of reference genomes
All BESs and shotgun sequences were compared to the GenBank nr database using BLASTX  with an e-value threshold of 1e-5. After the removal of sequences similar to TEs, the overall frequencies of sequences finding matches in the protein database were 11.9% and 8.05% for the BES and Sanger shotgun sequences, respectively. For BESs from FPC contigs with ten or more BACs, we found a negative correlation between the frequencies of BESs matching protein-coding genes and LTR retrotransposons (r = -0.423, P < 0.0001). As has been described for other genomes [54–56], gene density seems to be negatively correlated with retrotranposon density in the Amborella genome.
Identification of syntenic blocks between Amborella, Arabidopsis, rice, poplar and grape
Taking advantage of the availability of a phase I physical map assembly, we mapped the Amborella contigs onto the genomes of A. thaliana, Populus trichocarpa, Vitis vinifera, and O. sativa. We focused on the 77 largest contigs with at least 39 clones. BLAST analyses of BESs were done within the context of their linkages within FPC contigs. All of the contig BESs classified as repeats (see above) were discarded. Those remaining were compared against the four reference genomes. Because of the large evolutionary time that separates Amborella from the other four sequenced genomes [41, 42, 57], the comparisons were carried out at the protein level using tBLASTX; only the best hits were taken into account. Amborella FPC contigs were considered for further analyses if at least two BESs had matches with bit scores greater than 80 (typically a maximum e-value of 1.0E-20 over 100 amino acidic residues) to loci separated by less than 500 kb within one of the four genomes being compared. Positive matches were used as anchors to circumscribe 4-Mbp tracts within the reference genomes and a second, more focused tBLASTX search was performed comparing the BESs with these regions. An e-value threshold of 1.0E-4 was used for the second set of tBLASTX searches and all significant hits were used to identify syntenic regions. We considered a contig as anchored if the contig had at least four positive hits (e-value lower than 1.0e-4) to at least three distinct genes.
Statistics for cDNA sequences included in multi-library transcriptome assembly of 246,196 unigenes with lengths greater than 100 bp
Tissue - library name
Number of reads
Total passing bases (MB)
Apical meristem - Atr12
454 FLX Titanium
Male flowers - Atr15
454 FLX Titanium
Old leaves - Atr14
454 FLX Titanium
Old stem - Atr13
454 FLX Titanium
Pre-meiotic female flower buds - Atr10
454 FLX GS
Pre-meiotic female flower bud - Atr02
Pre-meiotic male flower bud - Atr01
Root - Atr11
454 FLX GS
Stem - Atr16
454 FLX Titanium
Paleopolyploidy in angiosperm genomes
Paleopolyploidy events have been well characterized in all four sequenced genomes analyzed here [29, 45, 58–60], and the syntenic Amborella FPC contigs described above often match multiple regions in these genomes. The most ancient of these paleopolyploidy events is the so-called γ triplication that has been inferred to have occurred before the divergence of the Asteridae (represented by tomato, Solanum lycopersicon) and the Rosidae, including Vitis, Populus and Arabidopsis . Given the very incomplete view of the Amborella genome that is available in the BES data, we are not able to assess synteny between Amborella FPC contigs. Nevertheless, comparisons between the Amborella contigs and sets of syntenic blocks in the Vitis genome indicate that the γ triplication most likely occurred sometime after the divergence of all other angiosperms from the lineage leading to Amborella.
All BESs were compared to all annotated protein-coding genes in the Vitis genome placed within the context of the pre-triplication ancestral gene blocks and post-triplication syntenic segments identified by Tang et al. . A total of 328 Amborella FPC contigs had between two and eight genes with significant best BLASTX matches (e-values ≤1.0E-6) to Vitis genes corresponding to pre-triplication gene blocks in the ancestral genome. In most of these cases (199 of 328; Additional file 2), best hits were distributed between two or three homeologous (that is, post-triplication) syntenic Vitis genome segments. Of the remaining 129 Amborella FPC contigs with BESs showing significant BLASTX hits to a single Vitis subgenome (that is, single copy of a triplicated ancestral block), most (113) included just 2 genes mapping to the ancestral Vitis gene blocks (14 including 3 genes, and 2 including 4 genes) (Additional file 2). All 21 FPC contigs with best BLASTX matches to five or more genes within the ancestral Vitis blocks were distributed among two or three post-triplication subgenomes. Complete sequences for the Amborella BAC contigs may reveal more even distribution of segments among Vitis subgenomes, but the results described here suggest that triplication, fractionation and divergence of homeologous segments in the Vitis genome postdate the divergence between lineages leading to Vitis and Amborella (that is, the last common ancestor of all extant angiosperms).
Analysis of complete sequences for two AmborellaBAC contigs
The DAWGPAWS suite of scripts was used to organize ab initio gene predictions, BLAST results and the output of repeat identification tools [61, 62]. Ab initio gene predictions were generated using FGENESH , AUGUSTUS , SNAP , GeneID  and GenScan . In addition, Amborella EST sequences produced by the 454 Titanium platform (2,943,273 reads; total read size of approximately 776 Mbp; average read length of 263.60 bp) and Sanger sequencing (38,147 reads; total read size of approximately 21.3 Mbp; average read length of 559.57 bp) were splice-aligned to the contigs using GMAP (Genomic Mapping and Alignment Program)  with the PASA (Program to Assemble Spliced Alignments) genome annotation tool . All predictions were manually compared with BLASTX results against gene annotations from Arabidopsis , Vitis , Z. mays , Medicago , Oryza [72, 73], and Sorghum  as well as tBLASTx results against the Amborella transcript assemblies. GBrowse views of gene annotations and BLAST results for each contig are available at the Ancestral Angiosperm Genome Project website .
At least two genome duplications (ρ and σ) have been inferred to have occurred within the monocot lineage leading to rice since divergence of monocots and eudicots . These duplications were evident in comparisons with both Amborella contigs. Regions of contig 1003 were found to be syntenic with portions of rice chromosomes 2 and 4 derived from the ρ duplication and a portion of chromosome 10 (Figure 5) that is related to these two regions through the earlier σ duplication . The LASTZ analysis of contig 431 revealed synteny with seven regions in the rice genome (Figure 6) and one of the 'putative ancestral regions' (PAR 17) characterized by Tang et al. . These PARs were defined as regions of synteny between the rice and Vitis genomes. Phylogenetic analyses of genes in Amborella contig 431 and syntenic regions of the rice and Vitis genomes may elucidate the timing of the γ triplication and genome duplications evident in synteny analyses of the rice genome relative to the divergence of monocots and eudicots.
Phylogenetic analyses of gene families represented in sequenced Amborellacontigs
While the fractionation process has resulted in the loss of most duplicated genes following the ancient polyploidy events evident in the syntenic Vitis and rice segments shown in Figures 5 and 6, duplicate Vitis genes have been retained for homologs of three Amborella genes located on contig 431 (Figures 6a). These genes were used to search the PlantTribes gene family database . The three gene sets identified in the synteny analysis correspond to three gene families (auxin-independent growth promoter, ceramidase and plant uncoupling mitochondrial protein) circumscribed through OrthoMCL clustering  of gene annotations from the available Arabidopsis, Carica (papaya), Populus, Medicago (alfalfa), Glycine, Cucumis (cucumber), Vitis, Mimulus, Oryza, Sorghum, Selaginella (spike moss) and Physcomitrella genomes. Homologous genes sampled from exemplar asterid, ranunculid, non-grass monocot and gymnosperm species were obtained from EST assembly databases [25, 77, 78] and were added to each gene family set. Sequences in each gene family set were aligned using MUSCLE , and RAxML  run with the GTRGAMMA substitution model was used to obtain maximum likelihood estimates of gene trees.
A. trichopoda is the sister species to the large clade encompassing all other extant flowering plants. As such, comparative analyses of Amborella and other flowering plants offer a uniquely informative perspective on the most recent common ancestor of all extant angiosperms. The physical map and BAC end sequences described in this study provide a low-resolution view of the Amborella genome. Nonetheless, these data shed light on genomic features of the last common ancestor of flowering plants. Moreover, the Amborella genome provides a unique reference for understanding genome evolution throughout angiosperm history. When placed in the context of the physical map, BESs representing just 5.4% of the Amborella genome allowed reconstruction of ancestral gene blocks in regions represented by 29 BAC contigs and inference of the timing of structural mutations that disrupted these blocks (Figure 3).
Analyses of BESs and BAC contigs also indicate that the ancient γ polyploidy event inferred from the Arabidopsis , Carica , Populus , and Vitis  genomes occurred after the Amborella lineage diverged from the rest of the angiosperms. Therefore, if the origin of angiosperms was associated with a genome duplication as has been hypothesized elsewhere [16, 20, 23], that polyploidy event predated the γ event.
Materials and methods
BAC library construction
Protocols for DNA megabase preparation, library construction, picking and arraying proposed in Luo and Wing  were followed.
The SNaPshot fingerprinting technique was adopted  with the modifications described by Kim et al. . Snapshot reactions were loaded into ABI 3730xl DNA sequencers. Analysis of data for each contig was carried out using the ABI Data Collection Program.
Physical map construction
Fingerprints were assembled into contigs using the program FPC version 7.2 . The initial assembly was carried out using a Sulston score threshold of e-50 followed by three rounds of dequeuing at the same stringency and auto-merging of contigs at e-21.
BAC end extraction and sequencing
BAC DNA was extracted and end sequenced from 36,684 clones using the methods described by Ammiraju et al. [83, 84]. Sequence quality assessment and trimming were carried out using the programs Phred  and Lucy .
Random sheared library
A random sheared library was constructed as previously described .
cDNA sequencing and assembly
Additional Sanger ESTs were generated from available male and female flower bud cDNA libraries  (Table 4). Libraries for 454 sequencing were constructed from the tissues listed in Table 4 using the Mint cDNA synthesis kit (Evrogen, Moscow, Russia). Total RNAs for cDNA synthesis were isolated using a combination of CTAB extraction and the RNeasy Plant Mini kit (Qiagen Valencia, CA USA) as previously described for basal angiosperms . Two rounds of messenger RNA isolation were performed with the Poly(A)Purist™ mRNA Purification Kit (Ambion Inc. Austin, TX USA) according to the manufacturer's recommendation. Contaminant DNA was removed with DNA-free™ (Ambion Inc.) and mRNA quality was verified using a Bioanalyzer (Agilent Inc. Santa Clara, CA, United States). Vector and adaptor sequences were trimmed from 454 Titanium (2,943,273 reads; total read size of approximately 776 Mbp; average read length of 263.60 bp) and Sanger sequences (38,147 reads; total read size of approximately 21.3 Mbp; average read length of 559.57 bp) using seqclean  and assembled using MIRA .
Similarity searches, repeat classification and contig anchoring
Similarity searches were carried out using the programs BLASTN and BLASTX . BLASTN was run under relaxed settings (-q -4 -r 5) in order to accommodate the evolutionary distance between Amborella and the species included in the repeat databases used; the significance threshold was set at 1e-10. In the case of BLASTX searches the threshold was set at 1e-5 or 1e-4 for the BES synteny analysis. tBLASTX was used to anchor the contigs to the reference genomes (see Results for details).
The databases used in similarity searches were RepBase version 15.08 , the GenBank non-redundant (nr) database, and the Oryza, Arabidopsis, Vitis and Populus genome sequences.
Validation of repeat searches and MITE identification
The program MUST  was used for de novo characterization of highly repeated sequences; results were then inspected for the presence of MITE features. Inverted repeats were identified manually parsing the results of dot-plot comparisons made using the program 'Dotter' .
Simple sequence repeat searches
Fluorescence in situhybridization
FPC contigs were validated by hybridizing BAC DNAs to Amborella chromosome squashes. DNA was prepared for BAC mapping to the middle and both ends of BAC contigs 431 and 1003 and used to prepare fluorescently labeled BAC-FISH probes. Chromosome squashes were prepared from root tips and labeled BAC-FISH probes were prepared as described by Xiong et al. .
Contig sequencing and annotation
Minimum tiling paths of seven and six BACs were identified for contigs 1003 and 431, respectively, by the visual inspection of the FPC assemblies. Adjacent clones were chosen based on their reciprocal position and probability value associated to their overlapping fingerprinted bands as shown by FPC. Sequencing of selected minimum tiling path BACs was done to phase II quality as previously described . Phase II BAC sequences were then assembled into 1003 and 431 contig sequences based on dot plot comparisons and overlap similarity between adjacent clones.
Perl scripts available from the DAWGPAWS package [61, 62] were used to convert computational annotation results from multiple sources into a single GFF3 file for combined evidence annotation in Apollo  and publication in Gbrowse . Ab initio gene annotation programs used in this process included FGENESH  AUGUSTUS , SNAP , GeneID  and GenScan . Because Amborella-specific gene model parameterizations were not available for these programs, multiple plant models were used for each ab initio program. The sequence of the entire contig was BLASTx (e < 1 × 10-5) searched against gene annotations from Arabidopsis , Vitis , Z. mays , Medicago , Oryza , and Sorghum  as well as tBLASTx (e < 1 × 10-5) searched against a database of comprehensive Amborella transcript assemblies . In addition, Amborella EST sequences (reads and assemblies; Table 4) were splice-aligned to the contigs using GMAP (Genomic Mapping and Alignment Program)  with the PASA (Program to Assemble Spliced Alignments) genome annotation tool . The gene models and BLAST search results were manually combined into gene models using the Apollo genome annotation curation tool .
Synteny analysis of sequenced BAC contigs with Vitis and Oryzagenomes
Sequenced Amborella BAC contigs 431 (487,318 bp) and 1003 (629,678 bp) were compared to the International Rice Genome Sequencing Project (IRGSP) rice genome assembly (version 5) and the Genoscope 12 × Vitis genome assembly using LASTZ and default parameters. Prior to LASTZ comparisons, all genomic sequences were masked using NCBI's WindowMasker to remove simple repeats. Significant matches after repeat masking were visualized as dot plots. Gene annotations for the rice and Vitis genomes were obtained from the Rice Annotation Project  and Genoscope , respectively, and plotted on the vertical axes of the dot plots (Figures 5 and 6). FGENESH  annotations for the Amborella contigs were included on the horizontal axes of the dot plots. LASTZ scores were summed for all aligned Amborella-rice or Amborella-Vitis blocks within 100 kb of each other in sequenced genomes. All regions with summed scores >100,000 were considered as syntenic and included in Figures 5 and 6.
All alignments were carried out using the program 'MUSCLE'  run under default settings. Maximum likelihood analyses were run on aligned DNA and amino acid sequences using RAxML  and the GTRGAMMA nucleotide substitution model.
Submission of data to GenBank databases
BESs (HR616970 to HR686434), full-length BAC sequences (AC243594.1 to AC243606.1), Sanger shotgun sequences (HR614237 to HR616931), 454 shotgun sequences (SRP006044), Sanger ESTs (FD425831.1 to FD443502.1) and 454 cDNA sequences (SRX018174, SRX018165, SRX018164, SRX018163, SRX018157, SRX018156) have been deposited in the appropriate NCBI GenBank sequence databases. All sequences are also available at the Ancestral Angiosperm Genome Project website .
bacterial artificial chromosome
BAC end sequence
expressed sequence tag
fluorescence in situ hybridization
high information content fingerprinting
long interspersed element
long terminal repeat
miniature inverted repeat transposable element
simple sequence repeat
This work was supported with funding from National Science Foundation grants 0208502, 0638595 and 0922742. We also acknowledge helpful comments and suggestions provided by anonymous reviewers.
- Cantino P, Doyle JJ, Graham S, Judd W, Olmstead R, Soltis D, Soltis P, Donoghue MJ: Towards a phylogenetic nomenclature of Tracheophyta. Taxon. 2007, 56: 822-846. 10.2307/25065865.View ArticleGoogle Scholar
- Leebens-Mack JH, Wall PK, Duarte J, Zheng Z, Oppenheimer D, dePamphilis CW: A genomics approach to the study of floral developmental genetics: strengths and limitations. Adv Bot Res. 2006, 44: 527-549.View ArticleGoogle Scholar
- Soltis DE, Albert VA, Leebens-Mack J, Palmer JD, Wing RA, dePamphilis CW, Ma H, Carlson JE, Altman N, Kim S, Wall PK, Zuccolo A, Soltis PS: The Amborella genome: an evolutionary reference for plant biology. Genome Biol. 2008, 9: 402-10.1186/gb-2008-9-3-402.PubMedPubMed CentralView ArticleGoogle Scholar
- Mathews S, Donoghue MJ: The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science. 1999, 286: 947-950. 10.1126/science.286.5441.947.PubMedView ArticleGoogle Scholar
- Qiu YL, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW: The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature. 1999, 402: 404-407. 10.1038/46536.PubMedView ArticleGoogle Scholar
- Soltis PS, Soltis DE, Chase MW: Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature. 1999, 402: 402-404. 10.1038/46528.PubMedView ArticleGoogle Scholar
- Jansen RK, Cai Z, Raubeson LA, Daniell H, dePamphilis CW, Leebens-Mack J, Muller KF, Guisinger-Bellian M, Haberle RC, Hansen AK, Chumley TW, Lee SB, Peery R, McNeal JR, Kuehl JV, Boore JL: Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA. 2007, 104: 19369-19374. 10.1073/pnas.0709121104.PubMedPubMed CentralView ArticleGoogle Scholar
- Moore MJ, Bell CD, Soltis PS, Soltis DE: Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA. 2007, 104: 19363-19368. 10.1073/pnas.0708072104.PubMedPubMed CentralView ArticleGoogle Scholar
- Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP, Grutzner F, Belov K, Miller W, Clarke L, Chinwalla AT, Yang SP, Heger A, Locke DP, Miethke P, Waters PD, Veyrunes F, Fulton L, Fulton B, Graves T, Wallis J, Puente XS, Lopez-Otin C, Ordonez GR, Eichler EE, Chen L, Cheng Z, Deakin JE, Alsop A, Thompson K, Kirby P, et al: Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008, 453: 175-183. 10.1038/nature06936.PubMedPubMed CentralView ArticleGoogle Scholar
- Albert VA, Soltis DE, Carlson JE, Farmerie WG, Wall PK, Ilut DC, Solow TM, Mueller LA, Landherr LL, Hu Y, Buzgo M, Kim S, Yoo MJ, Frohlich MW, Perl-Treves R, Schlarbaum SE, Bliss BJ, Zhang X, Tanksley SD, Oppenheimer DG, Soltis PS, Ma H, dePamphilis CW, Leebens-Mack JH: Floral gene resources from basal angiosperms for comparative genomics research. BMC Plant Biol. 2005, 5: 5-10.1186/1471-2229-5-5.PubMedPubMed CentralView ArticleGoogle Scholar
- Kim S, Koh J, Yoo MJ, Kong H, Hu Y, Ma H, Soltis PS, Soltis DE: Expression of floral MADS-box genes in basal angiosperms: implications for the evolution of floral regulators. Plant J. 2005, 43: 724-744. 10.1111/j.1365-313X.2005.02487.x.PubMedView ArticleGoogle Scholar
- Soltis DE, Chanderbali AS, Kim S, Buzgo M, Soltis PS: The ABC model and its applicability to basal angiosperms. Ann Bot. 2007, 100: 155-163. 10.1093/aob/mcm117.PubMedPubMed CentralView ArticleGoogle Scholar
- Vialette-Guiraud AC, Adam H, Finet C, Jasinski S, Jouannic S, Scutt CP: Insights from ANA-grade angiosperms into the early evolution of CUP-SHAPED COTYLEDON genes. Ann Bot. 2011, 107: 1511-1519. 10.1093/aob/mcr024.PubMedPubMed CentralView ArticleGoogle Scholar
- Fourquin C, Vinauger-Douard M, Chambrier P, Berne-Dedieu A, Scutt CP: Functional conservation between CRABS CLAW orthologues from widely diverged angiosperms. Ann Bot. 2007, 100: 651-657. 10.1093/aob/mcm136.PubMedPubMed CentralView ArticleGoogle Scholar
- Fourquin C, Vinauger-Douard M, Fogliani B, Dumas C, Scutt CP: Evidence that CRABS CLAW and TOUSLED have conserved their roles in carpel development since the ancestor of the extant angiosperms. Proc Natl Acad Sci USA. 2005, 102: 4649-4654. 10.1073/pnas.0409577102.PubMedPubMed CentralView ArticleGoogle Scholar
- Zahn LM, Kong H, Leebens-Mack JH, Kim S, Soltis PS, Landherr LL, Soltis DE, dePamphilis CW, Ma H: The evolution of the SEPALLATA subfamily of MADS-box genes: a preangiosperm origin with multiple duplications throughout angiosperm history. Genetics. 2005, 169: 2209-2223. 10.1534/genetics.104.037770.PubMedPubMed CentralView ArticleGoogle Scholar
- Zahn LM, Leebens-Mack J, dePamphilis CW, Ma H, Theissen G: To B or Not to B a flower: the role of DEFICIENS and GLOBOSA orthologs in the evolution of the angiosperms. J Hered. 2005, 96: 225-240. 10.1093/jhered/esi033.PubMedView ArticleGoogle Scholar
- Zahn LM, Leebens-Mack JH, Arrington JM, Hu Y, Landherr LL, dePamphilis CW, Becker A, Theissen G, Ma H: Conservation and divergence in the AGAMOUS subfamily of MADS-box genes: evidence of independent sub- and neofunctionalization events. Evol Dev. 2006, 8: 30-45. 10.1111/j.1525-142X.2006.05073.x.PubMedView ArticleGoogle Scholar
- Shan H, Zahn L, Guindon S, Wall PK, Kong H, Ma H, dePamphilis CW, Leebens-Mack J: Evolution of plant MADS box transcription factors: evidence for shifts in selection associated with early angiosperm diversification and concerted gene duplications. Mol Biol Evol. 2009, 26: 2229-2244. 10.1093/molbev/msp129.PubMedView ArticleGoogle Scholar
- Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, Albert VA, Ma H, dePamphilis CW: Widespread genome duplications throughout the history of flowering plants. Genome Res. 2006, 16: 738-749. 10.1101/gr.4825606.PubMedPubMed CentralView ArticleGoogle Scholar
- Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K: The flowering world: a tale of duplications. Trends Plant Sci. 2009, 14: 680-688. 10.1016/j.tplants.2009.09.001.PubMedView ArticleGoogle Scholar
- Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg LH: The frequency of polyploid speciation in vascular plants. Proc Natl Acad Sci USA. 2009, 106: 13875-13879. 10.1073/pnas.0811575106.PubMedPubMed CentralView ArticleGoogle Scholar
- De Bodt S, Maere S, Van de Peer Y: Genome duplication and the origin of angiosperms. Trends Ecol Evol. 2005, 20: 591-597. 10.1016/j.tree.2005.07.008.PubMedView ArticleGoogle Scholar
- Soltis DE, Albert VA, Leebens-Mack J, Bell CD, Paterson AH, Zheng C, Sankoff D, dePamphilis CW, Wall PK, Soltis PS: Polyploidy and angiosperm diversification. Am J Bot. 2009, 96: 336-348. 10.3732/ajb.0800079.PubMedView ArticleGoogle Scholar
- Ancestral Angiosperm Genome Project. [http://ancangio.uga.edu/]
- Lyons E, Pedersen B, Kane J, Alam M, Ming R, Tang H, Wang X, Bowers J, Paterson A, Lisch D, Freeling M: Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol. 2008, 148: 1772-1781. 10.1104/pp.108.124867.PubMedPubMed CentralView ArticleGoogle Scholar
- Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH: Synteny and collinearity in plant genomes. Science. 2008, 320: 486-488. 10.1126/science.1153917.PubMedView ArticleGoogle Scholar
- Tang H, Bowers JE, Wang X, Paterson AH: Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad Sci USA. 2010, 107: 472-477. 10.1073/pnas.0908007107.PubMedPubMed CentralView ArticleGoogle Scholar
- Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH: Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 2008, 18: 1944-1954. 10.1101/gr.080978.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Leitch I, Hanson L: DNA C-values in seven families fill phylogenetic gapsin the basal angiosperms. Bot J Linn Soc. 2002, 140: 175-179. 10.1046/j.1095-8339.2002.00096.x.View ArticleGoogle Scholar
- Arizona Genome Institute. [http://www.genome.arizona.edu/orders/direct.html?library=AT_SBa]
- Luo MC, Thomas C, You FM, Hsiao J, Ouyang S, Buell CR, Malandro M, McGuire PE, Anderson OD, Dvorak J: High-throughput fingerprinting of bacterial artificial chromosomes using the snapshot labeling kit and sizing of restriction fragments by capillary electrophoresis. Genomics. 2003, 82: 378-389. 10.1016/S0888-7543(03)00128-9.PubMedView ArticleGoogle Scholar
- Nelson WM, Bharti AK, Butler E, Wei F, Fuks G, Kim H, Wing RA, Messing J, Soderlund C: Whole-genome validation of high-information-content fingerprinting. Plant Physiol. 2005, 139: 27-38. 10.1104/pp.105.061978.PubMedPubMed CentralView ArticleGoogle Scholar
- Soderlund C, Humphray S, Dunham A, French L: Contigs built with fingerprints, markers, and FPC V4.7. Genome Res. 2000, 10: 1772-1787. 10.1101/gr.GR-1375R.PubMedPubMed CentralView ArticleGoogle Scholar
- Wall PK, Leebens-Mack J, Muller KF, Field D, Altman NS, dePamphilis CW: PlantTribes: a gene and gene family resource for comparative genomics in plants. Nucleic Acids Res. 2008, 36: D970-976.PubMedPubMed CentralView ArticleGoogle Scholar
- Duarte JM, Wall PK, Edger PP, Landherr LL, Ma H, Pires JC, Leebens-Mack J, dePamphilis CW: Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC Evol Biol. 2010, 10: 61-10.1186/1471-2148-10-61.PubMedPubMed CentralView ArticleGoogle Scholar
- Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH: Analysis of the Amborella trichopoda chloroplast genome sequence suggests that amborella is not a basal angiosperm. Mol Biol Evol. 2003, 20: 1499-1505. 10.1093/molbev/msg159.PubMedView ArticleGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110: 462-467. 10.1159/000084979.PubMedView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Leebens-Mack J, Raubeson LA, Cui L, Kuehl JV, Fourcade MH, Chumley TW, Boore JL, Jansen RK, dePamphilis CW: Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one's way out of the Felsenstein zone. Mol Biol Evol. 2005, 22: 1948-1963. 10.1093/molbev/msi191.PubMedView ArticleGoogle Scholar
- Magallon S: Using fossils to break long branches in molecular dating: a comparison of relaxed clocks applied to the origin of angiosperms. Syst Biol. 2010, 59: 384-399. 10.1093/sysbio/syq027.PubMedView ArticleGoogle Scholar
- Smith SA, Beaulieu JM, Donoghue MJ: An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proc Natl Acad Sci USA. 2010, 107: 5897-5902. 10.1073/pnas.1001225107.PubMedPubMed CentralView ArticleGoogle Scholar
- Baucom RS, Estill JC, Chaparro C, Upshaw N, Jogi A, Deragon JM, Westerman RP, Sanmiguel PJ, Bennetzen JL: Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genet. 2009, 5: e1000732-10.1371/journal.pgen.1000732.PubMedPubMed CentralView ArticleGoogle Scholar
- International Rice Genome Sequencing Project: The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Micac E, Jublot D, Poulain J, Bruyere C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, et al: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-467. 10.1038/nature06148.PubMedView ArticleGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, D Xu, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, et al: Genome sequence of the palaeopolyploid soybean. Nature. 2010, 463: 178-183. 10.1038/nature08670.PubMedView ArticleGoogle Scholar
- Vershinin AV, Druka A, Alkhimova AG, Kleinhofs A, Heslop-Harrison JS: LINEs and gypsy-like retrotransposons in Hordeum species. Plant Mol Biol. 2002, 49: 1-14. 10.1023/A:1014469830680.PubMedView ArticleGoogle Scholar
- Leeton PR, Smyth DR: An abundant LINE-like element amplified in the genome of Lilium speciosum. Mol Gen Genet. 1993, 237: 97-104.PubMedGoogle Scholar
- Chen Y, Zhou F, Li G, Xu Y: MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene. 2009, 436: 1-7. 10.1016/j.gene.2009.01.019.PubMedView ArticleGoogle Scholar
- Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF: Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 2006, 16: 1252-1261. 10.1101/gr.5282906.PubMedPubMed CentralView ArticleGoogle Scholar
- Schmidt T: LINEs, SINEs and repetitive DNA: non-LTR retrotransposons in plant genomes. Plant Mol Biol. 1999, 40: 903-910. 10.1023/A:1006212929794.PubMedView ArticleGoogle Scholar
- Kurtz S, Narechania A, Stein JC, Ware D: A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics. 2008, 9: 517-10.1186/1471-2164-9-517.PubMedPubMed CentralView ArticleGoogle Scholar
- KEW C-Value Database. [http://data.kew.org/cvalues/]
- Bowers JE, Arias MA, Asher R, Avise JA, Ball RT, Brewer GA, Buss RW, Chen AH, Edwards TM, Estill JC, Exum HE, Goff VH, Herrick KL, Steele CL, Karunakaran S, Lafayette GK, Lemke C, Marler BS, Masters SL, McMillan JM, Nelson LK, Newsome GA, Nwakanma CC, Odeh RN, Phelps CA, Rarick EA, Rogers CJ, Ryan SP, Slaughter KA, Soderlund CA, et al: Comparative physical mapping links conservation of microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci USA. 2005, 102: 13206-13211. 10.1073/pnas.0502365102.PubMedPubMed CentralView ArticleGoogle Scholar
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, et al: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457: 551-556. 10.1038/nature07723.PubMedView ArticleGoogle Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA, Minx P, Reily AD, Courtney L, Kruchowski SS, Tomlinson C, Strong C, Delehaunty K, Fronick C, Courtney B, Rock SM, Belter E, Du F, Kim K, Abbott RM, Cotton M, Levy A, Marchetto P, Ochoa K, Jackson SM, Gillam B, et al: The B73 maize genome: complexity, diversity, and dynamics. Science. 2009, 326: 1112-1115. 10.1126/science.1178534.PubMedView ArticleGoogle Scholar
- Bell CD, Soltis DE, Soltis P: The age and diversification of angiosperms re-revisited. Am J Bot. 2010, 97: 1296-1303. 10.3732/ajb.0900346.PubMedView ArticleGoogle Scholar
- Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003, 422: 433-438. 10.1038/nature01521.PubMedView ArticleGoogle Scholar
- Paterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci USA. 2004, 101: 9903-9908. 10.1073/pnas.0307901101.PubMedPubMed CentralView ArticleGoogle Scholar
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov Schein AJ, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, Cunningham R, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.PubMedView ArticleGoogle Scholar
- Estill JC, Bennetzen JL: The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes. Plant Methods. 2009, 5: 8-10.1186/1746-4811-5-8.PubMedPubMed CentralView ArticleGoogle Scholar
- DAWGPAWS. [http://dawgpaws.sourceforge.net]
- FGENESH. [http://softberry.com]
- Stanke M, Schoffmann O, Morgenstern B, Waack S: Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006, 7: 62-10.1186/1471-2105-7-62.PubMedPubMed CentralView ArticleGoogle Scholar
- Korf I: Gene finding in novel genomes. BMC Bioinformatics. 2004, 5: 59-10.1186/1471-2105-5-59.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanco E, Abril JF: Computational gene annotation in new genome assemblies using GeneID. Methods Mol Biol. 2009, 537: 243-261. 10.1007/978-1-59745-251-9_12.PubMedView ArticleGoogle Scholar
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268: 78-94. 10.1006/jmbi.1997.0951.PubMedView ArticleGoogle Scholar
- Wu TD, Watanabe CK: GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005, 21: 1859-1875. 10.1093/bioinformatics/bti310.PubMedView ArticleGoogle Scholar
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O: Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003, 31: 5654-5666. 10.1093/nar/gkg770.PubMedPubMed CentralView ArticleGoogle Scholar
- Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C, Zhang P, Huala E: The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008, 36: D1009-1014.PubMedPubMed CentralView ArticleGoogle Scholar
- Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, Gouzy J, Wang X, Mudge J, Vasdewani J, Schiex T, Spannagl M, Monaghan E, Nicholson C, Humphray SJ, Schoof H, Mayer KF, Rogers J, Quetier F, Oldroyd GE, Debelle F, Cook DR, Retzel EF, Roe BA, Town CD, Tabata S, Van de Peer Y, Young ND: Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc Natl Acad Sci USA. 2006, 103: 14959-14964. 10.1073/pnas.0603228103.PubMedPubMed CentralView ArticleGoogle Scholar
- Itoh T, Tanaka T, Barrero RA, Yamasaki C, Fujii Y, Hilton PB, Antonio BA, Aono H, Apweiler R, Bruskiewich R, Bureau T, Burr F, Costa de Oliveira A, Fuks G, Habara T, Haberer G, Han B, Harada E, Hiraki AT, Hirochika H, Hoen D, Hokari H, Hosokawa S, Hsing YI, Ikawa H, Ikeo K, Imanishi T, Ito Y, Jaiswal P, Kanno M, et al: Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res. 2007, 17: 175-183. 10.1101/gr.5509507.PubMedPubMed CentralView ArticleGoogle Scholar
- Project IRGS: The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Harris RS: Improved pairwise alignment of genomic DNA. PhD Thesis. 2007, Pennsylvania State University, Biology DepartmentGoogle Scholar
- Miller Lab Software. [http://www.bx.psu.edu/miller_lab/]
- Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003, 13: 2178-2189. 10.1101/gr.1224503.PubMedPubMed CentralView ArticleGoogle Scholar
- Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V: PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res. 2008, 36: D959-965.PubMedPubMed CentralView ArticleGoogle Scholar
- PlantGDB. [http://www.plantgdb.org/]
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5: 113-10.1186/1471-2105-5-113.PubMedPubMed CentralView ArticleGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22: 2688-2690. 10.1093/bioinformatics/btl446.PubMedView ArticleGoogle Scholar
- Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL, Salzberg SL, Feng L, Jones MR, Skelton RL, Murray JE, Chen C, Qian W, Shen J, Du P, Eustice M, Tong E, Tang H, Lyons E, Paull RE, Michael TP, Wall K, Rice DW, Albert H, Wang ML, Zhu YJ, et al: The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008, 452: 991-996. 10.1038/nature06856.PubMedPubMed CentralView ArticleGoogle Scholar
- Luo M, Wing RA: An improved method for plant BAC library construction. Methods Mol Biol. 2003, 236: 3-20.PubMedGoogle Scholar
- Kim H, San Miguel P, Nelson W, Collura K, Wissotski M, Walling JG, Kim JP, Jackson SA, Soderlund C, Wing RA: Comparative physical mapping between Oryza sativa (AA genome type) and O. punctata (BB genome type). Genetics. 2007, 176: 379-390. 10.1534/genetics.106.068783.PubMedPubMed CentralView ArticleGoogle Scholar
- Ammiraju JS, Luo M, Goicoechea JL, Wang W, Kudrna D, Mueller C, Talag J, Kim H, Sisneros NB, Blackmon B, Fang E, Tomkins JB, Brar D, MacKill D, McCouch , Kurata N, Lambert G, Galbraith DW, Arumuganathan K, K Rao, Walling SJ, Gill N, Yu Y, SanMiguel P, Soderlund C, Jackson S, Wing RA: The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res. 2006, 16: 140-147.PubMedPubMed CentralView ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.PubMedView ArticleGoogle Scholar
- Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17: 1093-1104. 10.1093/bioinformatics/17.12.1093.PubMedView ArticleGoogle Scholar
- Zuccolo A, Sebastian A, Talag J, Yu Y, Kim H, Collura K, Kudrna D, Wing RA: Transposable element distribution, abundance and role in genome size variation in the genus Oryza. BMC Evol Biol. 2007, 7: 152-10.1186/1471-2148-7-152.PubMedPubMed CentralView ArticleGoogle Scholar
- SeqClean. [http://sourceforge.net/projects/seqclean/]
- Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WE, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004, 14: 1147-1159. 10.1101/gr.1917404.PubMedPubMed CentralView ArticleGoogle Scholar
- Sonnhammer EL, Durbin R: A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene. 1995, 167: GC1-10. 10.1016/0378-1119(95)00714-8.PubMedView ArticleGoogle Scholar
- Sputnik. [http://espressosoftware.com/sputnik/index.html]
- Morgante M, Hanafey M, Powell W: Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002, 30: 194-200. 10.1038/ng822.PubMedView ArticleGoogle Scholar
- Xiong Z, Kim JS, Pires JC: Integration of genetic, physical, and cytogenetic maps for Brassica rapa chromosome A7. Cytogenet Genome Res. 2010, 129: 190-198. 10.1159/000314640.PubMedView ArticleGoogle Scholar
- Lee E, Harris N, Gibson M, Chetty R, Lewis S: Apollo: a community resource for genome annotation editing. Bioinformatics. 2009, 25: 1836-1837. 10.1093/bioinformatics/btp314.PubMedView ArticleGoogle Scholar
- Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The generic genome browser: a building block for a model organism system database. Genome Res. 2002, 12: 1599-1610. 10.1101/gr.403602.PubMedPubMed CentralView ArticleGoogle Scholar
- Rice Annotation Project. [http://rapdb.dna.affrc.go.jp/]
- Grape Genome Browser. [http://www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.