Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication
© Mun et al.; licensee BioMed Central Ltd. 2009
Received: 18 May 2009
Accepted: 12 October 2009
Published: 12 October 2009
Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics.
We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago.
This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution.
Flowering plants (angiosperms) have evolved in genome size since their sudden appearance in the fossil records of the late Jurassic/early Cretaceous period [1–4]. The genome expansion seen in angiosperms is mainly attributable to occasional polyploidy. Estimation of polyploidy levels in angiosperms indicates that the genomes of most (>90%) extant angiosperms, including many crops and all the plant model species sequenced thus far, have experienced one or more episodes of genome doubling at some point in their evolutionary history [5, 6]. The accumulation of transposable elements (TEs) has been another prevalent factor in plant genome expansion. Recent studies on maize, rice, legumes, and cotton have demonstrated that the genome sizes of these crop species have increased significantly due to the accumulation and/or retention of TEs (mainly long terminal repeat retrotransposons (LTRs)) over the past few million years; the percentage of the genome made up of transposons is estimated to be between 35% and 52% based on sequenced genomes [7–12]. However, genome expansion is not a one-way process in plant genome evolution. Functional diversification or stochastic deletion of redundant genes by accumulation of mutations in polyploid genomes and removal of LTRs via illegitimate or intra-strand recombination can result in downsizing of the genome [13–15]. Nevertheless, neither of the aforementioned mechanisms has been demonstrated to occur frequently enough to balance genome size growth, and plant genomes tend, therefore, to expand over time.
The progress in whole genome sequencing of model genomes presents an important challenge in plant genomics: to apply the knowledge gained from the study of model genomes to biological and agronomical questions of importance in crop species. Comparative structural genomics is a well-established strategy in applied agriculture in several plant families. However, comparative analyses of modern angiosperm genomes, which have experienced multiple rounds of polyploidy followed by differential loss of redundant sequences, genome recombination, or invasion of LTRs, are characterized by interrupted synteny with only partial gene orthology even between closely related species, such as cereals , legumes [17, 18], and Brassica species [19, 20]. Furthermore, functional divergence of duplicated genes limits interpretation of function based on orthology, which complicates knowledge transfer from model to crop plants. Thus, better delimitation of comparative genome arrangements reflecting evolutionary history will allow information obtained from fully sequenced model genomes to be used to target syntenic regions of interest and to infer parallel or convergent evolution of homologs important to biological and agronomical questions in closely related crop genomes.
The mustard family (Brassicaceae or Cruciferae), the fifth largest monophyletic angiosperm family, consists of 338 genera and approximately 3,700 species in 25 tribes , and is fundamentally important to agriculture and the environment, accounting for approximately 10% of the world's vegetable crop produce and serving as a major source of edible oil and biofuel . Brassicaceae includes two important model systems: Arabidopsis thaliana (At), the most scientifically important plant model system for which complete genome sequence information is available, and the closely related, agriculturally important Brassica complex - B. rapa (Br, A genome), B. nigra (Bn, B genome), B. oleracea (Bo, C genome), and their three allopolyploids, B. napus (Bna, AC genome), B. juncea (Bj, AB genome), and B. carinata (Bc, BC genome). Syntenic relationships and polyploidy history in these two model systems have been investigated, although details about macro- and microsyntenic relationships between At and Brassica are limited and fragmented. Previous studies demonstrated broad-range chromosome correspondence between the At and Brassica genomes [23, 24], and a few studies have demonstrated specific cases of conservation of gene content and order with frequent disruption by interspersed gene loss and genome recombination [19, 20]. Although this issue is contentious, there is evidence that Brassicaceae genomes have undergone three rounds of whole genome duplication (WGD; hereafter referred to as 1R, 2R, and 3R, which are equivalent to the γ, β, and α duplication events) [5, 25, 26]. One profound finding from comparative analyses is the triplicate nature of the Brassica genome, indicating the occurrence of a whole genome triplication event (WGT, 4R) soon after divergence from the At lineage approximately 17 to 20 million years ago (MYA) [19, 20, 26]. This result strongly suggests that comparative genomic analyses using single gene-specific amplicons or those based on small scale synteny comparisons will fail to identify all related genome segments, and thus not be able to provide accurate indications of orthology between the At and Brassica genomes. However, obtaining sufficient sequence information from Brassica genomes to identify genome-wide orthologous relationships between the At and Brassica genomes is a major challenge.
Br was recently chosen as a model species representing the Brassica 'A' genome for genome sequencing [27, 28]. This species was selected because it has already proved a useful model for studying polyploidy and because it has a relatively small (approximately 529 megabase-pair (Mbp)) but compact genome with genes concentrated in euchromatic spaces. However, widespread repetitive sequences in the Br genome hinder direct application of whole genome shotgun sequencing. Instead, targeted sequencing of specific regions of the Br genome could be informed by the reference At genome by selecting genomic clones based on sequence similarity; this approach is referred to as comparative tiling . Here, we report sequencing of large-scale regions of the Br euchromatic genome, covering almost all of the At euchromatic regions, obtained using the comparative tiling method. We performed a genome-wide sequence comparison of Br and At and analyzed the number of substitutions per synonymous site (Ks) between the two genomes and among related Brassica sequences to identify syntenic relationships and to further refine our understanding of the evolution of polyploidy. We also investigated genome microstructure conservation between the two genomes. In this study, we provide a foundation to reconstruct both the ancestral genome of the Brassica progenitor and the evolutionary history of the Brassica lineage, which we anticipate will provide a robust model for Brassica genomic studies and facilitate the investigation of the genome evolution of domesticated crop species.
Generation of Breuchromatic sequence contigs and genome coverage
Summary of B. rapa chromosome sequences comparatively tiled on the A. thaliana genome
Number of BACs
Number of sequence contigs
Total sequence length (Mbp)
Coverage of Atgenome (Mbp)
Gaps of Atgenome (Mbp)
The genome coverage of the gene-rich Br sequences was estimated by representation in two different datasets: expressed sequence tag (EST) sequences and conserved single-copy genes. Based on a BLAT analysis of 32,395 Br unigenes (a set of ESTs that appear to arise from the same transcription locus) against the sequence contigs, the proportion of hits recovered under stringent conditions (see Materials and methods) was 29.2%. This result was largely consistent with the proportion of rosid-conserved single-copy genes showing matches to Br sequences. A TBLASTN comparison of 1,070 At-Medicago truncatula (Mt) conserved single-copy genes against Br sequences revealed a 24.3% match. Both methods indicate approximately 30% coverage of euchromatin in the dataset analyzed; thus, the euchromatic region of Br is estimated to be approximately 220 Mbp, 42% of the whole genome given that the genome size of Br is 529 Mbp .
Characteristics of the B. rapagene space
Gene annotation was carried out using our specialized Br annotation pipeline. Gene prediction of the Br sequence data using a variety of ab initio, similarity-based, and EST/full-length cDNA-based methods resulted in the construction of 15,762 gene models. Taken together with the genome coverage of Br sequences, the overall number of protein-coding genes in the Br genome is at least 52,000 to 53,000, which is higher than those of other plant genomes sequenced thus far, including At , rice (Oryza sativa (Os)) , poplar (Populus trichocarpa (Pt)) , grape , papaya , and sorghum . However, the estimated total number of genes in the Br genome is only twice that of At. Details of the annotation are available online at the URL cited in the 'Data used in this study' section in the Materials and methods.
Comparison of the overall composition of annotated protein coding genes in the B. rapa sequence contigs and euchromatic counterparts in the A. thaliana genome
Number of sequence contigs
Total sequence length (Mbp)
Number of protein coding genes
Number of exons per gene
Intron size (bp)
Exon size (bp)
Average gene size (kbp)
Average gene density (kbp/gene)
Overall G/C content (%)
Comparison of repetitive sequences identified in the B. rapa sequence contigs and euchromatic counterparts in the A. thaliana genome
Genome coverage (%)*
Low complexity repetitive sequences
Synteny between the B. rapa and A. thalianagenomes
The Br and At genomes share a minimum of 20 large-scale synteny blocks with substantial microsynteny; these synteny blocks extend the length of whole chromosome arms. At shows synteny of chromosome arms with multiple chromosome blocks of Br, apparently corresponding to triplicated remnants (Figure 2b). At1S (short arm), At2L (long arm), At4L, and At5 have three long-range synteny counterparts in three independent Br chromosomes. However, At1L and At3 have only one or two synteny blocks in the Br genome. Moreover, some genome regions of At, including a smaller section of At2S and At4S, show no significant synteny with Br counterparts, indicating chromosome-level deletion of triplicated segments. Incidentally, Br shows synteny with a major single chromosome along almost the entire length (A1, A2, A4, and A10) or fragments of multiple At chromosomes in a complicated mosaic pattern, indicating frequent recombination of Br chromosomes. Notable regions of synteny are shown in Figure 2b, and are At1S-A6/A8/A9, At1L-A7, At2L-A3/A4/A5, At3S-A3/A5, At3L-A7/A9, At4L-A1/A3/A8, and At5-A2/A3/A10 (synteny view available at the URL cited in the 'Data used in this study' section in the Materials and methods. Additional synteny blocks scattered throughout genome regions, probably due to recombination, were also identified.
Within individual synteny blocks, microsynteny (conservation of gene content and order) was considerable. The average degree of proteome conservation for all predicted synteny blocks was 52 ± 13% in the blocks (Table S3 in Additional data file 1). This value is almost the same as that of the Mt-Lotus japonicus comparison in which an ancient WGD event at a similar time period (Ks 0.7 to 0.9) as the Br-At WGD but earlier speciation (Ks 0.6) than Br-At was detected . The underestimated value reported here presumably reflects significant gene loss and rearrangement after WGT in the Br lineage resulting in genome shrinkage, based on the fact that deletion events in syntenic blocks of the Br genome were twofold more frequent than in the At genome. Genes without corresponding homologs in syntenic regions contributed to 15 ± 7% of all genes from Br but 33 ± 13% from At (Table S3 in Additional data file 1; Additional data file 3). Genes encoding proteins involved in transcription or signal transduction were not found to be significantly more retained in syntenic blocks than those encoding proteins classified as having other functions. Further genome sequencing will help resolve the synteny in the uncovered and/or the scattered genome regions.
Rearrangement of the B. rapagenome
An examination of the Br genome from the perspective of ancestral blocks reveals that three copies of the genome are present, as predicted from the WGT (Figure 3). Although there are several discontinuous matches due to gaps between syntenic blocks, almost 50% of the ancestral blocks were triplicated in the Br genome, while others occurred only once or twice, indicating loss of blocks during genome rearrangement. Blocks D, G, and M could not be found on the Br genome. The Br genome is highly rearranged relative to At compared with AK. Block R was localized together with block W in triplicate regions (A2, A3, and A10). However, in At5, blocks R and W were separated on the short arm and long arm, respectively [38, 39]. Similarly, blocks E and N were adjacent and triplicated in Br but separated in At. Meanwhile, blocks K and L, which are fused in AK but split in different chromosomes of At, were adjacent (A6) or separated (A9) on the same chromosomes of Br. However, we did not determine precisely which copy of the replicated AK block family corresponds to the Br BACs because of the possibility that Br sequences in the polyploid genome were not accurately positioned. Because several genetic markers originate from duplicate or triplicate regions of the Br genome, the true location of the BACs could correspond to any of the amplified bands, which could result in inaccurate mapping of the BAC sequence. In this case, the resulting assignment of the BAC to an incorrect linkage group on a specific AK block family member would also be flawed; however, we found that almost all BAC sequences showed excellent correspondence to the correct family of AK blocks. Further analysis, including chromosome painting and additional genome sequencing, will allow determination of the precise location of AK blocks in the Br genome.
Loss of genes from the recent duplication event in the B. rapagenome
The Ks distribution for At and Br orthologs displayed two peaks at Ks = 0.3 to 0.4 and 2.0 to 2.1, corresponding to shared duplication events (3R and 2R) and speciation between the genomes at around 13 to 17 MYA (Figure 4d). As reported before, the oldest duplication (1R) could not be seen in the Ks distributions in both genomes. Surprisingly, a comparison of the Ks mode for the paralogs in At and Br identified remarkable differences in the duplicated genes retained in the two genomes. Furthermore, the At genome has two clear peaks for 3R (mode Ks = 0.6 to 0.7) and 2R (mode Ks = 1.7 to 1.8). However, in the Br genome, two peaks representing 4R (mode Ks = 0.2 to 0.3) and 2R (mode Ks = 1.8 to 1.9) are evident, but the 3R peak has collapsed (Figure 4e, f). The difference between the distributions for Br-Br versus Br-At (P = 1.65E-8) was significantly higher than that for At-At versus Br-At (P = 0.001). Taken together, these findings suggest that duplicated genes produced by the 3R event were widely lost in the triplicated Br genome.
Comparison of the degree of conservation between duplicated groups originating from different polyploidy events in B. rapa and B. oleracea
Number of groups produced
Number of genes*
Degree of conservation
B. oleracea ‡
A comparative genomics approach to target the euchromatic gene space of a crop genome
Investigation of crop genomes not only offers information that can be used for agricultural improvement, but also provides opportunities to understand angiosperm biology and evolution. As of 2009, the genome sequences of only five economically important crop plants (rice, poplar, grape, papaya, and sorghum) have been published [8–12], and whole genome sequencing projects are currently underway for only a few selected crop species. One hurdle faced when sequencing a crop genome is genome obesity due to polyploidy and repetitive DNA . Therefore, a stepwise approach is required to obtain genome-wide information from crop genomes, and strategies for targeting gene-rich fractions are required. In combination with EST sequencing, two approaches - methylation filtration  and Cot-based cloning and sequencing  - were developed to capture euchromatic regions. Although both methods enrich for gene-rich fractions, they can exclude transcriptionally suppressed regions or euchromatic regions with abundant interspersed repetitive sequences (tandem repeats). We applied a novel gene space targeting method by allocating BAC clones to a closely related model genome based on BAC end sequence (BES) matches; this approach has not previously been reported in a genome sequencing project. This method has several advantages. First, gene-rich fractions of the crop genome can be obtained successfully in silico without additional experiments. We collected approximately 30% of the euchromatic region of B. rapa in this study. If a greater overlap between the clones and target region is allowed, and additional information in the form of genetic maps and physical contigs is used, the gene-rich fraction recovered is likely to increase significantly. Second, clone-by-clone strategies used in genome sequencing can benefit directly from this method because of selection of gene-rich seed BACs as well as the alignment of sequence scaffolds. Quick selection of a sufficient number of gene-rich seed BACs and directed ordering of the sequence scaffold will likely accelerate clone-based whole genome sequencing at reduced cost. The BAC clones selected in this study can be used as seed BACs for the ongoing clone-by-clone genome sequencing of Br [27, 28]. Third, this analysis allows investigation of syntenic relationships between wild and crop genomes, thereby informing our understanding of crop evolution. Integration of genomes based on sequence level comparisons can offer a platform for the correlation between specific genes and phenotypes, which is important for further improvement of crops. We anticipate that application of our method will accelerate knowledge spreading from nodal model species to closely related taxa. For example, genome sequencing of other Brassica crops, particularly the construction of sequence assemblies and scaffolds of Bna, can benefit from the information obtained from the Br genome; this holds true even for next-generation sequencing. Thus, we anticipate that this study will make a significant contribution to structural and comparative genomic studies of crop species.
Counterbalancing genome obesity after whole genome triplication in B. rapa
A large-scale comparison of Br genomic sequences and the whole euchromatic region of At demonstrated extensive synteny between the genomes, and provided clear evidence of a recent WGT event in the Brassica lineage. Our results significantly expand on previous observations of synteny between At and Br based on comparative genetic mapping  and small-scale comparisons of homologous regions  by deciphering the start-end points of macrosynteny blocks and elucidating the fine-scale details of microsynteny within the syntenic regions more accurately. Even though the Br sequencing project is still underway and the sequences used in this study are incomplete, the scale of synteny between the two genomes at both the macro- and micro-levels is significant. As the Br sequencing project moves forward, the availability of nearly complete coverage of the euchromatin will enable more precise definition of syntenic blocks between At and Br, which can be used to reconstruct ancestral chromosome sequences of Brassica.
Despite the WGT event, the total number of genes in the Br genome was estimated to be approximately 53,000, which is only a twofold increase compared with that of At. The usual fate of a duplicate-gene pair in a polyploid genome is nonfunctionalization or the deletion of one copy [44–46]. The reduction in the overall number of genes in the triplicated Br genome can be regarded as a result of a process that restores the diploid state, thereby counterbalancing genome obesity. This process seemed to be driven by the deletion of redundant genome components at the level of both the chromosome and the gene. A genome-wide synteny comparison between Br and At revealed that some of the triplicated copies of Br segments were lost or reconstructed. In addition, microsynteny analysis also indicated a relatively shrunken genome throughout the entire euchromatic region of the Br genome, with the Br gene space occupying a fraction 30% smaller than that of At due to a higher frequency of deletion events in the Br genome. A previous study reported that in the At genome, genes with regulatory functions, such as those encoding transcription factors or genes involved in signal transduction, were retained significantly more often than genes with other molecular functions . However, we did not find differential retention of genes according to molecular function, which suggests random deletion of redundant genes in triplicated regions of the Br genome before functional diversification.
Several mechanisms responsible for post-polyploid changes have been proposed. These include chromosome rearrangements caused by unequal crossing-over, homologous recombination, translocation, or other cytogenetic events [47–50]. A tandem array with high sequence similarity would be a good candidate for deletion, because it is more likely to recombine and less likely to have a severe phenotype when one redundant gene is deleted. Fewer tandem duplicate genes in the Br genome may, therefore, be attributable to an increase in the rate of deletion. Incidentally, because polyploidy itself is a form of genomic 'disturbance,' it might induce a cellular response such as epigenetic silencing by hypermethylation, which may be especially relevant to genome evolution . As a result, the epigenetic response itself may accelerate the rate of mutation, thereby causing rapid genomic change as seen in Br. In addition, polyploidy could increase transposable element activity, causing the deletion of genes or even chromosome segments. Illegitimate recombination of TEs has been demonstrated to have the ability to remove large blocks of DNA in Arabidopsis, rice, [15, 49] and wheat . We speculate that the twofold increase in transposon accumulation in the triplicated euchromatic regions of Br compared to the euchromatic counterpart regions of At might be correlated with the deletion of duplicated genes.
Evolution of the Brassica'A' genome
More importantly, differential gene loss following 4R in the Brassica genome might be responsible for the diversification of the genome, based on the finding that significantly more genes duplicated as a result of 3R have been lost in Br than in Bo. However, it is not clear if duplicated genes from the 3R event that were retained in Bo have diverged functionally. It appears that the split between Br and Bo happened rapidly (0.1 Ks interval) compared to the At-Brassica split (0.3 Ks interval), perhaps due to differential retention of duplicated genes and genome recombination in the ancestral Brassica genome. These observations, along with the independent accumulation of repetitive sequences, may have facilitated speciation within the tribe Brassiceae, which contains approximately 240 highly diverse species. Further analysis and cross-comparisons of diploid and allopolyploid genomes of Brassica will enhance our understanding of the fate of duplicated genes in the Brassica genome. It appears that, as a counterbalance to genome obesity, there was higher selection pressure on redundant genes in the triplicated Brassica ancestor, accelerating gene loss in this triplicated ancestor compared to the Arabidopsis-Brassica common ancestor. Alternatively, differences in the life cycles of Brassica progenitors might have resulted in the differential deletion of duplicated genes in Brassica genomes. Moreover, artificial selection after domestication could also have had an impact on differentiation of diploid Brassica genomes. Taken together, the available evidence suggests that genome duplication and chromosomal diploidization are ongoing processes collectively driving the evolution of Brassica genomes.
Comparisons of large-scale genomic sequences of Br and the whole euchromatic region of At revealed extensive synteny between the genomes due to at least two shared genome duplication events and a recent WGT event specific to the Brassica lineage. The reduction of the number of genes in the triplicated Br genome by approximately one-third can be regarded to be the result of a process counterbalancing genome obesity to regain the diploid state. Segmental loss of triplicated genome blocks and differential deletion of duplicated genes in Br along with less accumulation of transposons appear to have resulted in the small size of the Br genome (approximately 529 Mbp) compared to its sibling species, Bo (approximately 696 Mbp) and Bn (approximately 632 Mbp) . The events proposed here indicate that genome diploidization following polyploidy played an important role in the radiation of Brassica. Our results clarify the orthology between Br and At and establish a strong basis for the genome evolution of Brassica. All the sequenced BAC clones investigated in this study were provided to the B. rapa Genome Sequencing Project as seed BACs for use as starting points for chromosome sequencing.
Materials and methods
BAC selection, sequencing, and sequence contig assembly
We previously published an efficient and novel clone selection method based on in silico BES matches to a model genome, which we named the comparative tiling method . To select gene-rich Br BAC clones covering the entire At euchromatic regions, a total of 92,000 BESs were allocated to At chromosomes by using BLASTZ at a cutoff of <E-6 with both end matches at 30 to 500 kbp intervals. A total of 4,647 BAC clones were allocated to 92 Mbp of At euchromatic regions and 589 minimally overlapping BAC clones (292 overlapping clones with an average of 41 kbp overlaps and 297 singleton clones) were finally selected and sequenced using an ABI 3730xl sequencer. The minimal sequence goal was five phase 2 (fully oriented and ordered sequence with some small gaps and low quality sequences) contigs, but 18 clones (3%) were sequenced as phase 1 due to large repetitive sequences (Table S1 in Additional data file 1). To anchor clones, a combination of sequence-based genetic mapping , fingerprint contig data , and fluorescent in situ hybridization (FISH) was used (Table S2 in Additional data file 1). The sequence contig assembly was created based on overlapping sequences. BAC sequences were assembled into big sequence contigs by first comparing paired BES matches and BAC sequences sharing overlapping positions on the target At chromosomes using Pipmaker . Then, sequence contigs were assembled based on overlapping sequences using Phred/Phrap/Consed programs [56–58]. The location of sequence contigs or BAC singletons was determined primarily by genetic marker anchors with fingerprint contig information, paired BES, and FISH results providing additional information about local contig and BAC ordering. Pseudochromosome sequences were created by connecting sequence assemblies with 10-kbp additions of anonymous sequences. All the Br sequences used in this study are available at NCBI and the URL cited below in the 'Data used in this study' section and relevant reference sequence sources are listed in Table S6 in Additional data file 1.
Estimation of genome coverage and genome annotation
The sequence coverage of the Br genome by BACs was estimated by calculating the proportions of Br EST unigenes and conserved single-copy rosid genes with strong matches. For EST comparisons, we considered unigenes to have a genome match if more than 90% of unigenes matched with at least 95% identity in a BLAT  analysis. For the single-copy rosid gene comparison, we created a list of 1,070 single-copy At and Mt genes not included in the Br EST collections. They were considered to have a genome match in Br if at least 50% of the gene matched in a TBLASTN  search at a cutoff of <E-100. The assembled sequences were masked using RepeatMasker  using a dataset combining the plant repeat element database of the Munich Information Center for Protein Sequence (MIPS)  and our specialized database of Br repetitive sequences. Gene model prediction was performed using EVidenceModeler . Putative exons and open reading frames were predicted ab initio using FGENESH  and AUGUSTUS  programs with the parameters trained using the Br matrix. To predict consensus gene structures, Br ESTs plus full-length cDNAs, plant transcripts, and plant protein sequences were aligned to the predicted genes using PASA  and AAT  packages. The predicted genes and evidence sequences were then assembled according to the weight of each evidence type using EVidenceModeler. The highest scoring set of connected exons, introns, and noncoding regions was selected as a consensus gene model. Proteins encoded by gene models were searched against the Pfam database  and automatically assigned a putative name based on conserved domain hits or similarity with previously identified proteins. Annotated gene models were also searched against a database of plant transposon-encoded proteins . Predicted proteins with a top match to transposon-encoded proteins were excluded from the annotation and gene counts.
Identification of syntenic blocks based on genome comparisons
Syntenic regions of the genomes of Br and At were identified by a proteome comparison based on BLASTP  analysis. The entire proteomes of the two genomes were compared, and only the top reciprocal BLASTP matches per chromosome pair were selected (minimum of 50% alignment coverage at a cutoff of <E-20). We chose to perform a BLASTP similarity search because it is inherently more sensitive than BLASTN . Moreover, the BLASTP hit matrix contains fewer BLAST hits that are due to repetitive nucleotide sequences. Chromosome scale synteny blocks were inferred by visual inspection of dot-plots using DiagHunter with parameters as described in Cannon et al. . Gene orientation, insertions/deletions, and inversions were considered, and at least four genes with the same respective orientations in both genomes were required to establish a primary candidate synteny block. To distinguish highly homologous real synteny blocks from false positives due to multiple rounds of polyploidy followed by genome rearrangement, we manually checked all the primary candidate blocks. Previous studies reported that the degree of gene conservation between At and the Brassica genome in several selected syntenic regions was >50%. Based on this result, 227 blocks showing a gene conservation index of >50% (twice the number of conserved matches divided by the total number of non-redundant genes in the blocks; tandem duplicated genes were collapsed to a single homolog) were selected as real syntenic regions. For microsynteny analysis, we manually broke the blocks if At homologs of independent Br sequences in the syntenic blocks were separated by more than 10 kbp. The synteny display is available online at the URL cited in the 'Data used in this study' section.
Ks analysis of homologous sequences
The timing of duplication events and the divergence of homologous segments was estimated by calculating the number of synonymous substitutions per synonymous site (Ks) between homologous genes. For the model genome comparisons, annotated gene models were used, whereas for the comparison between the Brassica genomes, ESTs were analyzed, even though they are error-prone. One drawback associated with the analysis of paralogs derived from ESTs is that multiple entries for the same gene can be included in the dataset, leading to overestimation of redundant Ks measures . However, it is reasonable to assume that redundant Ks measures are randomly distributed among all the Ks values; thus. the effect of redundancy is likely to have been neutral. Before comparing the Ks distribution for EST paralogs and genome sequences of Br, we carefully checked the patterns of Ks in the EST data; we did not find any significantly overestimated bulges or peaks. To identify orthologs and paralogs, the protein sequences of the gene models or ESTs were aligned using the all-against-all alignment and the resulting alignment was used as a reference to align the nucleotide sequences. After removing gaps, the Ks values from pairwise alignments of homologous sequences were determined using the maximum likelihood method implemented in the CODEML  program of the PAML  package under the F3×4 model, similar to the analysis described by Blanc et al. . We compared the mode rather than the mean of Ks distributions, because the mode is not affected by bias due to incorrectly defined homolog pairs, which is partly responsible for unexpected overestimation of Ks. Only gene pairs with a Ks estimate of <5 were considered for further evaluation and their Ks age distribution was calculated using the interval 0.02 to 0.1. Divergence time calculations were based on the neutral substitution rate of 1.5 × 10-8 substitutions per site per year for chalcone synthase (Chs) and alcohol dehydrogenase (Adh) .
Gene conservation between sister blocks in the B. rapagenome
Because Br BAC clones were selected to minimally overlap the target At region, self comparison of Br sequences using the DiagHunter program found few duplicated regions (Figure S4 in Additional data file 2). Instead, we manually identified sister blocks of duplication events by using synteny group information between Br and At. Br sequence blocks were defined as putative sister blocks of 3R if two different sequence blocks showed high synteny with respect to At regions known to be duplicated remnants of 3R , whereas independent Br sequence blocks sharing the same syntenic relationship with a single At region were selected as sister blocks of 4R. For additional validation, we compared the Ks distribution modes between the paralog gene pairs in the sister blocks.
Data used in this study
All the data used in this study can be accessed online at .
Additional data files
The following additional data are available with the online version of this paper: Tables S1, S2, S3, S4, S5 and S6 (Additional file 1); Figures S1, S2, S3 and S4 (Additional file 2); a spreadsheets listing synteny blocks between Br and At genomes (Additional file 3); spreadsheets describing genome blocks and block boundaries of the ancestral karyotype (AK) mapped on the B. rapa chromosomes based on Br-At synteny and At-AK correspondences (Additional file 4).
- AK :
- At :
bacterial artificial chromosome
- Bc :
BAC end sequence
- Bj :
- Bn :
- Bna :
- Bo :
- Br :
expressed sequence tag
fluorescent in situ hybridization
substitutions per synonymous site
long terminal repeat retrotransposon
- Mt :
million years ago
National Center for Biotechnology Information
- Os :
- Pt :
whole genome duplication
whole genome triplication.
We thank the many participants in the Korea Brassica rapa Genome Project. Collaborators meriting special note include Hyung Tae Kim of Macrogen for BAC sequencing and Eu-Ki Kim of NIAB, RDA, and the Korean Bioinformation Center, KRIBB, for bioinformatics support. This work was supported by the National Academy of Agricultural Science (05-1-12-2-1, 200901FHT020710397, and 200901FHT020508369) and by the BioGreen 21 Program (20050301034438), Rural Development Administration, Korea.
- De Bodt S, Maere S, Peer Van de Y: Genome duplication and the origin of angiosperm. Trends Ecol Evol. 2005, 20: 591-597. 10.1016/j.tree.2005.07.008.PubMedView ArticleGoogle Scholar
- Sun G, Dilcher DL, Zheng Z, Zhou Z: In search of the first flower: a Jurassic angiosperm, Archaefructus, from northeast China. Science. 1998, 282: 1692-1695. 10.1126/science.282.5394.1692.PubMedView ArticleGoogle Scholar
- Sun G, Ji Q, Dilcher DL, Zheng S, Nixon KC, Wang X: Archaefructaceae, a new basal angiosperm family. Science. 2002, 296: 899-904. 10.1126/science.1069439.PubMedView ArticleGoogle Scholar
- Leitch IJ, Soltis DE, Soltis PS, Bennett MD: Evolution of DNA amounts across land plants (embryophyta). Ann Bot. 2005, 95: 207-217. 10.1093/aob/mci014.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanc G, Wolfe KH: Widespread paleopolyploidy in model plant species inferred from age distributions of duplicate genes. Plant Cell. 2004, 16: 1667-1678. 10.1105/tpc.021345.PubMedPubMed CentralView ArticleGoogle Scholar
- Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, Albert VA, Ma H, dePamphilis CW: Widespread genome duplications throughout the history of flowering plants. Genome Res. 2006, 16: 738-749. 10.1101/gr.4825606.PubMedPubMed CentralView ArticleGoogle Scholar
- The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- International Rice Genome Sequencing Project: The map-based sequence of the rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A, Schein J, Sterck L, Aerts A, Bhalerao RR, Bhalerao RP, Blaudez D, Boerjan W, Brun A, Brunner A, Busov V, Campbell M, Carlson J, Chalot M, Chapman J, Chen GL, Cooper D, Coutinho PM, Couturier J, Covert S, Cronk Q, et al: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.PubMedView ArticleGoogle Scholar
- The French-Italian Public Consortium for Grapevine Genome Characterization: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-468. 10.1038/nature06148.View ArticleGoogle Scholar
- Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KL, Salzberg SL, Feng L, Jones MR, Skelton RL, Murray JE, Chen C, Qian W, Shen J, Du P, Eustice M, Tong E, Tang H, Lyons E, Paull RE, Michael TP, Wall K, Rice DW, Albert H, Wang ML, Zhu YJ, et al: The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008, 452: 991-996. 10.1038/nature06856.PubMedPubMed CentralView ArticleGoogle Scholar
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, et al: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457: 551-556. 10.1038/nature07723.PubMedView ArticleGoogle Scholar
- Vitte C, Bennetzen JL: Analysis of retrotransposon structural diversity uncovers properties and propensities in angiosperm genome evolution. Proc Natl Acad Sci USA. 2006, 103: 17638-17643. 10.1073/pnas.0605618103.PubMedPubMed CentralView ArticleGoogle Scholar
- Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004, 16: 1679-1691. 10.1105/tpc.021410.PubMedPubMed CentralView ArticleGoogle Scholar
- Devos KM, Brown JK, Bennetzen JL: Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Res. 2002, 12: 1075-1079. 10.1101/gr.132102.PubMedPubMed CentralView ArticleGoogle Scholar
- Paterson AH, Bowers JE, Peterson DG, Estill JC, Chapman BA: Structure and evolution of cereal genomes. Curr Opin Genet Dev. 2003, 13: 644-650. 10.1016/j.gde.2003.10.002.PubMedView ArticleGoogle Scholar
- Choi HK, Mun J-H, Kim DJ, Zhu H, Baek JM, Mudge J, Roe B, Ellis N, Doyle J, Kiss GB, Young ND, Cook DR: Estimating genome conservation between crop and model legume species. Proc Natl Acad Sci USA. 2004, 101: 15289-15294. 10.1073/pnas.0402251101.PubMedPubMed CentralView ArticleGoogle Scholar
- Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, Gouzy J, Wang X, Mudge J, Vasdewani J, Schiex T, Spannagl M, Monaghan E, Nicholson C, Humphray SJ, Schoof H, Mayer KF, Rogers J, Quétier F, Oldroyd GE, Debellé F, Cook DR, Retzel EF, Roe BA, Town CD, Tabata S, Peer Van de Y, Young ND: Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc Natl Acad Sci USA. 2006, 103: 14959-14964. 10.1073/pnas.0603228103.PubMedPubMed CentralView ArticleGoogle Scholar
- Town CD, Cheung F, Maiti R, Crabtree J, Haas BJ, Wortman JR, Hine EE, Althoff R, Arbogast TS, Tallon LJ, Vigouroux M, Trick M, Bancroft I: Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Plant Cell. 2006, 18: 1348-1359. 10.1105/tpc.106.041665.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang TJ, Kim JS, Kwon SJ, Lim KB, Choi BS, Kim JA, Jin M, Park JY, Lim MH, Kim HI, Lim YP, Kang JJ, Hong JH, Kim CB, Bhak J, Bancroft I, Park BS: Sequence-level analysis of the diploidization process in the triplicated FLOWERING LOCUS C region of Brassica rapa. Plant Cell. 2006, 18: 1339-1347. 10.1105/tpc.105.040535.PubMedPubMed CentralView ArticleGoogle Scholar
- Beilstein MA, Al-Shehbaz IA, Kellogg EA: Brassicaceae phylogeny and trichome evolution. Am J Bot. 2006, 93: 607-619. 10.3732/ajb.93.4.607.PubMedView ArticleGoogle Scholar
- Economic Research Service, USDA: Vegetables and Melons Outlook. [http://www.ers.usda.gov/Publications/VGS/Tables/World.pdf]
- Parkin IA, Gulden SM, Sharpe AG, Lukens L, Trick M, Osborn TC, Lydiate DJ: Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics. 2005, 171: 765-781. 10.1534/genetics.105.042093.PubMedPubMed CentralView ArticleGoogle Scholar
- Lukens L, Zou F, Lydiate D, Parkin I, Osborn T: Comparison of a Brassica oleracea genetic map with the genome of Arabidopsis thaliana. Genetics. 2003, 164: 359-372.PubMedPubMed CentralGoogle Scholar
- Blanc G, Hokamp K, Wolfe KH: A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003, 13: 137-144. 10.1101/gr.751803.PubMedPubMed CentralView ArticleGoogle Scholar
- Lysak MA, Koch MA, Pecinka A, Schubert I: Chromosome triplication found across the tribe Brassiceae. Genome Res. 2005, 15: 516-525. 10.1101/gr.3531105.PubMedPubMed CentralView ArticleGoogle Scholar
- Brassica Genome Gateway. [http://brassica.bbsrc.ac.uk]
- The Korea Brassica rapa Genome Project. [http://www.brassica-rapa.org/BRGP/index.jsp]
- Yang TJ, Kim JS, Lim KB, Kwon SJ, Kim JA, Jin M, Park JY, Lim MH, Kim HI, Kim SH, Lim YP, Park BS: The Korea Brassica Genome Projects: a glimpse of the Brassica genome based on comparative genome analysis with Arabidopsis. Comp Funct Genomics. 2005, 6: 138-146. 10.1002/cfg.465.PubMedPubMed CentralView ArticleGoogle Scholar
- Johnston JS, Pepper AE, Hall AE, Chen ZJ, Hodnett G, Drabek J, Lopez R, Price HJ: Evolution of genome size in Brassicaceae. Ann Bot. 2005, 95: 229-235. 10.1093/aob/mci016.PubMedPubMed CentralView ArticleGoogle Scholar
- Meyers BC, Kozik A, Griego A, Kuang H, Michelmore RW: Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis. Plant Cell. 2003, 15: 809-834. 10.1105/tpc.009308.PubMedPubMed CentralView ArticleGoogle Scholar
- Mun J-H, Yu H-J, Park S, Park B-S: Genome-wide identification of NBS-encoding resistance genes in Brassica rapa. Mol Genet Genomics. 2009, doi: 10.1007/s00438-009-0492-0Google Scholar
- Zhang X, Wessler SR: Genome-wide comparative analysis of the transposable elements in the related species Arabidopsis thaliana and Brassica oleracea. Proc Natl Acad Sci USA. 2004, 101: 5589-5594. 10.1073/pnas.0401243101.PubMedPubMed CentralView ArticleGoogle Scholar
- Lim KB, Yang TJ, Hwang YJ, Kim JS, Park JY, Kwon SJ, Kim J, Choi BS, Lim MH, Jin M, Kim HI, de Jong H, Bancroft I, Lim YP, Park BS: Characterization of the centromere and peri-centromere retrotransposons in Brassica rapa and their distribution in related Brassica species. Plant J. 2007, 49: 173-183. 10.1111/j.1365-313X.2006.02952.x.PubMedView ArticleGoogle Scholar
- Kwon SJ, Kim DH, Lim MH, Long Y, Meng JL, Lim KB, Kim JA, Kim JS, Jin M, Kim HI, Ahn SN, Wessler SR, Yang TJ, Park BS: Terminal repeat retrotransposon in miniature (TRIM) as DNA markers in Brassica relatives. Mol Genet Genomics. 2007, 278: 361-370. 10.1007/s00438-007-0249-6.PubMedView ArticleGoogle Scholar
- Cannon SB, Kozik A, Chan B, Michelmore R, Young ND: DiagHunter and GenoPix2D: programs for genomic comparisons, large-scale homology discovery and visualization. Genome Biol. 2003, 4: R68-10.1186/gb-2003-4-10-r68.PubMedPubMed CentralView ArticleGoogle Scholar
- Schranz ME, Lysak MA, Mitchell-Olds T: The ABC's of comparative genomics in the Brassicaceae: building blocks of crucifer genomes. Trends Plant Sci. 2006, 11: 535-542. 10.1016/j.tplants.2006.09.002.PubMedView ArticleGoogle Scholar
- Lysak MA, Berr A, Pecinka A, Schmidt R, McBreen K, Schubert I: Mechanisms of chromosome number reduction in Arabidopsis thaliana and related Brassicaceae species. Proc Natl Acad Sci USA. 2006, 103: 5224-5229. 10.1073/pnas.0510791103.PubMedPubMed CentralView ArticleGoogle Scholar
- Henry Y, Bedhomme M, Blanc G: History, protohistory and prehistory of the Arabidopsis thaliana chromosome complement. Trends Plant Sci. 2006, 11: 267-273. 10.1016/j.tplants.2006.04.002.PubMedView ArticleGoogle Scholar
- Koch MA, Haubold B, Mitchell-Olds T: Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol Biol Evol. 2000, 17: 1483-1498.PubMedView ArticleGoogle Scholar
- Paterson AH: Leafing through the genomes of our major crop plants: strategies for capturing unique information. Nat Rev Genet. 2006, 7: 174-184. 10.1038/nrg1806.PubMedView ArticleGoogle Scholar
- Rabinowicz PD, Schutz K, Dedhia N, Yordan C, Parnell LD, Stein L, McCombie WR, Martienssen RA: Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome. Nat Genet. 1999, 23: 305-308. 10.1038/15479.PubMedView ArticleGoogle Scholar
- Peterson DG, Schulze SR, Sciara EB, Lee SA, Bowers JE, Nagel A, Jiang N, Tibbitts DC, Wessler SR, Paterson AH: Integration of Cot analysis, DNA cloning, and high-throughput sequencing facilitates genome characterization and gene discovery. Genome Res. 2002, 12: 795-807. 10.1101/gr.226102.PubMedPubMed CentralView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290: 1151-1155. 10.1126/science.290.5494.1151.PubMedView ArticleGoogle Scholar
- Adams KL, Wendel JF: Novel patterns of gene expression in polyploid plants. Trends Genet. 2005, 21: 539-543. 10.1016/j.tig.2005.07.009.PubMedView ArticleGoogle Scholar
- Buggs RJA, Doust AN, Tate JA, Koh J, Soltis K, Feltus FA, Paterson AH, Soltis PS, Soltis DE: Gene loss and silencing in Tragopogon miscellus (Asteraceae): comparison of natural and synthetic allotetraploids. Heredity. 2009, 103: 73-81. 10.1038/hdy.2009.24.PubMedView ArticleGoogle Scholar
- Song K, Lu P, Tang K, Osborn TC: Rapid genome change in synthetic polyploids of Brassica and its implications for polyploidy evolution. Proc Natl Acad Sci USA. 1995, 92: 7719-7723. 10.1073/pnas.92.17.7719.PubMedPubMed CentralView ArticleGoogle Scholar
- Wendel JF: Genome evolution in polyploids. Plant Mol Biol. 2000, 42: 225-249. 10.1023/A:1006392424384.PubMedView ArticleGoogle Scholar
- Bennetzen JL, Ma J, Devos KM: Mechanisms of recent genome size variation in flowering plants. Ann Bot. 2005, 95: 127-132. 10.1093/aob/mci008.PubMedPubMed CentralView ArticleGoogle Scholar
- Gaeta RT, Pires JC, Iniguez-Luy F, Leon E, Osborn TC: Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell. 2007, 19: 3403-3417. 10.1105/tpc.107.054346.PubMedPubMed CentralView ArticleGoogle Scholar
- Chantret N, Cenci A, Sabot F, Anderson O, Dubcovsky J: Sequencing of the Triticum monococcum hardness locus reveals good microcolinearity with rice. Mol Genet Genomics. 2004, 271: 377-386. 10.1007/s00438-004-0991-y.PubMedView ArticleGoogle Scholar
- Adams KL, Wendel JF: Novel patterns of gene expression in polyploid plants. Trends Genet. 2005, 21: 539-543. 10.1016/j.tig.2005.07.009.PubMedView ArticleGoogle Scholar
- Kim JS, Chung TY, King GJ, Jin M, Yang TJ, Jin YM, Kim HI, Park BS: A sequence-tagged linkage map of Brassica rapa. Genetics. 2006, 174: 29-39. 10.1534/genetics.106.060152.PubMedPubMed CentralView ArticleGoogle Scholar
- Mun J-H, Kwon SJ, Yang TJ, Kim HS, Choi BS, Baek S, Kim JS, Jin M, Kim JA, Lim MH, Lee SI, Kim HI, Kim H, Lim YP, Park BS: The first generation of a BAC-based physical map of Brassica rapa. BMC Genomics. 2008, 9: 280-10.1186/1471-2164-9-280.PubMedPubMed CentralView ArticleGoogle Scholar
- Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W: PipMaker-a web server for aligning two genomic DNA sequences. Genome Res. 2000, 10: 577-586. 10.1101/gr.10.4.577.PubMedPubMed CentralView ArticleGoogle Scholar
- Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8: 195-202.PubMedView ArticleGoogle Scholar
- Ewing B, Hillier L, Wendl M, Green P: Base calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.PubMedView ArticleGoogle Scholar
- Ewing B, Green P: Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.PubMedView ArticleGoogle Scholar
- Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- RepeatMasker. [http://www.repeatmasker.org/]
- Munich Information Center for Protein Sequence. [http://mips.gsf.de/proj/plant/webapp/recat/]
- Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR: Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008, 9: R7-10.1186/gb-2008-9-1-r7.PubMedPubMed CentralView ArticleGoogle Scholar
- FGENESH. [http://www.softberry.com]
- Stanke M, Morgenstern B: AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acid Res. 2005, 33: W465-W467. 10.1093/nar/gki458.PubMedPubMed CentralView ArticleGoogle Scholar
- Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL, White O: Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acid Res. 2003, 31: 5654-5666. 10.1093/nar/gkg770.PubMedPubMed CentralView ArticleGoogle Scholar
- Huang X, Adams MD, Zhou H, Kerlavage AR: A tool for analyzing and annotating genomic sequences. Genomics. 1997, 46: 37-45. 10.1006/geno.1997.4984.PubMedView ArticleGoogle Scholar
- Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL: The Pfam protein families database. Nucleic Acid Res. 2002, 30: 276-280. 10.1093/nar/30.1.276.PubMedPubMed CentralView ArticleGoogle Scholar
- Plant Transposon-encoded Protein Database. [ftp://ftp.tigr.org/pub/data/TransposableElements/transposon_db.pep]
- Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994, 11: 725-736.PubMedGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003, 422: 433-438. 10.1038/nature01521.PubMedView ArticleGoogle Scholar
- Data used in this study. [http://www.brassica-rapa.org/brvsat]
- The Arabidopsis Information Resource. [http://www.arabidopsis.org/portals/genAnnotation]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited