A fruitful outcome to the papaya genome project
© BioMed Central Ltd 2008
Published: 06 June 2008
Skip to main content
© BioMed Central Ltd 2008
Published: 06 June 2008
The draft genome sequence of a transgenic virus-resistant papaya marks the first genome sequence of a commercially important transgenic crop plant.
In regard to genomics, the papaya is an ideal and interesting species to work with. It has a very small diploid genome of 372 Mb , slightly smaller than rice  and six times smaller than maize . The papaya belongs to the order Brassicales, which includes the model plant Arabidopsis as well as the cabbage family; it shared a common ancestor with Arabidopsis approximately 72 million years ago . The papaya can also be easily transformed  and has a generation time of 9-15 months. Also of interest is its primitive sex-chromosome system, which has interested evolutionary biologists for years .
The papaya genome was sequenced using a whole-genome shotgun approach by the traditional Sanger method to approximately 3x coverage . The majority of assembled contigs added up to about 271 Mb (73% of the genome) with scaffolds spanning 370 Mb. About 167 Mb of sequence (or 235 Mb of scaffolds) could be anchored to the integrated genetic and physical map of the papaya genome. More than half (52%) of the papaya genome comprises repetitive sequences, mainly long terminal repeat retrotransposons. Cytogenetic studies suggest that the genome is about 65-70% euchromatic and 35-30% heterochromatic. Various measures were used to assess the coverage of the draft genome, such as the percentage of unique genes (unigenes) and genetic markers matching the assembly. The authors estimate that approximately 90% of the euchromatin has been covered, containing 92.1% of the unigenes and 92.4% of the genetic markers.
Automated annotation of the genome combined with the genome coverage led the team to project a gene content of 24,746 genes. Compared with the other four sequenced plant genomes, this gene count is 11-20% less than that of Arabidopsis , 34% less than rice , 46% less than poplar  and 19% less than grape .
The indication that papaya contains the smallest number of genes of any plant yet sequenced was investigated further. First, all inferred non-redundant protein sequences from the five sequenced plant genomes were collapsed into 39,709 similarity groups, or 'tribes'. Then the numbers of genes found in each tribe were compared between papaya and each of the other genomes. In the papaya-Arabidopsis comparison, for example, 3,595 tribes out of 6,726 contained the same number of genes. However, for the remaining tribes, Arabidopsis genes outnumbered papaya by two to one, and this trend was consistent with all the other plant genome sequences.
The team next asked what the minimum set of genes required for an angiosperm might be. By determining the genes shared across all the 39,709 tribes among the five sequenced genomes they estimated this minimum to be 13,311 genes. As papaya had the smallest numbers of genes over the most tribes, these data further supported the idea that it has the lowest gene count of any plant genome so far sequenced.
One possible explanation for the lower than expected number of genes is that the papaya genome did not undergo the two rounds of recent whole-genome duplication observed in Arabidopsis . Analysis of syntenic blocks between papaya and Arabidopsis revealed that for single papaya genes, Arabidopsis has two to four corresponding genes, but that each Arabidopsis gene only has one counterpart in papaya.
Interestingly, when syntenic blocks from the grape genome were included along with Arabidopsis in the analysis, Ming et al.  detected a possible ancient whole-genome triplication that occurred before the divergence of the three species, but after the separation of the monocotyledons and dicotyledons. This triplication event was first proposed on the evidence of the grape genome sequence  and is now supported by the papaya sequence.
Ming et al.  categorized several important gene families essential for papaya fitness. One surprise is the extremely small number of disease-resistance genes of the nucleotide-binding site leucine-rich repeat (NBS-LRR) class. Arabidopsis has more than 200 NBS-LRR genes  and rice more than 600 . In contrast, there are only 55 NBS-LRR genes in papaya, but they are clustered in a similar fashion to those in Arabidopsis and rice. This dearth of NBS-LRR genes might suggest that papaya has developed alternative strategies of host defense, such as the evolution of other classes of resistance genes (for example, tomato Cf-like genes , rice Xa-21 like receptor kinase  or maize Hm1-like detoxin protein ) or even of nonhost resistance , in which all members of a plant species exhibit resistance to all members of a given pathogen species.
The papaya genome has a similar number of genes to Arabidopsis and poplar for cellulose biosynthesis, cell wall and lignin syntheses, and ethylene biosynthesis, but fewer genes involved in cell-wall degradation and in light-induced and circadian rhythms. On the other hand, papaya has more genes associated with starch metabolism and the development of volatiles.
The papaya genome sequence also sheds new light on the primitive XY sex-chromosome system, where the Y chromosome contains a male-specific region (MSY) approximately 8 Mb in length [8, 19]. Two scaffolds (totaling approximately 4.5 Mb) from the female papaya genome sequence determined by Ming et al.  aligned to a bacterial artificial chromosome (BAC) physical map of the X chromosome. The female region contained 254 genes, of which 75% were supported by expressed sequence tags. In contrast, only four expressed genes have so far been found among seven completely sequenced BACs (totaling 1.2 Mb in length) in the MSY region . Using repeat data derived from the whole-genome sequence, Ming et al. were able to show that 85.6% of the 1.2 Mb MSY sequence is composed of repeats. Although complete sequence data are not yet available for the MSY from a male genome, the sequence generated from the female will provide essential comparative information to help unravel the mysteries of the evolution and function of sex chromosomes in plants.
An important point of the paper by Ming et al. is that the genome analyzed was from a transgenic inbred line. The PRSV coat protein transgene confers resistance to the virus and was introduced into papaya by particle bombardment. Particle bombardment can cause the construct to be fragmented, resulting in multiple integration events within the genome . The genome sequence enabled the identification of multiple integration sites, three of which occurred in nuclear genomic regions that contained AT-rich DNA fragments from the chloroplast genome. From a regulatory viewpoint, precise identification of the insertion sites of a transgene is required by many countries in order to obtain permission to grow or import transgenic food crops. Now that these integration sites have been determined, a major hurdle to the introduction of transgenic papaya in other countries has been removed.
In summary, a new and interesting plant genome sequence is now publicly available for interrogation. The papaya genome provides basic plant research with an exciting new tool to better understand angiosperm evolution and sex-chromosome biology. It also provides clues to the minimum set of genes that are needed to be a flowering plant. On the practical side, the papaya genome sequence has yielded a vast set of molecular genetic markers that can be used to create higher yielding, more nutritious and hardier papaya varieties.
It is safe to assume that the genomes of all the major food and fiber crops will be sequenced within the next five years. This is an important goal for both the plant genomics community and the wider world if we are to meet the food and energy security needs of the future. Sequencing the papaya genome illustrates just how much we do not know about plant genomes and how important it will be to generate reference genome sequences at key nodes on the tree of life. One argument has recently been made to sequence the genome of Amborella trichopoda, which lies at the base of angiosperm evolution . Whoever coined the phrase 'post-genomics' has jumped the gun. Plant genomics has just touched the surface of new biological discovery and practical solutions in support of the next green revolution.