De novo sequencing, assembly, and annotation of Capsicum genomes
We sequenced and assembled the genome sequences of Capsicum baccatum PBC81 (hereafter, Baccatum) and C. chinense PI159236 (hereafter, Chinense) using Illumina HiSeq 2500 with library insert sizes in the range of 200 bp–10 kb (Additional file 1: Table S1-S3). The estimated genome sizes of Baccatum and Chinense, based on 19-mer analysis, were 3.9 and 3.2 Gb, respectively (Additional file 1: Figure S1). The assembled genomes of Baccatum and Chinense constituted 3.2 and 3.0 Gb (83% and 94% of the estimated genome sizes, respectively) and had scaffold (contig) N50 sizes of 2.0 Mb (39 kb) and 3.3 Mb (50 kb), respectively (Additional file 1: Table S3). We annotated the protein-coding genes in the Baccatum and Chinense assemblies as well as those in the pre-existing C. annuum CM334 genome [25] (hereafter, Annuum) for detailed comparative analysis (Additional file 1: Figure S2). On average, ~ 35,000 genes were annotated in each species (Additional file 1: Table S4). Our analysis revealed higher gene coverage in the updated gene model than that in the previous gene model of Annuum (Additional file 1: Table S5). Furthermore, a comparison of the updated and previous gene models of Annuum revealed ~ 10,000 genes that did not overlap between the two gene models, suggesting that most of the non-overlapping genes in the previous version were associated with TEs (Additional file 1: Figure S3).
A high-density genetic map of each species was generated by genotyping-by-sequencing on F2 populations [26, 27]. After breaking up chimeric scaffolds on the basis of genetic map information, we organized the assembled genome sequences into 12 chromosomes-scale pseudomolecules. Overall, 87% of Baccatum (2.8 Gb in 2083 scaffolds) and 89% of Chinense (2.8 Gb in 1,557 scaffolds) in assembled genomes were ordered by the genetic map and inspected for syntenic inferences with the updated pseudomolecules of Annuum (Additional file 1: Table S6). We validated the assembled genome sequences by reference guided mapping using the refined single-end and paired-end data, and alignment to the de novo transcriptome assembly of each species (Additional file 1: Table S7 and S8). In total, we detected more than 98.1% of the filtered raw sequences (>98% identity) and more than 93.4% of the assembled transcriptomes (> 98% identity and 80% of query coverage) in the genome assemblies (Additional file 1: Table S8). Taken together, our analyses provide the de novo reference genome sequences of two new pepper species as well as an improved Annuum genome sequences.
Repeat annotation was performed with the assembled genomes and the initial contigs covering the estimated genome sizes of the three species (Additional file 1: Figure S4, Table S9 and S10). Overall, ~ 85% of the initial contigs were annotated as repeat sequences. LTR-Rs of the Ty3-gypsy superfamily accounted for about half of the entire genome in each of the three species (Additional file 1: Table S9 and S10). Among the subgroups of the gypsy superfamily, del elements comprised the largest fraction, representing 41.5%, 34.9%, and 41.7% (1482, 1337, and 1,343 Mb) in Annuum, Baccatum and Chinense, respectively. Furthermore, athila elements were more abundant (> 2-fold) in Baccatum, indicating that the athila subgroup contributed to species-specific genome expansion in the Baccatum lineage (Additional file 1: Table S10).
Speciation and evolution of the Capsicum species
A phylogenetic analysis of the peppers with other plant species revealed that the divergence among the three peppers occurred first between Baccatum and a progenitor of the other two peppers ~ 1.7 million years ago (MYA), followed by divergence between Annuum and Chinense lineages ~1.1 MYA (Fig. 1a; Additional file 1: Table S11). To identify genomic changes in the three pepper species, we compared the genome structures, LTR-R insertion patterns, and gene duplication histories across these pepper genomes (Fig. 1b, c and Fig. 2).
Chromosomal rearrangement is an important force in speciation, often producing unbalanced gametes that reduce hybrid fertility [28]. We performed an inter-genomic structural comparison and detected translocations among the pepper genomes (Fig. 1b). The results show that chromosomes 3, 5, and 9 exhibit translocations that differentiate Baccatum from the other two species (Fig. 1b, c). Collinearity comparisons among Capsicum species and two Solanum species revealed that the distal region on the long arm of chromosome 9 was conserved in Baccatum but translocated to the short arm of chromosome 3 in a shared ancestor of Annuum and Chinense (Fig. 1c; Additional file 1: Figure S5). Furthermore, chromosomes 6 and 4 of Solanum were detected in the terminal regions of the long and short arms of chromosomes 3 and 5 in Annuum and Chinense, respectively. In contrast, the orthologous regions of Solanum were mixed in the corresponding blocks of Baccatum (Fig. 1c). This indicates that the distal regions of the long and short arms of chromosomes 3 and 5 were translocated in the Baccatum lineage. We detected translocations between the terminal regions of the short arm of chromosome 3 in Baccatum and the long arm of chromosome 9 in Annuum and Chinense. Consequently, our analyses revealed that translocations have generated hetero karyotypes in both the Baccatum and the Annuum/Chinense progenitor lineages.
To compare LTR-R insertion patterns across the pepper genomes, we identified full-length LTR-Rs in each assembled genome and estimated their insertion times [29] (Additional file 1: Figure S5 and Table S12). A peak of LTR-R activity in Baccatum appeared around its speciation time 1–2 MYA (Fig. 2a). In particular, the athila family was highly amplified in Baccatum around the estimated speciation time, indicating that this subgroup may have explosively increased in Baccatum after speciation. In Chinense, we observed the recent proliferation of LTR-Rs (Fig. 2a).
Gene duplication is a major mechanism generating functional diversity between species by the creation of new genes [30, 31]. We detected recent gene duplication events and characterized the repertories of duplicated genes in the three pepper genomes during and after speciation (Fig. 2b). Overall, the duplication events were particularly frequent in the Baccatum lineage, both during and after the speciation. In particular, NLRs were extensively amplified in Baccatum in the last 0–2 MYA, and more recently in the other two peppers (Fig. 2b). Taken together, those results suggest that the chromosomal rearrangements, accumulation of specific LTR-Rs, and differential gene duplications have contributed to genome diversification in the Capsicum.
Massive creation of new NLRs via LTR-R-driven retroduplication
A previous study suggested that NLRs were amplified in pepper compared to tomato and potato genomes [21]. In particular, the coiled-coil NLR subgroup 2 (CNL-G2) was highly expanded in the pepper genome (Additional file 1: Table S13). To explore the possible mechanism of the NLR proliferation in Capsicum spp., we analyzed the NLRs and their flanking sequences. We identified 105, 123, and 86 NLRs located inside LTR-Rs in Annuum, Baccatum, and Chinense, respectively (Additional file 1: Figure S6, S7, and Table S13; Additional file 2: Table S14). Hence, a large proportion (~13%) of the NLRs were amplified by LTR-Rs, with the structures indicating that their retroduplicated origin is still intact. The retroduplicated NLRs were manly located on specific euchromatic chromosome arms (Additional file 1: Figure S8). Most of these NLRs (~70%) were in the CNL-G2 category, indicating that the copy number of specific NLR subfamilies was particularly expanded in specific chromosomes (Fig. 3a, b; Additional file 1: Figure S8). Furthermore, most of the retroduplicated NLRs (~ 72% of the total and ~ 67% of the CNL-G2 type) were inside non-autonomous LTR-Rs that contained no gag or pol protein coding potential (Additional file 1: Table S15). This suggests that all steps for the retroduplication, presumably including the initial sequence capture process, had to be provided in trans. To compare retroduplicated NLRs and NLRs which are not affected by LTR-Rs, we classified normal NLRs as false-negative annotations (Additional file 1: Table S13). We performed Ka/Ks analysis to compare selection pressure between retroduplicated and normal NLRs in CNL-G2. Our analysis revealed that both retroduplicated and normal NLRs were under purifying selection and Ka/Ks ratio of the both genes was not significantly different (Additional file 1: Figure S9).
When we compared the retroduplicated and normal NLRs in the CNL-G2 category, the number and length of exons were significantly fewer and longer in the retroduplicated NLRs, but not all of the retroduplicated NLRs have single exons (Fig. 3c, d). In total, ~ 32% of the retroduplicated NLRs in each species had multiple exons but all of those had a reduced number of introns compared to their predicted parental sequences (Additional file 1: Figure S7b, Table S15 and S16). For example, CB.v1.2.scaffold1410.1 having two exons is annotated as a retroduplicated NLR in CNG-G10 of Baccatum. We found that its potential parental sequence was consisted of six exons and sequence comparison of the both genes revealed that five exons of parental sequence were fully merged to the first exon (2.9 kbp) of CB.v1.2.scaffold1410.1 (Additional file 1: Figure S7b). These results suggest that retroduplicated NLRs containing introns in pepper genomes might be emerged through alternative splicing mechanisms such as intron retention or exon skipping as described in Zhang et al. [10].
We performed the genome-wide analyses using tomato, potato, and rice genomes to verify that the retroduplication is a general feature of genome evolution in the plant kingdom (Additional file 1: Table S13). We found that 21, 81, and 27 (8%, 18%, and 5%) of NLRs were inside LTR-Rs in tomato, potato, and rice, respectively (Additional file 1: Table S13). Of these, we identified parental sequences with multiple exons for 14, 71, and 16 of the NLRs inside LTR-Rs in tomato, potato, and rice, respectively, thus confirming their emergence by retroduplication (Additional file 1: Figure S10 and Table S17). Similar to the peppers, NLRs in a particular subgroup (CNL-G9) were primarily retroduplicated in potato (Additional file 1: Table S13). These results indicate that LTR-Rs played a key role in the expansion of NLRs by retroduplication throughout the plant kingdom and that the detected events are both recent and lineage-specific.
In addition to the NLRs, we looked for other genes inside LTR-Rs in the six plant species (Additional file 1: Table S18). In total, a range from 1398 genes in rice to 3898 genes in potato genomes were found to be inside LTR-Rs, suggesting possibility for emergence of a large proportion of genes in these plant species by LTR-R-driven duplication. On average, ~ 45% of them had functional domains including highly amplified families such as MADS-box TFs, cytochrome P450s, and protein kinases, and ~ 42% of those genes were expressed in one or more investigated tissues by RNA-sequencing (RNA-seq) analysis (Additional file 1: Table S18).
Evolutionary mechanisms for the emergence of disease resistance genes in Solanaceae
The L genes encoded by the NLRs are known to provide resistance in peppers against Tobamoviruses and they belong to the CNL-G4 category, along with I2 in tomato that provides resistance to race 2 of Fusarium oxysporum f. sp. lycopersici and R3a in potato that provides resistance to the late blight pathogen, Phytophthora infestans [32,33,34]. Each gene has single exon encoding a peptide of ~ 1300 amino acids. Synteny analysis and sequence comparison among pepper, potato, and tomato genomes suggested L, I2, and R3a are orthologous genes and the genomic regions containing L, I2, and R3a were tightly linked on chromosome 11 (Additional file 1: Figure S11a and Table S19). These results suggest the possibility that the genes originated by an early retroduplication and then underwent divergent evolution in each lineage.
We examined the evolutionary history of L genes and their putative parental genes in the pepper genomes (Fig. 4; Additional file 1: Figure S11b, c, and Table S20). The candidates for a parental gene (P1 to P6) were identified considering similarity, Ks values, and alignment coverage to L genes. All candidate parental sequences contained multiple exons. When candidate parental sequence P1 was compared with L in Annuum, the results suggested that L was derived from retroduplication in the ancestral lineage of Capsicum spp. ~ 8.9 MYA (Fig. 4). Because L has a 6.7-kb single exon, with only an intron in the 3′ UTR, and the presence of both flanking direct repeat sequences and a poly(A) “tail,” our analysis suggests that L emerged through capture and reverse transcription by a long interspersed nuclear element (LINE)-driven retroduplication (Fig. 4; Additional file 1: Figure S12). Sequence comparison of L genes in the three genomes and L4 in C. chacoense revealed that the L genes were diversified by accumulation of lineage-specific sequence mutations after speciation within Capsicum (Fig. 4; Additional file 1: Table S21). Consequently, our results suggest that the ancestor of the L genes was derived from retroduplication and that subsequent divergent evolution has led to specific resistance against diverse strains of Tobamovirus in each species of Capsicum after speciation (Fig. 4).
To analyze the evolutionary processes acting on R3a of potato, we first performed a genome-wide search for the R3a as well as for candidate parental sequences. Because R3a is absent in the current potato reference genome [35], we could not carry out accurate comparisons of R3a and their homologs. However, R3a and its clustered genes originated from wild species, Solanum demissum [36], and were available in a public database. So, we compared these sequences with their closest homologs in the reference potato genome. Our analyses revealed that intronless sequences of the ancestral potato R3a might have emerged by RNA-based gene duplication in a shared ancestor of potato and tomato (Fig. 4). Subsequently, R3a and its paralogues were amplified by two rounds of tandem gene duplication after the divergence of potato and tomato (Fig. 4; Additional file 1: Table S22). Taken together, our results suggest that retroduplication events are a main evolutionary process in the emergence of new plant disease resistance genes, which can gain function via subsequent sequence variation and tandem duplication.
Evolution of potential anthracnose resistance genes in Baccatum
Pepper anthracnose caused by Colletotrichum spp. is one of the most devastating diseases in worldwide pepper production [37]. Due to the complexity of the interactions between the host and Colletotrichum spp. and the lack of resistance in the Annuum gene pool, a few Baccatum varieties were identified as the only breeding resources for anthracnose resistance [38]. Using pre-existing genetic information [39], we identified the pertinent genomic regions and obtained 64 NLRs from a 3.8 Mb region of Baccatum chromosome 3 as candidate resistance genes for C. capsici (Fig. 5; Additional file 1: Table S23). Previous studies reported that the main quantitative trait locus (QTL) for pepper resistance against C. capsici was located on chromosome 9 [39]; however, we found that QTL is located on chromosome 3 due to translocation in Baccatum and Annuum (Fig. 1c). We obtained 35 Baccatum-specific NLRs (27 in CNL-G2, five in CNL-G10, and three in CNL-G10) from the 64 NLRs by sequence comparison among the three pepper genomes (Fig. 5). Considering the gene duplication history, 15 of the 35 genes appear to have emerged after generation of the Baccatum lineage and all of them belong to the CNL-G2 category. Transcriptome evidence indicated that ten of those 15 genes are expressed in one or more tissues (Fig. 5; Additional file 1: Table S23). Furthermore, five of the 15 genes appear to have emerged by retroduplication (Fig. 5). Consequently, our results suggest that the retroduplication along with tandem and segmental duplications, may have played a major role in the emergence of the candidate genes for anthracnose resistance in the Baccatum lineage.