- Open Access
Genomic analyses of an extensive collection of wild and cultivated accessions provide new insights into peach breeding history
Genome Biology volume 20, Article number: 36 (2019)
Human selection has a long history of transforming crop genomes. Peach (Prunus persica) has undergone more than 5000 years of domestication that led to remarkable changes in a series of agronomically important traits, but genetic bases underlying these changes and the effects of artificial selection on genomic diversity are not well understood.
Here, we report a comprehensive analysis of peach evolution based on genome sequences of 480 wild and cultivated accessions. By focusing on a set of quantitative trait loci (QTLs), we provide evidence supporting that distinct phases of domestication and improvement have led to an increase in fruit size and taste and extended its geographic distribution. Fruit size was predominantly selected during domestication, and selection for large fruits has led to the loss of genetic diversity in several fruit weight QTLs. In contrast, fruit taste-related QTLs were successively selected for by domestication and improvement, with more QTLs selected for during improvement. Genome-wide association studies of 11 agronomic traits suggest a set of candidate genes controlling these traits and potential markers for molecular breeding. Candidate loci for genes that contributed to the adaption to low-chill regions were identified. Furthermore, the genomic bases of divergent selection for fruit texture and local breeding for different flavors between Asian and European/North American cultivars were also determined.
Our results elucidate the genetic basis of peach evolution and provide new resources for future genomics-guided peach breeding.
Global food production and crop quality have been transformed over the last 10,000 years through domestication and extensive selection . However, crop breeding still relies heavily on experience and manual observation, which are both limiting in terms of efficiency and time-consuming, especially for perennial fruit crops with long juvenile phases. Recently developed next-generation DNA sequencing technologies allow the tracking of genome-wide selection signatures and can be used to develop strategies for further crop improvement, as recently reported in rice (Oryza sativa) , maize (Zea mays) , cucumber (Cucumis sativus) , tomato (Solanum lycopersicum) , soybean (Glycine max) , peach (Prunus persica) , and apple (Malus domestica) . Genome-wide association studies (GWAS) have been performed to identify loci or genes associated with important agronomic traits in a range of species [2, 5,6,7,8] and to facilitate targeted and precise genetic selection, involving marker-assisted selection (MAS) and molecular design breeding. MAS has been applied to woody perennial crops as well as annual crops, where it has accelerated breeding programs through early identification of fruit-related traits at the seedling or juvenile phases [9, 10].
Fruits represent an important component of the diet of humans and other animals. Peach (Prunus persica L.) is one of the most economically important fruit crops in temperate regions, with a global yield of 25.0 million tons in 2016 and a net value of over $12.4 billion in 2014 (FAOSTAT; http://faostat.fao.org). Peach is also considered to be a model system for molecular biology researches in the Rosaceae family and represents one of the fruit species most consumed worldwide. Following domestication in China, approximately 5000 years ago, peach has undergone a long period of selection by native farmers, and in recent decades, specific target-guided breeding has led to a remarkable increase in fruit quality. However, the selection signatures at the molecular level associated with domestication and improvement remain largely unclear. Understanding the genetic basis of selection is important for precise molecular breeding and for eliminating undesired costs, such as linkage drag during wild introgressions . Although previous studies have attempted to address this issue, small population sizes and underrepresentation of the accessions in the germplasm collection limited the findings [7, 11].
To explore this question more comprehensively, we sequenced the genomes of a large peach collection, consisting of 480 diverse accessions from around the world, including 52 wild relatives, 213 landraces, and 215 improved cultivars (Fig. 1a, Additional file 1: Table S1). Among the improved cultivars, 126 were collected from eastern countries (Asia) and 89 were from western countries (North America and Europe). Using these sequencing data, here we elucidate the breeding history of peach by identifying loci and genes associated with important agronomic traits that have been under selection during domestication and improvement. Our data indicate that fruit size was mainly selected during domestication, while fruit taste was successively selected during both domestication and improvement. Our results provide a new resource for further molecular breeding and studies of peach biology.
Results and discussion
A high-density peach genomic variation map
Through sequencing of 480 peach accessions (Fig. 1a), we generated a total of 719 Gb base pairs of sequences, with an average depth of 6.4× and a coverage of 93.5% for each accession (Additional file 1: Table S1). Pair-end reads were mapped against the peach reference genome  (release version 2.0), and a final set of 4,980,259 high-quality single nucleotide polymorphisms (SNPs) were identified, resulting in an average of 21.9 SNPs per kilobase (Additional file 2: Tables S2 and S3, Additional file 3: Figure S1). A total of 886,033 SNPs were located in coding regions, including 457,988 nonsynonymous and 430,034 synonymous SNPs (Additional file 2: Table S3). We also found 8812 SNPs in 5724 genes that were predicted to have significant effects on gene functions, including 5784 nonsense SNPs in 4480 genes that resulted in start codon changes, the introduction of premature stop codons, or the production of elongated transcripts. The accuracies of original SNPs and SNPs following imputation were estimated to be 96.1% and 96.3%, respectively, based on genotyping using the Sequenom MassARRAY platform for 30 SNPs in 258 accessions (Additional file 1: Tables S1, Additional file 2: Table S4). In addition, we identified 1,026,375 small insertions and deletions (INDELs) (< 5 bp) and 159,330 large structural variations (SVs) (Additional file 3: Figure S1, Additional file 2: Table S2).
The high-density SNP data enabled GWAS analysis of seven agronomic traits: flesh color (white/yellow), fruit hairiness (peach/nectarine), fruit shape (round/flat), fruit texture (melting/non-melting), flesh adhesion (adhesion/freestone), pollen fertility (male sterility/fertility), and fruit skin color (red/non-red) (Additional file 3: Figure S2), in addition to the four (fruit weight, total phenolic content, fruit soluble solid content, and chilling requirement) that were described separately in the following sections. For quantitative traits, we identified a total of 20 novel quantitative trait loci (QTLs) using GWAS. For qualitative traits, the association signals were close to, or included, known loci or genes that have been identified previously using studies of recombinant populations [13,14,15,16,17] (Additional file 3: Figure S2, Additional file 2: Table S5). Specifically, these associations included a SNP (Pp01, 27,005,584 bp) located 392 kb from the PpCCD4 gene for flesh color , a SNP (Pp06, 26,288,291 bp) within the S region for fruit shape , a SNP (Pp05, 16,633,286 bp) located 671 kb from PpMYB25 for fruit hairiness , a SNP (Pp04, 19,909,362) located 860 kb from PpendoPG for fruit texture , a SNP (Pp06, 2,014,933) within the Ps region for male sterility , a SNP (Pp06, 19,070,801) located 11 kb from PpendoPG for flesh adhesion , and a SNP (Pp03, 18,103,021) within the skc region for fruit skin color  (Additional file 1: Table S5). We also carried out GWAS of these seven traits using genome-wide SVs and found that these results were highly consistent with those using the SNPs. Additionally, some candidate causative SVs were identified (Additional file 2: Table S5, Additional file 3: Figure S2), such as a 70.5-kb deletion associated with a non-melting phenotype in a gene cluster encoding endopolygalacturonase on chromosome 4 (Additional file 3: Figure S3), which was consistent with a previous study . These associations will likely be useful for marker-based early selection and accelerated breeding for these seven commercially important fruit-related traits.
Peach population structure and domestication bottleneck
To understand the genetic relationships among accessions, we constructed a neighbor-joining phylogenetic tree and performed a principal component analysis (PCA) using the 4,980,259 SNPs. The results of neighbor-joining tree and PCA largely supported the classification of the peach accessions into three major groups (Fig. 1b–e; Additional file 3: Figure S4). Group I included all wild relatives (wild group), while groups II and III were biased towards landraces (landrace group) and improved cultivars (improved group), respectively (Fig. 1a, b; Additional file 3: Figure S4). The domesticated accessions, including landraces and improved cultivars, formed a monophyletic lineage, indicating that all currently grown cultivated peach accessions originated from a single domestication event (Fig. 1b). Group II was further classified into three subclades, which showed strong geographic distribution patterns (Fig. 1a, b; Additional file 1: Table S1). Group II-1 contained ornamental accessions and landraces from northeastern and southern China. Group II-2 mainly consisted of landraces from the middle and lower reaches of the Yangtze river (Yangtze regions) and Yungui Plateau. Group II-3 mainly included landraces from northwestern and northern China. Similarly, accessions from group III were further clustered into two subclades, designated as the eastern improved (group III-1) and the western improved subgroups (group III-2), with group III-1 biased towards improved cultivars from Asia and group III-2 biased towards improved cultivars from North America and Europe (Fig. 1a, b; Additional file 1: Table S1). We interpret this geographic clustering pattern to indicate the introduction and development history of peach domestication, spread, and subsequent improvement. We also found that some cultivated peach accessions harbored admixed ancestry, suggesting that they might have experienced introgression or gene flow during breeding (Additional file 3: Figure S5).
The overall nucleotide diversity measured by the π value of the 480 accessions was 1.1 × 10− 3. The wild progenitors of cultivated peach showed a much higher nucleotide diversity (3.5 × 10− 3) than landraces (1.2 × 10− 3), suggesting a narrow domestication bottleneck (πwild/πlandrace = 2.92) (Fig. 1f), which was different from that reported in a previous study . Nearly two thirds of the genetic diversity have been lost during peach domestication, indicating a major effect of artificial selection on the peach genomes. This pattern of domestication is different from other perennial fruit crops, such as grape  (Vitis vinifera) and apple , which have been reported to lack narrow domestication bottlenecks (Additional file 2: Table S6). We further verified the domestication bottleneck (P < 0.001) using the BOTTLENECK program which detects reductions in recent effective population size  (Additional file 2: Table S7). Moreover, the high genetic differentiation (FST) between wild and domesticated peaches (~ 0.76) (Fig. 1f), positive mean Tajima’s D values, and slow linkage disequilibrium (LD) decay (half LD decay distance of ~ 35 kb) in domesticated peach are consistent with a narrow domestication bottleneck (Fig. 1g). In contrast, most of the genetic diversity was retained during peach improvement from landraces (1.2 × 10− 3) to improved cultivars (1.0 × 10− 3), suggesting a weaker improvement bottleneck (πlandrace/πimprovement = 1.20) in peach (Fig. 1f).
To better understand the basis of the narrow bottleneck during domestication, we examined factors that might affect bottleneck intensity, such as effective population size of founders, length of juvenile phase, mating system, or single versus multiple domestication origins. The effective population size of the founders estimated by the δaδi program  was ~ 200 for peach at domestication, which is markedly lower than the corresponding estimates in maize  (~ 150,000) and rice  (~ 1300). Peach typically is self-compatible with a relative short juvenile phase of ~ 3 years, which is shorter than apple (> 5 years) and pear (> 5 years), and equal to that of grape. Multiple origins of cultivated crops may lead to a mild bottleneck, such as has been reported for grape , which contrasts with our previous finding that peach had a single origin and a linear evolutionary route . In addition, the use of grafting promoted the formation of a marked bottleneck as it resulted in the propagation of a limited number of highly interesting genotypes. We conclude that the combined effects of these factors resulted in the narrow domestication bottleneck in peach.
We found that eastern (π = 1.1 × 10− 3) and western improved cultivars (π = 0.9 × 10− 3) harbored nearly the same low level of genetic diversity, with little genetic differentiation (FST = 0.08) (Fig. 1f), suggesting a narrow genetic background of current cultivated peach varieties. Therefore, trait improvement in peach may be constrained by the limited genetic diversity in commonly used breeding materials, suggesting the potential benefit of introgressing material from wild relatives in future breeding programs.
Genome-wide selection signatures during domestication and improvement
To identify the potential selection signatures of peach domestication and improvement, we screened for genomic regions with a sharp reduction of nucleotide diversity (ROD) between wild and landrace groups (πwild/πlandrace; domestication) as well as landrace and improved groups (πlandrace/πimprovement; improvement). Consistent with the narrow domestication bottleneck and weak improvement bottleneck, we observed a significantly higher ROD value during domestication (average πwild/πlandrace = 3.33) than improvement (average πwild/πlandrace = 1.76) (P < 0.01) (Fig. 2a, f). The top 5% of the genomic windows or regions with unusually high ROD values in each comparison were defined as selective sweeps. Finally, a total of 142 domestication sweeps and 104 improvement sweeps, covering 11.1% (25.2 Mb) and 9.4% (21.3 Mb) of the assembled genome, harboring 3683 and 3039 genes, respectively, were identified (Fig. 2a, f; Additional file 1: Tables S8-S11). Taken together, we concluded that approximately 18.6% (42.4 Mb) of the peach genome has been shaped by these two selection steps involving domestication from wild relatives. Notably, we found that 15.9% (4.1 Mb; 1.8% of the genome) of domestication sweeps, harboring 768 genes, overlapped with improvement sweeps, indicating that a subset of domestication loci may have undergone a second round of artificial selection for continued improvement of important agronomic traits (Additional file 2: Table S12).
Fruit size was a major target of selection during domestication
Increase in fruit or seed size is a key target in the breeding history of most crop species. The peach fruit size has increased more than 10 times and undergone two rounds of selection, from 13.6 ± 4.2 g in the wild group to 132.1 ± 28.75 g in landraces and 140 ± 31.52 g in improved cultivars (Fig. 3i, Additional file 3: Figure S6). We hypothesized that fruit size may have been the predominant target of selection during domestication, with a mild selection during the subsequent phase of peach improvement. Previous studies have identified multiple quantitative trait loci (QTLs) associated with fruit size (fw) in peach [25,26,27,28], but the genetic basis of the increase of fruit size during domestication and improvement remains largely unknown.
We compared the selective sweeps with the locations of known QTLs. A total of nine known fw QTLs on chromosomes 1, 4, 6, and 8 fell within domestication sweeps, including a stable QTL on chromosome 4, identified in a multi-year mapping experiment  (Fig. 2a). Selections for these nine QTLs likely contributed to the enlargement of peach fruit size during the evolutionary transition from wild to landrace accessions (Fig. 3i). For instance, the stable fw QTL on chromosome 4 (10.3–15.2 Mb) showed sharp ROD and high composite likelihood ratio (CLR) values corresponding to a strong domestication sweep (Fig. 3a, c, d). Moreover, a SNP associated with fruit weight (P = 7.5 × 10− 7) within this QTL was also identified in the GWAS analysis (see below; Fig. 3a). Within this QTL, we found one gene (Prupe.4G197000) encoding a putative auxin-responsive GH3-like protein (Fig. 3b), which showed high expression at the early stage of fruit development (Fig. 3d). Notably, a homolog of this gene has been proposed as a candidate for the increase in fruit mass during tomato domestication . Compared with domestication sweeps, only four fw QTLs, located on chromosomes 4, 5, and 6, overlapped with improvement sweeps (Fig. 2f), which may explain the relatively small increase in fruit size during improvement (Fig. 3i, Additional file 3: Figure S6). Collectively, the difference in the number of fw QTLs associated with the domestication and improvement sweeps (Fig. 2a, f) suggests a stronger selection for fruit size during domestication than improvement.
To further track the selection for fruit size during the two selection steps, we performed GWAS analysis of fruit weight (Additional file 3: Figure S7). A total of eight association peaks, on chromosomes 1, 2, 3, 4, 6, 7, and 8 (P < 3.2 × 10− 8), were identified, of which three overlapped with known QTLs and five were new (Additional file 2: Table S13, Additional file 3: Figure S7). Among the association peaks, three (chromosomes 2, 6, and 8) overlapped with domestication sweeps, while no peak was found in the improvement sweeps (Fig. 2a, b, d, e). Specifically, we detected 52 SNPs significantly associated with fruit weight, of which 26 underwent selection during domestication, while only two showed selection signals during improvement (Additional file 2: Table S13). This finding is congruent with stronger selection for fruit size during domestication than improvement. Notably, we found a strong association on chromosome 2 overlapping with a domestication sweep, supported by high ROD and CLR values (Fig. 3e, g, h). Two candidate genes within this region, Prupe.2G275500 and Prupe.2G276700, are related to cell division, with high expression at the cell division stage of fruit development (20 days after full bloom) (Fig. 3f, h). The latter encodes a PLAC 8 family protein and is homologous to fw2.2 from tomato, which controls fruit size through regulating the cell number  (Fig. 3f). Selection on this QTL may contribute to the increase in fruit size during domestication and have resulted in extremely low genetic diversity of a nearly 10-Mb region on the bottom of chromosome 2 (Fig. 3g, h) (π = 0.573 × 10− 3 in the region versus 1.2 × 10− 3 in the whole genome), reflecting hitchhiking effects (LD decay was ~ 90 kb in this region versus ~ 35 kb in the whole genome) (Additional file 3: Figure S8).
Loss of bitterness and improvement in fruit taste during domestication and improvement
Fruit taste is another important target for artificial selection in fruit crops. Wild peaches taste bitter, landraces less so, while improved cultivars have no bitterness and a high sugar content, suggesting that fruit taste was successively selected for during both domestication and improvement. Fruit taste is mainly related to sugar content, acid content, and the balance between them. Significant changes in the content of these carbohydrates during peach evolution have been observed  (Additional file 3: Figures S6 and S9) and, accordingly, a series of fruit sugar- and acid-related QTLs [25,26,27,28, 31, 32] were found to be co-located with domestication and/or improvement sweeps (Fig. 2a, f). Moreover, some of these QTLs were situated in genomic regions that have undergone two rounds of selection (Additional file 2: Table S12), further suggesting successive selection for fruit taste during domestication and subsequent improvement.
A total of 27 fruit taste-related QTLs were found in the domestication sweeps (Fig. 2a). A previous study reported that peach fruit bitterness is conferred by levels of phenolic compounds , and we found that fruit produced by wild peach accessions had a significantly (P < 0.01) higher total phenolic content (681.8 ± 102.4 mg/kg) than fruit of the landraces (141.8 ± 118.1 mg/kg) and improved cultivars (96.0 ± 93.7 mg/kg). Previous studies identified several QTLs for total phenolic levels (phe) , of which two on chromosomes 2 and 4 showed strong domestication signals (Fig. 2a and Fig. 4a–d). We propose that selection for these two QTLs underlies the loss of flesh bitterness during peach domestication (Fig. 2a and Fig. 4a–d). We also identified a GWAS signal for total phenolic content within the phe QTL on chromosome 2 (Fig. 4a; Additional file 3: Figure S10). Moreover, the most significant signal (P = 4.31 × 10− 10) in this association peak showed considerable differentiation between wild and landrace accessions (FST = 0.98) and was almost fixed in landraces (98.5%), suggesting that the genotype conferring bitter fruit was swept during domestication, consistent with the phenotype.
We also identified 31 QTLs within the improvement sweeps that may have contributed to the second-round improvement of fruit taste (Fig. 2f). One of these, Prupe.4G037800, encoding a sugar transporter within a suc QTL , overlapped with the GWAS signal of fruit soluble solid content (SSC) on chromosome 4 (Fig. 2c, f). This gene was differentially expressed during fruit development with high expression at the mature stage (Additional file 3: Figure S11), suggesting its contributions to the increase of fruit taste during improvement. Notably, we found that the improvement sweeps were associated with a greater number of sugar-related QTLs (22 QTLs) than the domestication sweeps (18 QTLs) (Fig. 2a, f), suggesting more specific breeding for better taste and high sweetness in improvement than in domestication. In contrast, more fruit acid-related QTLs (8) were found in the domestication sweeps than in the improvement sweeps (4) (Fig. 2a, f), indicating a selection for reduced fruit acidity during domestication, which is typical of domesticated peach (Additional file 3: Figure S9).
To further characterize the selection for fruit taste, we performed GWAS analysis of SSC (Additional file 3: Figure S7). A total of 35 association SNPs and 10 association peaks were identified (P < 2.6 × 10− 8), of which three overlapped with known QTLs and seven were novel (Additional file 2: Table S13, Additional file 3: Figure S7). Among the association peaks, one on chromosome 4 fell within the domestication sweeps and four (chromosomes 1, 4, 5, and 6) were located within the improvement sweeps (Fig. 2a, c, f–i). Moreover, many more SSC associations were identified in the improvement sweeps than in the domestication sweeps, consistent with stronger selection for fruit taste during improvement than domestication.
The GWAS signal on chromosome 4 co-located with a previously reported SSC QTL  (Fig. 4e) and the genomic region harboring this QTL showed a strong domestication signal, supported by a sharp ROD value, a high CLR value, and a high FST between wild and landrace accessions (Fig. 4f–h). Through combining the transcriptome and biochemical analyses, we identified a candidate gene, Prupe.4G150100, for SSC within this QTL (Fig. 4g, Additional file 3: Figure S12). This gene, encoding a putative nine-cis-epoxycarotenoid dioxygenase 3 protein (NCED3) and involved in the abscisic acid (ABA) signaling pathway that play essential roles in fruit ripening , showed a large increase in expression at the ripening stage (~ 200 times), and its expression profiles during fruit development were strongly correlated with both SSC and ABA contents (Fig. 4g, Additional file 3: Figure S12). We found that the genomic region of the NCED3 gene showed very low nucleotide diversity in the landrace group (π = 0.688 × 10− 3 in this region versus 1.2 × 10− 3 in the whole genome), and that the selection for high SSC was likely resulted from the hitchhiking of an approximately 5-Mb genomic region (LD decay was ~ 150 kb in this region versus ~ 35 kb in the whole genome) (Fig. 4f) (Additional file 3: Figure S12). Another strong SSC association was identified on the top of chromosome 5, located within the overlap between SSC and suc QTLs  (Fig. 4i). The high ROD and CLR values from the comparison of landraces and improved cultivars suggested a strong improvement signal in this QTL (Fig. 4j, k). Moreover, an extremely high FST value of the 9-Mb region at the top of chromosome 5 further supported a strong selection for this QTL during improvement (Fig. 4l).
We found that QTL hotspots  related to fruit taste differed in the domestication and improvement phases and were located on chromosome 4 (7.5–13.0 Mb) (Fig. 2j) and 5 (1.5–4.5 Mb) (Fig. 2k), respectively. We also observed that 11 scattered QTLs were shared by domestication and improvement sweeps (Fig. 2a, f). In addition, two QTLs [17, 32] and one GWAS signal related to fruit skin color (skc) fell within both the domestication and improvement sweeps, suggesting a successive selection for fruit appearance during the two selection steps (Fig. 2a, f and Fig. 3i).
Selected regions relevant to low chilling requirement breeding
In addition to fruit size and taste, selection for low chilling requirement (CR) is a unique breeding objective for peach, and contributes to extension of the harvest season and adaptation to low-chill zones. Moreover, global warming has already led to a significant decrease in winter chill, encouraging breeders to select low CR cultivars that are likely to perform well in future climates . The continuing efforts by breeders to select for lower CR have changed the pattern of CR distribution, as evidenced by the fact that 90% of peach cultivars required more than 800 chilling hours to break dormancy 60 years ago, whereas now only 20% of cultivars require this much chilling . Although a few previous studies have identified multiple QTLs associated with peach CR [38, 39], the genome-wide genetic basis underlying low CR breeding has not yet been determined. To address this, we performed GWAS for CR (Additional file 3: Figure S13) and identified seven association peaks, located on chromosomes 1, 3, 7, and 8 (−log(P) > 7.59) (Fig. 5a, Additional file 2: Table S14). Of these, three were within or close to previously reported CR QTLs [38, 39], while the other four were new (Fig. 5a, b, f). Notably, the strong association peak (P = 3.14 × 10− 14) on chromosome 1 overlapped with a known major CR QTL (qCR1a) that explained up to 44.8% of the phenotypic variance  (Fig. 5a, b). Moreover, this strong association co-localized with the EVG locus underlying the evergrowing peach dormancy mutation (evg) [40, 41] (Fig. 5b). Six tandemly arranged Dormancy Associated MADS-box transcription factors (DAM1-6; Prupe.1G531100, Prupe.1G531300, Prupe.1G531400, Prupe.1G531500, Prupe.1G531600, Prupe.1G531700) have been identified in qCR1a, with a 41.7-kb deletion spanning all or part of DAM1-4, conferring a non-dormancy (or CR = 0) phenotype in evg peach .
Next, we identified a total of 112 targets of low CR breeding selection by screening for genomic regions with high ROD and CLR values, using a 5% cutoff (Additional file 1: Table S15). Two association peaks from the CR GWAS that overlapped with previously reported QTLs [40, 41] on chromosomes 1 and 8 fell within the selection targets (Fig. 5c, d, g, h), suggesting that these two QTLs may underlie the signatures of breeding for low CR cultivars. High FST values further supported the selection for these QTLs (Fig. 5e, i). Furthermore, we also obtained strong association signals within these two QTLs on chromosomes 1 and 8 by performing another GWAS through classification of accessions into low CR and non-low CR (Additional file 3: Figure S14, Additional file 1: Table S16). Selection for these two QTLs could have contributed to the extension of a suitable planting area for peach to low-altitude zones. A total of seven SHORT VEGETATIVE PHASE (SVP) subfamily genes related to flowering time was identified within these two QTLs, including the six DAM genes in the EVG locus and one (Prupe.8G069300) in qCR8 on chromosome 8. Interestingly, the CR QTL on chromosome 1 also harbored the loci associated with bloom date and heat requirement . Moreover, overexpression of peach SVP gene (Prupe.8G069300) within the interval of the GWAS signal on chromosome 8 in a model plant, Arabidopsis thaliana, led to the strong vegetative growth and delayed flowering time (Additional file 3: Figure S15), providing strong evidence for candidate genes for CR in peach. These results will be valuable for the development of markers for low CR breeding and the identification of genes responsible for the CR. A PCR-based marker with an accuracy of 92.2% has been developed for low CR peach identification (Additional file 3: Figure S16).
Divergent selection and local breeding
Fruit texture is an important agronomic trait for peach that is directly related to mouth-feel, transportation tolerance, and processing quality. We found that fruit texture showed geographic distribution patterns, with all wild accessions showing a melting fruit texture, while landraces and improved cultivars have both melting and non-melting phenotypes. Most of the landraces from northeastern and southern China, as well as the middle and lower reaches of the Yangtse river, showed a melting fruit texture, while most of the landraces harboring a non-melting phenotype were from northwestern China, the Yungui Plateau, and the northern Plain of China. Western and eastern improved cultivars showed both melting and non-melting fruit textures. We therefore concluded that genes underlying fruit texture underwent divergent selection during geographic differentiation or local breeding. Using GWAS, we identified a 70.5-kb SV (Pp04: 19,026,186) associated with fruit texture within the previously mapped FT locus [14, 16] (Fig. 6a, Additional file 3: Figure S2 and S3). Although a non-melting fruit texture may have arisen during domestication or post-domestication, we only identified weak domestication and improvement signals at the association region (Fig. 6b). However, using FST analysis of pairwise population differentiation, we detected a strong signal at the FT locus in accessions from northwestern China versus the northeastern, southern, and Yangtze regions (Fig. 6c). Further investigation of allele frequencies suggested that the mutant allele was mainly distributed in northwestern China, the Yungui Plateau, the North Plain, and in western improved cultivars, which is consistent with the fruit texture distribution (Fig. 6d). These data, combined with the phylogenetic analysis, suggest that the non-melting alleles in western improved cultivars may be derived from northwestern China through the ancient Silk road and subsequently spread to other western countries (Fig. 1b and Fig. 6d).
In addition to breeding for fresh consumption, breeding for processing into peach can or juice is an important objective during improvement, and the non-melting fruit texture is an essential selection target for breeding of processing peach . A total of 15 processing peach cultivars were sequenced in this study (Additional file 1: Table S1). Through scanning for genomic regions with high ROD in comparison of landraces and these 15 processing peach cultivars, we found a nearly 4-Mb selective sweep around the fruit texture GWAS signal on chromosome 4 (Fig. 6e). Intriguingly, this selection signal also harbored QTLs related to SSC, fruit firmness, stone adhesion, total sugar, maturity date , and fruit weight (Fig. 6e). Some of these traits (non-melting, high fruit firmness, and clingstone) are fundamental selection goals for processing peaches , and this 4-Mb segment may therefore represent a major genomic signature for selection of processing peaches.
Peach fruits with both high sugar and titratable acidity (TA) are widely consumed in western countries, while those with only high sugar are popular in eastern countries, suggesting that breeding targets between eastern and western countries may differ. To test this hypothesis, we determined improvement sweeps for eastern and western improved groups, respectively, and identified a total of 102 and 87 selective sweeps (Additional file 1: Tables S17 and S18, Additional file 3: Figure S17). Only approximately half of the improvement sweeps in the eastern group fell within the top 5% of the ROD regions in the western group (Fig. 6f), suggesting a limited number of shared breeding targets between eastern and western accessions, which may underlie the difference in fruit taste between different areas (Additional file 1: Tables S17 and S18, Additional file 2: Table S19-S21). One example is the SSC:TA ratio, which is essential for fruit taste, for which we found a major 820-kb QTL  overlapping with a western-specific sweep on chromosome 5 (Fig. 6g), which could have contributed to the selection for high SSC and TA in western cultivars. The local selection for this QTL was also supported by a high FST value, when western and eastern accessions were compared (Fig. 6h). We conclude that differential selection for these regions conferred the richer flavor in western cultivars than eastern cultivars. Flavor selection involves multiple factors and it is notable that the increase of sugar content is not always accompanied by a parallel increase in consumer preference and flavor [10, 43]. An alternative explanation for different selection targets is that the western and eastern peach accessions have undergone different selection pressures resulting from the distinct environment.
In this study, we generated a large genome-variation dataset for peach by sequencing 480 widely collected wild and cultivated accessions from around the world. Using this resource, we tracked the genetic changes and selection targets during domestication and breeding, providing insights into the history of peach evolution. We found that fruit size was predominantly selected during domestication and that selection for large fruits has led to the loss of genetic diversity in fw QTLs. Fruit taste-related QTLs were successively selected for by two successive selection steps, and many more fruit taste-related QTLs were found to be associated with improvement than with domestication, suggesting a sharp decline of genetic diversity at loci related to fruit taste. Although peach fruit taste has improved in modern cultivars compared with their wild progenitors, successive selection for sugar and acid may lead to a homogenization and deterioration in flavor quality, similar to the case of commercial tomato varieties . In this regard, the use of landrace accessions as parents, especially heirloom varieties, has considerable potential in introgressing valuable genetic diversity.
Our variation dataset provides a valuable resource for future peach improvement using novel breeding technologies and strategies, such as genomic selection, molecular design breeding, and introgression of novel alleles from landraces and wild relatives . Associations for 11 agronomic traits also provide valuable information to accelerate genetic gains for key traits to improve yield and edible quality in peach.
Plant material and sequencing
We sampled a total of 480 peach accessions from the National Peach Germplasm Repository of China (NPGRC) (Zhengzhou) (N 34.71°, E 113.70°, A.S.L. 74 m), including 52 wild relatives (45 of Prunus mira Koehne, four of P. davidiana (Carr.) Franch., two of P. kansuensis Rehd., a single sample of P. potaninii Batal), 213 landrace accessions (P. persica L.), and 215 improvement accessions (P. persica L.) (Additional file 1: Table S1). Genomic DNA was extracted from young leaves using the cetyltrimethylammonium bromide (CTAB) method  and at least 4 μg of genomic DNA from each accession was used to construct a sequencing library, following the manufacturer’s instructions (Illumina Inc.). Paired-end sequencing libraries with an insert size of approximately 300 bp or 500 bp were sequenced using an Illumina GAII or Hiseq 2500 sequencer, with read lengths of 49 bp, 90 bp, or 125 bp. In total, more than 1 Gb of sequence data was generated for each accession (Additional file 1: Table S1).
Alignment and variation calling
Mapping of short reads
Paired-end sequencing reads from each accession were mapped to the peach “Lovell” genome  (release version 2.0_a2.1) using BWA  (version 0.7.12), with the following parameters: bwa mem -t 4 -M -R. Mapped reads in SAM format were converted into BAM format, sorted according to mapping coordinates, and processed for PCR duplicate removal using the Picard package (http://broadinstitute.github.io/picard/, version 1.136) with default parameters. The coverage and depth of sequence alignments were computed using the DepthOfCoverage program in the Genome Analysis Toolkit  (GATK, version 3.4-46) and the genomencov program in BEDtools  (version 2.24.0), respectively. The coverage and depth of each accession are provided in Additional file 1: Table S1.
SNP calling followed GATK Best-Practices (https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS). To obtain accurate SNPs, we filtered low-quality alignments with a mapping quality < 20 using SAMtools  (V1.3.1). SNP detection was performed using the GATK HaplotypeCaller , which calls SNPs via local de novo assembly of haplotypes in an active region. The detailed processes were as follows: (1) after filtering out the low-quality alignments, the reads around the INDELs were realigned in two steps, including identifying regions where realignment was needed using the GATK RealignerTargetCreator package and realigning the regions found in the first step using the GATK IndelRealigner package. Next, a realigned BAM file for each accession, which was used for SNP detection, was generated using the GATK PrintReads packages. (2) SNPs were detected at a population level using the realigned BAM file with GATK HaplotypeCaller. To reduce false positives, we set a high SNP confidence score with the following parameters: -stand_call_conf 30 -stand_emit_conf 40. (3) To ensure the quality of variant calling, we supplied a hard filter for the raw SNPs with SNP quality > 40 and more than two supporting reads using GTAK VariantFiltration with the following parameters: QUAL < 40, QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < − 12.5, ReadPosRankSum < − 8.0. To further reduce false positives, the SNP number per 10 bp was limited to three using the following settings: --clusterWindowSize 10, --clusterSize 3.
INDEL calling was performed using the same pipeline as SNP calling since GATK is capable of calling SNPs and INDELs simultaneously. To reduce false positives, we also supplied a harder filter for raw INDELs using GTAK VariantFiltration with the following parameters: QD < 2.0, FS > 200.0, ReadPosRankSum < − 20.0. Insertions and deletions ≤ 5 bp were defined as small INDELs.
To reduce the false positives, SV calling was performed using three programs, SpeedSeq (version 0.1.2) , DELLY (version 0.7.8) , and manta (version 1.4.0) . For SpeedSeq, paired-end reads were mapped to the reference genome using the aln module in SpeedSeq with the following parameters: speedseq align -R -t 4. Three BAM files were generated, including a full, duplicate-marked, sorted BAM, a BAM file containing split reads, and a BAM file containing discordant read-pairs. SVs were identified using the sv module in SpeedSeq using the following settings: sv -o -x -t 25 -R -B -D -S -g -P. Finally, the SV file in VCF format was generated. For DELLY, mapped pair-end reads in BAM format after sorting and marking PCR duplicates were used as input. SVs were identified by DELLY with default parameters. SV files in VCF format for all of 480 samples were merged into a population-level VCF file using bcftools. For manta, the same BAM files with DELLY were used to detect SVs using configManta.py with default parameters. SV files for 480 accessions were then merged using SURIVER and genotyping for the accessions were called using SVtyper  with default parameters. The shared and private SVs for three callers were detected using the merge module in SURIVAR  (Additional file 3: Figure S18). Finally, SVs identified by at least two callers were kept and used for downstream analyses.
SNP annotation was performed based on genomic locations and predicted coding effects according to the peach genome annotation  (release annotation version a2.1) using snpEff (version 4.1g) . The final SNPs were categorized in exonic regions, intronic regions, splicing sites, 5′ and 3′ untranslated regions (UTRs), upstream and downstream regions, and intergenic regions. SNPs in the coding sequence were further grouped into synonymous SNPs (which did not cause amino acid changes) and nonsynonymous SNPs (which caused amino acid changes) (Additional file 2: Tables S2 and S3).
SNP genotype imputation
We noted that 13 accessions exhibited a genotyping miss rate > 30% (Additional file 3: Figure S19). In these cases, we inferred missing genotype data using the hidden Markov model (HMM) in the Beagle software  (version 4.0) with the following default parameter settings: unphased and non-reference; iterations = 10 window = 50,000 nthreads = 10.
Population genetics analysis
A neighbor-joining tree was constructed with all 4,980,259 SNPs using the PHYLIP software (version 3.696) with 100 bootstrap replicates , and visualized using the FigTree software (http://tree.bio.ed.ac.uk/software/figtree/, version 1.4.2) and evolview (v2) .
We performed the PCA using the GCTA software (version 1.26.0) with default settings . The first three eigenvectors were retained to create a plot in two or three dimensions.
The population structure was also investigated with the STRUCTURE software  (version 1.3.0), which is based on a likelihood model. We ran 10,000 iterations, and the number of clusters (K) was set from 2 to 7, representing the assumed groups of the simulated ancestry population. The best K was inferred based on the delta K method using the Structure Harvester program .
LD was calculated on the basis of SNPs with MAF greater than 0.05 using PLINK software (Version 1.07)  with following: --file --r2 --ld-window 99999 --ld-window-kb 200 --out. Then, values for the r2 statistics were obtained. LD decay was calculated based on r2 between two SNPs and the distance between the two SNPs.
Population size estimation
The effective population size at domestication was estimated using the δaδi program (Version 1.7.0) . We fitted a two-population model with landrace and improved groups mixed together and compared to the wild group (Additional file 3: Figure S20 and Additional file 2: Table S22). The simulation was carried out 10 times, and each time we randomly selected 500,000 SNPs and estimated 95% confidence intervals on the basis of the best fitting parameters. The parameters inferred by δaδi were scaled by 2Ne, with Ne being the ancestral population size. The ancestral population size was estimated using the formula 4Ne × μ × L = θ, where μ is the mutation rate, L is the generation time, and θ is the genetic diversity. We used θwild to estimate θ (∼ 3.5 × 10− 3). A mutation rate of 7.77 × 10− 9 (the mutation rate per generation)  and a generation time of 3 were used for μ and L. Therefore, 2Ne was estimated to be 7.50 × 104. All the parameters were then scaled by 2Ne to estimate time in years and the population size in number of individuals.
Domestication bottleneck was further verified using the BOTTLENECK program (version 1.2.02) which tests historical bottleneck by detecting the reductions of recent effective population size based on the comparison of heterozygosity . A total of 1000 randomly selected SNPs were used in BOTTLENECK analysis. The tests were performed under the stepwise-mutation model (SMM), infinite allele model (IAM), and the two-phase model (TPM) allowing for 30% multistep changes. The iteration was set to 1000. We used the Wilcoxon signed rank tests to determine whether a population had a significant number of loci with excess genetic diversity.
Genetic diversity and population differentiation
The nucleotide diversity (π) was calculated for each group with VCFtools  (version 0.1.12) using a 100-kb window and a step size of 10 kb. Genetic differentiation (FST) was determined among different groups with VCFtools using a 100-kb window with a step size of 10 kb. Tajima’s D value was calculated with VariScan  (version 2.0.3) using a 100-kb window and a step size of 10 kb.
Detection of selective sweeps
Genomic regions under selection often showed a decrease in genetic diversity. We identified genomic regions selected by domestication and improvement by comparing the ROD using a 100-kb window with a step size of 10 kb. For domestication, πwild/πlandrace was calculated, and the top 5% of windows or regions with highest ROD values were defined as domestication sweeps (πwild/πlandrace > 8.21). For improvement, πlandrace/πimproved cultivars was calculated, and the top 5% of windows or regions with highest ROD values were defined as improvement sweeps (πlandrace/πimproved cultivars > 4.28).
To confirm the selection sweeps identified by ROD, we also detected the selection signals using a likelihood-based method in the SweeD program (Version 3.1) . The composite likelihood ratio (CLR) was calculated in domestication and improvement candidate regions using a grid of 2000. We found that more than 83% of the selective sweeps identified by ROD also showed selection signals by the CLR test (Additional file 3: Figure S21), indicating that most of the selection regions can be identified by both methods, which are thus reliable.
Seven qualitative traits, including flesh color (white/yellow), fruit hairiness (peach/nectarine), fruit shape (round/flat), fruit texture (melting/non-melting), flesh adhesion (adhesion/freestone), pollen fertility (male sterility/fertility), and fruit skin color (red/non-red skin) were measured based on previously published plant genetic resources evaluation criteria  in two successive years, from 2011 and 2012. The SSC was determined with a hand refractometer in 2007, 2010, and 2014 following the same protocol described in Frett et al. . Ten peach fruits were sampled and homogenized to form a juice for each accession to determine the SSC. The fruit weight was determined as the average value of 10 fruits for each accession in 2007 and in 2010. The phenotypic value used for GWAS of these two quantitative traits was the mean of different measurements from different years. The frequency distribution of these two quantitative traits (SSC and fruit weight) is shown in Additional file 3: Figure S7. Total phenolics were extracted from 5 g fresh flesh combining five peach fruits using an extraction solution of 0.5 mol/L HCL in 800 mL/L methanol. The total phenolics were further evaluated with colorimetric methods and measured using a spectrophotometer (TU-1901, PERSEE, Beijing, China) as previously described . Total phenolics was evaluated in 2015 and 2016, and the mean values used in GWAS.
For CR, two different kinds of phenotyping analyses were performed in 2011 and 2012. First, CR was considered a quantitative trait and determined following a protocol as previously described . Average temperatures in 30-min intervals were continuously recorded using a temperature automatic recorder (LOGGER 95-4, http://www.hzjly.com) placed in the canopies of the experimental trees, starting in the middle of October. A 0–7.2 °C model was chosen to evaluate CR that was demonstrated to be suitable at Zhengzhou [38, 71]. The number of hours in 0–7.2 °C (chilling hours, CHs) was counted, starting when the daily average air temperature dropped to below 7.2 °C. Starting at 50 CHs, the branches of each accession were cut every 50 CHs until 1300 CHs. For each accession, two clones were sampled, and three branches longer than 40 cm with floral buds were taken from each clone. Branch cuttings were placed in water in a greenhouse at 25 °C and a 16 h/8 h photoperiod to force floral bud break. The frequency of floral bud break was evaluated after 2 weeks. The CR of an accession was defined as being sufficient at a specific sampling time if 50% of floral buds on the branch cuttings opened (Additional file 3: Figure S13). Secondly, CR was defined as a qualitative trait and evaluated using a similar method. Branches of each accession were cut only at 400 CHs. The frequency of floral bud break was evaluated after 2 weeks at forcing temperature. An accession was considered to have “low CR” if 50% of floral buds on the branch cuttings opened; otherwise, the accession was defined as a “non-low CR” accession. The results of the second measurement are shown in Additional file 1: Table S16.
Genome-wide association studies
We used 374, 207, 415, 343, 343, 343, 165, 271, 361, 355, and 362 peach accessions to perform GWAS for flesh color, fruit hairiness, fruit shape, fruit texture, flesh adhesion, pollen fertility, total levels of phenolic compounds, skin color, fruit weight, fruit SSC, and CR, respectively. To improve the statistical power of the analysis, we filtered SNPs with minor allele frequency (MAF) < 0.05 or MAF < 0.01 after genotype imputation for each GWAS. A mixed linear model (MLM) program, Efficient Mixed-Model Association eXpedited (EMMAX)  (version beta), was used to carry out the GWAS analyses. To minimize false positives, population structure was taken into account using a kinship matrix that was estimated with the EMMAX emmax-kin program . We defined the whole-genome significance cutoff as the Bonferroni test threshold, which was set as 0.05/total SNPs (−log10(P) = 7.19, −log10(P) = 7.62, −log10(P) = 7.19, −log10(P) = 7.62, −log10(P) = 7.62, −log10(P) = 7.62, −log10(P) = 7.18, −log10(P) = 7.12, −log10(P) = 7.49, −log10(P) = 7.59, −log10(P) = 7.38 for flesh color, fruit hairiness, fruit shape, fruit texture, flesh adhesion, pollen fertility, total levels of phenolic compounds, skin color, fruit weight, fruit SSC, and CR, respectively). Genome-wide SVs were also used to perform GWAS with the same accessions. The GWAS using SVs followed the same protocols of GWAS using SNPs.
Fruit samples were taken at six stages of fruit development from peach cultivar “Taijin Shui Mi”, including 20, 40, 60, 80, 100, and 120 days after bloom (DAB). Three biological replicates were collected for each stage. Fruit flesh was immediately frozen in liquid nitrogen and then ground to fine powder. Total RNA was extracted using a quick extraction kit (Aidlab, Beijing, China). First- and second-strand complementary DNA (cDNA) was synthesized using a cDNA Synthesis System kit (TOYOBO, Osaka, Japan), following the manufacturer’s protocol. Then double-strand cDNAs were purified and adapters were ligated to the short fragments. The constructed RNA-Seq libraries were sequenced using the Illumina HiSeq 2000 platform in paired-end 150-bp mode. Low-quality reads were filtered from the raw reads using Trimmomatic . Data analysis followed the protocol proposed by Pertea et al. . Cleaned reads were mapped to the peach reference genome using HISAT2 (Version 2.0.5)  using default parameters. Transcript abundances were estimated, and transcript assembly was performed using the Stringtie program . DEG analysis was carried out using the R package ballgown .
Overexpression of SVP gene in Arabidopsis
The full-length open reading frames of peach SVP gene (Prupe.8G069300) were amplified through PCR using cDNAs synthesized from RNA that were isolated from young leaves of a low CR landrace, “Nanshan Tian Tao” (CR = 200 h). The PCR products were further cloned into the pBI121 vector driven by the cauliflower mosaic virus (CaMV) 35S promoter at Sangon Biotech (Sangon, Shanghai, China). The resulting constructs were further transformed into Arabidopsis thaliana Columbia type by Agrobacterium tumefaciens GV3101 and selected with kanamycin. The rosetteleaf numbers of 10 transgenic lines were used to evaluate the flowering time. The primers used for gene cloning are listed in Additional file 2: Table S23.
Expression analysis using qRT-PCR
Total RNA were extracted from the fruit flesh of 10 accessions at maturity, including “Kashi 2,” “Kashi 3,” “Kashi 4,” “Xinjiang Huang Rou,” “Zao Shu Huang Gan,” “Lin Huang 1,” “Reddomun,” “Gua Tao,” “Shaji 2,” “Hang Zhou Zao Shui Mi,” using an extraction kit (Aidlab, Beijing, China). First-strand cDNA synthesis was carried out using 1 mg RNA and the Transcriptor First Strand cDNA synthesis kit (Takara, Dalian, China), according to the manufacturer’s protocol. Sequence of candidate gene for SSC, Prupe.4G150100, was downloaded from the Genome Database for Rosaceae (GDR; www.rosaceae.org). Gene-specific primers were designed using the Primer-BLAST software (National Center for Biotechnology Information, Maryland, USA). We performed qRT-PCR using the LightCycler System (Roche LightCycler 480; Roche Diagnostics), following the manufacturer’s protocol. Relative expression levels were estimated by the 2−ΔΔCT method. The primers were listed in Additional file 2: Table S23.
Borlaug NE. Contributions of conventional plant breeding to food production. Science. 1983;219:689–93.
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;42:961–7.
Jiao Y, Zhao H, Ren L, Song W, Zeng B, Guo J, et al. Genome-wide genetic changes during modern breeding of maize. Nat Genet. 2012;44:812–5.
Qi J, Liu X, Shen D, Miao H, Xie B, Li X, et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat Genet. 2013;45:1510–5.
Lin T, Zhu G, Zhang J, Xu X, Yu Q, Zheng Z, et al. Genomic analyses provide insights into the history of tomato breeding. Nat Genet. 2014;46:1220–6.
Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, Li W, et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol. 2015;33:408–14.
Cao K, Zhou Z, Wang Q, Guo J, Zhao P, Zhu G, et al. Genome-wide association study of 12 agronomic traits in peach. Nat Commun. 2016;7:13246.
Duan N, Bai Y, Sun H, Wang N, Ma Y, Li M, et al. Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement. Nat Commun. 2017;8:249.
Dirlewanger E, Graziano E, Joobeur T, Garriga-Calderé F, Cosson P, Howad W, et al. Comparative mapping and marker-assisted selection in Rosaceae fruit crops. Proc Natl Acad Sci U S A. 2004;101:9891.
Cirilli M, Bassi D, Ciacciulli A. Sugars in peach fruit: a breeding perspective. Hortic Res. 2016;3:15067.
Cao K, Zheng Z, Wang L, Liu X, Zhu G, Fang W, et al. Comparative population genomics reveals the domestication history of the peach, Prunus persica, and human influences on perennial fruit crops. Genome Biol. 2014;15:415.
Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, et al. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet. 2013;45:487–94.
Falchi R, Vendramin E, Zanon L, Scalabrin S, Cipriani G, Verde I, et al. Three distinct mutational mechanisms acting on a single gene underpin the origin of yellow flesh in peach. Plant J. 2013;76:175–87.
Dirlewanger E, Cosson P, Boudehri K, Renaud C, Capdeville G, Tauzin Y, et al. Development of a second-generation genetic linkage map for peach [Prunus persica (L.) Batsch] and characterization of morphological traits affecting flower and fruit. Tree Genet Genomes. 2006;3:1–13.
Vendramin E, Pea G, Dondini L, Pacheco I, Dettori MT, Gazza L, et al. A unique mutation in a MYB gene cosegregates with the nectarine phenotype in peach. PLoS One. 2014;9:e90574.
Gu C, Wang L, Wang W, Zhou H, Ma B, Zheng H, et al. Copy number variation of a gene cluster encoding endopolygalacturonase mediates flesh texture and stone adhesion in peach. J Exp Bot. 2016;67:1.
Yamamoto T, Shimada T, Imai T, Yaegaki H, Haji T, Matsuta N, et al. Characterization of morphological traits based on a genetic linkage map in peach. Breeding Sci. 2001;51(4):271–8.
Velasco D, Hough J, Aradhya M, Ross-Ibarra J. Evolutionary genomics of peach and almond domestication. G3-Genes Genom Genet. 2016;6(12):3985–93.
Zhou Y, Massonnet M, Sanjak JS, Cantu D, Guat B. Evolutionary genomics of grape (Vitis viniferassp. Vinifera) domestication. Proc Natl Acad Sci U S A. 2017;114(44):11715–20.
Cornuet JM, Luikart G. Description and power analysis of two tests for detecting recent population bottlenecks from allele frequency data. Genetics. 1996;144:2001–14.
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009;5:e1000695.
Ross-Ibarra J, Tenaillon M, Gaut BS. Historical divergence and gene flow in the genus Zea. Genetics. 2009;181:1399–413.
Caicedo AL, Williamson SH, Hernandez RD, Boyko A, Fledel-Alon A, York TL, et al. Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet. 2007;3:1745–56.
Arroyo-García R, Ruizgarcía L, Bolling L, Ocete R, López MA, Arnold C, et al. Multiple origins of cultivated grapevine (Vitis vinifera L. ssp. sativa) based on chloroplast DNA polymorphisms. Mol Ecol. 2006;15(12):3707–14.
Dirlewanger E, Moing A, Rothan C, Svanella L, Pronier V, Guye A, et al. Mapping QTLs controlling fruit quality in peach (Prunus persica (L.) Batsch). Theor Appl Genet. 1999;98:18–31.
Quilot B, Wu B, Kervella J, Génard M, Foulongne M, Moreau K. QTL analysis of quality traits in an advanced backcross between Prunus persica, cultivars and the wild relative species P. davidiana. Theor Appl Genet. 2004;109:884–97.
Martínez-García PJ, Parfitt DE, Ogundiwin EA, Fass J, Chan HM, Ahmad R, et al. High density SNP mapping and QTL analysis for fruit quality characteristics in peach (Prunus persica L.). Tree Genet & Genomes. 2013;9:19–36.
Zeballos JL, Abidi W, Giménez R, Monforte AJ, Moreno MA, Gogorcena Y. Mapping QTLs associated with fruit quality traits in peach [Prunus persica (L.) Batsch] using SNP maps. Tree Genet & Genomes. 2016;12:37.
Frary A, Nesbitt TC, Grandillo S, Knaap E, Cong B, Liu J, et al. fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. Science. 2000;289:85–8.
Wang LR, Zhu GR, Fang WC. Peach genetics resources in China. Beijing: China Agriculture Press; 2012.
Etienne C, Rothan C, Moing A, Plomion C, Bodénès C, Svanella-Dumas L, et al. Candidate genes and QTLs for sugar and organic acid content in peach [Prunus persica (L.) Batsch]. Theor Appl Genet. 2002;105:145–59.
Eduardo I, Pacheco I, Chietera G, Bassi D, Pozzi C, Vecchietti A, et al. QTL analysis of fruit quality traits in two peach intraspecific populations and importance of maturity date pleiotropic effect. Tree Genet Genomes. 2011;7(2):323–35.
Predieri S, Ragazzini P, Rondelli R. Sensory evaluation and peach fruit quality. Acta Hort. 2006;713:429–34.
Jia HF, Chai YM, Li CL, Lu D, Luo JJ, Qin L, et al. Abscisic acid plays an important role in the regulation of strawberry fruit ripening. Plant Physiol. 2011;157(1):188.
Wang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, Ye Z, et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet. 2017;49(4):579.
Li Y, Wang L, Zhu G, Fang W, Cao K, Chen C, et al. Phenological response of peach to climate change exhibits a relatively dramatic trend in China, 1983-2012. Sci Hortic-Amsterdam. 2016;209:192–200.
Sansavini S, Gamberini A, Bassi D. Peach breeding, genetics and new cultivar trends. Acta Hort. 2006;713:23–48.
Fan S, Bielenberg DG, Zhebentyayeva TN, Reighard GL, Okie WR, Holland D, et al. Mapping quantitative trait loci associated with chilling requirement, heat requirement and bloom date in peach (Prunus persica). New Phytol. 2010;185:917–30.
Bielenberg DG, Rauh B, Fan S, Gasic K, Abbott AG, Reighard GL, et al. Genotyping by sequencing for SNP-based linkage map construction and QTL analysis of chilling requirement and bloom date in peach [Prunus persica (L.) Batsch]. PLoS One. 2015;10:e0139406.
Bielenberg DG, Wang Y, Li Z, Zhebentyayeva T, Fan S, Reighard GL, et al. Sequencing and annotation of the evergrowing locus in peach [Prunus persica (L.) Batsch] reveals a cluster of six MADS-box transcription factors as candidate genes for regulation of terminal bud formation. Tree Genet Genomes. 2008;4:495–507.
Bielenberg DG, Wang Y, Fan S, Reighard GL, Scorza R, Abbott AG. A deletion affecting several gene candidates is present in the evergrowing peach mutant. J Hered. 2004;95:436–44.
Gradziel TM, Mccaa JP. Processing peach cultivar development. In: Layne D, Bassi D, editors. The peach: botany, production and uses. Oxfordshire: CABI; 2008. p. 175–92.
Tieman D, Zhu G, Resende MF, Lin T, Nguyen C, Bies D, et al. A chemical genetic roadmap to improved tomato flavor. Science. 2017;355:391.
Tester M, Langridge P. Breeding technologies to increase crop production in a changing world. Science. 2010;327:818–22.
Murray MG, Thompson WF. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980;8:4321–5.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–8.
Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods. 2015;12(10):966–8.
Tobias R, Thomas Z, Andreas S, Adrian MS, Vladimir B, Jan OK. Delly: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–9.
Chen X, Schulz-Trieglaff O, Shaw R, Barnes B, Schlesinger F, Källberg M, et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32(8):1220–2.
Jeffares DC, Jolly C, Hoti M, Speed D, Shaw L, Rallis C, et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat Commun. 2017;8:14061.
Cingolani P, Platts A, Wang Le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly. 2012;6:80.
Browning BL, Browning SR. Genotype imputation with millions of reference samples. Am J Hum Genet. 2016;98:116–26.
Felsenstein J. PHYLIP-phylogeny inference package (version 3.2). Cladistics. 1989;5:164–6.
He Z, Zhang H, Gao S, Lercher MJ, Chen WH, Hu S. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res. 2016;44(W1):W236–41.
Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88:76–82.
Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2013;164:1567–87.
Earl DA, Vonholdt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour. 2012;4(2):359–61.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
Xie Z, Wang L, Wang L, Wang Z, Lu Z, Tian D, et al. Mutation rate analysis via parent-progeny sequencing of the perennial peach I A low rate in woody perennials and a higher mutagenicity in hybrids. P Roy Soc B-Biol Sci. 2016;283:20161016.
Piry S, Luikart G, Cornuet JM. BOTTLENECK: a computer program for detecting recent reductions in the effective population size using allele frequency data. J Hered. 1999;90:502–3.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8.
Vilella AJ, Blanco-Garcia A, Hutter S, Rozas J. VariScan: analysis of evolutionary patterns from large-scale DNA sequence polymorphism data. Bioinformatics. 2005;21:2791–3.
Pavlidis P, Živković D, Stamatakis A, Alachiotis N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol Biol & Evol. 2013;30(9):2224.
Wang L, Zhu G. Descripters and data standard for peach. Beijing: China Agriculture Press; 2005.
Frett TJ, Gasic K, Clark JR, Byrne D, Gradziel T, Crisosto C. Standardized phenotyping for fruit quality in peach [Prunus persica (L.) Batsch]. J Am Pomol Soc. 2012;66:214–9.
Abidi W, Cantín CM, Jiménez S, Giménez R, Moreno MA, Gogorcena Y. Effect of antioxidant compounds and total sugars and genetic background on the chilling injury susceptibility of a non-melting peach [Prunus persica (L.) Batsch] progeny. J Sci Food Agric. 2015;95:351–8.
Wang LR, Zhu GR, Fang WC, Zuo QY. Estimating models of the chilling requirement for peach. Acta Hortic Sin. 2003;30:379–83.
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–54.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Pertea M, Kim D, Pertea G, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11(9):1650–67.
Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–60.
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5.
Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol. 2015;33:243–6.
Li Y, Cao K, Zhu G, Fang W, Chen C, Wang X, et al. Genomic analyses of an extensive collection of wild and cultivated accessions provide new insights into peach breeding history. Sequence Read Archive: PRJNA388029. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA388029. Accessed 2017.
Li Y, Cao K, Zhu G, Fang W, Chen C, Wang X, et al. Genomic analyses of an extensive collection of wild and cultivated accessions provide new insights into peach breeding history. Figshare. https://doi.org/10.6084/m9.figshare.5008988.v4. Accessed 2017.
Li Y, Cao K, Zhu G, Fang W, Chen C, Wang X, et al. Genomic analyses of an extensive collection of wild and cultivated accessions provide new insights into peach breeding history. Figshare. https://figshare.com/articles/Peach_SVs/7238186. Accessed 2018.
We thank Xiuli Zeng and Shanshan Zhang from the Tibet Academy of Agricultural and Animal Husbandry Sciences for helping to collect and investigate the P. mira accessions from the Tibet Plateau, Prof. Dacheng Tian and Prof. Sihai Yang from Nanjing University for suggestions on data analyses and paper writing, and Dr. Amandine Cornille from Université Paris-Sud for helpful suggestions in paper writing.
This work was supported by the grants from the National Natural Science Foundation of China (31572094), the Agricultural Science and Technology Innovation Program (CAAS-ASTIP-2018-ZFRI-01), Crop Germplasm Resources Conservation Project (2016NWB041), and the US National Science Foundation (IOS-1339287 and IOS-1539831).
Availability of data and materials
The sequence data have been deposited in the NCBI Sequence Read Archive (SRA) under accession PRJNA388029 . The peach reference genome (v2.0) used in this study is available from Genome Database for Rosaceae . SNPs and SVs in Variant Call Format (VCF) have been deposited into the Figshare database [79, 80]. All other relevant data are contained within the paper and its Additional files.
The review history is available as Additional file 4.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Sequencing summary of 480 peach accessions. Table S8. Putative domestication sweeps. Table S9. Putative improvement sweeps. Table S10. Genes in putative domestication sweeps. Table S11. Genes in putative improvement sweeps. Table S15. Putative selective sweeps for low CR breeding. Table S16. Chilling requirement test by classifying accessions into low (1) and non-low CR (0). Table S17. Putative improvement sweeps in western improved group. Table S18. Putative improvement sweeps in eastern improved group. (XLSX 576 kb)
Table S2. Distribution and summary of genome-wide variants. Table S3. Summary of SNP annotations. Table S4. SNP loci (30) selected for validation by Sequenom MassARRAY. Table S5. GWAS results for six traits using SNPs and SVs. Table S6. Estimation of domestication bottlenecks in peach and other crop and fruit species. Table S7. Domestication bottleneck verified using BOTTLENECK. Table S12. Genomic regions continuously selected by both domestication and improvement. Table S13. Summary of SNPs associated with SSC and fruit weight. Table S14. SNPs associated with chilling requirement. Table S19. Shared selective sweeps between western and eastern improved groups. Table S20. Eastern specific improvement sweeps. Table S21. Western group specific improvement sweeps. Table S22. Best fitting parameters for the two-population model for cultivated and wild peach groups in the demographic analysis. Table S23. Primers used in this study. (DOC 486 kb)
Figure S1. Genetic diversity of the 480 peach accessions. Figure S2. Manhattan plot and QQ plot of genome-wide association studies on seven important agronomic traits. Figure S3. A candidate 70.5-kb deletion underlying fruit texture. Figure S4. Phenetic neighbor-joining tree of 480 peach accessions constructed using PHYLIP with 100 bootstrap replicates. Figure S5. Population structure of the 480 peach accessions. Figure S6. Phenotypes and nucleotide diversity changes during peach improvement and domestication. Figure S7. Genome-wide association studies (GWAS) of SSC and fruit weight. Figure S8. Regional LD decay measured by R2 at the 20.0–30.0-Mb region of chromosome 2. Figure S9. Total organic acids in domesticated peaches and wild relatives. Figure S10. Genome-wide association study of total phenolic content for peach. Figure S11. Expression profile of a sugar transport gene, Prupe.4G037800, during fruit development in peach. Figure S12. A candidate gene for SSC associated with increase of fruit taste during domestication. Figure S13. Distribution of chilling requirement in 371 peach accessions. Figure S14. Manhattan plot (left) and QQ plot (right) of GWAS for chilling requirement (CR) by classifying accessions into low CR and non-low CR. Figure S15. Comparison of flowering times between Arabidopsis overexpressing of peach SVP gene (OE) and the wild type (WT) Arabidopsis. Figure S16. A PCR-based marker for low CR peach identification based on the results of CR GWAS. Figure S17. Comparison of improvement sweeps between eastern and western improved groups. Figure S18. Shared and private SVs among three callers. Figure S19. Distribution of genotype missing rate for 480 peach accessions. Figure S20. Demographic model for peach domestication and spread. Figure S21. Shared domestication and improvement sweeps identified by ROD and XP-CLR methods. (PDF 2514 kb)
Review history. (DOCX 197 kb)