Sampling and genomic data of weedy, cultivated, and wild rice
For a comprehensive investigation of the origin and evolution of global weedy rice, whole-genome sequences were analyzed from 524 weedy rice accessions representing major rice production areas in 16 countries across Asia, Europe, North America, and Latin America (Fig. 1; Additional file 1: Table S1). Diverse types of rice are cultivated in these regions, including temperate japonica (predominant in northern China, Korea, Japan, and Italy), indica (in southern China, India, Southeast Asia and Latin America), aus (in upland regions of the Indian subcontinent), and tropical japonica (in Southeast Asia and the USA). Phenotypically, most of the weed accessions had seeds characterized by reddish-brown pericarp color and smooth spikelet bases, which are typical traits of weedy rice (Additional file 1: Table S1).
Weedy rice samples were sequenced to an average 19.9× genome coverage. For genomic comparison, public genomic sequence data were retrieved for a worldwide sample of 426 locally cultivated rice varieties and 53 wild rice accessions [3, 9, 10]; this yielded a combined genotype dataset of 16.2 million SNPs across 1003 samples for use in the population genomic analysis described below.
Weedy rice has evolved repeatedly from cultivated rice
Population structure and principal component analysis (PCA) confirmed previously described subgroups within cultivated rice (tropical japonica, temperate japonica, and aromatic varieties within the traditional japonica subspecies; and indica and aus varieties within the traditional indica subspecies) (Fig. 2a; Additional file 2: Fig. S1). For weedy rice, though varying by region, all strains shared ancestry predominantly with cultivated rice, specifically varieties of the indica, temperate japonica, and aus subgroups. None of the weeds showed closest ancestry with wild rice, although some degree of wild rice introgression was evident in Southeast Asian strains based on FastStructure and ABBA analyses (Fig. 2a; Additional file 2: Fig. S2); this is consistent with previous inferences of wild rice introgression into weed populations in this geographical region [11,12,13]. Assessments of genome-wide nucleotide diversity indicated that most weedy groups harbor lower genetic diversity than their respective inferred crop ancestors (Additional file 2: Fig. S3), consistent with post-domestication bottlenecks during feralization. We also compared the ratio of the derived allele frequency spectrum of genomic regions that were targets of selection during rice domestication [9] and regions that were not, in wild, cultivated and weedy rice populations. Both japonica and indica weeds showed the domestication-associated U-shaped distribution found in cultivated rice (Fig. 2b; Additional file 2: Fig. S4) and thus bear a signature of ancestry from domesticated ancestors. Similarly, the relative genetic diversity change of domestication and improvement genes [14] shows a similar level of reduction for weedy and cultivated rice compared to wild rice (Additional file 2: Fig. S5); this is again consistent with weedy rice descent from domesticated ancestors.
Estimates of divergence times between weeds and their respective crop ancestors revealed substantial variation among strains (Fig. 2c; Additional file 2: Fig. S6). For example, both japonica-derived weeds and cultivated japonica rice shared a very recent genetic bottleneck around 1000 years ago (assuming one generation per year) (Fig. 2c), and had similar patterns in the distribution of population effective size (Ne) after that; this suggests that japonica weeds likely diverged from their cultivated counterparts < 1000 years ago. In comparison, some indica- and aus-derived weeds were inferred to have diverged from their respective crop ancestors approximately a millennium earlier (Additional file 2: Fig. S6). Taken together, these results suggest that weedy rice has evolved repeatedly and independently from cultivated ancestors at different time points during the history of rice cultivation.
Hybrid origin of Latin American weedy rice
Weedy rice in Latin America (e.g., Brazil, Panama, Paraguay, and Peru) is unique among worldwide samples, with over half of samples (51/95) showing admixed genetic ancestry between indica and aus (Fig. 2a; Additional file 1: Table S1, S2). Consistent with this pattern, a TreeMix analysis indicates that these putatively admixed accessions originated from hybridization between Latin America indica- and aus-type weedy rice (Additional file 2: Fig. S7). In addition, the admixed strains showed elevated nucleotide diversity compared to local indica or aus weedy rice strains (Additional file 2: Fig. S3), as well as higher observed heterozygosity than weedy rice in other world regions (Additional file 2: Fig. S8). Given that weedy rice, like cultivated rice, is predominantly self-fertilizing, these patterns suggest that many of the Latin American weeds in our sample have originated through recent hybridization of local aus and indica weeds.
Commercial varieties with ALS (acetolactate synthase) herbicide resistance (HR) have been released in many countries since 2001, and several weedy rice populations with tolerance to herbicide have been reported recently [15,16,17]. In our weedy rice collection, a total of 11 non-synonymous SNPs were found within ALS in 52 Latin American weedy rice accessions, mostly from Brazil (51, including 9 indica type, 4 aus type, and 38 indica-aus type) (Additional file 1: Table S3; Additional file 2: Fig. S9a). Most HR cultivars in Brazil are indica. Of the 11 ALS mutations observed in Latin American weeds, 3 functional mutations (Ala122Thr, Ser653Asn, and Gly654Glu) have been employed in HR cultivars [18]. These results suggest that HR weedy rice has likely acquired resistance by crop-weed hybridization and adaptive introgression—i.e., escape of resistance alleles from HR cultivars—although the possibility of parallel HR evolution in weedy rice by mutational convergence cannot be ruled out with the present data.
Convergent in situ origins of weedy rice from local cultivars
Kinship analysis was carried out to assess geographical origins of weedy rice from each sampled region or country. With the exception of the USA, where weed strains were likely introduced from Asia [2, 19], most weedy rice worldwide appears to have originated from local cultivars or varieties grown in neighboring regions (Additional file 2: Fig. S10; Additional file 1: Table S4). For example, weed accessions from southern China (Jiangsu, Guangdong, and Zhejiang) were inferred to be closest to Chinese cultivated varieties; over half of weeds from northern China (Liaoning and Jilin) have highest kinship with the cultivars from the nearby Korean peninsula; and Japanese weeds show high kinship with cultivars from South Korea and Japan. Most weeds in Southeast and South Asia also show close relationships with cultivars from local or neighboring countries. In a parallel pattern, Italian weedy rice was inferred to be most closely related to European cultivars.
Notably, the kinship analysis further revealed multiple cases where individual formerly widely-grown cultivars have apparently given rise to the major contemporary weed strains in the region where the cultivar was once grown (Additional file 1: Table S5; examples shown in Fig. 3). For example, a total of 38 weeds from Liaoning in northern China showed highest kinship with a single widely grown twentieth century cultivar “Huk Zo” while 16 Japanese weeds showed highest kinship with “Ssal Byeo.” Both Huk Zo and Ssal Byeo are Korean landraces. For Malaysia, one variety “MR 84,” which was released in 1986 and widely planted during 1980s to 1990s, has a total of 12 weedy rice accessions with closest kinship relationship. In China, one cultivar, “Nanjing11,” was found to be closest to 27 weed strains from South China (Jiangsu, Zhejiang, and Guangdong). This cultivar was bred around 40 years ago in Nanjing, Jiangsu Province; it remained one of the most popular indica cultivars and was broadly cultivated throughout South China until about 15 years ago, when it was replaced by newer cultivars. These patterns suggest that a large proportion of Asian weed strains are descended from commercial cultivars that were widely grown in the twentieth century, as rice agriculture shifted from smallholder farms to industrialized production.
To further document this pattern, we collected parental pedigree accessions of Nanjing11 and re-sequenced their genomes (Fig. 3b). The phylogenetic tree confirmed the group of 27 weed strains has closer kinship with Nanjing11 than its pedigree accessions (e.g., EJA4), and the topology supported that the weedy rice group is likely to be derived from Nanjing11, not its parental lines before Green Revolution (e.g., GC13, SLX, and NTH) (Fig. 3c). Extrapolating from these results, we can estimate that more than 35% (27/75) of the current weedy rice strains in southern China (Jiangsu, Zhejiang, Guangdong) are likely descended from Green Revolution cultivars. Taken together, these results indicate that widely grown twentieth century cultivars that were developed during the Green Revolution have left a legacy of weedy rice infestations throughout Asia.
Non-domestication genomic regions for adaptation of weedy rice
To identify genomic regions with signatures of adaptive differentiation between weed strains and their inferred cultivated ancestors, genomic scans of differentiation were performed (Z(FST) > 3) (Additional file 2: Fig. S11). We further examined whether these significantly differentiated regions between weedy and cultivated rice overlap with domestication regions. Notably, a very low overlapping rate was observed for most weeds worldwide (the exceptions being regions of South and Southeast Asia where wild rice hybridization has led to adaptive introgression of wild alleles at domestication loci [11,12,13, 20]). An especially low rate, 2.1% (1.2–2.9%), was observed for japonica type weeds, while for indica type, the mean overlapping rate (excluding South and Southeast Asia) was 7.6% (3.7–18.7%) (Fig. 4a; Additional file 1: Table S6). In addition, we found that genes in regions not known to be associated with domestication show higher differentiation between weedy and cultivated rice, particularly for japonica type weeds (Fig. 4b, c); this indicates that after diverging from cultivated rice, natural selection may act more strongly on the genomic regions unrelated to domestication loci.
Consistent with this pattern of adaptive divergence, we found many mutations in weedy rice that were not observed in cultivated varieties (Additional file 2: Fig. S12). Novel variation in the ALS gene for herbicide resistance provides one such example. Among the 11 non-synonymous ALS SNPs identified in weed strains (described above), a mutation (Ala205Val) has not been previously reported in cultivated, wild, or weedy rice (Additional file 1: Table S3). An herbicide resistance assessment revealed that the weed accessions with this ALS mutation showed strong herbicide (Imazamox) resistance (Additional file 2: Fig. S9b). The presence of this mutation suggests that weed populations can evolve resistance through new spontaneous mutations.
De-domestication blocks under parallel evolution
Despite the independent and repeated origins for different weedy rice populations, we can find some shared genomic regions that are highly diverged from cultivated rice, indicating that these regions may underlie shared targets of selection in weed evolution (we refer to these as de-domestication “hot blocks,” to distinguish them from more localized hotspots). One of the most significantly differentiated regions is a 0.5-Mb de-domestication hot block from 6.0 to 6.5 Mb on Chromosome 7 in both japonica- and indica-type weeds (Fig. 5a). This region harbors multiple genes (e.g., Rc, RAL, and LtpL) with potential functions for environmental adaptation (Fig. 5b). For example, Rc pleiotropically controls both red pericarp and seed dormancy [21]. The red pigment in rice grains is caused by proanthocyanidins or condensed tannins, which could have deterrent effects on pathogens and predators [22]. Seed dormancy is a highly adaptive trait for weedy rice, as it enhances survival of seeds in the soil seed bank and allows seeds to persist in rice fields over multiple seasons [23]. Interestingly, a cluster of six RAL (seed allergenic protein) genes and three LtpL genes (encoding plant lipid transfer proteins that function in alpha amylase inhibition) are also located within this region. All nine genes harbor the protein domain PF00234 (Protease inhibitor/seed storage/LTP family). These genes are proposed to be involved in multiple roles, such as inhibiting the growth of fungal and bacterial pathogens and facilitating adaptation of plants to various environmental conditions [24], which may protect the weedy rice seeds from pathogens and predators in paddy fields for years during their dormancy. For indica-type weeds, each of the RALs showed clear differentiation from those in their cultivated ancestors (FST > 0.4) (Additional file 2: Fig. S13). Our results thus suggest that the RAL genomic region has been a repeated target of selection in the evolution of weed strains from cultivated rice.
Another de-domestication hot block occurs in the 22.5–23.1-Mb genomic region of chromosome 7, which stands out as the highest peak when comparing all combined japonica weedy rice with cultivars, and also with a Z-value of ~ 4 for each japonica-type weedy rice population compared to japonica cultivated rice (Fig. 5a; Additional file 2: Fig. S11). Within this region resides a gene encoding B3 domain containing transcription factor GD1, which participates in regulating GA and carbohydrate homeostasis, and further regulates rice seed germination and seedling development [25]. The phylogeny of GD1 clearly shows that most japonica weedy rice strains are separated from the group of japonica cultivated rice, which is consistent with the haplotype of this gene (Fig. 5c). In addition, we found one non-synonymous SNP on the last exon, and the allele frequency is markedly different between japonica weedy (0.85) and cultivated (0.12) rice. The results above suggest that the seed germination-related gene is under potential parallel evolution among different japonica weedy rice populations and may play a crucial role for the distinct germination behavior of weedy rice compared to cultivated rice in rice fields.