MA experiments and whole-genome sequencing
We conducted long-term MA experiments on A. thaliana in both single-seed descent lineages and populations grown under Control (day 23 °C / night 18 °C), Heat (day 32 °C / night 27 °C), and Warming (day 28 °C / night 23 °C) conditions (Fig. 1a, b) (see “Methods”). The elevated temperature treatments, especially Heat (32 °C), resulted in various stress symptoms such as significantly decreased leaf size, shorter siliques (Fig. 1c, d), and shorter generation times. We sequenced 35 A. thaliana genomes, including 15 plants from MA lines at generation 10 (G10; five plants from each treatment) and 15 plants from MA populations [five plants each from G16 (Control, A16), G19 (Warming, C19), and G22 (Heat, B22)], spanning 10–22 successive generations, as well as their ancestor genomes (five individual plants from G0). In total, approximately 165 Gb of clean reads (30 libraries) from 30 genomes of progeny, and 25 Gb of clean reads from five libraries (see “Methods”) representing the genetic background of the ancestor (Additional file 1: Table S1) were obtained. For all MA lines and populations, an average of 99.68% of sequenced reads was mapped to the A. thaliana reference genome, with average depths of 52.5×, 49.7×, 47.4×, 42.7×, 37.4×, and 36.3× per individual in D10 (Control), E10 (Heat), F10 (Warming), A16 (Control), B22 (Heat), and C19 (Warming), respectively. Accordingly, an average of 116 Mb (96.9%) of the reference genome was accessible for variant calling (Additional file 1: Table S1). To obtain sufficient coverage of the genetic background of the ancestor, the five G0 libraries (average coverage 37.1×) were combined. This sequencing depth/coverage and number of accessible reference sites allowed for precise detection of mutations at the whole-genome level.
Accumulated mutations and mutation rates in MA lines and populations under elevated temperatures
We obtained a total of 211 homozygous de novo mutations from MA lines under three temperature treatments (Fig. 2a and Additional file 1: Tables S2-S3), including 39 mutations (31 single-nucleotide variants [SNVs] and 8 indels) in D10 (Control), 98 mutations (69 SNVs and 29 indels) in E10 (Heat), and 74 mutations (54 SNVs and 20 indels) in F10 (Warming). Most (85.9%) of the 57 indels in the MA lines were short (1–3 bp) deletions (dels) and insertions (ins) (Fig. 2c). Furthermore, the indels of E10 (25 dels vs. 4 ins) and F10 (17 dels vs. 3 ins) showed strong biases toward dels. In addition, we also detected 376 homozygous de novo mutations in MA populations, including 70 mutations (60 SNVs and 10 indels) in A16 (Control), 183 mutations (130 SNVs and 53 indels) in B22 (Heat), and 123 mutations (88 SNVs and 35 indels) in C19 (Warming) (Fig. 2b and Additional file 1: Tables S2-S3). Similar to the indels identified from MA lines, most indels in the MA populations were short (1–3 bp) and biased toward dels in B22 (42 dels vs. 11 ins) and C19 (22 dels vs. 13 ins) (Fig. 2d). Moreover, we found no novel transposable element (TE) insertion event in any MA line or population.
We further estimated the accuracy of the mutation calling pipelines using two simulation tests [14, 35]. For the first test, we simulated 600 random SNVs using six copies of reference genomes (see “Methods”). After read mapping and SNV filtering against the mutated reference genomes, our pipeline recovered 588 (98%) of 600 expected SNVs (Additional file 1: Table S4). For the second simulation test, we introduced homozygous SNVs and performed heterozygous SNV filtering, resulting in the recovery of 71–91% homozygous SNVs (Additional file 1: Table S5). To confirm our mutation calls, we experimentally examined all SNVs and indels from MA lines by Sanger sequencing. In total, 205 of 211 mutations were confirmed (six mutations were identified as PCR failures) (Additional file 1: Table S6).
We estimated the SNV mutation rate (μSNV) and indel mutation rate (μindel) per site per generation in the MA lines. Mutigenerational growth of A. thaliana under heat conditions caused significant increases relative to Control D in the average rates of SNVs [μE-SNV = 1.18 (± 0.09) × 10− 8 vs. μD-SNV = 5.28 (± 0.95) × 10− 9; two-sample t test, p = 1.0 × 10− 3] and indels [μE-indel = 4.94 (± 0.50) × 10− 9 vs. μD-indel = 1.36 (± 0.43) × 10− 9, p = 6.3 × 10− 4; Fig. 3a]. Similarly, Warming also increased the SNV [μF-SNV = 9.21 (± 0.68) × 10− 9, p = 9.9 × 10− 3] and indel [μF-indel = 3.41 (± 0.71) × 10− 9, p = 0.04] mutation rates. Furthermore, the mutation rates of dels were more than 5-fold higher than those of ins in both Heat E [μE-del = 4.26 (± 0.47) × 10− 9 vs. μE-ins = 6.82 (± 1.70) × 10− 10] and Warming F [μF-del = 2.90 (± 0.88) × 10− 9 vs. μF-ins = 5.12 (± 3.41) × 10− 10], in contrast to the lack of difference observed in Control D. The overall MA mutation rates (SNVs and indels) of the Heat and Warming lines were 1.67 (± 0.06) × 10− 8 (μE-total) and 1.26 (± 0.13) × 10− 8 (μF-total) per site per generation, approximately 2.5-fold (p = 8.6 × 10− 6) and 1.9-fold (p = 4 × 10− 3) higher than the Control [μD-total = 6.65 (± 0.83) × 10− 9], respectively (Fig. 3a). In addition, we observed significantly higher rates of total mutations and SNVs in Heat E compared to Warming F (p < 0.05), whereas the difference in indel rates was not significant (p = 0.1).
In parallel, we also estimated the average mutation rates in MA populations (Fig. 3b). The plants grown under Heat conditions had significantly higher SNV rates than Control plants, such as μB-SNV (Heat) = 1.03 (± 0.06) × 10− 8 vs. μA-SNV (Control) = 6.53 (± 0.76) × 10− 9 (p = 1.6 × 10− 4), whereas no significant difference was observed between Warming (μC-SNV = 8.08 (± 0.63) × 10− 9) and Control conditions (p = 0.2). However, the total mutation rates of Heat B and Warming C were 1.45 (± 0.09) × 10− 8 (μB-total) and 1.13 (± 0.09) × 10− 9 (μC-total), nearly 2.0- and 1.5-fold (p < 0.05) higher than in Control A [μA-total = 7.61 (± 0.08) × 10− 9], respectively (Fig. 3b). Additionally, compared to Warming C, Heat B had significantly higher total mutation and SNV rates (p < 0.05). However, the SNV rates and total mutation rates were lower in both the Heat and Warming populations than in the MA lines under elevated temperatures.
Molecular spectra of mutations in A. thaliana under elevated temperatures
Base substitution mutation spectra varied after multigenerational growth of A. thaliana under Heat, Warming, and Control conditions. We found a strong C:G → T:A bias (driven by C → T and G → A) in six mutational spectra that commonly occurred in MA lines under all three temperature treatments (Fig. 3c); however, C:G → T:A mutations under elevated temperatures (Heat and Warming) had much higher rates compared to Control. Furthermore, compared to Heat E (μE-C:G → T:A = 1.47 × 10− 8 per site per generation), Warming F exhibited a lower C:G → T:A mutation rate (μF-C:G → T:A = 1.23 × 10− 8). In addition, in Heat E and Warming F, the second most frequent substitution was A:T → T:A (mutation rate, μE-A:T → T:A = 3.20 × 10− 9); however, this differed from Control D, in which the second most frequent substitution was A:T → G:C (μE-A:T → G:C = 2.39 × 10− 9). In general, the mean rate of mutations occurring at C:G sites was nearly 3-fold higher than at A:T sites in Heat E and Warming F (Fig. 3c), in contrast to ~ 2-fold in Control D. In MA populations, we observed similar results under Heat and Warming (Fig. 3d); for example, the most frequent substitutions in Heat B and Warming C were also biased toward C:G → T:A (μB-C:G → T:A = 1.50 × 10− 8; μC-C:G → T:A = 1.54 × 10− 8) and were higher than in Control A (μA-C:G → T:A = 1.01 × 10− 8). The second most frequent substitutions (mutation rate) in Heat B occurred at A:T → T:A (μB-A:T → T:A = 2.37 × 10− 9) sites, similar to Heat MA lines. By contrast, the second most frequent substitutions in Warming C were A:T → G:C (μC-A:T → G:C = 2.16 × 10− 9), somewhat different from Warming MA lines.
We calculated the transition and transversion frequencies (per genome per generation) for the three treatments. In MA lines, the transition (Ts) and transversion (Tv) frequencies in Heat E (0.84 and 0.54, respectively) and Warming F (0.68 and 0.40, respectively) showed obvious increases compared to Control D (0.46 and 0.16, respectively; Fig. 3e,f), resulting in significantly decreased Ts/Tv ratios at both elevated temperatures (Fig. 3g and Additional file 1: Table S7). Moreover, compared to Heat E, Warming F had a higher Ts/Tv ratio, which can be attributed to its lower frequencies of Ts and Tv. The Ts/Tv ratios were higher in Heat and Warming MA populations than in the MA lines (Fig. 3g). Within MA populations, the Ts/Tv ratios were significantly decreased in Heat B (1.83) compared to Control A (2.53; Fig. 3g). Nevertheless, the Warming population showed a high Ts/Tv ratio due to its higher transition and lower transversion rates relative to the Heat and Control populations.
Mutation frequency distribution across different genomic regions in A. thaliana under elevated temperatures
We annotated the mutations and estimated their frequencies across different genomic regions in MA lines and populations. All MA lines showed higher mutation frequencies in intergenic regions than in genic regions. Heat E and Warming F showed > 50% increases in mutation frequencies in both genic (1.4–2.2-fold increases) and intergenic (0.5–1-fold increases) regions compared to Control D (p < 0.05; Fig. 4a; Additional file 1: Table S8A). Notably, within genic regions of Heat E and Warming F, higher mutation frequencies occurred in coding regions than in noncoding regions, different from Control D. The predominance of variants in coding regions of Heat E and Warming F was attributed to the disproportionate occurrence of nonsynonymous mutations (Additional file 1: Table S8A). For example, the nonsynonymous mutations in Heat E (0.26) were significantly more frequent than in Control D (0.02; p < 0.05). In addition, Heat E showed higher mutation frequencies in intergenic and genic regions than Warming F, but this difference was not significant (p > 0.05). In noncoding regions, the frequencies of intronic and untranslated region (UTR) mutations were highest in Heat E. Interestingly, more mutations occurred in transposable elements (TEs) under the Heat treatment, with a significantly increased frequency in Heat E compared to Control D (p = 0.02). Finally, we calculated SNV rates within intergenic, genic, and TE regions (Fig. 4a), all of which increased with temperature.
In MA populations, we found that the mutation frequencies of intergenic regions and TEs in Heat B were significantly higher than those in Control A (p < 0.01; Fig. 4b). By contrast, the frequency of nonsynonymous mutations in Heat B (0.07) was lower than that in Warming C (0.18) (Additional file 1: Table S8B). Consistently, the mutation frequency of coding regions in Heat B (0.13) was lower than those in Warming C (0.26) (Fig. 4b). Notably, the mutation frequencies of coding regions in Heat and Warming populations were also lower than in the Heat and Warming lines, with a significantly lower frequency of nonsynonymous mutations observed in Heat B population (0.07) than in the Heat E lines (0.26) (Additional file 1: Table S8B); this indicates the stronger selection effects for nonsynonymous mutations in MA populations at high temperatures. To further investigate the selection effects on MA populations, we used the KaKs calculator to determine the ratio of nonsynonymous to synonymous substitutions (Ka/Ks ratio). The Heat E lines had a Ka/Ks ratio of 0.92, whereas the Heat B population had a Ka/Ks ratio > 1 (1.51), suggesting that the Heat MA population had been subjected to positive selection.
Nonsynonymous SNVs, gains and losses of stop codons, and indels within coding regions are likely to affect fitness [36]. Therefore, we estimated the rates of diploid genomic mutations affecting fitness under different treatments. These rates were 0.48 (± 0.1) and 0.36 (± 0.2) per generation in Heat E and Warming F, respectively, and these values were higher than those in Control D (0.13 ± 0.1; Heat vs. Control, p = 0.005; Warming vs. Control, p = 0.16). Similarly, genomic mutation rates affecting fitness were significantly higher in Heat B (0.16) and Warming C (0.24) than in Control A (0.05; p < 0.003). In addition, genomic mutation rates affecting fitness were lower in MA populations than MA lines.
Mutations in functional genes of A. thaliana under elevated temperatures
To investigate the accumulated mutations in functional genes that may be involved in various biological processes underlying high-temperature responses, we performed Gene Ontology (GO) functional analysis of 29, 46, and 55 genes with mutations from the Control, Warming, and Heat MA lines and populations, respectively (Additional file 1: Table S3). We found that these genes were enriched in multiple related terms, including the cellular process, metabolite process, cell part, membrane, binding, and catalytic activity (Fig. 5a). In contrast, elevated temperatures resulted in the enrichment of more genes associated with the “response to stimulus,” “reproductive process,” “development process,” and “biological regulation” terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) functional analysis showed enrichment of common pathways, including “signaling transduction,” “development,” and “replication and repair” at elevated temperatures (Additional file 2: Fig. S1).
To further determine whether these mutations occurred at genes involved in the transcriptional response to heat and warming, we used a previously obtained RNA-seq dataset to identify potential temperature-responsive (significantly differentially expressed) transcripts among the Heat, Warming, and Control treatments (see “Methods”). Interestingly, 9 (16%) of 55 genes from Heat MA samples showed significantly differential expression between the Control and Heat treatments, and 10 (22%) of 46 genes from Warming MA samples were differentially expressed between Control and Warming (Fig. 5b). In particular, mutations occurred in two genes encoding heat-shock protein 70-17 (HSP70-17) and heat stress transcription factor A-1a (HSFA1A), which were upregulated under Heat treatment and were identified in Heat E and B, respectively.
We further focused on genes with nonsynonymous, frameshift, stop-gain SNVs, or indels (Fig. 5c). In Heat lines, in addition to HSP70-17, described above, a nonsynonymous mutation was found in a gene encoding fumarate hydratase 2 (FUM2), which is associated with respiratory metabolism. Interestingly, a mutation occurred in the gene encoding a proliferating cell nuclear antigen (PCNA) domain-containing protein (AT4G17760), which is associated with DNA repair. Moreover, a frameshift del and a nonsynonymous SNV in the defense-related (i.e., disease resistance) protein Toll interleukin 1 receptor-like nucleotide-binding leucine-rich repeat (TIR-NB-LRR; AT5G48770 and AT4G10780) were identified in the Heat E and warming F lines, respectively. In contrast to the accumulated mutations in MA lines, we found many exonic mutations distributed in genes associated with development and signal transduction, such as those encoding the calcium-dependent lipid-binding family protein (AT1G48090) and WD40 repeat-like superfamily protein (AT3G54190), in MA populations (Fig. 5c). Notably, the mutation distribution patterns among all individuals differed significantly between MA populations and MA lines. For example, some mutations within a MA population were shared by different individuals, whereas MA line mutations were scattered widely among individuals (Fig. 5c, Additional file 1: Table S3); these results demonstrated the distinct mutation landscapes of MA populations and MA lines. Given the experimental design applied to MA populations, we speculate that these common exonic mutations probably originated from a parental individual in the same generation (not the ancestor plant), suggesting that some genetic variants are more likely to spread in populations under selective pressure over multiple generations.
Interaction between methylation and TE annotation
We conducted whole-genome bisulfite sequencing of MA lines and identified more methylated cytosines (mCs) in Heat E (10.54%) and Warming F (10.44%) than in Control D (9.78%; Additional file 1: Table S9). mCs in CG, CHG, and CHH (where H refers to A, T, or G) contexts are summarized in Supplemental Table 8. Spontaneous deamination of methylated cytosine (mC) to thymine is known to be a major source of mutations, resulting in elevated mutation rates at methylated sites [37]. We thus focused on mutations in the Control, Heat, and Warming MA lines in three contexts methylated and nonmethylated contexts. In the Heat treatment, the proportions of methylation at mutated bases were much greater than the genome-wide occurrence of methylation in the CG (Fisher’s exact test, p = 4.58 × 10–8), CHG (Fisher’s exact test, p = 1.92 × 10–21), and CHH (Fisher’s exact test, p = 1.63 × 10–3) contexts (Fig. 6b). High frequencies of methylation at mutation sites were also found in the Warming (Fisher’s exact test: CG, p = 3.36 × 10–4; CHG, p = 1.92 × 10–15; CHH, p = 0.02) and Control (Fisher’s exact test: CG, p = 3.56 × 10–12; CHG, p = 2.27 × 10–21; CHH, p = 0.08) treatments (Fig. 6a, c). Because methylation and TEs correlate significantly [35, 38], we further tested the main effects of methylation and TE position (two-way analysis) on mutation rates under elevated temperatures using a logistic regression model. The methylated sites and TE regions were associated positively with mutations in the Control, Heat, and Warming MA lines (Additional file 1: Table S10). In general, methylated sites within and outside TEs had higher mutation rates than did nonmethylated sites in MA lines (Additional file 2: Fig. S2). Compared with those in Control lines, methylated and nonmethylated sites in the Heat and Warming lines showed higher mutation rates regardless of location (within or outside TEs); Heat E had the highest mutation rate on methylated sites outside TEs (Fig. 6d). In addition, we observed that nonmethylated sites within TEs had a higher rate in Warming F than in Heat E, but this difference was not significant.
Mutational bias and context effects of A. thaliana under elevated temperatures
We estimated the relative contributions of various genomic properties to mutation frequency, including gene density and GC content. In both MA lines and populations, we found that fewer mutations in high versus low gene density regions in all three treatments (Fig. 7a). For MA lines, the mutation rate was significantly biased toward low versus high gene density region in Heat E (t test, p = 0.02; Fig. 7b) and Warming F (p = 0.03). By contrast, no significant difference in mutation rate of Control D was observed between the high and low gene density regions (p = 0.45). This result suggests that multigenerational exposure of A. thaliana to high temperatures accelerates the accumulation of DNA mutations toward low gene density region compared to plants under ambient (Control) temperature. We also estimated whether GC content (per 1-kb window) affected local mutation rates in the MA lines and populations of each treatment. For all MA lines and populations, the GC contents and observed mutation rates were not well correlated (Additional file 2: Fig. S3), suggesting that GC content did not affect the mutation rate in our MA experiments.
We evaluated the effect of local sequence context on the mutation rates of A/T and G/C positions flanked by different nucleotides at either site, regardless of DNA strand orientation (for example, AAG and its complement CTT both contribute to the mutation rate in the central AT position under the category AAG). As expected, GC bases had significantly higher mutation rates than did AT bases in all treatments (t test, p < 0.03) except Control D (p > 0.14; Fig. 7c–e). In general, mutation rates of AT bases in all contexts were uniform for each MA experiment (G test, p > 0.15), but mutation rates of GC bases were not (p < 0.01), except in the Warming treatments (Warming F, p = 0.98; Warming C, p = 0. 44). Moreover, the nucleotides located one position upstream or downstream had significant effects on the mutation rate in the Heat B population (t test, p < 3.32 × 10–4; Fig. 7d), whereas those in other MA lines and populations did not (p > 0.05). Of all 16 possible combinations of flanking nucleotides, GCG in Control A (two-tailed Z test, p = 5.56 × 10–13), GCG in Heat B (p = 7.40 × 10–14), and CCG in Warming C (p = 3.97 × 10–6) had significantly higher mutation rates than did other GC contexts. In contrast to MA populations, the Control D, Heat E, and Warming F lines showed significantly higher mutation rates in the CCC (p = 3.03 × 10–4), CCG (p = 0.02), and GCT (p = 6.24 × 10–5) contexts than in other GC contexts (Fig. 7g, h). However, the trinucleotides CCG (or GGC) and GCG (or CGC) appeared to have high mutation rates in all MA groups, regardless of temperature treatment. In addition, we observed that almost all indels within the Heat (E10 and B22), Warming (F10 and C19), and Control groups (D10 and A16) either occurred near simple repeats or involved tandem-repeat dels and ins (Additional file 1: Table S11), suggesting that the occurrence of indels is strongly biased toward repeat sequences in A. thaliana.
Comparison of de novo mutations with natural genetic variations
The 1001 Genomes Consortium (2016) reported 10,707,430 single-nucleotide polymorphisms (SNPs) and 1,424,879 indels (≤ 40 bp) in 1135 natural accessions of A. thaliana. To compare de novo mutations with natural variations, we merged the mutations from all MA lines and populations into 263 unique SNVs and 93 indels. We found that 64 (24%) of 263 total SNV sites coincide with biallelic SNPs in the 1001 Genomes dataset, and 50 (19% of the total) of these 64 shared SNVs are identical (Fig. 8). Among the 93 indels identified in MA experiments, 40 (43%) overlap with indels from the 1001 Genomes population, 12 (13% of the total) of which are identical. These identical sites (86% of shared SNVs, 75% of shared indels) are derived mainly from the Heat and Warming lines. Compared with the expected overlap (based on a random distribution of mutations and polymorphisms), the overlap between polymorphisms in all of our MA lines and populations with those of natural variants is highly significant (Fisher’s exact test: SNV, p = 2 × 10–24; indel, p = 1 × 10–12; Fig. 8). To determine whether the SNVs identified in our MA results were biased toward conserved or substitution sites in A. thaliana, we compared them with the 219,909 ancestral variants (SNPs occurring at substitution sites in A. thaliana) and 1,799,125 derived variants (SNPs occurring at conserved sites) from the 1001 Genomes biallelic SNP dataset (see “Methods”). Among all SNVs identified in our MA results, only four SNVs (1% of SNVs from Warming C and F and Heat E) overlap with ancestral variants and one SNV (from Heat B) is shared with derived variants, indicating a low frequency of de novo SNVs (identified under elevated temperatures) at the conserved sites.