Dynamic CHH methylation during embryogenesis and germination
To better understand the dynamics of DNA methylation variation through the plant life cycle, we compared single-base resolution methylomes of seeds at embryogenesis and germination stages in Arabidopsis (Additional file 1: Table S1). Germination methylomes were generated from Col-0 dry seeds and seedlings at 0–4 days after imbibition for 4 days (DAI) by MethylC-seq [26, 27]. These data were compared with publicly available methylomes of Ws-0 developing seeds from globular stage (4 days after pollination [DAP]), linear cotyledon stage (8 DAP), mature green stage (13 DAP), post-mature green stage (18 DAP), and dry seed (Ws-0), leaf [28], flower bud [26], microspore [17], sperm [19], vegetative nucleus [19], hand dissected embryo and endosperm (mid-torpedo to early-maturation stage; 7–9 DAP) [22], and columella root cap [29].
Global methylation analysis revealed that mCG and mCHG were most stable throughout seed development (Fig. 1a). Dry seed global mCHH levels (~3%) were twofold higher than globular and linear cotyledon stage mCHH levels (~1%). These results are consistent with active MET1, CMT3, and RdDM pathways during embryogenesis [23]. Hypermethylation was observed in all sequence contexts from post-maturation to dry stages indicating that RdDM, rather than MET1 or CMT3, is still active during desiccation until dormancy, since cell division and DNA replication do not take place at these stages.
A striking feature observed for the Col-0 dry seed methylome was the extensive hyper mCHH (Fig. 1a, Additional file 2: Figure S1). In fact, mCHH levels in dry seeds were higher than mCHH levels in all other tissues and cells, except for columella root cap. mCG and mCHG levels in dry seeds were similar to those in leaf, but lower than those of flower bud, sperm, and columella root cap. Interestingly, we observed that mC levels in all contexts were higher in 0 DAI seeds that were imbibed and stratified for four days, than in dry seed, suggesting that RdDM is active during stratification even at 4 °C. mC levels in all contexts dropped at 1 DAI. A decrease in the mCHH level continued until 4 DAI where the level was even more reduced than found in rosette leaves. After 1 DAI, the mCG level increased while the mCHG level marginally decreased.
The distribution of mC along chromosomes was analyzed in 100 kb bins (Fig. 1b). mC was enriched in all sequence contexts at centromeres and peri-centromeres, although mCG was also broadly distributed in the chromosome arms. The gain and subsequent loss of mC during seed development and germination, respectively, occurred within these regions.
Dynamic DNA methylation change occur within TEs
To examine local DNA methylation changes, we identified seed-development-related (sdev) differentially methylated regions (DMRs) and germination-related (germin) DMRs by combining differentially methylated cytosine sites within 100 bp using the methylpy pipeline [30]. Sdev DMRs were called from comparison between Ws-0 methylomes of developing seed at globular stage, linear cotyledon stage, mature stage, post mature green stage, and dry seed. Germin DMRs were called from comparison between Col-0 methylomes of dry seed and germinating seed at 0-4 DAI. We found 25,343 sdev DMRs and 166,441 germin DMRs in total (Additional file 3: Table S2). Over 95% of DMRs were CHH DMRs, whereas no germin-CG DMRs were identified that met our criteria. Sdev-CHH DMRs and germin-CHH DMRs covered 8.3 Mb (7%) and 18 Mb (15%) of the reference genome, respectively (Fig. 2c and e). Whereas sdev-CG, sdev-CHG, and germin-CHG DMRs covered less than 0.1% of the reference genome (Fig. 2a, b, and d). Overall, mCG levels within sdev-CG DMRs decreased during seed development, but mCHG and mCHH levels within sdev-CHG and sdev-CHH DMRs increased as maturation proceeded (Fig. 2a–c). mCHH levels within germin-CHH DMRs were higher in 0 DAI seed than in dry seed (Additional file 4: Table S3; Wilcoxon rank sum test: p = 0), suggesting that these DMRs were further methylated during stratification (Fig. 2e). Then mCHG and mCHH levels within germin-CHG and germin-CHH DMRs during 0–3 DAI and during 0–4 DAI, respectively (Fig. 2d and e, Additional file 4: Table S3; Wilcoxon rank sum test: p < 0.05). We next examined genomic features overlapping with DMRs (Fig. 2f). We found that 60% of sdev-CG DMRs overlapped with protein-coding genes and 10% overlapped with TEs, whereas 19% of sdev-CHG DMRs overlapped with protein-coding genes and 44% with TEs. Finally, 73% of sdev-CHH DMRs overlapped with TEs while similar level, germin-CHG DMRs (60%) and germin-CHH DMRs (74%) overlapped with TEs, respectively.
Twenty-eight sdev-CHG DMRs and germin-CHG DMRs overlapped (permutation test: p < 0.001), whereas 82% (19,159) of sdev-CHH DMRs overlapped with germin-CHH DMRs (permutation test: p < 0.001) (Fig. 2g–i). Discrepancy in the number of sdev and germin DMRs is likely a consequence of the different accessions used to analyze seed development (Ws-0; from public database) and germination (Col-0; our study), due to following observations. First, Ws-0 seed development methylomes had no data (sequence reads) for 23,500 germin-CHH DMRs, even though Ws-0 methylomes (×24 ~ ×31 per strand) had higher coverage than Col-0 (×5 ~ ×9 per strand) methylomes, suggesting that these regions are absent from Ws-0 genome. Second, mCHH levels within sdev-specific and germin-specific CHH DMRs in Ws-0 dry seed and Col-0 dry seed differed more than those within sdev-common and germin-common CHH DMRs, suggesting these sdev-specific and germin-specific CHH DMRs are accession specific (Additional file 2: Figure S2). Nevertheless, we observed that mCHH levels within germin-specific CHH DMRs increased during seed development in Ws-0 and mCHH levels within sdev-specific CHH DMRs decreased during germination in Col-0 (Additional file 2: Figure S2). Again, virtually all sdev-CHH DMRs showed increasing mCHH levels toward maturation, whereas germin-CHH DMRs showed decreasing mCHH levels during germination (Fig. 2j and k). Collectively, the mCHH gained within TEs during seed development was lost during germination.
To examine whether DMRs affect the expression of nearby genes, we performed messenger RNA sequencing (mRNA-seq) analysis for dry seed and seeds/seedlings at 0, 1, and 2 DAI (Additional file 5: Table S4). As germination proceeded, more genes were expressed (FPKM > 1; Additional file 5: Table S4). Germination-expressed genes were classified into ten clusters based on their level of expression (Additional file 2: Figure S3A). Genes in clusters 5 and 9 were induced during the germination period. Twenty-seven percent (837/3144) and 25% (369/1485) of genes in clusters 5 and 9 were associated with germin-CHH DMRs, whereas 23% (4791/20,836) of all expressed genes were associated with germin-CHH DMRs (Additional file 2: Figure S3B and Additional file 6: Table S5). Therefore, germin-CHH DMRs were slightly enriched nearby germination-regulated genes in clusters 5 and 9 (Additional file 2: Figure S3B; fold enrichment: 1.2 and 1.1; one-sided Fisher’s exact test: p = 1.3e-07 and 0.043, respectively), compared with all expressed genes. This suggests that hypermethylation during seed development and hypomethylation during germination are at least partially associated with germination-related gene expression.
RdDM and CMT2 pathways are active during seed development
To elucidate the pathway responsible for TE hypermethylation during seed development, we compared dry seed methylomes from wild-type (WT) (Col-0), drm1 drm2 cmt3 (ddc) triple mutants [31], and drm1 drm2 cmt2 cmt3 (ddcc) quadruple mutants [8] (Fig. 3a–c). MET1 transcripts, CMT3 transcripts, DRM2 transcripts, and their products are abundant in developing embryos, whereas only a marginal level of CMT2 expression was observed [23]. Therefore, only RdDM is thought to be responsible for mCHH hypermethylation during embryogenesis. Whereas mCG levels within TEs mildly decreased in ddc and ddcc mutants (Wilcoxon rank sum test: p = 2.6e-38 and 2.5e-180, respectively), mCHG and mCHH levels drastically decreased, compared with Col-0 (Wilcoxon rank sum test: p = 0 for all comparisons). Interestingly, ddcc had lower mC levels within TEs in all contexts compared with ddc (Additional file 2: Figure S4; Wilcoxon rank sum test: p = 1.7e-38, 8.0e-205 and 0 for mCG, mCHG and mCHH, respectively). Indeed, we observed TEs substantially retain high mCHH levels in ddc triple mutants that are lost in ddcc quadruple mutants (Fig. 3d), suggesting CMT2 activity during seed development, in contrast to the previous report [23].
Next, we compared fluctuation in mCHH levels across the bodies of TEs in dry seeds of WT and mutant plants. To clarify the contribution of each pathway to TE methylation during seed development, we considered RdDM-targeted TEs and CMT2-targeted TEs (Fig. 3e and f). RdDM-targeted TEs and CMT2-targeted TEs were designated as TEs affected in drm1 drm2 and in cmt2 in leaf, respectively [32]. Although the overall methylation patterns along TE bodies in the embryo at mid-torpedo to early-maturation stage and dry seed were similar, hypermethylation of TEs was clearly evident in dry seed methylomes. The edges of CMT2-targeted TEs have sharp peaks of mCHH due to RdDM [7]. These peaks were pronounced in both embryo and dry seed, compared with leaf, indicating enhanced RdDM activity in these tissues (Fig. 3f). mCHH levels within RdDM-targeted TE bodies dropped to the same levels outside TE bodies and it was completely lost in ddc and ddcc (Fig. 3e). mCHH levels within CMT2-targeted TE bodies decreased in ddc, but substantial mCHH remained (Fig. 3f). mCHH peaks at the edge of CMT2-targeted TEs disappeared in ddc dry seeds. In contrast, ddcc dry seeds lose mCHH within CMT2-targeted TEs. Therefore, our data clearly show that CMT2 as well as RdDM is required for DNA methylation during seed development.
Dry seeds store substantial levels of RNA transcripts for components of DNA methylation in the RdDM pathway, including DRM2 (Fig. 4). In contrast, almost no transcripts for components of DNA methylation maintenance, small interfering RNA (siRNA) biogenesis or heterochromatin formation were detected in dry seed, although these genes are expressed during seed development, at least until mature green embryo stage (Fig. 4 and Additional file 2: Figure S5). This suggests that MET1, CMT3, CMT2 pathways, and siRNA biogenesis pathway are active only before desiccation, but DRM2 is active throughout seed development including the desiccation stage.
Global demethylation during germination does not depend on DNA demethylases
DME, a DNA demethylase, is responsible for local DNA demethylation in the pollen vegetative nucleus and endosperm central cells [19]. These demethylation events occur in companion cells and are involved in genomic imprinting and transposon silencing in the neighboring gamete cells [10, 19, 33]. To examine the possible involvement of DNA demethylases in global demethylation during germination, we compared the methylation levels within TEs of germinating seeds/seedlings of WT (Col-0) and ros1 dml2 dml3 (rdd) triple demethylase mutant plants [12] (Additional file 2: Figure S5). At all time points, mCG and mCHG levels within RdDM-targeted TEs were slightly higher in rdd than in WT, whereas mCHH levels within RdDM-target TEs and mCG, mCHG and mCHH levels within CMT2-targeted TEs were slightly higher in WT than in rdd (Fig. 5, Additional file 4: Table S3; Wilcoxon rank sum test p = 2.9e-03 ~ 6.7e-278). Overall, Col-0 and rdd showed similar methylation level changes (Fig. 5). Germinating seed (0 DAI and 1 DAI) methylation levels, in all sequence contexts, were slightly higher and lower than found in dry seed, respectively. mCG levels within RdDM-targeted TEs were slightly re-elevated to the similar levels in dry seed between 2 and 4 DAI. In contrast, mCG levels within CMT2-targeted TEs marginally but further decreased between 2 and 4 DAI. mCHG and mCHH levels within both RdDM-targeted TEs and CMT2-targeted TEs decreased during germination. Remarkably, more than half of all mCHH sites within both RdDM-targeted TEs and CMT2-targeted TEs were lost in the period from germination until 4 DAI. These results indicate that ROS1, DML2, or DML3 are not involved in global demethylation during germination. Indeed, ROS1 and DML2 are very weakly expressed while DML3 is not express during germination (Fig. 4). Rather, this global demethylation likely occurs in a passive manner by methylation dilution promoted by cell division, as suggested by the enrichment of cell division related genes in germination-related genes (clusters 5 and 9 in Additional file 2: Figure S3 and Additional file 7: Table S6). Relatively stable mCG and mCHG levels and dynamic reduction of mCHH levels suggest that CG maintenance by MET1 and CHG maintenance by CMT3 are active, whereas RdDM and CMT2 pathways for mCHH establishment and maintenance are not fully active during germination.
Next, we examined the mCHH pattern changes across TEs during germination (Fig. 6). Col-0 and rdd dry seeds showed slightly different mCHH patterns across RdDM-targeted TEs (Fig. 3e). Compared with WT, mCHH levels dropped near the center of RdDM-targeted TE bodies in rdd mutants. However, similar mCHH patterns were observed within RdDM-targeted TEs in 4 DAI WT (Col-0) and rdd plants, suggesting that reconfiguration could reset aberrant mCHH patterns caused by loss of DNA demethylases (Fig. 6a and b). Although the distribution of mCHH within CMT2-targeted TEs was similar in WT and rdd dry seeds, Col-0 TEs showed higher mCHH levels (Fig. 3f). Both Col-0 and rdd had mCHH peaks at the edges of CMT2-targeted TEs. However, peaks at the edges of CMT2-targeted TEs, a consequence of RdDM (Fig. 3f), become less pronounced at 3 DAI in both in Col-0 and rdd (Fig. 6c and d), indicating that the rate of mC loss was slower inside TE bodies than at the edges of TE bodies. Since global demethylation is likely passive, this suggests that CMT2 activity started to recover at this stage, whereas RdDM must still be inactive. Indeed, CMT2 expression initiated at 1 DAI, but siRNA biogenesis components expression stayed low even at 2 DAI, whereas DRM2 was expressed at a steady level (Fig. 4).
Collectively, our data suggest a global passive demethylation reprograms CHH hypermethylation in the dry seed during the four days post-germination period.
ROS1 is active in developing seed during late embryogenesis
Overall, active methylation occurs during embryogenesis and passive demethylation occurs during germination. However, mCG levels within sdev-CG DMRs decreased during seed development, especially between mature and post-mature stages (Fig. 2a; Wilcoxon rank sum test: p = 1.7E-19). Nearly 60% of CG DMRs overlapped with genes. mCG in gene bodies, so-called gene body methylation (gbM), is stable because mCG is maintained by MET1 DNA methylase during DNA replication. Since cell division does not occur in the mature stage embryo, we hypothesized that mCG hypomethylation within sdev-CG DMRs was caused by active demethylation. RNA sequencing (RNA-seq) revealed the presence of ROS1 transcripts, but low or absent expression of DME, DML2, DML3 transcripts in dry seeds, suggesting that ROS1 is active during late embryogenesis (Fig. 4). We compared mCG levels in dry seed of Col-0 and rdd within sdev-CG DMRs. CG hypomethylation within sdev-CG DMRs was retained in dry seed of Col-0, but not in rdd. mCG levels in dry seed of rdd were higher than in dry seed of Col-0 (rdd - Col-0 > 0.2) in 75% (97/130) of sdev-CG DMRs (in both replicates) (Fig. 7a and b). It is unclear whether ROS1 is active throughout seed development, but our data showed ROS1 expression and activity in developing seed, at least in the late stage of embryogenesis, generates sdev-CG DMRs.
rdd seeds show increased methylation at endosperm-specific hyper-DMRs
DME and ROS1 are closely related DNA demethylases, but they are active at distinct sites, even in developing seeds. DME locally demethylates TEs in the endosperm and demethylated TEs are transcribed, leading to siRNA production [19]. These siRNAs are hypothesized to be transported to the embryo and reinforce TE methylation in the embryo. We compared methylomes in dry seed of Col-0, rdd, ddc, and ddcc and in embryo and endosperm at mid-torpedo to early-maturation stage of Col-0. We identified 44,554 DMRs in all contexts (C-DMRs) among these methylomes (Additional file 4: Table S3). Among these, we found 194 endosperm-specific hyper-DMRs (endo-DMRs) that were methylated in the endosperm but not in the embryo or in dry seeds of Col-0 (Fig. 8). Hierarchical clustering based on differences in DNA methylation levels classified endo-DMRs into 11 clusters (Fig. 8b). Methylation levels within endo-DMRs in clusters 1, 2, 3, 6, 8, 10, and 11 were increased in dry seed of rdd, compared with in dry seed of Col-0, suggesting that ROS1 is required to demethylate these regions during seed development.