RDR6-RdDM targets many transcriptionally active TEs
The switch from an epigenetically silenced state to transcriptional activation is known to trigger Pol II expression-dependent mechanisms of TE silencing such as RDR6-RdDM on the single-locus level [12]. To examine genome-wide methylation states of both active and inactive TEs, we generated a dataset containing whole-genome MethylC-seq of nine key RdDM mutant genotypes in the wild-type Columbia (wt Col) background as well as the same nine mutant genotypes in the ddm1 mutant background. TE transcription is globally reactivated in the ddm1 mutant (Additional file 1: Figure S1) [23], whereas the RdDM mutants that we investigated generally do not show TE transcriptional reactivation or at least not nearly as severe of a transcriptional reactivation compared to ddm1. For example, even in pol V mutants, which are defective for all RdDM [30], global TE activation is minimal compared to ddm1 (Additional file 1: Figure S1) [19, 22]. Therefore, in this study any genotype without ddm1 is referred to as the TE-silent context and our dataset distinguishes RdDM types in both the TE-silent context and the globally reactivated ddm1 TE-active context.
We determined that using only uniquely mapping sequencing reads resulted in reduced coverage of repetitive TE regions; however, sequencing coverage was high enough to assay RdDM dynamics of individual TE copies (see “Methods,” Additional file 2: Results, and Additional file 3: Figure S2). To identify the regions of the genome targeted by RDR6-RdDM (and contrast them to the regions regulated by Pol IV-RdDM), we identified differentially methylated regions (DMRs) between all of the genotypes (see “Methods”). Aligning the DMRs, we find that the average wt Col and rdr6 CHH methylation patterns are indistinguishable, demonstrating that RDR6-RdDM plays a minor genome-wide role in the TE-silent context (Fig. 1a, replicate data in Additional file 4: Figure S3A). In contrast, pol IV mutants lose methylation from the DMRs, confirming that Pol IV-RdDM functions to target CHH methylation on a genome-wide level in the TE-silent context (Fig. 1a) [22, 31]. In addition, we assayed the loss of methylation when both RDR6- and Pol IV-RdDM are lost (in pol IV rdr6 double mutants) and found that this methylation level is slightly reduced compared to the pol IV single mutant (Fig. 1a), demonstrating that RDR6-RdDM plays a minor role when Pol IV-RdDM is mutated (see section below on RdDM compensation). In the ddm1 TE-active context, the overall CHH methylation level is reduced compared to the TE-silent context (Fig. 1a, replicate data in Additional file 4: Figure S3A) [19]. In addition, the ddm1 rdr6 double mutant shows lower CHH methylation compared to the ddm1 single mutant (Fig. 1a, replicate data in Additional file 4: Figure S3A), demonstrating a genome-wide role for RDR6-RdDM when TEs are reactivated.
In both the TE-silent and ddm1 TE-active contexts, loss of CHH methylation in pol IV mutants is reduced near the edge of the DMR and less so in the center of the DMR (Fig. 1a). To determine if this loss is due to Pol IV-RdDM functioning specifically at edges of long DMRs or if this effect is due to Pol IV-RdDM’s preference for short TE targets [19], we investigated only DMRs over 2 kb. We found that in the TE-silent context Pol IV-RdDM functions preferentially on long DMR edges, as the CHH methylation in pol IV mutants is lost more at the edge compared to the center of a >2 kb DMR (Fig. 1b). At the same time, we found the peak of high CHH methylation at the DMR edge (compared to the body of the DMR) in wt Col and ddm1 is a function of small DMRs in our analysis, as when only DMRs >2 kb are assayed, the CHH methylation values in wt Col or ddm1 are consistent over the length of the entire DMR (compare Fig. 1a to 1b, replicate data in Additional file 4: Figure S3A, B). Therefore, at least in the TE-silent context, Pol IV-RdDM targets short DMRs as well as the edges of long DMRs.
A DMR is a computationally identified feature that may span multiple TEs and genes or which may be as short as 4 bp. We found that analysis of the alignment of CHH methylation states of annotated genomic features (such as genes or TEs) was more informative than an analysis of the arbitrary edges of DMRs. For genes, we find that there is low average CHH methylation that is unaltered by Pol IV- or RDR6-RdDM, and we confirm that Pol IV-RdDM is responsible for gene-flanking methylation [22, 32], while RDR6-RdDM does not act near genes (Fig. 1c). For TEs, similar to our findings with DMRs, we find that rdr6 shows a CHH methylation loss only in the ddm1 TE-active context but not the TE-silent context (Fig. 1d, replicate data in Additional file 4: Figure S3D). We also observed that loss of CHH methylation in ddm1 rdr6 mutants occurs not specifically at the edge (as with Pol IV-RdDM at TE edges, see Fig. 1d), but rather acts over the length of the entire long TE and mostly from the TE internal region (Fig. 1e, replicate data in Additional file 4: Figure S3E). Interestingly, in the TE-active context Pol IV-RdDM acts like RDR6-RdDM throughout the length of the entire >2 kb TE (Fig. 1e). We observed this differential role of Pol IV-RdDM with DMRs as well (Fig. 1b) and these data demonstrate that the function of Pol IV-RdDM to reinforce silencing at short TEs and TE edges expands to silencing TE internal body coding regions when TEs are activated. In addition, for TEs >2 kb we find that the pol IV rdr6 double mutant has lower CHH methylation levels compared to either the rdr6 or pol IV mutants in either the TE-silent or TE-active context (Fig. 1d, e). This demonstrates that the finding on the single-locus level that some TEs are subject to both Pol IV- and RDR6-RdDM to direct full TE CHH methylation [12, 16] is also true on the genome-wide level.
To assess the role of Pol II expression on RdDM dynamics, we focused our analysis on transcriptionally competent TEs by identifying elements with direct evidence of mRNA production in ddm1 mutant plants (see “Methods”). For this set of 2374 TEs (7.6 % of all TEs) in the TE-silent context, we find that RDR6-RdDM does not function and Pol IV-RdDM’s role is reduced and primarily contributes to the edges of long TEs (Fig. 1f, g, replicate data in Additional file 4: Figure S3F, G). When this set of TEs is specifically transcribed, we find that RDR6-RdDM plays a larger role in TE methylation compared to Pol IV-RdDM, and this is pronounced in the internal regions of long TEs. Therefore, we conclude that RDR6-RdDM targets transcriptionally active TEs on the genome-wide level.
Dataset capture of both Dicer-dependent and Dicer-independent RdDM
Recent data have demonstrated that RdDM can occur through a Dicer-independent mechanism by which either transcribed or processed un-Diced RNAs of ~30–40 nt are trimmed into various small RNA sizes including 21–24 nt siRNAs [33–36]. This Dicer-independent production of small RNAs was shown to occur on both Pol IV and Pol II derived transcripts. While Dicer-dependent production generates specific siRNA size classes, Dicer-independent siRNA production creates small RNAs of all sizes, known as small RNA laddering [35]. We aimed to remove all Dicer-dependent and Dicer-independent TE RdDM at the same time by using a pol IV rdr6 double mutant. The pol IV mutation abolishes Pol IV transcript accumulation upstream of Dicer-independent or Dicer-dependent siRNA production [33, 36]. Because we cannot mutate Pol II’s function without affecting essential non-RdDM networks, we mutated rdr6 to block the production of dsRNA from Pol II transcripts. By using siRNA laddering as a consequence of Dicer-independent siRNA production, we find that loci that undergo Pol II-dependent RdDM require RDR6 production of dsRNA before either Dicer-dependent or Dicer-independent RdDM (Fig. 2). For example, the TAS3 locus loses CHH methylation in rdr6 but not pol IV mutants (Fig. 2a), confirming that the TAS3 locus is a target of RDR6-RdDM in the TE-silent context [9]. When RDR6 is functional and DCL2, DCL3, and DCL4 are mutated, Dicer-independent processing occurs and generates a ladder of TAS3 siRNA sizes (Fig. 2b) [34, 35]. However, when RDR6 is non-functional (in the rdr6 mutant), TAS3 siRNAs and laddering are not produced, demonstrating that RDR6 is upstream of Dicer-independent processing (Fig. 2b). The same is true of TE siRNAs: in the TE-silent context they are all dependent on Pol IV (Fig. 2c) and in the TE-active context siRNA laddering does not occur in the ddm1 pol IV rdr6 triple mutant as it does in ddm1 dcl3 (Fig. 2d). This result demonstrates that like Pol IV activity, RDR6 activity on TE mRNAs occurs before the Dicer-independent siRNA production that generates siRNA laddering. Therefore, the pol IV rdr6 double mutant represents the removal of the majority of the upstream dsRNA that drives Dicer-dependent or Dicer-independent RdDM in either the TE-silent or TE-active context. Correspondingly, we find that Pol IV and RDR6 are responsible for nearly all TE RdDM in the TE-silent or TE-active contexts (Fig. 1d, e, Additional file 2: Results, and Additional file 5: Figure S4A), but this level is not 100 % as we have identified a distinct pathway of TE RdDM that is not dependent on either Pol IV or RDR6 (see below).
Upon TE transcriptional activation, three RdDM mechanisms target genome-wide TE methylation
To characterize the methylation pathways that act on each TE genome-wide, we calculated the CHH methylation level for each of the annotated TE elements and fragments in the Arabidopsis genome (31,189). We were able to successfully cover and individually assay 29,252 (93.8 %) of all TEs, with the majority of TEs lost representing small high-copy TE fragments. We grouped TEs by their mechanism of CHH methylation: no CHH methylation, Pol IV-RdDM (dependent on Pol IV), RDR6-RdDM (dependent on RDR6), and maintenance methylation (not dependent on any RdDM) (see “Methods”) (Fig. 2e). The corresponding CG and CHG methylation analysis is shown in Additional file 6: Figure S5 and replicate data of CHH methylation patterns for key genotypes is shown in Additional file 4: Figure S3H. Similar to the TEs that have been individually investigated and determined to be targets of RDR6-RdDM [12, 16], we found that both RDR6-RdDM and Pol IV-RdDM can target the same TE, providing a distinct co-regulated category (Fig. 2e). In addition, we identified a category of TEs that are methylated by a new pathway of DCL3-dependent 24 nt siRNAs which are not produced from Pol IV, a pathway we refer to as DCL3-RdDM (see below).
Genome-wide distribution of TE CHH methylation in the TE-silent context demonstrates that roughly one-third of TEs do not have CHH methylation, roughly one-third of TEs are not going through RdDM and are subject to only maintenance CHH methylation via CMT2, and roughly one-third are regulated by Pol IV-RdDM (Fig. 2e). This confirms that when TEs are silenced, maintenance methylation and Pol IV-RdDM are the major pathways that mutually exclusively target TE CHH methylation [7]. We find that pol IV and rdr2 mutants have less TE CHH methylation than dcl3 mutants (Fig. 2e), supporting the Dicer-independent function of Pol IV/RDR2-derived siRNAs in RdDM [35, 36]. On a genome-wide level very few TEs are targeted by DCL3-RdDM or RDR6-RdDM in the TE-silent context, although this number is not zero and we have previously identified a TE that is subject to RDR6-RdDM in wt Col [16]. Consequently, very few TEs in the TE-silent context are regulated by AGO1 and the TEs regulated by AGO6 are targeted through 24 nt siRNAs and the Pol IV-RdDM pathway (Fig. 2e). In addition, we find evidence of 1547 TEs that are primarily targeted by Pol IV-RdDM, but upon loss of Pol IV, these TEs have low levels of RDR6-dependent CHH methylation, demonstrating that they are acted upon by both Pol IV-RdDM and RDR6-RdDM (Fig. 2e). By analyzing mutants that at the same time lose both Pol IV- and RDR6-RdDM, we are able to detect that these two distinct pathways do not function completely independently, but rather one can compensate for the loss of the other (Additional file 7: Figure S6).
We find on the genome-wide level that RdDM regulates more TEs when they lose transcriptional silencing and this is due to an increased number of TEs targeted by the Pol II expression-dependent RdDM pathways (Fig. 2e) (Additional file 5: Figure S4B). Compared to the TE-silent context, in the TE-active context we observe in our dataset an increase in the number of TEs that go through RDR6-RdDM (4.3-fold higher in Fig. 2e), DCL3-RdDM (6.2-fold higher), and the Pol IV-/RDR6- co-regulated RdDM category (2.7-fold higher). We find that roughly one-third (31.6 %) of TEs with CHH methylation in the TE-active context are regulated by RDR6-RdDM (either RDR6-RdDM alone or co-regulated with Pol IV-RdDM). We used the TEs identified in our analysis as regulated by RDR6-RdDM to investigate a replicate dataset and determined that ~50 % of TEs are not covered in the replicate dataset, ~25 % display RDR6-dependent methylation in both datasets, and ~25 % failed to replicate (Additional file 4: Figure S3I). The fraction of RDR6-RdDM TEs that could not be replicated may either be false positives in our analysis or bona fide RDR6-RdDM targets identified but not replicated as a result of the four-fold increase in TE methylation resolution between datasets (Additional file 4: Figure S3J) due to our improved TE mappability (see “Methods” and Additional file 2: Results). Our data prove that RDR6-RdDM does not just function on three TEs (as previously shown [12, 16]), but rather hundreds of individual TEs, and this pathway was likely previously not identified due to the lack of transcriptionally active TEs in wt Col and the activity of Pol IV-RdDM on many of the same TE target loci. A corresponding reduction (we find 2.3-fold) takes place in the number of TEs regulated by maintenance methylation in the ddm1 TE-active context, demonstrating that DDM1 functions in preserving maintenance methylation-based transcriptional silencing [19]. Of the TEs that undergo any type of RdDM in the TE-silent context, the majority (we find 71.9 %) still undergo RdDM in the ddm1 TE-active context, while the other TEs lose CHH methylation completely (Additional file 5: Figure S4C). Pol IV-RdDM is still the major RdDM mechanism targeting TEs in ddm1, as the number of TEs that undergo Pol IV-RdDM (without RDR6-RdDM) in the ddm1 background is roughly equal (we find a 1.3-fold change) compared to TE-silent context (and we find a 1.1-fold change when the RDR6 co-regulated pathway is considered). When focused on only transcriptionally competent TEs in the TE-active context, the Pol II-expression dependent RdDM pathways play a pronounced role: we find RDR6-RdDM is 17.4-fold higher, DCL3-RdDM is 18.4-fold higher, and the co-regulated RDR6- and Pol IV-RdDM pathway is 3.6-fold higher compared to the TE-silent context. At the same time, Pol IV-RdDM has decreased function on transcriptionally active TEs (we find a 0.6-fold change) (Additional file 8: Figure S7A). Therefore, we conclude that RDR6- and DCL3-RdDM are the major activated pathways upon TE transcriptional activation and these pathways preferentially act on TEs transcribed into mRNAs. As a consequence of this shift in RdDM pathways, AGO1 indirectly contributes to the CHH methylation of more TEs (we find four–fold) in the TE-active context due to its role in the production of 21–22 nt siRNAs (Fig. 2e) [15].
In addition to the number of TE targets, we quantified the amount of CHH methylation that each RdDM pathway contributes to their respective targets. We find that in the TE-active context, when all three RdDM mechanisms are active, Pol IV-RdDM is the most efficient and causes the highest level of CHH methylation, while RDR6-RdDM and DCL3-RdDM cause less overall CHH methylation of their targets (Additional file 5: Figure S4D). The higher efficiency of Pol IV-RdDM may be due to the specialization of this pathway and its components away from post-transcriptional silencing to specifically target RdDM.
Pol II-dependent DCL3-RdDM defines a new mechanism targeting TEs
In our analysis of TE CHH methylation patterns, we identified a category of TEs that loses methylation in dcl3, but not in pol IV or rdr2 mutants (Fig. 2e). In the canonical Pol IV-RdDM pathway, Pol IV/RDR2-derived dsRNAs are processed into 23–24 nt siRNAs by DCL3 (reviewed in [2]). To characterize the Pol IV/RDR2-independent mechanism of DCL3-RdDM, we investigated the AtCopia68 long terminal repeat (LTR) retrotransposon fragment At5TE76210, which is located within an intron of the Agenet domain gene At5g52070. We found that CHH methylation of this TE is present in ddm1, but lost in the ddm1 dcl3, ddm1 ago6, ddm1 drm2, and ddm1 pol V double mutants (Fig. 3a, blue box). Importantly, the CHH methylation is present in ddm1 pol IV and ddm1 rdr2 mutants at a comparable level as the ddm1 single mutant, demonstrating that the CHH methylation at this TE is not dependent on Pol IV/RDR2. The DCL3-RdDM mechanism requires Pol V, which acts downstream of siRNA production [37, 38]. Therefore, the downstream chromatin-bound portion of the DCL3-RdDM pathway acts similar to RDR6- and Pol IV-RdDM to target Pol V scaffolding transcripts with AGO-bound siRNAs, while it is only the upstream siRNA-producing portion of the pathway that differs. This DCL3-RdDM mechanism is responsible for the methylation of few TEs in the TE-silent context, but plays a larger role in CHH methylation of TEs in the ddm1 TE-active context (Fig. 2e) and an even greater role on transcriptionally competent TEs (Additional file 8: Figure S7A), again demonstrating that this mechanism is likely dependent on Pol II transcription of its target loci.
We next aimed to characterize the siRNAs that target the DCL3-RdDM pathway. This is complicated by the fact that DCL3-RdDM targeted TEs generally have low siRNA mappability (0.78, while 1.0 equals all siRNAs map uniquely; see Additional file 2: Methods for explanation of mappability calculation) and this complicates the analysis of exactly which siRNAs are produced from these loci. We chose in Fig. 3a to investigate At5TE76210 because the methylation of the TE extends beyond the TE boundary into the single-copy sequence of the At5g52070 gene (Fig. 3a, red box). Therefore, we could unambiguously map siRNAs to this region of the genome and determine which siRNAs are driving its RdDM. We find that 24 nt siRNAs are abundantly produced from this genic region in both wt Col and ddm1, and the majority of these 24 nt siRNAs are not dependent on Pol IV or RDR2 in the ddm1 TE-active context (Fig. 3b). This continued production of 24 nt siRNAs in ddm1 pol IV or ddm1 rdr2 mutants correlates with the continued targeting of this region by RdDM in these mutants (Fig. 3a, red box). The 24 nt siRNAs and RdDM of this region are only dependent on DCL3 (Fig. 3a–c) and thus this represents a mechanism of Pol IV/RDR2-independent production of 24 nt siRNAs via DCL3, which can target RdDM. The production of these 24 nt siRNAs is independent of RDR2 and RDR6 and therefore this represents a distinct mechanism of 24 nt siRNA production and RdDM from the previously described RDR2-dependent or RDR6-dependent mechanisms [39, 40]. In addition, we used biological replicates and single-locus bisulfite sequencing to verify the activity of the DCL3-RdDM pathway at the gene At5g52070 and a distinct TE (At3TE40740 – AtSINE4) in the ddm1 TE-active context (Fig. 3c, d), validating our MethylC-seq data analysis and confirming that the DCL3-RdDM mechanism is not an informatic outlier, but rather a distinct pathway that regulates multiple TEs.
TE length is a key determinant for regulation by each type of RdDM
To determine how individual TEs are selected to go through different RdDM types, we analyzed individual elements from the Athila6A subfamily of gypsy LTR retrotransposons, which are strong targets of RDR6-RdDM in the TE-active ddm1 context (Additional file 3: Figure S2A, C) [12, 16]. The majority (we find 84.6 %) of Athila6A elements are not targeted by RdDM in the TE-silent context and the rest of Athila6A TEs are smaller than 2.0 kb (the full-length Athila6A consensus element is 11.6 kb) (Fig. 4a). This demonstrates that the TE fragments which are too small to encode all of their own proteins (and thus by definition are non-autonomous) are either targets of RdDM or do not have detectable levels of CHH methylation, while the large potentially full-length elements are maintained in a silenced state by CMT2-based maintenance of methylation [16] and not RdDM (Fig. 4a). When transcriptionally activated, more Athila6A elements are targeted by RdDM (62.7 %). Although each RdDM mechanism targets some short TE fragments, we find the median size of the Athila6A TE that Pol IV-RdDM targets is a short 219 bp fragment, the median size that DCL3-RdDM targets is an intermediate sized 1.1 kb, while the median size that RDR6-RdDM targets is 4.5 kb (Fig. 4a). These data suggest that different RdDM mechanisms exist for long, intermediate, and short TEs.
To investigate whether the trend of long TEs specifically targeted by RDR6-RdDM is maintained genome-wide, we categorized all TEs (without Athila6A) by length. We find that almost all TEs with no detectable CHH methylation are small (under 2.0 kb), while most large TEs (>5 kb) undergo maintenance CHH methylation independent of RdDM in the TE-silent context (Fig. 4b). Importantly, for large TEs there is a genome-wide increase in their targeting by each type of RdDM in the TE-active context: the medium and long TEs (>2.0 kb) are statistically over-represented in the RDR6-RdDM and DCL3-RdDM categories (compared to the total genome TE size distribution) (Fig. 4b). Therefore, we conclude that when expressed, long TEs are preferentially targeted by the RDR6-RdDM and DCL3-RdDM pathways. In addition, we investigated whether TE type, proximity to a gene, position on the chromosome, or copy number correlates with RdDM type (Additional file 9: Figure S8). In contrast to TE size, we found that these other factors do not account for the switch from maintenance methylation in the TE-silent context to RdDM in the TE-active context. We did observe trends such as that the TEs without CHH methylation are typically small, high copy, and on the chromosome arms very close to genes, and that the TEs with CHH methylation that are not targeted by RdDM in the TE-silent context (and therefore undergo CMT2-based maintenance methylation) are primarily centromeric/pericentromeric and are far from genes. Pol IV-RdDM preferentially targets chromosome arm TEs near genes, which correlates with previous data [7, 19, 22]. DCL3- and RDR6-RdDM preferentially target TEs far from genes in the centromere/pericentromere and favor the long LTR retrotransposons that are found at these regions and dominate large plant genomes.
We next aimed to correlate the genetic structure of individual TEs with their specific RdDM regulatory pathway. Most Arabidopsis TEs lack a detailed annotation based on structure and RNA expression data. We characterized the well-studied Athila6A consensus TE to define the transcriptional start sites, open reading frames (ORFs), and intron (data summarized in Fig. 4c). We aligned individual Athila6A TEs to the full-length annotated consensus element and categorized them by the CHH methylation pathway in either the TE-silent or TE-active context (Fig. 4c). As in Fig. 4a, we find that in the TE-silent context very few Athila6A elements are targeted by RdDM and these are only small TE fragments. In the TE-active context, the Athila6A elements are spread among the various RdDM categories. Importantly, we find that all full-length elements are specifically targeted by the RDR6-RdDM pathway (Fig. 4c). In addition, Pol IV-RdDM only targets small LTR fragments, while other fragmented or larger internally deleted elements are spread among all of the other CHH methylation pathways, including RDR6-RdDM. Of the known Athila6A features required for autonomous transposition (production of all the necessary proteins required for self-mobilization or mobilization of non-autonomous elements), including LTRs, transcriptional start sites, and ORFs, we find the probability of an element to encode this feature correlates with its CHH methylation pathway in the TE-active context (Fig. 4d). For example, elements targeted by Pol IV-RdDM never (in our dataset) contain any of the Athila6A internal coding region, while elements targeted by DCL3-RdDM always have an internal deletion of the ENV ORF promoter. Of interest, nearly all elements that retain the ENV ORF promoter are targeted for RDR6-RdDM in our dataset, suggesting that this structure is directing RDR6-RdDM activity on these transcripts. From these data we determine that RDR6-RdDM does not specifically act only on full-length elements, but all full-length and structurally autonomous Athila6A elements are targeted specifically by RDR6-RdDM. Therefore, which particular small RNA silencing pathway regulates each TE is influenced by the TE's genetic structure.
Structurally autonomous TEs are preferentially targeted by RDR6-RdDM
To determine if the trend of full-length Athila6A elements preferentially targeted by RDR6-RdDM is consistent with all LTR retrotransposons, we profiled each LTR retrotransposon for the presence or absence of seven domains essential for retrotransposition: 5’ LTR, GAG capsid protein, AP protease, RT, RNaseH, INT protein, and 3′ LTR. We determined the probability of each RdDM pathway to target a TE with these domains in the TE-active context (Fig. 5a). We found that elements regulated by RDR6-RdDM generally possess all of the internal protein-coding regions, while particular TEs with the GAG protein domain are more often targeted by RDR6-RdDM. Similar to our finding with Athila6A, we find that Pol IV-RdDM targets TEs that have a low probability of containing any of the internal retrotransposition-essential domains and elements targeted by DCL3-RdDM have a reduced probability of containing the protein-coding regions GAG, RNaseH, or INT (Fig. 5a).
We next aimed to determine if LTR retrotransposons with all of the domains required for retrotransposition are targeted by one RdDM type. Few LTR retrotransposons have all seven of the domains defined in Fig. 5a, so we clustered the TEs into categories of 1, 2–3, 4–5, and 6–7 domains (inset Fig. 5b). In the TE-silent context, most of the TEs with 6–7 domains are not targeted by RdDM and rather are subject to maintenance methylation. In the TE-active context, Pol IV-RdDM alone acts on few 6–7 domain elements, while the RDR6-RdDM, DCL3-RdDM and co-regulated Pol IV- and RDR6-RdDM categories function on the majority of the elements with all the necessary domains required for retrotransposition. Of note, a trend exists where the higher number of retrotransposition-essential domains an LTR-retrotransposon has, the less likely that TE is to be targeted by Pol IV-RdDM in the TE-active context. These trends remain consistent, but are not further enriched, when the subset of transcriptionally competent TEs is interrogated (Additional file 8: Figure S7B).
One outlier TE family to the trends observed in Fig. 5b is ONSEN, a heat-activated Copia LTR retrotransposon (Copia78) [14, 41]. For ONSEN, most elements with 6–7 essential retrotransposition domains are targeted by Pol IV-RdDM in the wt Col background and they remain targeted by Pol IV-RdDM in the ddm1 background (Fig. 5c). Two ONSEN elements behave like most other LTR retrotransposons and in the ddm1 background switch to being regulated by RDR6-RdDM, but ONSEN is unusual in the fact that many near-complete elements remain Pol IV-RdDM targets in the ddm1 background. Why the ONSEN family behaves differently than other LTR retrotransposons is unclear, but it is likely due to the fact that ONSEN is not transcriptionally activated in ddm1 mutants that have not been heat-activated [14].
Full-length TEs are preferentially targeted for mRNA cleavage and secondary siRNA production, driving RDR6-RdDM
Key remaining questions are how and why long autonomous TEs are preferentially targeted by RDR6-RdDM. To investigate this preference, we measured the length of each TE compared to its autonomous consensus element and categorized individual TEs as full-length or TE fragments (see “Methods”). Creasy et al. found that TE mRNAs are targeted for initial cleavage by microRNA-like primary (not dependent on a RDR protein) small RNAs produced elsewhere in the genome [29]. They also demonstrated that this cleavage is responsible for initiating RDR6-dependent RNAi and production of 21–22 nt secondary siRNAs [29] and these secondary siRNAs drive RDR6-RdDM [12, 16]. We used this mRNA cleavage data from the same tissue of wt Col TE-silent and ddm1 TE-active context genotypes to determine if the preference of full-length TEs to enter RDR6-RdDM is dictated on the mRNA-cleavage level. Therefore, we compared the percentage of full-length and fragmented TEs that are cleaved by primary small RNAs. As expected, few TE mRNAs are cleaved in the TE-silent context (Fig. 6a), since not many full-length TEs are expressed in wt Col (Additional file 1: Figure S1). In the TE-active context more TE mRNAs are expressed and are cleaved and we detected that the cleaved TE mRNAs are mostly from full-length TEs (Fig. 6b). This trend holds true for all TE types and is not specific to LTR retrotransposons (Additional file 10: Figure S9A, B). Additionally, by comparing the cleavage data from rdr6 and ddm1 rdr6 mutants, we were able to categorize TEs specifically targeted by primary small RNAs (not dependent on RDR6) or secondary siRNAs (dependent on RDR6) in both the TE-silent and TE-active backgrounds (Fig. 6c). We find that the small amount of detectable TE mRNA cleavage in the TE-silent context is occurring via primary small RNAs and in the TE-active context both primary and secondary small RNAs cause TE mRNA cleavage (Fig. 6a, b).
We next aimed to determine if the preference for full-length TE mRNA cleavage in the TE-active context results in secondary siRNA production from specifically the full-length cleaved TEs. In the TE-silent context, TEs that produce either cleaved or uncleaved mRNAs generate similar siRNA distributions, which are predominantly 24 nt (Fig. 6d), demonstrating that in the TE-silent context the small amount of TE cleavage does not lead to additional siRNA production. In the TE-active context, cleaved TE mRNAs generate RDR6-dependent 21–22 nt siRNAs, while as expected the uncleaved TE mRNAs do not (Fig. 6d). In addition, it is only the TEs with cleaved mRNAs in the TE-active context that are subject to RDR6-RdDM without Pol IV-RdDM compensation (Fig. 6e). Therefore, the reason most full-length structurally autonomous TEs are targeted by RDR6-RdDM in the TE-active context is: (1) full-length TEs are preferentially cleaved by primary small RNAs (Fig. 6a, b); (2) only cleaved TE mRNAs in the TE-active context make RDR6-dependent secondary 21–22 nt siRNAs (Fig. 6d); (3) only secondary 21–22 nt siRNA production drives RDR6-RdDM [16].
New primary sRNAs that accumulate in the TE-active context direct TE mRNA cleavage and drive RDR6-RdDM specificity
Since cleavage of full-length TE mRNAs can be detected in both the TE-silent and TE-active contexts (Fig. 6a, b), we wondered why RDR6-RdDM is only activated in the TE-active context. We therefore aimed to determine if secondary siRNAs generated in the TE-active context are from: (1) the same TE mRNAs cleaved in both the TE-silent and TE-active context; or (2) from cleavage of new TE mRNAs that were not expressed or uncleaved in the TE-silent context. We found that there are new TEs subject to mRNA cleavage in the TE-active context and these mRNAs produce 21–22 nt secondary siRNAs (new ddm1 cleaved, Fig. 6f). Additionally, we found that for the TE mRNAs that are cleaved in the TE-silent context (which do not produce secondary siRNAs), in the TE-active context these exact same TE mRNAs produce 21–22 nt secondary siRNAs (wt Col cleaved, Fig. 6f). Therefore, why does the same cleaved TE mRNA not produce secondary siRNAs in the TE-silent context while it efficiently produces secondary siRNAs in the TE-active context? We generated four hypotheses: (1) increased mRNA expression and hence increased mRNA cleavage at the same site in the ddm1 TE-active context drives secondary siRNA production; (2) equal numbers of new TE mRNA primary cleavage sites accumulate in the TE-active context, resulting in secondary siRNA production; (3) cleavage by multiple primary sRNAs drives secondary siRNA production only in the TE-active context [42]; and (4) the primary sRNAs directing TE mRNA cleavage are 21 nt in the TE-silent context, but are 22 nt in the TE-active context, a size shift that is known to induce secondary siRNA production [29, 43]. We individually tested these hypotheses (Fig. 6g, h, Additional file 10: Figure S9C, D) and found that TE mRNA cleavage occurs at new distinct sites within the TE mRNAs, driving RDR6-function and secondary siRNA production (Fig. 6g), while the size of the siRNA, level of TE mRNA, and the number of cleavage sites did not contribute (Fig. 6h, Additional file 10: Figure S9C, D). We observed that many TE mRNAs are cleaved once in the TE-silent context and once in the TE-active context, but the change in the primary sRNA and/or cleavage site results in secondary siRNAs production only in the TE-active context (Fig. 6g, h). Thus, new primary small RNAs with the same size distribution appear in the TE-active context and cleave the same TE mRNAs at new positions and are responsible for the TE 21–22 nt secondary siRNA production that drives RDR6-RdDM of full-length elements specifically in the TE-active context.