EpiMethylTag simultaneously detects ATAC-seq or ChIP-seq signals with DNA methylation

Activation of regulatory elements is thought to be inversely correlated with DNA methylation levels. However, it is difficult to determine whether DNA methylation is compatible with chromatin accessibility or transcription factor (TF) binding if assays are performed separately. We developed a fast, low input, low sequencing depth method, EpiMethylTag that combines ATAC-seq or ChIP-seq (M-ATAC or M-ChIP) with bisulfite conversion, to simultaneously examine accessibility/TF binding and methylation on the same DNA. Here we demonstrate that EpiMethylTag can be used to study the functional interplay between chromatin accessibility and TF binding (CTCF and KLF4) at methylated sites.


Introduction 1
The role of DNA methylation (DNAme) in gene regulation has been widely described [1][2][3][4]. In 2 general, methylation is thought to reduce accessibility and prohibit TF binding at enhancers and 3 promoters [5,6]. Nevertheless, TFs are also known to bind methylated DNA [2], but due to 4 limitations in the techniques available for this kind of analysis, few genome wide studies have 5 been performed. As a result, we still know very little about the DNA sequence and chromatin 6 context of TF binding at methylated sites and its significance to gene regulation. 7 Several techniques have been developed to measure DNAme, some more comprehensive than 8 others. Whole genome bisulfite sequencing (WGBS) covers all genomic regions, however, to 9 achieve sufficient sequencing coverage is costly. The alternative, reduced representation bisulfite 10 sequencing (RRBS), which requires less sequencing depth, preferentially captures CpG-dense 11 sequences known as CpG islands that can potentially act as regulatory elements [7]. 12 Nevertheless, both techniques require additional assays on different batches of cells to elucidate 13 the interplay between DNAme, DNA accessibility and TF binding and this does not satisfactorily 14 address the issue of compatibility. Current techniques that simultaneously analyze methylation 15 together with TF binding or accessibility (NOME-seq [8], HT-SELEX [9], ChIP-bisulfite [10], 16 group 4 ( Figure S5f, 288 versus 25 CpGs), group 1 shows a lower level of CTCF enrichment than 1 group 4. This may be due to the confidence of attributing CpGs to a specific group. As shown in 2 Figure S6g, for all clusters, more than half of the CpGs have a high probability of being in the 3 assigned group (> 72%). These data provide insight into CTCF binding and suggest an 4 anticorrelation between high accessibility and high methylation. 5 The MA0139.1 CTCF motif incorporates 2 CpGs: C2 and/or C12 (Fig. 4b, top panel). According 6 to the CTCF logo, we identified more CpGs at position C12 than C2 in the CTCF M-ChIP peaks 7 (4884 versus 921 CpGs, respectively, considering only the CpGs covered by at least 5 reads in 8 both M-ChIP and WGBS). Consistent with the findings from a recent study that analyzed CTCF 9 binding using oligonucleotides rather than genomic DNA [19], CTCF M-ChIP detected higher 10 levels of methylation at C12 compared to C2 (Fig. 4b, bottom panel, compare CTCF M-ChIP C2 11 versus C12, p-value = 1.02e-12). Importantly, CTCF M-ChIP is more suitable than WGBS for 12 detecting the differences (Fig. 4b, bottom panel, compare CTCF M-ChIP versus WGBS, p-value 13 = 0.023). In addition, we found that bi-methylation at both CpGs within the same read is slightly 14 enriched compared to what is expected by random chance (0.97% versus 0.05%) (Figure S7a, 15 χ 2 = 1531, p-value < 0.001). CTCF signal intensity is relatively comparable at the 4 combinations 16 of methylation, with a slight increase for C2 being methylated and C12 unmethylated ( Figure  17 S7b), however the biological significance of this remains to be determined. Nonetheless, 18 sequence variation at the C2 and C12 positions appears to have no effect on methylation levels 19 ( Figure S7c). 20

KLF4 M-ChIP enables characterization of WT versus mutant KLF4 R462A binding 21
Pioneer transcription factors need to access target genes that are inaccessible and whose 22 enhancer and promoter sequences may be methylated. A recent study has shown that a minority 23 of transcription factors (47 out of 1300 examined) including KLF4 can bind to methylated CpG sites [2]. A scatter plot of KLF4 M-ChIP in WT mESCs shows that the majority of CpGs in KLF4 1 peaks display low peak intensity and low methylation (Fig. 4c). However, in contrast to CTCF, 2 the small fraction of peaks with the highest peak intensity also display the highest methylation 3 levels. The study mentioned above [2], revealed that distinct zinc fingers on KLF4 mediate KLF4's 4 binding activity with methylated and unmethylated DNA. Residue arginine 458 on human KLF4 5 was shown to be important for binding to the methylated motif CCmCpGCC [2] (similar to the 6 Jaspar motif MA0039.2 for mouse KLF4). In the mouse protein the equivalent arginine residue 7 lies at position 462. 8 In order to investigate the binding of KLF4 to methylated DNA, we used Klf4 -/-mESCs [20] that 9 express either a WT or mutant version of KLF4 in which arginine 462 has been replaced by 10 alanine (R462A) (Figure S8a-b). We performed KLF4 M-ChIP in both WT and mutant expressing 11 mESC in duplicates. Intersections between replicates were used to identify peaks specific to (i) 12 WT or (ii) mutant versions of KLF4 and (iii) those that were common to both (Fig. 4d). Heatmaps 13 confirm the binding specificity of the two versions of KLF4 and reveal the high reproducibility 14 between duplicates ( Figure S8c). 15 We searched for mouse KLF4 motifs from the Jaspar database, using the FIMO tool from the 16 MEME suite. The two motifs that were identified, MA0039.2 and MA0039.1 can be distinguished 17 by the presence and absence of a CpG dinucleotide, respectively (Fig. 4e, top). The wild-type 18 version of KLF4 has a strong preference for motif MA0039.2 while the mutant loses this 19 preference. Overall the mutant protein has reduced binding to both motifs (Fig. 4e, (Fig. 4f). This result together with the motif findings ( Fig. 4e) Fig. 4g and Figure S8d, most of the 9 CpGs display low levels of methylation in any condition (bottom left corner). Thus, methylation 10 levels do not explain the absence of Mutant KLF4 binding at these sites. 11

Discussion 12
We developed a new method, "EpiMethylTag", that allows the simultaneous analysis of DNA 13 methylation with ChIP-seq or ATAC-seq. EpiMethylTag can be used to analyze the methylation 14 status and coincident accessibility or binding of other chromatin bound transcription factors. 15 Importantly our approach is a fast, low input, low sequencing depth method that can be used for 16 smaller cell populations than existing methods and can adapted for rare cell populations. 17 Specifically, our M-ChIP protocol significantly reduces the input for DNA binding factors such as 18 CTCF. The only published genome-wide ChIP-Bis-Seq for CTCF [12] used 100 ng of 19 immunoprecipitated DNA. Using a Tn5 transposase successfully allowed us to use less than 1ng 20 of immunoprecipitated DNA followed by bisulfite conversion. The number of cells required to 21 obtain 1ng of ChIPped DNA will vary depending on the protocol and the antibody used. ChIP-22 bisulfite [10] and BisChIP-seq [11] use lower cell numbers for H3K27me3. However, such histone 23 modifications in general require less cells for ChIP on TFs such as CTCF or KLF4 because they cover a higher portion of the genome. Although it has not been tested, our protocol may also lower 1 the number of cells required for M-ChIP of histone modifications. 2 EpiMethylTag confirmed that as a general rule, DNA methylation rarely coexists with DNA 3 accessibility or TF binding. Nonetheless, we found M-ATAC peaks of low signal intensity that 4 overlapped with DNA methylation. These peaks were located predominantly in intragenic and 5 intergenic regions and associated with low transcriptional output at gene promoters. This data 6 identifies a class of promoters with high accessibility, high levels of methylation, high H3K4me1, 7 low K3K4me3 and low H3K27ac (Fig. 3d). The biological relevance of such 'poised promoters', 8 remains to be determined. 9 Of note, a recent publication used the same design for the Methyl-ATAC aspect of EpiMethylTag 10 method [21]. As with our approach they show that mATAC-seq detects methylation patterns that 11 agree with both WGBS and Omni-ATAC (improved normal ATAC-seq [22]). By comparing 12 parental and DNMT1 and DNMT3B double knockout HCT116 cells they identified ATAC peaks 13 with increased accessibility that were bound by TFs only in the demethylated cells. However, they 14 did not adapt their approach to analysis of methylated ChIP-seq peaks as we have done. Here 15 we used M-ChIP to characterize the binding of both CTCF and KLF4 to motifs in the context of 16 DNA methylation. 17 Methylation within CTCF motifs is known to be anticorrelated with CTCF binding [3]. Our analysis 18 revealed that M-ATAC peaks containing a CTCF motif have an enriched CTCF intensity at CpGs 19 with intermediate levels of methylation as opposed to low and high levels of methylation. In 20 addition, CTCF M-ChIP revealed that methylation at CpG C2 is lower than at CpG at position 21 C12, a finding that suggests methylation at C2 could have a stronger negative impact on CTCF 22 binding than methylation at C12. Differences of this sort could not be detected by integrating 23 We further demonstrate that M-ChIP could be used to characterize the profiles and methylation 1 status of common WT and mutant KLF4 R462A binding sites. Thus, methylation levels do not 2 explain the absence of Mutant KLF4 binding at these sites and it appears that the mutant does 3 not bind the consensus motif so we cannot investigate the relationship between methylation in 4 the KLF4 motif and binding of WT versus Mutant KLF4 ( Fig. 4f-g). While the biological 5 significance of such differences remains to be investigated, our data demonstrate that 6 EpiMethylTag can be used to provide information about the methylation status of the binding sites 7 for the WT and mutant proteins. This information could not be obtained by performing separate 8 methylation and ChIP-seq experiments. 9

Conclusion 10
In sum, M-ATAC and CTCF M-ChIP reveal a complex interplay between accessible chromatin, 11 DNA methylation and TF binding that could not be detected by WGBS. EpiMethylTag can be used 12 to provide information about the DNA sequence and chromatin context of TF binding at 13 methylated sites and its significance to gene regulation and biological processes. This technique 14 can also be adapted for single cell analysis. with L-glutamine, penicillin/streptomycin, nonessential amino acids, β-mercaptoethanol, 1,000 20 U/mL LIF, and 15% FBS (ESC medium). To remove feeder cells from ESCs, cells were trypsin 21 digested and pre-plated in ESC medium for 30 min. Supernatant containing ESCs was used for 22 further experiments. 23

No. CRL 3216). Lentiviral infection of KLF4 knock-out mESC [20] was performed by spin-infection 8
and the cells were transferred to feeders and expanded with puromycin. After selection, KLF4 9 expression was induced with doxycycline (1ug/ml) for 2 days. Finally, the cells were pre-seeded 10 (30 mins) to remove the feeders and the ES cells were processed as described in the "Cell culture" 11 section. KLF4 protein expression has been checked by western blot using an antibody from 12 Santa-Cruz (#sc-20691, now discontinued) and using H3 as a loading control (anti-H3, Abcam, 13 ab1791). 14

Assembly of the transposase 15
Tn5 transposase was assembled with methylated adaptors as per the T-WGBS protocol[16]. Ten 16 microliters of each adapter with incorporated methylated cytosines (Tn5mC-Apt1 and Tn5mC1.1-17 A1block; 100 μM each; Table S1) were added to 80 μl of water and annealed in a thermomixer and tagmentation was performed for 1 min at 37°C with either 1 μl of the Nextera transposase 23 (ChIP-seq) or the transposase containing the methylated adaptors (M-ChIP, see section "assembly of the transposase" for details). Then, beads were washed twice with TF-WBI (20 mM 1 Tris-HCl/pH 7.4, 150 mM NaCl, 0.1% SDS, 1% Triton X -100, and 2mM EDTA) and twice with 2 TET (0.2% Tween -20, 10 mM Tris-HCl/pH 8.0, 1 mM EDTA). During the last wash, the whole 3 reaction was transfered to a new tube to decrease carry-over of tagmented unspecific fragments 4 stuck to the tube wall. Chromatin was eluted and decrosslinked by 70μl of elution buffer (0.5% 5 SDS, 300 mM NaCl, 5 mM EDTA, 10 mM Tris HCl pH 8.0) containing 20μg of proteinase K for 2 6 hours at 55 °C and overnight incubation at 65 °C. Eluted and purified DNA was either bisulfite 7 converted (CTCF M-ChIP, see section "Bisulfite conversion" for details) or directly amplified 8 (CTCF ChIP-seq, see "Amplification of ATAC-seq and ChIP-seq libraries" for details). 9

Bisulfite conversion 10
Purified DNA was bisulfite converted following the T-WGBS protocol[16] with the EZ DNA 11 methylation kit (Zymo). Oligonucleotide replacement was performed by incubating 9 μl of 12 tagmented M-ATAC or M-ChIP purified DNA with 2 ng of phage lambda DNA as carrier, 2 μl of 13 dNTP mix (2.5 mM each, 10 mM), 2 μl of 10× Ampligase buffer and 2 μl of replacement oligo 14 (Tn5mC-ReplO1, 10 μM; Table S1) in a thermomixer with the following program: 50 °C for 1 min, 15 45°C for 10 min, ramp at −0.1 °C per second to reach 37 °C. 1 μl of T4 DNA polymerase and 2.5 16 μl of Ampligase were added and the gap repair reaction was performed at 37 °C for 30 min. DNA 17 was purified using SPRI AMPure XP beads with a beads-to-sample ratio of 1.8:1 and eluted in 50 18 μl of H2O. 5 μl were kept as an unconverted control sample, and 45 μl was bisulfite converted 19 using the EZ DNA methylation kit (Zymo). Briefly, the gap repair reaction was performed by adding 20

Amplification of ATAC-seq and ChIP-seq libraries 1
Purified DNA (20 μl) was combined with 2.5 μl of each primer and 25 μl of NEB Next PCR master 2 mix as per the original ATAC-seq protocol [13]. For ATAC-seq, DNA was amplified for 5 cycles 3 and a monitored quantitative PCR was performed to determine the number of extra cycles needed 4 not exceeding 12 cycles in total to limit the percentage of duplicated reads. DNA was purified on DNA was purified using SPRI AMPure XP beads with a beads-to-sample ratio of 1:1 and eluted 9 in 20 μl of H2O. 10

Amplification of M-ATAC and M-ChIP libraries 11
Purified converted DNA was amplified as per the original T-WGBS protocol [16]. Briefly, 10 μl of 12 DNA was combined with 1.25 μl of each primer (25 μM each) and 12.5 μl of high-fidelity system 13 KAPA HiFi uracil+ PCR master mix. DNA was amplified for 5 cycles and a monitored quantitative 14 PCR was performed to determine the number of extra cycles needed, not exceeding 12 cycles in 15 total to limit the percentage of duplicated reads. 16

Sequencing of the libraries and data processing 17
For ATAC-seq, ChIP-seq, M-ATAC and M-ChIP, libraries were quantified using Kapa qPCR kit 18 and sequenced using the HiSeq 2500 for paired-end 50 bp reads). ChIP-seq for histone 19 modifications in mESC were downloaded from GEO (H3K4me1: GSM1000121, H3K27ac: 20 GSM1000126, H3K4me3: GSM1000124). Data processing was performed as per the pipeline 21 available on Github (link: "https://github.com/skoklab/EpiMethylTag"). Briefly, reads were trimmed 22 using trim-galore/0.4.4, and aligned to the mm10 assembly of mouse genome using bowtie2 [25] for ChIP-seq and ATAC-seq, and using Bismark/0  For Fig. 3d and Figure S5b, the plots were centered on CpGs in M-ATAC 13 peaks from the different groups highlighted in Fig. 3a. For Fig. 4a shared or specific to either WT or mutant KLF4 were identified using BEDTools [30]. For the ChIP enrichment versus CpG methylation plots, we plotted the peak score versus the beta values of 1 the CpG probes within the peaks, using peaks called via MACS2 for CTCF ( Figure S6b) and via 2 PeaKDEck for KLF4 (Fig. 4c). 3 4 To quantify the probability of clustering CpG probes into low-, medium-, and highly methylated 5 groups we assumed that beta values (ie the sampling mean) is normally distributed with mean 6 the beta value (b) and variance (b(1-b))⁄((n-1)) where n is the total number of reads. This allows 7 us to quantify the probability that each probe belongs to its designated cluster as ( < ' ) − 8 ( < * ) where ' and * are the high and low thresholds of the cluster respectively. In the 9 Figure S6g, the points and corresponding contours are colored based on their designated cluster. 10 The x-axis is the beta value and the y-axis is the probability that beta lies within the cluster limits. 11 For all clusters, more than half of the CpGs have a high probability of being in the assigned group 12 (> 72%). 13      Figure S4d). ***P=1.25e-28 between groups #1 and 2, 8 NS P=0.19 between groups #2 and 3, NS P=0.58 between groups #3 and 4 (Wilcoxon test), and for 9 the cytosines at intragenic regions (right panel, introns, exons, 5'UTR, 3'UTR, see Figure S4d). 10 ***P= 0.0001 between groups #1 and 2, *P= 0.02 between groups #2 and 3, NS P= 0.1 between 11 groups #3 and 4 (Wilcoxon test).

23.
Beard C, Hochedlinger K, Plath K, Wutz A, Jaenisch R: Efficient method to generate 20 single-copy transgenic mice by site-specific integration in embryonic stem cells.