Auxin-inducible degron 2 system deciphers functions of CTCF domains in transcriptional regulation
Genome Biology volume 24, Article number: 14 (2023)
CTCF is a well-established chromatin architectural protein that also plays various roles in transcriptional regulation. While CTCF biology has been extensively studied, how the domains of CTCF function to regulate transcription remains unknown. Additionally, the original auxin-inducible degron 1 (AID1) system has limitations in investigating the function of CTCF.
We employ an improved auxin-inducible degron technology, AID2, to facilitate the study of acute depletion of CTCF while overcoming the limitations of the previous AID system. As previously observed through the AID1 system and steady-state RNA analysis, the new AID2 system combined with SLAM-seq confirms that CTCF depletion leads to modest nascent and steady-state transcript changes. A CTCF domain sgRNA library screening identifies the zinc finger (ZF) domain as the region within CTCF with the most functional relevance, including ZFs 1 and 10. Removal of ZFs 1 and 10 reveals genomic regions that independently require these ZFs for DNA binding and transcriptional regulation. Notably, loci regulated by either ZF1 or ZF10 exhibit unique CTCF binding motifs specific to each ZF.
By extensively comparing the AID1 and AID2 systems for CTCF degradation in SEM cells, we confirm that AID2 degradation is superior for achieving miniAID-tagged protein degradation without the limitations of the AID1 system. The model we create that combines AID2 depletion of CTCF with exogenous overexpression of CTCF mutants allows us to demonstrate how peripheral ZFs intricately orchestrate transcriptional regulation in a cellular context for the first time.
CTCF is a multi-functional protein that organizes chromatin architecture and plays varied roles in transcriptional regulation by acting as both a transcriptional activator and repressor . CTCF represses transcription through its well-described role as an insulator at topologically associated domain (TAD) boundaries or as an enhancer blocker [2,3,4,5,6,7,8]. Transcriptional activation occurs when CTCF acts as a transcription factor at promoters or as a promoter-enhancer looping factor [9,10,11,12]. Well established as an integral component of chromatin architectural organization [13,14,15,16], CTCF’s loss results in TAD integrity collapse, abolished chromatin looping, and genome-wide changes to chromatin accessibility [12, 17, 18]. Despite these large-scale changes, transcriptional dysregulation following CTCF loss is modest [12, 18, 19].
CTCF principally functions through its interaction with DNA, which occurs via its conserved 11 repeat Cys2His2 (C2H2) zinc-finger (ZF) domain [1, 13, 14, 20]. Biochemistry and structural studies previously showed that ZFs 2–9 exhibited DNA sequence-specific interactions, while ZFs 1 and 10 lacked functional binding [21, 22]. ZFs 3–7 act as the anchor region of the domain that directly interacts with CTCF’s core consensus sequence motif [21,22,23,24]. Loss of ZFs in this region caused significant disruptions to DNA binding and chromatin organization . Motifs up and downstream of the core sequence have been predicted to stabilize CTCF binding via interactions with ZFs at the periphery of the ZF domain [22, 25,26,27]. In addition, Saldana-Meyer et al. partially disrupted ZFs 1 and 10, demonstrating these regions function through interactions with RNA rather than DNA, which stabilized chromatin binding and had some effects on gene expression and chromatin organization . In addition to the ZF domain, an RNA binding region (RBR) located at the C-terminus supports CTCF dimerization and orchestrates RNA-dependent chromatin organization [29, 30]. Although much has been discovered about how the different domains within CTCF work, previous domain studies were conducted in vitro or in vivo by overexpressing domain mutants while endogenous CTCF remained intact. Therefore, it is challenging to evaluate the crosstalk and impact between the ectopic and endogenous forms. To overcome these limitations, we developed a model to easily switch from endogenous CTCF expression to induced exogenous expression of domain mutants.
Compared with loss of function studies designed to disrupt DNA sequences or RNA transcripts, protein degradation studies, including the auxin-inducible degron (AID) system, offer the benefit of directly and reversibly removing the protein of interest without any off-target effects on the genome [31, 32]. However, the AID system does have some limitations, including leaky degradation in the absence of auxin, poor constitutive expression of OsTIR1, and, in particular, the need to use high concentrations of indole-3-acetic acid (auxin) to achieve degradation, which could cause cytotoxicity [33, 34]. To overcome these limitations, Yesbolatova et al. developed a new AID system, AID2, which used a mutated OsTIR1, OsTIR1(F74G), and auxin analog, 5-phenyl-indole-3-acetic acid (5-Ph-IAA), to achieve rapid and efficient protein degradation using hundreds fold less drug .
In this study, we demonstrated the AID2 system improved transcriptional studies following acute degradation of CTCF in the B-ALL SEM cell line by facilitating more rapid degradation of CTCF without cytotoxic effects. As was previously shown, despite causing global chromatin architecture changes, acute loss of CTCF had surprisingly little effects on transcription. To identify regions within CTCF connected with its transcriptional response, we employed a CTCF domain sgRNA library screen that identified the ZF domain as CTCF’s most functional domain. Furthermore, by combining the new AID2 system with induced expression of CTCF ZF mutants, we created a model that can switch from endogenous CTCF expression to induced mutant expression to study specific effects of mutant loss. Here, we revealed that ZF1 and ZF10 were required for binding CTCF to mutually exclusive regions within the genome that exhibited characteristic CTCF binding motifs and were correlated with transcriptional regulation of a subset of genes regulated by CTCF.
AID2 facilitated rapid degradation of CTCF with reduced cellular toxicity
The new CTCFAID2 cell line constitutively expresses OsTIR1(F74G)-P2A-EGFPAID2 in a previously derived SEM B−ALL cell line that contains the endogenous CTCFAIDmClover3 fusion protein and doxycycline-inducible wild-type (WT) OsTIR1 (CTCFAID1) , allowing for the comparison between AID1 and AID2 in the same cellular context. Upon the addition of 5-Ph-IAA, the 5-Ph-IAA ligand binds specifically to OsTIR1(F74G) to direct the Skp, Cullin, F-box (SCF) complex to AID-tagged proteins for ubiquitination and degradation (Fig. 1a) . CTCFAID2 cells were treated with 10 μM 5-Ph-IAA for 24 h and assessed for mClover3 fluorescence (Fig. 1b). Control RFP expression remained constant, demonstrating that OsTIR1(F74G) was constitutively expressed (Fig. 1b). At 2 h post-treatment, mClover3 fluorescence, which represented CTCFAID2 and EGFPAID2 expression, was markedly reduced after 4 h corresponding to immunoblot analysis that showed CTCFAID2 was undetectable after 4 h of treatment and EGFPAID2 after 2 h (Additional file 1: Fig. S1a). To examine the sensitivity of CTCFAID2 degradation, CTCFAID2 cells were treated with a titration of 5-Ph-IAA from 0.001 to 10 μM over a 6-h time course. Immunoblotting analysis showed CTCFAID2 was undetectable at concentrations as low as 0.01 μM (Additional file 1: Fig. S1b). Concurrently treating cells with 5-Ph-IAA and the proteasome inhibitor MG132 rescued CTCF auxin-induced degradation (Additional file 1: Fig. S1c).
To examine whether requirements of the auxin degradation system, including miniAID tagging of CTCF and OsTIR1 expression, might affect CTCF expression prior to auxin-induced degradation, we examined CTCF expression levels in various cell lines and AID treatment settings. First, SEMWT and CTCFAID1 cells were treated with 500 μM IAA and 1 μg/mL doxycycline, individually and combined for 24 h. As expected, endogenous CTCF expression in SEMWT cells was unaffected by all treatment conditions (Additional file 1: Fig. S1d). Tagged CTCF expression in CTCFAID1 cells was comparable to SEMWT levels without treatment and with IAA alone. However, induction of OsTIR1 by doxycycline caused leaky degradation of CTCF, a known limitation of the AID system using WT OsTIR1 [33,34,35]. Next, we examined CTCF expression levels in the AID2 system by comparing CTCF expression between SEMWT, SEMOsTIR1(F74G) [SEMWT cells expressing OsTIR1(F74G)], CTCFAID1, and CTCFAID2 cells with and without 1 μM 5PhIAA treatment for 24 h (Fig. 1c). CTCF expression was consistent between SEMWT, SEMOsTIR1(F74G), and CTCFAID1 regardless of OsTIR1(F74G) expression, tagging of CTCF, or 5-Ph-IAA. In addition, no leaky degradation of CTCF was observed in CTCFAID2 cells, which have tagged CTCF and OsTIR1(F74G). Degradation was only observed after adding 5-Ph-IAA, supporting the AID2 degradation system overcomes the limitation of leaky degradation observed in the previous AID1 system.
To compare the degradation efficiency between AID1 and AID2, CTCFAID2 cells were treated with either 1 μM 5-Ph-IAA (AID2) or 500 μM IAA + 1 μg/mL doxycycline (AID1) for 96 h (Fig. 1d, e). While CTCFAID2 showed undetectable levels of CTCF by 4 h post-treatment, CTCFAID1 required 24 h and five 100-fold more drug to achieve similar degradation. Both systems showed degradation was stable over time with no CTCFAID recovery observed by 96 h.
Next, a growth assay was performed to compare cell proliferation over 4 days using the AID1 and AID2 systems. CTCFAID1 and CTCFAID2 cells carrying doxycycline-inducible WT CTCF were included as rescue controls. Cells treated with the AID2 system only showed a very slight decrease in cell number by the end of the assay (Fig. 1f). However, a striking growth retardation was observed following the AID1 treatment regimen that could not be rescued by the induced expression of WT CTCF (Fig. 1g). In addition, cell growth arrest increased in the AID1-treated cells, which paused at the G2-M checkpoint and exhibited a shortened S phase (Fig. 1h, i). To further examine whether treatment conditions, CTCF degradation, or a combination of both were responsible for the severe growth restriction observed using the AID1 system, SEMWT cells and CTCFAID1 cells were treated with 500 μM IAA and 1 μg/mL doxycycline, individually and combined, over a 4-day growth assay. High auxin treatment alone significantly reduced proliferation in SEMWT cells, which was compounded in CTCFAID1 cells following CTCF degradation (Additional file 1: Fig. S1e-f). In contrast, SEMWT, SEMOsTIR1(F74G), CTCFAID1, and CTCFAID2 cells showed no growth defects following 5-Ph-IAA treatment (Additional file 1: Fig. S1g). Taken together, these data support the AID2 system was superior for rapidly degrading CTCF with reduced cellular toxicity.
CTCFAID2 degradation led to a genome-wide loss of CTCF DNA binding and chromatin looping
As was previously observed following AID1 depletion of CTCF [12, 18], CTCF ChIP-seq showed 5-Ph-IAA treatment globally removed CTCF DNA binding. As expected, the Homer motif analysis identified the CTCF consensus binding motif as the top enriched transcription factor motif (Fig. 1j). CTCF binding peaks were assigned to loci that exhibited CTCF motif enrichment and high confidence reproducibility between replicates. Principal component analysis (PCA) confirmed peaks assigned in the CTCFAID2 treatment groups were highly correlated with significant variation observed between 5-Ph-IAA-treated and untreated cells (Fig. 1k). When reproducible peaks were combined from both treated and untreated conditions, a total of 46,620 peaks were called (Fig. 1l). Upon 24-h treatment with 10 μM 5-Ph-IAA, 27,262 CTCF-bound peaks were significantly reduced, accompanied by a complete loss of 19,291 peaks after CTCF degradation. An additional 67 peaks were called in 5-Ph-IAA-treated cells but with low confidence (Additional file 1: Fig. S2a).
Although CTCF appears completely degraded following 5-Ph-IAA treatment by immunoblot analysis, a small fraction of CTCF remained, which can be seen at the 27,262 peaks that were retained following 5-Ph-IAA treatment. Persistent CTCF binding following CTCF depletion by either auxin or RNAi has been observed before by us and others [12, 18, 19, 36]. When we further examined the retained peaks, we observed that, prior to treatment, they exhibited a much greater peak intensity than lost peaks (Additional file 1: Fig. S2b). No significant genomic distribution differences were observed when comparing the total population of lost and significantly reduced peaks (Additional file 1: Fig. S2c). However, the genomic distribution between the most significantly reduced or lost CTCF peaks (FDR ≤ 0.05, FC ≥ 2) showed that the significantly reduced CTCF peaks were predominantly located at promoters. In contrast, the lost peaks were found in non-promoter regions (Additional file 1: Fig. S2d). Additionally, while the significantly reduced peaks (FDR ≤ 0.05, FC ≥ 2) were located slightly closer to a TAD boundary, there was no statistical significance when compared to total lost peaks (Kolmogorov–Smirnov test) (Additional file 1: Fig. S2e). As CTCF is a survival essential gene in SEM cells , and we did not notice a significant change in cell proliferation following 5-Ph-IAA treatment (Fig. 1f; Additional file 1: Fig. S1g), the persistent binding of CTCF at a subset of CTCF binding sites may help support survival.
In addition, CTCF HiChIP improved the capture of CTCF-dependent loops compared to HiC and showed a global reduction in CTCF anchored chromatin loops following 6 h of 5-Ph-IAA treatment, with a strong correlation observed between replicates (Fig. 1m; Additional file 1: Fig. S3a-b). Untreated cells exhibited 7220 chromatin loops, of which 6852 were lost following CTCF degradation (Additional file 1: Fig. S3b). A small set of loops were either retained or gained; however, the gained, or “new,” loops had low aggregate peak analysis (APA) signals, and the percentage of both anchors overlapping CTCF peaks was much lower (~ 30%) compared to others (> 60%).
To determine how loop anchors colocalized, CTCF HiChIP loop anchors were compared to H3K27ac data for SEM from GEO (GSM1934089). Based on the common nomenclature, the peaks called for H3K27ac that were not associated with a TSS were considered enhancers (E). Anchors overlapping both 2 kb + / − a TSS and an H3K27ac peak were assigned to promoter (P). All anchors not associated with a TSS or H3K27ac were assigned to CTCF. Notably, there was a similar distribution of P-P (p = 0.066) and E-E (p = 0.045) loops between the retained and lost loops. However, lost loops had significantly fewer P-E loops (p = 0.001245, odds ratio = 1.636, Fisher’s exact test) (Additional file 1: Fig. S3c). It is worth noting that although statistically significant, the total loop number is low. Therefore, the biological significance of these comparisons requires extensive investigation in the future.
When loop anchor locations were compared to CTCF ChIP-seq, 72.1% of lost loop anchors overlapped CTCF peaks, while 72.4% of retained loop anchors overlapped CTCF peaks, demonstrating the loop anchors colocalized with CTCF binding sites. When we correlated CTCF binding status to loop anchor regions lost after treatment, we observed that 65.4% of lost loop anchors overlapped CTCF retained peaks, while only 11.3% overlapped CTCF lost peaks. Although the small percentage of lost loops that overlapped lost CTCF peaks may initially seem unexpected, as we previously reported, retained CTCF binding after 5-Ph-IAA treatment was observed but with significantly reduced peak intensity compared to untreated samples (Fig. 1l; Additional file 1: Fig. S2a). To access the correlation of binding intensity to loop anchor loss, we compiled the log2 (fold change) for CTCF peaks and separated them into two groups depending on whether they overlapped lost loop anchors or overlapped retained loop anchors. The results indicated that all CTCF binding was decreased, while the decreasing magnitude of binding at CTCF peaks overlapping the anchors for lost loops was significantly lower when compared to the peaks overlapping retained loops (Additional file 1: Fig. S3d). Therefore, complete abrogation of CTCF binding was not necessary to disrupt CTCF-mediated chromatin looping.
AID2 improved gene expression analysis by SLAM-seq following CTCF acute depletion
Using the AID1 system, we previously showed that CTCF depletion for 48 h minimally impacted genome-wide transcription despite the global loss of chromatin architectural integrity and accessibility [12, 17]. To address whether modest transcriptional changes previously observed after CTCF loss were due to the examination of steady-state RNA levels versus nascent transcriptional changes, we used thiol (SH)-linked alkylation for the metabolic sequencing of RNA (SLAM-seq) to compare nascent RNA transcription changes using the AID1 and AID2 systems. SLAM-Seq allows for the quantification of newly transcribed mRNAs by incorporating a 4-thiouridine (4sU) into the RNA that will go on to be identified as a thymine-to-cytosine (T > C) conversion in 3′-end mRNA-sequencing (Fig. 2a) . CTCFAID1 and CTCFAID2 cells were treated with either 500 μM IAA, 1 μg/mL doxycycline (AID1), or 10 μM 5-Ph-IAA (AID2) for a total of 24 h with collection time points throughout. T > C conversion was successful and showed no strand bias for both AID1 and AID2 treatment groups (Additional file 1: Fig. S4a). PCA of SLAM-seq data revealed that PC1 generally correlated with the time of treatment in PCA plots (Fig. 2b). However, for AID1, this correlation was predominantly treatment dependent as indicated by the clustering of untreated samples (on the left) versus treated samples (clustered together on the right), indicating the nascent gene expression contributing to PC1 were in response to + IAA/Dox, not decreasing levels of CTCF (Fig. 2b). However, the treatment samples did not cluster together in AID2, indicating those genes contributing to PC1 were more correlated with treatment time and subsequent CTCF degradation instead of a binary response to + 5-Ph-IAA. As expected, the AID2 system resulted in a time-dependent accumulation of differential nascent RNA transcript populations in response to early CTCF loss starting at 2 h post-5-Ph-IAA treatment (T2) and with a peak of nascent transcript differential expression (DE) observed at 12 h post-treatment (T12) (p ≤ 0.05, |log2FC|≥ 1) (Fig. 2c). Steady-state (the superset of reads including T > C converted and unconverted reads) DE appeared at T2 and increased at T12 and T24. In contrast, AID1 cells accumulated 1101 nascent transcript and 847 steady-state RNA transcript changes following 2 h of IAA treatment (T2) (Fig. 2c), which did not correspond to a significant reduction in CTCF protein (Fig. 1e), suggesting AID1 treatment conditions could affect transcriptional response.
To determine to what extent AID1 treatment conditions contributed to transcriptional dysregulation, differential gene expression was analyzed between SEMWT and CTCFAID1 cells following AID1 treatment and CTCFAID2 cells following AID2 treatment. Total RNA was collected after 24 h of treatment and total-stranded RNA-seq was performed. When comparing SEMWT and CTCFAID1 cells following AID1 treatment, 196 genes (82 up, 114 down) were differentially expressed, including genes associated with amino acid metabolism, apoptosis, and ER stress (Additional file 1: Fig. S5a-b). Pathways enriched for DE following treatment included the krige amino acid deprivation, hallmark TNFA signaling, and hallmark unfolded protein response (Additional file 1: Fig. S5c). Spearman’s correlation between AID1-treated SEMWT and CTCFAID1 cells showed some correlation in DE genes, suggesting dysregulation of these genes resulted from auxin treatment alone (Additional file 1: Fig. S5d). When SEMWT cells treated with AID1 conditions were compared to CTCFAID2 cells treated with AID2 conditions, no significant correlation was observed, highlighting that CTCFAID2 cells treated with low auxin did not develop transcriptional dysregulation attributed to high auxin toxicity (Additional file 1: Fig. S5e). CTCFAID1 and CTCFAID2 cells treated with either high auxin and doxycycline (AID1) or 5-Ph-IAA (AID2) demonstrated a correlation of DE genes that corresponded to those genes dysregulated by CTCF loss (Additional file 1: Fig. S5f). Taken together, these data suggest the genes DE following AID1 treatment could be attributed to genes dysregulated by high auxin toxicity, CTCF degradation, or a combined effect of both conditions. Therefore, the AID2 system, which does not have auxin toxicity, was preferred for studying acute CTCF degradation effects on transcription.
Previously, we showed acute depletion of CTCF in SEM cells resulted in dysregulation of MYC transcription pathways . Gene set enrichment analysis (GSEA) of nascent transcript populations showed early enrichment of decreased DE in hallmark MYC targets V1 and V2 following AID1 treatment, confirming observations from the previous study (Fig. 2d). AID2 also showed a decrease in DE enrichment in the MYC targets V1 and V2 pathways, but the enrichment peaked after 4 h of treatment for MYC targets V2 and 12 h for MYC targets V1, corresponding with undetectable levels of CTCF (Fig. 1d). These data confirmed that MYC was a direct target of CTCF. However, the early response observed for MYC targets using the AID1 system could not be supported by CTCF degradation alone since CTCF was not fully depleted by immunoblot analysis until after 24 h of treatment (Fig. 1e), nor could it be attributed to auxin toxicity since MYC targets were not affected by AID1 treatment in SEMWT cells, suggesting undetermined effects of combined high concentration auxin treatment and CTCF depletion could destabilize MYC targets.
AID2 SLAM-seq revealed additional pathways enriched for increased DE of nascent transcripts, including the nuclear factor (NF)-Ka signaling via NF-Kb pathway and WNT beta-catenin signaling pathway suggesting targets within these pathways were directly regulated by CTCF, possibly by its insulation properties (Fig. 2d). Overall, global nascent transcription changes following CTCF loss with the AID2 system were limited, which supported previous observations that CTCF loss minimally impacted global steady-state RNA expression [12, 18, 19].
CTCF domain sgRNA library screen identified functional domains of CTCF
Despite showing a limited impact on global transcription following CTCF loss, we hypothesized certain domains of CTCF, in addition to the well-characterized ZF core binding domain, could sensitize CTCF’s transcriptional response. To identify regions within CTCF that could be targeted to study its transcriptional impact and chromatin binding without fully disrupting the most survival essential domains, we developed a CRISPR (clustered regularly interspaced short palindromic repeats) functional domain single guide RNA (sgRNA) library screen for CTCF. Domain libraries are designed by tiling sgRNAs across a gene’s coding sequence . Nucleic acid disruptions caused by sgRNAs that shift the reading frame in essential genes would drop out of survival screens when targeted before or within essential domains. In contrast, sgRNAs that cause in-frame mutations only drop out when targeting essential domains (Fig. 3a). In the CTCF domain CRISPR sgRNA library, 512 sgRNAs were designed spanning the coding exons of CTCF, 120 sgRNAs were included as positive controls, and an additional 100 non-targeting sgRNAs were included as negative controls. The CTCF sgRNA domain library was cloned into a lentiviral vector with puromycin resistance and CFP fluorescence followed by infection at a low M.O.I. (less than 0.3) into a SEM B-ALL cell line stably expressing Cas9. Following puromycin selection, cells were collected at days 0, 7, and 14 and sequenced for sgRNA distribution (Fig. 3b). The differentially represented sgRNAs were calculated by MAGeCK analysis. As expected, positive control sgRNAs dropped out of the screen while negative control sgRNAs were stably represented on day 14 (Fig. 3c). The Spearman’s correlation coefficient was ≥ 0.9 between the different groups, confirming time-dependent variations in sgRNA distribution (Fig. 3d). Survival dependency was only observed in the ZF domain, with the most significant reduction of sgRNAs seen in ZFs 2–9 (FDR ≤ 0.05), which corresponded to the region of the domain previously shown to control DNA sequence-specific interactions  (Fig. 3e). In addition, ZFs 1 and 10 were also enriched for sgRNA loss suggesting these peripheral ZFs were functionally relevant.
ZF1 and ZF10 mutants exhibited disrupted chromatin binding
Since disruption of the core ZF domain globally perturbs CTCF/DNA interactions , we decided to examine how peripheral ZFs 1 and 10, which have not previously been shown to exhibit essential properties, contribute to CTCF function. We designed a model that allowed for cells to switch from endogenous CTCF expression to induced exogenous CTCF expression by combining the CTCFAID2 cell line with induced overexpression of HA-tagged CTCF-WT and CTCF mutants for ZF1 (dZF1), ZF10 (dZF10), and RBR (dRBR), which was included as a negative control since this region was minimally enriched in the domain screen and does not bind DNA (Fig. 4a). Immunoblot analysis confirmed complete degradation of endogenous CTCFAID2 and induced CTCF-HA expression comparable to endogenous for all constructs after induction by doxycycline (Fig. 4b).
The DNA binding affinity of CTCFAID2/dZF1, CTCFAID2/dZF10, and CTCFAID2/dRBR was assessed by performing ChIP-seq using HA-conjugated beads to pull down the induced HA-tagged CTCFs in CTCFAID2/WT, CTCFAID2/dZF1, CTCFAID2/dZF10, and CTCFAID2/dRBR cell lines depleted for endogenous CTCF. Overall, the global binding intensity among the mutants was similar to CTCFAID2/WT (Additional file 1: Fig. S6a). Pearson’s correlation showed that CTCFAID2/dZF10 had the most significant amount of variance in DNA binding when compared to CTCFAID2/WT (Fig. 4c). Heatmaps generated by deepTools  also showed that CTCFAID2/dZF10 exhibited the most differential binding (DB) among the groups when compared to CTCFAID2/WT (4084 lost/525 gained) (Fig. 4d). CTCFAID2/dZF1 also showed DB compared to CTCFAID2/WT (1875 lost/582 gained) (Fig. 4e), but less than was observed for CTCFAID2/dZF10. As expected, CTCFAID2/dRBR showed minimal variance in binding when compared to CTCFAID2/WT (Pearson r = 0.94) (Fig. 4c), and DB of peaks was not significant (29 lost/102 gained) (Fig. 4f). Genomic distribution of the CTCF binding peaks lost and gained in CTCFAID2/dZF10 and CTCFAID2/dZF1 was similar to CTCFAID2/WT (Additional file 1: Fig. S6b–h), with a slight increase in promoter localization among gained peaks that were previously observed following acute depletion of CTCF . Notably, the DB profiles between CTCFAID2/dZF1 and CTCFAID2/dZF10 were distinct with little overlap suggesting these ZFs were independently required for CTCF binding at select genomic locations (Fig. 4g).
Correlation observed between differential gene expression and disrupted CTCF binding of ZF mutants
Since each CTCF ZF mutant exhibited unique dependencies for DNA binding, we hypothesized that loci with disrupted CTCF interactions might correlate with differential gene expression. CTCF binding can regulate transcription either proximally to the transcription start site (TSS) or distally through enhancer regions within the TAD but distinct from TAD boundaries [14, 41,42,43,44,45]. Therefore, the 50-kb region upstream of the TSS through the 50-kb region downstream of the transcription end site (TES) was considered when assigning CTCF binding to DE genes. Steady-state RNA-seq was performed on CTCFAID2 cells following treatment with 10 μM 5-Ph-IAA. A total of 753 DE genes were identified with a stringent cutoff (log2fold change ≥ 1, adjust P ≤ 0.05, CPM ≥ 1) between treated and untreated groups and correlated with genome-wide HA ChIP-seq from above to create a combined Gene/Peak profile. Pareto optimization identified the top associated Gene/Peak pairs (Fig. 5a). The DB profile of CTCFAID2/dZF10 exhibited the most correlation with DE (Fig. 5b; Additional file 1: Fig. S7a, d), with a strong correlation between decreased DE and DB. In total, 167 genes differentially expressed after CTCF loss exhibited DB in CTCFAID2/dZF10 cells. CTCFAID2/dZF1 also showed DE correlated with DB, with 84 genes DE in regions with DB following ZF1 loss (Fig. 5b; Additional file 1: Fig. S7b, d). As expected, the correlation of DE and DB was not seen in the CTCFAID2/dRBR mutant (Fig. 5b; Additional file 1: Fig. S7c, d).
A comparison of the loci correlated with DE and DB from CTCFAID2/dZF1 and CTCFAID2/dZF10 showed these ZFs regulated mutually exclusive gene sets (Additional file 1: Fig. S7e). As an example, the ChIP-seq track for DDN showed two unaltered CTCF binding peaks for CTCF in CTCFAID2/WT, CTCFAID2/dZF1, and CTCFAID2/dRBR. However, the CTCFAID2/dZF10 mutant showed a loss of binding at one of the CTCF peaks, which was associated with decreased gene expression (Fig. 5c, d). Similarly, the ChIP-seq track for FGFBP2 showed reduced CTCF binding was limited to CTCFAID2/dZF1, and gene expression increased upon reduced binding (Fig. 5e, f). These examples highlight the unique specificity ZF1 and ZF10 play in regulating the transcription of a subset of genes controlled by CTCF.
Loci regulated by ZF1 or ZF10 were enriched for a CTCF binding motif associated with its respective binding
We hypothesized that genes whose regulation by CTCF was dependent on ZF1 or ZF10 for chromatin binding might require additional DNA signatures unique to each ZF, in addition to the core CTCF consensus sequence motif, for CTCF binding. Motif enrichment analysis for ZF1 and ZF10 mutants was performed on the loci identified with DB from the previous analysis. Loci with decreased CTCF binding in CTCFAID2/dZF1 cells were enriched to include the 20 base−pair CTCF consensus sequence motif and an additional DNA sequence overlapping and 3′ of the consensus sequence. The motif clustering automatically identified 5 clusters with a slight difference at CTCF consensus core motifs. Strikingly, all 5 clusters were enriched for a dominant G nucleotide positioned at position + 4 proximal to the core motif. In total, the G signature represented more than 82% of decreased CTCF-bound sites (Fig. 6a) . The G signature was not enriched in control peaks, but only in CTCFAID2/dZF1 with decreased binding, highlighting it was specifically enriched in loci dependent on ZF1 for binding (Fig. 6b). While the interaction between ZF1 and a motif downstream the consensus sequence motif has been speculated , this is the first direct in vivo evidence that ZF1 interacts with a specific DNA signature.
Like observations for ZF1, loci that exhibited decreased CTCF binding upon ZF10 loss were enriched for a sequence 5′ of the core consensus sequence motif that shared similarity to the previously identified and predicted upstream (U) motif [22, 25, 26, 47, 48]. When CTCFAID2/dZF10 DB peaks were compared to CTCFAID2/WT, the upstream sequence motif was observed in 79.1% of decreased CTCF-bound sites (Fig. 6c). The consensus sequences fell into 5 clusters with the majority in clusters 2 and 3. As was seen for ZF1, there was no enrichment of the upstream motif in control peaks that do not change when comparing CTCFAID2/dZF10 against CTCFAID2/WT (Fig. 6d).
Moreover, we also identified a few interesting clusters that demonstrated conserved sequence signature in CTCFAID2/dZF1 decreased site cluster 5 (Fig. 6a) and CTCFAID2/dZF10 decreased site clusters 4 and 5 (Fig. 6c). The consensus region was too long to be associated with transcription factor motifs. Instead, almost all regions from these clusters could be assigned to a transposable element. Out of 58 sites for CTCFAID2/dZF1 decreased site cluster 5, 40 were LTR41 and 18 were LTR41B. All 82 sites for CTCFAID2/dZF10 decreased site cluster 4 were LTR13. And 63 out of 64 sites for CTCFAID2/dZF10 decreased site cluster 5 could be assigned to LINE L1. It has been reported that LTR13/LTR41/LINE elements were associated with CTCF . Our study confirmed this observation and further suggested specifical ZFs can be related to the individual transposable element type. The functional correlation between these motifs and CTCF protein remains an interesting topic for further investigation.
In this study, we demonstrated the new AID2 degradation system is a powerful and superior tool to degrade endogenous CTCF quickly and efficiently with hundreds of folds less auxin, overcoming combined off-target effects of high auxin concentrations and CTCF depletion. As was previously observed by us and others using the AID1 system [12, 18, 19], CTCF degradation using the AID2 system led to a genome-wide loss of CTCF chromatin binding and looping with a modest effect on genome-wide transcription regulation. A main benefit observed using the new AID2 system was reduced cellular toxicity following CTCF depletion. In addition, SLAM-seq analysis revealed degradation of CTCF using the AID1 system caused a significant early shift in nascent and steady-state gene expression that would not be anticipated due to retained CTCF expression at early time points, which was not observed using AID2, further supporting the AID2 degradation system is superior for transcriptional studies.
Although the mechanism for CTCF interacting with a core consensus DNA sequence via its ZF domain, specifically ZFs 3–7, has been well characterized, the functional significance of the peripheral regions of the ZF domain is less clear. Rhee and Pugh used ChIP-exo to identify a four-part module for CTCF binding, including module 1, which resides upstream of the CTCF binding core motif; modules 2 and 3, which cover the core motif; and module 4, which resides downstream of the core motif . More than half of CTCF binding events required only modules 2 and 3, with the remaining binding events utilizing a combination of 3 or four modules. The upstream motif has also been confirmed by others [25, 26]. Nakahashi et al. developed a model to study CTCF ZF interactions with DNA by mutating histidine residues that coordinate zinc binding . Their study supported that ZFs 4–7 were critical for binding to the core motif and suggested that the peripheral ZFs stabilized CTCF occupancy with a stronger association with binding attributed to ZFs closer to the core motif. Further studies of the crystal structure of CTCF confirmed ZFs 3–7 interact with the core motif and supported ZFs 2–9 make DNA-specific contacts , and ZFs 9–11 were shown to interact with module 1, which is observed in 15% of CTCF binding sites . However, these studies’ limitations rely on ectopically expressed CTCF mutants in the presence of an endogenous CTCF background or biochemistry characterization in vitro, which is not ideal for revealing cellular regulation in physiological conditions.
Soochit et al. have generated a similar swapping system using neomycin-resistant lentiviral constructs containing green fluorescent protein (GFP)-tagged wild-type Ctcf or mutants with deletions of individual ZFs together with an IRES-driven Cre in mouse embryonic stem cells (ESCs). These ESC lines carrying a loxP-flanked Ctcf allele were infected with these lentiviruses to delete endogenous Ctcf protein and were rescued by constitutively expressed Ctcf mutants at various expression levels. They found that mouse ESCs expressing Ctcf mutants lacking ZF1 (del1) and ZF10 (del10) are viable with reduced proliferation rates. They also identified the identical flanking sequences associated with del1 and del 10, as described in our study . Of note, while our manuscript was under peer review, a similar study that utilized the ZF1 mutant knockin MCF7 cell line identified the same conserved G associated with ZF1 binding in CTCF’s binding motif identified in this study . Their data, which was highly consistent with ours, supports and strengthens our findings, since independent approaches were used in each study. Moreover, our elegant AID2 system also allows the expression and functional study to shift from endogenous CTCF to any mutant forms, including those affecting cellular viability. We believe this new model system will significantly advance the understanding of CTCF biology in the future.
By combining AID2 degradation of CTCF with induced exogenous expression of CTCF ZF1 and ZF10 mutants, we developed a new cellular tool to study the impact of CTCF domain mutants on CTCF function. Our findings provide the first direct support that ZFs 1 and 10 bind loci through the recognition of individual consensus sequence motifs highlighting loci regulated by CTCF contain different CTCF binding motifs dependent on how the ZF domain interacts with the DNA, supporting the “CTCF code” model of multivalent binding . Moreover, this binding controls the transcription of a subset of genes regulated by CTCF, uncovering functional roles of the less conserved upstream and downstream motifs, and defining CTCF binding motifs for loci regulated by peripheral ZF binding. Future studies mutating additional ZFs and ZF combinations would further define CTCF’s binding specificities. While not performed in this study, utilization of the cellular model developed here could be combined with chromatin studies to examine how ZF mutants affect chromatin architecture and organization.
The pCDH-MND-OsTIR1(F74G)-P2A-EGFPAID2-EF1a-RFP construct was made by Gibson assembly . The OsTIR1(F74G)-P2A-EGFPAID2 fragment was amplified from the pAAV-hSyn-OsTIR1(F74G) vector (Addgene, 140,730) and cloned into the EcoR1 site of pCDH-EF1a-RFP. After cloning, the CMV promoter was replaced by Gibson assembly with an MND promoter to overcome promoter silencing . The inducible CTCF series was created by cloning the WT CTCF and CTCF dZF1, dZF10, and dRBR (− 120 bps following ZF11) into a Tet-on-3G-inducible vector. Primers were designed to amplify CTCF from a pT2K-CTCF-HA construct we had previously cloned. The mutants were generated by designing primers to amplify two fragments that flanked either ZF1, ZF10, or the RBR and would exclude these features after Gibson assembly. Snapgene software was used to design all primers used for cloning. The PCR reactions to amplify all products for cloning were performed using CloneAmp polymerase (Clontech) and the cycling parameters were 98 °C for 5 min, followed by 40 cycles of 98 °C for 15 s, 55 °C for 20 s, and 72 °C for 20 s. Gibson assembly reaction mix was made as previously described , and all reactions were carried out at 50 °C for 20 min. All primer information was included in the Additional file 2: Table S1.
Generation of the CTCFAID2 cell line and cell culture
The CTCFAID2 cell line was created by infecting the pCDH-MND-OsTIR1(F74G)-P2A-EGFPAID2 EF1a RFP construct containing an EGFP AID-tagged control  into a previously derived SEM B-ALL cell line expressing the endogenous CTCFAIDmClover3 fusion protein and doxycycline-inducible wild-type (WT) OsTIR1 in the AAVS1 safe harbor locus (CTCFAID1) . The SEMOsTIR1(F74G) cell line was created by infecting the pCDH-MND-OsTIR1(F74G)-P2A-EGFPAID2 EF1a RFP construct containing an EGFP AID-tagged control into SEMWT cells.
The SEMWT, SEMOsTIR1(F74G), CTCFAID1, and CTCFAID2 cell lines were maintained in RPMI-1640 medium (Lonza) containing 10% fetal bovine serum (FBS) (Hyclone), 2 mM glutamine (Sigma), and 1% penicillin/streptomycin (Thermo Fisher Scientific). All cells were maintained at 37 °C in a 5% CO2 atmosphere and 95% humidity. Cells were tested negative for mycoplasma infection. The cell identity of SEM was confirmed by short tandem repeat (STR) analysis.
Growth assays were performed by plating 1 million cells in 3-mL RPMI medium supplemented with either DMSO or AID1 (500 μM IAA and 1 μg/mL doxycycline) or AID2 (10 μM 5-Ph-IAA) treatment conditions in a 6-well tissue culture plate. Samples were set up in triplicate. Cells were counted using a Countess II automated cell counter (Invitrogen) daily for 4 days. At each count, cells were submitted for cell cycle analysis.
CTCFAID1 cells were cultured in a medium supplemented with doxycycline (1 μg/mL) to induce the expression of OsTIR1 and 500 μM IAA (natural auxin) (Abcam) to induce degradation of CTCF. CTCFAID2 cells were cultured in a medium supplemented with 0.001–10 μM 5-Ph-IAA (MedChemExpress) to induce the degradation of CTCF. CTCFAID2/WT, CTCFAID2/dZF1, CTCFAID2/dZF10, and CTCFAID2/dRBR cells were cultured in a medium supplemented with 10 μM 5-Ph-IAA to induce endogenous CTCF degradation for 6 h and 1 μg/mL doxycycline was added for 18 h to induce exogenous HA-tagged CTCF expression.
To determine the percentage of RFP or mClover3-positive, suspension-cultured CTCFAID2 cells were collected and filtered through a 70-micron cell strainer before flow cytometry sorting. DAPI was added to the cell suspension to exclude dead cells. Fluorescence from mClover3 was detected using the same FL1/FITC channel as GFP.
Cell lysates were prepared by using RIPA buffer. Lysates were run on an SDS-PAGE (Thermo Fisher Scientific) gel and transferred to a PVDF membrane according to the manufacturer’s protocols (Bio-Rad) at 100 V for 1 h. After blocking incubation with 5% non-fat milk in TBS-T (10 mM Tris, pH 8.0, 150 mM NaCl, 0.5% Tween-20) for 1 h at room temperature, membranes were incubated with antibodies against GAPDH (Thermo Fisher Scientific, AM4300, 1:10,000), AID (MBL, M3-214–3, 1:2000), OsTIR1 (MBL, PD048, 1:1000), or CTCF (Santa Cruz, sc-271514, 1:200) at 4 °C overnight with gentle shaking. Membranes were washed three times for 10 min each with TBS-T and incubated with a 1:2000 (CTCF), 1:5000 (AID), or 1:20,000 (GAPDH) dilution of sheep anti-mouse IgG HRP (GE Healthcare, NA931) or 1:5000 (OsTIR1) dilution of donkey anti-rabbit IgG HRP (GE Healthcare, NA340) for 1 h at room temperature. Blots were washed with TBS-T three times for 10 min each and developed with the ECL system (Perkin Elmer) according to the manufacturer’s protocol. Uncropped raw data were provided in Additional file 4.
For CTCF ChIP-seq, CTCFAID2 cells were treated with 1 μM 5-Ph-IAA for 24 h. For HA ChIP-seq, CTCFAID2 cells with induced exogenous expression of HA-CTCF WT, HA-CTCF-dZF1, HA-CTCF-dZF10, and HA-CTCF-dRBR were treated for 24 h with 10 μM 5-Ph-IAA and 18 h with 1 μg/mL doxycycline. Treated and untreated samples were collected for all in duplicate. For each sample, 20 million cells were fixed with 1% formaldehyde for 5 min at room temperature using the Covaris TruChIP Chromatin Shearing Kit (Covaris, 520,154). Nuclei were prepared according to the TruChIP protocol and chromatin was sheared in a Covaris milli tube using the Covaris M220 ultrasonicator set at a duty factor of 10 and 200 cycles/burst for 10 min at set point 6 °C. Sheared chromatin was centrifuged for 10 min at 8000 × g and clarified chromatin was moved to a new 1.5-mL Eppendorf tube. Chromatin was amended to a final concentration of 50 mM Tris–HCL pH 7.4, 100 mM NaCl, 1 mM EDTA, 1% NP-40, 0.1% SDS, and 0.5% Na deoxycholate plus protease inhibitors (PI). For CTCF ChIP, 8 μg CTCF antibody (Diagenode, c15410210-50) was added to the chromatin and incubated overnight at 4 °C with gentle rotation. Spike-in chromatin and antibody were added to the chromatin according to the manufacturer’s protocol (Active Motif). The next day, protein-G magnetic beads (Pierce) were washed and added for 4 h incubation at 4 °C with gentle rotation. For HA ChIP, anti-HA magnetic beads (Pierce) were washed, added to the chromatin, and incubated at 4 °C for 4 h with gentle rotation (no spike-in included). Samples were placed on a magnetic stand, unbound chromatin was removed, and beads were washed 2 times with wash buffer 1 (50 mM Tris–HCL pH 7.4, 1 M NaCl, 1 mM EDTA, 1% NP-40, 0.1% SDS, 0.5% Na deoxycholate plus PI) and 1 time with wash buffer 2 (20 mM Tris–HCL pH 7.4, 10 mM MgCl2, 0.2% Tween-20 plus PI). The beads were resuspended in wash buffer 2 and transferred to a new 1.5-mL Eppendorf tube. Samples were placed on a magnetic stand to remove the wash buffer. DNA was eluted and de-crosslinked in 1X TE plus 1% SDS, proteinase K, and 400 mM NaCl at 65 °C for 4 h. DNA was precipitated by phenol, chloroform, and isopropyl alcohol. Libraries were constructed by NEBNext Ultra II NEB Library Prep Kit and NEBNext Multiplex oligos for Illumina.
The HiChIP protocol was adapted from Mumbach et al. . Ten million CTCFAID2 cells were treated with DMSO or 10 μM 5-Ph-IAA for 6 h. Cells were fixed with 2% formaldehyde for 10 min at room temperature. Fixed pellets were lysed in 500 μL of ice-cold HiC lysis buffer with PI (10 mM Tris–HCl pH 8.0, 10 mM NaCl, 0.2% Igepal CA630, 1X cOmplete PI) at 4 °C with rotation for 20 min followed by centrifugation at 2500 × g for 5 min at 4 °C. The pelleted nuclei were washed once with 500 μL of ice-cold HiC lysis buffer + PI, resuspended in 100 μL of 0.5% SDS, and incubated at 62 °C for 10 min with no shaking. After incubation, 285 μL of water and 50 μL of 10% Triton X-100 (Sigma, 93,443) were added and incubated at 37 °C for 15 min. Fifty microliters of 10X NEBuffer2 and 15 μL of 25 U/μL MboI restriction enzyme (375U) (NEB, R0147M) were added, and chromatin was digested on the Eppendorf thermomixer for 2 h at 37 °C with 900 rpm and a heated cover. To inactivate MboI, the reaction was incubated at 62 °C for 20 min, then cooled to room temperature. Restriction fragment overhangs were filled in and marked with biotin by adding 54 μL of fill-in master mix [37.5 μL of 0.4 mM biotin-14-dATP (Life Technologies, 19,524–016), 1.5 μL of 10 mM dCTP (Thermo Fisher, R0181), 1.5 μL of 10 mM dGTP (Thermo Fisher, R0181), 1.5 μL of 10 mM dTTP (Thermo Fisher, R0181), 12 μL of 5 U/μL DNA polymerase I, Large (Klenow) Fragment (NEB, M0210)] and incubated on Eppendorf thermomixer at 37 °C for 1 h and 15 min with shaking speed at 900 rpm. Ligation of proximity ends was carried out by adding 946 μL of ligation master mix [651.5 μL of water, 150 μL of 10X NEB T4 DNA ligase buffer (NEB, B0202), 125 μL of 10% Triton X-100, 7.5 μL of 20 mg/mL bovine serum albumin (NEB, B9000S), 12 μL of 400 U/μL T4 DNA ligase (NEB, M0202L)] and incubating on an Eppendorf thermomixer at 16 °C overnight with shaking speed at 900 rpm. Nuclei were collected by centrifugation, washed, and resuspended in Covaris shearing buffer plus PI and sheared in a Covaris milli tube at 30% duty factor, 200 cycles/burst for 10 min in a Covaris LE220-plus. Samples were clarified by centrifuge at 2500 rcf for 5 min at 4 °C, then the supernatant was transferred to a 15-mL DNA LoBind Conical Tube (Eppendorf, 0,030,122,208). About 3 mL of freshly made 1X ChIP dilution buffer + PI (21 mM Tris pH 8.0, 167 mM NaCl, 1 mM EDTA, 1.1% Triton X, 0.1% NP-40) was added to the pellet. Then, 60-μL washed DiaMag protein A/G magnetic beads were added and incubated at 4 °C for 1 h to preclear the lysates. Lysates and beads were separated by placing them on a magnetic stand and lysates were transferred to a new tube. CTCF antibody [8 μg Diagenode CTCF rabbit polyclonal antibody (C15410210-50)] was added and incubated at 4 °C overnight with rotation. Sixty microliters of Diagenode DiaMag protein A/G-coated magnetic beads was washed twice in 200 μL freshly made DiaMag beads wash buffer (1X ChIP dilution buffer + 0.1% BSA) and then resuspended in 1 mL of DiaMag beads wash buffer and rotated at 4 °C overnight. Blocked protein beads were added to the sample lysates and incubated with rotation at 4 °C for 3 h. Beads were washed one time with Cold Low Salt Wash Buffer (20 mM Tris pH 8.0, 150 mM NaCl, 2 mM EDTA, 1% Triton X, 0.1% SDS), 2 times with Cold High Salt Wash Buffer (20 mM Tris pH 8.0, 500 mM NaCl, 2 mM EDTA, 1% Triton X, 0.1% SDS), 1 time with Cold LiCl Wash Buffer (10 mM Tris pH 8.0, 250 mM LiCl, 1 mM EDTA, 1% NP-40, 1% Na deoxycholate), and 2 times with Cold TE Buffer, pH 8.0. DNA was eluted by resuspending the beads in 100 μL of DNA elution buffer (50 mM NaHCO3, 1% SDS) and incubating at room temperature for 10 min with rotation, followed by 3 min at 37 °C with shaking. Samples were placed on a magnetic stand and the supernatant was transferred to a new tube. The elution was repeated, and eluates combined. Samples were de-crosslinked by adding 10 μL of 20 mg/mL proteinase K and incubating at 55 °C for 45 min with shaking. The temperature was increased to 67 °C and samples were incubated for 1.5 h with shaking. DNA was extracted and eluted in 23 μL elution buffer using the Qiagen MinElute kit (Qiagen MinElute PCR Purification Kit, 28,006). For biotin pull-down, 15 μL of 10 mg/mL Dynabeads MyOne Streptavidin T1 beads (Life Technologies, 65,602) was washed with 150 μL of 1X Tween Washing Buffer [1X TWB: 5 mM Tris–HCl (pH 7.5); 0.5 mM EDTA; 1 M NaCl; 0.05% Tween 20]. After washing, the beads were resuspended in 150 μL of 2X binding buffer (2X BB: 10 mM Tris–HCl (pH 7.5); 1 mM EDTA; 2 M NaCl) and added to the DNA followed by incubation at room temperature for 15 min with rotation to bind biotinylated DNA to the streptavidin beads. Beads were separated on a magnetic stand and washed by adding 400 μL of 1X TWB and transferring the mixture to a new tube. The tubes were heated on a Thermomixer at 55 °C for 2 min with 950 rpm mixing. The wash was repeated one more time followed by one wash using 200 μL of 1X Tris buffer. After washing, the beads were resuspended in 50 μL 1 × Tris buffer. The Roche Kapa HyperPrep Kit (KK8502/KK8504) and Illumina TruSeq DNA UD Indexes were used to prepare library DNA.
HiChIP/HiC data analysis
For HiChIP, paired-end reads of 151 bp were confirmed for the enrichment of MboI ligation sites (GATCGATC) and trimmed for adapters by fastp (version 0.20.0, paired-end mode, parameter as “–detect_adapter_for_pe –trim_poly_x –cut_by_quality5 –cut_by_quality3 –cut_mean_quality 15 –length_required 20 –low_complexity_filter –complexity_threshold 30”) . Then, trimmed reads were processed by HiC-Pro (version 2.11.4)  using human genome hg19 (GRC37 from GENCODE) and MboI fragment file (cut site GATC). Bowtie2-2.2.4, samtools-1.2, R-3.4.0, and Python-2.7.12 were configured for HiC-Pro. allValidPairs files from the HiC-Pro pipeline were then used to generate.hic file for visualization. Both samples have 2 biological replicates with good depth (~ 150 M pairs) and comparable metrics with the published data (GEO id: GSE80820)  such as a valid interaction rate (95.6 to 97.8% compared to GEO 78.65 to 80.53%). After confirmation of reproducibility by HiC-Spector  and HiCRep , contacts from replicates were merged and call loops using FitHiChIP (Stringent mode using Coverage for normalization)  based on CTCF peaks from GSM3312803 . We also confirmed loops mostly overlapped CTCF peaks and convergent CTCF motif patterns.
For HiC, the pair-end reads of 76-bp reads were processed by Juicer (v1.5, default parameters)  based on hg19 and MboI fragment (ligation sites GATCGATC). Eight replicates were processed separately, and reproducibility was confirmed by HiC-Spector  and HiCRep  (see code repository https://doi.org/10.6084/m9.figshare.21002533 for details). Then, we merged all replicates to reach the highest resolution of HiC data for SEM cells (2.7 billion reads, 2 billion contacts). We called 6720 loops by HiCCUPS from the Juicer package (parameters: “-r 5000,10,000,25,000 -f 0.2,0.2,0.2 -p 4,2,1 -i 7,5,3 -t 0.05,1.25,1.25,1.5 -d 20,000,20,000,50,000”) and confirmed they mostly overlapped CTCF peaks and convergent CTCF motif pattern.
APA scores refer to the ratio of the mean central pixels to the mean of pixels in the lower left corner (a.k.a P2LL) in Aggregate Peak Analysis . We used the APA function from Juicer pipeline  for the analysis at 5-kb resolution and then plotted using normalized signal (“normedAPA” version from Juicer APA output, https://github.com/aidenlab/juicer/wiki/APA). For APA analysis in Fig. 1m, we included the CTCF HiChIP signal for − 5-Ph-IAA and + 5-Ph-IAA loops at all 7220 loops called in CTCF HiChIP of − 5-Ph-IAA.
The SLAM-seq protocol was adapted from Herzog et al. 2018 Protocol Exchange and Muhar et al. [38, 63]. CTCFAID1 or CTCFAID2 cells were treated with either 500 μM IAA, 1 μg/mL doxycycline (AID1), or 10 μM 5-Ph-IAA (AID2) for 2, 4, 6, 12, or 24 h. For the last hour of treatment, 250 μM 4SU was added to the media. Cells not treated with auxin received a 1-h treatment of 250 μM 4SU for labeling. After 4SU labeling, cells were collected under restricted light. Total RNA was extracted by Trizol in a dark room. The reaction conditions for thiol modification were 3 μg RNA, 10 mM iodoacetamide (made fresh by dilution in 100% ETOH), 50 mM NaPO4, and 50% DMSO. The reaction was incubated at 50 °C for 15 min and 1 μL 1 M DTT was added to stop the reaction. RNA was purified using the RNA clean and concentrator kit from Zymo Research and libraries were prepared using the Quant-seq 3′ end mRNA library prep kit for Illumina (Lexogen).
SLAM-seq data analysis
Single-end 51-cycle sequencing was performed using the NovaSeq 6000 platform following the manufacturer’s instructions (Illumina). Quantifying 3′ UTR T > C reads was performed using SlamDunk (v0.4.2)  software according to the protocol (http://t-neumann.github.io/slamdunk/docs.html, using the command “slamdunk all” with default options) with primary assembly GRCh38.p12 and Gencode annotation v31 3′ UTR definitions. Only non-overlapping 3′ UTRs > 10 bp were retained. For each sample, T > C counts were collapsed by the gene of origin, using the SlamDunk module “alleyoop collapse”; genes with CPM ≤ 10 were removed from downstream analyses. Conversion rates were quantified by slamdunk module “alleyoop rates.” Package edgeR (v3.24.3)  in the R environment (v3.5.1) was used to normalize gene expression with function calcNormFactors (method = “TMM”). Differentially expressed genes were called using the trimmed mean of M values (TMM) normalization factors and raw counts in a Limma-voom analysis using the “voom,” “lmFit,” and “eBayes” functions from the limma (v3.42.2) R package, with statistical significance threshold p value ≤ 0.05 and |log2(FC)|≥ 1 . Gene set enrichment analyses were performed using gseapy (version 10.4; method = “signal_to_noise”), a pythonic wrapper/implementation of GSEA , with the normalized gene expression values from samples of two treatment arms and the hallmark gene sets from MSigDB (v7.2) . The principal component analysis (PCA) was performed using the TMM normalized log2(CPM) (counts per million) and ranked (descending) using the median absolute deviation as implemented by the “mad” function in R. The top 3000 most variable genes were used to perform the analysis. Both the PCA and the visualization were performed using the R package PCAtools v2.3.7, first using the function “pca” with default values. The first two principal components (PC1 and PC2) were visualized using the function “biplot.”
CTCF domain library design and drop-out screen
A total of 512 20-bp sgRNAs were designed by FlashFry (version 1.12)  to span all CTCF exons. Low-quality sgRNAs (extreme GC content, polyT, non-unique, Hsu2013 score ≤ 55, or DoenchCFD_specificityscore ≤ 0.02) were excluded [69, 70]. One hundred and twenty positive control sgRNAs and 100 negative control sgRNAs were also included. The oligonucleotides for the sgRNAs designed were synthesized by CustomArray. Forward library PCR primer (5′-GGCTTTATATATCTTGTGGAAAGGACGAAACACC-3′) (10 μM) and reverse library PCR primer (5′-CTAGCCTTATTTTAACTTGCTATTTCTAGCTCTAAAAC-3′) (10 μM) were used to amplify the sgRNA oligonucleotides using 2X HiFi CloneAmp PCR mixture (Clontech) under the following PCR conditions: 98 °C 3 min, 98 °C 10 s, 55 °C 10 s, 72 °C 10 s, 72 °C 5 min, and 4 °C hold for 12 cycles. The amplified product was run on a SybrGreen stained 2% agarose gel and bands were excised for gel purification by a Qiagen Gel Purification kit. The amplified sgRNAs (10 ng) were cloned into the LentiGuide-Puro (#52,963) backbone cut by BsmB1 (100 ng) using the NEbuilder HiFi DNA assembly master mix (NEB) at 50 °C for 1 h. Eight 50-μL vials of NEB stable competent E. coli high-efficiency cells (NEB, C3030H) were thawed on ice and 2 μL of the assembled reaction was added to each. Cells were incubated on ice for 30 min, heat shocked at 42 °C for 30 s, and then placed on ice. NEB 10-beta/stable outgrowth medium was added to the heat-shocked cells (950 mL per vial) and incubated at 30 °C for 60 min at 250 rpm. Recovered cells were plated at 2.5 mL per square LB + ampicillin dish (245 mm × 245 mm) and incubated at 30 °C overnight. Bacterial colonies were counted to ensure good library coverage and collected for DNA extraction of the pooled sgRNA library by Qiagen Maxi prep (Qiagen).
CTCFAID2 cells stably expressing lentiviral Cas9-blasticidin were infected with the pooled sgRNA library at low M.O.I (~ 0.3). Infected cells were selected with blasticidin and puromycin for 3 days. On days 7 and 14 post-antibiotic selection, cells were collected and the sgRNA sequences were recovered by genomic PCR analysis (Additional file 2: Table S1, Nextera primers), indexed (Nextera, FC-131–1096), and sequenced using NovaSeq 6000 for single-end 151 bp read length (Illumina). The sgRNA sequences are described in Additional file 3: Table S2. High-titer lentivirus stocks were generated in 293 T cells as previously described .
CRISPR-Cas9 tiling-sgRNA knockout screen data analysis
Raw data 151-bp reads were obtained from Illumina NovaSeq and trimmed for adapters. Then, we counted 20mer by bbmap (version 37.28, “kmercountexact.sh fastadump = f mincount = 1 k = 20 rcomp = f”) and assigned to sgRNAs. MAGeCK (version 0.5.9.4, default parameters)  was used for statistical analysis and results were then extracted to make protein domain-based plots using Protiler (version 1.0.0) .
Total RNA was extracted by Trizol (Thermo Fisher Scientific, 15,596,026) from replicate samples of SEMWT and CTCFAID1 cells treated with either DMSO or 500 μM IAA, 1 μg/mL Dox for 1 day, and CTCFAID2 cells treated with either DMSO or 10 μM 5-Ph-IAA for 1, 4, and 7 days. About 200 ng total RNA was treated using Kapa rRNA depletion reagents to remove ribosomal RNA, then converted into cDNA libraries using Kapa RNA HyperPrep Kit with RiboErase (HMR). After end repair, dA-tailing, and adapter ligation, each cDNA library was purified and enriched by 11 cycles of PCR amplification.
RNA-seq data analysis
Paired-end 101-cycle sequencing was performed on the NovaSeq 6000 sequencer following the manufacturer’s instructions (Illumina). Raw reads were first trimmed using TrimGalore (v0.6.3) with parameters “–paired –retain_unpaired.” Filtered reads were then mapped to the Homo sapiens reference genome GRCh37.p13 using STAR (v2.7.9a) . Gene-level read quantification was done using RSEM (v1.3.1)  on the Gencode annotation v19 . To identify the differentially expressed genes, normalization factors were first estimated using the TMM and genes with CPM ≤ 1 in all samples were removed. Next, the TMM normalization factors and raw counts were then used for the Limma-voom analysis using the “voom,” “lmFit,” and “eBayes” functions from the limma R package . Gene set enrichment analysis (GSEA) was performed using the MsigDB database (v7.1). Differentially expressed genes were ranked based on their log2(FC) . The principal component analysis (PCA) plots were generated from the TMM normalized data. Based on the log2(CPM) data, we ranked the genes based on their median absolute deviation (using the “mad” function in R) as it is a more robust statistic against outliers. The log2(CPM) of the top 3000 variable genes was passed to the “prcomp” function to do PCA analysis. The first two principal components were used to generate the PCA plots.
ChIP-seq data analysis
Single-end 51-cycle sequencing was performed on the NovaSeq 6000 sequencer following the manufacturer’s instructions (Illumina). Raw reads were first trimmed using TrimGalore (v0.6.3). Filtered reads were then aligned using BWA (v0.7.17-r1198)  to the Homo sapiens reference genome GRCm37.p13 or to a hybrid-genome between the human GRCm37.p13 genome and the Drosophila melanogaster (dm6) genome if they have spike-in materials. Duplicated reads were marked using the “bamsormadup” function from the biobambam2 tool (v2.0.87) (https://gitlab.com/german.tischler/biobambam2). PCR duplicates and low mapping quality reads (MAPQ ≤ 1) were removed using samtools (version 1.9, parameter “-q 1 -F 1024”) . Human and Drosophila reads were then extracted into two separated bam files. The uniquely mapped reads in the human genome were then used to estimate the average fragment length in each sample based on the cross-correlation profile calculated from SPP (v1.11) . The smallest fragment size estimated by SPP was used to center and extend reads to generate bigwig files. If the samples did not contain spike-in materials, the bigwig signals were scaled to 15 million reads (i.e., scaling-factor = 15e6/libSize). If the samples contained spike-in, the bigwigs were generated by scaling the number of uniquely mapped spike-in reads to 1 million reads (i.e., scaling-factor = 1e6/spike-in_counts). Macs2 (v 2.1.1) was used to call peaks using parameters “-g hs –nomodel –extsize < SPP_fragmentSize > ” . Two sets of peaks were generated: (i) “high confidence peaks”: called with FDR ≤ 0.05 (parameter “-q 0.05”) and (ii) “low confidence peaks” called with FDR ≤ 0.5 (parameter “-q 0.5”). We consider a peak to be reproducible if it was called as a high confidence peak in at least one replicate that also overlapped with a low confidence peak in the other replicates.
Differential ChIP-seq peak analysis
Reproducible peaks from treated and untreated cells were merged together. For each peak, we counted the number of overlapping ChIP-seq fragments generated for the paired reads based on the estimated fragment size from SPP (slopBed -s -l 0 -r < SPP_fragmentSize >) in each sample (bedtools v2.24.0) . For the HA-CTCF-ZF mutants, counts were normalized using the trimmed mean of M values (TMM) method. For the CTCF AID2 samples, were normalized based on the median of the ratios of observed counts of spike-in . We identified the differential peaks using the empirical Bayes method (eBayes function from the limma R package) . For downstream analyses, heatmaps were generated by deepTools . Peaks were annotated based on Gencode v19 genes coordinates following this priority: “Promoter.Up”: if they fall within TSS − 2 kb, “Promoter.Down”: if they fall within TSS − 2 kb, “Exonic” or “intronic”: if they fall within an exon or intron of any isoform, “TES peaks”: if they fall within TES ± 2 kb, “distal5” or “distal3” if they are with 50 kb upstream of TSS or 50 kb downstream of TES, respectively, and they are classified as “intergenic” if they do not fit in any of the previous categories.
For CTCF flanking motif analysis, we first scanned the CTCF motif (TRANSFAC M01259) by FIMO (MEME suite 5.3.3, “–thresh 1e-4”) and retrieved sequence ± 20 bp flanking the motif matches. We then generated a heatmap by seaborn (v0.11.1) using hierarchical clustering (hamming distance) from scipy (v1.6.2), and the motif consensus logo was generated by logomaker (v0.8).
Integrative analysis of RNA-seq and ChIP-seq changes
To identify differential CTCF peaks correlated with gene expression changes after CTCF depletion and exogenous CTCF HA mutant expression, we adapted some ideas from the intePareto method . For each gene \(g\), we converted its RNA-seq log2(FC) to a z-score by scaling the log2(FC) to the standard deviation of all fold changes in the sample using the following formula:
Instead of associating a single peak to each gene as it was done in the original intePareto method, we associated a gene to all peaks in its genomic neighborhood defined as [TSSg − 50 kb, TESg + 50 kb] to be able to unbiasedly identify the most correlated peak. Similarly, we converted the fold change value of each peak \(p\) in [TSSg − 50 kb, TESg + 50 kb] to a z-score using the same formula but using ChIP-seq fold change values:
For each gene-peak pair, we calculated a combined z-score by multiplying their z-scores as follows (Fig. 5a):
The multi-objective Pareto optimization [https://ieeexplore.ieee.org/abstract/document/1599245] was then calculated using the “psel” function from the “rPref” R/package (v1.3) [https://journal.r-project.org/archive/2016/RJ-2016-054/index.html]. The peaks from the top 10 best Pareto levels were selected as the most correlated/anti-correlated.
Statistical analysis was done using R (v4.0.1), python 3.6, or GraphPad Prism software version 9. Heatmaps were generated using the pheatmap R/package. The ChIP-seq heatmaps were generated using deepTools (v3.5.0).
Availability of data and materials
Data generated in this study, including total RNA-seq, nascent RNA-seq, ChIP-seq, and HiChIP, were deposited at NCBI GEO (Super-series GSE205218) . Publicly available datasets (GSE80820 and GSM3312803) were referenced [86, 87]. Code repositories collected at https://doi.org/10.6084/m9.figshare.c.6186670 included ChIP-seq QC (https://doi.org/10.6084/m9.figshare.21002533), Integrative analysis ChIP-seq and RNA-seq (https://doi.org/10.6084/m9.figshare.21045889), Hi-C and HiChIP analysis (https://doi.org/10.6084/m9.figshare.21002533), and SLAM-seq analysis (https://doi.org/10.6084/m9.figshare.21259278) [88,89,90,91,92].
Phillips JE, Corces VG. CTCF: master weaver of the genome. Cell. 2009;137(7):1194–211.
Bell AC, West AG, Felsenfeld G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell. 1999;98(3):387–96.
Hark AT, Schoenherr CJ, Katz DJ, Ingram RS, Levorse JM, Tilghman SM. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature. 2000;405(6785):486–9.
Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012;485(7398):376–80.
Phillips-Cremins JE, Corces VG. Chromatin insulators: linking genome organization to cellular function. Mol Cell. 2013;50(4):461–74.
Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80.
Ghirlando R, Felsenfeld G. CTCF: making the right connections. Genes Dev. 2016;30(8):881–91.
Nanni L, Ceri S, Logie C. Spatial patterns of CTCF sites define the anatomy of TADs and their boundaries. Genome Biol. 2020;21(1):197.
Vostrov AA, Quitschke WW. The zinc finger protein CTCF binds to the APBbeta domain of the amyloid beta-protein precursor promoter. evidence for a role in transcriptional activation. J Biol Chem. 1997;272(52):33353–9.
Lobanenkov VV, Nicolas RH, Adler VV, Paterson H, Klenova EM, Polotskaja AV, et al. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5′-flanking sequence of the chicken c-myc gene. Oncogene. 1990;5(12):1743–53.
Tang Z, Luo OJ, Li X, Zheng M, Zhu JJ, Szalaj P, et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell. 2015;163(7):1611–27.
Hyle J, Zhang Y, Wright S, Xu B, Shao Y, Easton J, et al. Acute depletion of CTCF directly affects MYC regulation through loss of enhancer-promoter looping. Nucleic Acids Res. 2019;47(13):6699–713.
Pombo A, Dillon N. Three-dimensional genome architecture: players and mechanisms. Nat Rev Mol Cell Biol. 2015;16(4):245–57.
Ong CT, Corces VG. CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet. 2014;15(4):234–46.
Hnisz D, Day DS, Young RA. Insulated neighborhoods: structural and functional units of mammalian gene control. Cell. 2016;167(5):1188–200.
Dekker J, Mirny L. The 3D genome as moderator of chromosomal communication. Cell. 2016;164(6):1110–21.
Xu B, Wang H, Wright S, Hyle J, Zhang Y, Shao Y, et al. Acute depletion of CTCF rewires genome-wide chromatin accessibility. Genome Biol. 2021;22(1):244.
Nora EP, Goloborodko A, Valton AL, Gibcus JH, Uebersohn A, Abdennur N, et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell. 2017;169(5):930-44.e22.
Luan J, Xiang G, Gómez-García PA, Tome JM, Zhang Z, Vermunt MW, et al. Distinct properties and functions of CTCF revealed by a rapidly inducible degron system. Cell Rep. 2021;34(8): 108783.
Rowley MJ, Corces VG. Organizational principles of 3D genome architecture. Nat Rev Genet. 2018;19(12):789–800.
Hashimoto H, Wang D, Horton JR, Zhang X, Corces VG, Cheng X. Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol Cell. 2017;66(5):711-20.e3.
Nakahashi H, Kieffer Kwon KR, Resch W, Vian L, Dose M, Stavreva D, et al. A genome-wide map of CTCF multivalency redefines the CTCF code. Cell Rep. 2013;3(5):1678–89.
Kim TH, Abdullaev ZK, Smith AD, Ching KA, Loukinov DI, Green RD, et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell. 2007;128(6):1231–45.
Xie X, Mikkelsen TS, Gnirke A, Lindblad-Toh K, Kellis M, Lander ES. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci U S A. 2007;104(17):7145–50.
Boyle AP, Song L, Lee BK, London D, Keefe D, Birney E, et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 2011;21(3):456–64.
Schmidt D, Schwalie PC, Wilson MD, Ballester B, Goncalves A, Kutter C, et al. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell. 2012;148(1–2):335–48.
Kaplow IM, Banerjee A, Foo CS. Neural network modeling of differential binding between wild-type and mutant CTCF reveals putative binding preferences for zinc fingers 1–2. BMC Genomics. 2022;23(1):295.
Saldaña-Meyer R, Rodriguez-Hernaez J, Escobar T, Nishana M, Jácome-López K, Nora EP, et al. RNA interactions are essential for CTCF-mediated genome organization. Mol Cell. 2019;76(3):412-22.e5.
Saldana-Meyer R, Gonzalez-Buendia E, Guerrero G, Narendra V, Bonasio R, Recillas-Targa F, et al. CTCF regulates the human p53 gene through direct interaction with its natural antisense transcript, Wrap53. Genes Dev. 2014;28(7):723–34.
Hansen AS, Hsieh TS, Cattoglio C, Pustova I, Saldaña-Meyer R, Reinberg D, et al. Distinct classes of chromatin loops revealed by deletion of an RNA-binding region in CTCF. Mol Cell. 2019;76(3):395-411.e13.
Morawska M, Ulrich HD. An expanded tool kit for the auxin-inducible degron system in budding yeast. Yeast. 2013;30(9):341–51.
Nishimura K, Fukagawa T. An efficient method to generate conditional knockout cell lines for essential genes by combination of auxin-inducible degron tag and CRISPR/Cas9. Chromosome Res. 2017;25(3–4):253–60.
Yesbolatova A, Saito Y, Kitamoto N, Makino-Itou H, Ajima R, Nakano R, et al. The auxin-inducible degron 2 technology provides sharp degradation control in yeast, mammalian cells, and mice. Nat Commun. 2020;11(1):5701.
Natsume T, Kiyomitsu T, Saga Y, Kanemaki MT. Rapid protein depletion in human cells by auxin-inducible degron tagging with short homology donors. Cell Rep. 2016;15(1):210–8.
Sathyan KM, McKenna BD, Anderson WD, Duarte FM, Core L, Guertin MJ. An improved auxin-inducible degron system preserves native protein levels and enables rapid and specific protein depletion. Genes Dev. 2019;33(19–20):1441–55.
Khoury A, Achinger-Kawecka J, Bert SA, Smith GC, French HJ, Luu PL, et al. Constitutively bound CTCF sites maintain 3D chromatin architecture and long-range epigenetically regulated domains. Nat Commun. 2020;11(1):54.
Zhang H, Zhang Y, Zhou X, Wright S, Hyle J, Zhao L, et al. Functional interrogation of HOXA9 regulome in MLLr leukemia via reporter-based CRISPR/Cas9 screen. Elife. 2020;9:e57858.
Herzog VA, Reichholf B, Neumann T, Rescheneder P, Bhat P, Burkard TR, et al. Thiol-linked alkylation of RNA to assess expression dynamics. Nat Methods. 2017;14(12):1198–204.
He W, Zhang L, Villarreal OD, Fu R, Bedford E, Dou J, et al. De novo identification of essential protein domains from CRISPR-Cas9 tiling-sgRNA knockout screens. Nat Commun. 2019;10(1):4541.
Ramirez F, Dundar F, Diehl S, Gruning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic acids research. 2014;42(Web Server issue):W187-91.
Splinter E, Heath H, Kooren J, Palstra RJ, Klous P, Grosveld F, et al. CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev. 2006;20(17):2349–54.
Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489(7414):109–13.
Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet. 2015;47(6):598–606.
Fang C, Wang Z, Han C, Safgren SL, Helmin KA, Adelman ER, et al. Cancer-specific CTCF binding facilitates oncogenic transcriptional dysregulation. Genome Biol. 2020;21(1):247.
Schuijers J, Manteiga JC, Weintraub AS, Day DS, Zamudio AV, Hnisz D, et al. Transcriptional dysregulation of MYC reveals common enhancer-docking mechanism. Cell Rep. 2018;23(2):349–60.
Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129(4):823–37.
Rhee HS, Pugh BF. ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy. Curr Protoc Mol Biol. 2012;Chapter 21:Unit 21 4.
Yin M, Wang J, Wang M, Li X, Zhang M, Wu Q, et al. Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites. Cell Res. 2017;27(11):1365–77.
Choudhary MNK, Friedman RZ, Wang JT, Jang HS, Zhuo X, Wang T. Publisher correction: co-opted transposons help perpetuate conserved higher-order chromosomal structures. Genome Biol. 2020;21(1):28.
Soochit W, Sleutels F, Stik G, Bartkuhn M, Basu S, Hernandez SC, et al. CTCF chromatin residence time controls three-dimensional genome organization, gene expression and DNA methylation in pluripotent cells. Nat Cell Biol. 2021;23(8):881–93.
Lebeau B, Zhao K, Jangal M, Zhao T, Guerra M, Greenwood CMT, et al. Single base-pair resolution analysis of DNA binding motif with MoMotif reveals an oncogenic function of CTCF zinc-finger 1 mutation. Nucleic Acids Res. 2022;50(15):8441–58.
Ohlsson R, Lobanenkov V, Klenova E. Does CTCF mediate between nuclear organization and gene expression? BioEssays. 2010;32(1):37–50.
Gibson DG, Young L, Chuang RY, Venter JC, Hutchison CA 3rd, Smith HO. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods. 2009;6(5):343–5.
Astrakhan A, Sather BD, Ryu BY, Khim S, Singh S, Humblet-Baron S, et al. Ubiquitous high-level gene expression in hematopoietic lineages provides effective lentiviral gene therapy of murine Wiskott-Aldrich syndrome. Blood. 2012;119(19):4395–407.
Mumbach MR, Rubin AJ, Flynn RA, Dai C, Khavari PA, Greenleaf WJ, et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat Methods. 2016;13(11):919–22.
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–90.
Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:259.
Yan KK, Yardimci GG, Yan C, Noble WS, Gerstein M. HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics. 2017;33(14):2199–201.
Yang T, Zhang F, Yardimci GG, Song F, Hardison RC, Noble WS, et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 2017;27(11):1939–49.
Bhattacharyya S, Chandra V, Vijayanand P, Ay F. Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat Commun. 2019;10(1):4221.
Godfrey L, Crump NT, Thorne R, Lau IJ, Repapi E, Dimou D, et al. DOT1L inhibition reveals a distinct subset of enhancers dependent on H3K79 methylation. Nat Commun. 2019;10(1):2803.
Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3(1):95–8.
Muhar M, Ebert A, Neumann T, Umkehrer C, Jude J, Wieshofer C, et al. SLAM-seq defines direct gene-regulatory functions of the BRD4-MYC axis. Science. 2018;360(6390):800–5.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15(2):R29.
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1(6):417–25.
McKenna A, Shendure J. FlashFry: a fast and flexible tool for large-scale CRISPR target design. BMC Biol. 2018;16(1):74.
Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol. 2013;31(9):827–32.
Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol. 2016;34(2):184–91.
Vo BT, Li C, Morgan MA, Theurillat I, Finkelstein D, Wright S, et al. Inactivation of Ezh2 upregulates Gfi1 and drives aggressive Myc-driven group 3 medulloblastoma. Cell Rep. 2017;18(12):2907–17.
Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15(12):554.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21.
Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011;12:323.
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22(9):1760–74.
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol. 2008;26(12):1351–9.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.
Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44(W1):W160–5.
Cao Y, Kitanovski S, Hoffmann D. intePareto: an R package for integrative analyses of RNA-Seq and ChIP-Seq data. BMC Genomics. 2020;21(Suppl 11):802.
Hyle J, Djekidel MN, Williams J, Wright S, Shao Y, Xu B, Li C. Auxin-inducible degron 2 system deciphers functions of CTCF domains in transcriptional regulation. GSE205218. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE205218 (2022).
Mumbach MR, Rubin AJ, Flynn RA, Dai C, Khavari PA, Greenleaf WJ, Chang HY. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. GSE80820. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE80820 (2016).
Godfrey L, Crump NT, Thorne R, Lau IJ, Repapi E, Dimou D, Smith AL, Harman JR, Telenius JM, Oudelaar AM, Downes DJ, Vyas P, Hughes JR, Milne TA. DOT1L inhibition reveals a distinct subset of enhancers dependent on H3K79 methylation. GSM3312803. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM3312803 (2019).
Xu B. Hi-C and HiChIP Figshare. 2022. https://doi.org/10.6084/m9.figshare.21002533.v1.
Xu B, M.N. D, C. L, Williams J. CTCF-AID2. Figshare. https://doi.org/10.6084/m9.figshare.c.6186670.v2 (2022).
Xu B. CHIPSEQ QC Figshare. 2018. https://doi.org/10.6084/m9.figshare.7411835.v8.
Xu B, M.N.D. ChIPseq and RNAseq. Figshare. https://doi.org/10.6084/m9.figshare.21045889.v1 (2022).
Williams J. SLAM-Seq analysis Figshare. 2022. https://doi.org/10.6084/m9.figshare.21259278.v1.
We thank the insightful discussion and comments of members from the Li lab and the CAB. We thank Drs. Ruopeng Feng, Gang Wu, and Peng Xu for valuable discussion. We gratefully acknowledge the staff of the Hartwell Sequencing, Cytogenetics, Flow Cytometry, and Cell Sorting Shared Resource facility within the Comprehensive Cancer Center of St. Jude Children’s Research Hospital.
The review history is available as Additional file 5.
Peer review information
Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
This work was supported by the institutional startup fund (ALSAC).
Ethics approval and consent to participate
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Hyle, J., Djekidel, M.N., Williams, J. et al. Auxin-inducible degron 2 system deciphers functions of CTCF domains in transcriptional regulation. Genome Biol 24, 14 (2023). https://doi.org/10.1186/s13059-022-02843-3
- Auxin-inducible degron