Insulator-based loops mediate the spreading of H3K27me3 over distant micro-domains repressing euchromatin genes



Chromosomes are subdivided spatially to delimit long-range interactions into topologically associating domains (TADs). TADs are often flanked by chromatin insulators and transcription units that may participate in such demarcation. Remarkably, single-cell Drosophila TAD units correspond to dynamic heterochromatin nano-compartments that can self-assemble. The influence of insulators on such dynamic compartmentalization remains unclear. Moreover, to what extent heterochromatin domains are fully compartmentalized away from active genes remains unclear from Drosophila to human.


Here, we identify H3K27me3 micro-domains genome-wide in Drosophila, which are attributed to the three-dimensional spreading of heterochromatin marks into euchromatin. Whereas depletion of insulator proteins increases H3K27me3 spreading locally, across heterochromatin borders, it concomitantly decreases H3K27me3 levels at distant micro-domains discrete sites. Quantifying long-range interactions suggests that random interactions between heterochromatin TADs and neighbor euchromatin cannot predict the presence of micro-domains, arguing against the hypothesis that they reflect defects in self-folding or in insulating repressive TADs. Rather, micro-domains are predicted by specific long-range interactions with the TAD borders bound by insulator proteins and co-factors required for looping. Accordingly, H3K27me3 spreading to distant sites is impaired by insulator mutants that compromise recruitment of looping co-factors. Both depletions and insulator mutants significantly reduce H3K27me3 micro-domains, deregulating the flanking genes.


Our data highlight a new regulatory mode of H3K27me3 by insulator-based long-range interactions controlling distant euchromatic genes.


with >3 different concentrations of cDNA for standard curves with Biorad iQ SYBR Green Supermix in Applied biosystems viia7 (ThermoFisher). For synthetic mutant-and WT-Beaf32, we used previously characterized stably transfected cell lines expressing either form of synthetic Beaf32 from previously stably transfected S2 cell lines (2), as obtained upon stable integration of pUWG-neo vectors selected by neomycin resistance. Of note, these cells harbor highly reduced levels of endogenous beaf-32 expression as compared with synthetic mutant-or WT-Beaf-32 as shown (2).

Characterization of H3K27me3 micro-domains
For analyses of H3K27me3 levels, Reads were first trimmed with bbduk.sh script from BBtools (38.00) and with standard parameters. The alignments were performed with Bowtie2 (2.3.3.1) on dm3 genome version with -q10 parameter. Samtools rmdup (1.4.1) was then used on sorted BAM files to filter out potential PCR duplicates. Bigwig files were generated with deeptools BamCoverage (3.0.1-2-2) RPKM normalized and using a bin size of 50bp with a smoothing window of 150 bp. H3K27me3 micro-domain detection by NormR was performed in two steps. First, detection was performed for normalized ChIP-seq read counts compared to input. Second, a systematic comparison was performed as normalized ChIP-seq read counts comparing Beaf32 depleted (or mutant Beaf32) to control (Figure 2; see also Additional file 2 and 6: Fig. S2 and S6). The preferential distribution of micro-domains as nucleosome 'mers' was verified by taking various bin sizes for detection (10, 20 and 40 bp) and not detected from control domains as obtained after running NormR on control input samples (e.g. input 1/2/3 compared to input 2/3/1) as systematically quantified in comparison with micro-domains. Further selection was performed using FDR of 5x10-2 (see NormR). Microdomains were further selected based on sizes (< 2 kb) (Additional file 2: Fig. S2D) and after excluding those micro-domains nearby (< 2kb) heterochromatin borders (to avoid mixed signals due to local spreading; see Figure 1). Micro-domains were further tested by quantification of H3K27me3 in mutants or upon Beaf32 depletion compared with WT (Additional file 6: Fig. S6). For quantification of H3K27me3 levels inside micro-domains, we quantified normalized ChIP-seq read counts by comparing systematically the first two quartiles between Beaf32 depleted and control cells, as well as between synthetic mutant/WT Beaf32 expressing cells. H3K27me3 micro-domains were analyzed by counting the ChIP-seq reads for H3K27me3 normalized over input levels of bins corresponding to micro-domains  (1). Further intersection analyses of micro-domains showing significant reduction were then performed directly using genomic coordinates of domains with enhancers (STARR-seq) (3,4) or the binding sites of GAF (5), dCTCF (6) and Beaf32 (2), or in the case of differential expression analyses with RNAseq, after filtering micro-domains associated with (+/-1 kb) with TSSs. For quantification of insulator protein binding near micro-domains, we used normalized ChIP-seq reads in windows (+/-1 kb) surrounding micro-domains compared to levels found at random euchromatic sites (same number of sites taken randomly from the same euchromatin domains). For the analysis with respect to heterochromatin borders, error bars represent the variations between replicates for all bins in the indicated intervals. Normalized H3K27me3 levels were measured for all bins and plotted to compare the H3K27me3 distribution in heterochromatin, euchromatin or micro-domains bins (Additional file 2: Fig. S2). To validate the genome-wide impact of Beaf-KD compared to control cells, on variations of H3K27me3 ( Figure 1E), we chose to score H3K27me3 in +/-1 kb windows surrounding transcription start sites as we previously showed that 91.1% of Beaf32 sites mapped in this context (2).

Experimental and computational integration of 3C and Hi-C data
TADs were inferred using HiCseg algorithm after normalizing 1kb contact matrices as previously performed (7) using Kc167 Hi-C data from Li et al. (8) allowing us to monitor long-range interactions within such TADs (696 contiguous TADs of a median size ~110Kbp).
All interactions were measured using Hi-C data from S2 and Kc167 cells and assessed within the limits of the calculated TAD domains, with one exception for GSEA test (see below).
Extraction of long-range contacts was performed using dump command from juicer tools in order to obtain normalized contacts (observed/expected and Knight-Ruiz correction). 2D APA plots were generated with heatmap function from R stats library after importing dumped matrices. as previously described (9). Aggregation first was performed by aggregating longrange contacts onto the indicated binding sites in 1D (2) and then in 2D/3D essentially as previously described by 'Aggregate Peak Analysis' (APA)(9) run onto the normalized Hi-C data from S2 or KC cells (10, 11). LRIs were measured from the aggregated matrices to probe specific loops at the binding sites (LRIs-3; 3x3 squares at middle of plot), or in the lower left part of the 2D plot (LRIs-2; 3x3 squares reflecting TAD strength normalized to 3x3 squares on lower right) or compartments (LRIs-1; upper right (3x3 squares) normalized to lower right). For statistical tests probing LRIs variations between Beaf32-KD and control cells (12), we used Fisher exact test after genome-wide ranking of TADs depending on LRI 1/2/3s. contiguous non overlapping squares as represented in Figure 4C. To optimize statistical ranking test by Gene set enrichment analysis (GSEA), we utilized high-resolution Hi-C detecting approximately 2,000 TADs throughout the genome (12), which correspond to repressive TAD domains ( Figure 4B). A/B compartments were detected with the eigenvector function from juicer tools (v 0.7.5) with parameters KR and at 10 kb resolution. To retrieve a homogeneous A/B nomenclature for chromosomes 2L 2R 3L 3R and X, we further verified that arbitrary A compartment labeling was associated with active genes as detected by RNAseq (from GSE22069). Then, compartment eigenvector were estimated in parallel and for bins of similar sizes corresponding to either A-or B-compartments, or to the detected micro-domains. A/B compartments were reconstructed as previously described (13 was performed after neutralization of 0.2% SDS with Triton X100 (1.2% final), at 37°C overnight and then with another 500u for 2 hours followed by a stop reaction (1% SDS at 65°C for 20 min) before adding T4 ligation buffer (1/20 ; v/v) for a 4h incubation at 16C with T4 DNA ligase (200u; Fermentas) and 1 mM ATP. EDTA (1M) was added and DNA recovered after Phe/chloroform extraction and ethanol precipitation. DNA products were measured by qPCR (Viia7 ; Life Technology) with bona fide reverse primers (PCR efficiencies of 95-100 %). 3C chimeras were normalized to random ligation using the same conditions from DNA purified from BACs spanning the same regions (BACR34H23 and BACR05D08) with Taqman-MGB (16). Standard deviation was calculated for each condition from independent replicates measured in triplicates as previously done (2).

Epigenetic domains during development
H3K27me3 ChIP-seq data along all Drosophila developmental stages were obtained from modencode (17, 18). Correlation coefficients were calculated through a K-S test and a Pearson correlation value for each micro-domain compared with control bins of the same size distribution. Control bins were selected out of euchromatin regions (filtering out bins from heterochromatin and domain borders) as for micro-domains.

Expression analyses
RNA-seq analysis was performed as previously (1) with the RNAseq data from Beaf32depleted compared to siRNA mock depleted (luciferase) control cells (GSE57168). Boxplot for gene expression analysis was generated using log ratio of BeafKD/WT as normalized RNA-seq reads by taking the normalized DEseq2 output counts. Table S1. List of genes associated or not with euchromatic H3K27me3 micro-domains in presence or absence of the indicated insulator proteins (see Material and Methods for details).   Note that Beaf32 depletion did not significantly deregulate the Drosophila esc, Su(z)12 and E(z) encoding the subunits of PRC2 complex, as indicated .  control loci (see also panel C). Note that the 16 micro-domains also harbor significant loss of H3K27me3 upon Beaf32 mutants (see Figure 6).  or dsRNAs specific of cp190 or of the rad21 subunit of cohesin were used to impair the expression of the corresponding genes as previously described (2). Following cDNA preparation, expression of these (cp190, cohesin/rad21) or of control genes (actin, gapdh) were then measured by qPCR in triplicates with normalization to standard curves (see Materials and Methods). Note that Beaf32 expression is globally unchanged in cp190 as compared to cohesin depleted cells, thereby arguing against possible indirect influence of cp190 on LRIs due to beaf32 expression.

D. Chromosome conformation capture (3C) analysis of long-range interactions between the
mio gene -associated micro-domain and the distant Beaf32 peak near the TSS of tsp39D ('tsp39D')(see also Figure 3A). The graph represents the relative frequency chimera products as measured by qPCR from CP190-depleted (green) compared to control siRNA-depleted (black) cells (see also Figure 3). Proximal ligation products were estimated after HindIII restriction with reciprocal anchor primer flanking the micro-domain to probe long-range contacts with the Hind III fragments spanning the tsp39 locus, using TaqMan-MGB probe (see Additional Methods). Variations were tested by Student's t-test in triplicates.

Fig S4. Regulation of long-range interactions by insulator proteins
Genome-wide aggregation of long-range interactions (2, 9) using high-resolution Hi-C data from KC cells (11) similar to what was detected in S2 cells (see Figure 4) (19, 20).

Aggregation of Hi-C data in KC cells confirm genome-wide long-range interactions between
all genomic Beaf32 sites with GAF or dCTCF sites depending on their co-localizing with CP190 (left) or not (right). 1, 2 and 3 represent A/B compartments (LRI-1: long-range interactions detected between two A or two B domains), TAD (LRI-2: long-range interactions defining triangles in the Hi-C matrices (as detected in panel B) and specific loops (LRI-3: specific long-range interactions between two defines sites (e.g. Beaf32, GAF or dCTCF), respectively. Note that similar results were obtained when estimating LRIs from other sources of Hi-C data (12, 19).