The chromatin interaction network
To assemble the chromatin interaction network, we used the recent PCHi-C dataset in mESCs from Schoenfelder et al. [8], including interactions amongst promoters and between promoters and other genomic elements. The PCHi-C data were processed using the CHiCAGO algorithm. CHiCAGO is a HiC data processing method that filters out contacts that are expected by chance given the linear proximity of the interacting fragments on the genome and takes into account the biases introduced by the capture step used in the PCHi-C approach [29]. The network based on the significant interactions detected by CHiCAGO has 55,845 nodes and 69,987 connections (see “Methods” and Additional file 1: Figure S1). Of these interactions, 20,523 interactions connect a promoter fragment with another promoter fragment (P–P edges) and 49,464 interactions connect promoters with non-promoter “other end” fragments (P–O edges).
As in many networks, we can observe a main large connected component (LCC) that consists of 35,293 nodes (63 % of total nodes) joined by 52,984 edges (76 % of total edges) (Additional file 1: Figure S1). There are 264 disconnected components with more than ten nodes and about 4000 additional small components. Each chromatin fragment has an average of 2.5 neighbors with each promoter interacting with three non-promoter elements on average.
Epigenomic features associated with chromatin fragments participating in 3D contacts
For each fragment in the PCHi-C network, we mapped a large set of 78 epigenomic features [38]. These features included cytosine modifications, histone marks, and ChIP-seq peaks of chromatin-related proteins, such as transcription factors and members of chromatin complexes, including cohesin, CTCF, PcG, and different RNAPII variants (Additional file 2). For each chromatin fragment we calculate the fraction covered by peaks of a specific feature and we define the abundance of each feature as the average of this value over all fragments in the network (see “Methods”). Figure 1a shows the fraction of fragments covered by EZH2 binding sites. We noticed the strong accumulation of the nodes that have binding sites for this PcG factor in specific regions of the network. Strikingly, this co-localization of the signal is observed despite the low overall prevalence of EZH2 binding in the fragments (only 10 % of fragments have some overlap with EZH2 peaks). Figure 1b shows the HoxA cluster region on chromosome 6. In this region, we observe that fragments connected by long-range interactions tend to have similar values of EZH2, with EZH2 peaks having similar heights on pairs of connected fragments. We therefore set out to investigate and quantify the extent to which connected nodes in the whole network have similar values for EZH2 and the other 77 epigenomic features. A high similarity of values in interacting nodes could suggest a role for some features in mediating these contacts.
Definition of ChAs
We propose an approach to identify epigenomic features that can be associated with 3D chromatin contacts. This involves measuring the extent to which neighboring network nodes have similar epigenomic features using ChAs. Assortativity, also called homophily, is the propensity for interacting nodes to have similar values [40] (see “Methods”). ChAs is defined as the assortativity of abundance levels of one specific epigenomic feature on the chromatin interaction network. In practical terms, it is the correlation of abundance of a single feature measured across all pairs of neighbors in the network. As a correlation coefficient, ChAs values range between −1 and 1. ChAs can therefore be used to identify features that are found in fragments that are globally connected in the network or to distinguish different types of fragments that tend to interact with each other. To aid the interpretation of these values, we can consider the three scenarios depicted in a schematic scatter plot of ChAs versus abundance (Fig. 1c):
-
1.
Fragments that have a certain value for the epigenomic feature (that is, certain proportion of the fragment is covered by peaks of that feature) predominantly interact with other fragments which have similar values for the same feature, but not with other fragments. In this case the ChAs for that feature will be positive (ChAs > 0). This situation would indicate that this feature is potentially associated with chromatin contacts.
-
2.
Alternatively, there might be no relationship between the values of the feature on fragments and the values on their neighbors. In this case we will have ChAs = 0. This can happen either when the feature values do not have anything to do with the contacts or when the feature values are very homogeneous in the network: either the feature is low on all fragments (as would be the case for a very rare chromatin mark) or high on all fragments (as would be the case for ubiquitous chromatin marks). This produces low variability of abundance across nodes and, hence, the correlation of these values in neighboring nodes measured by ChAs tends to be 0.
-
3.
Finally, it could be that fragments that have high values for a given feature frequently interact with fragments with low values for that same feature. In this case we will have a negative ChAs (ChAs < 0). This suggests that a set of genomic regions with the feature tend to interact in the network mostly with fragments of a different kind.
For this reason, it is important to consider the abundance of a feature (defined above as the fraction of fragment covered by the feature averaged over fragments) together with the ChAs value. In our EZH2 example, the abundance of this feature is 0.027 and its ChAs value is 0.34, which demonstrates how a fairly rare feature can be assortative.
To summarize, firstly we are interested in features that have high positive ChAs, as this signifies that the mark appears to be localized in specific connected areas of the network. These features are thus very probably involved in the chromatin contacts. Secondly, we are looking for features with negative ChAs, which should be typical of one subclass of fragments that frequently interact with a different subclass of fragments. In this case, ChAs can be used to detect features that distinguish multiple chromatin fragment types.
A recent cohesin ChIA-PET dataset [42] allows us to illustrate the characteristics and biological interpretation of ChAs. Dowen et al. [42] reported interactions with pulldowns of the SMC1 cohesin unit in mESCs. We therefore proceeded to measure abundance and ChAs of SMC1 in this dataset, obtaining a fairly high value of abundance (0.27, mean of all features 0.09) and a low value of ChAs (0.09, mean of all features 0.28). This is expected due to the strong enrichment of fragments for presence of this protein (98 % of fragments have an SMC1 peak). This enrichment makes all fragments have similar proportions covered by the SMC1 feature, hence driving down the ChAs value. CTCF, in contrast, shows an almost threefold increase in ChAs (0.29 versus 0.09 of SMC1) and only a 1.2 % increase in abundance (0.33 versus 0.27 of SMC1) compared with SMC1. These results suggest that the subset of cohesin-bound fragments that also have CTCF bound tend to interact preferentially with each other. In summary, using this well understood dataset, we showed that ChAs is a measure that combines the presence of peaks in different interacting fragments and the topology of the chromatin interaction network. ChAs can thus detect differences and biases in the different types of chromatin interaction networks and identify the chromatin features playing important roles in 3D structure in the cases where these are not known a priori.
ChAs of chromatin features in the mESC chromatin interaction network detected by PCHi-C
We calculated ChAs for the 78 chromatin features in the entire PCHi-C network and compared these values with the corresponding abundance (Fig. 2a). The PcG proteins (EZH2, PHF19, RING1B, SUZ12, CBX7) and histone marks associated with them (H3K27me3, H2Aub1) have the highest ChAs values (ranging from 0.2 to 0.35, mean of all features 0.08; Fig. 2a), suggesting that this complex might be involved in establishing the 3D structure of chromatin in mESCs. This confirms and extends results observed for the Hox gene clusters [8, 20, 41]. RNAPII also has high ChAs, especially the variant implicated in transcriptional elongation (ChAs of RNAPII-S2P = 0.23; Fig. 2a). Two features with high abundance that also have high ChAs are H3K4me1, found on regulatory distal regulatory elements, and H3K36me3, marking transcribed gene bodies. On the other hand, H3K4me3, a modification associated with active promoters, is a very abundant mark (fourth most abundant, abundance = 0.12, mean of all features 0.02) but it has low ChAs (0.04).
We verified that ChAs is robust to random removal of edges in the network, such that our results do not depend on the completeness and accuracy of the chromatin interaction network (see Additional file 1: Text S1 and Figure S2). Moreover, we have ensured the significance of ChAs for at least 72 % of the features using a randomization that preserves network topology and overall feature abundance, as well as using an alternative approach preserving the features’ spatial distribution (see Additional file 1: Text S1 and Figure S3). We have also verified that ChAs values are generally not affected by removing short-range contacts that might produce similarity of abundance values in neighboring fragments (Additional file 1: Figures S4 and S5). Finally, comparison of ChAs with other network measures demonstrates that it is a complementary method to identify important features (see Additional file 1: Text S2 and S3, Figures S6 and S7).
In summary, the ChAs of an epigenomic feature is a useful global measure that relates feature abundance at interacting fragments with the underlying contact network topology. In the next section, we compare the ChAs values calculated on different chromatin interaction networks.
Chromatin assortativity in additional PCHi-C and ChIA-PET datasets
To test to what extent chromatin interaction network properties depend on the experimental protocol and signal detection algorithm, we took advantage of an alternative promoter interaction dataset in mESCs. Sahlén et al. [7] applied HiCap (a promoter capture method similar to PCHi-C) to mESCs, identifying interactions involving promoters. Using contacts amongst promoters and between promoter and non-promoter fragments from the Sahlén et al. dataset yields a network of 87,823 nodes with 173,801 interactions (including 19,309 promoter nodes and 82,659 P–P interactions). The HiCap technique is complementary to PCHi-C since a different enzyme is used for the restriction step, generating shorter interaction fragments compared with PCHi-C (median size 599 bp versus 3953 bp for PCHi-C). The shorter fragments produce a higher resolution picture of contacts between nearby fragments, at the expense of reduced coverage of long-range interactions. Visualizing the network shows that the largest connected component is comparatively smaller than in PCHi-C, encompassing 9.6 % of the total nodes and 12.8 % of the total connections (Additional file 1: Figure S8).
We analyzed the HiCap network in combination with the 78 chromatin features previously introduced. We repeated the calculation of ChAs of the chromatin features using the HiCap network as described above for the PCHi-C network. We directly compared the ChAs values for all features between PCHi-C and HiCap networks and found that, overall, they are highly correlated (Pearson’s R = 0.67, p value = 2.99 × 10−11; Fig. 2b; Additional file 1: Figure S8d, e). For example, the PcG components are confirmed amongst the features with the highest ChAs, as was observed in the PCHi-C analysis, together with RNAPII, especially the S2P variant (Additional file 1: Figure S8e).
In summary, we have shown that ChAs is a useful metric to detect those epigenomic features that might be more influential in promoter-centered chromatin interaction networks and that the ChAs measurements are rather independent of the underlying experimental protocol. A comparison with a contact map that is not enriched for contacts involving promoters was performed using the previously mentioned SMC1 ChIA-PET dataset [42] (Additional file 1: Figure S9a, c). There was no significant correlation between ChAs values obtained for the ChIA-PET dataset and the promoter capture datasets (Fig. 2b), showing that the ChAs measurements are specific to the types of contacts assayed (Additional file 1: Figures S10 and S11). The cohesin ChIA-PET network is not enriched for promoters—only 20 % of the SMC1 ChIA-PET fragments overlap the PCHi-C promoter fragments (Additional file 1: Table S1)—but it still shows the assortativity of PcG features and of the actively elongating RNAPII-S2P.
To exclude the possibility that the correlation observed in the two promoter capture datasets was purely due to the experimental technique used to map the contacts, we also calculated ChAs for an RNAPII ChIA-PET dataset. Interactions involving RNAPII (8WG16 antibody, recognizing all variants) have been detected in mESCs [43], allowing us to analyze an RNAPII-focused chromatin interaction network (Additional file 1: Figure S9b, d). In addition, this network allowed us to further test the differences in ChAs of RNAPII variants, which we observed to be reproduced in the PCHi-C, HiCap, and RNAPII ChIA-PET networks but not in the SMC1 ChIA-PET network (Additional file 1: Figures S9–S11). The RNAPII ChIA-PET network is obviously enriched in promoter interactions (58 % of the RNAPII ChIA-PET fragments overlap PCHi-C promoter fragments; Additional file 1: Table S1) but, contrary to the PCHi-C and HiCap promoter-capture networks, it contains only fragments in which RNAPII is bound. Similarly to what we found in the PCHi-C and HiCap networks, PcG proteins and associated histone marks show considerably high ChAs in the RNAPII ChIA-PET network, but lower than H3K4me1 (an enhancer specific mark) and the repressive mark H4K20me3 (Additional file 1: Figure S9b).
The ChAs of the non-specific RNAPII-8WG16 is quite low (0.07) in the RNAPII ChIA-PET network compared with all other features (mean 0.1) (Additional file 1: Figure S9b). A low ChAs is expected given that fragments in this network are highly enriched for the presence of this feature (84 % of fragments have an RNAPII-8WG16 peak, abundance = 0.5). This leads to uniform levels of RNAPII abundance on the nodes and, hence, we do not observe any localization of the mark in specific areas of the contact network. Interestingly, we do observe higher ChAs for the elongating variant RNAPII-S2P (0.19 versus 0.07 for the RNAPII-8WG16) accompanied by a comparatively lower abundance (0.25 versus 0.5 for RNAPII-8WG16), suggesting that regions of the genome in which elongation takes place interact preferentially (Additional file 1: Figure S9b).
Overall, we observe a significant correlation of the RNAPII ChIA-PET ChAs values with PCHi-C (Pearson’s R = 0.37, p value = 1.01 × 10−3; Fig. 2b; Additional file 1: Figure S10c) and an even better correlation with HiCap (Pearson’s R = 0.59, p value = 9.77 × 10−9; Fig. 2b; Additional file 1: Figure S11b), despite the drastically different topology (Additional file 1: Figure S11d).
Comparing the results of our approach using these four different networks, we conclude that the methodology is able to identify the putative roles of specific epigenomic features in mediating different types of chromatin contacts. The high ChAs values of PcG and RNAPII are confirmed in different datasets but different features acquire different levels of ChAs and, potentially, different relevance in the different contact maps. Although PCHi-C, HiCap, and RNAPII ChIA-PET are all enriching for interactions involving promoters, there are clear differences in the resulting networks. Notwithstanding the strong differences in topology and network statistics between promoter-capture and ChIA-PET networks (Additional file 1: Figure S9c–e), we find higher similarity between the three promoter-enriched datasets (PCHi-C, HiCap, and RNAPII ChIA-PET; Additional file 1: Figures S10 and S11). The correlation between ChAs of promoter-capture networks is improved when looking at PCHi-C and HiCap subnetworks that only include P–P contacts or P–O contacts (Fig. 2b; Additional file 1: Figure S12). We therefore proceed with our goal to use ChAs to analyze the difference between interactions involving two promoters and interactions between promoters and other genomic elements.
Distinct ChAs properties of contacts amongst promoters and between promoters and other elements
As mentioned above, the experimental design of promoter-capture HiC (PCHi-C or HiCap) produces chromatin fragments of two kinds: promoter (P) fragments are the ones that are captured in the experiment because they match a library of promoters and are therefore identified as baits; other-end (O) fragments are chromatin fragments found to interact with the promoter baits.
We first investigated the differences in chromatin features associated with PCHi-C contacts involving two promoters (P–P) and contacts involving a promoter and an other-end fragment element (P–O). We calculated feature abundance and ChAs values for two subnetworks: the P–P network and the P–O network (Fig. 3a; Additional file 1: Figure S12). We combined these data in a comparative ChAs plot to directly assess the relationship between the ChAs of chromatin features measured in the two different subnetworks in PCHi-C (Fig. 3b).
Strikingly, we find a number of features with very different values of ChAs in these two subnetworks. For example, in Fig. 3b we see a group of features with positive ChAs in the P–P interactions, implying that these epigenomic features are found in promoters that contact each other, and negative ChAs in the P–O interaction network, implying that these features are usually not present on the other-end fragments that contact promoters. The features that have discordant signs of ChAs in the two subnetworks include many promoter-specific histone modifications and chromatin factors, specifically H3K4me3 (typically denoting active promoters), HCFC1 (transcription activator complex), SIN3A (transcriptional repressor complex), KDM2A (H3K26 demethylase), NMYC, OGT (histone acetyl transferase complex), H3K4me2, and H3K9ac (denoting active promoters) [38]. Features that have slightly higher or equal ChAs in the P–O interactions include CBX3 (the HPγ implicated in elongation [44, 45]) and RNAPII-S2P. PCHi-C can only detect interactions involving at least one promoter. At the same time, most of the epigenetic features considered here are characteristic of promoters, due to the large bias in datasets available in the literature. Therefore, we are unlikely to find features with higher ChAs in P–O versus P–P contacts, which would lie at the upper left corner above the diagonal in Fig. 3b. However, the features closer to the diagonal are features that are present in both P–P and P–O contacts. The PcG proteins and their associated histone marks are found very close to the diagonal on the comparative ChAs plot of Fig. 3b, suggesting that they are found at both P–P and P–O contacts, together with H3K36me3 and the cytosine modifications 5hmC and 5fC.
The comparative ChAs plots for the HiCap datasets are very consistent with the PCHi-C ones (Fig. 3c; Additional file 1: Figure S12), as shown clearly in a scatter plot of the difference of ChAs between P–O and P–P subnetworks in the two datasets (Fig. 3c; further comparisons of P–P and P–O ChAs are shown in Additional file 1: Figure S12). Interestingly, we observe substantially different ChAs scores for different RNA polymerase variants exclusively in P–O contacts, with elongating RNAPII having a ChAs 23-fold higher than the non-elongating forms (ChAs of RNAPII-S2P = 0.23 versus 0.01 for RNAPII-8WG16; Fig. 3b).
In order to assess the robustness of these differences, we generated 100 networks by random partial rewiring of the original network and re-calculated the ChAs in P–P and P–O subnetworks (see “Methods” and Additional file 1: Figure S12H). The simulations show non-overlapping simulated ChAs distributions in the P–O subnetworks for the different RNAPII variants, whereas the corresponding distributions in the P–P subnetworks are highly overlapping. These results suggest a significant difference in the role of elongating polymerase between P–P and P–O contacts.
Characterization of overlapping chromatin communities reveals PcG and RNAPII-S2P modules
A large portion of the PCHi-C interactions form a large connected component (LCC), also called a “giant component” [35]. There is a significant correlation of the ChAs values measured for the LCC and for the interactions in the rest of the network (Pearson’s r = 0.8, p = 0; Additional file 1: Figure S13). However, we observe a higher ChAs for PcG features in the LCC (mean 2.8-fold increase; especially for EZH2, having ChAs = 0.37 in the LCC compared with ChAs = 0.14 in the rest of the network). Considering the LCC, we then identify features that are most abundant in nodes with high betweenness centrality, defined as the number of shortest paths from all nodes to all others that pass through that node [46]. PcG features are enriched in nodes with high betweenness centrality, again suggesting PcG’s role in holding the core of the interaction network together (Additional file 1: Figure S14a).
To investigate whether PcG features were also involved in mediating connections between different chromatin communities, or neighborhoods [35], we analyzed the LCC with the ModuLand algorithm, which identifies overlapping modules [47] (Fig. 4a; Additional file 1: Text S3). Once overlapping communities were defined, we calculated the “bridgeness” of each node, defined as the number of different chromatin communities (modules) that it belongs to [48]. Figure 4b shows that the features most abundant in the nodes with highest bridgeness are the ones typical of promoters (SIN3A, HCFC1, and H3K4me3) as well as transcription factors such as E2F1, N-MYC, C-MYC, and KLF4. In contrast, PcG features are not abundant in high bridgeness nodes, suggesting that nodes in which PcG is present do not tend to belong to multiple chromatin communities.
The relative values of bridgeness and betweenness centrality can be used to distinguish the so-called date and party hubs, defined as nodes that entertain multiple interactions respectively one at a time or simultaneously [49, 50] (Additional file 1: Text S4). Extending this concept and using the enrichment of features in the top bridgeness and betweenness nodes, we can identify “party features”, found in nodes that belong to multiple communities at the same time, and “date features”, found in nodes involved mainly in one community at any one time (Additional file 1: Figure S14b). Only the PcG features (and to a lesser extent KDM2B, TAF1, and H4K20me3) appear to have a definite “party” character, suggesting that they might mediate more stable interactions due to their high abundance in nodes that are central in the network (high betweenness) but mostly belong to a single community (low bridgeness) (Additional file 1: Figure S14b). Similarly to what was observed for values of ChAs in the P–O subnetwork (Fig. 3b), we see a striking difference between the elongating RNAPII variant S2P and non-elongating RNAPII variants (Fig. 4b; Additional file 1: Figure S14b). The non-elongating RNAPII variants show similarly high abundance in top bridgeness and top betweenness nodes, suggesting their presence in nodes that are central and shared between multiple modules. In contrast, the elongating S2P variant is found in more peripheral nodes that specifically belong to a single module, as shown by equally low enrichment in top bridgeness and top betweenness nodes (Additional file 1: Figure S14b). To summarize, PcG features are found in highly connected and highly central nodes, but these nodes do not tend to belong to distinct network communities. The elongating variant of RNAPII, contrary to other RNAPII variants, is found mostly in nodes that belong to a single community and they are more peripheral to the network (low betweenness centrality).
We investigate the difference between RNAPII variants further by looking at enrichment of features in chromatin communities identified by ModuLand, concentrating on the features that showed a high value of ChAs (ChAs > 0.1; Fig. 4b). The heat map in Fig. 4c clearly shows the presence of four clusters. The largest and most prominent is cluster IV including all PcG features, which are enriched in a specific subset of chromatin communities. Clusters II and III contain, respectively, non-elongating forms of RNAPII and DNA cytosine modifications. On the other hand, RNAPII-S2P appears in cluster I in chromatin communities that are also enriched in H3K36me3 and CBX3. Although all enrichments in RNAPII are anti-correlated with enrichments in PcG features (Fig. 4c), this anti-correlation pattern is stronger for the actively elongating variant RNAPII-S2P (Additional file 1: Figure S15). Overall, these results suggest that PcG features are found in very central and connected nodes that interact stably, forming specific chromatin communities. Similarly, active elongation is taking place in specific chromatin communities but fragments of chromatin bound by elongating RNAPII are not particularly connected or central in the network (Additional file 1: Figure S6). In the next section we explore the differences between the different RNAPII variants in more detail.
RNAPII-S2P has higher ChAs in P–O contacts compared with other RNAPII variants
Our collection of genome-wide features includes five different ChIP-seq datasets for RNAPII obtained using different antibodies. Of these, three recognize different phosphorylated forms of RNAPII involved in the different stages of transcription [51, 52] (Fig. 5a). We can therefore distinguish between ChIP-seq peaks of RNAPII in its initiating or repressed form (S5P, S7P), in its actively elongating variant (S2P), or in any of its variants (RNAPII-8WG16, POLII).
We compared the ChAs of the different RNAPII variants in the whole PCHi-C and HiCap networks. As was already noted, RNAPII-S2P, which denotes elongation of actively transcribed genes, shows higher ChAs than the other RNAPII variants in both datasets (Fig. 5b). These differences are robust to partial rewiring of the networks (see “Methods” and Additional file 1: Figure S16a). Figure 5c shows the corresponding abundance values, which are very comparable between different RNAPII variants within each dataset.
Next, we compared the ChAs of the different RNAPII variants in the RNAPII ChIA-PET network (Fig. 5b). In principle, the RNAPII ChIA-PET dataset provides us with the network of chromatin contacts in mESCs mediated by any RNAPII, as the antibody used in this experiment (8WG16) recognizes all RNAPII variants. Interestingly, there is an increase of ChAs from repressed to actively elongating RNAPII in all three networks (Fig. 5b; Additional file 1: Figure S16a). These results suggest that, whereas all interacting fragments in these promoter-rich networks do contain some form of polymerase, the presence of active forms of RNAPII distinguishes different network neighborhoods in which active elongation is taking place, as also suggested in Fig. 4c.
Finally, we used the ChIA-PET network of contacts mediated by cohesin in mESCs as a negative control [42]. In this dataset we see many contacts that do not involve any promoters or genes, in which we do not expect to find any RNAPII bound (61 % of fragments in the SMC1 ChIA-PET dataset have no signal for RNAPII-8WG16). Indeed, the different variants of RNAPII in this cohesin-mediated network have very high ChAs (Fig. 5b; Additional file 1: Figure S16a). The presence of any form of RNAPII clearly separates regions of the cohesin-centered network where transcription is active from regions where it is not. These trends cannot be explained by changes in abundance (Fig. 5c).
We further compared the ChAs of different RNAPII variants between P–P and P–O contacts (Fig. 5d). In the PCHi-C network we observe the ChAs for different phosphorylation states of RNAPII to vary widely in the P–O contacts (from close to 0.01 to 0.23, the third highest value overall), while all states have similar ChAs in the P–P contacts (ChAs range 0.21–0.22) (Fig. 5d; Additional file 1: Figure S16b). To understand this trend better, we also look at abundance of the different RNAPII variants in the different subnetworks (Additional file 1: Figure S16c). Whereas in the P–P subnetwork the abundance decreases from inactive forms of RNAPII to the elongating form, in the P–O subnetwork the elongating form is equally abundant compared with the other forms. We can therefore conclude that the different ChAs observed for different forms of RNAPII are related to the topological distribution of RNAPII binding on the network, rather than simply to changes in average abundance in the network. This finding suggests that when O fragments contact P fragments, predominantly the elongating form of RNAPII is present on both fragments. The difference between different RNAPII forms specific to P–O contacts is even more evident in the HiCap dataset where the ChAs value of non-elongating variants of RNAPII is negative (Fig. 5e; Additional file 1: Figure S16d). This is likely due to the higher resolution of the HiCap experiment, which allows us to better discriminate P and O fragments that are probably merged in some of the larger PCHi-C fragments.
We investigated further to determine whether the patterns of ChAs of different RNAPII variants change depending on the type of fragments contacted by the promoter. We selected two types of O fragments: enhancers (fragments with H3K4me1 > 0) divided into active enhancers (H3K4me1 > 0 and H3K27ac > 0) and poised enhancers (H3K4me1 > 0 and H3K27me3 > 0). We can thus separately compare ChAs values between P–P contacts and contacts of P fragments with each type of O fragment. As shown in Fig. 5f, RNAPII-S2P has higher ChAs than the other RNAPII variants in contacts between promoters and active enhancers but not in contacts with poised enhancers (Additional file 1: Figure S16). This suggests that the presence of elongating RNAPII at the P–O contact and the activity of the enhancer might be related.
Strikingly, we also observe a considerable number of contacts between promoters and fragments that do not have the H3K4me1 enhancer mark (H3K4me1 = 0, referred to as non-enhancers in the figure), which we found to be strongly enriched for H3K36me3 (Additional file 1: Figure S17) and that, in 19 % of cases, overlap protein coding gene bodies. In these contacts ChAs varies from very negative in the non-specific forms to highly positive for the elongating form. This is not due to a change in the abundance of different forms of RNAPII (Additional file 1: Figure S18) and these results are largely confirmed in HiCap (Fig. 5g). These findings suggest that promoters can contact transcribed gene bodies.