Research | Open | Published:
VULCAN integrates ChIP-seq with patient-derived co-expression networks to identify GRHL2 as a key co-regulator of ERa at enhancers in breast cancer
Genome Biologyvolume 20, Article number: 91 (2019)
The Correction to this article has been published in Genome Biology 2019 20:122
VirtUaL ChIP-seq Analysis through Networks (VULCAN) infers regulatory interactions of transcription factors by overlaying networks generated from publicly available tumor expression data onto ChIP-seq data. We apply our method to dissect the regulation of estrogen receptor-alpha activation in breast cancer to identify potential co-regulators of the estrogen receptor’s transcriptional response.
VULCAN analysis of estrogen receptor activation in breast cancer highlights the key components of the estrogen receptor complex alongside a novel interaction with GRHL2. We demonstrate that GRHL2 is recruited to a subset of estrogen receptor binding sites and regulates transcriptional output, as evidenced by changes in estrogen receptor-associated eRNA expression and stronger estrogen receptor binding at active enhancers after GRHL2 knockdown.
Our findings provide new insight into the role of GRHL2 in regulating eRNA transcription as part of estrogen receptor signaling. These results demonstrate VULCAN, available from Bioconductor, as a powerful predictive tool.
Breast cancer is the most common form of cancer in women in North America and Europe accounting for 31% of all new cancer cases. In the USA, it is estimated that 41,400 deaths will have occurred from the disease in 2018 . The majority of breast cancers, approximately 70%, are associated with deregulated signaling by the estrogen receptor-alpha (ER), which drives tumor growth. Therefore, in ER-positive (ER+) tumors, ER is the primary therapeutic target. During activation, ER recruits several cofactors to form an active complex on the chromatin. FOXA1 is of particular interest as the protein shares nearly 50% of its genomic binding sites with ER and has been shown to operate as a pioneer factor before ER activation [2, 3]. It is through FOXA1 and other cofactors (e.g., SRC-1) [4, 5] that ER is able to recruit RNA polymerase II at the gene promoter sites by way of adaptor proteins in order to initiate transcription . Combinatorial treatments targeting ER cofactors present a significant opportunity in breast cancer therapy for increasing patient survival. In particular, the pioneer factor FOXA1  has been identified as a novel therapeutic target for the treatment of breast cancer, while the EZH2-ERα-GREB1 transcriptional axis has been shown to play a key role in therapeutic resistance .
ChIP-seq enables the identification of potential site-specific interactions at common binding sites between transcription factors and their cofactors; however, to fully characterize all potential cofactors of a single project on this scale is laborious and expensive. To follow up all potential cofactors identified by a chromatin-wide proteomics method, e.g., RIME  or ChIP-MS , would take hundreds of individual ChIP-seq experiments. Studies like ENCODE  have gone a long way to provide resources to meet these challenges; however, the inherent scale of the problem means public studies can only offer data for a subset of TF in a limited number of models. A single lab to undertake this level of experimentation is unfeasible and, in cases where suitable antibodies for the ChIP do not exist, impossible.
To enable discoveries beyond collections like ENCODE, we are proposing a computational framework to integrate patient data in the prediction of functional protein-protein interactions. By applying machine learning methods, we are able to surpass the limitation of current predictive tools that exist to support the interpretation of data. Previous methods provide information in the context of predefined biological pathways and established gene sets [12, 13] or through motif analysis , while our method is built on data specific to the disease being studied. Further, standard gene set enrichment analysis has inherent limitations because it was not designed for reconstructing gene networks, whereas one of the advantages of VULCAN is that it down weights genes shared by multiple TFs.
Our method, “VirtUaL ChIP-seq Analysis through Networks” (VULCAN), is able to specifically analyze the potential disease-specific interactions of TFs in ChIP-seq experiments by combining machine learning approaches and patient data. Previously, the strategies employed by VULCAN were limited to the analysis of transcription data. By developing VULCAN to overlay co-expression networks established from patient tumor data onto ChIP-seq data, we are able to provide candidate co-regulators of the response to a given stimulus (Fig. 1). Further, as VULCAN builds on transcriptional master regulator analysis, the output from the pipeline provides the end user with functional information in terms of the activity of potentially interacting TFs. The combination of disease-specific context and TF activity information presents a significant step forward in providing valuable information for the elucidation of on-chromatin interactions from ChIP-seq experiments over previous strategies.
Through the application of VULCAN to the activation of the ER in breast cancer, we were able to identify multiple previously characterized cofactors of the ER along with GRHL2 as a potential co-repressor of the ER. We then demonstrated experimentally that GRHL2 is able to modulate the expression of eRNA at ER bound enhancers, and the removal of the P300 inhibitory alpha helix results in suppression of the inhibitory effect on eRNA production.
VULCAN integrates ChIP-seq data (Fig. 1, step 1) with co-expression networks (Fig. 1, step 2) to predict cofactor activity (Fig. 1, step 3). The initial ChIP-seq data is converted into genomic regions, and if multiple conditions are supplied, the changes in the transcription factor affinity are calculated. In parallel, master regulator analysis of tumor transcriptional data is used to provide tissue-specific information on the regulation of genes by TFs within the tumor type. The integration of these two data types provides context-specific results and differentiates VULCAN from the existing methods which make use of predefined unweighted gene sets or motif analysis. VULCAN additionally makes use of the key functionality of the VIPER algorithm  that assigns edge-specific scores like mode of action and likelihood to the reconstructed network.
In the following, we first benchmark VULCAN’s performance in a comprehensive comparison to alternative approaches. We then apply it to our data on temporal ER binding, which identifies GRHL2 as a novel ER cofactor, and we explore its function.
Comparison of VULCAN to existing methods
Mutual information networks outperform partial correlation networks
We generated a mutual information network with ARCANe alongside several partial correlation networks at different thresholds all from the TCGA breast cancer data. To ensure the robustness of our method, we tested the overlap of every partial correlation network with the mutual information network using the Jaccard index (JI) criterion (Additional file 1: Figure S1). Finally, we showed how the Jaccard index between partial correlation networks and the ARACNe network is always significantly higher than expected by selecting random network edges (Additional file 1: Figure S2). For further analysis, we selected the mutual information network generated by ARACNe as this method outperformed the partial correlation networks at all thresholds.
GSEA is the optimum method for VULCAN’s target enrichment analysis
VULCAN applies gene set enrichment analysis  to identify enrichment of our mutual information network derived regulons in differential ChIP-seq data. To validate our method, we compared the results of VULCAN when applied to our ER binding data against three independent methods previously applied to benchmark VIPER . First, we implemented a Fisher p value integration step. This test lacks stringency and results in nearly all regulons as significantly enriched (Additional file 1: Figure S3). Second, we implemented a fraction of targets method, defining for every TF the fraction of their targets that are also differentially bound. This alternative to VULCAN ignores the MI strength of interaction and the individual strengths of differential bindings, reducing the resolving power of the algorithm (Additional file 1: Figure S4). Finally, we compared to Fisher’s exact method, which assesses the overlap between networks and significant differential binding. This method is too stringent (as observed in the original VIPER paper) ; and even without p value correction, there are no significant results, even at low stringency, demonstrating the low sensitivity of using Fisher’s exact method (Additional file 1: Figure S5). In summary, VULCAN GSEA implementation outperformed all three alternative methods we tested (t test based; fraction of targets method; and Fisher’s exact method) in our dataset and was therefore applied to all downstream analysis of ChIP-seq data.
VULCAN outperforms enrichment analysis tools (GREAT, ISMARA, and ChIP-Enrich)
To further validate our method, we compared the output of our GSEA analysis with different versions of promoter-enrichment approaches implemented by GREAT , ISMARA , and ChIP-Enrich . The VULCAN analysis shows a significant overlap in terms of detected pathways with the GREAT method (Additional file 1: Figure S6). ChIP-Enrich identifies enrichment of a number of TFs also predicted by VULCAN, but it fails to identify ESR1 as the top transcription factor affected by our experiment (Additional file 1: Figure S7). ISMARA succeeds at identifying ESR1 using a motif-based analysis but does not identify other candidate binding TFs (Additional file 1: Figure S8). In summary, VULCAN outperforms both ISMARA and ChIP-Enrich, and significantly overlaps with GREAT, but provides additional value through inference of TF factor activity.
Temporal analysis of ER DNA binding profiles after activation by E2
We performed four replicated ChIP-seq experiments for ER at three time points (0, 45, and 90 min) after estradiol treatment (Fig. 2) in the ER+ breast cancer cell line, MCF7. The cistromic profile of ER at 45 and 90 min was then compared to 0 min to identify binding events enriched by E2. Our analysis (Fig. 2b, c) identified 18,900 statistically significant binding events at 45 min (FDR < 0.05) and 17,896 numbers at 90 min. We validated the ER binding behavior with ChIP-qPCR (Fig. 2a), and the response was sustained, in agreement with our previous study .
We performed motif enrichment analysis (HOMER software) on ER binding sites detected by differential binding analysis. Our analysis confirmed a strong enrichment for a single element, ERE, bound at both 45 and 90 min, with a corrected p value of 0.0029 (Fig. 3f). When clustered according to peak intensity, the samples cluster tightly in two groups: treated and untreated (Additional file 1: Figures S9, S10, and S11), but treatment at 45 and 90 min is detectably different on a genome-wide scale, as highlighted by principal component analysis (Additional file 1: Figure S12 and S13).
We performed a gene set enrichment analysis (GSEA)  and an associated rank enrichment analysis (aREA)  using the differential binding at gene regulatory regions with time 0 as reference. Individual differential binding signatures for GSEA were calculated using a negative binomial test implemented by DiffBind . The collective contribution of differentially bound sites highlights several ER-related pathways in both the GSEA and aREA analyses [19,20,21] (Additional file 1: Figure S14). The strongest upregulated GSEA pathway in both time points (Additional file 1: Tables S1 and S2) was derived from RNA-seq in an MCF7 study using estradiol treatment , confirming the reproducibility of our dataset.
VULCAN analysis of ER activation
VULCAN identifies coactivators and co-repressors of ER
We leveraged the information contained in mutual information networks to establish TF networks enriched in the differential binding patterns induced by estradiol. From our analysis of ER binding, we established four classes of modulation: early coactivators, early co-repressors, delayed coactivators, and transient coactivators (Fig. 3).
Using VULCAN, we defined TF network activity of occupied regulatory regions (Fig. 3a) according to the binding of ER within their promoter and enhancer regions (limited to 10 kb upstream of the transcription starting site to ensure gene specificity). We define early coactivators as those TFs whose network is upregulated at both 45 and 90 min (Fig. 3b); these genes include AR, SP1, and CITED1. TFs with opposite behavior (namely TFs whose negative/repressed targets in the ARACNe model are occupied by ER), or “early co-repressors,” include GLI4, MYCN, and GRHL2 (Fig. 3c). Some TFs appear to have their targets transiently bound at 45 min but then unoccupied at 90 min, and therefore, we dubbed them “transient coactivators” (Fig. 3d). We further defined TFs active at 90 min but not at 45 min as “delayed coactivators,” noting these cofactors could be the transient if the response is not completed by 90 min. While this category exists, and notably contains both ESR1 and the known ESR1 interactor GATA3, it is just below the significance threshold at 45 min (Fig. 3e).
We repeated our TF network activity analysis of ER activation (Fig. 3a–e) on an independent dataset from TCGA and found similar results to those established from the METABRIC-derived network (Additional file 1: Figures S14, S15, S16, S17, S18, and S19).
To ensure the robustness of the results, we performed a joint analysis of data obtained from both networks. At 45 (Fig. 4a) and 90 min (Fig. 4b), we identified candidates, specifically the ESR1, GATA3, and RARA networks, which were consistently and robustly activated by ER in both time points. The joint analysis also identified candidate co-repressors, including HSF1 and GRHL2.
VULCAN results are specific to the tissue used for network modeling
Regulatory networks can be tissue specific due to a variety of biological reasons, such as chromatin status, cofactor availability, and lineage-dependent transcriptional rewiring . We tested whether our VULCAN results can be affected by the choice of the ARACNe-inferred regulatory network. In order to do so, we required a gene expression dataset large enough for robust mutual information inference (> 100 samples), based on the same library preparation and sequencing protocols as the breast cancer TCGA dataset used in our analysis (to remove the possibility of technical differences), but ultimately derived from a tissue as distant as possible from breast cancer (BRCA) on which network models on this study are derived. For this purpose, we computed ARACNe regulatory models on the TCGA dataset for acute myeloid leukemia (AML), a liquid tumor histologically very different from BRCA. This AML-derived network shows globally weaker VULCAN enrichment scores than the BRCA-derived network and a weak positive correlation with the results obtained through breast cancer regulatory models (Additional file 1: Figure S20). The positive correlation suggests that regulatory networks inferred in breast cancer are tissue specific and can only in part be recapitulated by a leukemia-inferred network.
VULCAN is able to predict protein-protein interactions in both patient-derived xenografts (PDX) and prostate cancer
To demonstrate the general applicability of VULCAN, we applied the algorithm to a breast cancer patient-derived xenograft dataset (Gene Expression Omnibus series GSE110824) [22, 23], which showed the expected enrichment of the ESR1, FOXA1, and GATA3 regulons (Additional file 1: Figure S21 and Fig. 5a) predicting the co-localization of the respective proteins on the chromatin. To further test the generality of VULCAN, we applied the method to another cancer-associated transcription factor type. More specifically, we evaluated an androgen receptor ChIP-seq dataset derived from prostate cancer cell line model LNCaP-1F5 and VCaP (Gene Expression Omnibus Series GSE39880, AR + DHT, RU486, or CPA) . By applying a context-specific network built from the TCGA prostate cancer dataset, we could predict functional co-localization of FOXA1 and AR in target genes’ promoters after dihydrotestosterone (DHT) treatment in prostate cell lines (Additional file 1: Figure S22 and Fig. 5b), validating the known role of FOXA1 in AR-regulated gene transcription in prostate cancer [25, 26].
VULCAN outperforms classical motif analysis
Finally, we compared VULCAN to a classical motif analysis by exploiting the MsigDB C3 collection v6.1  of gene sets, which contain canonical TF-specific binding motifs in their promoters. Our analysis shows the correlation of VULCAN results for two transcription factors (e.g., between GATA3 and ESR1, Additional file 1: Figure S23) can be relatively high but not significantly overlapping in terms of target genes containing the same canonical motifs (Additional file 1: Figure S24). We could prove that this non-relationship is general as it extends to the majority of TF-TF pairs that were present in the MsigDB database (Additional file 1: Figure S25).
Optimization of VULCAN parameters
The network generation algorithm uses established methods to optimize the parameters for the RNA-seq input (e.g., ARACNE-AP calculates the edge significance based on data-specific permutation test). By default, VULCAN can calculate key settings from the provided ChIP-seq data (e.g., DNA fragment length). Additionally, parameter choice is tunable at the wish of the user. The distance from promoter transcription starting site (TSS) can be tuned to the specific organism investigated by the ChIP-seq experiment. In this manuscript, we used 1000 nt for Homo sapiens, but it can be lowered to 100 nt for bacterial chromosomes, to assign peaks that act as representative for the gene.
GRHL2 is a novel ER cofactor
In our analysis of ER dynamics, the GRHL2 transcription factor was consistently identified as a key player, using both the METABRIC and TGCA networks. GRHL2 is a transcription factor that is important for maintaining epithelial lineage specificity in multiple tissues [28, 29]. It has previously been predicted to exist in ER-associated enhancer protein complexes , but its function in the ER signaling axis is unknown. Therefore, we set out to experimentally validate GRHL2 as an ER cofactor.
There is only a weak, positive correlation between ESR1 and GRHL2 expression in the TCGA and METABRIC breast cancer datasets (Additional file 1: Figure S26). Furthermore, GRHL2 does not change significantly in different PAM50 subtypes, although it is overexpressed in malignant tissue. The low correlation between GRHL2 expression and subtype implies that the protein is controlled by mechanisms such as phosphorylation , subcellular localization, or on-chromatin protein-protein interactions.
qPLEX-RIME detects a significant increase in the ER-GRHL2 interaction on activation
We undertook a complementary, unbiased, experimental approach combining RIME  with TMT , called qPLEX-RIME , to identify interactors of ER within the ER-chromatin complex. We generated ER qPLEX-RIME data from MCF7 cells treated with estradiol at both 45 and 90 min and compared this to the VULCAN dataset (Additional file 1: Figure S27). We found known ESR1 interactors with both methods, namely HDAC1, NCOA3, GATA3, and RARA. These interactors have positive enrichment according to VULCAN , implying the TF’s regulon is over-represented within the differentially bound genes. Importantly, qPLEX-RIME identified a significant increase in the protein-protein interaction between ER and GRHL2 in estrogenic conditions. As GRHL2 has a negative enrichment score in VULCAN, this implies either the protein is recruited by ER to sites that are significantly depleted for GRHL2’s regulon or that GRHL2 is established as having a negative correlation to the genes regulated at these sites, i.e., the protein is a co-repressor of the ER.
To assess the chromatin-association of ER and GRHL2, we undertook GRHL2 ChIP-seq in the absence (0 min) or presence (45 min) of E2 (Fig. 6a). VULCAN analysis of the GRHL2 differential binding showed that ER was the key interacting transcription factor, using both the TCGA- and METABRIC-derived networks (Fig. 6b).
We undertook a comparison of GRHL2 binding with public datasets (Fig. 6c). Our analysis showed that GRHL2 sites that responded to estradiol were enriched for ER binding sites (in agreement with our qPLEX-RIME data and VULCAN results) and FOXA1 (compatible with either an ER interaction or the previously reported interaction with MLL3 ). Importantly, the changes in GRHL2 binding profiles after E2 treatment were not a result of altered GRHL2 protein levels (Fig. 7). Individual analysis of peaks shows that classical ER promoter binding sites, e.g., RARa, were not the target of this redistribution of GRHL2, as these sites were occupied by GRHL2 before E2 stimulation. Motif analysis of the sites within increased GRHL2 occupancy showed enrichment for the full ERE (p value = 1 × 10−179) and the GRHL2 binding motif (p value = 1 × 10−51) (Fig. 6g).
To establish if the recruitment of GRHL2 was primarily related to a transcriptional function or the previously described interaction with MLL3, we overlapped our GRHL2 data with that of published H3K4me1/3  and P300  cistromes. While H3K4me occupancy was consistent between conditions, we found P300 binding to be enriched at the E2-responsive GRHL2 sites.
A more detailed analysis of the GRHL2 overlap with P300 sites showed the greatest co-occupancy of GRHL2/P300 sites was when both TFs were stimulated by E2 (Fig. 6d). Moreover, the overlap of GRHL2 peaks with ER ChIA-PET data [ENCSR000BZZ] showed that the GRHL2-responsive sites were enriched at enhancers over promoters (Fig. 6e). These findings suggested that the GRHL2-ER complex is involved in transcription at ER enhancer sites.
Validation of the ER-GRHL2 interaction by qPLEX-RIME and co-IP
qPLEX-RIME  analysis of GRHL2 in both the estrogen-free and estrogenic conditions showed high levels of transcription-related protein interactors including HDAC1 (p value = 6.4 × 10−9), TIF1A (p value = 6.4 × 10−9), PRMT (p value = 6.4 × 10−9), and GTF3C2 (p value = 4.6 × 10−9). p values given for estrogen-free and estrogenic conditions were comparable. The only protein differentially bound to GRHL2 in estrogen-free vs estrogenic conditions was the ER.
We further validated this interaction by co-IP. Our analysis robustly found that GRHL2 and ER interact in both MCF7 and T47D cells (Fig. 7). We further validated the antibody by siRNA knockdown and saw the disappearance of the GRHL2 band at 24 h.
GRHL2 constrains ER binding and activity
We investigated the transcription of enhancer RNAs at these sites using publicly available GRO-seq data  [GSE43836] (Fig. 6f). At E2-responsive sites, eRNA transcription was strongly increased by E2 stimulation; by contrast, eRNA transcription was largely independent of E2 stimulation when the entire GRHL2 cistrome was considered. Analysis of a second GRO-seq dataset, GSE45822, corroborated these results (Additional file 1: Figure S28).
To further explore how GRHL2 regulates ER enhancers, we measured eRNA expression at the GREB1 [36, 37], TFF1 [38,39,40], and XBP1 [41, 42] enhancers after overexpression of GRHL2. At GREB1 and XBP1, increased GRHL2 resulted in reduced eRNA transcription (Fig. 8) (p < 0.05, paired sample t test). Conversely, eRNA production at the TFF1, XBP1, and GREB1 enhancers was moderately increased 24 h after GRHL2 knockdown (Additional file 1: Figure S29). Combining the data from all three sites established the effect as significant by paired sample rank test (p = 0.04, one-tailed paired sample, Wilcoxon test). Collectively, these data demonstrate that GRHL2 constrains specific ER enhancer transcription.
A conserved alpha helix between residues 425 and 437 of GRHL2 has previously been shown to inhibit P300 . We therefore overexpressed GRHL2 Δ425–437, a previously demonstrated non-p300-inhibitory mutant [43, 44], in three ER-positive breast cancer cell lines (MCF7, T47D, and ZR75) and compared levels of eRNA to those recorded for both an empty vector control and for the overexpression of the wild-type protein (Fig. 8b). The results of the wild-type study were concordant to those of our previous analysis (Fig. 8a), suggesting in general that overexpression GRHL2 leads to the inhibition of eRNA production at certain ER sites. Importantly, in all cases, the removal of aa 425–437 from GRHL2 led to a reduction in the inhibitory effect caused by overexpression of the wild-type protein and was found as significant in five out of nine cases test (p < 0.05, t test, single-tail, paired).
We undertook H3K27ac ChIP-seq after knockdown of GRHL2 by siRNA for 48 h. In both MCF7 and T47D cells, we saw a significant change in the acetylation marks surrounding GREB1, and in MCF7, we saw an increase at both XBP1 and GREB1 promoters and a decrease at TFF1 (Fig. 9a). Genome-wide, we saw a redistribution of H3K27 acetylation in both cell lines (Fig. 9b). Comparison of the sites altered by GRHL2 knockdown showed a stronger signal for ER binding (Fig. 9c, right).
VirtUaL ChIP-seq Analysis through Networks
VULCAN is valuable for the discovery of transcription factors acting as co-regulators within chromatin-bound complexes that would otherwise remain hidden. The challenge of highlighting the cofactors from a ChIP-seq experiment lays in the infeasibility of reliable proteomic characterization of DNA-bound complexes at specific regions. On the other hand, while RNA-seq is arguably the most efficient technique to obtain genome-wide quantitative measurements, any transcriptomic approach cannot provide a full picture of cellular responses for stimuli that are provided on a shorter timescale than mRNA synthesis speed, such as the estradiol administration described in our study. VULCAN, by combining RNA-seq-derived networks and ChIP-seq cistrome data, aims at overcoming the limitations of both. Most notably, our method can work in scenarios where candidate cofactors do not have a well-characterized binding site or do not even bind DNA directly.
Through comparative analysis, we have robustly shown that VULCAN is able to outperform other readily available methods for the prediction of on-chromatin interactions of transcription factors. VULCAN achieves this through the integration of ChIP-seq and tumor transcriptional data. The inherent limitation of our method therefore is that tumor transcriptional data must be available in sufficient quantity to build the underlying network for the analysis, whereas tools based on predefined networks have no such limitation. In the majority of cases, this is not a challenge as projects like the TCGA provide transcriptome-wide data for a range of cancers. It is therefore only in the cases of rarer disease types (such as neuroendocrine tumor) and orphan tissues that this limitation will be problematic as these are poorly represented in public data. Even so, in these cases where specific networks cannot be generated, pan-tissue regulatory networks are currently being developed to overcome this limitation and these could be adapted for VULCAN .
By developing VULCAN, we have been able to rediscover the known cofactors of the estradiol-responsive ER complex and predict and experimentally validate a novel protein-protein interaction.
GRHL2 has a key role in regulating EMT
In the 4T1 tumor model, GRHL2 was found to be significantly downregulated in cells that had undergone EMT . The same study showed that knockdown of GRHL2 in MCF10A—an ER-negative cell line—led to the loss of epithelial morphology. Overall, this suggested that the GRHL2 transcription factor plays an essential role in maintaining the epithelial phenotype of breast cells. Similar results were observed with the MDA-MB-231 model, where expression of GRHL2 resulted in the reversal of EMT . This result has been recapitulated in hepatocytes, where GRHL2 was found to suppress EMT by inhibiting P300 . The ability to suppress EMT has also been noted in prostate cancer, another cancer driven by a steroid hormone receptor (AR), and the genes regulated by GRHL2 are linked to disease progression .
GRHL2, a novel co-repressor of ER eRNA production
These earlier data combined with the link between GRHL2 expression and patient survival indicate a significant role for GRHL2 in the progression of breast cancer. However, its role in the ER signaling axis has, until now, been unknown. Here, we show that GRHL2 performs its activity at a subset of ER enhancers. Overexpression of GRHL2 resulted in a significant decrease in eRNA production at the TFF1 and XBP1 enhancers, and in agreement with previous studies that correlate eRNA transcription with gene expression [47,48,49], we found the measured eRNA decrease was concurrent with a significant downregulation in the expression of the corresponding gene.
These results are consistent with previous findings that GRHL2 inhibits P300  and, while the ER complex results in the activation of eRNA transcription at these sites, that GRHL2 plays a role in fine-tuning or modulating this process.
GRHL2 role in the ER signaling axis is independent to its role in tethering MLL3
In breast cancer, GRHL2 has previously been shown to directly interact with FOXA1, which may contribute to the tethering of the histone methyltransferase MLL3 and, consequently, epigenetic marks at GRHL2/FOXA1 binding sites . Our analysis, however, showed no particular enrichment for H3K4me1/3 marks at E2-responsive GRHL2 sites compared to other GRHL2 binding sites, and our proteomic analysis of interactors showed a strong association with proteins related to transcription. We proposed that these ER-responsive sites are related to the role of GRHL2 in a transcriptional process independent of its interaction with MLL3. This was supported by evidence of a significant overlap with binding of the coactivator P300, transcriptional proteins detected by qPLEX-RIME analysis of GR, and a pronounced increase in eRNA transcription at E2-responsive GRHL2 sites.
ER is bound more strongly at active enhancers (H3K27ac) that are altered by siGRHL2
Knockdown of GRHL2 led to a genome-wide remodeling of H3K27ac marks, found at active enhancers, confirming a role of GRHL2 in partially regulating these sites. Detailed inspection of the data showed a significant increase of these marks around the XBP1 and GREB1 genes, supporting our hypothesis that GRHL2 has a partial inhibitory role within the ER regulon. The result was further supported by finding enrichment of ER binding events at H3K27ac marks altered by GRHL2 knockdown (Fig. 9c, right panel). The more complex effect on H3K12ac, when compared to the effects on eRNA production, is likely a result of the diversity of roles that GRHL2 holds within the cell, leading to a host of downstream effects in the regulation of chromatin recruitment of key factors such as MLL3, ER, and FOXA1.
Deletion of the P300 inhibitory α-helix from GRHL2 reduces the protein’s ability to repress the production of eRNA at ER bound enhancer sites
Finally, to clarify if inhibition of P300 was occurring, we generated and overexpressed GRHL2 lacking the inhibitory alpha helix between amino acids 425 and 437. In all cases, GRHL2 Δ425–437 had reduced an inhibitory effect compared to overexpression of the wild type, confirming that GRHL2 primarily plays a repressive role at these sites.
VULCAN is built on state-of-the-art network analysis tools previously applied to RNA-seq data. By adapting network-based strategies to ChIP-seq data, we have been able to reveal novel information regarding the regulation of breast cancer in a model system.
We have demonstrated that the VULCAN algorithm can be applied generally to ChIP-seq for the identification of new key regulator interactions. Our method provides a novel approach to investigate chromatin occupancy of cofactors that are too transient or for which no reliable antibody is available for direct ChIP-seq analysis.
Further, because of our use of clinical data, VULCAN results are both more likely to be relevant and are specific to the disease type studied, as demonstrated in the loss of signal when using a control co-expression network generated from an alternative disease type.
VULCAN enabled us to identify the GRHL2-ER interaction and that GRHL2 plays a repressive role. Further analysis showed the process to be independent of the previously reported interaction with FOXA1 and MLL3 . Our conclusion, therefore, is that GRHL2 has a second, previously undescribed role that regulates transcription at specific estrogen-responsive enhancers (Fig. 10).
Given the central role of the ER in breast cancer development and GRHL2’s own ability to regulate EMT, the discovery that ER recruits GRHL2 leading to the altered eRNA production is an important step in enhancing our understanding of breast cancer and tumorigenesis.
An implementation of VULCAN in R is available on Bioconductor.org [https://bioconductor.org/packages/release/bioc/html/vulcan.html], and the scripts to replicate our analysis are available as Rmarkdown files. Unless otherwise specified, all p values were Bonferroni corrected.
MCF7 cells were obtained from the CRUK Cambridge Institute collection, authenticated by STR genotyping and confirmed free of mycoplasma. All cells were maintained at 37 °C, 5% CO2. For each individual ChIP pulldown, we cultured 8 × 107 MCF7 cells (ATCC) across four 15-cm-diameter plates in DMEM with 10% FBS, glutamine and penicillin/streptomycin (Glibco). Five days before the experiment, the cells were washed with phosphate-buffered saline (PBS) and the media were replaced with clear DMEM supplemented with charcoal-treated serum. The media was refreshed every 24 h, which halted the growth of the cells and ensured that majority ER within the cell was not active. On day 5, the cells were treated with estradiol (100 nM). At the appropriate time point, the cells were washed with ice-cold PBS twice and then fixed by incubating with 10 mL per plate of 1% formaldehyde in unsupplemented clear media for 10 min. The reaction was stopped by the addition of 1.5 mL of 2.5 M glycine, and the plates were washed twice with ice-cold PBS. Each plate was then scraped in 1 mL of PBS with protease inhibitors (PI) into a 1.5-mL microcentrifuge tube. The cells were centrifuged at 8000 rpm for 3 min at 4 °C and the supernatant removed. The process was repeated for a second wash in 1 mL PBS+PI and the PBS removed before storing at − 80 °C.
Frozen samples were processed using established ChIP protocols  to obtain DNA fragments of ~ 300 bp in length. The libraries were prepared from the purified DNA using a ThruPLEX DNA-seq kit (Rubicon Genomics) and sequenced on the Illumina HiSeq Platform. Sequencing data is available from Gene Expression Omnibus, accession GSE109820 and GSE123475.
Differential binding analysis
Sequencing data was aligned using BWA  to the human genome (hg19). Reads from within the DAC Blacklisted Regions was removed before peak calling with MACS 2.1  on default parameters. The aligned reads and associated peak files were then analyzed using DiffBind  to identify significant changes in ER binding.
Gene set enrichment analysis
Gene set enrichment analysis (GSEA) was performed as described by Subramanian et al.  using the curated pathway collection (C2) from MSIGDB v 5.0 with 1000 set permutations for each pathway investigated, followed by Benjamini-Hochberg p value correction.
We reconstructed a regulatory gene network using ARACNe-AP as described by Alvarez . RNA-seq breast cancer data was downloaded from TCGA in January 2015 and VST-Normalized as described by Anders and Huber . The ARACNe transcriptional regulation network was imported into R using the VIPER BioConductor package, and it was interrogated using the differential binding profiles from our ChIP-seq experiment as signatures, 45 min vs control and 90 min vs control. The peak-to-promoter assignment was performed using a 10-kb window with respect to the transcription starting site (TSS) of every gene on the hg19 human genome. The algorithm msVIPER (multi-sample Virtual Inference of Protein activity by Enriched Regulon analysis) was then applied, leveraging the full set of 8 replicates per group, with 1000 signature permutations and default parameters.
TF binding overlap
Publicly available data was downloaded as described in the source publication [3, 30, 34, 35], and overlap was calculated with bedtools (v2.25.0). Presented data was normalized as a percentage of GRHL2 sites.
MCF7 cells were transfected with Smart Pool siRNA (Dharmacon, L-014515-02), siControl, GRHL2 overexpression vector (Origene, RC214498), GRHL2 Δ425–437 (Origene), or empty control vector using Lipofectamine 3000 (Thermo Fisher Scientific) according to the manufacturer’s protocol in 6-well format. Expression was monitored by rtPCR using TaqMan assay with GAPDH as a control transcript. Knockdown efficiency was ~ 75%, and the GRHL2 overexpression vector led a 730-fold increase in expression over control plasmid. One microgram of purified RNA was reverse transcribed with Superscript III reverse transcriptase (Thermo Fisher Scientific, 18080085) using random primers (Promega, C1181) according to the manufacturer’s instructions. eRNAs were quantified with qPCR using Power SYBR™ Green PCR Master Mix (Thermo Fisher Scientific, 4367660) and denoted as relative eRNA levels after normalizing with UBC mRNA levels.
|eGREB1 F||ACTGCGGCATTTCTGTGAGA||This study|
|eGREB1 R||ACTGCAGTTTGCCTGTCACT||This study|
|eXBP1 F||TGTGAGCACTTGGCATCCAT||Nagarajan et al. |
|eXBP1 R||ACAGGGCCTCATTCTCCTCT||Nagarajan et al. |
|eTFF1 F||AGGGGATGTGTGTGAGAAGG||Li et al. |
|eTFF1 R||GCTTCGAGACAGTGGGAGTC||Li et al. |
|UBC F||ATTTGGGTCGCGGTTCTTG||Peña et al. |
|UBC R||TGCCTTGACATTCTCGATGGT||Peña et al. |
ERα (F10) antibody, Santa Cruz (sc-8002), was cleaned using Amicon 10K Buffer Exchange Column (EMD, Cat # UFC501096) to remove the sodium azide. 2.5 μg ERα (F10) antibody rotated overnight at 4 °C with 100 μL Dynabeads Protein A, Invitrogen (10001D).
Nuclear lysate was harvested via cell lysis (20 mM Tris-HCl, 20 mM NaCl, 0.2 mM EDTA) followed by nuclear lysis (20 mM Tris-HCl, 20 mM NaCl, 0.2 mM EDTA 1% IGEPAL). The uclear lysate was then incubated overnight at 4 °C with the Dynabeads Protein A. Elution via 10 min incubation at 70 °C with 1× NuPAGE LDS Sample Buffer, Invitrogen (NP0007), and 1× NuPAGE Sample Reducing Agent, Invitrogen (NP0004), and subjected to western blotting.
Knockdown of GRHL2
Knockdown of GRHL2 was undertaken using ON-TARGETplus SMARTpool Human GRHL2, Dharmacon (#L-014515-02-0050) and Lipofectamine® RNAiMAX Reagent Protocol (Thermo) according to the manufacturer’s protocol. Control samples were prepared following the same method using ON-TARGETplus Control pool Non-targeting pool, Dharmacon (#D-001810-10-50) in place of siGRHL2.
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;68(1):7–30.
Zaret KS, Carroll JS. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 2011;25(21):2227–41.
Hurtado A, Holmes KA, Ross-Innes CS, Schmidt D, Carroll JS. FOXA1 is a key determinant of estrogen receptor function and endocrine response. Nat Genet. 2010;43:27–33.
Horwitz KB, Jackson TA, Bain DL, Richer JK, Takimoto GS, Tung L. Nuclear receptor coactivators and corepressors. Mol Endocrinol. 1996;10(10):1167–77.
Glass CK, Rose DW, Rosenfeld MG. Nuclear receptor coactivators. Curr Opin cell Biol. 1997;9(2):222–32.
Green KA, Carroll JS. Oestrogen-receptor-mediated transcription and the influence of co-factors and chromatin state. Nat Rev Cancer. 2007;7(9):713–22.
Nakshatri H, Badve S. FOXA1 as a therapeutic target for breast cancer. Expert Opin. Ther Targets. 2007;11(4):507–14.
Wu Y, et al. Tamoxifen resistance in breast cancer is regulated by the EZH2-ERα-GREB1 transcriptional axis. Cancer Res. 2018;78(3):671–84.
Mohammed H, et al. Endogenous purification reveals GREB1 as a key estrogen receptor regulatory factor. Cell Rep. 2013;3(2):342–9.
Wang CI, et al. Chromatin proteins captured by ChIP-mass spectrometry are linked to dosage compensation in Drosophila. Nat Struct Mol Biol. 2013;20(2):202–9.
Hong EL, et al. Principles of metadata organization at the ENCODE data coordination center. Database. 2016;2016:baw001.
McLean CY, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28(5):495–501.
Welch RP, et al. ChIP-Enrich: gene set enrichment testing for ChIP-seq data. Nucleic Acids Res. 2014;42(13):e105.
Balwierz PJ, Pachkov M, Arnold P, Gruber AJ, Zavolan M, van Nimwegen E. ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs. Genome Res. 2014;24(5):869–84.
Alvarez MJ, et al. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet. 2016;48(8):838–47.
Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
Holding AN, Cullen AE, Markowetz F. Genome-wide estrogen receptor-α activation is sustained, not cyclical. Elife. 2018;7:e40854.
Ross-Innes CS, et al. Differential oestrogen receptor binding is associated with clinical outcome in breast cancer. Nature. 2012;481(7381):389–93.
Stein RA, et al. Estrogen-related receptor alpha is critical for the growth of estrogen receptor-negative breast cancer. Cancer Res. 2008;68(21):8805–12.
Bhat-Nakshatri P, et al. AKT alters genome-wide estrogen receptor alpha binding and impacts estrogen signaling in breast cancer. Mol Cell Biol. 2008;28(24):7487–503.
Gozgit JM, Pentecost BT, Marconi SA, Ricketts-Loriaux RSJ, Otis CN, Arcaro KF. PLD1 is overexpressed in an ER-negative MCF-7 cell line variant and a subset of phospho-Akt-negative breast carcinomas. Br J Cancer. 2007;97(6):809–17.
Guertin MJ, Cullen AE, Markowetz F, Holding AN. Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq. Nucleic acids Res. 2018;46(12):e75.
Bruna A, et al. A biobank of breast cancer explants with preserved intra-tumor heterogeneity to screen anticancer compounds. Cell. 2016;167(1):260–274.e22.
Sahu B, et al. FoxA1 specifies unique androgen and glucocorticoid receptor binding events in prostate cancer cells. Cancer Res. 2013;73(5):1570–80.
Sahu B, et al. Dual role of FoxA1 in androgen receptor binding to chromatin, androgen signalling and prostate cancer. EMBO J. 2011;30(19):3962–76.
Wang D, et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature. 2011;474(7351):390–4.
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27(12):1739–40.
Xiang X, et al. Grhl2 determines the epithelial phenotype of breast cancers and promotes tumor progression. PloS One. 2012;7(12):e50781.
Cieply B, et al. Suppression of the epithelial-mesenchymal transition by Grainyhead-like-2. Cancer Res. 2012;72(9):2440–53.
Jozwik KM, Chernukhin I, Serandour AA, Nagarajan S, Carroll JS. FOXA1 directs H3K4 monomethylation at enhancers via recruitment of the methyltransferase MLL3. Cell Rep. 2016;17(10):2715–23.
Helzer KT, et al. The phosphorylated estrogen receptor cistrome identifies a subset of active enhancers enriched for direct ER-DNA binding and the transcription factor GRHL2. Mol Cell Biol. 2018;39(3):e00417-18.
Thompson A, et al. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem. 2003;75(18):1895–1904.
Papachristou EK, et al. A quantitative mass spectrometry-based approach to monitor the dynamics of endogenous chromatin-associated protein complexes. Nat Commun. 2018;9(1):2311.
Zwart W, Theodorou V, Kok M, Canisius S, Linn S, Carroll JS. Oestrogen receptor-co-factor-chromatin specificity in the transcriptional regulation of breast cancer. EMBO J. 2011;30(23):4764–76.
Hah N, Murakami S, Nagari A, Danko CG, Kraus WL. Enhancer transcripts mark active estrogen receptor binding sites. Genome Res. 2013;23(8):1210–23.
Ghosh MG, Thompson DA, Weigel RJ. PDZK1 and GREB1 are estrogen-regulated genes expressed in hormone-responsive breast cancer. Cancer Res. 2000;60(22):6367–75.
Rae JM, Johnson MD, Scheys JO, Cordero KE, Larios JM, Lippman ME. GREB 1 is a critical regulator of hormone dependent breast cancer growth. Breast Cancer Res Treat. 2005;92(2):141–9.
Prest SJ, May FEB, Westley BR. The estrogen-regulated protein, TFF1, stimulates migration of human breast cancer cells. FASEB J. 2002;16(6):592–4.
Ribieras S, Tomasetto C, Rio MC. The pS2/TFF1 trefoil factor, from basic research to clinical applications. Biochim et Biophys Acta. 1998;1378(1):F61–77.
Crosier M, Scott D, Wilson RG, Griffiths CD, May FE, Westley BR. High expression of the trefoil protein TFF1 in interval breast cancers. Am J Pathol. 2001;159(1):215–21.
Lacroix M, Leclercq G. About GATA3, HNF3A, and XBP1, three genes co-expressed with the oestrogen receptor-alpha gene (ESR1) in breast cancer. Mol Cell Endocrinol. 2004;219(1–2):1–7.
Oh DS. Estrogen-regulated genes predict survival in hormone receptor-positive breast cancers. J Clin Oncol. 2006;24(11):1895–1904.
Pifer PM, et al. Grainyhead-like 2 inhibits the coactivator p300, suppressing tubulogenesis and the epithelial-mesenchymal transition. Mol Biol Cell. 2016;27(15):2479–92.
MacFawn I, et al. Grainyhead-like-2 confers NK-sensitivity through interactions with epigenetic modifiers. Mol Immunol. 2019;105:137–49.
Ding H, et al. Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm. Nat Commun. 2018;9(1):1471.
Paltoglou S, et al. Novel androgen receptor coregulator GRHL2 exerts both oncogenic and antimetastatic functions in prostate cancer. Cancer Res. 2017;77(13):3417–30.
Cheng J-H, Pan DZ-C, Tsai ZT-Y, Tsai H-K. Genome-wide analysis of enhancer RNA in gene regulation across 12 mouse tissues. Sci Rep. 2015;5:12648.
Rahman S, et al. Single-cell profiling reveals that eRNA accumulation at enhancer-promoter loops is not required to sustain transcription. Nucleic acids Res. 2017;45(6):3017–30.
Kim T-K, et al. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465(7295):182–7.
Holmes KA, Brown GD, Carroll JS. Chromatin immunoprecipitation-sequencing (ChIP-seq) for mapping of estrogen receptor-chromatin interactions in breast cancer. Methods Mol Biol. 2016;1366:79–98.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):137.
Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50.
Heinz S, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol cell. 2010;38(4):576–89.
Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14(6):1188–90.
Lachmann A, Giorgi FM, Lopez G, Califano A. ARACNe-AP: gene network reverse engineering through adaptive partitioning inference of mutual information. Bioinformatics. 2016;32(14):2233–5.
Anders S, Huber W, S A, W H. Differential expression analysis for sequence count data. Genome Biol. 2010;11:106.
Nagarajan S, et al. Bromodomain protein BRD4 is required for estrogen receptor-dependent enhancer activation and gene transcription. Cell reports. 2014;8(2):460–9.
Li W, et al. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature. 2013;498(7455):516–20.
Peña C, et al. The expression levels of the transcriptional regulators p300 and CtBP modulate the correlations between SNAIL, ZEB1, E-cadherin and vitamin D receptor in human colon carcinomas. Int J Cancer. 2006;119(9):2098–104.
Holding AN, Markowetz F. Early dynamics of ERa and GRHL2 binding on stimulation with estradiol. Gene Expression Omnibus. 2018; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Holding AN, Markowetz F. Knockdown of GRHL2 affects active enhancers (H3K27ac sites) enriched for ER binding. Gene Expression Omnibus. 2018; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Bruna A, Markowetz F, Caldas C, Holding AN. Knockdown of GRHL2 affects active enhancers (H3K27ac sites) enriched for ER binding. Gene Expression Omnibus. 2018; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Sahu B, Laakso M, Pihlajamaa P, Ovaska K, Sinielnikov I, Hautaniemi S, Jänne OA. FoxA1 specifies unique androgen and glucocorticoid receptor binding events in prostate cancer cells. Gene Expression Omnibus. 2013; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Li W, Ma Q, Rosenfeld MG. Functional importance of eRNAs for estrogen-dependent gene transcriptional activation. Gene Expression Omnibus. 2013; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Hah N, Murakami S, Nagari A, Danko CG, Kraus WL. Enhancer transcripts mark active estrogen receptor binding sites using GRO-seq. Gene Expression Omnibus. 2013; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi.
Holding AN. VULCAN supplementary, (2013), GitHub repository. 2018. https://github.com/andrewholding/VULCANSupplementary. https://doi.org/10.5281/zenodo.2633145
Giorgi FM, Holding AN, Markowetz FM. VirtUaL ChIP-Seq data Analysis using Networks. Bioconductor. 2018. https://doi.org/10.18129/B9.bioc.vulcan.
Giorgi FM, Holding AN, Markowetz FM. VirtUaL ChIP-Seq data Analysis using Networks dataset. Bioconductor. 2018. https://doi.org/10.18129/B9.bioc.vulcandata.
We would like to acknowledge the support of the University of Cambridge, Cancer Research UK, and Hutchison Whampoa Limited. We are grateful for the support of Eva Papachristou in undertaking the qPLEX-RIME analysis of our samples in advance of her initial publication describing the method. We would like to acknowledge the contribution from the CRUK Genomics, Proteomics, and Bioinformatics core facilities in supporting this work.
This work was funded by CRUK core grant [grant numbers C14303/A17197, A19274] and Breast Cancer Now Award [grant number 2012NovPR042] to FM and supported by The Alan Turing Institute under the EPSRC grant EP/N510129/129/1 as a Turing Fellowship to ANH.
Availability of data and materials
All sequence data are publicly available at the GEO, accession numbers GSE109820  and GSE123475 . Data from previously published datasets were also acquired from GEO with accession numbers GSE110824 , GSE39880 , GSE45822 , and GSE43836 .
Code for data analysis is provided as a Rmarkdown document, and supporting data is available from https://github.com/andrewholding/VULCANSupplementary . For convenience, we have provided VULCAN as a BioConductor package https://bioconductor.org/packages/release/bioc/html/vulcan.html  along with a supporting data package https://bioconductor.org/packages/release/data/experiment/html/vulcandata.html .
Ethics approval and consent to participate
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised: "Figure 4 and 5 were mistakenly transposed.
Supplemental figures and tables. (PDF 3249 kb)