Network analysis of ChIP-seq data by VULCAN identifies GRHL2 as a key co-regulator of ERa in luminal breast cancer

VULCAN infers regulatory interactions of transcription factors by overlaying networks generated from tumor expression data onto ChIP-seq data. VULCAN analysis of estrogen receptor (ER) activation in breast cancer highlighted key components of the ER complex alongside a novel interaction with GRHL2. We demonstrate that GRHL2 is recruited to a subset of ER binding sites and regulates the transcriptional output of ER, as evidenced by: changes in ER-associated eRNA expression; and stronger ER binding at active enhancers (H3K27ac sites) after GRHL2 knockdown. Our findings provide new insight into ER signaling and demonstrate VULCAN, available from Bioconductor, as a powerful predictive tool.


Introduction
Breast cancer is the most common form of cancer in women in North America and Europe accounting for 31% of all new cancer cases. In the US, it is estimated that 41,400 deaths will have occurred from the disease in 2018 [1] . The majority of breast cancers, approximately 70%, are associated with deregulated signaling by the Estrogen Receptoralpha (ER), which drives tumor growth. Therefore, in ERpositive (ER+) tumors, ER is the primary therapeutic target.
During activation, ER recruits several cofactors to form an active complex on the chromatin. FOXA1 is of particular interest as the protein shares nearly 50% of its genomic binding sites with ER and has been shown to operate as a pioneer factor before ER activation [2], [3] . It is through FOXA1 and other cofactors (e.g SRC1) [4], [5] , that ER is able to recruit RNA Polymerase II at the gene promoter sites by way of adaptor proteins in order to initiate transcription [6] .
Combinatorial treatments targeting ER cofactors present a significant opportunity in breast cancer therapy for increasing patient survival. In particular, the pioneer factor FOXA1 [7] has been identified as novel therapeutic target for the treatment of breast cancer, while the EZH2ERαGREB1 transcriptional axis has been shown to play a key role in therapeutic resistance [8] .
ChIPseq enables the identification of potential sitespecific interactions at common binding sites between transcription factors and their cofactors; however, to fully characterize all potential cofactors of a single project on this scale is laborious and expensive. To follow up all potential cofactors identified by a chromatinwide proteomics method, e.g. RIME [9] or ChIPMS [10] , would take hundreds of individual ChIPseq experiments. Studies like ENCODE [11] have gone a long way to provide a resources to meet these challenges; however, the inherent scale of the https://docs.google.com/document/d/1e_NXx-TtldXb44vEAS7vcbr29pKaV1gQWRGeACxPqbg/edit# 4/44 4 problem means public studies can only offer data for a subset of TF in a limited number of models. For a single lab to undertake this level of experimentation is unfeasible, and in cases where suitable antibodies for the ChIP do not exist, impossible.
To enable discoveries beyond collections like ENCODE, we are proposing a computational framework to integrate patient data in the the prediction of functional proteinprotein interactions.
By applying machine learning methods, we are able to surpass the limitation of current predictive tools that exist to support the interpretation of data. Previous methods provide information in the context of predefined biological pathways and established gene sets [12], [13] or through motif analysis [14] , while our method is built on data specific to the disease being studied.
Our method, " V irt U a L C hIPseq A nalysis through N etworks" (VULCAN), is able to specifically analyze potential diseasespecific interactions of TFs in ChIPseq experiments by combining machine learning approaches and patient data. Previously, the strategies employed by VULCAN were limited to the analysis of transcription data. By developing VULCAN to overlay coexpression networks established from patient tumor data onto ChIPseq data, we are able to provide candidate coregulators of the response to a given stimulus ( Figure 1 ). Further, as VULCAN builds on transcriptional master regulator analysis, the output from the pipeline provides the end user with functional information in terms of the activity of potentially interacting TFs. The combination of diseasespecific context and TF activity information presents a significant step forward in providing valuable information for the elucidation of onchromatin interactions from ChIPseq experiments over previous strategies.

Results
In the following, we first benchmark VULCAN's performance in a comprehensive comparison to alternative approaches. We then apply it to our data on temporal ER binding, which identifies GRHL2 as a novel ER cofactor and we explore its function.

Mutual Information networks outperform Partial Correlation networks
We generated a mutual information network with ARCANe alongside several partial correlation networks at different thresholds all from the TCGA Breast Cancer data. To ensure robustness of our method, we tested the overlap of every partial correlation network with the mutual information network using the Jaccard Index (JI) criterion ( Suppl. Figure S228 ). Finally, we showed how the Jaccard Index between partial correlation networks and the ARACNe network is always significantly higher than expected by selecting random network edges ( Suppl. Figure S229 ). For further analysis, we selected the mutual information network generated by ARACNe as this method outperformed partial correlation networks at all thresholds.

GSEA is the optimum method for VULCAN's target enrichment analysis
https://docs.google.com/document/d/1e_NXx-TtldXb44vEAS7vcbr29pKaV1gQWRGeACxPqbg/edit# 6/44 6 VULCAN applies Gene Set Enrichment Analysis [15] to identify enrichment of our mutual information network derived regulons in differential ChIPseq data. To validate our method, we compared the results of VULCAN when applied to our ER binding data against three independent methods previously applied to benchmark VIPER [16] . The first implemented a Fisher pvalue integration step. This test lacks stringency and results in nearly all regulons as significantly enriched ( Suppl. Figure S230 ). Second, we implemented a fraction of targets method, defining for every TF the fraction of their targets that are also differentially bound. This alternative to VULCAN ignores the MI strength of interaction and the individual strengths of differential bindings, reducing the resolving power of the algorithm ( Suppl. Figure S231 ). Finally, we compared to a Fisher's Exact Method, which assesses the overlap between networks and significant differential binding. This method is too stringent (as observed in the original VIPER paper) [16] ; and even without pvalue correction, there are no significant results, even at low stringency, demonstrating the low sensitivity of using a Fisher's Exact Method method ( Suppl. Figure S232 ). In summary, VULCAN GSEA implementation outperformed all three alternative methods we tested (ttest based; fraction of targets method; and Fisher's Exact Method) in our dataset and was therefore applied to all downstream analysis of ChIPseq data.

VULCAN outperforms enrichment analysis tools (GREAT, ISMARA & ChIPEnrich)
To further validate our method, we compared the output of our GSEA analysis with different versions of promoterenrichment approaches implemented by GREAT [12] , ISMARA [14] and ChIPEnrich [13] . The VULCAN analysis shows a significant overlap in terms of detected pathways with the GREAT method ( Suppl. Figure S233 ). ChIPEnrich identifies enrichme nt of a number of TFs also predicted by VULCAN, but it fails to identify ESR1 as the top transcription factor affected by our experiment ( Suppl. Figure S234 ). ISMARA succeeds at identifying ESR1 https://docs.google.com/document/d/1e_NXx-TtldXb44vEAS7vcbr29pKaV1gQWRGeACxPqbg/edit# 7/44 7 using a motifbased analysis, but does not identify other candidate binding TFs ( Suppl. Figure   S235 ). In summary, VULCAN outperforms both ISMARA and ChIPEnrich, and significantly overlaps with GREAT, but provides additional value through inference of TF factor activity.
Temporal analysis of ER DNA binding profiles after activation by E2 We  Figure 2A ) and the response was sustained, in agreement with our previous study [17] We performed motif enrichment analysis (HOMER software) on ER binding sites detected by differential binding analysis. Our analysis confirmed a strong enrichment for a single element, ERE, bound at both 45 and 90 minutes, with a corrected pvalue of 0.0029 ( Figure 3F ). When clustered according to peak intensity, samples cluster tightly in two groups: treated and untreated We performed a Gene Set Enrichment Analysis (GSEA) [15] and an associated Rank Enrichment Analysis (aREA) [16] using the differential binding at gene regulatory regions with time 0 as reference. Individual differential binding signatures for GSEA were calculated using a negative  [18] . The collective contribution of differentially bound sites highlights several ERrelated pathways in both the GSEA and aREA analyses [19]- [21] ( Suppl. Figure S222 ). The strongest upregulated GSEA pathway in both time points ( Table S2 and S5 ) was derived from RNAseq in an MCF7 study using estradiol treatment [20] , confirming the reproducibility of our dataset.

VULCAN identifies coactivators and corepressors of ER
We leveraged the information contained in mutual information networks to establish TF networks enriched in the differential binding patterns induced by estradiol. From our analysis of ER binding, we established four classes of modulation: early coactivators, early corepressors, delayed coactivators and transient coactivators ( Figure 3 ) .
Using VULCAN, we defined TF network activity of occupied regulatory regions ( Figure 3A ) according to the binding of ER within their promoter and enhancer regions (limited to 10kb upstream of the Transcription Starting Site to ensure gene specificity). We define early coactivators as those TFs whose network is upregulated at both 45 and 90 minutes ( Figure 3B ); these genes include AR, SP1 and CITED1. TFs with opposite behavior (namely TFs whose negative/repressed targets in the ARACNe model are occupied by ER), or "early corepressors", include GLI4, MYCN and GRHL2 ( Figure 3C ). Some TFs appear to have their targets transiently bound at 45 minutes, but then unoccupied at 90 minutes and therefore we dubbed them "transient coactivators' ( Figure 3D ). We further defined TFs active at 90 minutes but not at 45 minutes as "delayed coactivators", noting these cofactors could be the transient if the response is not completed by 90 minutes . While this category exists, and notably contains both ESR1 and the 9 known ESR1 interactor GATA3, it is just below the significance threshold at 45 minutes ( Figure   3E ).
We repeated our TF network activity analysis of ER activation ( Figure 3AE ) on an independent dataset from TCGA and found similar results to those established from the METABRICderived network ( Suppl. Figures S214 to S218 ).
To ensure the robustness of the results, we performed a joint analysis of data obtained from both networks. At 45 ( Figure 4A ) and 90 minutes ( Figure 4B ) we identified candidates, specifically the ESR1, GATA3 and RARA networks, which were consistently and robustly activated by ER in both time points. The joint analysis also identified candidate corepressors, including HSF1 and GRHL2.

VULCAN results are specific to the cancer used for network modeling
As a negative control, we used a different context ARACNe network, derived from the TCGA acute myeloid leukemia (AML) dataset. This network shows globally weaker enrichment scores and a weak positive correlation with the results obtained through breast cancer regulatory models ( Suppl. Figure S221 ).

VULCAN is able to predict proteinprotein interactions in both patientderived xenografts (PDX) and prostate cancer
To demonstrate the general applicability of VULCAN, we applied the algorithm to a Breast Cancer Patientderived xenograft dataset (Gene Expression Omnibus series GSE110824) [22], [23] , which showed the expected enrichment of the ESR1, FOXA1 and GATA3 regulons ( Suppl. Figure S236 and Figure 5A ) predicting the colocalization of the respective proteins on the chromatin. To further test the generality of VULCAN, we applied the method to another cancerassociated transcription factor type. More specifically, we evaluated an androgen receptor ChIPseq dataset derived from prostate cancer cell line model LNCaP1F5 and VCaP (Gene Expression Omnibus Series GSE39880, AR + DHT, RU486 or CPA) [24] . By applying a context specific network built from the TCGA prostate cancer dataset, we could predict functional colocalization of FOXA1 and AR in target genes' promoters after dihydrotestosterone (DHT) treatment in prostate cell lines ( Suppl. Figure S237 and Figure 5B ), validating the known role of

VULCAN outperforms classical motif analysis
Finally, we compared VULCAN to a classical motif analysis by exploiting the MsigDB C3 collection v6.1 [28] of gene sets, which contain canonical TFspecific binding motifs in their promoters. Our analysis shows the correlation of VULCAN results for two transcription factors (e.g. between GATA3 and ESR1, Suppl. Figure S238 ) can be relatively high but not significantly overlapping in terms of target genes containing the same canonical motifs ( Suppl. Figure S239 ).
We could prove that this nonrelationship is general as it extends to the majority of TFTF pairs that were present in the MsigDB database ( Suppl. Figure S2S40 ).

GRHL2 is a novel ER cofactor
In our analysis of ER dynamics, the GRHL2 transcription factor was consistently identified as a key player, using both the METABRIC and TGCA networks. GRHL2 is a transcription factor that is important for maintaining epithelial lineage specificity in multiple tissues [29], [30] . It has previously been predicted to exist in ERassociated enhancer protein complexes [31] , but its https://docs.google.com/document/d/1e_NXx-TtldXb44vEAS7vcbr29pKaV1gQWRGeACxPqbg/edit# 11/44 11 function in the ER signaling axis is unknown. Therefore, we set out to experimentally validate GRHL2 as an ER cofactor.
T here is only a weak, positive correlation between ESR1 and GRHL2 expression in the TCGA and METABRIC breast cancer datasets ( Suppl. Figure S225 ). Furthermore, GRHL2 does not change significantly in different PAM50 subtypes, although it is overexpressed in malignant tissue. The low correlation between GRHL2 expression and subtype implies that the protein is controlled by mechanisms such as phosphorylation [32] , subcellular localization or onchromatin proteinprotein interactions.

qPLEXRIME detects a significant increase in the ERGRHL2 interaction on activation
We undertook a complementary, unbiased, experimental approach combining RIME [9] with TMT [33] , called qPLEXRIME [34] , to identify interactors of ER within the ERchromatin complex. We generated ER qPLEXRIME data from MCF7 cells treated with estradiol at both 45 and 90 minutes and compared this to the VULCAN dataset ( Suppl. Figure S227 ). We found known ESR1 interactors with both methods, namely HDAC1, NCOA3, GATA3 and RARA. These interactors have positive enrichment according to VULCAN [16] , implying the TF's regulon is overrepresented within the differentially bound genes. Importantly, qPLEXRIME identified a significant increase in the proteinprotein interaction between ER and GRHL2 in estrogenic conditions. As GRHL2 has a negative enrichment score in VULCAN, this implies either the protein is recruited by ER to sites that are significantly depleted for GRHL2's regulon or that GRHL2 is established as having a negative correlation to the genes regulated at these sites, i.e.
the protein is a corepressor of the ER.
To assess the chromatinassociation of ER and GRHL2, we undertook GRHL2 ChIPseq in the absence (0 min) or presence (45 min) of E2 ( Figure 6A) . VULCAN analysis of the GRHL2 differential binding showed that ER was the key interacting transcription factor, using both the TCGA and METABRICderived networks ( Figure 6B) .
We undertook a comparison of GRHL2 binding with public datasets (Fig ure 6C). Our analysis showed that GRHL2 sites that responded to estradiol were enriched for ER binding sites (in agreement with our qPLEXRIME data and VULCAN results) and FOXA1 (compatible with either an ER interaction or the previously reported interaction with MLL3 [31] ). Importantly, the changes in GRHL2 binding profiles after E2 treatment were not a result of altered GRHL2 protein levels ( Figure 7) . Individual analysis of peaks show that classical ER promoter binding sites, e.g. RARa, were not the target of this redistribution of GRHL2, as these sites were occupied by GRHL2 before E2 stimulation. Motif analysis of the sites within increased GRHL2 occupancy showed enrichment for the full ERE (pvalue = 1 x 10 179 ) and the GRHL2 binding motif (pvalue = 1 x 10 51 ) ( Figure 6G) .
To establish if the recruitment of GRHL2 was primarily related to a transcriptional function or the previously described interaction with MLL3, we overlapped our GRHL2 data with that of published H3K4me1/3 [31] and P300 [35] cistromes. While H3K4me occupancy was consistent between conditions, we found P300 binding to be enriched at the E2responsive GRHL2 sites.
A more detailed analysis of the GRHL2 overlap with P300 sites showed the greatest cooccupancy of GRHL2/P300 sites was when both TFs were stimulated by E2 ( Figure 6D). 13 GRHL2 responsive sites were enriched at enhancers over promoters ( Figure 6E). These findings suggested that the GRHL2ER complex is involved in transcription at ER enhancer sites.

Moreover
Validation of the ERGRHL2 interaction by qPLEXRIME and coIP qPLEXRIME [34] analysis of GRHL2 in both the estrogenfree and estrogenic conditions We further validated this interaction by coIP. Our analysis robustly found that GRHL2 and ER interact in both MCF7 and T47D cells (Figure 7). We further validated the antibody by siRNA knockdown and saw the disappearance of the GRHL2 band at 24 hours.

GRHL2 constrains ER binding and activity
We investigated the transcription of enhancer RNAs at these sites using publicly available GROseq data [36] [GSE43836] (Figure 6F). At E2 responsive sites , eRNA transcription was strongly increased by E2 stimulation; by contrast, eRNA transcription was largely independent of E2 stimulation when the entire GRHL2 cistrome was considered. Analysis of a second GROseq dataset, GSE45822, corroborated these results (Suppl. Figure S510) .
Combining data from all three sites established the effect as significant by pairedsample rank test (p = 0.04, onetailed pairedsample, wilcoxon test). Collectively, these data demonstrate that GRHL2 constrains specific ER enhancer transcription.
A conserved alphahelix between residues 425437 of GRHL2 has previously been shown to inhibit P300 [44] . We therefore overexpressed GRHL2 Δ425437, a previously demonstrated nonp300inhibitory mutant [44], [45] , in three ER positive breast cancer cell lines (MCF7, T47D and ZR75) and compared levels of eRNA to those recorded for both an empty vector control and for the overexpression of the wild type protein ( Figure 8B). The results of the wildtype study were concordant to those of our previous analysis ( Figure 8A), suggesting in general that overexpression GRHL2 leads to the inhibition of eRNA production at certain ER sites. Importantly, in all cases, the removal of aa 425437 from GRHL2 led to a reduction in the inhibitory effect than caused by overexpression of the wildtype protein and was found as significant in 5 out of 9 cases test (P < 0.05, ttest, singletail, paired).
We undertook H3K27ac ChIPseq after knockdown of GRHL2 by siRNA for 48 hours. In both MCF7 and T47D cells, we saw a significant change in the acetylation marks surrounding GREB1, and in MCF7 we saw an increase at both XBP1 and GREB1 promoters and a decrease at TFF1 ( Figure 9A) . Genomewide, we saw a redistribution of H3K27 acetylation in both cell lines ( Figure 9B). Comparison of the sites altered by GRHL2 knockdown showed a stronger signal for ER binding (Figure 9C Right).

GRHL2 has a key role in regulating EMT
In the 4T1 tumor model, GRHL2 was found to be significantly downregulated in cells that had undergone EMT [29] . The same study showed that knockdown of GRHL2 in MCF10A -an ERnegative cell line -led to loss of epithelial morphology. Overall, this suggested that the GRHL2 transcription factor plays an essential role in maintaining the epithelial phenotype of breast cells. Similar results were observed with the MDAMB231 model, where expression of GRHL2 resulted in reversal of EMT [30] . This result has been recapitulated in hepatocytes, where GRHL2 was found to suppress EMT by inhibiting P300 [44] . The ability to suppress EMT has also been noted in prostate cancer, another cancer driven by a steroid hormone receptor (AR), and the genes regulated by GRHL2 are linked to disease progression [46] .

GRHL2 a novel corepressor of ER eRNA production
These earlier data combined with the link between GRHL2 expression and patient survival indicate a significant role for GRHL2 in the progression of breast cancer. However, its role in the ER signaling axis has, until now, been unknown. Here, we show that GRHL2 performs its activity at a subset of ER enhancers. Overexpression of GRHL2 resulted in a significant decrease in eRNA production at the TFF1 and XBP1 enhancers and, in agreement with previous studies that correlate eRNA transcription with gene expression [47]- [49] , we found the measured eRNA decrease was concurrent with a significant downregulation in the expression of the corresponding gene.
These results are consistent with previous findings that GRHL2 inhibits P300 [44] and, while the ER complex results in the activation of eRNA transcription at these sites, that GRHL2 plays a role in finetuning or modulating this process.

GRHL2 role in the ER signaling axis is independent to its role in tethering MLL3
In breast cancer, GRHL2 has previously been shown to directly interact with FOXA1, which may contribute to tethering of the histone methyltransferase MLL3 and, consequently, epigenetic marks at GRHL2/FOXA1 binding sites [31] . Our analysis, however, showed no particular enrichment for H3K4me1/3 marks at E2 responsive G RHL2 sites compared to other GRHL2 binding sites and our proteomic analysis of interactors showed a strong association with proteins related to transcription. We proposed that these ER responsive sites are related to a role of GRHL2 in a transcriptional process independent of its interaction with MLL3. This was supported by evidence of a significant overlap with binding of the coactivator P300, transcriptional proteins detected by qPLEXRIME analysis of GR, and a pronounced increase in eRNA transcription at E2 responsive GRHL2 sites.

ER is bound more strongly at active enhancers (H3K27ac) that are altered by siGRHL2
Knockdown of GRHL2 led to a genomewide remodeling of H3K27ac marks, found at active enhancers, confirming a role of GRHL2 in partially regulating these sites. Detailed inspection of the data showed a significant increase of these marks around the XBP1 and GREB1 genes, supporting our hypothesis that GRHL2 has a partial inhibitory role within the ER regulon. The result was further supported by finding enrichment of ER binding events at H3K27ac marks altered by GRHL2 knockdown (Figure 9C, Right Panel). The more complex effect on H3K12ac, when compared to the effects on eRNA production, is likely a result of the diversity of roles that GRHL2 holds within the cell, leading to a host of downstream effects in the regulation of chromatin recruitment of key factors such as MLL3, ER and FOXA1 [31] Deletion of the P300 inhibitory α helix from GRHL2 reduces the protein's ability to repress the production of eRNA at ER bound enhancer sites Finally, to clarify if inhibition of P300 was occurring, we generated and overexpressed GRHL2 lacking the inhibitory alphahelix at between amino acids 425437. In all cases, GRHL2 Δ425437 had reduced an inhibitory effect compared to overexpression of the wildtype, confirming that GRHL2 primarily plays a repressive role at these sites.

Conclusion
VULCAN is built on stateoftheart network analysis tools previously applied to RNAseq data. By adapting networkbased strategies to ChIPseq data, we have been able to reveal novel information regarding the regulation of breast cancer in a model system.
We have demonstrated that the VULCAN algorithm can be applied generally to ChIPseq for the identification of new key regulator interactions. Our method provides a novel approach to investigate chromatin occupancy of cofactors that are too transient or for which no reliable antibody is available for direct ChIPseq analysis.
Further, because of our use of clinical data, VULCAN results are both more likely to be relevant and are specific to the disease type studied, as demonstrated in the loss of signal when using a control coexpression network generated from an alternative disease type.
VULCAN enabled us to identify the GRHL2ER interaction and that GRHL2 plays a repressive role. Further analysis showed the process to be independent to the previously reported interaction with FOXA1 and MLL3 [31] . Our conclusion, therefore, is that GRHL2 has a second, previously undescribed role that regulates transcription at specific estrogen responsive enhancers ( Figure 10) .
Given the central role of the ER in breast cancer development and GRHL2's own ability to regulate EMT, the discovery that ER recruits GRHL2 leading to the altered eRNA production is an important step in enhancing our understanding of breast cancer and tumorigenesis.

ChIPseq
Frozen samples were processed using established ChIP protocols [50] to obtain DNA fragments of~300 bp in length. The libraries were prepared from the purified DNA using a Thruplex DNAseq kit (Rubicon Genomics) and sequenced on the Illumina HiSeq Platform. Sequencing data is available from Gene Expression Omnibus, accession GSE109820 & GSE123475.

Differential binding analysis
Sequencing data was aligned using BWA [51] to the human genome (hg19). Reads from within the DAC Blacklisted Regions was removed before peak calling with MACS 2.1 [52] on default parameters. The aligned reads and associated peak files were then analyzed using DiffBind [18] to identify significant changes in ER binding. Gene Set Enrichment Analysis (GSEA) was performed as described by Subramanian et al . [53] using the curated pathway collection (C2) from MSIGDB v 5.0 with 1000 set permutations for each pathway investigated, followed by Benjamini Hochberg Pvalue correction.

Motif analysis
Motif analysis of the binding regions was undertaken with Homer v4.4 [54] using default parameters. Motif logo rendering was performed using Weblogo v2.8.2 [55] VULCAN analysis We reconstructed a regulatory gene network using ARACNeAP as described by Alvarez [56] .
RNAseq breast cancer data was downloaded from TCGA on January 2015 and VSTNormalized as described by Anders and Huber [57] . The ARACNe transcriptional regulation network was imported into R using the VIPER BioConductor package and it was interrogated using the differential binding profiles from our ChIPseq experiment as signatures, 45  Samples were prepared as previously described for RIME [9] , protocol was modified to include TMT isobaric labels for quantification [34] . qPLEXRIME data and analysis pipeline is available as part of the supplementary Rmarkdown.

Code availability
Code for data analysis is provided as a Rmarkdown document and supporting data is available     Overexpression of GRHL2 in MCF7 resulted in a reduction of eRNA transcribed from the GREB1, TFF1 and XBP1 enhancers. The effect was significant at TFF1 and XBP1 enhancers (p < 0.05, paired ttest).

B.
Overexpression of GRHL2 Δ425437 (delta) compared to empty vector (EV) and GRHL2 wildtype (OE) at 24 hours. In all 3 cell lines at all 3 loci, overexpression of the wild type (WT) led to a reduction in the mean eRNA production at GREB1, TFF1 and XPB1. This effect was significant in 6 out of 9 experiments (P < 0.05, ttest, onetailed, paired). Overexpression of GRHL2 Δ425437 had a reduced effect, that led to a significant reduction in only 2 out of 9 experiments (P < 0.05, ttest, one tailed, paired). Importantly, in 4 out of 9 experiments, WT overexpression had significantly less eRNA production than GRHL2 Δ425437, suggesting the P300 inhibition domain plays a role in the regulation of eRNA production. The effect of silencing GRHL2 on H3K27ac at 48 hours in MCF7 and T47D cell lines was monitored by ChIPseq. Analysis of sites proximal to TFF1, XBP1 and GREB1 showed significant changes in acetylation at all three sites in MCF7. Significant changes were only found at GREB1 in T47D (top right). While XBP1 and GREB1 show an increase in histone acetylation on silencing GRHL2, TFF1 showed the reverse effect.
B. Genomewide, the effects of silencing GRHL2 led to significant redistribution of H3K27ac in both the MCF7 and T47D cell lines, with both showing an increase and decrease in the histone mark dependent on site.
C. From left to right. Coverage as calculated by Homer. H3K27ac was found at GRHL2 sites in both MCF7 and T47D cells, in particular at the E2 responsive sites. The same mark was also found at P300 sites as expected. Analysis of ER binding at H3K27ac sites showed an enrichment for ER binding at the H3K27ac sites that were most responsive to knockdown of GRHL2 in MCF7 cells.

Figure 10: Overview of the role of GRHL2 in ER activation
On activation of the ER by the ligand E2, the protein is released from a complex containing HSPs and translocates to the nucleus. The holoER dimer forms a core complex at Estrogen Response Elements (ERE) with FOXA1 (pioneer factor) and GATA3. ER further recruits P300 and GRHL2. GRHL2 has an inhibitory effect on P300 (a transcriptional activator interacting with TFIID, TFIIB, and RNAPII), thereby reducing the level of eRNA transcription at enhancer sites. Overexpression of GRHL2 further suppresses transcription, while knockdown of GRHL2 reverses the process.