Glucocorticoid receptor binds half sites as a monomer and regulates specific target genes

Background Glucocorticoid receptor (GR) is a hormone-activated, DNA-binding transcriptional regulatory factor that controls inflammation, metabolism, stress responses, and other physiological processes. In vitro, GR binds as an inverted dimer to a motif consisting of two imperfectly palindromic 6 bp half sites separated by 3 bp spacers. In vivo, GR employs different patterns of functional surfaces of GR to regulate different target genes. The relationships between GR genomic binding and functional surface utilization have not been defined. Results We find that A477T, a GR mutant that disrupts the dimerization interface, differs from wild-type GRα in binding and regulation of target genes. Genomic regions strongly occupied by A477T are enriched for a novel half site motif. In vitro, GRα binds half sites as a monomer. Through the overlap between GRα- and A477T-bound regions, we identify GRα-bound regions containing only half sites. We further identify GR target genes linked with half sites and not with the full motif. Conclusions Genomic regions bound by GR differ in underlying DNA sequence motifs and in the GR functional surfaces employed for regulation. Identification of GR binding regions that selectively utilize particular GR surfaces may discriminate sub-motifs, including the half site motif, that favor those surfaces. This approach may contribute to predictive models for GR activity and therapy. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0418-y) contains supplementary material, which is available to authorized users.

Results: We find that A477T, a GR mutant that disrupts the dimerization interface, differs from wild-type GRα in binding and regulation of target genes. Genomic regions strongly occupied by A477T are enriched for a novel half site motif. In vitro, GRα binds half sites as a monomer. Through the overlap between GRα-and A477T-bound regions, we identify GRα-bound regions containing only half sites. We further identify GR target genes linked with half sites and not with the full motif. Conclusions: Genomic regions bound by GR differ in underlying DNA sequence motifs and in the GR functional surfaces employed for regulation. Identification of GR binding regions that selectively utilize particular GR surfaces may discriminate sub-motifs, including the half site motif, that favor those surfaces. This approach may contribute to predictive models for GR activity and therapy.

Background
Glucocorticoid receptor (GR, HUGO symbol NR3C1) is a DNA-binding transcriptional regulatory factor that is activated specifically by binding glucocorticoid hormones, and which regulates diverse aspects of physiology. Genome-wide measurements of gene expression in U2OS human osteosarcoma cells stably expressing GR [1] have revealed that thousands of genes are differentially expressed in response to glucocorticoid treatment (that is, regulated by GR) [2]. Moreover, U2OS lines stably expressing three mutant GR alleles, each disrupting one functional surface, have revealed that different GR target genes depend on different patterns of functional surfaces of GR for proper regulation; A477T disrupts the dimerization interface; and 30iiB (E219K/ F220L/W234) and E773R disrupt the activation function 1 and 2 (AF1, AF2) domains, respectively ( Figure 1A) [2]. Similarly, GRγ, a minor isoform differing from the major isoform GRα by insertion of a single arginine in the 'lever arm' region [3], differentially regulates some GR regulated genes, implicating this region as another functional surface. X-ray crystallography [3] and nuclear magnetic resonance spectroscopy (NMR) [4] of GR bound to different oligonucleotides containing high-affinity binding sites indicate that GR assumes distinct conformations at different sites. These results imply that GR also assumes different conformations in vivo at different glucocorticoid response elements (GREs), genomic regions that confer specific context-dependent GR responsiveness upon a nearby gene.
In vitro, purified GR recognizes specifically a GR binding sequence (GBS) motif composed of imperfect palindromic 6 base pair (bp) 'half sites' separated by 3 bp 'spacers' , binding as an inverted dimer [5]. In vivo, GR occupies specific genomic GR binding regions (GBRs), identified using chromatin immunoprecipitation coupled with quantitative PCR (ChIP) or deep sequencing (ChIP-seq). The genomic locations of GBRs differ strikingly in different cell or physiologic contexts, and not all GBRs contain an identifiable GBS motif. Hence, the relationship between sequences recognized in vitro and genomic regions occupied in vivo is complex. It is reassuring that GBS motifs drive GR-regulated transcription in a simple reporter context [5], and therefore reasonable to assume that GR occupancy at GBRs that contain GBSs reflects in vivo sequence recognition. Interestingly, different GBSs, tested in isolation in vivo, trigger utilization of different patterns of GR functional surfaces [3], supporting the notion that GR assumes different conformations at these sequences in vivo. However, rules dictating the relationship between DNA sequence, GR conformation, and utilization of distinct GR functional surfaces remain unknown.
We therefore examined whether sequence motifs, including the canonical GBS, found within GBRs might in fact be an aggregate of specific motifs, each biased toward utilization of particular functional surfaces of GR. In this study, we measured changes in gene expression and genomic occupancy upon glucocorticoid treatment of isogenic cultured human cell lines expressing alleles of GR bearing differences in particular functional surfaces. We further used statistical, computational, and biophysical methods to characterize changes in GR conformation and function, as well as the sequence motifs that are present at GREs, in relation to these functional surfaces.

Different GR alleles induce diverse, gene-specific transcriptional responses
To improve our understanding of the patterns of GR functional surfaces used during transcriptional regulation, we carried out a genome-wide analysis in a series of human U2OS osteosarcoma cell lines with stably integrated GR alleles [2,6]. We isolated RNA from U2OS cells expressing integrated rat GRα, GRγ, GRα 30iiB (30iiB; E219K/F220L/W234R), GRα A477T (A477T), or GRα E773R (E773R) ( Figure 1A, B) after treatment for 2, 4, or 24 h with 100 nM dexamethasone, a synthetic glucocorticoid that binds GR with high affinity, or for 4 h with ethanol only as a '0 hour' control. Each GRα mutation alters in a distinct way the utilization of various GR regulatory surfaces; among cells expressing the 30iiB, A477T, or E773R mutations, a panel of 10 regulated genes displayed six out of the eight possible patterns of surface utilization of three surfaces compared to GRα; that is, (2 states, utilizes or does not) (3 mutations) = 8 patterns [2]. These results imply that other functional surfaces and a multitude of patterns are likely used at regulated genes. We assayed all gene expression from isolated RNA samples (Additional file 1). After 4 h of dexamethasone treatment, thousands of genes were regulated by GR [2]. Because of the statistical complexities with assessing utilization of each of the four GR surfaces at every regulated gene, we instead analyzed pairwise the regulation by GRα compared individually with our four GR alleles (Additional file 1). As expected, we found that regulation was affected in a highly allele-specific and gene-specific manner (Additional file 1, Figure 2).
We calculated the statistical significance of the difference in regulation between GRα and each allele for each gene after 4 h of dexamethasone treatment ( Figure 3A to D), using limma, a software package for the analysis of gene expression microarray data using linear models. We instructed limma to estimate the regression coefficients (that is, the per-gene differences in regulation between a given allele and GRα) for each such simple contrast (the difference of GRγ and GRα, and so on), as well as the standard errors of those coefficients [7]. This method allows the application of a t-test to determine significance, which may reduce the number of identified targets but ensures a high confidence gene list by properly controlling the false discovery rate [7]. Figure 1 GR alleles used in this study. Rat GRα, the major isoform of GR, is a 795 amino acid protein with a modular structure typical of nuclear receptors: an N-terminal activation function 1 (AF1) domain, a DNA-binding domain (DBD), and a C-terminal ligand binding domain (LDB), which contains the activation function 2 (AF2) domain. We used mutations of GRα in AF1 (30iiB); AF2 (E773R); the dimerization interface of the DBD (A477T); and a minor isoform GRγ, which bears a single amino acid (arginine) insertion at position 471 in GRα as a result of alternative splicing. Domain boundaries are approximate, based on functional assays. Western blots show that GR was expressed at similar levels in all cell lines, which were similar to the level of endogenous GR found in the lung adenocarcinomic human alveolar basal epithelial cell line A549.
We then assigned a simple phenotypic classification to each GR regulated gene based on its valence of regulation by GRα (activation or repression) and the difference in regulation by that allele and by GRα (gain, loss, or change of valence). Some phenotypic classifications were less common than others: in particular, genes that change the valence of regulation were relatively rare. Moreover, certain phenotypes appeared to be relatively allele-restricted; for example, many genes gained activation with A477T and GRγ (103 and 36 genes, respectively), whereas few gained activation with 30iiB or E773R (6 and 1 genes, respectively) ( Figure 3). Thus, A477T and GRγ may drive activation through one or more mechanisms unavailable to GRα, 30iiB, or E773R.
A477T and GRγ selectively occupy GBRs near genes that gain activation Next, we performed ChIP-seq to define GBRs genomewide in U2OS cells expressing GRα or one of the three mutant alleles (30iiB, A477T, and E773R), after treatment with dexamethasone. Defining GBRs enables identification of putative primary regulated genes (those regulated by a proximal GR-occupied presumptive GRE) and allows us to distinguish genes differentially regulated due to changes in occupancy from those with altered GR activity. GRγ has been similarly examined by others [6]. Importantly, we did paired-end sequencing of the ChIP-seq library, which improves the dynamic range of occupancy by allowing use of both ends of the double stranded molecule to determine whether the originating fragment was unique [8]. We identified GBRs (Additional file 2) using MACS2 a , a software package for the analysis of ChIP-seq data, and assigned the GBRs to genes (Additional file 3) based on proximity (see Materials and Methods, 'GBR to gene assignment') [9]. We also compared occupancy between GRα and each mutant (Additional file 4 Additional file 5 and Additional file 6) using the 'diffstats' module of MACS2 (see Materials and Methods, 'Computational analysis of ChIP-seq data'). To avoid considering the same GBR twice, we considered all GBRs from GRα and those GBRs from a given mutant that did not overlap with GBRs from GRα (see Additional file 7, Additional file 8, Additional file 9, and Additional file 10); additionally, we analyzed only GBRs with signals of at least two reads per million total reads. Among cells expressing GRα or 30iiB, we found that the locations of about 7% of GBRs differed b ; and we observed the same fractional difference among cells expressing GRα or E773R. Similarly, the magnitude of occupancy (that is, the number of observed sequence reads normalized by total reads) differed at only onequarter of GBRs among cells expressing GRα, 30iiB, or E773R ( Figure 4C and D). In contrast, the locations of over half of GBRs differed b among cells expressing GRα or A477T (for example, Figure 4A). Moreover, the magnitude of occupancy varied at most GBRs: more than half had greater occupancy by GRα than by A477T (GRα-selectively occupied) and one-quarter had greater occupancy by A477T than by GRα (A477T-selectively occupied) ( Figure 4B).
Comparing these results with our gene expression findings, we observed substantial enrichment of GRαselectively occupied GBRs near genes that lost activation in cells expressing A477T ( Figure 4E, Additional file 11, and Additional file 12), and of both A477T-and GRαselectively occupied GBRs near genes that gained activation ( Figure 4F, Additional file 12, and Additional file 13).
Thus, it appeared that the differentially regulated genes represent a mixture, some with altered strength of occupancy (that is, a direct relationship between binding and regulation) and others with altered GR activity (that is, an indirect relationship). The occupancy of A477T near genes that are activated by A477T is surprising in light of previous reports indicating that mutations in the dimerization interface abrogate 'transactivation' [10,11].
We further investigated whether the differentially occupied GBRs near differentially regulated genes were responsible for the observed regulation. As a simple measure of regulatory function, we cloned GBRs -selected on the basis of proximity to differentially regulated genes, strength of occupancy, and isolation from many other GBRs -into luciferase-based reporters and transfected them into U2OS cells expressing either GRα or A477T. We found GRαselective activation with four GRα-selectively occupied  Figure 2. Genes with significant differences in regulation (differences in differences in expression) with a given allele as compared to GRα are shown as filled circles with a black outline; genes lacking significant differences are shown as open circles. Only genes with significant (adjusted P <0.05) regulation in at least one condition (GRα or a given allele) are shown. We categorized genes showing significant differences in regulation (compared to GRα) by the functional consequence of the alternative allele/mutation (right).
GBRs, A477T-selective activation with seven A477Tselectively occupied GBRs, and non-selective activation with five non-selectively occupied GBRs (Additional file 14). Six of 24 GBRs did not activate transcription in either cell line, but only two drove aberrant activation (opposing selectivity of occupancy and activation). These results imply that certain of the differentially occupied GBRs may indeed function to differentially regulate nearby genes. Additional experiments, using recent methods such as endogenous genome editing [12][13][14][15], will be required to assess unequivocally whether a given GBR is actually a GRE, and, if so, whether it confers differential regulation.

Most A477T-selectively occupied GBRs contain a half site motif
In order to understand what sequence constraints, if any, were responsible for differential occupancy by GRα and A477T, we performed motif analysis (Additional file 15) using MEME-ChIP [16]. Three groups of GBRs were analyzed: (1) the thousand most-occupied GBRs that were occupied by GRα and did not have detectable occupancy by A477T ('GRα only' , Figure 5); (2) the thousand mostoccupied GBRs that were occupied by both GRα and A477T with less than a 1.  There was enrichment of GRα-selectively occupied GBRs near genes that lost activation (E) in cells expressing A477T. Similarly, there was enrichment of A477T-selectively occupied GBRs near genes that gained activation (F), although such genes also displayed enrichment for GRα-selectively occupied GBRs. The relative heights of the enrichment curves (E and F) are proportional to the number of binding sites of each allele. In contrast, 30iiB and E773R GBRs were generally similar to GRα GBRs in both location and strength of occupancy. detectable occupancy by GRα (' A477T only' , Figure 5). We excluded from motif analysis GBRs that overlap with known repeats (see Materials and Methods, 'Computational analysis of ChIP-seq data').
De novo motif searches with MEME [17] revealed that GBRs occupied by only GRα were enriched for the canonical GR 'full site' motif, which specifies cooperative binding as a GR dimer ( Figure 5). Allowing MEME to determine the optimal length for motifs in GBRs occupied by only GRα revealed a 17 bp GR motif (Additional file 15), which is longer than the previously reported 14 bp [18] and 15 bp motifs [19], having additional, asymmetric specificity flanking the more strongly specified half site. In contrast, GBRs occupied by only A477T were enriched for a single half site motif ( Figure 5), which may provoke binding of a GR monomer to DNA. These results are consistent with the observations that A477T has an impaired dimerization interface and decreased cooperativity [4].
GBRs that were occupied by both GRα and A477T were enriched for an intermediate motif that consisted of a closer-to-consensus half site and a very weak second half site, separated by the normal 3 bp spacer ( Figure 5). Due to the semi-quantitative nature of ChIP-seq, some GBRs that we characterized as similarly occupied by both GRα and A477T may actually be differentially occupied (and conversely, some GBRs that we characterized as differentially occupied may not be). Therefore, the weak second half site seen at GBRs occupied by both GRα and A477T may be an experimental artifact; moreover, the lack of a second half site motif at A477T-selectively occupied GBRs implies that such GBRs indeed contain a motif distinct from that found at GRα-occupied GBRs. Supporting this view, length-unbiased searches with MEME for motifs at GBRs occupied by only A477T or both A477T and GRα revealed truncated motifs that included only the first two positions corresponding to the second half site (Additional file 15). In Figure 5, we display GR motifs found when forcing MEME to search for motifs that are exactly 17 bp in length, for ease of comparison. CentriMo, which searches for central/local enrichment of motifs in Figure 5 Both A477T and GRα occupy 'half site' GBRs in vivo. Motif analysis was performed using MEME-ChIP on three groups of GBRs (results shown by row): occupied by GRα only (top row), occupied by both GRα and A477T with ≤1.4-fold difference (second row), and occupied by A477T only (third row). GR motifs (left column) from GBRs occupied by GRα only are similar to the canonical full site. In contrast, GR motifs from GBRs occupied by A477T only consist of a 'half site' motif, which lack the second adjacent half site. GR motifs from GBRs occupied by both GRα and A477T contain a very weak second half site, separated by the normal three bp spacer. GR motifs were forced to 17 bp in all samples, but an unbiased search revealed shorter 11 bp motifs at GBRs occupied by GRα and A477T, or A477T alone (Additional file 10). GR motifs are centrally enriched in all samples (P < <0.01). Motifs that were similar to those for STAT (center column) and AP1 (right column) family transcription factors were also found, although the AP1 motif was not enriched at GBRs occupied by only A477T. The bottom row shows the JASPAR motifs for STAT3 (MA0144.1) and AP1 (MA0099.2) as a reference, and the vertical dashed lines indicate their alignment with discovered motifs.
ChIP-seq data [20], showed that GR motifs were strongly enriched at GBR signal maxima (Additional file 15), implying direct binding by GR at these GBRs.
Discriminatory motif analysis of the three different classes (GRα-selective, A477T-selective, and non-selective) of occupied GBRs with DREME, which exhaustively searches for all 3 to 8 bp regular expression motifs that are enriched relative to di-nucleotide shuffled sequences [21], revealed motifs similar to that of activation protein 1 (AP1) and Signal Transducers and Activators of Transcription (STAT) families of transcription factors ( Figure 5 and Additional file 15). We did not detect any centrally enriched motifs that might indicate a cooperative binding partner (that is, a heterodimer) for DNA-bound GRα or A477T. Nonetheless, previous studies [22] indicate that there is widespread co-association among transcription factor binding sites, indicating that nearby sites may be functional.
Using DREME, we identified a STAT-like motif (CWGGAA) in 233 and 251 of 1,000 GBRs occupied by GRα alone, or GRα and A477T, respectively; a different STAT-like motif (GGAAYG) was found in 117 of 1,000 GBRs occupied by A477T alone (Table 1). AP1like motifs, STGAGTCA and ATGABTCA, were positively identified in 35 of 1,000 GBRs occupied by GRα alone, and 105 of 1,000 GBRs occupied by GRα and A477T, respectively; AP1-like motifs were not significantly enriched at GBRs occupied by A477T alone (Table 1). Of all 23,525 GBRs found in cells expressing GRα, we found that 1,719 and 1,214 had STAT3 (a canonical member of the STAT family) or AP1 motifs, respectively, but lacked GR full sites, and 3,277 and 3,694 had STAT3 or AP1 motifs, respectively, and contained GR full (see Additional file 10, Additional file 12, and Table 2).
GR interacts physically and functionally with both the STAT and AP1 families of transcription factors [23]. Although GR:AP1 interaction is known to be stimulated by lipopolysaccharide (LPS) treatment, GR also interacts with AP1 under normal conditions [24]. The presence of AP1 and STAT motifs at GBRs is consistent with both 'tethering' of GR -protein:protein association of GR with DNA-bound AP1 or STAT -and with the existence of composite elements, elements that contain binding sites for GR, and for one or more additional transcriptional regulatory factors. The greater abundances of GBRs that contained STAT3 or AP1 motifs and also GR full sites as compared to GBRs that contained these motifs and lacked GR full sites (1.4-and 3.0-fold, respectively; Table 2) implies that composite elements may be more common than tethering elements.

A subset of genes regulated by GRα contains only half sites
Motivated by the discovery of a half site motif near genes regulated by A477T, we investigated whether any genes were regulated by GRα recognition of half site motifs. We identified a set of genes whose regulation at 4 h is similar (less than two-fold different) between GRα and A477T at the 95% confidence level c . GBR occupancy near these 'non-selective' genes (Additional file 15) did not reveal significant enrichment of GRα-or A477T-selectively occupied GBRs relative to genes that are selectively regulated, suggesting that GBRs near these genes are bound by both GRα and A477T. Consistent with this explanation, GRα and A477T bind specifically and with equal affinity to half sites in vitro [4].
We investigated whether GBRs within 20 kilo-base pairs (kb) of the transcriptional start site of non-selective, activated genes contained half sites, full sites, or both. Since the sets of sequences described by the full site motif and by the half site motif are partially overlapping ( Figure 6A), we assessed the individual information, a quantitative and absolute measurement of the similarity of a sequence to a given motif [25], of the best match to each GR motif (GRα-derived full site motif and A477T-derived half site motif) at each GBR (±75 bp from the signal summit; Additional file 12 and Additional file 16). Consistent with the observed differential occupancy, genes that lost activation with A477T generally had GBRs that contained GR motifs that are good full sites ( Figure 6B), which logically also constitute good half sites. Inversely, genes that gained activation with A477T had GBRs that contained poor full sites and good half sites. Non-selective genes also had GBRs that contained poor full sites and good half sites, suggesting that GBRs near non-selective genes, like those near genes that gained activation, are bound by GRα at half sites ( Figure 6B).
Next, we investigated whether there were GRα-regulated genes bearing half site GBRs and apparently lacking full Found by DREME within training set of 1,000 peaks (see Figure 5 and Additional file 10).
site GBRs (Additional file 7 and Additional file 12). This restrictive criterion should define genes most likely to be regulated by GRα via half sites. We did not restrict our selection with respect to regulation by A477T, as some genes regulated by half sites might not be regulated by A477T due to impaired conformational dynamics and/or co-regulator recruitment. We defined as a full site motif any sequence with an individual information of at least 8.94 bits (the 95th percentile of randomly permuted GBR sequences), and as a half site motif any sequence with an individual information of at least 8.00 bits, with respect to the given motif. By comparison, the full site and half site motifs had 11.9 bits and 9.3 bits of information, respectively, relative to GBR background sequence (Additional file 15). We additionally required that the full site motif's individual information at candidate GBRs be less than the half site motif's individual information. In total, there were 1,441 GRα-regulated genes, 607 (42%) of which had associated GBRs within 20 kb of the transcriptional start site. We found that 35 of these 607 genes met our definition, and that this number declined slowly with an increasing window of association (27 within 30 kb, 21 within 40 kb, 17 within 50 kb).

GRα monomers bind half sites in vitro
We next investigated the in vitro binding of glucocorticoid receptor DNA binding domain (DBD) to an oligonucleotide containing a half site adjacent to an 'anti-motif 'that is, the least favorable binding sequence according to the half site motif -(gtacAGAACAtttGTGAGGtcgac) by electrophoretic mobility shift assay [4]. GRα binding was fitted with the Hill equation and a Hill coefficient of 1.080 ± 0.065 (mean ± standard deviation) was determined ( Figure 7A), consistent with no cooperativity. This indicates Figure 6 GR half sites predominate at GBRs near non-selective, activated genes. (A) More 17mers containing half sites (green circle) than full sites (red circle) are expected because the information-containing region of the motif is shorter; similarly, most full sites contain a good half site because the constraints (that is, the sequence motifs) are similar and there are two positions available in a full site for a good half site due to the palindromic nature of dimer. Nonetheless, not all full sites contain half sites, as the half site motif is more strongly constrained than the full site motif; the additional specificity may be due to the lack of cooperative binding at these sites.
(B) Individual information (measured in bits) of the best full site (x-axis) and half site (y-axis) found at GBRs (circles) located within 20 kb of the transcriptional start sites of a set of genes; contour lines encompass same number of GBRs, and thus localize regions of high GBR density. Genes that lost activation with A477T (B, red) generally had peaks with good full sites (>10 bits) and good half sites (>9 bits). Genes that gained activation with A477T (B, green) generally had GBRs with poor full sites (<5 bits) and good half-sites (>10 bits). Genes that were non-selectively c activated (B, blue) also generally had GBRs with poor full sites and good half sites.
that a strong half site adjacent to an unfavorable DNA sequence is bound by GRα as a monomer.

Discussion
Half site motifs suggest GR monomers bind DNA and regulate specific genes We set out to find whether sequence motifs bound by GR might be decomposed into multiple specific motifs that each relate to the utilization of specific functional surfaces of GR. We found that A477T, a mutation in the dimerization interface of GR leading to reduced DNAbinding cooperativity and increased dissociation rates in vitro [4], resulted in gains and losses in glucocorticoidinduced gene regulation and in GBR occupancy in vivo. Analysis of A477T-occupied GBRs revealed a hitherto unrecognized half site motif. The half site motif, coupled with a failure to detect non-GR partner motifs, suggested that A477T might bind GBRs as a monomer in vivo. Furthermore, A477T regulated many genes, suggesting that GR monomers are sufficient for gene regulation. The question remains, then, whether GRα monomers bind to DNA in a sequence-specific manner and regulate a set of target genes. Curiously, we found that some genes were regulated by A477T and not by GRα, and some GBRs were occupied by A477T and not by GRα. We speculate that GRα fails to bind such GBRs because it is titrated away by cooperatively bound, high affinity full sites, composed of 'degenerate' or weak half sites; such cooperatively bound sites would be unavailable to A477T due to its reduced cooperativity. Indeed, many GRα-selectively occupied GBRs near GRα-selectively regulated genes contained full sites lacking 'consensus' or perfect half sites; rather, these full sites were composed of two degenerate half sites ( Figure 6). These results suggest two orthogonal binding modes for GRα: (1) consensus half sites, most likely bound by monomers; and (2) degenerate full sites, most likely bound by dimers.
Alternatively, GRα occupancy may not be observed at A477T-specific GBRs because GRα dissociates from these regions more rapidly than A477T, reducing the signal below detection limits. However, A477T dissociates from DNA more rapidly than GRα in vitro [4] and in vivo [26], arguing against this model. Importantly, at any given site, some fraction of binding events may involve monomers and some fraction may involve dimers. We found that a minimum of 35 genes are regulated entirely by direct binding of GRα monomers. While many genes may be regulated entirely by direct, cooperative binding of GRα as a dimer, our results suggest that others may be regulated in part by monomers and in part by dimers.
Additionally, we found that GBRs were enriched within 20 kb from the promoters of regulated genes (relative to the whole genome), suggesting that promoter-proximal GBRs were responsible for many observed changes in gene expression. Similarly, we found that GRα-and A477T-selectively occupied GBRs were enriched near the promoters of genes that lost and gained activation, respectively, with A477T, consistent with these GBRs being responsible for gene regulation. Nonetheless, more than half of GBRs were located distal to genes (more than 50 kb from the transcriptional start site), and only 42% of GRα-regulated genes were associated with promoter-proximal GBRs (±20 kb). Further exploration is needed to understand the function (if any) of distal GBRs.

Monomeric GRα is functional in regulation
The ability of a transcription factor to function in gene regulation alternatively as a monomer or a dimer has been previously observed for SRY (sex determining region Y)-box 10 (SOX10) [27]. There are several lines of evidence suggesting that GRα, too, binds DNA alternatively as a monomer or a dimer. GRα and A477T bind specifically and with equal affinity to half sites in vitro [4], with a Hill coefficient of approximately 1, consistent with binding as a monomer to half sites. Additionally, single-molecule imaging of GR binding in vivo indicates that A477T dissociates from DNA 10-fold faster than GRα [26], consistent with the observation that A477T has reduced cooperativity when binding full sites in vitro Figure 7 GRα binds a half site in distinct conformation. Hill plot derived from an electrophoretic mobility shift assay describing the fraction of oligonucleotides containing a half site with an adjacent 'anti-motif' -that is, the least favorable binding sequence according to the half site motif -(gtacAGAACAtttGTGAGGtcgac) bound at increasing GRα concentrations [4]. The Hill coefficient was computed from the Hill Equation (right) to be 1.080 ± 0.065. [4]. These results collectively suggest that A477T binds as a monomer in vivo. Furthermore, deep sequencing of DNase I-treated chromatin, in combination with ChIPseq, reveals that hormone-induced binding of androgen receptor (AR, HUGO symbol NR3C4), a related nuclear receptor, was associated with DNase footprints the size of AR monomers at two-thirds of binding regions [28]. Given the significant homology between AR and GR, these results suggest that GRα is also capable of binding as a monomer in vivo.
DNase footprints measured in aggregate, and even direct imaging of individual binding events, are insufficient to demonstrate that certain GBR binding events promote regulation of target genes. Joint measurements of gene expression and GBR occupancy are the minimum necessary data to infer function of GBRs. Our finding that at least 35 regulated genes contain only GBRs with half sites within 20 kb of their transcriptional start site provides the best evidence to date that GRα binds as a functional monomer in vivo, resulting in regulation of target genes.

Implications of GR monomers for physiology
If GR and other nuclear receptors bind DNA alternatively as dimers or as monomers, then these two receptor species may represent distinct drug targets [29]. Glucocorticoids, like other steroid hormones, regulate diverse aspects of physiology, making them potent but blunt tools for medicine. Significant interest in 'selective' or 'dissociated' glucocorticoid receptor agonists [30,31] that separate the beneficial and adverse clinical effects of glucocorticoid treatment has been driven by the success of similar drugs targeting the estrogen receptor, especially in treating breast cancer and osteoporosis [32]. Drugs targeting nuclear receptors commonly bind to a pocket in the ligand binding domain, normally bound by an endogenous hormone (cortisol in the case of human GR). Alternatively targeting the dimerization interface or DNAbinding interface to separate dimer and monomer activity may help produce improved selective modulators for GR and other receptors.
Encouragingly, drugs modulating AR conformation and activity, subsequent to hormone binding, have been previously found [33]. Our data make it difficult to infer the potential effects of such selective modulators on GR activity. However, observations of mice harboring the equivalent mutation to A477T indicate that inhibiting GR dimerization may prevent glucocorticoid-induced decreases in skin and bone collagen synthesis, as well as decreases in insulin sensitivity, while maintaining or even exaggerating glucocorticoid-induced increases in triglyceride synthesis [29]. Moreover, such an inhibitor may cause an increase in lipogenesis, possibly due to the regulation (by monomers) of target genes not normally regulated by GR [29]. A selective modulator that inhibits the regulatory function of GR dimers may have different effects from one that inhibits their formation; both would inhibit regulation of dimerization-dependent genes, but the latter may also drive regulation of new target genes, similar to those observed in the dimerization interface mutant.
Evolution and structure of hormone-regulated gene networks Evidence for existence of functional AR [28] and GR monomers bound to consensus half sites suggests that similar monomeric forms may exist for all nuclear receptor subfamily 3 group C transcription factors (GR, mineralocorticoid receptor, progesterone receptor, and AR), and possibly more distantly related nuclear receptors. These results may have important implications for how hormone-regulated gene networks evolved. Binding of receptor monomers to consensus half sites may present an evolutionary path for establishing robust response elements wherein one perfect half site appears, at random or by insertion or rearrangement, and allows for regulation by monomers, while also promoting selection of a second site with the proper spacing for cooperative binding by dimers; the full site would then be resistant to mutations that might have perturbed binding by monomers. Alternatively, the half site may have preferred function, and may therefore remain stable. Analysis of mutations equivalent to A477T in other nuclear receptors and cell lines may help clarify the functional and evolutionary role of half sites.
Additionally, other GR functional surfaces, including but not limited to AF1, AF2, and the lever arm, may be associated with more narrowly defined motifs that may either be subsets of the full site GBS motif, or may differ from it. Such motifs may be detected by analyzing differentially occupied GBRs near regulated genes selectively affected by perturbations to a functional surface. In cases where binding is not perturbed, discriminatory motif analysis between GBRs proximal to differentially regulated genes and GBRs proximal to non-selectively regulated genes may yield similar answers.
Moreover, structural methods such as X-ray crystallography, electron microscopy, and NMR may define directly the conformational changes induced in GR upon binding to these motifs. These approaches may confirm that GR binding to a suspected 'sub-motif ' induces changes in the functional surface being studied and drives allosteric changes in remote functional surfaces.

Conclusions
GREs differ in their precise sequence motifs, in their modes of GR binding (monomer or dimer; direct or indirect), and in the functional GR surfaces required for binding or regulation. Structural changes induced in GR upon binding different 'sub-motifs' (refinements of the canonical motif) may be important determinants of GR activity at particular response elements. We have shown that GR occupies in vivo many genomic regions containing half sites, and that these regions are likely responsible for regulation of a subset of target genes. Moreover, GR specifically binds to half sites in vitro, without apparent cooperativity, and thus as a monomer.
Half sites are easily overlooked among those sequences that conform to the known GR full site motif, but they have distinct functional properties compared to degenerate full sites, which may be predicted to bind with similar affinity. Identification of GREs with other sub-motifs, the conformations those GREs induce, and the functional surfaces they utilize will be important steps in defining predictive models of GR activity and in developing therapeutic strategies to modulate GR activity.

Materials and methods
Cell culture U2OS cells stably transfected with rat GRα, GRα 30iiB (30iiB; E219K/F220L/W234R), GRα A477T (A477T), GRγ, or GRα E773R (E773R) were grown as previously described [2]. Parent U2OS cells do not manifest any endogenous response to glucocorticoid treatment. DNA sequencing analysis was used to confirm the presence of point mutations in all cell lines. A549 cells were obtained from American Type Culture Collection and maintained in DMEM supplemented with 10% volume/volume (v/v) fetal bovine serum.

Gene expression measurements using microarrays
Cells were plated in 6-well plates using DMEM supplemented with 5% v/v fetal bovine serum. Dexamethasone (Sigma) at a final concentration of 100 nM was added to the medium for 2, 4, or 24 h; for the '0 h' time point, cells were treated with vehicle (ethanol) only for 4 h. Cells were lysed and total RNA was isolated using QIAshredder and RNeasy mini columns (Qiagen). The quality of RNA samples was evaluated by A260/A280 ratio which was at least 1.9 and the integrity was analyzed using the Bioanalyzer 2100 (Agilent) with the Experion RNA Stdsens analysis kit (Biorad). For each experimental condition 2 μg of high quality total RNA was submitted to the University of Southern California Epigenome Center. Four biological replicates of each experimental sample from independent experiments that were collected on different days were randomly placed on Illumina Human Ref8 beadchips and processed following standard Illumina procedures.

Genome build
All coordinates reported are for the UCSC human genome build hg19 (GRCh37).

Computational analysis of gene expression data
Bead-level data outputted from Illumina's BeadScan software were read using the beadarray package [35] from Bioconductor [36] with the R programming language [37]. BASH [38] was used to correct for compact and diffuse spatial artifacts on the arrays. Bead-level data were summarized using beadarray, filtering out beads that were more than three mean absolute deviations from the median. Probe annotations were retrieved using the illumina-Humanv3.db package from Bioconductor; poor quality probes and probes not corresponding to known HUGO gene symbols were removed from the data before further processing. Bead summary data were log2 transformed and quantile normalized. We then used the non-parametric empirical Bayes method described by [39] to correct for chip-to-chip batch effects. The raw data and batchcorrected, normalized data have been deposited in NCBI's Gene Expression Omnibus [40] and are accessible through GEO Series accession number GSE45407 [41].
The limma package [7] from Bioconductor was used to determine differential gene expression/regulation: the lmFit function was used to determine relative expression at each condition (time point, expressed allele); the contrasts.fit function was used twice, first to determine fold-change at 2, 4, and 24 h relative to the '0 h' time point (4 h ethanol treatment), and second to determine the difference in fold-change between each expressed allele and GRα; the eBayes function was used to compute statistics in each case. An adjusted [42] P value cutoff of P <0.05 was used to determine which genes are differentially expressed. We found there was insufficient statistical power to identify many differentially regulated genes at 2 h. This is likely due in part to the limitations of measuring steady state total RNA levels rather than newly generated transcripts, which was not feasible at the time of this study. The ggplot2 package [43] was used to visualize the results. For a more detailed procedure see (Additional file 17).

Chromatin immunoprecipitation (ChIP)
U2OS cells stably expressing one of GRα, A477T, 30iiB, or E773R were grown in 3× 75 cm 2 dishes and treated with 100 nM dexamethasone for 90 min. The following procedure is adapted from [19]. We added formaldehyde to a final concentration of 1% for 3 min at room temperature and then quenched by adding 2.5 M glycine to a final concentration of 125 mM glycine. Cells were incubated at 4°C for 10 min and then washed with PBS for 5 min. We added 5 mL ice-cold IP lysis buffer (50 mM HEPES-KOH pH 7.4, 1 mM EDTA, 150 mM NaCl, 10% v/v glycerol, 0.5% v/v Triton X-100) with 1× protease inhibitor cocktail added (PIC; Roche Complete) to each flask to lyse cells and scraped cells into 50 mL conical tubes. Lysed cells were nutated for 30 min and pelleted by centrifugation at 4°C for 5 min at 600 times gravity. After removing the supernatant, nuclei were resuspended in 900 μL RIPA buffer (10 mM Tris pH 8.0, 1 mM EDTA, 150 mM NaCl, 5% v/v glycerol, 0.1% sodium deoxycholate, 0.1% SDS, 1% v/v Triton X-100) with 1× PIC added and frozen at −80°C.
The remaining ChIP procedure was performed at 4°C. Protein G Dynabeads (Invitrogen) were prepared for IP as follows: beads were pelleted (using a magnet); liquid was removed; beads were washed twice with RIPA, then resuspended in RIPA with 1× PIC and 0.5 mg/mL BSA at the original bead concentration; 12 μg N499 α-GR antibody were added per 100 μL beads, and the beads were nutated for 2 h.
Nuclei were thawed in ice water. Chromatin from nuclei were fragmented by sonication at 4°C in 300 μL aliquots in 1.5 mL microcentrifuge tubes using the Diagenode Biorupter for 8 cycles of 7 min (0.5 min on / 0.5 min off). Samples were spun at 14,000 revolutions per minute (rpm) and for 15 min to pellet insoluble molecules, and the supernatant was transferred into new microcentrifuge tubes. We saved 10% of each sample to isolate later as input DNA (GRα and A477T only). Protein G Dynabeads with antibody bound were pelleted, washed twice with RIPA, and resuspended at the original bead concentration in RIPA with 4× PIC and 2 mg/mL BSA. Sonicated chromatin (900 μL) from each sample was incubated with 100 μL of antibody-bound beads for 16 h.
After pulldown, beads were washed four times with RIPA with 500 mM NaCl, then four times with LiCl buffer (20 mM Tris, 1 mM EDTA, 250 mM LiCl, 0.5% NP-40, 0.5% sodium deoxycholate), to decrease non-specific binding. To reverse crosslinks, beads were then resuspended in 11 μL RIPA with 500 mM NaCl plus 89 μL Rev-Xlink buffer (0.7% SDS in TE pH 8.0 with 0.2 mg/mL Proteinase K). A total of 10 μL of each input DNA sample was similarly treated by adding 1 μL 5 M NaCl and 89 μL Rev-Xlink buffer. Samples were incubated in a PCR block at 55°C for 3 h, then 16°C for 16 h, then column purified using the Clean & Concentrator-5 kit (Zymogen) at a DNA binding buffer:sample ratio of 5:1.

Library preparation for ChIP sequencing (ChIP-seq)
Fragments were end-repaired for 30 min at 20°C in 25 μL reactions with 1× T4 DNA Ligase Buffer w/ 10 mM ATP (NEB), 100 uM dNTPs (Invitrogen), 3.75 U T4 DNA Polymerase (NEB), 1.25 U DNA Polymerase I, Large (Klenow) Fragment (NEB), and 12.5 U T4 Polynucleotidse Kinase. Samples were column purified using the Clean & Concentrator-5 kit (Zymogen) at a DNA binding buffer: sample ratio of 2:1 following this and subsequent enzymatic reactions. dATP was added to the 3' end of molecules by incubating for 30 min at 37°C in 25 μL reactions with Buffer 2 (NEB), 200 uM dATP (Invitrogen), and 7.5 U Klenow Fragment (3'-> 5' exo-). Sample concentrations were measured using PicoGreen (Invitrogen), and then ligated to sequencing adapters at a 2:1 (adapter:sample) molar ratio in 20 μL reactions with 1X T4 DNA ligase buffer w/ 10 mM ATP (NEB) with 6% PEG-8000 and 200 U T4 DNA Ligase (NEB). Libraries were amplified by PCR for 17 cycles using the standard Illumina paired-end primers, and purified by PAGE as described in [44]. Sample concentrations and size distribution were measured using the High Sensitivity DNA kit for the Bioanalyzer 2100 (Agilent) prior to sequencing.

Adapters and barcodes for ChIP-seq
In-house barcodes were used for ChIP samples, which were added to the 3' ends of the sequencing adapters and determined using the first four bases of the read. Barcodes are TCAT (BC1), GACG (BC2), AGTC (BC3), and CTGA (BC4). The ChIPs for GRα and A477T were each ligated to all barcodes, pooled, and sequenced in their own lanes. Samples for 30iiB and E773R were ligated to BC1-2 and BC3-4, respectively, and pooled. Input samples were ligated to indexed TruSeq adapters using indices 5 and 7 for GRα and A477T inputs, respectively.

ChIP-seq procedure
ChIP samples were sequenced at the University of California San Francisco Center for Advanced Technology using the Genome Analyzer II (Illumina) with 2× 75 bp paired end reads as per the manufacturer's instructions. Input DNA samples were sequenced using the HiSeq 2000 (Illumina) with 2× 100 bp paired end reads. Raw sequence reads were deposited in the Sequence Read Archive (SRA) at the National Center for Biotechnology Information (NCBI) [45] and are accessible through SRA study accession number SRP020242.

Computational analysis of ChIP-seq data
We automated the analysis procedure using the Python programming language in order to make our results easily reproducible and to facilitate similar analysis by others. The scripts used -preprocess_reads.py, align2.py, call_peaks.py, and analyze.py -are described here and are available in the software package seriesoftubes on the Python Package Index (PyPI). seriesoftubes depends in part on Biopython [46]. We used preprocess_reads.py (seriesoftubes) to separate the sequencing results by barcode/index, saving only reads with valid barcodes/indices (on both ends in a pair for internal barcodes), and merged barcodes/indices as applicable to consolidate sequence data for each sample. We then used align2.py (seriesoftubes) to align sequences to the genome using bowtie2 [47], allowing a maximum insert size of 600 bp (based on our observation of an average library size of <150 bp) and saved in BAM format using samtools [48]. Note: for internal barcodes, the first five bases contain the barcode plus an additional 'T'; these positions were ignored when running bowtie2.
We used call_peaks.py (seriesoftubes), which relies on the MACS2 a callpeak module, to find significantly occupied peak regions ('peaks') and summit positions using an adjusted P value cutoff of P <0.01 and to generate pileup tracks (signal per million reads). Additionally, the MACS2 diffstats module was used to compute differential occupancy statistics and find (mutant versus GRα) differentially occupied peaks at using an adjusted P value cutoff of P <0.05. For convenience, we have created a track hub formatted for the UCSC genome browser [49] containing paired-end read alignments, signal pileups, peak region locations, and peak summit locations [50].

GBR to gene assignment
Peaks were assigned to genes using GREAT [9], with a basal domain of ±20 kb of a gene's primary transcriptional start site (where assignment is guaranteed) and an extended domain of ±100 kb (where assignment is made only if it does not overlap with another gene's basal domain); if a peak was assigned to multiple genes, it was ignored in subsequent analyses.
Motif analysis MEME-ChIP [16] was used for motif discovery as described in the text. For all motif analyses, we used only peak regions (±75 bp from the peak summit) that do not overlap with repeats, found by using bedtools [51] to intersect peaks regions with the 'rmsk' table from the RepeatMasker [52] track for hg19, extracted using the Table Browser [53]. Peak sequences (±75 bp from the peak summit) were then extracted from the UCSC hg19 build of the human genome using the twobitreader package (available on PyPI). Occurrences of motifs were found using analyze.py (seriesoftubes), which relies on MOODS [54] for efficiency. JASPAR [55] motifs MA0144.1 and MA0099.2 were used to find STAT3 and AP1 sites, respectively. Peak sequences were randomly permuted and the 95th percentile of individual information for the best motif in each permuted peak sequence was used as a cutoff for determining presence of GR full site, STAT3, or AP1 motifs (8.94, 8.06, and 9.49 bits, respectively). Data were integrated for further analysis using the R programming language (Additional file 10) and visualized with ggplot2 [43].

Luciferase transcriptional reporters
Selected GR binding regions were cloned into the firefly luciferase reporter vector pGL4.10-E4TATA d using conventional methods (that is, PCR and restriction enzymes). A table of reporter names, primers, and genomic coordinates is provided in Additional file 18. We used the Renilla luciferase reporter vector pGL4.70-MLE2 e as a normalization control, and the mammalian expression vector p6R as 'filler' DNA to reach the recommended concentration of DNA for transfections.
U2OS cells stably expressing either GRα or A477T were grown in three 10 cm dishes each, split, and seeded into two 96-well plates at a final density of about 5 × 10 4 cells/well. The next day, cells were transfected with 3.33 ng firefly reporter, 150 pg Renilla reporter, and 20 ng p6R per well using PLUS Reagent and Lipofectamine (Invitrogen) according to manufacturer instructions. After 16 h, cells were treated with 100 nM dexamethasone or vehicle only (ethanol) for 3 h, and then luciferase expression was assayed using the Dual-Luciferase Reporter Assay System (Promega) according to manufacturer's instructions.
Endnotes a MACS2 will be described in more detail in a forthcoming publication. Briefly, peak calling uses a Poisson test as in the first version of MACS [56] and differential occupancies are determined using a log-likelihood ratio test, and transformed to P values using Wilks' theorem [57]. Pre-release versions of MACS2 are available online at [58]. b GBRs with different locations among two samples were determined as those GBRs that were detectable in one sample and not the other (by overlap) and that did not significantly differ in occupancy among the two samples. This quantification of GBRs with differing locations is only an estimate, as some GBRs that differ in occupancy may appear to have similar occupancy. This effect results from variation in the observed numbers at a given GBR, primarily due to Poisson counting error. c To determine whether fold-changes in gene expression between two conditions were less than two-fold different at a significance level of 95%, we constructed 97.5% confidence intervals for the fold-change in gene expression under each condition and required that they do not overlap at all. The 97.5% confidence intervals were computed as the fold-change plus or minus 2.447 -the t-statistic for six degrees of freedom (eight total sample measurements minus two degrees for determining their respective means)divided by the square root of two (because subtracting two groups of four yields an equivalent n of 2), and