H3Q85C cleavage mapping precisely locates dyads of single nucleosomes
H4S47C positions are derived by averaging many cleavages. There are also non-specific cleavages in linkers and NDRs due to free phenanthroline, contributing to noise in H4S47C-anchored cleavage mapping (Fig. 1a, tracks 1–5). To obtain direct measurements of dyad positions and reduce the noise due to non-specific cleavages, we tested mutants at positions farther away from the dyad so that two cleavages on either side of the dyad would give rise to a fragment that is long enough to be sequenced. We found that H3Q85C cleaves nucleosomal DNA and releases 51-bp fragments, as expected from the position of this residue close to the DNA minor groove based on the crystal structure of the nucleosome (Additional file 1: Figure S1). Cleavages on either side of the dyad of a nucleosome result in a 51-bp fragment and drastically reduce the noise due to non-specific cleavages. As each 51-bp fragment arises from a single nucleosome, its midpoint corresponds to the precise position of a nucleosome and directly informs about the distribution of nucleosome centers on the genome without the need to average cleavages from multiple nucleosomes. By using 51-bp fragments generated by H3Q85C cleavages for mapping rather than the linker-spanning fragments generated by H4S47C cleavages, we avoid the background due to cleavages within linkers, which is especially important for distinguishing regions that are merely nucleosome-depleted from those that are virtually nucleosome-free (Fig. 1a; compare tracks 6–8 with tracks 1–5). This reduction in noise is reflected in the average nucleosome distribution relative to the transcription start site (TSS) (Fig. 1b) by the reduced nucleosome density at promoters and increased peak heights at the typical nucleosome positions. Wild-type control cells treated similarly to the mutants show a high density of cleavages at TSSs and linker regions, evidence that the elevated level of cleavages over TSSs using the H4S47C mutant is caused by free phenanthroline.
Although, in addition to intranucleosomal fragments (~51 bp long), H3Q85C cleavages also produce longer internucleosomal fragments, we can easily purify the intranucleosomal fragments using polyacrylamide gel electrophoresis (PAGE) (Fig. 1c). With this increased sensitivity in detecting H3-containing nucleosomes, we asked if this data set could clarify nucleosome occupancy at other genomic landmarks. One model for the budding yeast centromere posited a Cse4-H3 heterotypic nucleosome [33]. With the new H3 localization data, we are able to directly interrogate the presence of histone H3 at yeast centromeres and directly distinguish between these alternative models. When we compared the heat map of fragment ends mapping around the ~ 80-bp centromere DNA element (CDEII) sequence of all yeast chromosomes from our previous H4S47C data [26] and our present H3Q85C data, we observed striking differences (Fig. 1d). The “railroad track” in the H4 data denotes the precise positioning of H4S47C at the centromere. In contrast, we observe no cleavages at the centromere in the H3 data, but uniform coverage of reads adjacent to the centromere, as seen also by averaging the fragments relative to CDEII (Fig. 1e). Thus, H3Q85C mapping rules out the presence of H3 at yeast centromeres. Similarly, at the transfer RNA (tRNA) genes, we see a striking nucleosome depletion in H3Q85C maps (Fig. 1f), whereas H4S47C maps show a high level of cleavages that were likely generated by free phenanthroline (Fig. 1f). Finally, the higher signal-to-noise ratio of H3Q85C enables us to discern alternate rotational positions for nucleosomes defined by H4S47C mapping. We plotted the distribution of nucleosome dyads deduced either from the centers of ~ 51-bp fragments produced by H3Q85C cleavages, or from the individual cleavages produced by H4S47C, relative to the +1 nucleosome positions defined using published H4S47C nucleosome calls (Fig. 1g). Not only did we observe a higher signal at the called dyad itself, but we also observed at least three alternate rotational positions upstream and downstream of the +1 dyad that are invisible to H4S47C mapping. In summary, relying on two intranucleosomal cleavages in H3Q85C mapping enables the generation of a superior nucleosome map in budding yeast.
H3Q85C cleavage mapping identifies rotational positioning preferences of all nucleosomes in the genome
Within the DNA sequences that are wrapped around the histone octamer, there is a compositional preference for WW dinucleotides (AA, TT, AT, or TA) to occur where the minor grooves of DNA directly interact with histones, while SS dinucleotides (CC, GG, CG, or GC) occur where the major grooves of DNA are facing the histones [15]. These alternating DNA sequences have a higher affinity for nucleosome formation [34], and they have been observed among the nucleosomal sequences in different organisms [35,36,37]. Evolution of this weak compositional preference has a structural rationale, in which more bendable sequences tend to be in contact with the histones every 10 bp and less bendable sequences tend to be solvent-exposed, to facilitate the tight wrapping around the nucleosome core [15]. Originally described using only 177 MNase-generated chicken nucleosomes [36], this enrichment has been confirmed for the yeast genome using MNase-seq [38] and is also seen using H4S47C-anchored cleavage [23]. These analyses are based on called nucleosome positions, which represent a small subset of nucleosomes that can be identified based on their high signal-to-noise ratio using these methods. However, with H3Q85C cleavage mapping, each ~ 51-bp fragment identifies one unique nucleosome without requiring any additional filtering or statistical inference. Thus, we calculated the dinucleotide preference based on every ~ 51- bp fragment. We observed that nucleosome centers determined by H3Q85C-anchored cleavage mapping also show a cumulative rotational phasing of WW and SS dinucleotides (Fig. 2a). The power of this analysis is apparent when we compare the WW/SS distributions generated from every read of H3Q85C mapping to WW/SS distributions generated from every read of H4S47C mapping (Fig. 2b). The WW/SS distributions appear stronger and phased for H4S47C mapping only when limited to the strong “called” nucleosome positions (Fig. 2c), representing a subset of the nucleosomes compared to H3Q85C mapping, which in principle represents all nucleosomes genome-wide.
Precise identification of NDRs and flanking nucleosomes
Most yeast genes have a stereotypical nucleosome organization in their promoters, characterized by an NDR just upstream of the TSS, which is flanked by regular arrays of nucleosomes [39,40,41,42,43]. Various factors are partly responsible for this organization: (1) non-histone proteins bind to promoters and compete with nucleosome formation [44,45,46]; (2) some DNA sequence motifs present in promoters are unfavorable for nucleosome formation [47, 48]; (3) chromatin organization is disrupted by transcription [49,50,51] and replication [52,53,54,55]; and (4) nucleosome positioning is maintained by chromatin remodelers [56,57,58,59,60].
The lack of background cleavages at promoters using H3Q85C mapping (Fig. 1a, tracks 6–8) facilitates identification of the exact locations of every NDR and the corresponding flanking nucleosomes (+1 and –1 nucleosomes). Using a previously described procedure [60], we identified all NDRs and the precise locations of +1 and –1 nucleosomes in S. cerevisiae (Additional file 2: Table S1). We then aligned all NDRs and sorted them according to their widths (Fig. 3a, b). Figure 3b demonstrates the high resolution of our nucleosome positioning data, in which we can distinguish even alternative rotational positions that are occupied by nucleosomes in different cells. These alternative positions from different cells differ by the magnitude of the DNA helical twist (~10 bp), so they can only be distinguished if the accuracy of the nucleosome positioning data has a resolution < 5 bp (the separation between a maximum and a minimum in the dyad distribution), which is not attainable using MNase-seq.
Analyzing other published data sets by aligning the promoters in the same way, we observed that most of the NDRs are flanked by H2A.Z-containing nucleosomes (Fig. 3c) (data from [61]) and are bound by a multitude of proteins, e.g., TATA-binding protein (TBP, data from [62]), Reb1 (data from [63]), and Sth1, Snf2, Ioc3 (data from [64]) (Fig. 3d-h). As expected, the NDRs are also DNase I hypersensitive (data from [65]) and accessible to transposase Tn5 (data from [66]) (Fig. 3i, j), indicating that our identification of NDRs captures known properties of yeast promoters.
The precise positions of +1 and –1 nucleosomes can be used to get insights about important processes that involve nucleosomes, e.g., to study the nucleosome shifts, nucleosome eviction, and spacing changes produced by chromatin remodelers with high resolution at the genome-wide scale, or to study the precise locations of pausing when polymerases transcribe through nucleosomes. As a proof of concept, we re-analyzed the nucleosome shifts produced through depletion of RSC from yeast cells. Previously, it was shown that in rsc8-depleted cells, nucleosomes near the gene promoters invade the NDR, reducing its width [58], although the precise mechanism of nucleosome displacement remains unknown. We wondered whether the +1 and –1 nucleosomes shift toward the NDRs by a random number of base pairs, or whether they change their locations among the alternative rotational positions observed in Fig. 3b by shifting by an integral number of helical twists along the DNA, thus maintaining the same rotational setting. Using the reference positions that we identified for +1 and –1 nucleosomes in wild-type cells and the MNase-seq data for rsc8-depleted and wild-type cells [58], we made an unexpected discovery. The +1 and –1 nucleosomes preferentially occupy the same rotational positions, both in wild-type and in rsc8-depleted cells (Fig. 3k), and it is only the relative occupancy of each of these alternative positions that changes in the mutant cells. This suggests that RSC maintains the proper NDR width by shifting nucleosomes in steps equal to the helical twist, maintaining at the same time the preferred rotational orientation of the DNA relative to the histone core. Aligning all DNA sequences of +1 and –1 nucleosomes, we observed that the average G/C content of these two loci is also oscillatory (Fig. 3k, bottom heat maps). This suggests that at the single-nucleosome level, the histone core is able to sample an ensemble of favorable rotational positions, determined by the underlying DNA sequence, and chromatin remodelers affect the stochastic process of selecting the individual positions of nucleosomes in each cell.
To better understand the nucleosome phasing patterns near gene promoters, we realigned all promoters at the +1 nucleosomes, and sorted the genes according to the distance between the +1 nucleosome and the NDR of the nearest upstream gene (Fig. 4a). We also separated the divergent genes (top part of Fig. 4a) from the tandem genes (bottom part of Fig. 4a). To maximize the signal, we combined dyads from our previously published H4S47C cleavage data (~ 300 million mapped nucleosomes, [26]) with the current H3Q85C cleavage data (~ 10 million mapped nucleosomes). The nucleosomes on the gene bodies are well phased relative to the NDR, as indicated by the regular pattern of vertical red stripes in Fig. 4a. The nucleosomes form regular arrays on both sides of the NDRs. Between neighboring NDRs, the nucleosome distribution produces an interference pattern, in which overlapping signals originate from each NDR [58]. When the space between the neighboring NDRs is a multiple of the typical nucleosome repeat length (NRL) (~ 165 bp in yeast [60]), we observe constructive interference of the two phasing signals, and well-positioned nucleosomes form a long regular array from one NDR to another. From Fig. 4a, we observe that nucleosome organization near gene promoters depends on the location and orientation of the upstream genes, and so averaging all configurations into an aggregate plot as is commonly done can be misleading, especially for the region upstream of the TSS.
We performed a similar analysis of the 3’ ends of the genes. It is known that the transcription termination site (TTS) is also a region of reduced nucleosome density, so the TTS could be another nucleosome phasing element, similar to the TSS. To test this hypothesis, we realigned all yeast genes at their TTSs and sorted according to the distance between the TTS and the NDR of the downstream gene (Fig. 4b). To distinguish the role of TTS per se, we separated the convergent genes (top part of Fig. 4b) from the tandem genes (bottom part of Fig. 4b), which typically have their TTS very close to the TSS of the neighboring downstream gene. We observed that the TTS of convergent genes is not characterized by strong nucleosome depletion. For tandem genes, strong nucleosome depletion is observed only when the TSS of the downstream gene is very close to the TTS of the reference gene. Moreover, the TTS is not a phasing element, and the only nucleosome phasing observed near the TTS is due to the presence of other TSSs in their vicinity. In the case of convergent genes, which have their TTS far from interfering TSSs of downstream genes, we do not observe vertical red stripes, which would indicate regular arrays of nucleosomes, phased relative to the TTS.
The presence of regular nucleosome arrays flanking the NDRs raises some interesting questions. Is this nucleosome organization encoded in the DNA sequence? Is the DNA sequence the most important factor that dictates the nucleosome positions in vivo? In order to answer these questions, we re-analyzed the nucleosome organization obtained by in vitro reconstitution [13]. Aligning the genes as in Fig. 4a, b, we see that reconstituted nucleosomes do not form the phased regular arrays flanking the NDRs (Fig. 4c, d), as observed in vivo (Fig. 4a, b). Instead, the nucleosome distribution is much more disorganized in vitro. Weak NDRs were observed in the reconstituted chromatin at both gene ends, with a stronger depletion present at the TTS. Surprisingly, strong NDR positions are observed for aligned TTSs in vitro, even though no strong NDRs are present in vivo. The locations of these strong in vitro NDRs coincide with the A/T-rich regions (Fig. 4e, f — genes aligned as in panels a, c and b, d, respectively). Although it is possible that there is reduced nucleosome formation at A/T-rich regions, the well-known preference of MNase for A/T-rich regions is sufficient to account for the NDRs at TTSs (Additional file 1: Figure S2) [20, 67]. In vitro, nucleosomes would form with similar affinities on every DNA sequence, but during the mapping process involving chromatin digestion by MNase, the nucleosomes from A/T-rich regions are digested faster, and are underrepresented in the sequenced library after extensive digestion (Additional file 1: Figure S2). In vivo, MNase sensitivity of nucleosomes wrapped by A/T-rich DNA sequences was previously reported both in S. cerevisiae [68] and in Drosophila [69].
H3Q85C cleavage mapping demonstrates linker “quantization”
The precise measurements of dyad positions and nucleosome occupancy obtained using H3Q85C cleavage mapping also provide a rigorous method for investigating linker lengths and nucleosome crowding in vivo [70]. We addressed these issues using our full cleavage data sets, which were derived from DNA samples that had not undergone gel purification and so included dyad-to-dyad fragments spanning all linker lengths. We aligned the positions of all +1 nucleosomes and computed the relative occupancy generated by the DNA fragments of each different size (Fig. 5a). The short fragments (~ 50 bp) were concentrated around the locations of nucleosome dyads (Fig. 5a), and they originate from two cleavages inside the same nucleosome (red segments in Fig. 5b). The medium fragments (~ 70–130 bp) were located between the consecutive nucleosome dyad positions (Fig. 5a), which indicates that they were the fragments resulting from two cuts inside neighboring nucleosomes produced at consecutive cleavage sites (black segments in Fig. 5b). The long fragments (~ 135–175 bp) overlap both with short and medium fragments (Fig. 5a), because they derive from cuts made by different nucleosomes at alternate cleavage sites (yellow segments in Fig. 5b).
During the cleavage reaction and library preparation steps, the ends of the DNA fragments resulting from a cut are slightly shortened, and a small gap arises at the location of the cleavage site (Fig. 5b). In order to estimate the sizes of these gaps, we computed the cross-correlations between the ends of different groups of DNA fragments. To obtain the left gap between the short fragments and the medium or long fragments, we computed the cross-correlations between the right ends of the short fragments (44–58 bp) and the left ends of the medium fragments (130–170 bp) or the left ends of the long fragments (135–175 bp), respectively. The overall maxima of the cross-correlations correspond to a lag of 5 bp, which is equivalent to a gap of 4 bp between the fragments (Additional file 1: Figure S3). The same value was obtained for the second gap from Fig. 5, by calculating the cross-correlations between the left ends of the short fragments and the left ends of the medium or long fragments. This gap is expected between two cleavages by a single cysteine-phenanthroline on opposite strands 5 bp apart according to our published structural model [26].
Knowing the lengths of the sequenced fragments and the gaps between them, we next estimated the typical NRLs and linker lengths in S. cerevisiae. From the histogram of the DNA fragment lengths (Fig. 5b) we inferred that the most abundant lengths of the long fragments are 147 bp, 157 bp, and 167 bp. From Fig. 5b we see that the NRL can be computed as the sum of the gap size and the length of a long fragment, so we obtain for the most frequent NRLs the following values: 151 bp, 161 bp, and 171 bp. As the NRL is the sum of the length of the nucleosomal DNA fragment and the linker length, and nucleosomal DNA is ~ 146–147 bp long [1, 2], we conclude that the most frequent linker lengths in budding yeast are 5, 15, 25 or 4, 14, 24, depending on the exact value for the length of the nucleosomal DNA. This proves that the linker lengths are “quantized,” in the sense that the most favored values for the linker lengths are discretized and separated by 10 bp.
Linker quantization at the gene level
The discretization of the linker lengths according to the rule L = 10 n + 5, at the global level, poses the following questions: Do some genes have linkers of 5 bp, others 15 bp, and others 25 bp? Or do individual genes also have a wide range of linker lengths, similar to the genome-wide linker length distribution? Are the linker lengths of an individual gene also discretized? Fortunately, our H3Q85C cleavage data permit the investigation of all these questions, as we can precisely map each internucleosomal DNA fragment to the correct location on the genome.
Yeast genes are often very small and are typically wrapped by only a few nucleosomes. In such cases, the number of reads mapped to the gene body is too low for reliably estimating linker sizes. Nevertheless, there are many yeast genes for which we obtained thousands of sequencing reads that allowed us to accurately estimate the linker length distribution for these genes.
We also analyzed the lengths of the long fragments (135–175 bp) that map to the bodies of single genes with > 5000 sequenced fragments (Fig. 6). We observed that individual genes are also characterized by a wide range of fragment lengths, similar to the genome-wide distribution (red curve in Fig. 6). This shows that in different cells and at different times during the cell cycle, individual genes can have somewhat different nucleosome organizations and NRLs, suggesting that chromatin organization is reacting to the stochastic processes that take place inside the nuclei.
Correlations among nucleosome spacing, transcription rate, and NDR width
One of the factors shown to have an effect on the nucleosome spacing is the level of transcription of a gene [71, 72]. In general, highly transcribed genes are characterized by a decreased nucleosome spacing [72], and the inactivation of RNA polymerase II via a temperature-sensitive mutation was shown to increase the nucleosome spacing [71]. As previously noted, the nucleosome phasing pattern can be explained to a first approximation by statistical positioning against promoter barriers [46]. The number of barrier complexes that are bound to a promoter is presumably correlated with the width of the accessible region, the NDR width. To test for possible relationships between NDR widths, transcription rates, and nucleosome spacing, we computed for each gene the average size of the long fragments (135–175 bp) from H3Q85C cleavage data, and we re-sorted the yeast genes according to this score (Fig. 7a; the genes with the longest average length of the H3Q85C cleavage fragments are at the bottom). This sorting score is a direct measure of the average nucleosome spacing, and it is closely related to the average linker length for each gene. As expected, the heat map confirmed that the arrays of nucleosomes at the top are more crowded, while the nucleosome arrays at the bottom are less crowded (toward the bottom of the heat map, the red stripes indicating nucleosome dyads bend to the right). This increase in nucleosome spacing was also observed when we grouped the genes from the heat map into five groups: quintile 1 containing the top fifth of the rows, quintile 5 containing the bottom fifth of the rows, etc. (Fig. 7b). The first quintile, with the most crowded nucleosomes, also had the widest NDR (Fig. 7b), which is characteristic of highly transcribed genes [51, 71]. To further test for a correlation between the chromatin organization and the transcription levels, we analyzed the levels of nascent transcripts obtained from native elongating transcript sequencing (NET-seq) data [73]. Indeed, transcription levels correlate with chromatin organization; the more active genes are the ones at the top of the heat map, corresponding to the genes with more crowded nucleosome arrays and wider NDRs (Fig. 7a).
Chromatin organization and nucleosome spacing also correlate with the amounts of linker histone H1 (Hho1 in yeast) present on the gene bodies and the level of incorporation of the H2A.Z variant into the +1 nucleosomes. Linker histone H1 is preferentially deposited on the genes with increased nucleosome spacing (Fig. 7c), in agreement with observations from multicellular organisms, where histone H1 and its variants have an important role in nucleosome spacing [74, 75]. Moreover, genes with increased nucleosome spacing also contain the +1 nucleosomes with the highest incorporation of H2A.Z. This agrees with the previous observation that, in yeast, H2A.Z is localized to the promoters of less active genes [76, 77], suggesting that during transcription, histone variant H2A.Z is preferentially evicted compared to the canonical histone H2A. As the genes with the more crowded nucleosome arrays from the top of the heat map are more transcribed, their gene bodies are marked by characteristic histone modifications, such as H3K4me3 (Fig. 7c), and their promoters contain higher amounts of TBP, which is probably an underlying cause of the extended NDRs of these genes. For each quintile shown in Fig. 7b, we computed the following quantities (Fig. 7d): average nucleosome spacing and average NDR width, estimated from Fig. 7b; average NET-seq signal on the gene bodies (data from [73]); average Rpb1 density on the gene bodies (data from [78]); average Rpb3 density on the gene bodies (data from [79]); average H1 cross-link density in the window (D-73, D + 750) bp, where D is the dyad position for +1 nucleosome (data from [80]); average H2A.Z occupancy for the +1 nucleosomes (data from [61]); average H3K4me3 occupancy computed in the same range as for the H1 density (data from [81]); average TBP occupancy in the window (D-500, D-73) bp, where D is the dyad position for +1 nucleosome (data from [62]). The strong positive and negative correlations suggested by the heat maps in Fig. 7 are confirmed for all possible comparisons by Pearson correlation coefficients and scatter plots (Additional file 1: Figure S4). In particular, the five quintiles show near-perfect correlations between the average NDR width, transcription level (estimated by the average NET-seq, Rpb1, and Rpb3 signals on the gene bodies, r = 0.980–0.988), average TBP occupancy at promoters (r = 0.988), and average density of H3K4me3 marks (r = 0.989). All these quantities are almost perfectly anti-correlated with the average nucleosome spacing of the corresponding genes, average enrichment of histone variant H2A.Z in the +1 nucleosomes, and with the average density of linker histone H1 (r = –0.945 to 0.998).
A biophysical model for nucleosome positioning
In general, promoters are depleted of nucleosomes and are occupied by other non-histone protein complexes (Fig. 3). This competition among diverse proteins for binding to the same DNA can be modeled using statistical mechanics methods [46, 69, 70]. Briefly, nucleosomes are modeled as a system of non-overlapping one-dimensional (1D) particles of length a = 147 bp. The particles are confined to 1D lattices of different lengths, representing the 16 yeast chromosomes. The energy of the system is given by the sum of one-particle energies (the binding energies corresponding to each nucleosome, u(i), where i represents the coordinate of the corresponding nucleosome dyad) and the two-body interactions, Φ(i, j), where i and j represent the positions of the neighboring nucleosome dyads. We assume a hard-core interaction between the nucleosomes (infinite if nucleosomes overlap, and zero if they do not overlap) which guarantees that only the valid non-overlapping nucleosomal configurations have finite energies and non-vanishing probabilities. This is also known as a Tonks lattice gas model [82]. The effect of non-histone barrier complexes that bind to promoters and create the NDRs is modeled by an external energy barrier that increases the binding energy of nucleosomes located in the promoters, preventing nucleosome formation at these loci. Following the formalism described in [69] and illustrated in Additional file 1: Figure S5, we enumerate all valid nucleosome configurations and their corresponding energies and probabilities in a recursive way. Once we compute the normalization factor of the Boltzmann weights corresponding to each state (the grand canonical partition function, Z), we predict the distribution of nucleosome dyads and the nucleosome occupancy by summing the probabilities of all states that contain a nucleosome at a given position, or a nucleosome covering a specified base pair, respectively. Using the nucleosome distribution on chromosome I for training the model, we calculated the model parameters — the chemical potential, μ, related to the overall nucleosome density, the average nucleosome binding energy, u, and the energy barrier parameters, H, σ, x0 (Fig. 8a) — such that the predicted nucleosome organization on chromosome I fits the data. The fitted parameters were then used to predict the genome-wide nucleosome organization.
Figure 8b shows the predicted nucleosome distribution at the gene ends, which are aligned as in Fig. 4. In this model the effect of the entire environment upon nucleosome formation is reduced to the presence of simple energy barriers at promoters, and the sequence-dependent contribution to nucleosome affinity is completely neglected. It is striking that the model produces a close approximation of the nucleosome phasing pattern. Although the energy barriers are placed only at the 5’ ends of the genes, they are enough to also generate the correct organization at the 3’ ends of the genes (Fig. 8c), which supports the hypothesis that TTSs are not nucleosome phasing elements. Therefore, the nucleosome organization at the 3’ ends of genes is also dictated by phasing elements at the 5’ ends of genes.
In vivo, there are also other processes that cannot be easily modeled within the framework of statistical positioning models. Transcription is a major nucleosome disrupting process [49,50,51, 83], and chromatin remodelers were shown to have a major impact on nucleosome organization [56, 58,59,60]. In order to model these processes, non-equilibrium statistical mechanics models are necessary, but single-molecule experiments that could be used to train such models are not yet available.