Overview of the SPIN method
The overview of the SPIN method is illustrated in Fig. 1a. Our goal is to identify genome-wide spatial compartmentalization patterns of the chromosomes by integrating TSA-seq [14] and DamID [13, 18] data together with Hi-C. TSA-seq and DamID provide complementary information to measure distance and contact frequency between chromosome regions and subnuclear structures. The rationale of including Hi-C is that the pairwise genomic regions spatially interacting with each other more often than expected (from Hi-C) are more likely to share similar spatial compartmentalization patterns. We formulate this objective by using the hidden Markov random field (HMRF) [19, 20], in which nodes represent non-overlapping genomic bins (with a size of 25kb) and edges represent either significant Hi-C interactions or adjacent genomic bins (see the “Methods” section). We assume that each node is associated with an unobserved spatial localization state that SPIN aims to reveal (“SPIN state”). The observations on each node include signals from TSA-seq and DamID for defined nuclear compartments. Given the observed data on each genomic bin across the entire genome, the goal of SPIN is to solve the estimation problem by maximizing the likelihood of assigning spatial compartmentalization states. Thus, the output of SPIN contains spatial localization state assignment, which is originally hidden, for each genomic bin throughout the genome.
SPIN is different from previous methods for chromatin domain segmentation based on a hidden Markov model (HMM) [18, 21, 22] where chromatin interaction is not utilized. Although SPIN shares similarity in its goal with Segway-GBR [23], the regularization in Segway-GBR uses significant Hi-C interactions as prior such that pairs of interacting genomic loci are encouraged to have the same label in genome annotation, which is not necessarily appropriate for refined spatial localization states of the chromosomes (see Additional file 1: Supplementary Results for more detailed comparisons). The transition probabilities between different states learned in SPIN generalize such constraints so that chromatin regions in spatial proximity would be assigned with states that correspond to similar but not necessarily the same localization. In addition to showing advantage of SPIN over other approaches using both simulation and real data (see Additional file 1: Supplementary Results), we validate our method by demonstrating that the SPIN states stratify functional genomic data and provide spatial interpretation for other 3D genome features, advancing our understanding of nuclear compartmentalization patterns (see later sections).
Note that in principle the input of SPIN on each genomic bin can also include functional genomic signals such as histone modifications, replication timing, and transcription levels. However, in this work, we explicitly limit the input signals to those that directly measure the spatial position of chromatin (TSA-seq and DamID) and use functional genomic data to evaluate the functional correlations of different SPIN states genome-wide.
SPIN identifies genome-wide patterns of nuclear compartmentalization
In this implementation of SPIN to infer genome-wide nuclear compartmentalization patterns, we used TSA-seq and DamID mapping data in K562 for nuclear speckles (SON TSA-seq) and the nuclear lamina (Lamin-B1 DamID and TSA-seq) [14, 24]. In addition, we generated new DamID maps using a Dam methylase fusion with a nucleolar targeting peptide repeat (4xAP3 [25]). We interpret these AP3-DamID data as putative nucleolus interactions; this will be analyzed in more detail elsewhere (van Schaik et al. manuscript in prep.). Hi-C data for K562 are from [12]. Details of data processing for TSA-seq, DamID, and Hi-C are in Additional file 1: Supplementary Methods. We partition the genome (chromosome 1–22 and X) into consecutive non-overlapping 25kb bins, which constitute the graph structure for the HMRF model in SPIN. Edges are derived from Hi-C and the adjacent genomic bins (including those caused by large-scale structural variants in K562; Methods and Additional file 1: Supplementary Methods). Figure 1b shows an example of the input signals from different measurements and the SPIN state annotations.
We identified 10 SPIN states that represent major nuclear compartmentalization patterns in K562. These 10 SPIN states are as follows: Speckle, Interior Active 1, 2, 3 (Interior_Act1, Interior_Act2, Interior_Act3), Interior Repressive 1, 2 (Interior_Repr1, Interior_Repr2), Near Lamina 1, 2 (Near_Lm1, Near_Lm2), Lamina_Like, and Lamina. The genome-wide percentage of each state is shown in Fig. 1c. The names of these states are partially informed by comparison to various functional genomic data, especially for the Interior states (see later), even though the input for SPIN does not use any functional genomic data. Note that we assessed the reliability of the identified SPIN states by using different TSA-seq, DamID, and Hi-C replicates as input and found that the states are highly consistent (Additional file 2: Figure S1a). In addition, we assessed the robustness of SPIN states based on its sensitivity to random initialization and found that SPIN can achieve consistent genome partitioning with randomly initialized states (Additional file 2: Figure S1b).
In Fig. 1c and Additional file 2: Figure S2, we show that each SPIN state has distinct distributions of TSA-seq and DamID signals for the input data, reflecting the spatial position for compartmentalization. For example, the Speckle state has the highest SON TSA-seq signals and the lowest lamina/nucleolus signals as compared to other states. Notably, although we group multiple states into larger classes such as Interior Active, Interior Repressive, and Near Lamina, the refined states do show their distinct patterns. For example, the Interior_Repr2 state has similar Lamin-B1 DamID and TSA-seq signals as compared to Interior_Repr1, but its nucleolus DamID score is significantly higher while its SON TSA-seq score is significantly lower (p value < 2.2E −16). A recent report identified nucleolus associated domains (NADs) in mouse embryonic fibroblasts and found that there are two types of NADs [26]: type I NADs localize more frequently with both nucleoli and nuclear lamina and type II NADs localize with nucleoli but do not overlap with lamina. We also observed such distinctions related to nucleoli from our SPIN states for spatial localization. The Interior_Repr2 state has similar enrichment of nucleolous DamID scores as compared to the Near_Lm1-2, Lamina_Like, and Lamina states, but the Interior_Repr2 state has significantly lower enrichment with Lamin-B1 DamID and TSA-seq (Fig. 1c and Additional file 2: Figure S2) (p value < 2.2E-16).
The identified SPIN states provide a comprehensive view of the spatial localization of the chromosomes in the nucleus relative to multiple subnuclear compartments, (see the cartoon in Fig. 1d). We compared the SPIN states to DNA FISH (fluorescence in situ hybridization) imaging data. In Fig. 1e, we show three genomic regions (with detailed genomic coordinates) that correspond to different SPIN states with comparisons to DNA FISH data from [14]. The probe in the FISH image of the Speckle state has an average distance of 0.16 μm from nuclear speckle (SON protein). The probe in the FISH image of the Lamina state has an average distance of 0.98 μm from nuclear speckles and is located <0.5μm from the nuclear periphery. The probe in the FISH image of the Interior_Act1 state has an average distance of 0.47 μm from nuclear speckles. The comparison further suggests the reliability and advantage of having genome-wide SPIN states relative to multiple nuclear compartments.
SPIN states provide spatial interpretation for Hi-C subcompartments
The five primary Hi-C subcompartments (A1, A2, B1, B2, and B3) defined from [12], which exhibit strong associations with various genomic and epigenomic features, provide more detailed compartmentalization patterns from Hi-C data than the binary A/B compartment separation. However, the spatial localization context of Hi-C subcompartments has not been clearly revealed except that [14] used the Hi-C subcompartments to identify the two transcriptional hot-zones based on TSA-seq scores by comparing to A1/A2 subcompartments, suggesting that the A1 subcompartment was significantly closer than the A2 subcompartment to nuclear speckles (Hi-C subcompartments defined in GM12878 which has extremely high coverage Hi-C data). The recently developed algorithm SNIPER [15] facilitates the identification of subcompartments in Hi-C data with low to moderate coverage and provides Hi-C subcompartment annotations specifically in K562. Here, we directly compare the 10 SPIN states with Hi-C subcompartments in K562.
Figure 2a shows the overall comparison of Hi-C subcompartments in different SPIN states. Specifically, we found that the Speckle and Interior_Act1 states are strongly associated with A1 subcompartment (fold change enrichment 8.5 and 4.7, p value < 2.2E −16; Additional file 1: Supplementary Methods). Interior_Act2 is strongly associated with A1, A2, and B1 subcompartments (fold change enrichment 3.8, 3.2, and 3.6, respectively, p value < 2.2E −16). Interior_Act3 is enriched with A2 subcompartment (fold change enrichment 3.1, p value < 2.2E −16). The Interior_Repr1 and Interior_Repr2 states overlap more with B1 subcompartment (fold change enrichment 2.8, and 5.3, p value < 2.2E −16). We found that the Lamina_Like state is strongly enriched with B2 subcompartment (fold change enrichment 4.95, p value < 2.2E −16), while Lamina state is associated with both B2 and B3 subcompartments (fold change enrichment 3.16 and 5.58, p value < 2.2E −16). Together, different SPIN states have a strong correlation with different Hi-C subcompartments, supporting that Hi-C subcompartments reflect spatial positions relative to nuclear structures. However, the SPIN states offer a much more direct and refined interpretation of Hi-C subcompartments in terms of spatial compartmentalization. For example, although the Speckle, Interior_Act1, and Interior_Act2 states are all enriched with A1 subcompartment, they show distinguishable distributions regarding SON TSA-seq signals (Fig. 1c). This suggests that SPIN is able to further subdivide Hi-C subcompartment annotations into additional distinguishable spatial states of nuclear compartmentalization.
SPIN states stratify patterns of transcription activity and histone modification
Earlier studies have shown the correlation between the genome compartmentalization patterns and transcriptional activities [10, 12, 15]. We sought to assess whether the SPIN states, which offer more detailed compartmentalization patterns, further stratify the transcriptional activity based on spatial locations of the chromatin. We first compared the SPIN states with 11 histone modification ChIP-seq datasets in K562 from the ENCODE project, including H2A.Z, H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K27me3, H3K36me3, H3K79me2, H3K9ac, H3K9me1, and H3K9me3. Overall, we found that the SPIN states have a strong correlation with histone modifications (Fig. 2a). In addition, we also show that by adding Hi-C data into the SPIN model we can achieve state calling to better stratify histone modifications as compared to the baseline HMM-based model (Additional file 2: Figure S3). From the SPIN states, as chromatin localization changes from nuclear periphery to the interior (i.e., the lamina to speckle axis), we observed a dramatic increase of ChIP-seq signal p value of active histone marks (e.g., H3K27ac, H3K4me1, H3K4me3, H3K9ac) and, in general, a decrease in repressive marks (e.g., H3K9me3) (Additional file 2: Figure S4). Specifically, histone marks that are associated with transcriptional activation, including H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K27ac, and H3K79me2, show a dramatic increase of signal in the Speckle state (> 5-fold increase on average, p value < 2.2E −16) (Fig. 2a). This result is consistent with previous studies that transcriptionally active chromatin regions are spatially localized preferentially near nuclear speckles and toward the interior [8, 14]. In contrast, heterochromatin mark H3K9me3 shows stronger presence in the Lamina state (p value < 2.2E −16), agreeing with earlier reports that LADs are often heterochromatic with inactive genes [8, 18, 21]. In addition, the H3K27me3 mark, known to be associated with repressed transcription [27], is more abundant in Interior_Repr2 (p value < 2.2E −16) compared with the Near_Lm1 and Interior_Repr1 states (Fig. 2a). Importantly, we found that there is an increase of nucleolus DamID signals in Interior_Repr2 compared with the Interior_Repr1 and also the Interior_Act states (Fig. 1c), suggesting a possible localization preference of Interior_Repr2 toward the nucleolus even though Interior_Repr1 and Interior_Repr2 have overall similar distance to nuclear lamina (Fig. 1c). This is in concordance with the recent report that H3K27me3 marks are enriched on type II NADs [26], which are found associated with nucleoli but not with LADs. Our results suggest that SPIN states reflect different associations with histone marks, and chromatin enriched for H3K27me3 and H3K9me3 have distinct spatial localization preferences.
To further demonstrate that the SPIN states clearly stratify functional genomic signals, we analyzed the patterns of histone modification signals across the transition boundaries between neighboring SPIN states. We selected the top six boundary types that are most frequently observed (Fig. 2b). Since SPIN states do not distinguish DNA strands, we therefore merged transition patterns in both directions on the genome. For example, Lamina to Near_Lm2 transition and Near_Lm2 to Lamina transition are considered as the same type of transition boundary for this analysis. For each transition type, we calculated the average histone modification signals at +/- 200 kb surrounding the transition boundaries. We found that many histone modifications show a clear, dramatic change across the SPIN state transition boundaries (Fig. 2c, Additional file 2: Figure S5). In particular, the active histone marks, such as H3K4me1 and H3K27ac, show a pronounced, > 2-fold signal increase, when the chromatin trajectory is going from Interior_Act1 to Speckle. H3K9me1 signals exhibit a gradual rather than a sharp increase across the transition boundaries (Additional file 2: Figure S5). Additionally, we observed the opposite trend of signal enrichment across the boundaries for the repressive marks such as H3K27me3 and H3K9me3 (Fig. 2c). We further compared histone mark changes at the transition boundaries of SPIN states when the boundaries were defined by Hi-C subcompartments in K562 [15]. This reveals a sharper transition of histone marks at SPIN state boundaries as compared to Hi-C subcompartments (Additional file 2: Figure S5; especially H3K9me1, H3K9me3, H3K4me1, H3K4me2, H3K4me3), further suggesting that SPIN states offer a more accurate and refined definition of nuclear compartmentalization as compared to Hi-C subcompartments.
Next, we explored how transcription activity varies in different SPIN states. We compared SPIN states with GRO-seq data [28] that measures through run-on transcription the density of engaged RNA Pol2 polymerases across protein coding genes in K562 (Fig. 2d). We found that genes in the Speckle and Interior_Act states have high transcription levels, as expected, with average FPKM > 40 for these nascent transcripts. The majority (over 90%) of the top 10% actively transcribed genes are from the Speckle or Interior_Act states. In contrast, genes in the Lamina and Interior Repr states are highly repressed. In addition, as shown in Fig. 2e for the consecutive SPIN states, nascent transcription in Interior_Act vs. Interior_Repr exhibit significant difference (p value < 2.2E −16), despite the fact that both states are likely to localize at relatively similar radial positions in the nucleus (based on TSA-seq). Also, genes in the Near_Lm and Lamina_Like states have higher transcription compared with the Lamina state (p value = 5.689E −11). These analyses suggest that the SPIN states demarcate spatial patterns of chromosome regions in fine-scale separated into transcriptionally active and repressed regions.
SPIN states are predictive of DNA replication timing
DNA replication timing (RT) is a vital genome function that is highly aligned with large-scale compartmentalization [29]. To further evaluate the functional connections of the SPIN states, we generated 7-fraction Repli-seq data for K562, where each fraction corresponds to the DNA replicated during 6 stages of S-phase (i.e., S1 to S6) as well as the stage entering G2-phase representing the latest DNA to replicate (Additional file 1: Supplementary Methods). For each genomic bin (5kb), we calculated the percentage of DNA replicated in each fraction (among all 7 fractions) which was used to compute the fold change score of SPIN states on different replication fractions genome-wide. We found that RT can be clearly stratified by the SPIN states (Fig. 3a). The Speckle and Interior_Act states are found in early replicated regions (S1, S2, fold change score > 1.5). The Interior_Repr1, Interior_Repr2, Near_Lm1, and Near_Lm2 states are replicated in the middle of S phase (S3, S4, fold change score > 1.3). The Lamina_Like and Lamina states are replicated late (S5, S6, and G2, fold change score > 1.3). Overall, the SPIN states show a striking separation of the multi-fraction Repli-seq. In addition, using the definition of constitutive and developmentally regulated replication domains (RDs) [29], we found the SPIN states have distinct correlation with different patterns of constitutive and developmental RDs (Fig. 3b). Constitutive RDs can be further separated as constitutive early (CE) and constitutive late (CL) domains. We found that 85% of the genomic regions in the Speckle state are CE and 55% of the genomic regions in the Lamina state are CL. In contrast, other SPIN states contain a higher proportion of developmentally regulated RDs. In Fig. 3c, we show that the SPIN states also correlate with evolutionary patterns of RT between human and non-human primates based on the annotations from [30]. Here, the RT patterns are separated in five groups based on their conservation across primates: early (all primate species have early RT), late (all primate species have late RT), weakly early (4 out of 5 species have early RT), weakly late (4 out of 5 species have late RT), and unconserved (the rest). We found that 96% of the genomic regions in Speckle states have conserved early RT pattern. In addition, the genomic regions in three interior_Act states mostly have conserved early RT. In contrast, 56% of Lamina state regions have conserved late RT. Similar to our observation of constitutive and developmentally regulated RDs, other SPIN states overlap more with unconserved RT. Collectively, these analyses reveal a strong correlation between the detailed nuclear spatial compartmentalization identified by SPIN and the DNA RT program as well as its constitutive patterns in different cell types and across different species.
Next, we sought to investigate the functional significance of SPIN states in terms of how important the SPIN states are, among other epigenomic features, in predicting RT. We built a predictive framework based on a random forest regression model to predict the multi-fraction Repli-seq signals along the genome by using the SPIN states together with various histone mark data (Fig. 3d) (see the “Methods” section). We specifically calculated the importance of each input feature based on how much each feature decreases the weighted impurity in a decision tree in the random forest. We found that the SPIN state is the most important feature, followed by H3K9me1, H3K9ac, H3K4me1, and H3K36me3, which are the top 5 most informative features (Additional file 2: Figure S6).
Together, the comparison with multi-fraction Repli-seq demonstrates that, by integrating different nuclear genome mapping data (TSA-seq, DamID, and Hi-C), the SPIN states delineate the detailed connections between nuclear compartmentalization and replication timing.
SPIN states offer new perspectives for other nuclear organization units
We evaluated the significance of the SPIN states with respect to providing new insights for other nuclear genome features. We assessed the interplay between the SPIN states and the known 3D genome structural features, including LADs, TADs, and chromatin loops. We also asked whether the SPIN states are indicative of the constitutive patterns of nuclear compartmentalization across different cell types.
By combining TSA-seq and DamID, we identified several types of nuclear periphery states with different localization relative to the nuclear lamina (Lamina, Lamina_Like, and Near_Lm1-2, Fig. 1c). To further assess how each SPIN state corresponds to LADs across multiple cell types, we used Lamin-B1 DamID data in 6 human cell lines from the 4DN portal, including HCT116, K562, RPE-hTERT, HAP-1, HFFc6, and H1-hESC (see Additional file 3: Table S1). Based on the assignment of LADs in 6 cell lines, we separated LADs into two categories: constitutive LADs (cLADs) and facultative LADs (fLADs). cLADs are defined as genomic regions characterized as LADs in at least 5 out of 6 cell lines. fLADs are defined as genomic regions characterized as LADs in at least 2 but fewer than 5 cell lines. In Fig. 4a, we show that there is a large difference between cLADs and fLADs in terms of the overlap with different SPIN states. Seventy-one percent of cLADs are in the Lamina state in K562 as well as 22% in Near_Lm2, 3% in Lamina_Like, and 3% in Near_Lm1 in K562. In contrast, for fLADs, only 23% are in K562 Lamina state, but 39% are in Near_Lm2 and 23% are in Near_Lm1 in K562. This suggests that the SPIN states in one cell type (i.e., K562 in this study) can already separate fLADs and cLADs, as well as extending the concept of LADs into two separate categories. These results are consistent with the recently reported HiLands chromatin domains relative to the nuclear lamina based on both Lamin-B1 DamID and histone marks (in mES cells) [21, 31], where HiLands-B and HiLands-P are two distinct chromatin states that correspond to the facultative and constitutive LADs. For the two types of LADs, both the Lamina SPIN state and HiLands-P have higher Lamin-B1 DamID signals and higher H3K9me3 modification, while the Near_Lm SPIN state and HiLands-B have lower Lamin-B1 DamID signals and higher H3K27me3 enrichment.
Next, we compared the SPIN states with ChIA-PET/Hi-C chromatin loops and TADs derived from Hi-C. For CTCF-mediated ChIA-PET chromatin loops, we discarded loops within 25kb range to focus on longer-range interactions. We found that most loops are formed within the same SPIN states with more loops toward the interior states with higher transcriptional activity (Fig. 4b). We also observed similar results in Pol2-mediated ChIA-PET loops (Additional file 2: Figure S7a). In addition to ChIA-PET loops, we compared the SPIN states to the Hi-C loops in K562 identified by HiCCUPS [12] (Additional file 2: Figure S7c). Similar to the comparisons with ChIA-PET loops, the majority (81.4%) of the HiCCUPS-identified chromatin loops are formed by genomic loci from the same SPIN states (Additional file 2: Figure S7b-c).
For TADs defined by the directionality index (DI) method [16], we found that the TADs tend to stay within the same SPIN state. Specifically, 82.3% of TADs have only one SPIN state labeled. It is rare (0.4%) that one TAD spans more than two different SPIN states (p value < 2.2E −16). Importantly, we observed the increase of TAD size when the SPIN state trajectory is changing from the nuclear interior to the periphery, with the average TAD size as 1.12Mb in the Lamina state and 0.19 Mb in the Speckle state, respectively (Fig. 4c, Fig. 4e, and Additional file 2: Figure S8). The boundaries of the SPIN states are close to TAD boundaries and CTCF peaks than expected at random (Fig. 4d) (p value < 2.2E −16). We calculated the Hi-C insulation score [32] to represent TAD boundary strength. We found that the insulation score on the Speckle state is on average two times higher than the score in the Lamina state, indicating that there are stronger TAD/subTAD boundaries in the Speckle states (Additional file 1: Supplementary Methods). In addition, we performed analysis for TAD-TAD level interactions and showed that TADs from the same SPIN state tend to form spatially separated cliques (i.e., a group of TADs with long-range interactions; see Additional file 2: Figure S9 and Additional file 1: Supplementary Results). Taken together, these results show that SPIN states stratify TADs and chromatin loops by providing spatial context.
We sought to analyze how conserved the spatial compartmentalization patterns are across different cell types. Here, we use Hi-C subcompartments in multiple cell lines as an estimation of chromosome spatial localization in different cell types. As we have already shown, Hi-C subcompartments are highly correlated with SPIN states although the SPIN states provide more detailed and explicit compartmentalization views relative to subnuclear bodies. We compared the SPIN states in K562 with the SNIPER Hi-C subcompartments across 9 human cell types [15], including K562, GM12878, HAP1, HeLa, HMEC, HSPC, HUVEC, IMR90, and TCell. We calculated the SNIPER entropy score as the metric of conservation for Hi-C subcompartments across cell lines (Additional file 2: Figure S10). The Speckle state has the lowest SNIPER entropy score (0.1 on average), strongly suggesting that Hi-C subcompartments on Speckle (mostly A1) are largely conserved across cell lines (Fig. 4e). The Lamina_Like and Lamina states (mostly B2 and B3 subcompartments) also have relatively low SNIPER entropy scores (0.75 on average). The most dynamic SPIN states across cell types are Interior_Repr2 and Near_Lm1. This comparison with cross-cell type SNIPER Hi-C subcompartments suggests that different SPIN states have distinct patterns across cell types with Speckle being the most conserved state.