A novel TAD detector (deTOKI) for ultra-low-resolution Hi-C data
Using ultra-sparse Hi-C contact matrices, we developed a novel algorithm, named deTOKI, to detect TAD-like domain structures, a term we use hereinafter to avoid confusion. The deTOKI takes advantage of a key property of TADs, namely that its topology distribution is relatively consistent with respect to number and length between cell types [14]. Briefly, for any given genome segment, deTOKI applies non-negative matrix factorization (NMF) to decompose the Hi-C contact matrix into genome domains that may be spatially segregated in 3D space (Fig. 1a). Non-negative matrix factorization is an algorithm for decomposition of a non-negative matrix into a product of two non-negative matrices, in which the n_components represent the common dimensions between the two decomposed matrices. As the n_components normally are substantially smaller than the dimensions of the origin matrix, NMF is a commonly used algorithm to perform dimension reduction [46]. To speed up the algorithm, deTOKI divides the chromosomes into 8-Mb sliding windows, overlapping each 4 Mb, and the clusters from the second Mb to the sixth MB of each window are reported as the predictions (“Methods,” Fig. 1, Additional file 1: Supplemental Note, Additional file 2: Fig. S1-2). The alternative local optimal solutions in the structure ensemble are achieved by summarizing deTOKI’s predictions with multiple random initiations (see “Methods”).
To assess deTOKI’s performance, we focused on two major characteristics of chromatin architecture at the single-cell level, i.e., data sparsity and cell-to-cell variations, and assessed them with downsampled experimental data and with simulated data, respectively. All analyses in this work were performed with binsize = 40 kb, unless otherwise mentioned.
deTOKI worked well in downsampled bulk Hi-C at the single-cell level
To assess the performance of deTOKI under the condition of sparse input, we mainly compared it with two publically available algorithms, i.e., insulation score (IS) [30] and deDoc [42]. These two methods were chosen because they were judged to be the most robust methods with sparse data in our previous comprehensive assessments of TAD predictors [43]. In addition, we also compared it with recently published algorithms designed for sparse data, including SpectralTAD [47], GRiNCH [48], and scHiCluster [49]. These algorithms employ the data imputation method on single-cell Hi-C data and predict domains by TopDom. Sparsity was defined as the proportion of entries in the Hi-C matrix that have value zero after excluding the unmappable genome regions, e.g., centromeres, for a given chromosome. The assessment was done for all chromosomes in 40-kb bins and was downsampled at the rate of 1/800 from the high-resolution Hi-C data [14]. The downsampled dataset consisted of about 0.44 M contacts, mimicking the sequencing depths of public scHi-C datasets, e.g., the median of the data generated by Flyamer and colleagues (hereafter termed Flyamer’s data [50]) was 0.339 M (Fig. 2a, b).
The deTOKI outperformed the other tools in the following two respects. First, compared to the other tools, the number of TAD-like domains predicted by deTOKI and GRiNCH was little affected by data sparsity (Fig. 2a and Additional file 2: Fig. S3b). Taking chr10 as an example, the largest absolute log2 fold changes (|log2FC|) in the number of predicted TAD-like domains among the downsampled datasets was 0.26 for GRiNCH and deTOKI, while it was 0.51, 0.80, 1.38, and 2.40 for scHiCluster, IS, SpectralTAD, and deDoc, respectively (Fig. 2a). Second, on single-cell data, deTOKI predicted TAD-like domains more accurately than all other predictors. We took the TADs identified with the full data as the gold standard and quantified the accuracy of predictions by the similarity to the gold standard. Two similarity indexes, i.e., adjusted mutual information (AMI) [51] and weighted similarity (WS) [42], were employed. The deTOKI values had higher similarity than all of the other algorithms for both indexes in most chromosomes, i.e., in 19 and 16 out of 22 chromosomes for AMI and WS, respectively (Fig. 2b). We also employed two different indexes, BP score (BP) [52] and variation of information (VI) [38]. Although IS performed best with the two indexes (Additional file 2: Fig. S3a), deTOKI performed comparably well in all chromosomes (median BP = 0.49 and 0.51; median VI = 1.58 and 1.61, for IS and deTOKI, respectively).
Moreover, when we performed an additional assessment with binsizes 20 kb and 80 kb, deTOKI performed equally well with binsize 40 kb, as we described above, and better than the other tools (Additional file 2: Fig. S3b-c). Finally, marks of the characteristic structural protein CTCF, or histones, were found to be enriched in deTOKI-predicted TAD-like domain boundaries. Compared to the genomic background, ChIP-seq peaks of H3K4me3, H3K36me3, and CTCF were enriched at the TAD-like domain boundary regions predicted by deTOKI, IS, scHiCluster, and deDoc, while such enrichment was barely seen in those boundary regions predicted by GRiNCH and SpectralTAD (Additional file 2: Fig. S3d). These observations further supported the accuracy of the predictions. Taken together, our assessments suggest that deTOKI can stably and accurately predict TAD-like domains with ultra-low-resolution (i.e., single-cell level) Hi-C data.
deTOKI worked well with simulated single-cell Hi-C data
To mimic cell-to-cell variation, we simulated a single-cell Hi-C experiment. The simulated data were generated according to the following protocol. First, we simulated chromosome structures for single cells. By applying a widely used 3D structure modeling tool known as IMP on the bulk Hi-C data, we modeled a 3D chromosome structure ensemble containing about 100 physical chromosome structure models such that each model represented a single cell (Fig. 2c) [53]. To simplify the simulation, we assumed that each modeled structure in the ensemble would be evenly distributed within the cell population. We randomly chose a 5-Mb-long genome region, i.e., chr18:50–55 Mb, as an example. To generate single-cell Hi-C data from the physical 3D model, we defined the Hi-C contacts as pairs of genome loci with a Euclidean distance less than a threshold (D) in the model. Hi-C reads were then sampled from the contacted genomic loci by a binomial distribution (see “Methods”). In this work, we tested three threshold Ds, i.e., 500, 750, and 1000, representing 20%, 40%, and 60% quantiles, respectively, in the distribution of distances among all genomic loci pairs in the physical model (Additional file 2: Fig. S3g). To define the true domains (reference) in the given single cell, a sufficient number of reads were sampled from the physical model with the sampling probability function of a pair of interacting genome loci being inversely proportional to their Euclidean distance (see “Methods”). The expected number of reads sequenced from the loci was calculated by normalizing all weights, i.e., contact probabilities, in a genome-wide manner, and the Hi-C reads were then sampled from those genome loci by Poisson distribution.
The deTOKI can accurately predict domain structures in simulated single-cell Hi-C data. With the method described above, we simulated the structures of a 5-MB region in 100 single cells and generated about 1000 and 0.35 M Hi-C contacts for each single cell and the reference Hi-C, respectively. We compared the accuracies (i.e., AMI and WS) of the predictions of the predictors, and when estimated by AMI, we found that deTOKI had significantly higher accuracy than the other tools (Mann-Whitney U test, P < 0.001) for D = 500 and 750. With D = 1000, IS had the best performance (Fig. 2d); however, the median values of AMI and WS were similar for IS and deTOKI (AMI median = 0.715 and 0.708; WS median = 0.789 and 0.753, respectively). We also employed BP and VI to measure the differences between the predicted domains and the reference. deTOKI and IS also performed best with these two indexes (Additional file 2: Fig. S3f). This pattern was also seen in an additional randomly selected genome region (chr18:10–15 Mb, Additional file 2: Fig. S3f and h). For example, in model #13 and with D = 500, deTOKI-identified domains in simulated single cells matched very well with the associated reference Hi-C, while several major domains were mislabeled using the IS predictor (Fig. 2e).
Single cells could also be accurately classified by deTOKI-predicted domains. As an example, we took the two 5-Mb regions of chr11 to represent two types of cells since their separation by 40 Mb on the chromosome would result in few connections. For the 100 simulated models of the two regions, representing two cell types, and using WS as distance, deTOKI-predicted domains had better classification power for distinguishing the two cell types than all other tools, except SpectralTAD and scHiCluster (Fig. 2f). If we run deTOKI at imputed data from scHiCluster, we can get the best classification (Additional file 2: Fig. S10d). Furthermore, the total number of misclassified cells of deTOKI was lower than that of IS (Fig. 2g). The success of deTOKI as predictor on the simulated data encouraged us to further assess if the tool would work equally well on experimental single-cell Hi-C data.
deTOKI predicts TAD-like domains with experimental scHi-C data
Next, we compared predictions with three experimental scHi-C datasets, hereinafter denoted as Flyamer’s, Tan’s, and Li’s datasets (Additional file 3-5: Table S1-3) [50, 54, 55]. We only compared deTOKI with IS, scHiCluster, and deDoc as the latter three were shown to perform relatively well with the simulated sparse data above. We found deTOKI’s predictions to be both more accurate and more stable than those of the three other tools.
First, deTOKI predicted TAD-like domains with higher modularity and lower structure entropy. The modularity and structure entropy of a network have previously been used to infer the topological properties of TADs from the Hi-C contact matrix [41, 42]. A better defined TAD set is expected to have smaller structure entropy [42] and larger modularity [41]. With Tan’s and Flyamer’s datasets, deTOKI predicted TAD-like domains with lower structure entropies and higher modularities than those of TAD-like domains predicted by IS, scHiCluster, or deDoc (Fig. 3a, Additional file 2: Fig. S4a). For example, when we compared the predictions of IS, scHiCluster, and deDoc in Tan’s data against deTOKI-predicted TAD-like domains, we found that the TAD-like domains in chr1 had higher modularity and lower structure entropy in all cells, respectively (Fig. 3a). In Li’s data, deTOKI also performed best of the four predictors, having the highest modularity and lowest structure entropy in 77 and 73 cells, respectively (Additional file 2: Fig. S4a).
Second, the structural proteins and histone marks were more enriched at the deTOKI-predicted TAD-like domain boundaries in real single-cell data. By aggregating the ChIP-seq signals at the predicted TAD-like domain boundaries in all single GM12878 cells, we found that the deTOKI-predicted TAD-like domain boundaries had higher enrichment of CTCF and Rad21(cohesin) compared to IS, scHiCluster, and deDoc (Fig. 3b). This was also true for H3K36me3 and H3K4me3, the two histone marks previously reported to be enriched in the ensemble TAD boundaries (Fig. 3c) [14].
Third, deTOKI-predicted single-cell TAD-like domains were more consistent with the modeled physical structures. Xie and colleagues modeled the physical structure of the haploid chromosomes of single GM12878 cells at 10 kb resolution and proposed an algorithm to infer the chromosome domains from the hierarchical physical structure [54]. Using this haploid physical model and algorithm, we inferred the chromosome domains in a randomly chosen genome region (chr10:4–16 M, see “Methods”). Compared with the deTOKI-predicted haploid single-cell TAD-like domains, we found that deTOKI-predicted single-cell TAD-like domain boundaries matched the 3D modeling very well (Fig. 3d). Using AMI and WS as the indexes, we compared the 3D-modeled hierarchical domains with the TAD-like domains predicted by the three predictors [54] (Fig. 3e). In both maternal and paternal chromosomes, the AMIs of deTOKI’s prediction were significantly higher than those predicted by IS (P = 0.04 and 0.02, respectively, two-sided Wilcoxon rank-sum test). The WS of deTOKI’s prediction was also higher than that predicted by IS (P = 0.1 and 0.35, respectively). As the total number of cells and TAD-like domains in this comparison was small, i.e., about 15–25 TAD-like domains, we think the significance of the WS was acceptable.
Last, deTOKI exhibited a more stable performance compared to IS. Using chr1 in PBMC cell #14 as an example, we performed 20 rounds of 50% downsampling on the single-cell Hi-C reads and predicted TAD-like domains from the downsampled data. Overall, the predictions of both predictors remained largely intact. For example, the distribution of TAD-like domain lengths remained similar between the full and the 50% downsampled data (Additional file 2: Fig. S4b-c). In terms of AMI and WS, deTOKI, IS, and scHiCluster performed equally well, i.e., AMI = 0.90, WS = 0.85 and AMI = 0.90, WS = 0.87 and AMI = 0.89, WS = 0.87, respectively (Additional file 2: Fig. S4d-e). The AMI and WS of deDoc were 0.80 and 0.71, which are lower values than those of deTOKI. However, deTOKI outperformed IS in two respects. First, the number of deTOKI-predicted TAD-like domains relied much less on reads coverage compared to IS. With Li’s data, the number of IS-predicted TAD-like domains was strongly correlated with reads coverage on all chromosomes, while for deTOKI, only a moderate correlation in this respect was found on five out of nineteen chromosomes (Fig. 3f). Second, IS predicted more questionable mini-TAD-like domains, i.e., length < 100 kb. Mini-TAD-like domains were typically found in the ultra-sparse region, depending on reads coverage. Within all the TAD-like domains, 0.29% and 9.81% were considered as mini-TAD-like domains, as predicted by deTOKI and IS (Fig. 3g and Additional file 2: Fig. S4f-g), in the single cell, respectively.
Taken together, our assessment suggests that deTOKI works well with experimental single-cell Hi-C data.
The improvement of deTOKI with data imputation
Data imputation is a commonly used strategy when handling single-cell data, e.g., scHiCluster [49] and Higashi [56]. We assessed the performance of deTOKI running on imputed data by comparing deTOKI with data imputation and Higashi (Additional file 1: Supplemental Note, Additional file 2: Fig. S10). We found that deTOKI and Higashi performed in a similar manner and that deTOKI could be further improved by data imputation, e.g., scHiCluster. Considering the substantial CPU time required by Higashi (about 100-fold more CPU time than that required by deTOKI or scHiCluster; see Additional file 6: Table S4), deTOKI is more efficient on TAD-like domain identification with single-cell Hi-C data than Higashi.
TAD-like domain structure is highly dynamic at the single-cell level
Using AMI as the index, we investigated the cell-to-ensemble and cell-to-cell similarity of TAD-like domains. Therefore, when we compared cell-to-ensemble and cell-to-cell AMIs in Tan’s data (GM12878 single cells, chr1), we found TAD-like domains in single cells to be more similar to ensemble than individual cells based on the comparison of AMIs (Fig. 4a, Additional file 2: Fig. S5a-b). In other words, cell-to-ensemble AMIs were significantly higher than cell-to-cell AMIs in all three scHi-C datasets tested. Intriguingly, the average cell-to-cell AMI is even smaller than the cell-to-ensemble AMI of another cell type, e.g., single cells of GM12878 vs. ensemble of K562 (Fig. 4a). For example, the AMI of cell (GM12878)-to-ensemble (K562) and cell-to-cell (GM12878) AMI are 0.858 and 0.848, respectively (two-sided Wilcoxon rank-sum test, P < 0.001, Fig. 4a). Thus, our data suggested that the TAD-like domain structure in single cells is quite dynamic, even bigger than inter-cell-type variation. The pattern we showed above is not specific to GM12878, as it can also be seen in the other two tested single-cell Hi-C datasets (Additional file 2: Fig. S5a-b). We note that the average AMI between single cells of GM12878 and ensemble of GM12878 is significantly higher than that of ensemble of K562 (Fig. 4a). We tested the assumption that TAD-like domain structure carries information for cell identity in the section subtitled “TAD-like domain structure carries information for cell identity” below.
Considering that TADs are conserved between cell types [14], two possible scenarios may explain the above high-level dynamics of the TAD-like domain structure in single cells. First, each individual cell employs a subset of the ensemble TADs. Second, each cell has a certain number of additional cell-specific TAD-like domains. To test which scenario is the more prevalent in the cell populations tested, we roughly defined three types of variations between TAD-like domains, namely, merge, split, and shift (see Additional file 2: Fig. S5c and “Methods”), where merge does not generate novel TAD-like domain boundaries, while split and shift do. Using chr1 as an example, we found, on average, 31.8%, 22.3%, and 26.6% of merge, split, and shift TAD-like domains, respectively (Additional file 2: Fig. S5c-e), implying that a notable number of TAD-like domain boundaries do not appear in the ensemble TAD structures. We term such boundaries as single-cell-specific boundaries (scSB). In the next two sections, we will sequentially discuss the dynamics of ensemble boundaries and scSBs.
Unnested ensemble TADs were frequently seen in single GM12878 cells.
We asked whether the ensemble TAD boundaries were purely randomly distributed in single cells. A simple assumption for this randomness would be that the distribution of the ensemble TAD boundaries is binomial in the cell population. To examine this assumption, we chose Li’s data as an example and modeled the distribution with a binomial B (150,0.06) [55], where 150 is the number of cells and 0.06 is the average frequency with which an ensemble TAD boundary appears in a single cell. We found that 453 and 452 boundaries (out of 2602) appeared in more than 12 and in less than 6 cells, respectively (Fig. 4b). Those numbers significantly deviate from the expectation of binomial null hypothesis (P < 0.001). This finding suggests that a group of ensemble TAD boundaries, termed as popular boundaries, occurs more frequently in the cells, while another group of boundaries, termed as unpopular boundaries, tends to be specific to a subpopulation of the cells. GO analysis showed that genes on the popular boundaries are enriched for terms related to cellular responses to DNA damage stimuli (P = 2.21E−3), while genes on the unpopular boundaries are enriched for terms related to negative regulation of cell-matrix adhesion (P = 1.52E−4, Additional file 2: Fig. S6d-f). This result further supported the assumption of a nonrandom distribution of ensemble TAD boundaries in single cells.
Both nested and unnested TAD boundaries were found in the ensemble [26]. We asked how these two types of boundaries are distributed in single cells. We chose chr1 in the GM12878 cells Hi-C data (termed hereinafter as Rao’s data [18]) as an example. We defined the nested and unnested boundaries and compartment domains, as previously described (see “Methods” [26]). Interestingly, by comparing the number of cells that carry such boundaries, we found that unnested boundaries were significantly enriched in single cells. In the 15 single cells, the 20 nested ensemble TAD boundaries appeared 14 times, while the 20 unnested ensemble boundaries appeared 44 times, being significantly more common than nested ones (P value = 0.003, two-sided Wilcoxon rank-sum test, Additional file 2: Fig. S6a-c). Taken together, our analysis suggested that ensemble TADs are dynamic in nature and that unnested ensemble TAD boundaries are more frequently chosen in single GM12878 cells.
Single-cell-specific TAD-like domain boundaries may adhere to the ensemble boundaries
The scSBs may not result entirely from stochastic fluctuation. First, we identified a large number of single-cell-specific boundaries (scSB) using deTOKI. About 89.3% of TAD-like domain boundaries in single cells were not found in the ensemble if we defined two boundaries as identical when they were in the same bin. Those scSBs were less likely to result from coverage bias, as strong correlation between the scSBs and read coverage was rarely seen (Cor = 0.296, P = 0.284). Because of data sparsity, not all chromosomes found reads in every cell. For this analysis, we therefore only looked at the largest chromosome (chr1) for which reads were found in most cells. The following analysis was performed on the whole genome. Second, the distribution of scSBs in the cell population is not random. We grouped all scSBs into 3 classes by the number of cells that carry these scSBs (number = 1, =2, > 2, denoted as scSB-1, scSB-2, and scSB-m, respectively). Compared to permutated controls, far more scSBs in the scSB-m class were found and far fewer scSBs in the scSB-1 and scSB-2 classes (P < 1E−4, Fig. 4c). Moreover, bins not taken as boundaries for any cells in our data were also more prevalent than permutated controls (“Absent” in Fig. 4c). These results imply that the bins are either deficient or relatively prevalent, i.e., either with too few or too many cells to function as domain boundaries, respectively.
Third, scSBs have characteristic histone marks. We mapped all histone marks (except for H3K4me3 and H3k36me3) that have publically available ChIP-seq data for GM12878 cells in ENCODE. The distribution of histone marks showed either enrichment or depletion at the scSBs, similar to the ensemble TAD boundaries. For example, H3K27me3 and H3K4me1 were enriched and depleted around the boundaries in single cells, respectively (Fig. 4d, e; Additional file 2: Fig. S7a-b). However, this pattern fluctuated with larger variation around ensemble boundaries (Additional file 2: Fig. S7d). We also observed a similar enrichment in IS- and deDoc-identified TAD-like domain boundaries (Additional file 2: Fig. S7a-b). This enrichment of H3K27me3 was higher in scSB-m than that in scSB-1 and scSB-2 (Fig. 4f, g, Additional file 2: Fig. S7e). Indeed, more ChIP-seq peaks represented histone marks in scSB-m (Additional file 2: Fig. S7c and f). This line of evidence suggests additional constraint above the stochastic random walk.
To investigate plausible constraints on the scSBs, we compared them with the ensemble boundaries in chr1. We found a strong association between the two classes. First, 7.89% of the scSBs in GM12878 can be found in K562 ensembles, which means that at least some of the GM12878 single-cell-specific boundaries are likely to be insulative in the K562 ensemble. Second, the bins that carry scSBs tend to be close to ensemble TAD boundaries. 18.9% of scSBs are located within an 80 kb (± 40 kb) region flanking the ensemble boundaries (Fig. 4h), and the average distance to the nearest ensemble boundaries from scSB-m is significantly smaller than that from both scSB-1 and scSB-2 (Additional file 2: Fig. S7g). Last, we built a simple logistic regression model to distinguish scSB-1 and scSB-2 from scSB-m using the number of ChIP-seq peaks as features, and we found 5 features, including CTCF, H3K4me1, H3K4me2, H3K9ac, and H3K36me3, that were most relevant in this respect (Additional file 2: Fig. S7h). However, the AUC (0.585) was much lower than the AUC (0.666) of a model that directly used the shortest distance to an ensemble TAD boundary as the feature (Fig. 4i), suggesting that distance is the most important factor restricting the biogenesis of scSB. The importance of distance suggests that genesis of scSBs may not be completely random, but rather tends to fall within certain restricted regions common to all, or most, human cells, and which is, at least to some extent, represented by the ensemble boundaries.
Altogether, our analysis indicates that a large amount of cell-to-cell variations in the TAD-like domain structure, the prevalence of cell-specific domain boundaries in cells, and a large portion of the single-cell-specific boundaries may not purely result from stochastic fluctuation in single cells.
The TAD-like domain structure carries information for cell identity
Previously, Tan et al. showed that cell types can be classified using single-cell Hi-C data combined with sequence features of the reads [54]. Now we ask whether the TAD-like domain structure alone can be used to classify single cells. Using WS as the similarity index for all three single-cell Hi-C datasets, we found that single cells could be correctly classified by the TAD-like domain structure alone. Tan’s dataset [54] consists of two cell types, GM12878 and PBMC. Both deTOKI and IS can completely distinguish the two cell types using the predicted TAD-like domain as a feature (AUC = 1.0, 1.0, and 0.863, for deTOKI, IS and deDoc, respectively, Fig. 5a). Flyamer’s dataset [50] consists of non-surrounded nucleolus (NSN) and surrounded nucleolus (SN) oocyte cell types, representing transcriptionally active immature and inactive mature oocytes, respectively [50]. The deTOKI could distinguish these two cell types much better than either IS or deDoc (AUC = 0.73, 0.66 and 0.52 for deTOKI, IS, and deDoc, respectively, Fig. 5b). Flyamer’s dataset also consists of zygote-mats and oocytes. The deTOKI distinguished these better as well (AUC = 1.0, 1.0, and 0.89 for deTOKI, IS, and deDoc, respectively, Additional file 2: Fig. S8e). In Li’s data [55], the Methyl-HiC data consists of 150 single cells cultured in two different media: 2i and serum. We found that the TAD-like domains predicted by deTOKI could also better distinguish cells with different growing conditions than could IS and deDoc (AUC = 0.691, 0.564, and 0.622 for deTOKI, IS, and deDoc, respectively, Fig. 5c). The classification of cells in different growth media may not be trivial, as the GO analysis showed that the genes on serum-specific TAD-like domain boundaries were enriched for the term “DNA methylation on cytosine” (Additional file 2: Fig. S8a-b, P = 7.28E−4), which is consistent with the fact that serum-cultured mESC has a higher DNA methylation rate [55]. The serum-specific TAD-like domain boundaries were also enriched for the gene regulation-related GO terms, e.g., positive regulation of gene expression and epigenetic features (P = 8.20E−4), which agrees with the fact that serum-cultured mESCs have more heterogeneous transcriptional activity than cells cultured in 2i [57,58,59]. Further, epigenetic features were distinguished between the serum- and 2i-specific TAD-like domain boundaries (Additional file 2: Fig. S8c). Altogether, deTOKI predicted TAD-like domains in single cells carrying reliable information about cell identity.
The DNA methylation pattern is highly correlated between TAD-like domain boundaries at the single-cell level
It has been suggested that spatially approximated genome loci are prone to share similar epigenetic patterns [60]. We thus asked if this feature could also be seen in single cells. In this case, we would expect to see lower correlations in DNA methylation between the inter-TAD-like domain bins than that between intra-TAD-like domain bins in the single cells. To test this speculation, we looked at Li’s data [55]. First, at the ensemble TAD level, the genome loci flanking the strongly insulated TAD boundaries have lower correlations on DNA methylation than those flanking the weakly insulated TAD boundaries (Additional file 2: Fig. S8d). The ensemble boundaries were classified into strong and weak groups with an identical number according to the insulation scores. The ensemble TAD boundaries of mESC were downloaded from the work of Dixon and colleagues [14], and the insulation scores at those boundaries were calculated by the pooled contact matrix of Li’s data. The average PCCs were 0.546 and 0.490 for weak and strong boundaries, respectively, and this correlation could also be seen if the boundaries were classified into more groups (Additional file 2: Fig. S8d). Next, we examined the inter-TAD-like domain and intra-TAD-like domain PCCs of DNA methylation level in the single cells. Indeed, when the TAD-like domains were defined by deTOKI or IS (Fig. 5d), the intra-domain PCCs were significantly larger than the inter-TAD-like domain PCCs, while when the TAD-like domains were defined by deDoc or the shuffled control, little difference was noted. The PCCs of inter-domain bins from deTOKI-predicted TAD-like domains were significantly lower than those from IS-predicted TAD-like domains (PCC = 0.520 vs. 0.523 for deTOKI and IS, respectively, p = 0.003), implying that the boundaries predicted by deTOKI are more spatially insulated in single cells. Although the average PCC between inter-domain bins was relatively low, it remains notable. We speculate that this might be caused by the existence of weak TAD boundaries, as discussed above. Together, our analysis suggested that spatially approximate chromatin loci are prone to carry similar epigenetic features and that the dynamic nature of TAD-like domain structures at the single-cell level has notable consequences for the ensemble of the epigenetic landscape.