Single-cell whole-genome bisulfite sequencing accurately reports genome-wide 5mC patterns
In order to experimentally measure intra-tissue liver heterogeneity, diploid hepatocytes were isolated from six mice, three 4-months old and three 26-months old, by liver perfusion and sorted using fluorescence-activated cell sorting (FACS) into PCR tubes after Hoechst 33342 staining. A total of 21 hepatocytes (at least two cells per animal) were subjected to bisulfite treatment followed by whole-genome library preparation and sequencing on an Illumina HiSeq 2500 with 100-base, single-end reads. For each animal we also performed whole-genome bisulfite sequencing (WGBS) of DNA from bulk hepatocytes. For comparison, we also sequenced libraries generated from five manually picked individual mouse embryonic fibroblasts, as well as DNA from two bulk fibroblast cell populations.
On average, 17 million reads were mapped for each single hepatocyte (109 million for the bulk), corresponding to a mapping efficiency of 32.3 % for the single cells and 55.9 % for the bulk (Additional file 1: Table S1). The somewhat lower mapping efficiency of the single-cell DNA may be due to reduced complexity of the DNA amplified from single cells compared with the bulk and was also found by Smallwood et al. [5]. Mapped reads appeared as randomly distributed across the genome, providing information on all genomic features, covering 2.2 million CpG sites on average for the single cells and 21.6 million on average for the bulk DNAs (Additional file 1: Figure S1 and Table S1). Coverage was distinctively lower for the single cells than for the bulk DNA, in agreement with data from others [5] (Fig. 1a). Bisulfite conversion efficiency was 98 % or higher, as assessed by analysis of non-CpG methylation (Additional file 1: Table S1). Additional file 1: Table S2 compares these methodological specifics with two previously published protocols for single-cell methylomics [5, 6], showing that performances of all three methods are fairly similar (Additional file 1: Table S2).
While the vast majority of CpG sites were either methylated or unmethylated, a small fraction was found to show partial methylation (1.07 ± 0.98 %; Fig. 1b; Additional file 1: Figure S2). Interestingly, the fraction of partially methylated sites was consistently smaller in single cells compared with the bulk (Fig. 1b). While in bulk cell populations partially methylated sites most likely reflect amplification from multiple alleles (from many cells), in single cells amplification is only from two alleles and, in practice, due to the low coverage in most cases, from only one. This is the most likely explanation for the very small fraction of partially methylated sites in single cells. This is in keeping with the slightly higher fraction of partial methylation in the manually selected control fibroblasts, coverage of which was significantly higher than the hepatocytes (due to better single-cell DNA quality in these cells, which had not gone through enzymatic isolation and cell sorting; Fig. 1b).
To analyze 5mC patterns in the isolated hepatocytes, we subdivided the genome in 3-kb sliding windows with a step size of 600 bp and determined the weighted average of 5mC in each window as described in Additional file 1: Supplemental Experimental Procedures. Only windows containing at least 5 CpG sites were used for the analysis (Fig. 1c). Before proceeding with an in-depth analysis of 5mC heterogeneity, we verified the accuracy of our single-cell procedure in faithfully reporting correct DNA methylation status using two approaches. First, we tested if the DNA methylation patterns obtained for the studied single cells were indeed specific for hepatocytes. Because 5mC promoter status is generally inversely correlated with gene expression status [7–9], we used a set of 58 liver-specific genes identified through multi-tissue RNA-seq analyses by Lin. et al [10]. As those genes are specifically expressed in the liver, we expected their promoters to be generally hypomethylated. Indeed, our results show that the average 5mC level of promoters of these liver-specific genes in all single hepatocytes was very similar to that in the bulk hepatocytes and significantly lower (p < 0.001, one-tailed t-test) than that in the promoters of these same genes in fibroblasts (Fig. 2a).
Second, we merged methylation patterns of single hepatocytes as well as fibroblasts and compared this with their corresponding bulk patterns. Similar methylation levels of the merged and the bulk were observed for every window (Fig. 2b). Principle component analysis (PCA) revealed that single cell and bulk cluster according to cell type and age (Fig. 2c). In both cell types the clustering was affected by sequencing coverage, as expected because of the much higher coverage of the bulk samples (Fig. 2c; Additional file 1: Figure S1). We noticed that one young hepatocyte, “y14”, clustered with fibroblasts in PCA, although it has similar promoter methylation patterns to liver-specific genes. This may reflect the diversity and heterogeneity of the hepatocyte population. Based on the promoter methylation patterns and PCA clustering, we conclude that our single-cell DNA methylomics protocol correctly identified cell type-specific DNA methylation patterns.
5mC heterogeneity in liver is high and dependent on sequence feature
To quantitatively analyze 5mC heterogeneity among the single hepatocytes, we compared the average 5mC content of the 3-kb sliding windows overlapping between each single hepatocyte and its bulk population and calculated the variance between each cell and the bulk from which it was derived (Fig. 3a; Additional file 1: Supplemental Experimental Procedures). To control for possible artifacts caused by sequencing depth variation, we also calculated the variance between the bulk and artificial cells, simulated by downsampling of the bulk itself to a single-cell sequencing depth. This “noise” (y-axis, Fig. 3a), which was between 53.3 ± 13.0 % of the actual variations between the real cells and the bulk, was then subtracted from the raw variance values (x-axis, Fig. 3a). The results confirm a significant level of cell-to-cell variation in 5mC across the genome. As expected, heterogeneity was significantly higher among hepatocytes compared with the five fibroblasts included as control cells (P = 0.016, one-tailed permutation test on the mean variance of each group). As these cells had been taken from the same plate, both the number of cell divisions and chronological time between them was much shorter than for the hepatocytes, each of which went through the process of development and aging, with ample opportunity to undergo epivariation.
Due to the high level of 5mC heterogeneity, both in young and old hepatocytes, our sample size (ten single cells per age group) does not provide enough power to significantly distinguish potential differences between age groups. Interestingly, while not statistically significant, variance (from cell to bulk) of hepatocytes from the aged mice was higher than in the young animals (P = 0.148, one-tailed permutation test on the mean variance of each group) (Fig. 3a, b). Higher heterogeneity in hepatocytes compared with fibroblasts was confirmed when comparing the fraction of differentially methylated windows (DMWs) rather than comparing the variance (Fig. 3b; Additional file 1: Supplemental Experimental Procedures). Also in this case the DMW frequency was slightly higher among hepatocytes from the aged mice, with a profound increase in cell-to-cell variation.
To translate the observed variance (from cell to bulk) among hepatocytes into epivariation frequency, we then calculated the ratio between the number of altered CpG sites and the total number of CpGs overlapping between individual cells and the bulk (Additional file 1: Supplemental Experimental Procedures and Figure S3). Epivariation frequency in the hepatocytes appeared to be remarkably high, i.e., in the order of 3.3 % of all CpG sites analyzed. This is more than three orders of magnitude higher than the frequency of somatic DNA sequence mutations [11] and very similar to our previous promoter-based estimates [3].
Next, we explored whether heterogeneity of 5mC in mouse liver was dependent on specific genomic features. First we grouped all the hepatocytes together as we did not find a significant difference between young and old (Fig. 3a). To reduce bias due to coverage and sample size, each bin in each cell was down-sampled to 5 CpG counts per bin, irrespective of their presence in multiple reads at the same site or at multiple sites. After downsampling, we retained ten cells with the highest coverage for each bin, making the comparison of variation in methylation content homogeneous in all bins across the genome (Additional file 1: Supplemental Experimental Procedures). Our results indicate that 5mC heterogeneity is highly dependent on the genomic context and mostly a feature of non-functional sites. More specifically, 5mC variance between cells was higher than the genome average in repeats and transposons, whereas it was lower in functional sequences, such as CpG islands (CGIs) and promoter regions, 5′ untranslated regions (UTRs), exons, H3k36me3, and H3k4me3 (Fig. 3c), which is in keeping with previous work suggesting that hypermethylated regions are more error prone than hypomethylated ones [1, 2]. The exception appeared to be in the regions associated with histone H3 methylation at Lys4 (H3K4me1), the transcription-associated histone modification. Of note, H3K4me1 has been previously found to be associated with 5mC loss during aging [12].
Next, we focused on promoter 5mC heterogeneity, which appeared to depend on whether they were CGI or non-CGI promoters. Non-CGI promoters were found to be more variable than the CGI ones (Fig. 3c). In this respect, we can speculate that because non-CGI promoter genes are destined to change during development and adult life [13], they might be subjected to higher levels of fluctuations than the CGI ones.