Skip to main content

Variation in 5-hydroxymethylcytosine across human cortex and cerebellum

An Erratum to this article was published on 17 June 2016

Abstract

Background

The most widely utilized approaches for quantifying DNA methylation involve the treatment of genomic DNA with sodium bisulfite; however, this method cannot distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). Previous studies have shown that 5hmC is enriched in the brain, although little is known about its genomic distribution and how it differs between anatomical regions and individuals. In this study, we combine oxidative bisulfite (oxBS) treatment with the Illumina Infinium 450K BeadArray to quantify genome-wide patterns of 5hmC in two distinct anatomical regions of the brain from multiple individuals.

Results

We identify 37,145 and 65,563 sites passing our threshold for detectable 5hmC in the prefrontal cortex and cerebellum respectively, with 23,445 loci common across both brain regions. Distinct patterns of 5hmC are identified in each brain region, with notable differences in the genomic location of the most hydroxymethylated loci between these brain regions. Tissue-specific patterns of 5hmC are subsequently confirmed in an independent set of prefrontal cortex and cerebellum samples.

Conclusions

This study represents the first systematic analysis of 5hmC in the human brain, identifying tissue-specific hydroxymethylated positions and genomic regions characterized by inter-individual variation in DNA hydroxymethylation. This study demonstrates the utility of combining oxBS-treatment with the Illumina 450k methylation array to systematically quantify 5hmC across the genome and the potential utility of this approach for epigenomic studies of brain disorders.

Background

Epigenetic modifications to DNA play a critical role in establishing and maintaining cellular phenotype [1]. Recent studies highlight widespread changes in DNA methylation occurring during neurodevelopment, with tissue-specific methylomic variation present between discrete regions of the human brain [2, 3]. Epigenetic processes control key neurobiological and cognitive processes in the brain, and their importance is highlighted by evidence implicating methylomic variation in a number of neuropsychiatric and neurodegenerative diseases, including multiple sclerosis, autism, Alzheimer’s disease, and schizophrenia [4–7].

Although 5-methylcytosine (5mC) is the best understood and most studied epigenetic modification modulating transcriptional plasticity in the mammalian genome, three additional DNA modifications (5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC)) have been recently described. These modifications are thought to represent intermediates in the demethylation of 5mC to un-modified cytosine [8] although recent data suggest there are specific functional roles for 5hmC. For example, 5hmC is specifically recognized by key binding-proteins [9], and can be maintained through cell division [10]. The exact genomic distribution of 5hmC is still debated; some studies have reported 5hmC in gene promoters and gene bodies [11, 12], while others have shown a depletion of 5hmC in CpG islands and an enrichment outside of CG-rich regions [13, 14]. It has been shown that 5hmC occurs at relatively high levels in the cerebellum and other regions of the brain [15, 16], where it is particularly enriched in the vicinity of genes with synapse-related functions [17]. Of note, recent studies have reported global alterations in 5hmC in Alzheimer’s disease [18, 19], supporting a role in health and disease.

Until recently it has not been possible to sensitively quantify 5hmC at base-pair resolution in the genome across large numbers of samples. Furthermore, many of the existing methods routinely used to interrogate DNA methylation (that is, those based on sodium bisulfite (BS) conversion and methylation-sensitive restriction enzyme cleavage) are unable to discriminate between 5mC and 5hmC [20]. The recent development of oxidative bisulfite (oxBS) treatment [13, 21], however, which involves the oxidation of 5hmC to 5fC before BS conversion, allows both a direct measurement of absolute 5mC and a proxy measure of 5hmC. Two recent papers demonstrated that oxBS conversion can be integrated with the Illumina 450K HumanMethylation (450K) array to facilitate the systematic quantification of both 5mC and 5hmC across the genome [22, 23]. In this study we used a commercially available oxBS treatment kit (TrueMethyl-CEGX, Cambridge, UK) in conjunction with the Illumina 450K array to compare the distribution of 5hmC across two regions of the human brain (prefrontal cortex and cerebellum) dissected from eight donors. We subsequently confirmed our findings in an independent set of matched prefrontal cortex and cerebellum samples dissected from an additional 18 individuals.

Results and discussion

Identifying differences in hydroxymethylated sites between cortex and cerebellum

The aim of the study was to compare 5hmC in matched postmortem prefrontal cortex and cerebellum samples from multiple donors using a commercially available oxBS conversion kit in combination with the Illumina 450K Human Methylation array. Briefly, the level of 5hmC at specific sites is quantified by subtracting oxBS-generated 450K array profiles from those generated following a BS-conversion performed in parallel. Each sample in this study was also profiled following a standard BS-conversion protocol using the Zymo EZ DNA methylation kit. Following normalization, the distribution of beta values was highly consistent across both BS methods (CEGX vs. Zymo) (Additional file 1: Figure S1A), with a highly significant correlation observed in both the prefrontal cortex (Additional file 1: Figure S1B; R2 = 0.99, P <2.2E-16) and cerebellum samples (Additional file 1: Figure S1C; R2 = 0.99, P <2.2E-16). These data indicate that the CEGX BS conversion protocol yields data that are directly comparable to data generated using standard BS conversion kits widely employed prior to 450K array processing.

We were interested in establishing the location of sites characterized by ‘detectable’ 5hmC and, building on our previous data demonstrating region-specific patterns of 5mC in the human brain [2], the extent to which levels of 5hmC differed between the prefrontal cortex and cerebellum. 5hmC levels were calculated by subtracting the oxBS beta-value from the BS beta value at each probe on the 450K array (ΔβBS-oxBS) (see Methods). As expected, the distribution of ΔβBS-oxBS values was positively-skewed (Fig. 1a), although a small proportion of probes in each sample were characterized by a negative ΔβBS-oxBS value, likely resulting from technical variance inherent in the Illumina array protocol. We therefore set a stringent threshold for calling 5hmC based on the 95th percentile of negative ΔβBS-oxBS values across all profiled samples, to ensure we only analyzed probes characterized by ‘detectable’ levels of 5hmC. In this dataset, therefore, only sites with an average ΔβBS-oxBS level in either tissue >0.09158275 were classified as having ‘detectable’ levels of 5hmC. Using this threshold, we identified a total of 79,263 loci characterized by ‘detectable’ 5hmC in one or both brain regions.

Fig. 1
figure 1

Quantifying 5hmC in two regions of the human brain. a ΔβBS-oxBS was calculated for each sample and a detection threshold based on the lowest fifth percentile in the negative values (0.09158275) used to call ‘detectable’ 5hmC (black vertical line). b We identified 37,145 and 65,563 probes with a mean 5hmC level above threshold in the prefrontal cortex and cerebellum, respectively. c The degree of hydroxymethylation at sites with ‘detectable’ 5hmC in prefrontal cortex (N = 37,145) is correlated with levels at the same sites in the cerebellum (adjusted R2 = 0.097, P <2.2E-16). The red horizontal line indicates our threshold for ‘detectable’ 5hmC. d The degree of hydroxymethylation at sites with ‘detectable’ 5hmC in cerebellum (N = 65,563) is correlated with levels at the same sites in the prefrontal cortex (adjusted R2 = 0.132, P <2.2E-16)

Of note, there was a striking difference in the prevalence of 5hmC-positive sites between the prefrontal cortex and cerebellum (Fig. 1a); we identified 37,145 (13,700 unique) and 65,563 (42,118 unique) probes with an average 5hmC level above threshold in the prefrontal cortex and cerebellum, respectively, with 23,445 probes characterized by ‘detectable’ 5hmC in both regions of the brain (Additional file 2: Table S1, Additional file 2: Table S2, and Fig. 1b). Of the 37,145 sites with ‘detectable’ 5hmC in the prefrontal cortex we observed a small but significant correlation with 5hmC level at the same sites in the cerebellum (Fig. 1c; adjusted R2 = 0.097, P <2.2E-16). Similarly for the 65,563 sites with ‘detectable’ 5hmC in the cerebellum we observed a significant correlation with 5hmC in the prefrontal cortex (Fig. 1d; adjusted R2 = 0.132, P <2.2E-16). As a resource to other researchers interested in the distribution of 5hmC in the brain, average ΔβBS-oxBS levels for each of the 79,263 probes on the 450K array characterized by ‘detectable’ 5hmC in one or both brain regions can be explored in the Hydroxymethylation Annotation in Brain Integrative Tool (HABIT) at our laboratory website (http://epigenetics.iop.kcl.ac.uk/HMC/). The tool also integrates annotated UCSC tracks to enable visualization of average 5hmC levels in both brain regions.

The distribution of 5hmC differs depending on genic location and CG density

Given that the abundance of 5mC is known to vary across the genome, we were interested in whether there is an enrichment of 5hmC in certain annotated regions of the genome. Although the Illumina 450K array does not enable an assessment of all potentially hydroxymethylated probes in the human genome, it is the most widely-used tool in epigenetic epidemiology and covers 99 % of RefSeq genes, with an average of 17 CpG sites per gene region distributed across the promoter, 5'UTR, first exon, gene body, and 3'UTR. We found that ‘detectable’ 5hmC is highly depleted in CpG islands in both brain regions (prefrontal cortex: OR = 0.18, P <2.53E-294; cerebellum: OR = 0.23, P <2.53E-294). In contrast, 5hmC is enriched in CpG island shores (prefrontal cortex: OR = 1.55, P = 2.53E-294; cerebellum: OR = 1.30, P = 1.80E-159), CpG island shelves (prefrontal cortex: OR = 1.78, P = 3.93E-262; cerebellum: OR = 1.86, P <2.53E-294), and locations outside of CG-rich regions (prefrontal cortex: OR = 1.62, P <2.53E-294; cerebellum: OR = 1.68, P <2.53E-294) (Table 1, Fig. 2a). This is consistent with previous studies demonstrating a depletion of 5hmC in CpG islands and an enrichment outside of CG-rich regions [13, 14]. Furthermore, ‘detectable’ 5hmC was significantly enriched in both brain regions in the gene body (prefrontal cortex: OR = 1.90, P <2.53E-294; cerebellum: OR = 2.48, P <2.53E-294), (Table 1, Fig. 2b, c), and also downstream of annotated transcripts (prefrontal cortex: OR = 1.30, P = 1.95E-12; cerebellum: OR = 1.35, P = 2.04E-25). In contrast, in both brain regions, 5hmC was depleted at intergenic sites (prefrontal cortex: OR = 0.82, P = 8.46E-34; cerebellum: OR = 0.79, P = 5.13E-75) and the proximal promoter (prefrontal cortex: OR = 0.54, P <2.53E-294; cerebellum: OR = 0.40, P <2.53E-294). This is consistent with previous studies showing a decrease in brain 5hmC in intergenic regions [24] and an enrichment of 5hmC in gene bodies [22]. Interestingly, 5hmC was modestly enriched in distal promoter sites in the prefrontal cortex (OR = 1.19, P = 5.16E-12), but not in the cerebellum (OR = 0.97, P = 0.166). These data concur with previous studies using oxBS in conjunction with the 450K array in smaller numbers of samples. Stewart et al. demonstrated a significant enrichment of probes with detectable 5hmC in gene bodies when investigating one unmatched cerebellum and frontal cortex sample [22], while Field et al. showed that loci with detectable 5hmC are enriched in the gene body (exonic and intronic) and regions downstream of the gene in a single cerebellum sample [23].

Table 1 Specific genomic features are characterized by hydroxymethylation in brain
Fig. 2
figure 2

Hydroxymethylation is enriched in specific genomic features in brain. a Hydroxymethylated loci are significantly under-enriched in CpG islands, shores, and shelves and (b, c) significantly enriched in the gene body. d Across both brain regions sites with ‘detectable’ 5hmC are enriched in alternative last exons, cassette exons, and mutually exclusive exons, and under-enriched in alternative first exons and constitutive exons. The level of enrichment was determined by Fisher’s exact test. QC: quality control; CGI: CpG island; SHO: CpG island shore; SHE: CpG island shelf; nonCGI, NC: outside CpG islands; UA: unannotated; A3SS: 3’ splice site; A5SS: alternative 5’ splice site; AFE: alternative first exon; ALE: alternative last exon; CE: cassette exon; CNE: constitutive exon; EI: exon isoforms; II: intron isoforms; IR: intron retention; MXE: mutually exclusive exon

5hmC is enriched in certain functional elements and depleted in others

We used functional genomic annotation data from ENCODE [25, 26] to examine the distribution of 5hmC across regulatory regions of the genome in the brain. ‘Detectable’ 5hmC was found to be significantly depleted at transcription factor binding sites (TFBS) (prefrontal cortex: OR = 0.67, P = 8.40E-291; cerebellum: OR = 0.44, P <2.53E-294) and at DNAse I hypersensitivity sites (prefrontal cortex: OR = 0.88, P <7.49E-14; cerebellum: OR = 0.68, P = 5.80E-174) (Table 1).

We also examined alternative transcription events (Table 1; Fig. 2d) and found that 5hmC is significantly depleted in both brain regions at alternative first exons (AFE) (prefrontal cortex: OR = 0.72, P = 5.07E-76; cerebellum: OR = 0.56, P <2.53E-294) and constitutive exons (CNE) (prefrontal cortex: OR = 0.71, P = 1.99E-38; cerebellum: OR = 0.80, P = 3.28E-28). In contrast, 5hmC is significantly enriched in alternative last exons (ALE) (prefrontal cortex: OR = 1.49, P = 1.55E-35; cerebellum: OR = 1.59, P = 5.61E-81), cassette exons (CE) (prefrontal cortex: OR = 1.53, P = 2.14E-128; cerebellum: OR = 1.57, P = 1.17E-239), and mutually exclusive exons (MXE) (prefrontal cortex: OR = 1.44, P = 8.63E-41; cerebellum: OR = 1.39, P = 6.38E-52). This concurs with previous studies demonstrating elevated levels of 5hmC in CE in the human brain [17]. Furthermore, there is evidence of cerebellum-specific enrichment of 5hmC at alternative 3’ splice site (A3SS) (OR = 1.14, P = 2.63E-03), alternative 5’ splice site (A5SS) (OR = 1.15, P = 1.11E-03), and exon isoforms (EI) (OR = 1.78, P = 1.39E-03), with no equivalent enrichment in matched prefrontal cortex samples. Finally, there is depletion of 5hmC at intron retention (IR) events in the prefrontal cortex (OR = 0.91, P = 8.9E-04), but not the cerebellum (OR = 1.02, P = 0.472).

Using a logistic regression method to identify biological pathways enriched for loci annotated to sites with ‘detectable’ 5hmC, stringently controlling for the number probes annotated to each gene, we found considerable overlap in 5hmC-enriched pathways between the prefrontal cortex (Additional file 2: Table S3) and cerebellum (Additional file 2: Table S4), with the most significantly enriched pathway in both brain regions being nervous system development (prefrontal cortex: P = 1.5E-11; cerebellum: P = 4.1E-11).

Levels of 5hmC at specific sites differ between prefrontal cortex and cerebellum

After describing the genomic distribution of ‘detectable’ 5hmC, we were interested in estimating absolute levels of 5hmC at specific sites, and the extent to which these differ between brain regions and individuals. The canonical patterns of 5hmC and 5mC levels across the gene are shown in Fig. 3 for the 79,263 loci with detectable levels of 5hmC. 5mC levels across the gene are similar to those reported previously [2, 23], with a decrease in levels at the TSS, before a gradual increase through the gene body, and an eventual decrement downstream of the gene body. Interestingly although 5mC levels are similar at the TSS in both the prefrontal cortex and cerebellum, levels of 5mC were slightly elevated in the cerebellum at other regions along the gene, and more notably so at the 3’ end of the transcript. In contrast, 5hmC is characterized by a different genic pattern, with levels being consistently higher in the cerebellum than the prefrontal cortex across the entire gene, in addition to immediate upstream/downstream regions.

Fig. 3
figure 3

Gene-level analysis of canonical 5hmC. Levels of 5hmC are consistently higher in the cerebellum than in the prefrontal cortex along the length of the gene. Key: average 5hmC level in prefrontal cortex (red), average 5hmC level in cerebellum (green), average 5mC level in prefrontal cortex (blue), average 5mC level in cerebellum (pink)

Additional file 2: Table S5 and Additional file 2: Table S6 list the 1,000 450K array sites with the highest estimated levels of 5hmC in the prefrontal cortex and cerebellum, respectively. Of the sites showing highest 5hmC in the prefrontal cortex (Additional file 2: Table S5), 349 did not exceed our detection threshold in the cerebellum. Similarly, of the sites showing highest 5hmC in the cerebellum (Additional file 2: Table S6), 651 did not exceed our detection threshold in the prefrontal cortex. These data suggest that although there is some similarity between brain regions, levels of 5hmC at individual sites are often tissue-specific.

In order to confirm our findings, we subsequently examined 5hmC levels at the top 1,000 sites in additional matched prefrontal cortex and cerebellum samples dissected from 18 independent donors (Additional file 2: Tables S5; Additional file 2: Table S6). Estimates of 5hmC at these sites was highly concordant across datasets (median difference between discovery and replication datasets = 4.73 (prefrontal cortex) and 4.62 (cerebellum)). There is a highly significant correlation in estimated 5hmC levels between the discovery and validation datasets at these sites in both the prefrontal cortex (R = 0.52, P = 8.96E-36) and cerebellum (R = 0.71, P = 7.42E-54) (Fig. 4).

Fig. 4
figure 4

5hmC estimates generated using this approach were validated in an independent set of matched prefrontal cortex and cerebellum samples. Estimates of 5hmC at the 1,000 loci showing the highest levels of 5hmC in the discovery cohort were highly consistent and significantly correlated across datasets in both (a) the prefrontal cortex (R = 0.52, P = 8.96E-36) and (b) the cerebellum (R = 0.71, P = 7.42E-54). The red line denotes our threshold for robust 5hmC detection. (c) 5hmC differences between prefrontal cortex and cerebellum at the 1,000 top-ranked TS-HMPs identified in the discovery cohort were significantly correlated with differences identified at the same sites in the validation cohort (R = 0.49, P = 6.32E-33)

Additional file 2: Table S7 lists the top 1,000 sites characterized by the largest average differences in 5hmC between the prefrontal cortex and cerebellum, which we define as tissue-specific hydroxymethylated positions (TS-HMPs). Each of these 1,000 probes is characterized by at least 28 % hydroxymethylation difference between the brain regions. For the majority (96.5 %) of these probes, the difference was driven by increased 5hmC in the cerebellum compared to the prefrontal cortex (Additional file 1: Figure S2), although 35 of the 1,000 sites (3.5 %) did show elevated 5hmC in the prefrontal cortex compared to the cerebellum. A total of 997 out of the 1,000 loci showed a significant difference in 5hmC between the prefrontal cortex and cerebellum in our validation dataset (Additional file 2: Table S7), and there was a highly significant correlation in prefrontal cortex versus cerebellum differences at these loci between the two independent datasets (Fig. 4c; R = 0.49, P = 6.32E-33). Pathway analysis of these top 1,000 TS-HMPs showed an enrichment for various neurobiological processes that distinguish the cortex and cerebellum (Additional file 2: Table S8), for example acetylcholine binding (P = 1.51E-04), dopaminergic neuron differentiation (P = 1.03E-03), and cerebellar purkinje cell layer morphogenesis (P = 1.83E-03).

Although there is an overall depletion of ‘detectable’ 5hmC in CpG islands (Table 1) when compared to the genic distribution of all probes on the 450K array, it is notable that sites with the highest levels of 5hmC in both the prefrontal cortex and cerebellum are enriched in CpG islands (OR = 1.25, P = 0.022 and OR = 3.46, P = 7.01E-55, respectively) when compared to the distribution of the 79,263 sites with ‘detectable’ 5hmC (Table 2; Additional file 1: Figure S3A). Furthermore, we found a significant enrichment of TS-HMRs in CpG islands (OR = 4.23, P = 6.37E-79) (Table 3; Additional file 1: Figure S3A) when compared to the 79,263 sites with ‘detectable’ 5hmC. Conversely, TS-HMRs are depleted in both CpG island shores (OR = 0.66, P = 6.2E-08) and CpG island shelves (OR = 0.59, P = 2.64E-07). Interestingly we have previously shown that tissue-specific differentially methylated regions in the genome (TS-DMRs) are enriched in CpG island shores and shelves [2], indicating a clear distinction between the location of tissue-specific DNA methylation and hydroxymethylation. The most hydroxymethylated sites in the prefrontal cortex showed similar patterns with respect to an enrichment at TFBS (prefrontal cortex: OR = 2.53, P = 2.57E-47; cerebellum: OR = 1.31, P = 4.97E-05) and DHS (prefrontal cortex: OR = 1.54, P = 3.95E-06; cerebellum: OR = 1.45, P = 8.42E-05), but some tissue differences in their presence at alternative events (Additional file 1: Figure S3C), and within gene features (Additional file 1: Figure S3B). For example, the most hydroxymethylated sites are over-represented in the proximal promoter in the prefrontal cortex (OR = 2.11, P = 3.10E-27), but not the cerebellum, and there is an under-representation of the top hydroxymethylated sites at gene bodies in the prefrontal cortex (OR = 0.64, P = 5.30E-12), but an over-representation in the cerebellum (OR = 1.45, P = 2.60E-08). The previous study investigating 5hmC in a single cerebellum sample also demonstrated that the highest levels of 5hmC are found in the gene body, within introns [23].

Table 2 The most hydroxymethylated loci in prefrontal cortex and cerebellum are enriched in distinict genomic regions
Table 3 The 1,000 top tissue-specific hydroxymethylated positions (TS-HMPs) between the prefrontal cortex and cerebellum are enriched in distinct genomic regions

Some sites are characterized by considerable inter-individual variation in 5hmC

Given the hypothesized role of 5hmC in health and disease, we were interested in identifying regions of the genome characterized by inter-individual variation in 5hmC within both the prefrontal cortex (Additional file 2: Table S9) and cerebellum (Additional file 2: Table S10). The 1,000 top-ranked variable 5hmC sites were enriched in CpG islands both in the prefrontal cortex: (OR = 1.72, P = 4.03E-09) and the cerebellum (OR = 3.92, P = 2.41E-69) (Table 4; Additional file 1: Figure S3A). Furthermore, in both tissues, highly-variable 5hmC sites were under-represented in intergenic regions (Additional file 1: Figure S3B) (prefrontal cortex: OR = 0.84, P = 0.032; cerebellum: OR = 0.63, P = 3.86E-05) and in the proximal promoter (prefrontal cortex: OR = 0.76, P = 1.10E-03; cerebellum: OR = 0.47, P = 1.03E-16). However, there were some differences between the genic location of the most variable sites between the prefrontal cortex and the cerebellum; the most variable sites in the prefrontal cortex were under-represented in CpG island shelves (OR = 0.69, P = 1.82E-07), while the most variable sites in the cerebellum were under-represented in CpG island shores (OR = 0.52, P = 9.25E-17). Furthermore, in the gene body there was a significant over-representation of the most variable sites in the cerebellum (OR = 1.28, P = 1.78E-04), but not in the prefrontal cortex (OR = 1.00, P = 1.000). Interestingly over-/under-representation of sites at alternative events (Additional file 1: Figure S3C) was only seen for the most variable probes in the cerebellum and not in the prefrontal cortex, indicating that 5hmC may play a role in regulating gene expression through splicing in a tissue-specific manner. Despite observing genic differences in the most variable sites between individuals in the prefrontal cortex and cerebellum, there was a significant correlation in the inter-individual differences between regions, with the 1,000 most variable sites in the prefrontal cortex showing a similar degree of variability in the cerebellum (Additional file 1: Figure S4A; R = 0.30, P = 6.14E-17), and similarly the most variable sites in the cerebellum showing a similar degree of variability in the prefrontal cortex (Additional file 1: Figure S4B; R = 0.24, P = 5.4E-12).

Table 4 The most variable hydroxymethylated loci in prefrontal cortex and cerebellum are enriched in distinct genomic regions

A small proportion of DMPs identified by epigenome-wide association studies (EWAS) using standard BS approaches may actually reflect differences in 5hmC

Standard BS-treatment has been used in conjunction with the Illumina 450K methylation array in a growing number of EWAS analyses to identify differences in DNA methylation associated with exposure and disease. Given that this approach actually provides a cumulative measure of both 5mC and 5hmC, it is plausible that variation in 5hmC could confound findings that have largely been attributed to variation in DNA methylation. When we identified regions with the greatest difference in BS-generated data between the prefrontal cortex and cerebellum (Additional file 2: Table S11), these changes significantly correlated with differences in oxBS-generated data (R = 0.537, P = 6.85E-37) (Additional file 1: Figure S5; Additional file 2: Table S11). However, differences at some loci appeared to be driven by 5hmC, rather than 5mC variation, with eight of the top 1,000 BS tissue differences being driven more by a 5hmC difference than a 5mC difference. Looking across a canonical gene, we actually see no difference in DNA modification levels between the prefrontal cortex and the cerebellum when using standard BS-treated DNA (Fig. 5); however, we do see a considerable difference in true DNA methylation levels as determined using oxBS-treated DNA, with higher levels at the 3’ end of the gene in the prefrontal cortex than the cerebellum. We therefore recommend running BS and oxBS 450K arrays in parallel for investigating the role of DNA methylation in cross-tissue studies of complex disease.

Fig. 5
figure 5

Canonical differences in DNA methylation in standard BS-treated DNA can be masked by 5hmC differences. There was no difference in levels of DNA modifications (5mC + 5hmC) in BS-treated DNA between the prefrontal cortex and the cerebellum. However, when examining true 5mC levels (oxBS) there are higher levels of DNA methylation at the 3’ end of the gene body in the prefrontal cortex than the cerebellum. Key: BS-treated (5mC + 5hmC levels) prefrontal cortex DNA (red), BS-treated (5mC + 5hmC levels) cerebellum DNA (green), oxBS-treated (5mC levels only) prefrontal cortex DNA (blue), oxBS-treated (5mC level only) prefrontal cortex DNA (pink)

Conclusions

This study demonstrates the utility of combining oxBS-treatment with the Illumina 450k methylation array to systematically quantify 5hmC across the genome. Our study highlights region-specific patterns of 5hmC in the human brain, with overall higher levels observed in the cerebellum than the prefrontal cortex, and notable differences in the genomic location of the most hydroxymethylated loci between these brain regions. Loci demonstrating the greatest differences between brain regions (TS-HMPs) are highly enriched at CpG islands and in the gene body. We also identify considerable inter-individual variation in 5hmC at a subset of loci within each brain region, with these variable regions being particularly enriched in CpG islands and depleted in intergenic regions and the proximal promoter. Finally, we were able to confirm our findings in a second independent set of matched prefrontal cortex and cerebellum samples. Given the enrichment of 5hmC in the vicinity of genes involved in nervous system development and function, and the inability to distinguish this modification from 5mC using standard BS-based methods, we propose that approaches described here can be used to interrogate the role of 5hmC in neurological/neuropsychiatric phenotypes and disease.

Methods

Sample preparation

Our discovery cohort comprised prefrontal cortex and cerebellum samples dissected from eight individuals. First, brain tissue from six control donors, with no evidence of neurological impairment, was obtained from the London Neurodegenerative Disease Brain Bank (LNDBB) (http://www.kcl.ac.uk/ioppn/depts/bcn/Our-research/Neurodegeneration/brain-bank.aspx). For each sample, genomic DNA was extracted from 100 mg of tissue using a standard phenol-chloroform extraction method. Additionally, two control cerebellum samples, provided by CEGX, were run alongside. Sample characteristics for all discovery samples are detailed in Additional file 2: Table S12. Validation data were generated from matched prefrontal cortex and cerebellum samples from 18 control donors, with no evidence of neurological impairment, currently being profiled as part of an independent study in our lab.

Bisulfite (BS) and oxidative-bisulfite (oxBS) treatment

For each sample DNA was treated using: (1) the Zymo BS conversion kit; (2) the Cambridge Epigenetix BS conversion module; and (3) the Cambridge Epigenetix oxBS conversion module. For Zymo BS, 500 ng DNA from each sample was sodium BS-treated using the Zymo EZ 96 DNA methylation kit (Zymo Research) according to the manufacturer’s standard protocol. For CEGX BS and CEGX oxBS samples, 4 μg of high molecular weight genomic DNA was sheared into <10 kb fragments using a Covaris g-Tube. The fragmented DNA was subsequently concentrated into a total volume of 40 μL by passing the sample through a geneJET purification column. The 40 μL was split into two 20 μL samples and processed using the TrueMethyl kit following the manufacturer’s instructions. We performed enzyme digestion of the CEGX conversion controls as recommended by the manufacturer, and all samples showed satisfactory conversion (see Additional file 1: Figure S6 for example output).

Illumina Infinium BeadArray

DNA modifications were quantified using the Illumina Infinium Human 450K Methylation Array according to the manufacturer’s instructions, with minor amendments. In brief, the DNA input to the MSA4 plate for whole genome amplification was increased from 4 μL to 7 μL of CEGX BS/oxBS treated DNA. To compensate for the increased DNA volume, the concentration of NaOH was increased to 0.4 M and only 1 μL added to the MSA4 plate.

Quality control (QC) and data normalization

All computations and statistical analyses were performed within the R statistical environment (version 3.1.2) [27] and Bioconductor [28]. Signal intensities were imported into R using the methylumi package [29]. Initial QC checks were performed to assess concordance between reported and predicted gender. Non-CpG SNP probes on the array were used to confirm that samples where sourced from the same individual were genetically identical (Additional file 1: Figure S7). Data were pre-processed using wateRmelon (version 1.4.0) [30], with a custom P filter threshold of 5 % of sites with a detection P value <0.05. No precedents have yet been set for pre-processing and normalizing oxBS data. We therefore tested all of the different normalization strategies available within the wateRmelon package. We found that although other normalization strategies scored highly within each metric, data analyzed using dasen consistently scored well for each metric (Additional file 2: Table S13), and were therefore used for data normalization. Non-CpG SNP probes, probes that have been reported to contain common (MAF >5 %) SNPs in the CG or single base extension position, or probes that were non-specific or mismapped [31, 32], were flagged and disregarded in the evaluation of our results, leaving 374,094 probes for analysis.

Data analysis

The level of 5-hmC within each sample was identified by subtracting the oxBS (CEGX) beta-value from the BS (CEGX) beta value at each probe on the 450K array (ΔβBS-oxBS) in each sample. A threshold for detection of 5hmC was established by determining the lowest fifth percentile in the data (that is, -0.09158275 in this study). We then applied this value as a threshold for the positive data. Sites with an average 5hmC level in either the prefrontal cortex or cerebellum above this level (that is, +0.09158275) were classified as having ‘detectable’ levels of 5hmC. Illumina 450K array probes were annotated using ENCODE annotation [25], and Fisher’s exact test was used to determine if 5hmC was enriched in specific genomic regions. Normalized 5hmC levels are available on our online analytical database HABIT for the 79,263 sites we identified as having ‘detectable’ 5hmC in one or both brain regions (http://epigenetics.iop.kcl.ac.uk/HMC/).

Pathway analyses

Illumina UCSC gene annotation was used to create a test gene list from the hydroxymethylated probes in the prefrontal cortex (N = 37,145) and the cerebellum (N = 65,563), or the TS-HMPs (N = 1,000). A logistic regression approach was used to test if gene lists predicted pathway membership while controlling for the number of probes annotated to each gene. Pathways were downloaded from the Gene Ontology website (http://geneontology.org/) and all genes annotated to parent terms were also included. All genes with at least one Illumina probe annotated and annotated to at least one GO pathway were considered. Pathways were filtered to those with between 10 and 2,000 genes in them. After applying this method to all pathways, significant pathways (P <0.05) were taken and grouped where overlapping genes explained the signal. This was achieved by taking the most significant pathway, and retesting all remaining significant pathways while controlling additionally for the best term. If the test genes no longer predicted the pathway, the term was said to be explained by the most significant pathway, and hence these pathways were grouped together. This algorithm was repeated, taking the next most significant term, until all pathways were considered as the most significant or found to be explained by a more significant term.

Determining 5mC and 5hmC in a canonical gene

A sliding window approach was used to calculate average 5mC and 5hmC levels in a gene. To investigate canonical differences in 5hmC we calculated the moving average for 5mC (oxBS) and 5hmC (ΔβBS-oxBS) in the 79,263 loci deemed to be hydroxymethylated for overlapping 1 % sliding windows from 5 kb upstream of the gene to 5 kb downstream of the gene. To investigate canonical differences in data derived from BS-treated DNA, which may be driven by 5hmC differences we calculated the moving average for BS-treated DNA (5mC + 5hmC) and for oxBS DNA (5mC) in the 374,094 loci that passed QC for overlapping 1 % sliding windows from 5 kb upstream of the gene to 5 kb downstream of the gene.

Ethics

Brain samples were obtained from the London Neurodegenerative Disease Brain Bank (LNDBB) (http://www.kcl.ac.uk/ioppn/depts/bcn/Our-research/Neurodegeneration/brain-bank.aspx). Samples were provided with informed consent according to the Declaration of Helsinki (1991) and ethical approval for the study was provided by the NHS South East London REC 3.

Data availability

All microarray data have been uploaded to GEO and are available under accession number GSE74368.

Abbreviations

5caC:

5-carboxylcytosine

5fC:

5-formylcytosine

5hmC:

5-hydroxymethylcytosine

5mC:

5-methylcytosine

A3SS:

Alternative 3’ splice site

A5SS:

Alternative 5’ splice site

AFE:

Alternative first exon

ALE:

Alterntive last exon

BS:

Bisulfite treatment

CE:

Cassette exon

CGI:

CpG island

CNE:

Constitutive exon

EI:

Exon isoforms

EWAS:

Epigenome-wide association study

II:

Inron isoforms

IR:

Intron retention

MXE:

Mutually exclusive exon

oxBS:

Oxidative-bisulfite treatment

QC:

Quality control

References

  1. Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Davies MN, Volta M, Pidsley R, Lunnon K, Dixit A, Lovestone S, et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood. Genome Biol. 2012;13:R43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Spiers H, Hannon E, Schalkwy L, Smith R, Wong CCY, O’Donovan M, et al. Methylomic trajectories across human fetal brain development. Genome Res. 2015;25:338–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lunnon K, Smith R, Hannon EJ, De Jager PL, Srivastava G, Volta M, et al. Cross-tissue methylomic profiling in Alzheimer’s disease implicates a role for cortex-specific deregulation of ANK1 in neuropathology. Nat Neurosci. 2014;17:1164–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Pidsley R, Viana J, Hannon E, Spiers HH, Troakes C, Al-Saraj S, et al. Methylomic profiling of human brain tissue supports a neurodevelopmental origin for schizophrenia. Genome Biol. 2014;15:483.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Ladd-Acosta C, Hansen KD, Briem E, Fallin MD, Kaufmann WE, Feinberg AP. Common DNA methylation alterations in multiple brain regions in autism. Mol Psychiatry. 2014;19:862–71.

    Article  CAS  PubMed  Google Scholar 

  7. Huynh JL, Garg P, Thin TH, Yoo S, Dutta R, Trapp BD, et al. Epigenome-wide differences in pathology-free regions of multiple sclerosis-affected brains. Nat Neurosci. 2014;17:121–30.

    Article  CAS  PubMed  Google Scholar 

  8. Song CX, Yi C, He C. Mapping recently identified nucleotide variants in the genome and transcriptome. Nat Biotechnol. 2012;30:1107–16.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Jin SG, Kadam S, Pfeifer GP. Examination of the specificity of DNA methylation profiling techniques towards 5-methylcytosine and 5-hydroxymethylcytosine. Nucleic Acids Res. 2010;38:e125.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wossidlo M, Nakamura T, Lepikhov K, Marques CJ, Zakhartchenko V, Boiani M, et al. 5-Hydroxymethylcytosine in the mammalian zygote is linked with epigenetic reprogramming. Nat Commun. 2011;2:241.

    Article  PubMed  Google Scholar 

  11. Jin SG, Wu X, Li AX, Pfeifer GP. Genomic mapping of 5-hydroxymethylcytosine in the human brain. Nucleic Acids Res. 2011;39:5015–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Stroud H, Feng S, Morey Kinney S, Pradhan S, Jacobsen SE. 5-Hydroxymethylcytosine is associated with enhancers and gene bodies in human embryonic stem cells. Genome Biol. 2011;12:R54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W, et al. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science. 2012;336:934–7.

    Article  CAS  PubMed  Google Scholar 

  14. Yu M, Hon GC, Szulwach KE, Song CX, Zhang L, Kim A, et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell. 2012;149:1368–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Song CX, Szulwach KE, Fu Y, Dai Q, Yi C, Li X, et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol. 2011;29:68–72.

    Article  CAS  PubMed  Google Scholar 

  16. Nestor CE, Ottaviano R, Reddington J, Sproul D, Reinhardt D, Dunican D, et al. Tissue type is a major modifier of the 5-hydroxymethylcytosine content of human genes. Genome Res. 2012;22:467–77.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Khare T, Pai S, Koncevicius K, Pal M, Kriukiene E, Liutkeviciute Z, et al. 5-hmC in the brain is abundant in synaptic genes and shows differences at the exon-intron boundary. Nat Struct Mol Biol. 2012;19:1037–43.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Condliffe D, Wong A, Troakes C, Proitsi P, Patel Y, Chouliaras L, et al. Cross-region reduction in 5-hydroxymethylcytosine in Alzheimer’s disease brain. Neurobiol Aging. 2014;35:1850–4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Chouliaras L, Mastroeni D, Delvaux E, Grover A, Kenis G, Hof PR, et al. Consistent decrease in global DNA methylation and hydroxymethylation in the hippocampus of Alzheimer’s disease patients. Neurobiol Aging. 2013;34:2091–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Ito S, Shen L, Dai Q, Wu SC, Collins LB, Swenberg JA, et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science. 2011;333:1300–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Booth MJ, Ost TW, Beraldi D, Bell NM, Branco MR, Reik W, et al. Oxidative bisulfite sequencing of 5-methylcytosine and 5-hydroxymethylcytosine. Nat Protoc. 2013;8:1841–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Stewart SK, Morris TJ, Guilhamon P, Bulstrode H, Bachman M, Balasubramanian S, et al. oxBS-450K: a method for analysing hydroxymethylation using 450K BeadChips. Methods. 2015;72:9–15.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Field SF, Beraldi D, Bachman M, Stewart SK, Beck S, Balasubramanian S. Accurate measurement of 5-methylcytosine and 5-hydroxymethylcytosine in human cerebellum DNA by oxidative bisulfite on an array (OxBS-array). PLoS One. 2015;10:e0118202.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Szulwach KE, Li X, Li Y, Song CX, Wu H, Dai Q, et al. 5-hmC-mediated epigenetic dynamics during postnatal neurodevelopment and aging. Nat Neurosci. 2011;14:1607–16.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Slieker RC, Bos SD, Goeman JJ, Bovee JV, Talens RP, van der Breggen R, et al. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenetics Chromatin. 2013;6:26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74.

    Article  Google Scholar 

  27. Development Core Team R. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2012. p. 2012.

    Google Scholar 

  28. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Davis S, Du P, Bilke S, Triche T Jr, Bootwalla M. Methylumi: Handle Illumina Methylation Data 2012. R package version 220 2012.

  30. Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14:293.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Blair JD, Price EM. Illuminating potential technical artifacts of DNA-methylation array probes. Am J Hum Genet. 2012;91:760–2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the LNDBB and Brains for Dementia Research (BDR) and their donors for provision of tissue for the study. This work was funded by NIH grant AG036039 to JM, UK Medical Research Council grant MR/K013807/1 to JM, and Alzheimer’s Association New Investigator Research Grant NIRG-14-320878 to KL.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katie Lunnon.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

KL and JB conducted laboratory experiments. KL, EH, RS, and ED undertook data analysis and bioinformatics. CT and SAS provided samples for analysis. CW and AK provided validation data. LS developed the online analytical tool. JM conceived and supervised the project. KL and JM drafted the manuscript. All authors read and approved the final submission.

Additional files

Additional file 1:

Supplementary Figures 1–7 in a single pdf, with legends embedded in each figure. (PDF 1 mb)

Additional file 2:

Supplementary Tables 1–13 in a single Excel file as separate tabs, with legends embedded in each table. (XLSX 14105 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lunnon, K., Hannon, E., Smith, R.G. et al. Variation in 5-hydroxymethylcytosine across human cortex and cerebellum. Genome Biol 17, 27 (2016). https://doi.org/10.1186/s13059-016-0871-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13059-016-0871-x

Keywords