Variation in 5-hydroxymethylcytosine across human cortex and cerebellum
Genome Biology volume 17, Article number: 27 (2016)
The most widely utilized approaches for quantifying DNA methylation involve the treatment of genomic DNA with sodium bisulfite; however, this method cannot distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). Previous studies have shown that 5hmC is enriched in the brain, although little is known about its genomic distribution and how it differs between anatomical regions and individuals. In this study, we combine oxidative bisulfite (oxBS) treatment with the Illumina Infinium 450K BeadArray to quantify genome-wide patterns of 5hmC in two distinct anatomical regions of the brain from multiple individuals.
We identify 37,145 and 65,563 sites passing our threshold for detectable 5hmC in the prefrontal cortex and cerebellum respectively, with 23,445 loci common across both brain regions. Distinct patterns of 5hmC are identified in each brain region, with notable differences in the genomic location of the most hydroxymethylated loci between these brain regions. Tissue-specific patterns of 5hmC are subsequently confirmed in an independent set of prefrontal cortex and cerebellum samples.
This study represents the first systematic analysis of 5hmC in the human brain, identifying tissue-specific hydroxymethylated positions and genomic regions characterized by inter-individual variation in DNA hydroxymethylation. This study demonstrates the utility of combining oxBS-treatment with the Illumina 450k methylation array to systematically quantify 5hmC across the genome and the potential utility of this approach for epigenomic studies of brain disorders.
Epigenetic modifications to DNA play a critical role in establishing and maintaining cellular phenotype . Recent studies highlight widespread changes in DNA methylation occurring during neurodevelopment, with tissue-specific methylomic variation present between discrete regions of the human brain [2, 3]. Epigenetic processes control key neurobiological and cognitive processes in the brain, and their importance is highlighted by evidence implicating methylomic variation in a number of neuropsychiatric and neurodegenerative diseases, including multiple sclerosis, autism, Alzheimer’s disease, and schizophrenia [4–7].
Although 5-methylcytosine (5mC) is the best understood and most studied epigenetic modification modulating transcriptional plasticity in the mammalian genome, three additional DNA modifications (5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC)) have been recently described. These modifications are thought to represent intermediates in the demethylation of 5mC to un-modified cytosine  although recent data suggest there are specific functional roles for 5hmC. For example, 5hmC is specifically recognized by key binding-proteins , and can be maintained through cell division . The exact genomic distribution of 5hmC is still debated; some studies have reported 5hmC in gene promoters and gene bodies [11, 12], while others have shown a depletion of 5hmC in CpG islands and an enrichment outside of CG-rich regions [13, 14]. It has been shown that 5hmC occurs at relatively high levels in the cerebellum and other regions of the brain [15, 16], where it is particularly enriched in the vicinity of genes with synapse-related functions . Of note, recent studies have reported global alterations in 5hmC in Alzheimer’s disease [18, 19], supporting a role in health and disease.
Until recently it has not been possible to sensitively quantify 5hmC at base-pair resolution in the genome across large numbers of samples. Furthermore, many of the existing methods routinely used to interrogate DNA methylation (that is, those based on sodium bisulfite (BS) conversion and methylation-sensitive restriction enzyme cleavage) are unable to discriminate between 5mC and 5hmC . The recent development of oxidative bisulfite (oxBS) treatment [13, 21], however, which involves the oxidation of 5hmC to 5fC before BS conversion, allows both a direct measurement of absolute 5mC and a proxy measure of 5hmC. Two recent papers demonstrated that oxBS conversion can be integrated with the Illumina 450K HumanMethylation (450K) array to facilitate the systematic quantification of both 5mC and 5hmC across the genome [22, 23]. In this study we used a commercially available oxBS treatment kit (TrueMethyl-CEGX, Cambridge, UK) in conjunction with the Illumina 450K array to compare the distribution of 5hmC across two regions of the human brain (prefrontal cortex and cerebellum) dissected from eight donors. We subsequently confirmed our findings in an independent set of matched prefrontal cortex and cerebellum samples dissected from an additional 18 individuals.
Results and discussion
Identifying differences in hydroxymethylated sites between cortex and cerebellum
The aim of the study was to compare 5hmC in matched postmortem prefrontal cortex and cerebellum samples from multiple donors using a commercially available oxBS conversion kit in combination with the Illumina 450K Human Methylation array. Briefly, the level of 5hmC at specific sites is quantified by subtracting oxBS-generated 450K array profiles from those generated following a BS-conversion performed in parallel. Each sample in this study was also profiled following a standard BS-conversion protocol using the Zymo EZ DNA methylation kit. Following normalization, the distribution of beta values was highly consistent across both BS methods (CEGX vs. Zymo) (Additional file 1: Figure S1A), with a highly significant correlation observed in both the prefrontal cortex (Additional file 1: Figure S1B; R2 = 0.99, P <2.2E-16) and cerebellum samples (Additional file 1: Figure S1C; R2 = 0.99, P <2.2E-16). These data indicate that the CEGX BS conversion protocol yields data that are directly comparable to data generated using standard BS conversion kits widely employed prior to 450K array processing.
We were interested in establishing the location of sites characterized by ‘detectable’ 5hmC and, building on our previous data demonstrating region-specific patterns of 5mC in the human brain , the extent to which levels of 5hmC differed between the prefrontal cortex and cerebellum. 5hmC levels were calculated by subtracting the oxBS beta-value from the BS beta value at each probe on the 450K array (ΔβBS-oxBS) (see Methods). As expected, the distribution of ΔβBS-oxBS values was positively-skewed (Fig. 1a), although a small proportion of probes in each sample were characterized by a negative ΔβBS-oxBS value, likely resulting from technical variance inherent in the Illumina array protocol. We therefore set a stringent threshold for calling 5hmC based on the 95th percentile of negative ΔβBS-oxBS values across all profiled samples, to ensure we only analyzed probes characterized by ‘detectable’ levels of 5hmC. In this dataset, therefore, only sites with an average ΔβBS-oxBS level in either tissue >0.09158275 were classified as having ‘detectable’ levels of 5hmC. Using this threshold, we identified a total of 79,263 loci characterized by ‘detectable’ 5hmC in one or both brain regions.
Of note, there was a striking difference in the prevalence of 5hmC-positive sites between the prefrontal cortex and cerebellum (Fig. 1a); we identified 37,145 (13,700 unique) and 65,563 (42,118 unique) probes with an average 5hmC level above threshold in the prefrontal cortex and cerebellum, respectively, with 23,445 probes characterized by ‘detectable’ 5hmC in both regions of the brain (Additional file 2: Table S1, Additional file 2: Table S2, and Fig. 1b). Of the 37,145 sites with ‘detectable’ 5hmC in the prefrontal cortex we observed a small but significant correlation with 5hmC level at the same sites in the cerebellum (Fig. 1c; adjusted R2 = 0.097, P <2.2E-16). Similarly for the 65,563 sites with ‘detectable’ 5hmC in the cerebellum we observed a significant correlation with 5hmC in the prefrontal cortex (Fig. 1d; adjusted R2 = 0.132, P <2.2E-16). As a resource to other researchers interested in the distribution of 5hmC in the brain, average ΔβBS-oxBS levels for each of the 79,263 probes on the 450K array characterized by ‘detectable’ 5hmC in one or both brain regions can be explored in the Hydroxymethylation Annotation in Brain Integrative Tool (HABIT) at our laboratory website (http://epigenetics.iop.kcl.ac.uk/HMC/). The tool also integrates annotated UCSC tracks to enable visualization of average 5hmC levels in both brain regions.
The distribution of 5hmC differs depending on genic location and CG density
Given that the abundance of 5mC is known to vary across the genome, we were interested in whether there is an enrichment of 5hmC in certain annotated regions of the genome. Although the Illumina 450K array does not enable an assessment of all potentially hydroxymethylated probes in the human genome, it is the most widely-used tool in epigenetic epidemiology and covers 99 % of RefSeq genes, with an average of 17 CpG sites per gene region distributed across the promoter, 5'UTR, first exon, gene body, and 3'UTR. We found that ‘detectable’ 5hmC is highly depleted in CpG islands in both brain regions (prefrontal cortex: OR = 0.18, P <2.53E-294; cerebellum: OR = 0.23, P <2.53E-294). In contrast, 5hmC is enriched in CpG island shores (prefrontal cortex: OR = 1.55, P = 2.53E-294; cerebellum: OR = 1.30, P = 1.80E-159), CpG island shelves (prefrontal cortex: OR = 1.78, P = 3.93E-262; cerebellum: OR = 1.86, P <2.53E-294), and locations outside of CG-rich regions (prefrontal cortex: OR = 1.62, P <2.53E-294; cerebellum: OR = 1.68, P <2.53E-294) (Table 1, Fig. 2a). This is consistent with previous studies demonstrating a depletion of 5hmC in CpG islands and an enrichment outside of CG-rich regions [13, 14]. Furthermore, ‘detectable’ 5hmC was significantly enriched in both brain regions in the gene body (prefrontal cortex: OR = 1.90, P <2.53E-294; cerebellum: OR = 2.48, P <2.53E-294), (Table 1, Fig. 2b, c), and also downstream of annotated transcripts (prefrontal cortex: OR = 1.30, P = 1.95E-12; cerebellum: OR = 1.35, P = 2.04E-25). In contrast, in both brain regions, 5hmC was depleted at intergenic sites (prefrontal cortex: OR = 0.82, P = 8.46E-34; cerebellum: OR = 0.79, P = 5.13E-75) and the proximal promoter (prefrontal cortex: OR = 0.54, P <2.53E-294; cerebellum: OR = 0.40, P <2.53E-294). This is consistent with previous studies showing a decrease in brain 5hmC in intergenic regions  and an enrichment of 5hmC in gene bodies . Interestingly, 5hmC was modestly enriched in distal promoter sites in the prefrontal cortex (OR = 1.19, P = 5.16E-12), but not in the cerebellum (OR = 0.97, P = 0.166). These data concur with previous studies using oxBS in conjunction with the 450K array in smaller numbers of samples. Stewart et al. demonstrated a significant enrichment of probes with detectable 5hmC in gene bodies when investigating one unmatched cerebellum and frontal cortex sample , while Field et al. showed that loci with detectable 5hmC are enriched in the gene body (exonic and intronic) and regions downstream of the gene in a single cerebellum sample .
5hmC is enriched in certain functional elements and depleted in others
We used functional genomic annotation data from ENCODE [25, 26] to examine the distribution of 5hmC across regulatory regions of the genome in the brain. ‘Detectable’ 5hmC was found to be significantly depleted at transcription factor binding sites (TFBS) (prefrontal cortex: OR = 0.67, P = 8.40E-291; cerebellum: OR = 0.44, P <2.53E-294) and at DNAse I hypersensitivity sites (prefrontal cortex: OR = 0.88, P <7.49E-14; cerebellum: OR = 0.68, P = 5.80E-174) (Table 1).
We also examined alternative transcription events (Table 1; Fig. 2d) and found that 5hmC is significantly depleted in both brain regions at alternative first exons (AFE) (prefrontal cortex: OR = 0.72, P = 5.07E-76; cerebellum: OR = 0.56, P <2.53E-294) and constitutive exons (CNE) (prefrontal cortex: OR = 0.71, P = 1.99E-38; cerebellum: OR = 0.80, P = 3.28E-28). In contrast, 5hmC is significantly enriched in alternative last exons (ALE) (prefrontal cortex: OR = 1.49, P = 1.55E-35; cerebellum: OR = 1.59, P = 5.61E-81), cassette exons (CE) (prefrontal cortex: OR = 1.53, P = 2.14E-128; cerebellum: OR = 1.57, P = 1.17E-239), and mutually exclusive exons (MXE) (prefrontal cortex: OR = 1.44, P = 8.63E-41; cerebellum: OR = 1.39, P = 6.38E-52). This concurs with previous studies demonstrating elevated levels of 5hmC in CE in the human brain . Furthermore, there is evidence of cerebellum-specific enrichment of 5hmC at alternative 3’ splice site (A3SS) (OR = 1.14, P = 2.63E-03), alternative 5’ splice site (A5SS) (OR = 1.15, P = 1.11E-03), and exon isoforms (EI) (OR = 1.78, P = 1.39E-03), with no equivalent enrichment in matched prefrontal cortex samples. Finally, there is depletion of 5hmC at intron retention (IR) events in the prefrontal cortex (OR = 0.91, P = 8.9E-04), but not the cerebellum (OR = 1.02, P = 0.472).
Using a logistic regression method to identify biological pathways enriched for loci annotated to sites with ‘detectable’ 5hmC, stringently controlling for the number probes annotated to each gene, we found considerable overlap in 5hmC-enriched pathways between the prefrontal cortex (Additional file 2: Table S3) and cerebellum (Additional file 2: Table S4), with the most significantly enriched pathway in both brain regions being nervous system development (prefrontal cortex: P = 1.5E-11; cerebellum: P = 4.1E-11).
Levels of 5hmC at specific sites differ between prefrontal cortex and cerebellum
After describing the genomic distribution of ‘detectable’ 5hmC, we were interested in estimating absolute levels of 5hmC at specific sites, and the extent to which these differ between brain regions and individuals. The canonical patterns of 5hmC and 5mC levels across the gene are shown in Fig. 3 for the 79,263 loci with detectable levels of 5hmC. 5mC levels across the gene are similar to those reported previously [2, 23], with a decrease in levels at the TSS, before a gradual increase through the gene body, and an eventual decrement downstream of the gene body. Interestingly although 5mC levels are similar at the TSS in both the prefrontal cortex and cerebellum, levels of 5mC were slightly elevated in the cerebellum at other regions along the gene, and more notably so at the 3’ end of the transcript. In contrast, 5hmC is characterized by a different genic pattern, with levels being consistently higher in the cerebellum than the prefrontal cortex across the entire gene, in addition to immediate upstream/downstream regions.
Additional file 2: Table S5 and Additional file 2: Table S6 list the 1,000 450K array sites with the highest estimated levels of 5hmC in the prefrontal cortex and cerebellum, respectively. Of the sites showing highest 5hmC in the prefrontal cortex (Additional file 2: Table S5), 349 did not exceed our detection threshold in the cerebellum. Similarly, of the sites showing highest 5hmC in the cerebellum (Additional file 2: Table S6), 651 did not exceed our detection threshold in the prefrontal cortex. These data suggest that although there is some similarity between brain regions, levels of 5hmC at individual sites are often tissue-specific.
In order to confirm our findings, we subsequently examined 5hmC levels at the top 1,000 sites in additional matched prefrontal cortex and cerebellum samples dissected from 18 independent donors (Additional file 2: Tables S5; Additional file 2: Table S6). Estimates of 5hmC at these sites was highly concordant across datasets (median difference between discovery and replication datasets = 4.73 (prefrontal cortex) and 4.62 (cerebellum)). There is a highly significant correlation in estimated 5hmC levels between the discovery and validation datasets at these sites in both the prefrontal cortex (R = 0.52, P = 8.96E-36) and cerebellum (R = 0.71, P = 7.42E-54) (Fig. 4).
Additional file 2: Table S7 lists the top 1,000 sites characterized by the largest average differences in 5hmC between the prefrontal cortex and cerebellum, which we define as tissue-specific hydroxymethylated positions (TS-HMPs). Each of these 1,000 probes is characterized by at least 28 % hydroxymethylation difference between the brain regions. For the majority (96.5 %) of these probes, the difference was driven by increased 5hmC in the cerebellum compared to the prefrontal cortex (Additional file 1: Figure S2), although 35 of the 1,000 sites (3.5 %) did show elevated 5hmC in the prefrontal cortex compared to the cerebellum. A total of 997 out of the 1,000 loci showed a significant difference in 5hmC between the prefrontal cortex and cerebellum in our validation dataset (Additional file 2: Table S7), and there was a highly significant correlation in prefrontal cortex versus cerebellum differences at these loci between the two independent datasets (Fig. 4c; R = 0.49, P = 6.32E-33). Pathway analysis of these top 1,000 TS-HMPs showed an enrichment for various neurobiological processes that distinguish the cortex and cerebellum (Additional file 2: Table S8), for example acetylcholine binding (P = 1.51E-04), dopaminergic neuron differentiation (P = 1.03E-03), and cerebellar purkinje cell layer morphogenesis (P = 1.83E-03).
Although there is an overall depletion of ‘detectable’ 5hmC in CpG islands (Table 1) when compared to the genic distribution of all probes on the 450K array, it is notable that sites with the highest levels of 5hmC in both the prefrontal cortex and cerebellum are enriched in CpG islands (OR = 1.25, P = 0.022 and OR = 3.46, P = 7.01E-55, respectively) when compared to the distribution of the 79,263 sites with ‘detectable’ 5hmC (Table 2; Additional file 1: Figure S3A). Furthermore, we found a significant enrichment of TS-HMRs in CpG islands (OR = 4.23, P = 6.37E-79) (Table 3; Additional file 1: Figure S3A) when compared to the 79,263 sites with ‘detectable’ 5hmC. Conversely, TS-HMRs are depleted in both CpG island shores (OR = 0.66, P = 6.2E-08) and CpG island shelves (OR = 0.59, P = 2.64E-07). Interestingly we have previously shown that tissue-specific differentially methylated regions in the genome (TS-DMRs) are enriched in CpG island shores and shelves , indicating a clear distinction between the location of tissue-specific DNA methylation and hydroxymethylation. The most hydroxymethylated sites in the prefrontal cortex showed similar patterns with respect to an enrichment at TFBS (prefrontal cortex: OR = 2.53, P = 2.57E-47; cerebellum: OR = 1.31, P = 4.97E-05) and DHS (prefrontal cortex: OR = 1.54, P = 3.95E-06; cerebellum: OR = 1.45, P = 8.42E-05), but some tissue differences in their presence at alternative events (Additional file 1: Figure S3C), and within gene features (Additional file 1: Figure S3B). For example, the most hydroxymethylated sites are over-represented in the proximal promoter in the prefrontal cortex (OR = 2.11, P = 3.10E-27), but not the cerebellum, and there is an under-representation of the top hydroxymethylated sites at gene bodies in the prefrontal cortex (OR = 0.64, P = 5.30E-12), but an over-representation in the cerebellum (OR = 1.45, P = 2.60E-08). The previous study investigating 5hmC in a single cerebellum sample also demonstrated that the highest levels of 5hmC are found in the gene body, within introns .
Some sites are characterized by considerable inter-individual variation in 5hmC
Given the hypothesized role of 5hmC in health and disease, we were interested in identifying regions of the genome characterized by inter-individual variation in 5hmC within both the prefrontal cortex (Additional file 2: Table S9) and cerebellum (Additional file 2: Table S10). The 1,000 top-ranked variable 5hmC sites were enriched in CpG islands both in the prefrontal cortex: (OR = 1.72, P = 4.03E-09) and the cerebellum (OR = 3.92, P = 2.41E-69) (Table 4; Additional file 1: Figure S3A). Furthermore, in both tissues, highly-variable 5hmC sites were under-represented in intergenic regions (Additional file 1: Figure S3B) (prefrontal cortex: OR = 0.84, P = 0.032; cerebellum: OR = 0.63, P = 3.86E-05) and in the proximal promoter (prefrontal cortex: OR = 0.76, P = 1.10E-03; cerebellum: OR = 0.47, P = 1.03E-16). However, there were some differences between the genic location of the most variable sites between the prefrontal cortex and the cerebellum; the most variable sites in the prefrontal cortex were under-represented in CpG island shelves (OR = 0.69, P = 1.82E-07), while the most variable sites in the cerebellum were under-represented in CpG island shores (OR = 0.52, P = 9.25E-17). Furthermore, in the gene body there was a significant over-representation of the most variable sites in the cerebellum (OR = 1.28, P = 1.78E-04), but not in the prefrontal cortex (OR = 1.00, P = 1.000). Interestingly over-/under-representation of sites at alternative events (Additional file 1: Figure S3C) was only seen for the most variable probes in the cerebellum and not in the prefrontal cortex, indicating that 5hmC may play a role in regulating gene expression through splicing in a tissue-specific manner. Despite observing genic differences in the most variable sites between individuals in the prefrontal cortex and cerebellum, there was a significant correlation in the inter-individual differences between regions, with the 1,000 most variable sites in the prefrontal cortex showing a similar degree of variability in the cerebellum (Additional file 1: Figure S4A; R = 0.30, P = 6.14E-17), and similarly the most variable sites in the cerebellum showing a similar degree of variability in the prefrontal cortex (Additional file 1: Figure S4B; R = 0.24, P = 5.4E-12).
A small proportion of DMPs identified by epigenome-wide association studies (EWAS) using standard BS approaches may actually reflect differences in 5hmC
Standard BS-treatment has been used in conjunction with the Illumina 450K methylation array in a growing number of EWAS analyses to identify differences in DNA methylation associated with exposure and disease. Given that this approach actually provides a cumulative measure of both 5mC and 5hmC, it is plausible that variation in 5hmC could confound findings that have largely been attributed to variation in DNA methylation. When we identified regions with the greatest difference in BS-generated data between the prefrontal cortex and cerebellum (Additional file 2: Table S11), these changes significantly correlated with differences in oxBS-generated data (R = 0.537, P = 6.85E-37) (Additional file 1: Figure S5; Additional file 2: Table S11). However, differences at some loci appeared to be driven by 5hmC, rather than 5mC variation, with eight of the top 1,000 BS tissue differences being driven more by a 5hmC difference than a 5mC difference. Looking across a canonical gene, we actually see no difference in DNA modification levels between the prefrontal cortex and the cerebellum when using standard BS-treated DNA (Fig. 5); however, we do see a considerable difference in true DNA methylation levels as determined using oxBS-treated DNA, with higher levels at the 3’ end of the gene in the prefrontal cortex than the cerebellum. We therefore recommend running BS and oxBS 450K arrays in parallel for investigating the role of DNA methylation in cross-tissue studies of complex disease.
This study demonstrates the utility of combining oxBS-treatment with the Illumina 450k methylation array to systematically quantify 5hmC across the genome. Our study highlights region-specific patterns of 5hmC in the human brain, with overall higher levels observed in the cerebellum than the prefrontal cortex, and notable differences in the genomic location of the most hydroxymethylated loci between these brain regions. Loci demonstrating the greatest differences between brain regions (TS-HMPs) are highly enriched at CpG islands and in the gene body. We also identify considerable inter-individual variation in 5hmC at a subset of loci within each brain region, with these variable regions being particularly enriched in CpG islands and depleted in intergenic regions and the proximal promoter. Finally, we were able to confirm our findings in a second independent set of matched prefrontal cortex and cerebellum samples. Given the enrichment of 5hmC in the vicinity of genes involved in nervous system development and function, and the inability to distinguish this modification from 5mC using standard BS-based methods, we propose that approaches described here can be used to interrogate the role of 5hmC in neurological/neuropsychiatric phenotypes and disease.
Our discovery cohort comprised prefrontal cortex and cerebellum samples dissected from eight individuals. First, brain tissue from six control donors, with no evidence of neurological impairment, was obtained from the London Neurodegenerative Disease Brain Bank (LNDBB) (http://www.kcl.ac.uk/ioppn/depts/bcn/Our-research/Neurodegeneration/brain-bank.aspx). For each sample, genomic DNA was extracted from 100 mg of tissue using a standard phenol-chloroform extraction method. Additionally, two control cerebellum samples, provided by CEGX, were run alongside. Sample characteristics for all discovery samples are detailed in Additional file 2: Table S12. Validation data were generated from matched prefrontal cortex and cerebellum samples from 18 control donors, with no evidence of neurological impairment, currently being profiled as part of an independent study in our lab.
Bisulfite (BS) and oxidative-bisulfite (oxBS) treatment
For each sample DNA was treated using: (1) the Zymo BS conversion kit; (2) the Cambridge Epigenetix BS conversion module; and (3) the Cambridge Epigenetix oxBS conversion module. For Zymo BS, 500 ng DNA from each sample was sodium BS-treated using the Zymo EZ 96 DNA methylation kit (Zymo Research) according to the manufacturer’s standard protocol. For CEGX BS and CEGX oxBS samples, 4 μg of high molecular weight genomic DNA was sheared into <10 kb fragments using a Covaris g-Tube. The fragmented DNA was subsequently concentrated into a total volume of 40 μL by passing the sample through a geneJET purification column. The 40 μL was split into two 20 μL samples and processed using the TrueMethyl kit following the manufacturer’s instructions. We performed enzyme digestion of the CEGX conversion controls as recommended by the manufacturer, and all samples showed satisfactory conversion (see Additional file 1: Figure S6 for example output).
Illumina Infinium BeadArray
DNA modifications were quantified using the Illumina Infinium Human 450K Methylation Array according to the manufacturer’s instructions, with minor amendments. In brief, the DNA input to the MSA4 plate for whole genome amplification was increased from 4 μL to 7 μL of CEGX BS/oxBS treated DNA. To compensate for the increased DNA volume, the concentration of NaOH was increased to 0.4 M and only 1 μL added to the MSA4 plate.
Quality control (QC) and data normalization
All computations and statistical analyses were performed within the R statistical environment (version 3.1.2)  and Bioconductor . Signal intensities were imported into R using the methylumi package . Initial QC checks were performed to assess concordance between reported and predicted gender. Non-CpG SNP probes on the array were used to confirm that samples where sourced from the same individual were genetically identical (Additional file 1: Figure S7). Data were pre-processed using wateRmelon (version 1.4.0) , with a custom P filter threshold of 5 % of sites with a detection P value <0.05. No precedents have yet been set for pre-processing and normalizing oxBS data. We therefore tested all of the different normalization strategies available within the wateRmelon package. We found that although other normalization strategies scored highly within each metric, data analyzed using dasen consistently scored well for each metric (Additional file 2: Table S13), and were therefore used for data normalization. Non-CpG SNP probes, probes that have been reported to contain common (MAF >5 %) SNPs in the CG or single base extension position, or probes that were non-specific or mismapped [31, 32], were flagged and disregarded in the evaluation of our results, leaving 374,094 probes for analysis.
The level of 5-hmC within each sample was identified by subtracting the oxBS (CEGX) beta-value from the BS (CEGX) beta value at each probe on the 450K array (ΔβBS-oxBS) in each sample. A threshold for detection of 5hmC was established by determining the lowest fifth percentile in the data (that is, -0.09158275 in this study). We then applied this value as a threshold for the positive data. Sites with an average 5hmC level in either the prefrontal cortex or cerebellum above this level (that is, +0.09158275) were classified as having ‘detectable’ levels of 5hmC. Illumina 450K array probes were annotated using ENCODE annotation , and Fisher’s exact test was used to determine if 5hmC was enriched in specific genomic regions. Normalized 5hmC levels are available on our online analytical database HABIT for the 79,263 sites we identified as having ‘detectable’ 5hmC in one or both brain regions (http://epigenetics.iop.kcl.ac.uk/HMC/).
Illumina UCSC gene annotation was used to create a test gene list from the hydroxymethylated probes in the prefrontal cortex (N = 37,145) and the cerebellum (N = 65,563), or the TS-HMPs (N = 1,000). A logistic regression approach was used to test if gene lists predicted pathway membership while controlling for the number of probes annotated to each gene. Pathways were downloaded from the Gene Ontology website (http://geneontology.org/) and all genes annotated to parent terms were also included. All genes with at least one Illumina probe annotated and annotated to at least one GO pathway were considered. Pathways were filtered to those with between 10 and 2,000 genes in them. After applying this method to all pathways, significant pathways (P <0.05) were taken and grouped where overlapping genes explained the signal. This was achieved by taking the most significant pathway, and retesting all remaining significant pathways while controlling additionally for the best term. If the test genes no longer predicted the pathway, the term was said to be explained by the most significant pathway, and hence these pathways were grouped together. This algorithm was repeated, taking the next most significant term, until all pathways were considered as the most significant or found to be explained by a more significant term.
Determining 5mC and 5hmC in a canonical gene
A sliding window approach was used to calculate average 5mC and 5hmC levels in a gene. To investigate canonical differences in 5hmC we calculated the moving average for 5mC (oxBS) and 5hmC (ΔβBS-oxBS) in the 79,263 loci deemed to be hydroxymethylated for overlapping 1 % sliding windows from 5 kb upstream of the gene to 5 kb downstream of the gene. To investigate canonical differences in data derived from BS-treated DNA, which may be driven by 5hmC differences we calculated the moving average for BS-treated DNA (5mC + 5hmC) and for oxBS DNA (5mC) in the 374,094 loci that passed QC for overlapping 1 % sliding windows from 5 kb upstream of the gene to 5 kb downstream of the gene.
Brain samples were obtained from the London Neurodegenerative Disease Brain Bank (LNDBB) (http://www.kcl.ac.uk/ioppn/depts/bcn/Our-research/Neurodegeneration/brain-bank.aspx). Samples were provided with informed consent according to the Declaration of Helsinki (1991) and ethical approval for the study was provided by the NHS South East London REC 3.
All microarray data have been uploaded to GEO and are available under accession number GSE74368.
Alternative 3’ splice site
Alternative 5’ splice site
Alternative first exon
Alterntive last exon
Epigenome-wide association study
Mutually exclusive exon
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–30.
Davies MN, Volta M, Pidsley R, Lunnon K, Dixit A, Lovestone S, et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood. Genome Biol. 2012;13:R43.
Spiers H, Hannon E, Schalkwy L, Smith R, Wong CCY, O’Donovan M, et al. Methylomic trajectories across human fetal brain development. Genome Res. 2015;25:338–52.
Lunnon K, Smith R, Hannon EJ, De Jager PL, Srivastava G, Volta M, et al. Cross-tissue methylomic profiling in Alzheimer’s disease implicates a role for cortex-specific deregulation of ANK1 in neuropathology. Nat Neurosci. 2014;17:1164–70.
Pidsley R, Viana J, Hannon E, Spiers HH, Troakes C, Al-Saraj S, et al. Methylomic profiling of human brain tissue supports a neurodevelopmental origin for schizophrenia. Genome Biol. 2014;15:483.
Ladd-Acosta C, Hansen KD, Briem E, Fallin MD, Kaufmann WE, Feinberg AP. Common DNA methylation alterations in multiple brain regions in autism. Mol Psychiatry. 2014;19:862–71.
Huynh JL, Garg P, Thin TH, Yoo S, Dutta R, Trapp BD, et al. Epigenome-wide differences in pathology-free regions of multiple sclerosis-affected brains. Nat Neurosci. 2014;17:121–30.
Song CX, Yi C, He C. Mapping recently identified nucleotide variants in the genome and transcriptome. Nat Biotechnol. 2012;30:1107–16.
Jin SG, Kadam S, Pfeifer GP. Examination of the specificity of DNA methylation profiling techniques towards 5-methylcytosine and 5-hydroxymethylcytosine. Nucleic Acids Res. 2010;38:e125.
Wossidlo M, Nakamura T, Lepikhov K, Marques CJ, Zakhartchenko V, Boiani M, et al. 5-Hydroxymethylcytosine in the mammalian zygote is linked with epigenetic reprogramming. Nat Commun. 2011;2:241.
Jin SG, Wu X, Li AX, Pfeifer GP. Genomic mapping of 5-hydroxymethylcytosine in the human brain. Nucleic Acids Res. 2011;39:5015–24.
Stroud H, Feng S, Morey Kinney S, Pradhan S, Jacobsen SE. 5-Hydroxymethylcytosine is associated with enhancers and gene bodies in human embryonic stem cells. Genome Biol. 2011;12:R54.
Booth MJ, Branco MR, Ficz G, Oxley D, Krueger F, Reik W, et al. Quantitative sequencing of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution. Science. 2012;336:934–7.
Yu M, Hon GC, Szulwach KE, Song CX, Zhang L, Kim A, et al. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell. 2012;149:1368–80.
Song CX, Szulwach KE, Fu Y, Dai Q, Yi C, Li X, et al. Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol. 2011;29:68–72.
Nestor CE, Ottaviano R, Reddington J, Sproul D, Reinhardt D, Dunican D, et al. Tissue type is a major modifier of the 5-hydroxymethylcytosine content of human genes. Genome Res. 2012;22:467–77.
Khare T, Pai S, Koncevicius K, Pal M, Kriukiene E, Liutkeviciute Z, et al. 5-hmC in the brain is abundant in synaptic genes and shows differences at the exon-intron boundary. Nat Struct Mol Biol. 2012;19:1037–43.
Condliffe D, Wong A, Troakes C, Proitsi P, Patel Y, Chouliaras L, et al. Cross-region reduction in 5-hydroxymethylcytosine in Alzheimer’s disease brain. Neurobiol Aging. 2014;35:1850–4.
Chouliaras L, Mastroeni D, Delvaux E, Grover A, Kenis G, Hof PR, et al. Consistent decrease in global DNA methylation and hydroxymethylation in the hippocampus of Alzheimer’s disease patients. Neurobiol Aging. 2013;34:2091–9.
Ito S, Shen L, Dai Q, Wu SC, Collins LB, Swenberg JA, et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science. 2011;333:1300–3.
Booth MJ, Ost TW, Beraldi D, Bell NM, Branco MR, Reik W, et al. Oxidative bisulfite sequencing of 5-methylcytosine and 5-hydroxymethylcytosine. Nat Protoc. 2013;8:1841–51.
Stewart SK, Morris TJ, Guilhamon P, Bulstrode H, Bachman M, Balasubramanian S, et al. oxBS-450K: a method for analysing hydroxymethylation using 450K BeadChips. Methods. 2015;72:9–15.
Field SF, Beraldi D, Bachman M, Stewart SK, Beck S, Balasubramanian S. Accurate measurement of 5-methylcytosine and 5-hydroxymethylcytosine in human cerebellum DNA by oxidative bisulfite on an array (OxBS-array). PLoS One. 2015;10:e0118202.
Szulwach KE, Li X, Li Y, Song CX, Wu H, Dai Q, et al. 5-hmC-mediated epigenetic dynamics during postnatal neurodevelopment and aging. Nat Neurosci. 2011;14:1607–16.
Slieker RC, Bos SD, Goeman JJ, Bovee JV, Talens RP, van der Breggen R, et al. Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450k array. Epigenetics Chromatin. 2013;6:26.
Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74.
Development Core Team R. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2012. p. 2012.
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80.
Davis S, Du P, Bilke S, Triche T Jr, Bootwalla M. Methylumi: Handle Illumina Methylation Data 2012. R package version 220 2012.
Pidsley R, Wong CCY, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics. 2013;14:293.
Blair JD, Price EM. Illuminating potential technical artifacts of DNA-methylation array probes. Am J Hum Genet. 2012;91:760–2.
Chen YA, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–9.
The authors would like to thank the LNDBB and Brains for Dementia Research (BDR) and their donors for provision of tissue for the study. This work was funded by NIH grant AG036039 to JM, UK Medical Research Council grant MR/K013807/1 to JM, and Alzheimer’s Association New Investigator Research Grant NIRG-14-320878 to KL.
The authors declare that they have no competing interests.
KL and JB conducted laboratory experiments. KL, EH, RS, and ED undertook data analysis and bioinformatics. CT and SAS provided samples for analysis. CW and AK provided validation data. LS developed the online analytical tool. JM conceived and supervised the project. KL and JM drafted the manuscript. All authors read and approved the final submission.
About this article
Cite this article
Lunnon, K., Hannon, E., Smith, R.G. et al. Variation in 5-hydroxymethylcytosine across human cortex and cerebellum. Genome Biol 17, 27 (2016). https://doi.org/10.1186/s13059-016-0871-x