5-Hydroxymethylcytosine is associated with enhancers and gene bodies in human embryonic stem cells
© Stroud et al.; licensee BioMed Central Ltd. 2011
Received: 10 March 2011
Accepted: 20 June 2011
Published: 20 June 2011
5-Hydroxymethylcytosine (5hmC) was recently found to be abundantly present in certain cell types, including embryonic stem cells. There is growing evidence that TET proteins, which convert 5-methylcytosine (5mC) to 5hmC, play important biological roles. To further understand the function of 5hmC, an analysis of the genome-wide localization of this mark is required.
Here, we have generated a genome-wide map of 5hmC in human embryonic stem cells by hmeDIP-seq, in which hydroxymethyl-DNA immunoprecipitation is followed by massively parallel sequencing. We found that 5hmC is enriched in enhancers as well as in gene bodies, suggesting a potential role for 5hmC in gene regulation. Consistent with localization of 5hmC at enhancers, 5hmC was significantly enriched in histone modifications associated with enhancers, such as H3K4me1 and H3K27ac. 5hmC was also enriched in other protein-DNA interaction sites, such as OCT4 and NANOG binding sites. Furthermore, we found that 5hmC regions tend to have an excess of G over C on one strand of DNA.
Our findings suggest that 5hmC may be targeted to certain genomic regions based both on gene expression and sequence composition.
Cytosine DNA methylation (5-methylcytosine (5mC)) is an epigenetic mark that is widespread in both animals and plants, and appears to play important roles in various biological processes, such as gene silencing and imprinting. Recently, studies have shown that embryonic stem cells (ESCs) and Purkinje neurons contain high levels of 5-hydroxymethylcytosine (5hmC) [1, 2]. Human TET1, a 2-oxoglutarate- and Fe(II)-dependent enzyme, has been shown to catalyze the conversion of 5mC to 5hmC both in vitro and in vivo . Subsequently, all mouse Tet proteins, Tet1, Tet2 and Tet3, were shown to be able to convert 5mC to 5hmC . Disruption in human TET1 and TET2 is associated with diseases such as MLL-associated leukemia  and myeloproliferative disorders . Studies have suggested that 5hmC inhibits the methyl-CpG-binding protein MeCP2 from binding DNA . In addition to the exclusion of methyl-CpG-binding proteins, 5hmC may recruit unknown 5hmC binding protein(s). Moreover, because the DNA methyltransferase DNMT1 binds poorly to 5hmC [1, 7], it is possible that 5hmC plays a role in excluding DNMT1 from methylating cytosines and thus may promote DNA demethylation. Importantly, 5hmC diminishes as embryonic stem cells (ESCs) differentiate, suggesting that 5hmC may play specific roles in ESCs. Indeed, mouse Tet1 has been shown to be required for ESC maintenance . The function of 5hmC in mammals remains poorly understood. To further understand the role of 5hmC, it is necessary to understand where 5hmC localizes in the genome. Very recently, a genome-wide map of 5hmC was reported in mouse cerebellum . 5hmC was chemically tagged and affinity enriched, and the purified DNA was sequenced. The authors found that 5hmC is enriched over genes and is positively correlated with expression levels .
Recently, commercial antibodies specific to 5hmC have become available. While these antibodies specifically recognize 5hmC, it is important to note that they tend to prefer densely 5-hydroxymethylated sites to single 5hmC sites (Figure S1 in Additional file 1). Here we generated genome-wide maps of 5hmC in human ESCs (hESCs) by performing hydroxymethyl-DNA immunoprecipitation followed by massively parallel sequencing with an Illumina Genome Analyzer (hmeDIP-seq). As did Song et al. , we found that a large fraction of 5hmC peaks were enriched over genes. However, we also found that 5hmC is enriched over predicted hESC enhancers, further suggesting a potential role of 5hmC in gene regulation. Moreover, we observed enrichment of 5hmC peaks with transcription binding sites such as those of pluripotency factors OCT4 and NANOG. In addition, we found that 5hmC regions correspond to genomic regions that are GC-skewed.
Results and discussion
5hmC is enriched over genic regions
The chromosomal distribution of 5hmC regions suggested that 5hmC is within gene-rich chromosomal domains (Figure 1b). Indeed, 46.2% of defined 5hmC regions overlapped with RefSeq annotated genes, suggesting a potential role of 5hmC in gene regulation. Plotting 5hmC peaks over RefSeq genes, we found that 5hmC tends to localize to transcribed regions (bodies) of genes in addition to immediate upstream regions (Figure S3a in Additional file 1). The distribution of expression levels of genes with 5hmC peaks was similar to levels of all genes, suggesting that 5hmC may not linearly correlate with expression levels (Figure S3b in Additional file 1). Plotting the distribution of 5hmC peaks over RefSeq genes with different expression levels, we observed that 5hmC is enriched near the transcription start sites of lowly expressed genes, whereas 5hmC is depleted at transcription start sites of highly expressed genes (Figure 1c). This is in contrast to data reported by Song et al.  that suggested that 5hmC levels positively correlate with expression in mouse cerebellum, suggesting possible differences in the role of 5hmC in different tissues.
5hmC is enriched over enhancers
To examine whether 5hmC peaks are associated with genes with specific functions, we performed gene ontology analyses using GREAT , which enables functional analysis of cis-regulatory regions such as enhancers. Interestingly, 5hmC-associated genes tended to function in processes such as embryonic pattern specification, cerebellum morphogenesis, and other developmental processes (Figure S5 in Additional file 1).
5hmC is enriched over transcription factor binding sites
5hmC regions are GC-skewed
We have generated the first genome-wide map of 5hmC in hESCs, and have found that 5hmC localizes to enhancers and gene bodies. 5hmC also tended to localize to other protein-DNA interaction sites such as TFBSs, suggesting a role of 5hmC in gene regulation. Finally, we found a novel characteristic of the DNA sequences associated with 5hmC peaks, GC-skew, which suggests the possibility that sequence composition may be a signal for the deposition of this epigenetic mark.
Materials and methods
Hydroxymethyl-DNA immunoprecipitation and Illumina library generation/sequencing
hmeDIP experiments were performed on HSF1 hESCs as previously described  using commercial antibodies specific to 5hmC, except that Illumina adapter ligated DNA fragments were used as the input for the immunoprecipitation. Two experiments, one using rabbit polyclonal antibody (Active Motif, Carlsbad, CA, USA) and the other using mouse monoclonal antibody (Diagenode, Sparta, NJ, USA), were performed using 5 μg per immunoprecipitation. Input genomic DNA and no antibody controls were also kept for sequencing. Illumina libraries were generated and sequenced on an Illumina Genome Analyzer per the manufacturer's instructions.
Data processing and analysis
Sequenced reads were base-called using the standard Illumina software. Reads were trimmed down to 50 bases due to low quality base calls in the 3' end of reads, and aligned to hg18 with Bowtie (v.0.12.4) allowing up to three mismatches. Only uniquely mapping reads were kept, and identical reads were collapsed to one read. Because the reads represent the ends of DNA libraries, for the downstream analyses, the reads were extended to represent the average fragment size of the libraries. All sequencing data have been deposited in Gene Expression Omnibus [GEO:GSE27627]. Regions were defined by using SICER (v.1.03). Only regions that were called by using both input and 'no antibody' as a background control with Benjamini corrected false discovery rate < 0.05 were kept. Finally, only regions called in both antibody hmeDIP-seq experiments were kept and analyzed. Gene ontology analysis was performed using the Genomic Regions Enrichment of Annotations Tool (GREAT) . Published hESC RNA-seq data  were used for expression analyses.
Fully hydroxymethylated DNA was produced by endpoint PCR using Phusion polymerase (NEB, Ipswich, MA, USA) and hm-dCTP (Bioline, Tauton, MA, USA) followed by PCR purification (Qiagen, Valencia, CA, USA). Unmethylated and fully methylated control DNAs were produced in the same manner with dCTP and m-dCTP (NEB, Ipswich, MA, USA), respectively. Various amounts of DNA were denatured, snap cooled and dotted onto positively charged nylon membranes (Roche, Indianapolis, IN, USA). Membranes were crosslinked, blocked with 5% milk, and incubated with Active Motif (1:10,000) 5-hmC antibody for 1 hour. Membranes were washed and then incubated with anti-rabbit secondary horseradish peroxidase-linked antibody (CST, Danvers, MA, USA) for 1 hour, washed, and developed with ECL reagent (CST) and Biomax MS film (Kodak). DNA sequences were (primer sequences in bold):
12 CG-TACTCTATACTCTACTCATCATTACACGCGCGATATCGTTAACGATAATTCGCGCGATTACGATCGATAACGCGTTAATATGAGATATGAGATGTGTATG; 6 CG-TACTCTATACTCTACTCATCATTACAATATATATATCGTTAACGATAATTCGCGCGATTACGATTTATAATTAATTAATATGAGATATGAGATGTGTATG; 3 CG-TACTCTATACTCTACTCATCATTACAATATATATATAATTAATTATAATTCGCGAAATTACGATTTATAATTAATTAATATGAGATATGAGATGTGTATG; 1 CG-TACTCTATACTCTACTCATCATTACAATATATATATAATTAATTATAATTAACGAAATTATAATTTATAATTAATTAATATGAGATATGAGATGTGTATG.
Validation of hydroxymethylated loci using MspI restriction enzyme and beta-glucosyltransferase
Human stem cell genomic DNA (5 to 10 μg) was treated with the EpiMark 5-hmC and 5-mC Analysis Kit as per the included protocol (NEB). Briefly, DNA was either glucosylated with beta-glucosyltransferase and UDP-Glc or mock treated with beta-glucosyltransferase and no UDP-Glc for 12 to 18 hours. These reactions were then split into three and mock digested, digested with MspI, or with HpaII for at least 4 hours. Samples were treated with proteinase K that was then heat inactivated. All DNA were diluted to a final concentration of 16 ng/μl to be used for PCR analysis. Quantitative PCR was completed with iQ SYBR Green Supermix (Biorad, Hercules, CA, USA) using a CFX384 Real-Time PCR Detection System (Biorad). Primers used for quantitative PCR are listed in Table S1 in Additional file 1.
embryonic stem cell
histone H3 mono-methylated at lysine 4
histone H3 acetylated at lysine 27
human embryonic stem cell
reads per kilobase per million mapped reads
transcription factor binding site.
HS was supported by a Fred Eiserling and Judith Lengyel Graduate Doctoral Fellowship. SF is a Special Fellow of the Leukemia and Lymphoma Society. Research in the laboratory of SEJ was supported by National Institutes of Health grant GM60398 and by an Innovation Award from the Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at UCLA. SEJ is an investigator of the Howard Hughes Medical Institute.
- Tahiliani M, Koh KP, Shen Y, Pastor WA, Bandukwala H, Brudno Y, Agarwal S, Iyer LM, Liu DR, Aravind L, Rao A: Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science. 2009, 324: 930-935. 10.1126/science.1170116.PubMedPubMed CentralView ArticleGoogle Scholar
- Kriaucionis S, Heintz N: The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science. 2009, 324: 929-930. 10.1126/science.1169786.PubMedPubMed CentralView ArticleGoogle Scholar
- Ito S, D'Alessio AC, Taranova OV, Hong K, Sowers LC, Zhang Y: Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature. 2010, 466: 1129-1133. 10.1038/nature09303.PubMedPubMed CentralView ArticleGoogle Scholar
- Meyer C, Kowarz E, Hofmann J, Renneville A, Zuna J, Trka J, Ben Abdelali R, Macintyre E, De Braekeleer E, De Braekeleer M, Delabesse E, de Oliveira MP, Cavé H, Clappier E, van Dongen JJ, Balgobind BV, van den Heuvel-Eibrink MM, Beverloo HB, Panzer-Grümayer R, Teigler-Schlegel A, Harbott J, Kjeldsen E, Schnittger S, Koehl U, Gruhn B, Heidenreich O, Chan LC, Yip SF, Krzywinski M, Eckert C, Möricke A, et al: New insights to the MLL recombinome of acute leukemias. Leukemia. 2009, 23: 1490-1499. 10.1038/leu.2009.33.PubMedView ArticleGoogle Scholar
- Viguie F, Aboura A, Bouscary D, Ramond S, Delmer A, Tachdjian G, Marie JP, Casadevall N: Common 4q24 deletion in four cases of hematopoietic malignancy: early stem cell involvement?. Leukemia. 2005, 19: 1411-1415. 10.1038/sj.leu.2403818.PubMedView ArticleGoogle Scholar
- Valinluck V, Tsai HH, Rogstad DK, Burdzy A, Bird A, Sowers LC: Oxidative damage to methyl-CpG sequences inhibits the binding of the methyl-CpG binding domain (MBD) of methyl-CpG binding protein 2 (MeCP2). Nucleic Acids Res. 2004, 32: 4100-4108. 10.1093/nar/gkh739.PubMedPubMed CentralView ArticleGoogle Scholar
- Valinluck V, Sowers LC: Endogenous cytosine damage products alter the site selectivity of human DNA maintenance methyltransferase DNMT1. Cancer Res. 2007, 67: 946-950. 10.1158/0008-5472.CAN-06-3123.PubMedView ArticleGoogle Scholar
- Song CX, Szulwach KE, Fu Y, Dai Q, Yi C, Li X, Li Y, Chen CH, Zhang W, Jian X, Wang J, Zhang L, Looney TJ, Zhang B, Godley LA, Hicks LM, Lahn BT, Jin P, He C: Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine. Nat Biotechnol. 2011, 29: 68-72. 10.1038/nbt.1732.PubMedPubMed CentralView ArticleGoogle Scholar
- Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo QM, Edsall L, Antosiewicz-Bourget J, Stewart R, Ruotti V, Millar AH, Thomson JA, Ren B, Ecker JR: Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009, 462: 315-322. 10.1038/nature08514.PubMedPubMed CentralView ArticleGoogle Scholar
- Morey Kinney S, Chin H, Vaisvila R, Bitinaite J, Zheng Y, Esteve P, Feng S, Stroud H, Jacobsen S, Pradhan S: Tissue specific distribution and dynamic changes of 5-hydroxymethylcytosine in mammalian genome. J Biol Chem. 2011,Google Scholar
- Heintzman ND, Hon GC, Hawkins RD, Kheradpour P, Stark A, Harp LF, Ye Z, Lee LK, Stuart RK, Ching CW, Ching KA, Antosiewicz-Bourget JE, Liu H, Zhang X, Green RD, Lobanenkov VV, Stewart R, Thomson JA, Crawford GE, Kellis M, Ren B: Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009, 459: 108-112. 10.1038/nature07829.PubMedPubMed CentralView ArticleGoogle Scholar
- Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J: A unique chromatin signature uncovers early developmental enhancers in humans. Nature. 2011, 470: 279-283. 10.1038/nature09692.PubMedPubMed CentralView ArticleGoogle Scholar
- McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G: GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010, 28: 495-501. 10.1038/nbt.1630.PubMedView ArticleGoogle Scholar
- Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu X, Chan YS, Ng HH, Bourque G: Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010, 42: 631-634. 10.1038/ng.600.PubMedView ArticleGoogle Scholar
- International Stem Cell Initiative, Adewumi O, Aflatoonian B, Ahrlund-Richter L, Amit M, Andrews PW, Beighton G, Bello PA, Benvenisty N, Berry LS, Bevan S, Blum B, Brooking J, Chen KG, Choo AB, Churchill GA, Corbel M, Damjanov I, Draper JS, Dvorak P, Emanuelsson K, Fleck RA, Ford A, Gertow K, Gertsenstein M, Gokhale PJ, Hamilton RS, Hampl A, Healy LE, Hovatta O, et al: Characterization of human embryonic stem cell lines by the International Stem Cell Initiative. Nat Biotechnol. 2007, 25: 803-816. 10.1038/nbt1318.View ArticleGoogle Scholar
- Brodie Of Brodie EB, Nicolay S, Touchon M, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A: From DNA sequence analysis to modeling replication in the human genome. Phys Rev Lett. 2005, 94: 248103-PubMedView ArticleGoogle Scholar
- Touchon M, Nicolay S, Audit B, Brodie of Brodie EB, d'Aubenton-Carafa Y, Arneodo A, Thermes C: Replication-associated strand asymmetries in mammalian genomes: toward detection of replication origins. Proc Natl Acad Sci USA. 2005, 102: 9836-9841. 10.1073/pnas.0500577102.PubMedPubMed CentralView ArticleGoogle Scholar
- Huvet M, Nicolay S, Touchon M, Audit B, d'Aubenton-Carafa Y, Arneodo A, Thermes C: Human gene organization driven by the coordination of replication and transcription. Genome Res. 2007, 17: 1278-1285. 10.1101/gr.6533407.PubMedPubMed CentralView ArticleGoogle Scholar
- Smagulova F, Gregoretti IV, Brick K, Khil P, Camerini-Otero RD, Petukhova GV: Genome-wide analysis reveals novel molecular features of mouse recombination hotspots. Nature. 2011, 472: 375-378. 10.1038/nature09869.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang X, Yazaki J, Sundaresan A, Cokus S, Chan SW, Chen H, Henderson IR, Shinn P, Pellegrini M, Jacobsen SE, Ecker JR: Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell. 2006, 126: 1189-1201. 10.1016/j.cell.2006.08.003.PubMedView ArticleGoogle Scholar
- Hawkins RD, Hon GC, Lee LK, Ngo Q, Lister R, Pelizzola M, Edsall LE, Kuan S, Luu Y, Klugman S, Antosiewicz-Bourget J, Ye Z, Espinoza C, Agarwahl S, Shen L, Ruotti V, Wang W, Stewart R, Thomson JA, Ecker JR, Ren B: Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell. 2010, 6: 479-491. 10.1016/j.stem.2010.03.018.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.