Independent genomewide screens identify the tumor suppressor VTRNA2-1 as a human epiallele responsive to periconceptional environment
- Matt J Silver†1, 2,
- Noah J Kessler†3,
- Branwen J Hennig1, 2,
- Paula Dominguez-Salas4, 5,
- Eleonora Laritsky3,
- Maria S Baker3,
- Cristian Coarfa6,
- Hector Hernandez-Vargas7,
- Jovita M Castelino8,
- Michael N Routledge8,
- Yun Yun Gong9,
- Zdenko Herceg7,
- Yong Sun Lee10,
- Kwanbok Lee10,
- Sophie E Moore1, 2, 11,
- Anthony J Fulford1, 2,
- Andrew M Prentice1, 2Email author and
- Robert A Waterland3, 12Email author
© Silver et al. 2015
Received: 7 March 2015
Accepted: 22 April 2015
Published: 11 June 2015
Interindividual epigenetic variation that occurs systemically must be established prior to gastrulation in the very early embryo and, because it is systemic, can be assessed in easily biopsiable tissues. We employ two independent genome-wide approaches to search for such variants.
First, we screen for metastable epialleles by performing genomewide bisulfite sequencing in peripheral blood lymphocyte (PBL) and hair follicle DNA from two Caucasian adults. Second, we conduct a genomewide screen for genomic regions at which PBL DNA methylation is affected by season of conception in rural Gambia. Remarkably, both approaches identify the genomically imprinted VTRNA2-1 as a top environmentally responsive epiallele. We demonstrate systemic and stochastic interindividual variation in DNA methylation at the VTRNA2-1 differentially methylated region in healthy Caucasian and Asian adults and show, in rural Gambians, that periconceptional environment affects offspring VTRNA2-1 epigenotype, which is stable over at least 10 years. This unbiased screen also identifies over 100 additional candidate metastable epialleles, and shows that these are associated with cis genomic features including transposable elements.
The non-coding VTRNA2-1 transcript (also called nc886) is a putative tumor suppressor and modulator of innate immunity. Thus, these data indicating environmentally induced loss of imprinting at VTRNA2-1 constitute a plausible causal pathway linking early embryonic environment, epigenetic alteration, and human disease. More broadly, the list of candidate metastable epialleles provides a resource for future studies of epigenetic variation and human disease.
Epigenetic mechanisms are established during development and stably regulate gene expression potential in differentiated cells . A fundamental outstanding question is whether and how interindividual epigenetic variation affects risk of disease [2,3]. A major focus is DNA methylation, which in mammals occurs predominantly at cytosines within CpG dinucleotides. Developmental establishment of CpG methylation can be influenced by environment [4,5], and once established, CpG methylation is mitotically heritable and normally highly stable . Elucidating the role of epigenetic variation in human disease is complicated, however, by the fact that epigenetic processes are inherently tissue-specific, and can themselves be altered by disease [7,8]. A potential way to circumvent these complications is to identify epigenetic marks that are established in the very early embryo and maintained during subsequent differentiation, thus affecting all germ layer lineages.
Accordingly, in this study we employed two different approaches to identify DNA methylation changes that are induced by periconceptional environment. First, we performed a genomewide search for metastable epialleles (MEs) in healthy Caucasian adults. MEs are genomic regions at which DNA methylation is established stochastically in the early embryo, leading to systemic (cross-tissue) interindividual variation in epigenetic regulation that is not mediated by genetic variation . Establishment of epigenotype at MEs has previously been shown to be affected by maternal nutrition around the time of conception [10-12]. Second, we used genomewide DNA methylation profiling to study a population in rural Gambia, wherein seasonal variations in food supply and metabolic demand provide a natural experiment by which to study the effect of periconceptional environment (including maternal nutritional status) on epigenetic development in the offspring . These two independent and complementary genomewide screens convergently identified the gene encoding the small non-coding RNA VTRNA2-1 as the lead candidate environmentally responsive epiallele. VTRNA2-1 (also called nc886) appears to act as a tumor suppressor gene subject to epigenetic silencing by promoter methylation. Elevated methylation at VTRNA2-1 predicts poor prognosis in leukemia , and lung  and esophageal cancer . VTRNA2-1 is genomically imprinted, with preferential methylation on the maternally inherited allele [17,18]. By assaying DNA methylation in peripheral blood mononuclear cells, Treppendahl et al. reported that about 25% of healthy individuals exhibit hypomethylation on both alleles of VTRNA2-1 , suggesting polymorphic imprinting. Here we report data indicating that polymorphic imprinting at VTRNA2-1 is not regulated by cis genetic variation, but is affected by maternal environment around the time of conception, occurs systemically, and is highly stable over many years. Our findings provide a plausible causal pathway to explain previous observations that season of birth predicts adult mortality from infection-related causes in rural Gambians .
Genomewide screen for human metastable epialleles
Genomic features of regions flanking candidate metastable epialleles
Validation of epigenetic metastability at VTRNA2-1
Genomewide screen for DNA methylation changes associated with season of conception in rural Gambians identifies VTRNA2-1 as the top hit
The 10 most significant SoC-associated DMRs (SoC-DMRs) identified by the bump hunting analysis
Probes in SoC-DMR
Probes in cluster
SoC P -value
SoC P -value (FWER)
~3 kb upstream of TSS
All of the top 10 SoC-DMRs (Table 1) showed lower methylation in the dry relative to the rainy season (that is, a positive coefficient), consistent with our previous studies focused on candidate MEs [10,20]. Plots of methylation by SoC for the second and third highest ranking SoC-DMRs (PAX8 and PRDM9) are provided as examples (Figure S5a,b in Additional file 1). Importantly, VTRNA2-1 was not the only candidate ME that showed SoC effects in the 450k data. PAX8 and AKAP12, identified as candidate MEs in our previous reduced-representation screen , were among the top 10 SoC-DMRs (Table 1), reinforcing the notion that developmental establishment of DNA methylation at MEs is particularly sensitive to periconceptional environment.
Maternal periconceptional nutrition predicts offspring hypomethylation at VTRNA2-1, which is stable for many years
Seasonal differences in maternal nutritional status affecting one-carbon metabolism (Figure 4e) suggest a potential mechanism to explain the SoC effect on VTRNA2-1 methylation. To test this, each of 13 maternal nutritional status biomarkers sampled in early pregnancy  was evaluated as a potential predictor of VTRNA2-1 hypomethylation in her infant (<40% by pyrosequencing)  (Table S4 in Additional file 2). Low maternal vitamin B2 (riboflavin) and methionine (MET) specifically around the time of conception predicted VTRNA2-1 hypomethylation in her infant, whereas low maternal plasma dimethylglycine (DMG) protected against hypomethylation (Figure 4f; P = 0.05, 0.01, and 0.05 respectively). Each of these associations appears to follow a dose-response relationship, but larger datasets will be needed to confirm them and model the relevant methyl donor pathways. The riboflavin finding, consistent with our previous observations across a panel of MEs , is particularly noteworthy; riboflavin is required for synthesis of flavin-adenine dinucleotide, an essential cofactor for methylenetetrahydrofolate reductase (MTHFR), a rate-limiting enzyme in one-carbon metabolism. Riboflavin deficiency is very common in low-income populations with low intakes of dairy and animal foods, including rural Gambia.
In order for a nutritionally mediated epigenetic change in early life to affect risk of disease in adulthood, such marks must persist for many years. To test the temporal stability of VTRNA2-1 methylation, we obtained serial PBL DNA samples from 55 rural Gambians, spanning a 10-year period. Average VTRNA2-1 methylation was generally stable from approximately 7 to 17 years of age (Figure 4g), indicating that individual patterns of methylation at the locus, once established in the early embryo, persist to adulthood.
Previous studies have cataloged interindividual variation in DNA methylation in specific cell types [29,30]. Our approach, screening for concordant variation in multiple tissues representing different embryonic lineages, is unique in enabling the identification of systemic interindividual variation . Here, using Bisulfite-seq to analyze the PBL and HF DNA methylomes of two Caucasian adults, we performed the first truly genomewide screen for human MEs. We followed this with a genomewide search for SoC effects on DNA methylation in Gambian infants. VTRNA2-1 was a top hit by both approaches. The convergence of these two independent genomewide screens at this one locus positions VTRNA2-1 as a potential major indicator of early environmental effects on epigenetic regulation in humans. Interindividual variation in VTRNA2-1 methylation has been reported in peripheral blood mononuclear cells of healthy individuals  and in adjacent normal tissues collected during tumor biopsies . Neither of those previous studies, however, demonstrated concordant interindividual variation across multiple tissues from the same individuals. (Although Romanelli et al.  showed four examples of allelic methylation in both placenta and cord blood of the same individuals, all were approximately 50% methylated - that is, no interindividual variation.) Here, by studying liver, kidney, and brain of Vietnamese cadavers, and PBL and HF of healthy Caucasians and Gambians we showed that hypomethylation, suggestive of loss of imprinting at VTRNA2-1, occurs systemically in specific individuals in diverse populations. Further, we demonstrated that this is partially determined by periconceptional environment, and is stable over at least 10 years.
The unbiased nature of our ME screen enabled us to characterize genomic features associated with epigenetic metastability. Rather than localized differences, our analysis identified substantial depletion of CGIs and SINEs, and enrichment of LINEs and ERVs across genomic regions at least 10 kb upstream and downstream of candidate MEs (Figure 2). Since transposable elements are important determinants of regional DNA methylation patterns , these widespread sequence features likely contribute to the stochastic epigenetic variation at these loci. The enrichment of ERVs is particularly noteworthy in that all documented mouse MEs are associated with retrotransposition of an intracisternal A particle (a murine ERV) .
Although genomically unbiased, our ME screen does have limitations. Due to the expense of performing Bisulfite-seq, we profiled two tissues from only two individuals. More MEs are likely to be discovered by future screens including more individuals; including at least three tissues (representing all three germ layer lineages) may also be advantageous. As expected, several of the genes identified as candidate MEs in our previous reduced-representation screen  were also identified here. Our results also corroborate those of another ME screen based on our multiple-tissue approach. Using the Illumina 450k array to profile DNA methylation in peripheral blood leukocytes and colonic mucosa of 10 children, Harris et al.  identified 1,776 CpG sites associated with 1,013 genes as candidate MEs. Of the 1,013 genes they identified, 198 (19.5%) overlap with those associated with the unfiltered candidate MEs we identified (Table S1 in Additional file 2), many more than expected by chance (P = 2.1 × 10-8; chi-squared test). Moreover, of the 1,776 CpGs Harris et al. identified as candidate MEs, four are within or near our top 10 SoC DMRs (Table S7 in Additional file 2). Among these, Harris et al. identified one probe (cg04515200) within the VTRNA2-1 SoC DMR (Figure 4c). Given the low intraindividual variance at VTRNA2-1 (Figure 4d), we were surprised that a lone probe would be identified as an ME (rather than the entire region). Indeed, examination of the Harris et al. data confirmed systemic interindividual variation in methylation across the entire VTRNA2-1 imprinted DMR (Figure S6 in Additional file 1), suggesting that perhaps the filtering criteria they used were overly conservative. Evidence of a SoC effect at the other three loci (Table S7 in Additional file 2) strengthens their candidacy as MEs.
Of these, the SoC DMR at ZFP57 is particularly interesting because ZFP57 plays a key role in maintaining allelic methylation during pre-implantation development . This genomic region (approximately 3 kb upstream of the gene) was identified as a candidate ME by both Harris et al.  and our current screen (prior to SNP filtering) (Table S1 in Additional file 2). Although it is intriguing that ZFP57 is among the top SoC DMRs (Table 1), the effect is very subtle (Figure S5c in Additional file 1) and should be interpreted with caution. Nonetheless, in a recent genomewide (450k) study of immune cells from cord blood of newborn infants , methylation in this same region was found to be strongly predicted by maternal folate status in late pregnancy. Future studies will be required to determine if this master regulator of allelic methylation is indeed an ME.
This and preceding studies [20,32] have thus far screened for MEs only in Caucasians. We did previously show that candidate MEs identified by our multiple-tissue screen in Caucasians exhibit similar patterns of interindividual DNA methylation variation across Asians and West Africans , suggesting that epigenetic metastability is an ancestral feature of the human genome. This conclusion is reinforced here by our data on VTRNA2-1. Indeed, our discovery in Gambians of a SoC effect at the VTRNA2-1 ME (which was identified in Caucasians) is a great strength of this study. Nonetheless, it will be important for future studies to perform the multiple-tissue ME screen in non-Caucasians.
One potential criticism of our approach for identifying SoC DMRs is that our 450k analyses were performed using unfractionated leukocytes. Using the method of Jaffe and Irizarry , however, we found evidence of only minor SoC effects on leukocyte composition (Figure S4 in Additional file 1), and our analyses of the 450k data included adjustment for this. An analysis of previous Illumina 450k data on flow-sorted blood cells  found no cell type-specific differences in DNA methylation at any of the 15 CpGs comprising the VTRNA2-1 SoC DMR (Table S5 in Additional file 2). Hence, the SoC effect at VTRNA2-1 should be unaffected by changes in blood cell composition. Indeed, we found a similar SoC effect in HF and PBL (Figure 4a,b), tissues derived from different germ layer lineages.
In an attempt to focus on interindividual epigenetic variation that is not genetically mediated, we filtered out candidate MEs associated with genetic variation within 200 bp bins. However, genetic variants that influence DNA methylation (mQTL) can operate over vast genomic distances. To evaluate potential longer-range genetic effects on DNA methylation at VTRNA2-1, we analyzed genome-wide mQTL data recently reported by Zhang et al.  on lymphoblastoid cell lines from 132 individuals of European and African origin. The distribution of polymorphic imprinting in these cell lines was impressively similar to that in primary tissues; the 10 CpG sites comprising the VTRNA2-1 imprinted DMR were hypomethylated in approximately 20% of individuals (Figure S2 in Additional file 1). However, at none of these sites did the study by Zhang et al. (currently among the largest genome-wide mQTL screens) detect mQTL. Given the strong bimodal distribution of individual VTRNA2-1 methylation, if hypomethylation at the locus was regulated by cis genetic variation, it should have been detected. Hence, these data argue strongly against interindividual variation at VTRNA2-1 being genetically mediated. The environmentally mediated effect of SoC on hypomethylation in Gambian children further supports this interpretation.
Together, our data suggest that the effect of maternal nutrition on DNA methylation at VTRNA2-1 exhibits all the hallmarks of ‘metabolic imprinting’ : a critical window of sensitivity (in the pre-implantation embryo), a dose-response relation between exposure and outcome, and a persistent effect. Because VTRNA2-1 is transcriptionally regulated by methylation at its promoter [14,17] and appears to act as a tumor suppressor in various types of cancer [14-16], metabolic imprinting of VTRNA2-1 DNA methylation is a likely determinant of cancer risk. Moreover, since the VTRNA2-1 transcript affects PKR-mediated regulation of immune function , this early environmental effect on DNA methylation could have far-reaching effects on immune function and might offer an explanation for how season of birth (which maps onto SoC) affects adult mortality from infectious disease in rural Gambians [19,37]. More generally, we anticipate that the list of candidate metastable epialleles we identified will provide a resource for future studies of epigenetic variation and human disease.
Materials and methods
Informed written consent was obtained from all subjects prior to participation, and experimental methods complied with the Helsinki declaration. Scientific approval for the Caucasian studies was obtained under IRB protocol H-18849 at the Baylor College of Medicine. For the Gambian studies, the Scientific Coordinating Committee of MRC Unit, The Gambia, granted scientific approval and the joint Gambian Government/MRC Ethics Committee (SCC/EC 1151) and the London School of Hygiene and Tropical Medicine Ethics Committee (EC 5525) granted ethical permission for this study. Sample collection, study populations, and DNA isolation have been previously described [10,20,21].
Bisulfite-seq, SNP calling from Bisulfite-seq data, and filtering bins based on SNP score
Bisulfite-seq library preparation and sequencing were performed in the Baylor College of Medicine Human Genome Sequencing Center, and read mapping and data processing were performed as previously described . The accuracy of our methylation calls by Bisulfite-seq has been quantitatively validated . Combined bisulfite-seq data from HF and PBL samples from two individuals (C01 and C02)  were used to determine SNP scores for 200 bp bins genomewide. Reads were mapped to hg19 with BISMARK  and BisSNP (v.0.82.2)  was run to call SNPs on each combined sample (PBL + HF) using the default settings and dbSNP 135. The VCFpostprocess tool of BisSNP was used with the default settings to filter the raw SNP calls. SNP scores were assigned at each locus for which there was a SNP called in either C01 or C02. A score of 1 was assigned to those loci at which both C01 and C02 had the same SNP call. A score of 0 was assigned otherwise. Average SNP scores were calculated for all 200-bp bins genomewide. Genomewide bins were defined as those containing at least 2 CpG sites and which had mapping coverage  in at least one of the four C01/C02 HF/PBL samples (N = 5,257,320). A bin’s SNP score was set to 1 if no SNP was called in either individual.
Formulation of the systemic interindividual variation index
Targeted analysis of BluePrint epigenome data
Bisulfite-seq methylation calls from each of six monocyte samples (C000S5, C0010K, C001UY, C004SQ, C005PS, S000RD) from the Blueprint Epigenome project  were placed into 200 bp bins and an average methylation score was determined for each bin. Each bin which had a methylation score in at least two samples was then assigned a range score, defined as the difference between the highest and lowest methylation score of all informative samples.
Comparison of genomic features among metastable epiallele and non-metastable epiallele genomewide bins
The distances from each non-ME and ME bin genomewide were determined for all SINE, LINE, and ERV elements (UCSC RepeatMasker track, hg19) and CGIs (UCSC CpG islands track, hg19). Non-ME bins (N = 298,979) were defined as those having a SIVI score between -5 and 5, a SNP score of 1, no overlap with segmental duplication (UCSC, hg19), and at least six CpG sites. ME bins (N = 109) had a minimum SIVI of 20 but otherwise the same characteristics. A 20 kb window centered on the midpoint of each bin was divided into 40 intervals of 500 bp each. The overlap score is the number of each type of individual elements that overlap the interval. The 'normalized overlap score' was calculated by dividing the raw overlap score by the total number of bins in each set. Chi-squared tests were performed on the raw overlap scores for CGIs and SINE, LINE, and ERV elements. A chi-squared test was also performed to determine the significance of the localization of ME bins to subtelomeric regions (defined as the 1 Mb flanking each 10 kb telomere).
Quantitative analysis of CpG site-specific DNA methylation at VTRNA2-1 was performed by bisulfite pyrosequencing . The pyrosequencing assay was validated using standards composed of known mixtures of methylated and unmethylated human genomic DNA  (Figure S7 in Additional file 1). The primers used were as follows: forward TGAAGGTGTGATAGAAAGTATG, reverse (Biotin) ACATTTTTTTATCCCCATA, sequencing AGTATGGAGGTTGGTTATT.
450k array hybridization, data processing, and analysis of the Gambian sample cohort
Sample preparation and hybridization to the Illumina HumanMethylation450 BeadChip arrays were performed at the Genetics Services Platform of the International Agency for Research on Cancer, according to manufacturer’s instructions. Data processing and analysis were performed as follows: 1) pre-filtering and quality control (QC); 2) color adjustment, probe-type bias correction and inter-sample quantile normalization; 3) batch effect correction and evaluation of alternative QC pipelines; 4) data-driven estimation of white blood cell counts; 5) replicate removal and outlier detection; 6) removal of probes close to SNPs; 7) bump hunting analysis.
Pre-filtering and quality control
Raw data for 485,577 CpG probes on the 450k array were loaded from IDAT files (n = 124 samples including three technical replicates). Array-wide two-dimensional multi-dimensional scaling plots clustered into two groups representing infant sex, confirming that recorded sex was correct for all samples. We then removed 11,656 probes on X and Y chromosomes, together with 57 SNP probes provided to detect potential sample mix-ups; 473,864 probes remained.
A further probe filtering step was implemented based on probe detection P-values. Using a detection P-value threshold of P = 0.01, the maximum sample failure rate was 0.00383 (sample ID = 9007225117_R06C01; Figure S8 in Additional file 1). This sample also appears as an outlier on principal component analysis plots of array-wide methylation (Figure S9 in Additional file 1) and was removed from all subsequent analysis; n = 123 samples remained. We also removed 6,600 probes that failed in one or more samples (using the same detection P-value threshold), leaving 467,264 probes for subsequent analysis. Additional quality control checks (overall signal intensity (M + U) across samples, M-value distributions and multi-dimensional scaling plot excluding X and Y probes) revealed no further issues.
Color adjustment, probe-type bias correction and inter-sample quantile normalization
Standard pre-processing adjustments as described in  were applied as follows: a) correction for probe color bias (lumi col adj), b) inter-sample quantile normalization (lumi QN), and c) probe type correction using beta mixture quantile dilation (BMIQ) 
Batch effect correction and evaluation of alternative quality control pipelines
p1A: col adj + QN + BMIQ + batch correction (using ComBat )*
p1B: col adj + QN + BMIQ + batch correction, adjusting for SoC (using ComBat )*
p2: col adj + BMIQ + adjust for batch covariates on locus-by-locus basis**
p3: col adj + QN + BMIQ adjust for batch covariates on locus-by-locus basis**
p4: col adj + QN + BMIQ (no adjustment for batch covariates)
p5: col adj + BMIQ (no adjustment for batch covariates)
*Batch correction using ComBat can be applied with or without adjusting for the variable of interest (in our case SoC).
**For p2 and p3, batch effects are first estimated in a linear multiple regression including batch covariates and SoC. Adjusted data are then the residual variation after adjusting for estimated batch effects only.
The performance of each pipeline was evaluated by comparing 450k beta values with percentage methylation estimates obtained from pyrosequencing data at five ME loci overlapping 450k CpG probes (Figure S3 in Additional file 1). Each pipeline was further evaluated by comparing 450k beta values between three pairs of technical replicates on the 450k array. Bland-Altman plots  illustrating replication performance across the full range of percentage methylation for each of three technical replicates using ‘optimal’ QC pipeline (p4) are presented in Figure S10 in Additional file 1.
Summary of QC pipeline evaluation
The p4 (col adj + QN + BMIQ) and p5 (col adj + BMIQ) approaches gave the best performance when comparing 450k and pyrosequencing methylation estimates at overlapping loci.
Neither of these methods (p4 or p5) feature direct batch adjustment, although p4 does remove a substantial amount of batch variation through inter-sample QN.
Regression-based locus-by-locus adjustment for batch covariates gives relatively poor 450k versus pyrosequencing correlations.
We selected p4 (col adj + QN + BMIQ) as our optimal pipeline. This achieves a Pearson correlation of 0.92 (Spearman R = 0.94) between 450k and pyrosequencing assays, with 88% of beta values differing by less than 15% between the two platforms. These figures compare favorably with a similar analysis  performed across 340 CpGs from 4 human breast cancer cell lines (Spearman R = 0.88; 81% with beta difference <15%).
Data-driven estimation of white blood cell counts
Interpretation of analyses investigating DMRs or differentially methylated positions (DMPs) in whole blood should be treated with caution, due to the possibility of confounding by white blood cell (WBC) type. We obtained methylation data-driven estimates of WBC-type composition using the method described by Jaffe and Irizarry . These estimates are stable across technical replicates (Figure S11 in Additional file 1).
There is some evidence for potential confounding by WBC in our data. First, WBC type may be weakly associated with SoC (Figure S4 in Additional file 1). Secondly WBC composition is strongly associated with principal components explaining a large portion of genomewide variation in methylation (Table S6 in Additional file 2). We therefore performed our DMR analysis using bump hunting with adjustment for estimated WBC composition. An independent study  identified probes on the 450k array that are differentially methylated according to WBC type. None of the probes mapping to VTRNA2-1 in our study fall within this set (Table S5 in Additional file 2).
Replicate removal and outlier detection
For each of the three technical replicates, we remove the replicate with the highest probe fail rate. One outlier is also removed (see ‘Pre-filtering and quality control’ section above), leaving n = 120 samples for final analysis.
Removal of probes close to SNPs
450k probes within 10 bp of a common African SNP, defined as ‘AFR’ designated polymorphisms with a minor allele frequency >1% using data from the 1000 Genomes Project  were removed. We identified 42,435 such probes, so that 424,829 CpGs remain for the bump hunting analysis. SNP filtering was performed using the R ‘Illumina450ProbeVariants.db’ package.
Bump hunting analysis
The bump hunting method  measures differential methylation across pre-defined clusters of neighboring CpGs. We used the recommended proximity-based criteria for defining clusters so that each cluster contains a minimum of seven probes on the 450k array, with each probe located within 300 bp of its nearest neighbour. After filtering of SNP-proximal probes (see previous section), this resulted in a total of 10,394 clusters, with a maximum cluster size of 104 CpGs. We performed the bump hunting analysis with methylation pre-adjusted for sex and WBC composition, since the inclusion of adjustment covariates in the bumphunter linear model is not recommended (see ). M-values (logit-transformed beta-values) are used in place of untransformed beta-values throughout .
The bump hunting method first fits a linear model at each CpG within a cluster, with M-value as the outcome variable and SoC as the predictor variable. A smooth curve (loess) is then fitted to the estimated SoC coefficients across each cluster. Regions where the smoothed estimated coefficients deviate far from zero were considered candidate DMRs. Specifically, these are defined as groups of neighboring probes whose smoothed absolute coefficients exceed the 99th percentile of all estimated coefficients. To form a null distribution that accounts for (a) correlations between probes, (b) differences in cluster size and (c) potential non-normal distribution of model errors, this process is repeated 1,000 times with the outcome variables (M-values) permuted. The permutation P-value for a specific DMR is then the proportion of candidate DMRs across all permutations that had both a larger mean absolute coefficient and a larger length (number of probes) than the empirically observed DMR. We additionally report FWER-adjusted P-values, which are the proportion of permutations having at least one region as extreme as the empirically observed DMR.
Analysis of associations between infant methylation at VTRNA2-1 and season of conception and maternal 1-carbon biomarker concentrations
VTRNA2-1 methylation assayed at three CpG sites by bisulfite pyrosequencing was observed to be highly correlated (mean Spearman R = 0.93). Because percentage methylation values followed a distinct bimodal pattern, we used mean methylation, dichotomized at 40% , as our outcome variable. Association of SoC with dichotomized mean methylation was assessed using a Pearson chi-squared test. For testing associations of maternal biomarkers with dichotomized mean methylation, all biomarkers were back-extrapolated to time of conception, adjusted for gestational age, and analyzed in the logarithm, as described previously . Biomarker associations were analyzed in a logistic regression model using the glm function in R.
The Gene Expression Omnibus (GEO) accession number for the raw sequence reads for the four Bisulfite-seq libraries is GSE44806. The GEO accession number for the original 450k data sets is GSE59592.
beta mixture quantile dilation
differentially methylated region
long interspersed nuclear element
methylation quantitative trait locus
peripheral blood lymphocyte
short interspersed nuclear element
systemic interindividual variation index
season of conception
white blood cell
We thank Adam Gillum (Baylor College of Medicine) for assistance with the figures, Dr Yongtao Guan (Baylor College of Medicine) for help with the analysis of the LCL methylation data, and Dr Florence Le Calvez-Kelm and Mr Geoffroy Durand from the Genetic Cancer Susceptibility group (IARC, Lyon) for performing the HM450 beadchip assays. RAW was supported by grants from NIH/National Institute of Diabetes and Digestive and Kidney Diseases (1R01DK081557) and the USDA (CRIS 6250-51000-055). The Gambian study was supported by a Wellcome Trust grant WT086369MA (to BJH), and core funding MCA760-5QX00 to the International Nutrition Group by the UK Medical Research Council (MRC) and the UK Department for International Development (DFID) under the MRC/DFID Concordat agreement. Our thanks go to the study participants in The Gambia and staff at MRC Keneba, The Gambia. MNR, JMC and YYG were supported by a grant from IARC. HHV and ZH were supported by a Grand Challenges Exploration grant from the Bill and Melinda Gates Foundation.
- Jaenisch R, Bird A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat Genet. 2003;33:245–54.PubMedView Article
- Gluckman PD, Hanson MA, Buklijas T, Low FM, Beedle AS. Epigenetic mechanisms that underpin metabolic and cardiovascular diseases. Nat Rev Endocrinol. 2009;5:401–8.PubMedView Article
- Ng JW, Barrett LM, Wong A, Kuh D, Smith GD, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol. 2012;13:246.PubMed CentralPubMedView Article
- Feil R, Fraga MF. Epigenetics and the environment: emerging patterns and implications. Nat Rev Genet. 2012;13:97–109.PubMed
- Jirtle RL, Skinner MK. Environmental epigenomics and disease susceptibility. Nat Rev Genet. 2007;8:253–62.PubMedView Article
- Cedar H, Bergman Y. Linking DNA methylation and histone modification: patterns and paradigms. Nat Rev Genet. 2009;10:295–304.PubMedView Article
- Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12:529–41.PubMed CentralPubMedView Article
- Waterland RA, Michels KB. Epigenetic epidemiology of the developmental origins hypothesis. Annu Rev Nutr. 2007;27:363–88.PubMedView Article
- Rakyan VK, Blewitt ME, Druker R, Preis JI, Whitelaw E. Metastable epialleles in mammals. Trends Genet. 2002;18:348–51.PubMedView Article
- Dominguez-Salas P, Moore SE, Baker MS, Bergen AW, Cox SE, Dyer RA, et al. Maternal nutrition at conception modulates DNA methylation of human metastable epialleles. Nat Commun. 2014;5:3746.PubMed CentralPubMedView Article
- Waterland RA, Dolinoy DC, Lin JR, Smith CA, Shi X, Tahiliani KG. Maternal methyl supplements increase offspring DNA methylation at Axin fused. Genesis. 2006;44:401–6.PubMedView Article
- Waterland RA, Jirtle RL. Transposable elements: targets for early nutritional effects on epigenetic gene regulation. Mol Cell Biol. 2003;23:5293–300.PubMed CentralPubMedView Article
- Dominguez-Salas P, Moore SE, Cole D, da Costa KA, Cox SE, Dyer RA, et al. DNA methylation potential: dietary intake and blood concentrations of one-carbon metabolites and cofactors in rural African women. Am J Clin Nutr. 2013;97:1217–27.PubMed CentralPubMedView Article
- Treppendahl MB, Qiu X, Sogaard A, Yang X, Nandrup-Bus C, Hother C, et al. Allelic methylation levels of the noncoding VTRNA2-1 located on chromosome 5q31.1 predict outcome in AML. Blood. 2012;119:206–16.PubMed CentralPubMedView Article
- Cao J, Song Y, Bi N, Shen J, Liu W, Fan J, et al. DNA methylation-mediated repression of miR-886-3p predicts poor outcome of human small cell lung cancer. Cancer Res. 2013;73:3326–35.PubMedView Article
- Lee HS, Lee K, Jang HJ, Lee GK, Park JL, Kim SY, et al. Epigenetic silencing of the non-coding RNA nc886 provokes oncogenes during human esophageal tumorigenesis. Oncotarget. 2014;5:3472–81.PubMed CentralPubMed
- Paliwal A, Temkin AM, Kerkel K, Yale A, Yotova I, Drost N, et al. Comparative anatomy of chromosomal domains with imprinted and non-imprinted allele-specific DNA methylation. PLoS Genet. 2013;9, e1003622.PubMed CentralPubMedView Article
- Romanelli V, Nakabayashi K, Vizoso M, Moran S, Iglesias-Platas I, Sugahara N, et al. Variable maternal methylation overlapping the nc886/vtRNA2-1 locus is locked between hypermethylated repeats and is frequently altered in cancer. Epigenetics. 2014;9:783–90.PubMed CentralPubMedView Article
- Moore SE, Cole TJ, Poskitt EM, Sonko BJ, Whitehead RG, McGregor IA, et al. Season of birth predicts mortality in rural Gambia. Nature. 1997;388:434.PubMedView Article
- Waterland RA, Kellermayer R, Laritsky E, Rayco-Solon P, Harris RA, Travisano M, et al. Season of conception in rural Gambia affects DNA methylation at putative human metastable epialleles. PLoS Genet. 2010;6, e1001252.PubMed CentralPubMedView Article
- Kunde-Ramamoorthy G, Coarfa C, Laritsky E, Kessler NJ, Harris RA, Xu M, et al. Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing. Nucleic Acids Res. 2014;42, e43.PubMed CentralPubMedView Article
- Adams D, Altucci L, Antonarakis SE, Ballesteros J, Beck S, Bird A, Bock C, Boehm B, Campo E, Caricasole A, et al: BLUEPRINT to decode the epigenetic signature written in blood. Nat Biotechnol. 2012;30:224-226.
- Zhang X, Moen EL, Liu C, Mu W, Gamazon ER, Delaney SM, et al. Linking the genetic architecture of cytosine modifications with human complex traits. Hum Mol Genet. 2014;23:5893–905.PubMedView Article
- Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98:288–95.PubMedView Article
- Michels KB, Binder AM, Dedeurwaerder S, Epstein CB, Greally JM, Gut I, et al. Recommendations for the design and analysis of epigenome-wide association studies. Nat Methods. 2013;10:949–55.PubMedView Article
- Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol. 2012;41:200–9.PubMed CentralPubMedView Article
- Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014;15:R31.PubMed CentralPubMedView Article
- Oakes CC, Claus R, Gu L, Assenov Y, Hullein J, Zucknick M, et al. Evolution of DNA methylation is linked to genetic aberrations in chronic lymphocytic leukemia. Cancer Discov. 2014;4:348–61.PubMed CentralPubMedView Article
- Gemma C, Ramagopalan SV, Down TA, Beyan H, Hawa MI, Holland ML, et al. Inactive or moderately active human promoters are enriched for inter-individual epialleles. Genome Biol. 2013;14:R43.PubMed CentralPubMedView Article
- Lou S, Lee HM, Qin H, Li JW, Gao Z, Liu X, et al. Whole-genome bisulfite sequencing of multiple individuals reveals complementary roles of promoter and gene body methylation in transcriptional regulation. Genome Biol. 2014;15:408.PubMed CentralPubMedView Article
- Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet. 2007;8:272–85.PubMedView Article
- Harris RA, Nagy-Szakal D, Kellermayer R. Human metastable epiallele candidates link to common disorders. Epigenetics. 2013;8:157–63.PubMed CentralPubMedView Article
- Li X, Ito M, Zhou F, Youngson N, Zuo X, Leder P, et al. A maternal-zygotic effect gene, Zfp57, maintains both maternal and paternal imprints. Dev Cell. 2008;15:547–57.PubMed CentralPubMedView Article
- Amarasekera M, Martino D, Ashley S, Harb H, Kesper D, Strickland D, et al. Genome-wide DNA methylation profiling identifies a folate-sensitive region of differential methylation upstream of ZFP57-imprinting regulator in humans. FASEB J. 2014;28:4068–76.PubMedView Article
- Waterland RA, Garza C. Potential mechanisms of metabolic imprinting that lead to chronic disease. Am J Clin Nutr. 1999;69:179–97.PubMed
- Lee K, Kunkeaw N, Jeon SH, Lee I, Johnson BH, Kang GY, et al. Precursor miR-886, a novel noncoding RNA repressed in cancer, associates with PKR and modulates its activity. RNA. 2011;17:1076–89.PubMed CentralPubMedView Article
- Moore SE, Cole TJ, Collinson AC, Poskitt EM, McGregor IA, Prentice AM. Prenatal or early postnatal events predict infectious deaths in young adulthood in rural Africa. Int J Epidemiol. 1999;28:1088–95.PubMedView Article
- Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011;27:1571–2.PubMed CentralPubMedView Article
- Liu Y, Siegmund KD, Laird PW, Berman BP. Bis-SNP: Combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biol. 2012;13:R61.PubMed CentralPubMedView Article
- Shen L, Guo Y, Chen X, Ahmed S, Issa JP. Optimizing annealing temperature overcomes bias in bisulfite PCR methylation analysis. Biotechniques. 2007;42:48–58.PubMedView Article
- Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey SL, et al. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications. Nat Biotechnol. 2010;28:1097–105.PubMed CentralPubMedView Article
- Marabita F, Almgren M, Lindholm ME, Ruhrmann S, Fagerstrom-Billai F, Jagodic M, et al. An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics. 2013;8:333–46.PubMed CentralPubMedView Article
- Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450k DNA methylation data. Bioinformatics. 2013;29:189–96.PubMed CentralPubMedView Article
- Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11:733–9.PubMedView Article
- Roessler J, Ammerpohl O, Gutwein J, Hasemeier B, Anwar SL, Kreipe H, et al. Quantitative cross-validation and content analysis of the 450k DNA methylation array from Illumina. Inc BMC Res Notes. 2012;5:210.View Article
- 1000 Genomes Project. http://www.1000genomes.org.
- 47.Bioconductor bumphunter reference manual. . http://www.bioconductor.org/packages/release/bioc/manuals/bumphunter/man/bumphunter.pdf.
- Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics. 2010;11:587.PubMed CentralPubMedView Article
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.