Cohort descriptions
The Lothian Birth Cohort 1921
Data were from the Lothian Birth Cohort 1921 (LBC1921), which is the basis of a longitudinal study of aging [13,15]. Participants were born in 1921 and most completed a cognitive ability test at about the age of 11 years in the Scottish Mental Survey 1932 (SMS1932) [36]. The SMS1932 was administered nationwide to almost all 1921-born children who attended school in Scotland on 1 June 1932. The cognitive test was the Moray House Test No. 12, which provides a measure of general cognitive ability and has a scoring range between 0 and 76. The LBC1921 study attempted to follow up individuals who might have completed the SMS1932 and resided at about the age of 79 years in the Lothian region (Edinburgh and its surrounding areas) of Scotland; 550 people (n = 234, 43% men) were successfully traced and participated in the study from the age of 79 years. To date, there have been four additional follow-up waves at average ages of 83, 87, 90, and 92 years. The cohort has been deeply phenotyped during the later-life waves, including blood biomarkers, cognitive testing, and psycho-social, lifestyle, and health measures [13]. Genome wide single nucleotide polymorphisms and exome chip data are also available. DNA methylation measured in subjects at an average age of 79 (n = 514) was used for analyses in this report.
Lothian Birth Cohort 1936
The methylation mortality survival analysis was investigated in a second study, the Lothian Birth Cohort 1936 (LBC1936) [13,14]. All participants were born in 1936. Most had taken part in the Scottish Mental Survey 1947 at a mean age of 11 years as part of national testing of almost all children born in 1936 who attended Scottish schools on 4 June 1947 [37]. The cognitive test administered was the same Moray House Test No. 12 used in the SMS1932. A total of 1,091 participants (n = 548, 50% men) who were living in the Lothian area of Scotland were re-contacted in later life. Extensive phenotyping has also been carried out in this study, with data collection waves at three time points [13]. Genome-wide single nucleotide polymorphisms and exome chip data are also available. DNA methylation was measured in 1,004 subjects at Wave 1 (mean age, 70 years). To date, there have been two additional follow-up waves at average ages of 73 and 76 years.
The Framingham Heart Study
Framingham Heart Study (FHS) is a community-based longitudinal study of participants living in and near Framingham, MA, at the start of the study in 1948 [16]. The Offspring cohort comprised the children and spouses of the original FHS participants, as described previously [17]. Briefly, enrollment for the Offspring cohort began in 1971 (n = 5,124), and in-person evaluations occurred approximately every 4 to 8 years thereafter. The current analysis was limited to participants from the Offspring cohort who survived until the eighth examination cycle (2005 to 2008) and consented to genetics research. DNA methylation data of peripheral blood samples collected at the eighth examination cycle were available in 2,741 participants.
The Normative Aging Study
The US Department of Veterans Affairs (VA) Normative Aging Study (NAS) is an ongoing longitudinal cohort established in 1963, which included men who were aged 21 to 80 years and free of known chronic medical conditions at entry [18,19]. Participants were subsequently invited to medical examinations every 3 to 5 years. At each visit, participants provided information on medical history, lifestyle, and demographic factors, and underwent a physical examination and laboratory tests. DNA samples were collected from 1999 to 2007 from the 675 active participants and used for DNA methylation analysis. We excluded 18 participants who were not of European descent or had missing information on race, leaving a total of 657 individuals.
Brisbane Systems Genetics Study
The Brisbane Systems Genetic Study (BSGS) [20] is a cohort comprising adolescent monozygotic (MZ) and dizygotic (DZ) twins, their siblings, and their parents. They were originally recruited into an ongoing study of the genetic and environmental factors influencing cognition and pigmented nevi. DNA methylation was measured on 614 individuals from 117 families of European descent. Families consist of adolescent monozygotic (MZ; n = 67 pairs) and dizygotic (DZ; n = 111 pairs) twins, their siblings (n = 119), and their parents (n = 139). Children have a mean age of 14 years (age range, 9–23 years) and parents 47 years (age range, 33–75 years).
Ethics
LBC consent
Following informed consent, venesected whole blood was collected for DNA extraction in both LBC1921 and LBC1936. Ethics permission for the LBC1921 was obtained from the Lothian Research Ethics Committee (Wave 1: LREC/1998/4/183). Ethics permission for the LBC1936 was obtained from the Multi-Centre Research Ethics Committee for Scotland (Wave 1: MREC/01/0/56), the Lothian Research Ethics Committee (Wave 1: LREC/2003/2/29). Written informed consent was obtained from all subjects.
FHS consent
All participants provided written informed consent at the time of each examination visit. The study protocol was approved by the Institutional Review Board at Boston University Medical Center (Boston, MA, USA).
NAS consent
The NAS study was approved by the Institutional Review Boards (IRBs) of the participating institutions. Participants have provided written informed consent at each visit.
BSGS consent
The BSGS study was approved by the Queensland Institute for Medical Research Human Research Ethics Committee. All participants gave informed written consent.
DNA methylation measurement
In all cohorts, bisulphite converted DNA samples were hybridised to the 12 sample Illumina HumanMethylation450BeadChips [38] using the Infinium HD Methylation protocol and Tecan robotics (Illumina, San Diego, CA, USA).
LBC1921 and LBC1936 DNA methylation
DNA was extracted from 514 whole blood samples in LBC1921 and from 1,004 samples in LBC1936. Samples were extracted at MRC Technology, Western General Hospital, Edinburgh (LBC1921) and the Wellcome Trust Clinical Research Facility (WTCRF), Western General Hospital, Edinburgh (LBC1936), using standard methods. Methylation typing of 485,512 probes was performed at the WTCRF. Raw intensity data were background-corrected and methylation beta-values generated using the R minfi package [39]. Quality control analysis was performed to remove probes with a low (<95%) detection rate at P <0.01. Manual inspection of the array control probe signals was used to identify and remove low quality samples (for example, samples with inadequate hybridization, bisulfite conversion, nucleotide extension, or staining signal). The Illumina-recommended threshold was used to eliminate samples with a low call rate (samples with <450,000 probes detected at P <0.01). Since the LBC samples had previously been genotyped using the Illumina 610-Quadv1 genotyping platform, genotypes derived from the 65 SNP control probes on the methylation array using the wateRmelon package [40] were compared to those obtained from the genotyping array to ensure sample integrity. Samples with a low match of genotypes with SNP control probes, which could indicate sample contamination or mix-up, were excluded (n = 9). Moreover, eight subjects whose predicted sex, based on XY probes, did not match reported sex were also excluded.
FHS DNA methylation
Peripheral blood samples were collected at the eighth examination samples (2005 to 2008). Genomic DNA was extracted from buffy coat using the Gentra Puregene DNA extraction kit (Qiagen) and bisulfite converted using EZ DNA Methylation kit (Zymo Research Corporation). DNA methylation quantification was conducted in two laboratory batches. Methylation beta values were generated using the Bioconductor minfi package with background correction. Sample exclusion criteria included poor SNP matching of control positions, missing rate >1%, outliers from multi-dimensional scaling (MDS), and sex mismatch. Probes were excluded if missing rate >20%. In total, 2,635 samples and 443,304 CpG probes remained for analysis.
NAS DNA methylation
DNA was extracted from buffy coat using the QIAamp DNA Blood Kit (QIAGEN, Valencia, CA, USA). A total of 500 ng of DNA was used to perform bisulfite conversion using the EZ-96 DNA Methylation Kit (Zymo Research, Orange, CA, USA). To limit chip and plate effects, a two-stage age-stratified algorithm was used to randomize samples and ensure similar age distributions across chips and plates; we randomized 12 samples - which were sampled across all the age quartiles - to each chip, then chips were randomized to plates (each housing eight chips). Quality control analysis was performed to remove samples where >1% of probes had a detection P value >0.05. The remaining samples were preprocessed using the Illumina-type background correction without normalization as reimplemented in the Bioconductor minfi package, which was used to generate methylation beta values [39]. All 485,512 CpG and CpH probes were in the working set.
BSGS DNA methylation
DNA was extracted from peripheral blood lymphocytes by the salt precipitation method [41] from samples that were time matched to sample collection of PAXgene tubes for gene expression studies in the Brisbane Systems Genetics Study [20]. Bisulphite converted DNA samples were hybridized to the 12 sample Illumina HumanMethylation450 BeadChips using the Infinium HD Methylation protocol and Tecan robotics (Illumina, San Diego, CA, USA). Samples were randomly placed with respect to the chip they were measured on and to the position on that chip in order to avoid any confounding with family. Box-plots of the red and green intensity levels and their ratio were used to ensure that no chip position was under- or over-exposed, with any outlying samples repeated. Similarly, the proportion of probes with detection P value less than 0.01 was examined to confirm strong binding of the sample to the array. Raw intensity values were background corrected using the Genome Studio software performing normalization to internal controls and background subtraction.
Mortality ascertainment
LBC mortality ascertainment
For both LBC1921 and LBC1936, mortality status was obtained via data linkage from the National Health Service Central Register, provided by the General Register Office for Scotland (now National Records of Scotland). Participant deaths and cause of death are routinely flagged to the research team on approximately a 12-weekly basis.
FHS mortality ascertainment
Deaths that occurred prior to 1 January 2013 were ascertained using multiple strategies, including routine contact with participants for health history updates, surveillance at the local hospital and in obituaries of the local newspaper, and queries to the National Death Index. We requested death certificates, hospital and nursing home records prior to death, and autopsy reports. When cause of death was undeterminable, the next of kin were interviewed. The date and cause of death were reviewed by an endpoint panel of three investigators.
NAS mortality ascertainment
Regular mailings to study participants have been used to maintain vital-status information, and official death certificates were obtained for decedents from the appropriate state health department. Death certificates were reviewed by a physician, and cause of death coded by an experienced research nurse using ICD-9. Both participant deaths and cause of death are routinely updated by the research team and last update available was 31 December 2013.
Covariate measurement
LBC covariates
Mortality-associated variables assessed in LBC1921 and LBC1936 were used as covariates in the statistical models: educational attainment, age-11 cognitive ability, APOE e4 status (carriers versus non-carriers), smoking status, and the presence or absence of diabetes, high blood pressure, or cardiovascular disease. Age-11 cognitive ability (age-11 IQ) was measured in 1932 for LBC1921 and in 1947 for LBC1936 using the Moray House Test Number 12, described above. All other variables were measured at the late-life baseline waves (age 79 years for LBC1921 and age 70 years for LBC1936). APOE was genotyped from venous blood using PCR amplification of a 227-bp fragment of the APOE gene, which contains the two single nucleotide polymorphisms that are used to define the e2, e3, and e4 alleles [42] in LBC1921, and by TaqMan technology (Applied Biosystems, Carlsbad, CA, USA) in LBC1936. Subjects were then categorized by the presence or absence of the e4 allele. Social class was based on the most prestigious occupation held by the participant prior to retirement. It was grouped into five categories in LBC1921 and six categories in LBC1936, where Class III was split into manual and non-manual professions [43,44]. It was treated as a continuous variable with lower values representing the more prestigious classes. The other variables were determined via self-report: number of years of education (measured as a continuous variable), diabetes (yes/no), high blood pressure (yes/no), cardiovascular disease (yes/no), and categorical smoking status (current/ex-smoker, never smoked).
Given the known influence of blood cell count on methylation [21], we adjusted for five types of white blood cell count (basophils, monocytes, lymphocytes, eosinophils, and neutrophils) that were measured at on the same blood that was analyzed for methylation. These data were collected and processed the same day; technical details are reported in McIllhagger et al. [45].
FHS covariates
At the eighth in-person examination visit participants completed a questionnaire that inquired about their education, occupation, smoking status, and disease status. Highest levels of educational attainment was assessed by eight categories - no schooling, grades 1 to 8, grades 9 to 11, completed high school or GED, some college but no degree, technical school certificate, associate degree, Bachelor’s degree, graduate or professional degree. Smoking status was dichotomized as current/past smokers and those who reported to never have smoked. Diabetes was defined as having fasting blood glucose ≥126 mg/dl or current treatment for diabetes. Hypertension was defined as having systolic blood pressure ≥140 mmHg, diastolic blood pressure ≥90 mmHg, or current treatment for hypertension. Cardiovascular disease was determined by a panel of three physicians, who reviewed participants’ medical records, laboratory findings, and clinic exam notes.
NAS covariates
At each in-person examination visit, participants completed a questionnaire that enquired about their smoking status, education, diabetes (self-reported diagnosis and/or use of diabetes medications), and diagnosis of coronary heart disease (validated on medical records, ECG, and physician exams). High blood pressure was defined as antihypertensive medication use or SBP ≥140 mmHg or DBP ≥90 mmHg at study visit. APOE-e4 allele status was assessed through genotyping on a Sequenom MassArray MALDI-TOF mass spectrometer.
Estimated naive T cell abundance
In LBC1921, LBC1936, FHS, and NAS, we considered the abundance of defined different subtypes of T cells: Naive T cells were defined as RA+ IL7 Receptor + cells. Central Memory T cells = RA negative IL7 Receptor positive Effector memory = RA negative IL7 Receptor negative. To estimate the naive T cells in our cohort studies, we used a prediction method that was developed on an independent dataset. The predictor of T cell counts (that is, naive CD4 T cell count) was found by applying a penalized regression model (elastic net) to regress T cell counts (dependent variable) on a subset of CpGs reported in Supplemental Table 3 from Zhang et al. [46]. By applying this resulting penalized regression model to our data, we arrived at predicted T cell counts.
Data availability
LBC methylation data have been submitted to the European Genome-phenome Archive under accession number EGAS00001000910; phenotypic data are available at dbGaP under the accession number phs000821.v1.p1. The FHS and NAS data are available at dbGaP under the accession numbers phs000724.v2.p9 phs000853.v1.p1, respectively. BSGS methylation data are available from the NCBI Gene Expression Omnibus under accession number GSE56105.
Statistical analyses
Two measures of DNA methylation age (mage) were calculated. The Horvath [11] mage uses 353 probes common to the Illumina 27 K and 450 K Methylation arrays using data from a range of tissues and cell types. The Hannum [10] mage is based on 71 methylation probes from the Illumina 450 K Methylation array derived as the best predictors of age using data generated from whole blood. Of the Hannum age predictor probes, 70, 71, and 71 were included in the LBC, NAS, and FHS data, respectively. mage was calculated as the sum of the beta values multiplied by the reported effect sizes for the Hannum predictor. For the Horvath predictor, mage was determined in all cohorts using the online calculator (http://labs.genetics.ucla.edu/horvath/dnamage/). A third predictor, based on the three probes highlighted in the Weidner et al. paper [12], was also examined although, due to its poorer predictive accuracy, it was not included for the main analyses. To account for technical variability in the measurement of the methylation CpGs in the LBC studies, mage was adjusted for plate, array, position on the chip, and hybridisation date (all treated as fixed effect factors) using linear regression. In a sensitivity analysis, additional adjustments were made for white blood cell counts (the number of basophils, monocytes, lymphocytes, eosinophils, and neutrophils per volume of blood) or DNA methylation-estimated cell counts, as described elsewhere [21]. The residuals from these models were added to the mean predicted methylation age to give the new, adjusted measure of mage. The two methylation age predictors contained six overlapping probes. A methylation-based age acceleration index (Δage) was calculated for all subjects, defined as the adjusted methylation age in years minus chronological age at sample collection in years (Δage = mage - chronological age).
Cox proportional hazards regression models were used to test the association between the Horvath and Hannum measures of Δage and mortality, adjusting for age at sample collection, and sex. Cox models in FHS further adjusted for laboratory batch (fixed effect) and used a robust variance estimator to account for familial relatedness. Hazard ratios for Δage were expressed per 5 years of methylation age acceleration. Schoenfeld residuals were examined to test the proportional hazards assumption. Sensitivity analyses, also using Cox proportional hazards regression, excluded deaths within the first 2 years of follow-up to eliminate the potential influences of (fatal) acute illness on the methylation measurements. Analyses to account for possible confounders/mediators included potential life-course predictors of mortality: age-11 IQ (LBC only), education in years, social class (LBC only), APOE e4 carrier status (LBC and NAS), smoking status, and self-reported diabetes, high blood pressure, and cardiovascular disease. A fully adjusted model was tested, in which all variables were entered together. Chronological age- and sex-adjusted linear regression models were used to explore the relationship between Δage and the additional covariates; for example, does methylation age acceleration depend on smoking or diabetes?
The results from the individual cohorts were meta-analyzed using the ‘meta’ package in R [47]. The cohorts were weighted based on the standard errors of the log hazard ratios. There was no evidence of cohort heterogeneity in the primary Cox model analyses according to the DerSimonian-Laird estimator of between-study variance so fixed effects models were considered.
All analyses were performed in the statistical software R [48] with the Cox models utilizing the 'survival' library [49].
Finally, we calculated the heritability of Δage in the BSGS cohort. As mage was a better predictor of chronological age in the adult compared to adolescent samples, the difference between methylation age and chronological age was firstly standardized within generations (parents and offspring). Regression models were fitted to methylation age removing the effects of age and sex. Additionally, the regression on the adolescent samples included age2 to account for the non-linearity between chronological and methylation age [11]. The residuals from these regressions were standardized to have a variance of 1 before combining the generations. See Additional file 8 for a graphical representation of the correction performed.
For each probe, the Intra Class Correlation of Δage for the various relative pairs was calculated using ANOVA as follows:
$$ ICC=\frac{M{S}_B-M{S}_W}{M{S}_B+M{S}_W} $$
where MS
B
is the Mean Square Between pairs and MS
W
is the Mean Square Within. The confidence intervals were based on the number of pseudo-independent relative pair for each relationship.
The heritability for each probe was estimated by partitioning its variance into additive genetic (V
a
) and environmental (V
e
) component by fitting a linear mixed model of the form
$$ \mathbf{y}=\mu +\mathbf{Z}\mathbf{a}+\mathbf{e} $$
where y is the vector of adjusted methylation age, a is the additive genetic effects and e is the unique environmental effects (residuals). The model was fitted using QTDT [50].