Skip to main content

Table 1 Methods used in publication derivation of DNA methylation profile scores (MPS)

From: An overview of DNA methylation-derived trait score methods and applications

Method: software

Brief summary (see Additional file 1 for a more detailed summary)

Literature examples of the application of MPS

CP+T after linear regression

Marginal effects of DNAm sites derived from linear regression. Selected probes have MWAS association p-value less than a defined threshold. In a greedy algorithm, the most associated probe is selected first. Other probes are selected if correlation (r) with any genomically local probe already selected is less than a defined threshold. The results are often reported from the p-value threshold that generates the highest out-of-sample prediction, but to avoid a winner’s curse effect, a single p-value threshold should be applied in the target cohort identified from MPS results applied in an independent tuning cohort.

BMI, height [58]

schizophrenia [59]

C-reactive protein levels [60]

Interleukin-6 [61]

Penalized linear regression:

glmnet [62, 63]

In ridge regression/lasso/elastic, net probe effect sizes are estimated jointly. In ridge regression, linear regression estimates are shrunk (dependent on penalty parameter λ1). In lasso, a proportion of probes have an effect size set to 0 (dependent on penalty parameter λ2). Elastic net regression requires the estimation of two penalty parameters (λ1, λ2), such that ridge regression and lasso are special cases of elastic net regression (when λ2 = 0 or λ1 = 0, respectively).

Major depressive disorder [64]

Smoking [65]

Alzheimer’s diseasea [66]

Incident diabetes [46]

Alcohol consumption, body fat percent, body mass index, lipoprotein cholesterol, waist-to-hip ratio [15, 67]

109 proteins [20]

Electronic health records [68]

Linear mixed model BLUPb:

OSCA [52]

All probes have a predicted effect size with effect sizes assumed to be drawn from a normal distribution with the total variance attributed to DNAm estimated from a restricted maximum likelihood (REML) analysis of the data.

ALS [69]

Linear mixed modelb:

OSCA [52]

lme4 [70]

Effect size of each probe estimated while fitting the joint effect of probes genome-wide in one (or several) random effects to control for unidentified background confounders.

ALS [69]

Parkinson’s disease [71]

Alzheimer’s and Parkinson’s disease, ALS, schizophrenia, rheumatoid arthritis [45]

Bayesian inference modelb:

BayesRR [72]

A linear mixed model, but with Bayesian framework to model any epigenetic genetic architecture. Probe effects are assumed drawn from one of multiple normal distributions (including null). Genetic effects can be modeled simultaneously.

BMI, smoking [72]

Cognitive ability [73]

  1. C+PT- clumping + P-value thresholding of MWAS summary statistics, BLUP-Best linear unbiased prediction
  2. aMPS derived from postmortem brain cortex
  3. bThese methods account for correlations in DNAm between people resulting from family relationships