From: An overview of DNA methylation-derived trait score methods and applications
Method: software | Brief summary (see Additional file 1 for a more detailed summary) | Literature examples of the application of MPS |
---|---|---|
CP+T after linear regression | Marginal effects of DNAm sites derived from linear regression. Selected probes have MWAS association p-value less than a defined threshold. In a greedy algorithm, the most associated probe is selected first. Other probes are selected if correlation (r) with any genomically local probe already selected is less than a defined threshold. The results are often reported from the p-value threshold that generates the highest out-of-sample prediction, but to avoid a winner’s curse effect, a single p-value threshold should be applied in the target cohort identified from MPS results applied in an independent tuning cohort. | BMI, height [58] schizophrenia [59] C-reactive protein levels [60] Interleukin-6 [61] |
Penalized linear regression: | In ridge regression/lasso/elastic, net probe effect sizes are estimated jointly. In ridge regression, linear regression estimates are shrunk (dependent on penalty parameter λ1). In lasso, a proportion of probes have an effect size set to 0 (dependent on penalty parameter λ2). Elastic net regression requires the estimation of two penalty parameters (λ1, λ2), such that ridge regression and lasso are special cases of elastic net regression (when λ2 = 0 or λ1 = 0, respectively). | Major depressive disorder [64] Smoking [65] Alzheimer’s diseasea [66] Incident diabetes [46] Alcohol consumption, body fat percent, body mass index, lipoprotein cholesterol, waist-to-hip ratio [15, 67] 109 proteins [20] Electronic health records [68] |
Linear mixed model BLUPb: OSCA [52] | All probes have a predicted effect size with effect sizes assumed to be drawn from a normal distribution with the total variance attributed to DNAm estimated from a restricted maximum likelihood (REML) analysis of the data. | ALS [69] |
Linear mixed modelb: OSCA [52] lme4 [70] | Effect size of each probe estimated while fitting the joint effect of probes genome-wide in one (or several) random effects to control for unidentified background confounders. | ALS [69] Parkinson’s disease [71] Alzheimer’s and Parkinson’s disease, ALS, schizophrenia, rheumatoid arthritis [45] |
Bayesian inference modelb: BayesRR [72] | A linear mixed model, but with Bayesian framework to model any epigenetic genetic architecture. Probe effects are assumed drawn from one of multiple normal distributions (including null). Genetic effects can be modeled simultaneously. | BMI, smoking [72] Cognitive ability [73] |