Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Evaluation of commonly used analysis strategies for epigenome- and transcriptome-wide association studies through replication of large-scale population studies

Fig. 2

a The number (x-axis) and percentage (y-axis) of replicated CpGs for age, BMI, and smoking (shown in columns). Per row, each step of the analysis strategy is displayed. The yellow model is the reference model and remains the same in each column and row: Beta-3IQR dataset, standard linear model (LM), measured cell count correction, and known technical confounders (bisulfite conversion plate and array row) correction (TCs). The circles are average Bonferroni-corrected replication results. The bars indicate the range of the four leave-one-out analyses. In each row, the other (non-yellow) colors represent alternative options: (A) Datatypes: beta without exclusion of outliers in green, M values in red, M values with outlier exclusion using the 3IQR method in blue, and RIN in purple. (B) Statistical models: linear mixed models (LMM) in green and robust linear mixed models (RLMM) in red. (C) Cell count adjustment: Houseman6 in green, Houseman3 in red, and none in blue (see the “Methods” section for details). (D) Hidden confounder (HC) correction: model 1 in purple, model 2 in green, and model 3 in red (see the “Methods” section for details). b The number (x-axis) and percentage (y-axis) of replicated genes for age, BMI, and smoking (shown in columns). Per row, each step of the analysis strategy is displayed. The yellow model is the reference model and remains the same in each column and row: Voom normalization, including all genes, standard linear model (LM), correcting for technical covariates (TC) and cell counts (CC). The circles are average Bonferroni-corrected replication results. The bars indicate the range of the four leave-one-out analyses. In each row, the other (non-yellow) colors represent alternative options: (A) Normalization methods: DESeq normalization in blue and edgeR in red. (B) Gene inclusion: removing very low-expressed genes (blue), low-expressed genes (red), or medium-expressed genes (green). (C) Statistical models: A limma linear model Fit in red (limma), a standard GLM in blue, and the edgeR GLM adaptation in green. (D) Covariates: correcting solely for technical covariates (TC; blue) or cell counts (CC; red) or replacing both for the first five principal components (5PCs; green); the last option is by adding five hidden confounders (HCs) to the technical covariates and cell counts (5HCs; purple)

Back to article page