Skip to main content

Advertisement

Table 1 Accuracy of age prediction from fibroblast transcriptomes, for various algorithms on two datasets. Cross-validation age prediction metrics are reported for our dataset of 133 individuals between 1 and 94 years old and for dataset E-MTAB-3037 with 22 individuals from newborn to 89 years old. Metrics: mean absolute error (MAE), median absolute error (MED), and R2 goodness-of-fit for the line of best fit. Parameters shown for regression algorithms are the best ones found for reducing MAE from a grid search of the parameter space. LDA ensemble with 20-year bins (in italics) achieves a lower MAE and MED and a higher R2 than competing methods. Other window sizes (15, 25, 35) did not improve performance above that of the 20-year bin size

From: Predicting age from the transcriptome of human dermal fibroblasts

Algorithm Parameters Mean absolute error Median absolute error R 2
Our dataset (133 individuals)
 LDA ensemble Age bin width = 10 9.5 4.0 0.68
Age bin width = 20 7.7 4.0 0.81
Age bin width = 30 8.2 4.0 0.77
 Gaussian naive Bayes ensemble Age bin width = 10 Uninformative priors 16.5 7.0 0.20
Age bin width = 20 16.0 8.0 0.27
Age bin width = 30 15.7 7.0 0.30
 k-nearest neighbors ensemble Age bin width = 10 Euclidean distance metric k = 5 22.3 14.0 − 0.19
Age bin width = 20 19.7 11.0 0.04
Age bin width = 30 19.7 14.0 0.09
 Random forest ensemble Age bin width = 10 n_trees = 100, min_impurity_split =2 14.2 5.0 0.38
Age bin width = 20 11.8 5.0 0.57
Age bin width = 30 11.8 5.0 0.55
 Linear regression N/A 12.1 10.0 0.73
 Elastic net regression Alpha = 0.1
L1/L2 ratio = 0.0
12.0 11.0 0.73
 Support vector regression Kernel = second order polynomial
C = 10, epsilon = 0.05
gamma = 0.0002
11.9 10.2 0.72
E-MTAB-3037 (22 individuals)
 LDA ensemble Age bin width = 20 18.1 14.5 0.20
 Gaussian naive Bayes ensemble Age bin width = 20, uninformative prior 36.4 39.5 − 1.47
 k-nearest neighbors ensemble Age bin width = 20, Euclidean distance metric k = 5 34.9 36 − 1.25
 Random forest ensemble Age bin width = 20, n_trees = 100, min_impurity_split =2 31.9 28 − 0.82
 Linear regression N/A 23.5 18.8 0.04
 Elastic net regression Alpha = 1.0
L1/L2 ratio = 0.6
20.0 18.8 0.36
 Support vector regression Kernel = second order polynomial
C = 1, epsilon = 0.05
gamma = 0.0002
19.7 15.4 0.31