Skip to main content

Table 1 Accuracy of age prediction from fibroblast transcriptomes, for various algorithms on two datasets. Cross-validation age prediction metrics are reported for our dataset of 133 individuals between 1 and 94 years old and for dataset E-MTAB-3037 with 22 individuals from newborn to 89 years old. Metrics: mean absolute error (MAE), median absolute error (MED), and R2 goodness-of-fit for the line of best fit. Parameters shown for regression algorithms are the best ones found for reducing MAE from a grid search of the parameter space. LDA ensemble with 20-year bins (in italics) achieves a lower MAE and MED and a higher R2 than competing methods. Other window sizes (15, 25, 35) did not improve performance above that of the 20-year bin size

From: Predicting age from the transcriptome of human dermal fibroblasts

Algorithm

Parameters

Mean absolute error

Median absolute error

R 2

Our dataset (133 individuals)

 LDA ensemble

Age bin width = 10

9.5

4.0

0.68

Age bin width = 20

7.7

4.0

0.81

Age bin width = 30

8.2

4.0

0.77

 Gaussian naive Bayes ensemble

Age bin width = 10

Uninformative priors

16.5

7.0

0.20

Age bin width = 20

16.0

8.0

0.27

Age bin width = 30

15.7

7.0

0.30

 k-nearest neighbors ensemble

Age bin width = 10

Euclidean distance metric k = 5

22.3

14.0

− 0.19

Age bin width = 20

19.7

11.0

0.04

Age bin width = 30

19.7

14.0

0.09

 Random forest ensemble

Age bin width = 10

n_trees = 100, min_impurity_split =2

14.2

5.0

0.38

Age bin width = 20

11.8

5.0

0.57

Age bin width = 30

11.8

5.0

0.55

 Linear regression

N/A

12.1

10.0

0.73

 Elastic net regression

Alpha = 0.1

L1/L2 ratio = 0.0

12.0

11.0

0.73

 Support vector regression

Kernel = second order polynomial

C = 10, epsilon = 0.05

gamma = 0.0002

11.9

10.2

0.72

E-MTAB-3037 (22 individuals)

 LDA ensemble

Age bin width = 20

18.1

14.5

0.20

 Gaussian naive Bayes ensemble

Age bin width = 20, uninformative prior

36.4

39.5

− 1.47

 k-nearest neighbors ensemble

Age bin width = 20, Euclidean distance metric k = 5

34.9

36

− 1.25

 Random forest ensemble

Age bin width = 20, n_trees = 100, min_impurity_split =2

31.9

28

− 0.82

 Linear regression

N/A

23.5

18.8

0.04

 Elastic net regression

Alpha = 1.0

L1/L2 ratio = 0.6

20.0

18.8

0.36

 Support vector regression

Kernel = second order polynomial

C = 1, epsilon = 0.05

gamma = 0.0002

19.7

15.4

0.31