Predicting age from the transcriptome of human dermal fibroblasts

Table 1 Accuracy of age prediction from fibroblast transcriptomes, for various algorithms on two datasets. Cross-validation age prediction metrics are reported for our dataset of 133 individuals between 1 and 94 years old and for dataset E-MTAB-3037 with 22 individuals from newborn to 89 years old. Metrics: mean absolute error (MAE), median absolute error (MED), and R² goodness-of-fit for the line of best fit. Parameters shown for regression algorithms are the best ones found for reducing MAE from a grid search of the parameter space. LDA ensemble with 20-year bins (in italics) achieves a lower MAE and MED and a higher R² than competing methods. Other window sizes (15, 25, 35) did not improve performance above that of the 20-year bin size

Algorithm	Parameters		Mean absolute error	Median absolute error	R ²
Our dataset (133 individuals)
LDA ensemble	Age bin width = 10		9.5	4.0	0.68
	Age bin width = 20		7.7	4.0	0.81
	Age bin width = 30		8.2	4.0	0.77
Gaussian naive Bayes ensemble	Age bin width = 10	Uninformative priors	16.5	7.0	0.20
	Age bin width = 20		16.0	8.0	0.27
	Age bin width = 30		15.7	7.0	0.30
k-nearest neighbors ensemble	Age bin width = 10	Euclidean distance metric k = 5	22.3	14.0	− 0.19
	Age bin width = 20		19.7	11.0	0.04
	Age bin width = 30		19.7	14.0	0.09
Random forest ensemble	Age bin width = 10	n_trees = 100, min_impurity_split =2	14.2	5.0	0.38
	Age bin width = 20		11.8	5.0	0.57
	Age bin width = 30		11.8	5.0	0.55
Linear regression	N/A		12.1	10.0	0.73
Elastic net regression	Alpha = 0.1 L1/L2 ratio = 0.0		12.0	11.0	0.73
Support vector regression	Kernel = second order polynomial C = 10, epsilon = 0.05 gamma = 0.0002		11.9	10.2	0.72
E-MTAB-3037 (22 individuals)
LDA ensemble	Age bin width = 20		18.1	14.5	0.20
Gaussian naive Bayes ensemble	Age bin width = 20, uninformative prior		36.4	39.5	− 1.47
k-nearest neighbors ensemble	Age bin width = 20, Euclidean distance metric k = 5		34.9	36	− 1.25
Random forest ensemble	Age bin width = 20, n_trees = 100, min_impurity_split =2		31.9	28	− 0.82
Linear regression	N/A		23.5	18.8	0.04
Elastic net regression	Alpha = 1.0 L1/L2 ratio = 0.6		20.0	18.8	0.36
Support vector regression	Kernel = second order polynomial C = 1, epsilon = 0.05 gamma = 0.0002		19.7	15.4	0.31

ISSN: 1474-760X