Skip to main content

A statistical and biological response to an informatics appraisal of healthy aging gene signatures

A Research to this article was published on 07 September 2015


Jacob and Speed did not identify even a single example of a ‘150-gene-set’ that was statistically significant at classifying Alzheimer’s disease (AD) samples, or age in independent studies. We attempt to clarify the various misunderstandings, below.

In 2013 we discovered a 150 RNA ‘gene-set’ from healthy skeletal muscle (others named HAG) that proved to be a high performing, statistically significant, non-linear age classifier of multiple types of human tissue [1]. We profiled muscle samples from an independent birth cohort and found our HAG score correlated with ‘successful aging’ over two decades, using rank-order methods which incorporated the direction of gene expression change from the original model [1]. The following year, we found that the HAG score was regulated with age in human hippocampus and in human blood profiles from AD [1]. Since then over 50 HAG have been linked, by other laboratories, to the biology of age and dementia (see Additional file 1), robust confirmation of the biological validity of the HAG. We illustrated the potential utility of the HAG, by combining it with a set of AD regulated ‘disease’ genes [2] to yield a test that distinguished AD samples from age- and gender-matched controls (Sood et al Fig. 5 [1]). The original disease signature (the Lunnon et al AD signature [2]) was not statistically significant, on its own, in an independent AD cohort. Notably this 150 probe-set plus 48 gene prototype ‘AD blood assay’ (red-dot, Age+AD-disease) out performs all 150 gene-sets presented by Jacob and Speed (Figure 1a) and remains one of the only RNA assays validated in an independent AD cohort. We presented our transcriptomic age model as a logical starting point for machine learning requiring independent data, to produce a tool to facilitate clinical AD research (i.e. screen for accelerated ‘aging’, the risk factor for AD).

Fig. 1
figure 1

A guide to the graphing layouts produced by Jacob and Speed and bench-marking against our top-ranked AD signature. a Using the same code and sampling procedure as Jacob and Speed, we compare the top-ranked gene-set from Fig. 5 Sood et al. [1] with their performance of their random process. Critically, Jacob’s code creates two random sampling objects per R session – rand.sig.1 for GSE63060 and rand.sig.2 for GSE63061, reflecting the different gene content of the GSE63060 and GSE63061 arrays. Jacob and Speed do not implement any cross-cohort validation nor any consideration of correction for multiple testing. Notably, our top-ranked blood signature exceeds random sampling by a substantial margin and critically is ‘active’ in both blood data-sets. Note that their ‘random’ gene-lists are selected from entire gene-chip which contains thousands of published age and AD correlated genes (as listed in our 2015 article, supplementary information). b Using the same code and sampling procedure as Jacob and Speed, we present the performance of just the 150 tissue age genes from Sood et al. (blue dot) in blood. We then assess one example of choices made by Jacob and Speed, namely using 50 vs 75% cohort split during withincohort cross-validation. By selecting a 50% split the relative (to ‘random’) performance of our age-150 gene list is impaired. Since the Sood age-150 was never stated to be the only age classifier, we made a genuine attempt at sampling at random by removing known Age (and AD) genes from the sampling pool (as listed in our 2015 article, supplementary information [1]). As Jacob’s code does not implement any cross-cohort validation nor any consideration of correction for multiple testing, our age signature remains the only signature validated in two (blood) cohorts (and across other independent data)

Substantial misunderstandings of our article, by Jacob and Speed, led to them create a narrative through reference to Ein-Dor et al [Ref 2 in their letter], that our work was an example of a classification solution found using a small clinical cohort [3]. Typically such solutions do not transfer to independent clinical cohorts, because the model is over-fitted to all the characteristics of the first data set. Notably it was impossible for the HAG signature to belong to this category, as we did not select our gene list using an AD cohort. Importantly, our single HAG ‘150 gene list’ was confirmed to be statistically significant in two independent AD cohorts [1]. In contrast, Jacob and Speed calculated thousands of ROC values; did not apply any statistical test for significance, and never confirmed the performance of any of their gene lists in any independent cohorts. For clarity on this complex topic, Table 1 contrasts some of the major differences in approach and rigor, taken by our team versus the approach presented by Jacob and Speed. It is noteworthy that the approach we took is far more laborious, and we understand why Jacob and Speed may have chosen to skip over most of the following time consuming and important steps.

Table 1 A comparison of the approaches of Sood et al. and Jacob and Speed

To revisit Jacob and Speed analysis we used their computational code, and found that their within-cohort modeling of the AD data (an approach we did not use as it is unreliable [4] (Table 1)) yielded 150 gene-sets with performance that reflected laboratory batch-noise within the AD Illumina blood RNA arrays (AdNeuroMed consortium [2]). We illustrate this, by sampling using their code, but only from non-expressed (‘background’) genes (Additional file 1: Figure S1). This noise is unfortunate, but the illustration demonstrates that the performance of their random sampling protocol was not driven, as claimed, to ‘interconnected biology’ or ‘shared gene expression variance’. Furthermore, their claim that our 150-gene age signature (Fig. 1b, blue-dot) lay ‘within’ the performance range obtained via random sampling in blood is neither correct nor a fair like-for-like statistical comparison [4]. Our single 150 HAG test was statistically significant in two independent AD cohorts, and exceeded any background noise. Jacob and Speed random gene-sets require statistical correction for thousands of multiple-tests and notably they did not present any statistical significance values. The importance of our approach, using hypothesis driven signatures and robust external validation (i.e. using independent data to validate a model) over their ‘easy to perform’ within data sampling is neither controversial or recent knowledge - See Konig et al. and cited articles [4].

We also found that their (less reliable) ‘within cohort’ ROC performance approach was enhanced by some of their choices e.g. 50% data-splits resulted in ~ 10% gain in ROC value, in their favor (Fig. 1b). Their random gene lists were not actually ‘random’ from a biological perspective. Sampling occurred from known age and AD correlated genes (see methods). Removal of age and AD genes known at the time of our study (2015) also reduces the performance of sampling at random (Additional file 1: Figure S2). Curiously, Jacob and Speed did not match AD cases with controls, for chronological age and gender (the two greatest risk factors for AD). Instead they combined AD with the Mild Cognitive Impairment (MCI) samples - a heterogenous population, comprising several clinical conditions and many who never develop AD [5]. This was clinically invalid but also unnecessary, as all required information was included in the GEO repository e.g. age, gender and clinical sample status (embedded in GSE63060 and GSE63061 as visualized in his files supplied with their letter (Additional file 1: Figure S3)). Combining the AD and MCI clinical samples also exaggerated the within cohort performance of their approach while it impaired the performance of our top-ranked gene-set (Additional file 1: Figure S4). Critically Jacob and Speed should have been able to replicate the design of our study analysis as all data necessary was at GEO, as illustrated in Additional file 1: Figure S3 - a screen shot taken directly from implementing the code provided by Jacob and Speed.

Finally, Jacob and Speed have claimed that our tissue age model was itself ‘unremarkable’. A clue to the exceptional performance of our age signature was the observation that when using muscle RNA profiles as the external validation data (classification space), the age of brain samples was classified correctly [1]. However, we formally revisited this issue by comparing our [1] HAG signature with 10,000 gene-sets chosen at random from the same muscle cohort. We calculated performance of each recorded ‘at random’ gene-set, using cross-cohort ‘gold standard’ external validation (four independent muscle data-sets; code and methods are included in the Additional file 1). Our 150 HAG signature was ranked better than all 10,000 ‘random’ gene-sets (Additional file 1: Figure S5A). We calculated the statistical significance of the average ‘random’ gene-set, and the performance of an ‘age model selected at random’ was not significant (unadjusted p = 0.113), despite having an ROC value > 0.6. Even gene-sets discovered and tested within a single cohort were largely inferior to our age signature (Additional file 1: Figure S5B). Our observations lead to the conclusion that the claims made by Jacob and Speed are flawed, at least in part because they did not present a single example of a new 150-gene-set that significantly worked across independent data as a tissue age classifer, a muscle-based health prognostic or a blood AD classifier. Thus, while we agree with Ein-Dor et al, that classifiers built on small clinical cohorts should be treated with caution; reflecting the interactive network of tissue and cellular gene expression [6] and simple technical factors such as noise, as we did not build a disease classifier using AD or any other disease samples, we find Jacob and Speed's criticism [3] of our work unfounded.


  1. Sood S, Gallagher IJ, Lunnon K, Rullman E, Keohane A, Crossland H, et al. A novel multi-tissue RNA diagnostic of healthy ageing relates to cognitive health status. Genome Biol. 2015;16:185 Available from:

    Article  Google Scholar 

  2. Lunnon K, Ibrahim Z, Proitsi P, Lourdusamy A, Newhouse S, Sattlecker M, et al. Mitochondrial dysfunction and immune activation are detectable in early Alzheimer’s disease blood. J Alzheimers Dis. 2012;30:685–710 [cited 2014 Oct 19]. Available from:

    Article  CAS  Google Scholar 

  3. Jacob L, Speed TP. The healthy ageing gene expression signature for Alzheimer’s disease diagnosis: a random sampling perspective. Genome Biol. 2018;19:97 [cited 2019 Jun 9]. Available from:

    Article  Google Scholar 

  4. König IR, Malley JD, Weimar C, Diener H-C, Ziegler A. Practical experiences on the necessity of external validation. Stat Med. 2007;26:5499–511 [cited 2019 May 3]. Available from:

    Article  Google Scholar 

  5. Vos SJB, Verhey F, Frölich L, Kornhuber J, Wiltfang J, Maier W, et al. Prevalence and prognosis of Alzheimer’s disease at the mild cognitive impairment stage. Brain. 2015;138:1327–38 [cited 2018 Sep 4]. Available from:

    Article  Google Scholar 

  6. Zhang B, Gaiteri C, Bodea L-GG, Wang Z, McElwee J, Podtelezhnikov AA, et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell. 2013;153:707–20 [cited 2017 Feb 3]. Available from:

    Article  CAS  Google Scholar 

Download references


We thank a number of colleagues, including Professor Claes Wahlestedt, for their constructive comments, during the preparation of this response.

Author information

Authors and Affiliations



The computational code used was produced by Jacob and Speed and inspected by SS, JT and IG. The original Jacob and Speed codes were implemented by SS during 2016 and 2017. The most recent Jacob and Speed code was implemented by JT during 2018, and results inspected by IG and SS. The original Knn classification R code was designed by JT, SS and IG and implemented by SS. The muscle random sampling code with 4 study external validation was produced by SS, and the data analysis checked by JT and IG. All primary and secondary data analysis results were discussed by JT, PA, BP and HC at meetings during 2018, and the responses to Jacob and Speed were then drafted by JT, PA, SS and IG. Various iterations of the document were edited by RH and WK. The final response was reviewed and approved by all authors.

Corresponding authors

Correspondence to James A. Timmons or Iain J. Gallagher.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Correspondence to:

The Research to this article has been published in Genome Biology 2015 16:185:

Additional file

Additional file 1:

Supplementary figures S1-S5, Supplementary Methods and code. (DOCX 3780 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Timmons, J.A., Gallagher, I.J., Sood, S. et al. A statistical and biological response to an informatics appraisal of healthy aging gene signatures. Genome Biol 20, 152 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: