Article URL

An algorithm for determining networks from gene expression data enables the identification of genes potentially linked to aging in worms.

significantly coordinated patterns of expression, to machine learning methods that consider all possible combinations of genes and identify groups whose combined expression pattern can distinguish between different phenotypes -with no constraint that the genes in a group must be biologically related.
Network methods for interpreting gene expression data [11,[14][15][16][17][18][19] fall in between these two extremes: they incorporate prior biological knowledge in the form of an interaction network -so that genes in a significant group are likely to participate in shared functions -but they consider many different combinations of genes, and so are more flexible than methods using pre-defined gene groups.Gene groups identified by these methods constitute novel biological hypotheses about which genes participate together in common functions related to the class variable.
Here, we propose a novel strategy for identifying subnetwork biomarkers: we incorporate a measure of topological modularity into the expression for subnetwork score.This yields subnetwork biomarkers that are biologically cohesive and that have different activity levels at different ages.Using two aging microarray datasets, we show that our method improves on previous approaches, yielding subnetworks that are more conserved across studies, and that perform better in a machine learning task.
We identify the subnetworks that play a role in worm aging, and then explore their connection with known longevity genes.Finally, we apply them to assign putative aging-related functions to longevity genes (genes that affect lifespan when deleted or perturbed).Worm is the ideal model organism for studying these questions, since it has the largest number of characterized longevity genes [20], and microarray datasets using worms of 4 or more ages are publicly available [2,21].Our work builds on a family of successful algorithms that incorporate supervised information to find subnetworks with phenotype-dependent activity, which we discuss below.

Methods for extracting active subnetworks by integrating gene expression data, network connectivity, and supervised class labels
To date, some of the most successful network-based methods of gene group identification for class prediction have been the score-based subnetwork markers originally proposed in Ideker et al. [22] and developed and expanded in later works, e.g.[11,14,15,18,23,24].Subnetworks identified using these approaches were recently shown to be highly conserved across studies and to perform better than individual genes or pre-defined gene groups at predicting breast cancer metastasis [11].
Most of these methods share the same basic architecture.Each algorithm aggregates genes around a seed node in a way that maximizes some measure of performance.In previous implementations, the score is a function of the subnetwork activity (often calculated as the mean expression value of the genes in the subnetwork) and the class label -i.e.subnetworks get high scores if their activity is different for different classes.Subnetworks are grown outward iteratively from a seed node, typically using a greedy search procedure to maximize subnetwork score: at every step, the network neighbour of the current subnetwork yielding the largest score increase is added to the subnetwork.Subnetwork scores are calculated differently in individual implementations (e.g.[18] uses the t-statistic and [11] uses mutual information) but are always solely a function of what we refer to as class relevance, i.e. of expression data and class labels.In particular, in all previous implementations the subnetwork score is insensitive to network topology -the only topological constraint is that subnetwork members must form a connected component.However, a large body of work in network theory has demonstrated the value of more sophisticated topological measures of network cohesiveness, or modularity [25,26].In fact, many algorithms successfully identify groups of functionally related genes on the basis of network topology alone.The simple intuition behind these algorithms is that genes that are members of a highly interconnected group (that is only sparsely connected to the rest of the network) are more likely to participate in the same biological function or process.In biological networks, genes belonging to the same topological module are more likely to share functional annotations or belong to the same protein complex [27][28][29].
No score-based subnetwork method proposed to date takes advantage of the rich modular structure of biological interaction networks.Here, we propose incorporating topological modularity into the expression for subnetwork score, and show that this approach offers important advantages -increased conservation across studies, and improved performance on a learning task.For the remainder of the paper, we refer to subnetworks grown using scores that are a function of class relevance alone as regular subnetworks, and to those grown using our new scoring criterion as modular subnetworks.

Identifying active subnetworks in aging by trading off network modularity and class relevance
Here, we give a basic outline of our method for identifying subnetworks that are both highly modular and relevant to the class variable (Fig. 1), and then we discuss the novel aspect -the subnetwork scoring method -in detail; other algorithm parameters are listed in Materials and methods.We compared the performance characteristics of modular and regular subnetworks using two microarray studies of worm aging [2,21].

Identifying modular subnetworks
Our method is summarized in Fig. 2. First, we assign a weight to every edge in the interaction network that reflects the strength of the relation between the two genes that flank it (quantified using Spearman correlation).For genes i and j with normalized expression vectors i z and j z , the weight ij w is defined as: 1 if there is a network edge between nodes and ( , ) , where 0 otherwise Next, we grow subnetworks starting at particular seed genes in the network (see for some 0 S R M

Materials and Methods
At every stage, the neighbour that leads to the highest score increase (without reducing either class relevance or modularity) is added to the subnetwork.
The intuition behind the modularity parameter M is that it allows us to trade off the information in gene expression data with the prior knowledge about gene connectivity encoded in the functional interaction network: for noisy microarray studies, or ones with few samples, we should place a greater emphasis on prior knowledge by choosing higher values for β.Previous subnetwork scoring algorithms effectively assume that β = 0, or S = R.

Class relevance R
We measure class relevance as the Spearman correlation between subnetwork activity and age, so that a subnetwork is considered age-related to the extent that its activity level either increases or decreases monotonically with increasing age (Fig. 1B).
Subnetwork activity is calculated as the mean expression level of subnetwork genes.

Network modularity M
To define the modularity of a connected set of genes in a network, we use a weighted generalization of the local measure proposed in Lancichinetti and Fortunato [30].We calculate the modularity for a subnetwork as the edge weight internal to the subnetwork divided by the total edge weight of all subnetwork nodes, squared.For subnetwork N, we define the internal, external, and total weight: Then the modularity of N can be written as . For all subnetworks, M lies between 0 and 1.
For each study, we grew subnetworks seeded at every node in the functional interaction network, so that corresponding subnetworks grown using different expression datasets could be directly compared.We used randomization tests to determine which subnetworks were significantly associated with age in each study.
For further details, see Materials and methods.Below, we compare these regular and modular subnetworks in terms of their robustness across studies and performance on a machine learning task.

Modular subnetworks are more robust across studies than regular subnetworks
Comparing the modular subnetworks m1-m5 and the regular subnetworks r1-r5 derived from both studies, we found that modular subnetworks identified as significant in one study were highly likely to be significant in the other study (i.e., seed genes of significant modular subnetworks were highly conserved across studies).
Fig. 3 shows that 15-18% of significant modular subnetworks were identified in both studies; in contrast, only 3-5% of significant regular ones were.
For each modular and regular network type, we also calculated the significance of the overlap between sets of significant seed genes using the hypergeometric test, and these values showed the same trend (Fig. 3).While all subnetwork types were more conserved across studies than would be expected by chance (p < 10 -3 ), modular subnetworks were much more conserved than regular ones -they had enrichment p-values ranging from 10 -84 to 10 -137 , while regular subnetworks had p-values from 10 -3 to 10 -38 .
While substantially more modular than regular subnetworks were conserved across studies, many subnetworks were identified in only one study; this can be partially accounted for by noise in the individual microarray studies, the fact that the two studies used different microarray platforms and different strains of worm, and the fact that the current functional interaction network is not complete and contains some errors.

Modular subnetworks trained on aging gene expression data from wild-type worms successfully predict age in fer-15 worms
We compared the performance of single genes, regular subnetworks, and modular subnetworks on a machine learning task: predicting worm age on the basis of gene expression levels (Fig. 4).We acquired sets of significant genes from [2]; g1 is made up of all the genes considered significant in that study, and g2 is the aging gene signature used for machine learning in [2] (i.e., g2 is the 100 most significant genes from g1).Using machine learning features drawn from gene sets g1-g2, regular subnetworks r1-r5, or modular subnetworks m1-m5 derived from the larger microarray study [2], we trained support vector regression (SVR) algorithms to predict the age of wild-type worms on the basis of gene expression (for details, see

Materials and methods).
We then tested the performance of the learned feature weights on an independent data set in a different strain of worm (fer-15) [21].
Performance on the test set was quantified as the squared correlation coefficient (SCC) between worm ages predicted by the SVR and true worm ages (measuring performance in terms of mean-squared error would be inappropriate here, because the worms in the training and test sets had different lifespans).All p values reported in this section were calculated using the Wilcoxon ranksum comparison of medians test.
To capture the typical performance of machine learners that used either genes or subnetworks as features, we considered four different sizes of feature set (5, 10, 25, or 50 features).Then, for each size of feature set, and for each set of genes (g1-g2) or subnetworks (r1-r5, m1-m5), we performed 1000 tests.For example, for the 25feature SVRs, and for the m1 significant subnetworks, we randomly drew 25 subnetworks from m1, trained them on the wild-type worm data, and then tested them on the fer-15 data -and repeated that process of drawing, training, and testing 1000 times.Fig. 5 summarizes test results at each feature level, showing the typical performance of the best sets of genes, regular subnetworks, and modular subnetworks.
Full results for every parameter setting are available in Additional file 1, and p-value comparisons in Additional file 2.
Over all tests, the SVRs using 25 or 50 modular subnetwork features (of the m1 and m3 types) achieved the highest typical performance, with a median SCC of 0.91 between predicted and true worm age; this is a statistically significant 7% and 26% improvement over the best performances of regular subnetworks (p < 10 -83 ) and genes (p <10 -202 ), respectively (Fig. 5).

Subnetworks vs. genes
Modular and regular subnetworks dramatically outperform significant genes across a range of parameters.For example, using 25 features (Fig. 5), the best modular subnetworks have a median SCC of 0.91 and the best regular subnetworks of 0.85, versus 0.70 for the 100-gene signature.This result was consistent across feature levels and parameter settings, and is highly significant for all tests: i.e., for every comparison between modular subnetwork features and gene features, we have p < 10 - 15 .For all sizes of feature set, the best-performing subnetworks (m3) always showed a median SCC at least 0.16 higher than the best-performing genes (g2), i.e. at least a 24% improvement.

Modular vs. regular subnetworks
For all sizes of feature set, the median SCC of the best modular subnetwork type always exceeded that of the best regular subnetwork type by at 0.05-0.08,corresponding to a 6-10% performance improvement (Fig. 5).The performance difference between the best modular subnetworks and the best regular subnetworks is highly significant at all feature levels (p < 10 -32 ).
It was not only the best modular subnetworks that outperformed the best regular subnetworks; in fact, modular subnetworks significantly outperformed the best regular subnetworks for most parameter settings.With the exception of m5 ( 1000 β = ), each modular subnetwork type significantly outperforms the best regular subnetwork type at all feature levels.For three types of modular subnetwork (m1-m3), the performance difference between them and the best regular subnetworks is highly significant (ranksum p < 10 -26 for every comparison); m4 outperforms the best regular subnetworks at p < 10 -5 for three feature levels, and at p<10 -2 for 5 features; for m5, there is no consistent trend (Additional file 1).All pairwise comparisons (p-values) between regular and modular subnetworks are available in Additional file 2.

The role of the modularity coefficient β in machine learning
Different values of β correspond to giving different proportional weights to the information in gene expression data and to the prior knowledge about gene connectivity encoded in the functional interaction network: for noisy microarray studies, or ones with few samples, we might want to depend more on prior knowledge by choosing a high value for β.
For the Golden et al. dataset [2] that we used for training, we found that a value of β = 100 corresponds roughly to treating class relevance and modularity as equally important in the expression for subnetwork score: in simulations where we generated subnetworks using either modularity or class relevance alone as the scoring criterion (i.e. S = M or S = R), the median modularity of the S = M subnetworks was two orders of magnitude smaller than the median class relevance of the S = R ones, i.e., 'good' values for modularity are roughly 100 times smaller than 'good' values for class relevance.
As β becomes larger, the proportional contribution of class relevance to the expression for subnetwork score becomes smaller -and so for large enough values of β, the algorithm will behave essentially like other purely unsupervised network clustering algorithms that greedily aggregate nodes around a seed to maximize modularity [29][30][31].In our tests, subnetworks generated using β = 50, 100, or 250 behaved virtually identically on the learning task; the performance of β = 500 subnetworks was typically a bit lower; and that of β = 1000 ones lower still.For large enough values of β, we would expect the typical performance of modular subnetworks to fall below that of regular subnetworks, because supervised feature selection is superior to unsupervised feature selection [32].
In the previous two sections, we established that modular subnetworks are more robust across studies than regular subnetworks and perform better in a worm age prediction task.Modular subnetworks grown using the coefficient 250 showed both the highest robustness across studies and the best performance on the test set, so we chose to analyze them in greater detail.For the remainder of the paper, we will explore the relation between these subnetwork biomarkers (generated from the larger microarray study [2]) and worm aging.The full set of these subnetworks is available in Additional file 2.

Modular subnetworks predict wild-type worm age with low mean-squared error
Here, we show using 5-fold cross-validation that modular subnetworks grown using Because it would be circular to predict age on the same dataset that was used to determine the features [33], we first divided the wild-type worm aging dataset into 5 stratified folds for cross-validation.We repeated the search for significant subnetworks 5 times, each time using 4/5 of the data to select significant subnetworks and train SVRs, and then the remaining 1/5 as a test set to evaluate the learned feature weights.We compared the performance of modular subnetworks with that of the top 100 differentially expressed genes reported in [2].To construct SVRs using genes as features, we used the same 5 stratified folds -i.e., we used 4/5 of the data to select the top 100 most significant genes and learn feature weights, and the remaining 1/5 as test data, and repeated this process for each of the 5 folds.As in the original study [2], for each fold we selected the top 100 significant genes by performing an F-test and applying a False Discovery Rate [34] (FDR) correction.
For four different sizes of feature set (5, 10, 25 or 50), we generated 1000 different SVRs using either modular subnetworks or genes as features to capture their typical performance.All p-values reported here were computed using the Wilcoxon ranksum test.
At every size of feature set (5, 10, 25 or 50), modular subnetworks significantly outperform differentially expressed genes (p < 10 -28 ) according to the metrics of mean-squared error (MSE) and squared correlation coefficient (SCC) between predicted age and true age.For example, using feature sets of size 50, we obtained a median MSE of 7.9 for subnetworks vs. 11.2 for genes (p < 10 -98 ), and a median SCC of 0.77 for subnetworks vs 0.69 for genes (p < 10 -65 ).Fig. 6A shows the median performance of modular subnetworks and genes across all tests, and Fig. 6B shows the predictions of a typical SVR learner built using 50 modular subnetworks as features.At every size of feature set, the MSE for genes was at least 1.76 higher than the corresponding MSE for subnetworks (i.e., at least 22% higher than the corresponding MSE for subnetworks) (p < 10 -28 ), and the SCC for subnetworks was at least 0.05 higher (p < 10 -28 ).
Over all tests, the modular SVRs with 50 features achieved the best performance: a median SCC of 0.77 and a median MSE of 7.9.This SCC is substantially lower than the highest one achieved on the test set of pooled fer-15 worms in the last section (0.91) because predicting the age of an individual worm is more difficult than predicting the age of a large pooled group of age-matched worms (pooling removes individual variability).

Longevity genes play crucial roles in significant subnetworks
For these analyses, we compiled two sets of known longevity genes (see Materials and Methods, Additional file 3): L1, a set of 233 genes that extend lifespan when perturbed, and L2, a larger set of 494 genes that either shorten or extend lifespan when perturbed.

Significant subnetworks are enriched for known longevity genes
We found that significant subnetworks derived using both C. elegans aging microarray studies [2,21] were significantly enriched for both sets of longevity genes, relative to the background set of 12808 genes represented in the functional interaction network.All p-values reported here were calculated using the hypergeometric test.
For the Golden et al. [2] data, of the 1957 genes that play a role in significant subnetworks, 65 are in L1 (p < 10 -6 ) and 124 are in L2 (p < 10 -8 ), and of the 535 seed genes that produce significant subnetworks, 27 are in L1 (p < 10 -5 ) and 45 are in L2 (p < 10 -6 ).For the Budovskaya et al. [21] study, subnetworks seeds were highly enriched for known longevity genes, and the set of all subnetwork genes was slightly enriched for them.Of the 1559 seed genes of significant subnetworks, 43 are in L1 (p = 0.003) and 90 are in L2 (p < 10 -4 ), and of the 4158 genes represented in some subnetwork, 88 are in L1 (p = 0.048) and 181 are in L2 (p = 0.025).

Examples of significant subnetworks containing known longevity genes
While high-throughput experimental methods have helped to identify hundreds of worm longevity genes [20], their aging-related functions remain poorly understood.We found that subnetwork biomarkers are highly enriched for longevity genes.Thus, subnetworks can provide a molecular context for these genes in aging: they can be applied to uncover new connections between different longevity genes, or to assign putative aging-related functions to them.
In Figure 7, we show several representative examples of significant subnetworks derived from the Golden et al. [2] data that involve multiple known longevity genes.The complete list is given in Additional file 3; individual NAViGaTOR XML [35] and PSI-MI XML [36] files for each subnetwork are available from the supplementary website [37].Subnetwork A involves longevity genes vit-2 and vit-5.B has known longevity genes age-1, daf-18, and vit-2; previous work has uncovered that a mutation in daf-18 will suppress the lifespan-extending effect of an age-1 mutation [38].C contains longevity genes rps-3 and skr-1, which are involved in protein anabolic and catabolic processes, respectively.Subnetwork D contains longevity genes unc-60 and tag-300, which are both involved in locomotion.
E contains longevity genes fat-7 and elo-5, which are involved in fatty acid desaturation and elongation.Subnetwork F has longevity genes rps-22 and rha-2, and G has longevity genes blmp-1, his-71, and Y42G9A.4.Blmp-1 and his-71 are both involved in DNA binding.

Modular subnetworks participate in many different age-related biological processes
Aging is highly stochastic and affects many distinct biochemical pathways.We analyzed the union of all genes in significant modular subnetworks using biological process categories from the Gene Ontology [13] (GO) and pathways from the Kyoto Encyclopaedia of Genes and Genomes [39] (KEGG) databases to determine their relation to known mechanisms of aging.Full results are given in Tables 1 and 2; all functions and pathways shown in the table and discussed below are significant at p < 0.05 after an FDR correction.
In total, we identified 27 KEGG pathways and 37 non-redundant GO biological processes (see Materials and Methods) that were significantly enriched for subnetwork genes.To test whether these pathways and processes were also related to aging, we calculated the significance of their overlap with the set of experimentally determined longevity genes (Additional file 4).We found that one third of the GO biological processes (12 of 37) and KEGG pathways (10 of 27) associated with subnetworks were significantly enriched for longevity genes (p < 0.05).Agingassociated GO categories enriched for subnetwork genes include 'locomotory behaviour,' which has recently been proposed as a biomarker of physiological aging [2] , and 'determination of adult life span'; KEGG pathways include 'cell cycle' and several metabolic pathways (including 'citrate cycle,' 'glycolysis').

Modular subnetworks can be used to annotate longevity genes with novel functions
An important advantage of subnetwork over single-gene biomarkers is that they can be applied to infer novel functions for subnetwork members [40].Most worm longevity genes were identified in high-throughput RNA interference screens, and thus many remain poorly characterized.And though several longevity genes do have some previously known functions, their aging-related function is still unknown.
We used modular subnetworks (derived from the expression data in [2]) to assign putative functions in aging to known longevity genes by annotating them with the Gene Ontology (GO) Biological Process categories that their associated subnetworks were significantly enriched for.In total, we provided 49 longevity genes with novel annotations; nine of these genes had no previous Gene Ontology biological process annotations (apart from those electronically inferred) or well-characterized orthologs (named NCBI KOGs [41]).The most significant novel annotation for each longevity gene is given in Table 3, as an example of our approach (poorly characterized genes are indicated with an asterisk).The full list of all longevity gene GO categories inferred by subnetwork annotations is available in Additional file 5, and on the supplementary website [37].All GO categories in the tables are significant with p < 0.05 (after an FDR correction), and annotated to at least 25% of subnetwork genes. .

Conclusions
Aging results not from individual genes acting in isolation of one another, but from the combined activity of sets of associated genes representing a multiplicity of different biological pathways.For the most part, the organization and function of these aging-related pathways remain poorly understood.In particular, the role of most longevity genes in aging is still unknown.
In this work, we showed that high-throughput information about which genes are likely associated with which other genes -in the form of a functional interaction network -can yield new insights into the transcriptional programs of aging.We identified modular subnetworks associated with worm aging -highly interconnected groups of genes that change activity with age -and showed that they are effective biomarkers for predicting worm age on the basis of gene expression.In particular, they outperform biomarkers of aging based on the activity of single genes or regular subnetworks.Furthermore, we found that modular subnetwork biomarkers were significantly enriched for known longevity genes.Thus, modular subnetwork biomarkers can provide a molecular context for each longevity gene in aging -in effect, each longevity subnetwork constitutes a biological hypothesis as to which genes interact with known longevity genes in some common age-related function.
This work is the first to use a new subnetwork performance criterion that incorporates modularity into the expression for subnetwork score, and the first to integrate network information with gene expression data to identify biomarkers of aging.The subnetwork biomarkers identified by our method are highly conserved across studies, and this opens the door to studying longevity genes -or indeed, any age-related gene set of interest -over a range of different health and disease conditions.In particular, we are interested in investigating the different subnetworks associated with longevity genes in diseases like cancer, and in aging across species.

Materials and methods
Functional interactions for C. elegans ORFs were downloaded from WormNet [46].
The network used in our analyses consists of the largest connected component of the network formed from all WormNet ORFs represented by some probeset in two separate worm aging microarray studies [2,21], and represents 12808 distinct C.

Longevity genes
We obtained L1, our high confidence set of genes that extend lifespan when perturbed or knocked out, from the recent list compiled in [47].In total, 233 genetic perturbations that extend lifespan belonged to the largest connected component of WormNet made up of genes covered by both expression studies.We constructed L2, our larger set of longevity genes, by taking the union of L1 and the set of mutations that affect worm lifespan downloaded from the GenAge database [20].This yielded 494 genes that either shorten or extend lifespan when perturbed (and are annotated to the network we use).Both gene lists are available in Additional file 4.

Seed genes
Previous methods [11,18] seed the subnetwork search process at a random subset of genes on the network; a problem with this approach is that different choices of seed genes might yield substantially different significant subnetworks.To avoid this bias, we grew subnetworks seeded from every node of the interaction network.For all machine learning tests, the total set of significant subnetworks was reduced to a nonredundant set, i.e. if two significant subnetworks shared more than 25% overlap (as measured with the Jaccard index), the lower-scoring subnetwork was deleted from the set of candidate features.

Stopping criteria
For modular subnetworks grown iteratively out from a seed node, the search was halted when there were no nodes that would increase both subnetwork modularity and class relevance.For regular subnetworks, the search was halted when there were no nodes that would increase the subnetwork score (class relevance) past some threshold r (r = 0.01, 0.02, 0.05, 0.1 and 0.2 for regular subnetworks r1-r5), or when there were no remaining local nodes (i.e., nodes at most two edges away from the seed).

Identifying significant subnetworks
We calculate subnetwork significance using both self-contained and competitive gene set tests [8,48].Our competitive test is identical to that used in [11], and our selfcontained test is more stringent -we use the method suggested in [18].
For the self-contained test, we randomized the assignment of ages to worms (samples), and then repeated the search for subnetworks starting from each network node.The subnetwork score of the original subnetwork determined from the true data was then ranked against the corresponding subnetworks determined from the artificial data that seeded from the same gene.This process was repeated 1000 times.
For the competitive test, we generated 100 artificial interactomes by randomizing the assignment of gene names to nodes on the functional interaction network and recalculating the weight for each network edge based on the new genes that flanked it (only for modular networks -regular networks do not use edge information).We repeated the search for significant subnetworks on each artificial interactome.Scores for subnetworks determined from the true interactome were ranked against the scores of all subnetworks generated from the artificial interactomes.
Subnetworks were considered significant if they achieved p < 0.001 on the local selfcontained test and p < 0.05 on the global competitive test.

Machine learning comparisons
We used ε-insensitive support vector regression (SVR) algorithms [49] to learn worm age as a function of the activity of regular subnetworks, modular subnetworks or differentially expressed genes.All SVRs were trained using a linear kernel and the default parameters provided by LIBSVM [42].For SVR features made up of subnetworks, subnetwork activity for a sample was calculated as the mean activity of all the genes in the subnetwork.

GO and KEGG enrichment analyses
The union of all genes present in some significant modular subnetwork (β = 250; derived using data from [2]) was compared with the background network, i.e. the set of 12808 genes present in the largest connected component of the network formed     The activities of genes or subnetworks (subnetwork activity is calculated as the mean activity of its member genes) are used by Support Vector Regression algorithms to predict age on the basis of gene expression.Performance is typically measured using both the mean-squared error of the difference between true and predicted ages, and the squared correlation coefficient between true and predicted ages.Modular subnetworks are shown in green, regular subnetworks in blue, and gene sets in gray.This figure shows the best-performing type of modular subnetworks, regular subnetworks, and genes at each feature level.For modular subnetworks, this is type m3 at every feature level; for regular subnetworks, type r3 at 5 and 10 features, r2 at 25 features, and r4 at 50 features; for genes, g2 at all feature levels.
Support Vector Regression algorithms using 5, 10, 25, or 50 features were trained to predict age on the data from Golden et al. [2] and tested on Budovskaya et al. [21].
For each size of feature set, 1000 different Support Vector Regression learners were computed; curves show their median performance (quantified using the squared correlation coefficient between true and predicted age in the bottom panel), and error bars indicate the 95% confidence intervals for the medians (calculated using a bootstrap estimate).The performance of a typical Support Vector Regression learner built using 50 modular subnetworks as features; true worm age is shown on the x-axis, and predicted age on the y-axis.
can predict the age of individual wild-type worms in the original dataset (104 worm microarrays over 7 ages) with low mean-squared error and a high squared correlation coefficient.Again, we used support regression algorithms (SVRs) for all learning tasks.

Figure 1 -
Figure 1 -High-scoring subnetworks fulfil two criteria: they are modular and related to aging.(a) High-scoring subnetworks have high modularity, i.e., they are highly interconnected, and sparsely connected to the rest of the network.(b) High-scoring subnetworks have high class relevance, i.e. they have activity levels that increase or decrease as a function of worm age.

Figure 2 -
Figure 2 -Identifying modular subnetworks.(a) Start with the largest connected component of the functional interaction network representing all genes whose expression has been measured (b) Weight every edge of the network with the absolute value of the Spearman correlation between the two genes flanking it.(c) Identify age-related subnetworks by growing subnetworks iteratively out from seed nodes.

Figure 3 -
Figure 3 -Modular subnetworks are highly conserved across studies.Modular subnetworks m1-m5 are shown in green and regular subnetworks r1-r5 in blue.Bar height shows the percentage overlap across studies for seed genes of significant modular and regular subnetworks derived from the data in Golden et al.andBudovskaya et al.;  this is calculated as the size of the intersection of sets of significant seed genes from both studies, divided by the union.P-values above each bar show the significance of the overlap calculated using the hypergeometric test.

Figure 4 -
Figure 4 -Predicting worm age using machine learning.

Figure 5 -
Figure 5 -Subnetworks and genes predict the age of fer-15 worms.

Figure 6 -
Figure 6 -Modular subnetwork biomarkers of aging predict the age of individual wild-type worms.(a) Machine learners built from modular subnetworks or genes, predicting worm age in a cross-validation task on the data from Golden et al. using 5, 10, 25, or 50 features.For each size of feature set, 1000 different Support Vector Regression learners were computed; curves show their median performance (quantified using mean-squared error in the top panel, and the squared correlation coefficient between true and predicted age in the bottom panel), and error bars indicate the 95% confidence intervals for the medians (calculated using a bootstrap estimate).(b)

Table 3 -Assigning putative functions to longevity genes
The first column lists longevity genes, column 2 shows the most highly enriched Gene Ontology biological process in subnetworks containing that gene, and the pvalue of the enrichment (hypergeometric test with FDR correction) is shown in column 3. Genes with no previously known manual GO BP annotation are indicated with an asterisk.