Observing metabolic functions at the genome scale

A modular approach is presented that allows the observation of the transcriptional activity of metabolic functions at the genome scale.


Background
The increasing availability of high-throughput data has allowed more and more analyses to be performed at the cell scale. After completion of genome sequencing for many spe-cies, the focus is shifting towards getting a global understanding of cell physiology. This task requires the integration of heterogeneous data at different scales, including genomic, transcriptomic, proteomic, and metabolomic data.
At the level of metabolism, good knowledge of the structure of metabolic networks has now been achieved for several species. A number of genome-wide models of metabolism have been reconstructed [1][2][3][4], but these structural models provide only a static representation of an organism's metabolism; the structure of a metabolic network is static for a given species, and only changes at a slow pace across species through evolution [5]. However, the usage of particular metabolic reactions by a given cell is highly dynamic. It changes very rapidly in time with modifications in the environment, in the cell cycle, or with stochastic fluctuations. Static representations, therefore, need to be extended toward truly dynamic descriptions.
Metabolic networks are also highly complex, formed by several hundreds of densely interconnected chemical reactions. To characterize such complex systems at the genome scale, it is necessary to identify smaller building blocks. Cellular networks have been shown to have a high degree of modularity, and are composed of groups of interacting elements and molecules that carry out specific biological functions [6]. In recent years, several methods have been proposed to decompose complex biological networks into subnetworks and to identify basic interaction modules [5,[7][8][9]. Although relevant progress has been achieved in detecting motifs and modules in transcriptional regulatory and protein-protein interaction networks [10][11][12][13][14][15][16], the building blocks of metabolic pathways still remain largely undiscovered. Evidence for the existence of modularity in metabolic pathways was recently proposed by Ravasz et al. [17], who showed that the high clustering degree observed in metabolic networks may imply a hierarchical modularity, in which modules are made up of smaller and denser modules in a fractal manner.
A complementary approach is provided by the concept of an 'elementary mode'. Elementary modes, and the very similar concept of 'extreme pathways', are minimal sets of reactions that can operate in steady state in a metabolic network [18][19][20]. They have already proven useful for studying many aspects of metabolism, including the prediction of functional properties of metabolic pathways, the measurement of robustness and flexibility, inferring the viability of mutants, the assessment of gene regulatory features, and so on [21]. Recently, it has been shown that they could even provide a basis for describing and understanding the properties of signaling and transcriptional regulatory networks [22,23]. All these applications, however, consider elementary modes as purely 'structural units'. Although the biological significance of elementary modes has already been mentioned [24], the use of elementary modes as true elementary 'functional units' of cellular metabolism has not been attempted so far. A few studies [25,26] have combined metabolic and transcriptomic data in order to find out whether co-expressed genes are part of a given metabolic pathway, but most of these approaches used complete metabolic pathways as metabolic units.
Here, we address the problem of identifying metabolic units in a genome-scale model of the yeast Saccharomyces cerevisiae by relying on elementary modes. Our study is based on the integration of dynamic gene expression data in various stress conditions into a genome-scale model of metabolism, modularly structured in elementary modes. We used a bioinformatics tool called BlastSets [27] to combine these two types of data in order to answer the following question: do enzymes that are involved in the same elementary mode have their corresponding genes co-expressed in particular conditions? We were able to identify active elementary modes, that is, elementary modes whose enzymes are induced or repressed in response to different environmental stresses; these elementary modes can thus be seen as functional units of the metabolic stress response.

Genome-wide computation of elementary modes
The computation of elementary modes in genome-wide models of metabolism is seriously hampered by the problem of combinatorial explosion. Even though the number of elementary modes is usually smaller in a real system than its theoretical limit and can be further reduced by taking into account various environmental or regulatory constraints, it is of no practical use to handle systems of thousands of elementary modes because such systems become impossible to interpret [28,29]. One possible approach to deal with this problem consists of decomposing a genome-scale metabolic network into smaller subunits. This kind of decomposition has already been proposed, but was based on network topology [30]; it consisted of finding the optimal decomposition that minimized the number of elementary modes. However, there is no guarantee that such subunits represent functionally coherent and biologically interpretable pathways.
We have developed an alternative approach for computing elementary modes at the genome scale. In the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, metabolic pathways are represented as a series of maps, where each map covers a precise biological function [31]. These maps are sufficiently small for the number of elementary modes inside each of them to remain in the hundreds (Table 1). Furthermore, because they have been manually drawn and annotated based on biological information, these units have a clear biological meaning and are easy to interpret. We thus considered each pathway map of the KEGG database as one subnetwork. We then computed the full set of elementary modes inside each of them using a classical algorithm [20] (Additional data file 1).
Because of their combinatorial nature, a number of different elementary modes usually share common reactions along their path. It often occurs that several elementary modes are almost identical except for a few branches at their extremities. Similarly, a given reaction can belong to a large number of different elementary modes. Figure 1a illustrates this property by showing some of the elementary modes between fumarate and 2-oxoglutarate in the citrate cycle (note that only 7 elementary modes have been drawn out of 99 calculated for the entire citrate cycle map). This combinatorial property, which is a major problem in large networks, is, on the contrary, welcome in our study: as our aim is to search for the most active route in a system, it guarantees that the full set of topologically possible routes will be considered in the search.
The use of KEGG maps for defining subnetworks aims at having entities that are as much as possible biologically coherent. The start and end points of elementary modes are compounds located at the boundaries between subnetworks. One drawback of this approach is that active metabolic routes that are spread over different KEGG maps may not be easily identified. To overcome this problem, we constructed two different collections of elementary modes, EM1 and EM2. EM1 contains the full set of single elementary modes computed with each KEGG pathway map being used as a subnetwork; each elementary mode from EM1 is entirely included in a single pathway map. EM2 was formed by combining all pairs of elementary modes from EM1 that are connected through a common boundary compound; elementary modes from EM2 thus spread over two different pathway maps (Figure 1b). The use of EM2 reduces the dependence of results on subnetwork boundaries since active elementary modes spread over different KEGG maps can now be identified. More details are provided in the 'Genome-wide computation of elementary modes' section in Materials and methods, and the full description of single elementary modes is available in Additional data file 1.

Elementary modes represent true functional units of metabolism
Functional activity is more significant in elementary modes than in entire pathways To elucidate whether elementary modes can be considered as true functional biological units, the stress response of yeast was investigated in a large number of different conditions. Towards this goal, we used microarray data obtained from several experimental analyses [32][33][34] (see the 'Expression data' section in Materials and methods) and a bioinformatics tool called BlastSets [27]. BlastSets enabled us to find similarities between the composition of two sets of genes or proteins derived from two different types of information (here, metabolic pathways and expression data). The elementary modes EM1 and EM2 were stored independently as two BlastSets collections. Entire KEGG pathways were also stored as a BlastSets collection, to find out whether stress responses involve entire pathways, as defined in KEGG, or only parts of these pathways, as represented by elementary modes. In many stress conditions, induced/repressed elementary modes were found with higher P values than whole pathways ( Table 2).
The numbers of detected induced/repressed elementary modes for each stress condition are shown in Table 3, as well as the number of different KEGG pathways these elementary modes belong to. The numbers obtained with EM1 and EM2 are relatively well correlated but there is no absolute relationship between them; in most cases, the number of induced/ repressed elementary modes is increased when compared to EM2, but a few of them show higher numbers with EM1. The same observation can be made about the number of KEGG pathways to which these elementary modes belong. In a majority of cases, elementary modes detected with EM1 are concentrated in a relatively small number of pathways, and EM2 increases this number by adding modes from adjacent pathways. But in a few cases, for example Thiuram, the number of pathways detected with EM2 is smaller than with EM1, indicating that these elementary modes tend to be isolated and poorly connected to adjacent pathways.
Examples of elementary modes induced in particular stress conditions are shown in Figure 2, including an induced elementary mode in the citrate cycle during stationary phase, and another induced one in sulfur metabolism in response to tetrachloro-isophthalonitrile exposure. The sets of induced enzymes detected by BlastSets are indeed highly connected. Fewer elementary modes could be identified from the sets of repressed enzymes and they are usually less connected, meaning that repressed enzymes are more dispersed in the mode. This fact has already been mentioned by Wei et al. [35] for the genetic model plant Arabidopsis thaliana, who observed that induced genes in the same metabolic pathway tend to be close and well connected to each other, while repressed genes are more distant.

Induced/repressed elementary modes are statistically significant
BlastSets applies a stringent threshold on P values (P value must be lower than 6.0 × 10 -5 for EM1 and 3.4 × 10 -6 for EM2; see 'Description of BlastSets' section in Materials and methods), which should already guarantee that identified elementary modes are statistically significant. Nevertheless, in order to further assess the reliability of our results, we created random gene expression values by random permutation of gene expression values in several stress responses. These random sets of induced/repressed genes were compared to elementary modes in BlastSets, in the same way as for stressinduced/repressed genes. No active elementary mode was identified using these random sets. The procedure was repeated for several conditions, always with the same result. This finding confirms that elementary modes found to be active in specific environmental stress conditions have a high statistical significance.

Pairing elementary modes to reconstruct induced/repressed routes
To identify complete metabolic routes that are spread over several KEGG pathway maps, we constructed the EM2 collection containing elementary modes grouped in pairs. Two elementary modes are grouped as a set in EM2 if they share a The first and second columns give the identifier and the name of each KEGG metabolic pathway. For each of them, the number of elementary modes computed is indicated in the third column and the number of elementary modes entered in the BlastSets database in the fourth column. In most cases, there is a difference between these two numbers because BlastSets eliminates redundant elementary modes and the ones involving only one enzyme.  The composition of the EM1 collection (left) and how elementary modes were merged to build the EM2 collection (right). Three independent sets from EM1 can be merged into two sets in EM2 if they share a common boundary compound. common boundary compound. These compounds act as bridges between individual pathway maps, enabling more extended induced/repressed routes to be identified by this approach.
In each stress situation, we could then infer a 'backbone' of induced/repressed metabolic routes. Backbones were constructed by selecting the pairs of elementary modes with the lowest P values and connecting them to each other, thanks to results from the EM2 collection (see 'Analysis of BlastSets results' section in Materials and methods). These backbones can be viewed as the main modules characterizing metabolic activity in terms of expression data in a given condition. They are provided for each individual condition in Additional data file 2.

Specialized and multitask elementary modes
To assess how the activity of elementary modes is distributed in response to a set of diverse environmental stresses, we computed the probability distribution P(k) to find a given induced/repressed elementary mode in k stress conditions ( Figure 3a). This distribution reveals a highly heterogeneous behavior: on one hand, a relatively low number of 'multitask' elementary modes are transcriptionally active in a large number of different conditions, while on the other hand, many 'specialized' elementary modes are active in a small number of conditions (less than three). About 77% of detected elementary modes appear to be conducting specialized tasks while the remaining 23% are involved in the more general stress response. This observed metabolic organization is far from a random distribution, where each induced/repressed elementary mode would have the same chance to be active in the vicinity of the average value. The deviation from a random distribution suggests that elementary modes involved in the stress response are governed by a more complex organization [36], that is, that they are organized into complex modules across the metabolic network.

Map of elementary mode activities
It is possible to reveal the various patterns of stress responses by drawing the 'activity map' of elementary modes. In Figure   3b, each line represents an elementary mode and each column a stress condition; induced elementary modes are shown in red and repressed modes in green in this representation, which is deliberately chosen to look similar to a microarray. Indeed, in the same way a microarray represents a map of the transcriptional activity of individual genes, we are here able to construct a map of genome-scale elementary mode activities, revealing the transcriptional activity of entire metabolic processes. It is particularly clear on this map that most of the identified elementary modes are either only induced or only repressed. While the three repressed patterns are very similar, induced patterns are more diverse and very few elementary modes are induced over all conditions, confirming the trend revealed by the distribution in Figure 3a.

Two main classes of stress responses
Our approach is able to provide new insights about metabolic activity in terms of expression data in particular conditions. We analyzed the raw expression data obtained for each stress condition in order to see which stresses lead to similar responses; the clustering tree of stress conditions based on raw expression data is provided as Additional data file 3. Among the 31 different conditions we studied, 12 had a too weak transcriptional response for any induced or repressed elementary mode to be detected. We noticed that, among the remaining 19 conditions that produced a sufficiently strong response, stresses could be divided into two main classes, which we hence denote as 'toxic' and 'non-toxic'. The toxic stress class mostly includes exposure of cells to toxic chemicals and metals. The non-toxic class, on the contrary, mostly includes other types of stresses, such as temperature changes, osmotic shocks, nutrient starvation, and so on. The list of conditions assigned to each class is provided in Table 4.
The metabolic backbones inside each class show recurrent similarities, which allowed us to construct a common backbone for each class (Figure 4). The two classes show a clearly distinct global response and few elementary modes are induced in both backbones, with the exception of the citrate cycle and nucleotide sugar metabolism. In addition, we represented both classes by networks where each node corresponds to a metabolic pathway and each edge denotes that at least one pair of elementary modes spanning both pathways Heat shock [32], induced sce00500 (starch and sucrose metabolism) 3.8e-4 sce00500.em13 (part of starch and sucrose metabolism) 4.2e-6 Results given by BlastSets for particular conditions. The second column gives the most significant full KEGG pathway found to be induced/repressed (that is, the one with the lowest P value, given in the third column). The fourth column gives the most significant elementary mode from EM1 found to be induced/repressed. These results are sorted from the highest to the lowest difference between the two P values.
is present in a stress response (see 'Construction of toxic and non-toxic networks' section in Materials and methods). The toxic response network is shown in Figure 5a and exhibits two components. The inner component is composed of a group of strongly connected pathways centered on sulfur metabolism, pyruvate metabolism and lysine biosynthesis metabolism. These pathways thus have a strong tendency to be activated simultaneously. They constitute the core of the toxic stress response and cover most parts of the toxic backbone described previously. The external component, in contrast, is composed of a sparse network with thinner connections. In the non-toxic network this bi-component nature is less clear, but it is still possible to identify a more strongly connected central component containing starch and sucrose metabolism, the pentose phosphate pathway, glycolysis, and arginine and proline metabolism (Figure 5b).

Insights about specific stress conditions
In some cases, the observed transcriptional metabolic response confirms earlier findings. Vido et al. [37] reported that cadmium exposure increases the synthesis of cysteine and perhaps of glutathione, which is essential for cellular detoxification. The synthesis of these two compounds is possible through the activation of the sulfur amino acid pathway. We observe that, among the three elementary modes activated in response to cadmium exposure, two have cysteine as their final product, and among these two, one elementary mode is a part of cysteine metabolism and another is a part of sulfur metabolism. Cysteine is also one of the compounds produced in the general backbone of the response to toxic stresses (Figure 4a).
Amino acid starvation is known to activate the transcription factor Gcn4p, which induces genes involved in amino acid biosynthetic pathways, except the cysteine pathway [38], although the genes involved in the biosynthesis of cysteine precursors (homocysteine and serine) are induced. This is exactly what we observe in response to amino acid starvation: several elementary modes from amino acid biosynthetic pathways are activated but none from the cysteine pathway, even if some elementary modes from the cysteine pathway are linked to modes activated during amino acid starvation.
Genes induced in stationary-phase cultures of yeast are associated with mitochondrial functions, that is, aerobic respiration and the citrate cycle [39]. ATP synthesis is thus very    important for yeast in the stationary phase. In our results, the elementary modes activated during the stationary phase are part of metabolic pathways linked to aerobic respiration, including glycolysis, the citrate cycle, pyruvate metabolism and oxidative phosphorylation.
Trehalose and glycerol are produced in large amounts by cells in stress situations [40]. Schade et al. [40] have shown that there is an overlap between the late cold response and the environmental stress response. This response corresponds to the production of glycerol and trehalose. This is what we observed in the general non-toxic backbone response ( Figure  4b): glycerol is produced just a few reactions after glycerone Transcriptional activity of elementary modes   Table 4 Composition of toxic and non-toxic stress classes Toxic class Non-toxic class Not assigned phosphate, and trehalose is present one step before D-glucose in the starch and sucrose metabolism KEGG map (the only reason why it cannot appear as an end product in our study is that it is not a boundary compound in KEGG maps). These examples, confirming previously observed results, enable us to be confident in the identification of metabolic processes found to be induced/repressed in response to other stress conditions.

Discussion
There have been growing developments in recent years towards a more systems-level approach for understanding living organisms. On one side, microarray technologies have generalized the study of the transcriptome of biological cells in various conditions, and on the other side, numerous efforts have been undertaken to construct and describe the properties of metabolic networks at the genome scale. It is timely, therefore, to integrate both efforts and move towards a genome-scale analysis of cell metabolism.
At the same time, it is believed that a better understanding of the metabolome will be an important step towards improving the efficiency of the drug discovery process [41]. Instead of concentrating on the 'genomic universe', that is, the levels of gene regulation and transcription, our approach shifts the focus to the 'biochemical universe', that is, the small molecules or metabolites that actually perform biological functions and allow organisms to live and thrive. This shift is symbolized by the microarray-style representation of Figure  3b, which instead of showing the transcriptomic activities of individual genes, displays the transcriptomic activity of entire metabolic functions, represented by elementary modes, at the genome-scale. Although this is still a long way from an accurate and quantitative representation of the actual metabolic activity of a whole cell, which would require metabolic flux measurements, we believe that this shift opens a new perspective with a wide range of potential applications.
A major challenge addressed in this work consisted of embedding a suitable modularity into the highly complex and interconnected structure of metabolic networks. Our approach for computing elementary modes at the genome scale using KEGG pathway maps presents a number of advantages. These maps provide a decomposition of the metabolic network into well-defined subnetworks, which are biologically coherent and easy to interpret. Each map is sufficiently small for the number of elementary modes to remain in the hundreds, thus avoiding the necessity of having to cope with the problem of combinatorial explosion of elementary modes in large systems. Furthermore, these maps provide a manually curated representation of metabolic pathways where most secondary metabolites have been removed, thus avoiding the need to use complex procedures to identify principal metabolic routes and to eliminate invalid metabolic connections.
Microarray experiments are subject to a number of factors and we observed discrepancies in data obtained by different authors in similar conditions (peroxide treatment and heat shock experiments are available from both Gasch et al. [33] and Causton et al. [32]) The question of reproducibility of microarray experiments has been recurrent, but large-scale cross-platform experiments have shown that microarray data are indeed reliable and reproducible when adequate care is taken in experimental design and data treatment [42]. Differences may indicate that the transcription of genes involved in the metabolic response to stress is finely regulated and can fluctuate depending on a large number of factors.
Challenges also remain to obtain a more accurate description of the transcriptional activity of elementary modes in a cell.
Our approach can be seen as 'discrete', since an elementary mode can only be assigned to three possible categories, that is, induced, repressed, or inactive. This division into three categories relies on a threshold on expression fold-change values, but enhanced statistical approaches could be researched to obtain a more subtle classification and avoid the need to set a threshold. Furthermore, with BlastSets, the localization of induced/repressed genes 'inside' an elementary mode is not taken into account, although this information could be relevant. For example, a repressed gene belonging to a group of genes coding for the same enzyme may have no influence on the activity of the elementary mode as a whole, while repression of a gene that is the only one to encode a particular enzyme would be important. Finally, the computation of elementary modes is based on a steady-state assumption and it remains to be seen to what extent these concepts can be extended to dynamic activity.

Genome-wide computation of elementary modes
Combinatorial explosion prevents the computation of elementary modes in large networks. For example, in a network of about 110 reactions the number of elementary modes was shown to be higher than two million [21]. Furthermore, even if more efficient algorithms were found, it would not be particularly useful to compute all the elementary modes in a genome-wide model of metabolism because the resulting set would be extremely difficult to interpret. We therefore opted for an alternative modular approach for computing elementary modes at the genome scale. Pathway maps of the KEGG database constitute a good basis for this task, as each of them represents a coherent and well-defined biological function and is sufficiently small for the number of elementary modes to remain in the hundreds.
We used the KEGG XML files for S. cerevisiae as a source for the metabolic model [43]. These files have the advantage of having been manually curated and they contain the same information as the graphical maps displayed by the KEGG database. They thus have been cleaned from invalid meta-  bolic connections due to very common compounds (ADP, ATP, and so on), which otherwise create artificial links between metabolic compounds that do not correspond to biologically valid metabolic routes.
A stoichiometric matrix was constructed for each pathway based on its XML description. A point of major importance to the computation of elementary modes is the definition of 'external metabolites'. They act as start and end points of elementary modes, and in our hierarchical approach they additionally enable elementary modes from different pathway maps to be connected to each other. We adopted the following rules for defining external metabolites: one, a metabolite located at the interface between two or more pathway maps is considered external to all of them; two, a metabolite that can only be either produced or consumed is considered external; and three, unbalanced ubiquitous metabolites are considered external. Rule one creates the vast majority of entry/exit points to elementary modes and allows connections between pathway maps. Rule two prevents the existence of 'inactive' metabolic branches, that is, branches of a metabolic network that cannot participate in any elementary mode. This happens, for example, when a branch ends up in a dead end: as steady state conditions are assumed when elementary modes are computed, no flux can be present in a dead-end branch since this would lead to accumulation of the compound at its extremity. Rule three was introduced to prevent particular branches of the metabolic network from collapsing due to inappropriate balancing. For example, CO 2 appears on the map of the citrate cycle and must be considered external for the cycle to be able to operate, otherwise this route would contain a dead end and become inactive for the reason stated above. The complete list of metabolites covered by rule three comprises H 2 O, O 2 , P, CoA, CO 2 , NH 3 , UDP, H 2 , and reduced and oxidized thioredoxin.
Once stoichiometric matrices had been constructed, elementary modes were computed using a classical algorithm [20]. The complete list of elementary modes for S. cerevisiae is provided as Additional data file 1, and was used to create the EM1 and EM2 BlastSets collections.

Expression data Data sources
We have chosen experiments analyzing the gene expression responses of the yeast S. cerevisiae to various environmental stresses. Three sets of microarray experiments have been selected for our study. Causton et al. [32] described the transcriptional response to environmental changes using genome-wide expression experiments; data are available on the Young lab website [44]. Gasch et al. [33] analyzed gene expression of yeast cells during the adaptation to stressful environments in order to identify the main patterns of response in these different conditions; data were downloaded from the Stanford MicroArray Database website [45]. Iwahashi et al. [34] studied transcriptional responses of yeast to physical and chemical stresses using microarray; data are available from the Yeast Environmental Stress database [34].
These three datasets enabled us to study a total of 31 stress conditions. Some of these stresses involved environmental changes or nutrient depletion, while others involved exposure to toxic compounds such as pesticides or fungicides. The latter include: ash, which refers to exposure to burned ash from an industrial incinerator; maneb, which is a fungicide used in the control of several diseases of fruit, vegetable, field crops and ornamentals; pentachlorophenol (PCP), which is an effective fungicide, herbicide and algicide used as a wood preservative; tetrachloro-isophthalonitrile (TPN), which is a fungicide used to prevent biofouling on ships and in agriculture; thiuram, which is a compound used as fungicide to prevent crop damage and to protect harvested crops; and zineb, which was rated as a pesticide of low toxicity and may be a weak mutagen.

Data processing
We plotted the distributions of the natural logarithm of foldchange values for the Causton, Gasch and Iwahashi datasets. For each of the three sets of data, the standard deviation was determined. A threshold was defined by multiplying the standard deviation by a constant, and this threshold was used to determine which genes were considered as significantly induced or repressed in each condition. Genes whose fold change was higher than the threshold were considered induced; genes whose fold change was lower than 1 divided by the threshold were considered repressed. For each condition, a set of induced genes and a set of repressed genes were constructed. Table 5 indicates the number of genes present in each set.

Creation of random data sets
For three particular conditions (one from each dataset), we re-assigned gene expression values randomly to all genes of the experiment. We then processed these random expression data using the same procedures as described above. The resulting sets of random induced/repressed genes were compared to elementary modes using BlastSets in the same way as real expression data.

Data integration and analysis Description of BlastSets
BlastSets is a bioinformatics tool that enables the integration of various biological data. This tool uses a standard representation for all types of data: data are structured in collections of sets of genes or proteins. Each collection corresponds to a biological source of information, and sets are composed of genes that share a similar property or value (close genes on a chromosome, co-expressed genes, proteins belonging to the same complex, proteins involved in the same metabolic pathway, and so on). The sets stored in the BlastSets database can  be compared to each other or to submitted custom sets. To evaluate the similarity between two sets, their composition in terms of genes/proteins is compared, and the hypergeometric distribution is used to decide if the number of genes in common between the two compared sets is statistically significant (P value). As an example, one can check if the genes found co-expressed in an experiment correspond to a set containing proteins involved in the same pathway.
A P value is considered significant by BlastSets if it is less than or equal to a certain threshold. Multiple comparisons are performed as a set is compared to a collection of sets. The P value significance threshold is thus adjusted to the considered target sets using a Bonferroni correction. This takes into account the number of comparisons conducted, which depends on the number of sets in the collection. The resulting threshold is 6.0 × 10 -5 when a set is compared to EM1 and 3.4 × 10 -6 when compared to EM2. All hits with higher P values were automatically rejected. Additional details about BlastSets can be found in [27].

Integration of elementary modes in BlastSets
We used BlastSets to evaluate the biological relevance of elementary modes by comparing them to the sets of induced or repressed genes described above. We created two different collections of sets of elementary modes named KEGG_EM_1 (EM1) and KEGG_EM_2 (EM2) in BlastSets. EM1 is a collection of single elementary modes, that is, enzymes involved in a given elementary mode are gathered in a set labeled by the name of the mode. EM2 is a collection of pairs of elementary modes: all enzymes involved in two elementary modes that are connected through a common external link form one set in EM2. These two collections of sets of elementary modes are stored in the BlastSets database, and can be queried against user-submitted data via the BlastSets website [46].

Analysis of BlastSets results
Sets of induced and repressed genes in various stress conditions were compared to elementary modes using BlastSets, and lists of elementary modes found to be similar to the submitted sets of induced/repressed genes were obtained. A Perl script was developed to analyze these results and, thus, make it possible to reconstruct the chain of elementary modes that have been activated or repressed in response to each stress condition.
We retrieved the elementary modes (single or pair) that had the highest similarity with the set of induced/repressed genes, called the 'best hit'. First, if the best hit was a single elementary mode, we browsed subsequent hits until we found a pair of elementary modes containing this best hit. Second, once this 'best elementary mode pair' had been found, the rest of the list was browsed in order to find further pairs of elementary modes that were connected to the best pair, that is, pairs of elementary modes having one mode in common with the best elementary mode pair. Third, we could display a chain of pairs of elementary modes that defines the backbone of the metabolic response. If the best hit was a pair of elementary modes, only the second and third steps were performed. Among the elementary modes that could be added to the backbone, we removed all those that were composed of less than three enzymes to ensure that they were significant enough and to avoid the inclusion of short modes that are not specific to a single pathway.

Construction of toxic and non-toxic networks
Using the files containing BlastSets results with EM2, we constructed a matrix representing the usage of elementary modes in response to the different stresses, each row corresponding to an elementary mode and each column corresponding to a stress condition. In each element of the matrix, 1 was entered if the elementary mode was identified in response to the stress, 0 if it was not. A program was developed to compute, for each pair of pathways, the number of conditions where at least one pair of elementary modes spanning both pathways was found to be induced.
In Figure 5, each pathway was represented by a node whose radius was set proportional to the natural logarithm of the number of elementary modes contained in that pathway (for pathways with only one mode, the value was set to 0.5). This radius does not depend on the stress response and is only aimed at enhancing large pathways. Two pathways were connected by an edge if the number of induced pairs of elementary modes spanning both of them was non-zero. We weighted the connections by setting the thickness of edges proportional to the number of stress conditions in which such pairs were found. The weight thus does not depend on the number of active elementary modes in both pathways but on the number of conditions where both pathways contain simultaneously activated elementary modes. For a clearer representation, all weights were reduced by one unit, so that edges of weight 1 are not visible and the smallest visible edges are those of weight 2.

Additional data files
The following additional data are available with the online version of this paper. Additional data file is a list of elementary modes for Saccharomyces cerevisiae. Additional data file 2 is a figure showing induced and repressed metabolic backbones for all stress conditions. Additional data file 3 is a figure of a clustering tree of stress conditions.