Network-based approaches for linking metabolism with environment
© BioMed Central Ltd 2008
Published: 24 November 2008
Skip to main content
© BioMed Central Ltd 2008
Published: 24 November 2008
Progress in the reconstruction of genome-wide metabolic maps has led to the development of network-based computational approaches for linking an organism with its biochemical habitat.
The most commonly used network representations are 'metabolite-centric'. They consider metabolites as the nodes of the graph and two metabolites are linked if one can be converted into the other by an enzymatic reaction (Figure 1b, left). An alternative network representation is 'enzyme-centric'. It considers the enzymes as nodes and links enzymes that catalyze successive reactions (Figure 1b, right). Although several studies have provided insights into the structure and evolution of a metabolic network, very few have addressed the influence of environment on metabolic network structure in species from diverse environmental conditions. The availability of many completely sequenced genomes means that metabolic-network analysis can now be extended from a few model organisms to species from different branches of the tree of life and living in very different environments. This should enable the elucidation of general principles underlying metabolic networks.
Two recent studies, published in the Proceedings of the National Academy of Sciences by Eytan Ruppin and colleagues (Kreimer et al.  and Borenstein et al. ), provide important insights into links between the environment of an organism and the structure of its metabolic network. Using data from a large number of bacterial metabolic networks, Kreimer et al. address the question of how the topologies of the metabolic networks from different species reflect both genome size and the diversity of environmental conditions the species would encounter. Borenstein et al. set out to identify the 'seed set' - that set of small molecules that are absolutely needed from the external environment - of each species and how this seed set differs across species from different environments.
Several studies have addressed a wide-range of questions using network representation of small-molecule metabolism [5–7]. For instance, at the structural level, the metabolic network of an organism has been shown to have a scale-free topology with few nodes (for example, pyruvate or coenzyme A) reacting with many other substrates [8, 9]. A distinguishing feature of such scale-free networks is the existence of a few highly connected metabolites, which participate in a very large number of metabolic reactions. By definition, when a large number of links integrate several substrates into a single highly connected component, fully separated modules will not exist. This has led to the notion of hierarchical modular structures within the fully connected metabolic network, where a 'module' is defined as a group of nodes that are more connected to each other than to other nodes in the network .
Kreimer et al.  have carried out a comprehensive, large-scale characterization of metabolic-network modularity (defined as in ) using 325 prokaryotic species with sequenced genomes and metabolic networks in the KEGG pathway database . They found that network size was an important topological determinant of modularity, with larger genomes exhibiting higher modularity scores (that is, a higher proportion of edges in the network forming part of modules than would be expected by chance). In addition, several environmental factors were shown to contribute to the variation in metabolic-network modularity across species. In particular, the authors found that endosymbionts and mammal-specific pathogens have lower modularity scores than bacterial species that occupy a wider range of niches. Moreover, among the pathogens, those that alternate between two distinct niches, such as insect and mammal, were found to have relatively high metabolic-network modularity. This supports the notion previously put forward by Parter et al.  that variability in the natural habitat of an organism promotes modularity in its metabolic network. Kreimer et al.  also reconstructed likely ancestral states, and found that modularity tends to decrease from ancestors to descendants; they attribute this to niche specialization and incorporation of peripheral metabolic reactions.
In line with the above effects of environmental diversity on network structure, Pal et al.  observed that bacterial metabolic networks grow by retaining horizontally acquired genes (genes acquired from other species) involved in the transport and catalysis of external nutrients, and that evolutionary changes in networks are primarily driven by adaptation to changing environments. Accordingly, horizontally transferred genes were found to be integrated at the periphery of the network, whereas the central parts remain evolutionarily stable. Indeed, genes encoding physiologically coupled reactions were often found to be transferred together, frequently in operons. This suggests that bacterial metabolic networks evolve by direct uptake of peripheral reactions in response to changing environments .
In this regard, a recent genome-wide study in yeast found that central and highly connected enzymes evolve more slowly than less connected ones and that duplicates of highly connected enzymes tend to have a higher likelihood of retention . Enzymes carrying high metabolic fluxes under natural biological conditions were also found to experience greater evolutionary constraints. Interestingly, however, it was shown that highly connected enzymes are no more likely to be essential to survival than the less connected ones .
Microorganisms constantly monitor their surroundings for the availability of nutrients and other chemicals, using both external and internal sensors to respond dynamically to environmental changes . Integration of the external environment with metabolism occurs through the import of compounds from the environment and results, for example, in a transcriptional response or an allosteric interaction with an enzyme [18–20]. In the second of the recent studies from Ruppin and co-workers, Borenstein et al.  propose a graph-theoretical approach to define these exogenously acquired compounds - the seed set of an organism - and have identified their repertoire across the tree of life (Figure 1b). This is one of the most comprehensive studies so far that links organisms' metabolic circuitry with their environment.
The authors represent the metabolic network of a given species as a directed graph with nodes representing metabolites and edges corresponding to the linking reactions converting substrates to products. Using this, they identify the maximal set of metabolites that can be synthesized from a particular precursor metabolite. This graph-based representation of the metabolic network then enabled them to discover the seed-set compounds for each of the 478 prokaryotic species with available metabolic networks in the KEGG database . On the whole, they found that about 8-11% of the compounds in the metabolic network of an organism correspond to the seed set. Their predictive ability to correctly identify seed compounds reached a precision of 95% when benchmarked against a set of compounds experimentally characterized as being taken up from the environment by the rickettsia that cause the disease ehrlichiosis in humans and animals. Recall values (defined as the percentage of correctly identified seeds of all exogenously acquired compounds) based on the same dataset were low, suggesting that other factors might have a role in the identification of seed compounds of an organism, such as the incompleteness of the metabolic network or ways of acquiring an exogenous compound that cannot be captured by currently available metabolic maps. The resulting compilation, which represents the overall static metabolic interface of each organism characterizing its biochemical habitat, enabled Borenstein et al. to trace the evolutionary history of both metabolic networks and growth environments.
When the seed sets identified in each organism were analyzed in detail, species living in variable environments were found to have more versatile seed sets, in terms of variability of size and diversity of composition. On the other hand, obligate parasites like Buchnera aphidicola and those microorganisms, such as archaea, that live in extreme and narrowly defined environments, were found to have much smaller seed set sizes. These results suggest that although organisms surviving in predictable environments can take up many compounds from their surroundings, this capability is still significantly smaller than in organisms that have to survive in a wide range of niches.
Borenstein et al.  carried out a phylogenetic analysis of the seed sets across different taxa, which suggested not only that an accurate tree of life can be reconstructed from them but that such a tree can provide insights into the evolutionary dynamics of seed compounds. In particular, the study revealed that novel compounds can be integrated into the metabolic network of an organism as either non-seeds or seeds, and that seed compounds are more likely to be lost during evolution than non-seed compounds. From the comparison with ancestral metabolic networks, Borenstein et al.  suggest that the transition from seed to non-seed compound occurs 2.5 times more often than the reverse. This suggested that, of the two main current hypotheses of metabolic network evolution - the 'patchwork' and 'retrograde' models (see Box 1) - the retrograde model, in which pathways evolve in a direction opposite to the metabolic flow, might best explain the observed events. However, the observations of Borenstein et al.  on the high overall rate of integration of non-seed compounds and the relatively high rate of transition of non-seed compounds into seed metabolites, suggest that some aspects of network evolution could be explained by the patchwork and other models. The results highlight the fact that these models are not mutually exclusive, but complementary, and might have contributed to pathway evolution to different extents [21, 22].
It should be noted that there are limitations to studies such as those reported here, in that the incompleteness of metabolic maps, the reversibility of reactions, possible alternative mechanisms controlling metabolic import, and the ignoring of the distinction between catabolic and anabolic pathways can all potentially result in false positives in the identified seed sets. Nevertheless, it is exciting to note that seed sets obtained using the approach developed in these studies not only reflect the metabolic environments of the species themselves but also provide insight into their natural biochemical habitats - the union of all the metabolic environments an organism encounters.
Hence, such approaches can be exploited to study the interaction and association of microbes with other species thriving in similar habitats. This may help in the identification of host-parasite and symbiotic relationships between organisms and also enable the prediction and design of drugs that can precisely target an organism of interest without adversely affecting the host. With the availability of metagenomic data ranging from viromes to biomes , we anticipate that similar approaches can be applied to study metagenomic environments to decipher species relationships and dependencies occurring in large ecological niches, thereby providing insights into ecological imbalances or tradeoffs.
SCJ and MMB acknowledge financial support from the MRC Laboratory of Molecular Biology. SCJ acknowledges financial support from Cambridge Commonwealth Trust. MMB thanks Darwin College and Schlumberger Ltd for generous support. We thank A Wuster, R Janky, K Weber, V Espinosa-Angarica and JJ Díaz-Mejía for critically reading the manuscript and providing helpful comments.