Influence of metabolic network structure and function on enzyme evolution
© Vitkup et al.; licensee BioMed Central Ltd. 2006
Received: 6 September 2005
Accepted: 7 April 2006
Published: 9 May 2006
Most studies of molecular evolution are focused on individual genes and proteins. However, understanding the design principles and evolutionary properties of molecular networks requires a system-wide perspective. In the present work we connect molecular evolution on the gene level with system properties of a cellular metabolic network. In contrast to protein interaction networks, where several previous studies investigated the molecular evolution of proteins, metabolic networks have a relatively well-defined global function. The ability to consider fluxes in a metabolic network allows us to relate the functional role of each enzyme in a network to its rate of evolution.
Our results, based on the yeast metabolic network, demonstrate that important evolutionary processes, such as the fixation of single nucleotide mutations, gene duplications, and gene deletions, are influenced by the structure and function of the network. Specifically, central and highly connected enzymes evolve more slowly than less connected enzymes. Also, enzymes carrying high metabolic fluxes under natural biological conditions experience higher evolutionary constraints. Genes encoding enzymes with high connectivity and high metabolic flux have higher chances to retain duplicates in evolution. In contrast to protein interaction networks, highly connected enzymes are no more likely to be essential compared to less connected enzymes.
The presented analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. Our results underscore the need for systems-based approaches in studies of molecular evolution.
Molecular networks and the genes encoding their building blocks represent two different levels of biological organization that interact in evolution. On the one hand, genetic changes such as point mutations, gene deletions, and gene duplications influence the structure and evolution of these networks. Conversely, network function may constrain the kinds of mutations that can be tolerated, and thus how genes evolve. Existing work on the structure and evolution of molecular networks has mainly focused on protein interaction networks [1–6]. Such networks are very heterogeneous: they contain large macromolecular complexes, regulatory interactions, signaling interactions, and interactions of proteins that provide structural support for a cell. As a result, it is difficult to ascertain how network structure reflects network function. A large fraction of false positives and false negatives in protein interaction networks [7, 8] further complicates the structure to function analysis. In contrast, cellular metabolic networks are relatively well-characterized in several model organisms such as Saccharomyces cerevisiae [9, 10] and Escherichia coli . Their function - biosynthesis and energy production - is also well understood, as is the relationship of network structure to network function.
In the present study, we ask how the topology of a metabolic network and the metabolic fluxes (a metabolic flux is the rate at which a chemical reaction converts reactants into products) through reactions in the network influence the evolution of metabolic network genes through point mutations and gene duplication. Our results suggest that both network structure and function need to be understood to fully appreciate how metabolic networks constrain the evolution of their parts. The present study has become possible with the recent publication of a comprehensive compendium of metabolic reactions in the yeast Saccharomyces cerevisiae . This compendium comprises 1,175 metabolic reactions and 584 metabolites, and involves about 16% of all yeast genes.
Using the stoichiometric equations that describe chemical reactions, we calculate the connectivity of an enzyme as the number of other metabolic enzymes that produce or consume the enzyme's products or reactants (see Materials and methods and Additional data file 1). In other words, a metabolic enzyme A and a metabolic enzyme B are connected if they share the same metabolite as either a product or reactant. Highly connected enzymes in this representation are enzymes that share metabolites with many other enzymes. Including the most highly connected metabolites and cofactors such as ATP or hydrogen in a network representation would render the network structure dominated by these few nodes, and would obscure functional relationships between enzymes. We thus excluded the top 14 most highly connected metabolites: ATP, H, ADP, pyrophosphate, orthophosphate, CO2, NAD, glutamate, NADP, NADH, NADPH, AMP, NH3, and CoA . The results we report below are qualitatively insensitive to the exact number of removed metabolites.
Highly connected enzymes evolve slowly
Why do highly connected enzymes show greater evolutionary constraint (smaller Ka/Ks)? One possibility is that this correlation is primarily mediated by the corresponding gene expression level . Indeed, confirming previous observations , we found a significant negative correlation between the ratio Ka/Ks and mRNA expression levels (Spearman's rank correlation r = -0.33, P = 5.5 × 10-10; Pearson's correlation r = -0.30, P = 3.6 × 10-8). Information on mRNA expression of metabolic genes was obtained from the study by Holstege et al.  in which the number of mRNA molecules per cell was estimated based on microarray data. We also found a relatively weak correlation between connectivity and expression levels (Spearman's rank correlation r = 0.11, P = 4.6 × 10-2). Nevertheless, a partial correlation analysis - controlling for mRNA expression levels - between gene connectivity and evolutionary constraint Ka/Ks shows that enzymes in highly connected parts of the network evolve slowly independent of expression levels (Spearman's partial correlation r = -0.18, P = 1.4 × 10-3; the P value for Spearman's partial correlation was estimated by randomization).
Enzymes that carry large metabolic fluxes evolve slowly
How well a metabolic network supports cell growth can be computationally quantified through the apparatus of metabolic flux analysis . In flux balance analysis, the constraints imposed by stoichiometry and reversibility of chemical reactions are used to restrict the space of feasible metabolic fluxes. The constrained system can be subjected to an optimization procedure to obtain a flux distribution that maximizes some desirable metabolic property. Because cellular growth-rate is an important component of the fitness in a single-cell organism, biomass production is often used as the property being optimized. The predictions of flux balance analysis are often in good agreement with experimental results for E. coli [18, 19] and S. cerevisiae .
Correlation between enzymatic flux magnitude and evolutionary constraint Ka/Ks
Maximum uptake rates (mmol/gDW/h)
Spearman's rank correlation (P value) with zero fluxes
Spearman's rank correlation (P value) without zero fluxes
-0.28 (P = 3.8 × 10-3)
-0.25 (P = 3.6 × 10-6)
-0.31 (P = 1.7 × 10-3)
-0.22 (P = 5.7 × 10-5)
-0.26 (P = 9.3 × 10-3)
-0.21 (P = 1.2 × 10-4)
-0.27 (P = 6.4 × 10-3)
-0.20 (P = 2.5 × 10-4)
-0.25 (P = 1.3 × 10-2)
-0.20 (P = 1.8 × 10-6)
-0.08 (P = 0.45)
-0.21 (P = 9.2 × 10-5)
-0.010 (P = 0.39)
-0.19 (P = 3.7 × 10-4)
Gene duplication correlation with connectivity and flux
Connectivity, essentiality, and metabolic robustness
Evolutionary constraints on enzymes are indirect indicators of metabolic robustness to amino acid changes, changes that a metabolic network tolerated for well over millions of years of evolution. Another type of biological robustness is that against complete gene deletions. Robustness against gene deletions can be derived from laboratory studies in which the effects of gene deletions on growth rate and other indicators of fitness are studied [23, 24]. These studies determine essential genes, that is, genes whose elimination in one or more laboratory environments is effectively lethal. Our use of available essentiality data is motivated by the observation that highly connected proteins in protein interaction networks may be more likely to be essential to a cell . We carried out analyses using data on essential genes derived from a large scale gene deletion study by Giaever et al. , and used the Saccharomyces genome database (SGD)  to collect the essentiality data.
In sum, we demonstrate that both highly connected enzymes and enzymes that carry high metabolic fluxes in the yeast metabolic network have tolerated fewer amino acid substitutions in their evolutionary history. Why are enzymes carrying larger fluxes more constrained? The likely answer comes from the observation that most mutations affecting enzymatic activity may reduce rather than increase flux. Enzymes carrying high fluxes tend to have reaction products that enter a large number of metabolic pathways. Consequently, a mutational reduction in the activity of such enzymes should be more detrimental than a reduction in the activity of enzymes with lower flux.
We also show that the genes encoding enzymes with high flux have more duplicates. Importantly, we do not argue that duplications arise more frequently for genes whose products carry high flux, but that such duplications are more likely to be preserved in evolution, because of the advantage - higher flux - they provide. While a gene's duplicates can initially be preserved through an advantageous increase in metabolic flux, after divergence they may provide other functional benefits . Divergence of metabolic genes in their expression and regulation is well-established for gene in intensely studied parts of metabolism, such as tricarboxylic acid cycle enzymes .
We found that the association between predicted enzymatic flux and evolutionary rate is most pronounced for carbon sources that dominate the natural environment of yeast. This suggests that one can use the association between flux and evolutionary constraint to search for conditions that dominated the evolution of metabolic networks. Similar analyses, which use genomic data to infer the environment that has shaped an organism's evolution, have been used before to show that carbon limitation may have influenced the evolution of the E. coli metabolic network more strongly than nitrogen limitation , and to show that yeast evolution favored fermentation over respiration .
A previous study by Hahn et al.  reported that, based on amino acid divergence, in the E. coli metabolic network there exists no statistically significant association between enzyme connectivity and evolutionary constraint. We emphasize that any contradiction between this earlier work and our results is only apparent. First, the earlier study was based on a much smaller set of enzymes (n = 108 as opposed to n = 350 here), and thus had less statistical power. Nevertheless, two different statistical measures in the previous study showed, like we do here, a negative association between connectivity and evolutionary constraint, albeit not at P < 0.05. Second, because of the lack of sufficient sequence information for a closely related sister species of E. coli, the previous study used only amino acid divergence K a and not the preferable K a /Ks to indicate evolutionary constraint. In fact, the correlation between connectivity and Ka is very similar between the present study and the previous work (Spearman's rank correlation r = -0.13, P = 1.2 × 10-2 here versus Spearman's rank correlation r = -0.15, P = 7 × 10-2 in the study by Hahn et al.).
It should not be surprising that the observed associations are weak in magnitude. The reason for the low magnitude is that many other factors influence the evolution of enzyme-coding genes. Two of these factors are gene expression levels (discussed in the paper) and constraints stemming from the tertiary and quaternary structure of enzymes, which may differ among enzymes (little is known about such constraints). The key point is that besides all these other factors, metabolic network function and structure also has a clear influence on protein evolution.
How do our results on the yeast metabolic network relate to earlier work on protein interaction networks? There, a similar relationship between protein connectivity and evolutionary constraint has been suggested [4, 5]; however, this association exists for different reasons. Highly connected proteins in protein interaction networks may evolve slowly because a larger fraction of a highly connected protein's sequence is involved in protein interactions and may thus be evolutionarily constrained . In contrast, high protein connectivity in the metabolic network is established not through protein-protein interactions, but through consumption or production of widely used metabolites. In metabolic networks, mutations in enzyme-coding genes - changing reaction rates and concentrations - may have especially deleterious consequences for widely used metabolites. Consequently, highly connected metabolic enzymes may evolve slowly due to functional as opposed to structural constraints. Our ability to consider fluxes through enzymes in a metabolic network allows us to relate the functional role of each enzyme in a network to its rate of evolution. Such a functional analysis of a genome-scale network has no counterpart in any other genome-scale network studied thus far.
In conclusion, our analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. In the long run, system analyses of biological networks will allow us to increasingly place the evolution of genes in the larger context in which they operate, as building blocks of cellular networks.
Materials and methods
We used a comprehensive collection of the yeast S. cerevisiae metabolic reactions by Foster et al.  to calculate metabolic enzyme connectivities. In addition to enzymatic reactions assigned to 671 open reading frames (ORFs), the collection contains reactions unassigned to known ORFs, transport reactions, and reactions represented by large macromolecular complexes. These reactions were used to calculate other enzyme connectivities but were excluded from the main analysis. Large macromolecular complexes (containing several ORFs) were represented by single enzymatic nodes in the calculation of connectivities for other metabolic enzymes. In order to include only functional relationships in the calculation of the enzyme connectivities, we excluded the 14 highly connected metabolites and co-factors (as described in the main text). As a result of the exclusion, a small fraction (5%) of network enzymes became disconnected from the network (they have zero connectivity). These enzymes were not included in the analysis.
Flux balance analysis
Flux balance analysis (FBA) was used to obtain metabolic flux distribution as described previously [10, 17, 19]. The network by Forster et al.  was used in all flux balance calculations. The in silico network of yeast metabolism includes central carbon metabolism, transmembrane transport reactions, pathways responsible for the synthesis and degradation of amino acids, nucleic acids, vitamins, cofactors, and lipids. In total, the network consists of 733 metabolites and 1,175 metabolic reactions. In the flux-balance analysis, the constraints limiting nutrient uptake, reaction irreversibility, and steady-state conservation of metabolite concentrations are applied. The fluxes optimal for growth are then obtained by maximization of biomass production using linear optimization. Linear optimization was performed using the GNU Linear Programming Kit .
We identified duplicates in the S. cerecisae genome using a previously described whole-genome analysis tool . Briefly, the tool locates gene duplicates in a genome using BLASTP  and aligns them globally with the Needleman and Wunsch dynamic programming alignment algorithm . Putative duplicate pairs with less than 40% amino acid similarity or less than 100 aligned amino acid residues were excluded; for the remaining pairs we calculated the number of substitutions per synonymous site (Ks) and the number of substitutions per non-synonymous site (Ka) using the maximum likelyhood models of Muse and Gaut  and Goldman and Yang .
The average Ka/Ks, Ka, and Ks values used in the analysis were obtained from the study by Kellis et al. . In a complementary approach, we also recalculated the average ratios using the maximum-likelihood method of Yang and Nielsen  and obtained qualitatively similar results.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a figure showing examples of metabolic connectivity. (a) An example of the metabolic reaction network from sphingoglycolipid metabolism; metabolites are drawn as small circles (DHSP, sphinganine 1-phosphate; PETHM, ethanolamine phosphate; SPH, sphinganine; CDPETN, CDPethanolamine; ETHM, ethanolamine) and enzyme-encoding genes are shown in rectangles. (b) Metabolic connectivity of the dpl1 gene (solid edges), as defined by the reactions shown in (a). The dpl1 gene has a total of six metabolic connections: two established through ethanolamine phosphate (red edges); and four through sphinganine 1-phosphate (blue edges). Metabolic connections between other enzymes are show by dashed edges. Additional data file 2 demonstrates the relationship between enzyme connectivity and the average amino acid divergence Ka. Spearman's rank correlation r = -0.13, P = 1.6 × 10-2. Additional data file 3 shows the relationship between enzyme connectivity and the average silent divergence Ks. Spearman's rank correlation r = -0.056, P = 0.30. Additional data file 4 is a histogram of the calculated metabolic fluxes in the yeast network for aerobic growth on glucose (maximal uptake rate for glucose 15.3 mmol/g dry weight/h; oxygen 0.2 mmol/g dry weight/h). Note the small number of fluxes - representing glycolysis - with disproportionately large magnitudes. Similar flux distributions were also obtained for other growth conditions. Additional data file 5 shows the correlation between non-zero enzymatic flux through a reaction and the number of duplicates of the respective enzyme's coding gene. Additional data file 6 provides connectivity and evolutionary parameters (Ka/Ks, Ka, Ks) for yeast metabolic enzymes.
We thank Dr Andrey Rzhetsky, Dr Uwe Sauer, and Dr Eugene Koonin for valuable discussions. We also thank two anonymous reviewers for several very helpful suggestions.
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138.PubMedView ArticleGoogle Scholar
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.PubMedView ArticleGoogle Scholar
- Pal C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158: 927-931.PubMedPubMed CentralGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296: 750-752. 10.1126/science.1068696.PubMedView ArticleGoogle Scholar
- Jordan IK, Wolf DM, Koonin EV: No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol. 2003, 3: 1-12. 10.1186/1471-2148-3-1.PubMedPubMed CentralView ArticleGoogle Scholar
- Hahn MW, Conant GC, Wagner A: Molecular evolution in large genetic networks: does connectivity equal constraint?. J Mol Evol. 2004, 58: 203-211. 10.1007/s00239-003-2544-0.PubMedView ArticleGoogle Scholar
- Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750.View ArticleGoogle Scholar
- Spinzak E, Sattah S, Margalit H: How reliable are experimental protein-protein interaction data ?. J Mol Biol. 2003, 327: 919-923. 10.1016/S0022-2836(03)00239-0.View ArticleGoogle Scholar
- Karp PD, Paley S, Romero P: The Pathway Tools software. Bioinformatics. 2002, 18: S225-S232.PubMedView ArticleGoogle Scholar
- Forster J, Famili I, Fu P, Palsson BO, Nielsen J: Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 2003, 13: 244-253. 10.1101/gr.234503.PubMedPubMed CentralView ArticleGoogle Scholar
- Edwards JS, Palsson BO: The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci USA. 2000, 97: 5528-5533. 10.1073/pnas.97.10.5528.PubMedPubMed CentralView ArticleGoogle Scholar
- Kharchenko P, Vitkup D, Church GM: Filling gaps in a metabolic network using expression information. Bioinformatics. 2004, 20: I178-I185. 10.1093/bioinformatics/bth930.PubMedView ArticleGoogle Scholar
- Li W-H: Molecular Evolution. 1997, Sunderland: Sinauer AssociatesGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-254. 10.1038/nature01644.PubMedView ArticleGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43.PubMedView ArticleGoogle Scholar
- Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA: Dissecting the regulatory circuitry of a eukaryotic genome. Cell. 1998, 95: 717-728. 10.1016/S0092-8674(00)81641-4.PubMedView ArticleGoogle Scholar
- Varma A, Boesch BW, Palsson BO: Biochemical production capabilites of Escherichia coli. Biotech Bioeng. 1993, 42: 59-73. 10.1002/bit.260420109.View ArticleGoogle Scholar
- Edwards JS, Ibarra RU, Palsson BO: In silico predictions of Escherichia coli metabolic capabilites are consistent with experimental data. Nat Biotechnol. 2001, 19: 125-130. 10.1038/84379.PubMedView ArticleGoogle Scholar
- Segre D, Vitkup D, Church GM: Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci USA. 2002, 99: 15112-12117. 10.1073/pnas.232349399.PubMedPubMed CentralView ArticleGoogle Scholar
- Foster J, Famili I, Palsson BO, Nielsen J: Large-scale evaluation of in-silico gene deletions in Saccharomyces cerevisiae. OMICS. 2003, 7: 193-202. 10.1089/153623103322246584.View ArticleGoogle Scholar
- Strathern JN, Jones EW, Broach JR: The Molecular Biology of the Yeast Saccharomyces. Metabolism and Gene Expression. 1982, Cold Spring Harbor Press, NYGoogle Scholar
- Papp B, Pal C, Hurst LD: Metabolic network analysis of the causes and evolution of the enzyme dispensability in yeast. Nature. 2004, 429: 661-664. 10.1038/nature02636.PubMedView ArticleGoogle Scholar
- Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, et al: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935.PubMedView ArticleGoogle Scholar
- Steinmetz LM, Scharfe C, Deutschbauer AM, Mokranjac D, Herman ZS, Jones T, Chu AM, Giaever G, Prokisch H, Oefner PJ, Davis RW: Systematic screen for human disease genes in yeast. Nat Genet. 2002, 31: 400-404.PubMedGoogle Scholar
- Dwight SS, Balakrishnan R, Christie KR, Costanzo MC, Dolinski K, Engel SR, Feierboch B, Fisk DG, Hirchman J, Hong EL, et al: Saccharomyces genome database: underlying principles and organisation. Brief Bioinform. 2004, 5: 9-22. 10.1093/bib/5.1.9.PubMedPubMed CentralView ArticleGoogle Scholar
- Mahadevan R, Palsson BO: Properties of metabolic networks: structure versus function. Biophys J. 2005, 88: L07-L09. 10.1529/biophysj.104.055723.PubMedPubMed CentralView ArticleGoogle Scholar
- Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li WH: Role of duplicate genes in genetic robustness against null mutations. Nature. 2003, 421: 63-66. 10.1038/nature01198.PubMedView ArticleGoogle Scholar
- Wagner A: Robustness against mutations in genetics networks of yeast. Nat Genet. 2000, 24: 355-361. 10.1038/74174.PubMedView ArticleGoogle Scholar
- Edwards JS, Palsson BO: Robustness analysis of the Esherichia coli metabolic network. Biotechnol Prog. 2000, 16: 927-939. 10.1021/bp0000712.PubMedView ArticleGoogle Scholar
- Kuepfer L, Sauer U, Blank LM: Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res. 2005, 15: 1421-1430. 10.1101/gr.3992505.PubMedPubMed CentralView ArticleGoogle Scholar
- McAlister-Henn L, Small WC: Molecular genetics of yeast TCA cycle isozymes. Prog Nucleic Acid Res Mol Biol. 1997, 57: 317-339.PubMedView ArticleGoogle Scholar
- Wagner A: Inferring lifestyle from gene expression patterns. Mol Biol Evol. 2000, 17: 1985-1987.PubMedView ArticleGoogle Scholar
- Makhorin A: GNU Linear Programming Kit. 2001, Boston: Free Software FoundationGoogle Scholar
- Conant GC, Wagner A: GenomeHistory: a software tool and its application to fully sequenced genomes. Nucleic Acids Res. 2002, 30: 3378-3386. 10.1093/nar/gkf449.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities for amino acid sequences of two proteins. J Mol Biol. 1970, 48: 443-453. 10.1016/0022-2836(70)90057-4.PubMedView ArticleGoogle Scholar
- Muse SV, Gaut BS: A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994, 11: 715-724.PubMedGoogle Scholar
- Goldman N, Yang Z: A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994, 11: 725-736.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.