Flux balance analysis accounting for metabolite dilution

A flux balance analysis method for gene essentiality prediction, which takes into account variation in biomass composition under different growth conditions.


Background
A practical approach to gaining biological understanding of complex metabolic networks requires the development of mathematical modeling, simulation, and analysis techniques. Traditional modeling techniques are based on mathematical approaches that require detailed and accurate information regarding reaction kinetics as well as enzyme and metabolite concentrations [1,2]. The lack of sufficient data limits the current applicability of such methods to small-scale systems. This hurdle is surpassed through the use of constraint-based modeling (CBM), which serves to analyze the functionality of genome-scale metabolic networks by relying solely on simple physical-chemical constraints [3,4]. Genome-scale CBM models have already been constructed for more than 50 organisms [5], including common model microorganisms [6,7], industrially relevant microbes [8][9][10][11], various pathogens [12][13][14][15], and recently for human cellular metabolism [16]. Flux balance analysis (FBA) is a key computational approach within the CBM modeling framework [17][18][19] and is frequently used to successfully predict various phenotypes of microorganisms, such as their growth rates, uptake rates, by-product secretion, and knockout lethality (see [3,5,20] for reviews). Traditional kinetic models of cellular metabolism are formulated as a set of differential equations that compute the time derivative of metabolite concentrations (denoted by ) as dependent on reaction rates (denoted by ; which, in turn, depend on metabolic concentration and kinetic constants, denoted by ) and metabolite dilution due to cellular growth (with μ denoting the growth rate) [21]: where S is an m × n stoichiometric matrix, m is the number of metabolites, n is the number of reactions, and S ij represents the stoichiometric coefficient of metabolite i in reaction j. A precise solution to Equation 1 requires determination of the kinetic parameters , which are generally unavailable, resulting in the development of the alternative CBM approach. In CBM, an entire space of possible solutions for the flux distribution is postulated, considering that the metabolic system is constrained by physicochemical, environmental and regulatory constraints. In FBA, this solution space is constrained by the assumption of a quasi steady-state, under which stoichiometric massbalance constraints enforce constant concentrations of intermediate metabolites over time: The uptake and secretion of a pre-defined set of metabolites from and to the environment is facilitated via the definition of exchange reactions in the stoichiometric matrix S [3]. A 1 The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel 3 Computer Science Department, Technion -Israel Institute of Technology, Haifa 32000, Israel Full list of author information is available at the end of the article pseudo growth reaction is defined to simulate the utilization of metabolites during growth, consuming the most abundant biomass constituents based on experimentally determined concentrations (that is, the j-th component in denotes the steady-state concentration of metabolite j). The objective of FBA is to find a steady-state flux distribution, , satisfying Equation 2 alongside additional enzymatic directionality and capacity constraints [3], together permitting a maximal growth rate μ. Accounting only for linear constraints, the resulting space of feasible flux distribution described by FBA is convex (forming a high-dimensional polytope), in which optimal biomass producing solutions can be efficiently searched for via linear programming (LP). The employment of a pseudo growth reaction in FBA to represent the utilization of metabolites as part of growth poses two fundamental problems. First, the metabolite composition of cellular biomass significantly varies across different growth media, genetic backgrounds and growth rates [22][23][24]. Indeed, previous work by Pramanik and Keasling [22,23] has shown that using the correct experimentally measured biomass composition of Escherichia coli under different growth media and growth rates significantly improves FBA flux predictions. However, as FBA is commonly applied to probe metabolic behavior under diverse genetic and environmental conditions for which no metabolite concentration data are available, it has become common practice to employ a constant biomass composition across all conditions [25]. Second, the growth reaction in various CBM models commonly accounts for no more than a few dozen metabolites, for which measured concentrations are available under a specific condition [23]. Ignoring the growth-associated dilution of the remaining metabolites (those not included in the biomass composition in use; required by Equation 1) may result in the prediction of biologically implausible flux distributions, leading to false predictions of gene essentiality and growth rates, as shown in the Results. This problem has been recently addressed by Kruse and Ebenhöh [26], who suggested a method that is based on network expansion to compute the set of producible metabolites under a given growth medium. This method, however, does not enable the prediction of feasible flux distributions that account for the growth-associated dilution of all intermediate metabolites. Another approach, recently suggested by Martelli et al. [27], predicts metabolic fluxes based on Von Neumann's model, which maximizes the growth rate in a metabolic network without assuming mass-balance nor utilizing prior knowledge of a biomass composition. However, similarly to FBA, flux distributions predicted by this method do not fully account for the growth-associated dilution of all intermediate metabolites.
In this paper we describe a variant of FBA, metabolite dilution flux balance analysis (MD-FBA), which aims to predict metabolic flux distributions by accounting for the dilution of all intermediate metabolites that are synthesized under a given condition. As shown below, accounting for growth dilution of intermediate metabolites is especially important for metabolites that participate in catalytic cycles, many of them being metabolic co-factors. Since CBM assumes a steady-state flux distribution and does not predict the actual concentration of the intermediate metabolites, we consider a uniform minimal dilution rate for all intermediate metabolites produced via a non-zero flux through some reaction (assuming a uniform concentration for all intermediate metabolites, following [28]). Figure 1 demonstrates an example network for which FBA and MD-FBA predict different flux distributions, leading to different growth rate and gene essentiality predictions. The biomass in this network is metabolite B, while the input metabolites available in the growth medium are A and X in Figure 1a, and only A in Figure 1b. The synthesis of the biomass precursor B is facilitated via two alternative pathways: through an efficient pathway via v 4 , producing one molecule of B per molecule of A; or through an inefficient pathway via reactions v 2 and v 3 , producing one molecule of B per two molecules of A. Reaction v 4 requires the presence of a co-factor metabolite C, which is recycled via reaction v 8 and synthesized via reactions v 6 and v 7 . Thus, in MD-FBA, the activation of the efficient pathway for synthesizing B via v 4 enforces de novo synthesis of co-factor C via v 6 and v 7 to balance the dilution of this co-factor (Figure 1a, red solid arrows). By contrast, since FBA does not account for metabolite dilution, it would predict a biologically implausible flux distribution in which the steady-state concentration of the co-factor C is maintained via reaction v 8 , without predicting the growth-associated demand for de novo synthesis of this co-factor (Figure 1a, b, blue dot-dash arrows). The different flux distributions predicted by FBA and by MD-FBA under the two growth media yield different growth rate and enzyme essentiality predictions. FBA predicts the activation of the efficient biosynthetic pathway for synthesizing metabolite B under both growth media, resulting in the same growth rate prediction under the two media. MD-FBA, on the other hand, predicts the activation of the efficient biosynthetic pathway when metabolite X is present in the growth medium (with a growth rate prediction similar to that of FBA; Figure 1a) and the activation of the inefficient pathway when metabolite X is absent (resulting in a lower growth rate; Figure 1b). When metabolite X is present in the growth medium, MD-FBA, unlike FBA, predicts that the biosynthetic pathway for the production of co-factor C is activated, with the reactions v 6 and v 7 being essential for achieving maximal growth rate (Figure 1a). When X is absent from the growth medium, FBA predicts r biomass v the activity of the efficient pathway through v 4 and v 8 , with the corresponding enzymes essential for obtaining a maximal growth rate. MD-FBA, however, predicts the inactivation of this efficient pathway and hence the inessentiality of v 4 and v 8 , while predicting the enzymes in the less efficient pathway v 2 and v 3 to be essential for growth ( Figure 1b). Next, we describe the implementation of MD-FBA as a mixed-integer linear programming (MILP) optimization problem and demonstrate its applicability in predicting met-abolic phenotypes, outperforming the commonly used FBA method.

MD-FBA -accounting for growth-associated dilution of all intermediate metabolites
Our method, MD-FBA, aims to predict a feasible flux distribution through a metabolic network under a given environmental and genetic condition, by maximizing the production rate of the biomass (that is, the flux through the biomass reaction) while satisfying a stoichiometric massbalance constraint, accounting for the growth-associated dilution of all produced intermediate metabolites, and satisfying enzymatic directionality and capacity constraints embedded in the model (similarly to FBA). MD-FBA is formulated as a MILP problem as defined in the Materials and methods.

Applying MD-FBA to predict metabolic phenotypes in Escherichia coli
As a benchmark for the prediction performance of MD-FBA, we applied it to the genome-scale metabolic network model of E. coli [6] to predict growth rates and gene essentiality under a diverse set of growth media and gene knockouts. The model of Feist et al. [6] accounts for 1,260 metabolic genes, 2,382 reactions and 1,668 metabolites. As an initial validation, we applied both MD-FBA and FBA to predict E. coli's growth rate for 91 gene knockout strains under 125 different media, yielding a total of 11,375 growth conditions for which measured optical density (OD) data are available via a phenotypic array in the ASAP database [29]. Each medium included a fixed set of metabolites (oxygen, phosphate, water, sulfate, carbon dioxide, hydrogen and metal ions) and alternating carbon and nitrogen sources (the full list of growth conditions (media and gene knockouts) as well as the experimental OD values are available in Additional files 1 and 2, respectively). Different gene knockouts were modeled by forcing a zero flux through the corresponding enzyme-catalyzed reactions [28]; different growth media were modeled by changing the bounds on metabolite uptake from the environment based on specification of the available metabolic nutrients in each medium [28]. Both FBA and MD-FBA predicted no growth for the wild-type strain under 13 growth media and hence these media were removed from further analysis. In an additional 16 growth media the correlation between the growth rates predicted by FBA and MD-FBA across all knockout strains was significantly high (Spearman r > 0.7) and hence these media were also removed from further comparison of the two methods (the results presented below are insensitive to specific choice of a Spearman correlation threshold). For each deletion strain, a Spearman correlation was calculated between the predicted growth rates and the measured OD values across the remaining 96 different Narrow arrows represent the growth-associated dilution of their attached metabolites. Note that the stoichiometric coefficients for reaction v 2 are two molecules of A per one molecule of D. v 1 and v 6 represent the uptake for metabolites A and X, respectively. B is the sole metabolite within the biomass, and hence the flux through v 5 represents the growth rate. Blue (dash-dot) and red (solid) arrows represent reactions predicted to be active by MD-FBA and FBA, respectively, while black (dashed) arrows represent all other reactions. The figure illustrates growth on two media: (a) growth on a medium in which both A and X are present; (b) growth on a medium including only metabolite A. FBA predicts the same growth rate, which is equal to v 1 under both media, while MD-FBA predicts a growth rate equal to v 1 when both A and X are present in the medium and a growth rate equal to 0.5v1 when only A is included in the medium. The latter is due to the fact that when X is absent from the growth medium, MD-FBA cannot activate reactions v 4 and v 8 , since the dilution of metabolites C and C* cannot be satisfied under this medium. The different flux distributions predicted by the two methods lead to different predictions of enzyme essentiality, as detailed in the main text. growth media. For 10 of the 91 gene deletion strains, both FBA and MD-FBA falsely predicted zero growth across all media and these strains were removed from further analysis. The median Spearman correlation obtained by MD-FBA was found to be slightly higher than that obtained via FBA (Wilcoxon test P-value = 0.0145; Figure 2). Several limitations of the MD-FBA method currently restrict its ability to markedly improve the growth rate predictions, as discussed below. Still, in some interesting specific cases MD-FBA outperforms FBA; for example, we examined two minimal growth media, N-acetyl-D-mannosamine and N-acetyl-D-glucosamine, under which the latter yields a higher measured growth rate across all knockout strains, while FBA predicted identical growth rates for all 81 knockout strains under both media. MD-FBA, on the other hand, predicted different growth rates for 67 of the knockout strains under the two growth media, correctly predicting a higher growth rate in N-acetyl-D-glucosamine in 87% of these cases. Extending the gene essentiality analysis under these media for other genes, not included in the ASAP dataset, revealed several additional scenarios in which MD-FBA and FBA predictions significantly differ. We found that, generally, MD-FBA predicts the activation of reactions involved in co-factor biosynthesis that are not activated by FBA (the distribution of reactions whose predicted activity pattern differ between MD-FBA and FBA across various metabolic subsystems is shown in Additional file 3). For example, under succinate minimal medium, MD-FBA predicts that genes in the ubiquinone-8 biosynthetic pathway are essential for growth, whereas FBA predicts these genes to be nonessential ( Figure 3). Specifically, both methods predict that the first part of this pathway, leading to the production of the biomass metabolite 2-octaprenyl-6-hydroxyphenol (black solid edges), is essential under succinate minimal medium, while only MD-FBA predicts that the remaining part of the pathway, leading to ubiquinone-8, is activated. Ubiquinone-8 is an important redox co-factor in E. coli's aerobic respiration, switching between a reduced (q8h2) state and an oxidized (q8) state. While both FBA and MD-FBA predict the cycling of ubiquinone-8 between the reduced and oxidized states under succinate minimal medium (as part of aerobic respiration), only MD-FBA predicts the corresponding requirement for de novo synthesis of this metabolite to accommodate for its growth-associated dilution. Notably, this scenario is similar to that described in the toy model in Figure 1a, where q8 and q8h2 correspond to co-factor metabolites C and C*. As a testimony to the correctness of these predictions, we found that a gene  Context-dependent activity of biosynthetic pathways for the co-factor ubiquinone-8 (q8h2). Edges represent reactions, circular nodes represent metabolites. Black (thin) edges represent reactions predicted to be active both by FBA and by MD-FBA and green (thick) edges represent reactions predicted to be inactive by FBA and active by MD-FBA. MD-FBA correctly predicts the pathway to be activated under succinate minimal medium (where q8h2 is used in aerobic respiration) and to be inactivated under other media. FBA falsely predicts the inactivity of the pathway (downstream to 2ohph), as it does not account for the dilution demand for the production of q8h2, which is not included in its biomass reaction (as it is used only under some environments). 2ohph, 2-octaprenyl-6-hydroxyphenol; 2ombzl, 2-octaprenyl-6-methoxy-1,4-benzoquinol; 2omhmbl, 2-octaprenyl-3methyl-5-hydroxy-6-methoxy-1,4-benzoquinol; 2ommbl, 2-octaprenyl-3-methyl-6-methoxy-1,4-benzoquinol; 2omph, 2-octaprenyl-6methoxyphenol; 2oph, 2-octaprenylphenol; 3ophb, 3-octaprenyl-4hydroxybenzoate; ahcys, s-adenosyl-l-homocysteine; amet, s-adenosyl-l-methionine; atp, adenosine-3-phosphate; co2, carbon dioxide; h, hydrogen; h2o, water; nad, nicotinamide-adenine-dinucleotide; nadh, nicotinamide-adenine-dinucleotide-reduced; o2, oxygen; pi, phosphate; q8h2, ubiquinone-8-reduced.
coding for an enzyme catalyzing two reactions in the ubiquinone-8 biosynthetic pathway, ubiG, was experimentally validated to be essential for E. coli growing under succinate minimal medium [30,31]. Adding ubiquinone-8 to the biomass reaction would indeed solve the false essentiality prediction of ubiG under succinate minimal medium, but would lead to a false essentiality prediction under glucose medium, where ubiG was shown to be nonessential for growth [32] -further emphasizing the advantage of accounting for metabolite dilution in FBA. As an additional validation, we applied MD-FBA to predict gene essentiality for 1,117 genes under glucose and glycerol minimal media, based on measurements from [32] and [33], respectively. Each gene in the dataset was experimentally determined to be either essential or non-essential and the accuracy of the essentiality predictions obtained by FBA and MD-FBA was assessed via an area under curve (AUC) score of the receiver operating characteristic (ROC) curve [34]. This curve represents the true positive and false positive rates as a function of the threshold on the predicted growth rate used to determine gene essentiality (experimental and predicted datasets are available in Additional file 4). Initially, we applied MD-FBA, utilizing the same definition of a biomass as in FBA (as performed above), obtaining very similar AUC scores of 0.888/0.873 and 0.873/0.875 for MD-FBA and FBA, respectively, under glucose/glycerol. However, following further inspection, we found that 15 of the 63 metabolic precursors that make up the biomass are actually designated as co-factors by Feist et al. [6]; hence, MD-FBA is likely to be able to predict the growthassociated demand for their synthesis specifically under the conditions in which they are required, without accounting for them explicitly in the biomass definition. For example, in Figure 1, the dilution of co-factor C is correctly predicted by MD-FBA in a context-dependent manner only when metabolite X is present in the medium, as C is not fixed to be included in the biomass. Falsely including metabolite C in the biomass, although it is required in only some media, would lead to a false prediction of lethality when metabolite X is absent from the growth medium. Given that such an inclusion of co-factors in the biomass may lead to false gene essentiality predictions, their removal from the biomass is likely to improve prediction performance. In order to remove these co-factors from the biomass, we performed the following pre-processing step: in each growth condition examined, each co-factor was in turn removed from the biomass and MD-FBA was then applied to test whether a dilution is predicted for the co-factor under a subset of the gene knockout strains. The analysis revealed three co-factors (10-formyltetrahydrofolate, 2-octaprenyl-6-hydroxyphenol, flavin adenine dinucleotide oxidized (FAD)) whose dilution is dynamically predicted by MD-FBA and they were subsequently removed from MD-FBA's biomass (dilution analysis results are available in Additional file 5).
Repeating the gene essentiality analysis with the reduced biomass considerably improved the prediction performance of MD-FBA ( Figure 4). Specifically, the AUC scores achieved by MD-FBA and FBA under glucose medium are 0.910 and 0.873, respectively, and under glycerol medium are 0.893 and 0.875, respectively. As further support for the assertion that the improved prediction performance is not a mere consequence of removing unnecessary biomass precursors, we re-applied FBA using the same reduced biomass (labeled FBA-r in Figure 4), which showed no improvement over FBA's original performance. These results clearly demonstrate the added-value of considering the context-dependent nature of co-factor requirements, which can change depending on both genetic and environmental factors.

Discussion
This study presents MD-FBA, a variant of FBA for predicting metabolic flux distributions by accounting for growthassociated dilution of all metabolites in a context-dependent manner. The method predicts feasible flux distributions maximizing the production rate of a predefined biomass while accounting for the dilution of all intermediate metabolites, and most importantly, for all metabolic co-factors involved in the process. MD-FBA was shown to successfully predict E. coli's gene essentiality under a variety of growth media and knockout strains, displaying a significant improvement upon the prediction performance of the commonly used FBA method. MD-FBA has two notable limitations, which may contribute to the relatively low improvement in growth rate prediction accuracy (compared to the marked advantage in predicting gene knockout lethality). First, MD-FBA employs a uniform lower bound on the dilution rate of intermediate metabolites which, along with the absence of reactions outside the scope of the network model that degrade intermediate metabolites, implicitly reflects the assumption of a uniform concentration of all intermediate metabolites. A natural extension of MD-FBA would be to consider different lower and upper bounds on concentrations of different metabolites, based on concentration statistics gathered via metabolomic measurements across a variety of conditions (for example, [24]). Notably though, changing the lower bound employed here to a range of possible values and incorporating an upper bound on dilution rates across all metabolites did not improve the prediction performance (data not shown). Second, MD-FBA, similarly to FBA, is based on the assumption that microbial species aim to maximize their growth rate and hence search for feasible flux distributions that maximize biomass synthesis rate. However, previous studies have questioned this hypothesis, suggesting alternative possible optimization criteria. Future studies should investigate the potential usage of such optimization criteria with MD-FBA [35].
More generally, CBM methods that do not rely on optimization may also benefit from variants that account for metabolite dilution during growth. A marked disadvantage of MD-FBA is its dependence on MILP, which is computationally more demanding than LP, utilized by FBA. To improve the run-time of MD-FBA, the amount of integer variables in the MD-FBA formulation may be reduced by employing a previous method to identify the metabolic 'scope' of the medium nutrients. Specifically, Handorf et al. [36] investigated the capacity to produce metabolites from available medium nutrients by applying FBA and a network expansion algorithm, resulting in a production scope for each set of medium metabolites. A potential improvement in run-time may be achieved by calculating the scope of the input growth medium and assigning integer variables only for metabolites in that derived scope, as all the other metabolites will never be able to satisfy their dilution demand. Speeding up the runtime may be of importance when applying MD-FBA to larger networks, such as the recently published human model [16], or when probing the network under multiple knockout configurations [37,38]. An interesting comparison can be made between MD-FBA and a method developed by Price et al. [39] for eliminating futile cycles via the identification of type III extreme pathways (that is, a unique set of convex basis vectors of the flux distribution solution space that do not include exchange reactions). While the extreme pathways method enables the elimination of thermodynamically impossible loops, MD-FBA removes infeasible solutions due to dilu-tion demands. Notably, the latter method also implicitly eliminates type III extreme pathways since these pathways do not satisfy dilution demands of the participating metabolites. Additionally, MD-FBA eliminates solutions that do not involve type III extreme pathways as demonstrated in Figure 1b: when metabolite X is absent from the growth medium, the cycle involving reactions v 4 and v 8 cannot be activated based on MD-FBA, since the dilution of co-factor C cannot be satisfied, although this cycle is not part of a type III extreme pathway. Another appealing application of MD-FBA could be the identification of missing reactions in the model by comparing predicted phenotypes with measured ones, in line with previous works using FBA for this purpose [40]. For example, suppose that in Figure 1a the biosynthetic pathway for metabolite C, through reactions v 6 and v 7 , was not included in the model. In this case, MD-FBA would predict metabolic flow through reactions v 2 and v 3 , such that the enzymes catalyzing these reactions are essential, contrary to experimental essentiality data. Utilizing a method similar to that used by Reed et al. [40], using MD-FBA can infer the missing reactions, v 6 and v 7 . Employing FBA for this purpose would not work since FBA predicts v 2 and v 3 to be non-essential, as the activity of reactions v 4 and v 8 do not depend on the presence of reactions v 6 and v 7 . While this work applied MD-FBA to predict metabolic phenotypes in E. coli, for which a comprehensive and accurate metabolic network model exists, the method can also be applied to any one of a growing number of reconstructed network models [20]. Importantly, the application of MD-FBA to other network models is straightforward and requires no model-specific data curation. To facilitate simple usage of MD-FBA, we provide an implementation of the method in the supplemental website [41]. A particularly interesting potential application of MD-FBA would be for modeling malignant proliferating cells in human cancer, potentially revealing the activity of biosynthetic pathways for various co-factors required to balance their growthassociated dilution. The latter may utilize the recently published model of human cellular metabolism by [16] or [42].
Overall, we expect that future use of MD-FBA will promote improved metabolic phenotypic predictions across a variety of organisms, growth conditions and genetic alterations.

Metabolite dilution flux balance analysis
To formulate a mass-balance constraint while accounting for metabolite growth dilution, we assume that each metabolite j that is produced by a certain reaction at a rate greater than zero (referred to as an 'active metabolite') has a nonzero concentration and should hence be diluted with a rate greater than zero (denoted by d j ). To compute a feasible flux distribution, , and a corresponding vector of dilution rates, , we employ the following optimization problem:

subject to
where a mass-balance constraint, accounting for the dilution of all active metabolites, is formulated in Equation 3. Equation 4 assigns a positive dilution rate above a predefined threshold (denoted by ε) for active metabolites, produced in some non-zero rate in the flux distribution . In our application of the method for E. coli we set ε = 10 -4 μmol/mg, which represents a common concentration of intermediate metabolites [6]. Notably, the model's predictions were robust to different choices of ε values (data not shown). Enzyme directionality and capacity constraints are formulated in Equation 5 by imposing and as lower and upper bounds on flux values. The above optimization problem is solved by formulating it as a MILP problem, replacing the Equation 4 constraint with the linear equations specified below: for each metabolite j in the model, we define an integer variable y j that denotes whether the metabolite is active (that is, being produced by some non-zero reaction in the model), via the following linear constraints: where R j denotes the set of reactions in which metabolite j participates. Equation 6 is a linear formulation of the statement 'if ν i ≥ ε then y j = 1' and Equation 7 is the symmetric for negative fluxes (that is, ν i ≤ -ε). Given the variables, Equation 4 can be formulated via the following constraints: which can be represented in linear form (since εμ < 1) as: A simplified formulation assuming a constant growth rate of μ = 1 in Equation 8 (for calculating the dilution rate of intermediate metabolites) gave qualitatively similar results to the above linear formulation (data not shown). The commercial solver CPLEX running on 64-bit Linux machines was used for solving LP and MILP problems within a few dozens of seconds per problem.