Mapping the landscape of metabolic goals of a cell

Genome-scale flux balance models of metabolism provide testable predictions of all metabolic rates in an organism, by assuming that the cell is optimizing a metabolic goal known as the objective function. We introduce an efficient inverse flux balance analysis (invFBA) approach, based on linear programming duality, to characterize the space of possible objective functions compatible with measured fluxes. After testing our algorithm on simulated E. coli data and time-dependent S. oneidensis fluxes inferred from gene expression, we apply our inverse approach to flux measurements in long-term evolved E. coli strains, revealing objective functions that provide insight into metabolic adaptation trajectories. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0968-2) contains supplementary material, which is available to authorized users.


Flux Balance Analysis (FBA)
To mathematically formulate FBA, let S denote the stoichiometric matrix of dimensions m x n where m is the number of metabolites and n the number of metabolic fluxes, x the vector of metabolic fluxes (internal and external), c the vector of coefficients expressing the cellular objective (e.g., biomass), Z opt the optimal objective value, and x lb , x ub lower and upper bounds, respectively, on the metabolic fluxes implied by empirical evidence of irreversibility or by the composition of the growth medium. The FBA problem is formulated as: , where 0 is the vector of all zeroes and primes indicates transpose.

Inverse Flux Balance Analysis (invFBA)
Let us assume we have a set of measured metabolic flux distributions x i , where i = 1,…,N. Let us also assume that, due to measurement noise, these flux distributions are not necessarily optimal, even if they are feasible solutions of the FBA problem (Eq. S1). With x* denoting an optimal solution of Eq. 1, let ! ≥ 0 denote the suboptimality gap of x i , i.e., the distance between the measured objective function value and the predicted one. This implies: It is well known that for every maximization linear programming problem like Eq. 1 we can write a corresponding Lagrangian dual problem (24): where p, q 1 , and q 2 are the dual variables corresponding to the constraint = , ≥ and ≤ respectively. The optimal objective values of (the primal) problem (Eq. S1) and the dual problem (Eq. S2) are equal (strong duality (24)). Each element p i , i=1,…, m of p can be interpreted as a shadow price for metabolite i. Assuming that none of the inequality constraints in Eq. S1 are binding, there exists a set of prices p so that the value c k of one unit of flux x ! * is equal to the sum of the prices of the metabolites that make up x ! in the proportions given by the kth column of the stoichiometry matrix S.
For any measured (and feasible) metabolic flux distribution ! , and using Eq. S2, the strong duality theorem (which ensures the equality of the optimal primal and dual objectives) [25] implies the following set of conditions: These equations guarantee that ! is near-optimal, that is, its objective cost is ! away from the optimal objective cost value of Eq. 1. Put differently, the equations above describe a set of vectors c that lead to the near-optimality of x i . This set of c's is a cone C, namely, a set that contains all non-negative multiples of its elements. This conclusion is quite intuitive since it suffices to determine the objective coefficient vector up to a non-negative multiplicative constant. Fig. 1 illustrates the conic structure of the set of c's that validate the optimality of a given metabolic flux distribution.
So far, we have characterized a set of c vectors that are consistent with the measurements x i . It is important to note that there are many valid cellular objectives (an infinite number of c vectors in the cone C) that are consistent with the measured flux distribution. Essentially, FBA modeling cannot lead to unique inference of the cellular objective from measured fluxes. Collecting the constraints in Eq. S3 for all measurements i = 1,…,N and minimizing the total sub-optimality gap, we obtain the following optimization problem (see Appendix A for further details): We can interpret the above linear program as seeking a vector c that makes all measurement flux vectors x i as close as possible to optimal flux distributions in the FBA problem (Eq. S1).
The problem in Eq. 4 has the trivial solution c = 0, which is reasonable given our observation that we can only determine the objective coefficient vector up to a non-negative multiplicative constant. It follows that we need to introduce some form of regularization to restrict c to non-trivial choices.
One possible regularization is to add to the formulation in Eq. S4 the L2-norm equality constraint ||c|| 2 = 1. In this case, the objective coefficient vector c lies on the surface of the unit ball in R n . To gain more geometric insight into the proposed L2-regularized invFBA, consider the case of a single measured flux vector, say x (i.e., N = 1). Solving the problem in Eq. 4 amounts to minimizing c′(x j -x) over all c and all extreme points x j of the FBA polyhedron (feasible set of Eq. 1). We have where α is the angle between c and x j -x and ||x j -x|| cosα is the projection of x j -x onto c. Thus, and since ||c|| = 1, minimizing Eq. S5 over c is equivalent to projecting x on all facets of the FBA polyhedron and selecting the c that is perpendicular to the closest facet. As an example, in Fig. S3 (right), we compare the distances d j , j = 1,…,4, between x and the four facets, which yields d 1 as the minimum and sets the corresponding optimal objective coefficient vector to c 1 .
Solving the FBA problem in Eq. S1, in practice, often leads to multiple optimal solutions. To select a unique optimal solution from the optimal solution set, a second optimization is required. In particular, one minimizes the L1-norm of the metabolic flux distribution subject to the constraints of Eq. S1 and an additional constraint that guarantees the same objective value is achieved. The formulation is: An L1-norm is appealing because of its sparsity-inducing properties, which can help the biological interpretation of the solution. An L1-norm constraint can also be seen as a relaxation of the combinatorial problem that minimizes the number of nonzero elements of c in Eq. S4. In the same spirit, we propose the regularization constraint ! ! !!! = 1 and add it to Eq. S4, which leads to the formulation: Again motivated by the second step of FBA in Eq. S6, we propose a subsequent step in invFBA to minimize the L1-norm of c = (c 1 ,…,c n ) vectors that solve Eq. S6: Part of the optimal solution of Eq. S8 is a sparse c vector that renders the given set of measured metabolic flux distributions x 1 ,…,x N near-optimal in the FBA optimization (Eq. S1). One can then interpret non-zero elements of c as corresponding to important metabolic fluxes that are critical in the FBA optimization context and provide a minimal description of the cellular objective function. In the sequel, when we refer to the invFBA algorithm, we mean the two-step procedure of solving problems Eq. S7 and Eq. S8. Due to the L1 regularization and the use of multiple flux vectors x i as inputs to invFBA, the resulting c may not be perpendicular to one of the hyperplanes defining the FBA polytope; it can in fact be interior to the cone C containing all valid c's.
Problem [S8] is a linear programming problem and it can be viewed as the convex relaxation of a problem that minimizes the L0-norm of c (i.e., the number of nonzero elements of c) instead of minimizing the L1-norm of c. To pursue more sparse objective functions, an integer programming problem is introduced to minimize the L0-norm of c. This problem is formulated as: 1], the binary variable ! is the indicator of whether ! is nonzero (then ! = 1) or not (then,

LASSO version of invFBA
Here we provide an alternative LASSO version of invFBA that can be used instead of Eq. S7 and Eq. S8.
The key idea is to add a sparsity-inducing L1 penalty for c in the objective. The formulation is: where is a tunable parameter that controls the sparsity of c.

Objective Variability Analysis
To analyze the variability of each element in the objective function within the optimal solution space, we developed a method called Objective Variability Analysis (OVA). OVA is a heuristic optimization method used to compute the upper and lower bound of the elements in the objective function. The first step of OVA is to solve the linear optimization problem given in Eq .S8.
By solving this problem, we can obtain an optimal solution for c and the optimal objective value ! ! !!! = Z !"# ! . Let R be the set of reaction indexes which are important in the FBA problem. To find the possible range for each c r , r ϵ R , we solve and an identical problem with the only difference being that we maximize ! -| | (instead of minimizing) so as to find the largest possible value of c r and maintain a small L1 norm of c . Solving these problems yields upper and lower bounds on each elements in the objective function.
To apply OVA in practice, some extra constraints on c should be added to Eq. S10. Consider reactions represented in the flux vectors ! which have a flux equal to zero for all . For these reactions, the corresponding elements c j in c can take arbitrary feasible values because of the term ! ! in Eq. S10. For this reason, it is meaningless to run OVA on these c j. To that end, we set c j =0 for all those indices. In the case study involving simulated E. coli data, since the flux distributions are very sparse, we applied this technique and focused on the non-trivial reactions only.

Generation of noisy flux sets
In order to generate simulated feasible flux vectors around a defined point, containing a given amount of noise, we devised the following optimization problem: where r is a random objective function, x i is the noisy flux distribution, x* is the pre-computed optimal flux distribution, and σ 2 denotes the largest Euclidean distance between optimal flux and noisy flux. Changing the value of σ 2 yields different magnitudes of noise.