^{1}

^{*}

^{2}

^{2}

The authors have declared that no competing interests exist.

Developed the mathematical models and algorithms: NV. Wrote the Introduction and Methods sections: NV. Carried out the experiments: MPP. Wrote the Related work section and Text S1: MPP. Suggested the problem and initiated the research: TS. Contributed in the methodological and experimental parts of the paper: TS. Contributed in the development of the Software: NV MPP TS. Contributed to the writing of the remaining sections of the paper: NV MPP TS.

Systemic approaches to the study of a biological cell or tissue rely increasingly on the use of context-specific metabolic network models. The reconstruction of such a model from high-throughput data can routinely involve large numbers of tests under different conditions and extensive parameter tuning, which calls for fast algorithms. We present

Metabolism comprises all life-sustaining biochemical processes. It plays an essential role in various aspects of biology, including the development and progression of many diseases. As the metabolism of a living cell involves several thousands of small molecules and their conversion, a full analysis of such a metabolic network is only feasible using computational approaches. In addition, metabolism differs significantly from cell to cell and over different contexts. Therefore, the efficient generation of context-specific mathematical models is of high interest. We present

Cell metabolism is known to play a key role in the pathogenesis of various diseases

To maximize the predictive power of a metabolic model when conditioning on a specific context, for instance the energy metabolism of a neuron or the metabolism of liver, recent efforts go into the development of

Most algorithms for context-specific metabolic network reconstruction (see ‘Related work’ section for a short overview) first identify a relevant subset of reactions according to some ‘omics’ information (typically expression data and bibliomics), and then search for a subnetwork of the global network that satisfies some mathematical requirements and contains all (or most of) these reactions

We present

Computing a minimal consistent reconstruction from a subset of reactions of a global network is, however, an NP-hard problem

A metabolic network of _{ij}

Given a metabolic network model with stoichiometric matrix

It has been suggested that network consistency can be detected by a single linear program (LP)

Note that A appears with stoichiometric coefficient 2 in the boundary reaction →2A.

A straightforward solution to the problem would involve iterating through all reactions, computing the maximum and minimum feasible flux of each reaction via an LP that satisfies the constraints in (1). Reactions with minimum and maximum flux zero would then be blocked. This is the idea behind the FVA (Flux Variability Analysis) algorithm and the

In most problems of interest there will be no single mode that renders the whole network consistent, and an iterative algorithm like the one described in the previous section must be used. For performance reasons it would therefore be desirable to be able to establish the consistency of as many reactions as possible in each iteration of the algorithm.

Since consistency implies nonzero fluxes, it is sufficient to optimize a function that just ‘pushes’ all fluxes away from zero. Formally, this amounts to searching for modes

Here we propose an approach to approximately maximize

Returning to the network of

By construction, the above approximation of the cardinality function applies only to nonnegative fluxes. In order to deal with reversible reactions that can also take negative fluxes, we can embed LP-7 in an iterative algorithm (as in the previous section), in which reversible reactions are first considered for positive flux via LP-7, and then they are considered for negative flux. The latter is possible by flipping the signs of the columns of the stoichiometric matrix that correspond to the reversible reactions under testing, in which case the fluxes of the transformed model are again all nonnegative, and the above approximation of the cardinality function can be used. This gives rise to an algorithm for detecting the consistent part of a network that we call

Independently to this work, a similar approach to network consistency testing was recently proposed, called OnePrune

The reconstruction problem involves computing a minimal consistent network from a global network and a ‘core’ set of reactions that are known to be active in a given context. Formally, given (i) a

Our approach hinges on the observation that a consistent induced subnetwork of the global network can be defined via a set of modes of the latter:

This simple result allows one to cast the reconstruction problem as a search problem over sets of modes of the global network:

Note that, without loss of generality, in NLP-8 we can restrict the search for _{j}

Note, however, that MILP-9 does not scale to large networks, for the following reasons: First, it requires computing all elementary modes of the global network, which can be a very large number

An alternative search strategy for computing _{1}-norm minimization, which is a standard approach to computing sparse solutions to (convex) optimization problems

The overall _{1}-norm minimization LP constrained by the set _{1} norm of fluxes in the penalty set _{1} norm in LP-10. For example, suppose in the network of ^{5}

Input: A consistent metabolic network model

Output: A consistent induced subnetwork

flip the sign of the

swap the upper and lower bounds of

Input: A set

Output: The support of a mode that is dense in

The

The

Several algorithms have been published in the last years for extracting condition-specific models from generic genome-wide models like Recon 1. Among them, mCADRE

GIMME | MBA | iMAT | mCADRE | INIT | FASTCORE | |

Optimization | LP | MILP | MILP | MILP | MILP | LP |

Computational cost | low | high | high | high | high | low |

Function required | yes | no | no | yes | yes | no |

Omics required | yes | optional | yes | yes | yes | no |

Code available | yes | yes | yes | yes | no | yes |

GIMME

iMAT

mCADRE

INIT

The closest algorithm to _{1}-norm minimization instead of pruning. The advantage of the former is that it can be encoded by a single LP, resulting in significant overall speedups (see ‘

Generic metabolic reconstructions like Recon 2 are inconsistent models as they contain reactions that are not able to carry nonzero flux due to gaps in the network (see next section). The first step towards obtaining a consistent context-specific reconstruction is therefore to extract the consistent part of a global generic model. This can be achieved by

We report results on two sets of problems, the first involving consistency verification of an input model, and the second involving the reconstruction of a context-specific model from an input model and a core set of reactions. The

In the first set of experiments we applied

c-Yeast (

c-Ecoli (

c-Recon1 (

c-Recon2 (

The results are shown in

c-Yeast | c-Ecoli | c-Recon1 | c-Recon2 | |||||

#LPs | time |
#LPs | time | #LPs | time | #LPs | time | |

fastFVA | 2408 | 3 | 3436 | 3 | 4938 | 9 | 11668 | 207 |

CMC | 18 | 0.5 | 25 | 1 | 49 | 2 | 42 | 11 |

7 | 0.1 | 2 | 0.2 | 9 | 0.4 | 19 | 5 |

In the second set of experiments, we used the

The results for the two settings are shown in

liver core set ( |
strict liver core set ( |
|||||||

IR |
#LPs | time |
IR | #LPs | time | |||

MBA | 1826 | 1573 | 72279 | 7383 | 1887 | 1630 | 71546 | 6730 |

1746 | 1546 | 20 | 1 | 1818 | 1627 | 20 | 1 |

The reconstructed models by

We also compared

Shown are mean values of sizes of reconstructed models (over 50 repetitions for each core set; standard deviations were small and are omitted to avoid clutter) as a function of the size of the core set.

To evaluate

As argued above, the reconstructions obtained by

Healthy (normal homozygote), partial (heterozygote) and full knock-out cases. See text for details.

We also used the

The key advantage of having a fast reconstruction algorithm is that it permits the execution of multiple runs in order to optimize for extra parameters or test different core sets extracted from the input data

Compactness is a key concept in various research areas of biology, such as the minimal genome

(PDF)

We would like to thank Ines Thiele, Ronan Fleming, Nils Christian, Evangelos Symeonidis, Nathan Price, and Rudi Balling for their feedback.