Sampling the Solution Space in Genome-Scale Metabolic Networks Reveals Transcriptional Regulation in Key Enzymes

Genome-scale metabolic models are available for an increasing number of organisms and can be used to define the region of feasible metabolic flux distributions. In this work we use as constraints a small set of experimental metabolic fluxes, which reduces the region of feasible metabolic states. Once the region of feasible flux distributions has been defined, a set of possible flux distributions is obtained by random sampling and the averages and standard deviations for each of the metabolic fluxes in the genome-scale model are calculated. These values allow estimation of the significance of change for each reaction rate between different conditions and comparison of it with the significance of change in gene transcription for the corresponding enzymes. The comparison of flux change and gene expression allows identification of enzymes showing a significant correlation between flux change and expression change (transcriptional regulation) as well as reactions whose flux change is likely to be driven only by changes in the metabolite concentrations (metabolic regulation). The changes due to growth on four different carbon sources and as a consequence of five gene deletions were analyzed for Saccharomyces cerevisiae. The enzymes with transcriptional regulation showed enrichment in certain transcription factors. This has not been previously reported. The information provided by the presented method could guide the discovery of new metabolic engineering strategies or the identification of drug targets for treatment of metabolic diseases.


Introduction
Systems Biology aims to use mathematical models to integrate different kinds of data in order to achieve a global understanding of cellular functions. The data to be integrated differ both in their nature and measurability. The availability of DNA microarrays allows for the comparative analysis or mRNA levels between different strains and conditions. These data provide genome-wide information, and changes in expression at different conditions are expressed in statistical terms such as p-values or Z-scores that quantify the level of significance in transcriptional changes. The availability of annotated genome-scale metabolic networks allowed mapping of the transcriptional changes in metabolic genes on to their corresponding metabolic pathways and defining significantly up or down regulated sub-networks [1]. Even though this allows for identification of transcriptional hot-spots in metabolism, this does still not provide information about whether there are any changes in metabolic fluxes in these pathways, as it has been shown that in general there is no clear correlation between gene expression and protein concentration [2] or metabolic flux [3,4].
Metabolic fluxes are the result of a complex interplay between enzyme kinetics, metabolite concentrations, gene expression and translational regulation. Metabolic fluxes can be directly measured using 13 C labeling experiments [5]. However, flux data obtained using this approach differ from gene expression data in two main features: 1) their determination is only possible for a relatively small subset of all the reactions in a genome-scale metabolic network and 2) they are indirect data in the sense that the fluxes are quantified obtained by fitting measured labeling patterns using a simple metabolic model. The complexity of the mRNA-flux dependence and the disparity in the nature of both kinds of data make their integration an important challenge.
In this paper we propose a method to integrate gene expression data with flux data by transforming a limited amount of quantitative flux data into a genome-scale set of statistical scores similar to the one obtained from DNA microarrays. In order to do that, a set of experimental exchange fluxes are fixed for each of the studied conditions or for each of the strains investigated, and a sampling algorithm is then used to obtain a set of flux distributions satisfying the experimental values. This approach allows for obtaining means and standard deviations for each flux in the genome-scale network. From the mean and standard deviation it is possible to derive statistical scores for the significance of flux change between conditions [6,7]. Random sampling in the region of feasible flux distributions has been previously used to study the statistical distribution of flux values and determine a flux backbone of reactions carrying high fluxes [8] as well as to define modules of reactions whose fluxes are positively correlated [9,10]. Also mitochondria related diseases have been analyzed using random sampling [11]. All the works published so far used the Hit and Run algorithm to perform the sampling [7].
By dividing the average difference among two conditions (e.g. carbon sources or mutant strains) by its standard deviation, it is possible to obtain Z scores for each metabolic flux. These scores can be transformed into p-values that measure the significance of change of each flux (see methods). By comparing these p-values with the p-values derived from gene-expression arrays, the enzymes in the network can be classified as: 1) enzymes that have a significantly correlated change both in flux and expression level (reactions showing transcriptional regulation), 2) enzymes that show a significant change in expression but not in flux (we will refer to them as showing post-transcriptional regulation) and 3) enzymes that show significant changes in flux but not a change in expression (metabolic regulation). Hereby we provide a framework that allows for global classification of reaction fluxes into those that are transcriptionally regulated, post-transcriptionally regulated and metabolically regulated (see Fig. 1). This will have substantial impact on the field of metabolic engineering where changes in gene-expression are often used as the key means to alter metabolic fluxes. In the paper we show the use of the presented framework for the analysis of the yeast Saccharomyces cerevisiae grown at different growth conditions and for the analysis of different deletion mutants.
The combined use of random sampling of genome-scale metabolic networks and expression data allows for global mapping of reactions that are either transcriptionally or metabolically regulated. This information can be used to guide the engineering of microbial strains or as a diagnosis tool for studying metabolic diseases in humans. In particular we should highlight that reactions in which there is no relation between gene transcription level and metabolic flux are not suitable targets for flux increase via gene over-expression. Through analysis of different data sets the method revealed that many changes in gene expression are not correlated with a corresponding change in metabolic fluxes. The use of gene-expression data alone can therefore be misleading. However, our method allowed for identification of many specific reactions that are indeed transcriptionally regulated, and we further identified that the expression of these enzymes is regulated a few key transcription factors. This fact suggests that the

Author Summary
The sequencing of full genomes and the development of high-throughput analysis technologies have made available both genome-scale metabolic networks and simultaneous transcription data for all the genes of an organism. Genome-scale metabolic models, with the assumption of steady state for the internal metabolites, allow the definition of a region of feasible metabolic flux distributions. This space of solutions can be further constrained using experimental flux measurements (normally production or uptake rates of external compounds). Here a random sampling method was used to obtain average values and standard deviations for all the reaction rates in a genome-scale model. These values were used to quantify the significance of changes in metabolic fluxes between different conditions. The significance in flux changes can be compared to the changes in gene transcription of the corresponding enzymes. Our method allowed for identification of specific reactions that are transcriptionally regulated, and we further identified that these reactions can be ascribed to a few key transcription factors. This suggests that the regulation of metabolism has evolved to contain a few flux-regulating transcription factors that could be the target for genetic manipulations in order to redirect fluxes. regulation of metabolism has evolved to contain a few fluxregulating transcription factors that could be the target for genetic manipulations in order to redirect fluxes.

Comparisons between different sampling methods
Here we propose a sampling method that finds extreme solutions among the feasible flux distributions of the metabolic network. These solutions correspond to the corners in the region of allowed flux distributions, and in mathematical terms they are elements of the convex basis of the region of feasible solutions (which is a convex set). The COBRA Toolbox [12] includes a random sampling option that uses the Hit and Run algorithm [13] to obtain points uniformly distributed in the region of allowed solutions. The difference between the two sampling methods is illustrated in Fig. 2.
In order to assess the accuracy of our sampling method to estimate the average fluxes and their standard deviations, we compared a set of internal fluxes measured with 13 C labeling [14] with predictions using 500 sampling points obtained using the sampling method in the convex basis and 500 sampling points obtained using the sampling algorithm implemented in the COBRA Toolbox. The results are summarized in Table 1 where our method is labeled Convex Basis (CB), because it samples elements of the convex basis of the region of allowed solutions (see above), and the method from the COBRA Toolbox is labeled Hit and Run (HR). The Z values in the table are the number of standard deviations that the real value is deviating from the calculated mean.
The means obtained by the two sampling methods are very similar for most of the reactions; however the standard deviations found using the HR algorithm are significantly smaller. With the HR method the real values for the fluxes in many cases deviate several standard deviations from the mean, A high value of Z indicates that the real value has a very low chance of being obtained using the considered sampling method (or in other words: the real value does not belong to the family of solutions that is generated by the sampling method). The number of samples with the HR algorithm was increased up to 5000 to check possible effects of the sample size on the standard deviation. Only small increases were observed for the standard deviations of the studied fluxes.
Using the CB algorithm we obtain higher standard deviations and the real flux is for most reactions less than one standard deviation away from the mean flux. We can therefore conclude that the CB sampling method gives more realistic standard deviations for the fluxes. This is important if we want to compare the significance of flux changes between conditions. An underestimated standard deviation would make some flux changes appear as being significant even though they may not be in reality, and our method therefore gives a more conservative list of significantly changed reaction fluxes than the HR algorithm.

Comparisons between different carbon sources and mutant strains
To evaluate our method we used data for the yeast S. cerevisiae. Data from growth on four different carbon sources (glucose, maltose, ethanol and acetate) in chemostat cultures and five deletion mutants (grr1D, hxk2D, mig1D, mig1Dmig2D and gdh1D) grown in batch cultures were used. The exchange fluxes and gene expression data for the mentioned conditions have been published earlier [15][16][17].
Our method obtains probability scores for each enzyme in the metabolic network (see methods) and this allowed us to classify the enzymes as transcriptionally regulated (correlation between flux and gene expression), post-transcriptionally regulated (changes in gene expression don't cause changes in flux) and metabolically regulated (changes in flux are not caused by changes in geneexpression). The cut-off chosen for this classification was a probability score above 0.95. Tables 2 and 3 show the 10 top scoring enzymes in each group (or fewer when less than 10 enzymes had a score exceeding 0.95). The method is illustrated in Fig. 3.
The method to identify the significance of flux changes relies on a set of measured external fluxes, and in some cases strains that don't show significant changes in external fluxes have changes in internal fluxes [18]. These changes cannot be identified with our method, and our estimations of the significance of flux changes can therefore be seen as conservative estimates. The lists of transcriptionally and metabolically regulated reactions are therefore more reliable than the list of post-transcriptionally regulated reactions (in which some fluxes may be changed in reality but their change pass undetected).
The reactions showing transcriptional regulation form a set of putative targets where enzyme over-expression or down regulation will influence the flux through these reactions. The reactions showing metabolic regulation points to parts of the metabolism where the pools of metabolites are possibly increasing or decreasing in connection with transcriptional changes and hereby counteracting possible changes in enzyme concentration as a result of transcriptional changes. This knowledge can be used to identify whether one should target changes in enzyme concentration (v max changes), e.g. through over-expression, or changes in enzyme affinity (K m changes), e.g. through expression of heterologous enzymes, in order to alter the fluxes.
Effects of different carbon sources. In the glucose to maltose transition, only two enzymes showed transcriptional change correlated with their flux. The a-glucosidase MAL32, responsible for the breakdown of maltose into glucose was upregulated and the glucose transporter HXT4 was down-regulated. The metabolic adjustment in terms of fluxes was minimal and only enzymes directly related with substrate uptake and utilization were detected. The changes in gene expression were also low and only 11 metabolic enzymes were significantly perturbed (without significant flux changes).
The glucose to ethanol and the glucose to acetate transitions showed widespread flux and expression changes, and they are therefore more interesting study cases. In the glucose-ethanol transition 19 enzymes showed transcriptional regulation and 22 other enzymes changed in expression but not in flux. For the glucose-acetate transition the same numbers were 33 and 23 respectively. We can see that about one half of the genes that changed in transcription level also showed significant flux changes, this proportion is higher than in the case of deletion mutants (as it will be discussed later). Among the enzymes showing transcriptional regulation, 14 were shared between the glucose-ethanol and glucose-acetate transitions. Interestingly, no overlap was found between the sets of enzymes that don't change in flux. Metabolic regulation was observed in 21 reactions for each case, among which 8 overlap.
The enzymes showing transcriptional regulation clearly show a down regulation of enzymes involved in glucose uptake and utilization (e.g. Glucose transporter HXT4 or Hexokinase 2) and the up-regulation of enzymes involved in the gluconeogenesis (e.g. Fructose-1,6-biphosphatase) and the TCA cycle (e.g. Succinate dehydrogenase or Citrate synthase). The AcCoA synthetase 2, responsible for supplying AcCoA to the TCA cycle is also transcriptionally up-regulated as well as the ATP synthetase (involved in the respiratory chain) and the external NADHubiquinone oxidoreductase 2, which supplies the necessary NAD + to oxidize ethanol or acetate in the cytoplasm and maintain the redox balance in the cell. Isocitrate lyase, a key component of the glyoxylate cycle, is also transcriptionally up regulated and this allows for net formation of malic acid that can further be converted to phopshoenolpyruvate (via oxaloacetate) that fuels the gluconeogenesis. All these changes in fluxes are consistent with general knowledge about the changes in metabolism from glucose to C2 carbon sources like ethanol and acetate, but what is interesting to see is that not all the reactions associated with these flux changes are transcriptionally regulated, but the cell have selected a few key reactions to regulate at the transcriptional level and these are identified using our method.
In order to make a deeper analysis, we performed an enrichment test to compare the transcription factors involved in the expression of the enzymes showing transcriptional regulation and the enzymes showing changes in expression but not in flux. We found three transcription factors that were strongly overrepresented in the metabolic genes showing transcriptional regulation. In the glucose-ethanol transition, the transcription factors Gcr1 and Gcr2 both appeared in 11 transcriptionally regulated genes and in none of the other genes, whereas the transcription factor Hap4 appeared in 11 transcriptionally regulated genes and 5 of the other regulated genes. For the glucose-acetate transition these numbers were 15-0, 11-0 and 15-0 for the same transcription factors. This means that certain transcription factors are especially involved in the transcriptional regulation of metabolic fluxes (the same kind of enrichment was observed in the deletion mutants, as will be discussed later), and to our knowledge this has not been previously reported. It basically implies that there is global regulation of major flux alterations, which is similar to what has experimentally been shown to be the case for galactose metabolism [19].
The top scoring metabolically regulated reactions, both for glucose-ethanol and glucose-acetate, are the Fructose biphosphate aldolase and the Triosephosphate isomerase. These reactions are known to operate close to the equilibrium and are therefore very sensitive to changes in the metabolic pools, which is consistent with metabolic regulation of the fluxes. In the considered cases the direction of these reactions is inverted. This can only be explained by a decrease in the fructose-1,6-diphosphate pool and an increase in the glyceraldehyde-3-phosphate and dihydroxyacetone pools. This hypothesis is supported by the fact that in chemostat cultures, there was not found any correlation between the glycolytic flux and the expression of the genes encoding these two enzymes [20]. The results for the glucose-ethanol change are summarized in Fig. 4.
Effects of gene deletions. As mentioned above several different deletion strains were evaluated and the number of enzymatic reactions showing transcriptional regulation was 26 for the grr1D strain, 25 for the hxk2D strain, 11 for the mig1D strain, 8 for the mig1Dmig2D strain and 0 for the gdh1D strain. The reactions showing post-transcriptional regulation were 73, 70, 46, 36 and 89 for the same strains, respectively. These numbers clearly show, that in contrast to growth on different carbon sources, most of the transcriptional changes do not result in correlated changes in metabolic fluxes. This indicates that many transcriptional changes are indeed happening in order to minimize the metabolic adjustment resulting from a gene deletion, and the most extreme case is the gdh1D strain, where no transcriptional changes seem to be correlated to flux changes. This behaviour is consistent with the MOMA (Minimization of Metabolic Adjustment) hypothesis [21].
There is a substantial overlap between the strains grr1D and hxk2D, 15 transcriptionally regulated reactions were shared between both strains and 28 post-transcriptionally regulated reactions were also shared. An enrichment test was performed in order to find    Pho2 and Bas1 are partner proteins that regulate the transcription of genes involved in purine and histidine biosynthesis [22]. It is possible that the slower growth rate observed in grr1D and hxk2D with respect to the wild type is due to a down-    regulation of purine and histidine biosynthesis resulting from lower activities of Pho2 or Bas1.
Strains grr1D and hxk2D show specific growth rates of 0.23 and 0.22 h 21 respectively [16], however the biomass yields were 0.09 g-DW g 21 and 0.2 g-DW g 21 . The specific glucose uptake rate for the hxk2D strain is significantly lower than for the grr1D strain as well as for the reference strain. This is associated with the observation that the glycolytic flux in the hxk2D strain shows transcriptional downregulation of five enzymes. In the upper glycolysis, the Hexokinase 2 has been deleted and the Phosphofructokinase 1 is down-regulated. The Phosphofructokinase 1 was also strongly down-regulated in the grr1D strain but the decrease in flux was not as large as in the hxk2D strain. In the lower glycolysis of the hxk2D, all the three iso-enzymes of Phosphoglycerate mutase were down-regulated as well as the phosphoglycerate kinase. No down-regulation for these enzymes was seen in the grr1D strain. The glycerol-3-phosphate dehydrogenase has two iso-enzymes. The first of those isoenzymes was upregulated in the grr1D and the hxk2D strains; however its expected flux decreased in both cases (more significantly in hxk2D). The second iso-enzyme did not show important changes in grr1D but was down-regulated in hxk2D. This is not the only case in which different iso-enzymes show different regulatory patterns, and in these cases our method for having flux estimations independent from transcriptome analysis is particularly useful.
The strain hxk2D showed a strong decrease in ethanol production compared to grr1D. All the alcohol dehydrogenase iso-enzymes were down-regulated in a similar way in both strains. The explanation of the differences in flux towards ethanol should be found in the pyruvate decarboxylase. Pyruvate decarboxylase 3 was strongly down-regulated in both strains, however, pyruvate decarboxylase 2 was up-regulated. This up-regulation was much more significant in grr1D, which could explain a higher flux from pyruvate to AcCoA in this strain. The results for the hxk2D mutant are summarized in Fig. 5.
The mig1D mutant shows a higher specific growth rate than the wild type. In general it is transcriptionally very similar to the wild type [16]. An enrichment test in transcription factors between transcriptional regulated and post-transcriptional regulated reactions was performed and the factor Sfp1 was found. This factor is known to regulate ribosome production and is nutrient sensitive [23]. This could mean that the deletion of MIG1 activates a response against starvation that results in an increased specific growth rate. Among the transcriptionally regulated reactions, a slight down-regulation of the PP pathway is observed together with an up-regulation of several amino-acid production pathways.
In the mig1Dmig2D mutant there is a slight decrease in the specific growth rate. All the 8 transcriptionally regulated reactions were down-regulated and belonged to amino-acid biosynthesis pathways. The enrichment test found the factors Cbf1 and Gcn4 (represented in 4 and 1 out of 8 reactions and 5 and 6 out of 36 reactions). Gcn4 is known to regulate amino-acid biosynthetic genes [24] and it seems that the up-regulation of amino-acid biosynthesis due to the deletion of MIG1 is cancelled by an opposite effect due to MIG2.
In all the mutants discussed above, the AcCoA carboxylase and the fatty acid synthesis showed metabolic regulation. This could indicate that in the studied cases the AcCoA pool was the main parameter responsible for adjusting the rate of lipid biosynthesis to match the changes in specific growth rates.
The experiments for the gdh1D mutant were performed in chemostat cultures [17] with the same dilution rate. The only observed change in exchange fluxes was a small decrease in glycerol production and only a few significant changes were identified in the metabolic fluxes. However, there were significant transcriptional changes in many metabolic pathways. This again points to the hypothesis that changes in transcription mainly results in altered metabolite levels such that metabolic homeostasis can be maintained. This is supported by metabolome analysis of this mutant, which showed that there were many changes in the metabolite levels [25] and in fact many of these changes were associated with changes also in the transcription of associated enzymes [26]. It is possible that the chemostat conditions, by imposing the same specific growth rate, forced the mutant strain to important transcriptional changes in order to keep the fluxes unchanged.
In Fig. 6 we aim to provide a global visualization of the changes for all the studied metabolic conditions.

Sampling in the region of feasible solutions
The steady state condition and the irreversibility of some reactions impose limitations on the flux distributions attainable by the cell [18]. The set of feasible solutions can be further constrained by fixing some fluxes to their experimental values. In general, the fluxes most accessible to experimental determination are those corresponding to uptake or secretion rates. After fixing a subset of fluxes, genome scale models still have a large number of degrees of freedom. In this study we used the genome scale model iFF708 for S. cerevisiae [27]. Random sampling has previously been performed [7] by enclosing the region of allowed solutions in a parallelepiped with the same dimensions as solution space (the null space of the stoichiometric matrix) and generating random points inside this parallelepiped. The points that lie inside the region of possible solutions are then selected. The COBRA Toolbox [12] uses a Hit and Run algorithm to generate random points in this way. In this work instead of sampling inside the region of allowed solutions we sampled at its corners.
In order to obtain corners in the space of allowed solutions we used the simplex method with a random set of objective functions to be maximized. The maximization of each of these objective functions will give a corner in the space of solutions. The constraints imposed upon each optimization are: The values of the measured fluxes (v exp ) are different between conditions. This fact changes the shape of the region of feasible solutions between different conditions. S is the stoichiometric matrix of the network. In order to reduce the effects of internal loops we first identified all the reactions that can get involved in loops using the FVA (Flux Variability Analysis) option in the COBRA Toolbox. The reactions that can be involved in loops are unbounded and show the default maximal or minimal value set in the COBRA Toolbox (1000 or 21000). If these bounds were kept, the means and standard deviations for these reactions would be unrealistic [6] and cannot be used for further analysis. In order to reduce the effect of loops, the default maximal and minimal fluxes for the reactions involved in loops, were set to a smaller value in order to reduce the loop effect. In order to select an appropriate value the bounds were increased from 0 in steps of 0.1 until the minimal value that allows obtaining flux distributions consistent with the experimental fluxes is found. These values went from 1 to 15 mmol h 21 g-DW 21 depending on each condition. Also no weights (eq. 4) were assigned to the reactions involved in loops in order to avoid objective functions that maximize the activity of loops.
Random objective functions were generated by selecting random pairs of reactions and assigning them random weights (the reactions involved in loops were excluded from these choices). The weights (w i ) assigned to each reaction were generated by dividing a random number between 0 and 1 by the maximal flux for this reaction obtained using FVA. This normalization was made to account for the different size orders of the different reactions. The objective functions take the form: One solution is obtained for each of the objective functions generated.
Our objective is to obtain means and standard deviations for each flux in each of the compared conditions and use them to get a Z-score quantifying the significance of change in each flux between the considered conditions. This score is equal to the difference between the means in each of the conditions divided by the standard deviation of this difference (note that the variance of the difference is the sum of the two variances and the standard deviation its square root).
The difference between averages in the numerator follows a normal distribution (according to the central limit theorem) with a standard deviation equal to the standard deviation of the flux (the denominator in eq. (5)) divided by the square root of the number of samples. Therefore, Z itself follows a normal distribution with a standard deviation equal to the inverse of the square root of the number of samples. The Z score measures the significance of change in terms of standard deviations. If the error in the Z score is lower than 0.15, no information would be lost in terms of classifying a reaction as significantly changed or not. The order of size of a genome-scale model is about 1000 reactions. A reasonable accuracy for the Zscores would be to expect errors higher than 0.15 on the Z score only for 1 reaction in the whole model. This means a p-value of 0.001. If we want to keep the error on the Z score under 0.15 with a probability of 0.999 we need 500 samples, and this was therefore selected as the sampling number.

Classification of enzymes according to their changes in flux and expression level
The Z-scores can be transformed into probabilities of change by using the cumulative Gaussian distribution. Once we have Z-  scores for the significance of flux changes and Z-scores for the significance of gene-expression changes we can obtain probabilities of having correlated expression and flux changes for each enzyme.
An increase in enzyme expression can result in an increase of flux (transcriptional regulation). In order to evaluate the probability for a reaction of being transcriptionally regulated we multiply the probability of its enzyme level changing by the probability of its flux changing in the same direction (obtained using the cumulative normal distribution).
If there is a decrease in expression and a decrease in flux, both Zscores are negative and we will use the absolute values of the Zs in eq. (6). If there is an increase in expression and a negative flux becomes more negative, we will use the absolute value of the Zscore for the flux change. If the direction of the flux changes between conditions, this change must be driven by the metabolic concentrations and no by transcriptional regulation, therefore a P tri of zero is assigned by default.
In the same way as in eq. (6) we can define probabilities for the expression level changing and for the flux not changing (post transcriptional regulation).
Now we use the error function because we want to evaluate the probability of change in any direction. The absolute value of Z is used in all the cases. The probability of a change in flux but not in transcription (metabolic regulation) can be obtained for each reaction as follows: Each of these three probabilities can be associated to each enzyme in the metabolic network. Table 4 summarizes the criteria to assign each type of regulation.