Physiologically Shrinking the Solution Space of a Saccharomyces cerevisiae Genome-Scale Model Suggests the Role of the Metabolic Network in Shaping Gene Expression Noise

Sampling the solution space of genome-scale models is generally conducted to determine the feasible region for metabolic flux distribution. Because the region for actual metabolic states resides only in a small fraction of the entire space, it is necessary to shrink the solution space to improve the predictive power of a model. A common strategy is to constrain models by integrating extra datasets such as high-throughput datasets and C13-labeled flux datasets. However, studies refining these approaches by performing a meta-analysis of massive experimental metabolic flux measurements, which are closely linked to cellular phenotypes, are limited. In the present study, experimentally identified metabolic flux data from 96 published reports were systematically reviewed. Several strong associations among metabolic flux phenotypes were observed. These phenotype-phenotype associations at the flux level were quantified and integrated into a Saccharomyces cerevisiae genome-scale model as extra physiological constraints. By sampling the shrunken solution space of the model, the metabolic flux fluctuation level, which is an intrinsic trait of metabolic reactions determined by the network, was estimated and utilized to explore its relationship to gene expression noise. Although no correlation was observed in all enzyme-coding genes, a relationship between metabolic flux fluctuation and expression noise of genes associated with enzyme-dosage sensitive reactions was detected, suggesting that the metabolic network plays a role in shaping gene expression noise. Such correlation was mainly attributed to the genes corresponding to non-essential reactions, rather than essential ones. This was at least partially, due to regulations underlying the flux phenotype-phenotype associations. Altogether, this study proposes a new approach in shrinking the solution space of a genome-scale model, of which sampling provides new insights into gene expression noise.


Introduction
As a powerful tool for biological interpretation and discovery, over 100 genome-scale metabolic networks for more than 35 organisms have been reconstructed (see http://gcrg.ucsd.edu/ InSilicoOrganisms/OtherOrganisms). These models have been widely used in molecular evolution studies [1][2][3], genome annotation [4,5], metabolic engineering [6][7][8][9], and cellular phenotype predictions [10][11][12]. The broad applications of genome-scale metabolic network reconstructions largely owe its success to a constraint-based modeling strategy. Instead of developing a system of partial differential equations such as that observed in kinetic models, the constraint-based modeling method converts a metabolic network into a stoichiometric matrix [13]. This matrix ensures balance of metabolites flow throughout the network, thus resulting in hundreds of mass balance constraints. These, together with the upper and lower bounds of the metabolic reactions in the network, define a solution space where actual flux distributions reside [14]. However, in reality, the metabolic behavior of a cell is under complex physiological regulations, many of which have not been reflected by the constraints mentioned above. This implies that the region for biologically relevant flux distribution is represented by small fraction of the entire solution space of a constraint-based model (CBM). Therefore, shrinking the predicted solution space is necessary to improve the predictive ability of a CBM.
Diverse constraint-based reconstruction and analysis (COBRA) methods have been developed to probe the solution space [15]. These could be basically classified into two groups, namely, unbiased and biased methods. Unbiased methods, like the Monte Carlo Markov Chain sampling method, are mainly used to characterize all possible flux distributions within the solution space. However, cells do not utilize most of these flux distributions [16]. On the other hand, Flux Balance Analysis (FBA) is the most extensively employed in biased methods. By optimizing an objective function, FBA attempts to capture the region where real flux distributions are most likely to reside [17]. FBA is capable of performing quantitative predictions with relatively high accuracy. During the past decades, the explosive growth of omics data offered a good opportunity to reduce the solution space of a CBM by integrating experimental datasets, including gene expression data [12,18], proteomic data [19], and metabolite concentration data [20]. In particular, algorithm innovations related to transcriptomic data integration have undergone substantial progress in the past decades [21]. Recently, Herrgård's group [22] systematically evaluated the predictive capabilities of eight classic methods. The results showed that in many conditions, these methods do not outperform simple FBA by using growth maximization and parsimony criteria when making phenotype predictions. Other studies have also described the limitations of the strategy [12,23]. Therefore, alternative approaches in locating real flux distributions are warranted.
Extensive experimental evidences suggest that cellular phenotypes could be synergistically regulated by global biological regulatory factors under certain rules [24][25][26][27]. Compared to mRNA levels, protein abundance, and metabolite concentration in cells, metabolic flux levels are more capable of reflecting cellular phenotypes [28]. Therefore, phenotype-phenotype correlations are very likely to be observed at the metabolic flux level. Exploring metabolic flux phenotype correlations would benefit certain bounds refinements of reactions, further contributing to CBM solution space shrinkage. Experimentally identified metabolic flux data on S. cerevisiae has been rapidly accumulating in the past decades. It has laid a solid foundation on flux phenotype correlation analysis; however, this kind of data has not been extensively utilized. In the present study, a meta-analysis of experimental flux data from publications has been performed to shrink the solution space of the yeast CBM.
A refined model benefits biological discoveries. In the present study, we applied the refined solution space of a S. cerevisiae model to explore the influence of the metabolic network on gene expression noise. Gene expression noise is defined as the stochastic fluctuation of a gene's expression at the protein level between isogenic cells under identical environment [29]. It is very important to understand the regulatory mechanism of gene expression noise because small variations at the noise level could result in a dramatic change at the phenotype level [30]. Previous studies have indicated that gene expression noise is regulated by several factors, including the structure of genes [31,32], gene essentiality [31], gene expression level [30,33], transcription/translation rate [34][35][36], chromatin remodeling [31,37], and regulation network [38][39][40]. A recent study indicated that gene expression noise could propagate and cause fluctuations in growth through corresponding metabolic reactions [41]. However, the interaction between gene expression noise and the metabolic network remains elusive.
In the present study, we first determined the correlations among several metabolic flux phenotypes through a systematic review of experimentally measured metabolic flux data. The correlations were quantified and imposed into a S. cerevisiae CBM to refine its solution space. Using the refined model, we examined whether the metabolic network influences expression noise of enzyme-coding genes.

Correlated metabolic flux phenotype quantification
All the metabolic flux data we collected were measured under the conditions that the yeast strains were cultivated aerobically with glucose as the only carbon source (see Materials and Methods; S1 Dataset). Because sugar uptake rate plays a central role in shaping the global metabolic system under a carbon-limited condition [26], the relationship of other flux phenotypes to glucose uptake rates was explored. In an aerobic condition, S. cerevisiae can be mainly in any of two metabolic states, respiration and fermentation. In the former state, glucose is completely oxidized into CO 2 through the mitochondrial respiration pathway, whereas in the latter, glucose is predominantly fermented into ethanol. Because of the existence of two metabolic patterns, it is necessary to identify the metabolic state that the S. cerevisiae is in prior to exploring flux-flux phenotype correlations.
Because the respiratory quotient (RQ) is an indicator of metabolic states, RQ values against glucose uptake rates (For convenience, 1 RQ was used for the y-axes) were plotted. Fig 1 shows that when the glucose uptake rate was <4mmol/ (g (DW)Áh), the 1 RQ values fluctuate within the range of 0.81-1.06. This finding was suggestive of the dominant role of respiration over fermentation. When the glucose uptake rate was >4 mmol/ (g (DW)Áh), the 1 RQ values gradually decreased as glucose uptake rate increased, which was indicative of a metabolic pattern transition from respiration to the fermentation state. This result is consistent with the study conducted by Postmaet al. [42]. Therefore, the glucose uptake rate "4 mmol/ (g (DW)Áh)" was defined as a "transition marker" for discriminating metabolic states.
Next, O 2 uptake rate, ethanol production rate, and glycerol secretion rate against glucose uptake rate were plotted, respectively. Results showed that when glucose uptake rate was <4 mmol/ (g (DW)Áh), the O 2 uptake rates positively correlated with glucose uptake rates (Fig 2  (A)). On the other hand, when the glucose uptake rate was >4 mmol/ (g (DW)Áh), the O 2 uptake rate negatively correlated with the glucose uptake rate (Fig 2(B)). Unlike the O 2 uptake rate, ethanol production rates were always positively correlated with glucose uptake rates, independent of the metabolic state of yeast (Fig 2(C)). Glycerol secretion rates were correlated with glucose uptake rates (Fig 2(D)). Based on these findings, the correlation between the three flux phenotypes (O 2 uptake rate, ethanol production rate, and glycerol secretion rate) and glucose uptake rate was quantified (Table A in S3 File).

Prediction of the upper and lower bounds of the growth rate under aerobic conditions
With the refined bounds for O 2 uptake rate, ethanol production rate, and glycerol secretion rate, FBA was employed to predict the upper and lower bounds of growth rate (see Materials and Methods). Fig 3 shows that almost all the experimentally identified growth rates were within the predicted space, indicating that our strategy was suitable for the refinement of flux bounds. In addition, when the glucose uptake rate was <4 mmol/(g(DW)Áh), the experimental data fitted well with the predicted upper bound (R 2 = 0.83), indicating the dominant role of the respiration pathway in supplying energy for growth. When the glucose uptake rate was >4mmol/ (g(DW)Áh), the data points dispersed in the space. This finding suggested that once the nutrients in the environment are sufficient to support a high glucose uptake rate, S. cerevisiae would be relaxed from evolutionary pressures for survival. Under this condition, "biomass growth" is just one of the several goals that a cell strives to achieve [43], and growth rate becomes a yeast strain-dependent phenotype.

Metabolic flux moderately correlates with mRNA level and protein abundance
Several studies have revealed a close relationship between metabolic flux and gene expression for various pathways in S. cerevisiae [24,44,45], but only a few studies directly identify this relationship [46]. In the present study, the average metabolic flux (AF) level of each metabolic reaction was estimated and compared to the corresponding gene expression level. showed that AF moderately correlated with mRNA level and protein abundance, which was consistent the findings of a previous study [46]. Contrary to the results of Bilu [46], our analysis suggested that the correlation between AF and protein abundance level is stronger than that between AF and mRNA level. This result reflected the fact that compared to mRNA level, protein abundance is more closely linked to metabolic flux level.   enzyme-coding genes (data not shown). A possible reason is that for many reactions, the flux levels are not controlled by enzyme concentrations, but by ubiquitous allosteric regulatory mechanisms [23]. Thus, we next examined the genes that were mostly likely to be related to enzyme-dosage sensitive reactions (DSRs). According to Newman et al., protein abundance levels are measured in both rich (YEPD) and minimal media (SD) [24]. Between the two conditions, several genes were differentially expressed, whereas others remained stable. It is reasonable to assume that the reactions that were catalyzed by enzymes whose level were divergently expressed between the two conditions were more likely to be DSRs. Therefore, the ratio of protein abundance obtained from SD to those obtained from YEPD was used for DSR gene retrieval. Only the genes whose expression levels follow the expression,|Log 2 (SD/YEPD)| > 1, were considered as DSR genes. Finally, a total of 41 genes were obtained and subjected to correlation analysis. In the DSR genes, FF moderately correlated with DM (rho = 0.27, p = 0.0480; Fig 4(A)). To ensure the robustness of this result, three different cutoffs (log 2 (SD/YEPD)> 0.5,0.7,1.0 respectively) were used to retrieve DSR genes for correlation calculations. The correlations between FF and DM were always positive(rho values ranging from 0.26 to 0.58; p<0.05). The more stringent the cutoff, the stronger the correlation. These findings ascertained that for DSR genes, the level of gene expression noise was subject to the tolerance of the metabolic network to flux fluctuations.

Reaction essentiality is related to DM and FF coupling
Gene expression noise undergoes selection pressures when it is beneficial or detrimental to the survival of an organism [30][31][32]. However, that selection pressure for traits that are not vital to cell growth may be relaxed. To determine potential influences of reaction essentiality on FF and DM correlation, the 41 DSR genes were subdivided into "essential reaction" and "nonessential reaction" groups. Among the 41 genes, 20 were associated with essential reactions, whereas 21 were associated with non-essential reactions. Correlation analysis indicated that for the genes corresponding to non-essential reactions, FF strongly correlated with DM (rho = 0.60, p = 0.0050; Fig 4(B)). However, we did not observe a significant correlation in the genes that corresponded to essential reactions (Fig 4(C)).
The DM and FF values between the essential and non-essential reaction groups were also compared. While no obvious differences were observed in the average DM values between the two groups, the FF values of non-essential reactions were generally higher than those of essential reactions (Fig 5). A simulation test was also performed to ensure the robustness of this result (see Materials and Methods), which showed that the simulated difference values between two groups were normally distributed around 0, with a standard deviation σ = 0.26. The observed difference value was at least four standard deviations away from the simulated mean difference. Therefore, the observation that FF values of non-essential reactions were higher than those of essential ones occurred by chance was highly unlikely (Fig 6).
To avoid the possibility that all the results obtained in this section were sampling artifacts of a specific part or the entire solution space, FF values were re-calculated by sampling another part of the shrunken space where the glucose uptake rate fluctuates within 20-22mmol/(g (DW)Áh). These FF values were also used in the correlation analysis, and the results were the same as those obtained earlier (Fig B in S3 File).

Physiological regulations mitigate FF and DM coupling
In the previous section, we revealed the difference between "non-essential reaction"-associated and "essential reaction"-associated genes in FF and DM coupling. The difference was speculated to be related to the flux phenotype-phenotype correlations that were quantified and imposed into the yeast model as additional constraints. To test this hypothesis, the FF values of DSR genes were re-calculated by sampling the yeast model's solution space without the constraints. Removal of the constraints did not remarkably affect the strong correlation with the "non-essential reaction"-related genes (rho = 0.59, p = 0.0051 ; Fig 7(A)). However, for genes     Fig 7(B)). These findings indicated that the global regulations underlying the phenotype-phenotype correlations mitigated the correlation between FF and DM in genes corresponding to essential reactions.

Flux constraint strength is not involved in expression noise regulation
In previous section, we have revealed that metabolic network has influence on gene expression noise. To know whether the influence is related to the extra constraint strength, we used FCS to estimate the level of flux constraints posed on each reaction (see Materials and Methods). Fig 8 showed that the majority of FCS values range within 0-0.1 (746/885), indicating the broad effect of the regulatory mechanism underlying the constraints. However, we did not observed a significant correlation between flux constraint strength and gene expression noise (Fig C in S3 File). This implied that the regulatory strength does not play a key role in shaping gene expression noise.

Discussion
Several studies have identified the limitations of using transcriptomic data in refining CBMs because complex regulations exist during transcription and translation. Because microbial metabolism is operated under certain global principles [24][25][26][27], we hypothesized that metabolic flux phenotypes are also under synergistical regulations, and thus correlate with each other. By summarizing the metabolic data that have accumulated in the past decades, we have identified several metabolic flux phenotype correlations, which were subsequently and successfully applied to the refinement of solution space. We expected to identify additional metabolic phenotype associations, for which the accumulated metabolic data during the past decades has served as solid foundation. FF is an intrinsic trait of a metabolic reaction that is determined by the metabolic network where a reaction is located. In the present study, FF was found to correlate with DM for the DSR genes, suggesting that the metabolic network influenced gene expression noise. In particular, DSR genes corresponding to NERs showed a strong correlation between flux fluctuation and gene expression noise, which, however, was not observed in those related to essential reactions. This is, at least partially, due to the underlying flux phenotype-phenotype associations, because after the removal of the extra constraints, FF strongly correlated with DM for the DSR genes related to essential reactions. We thus concluded that the metabolic network is also an important determinant that constrains gene expression noise, especially for genes associated with the reactions that are not essential for organism's survival. However, for reactions that are vital to cell growth, stochastic fluctuations at the flux level led by gene expression noise might be detrimental. Under this condition, the cell adopts additional physiological regulatory mechanisms to precisely, rather than strongly, control gene expression, and thus reduce expression noise. By using this physiological response, the connections between the metabolic network and gene expression noise is mitigated.
In summary, the present study has revealed the associations among several metabolic flux phenotypes through S. cerevisiae physiological data mining. The integration of these flux phenotype-phenotype correlations into a yeast model has resulted in a substantial shrinkage of its solution space. By analyzing the biologically more relevant solution space, we concluded that the metabolic network contributes, sometimes predominantly, to shaping gene expression noise.

Data sources
Metabolic flux data. A comprehensive search of English-language published reports was performed using three search engines (PubMed, Google Scholar, and ScienceDirect). Several publisher archives, including "Applied and Environmental Microbiology", "Yeast", "Enzyme and Microbial Technology", "Applied Microbiology Biotechnology", and "Microbial Cell Factories", were also thoroughly surveyed. The search terms included "Saccharomyces cerevisiae" or "S. cerevisiae" or "Yeast," plus "physiology" or "physiological," plus "metabolic" or "metabolism". After removing duplicates, a total of 160 full-text articles were obtained. The content of these articles were reviewed and 96 of these were accepted (S2 File). The retrieved metabolic data met following criteria: 1. The unit of measurement for growth rate is "h −1 "; The units of measurement for other metabolic flux data is (or could be converted to)"mmol/ (g (DW)Áh)"; 2. The yeast strains used in experiments did not undergo any artificial genetic manipulation at the molecular level such as gene deletion or overexpression; 3. All the wild-type yeast strains were cultivated aerobically in a steady state; and 4. Glucose was the only carbon source for yeast, and carbon was the only nutrient limitation of the culture.
After selection according to these criteria, we aggregated and normalized the data to generate a metabolic flux dataset (S1 Dataset), which includes glucose uptake rates, O 2 uptake rates, CO 2 production rates, respiratory quotient (RQ) values, ethanol production rates, and glycerol secretion rates.
Protein abundance. We considered three quantitative genome-wide measurements of protein abundance: genome-scale data from the study conducted by Ghaemmaghami et al. [47], and two genome-scale measurements in two conditions as performed by Newman et al. [29].
Gene expression noise. We considered two yeast genome-wide datasets of gene expression noise level based on the report of Newman et al. [29]. In their work, DM was used as a measure of gene expression noise. DM was defined as the difference of the gene specific noise and the median noise for proteins with the same abundance. In this way, DM represents genespecific noise independent of the corresponding gene expression level.
For protein abundance, mRNA levels, and gene expression noise, we averaged across data sets (after normalizing each data set by its mean) to minimize experimental noise.

Quantification of metabolic flux phenotype-phenotype correlations and identification of flux boundaries
We examined the relationship between the glucose uptake phenotype and three metabolic flux phenotypes, namely,O 2 uptake rate, ethanol production rate, and glycerol secretion rate, respectively.
Because O 2 uptake rate and ethanol production rate linearly well correlate with glucose uptake rate (Fig 2), we learned linear models for them. To ensure robustness and avoid publication bias in the linear model, a procedure analogous to leave-one-out cross-validation was performed [51]: The data from each paper were regarded as elementary dataset. All these datasets were aggregated to construct a "reference dataset". Next, leave-one-elementary dataset-out (LOE) datasets (LDs) were constructed by iteratively omitting each elementary dataset from the reference dataset (If m elementary datasets are available for analysis, then m LOE datasets will be obtained). Linear regression analysis was conducted using the data from each LOE dataset, and m slope values and corresponding R 2 values were obtained. The results showed that the m slope values were almost equal to each other, with R 2 >0.8, indicating the robustness of our modeling strategy.
By performing linear regression analysis, the slope value α (α oxy for oxygen uptake versus glucose uptake curve; and α eth for ethanol production versus glucose uptake curve) was defined. While fixing the α value, two β values, namely, β l and β u , were set through visual inspection to define a space covering > 90% of the following experimental identified points: where v glu represents glucose uptake rate; and lb (ub) represents lower (upper) bound for ethanol (or O 2 ) exchange rate.
Glycerol secretion rate did not correlate well with glucose uptake rate. Therefore, its bounds were defined, somewhat arbitrarily, by modifying the α value as follows: lb gly ¼ a gly v glu ub gly ¼ a gly v glu where lb gly (ub gly )represents the lower (upper) bound of the glycerol secretion rate.
All aforementioned α and β values are listed in Table A in S3 File. The newly refined glucose uptake-dependent bounds for O 2 consumption rate, ethanol production, rate, and glycerol secretion rate are as follows: lb oxy v oxy ub oxy ; ð1Þ lb eth v eth ub eth ; and ð2Þ lb gly v gly ub gly : Prediction of the upper and lower bound for growth rate S. cerevisiae genome-scale metabolic network model iMM904 was used because of its high predictive power among available yeast models [52,53]. To simulate yeast cultivated in glucoselimited media, fructose and ethanol uptake rates were fixed at 0 because these are potential carbon sources [52]. At each glucose uptake rate, v oxy , v eth and v gly values were uniformly sampled within the bounds defined by (1), (2), and (3), respectively, and the maximum growth v gro rate that the yeast could achieve was calculated through FBA as follows: subject to:S•v = 0 where v gro indicates the flux of "biomass growth";S is an M × N stoichiometric matrix (M is the number of metabolites, and N is the number of reactions);v is a vector containing metabolic flux v i of N reactions; and lb i (ub i ) represents the lower (upper) bound for v i . When lb oxy <0.016 mmol/ (g (DW)Áh) at a certain glucose uptake rate, we set lb oxy to 0.016 mmol/ (g (DW)Áh) [5]. After repeating this process 10,000 times, the biggest (smallest) value in the 10,000 simulated growth rate values was regarded as the maximum (minimum) growth rate v max (v min ) the yeast was able to achieve under a glucose uptake rate. The v max (v min ) under glucose uptake rates ranging from 0 to 24 (step size = 0.1) were all computed.
Because the upper (lower) bound for growth rate was likely to be a combination of several linear functions (Fig 3), the predicted v max (v min ) was divided into several segments according to glucose uptake rates, for each of which linear regressions of the data were performed separately (Fig A in S3 File).The resulting equations were used as upper (lower) bound for the growth rate (All the equations are listed in Table B in S3 File). For simplicity, the bounds were presented as follows: Estimation of the average flux, flux fluctuation and flux constraint strength  [41]; and (2) Data on mRNA levels to which AF would be compared, and data for gene expression noise to which FF would be compared, were all collected from rich media, under which glucose uptake rate ranges within 0.36-0.40 [54]. The upper (lower) bound for v oxy , v eth and v gly were set according to (1), (2), and (3), respectively. Facilitated with these additional constraints, The shrunken solution space was fully sampled to generate 500,000 eligible flux distributions. The sampled flux values of each metabolic reaction were extracted from the 500,000 flux distributions. Given a reaction, the top and bottom 5% simulated flux values were discarded. The remaining sampled flux values were used to calculate AF and FF as follows: where AF i is defined as average flux level of the i-threaction; and FF i is the flux fluctuation of the i-th reaction. v i,max (v i,min )represents the maximum (minimum) flux value among the sampled flux values of the i-th metabolic reaction.
To estimate the constraint strength posed on each reaction, we also sampled yeast model's solution space 500,000 times before imposing the extra constraints. After removing the top (bottom) 5% flux values of each reaction, FCS was defined as follows: FCS i ¼ v i;max À v i;min ðv i;max;ori À v i;min;ori Þ where v i,max,ori (v i,min,ori ) represents the i-th maximum (minimum) flux value, which was generated from the original solution space. Reactions and metabolites participating in the intracellular loops were not included in the analysis [45]. To exclude potential biases, reactions catalyzed by isozymes were also excluded in the correlation analysis, thus ensuring a one geneone reaction relationship. Bi-directional reactions were also not involved in AF and FF calculation.

Identification of essential and non-essential reactions
Setting "biomass growth" as the objective function to be optimized, FBA was conducted to predict the growth rate v ori of yeast model iMM904 without any extra constraints. We iteratively set reaction flux v i in the yeast model to 0 and employed FBA again to predict the growth rate v gro,i . When v gro,i = 0, this i-th reaction was defined as an "essential reaction." When v gro,i = v ori , this i-th reaction was defined as a "non-essential reaction." Simulation test for FF value differences FF values of the 41 DSR genes were shuffled and randomly divided into two groups, one consisting of 20 values and the remaining values were binned into another one. Difference in average FF values was calculated by subtracting the average FF value in one group by that of another. This process was repeated 100,000 times.

Statistical analysis
All simulations and statistical analyses were performed in MATLAB, which was interfaced with the COBRA toolbox and 'glpk' solver. All correlations described in this report are Spearman's rank correlations.  (Table A).Regression analysis of growth rate versus glucose uptake rate in each segments. (y = αx +β; y is for growth rate and x is for glucose uptake rate( Table B). (PDF)