^{1}

^{2}

^{¤}

^{1}

^{2}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: SC NDP. Performed the experiments: SC. Analyzed the data: SC. Wrote the paper: SC NDP.

Current address: Faculty of Arts and Sciences (FAS) Center for Systems Biology, Harvard Society of Fellows, Harvard University, Cambridge, Massachusetts, United States of America.

There is a strong need for computational frameworks that integrate different biological processes and data-types to unravel cellular regulation. Current efforts to reconstruct transcriptional regulatory networks (TRNs) focus primarily on proximal data such as gene co-expression and transcription factor (TF) binding. While such approaches enable rapid reconstruction of TRNs, the overwhelming combinatorics of possible networks limits identification of mechanistic regulatory interactions. Utilizing growth phenotypes and systems-level constraints to inform regulatory network reconstruction is an unmet challenge. We present our approach ^{−172}), significantly better than using gene expression alone. We applied GEMINI to create an integrated metabolic-regulatory network model for ^{−14}) and revealed potential condition-specific regulatory mechanisms. Our results suggest that a metabolic constraint-based approach can be successfully used to help reconstruct TRNs from high-throughput data, and highlights the potential of using a biochemically-detailed mechanistic framework to integrate and reconcile inconsistencies across different data-types. The algorithm and associated data are available at

Cellular networks, such as metabolic and transcriptional regulatory networks (TRNs), do not operate independently but work together in unison to determine cellular phenotypes. Further, the phenotype and architecture of one network constrains the topology of other networks. Hence, it is critical to study network components and interactions in the context of the entire cell. Typically, efforts to reconstruct TRNs focus only on immediately proximal data such as gene co-expression and transcription factor (TF)-binding. Herein, we take a different strategy by linking candidate TRNs with the metabolic network to predict systems-level responses such as growth phenotypes of TF knockout strains, and compare predictions with experimental phenotype data to select amongst the candidate TRNs. Our approach goes beyond traditional data integration approaches for network inference and refinement by using a predictive network model (metabolism) to refine another network model (regulation) – thus providing an alternative avenue to this area of research. Understanding how the networks function together in a cell will pave the way for synthetic biology and has a wide-range of applications in biotechnology, drug discovery and diagnostics. Further we demonstrate how metabolic models can integrate and reconcile inconsistencies across different data-types.

The inference of transcriptional regulatory networks (TRNs) from high-throughput data is a central challenge in systems biology. TRN models provide a mechanistic framework for describing interactions between transcription factors and their target genes. Cellular phenotypes are influenced by the differential activity of these networks, and reconstructing the regulatory network enables one to understand the underlying molecular processes that cause phenotypic changes and better predict the response of a cell to an external perturbation.

Current network inference algorithms enable rapid reconstruction of TRNs by utilizing high-throughput data such as protein-DNA binding, DNA sequence or gene expression

We hypothesized that integrating regulatory interactions with metabolic networks would make it possible to more directly connect the regulatory interactions with their downstream phenotype, and thus allow us to use a broader range of data for network curation. Genome-scale models of metabolic networks have been constructed using growth phenotype data for a wide range of organisms, and these models accurately predict the response of the cell to environmental and genetic perturbations

To enable the concurrent analysis of transcriptional regulation and metabolism, we recently developed the Probabilistic Regulation of Metabolism (PROM) approach for integrating biochemical networks with TRNs in an automated fashion

PROM solves the forward problem of combining disparate networks to predict phenotype (e.g., flux and growth rates). In the work described herein, we iteratively use PROM to aid in solving the more challenging inverse problem

This new approach, Gene Expression and Metabolism Integrated for Network Inference (GEMINI), discerns functional regulatory interactions in high-throughput data by taking advantage of PROM, the growing amount of information in phenotype databases, and the observation by Barrett

Here we describe the GEMINI approach and then test it by building a genome-scale integrated model for yeast. We compare the refined network model across various high-throughput data sets, and demonstrate that GEMINI effectively recalls known mechanistic interactions. We then iteratively expand and refine the integrated model using published genome-wide chromatin immunoprecipitation, TF knockout gene expression and binding-site-motif data sets, and show the ability of our integrated metabolic and regulatory network model to predict growth phenotypes of transcription factor knockout strains in new conditions. We also use GEMINI to identify potential condition-specific interactions and post-transcriptional regulatory mechanisms in

GEMINI takes in a draft regulatory network and integrates it with the corresponding metabolic network and gene expression data using PROM. PROM uses conditional probabilities, viz. the probability of a given gene being ON or OFF when the regulating transcription factor is ON or OFF, to represent gene states and gene–transcription factor interactions. The ON/OFF state of the TFs is then used to determine the likelihood of an ON/OFF state of the target genes based on the probabilities estimated from microarray data. PROM then utilizes the Gene-Protein-Reaction (GPR) relationships present in the metabolic network models to connect the regulatory targets to the corresponding metabolic reactions. The GPRs take into account the presence of isozymes or multi-gene/multi-subunit complexes that may be involved in catalyzing each metabolic reaction. The probabilities are then used to constrain the fluxes through the metabolic network (detailed below), and an optimal state of the network that satisfies topological and transcriptional constraints is determined.

Using this integrated metabolic-regulatory network, PROM can simulate metabolic phenotypes under different conditions using Flux Balance Analysis (FBA) _{ij}_{j}_{j}

Once the initial PROM model is built, GEMINI then performs

Unlike mass balance or thermodynamic constraints that cannot be violated, PROM imposes “soft” constraints on the system due to transcriptional regulation, thereby enabling us to force the model to match the measured phenotype. This procedure results in a flux solution that is geometrically closest to the flux state

We demonstrate the GEMINI approach using the model organism

The effectiveness of GEMINI was evaluated by measuring its ability to differentiate between the validated direct interactions and the remaining low-confidence interactions (putative/potential interactions), which were inferred using motif search algorithms

The initial TRN, formed by compiling the Yeastract interactions, was integrated with the yeast metabolic network

GEMINI performed ^{−172}, hyper-geometric test) for validated gold-standard interactions; this result suggests that GEMINI preferentially removed low-confidence interactions (

^{−172}, hyper-geometric test). Most of the interactions eliminated by GEMINI were found to have little supporting experimental evidence (interactions that did have strong supporting evidence were preferentially retained).

These results were robust to the chosen growth conditions – glucose, galactose, glycerol and ethanol minimal media all led to significant enrichment of gold-standard interactions (

Condition | Enrichment for Direct | Enrichment for Indirect | Final Network Size |

Glucose | 10^{−172} |
10^{−104} |
22059 |

Galactose | 10^{−96} |
10^{−55} |
22308 |

Glycerol | 10^{−179} |
10^{−100} |
22134 |

Ethanol | 10^{−144} |
10^{−86} |
22551 |

Rich/undefined Media | 10^{−42} |
10^{−39} |
28981 |

To determine whether a similar accuracy could have been obtained using expression data alone (i.e., without adding constraints based on the phenotypic outcomes predicted by the metabolic network), we compared our GEMINI results to a more commonly used approach for curating TRNs—sorting predicted interactions based on the correlated expression of the TFs and their putative target genes. Specifically, we measured the Mutual Information (MI) and Pearson's correlation among all of the interactions in our original YEASTRACT network.

To ensure comparison was not biased towards GEMINI, we tuned the size of the network using MI and correlation over all possible values (over-fitting to the best outcome that could be achieved for MI or correlation for any cutoff). The maximum enrichment obtained by MI and correlation (even when overfit) was lower than that obtained using GEMINI (the lowest p-value measured over all possible network sizes for MI was 10^{−6} and for correlation was 10^{−3};

To gain further insight into the types of interactions recalled by the different methods, we examined another subset of interactions having “indirect evidence”—interactions inferred based on changes in the mRNA or protein expression of a target gene after perturbing its putative regulator ^{−19} and 10^{−4} for the best cutoffs of MI and correlation, respectively); this is not surprising since the indirect relationships are defined by gene expression changes. However, GEMINI still outperformed these methods in recalling indirect interactions (p-value of 10^{−104}) for any network size (

The biological relevance of the interactions retained by GEMINI is also supported by the enrichment for biological processes relevant to the set of target genes for each regulator. As compared to regulons (target genes for each regulator) in the original network, regulons in the refined network were found to be more specific, on average, to a given metabolic pathway (p-value<0.01;

Comparison with TF knockout expression data from a recent study ^{−9};

One interesting observation from our results is that GEMINI can differentiate interactions from different sources based on their effect on the predicted phenotype. We next checked to see if we can use this to evaluate newly inferred interactions in the context of available known interactions. We can subsequently reconcile inconsistencies that arise from these interactions with metabolic phenotypes. To simulate such a scenario, we added new interactions onto the refined Yeastract network model and refined the expanded network model using GEMINI.

We chose three commonly used data types:

Interactions inferred based on sequence motif search learned from ChIP

Interactions inferred using the expression-based reverse engineering algorithm, CLR

Validated direct and indirect interactions in the literature measured using experiments such as large-scale TF knockout

We found that for both the motif and CLR network, we could refine the network further and significantly enrich once again for direct and indirect interactions (enrichment p-value compared to the original inferred network (direct, indirect) = (10^{−44},10^{−73}) and (10^{−13},10^{−31}) for motif and CLR, respectively;

Data Set | Enrichment for Direct | Enrichment for Indirect | Network Size (Initial/Final) |

I. Motif data | 10^{−44} |
10^{−73} |
38105/28807 |

II. Expression (CLR) | 10^{−13} |
10^{−31} |
24111/21954 |

III. Validated interactions | NA | 29874/29808 | |

Validated interactions (Quantitative Iteration) | 10^{−27} |
29874/25000 |

In contrast to the inferred interactions, very few interactions (∼66) from the validated interaction data set (Network III) were removed by GEMINI. This interaction set is similar to the gold-standard set in the Yeastract database and was thus retained in the network. While these interactions were consistent with the simple lethal/non-lethal constraint we used in glucose minimal media, we predicted that by adding more constraints, we could narrow down the solution space further, and remove more phenotype-inconsistent interactions. With this aim, we employed PROM to quantitatively predict the growth rate (as opposed to just lethal/non-lethal outcomes). Doing so allowed us to partition the non-lethal predictions into two categories: suboptimal and optimal (^{−27};

By using an iterative approach, we increased the comprehensiveness of the integrated network model by adding new interactions (Network III) and iteratively refining the model using GEMINI. This process enriched the fraction of validated interactions in the network (shown in red) and improved the predictive ability of the integrated network model.

Importantly, we observed that the refined network had a greater consistency with growth phenotype data in new conditions than the original network. Thus, by learning only on glucose minimal medium, the network model had greater correlation with growth rate measurements in galactose minimal medium (correlation of 0.47, p-value = 10^{−7} vs. a correlation of 0.2, p-value = 0.04 for the original unrefined Yeastract model) and in urea minimal medium (correlation of 0.62, p-value = 10^{−14} vs. a correlation of 0.22, p-value = 0.02; data from Fendt

In this study, we developed a novel way to connect regulatory interactions with phenotype data using a metabolic network. Currently, accurate regulatory network reconstruction is hampered by the lack of methods to directly connect inferred potential interactions to observable phenotypes such as growth rate to guide the inference of these networks in a high-throughput fashion. Using GEMINI, we demonstrated that we can identify functional regulatory interactions and refine high-throughput interaction data using phenotype-consistency as a constraint. We showed that by integrating with a predictive metabolic network model, we can improve the quality and predictive ability of the generated high-throughput data significantly better than using gene expression alone.

By applying the GEMINI approach to our yeast model, we identified phenotype inconsistencies for 80 TF knockout predictions. The majority of the inconsistencies (85%) were of the type NGG (No Growth – Growth), for which the model predicts lethality (or suboptimality), while the actual phenotype was non-lethal (or optimal). Because this scenario was the most commonly identified inconsistency type, we concentrated on reconciling this set alone. Also, this case is more tractable to resolve than the opposite case (GNG), which involves adding interactions from a very large multi-optimal solution space. Further, a TF knockout may be lethal or suboptimal due to a non-metabolic reason, meaning that even an optimal metabolic model would not be expected to resolve all GNG inconsistencies; in contrast, if a knockout is non-lethal and the model predicts it to be lethal, then that implies there is an inconsistency with the integrated model.

GEMINI integrates two different network models (metabolic and regulatory) and inconsistencies could arise due to either network. In this work, we assumed that the metabolic network, being better curated and having a biochemical basis, could be used to identify inconsistencies in the regulatory network. Additional evidence from the distribution of inconsistencies also supports our assumption (^{−104}), but not as strongly as when using the more predictive model by Zommorodi and Maranas. This suggests that as the predictive ability of the metabolic models improves, we should be able to refine these interactions further. In theory, a trivial solution for resolving NGG inconsistencies is to remove all of the interactions for the respective TF. However, interestingly, GEMINI resolved all 80 NGG inconsistencies without reverting to the trivial solution.

Furthermore, the elimination of phenotype inconsistent interactions by GEMINI based on one condition might lead to inconsistent predictions in a different condition. We found that this was the case for a small fraction (4%) of the interactions that were phenotype-inconsistent in glucose minimal media, but were predicted to be consistent with growth phenotype data in galactose minimal media. Analyzing inconsistencies over different set of conditions would help us avoid over fitting the model to the growth phenotype data. Further analysis across conditions would help uncover interactions that are condition-specific and post-transcriptionally regulated (discussed below).

In the present analysis, we used the predicted growth rate as the only phenotype to constrain the regulatory network. If the interactions regulating biomass-related metabolic reactions were enriched for potential interactions, this would lead to an apparent enrichment for direct gold standard interactions on running GEMINI as an artifact. We tested this by evaluating the metabolic genes for which their knockout affected the maximum growth rate of the model. No difference was observed in the number of gold-standard interactions regulating this set of genes versus the rest (both the sets had the same fraction (14%) of gold-standard interactions;

We predicted that the effectiveness of GEMINI would also depend on the scale of the regulatory network model used. GEMINI evaluates interactions in the context of other interactions in the network and so its effectiveness will depend on the size and degree of completeness of the entire network. To test this, we ran GEMINI using different fractions of the entire TRN and looked at the enrichment for gold-standard interactions. As expected, we found that GEMINI's effectiveness to refine the network increased with the size of the input network. To control for size bias on the enrichment p-value, we also looked at the fraction of gold-standard interactions in the initial and final refined network and observed the same effect (

GEMINI utilizes the mechanistic information in biochemical networks to refine high-throughput interaction data. We next sought to determine which parts of the yeast transcriptional regulatory network were prone to inconsistencies across different growth conditions (

The distribution of phenotype inconsistencies was exponential across the TRN, suggesting that a few TFs led to most of the inconsistencies. In contrast, the distribution of inconsistencies across the metabolic network was linear and did not reveal any strong trend towards specific metabolic genes.

In contrast to the regulatory network, analysis of the distribution of inconsistencies across the metabolic network did not reveal any strong trend towards specific metabolic pathways. The distribution was linear rather than exponential across the metabolic genes (

Among the carbon sources, galactose led to the least enrichment for both validated gold standard interactions and indirect interactions. Both glucose and galactose enter central metabolism at the level of glucose-6-P, but they lead to primarily fermentative or respiro-fermentative metabolism, respectively

Analysis of phenotype-consistent interactions inferred using GEMINI under different environmental conditions (

Glucose repression is one of the most well-studied processes in yeast and we focused on a subset (408) of these 1170 interactions that were predicted to be inactive only in glucose minimal media. The top 3 TFs with most interactions in this list—Rph1, Hsf1 and Adr1, were all activated during glucose starvation and are regulated via signaling and phosphorylation

This strategy shows the utility of looking across multiple conditions to identify discrepancies in the data, which might be due to additional biological regulation. This also highlights the importance of incorporating signaling networks as they become available into these integrated network models.

Given the large amount of data required to run GEMINI, we are currently restricted to a few well-studied systems with adequate expression, knockout phenotype and network data. However, with the development of automated methods for reconstructing metabolic networks

The regulatory network model used in this study, despite being genome-scale and much more comprehensive than the current integrated model for yeast

Regulatory network inference is a significant challenge today as the system is underdetermined and often results in multiple models that could explain the same data with equal efficacy. Thus, it is important to incorporate diverse heterogeneous data types like expression, binding and growth phenotype to constrain the solution space. GEMINI exploits this principle to refine high-throughput regulatory interaction data and identifies interactions that are consistent with various data types. Importantly, this is the first such approach that ties the inference of a transcriptional regulatory network from high-throughput data with a biochemically detailed metabolic network.

We believe this to be an important first step towards mechanistically refining a network model of one type (gene regulatory) using data from another network type (metabolic). Further, our approach highlights the potential of using a biochemically-detailed mechanistic framework to interpret high-throughput data and identify and reconcile inconsistencies across different data types. We find that the data types that are more consistent with each other also have greater evidence supporting their existence. While there are still several challenges ahead for regulatory network inference, the methods presented here lay the foundation for the rapid refinement of omics data using a mechanistic framework, which will advance the study of metabolic regulation and lead to better predictive models of the cell.

Using PROM, we predicted the growth outcome of knocking out each TF in the network under a specific condition. By comparing our simulations with experimental growth viability data, we identified and reconciled inconsistent predictions. TF knockouts were predicted to be lethal if the respective maximal growth rate prediction of the mutated organism was less than 5% of the wild-type growth rate

The closest flux state that represents the measured growth phenotype (

The entire steps in GEMINI are described in the pseudo code below:

The flux solutions in FBA have multiple possible states, while the growth rate or the objective function is unique. Since we relied only on the growth rate and the transcriptionally constrained reactions (part of the objective function in PROM) as the metric to refine the network, the final network structure was identical across different runs of GEMINI. To further investigate how alternate optimal solutions alter the effectiveness of GEMINI we generated new flux solutions by introducing small changes to the growth threshold (step 3b in pseudo code and

The above analysis of inferring regulatory networks across alternate metabolic flux solutions also resolves the possibility of multiple alternate optima with respect to the regulatory network. We found that the same core set of interactions was removed across different runs. In addition, we also compared network generated using much larger changes in growth rate threshold used for inferring the flux state v2. We once again found that while the refined network sizes changed across different thresholds, they were >95% similar to each other among the interactions that were retained. These results indicate that that there is a strong global optimal state for the regulatory network and by perturbing the model and constraints we still converge close to the global optima. In terms of network refinement, all these results suggest that there is a core set of regulatory interactions that are removed across different constraints and conditions (

The value of κ, which determines the strength of the transcriptional regulatory constraint, was determined in a data-driven manner by tuning across a range of values. We set κ to be the lowest value above which there was no increase in the number of interactions removed (

We have used a metabolic network-based approach for prioritizing regulatory interactions for pruning. One can envision other approaches and metrics to prioritize these interactions. As an alternative metric, we sorted interactions based on probabilities instead of predicted flux difference (see step 3 in the pseudo-code). While this seems to be a straightforward metric, this ignores the system-level effect of these interactions on the biochemical network for prioritizing the interactions. Using this approach on the yeastract data, we obtained an enrichment of 10^{−20} for direct interactions. Note that even though we only use transcriptomic data to prioritize interactions, this approach yields higher enrichment than MI or correlation. This is because we prune interactions till the predicted systems-level growth phenotype matches the experimental measurement; thus the systems level constraint is unchanged while only transcriptomic data is used for prioritizing interactions.

As a second alternative approach for prioritizing interactions, instead of sorting interactions based on the flux difference between the predicted (v1) and expected (v2) flux state, we assigned the reactions into two groups – the first group of reactions change significantly based on a z-score threshold between v1 and v2 and the rest that did not change significantly. Interactions that regulate these reactions were then pruned randomly from the first group and then from the second group. The rationale being that this strategy doesn't rely significantly on the absolute difference between reactions and allows for alternate flux solutions. We once again found strong enrichment for gold standard interactions through this approach across different thresholds (p-value = 10^{−143}). This method is further discussed in

Both expression randomization and phenotype swapping removed the enrichment for gold-standard interactions (p-value = 1). We also performed bootstrapping of expression data to determine sensitivity to the gene expression data used. This was done by running GEMINI using random subsets comprising 80% of the expression data. We found strong enrichment in all of the runs (p-value<1E-90;

All parameters were left at the default value as recommended for running PROM (binarization threshold – 0.33 i.e. the 33^{rd} percentile of gene expression data (^{−220};

For the analysis to identify potential biases in the network architecture, we identified genes affecting maximal growth rate by doing a systematic single gene deletion of all the metabolic genes in the model in glucose minimal media. We identified interactions that regulate this set of genes and compared it with the rest of the interactions in the network. We found the fraction of gold standard interactions to be the same in both sets of interactions. Dead end reactions used for this analysis were identified using the removeDeadends algorithm in the COBRA toolbox in MATLAB.

We used the reconstructed yeast metabolic network by Zommorodi and Maranas because it had the highest predictive ability among the available yeast models

Robust multi-array averaged (RMA)-normalized gene expression data consisting of 904 arrays in 435 conditions were obtained from the Many-Microbes Microarray Database

All the regulatory interaction data were obtained from the supplementary material of the respective publications

Growth phenotype data for yeast TF knockout strains grown in glucose, galactose, glycerol and ethanol minimal media were obtained from Kuepfer et al

Metabolic pathway enrichment analysis was done by overlapping genes in each regulon with genes in each pathway (like TCA cycle or glutamate metabolism) as defined in the metabolic network model. The p-value for overlap between the regulons and pathway genes was calculated using the hyper-geometric test.

In the analysis to determine the functional significance of the interactions, the differentially expressed genes (FDR<0.05) were obtained from Reimand

For the comparison with PBM data

The sequence motif data were obtained from the supplement of MacIsaac

Mutual Information between interactions was measured using the algorithm ARACNE

All the simulations and statistical analyses were performed in MATLAB. The COBRA toolbox

Mutual Information (blue) and correlation (red) tuning across various network sizes. Plots show enrichment (shown as the negative log to the base 10 of the hypergeometric p-value) for direct interactions. The same gene expression data set used for GEMINI (904 arrays in 435 conditions) from the Many-Microbes Microarray Database were used for estimating MI and correlation. Interestingly, we observed that redoing the same analysis using interactions with positive pearson's correlation alone yielded higher enrichments (minimum p-value = 10^{−10}), while interactions with negative pearson's correlation did not lead to any enrichment for direct interactions (minimum p-value = 0.99).

(TIF)

MI (blue) and correlation (red) tuning across various network sizes for indirect interactions. Plots show enrichment (shown as the negative log to the base 10 of the hypergeometric p-value) for ^{−16}), while interactions with negative pearson's correlation did not lead to any enrichment for indirect interactions (minimum p-value = 0.96).

(TIF)

Comparison of the distributions of the MI scores for the original and refined yeastract networks. We found that the interactions retained by GEMINI do not consist only of the lower part of the total MI distribution, except for extremely low MI values close to zero. The pruning of the network by GEMINI is less trivial than simply raising the threshold to select for significant MI scores. The Kolmogorov-Smirnoff test also revealed no significant difference (p-value = 1) between the two MI distributions.

(TIF)

Effect of the size of the input TRN

(TIF)

Assessment of the algorithm's sensitivity to the choice of the growth threshold used to determine lethal/non-lethal predictions. TF knockouts were predicted to be lethal if the respective maximal growth rate prediction of the mutated organism was less than 5% of the wild-type growth rate. The plot shows that the enrichment for gold standard interactions is robust to the choice of the growth thresholds over a reasonable range of values. While we used the values commonly used in the literature (5%), tuning this threshold indicated that higher enrichments could be achieved by varying this parameter. 10% gave the highest enrichment implying that a 10% cut off might be a better threshold for identifying lethal interactions in yeast. In general, we recommend using the default values to avoid over-fitting.

(TIF)

Estimating the value of κ in a data-driven manner by tuning across a range of values. We set κ to be the lowest value above which there is no increase in the number of interactions removed. We obtained a κ = 10 using this strategy.

(TIF)

Assessment of the algorithm's sensitivity to the choice of the kappa parameter. The enrichment for gold standard interactions is robust to the value of kappa chosen for a wide range of values above 10. Note that higher kappa implies greater constraint due to transcriptional regulation.

(TIF)

Bootstrapping of expression data to determine sensitivity of the algorithm's performance (enrichment for gold standard interactions) to gene expression data size and variance. GEMINI was run using random subsets comprising 80% of the expression data. We found strong enrichment in all of the runs, while complete randomization of gene expression removed enrichment. These results suggest that GEMINI is robust to small changes in gene expression data and the array conditions were quite diverse and were sufficiently powered for this analysis.

(TIF)

Alternative approaches to prioritize interactions. The normalized flux approach works as follows: We first estimated the flux difference between the predicted (v1) and expected (v2) flux state. We then normalized the flux differences to have zero mean and unit variance (z-scores). Reactions were then pooled into two groups based on a threshold z, which represents the deviation from the mean flux difference. Interactions that regulate these reactions were then pruned randomly from the first group (higher than the threshold) and then from the second group. The advantage of this approach is that it doesn't rely significantly on the absolute difference between reactions. However this approach introduces a new parameter – the z-score threshold. The plot shows the enrichment for gold standard interactions over a range of z-score thresholds. The fact that we observe strong enrichments using different metrics and thresholds suggest that the systems level constraints are more important than the order in which the different inconsistencies are solved. As mentioned earlier, the flux solutions in FBA have multiple possible states, while the objective function (the growth rate and the transcriptionally constrained reactions) is usually unique.

(TIF)

Changing the threshold for binarizing gene expression data. The binarization threshold is used to binarize the gene expression data for estimating probabilities using PROM. We used the default value used for running PROM (0.33); i.e. genes less than 33^{rd} percentile of the overall expression distribution are considered to be OFF. If the binarization threshold is lowered, only genes with very low expression would be considered as OFF, and we would be unable to quantify interactions accurately. In addition, we may be unable to quantify interactions because some of the genes could be predicted to be ON in all conditions as a result of the low threshold (i.e., lost interactions). Decreasing the threshold to very low values (<0.1) decreases the accuracy of PROM, which leads to less comprehensive prediction. Increasing the threshold above 0.5 decreased the accuracy as well, as it would result in considering genes that are ON as OFF. The ideal region is around 0.3 to 0.4 for running PROM. We performed additional analysis for GEMINI where we tuned our predictions over a range of binarization threshold values. Our accuracy changes with the ability of PROM to accurately predict growth phenotype (

(TIF)

Enriched metabolic pathways in the refined Yeastract network.

(DOCX)

List of 1170 interactions that were predicted by GEMINI to be phenotype-inconsistent in only one of the four conditions (glucose, galactose, glycerol and ethanol). We predicted that these interactions might be true interactions that are conditionally-inactive, and the phenotype inconsistency might have arose due to post transcriptional regulatory mechanisms inactivating these interactions in these conditions. We found that for the top TFs with most interactions in this list were inactivated through phosphorylation, consistent with our predictions.

(DOCX)

Analysis of alternate optimal solutions. We compared networks inferred from different flux states by introducing small changes to the expected growth rate. The similarity matrix below shows the network sizes for different growth thresholds (described in the

(DOCX)

We would like to thank Julie Bletz, Caroline Milne, James Eddy, Seth Ament, Nicholas Chia, Vangelis Simeonidis and Areejit Samal for critical readings of this manuscript. We thank Nitin Baliga and other members at the Institute for Systems Biology for their insightful comments and discussion.