Integrating Quantitative Knowledge into a Qualitative Gene Regulatory Network

Despite recent improvements in molecular techniques, biological knowledge remains incomplete. Any theorizing about living systems is therefore necessarily based on the use of heterogeneous and partial information. Much current research has focused successfully on the qualitative behaviors of macromolecular networks. Nonetheless, it is not capable of taking into account available quantitative information such as time-series protein concentration variations. The present work proposes a probabilistic modeling framework that integrates both kinds of information. Average case analysis methods are used in combination with Markov chains to link qualitative information about transcriptional regulations to quantitative information about protein concentrations. The approach is illustrated by modeling the carbon starvation response in Escherichia coli. It accurately predicts the quantitative time-series evolution of several protein concentrations using only knowledge of discrete gene interactions and a small number of quantitative observations on a single protein concentration. From this, the modeling technique also derives a ranking of interactions with respect to their importance during the experiment considered. Such a classification is confirmed by the literature. Therefore, our method is principally novel in that it allows (i) a hybrid model that integrates both qualitative discrete model and quantities to be built, even using a small amount of quantitative information, (ii) new quantitative predictions to be derived, (iii) the robustness and relevance of interactions with respect to phenotypic criteria to be precisely quantified, and (iv) the key features of the model to be extracted that can be used as a guidance to design future experiments.


times [ FIS ] [ CYA ] growth
10 130 exponential Since the model is formalized in a Piecewise Affine Differential Equation system (PADE), both its biological graph (ex: gene x activates the gene y transcription) or its formalization of the dynamical system can be used to build an ETG. As an application, we used herein the biological graph. The corresponding ETG is pictured in Figure 6 of the manuscript. Notice that the switch between the two phases impacts the event transition graph by suppressing two transitions (f is + → crp − and complex → f is − in the stationary growth phase.

Training dataset and costs
For the sake of clarity, we expose here the data used for the training the model (i.e. estimation of the probability matrices). Notice here that 3 data points are needed for finding the information. Datasets. The concentration evolution rates can be determined for both phases, according to Figure . For instance, the growing rate for FIS in the stationary growth phase, computed by using is relative values at times 2 and 80 minutes, equals log(100) − log (10) 80 − 2 = 1.03.
This value says that FIS protein concentration increases by 3% each minute. In order to use it in our model, it is necessary to obtain the corresponding rate per transition of the model, and thus to know the number of iterations performed by the model in a one minute duration. We argue that FIS, CYA and other proteins are degraded as soon as a sufficient number of its amino acids are degraded. In accordance to the N-end rule [Alexander, Varshavsky (1997). "The N-end rule pathway of protein degradation". Genes to Cells 2 (1): 13-28], we take a duration of 2 minutes as the minimal half-life for these animo-acids. Thus, when taking a natural degradation rates of 5% per transition, the model runs n iterations to degrade half of the present proteins, where n satisfies 0.95 n = 0.5. Here, n ≈ 14 implying that 7 iterations are reached per minute. Known concentration evolution rates in both phases, expressed in a per iteration scale, are synthesized in the following  Figure 4 depicts an example of probabilities assignment that satisfies the expected growth ratios for protein FIS.
Model validation For the sake of validation of our modeling technique applied on E. coli, we compare the time series predicted with those observed experimentally during both 1 used for inference 2 used for validation Figure 3: Event Transition graph and an example of corresponding probabilities after an estimation based on experimental data such as given in Figure 2. Note herein that several probabilities allow to fit the experimental knowledge.
growth phases. As previously mentioned, CYA and Fis concentration behaviors were investigated. A comparison between FIS and CYA observations and their respective predictions by the model is performed. A Pearson correlation test confirms the accuracy of the predictions. Notice that in the computed time-series, we set the value to 1 if the computed value is smaller than 1 and to 100 is the computed value is greater than 100.

Model predictions
It is also possible to predict other concentration evolutions, assuming, for instance, an initial concentration of 50% for each unknown proteins. The resulting predictions are depicted in Figure 5.

Sensitivity of the ETG transitions
Computing the sensitivity of the model allows to rank the transitions according their partial derivative. The higher is the sensitivity of the transition, the higher it is constrained to be equal to a fixed value. It is expressed in percentage having the following meaning: if the given probability is changed by 1%, then the euclidean distance between the expected growth ratio and their predictions is modified by X% (the given sensitivity). Each returned sensitivity is computed as the mean over 100 transition matrices satisfying FIS observed protein evolution. Such an information is useful to classify the transitions according to their importance on the system. The sensitivities are depicted in Figure 6.

Stationary growth
Exponential growth    Figure 5: Event Transition graph and corresponding sensitivities after an estimation based on experimental data such as given in Figure 2