Model certainty in cellular network-driven processes with missing data

Mathematical models are often used to explore network-driven cellular processes from a systems perspective. However, a dearth of quantitative data suitable for model calibration leads to models with parameter unidentifiability and questionable predictive power. Here we introduce a combined Bayesian and Machine Learning Measurement Model approach to explore how quantitative and non-quantitative data constrain models of apoptosis execution within a missing data context. We find model prediction accuracy and certainty strongly depend on rigorous data-driven formulations of the measurement, and the size and make-up of the datasets. For instance, two orders of magnitude more ordinal (e.g., immunoblot) data are necessary to achieve accuracy comparable to quantitative (e.g., fluorescence) data for calibration of an apoptosis execution model. Notably, ordinal and nominal (e.g., cell fate observations) non-quantitative data synergize to reduce model uncertainty and improve accuracy. Finally, we demonstrate the potential of a data-driven Measurement Model approach to identify model features that could lead to informative experimental measurements and improve model predictive power.

In my view, the authors addressed many of the raised issues in a satisfactory manner. However, I find the responses to Comment 2 and 17 somewhat lacking. Comment 2: While the overall presentation has been improved, in some parts the manuscript still appears to have been sloppily put together. In that regard it does not help that the revised manuscript (RM) and the revised manuscript with highlighted changes (RMH) do not seem to be consistent with each other. Please note that the RM lists the legends of Figure 1 ). The actual figures that follow immediately after p.25 are also labeled as Figure 1-Figure 7. By contrast, the RMH lists the legends of Figure 1- Figure 5, and Box 1-Box 2 (p.20-p.24). As far as I can see, Box 1 and Box 2 are never mentioned in the RM. Presumably as a consequence, many of the figure references in the RM and the supplemental (S.) material do not seem to be correct. Further, the RM has an Author Summary while the RHM does not. All these issues should be carefully and thoroughly resolved.
There are a number of more minor issues that seem to appear in both the RM and the RMH. In the following, page and line numbers refer to the RM.
• Eq. 1: Please indicate that the rhs of an ODE typically depends on x(t) as well: rather thanẋ(t) = f (t, θ).
• line 268: We subsequently calibrated the parameters to fluorescence data; Please provide a reference for the used data.
• line 303: These data differ from synthetic ordinal and nominal datasets in that they lack a known reference to "ground truth" dynamics... I am somewhat confused by this statement because S. Fig. 18 shows a timecourse that is referred to as "true". Also, please provide the reference for the fractional cell death data in the main text.
• line 358: Shown in Figure 5C-E, we represent this as a flexible assumption by encoding it in our priors (ii -iv) i.e., Cauchy distributions... However, 5C's prior is a uniform rather than a Cauchy distribution.
• On occasion, y obs is still used instead ofŷ to refer to observed data. See line 897: ...can be compared to data y obs (t i ) and, by contrast, Eq. 6 that usesŷ. Similarly, line 634: ...where data the, y obs ,... and, by contrast, Eq. 13 that usesŷ.
• S. Table 3: The prior and posterior plots below are of Θ j . There are no plots immediately below S. Table 3. Hence, please refer to the specific figures that contain the plots.
• S. Table 5: typo: All kf use cell/copies-s as their units.
• S. Fig. 7: legend of x-axis is partially cut off Figs. 18,19A,19B refer to Methods only. Please provide the specific section as well.
• The log(k) − τ plot in S. Fig. 20 indicates a TRAIL dose of 50ng/ml. However, the figure caption and the main text on p. 10 only mention a 25ng/ml TRAIL dose.
The following sentences do not seem to be entirely correct: • line 267: We centered our the prior for... Comment 17: In response to this comment the authors added on p. 14, line 659, To represent the irreversible progression toward an apoptotic cell fate (and monotonic accumulation of apoptosis cell fate markers, tBID and cPARP), each ordinal constraint function is combined, using the sequential model... And, on the same page, line 662, The cumulative, and adjacent-category models may also be appropriate in other contexts. I find this response somewhat too thin.
Especially the second sentence on the cumulative and adjacent-category models does not add much in my view and, as it stands, should be removed. Also, is the irreversible progression the reason why the authors chose the sequential model over the others? Or is it the monotonic accumulation of the markers? Please clarify and motivate/explain briefly the choice of the sequential model. Could the other two models used here? Why or why not (briefly)?