Time series experimental design under one-shot sampling: The importance of condition diversity

Many biological data sets are prepared using one-shot sampling, in which each individual organism is sampled at most once. Time series therefore do not follow trajectories of individuals over time. However, samples collected at different times from individuals grown under the same conditions share the same perturbations of the biological processes, and hence behave as surrogates for multiple samples from a single individual at different times. This implies the importance of growing individuals under multiple conditions if one-shot sampling is used. This paper models the condition effect explicitly by using condition-dependent nominal mRNA production amounts for each gene, it quantifies the performance of network structure estimators both analytically and numerically, and it illustrates the difficulty in network reconstruction under one-shot sampling when the condition effect is absent. A case study of an Arabidopsis circadian clock network model is also included.


A table describing the simulations performed and summarizing the behavior.
Response: We now summarize the simulations in Table 1 on page 20.

A description of how the adjacency matrix A was chosen from simulations.
Response: We now emphasize on page 17 that the adjacency matrix A is chosen i.i.d. from the prior distribution described in Appendix S2.
So instead of fixing a ground truth A as in Fig 4, we fix a prior distribution of A with split Gaussian prior described in Appendix S2 (note we assume the knowledge of no autoregulation), and choose A i.i.d. from the prior distribution with d max = 3.
3. An application on "real" data.
Response: We could not find a real biological dataset with both one-shot and multishot samples and a reliable ground truth network. As we discuss on pages 22-25, most expression datasets are one-shot data, and there is no well-accepted ground truth. Thus, to address the reviewer's comment, we generate expression data from a most-accepted Arabidopsis circadian clock model using a DREAM-challenge-like SDE model and run BSLR on both the one-shot and the multi-shot datasets. We show that even for such a sophisticated model with nonlinear regulation, continuoustime dynamics, unobserved cytoplasmic and nuclear protein concentrations, and very limited samples, the simple BSLR algorithm demonstrates a clear improvement from one-shot sampling to multi-shot sampling. The results also show that one-shot data is better paired with replicate averaging, while multi-shot data is better paired with no replicate averaging, illustrating the effect of sampling method on the data analysis decisions.
See the new discussion section "A case study on Arabidopsis circadian clock network" on pages 25-28 for the details.

A comparison of performance to other methods on some of the DREAM data.
Response: Because the focus of the paper is on the difference between one-shot and multi-shot sampling methods and how the difference can be mitigated by treating oneshot data under the same conditions as multi-shot data, we feel that comparison to algorithms that do not take this difference into considerations would be a digression.
As a result, we choose GLRT (for single-gene recovery) and BSLR (for multi-gene recovery), which are theoretically optimal, practically competitive, and intuitively understandable algorithms, as opposed to other variants of approximations. We believe the results on the two simple algorithms are easier to interpret for the purpose of this paper.
As for the DREAM challenge data, neither the real biological data nor the in silico data includes one-shot sampling. Hence they are not suitable for validation of the results in this paper. However, as we mentioned in the response of the previous comment, we do include the simulated expression data using SDE model similar to that of the in silico data in the DREAM challenges for a most-accepted Arabidopsis circadian clock network.
5. The section "On biological replicates" is a bit odd and seems tacked on. Particularly, the discussion of differential expression tools seems odd. The goal in those tools is not to infer gene regulation, but to infer whether the expression of some set of genes is changed given some experimental perturbation. Not sure what you mean here (line 551):