^{1}

^{2}

^{1}

^{1}

^{3}

^{1}

I have read the journal’s policy and the authors of this manuscript have the following competing interests: PKS is a member of the SAB or Board of Directors of Merrimack Pharmaceutical, Glencoe Software, Applied Biomath, and RareCyte and has received research funding from Novartis and Merck. PKS declares that none of these relationships are directly or indirectly related to the content of this manuscript.

The goal of many single-cell studies on eukaryotic cells is to gain insight into the biochemical reactions that control cell fate and state. In this paper we introduce the concept of Effective Stoichiometric Spaces (ESS) to guide the reconstruction of biochemical networks from multiplexed, fixed time-point, single-cell data. In contrast to methods based solely on statistical models of data, the ESS method leverages the power of the geometric theory of toric varieties to begin unraveling the structure of chemical reaction networks (CRN). This application of toric theory enables a data-driven mapping of covariance relationships in single-cell measurements into stoichiometric information, one in which each cell subpopulation has its associated ESS interpreted in terms of CRN theory. In the development of ESS we reframe certain aspects of the theory of CRN to better match data analysis. As an application of our approach we process cytomery- and image-based single-cell datasets and identify differences in cells treated with kinase inhibitors. Our approach is directly applicable to data acquired using readily accessible experimental methods such as Fluorescence Activated Cell Sorting (FACS) and multiplex immunofluorescence.

We introduce a new notion, which we call the effective stoichiometric space (ESS), that elucidates network structure from the covariances of single-cell multiplex data. The ESS approach differs from methods that are based on purely statistical models of data: it allows a completely new and data-driven translation of the theory of toric varieties in geometry and specifically their role in chemical reaction networks (CRN). In the process, we reframe certain aspects of the theory of CRN. As illustrations of our approach, we find stoichiometry in different single-cell datasets, and pinpoint dose-dependence of network perturbations in drug-treated cells.

This is a

Single-cell, multiplex datasets have become prevalent [

A wide variety of tools have been developed for visualization of single-cell data, including t-SNE [

Unexpectedly, we have been able to sidestep some of the challenges posed by MAK in a cellular context by leveraging geometric aspects of dynamical systems and thereby obtaining analytical results from single-cell data. Chemical Reaction Network Theory (CRNT) is a branch of dynamical systems analysis that focuses primarily on topological features of a reaction network [

To illustrate this approach, we briefly review some basic definitions. For _{i} involved in a reaction:
_{i}} and {_{i}}, we define a

Provided the reverse reaction exists, the _{eq}. This equality can be rewritten in terms of

Given a network

We will focus on two objects associated with such systems: 1) the _{1} = _{−1} = 1, and its one-dimensional stoichiometric subspace is represented by a yellow line. The surface characterizes the network well, since any initial concentration ([

(a) Several simulated trajectories of the reaction network _{1} = _{−1} = 1. The steady state set is shown in cyan/magenta, along with some of its level sets for fixed values of [

More specifically, among MAK dynamical systems, the subset known as “complex-balanced” reaction networks (which includes the familiar case of “detailed-balanced networks” [

This is analogous to the case of a single reaction in _{i} is not the equilibrium constant of the isolated reaction, but is instead a constant that accounts for kinetic constants from the entire network. The satisfaction of these equalities implies that, in log-concentration space, the transformed steady state set, log(_{i} are additionally constrained [

More generally, the subset of reaction networks that obey log-linearity are called

We represent a single cell by a vector that includes as components the concentrations _{G}}. The localization of a reactant into different cellular compartments (e.g. nucleus and cytoplasm) or different macro-molecular assemblies is managed using the conventional compartmentalized formalism and simply adds elements to

We reframe the equations described in the introduction in terms of the distribution of chemical trajectories from a population of cells, _{fixed} is large in an appropriate sense. Typically, it is only possible to observe a subset of the species in a cellular reaction network. We find that when only a subset of the chemical species are observed,

Exploring non-complex-balanced networks by simulation and examples, we find that our analysis method still recovers subspaces tied to reaction network topology, analogous to how

With this theoretical background, we show that single-cell, multiplex data (sc-data) that can feasibly be obtained from mammalian cells using multiplexed flow cytometry (FACS) or multiplexed immunofluorescence (using CyCIF [

Suppose that a population of

Applied to the example in

In general, if we identify each cell in a population with a vector for the concentrations of all its relevant biochemical species

For finite but sufficiently long times _{i} ⊂ _{1} ⊂ _{2} ⊂ … ⊂

Following our earlier example of _{f} = 0.2, _{r} = 0.1, much slower than the original reaction. The trajectories now converge first to the earlier surface, since it is the steady state of the fast reaction. With enough time, those trajectories eventually converge to the steady state of both reactions (see

(a) Simulated trajectories are shown for the reaction network with the additional, slow reaction _{f} = 0.2, _{r} = 0.1. Trajectories first converge toward the steady state set of the fast reactions alone, the slow manifold, before slowly converging to the complete steady state (black). (b) An example of a log-linear steady state set (blue, parameterized as (^{2}, ^{3})) and its stoichiometric subspace (yellow) are depicted. Supposing we observe

For detailed-balanced reaction networks, slow manifolds are approximately the steady state sets of fast networks defined by ignoring slow reactions [

MAK assumes well-mixed, elementary reactions involving the collision of molecules, but single-cell experiments never provide data on all, or even most, of the chemical species participating in elementary reactions for any given biological process. However, assuming that MAK adequately describes the elementary reactions, our conclusions change minimally after accounting for these unobserved species, thanks to log-linearity. More specifically, given a complex-balanced MAK network _{obs} of _{obs} is the intersection of the stoichiometric subspace and the observable space (see

As an example in _{obs} shown in black, is the intersection between the plane

The fact that the observed orthogonal complement _{obs} is a subset of

In summary, not only is _{obs} composed of net reaction vectors, the equality in _{obs} with the intersection implies that _{obs} contains

Whereas complex-balanced networks provide a sufficient condition for the previous results to hold, similar results hold for a larger class of MAK networks, relying on the log-linearity of steady states.

For example, take a simplified kinase(

The complexes

The orthogonal complement contains (1, 1, −1, −1), which would be seen in data, informing us that

Pérez-Millán et al. provide a sufficient condition for a reaction network to have “toric steady states” [^{⊥} of steady states in log coordinates need not coincide with the stoichiometric subspace, although ^{⊥} still relates to network topology.

Furthermore, the steady state set need only be a

Despite this, the previous log-linear constraint in

Although some classes of non-complex-balanced have been treated analytically [

Parameter | Log-Mean | Log-Variance |
---|---|---|

Concentration @t = 0 (CB) | 4 | 4 |

Kinetic Constants (CB) | 2.5, 3 | 0.05 |

Concentration @t = 0 (GRN) | 5 | 8 |

Unbound Production Constants (GRN) | 1 | 1 |

Bound Production Constants (GRN) | 3 | 3 |

Protein Binding-Unbinding Constants (GRN) | 3 | 1 |

Protein Degradation Rate (GRN) | 3 | none |

(a) Example complex-balanced simulation, analyzed by PCA, shows 11 small eigenvalues, as expected from the simulated network’s structure, leading to a gap (red line) that grows larger with time. PAD shows that the span of these 11 eigenvectors converges to the true stoichiometric subspace. The 10th and 11th eigenvalues decrease slower than the others, due to slow reactions in the simulation. (b) An example GRN simulation for

Having confirmed our conclusions about single-cell data covariance on a complex-balanced simulation, we turned to a non-complex-balanced model. We simulated a Gene Regulatory Network (GRN) with _{i}, _{i}, and ∼70% (chosen randomly) of the possible ^{2} protein-bound genes

Transcription/translation was lumped into a single, protein production step for sake of simplicity. The interpretation of this assumption is that mRNA turnover is faster than protein turnover, which is not biologically unreasonable [

When the distribution of trajectories at the end of the simulations was analyzed by PCA, the eigenvalues of the covariance matrix for all simulations exhibited gaps visible in

To understand the convergence of the reversible reactions and the _{A}, _{B}, that of their protein products _{A}, _{B}, and one protein-bound gene, _{B} and _{B} in a 1 and −1 ratio, even though this is not a reaction vector. Finally, we have

This possibly explains the origin of the ^{th} protein species and the second being the most active bound-state of the ^{th} gene), which is tilted ^{th} protein’s reaction vector (1, 0). In our simulation, multiple protein-bound variants existed for any gene, which adds

From this one small example, we see that log-linear constraints arise from complex-balanced reactions, from a balance between production and degradation, and from a biological, asymptotic case. We expect log-linear constraints to be mechanistically informative, even without complex-balancing, and thus our framework may be useful with further development in the analysis of general biological systems.

In the remainder of this paper we refer to the orthogonal complement of the minimal, linear set containing the log steady state as the Effective Stoichiometric Space (ESS). The previous examples demonstrate the potential value of the ESS for mechanistic analysis of biological systems, which may often be considered non-complex-balanced. At the very least, if one wants to constrain an asymptotically stable reaction network model using single-cell, multiplex data, a data-derived ESS identifies log-linear relations that must appear in the model’s dynamical equations; this is a fairly precise constraint, since generic polynomial equations are seldom log-linear.

We analyzed a previously published multi-parameter Fluorescence-Activated Cell Sorting (FACS) dataset in which the levels of 11 phospho-proteins in the ERK/Akt signaling pathway were measured in naive primary human CD4+ T-cells [

FACS data from each condition were fit with a two-component Gaussian Mixture Model (GMM) to distinguish two empirical subpopulations, and the larger component was analyzed further. For each condition, the covariance matrix was eigendecomposed. Each eigenvalue spectrum showed at least one gap, denoted by an orange arrow in

(a) Eigenvalue spectra from PCA of the larger CD4+ subpopulations are shown for 4 of the 13 conditions (shifted to avoid overlap). Apparent gaps denoted by orange arrows. (b) The small eigenvectors were linearly recombined by row reduction on their transpose, with complete pivoting, for ease of interpretation. The distribution of the linearly recombined entries from all 13 conditions are shown in a histogram (not including the 0s and 1s that are necessarily produced by row reduction), as well as with a Gaussian smoothing kernel of bandwidth 0.04. Peaks seem to appear at -1/3, -2/3, and -1. The null distribution for random, sparse, constraints is also shown for comparison. (c) As an example, the recombined vectors for Condition A are shown, with bootstrapped 95% confidence intervals. Other conditions are similar in appearance.

Gaps in eigenvalue spectra were identified by visual inspection, based on the presence of abrupt discontinuities, but the approach is not rigorous. Principled methods do exist to identify which gaps are significant [

Each ESS was defined by choosing the gap farthest right. The corresponding eigenvectors were then interpreted by linearly recombining them by row reducing their transpose with complete pivoting [

Additionally, the recombined eigenvectors’ entries (excluding the 1s and 0s necessarily produced by row reduction) had a distinct distribution (see

We also analyzed a Cyclic Immunofluorescence (CyCIF) dataset that comprises measurement of the levels and modification states of 26 antigens with a focus on phospho-states of proteins involved in apoptosis, Akt/Erk signaling, cell cycle progression, and cytoskeletal structure. The dataset is found in the Library of Integrated Network-based Cellular Signatures (

The eigenvalue spectra for each of the conditions also exhibited gaps, as denoted by orange arrows in

(a) The dominant subpopulations of the MCF10A cells were analyzed by PCA and the eigenvalue spectra are shown for 3 of the 24 conditions (shifted to avoid overlap). Some, but not all, apparent gaps denoted by orange arrows. (b) Singular eigenvectors were linearly recombined by row reduction on their transpose, with complete pivoting. The distribution of entries is displayed, along with a null. (c) Condition A’s dominant subpopulation’s recombined, singular eigenvectors are shown. Net reactions link the various proteins, such as S6 with mTor, or the two phosphoforms of S6 (235 and 240). Other conditions show similar sparseness.

For the CyCIF data, we analyzed the dose and drug dependence of the ESS associated with the dominant fluorescent signal in each channel. To compare the subspaces from any two conditions, we first performed PAD between all pairs of conditions, and then summarized the principal angles _{i} with the metric

Between conditions that did and did not involve exposure to drug (DMSO-only control samples), the ESS changed substantially, as shown in _{1} and _{2} having the same substrate

(a) Analyses of the ESS between conditions, as quantified by an angle-based metric for a common cutoff of a 10-dimensional ESS. Average metric between a drug condition and the four DMSO replicates (top) are plotted, as well as between any pair of non-zero doses of drug (bottom), arranged by increasing dose. The cumulative distribution of the metric is shown for the pairs between DMSO null replicates, the pairs that used the same drug, and the pairs that used different drugs. (b) Analyses of the LDA separation between the high-dimensional marker distributions, with analogous comparisons as above.

The steady state set is constrained by the non-log-linear relation:
_{1}[_{1}] ≫ _{2}[_{2}], i.e. one kinase being much more active than the other, we would observe the approximate log-linear relation _{1}[_{1}][_{r}[_{1}, such as by inhibition of an upstream activator or _{1} itself, we can reach the other limit _{1}[_{1}] ≪ _{2}[_{2}] at high enough doses. Then, the observed ESS would change to _{2}[_{2}][_{r}[

Between different doses of the same drug, the data shows the ESS remaining independent of dose almost 50% of the time with a precision comparable to experimental error (based on comparisons between the DMSO-only control samples), as shown in the cumulative distribution of

As a more conventional analysis of differences between distributions, we also compared conditions using the Linear Discriminant Analysis (LDA) [_{i} from

Single-cell, multiplex imaging and flow cytometry (or mass cytometry [

A characteristic of sc-data is that a subset of measured features (typically the levels, localization and modification states of genes and proteins) are observed to co-vary in individuals cells. In the face of random fluctuation, patterns of covariance potentially contain information on interactions between biochemical species. A key question is how this covariance information should be analyzed to obtain insight into the underlying biochemical pathways. We find that eigendecomposition of covariance matrices from sc-data can be interpreted in terms of network stoichiometry and timescales, without model simulation, independent of kinetic parameters, and unhindered by unobserved species; the latter point is critical because most single-cell data is sparse with respect to the number of reactants than can be measured. These features of the ESS approach are a direct consequence of toric (log-linear) manifolds that arise from an assumption of mass-action kinetics applicable—at least approximately—to a broad class of networks, and hold even under the looser requirement that a steady state is a subset of an approximately toric manifold. If the steady state set fulfills this requirement, the ESS also requires that the steady states be asymptotically stable, and that single cells have reached the exponentially stable neighborhood of these steady states at the time of observation. This is looser than requiring that cells be at steady state or quasi-steady state, further expanding the situations in which the ESS framework is informative and applicable.

We tested our approach using synthetic data derived from various simplified reaction systems and also showed that it can be applied to FACS and multiplex imaging datasets. We extract features from the data that are consistent with an interpretation in a reaction network framework: integer-like stoichiometries for interacting species, and independence of network topology on the dose of a single drug used to perturbed the network. Other kinds of sc-data, such as mass cytometry or sc-RNAseq, can potentially be analyzed using the same approach. Because this paper focuses on the theoretical aspects of toric geometries as applied to sc-data, we have not yet tested any of the biological conclusions derived from the analysis of experimental data. However, the interactions we infer are consistent with current understanding of well-studied human signal transduction networks and with previous publications [

Simulation of synthetic complex-balanced networks and GRNs suggests ways to tailor reaction network ODEs to better match sc-data. Assuming that the primary goal in fitting a network to data is to match the mean

One limitation of the network analysis approach described here is that identifying gaps in the eigenvalue spectrum is a heuristic procedure. Unfortunately, this is true of most other applications in which it is necessary to identify cutoffs in eigenvalue spectra. The relatively low dimensionality of FACS and CyCIF datasets further limits the applicability of those principled approaches that are available, including methods in random matrix theory. However, for larger datasets it will potentially be possible to apply principled methods for identifying gaps that are statistically significant.

Cell regulatory networks are characterized by multistability and limit cycles. The relationship between our analysis and such network structures remains unclear and will require further theoretical work. Multistable, toric steady state sets exist, with many biologically relevant ones studied recently in the class of toric MESSI systems [

The promise of the ESS approach is that it provides a potentially powerful but as-yet unexplored, geometric framework for linking features in sc-data to reaction networks. This is a parallel to recent geometrical analysis of CRNT, in which toric varieties have played a key role. Toric varieties have aided in characterizing the central CRNT concept of complex-balancing [

For any complex-balanced reaction network ^{⊥} =

The equality _{v}(_{v} is non-zero for all

There is a unique steady state point _{e} in each coset of a complex-balanced reaction network [_{1}(_{2}(

While the variance along the orthogonal complement of

Furthermore, if the reaction network has a subnetwork with separably faster rates of convergence than the entire network, additional gaps may occur. In the case of detailed-balance reaction networks, this follows from a singular perturbation approach: Giovangigli et al. showed that for separably fast and slow reactions in such networks, the critical steady-state manifold is equivalent to the steady-state manifold of a network containing only the fast reactions [_{fast} ⊃

Given a _{N} is the null space of ^{T}.

Split the matrix _{obs}, and _{unobs}, corresponding to

Now we show that the orthogonal complement of _{obs}, _{N−n} is the zero-vector in the (

_{n} ⊕ 0_{N−n} ⊂ _{N}, because for all

For the reverse inclusion, a vector in _{N} indicates that

ODE simulations of each reaction network were performed with the

All complex-balanced networks were chosen to have

Each random network was generated with all the nodes representing complexes containing either a single species, or any pair of species. Then, ∼ 0.03% of the possible edges were stochastically chosen. The graph was then symmetrized by adding all the reverse edges, to ensure reversibility. Rate constants were randomly assigned from a log-normal distribution (see

For the GRNs, for

The GRN simulations were not complex-balanced, both because the particular arrangement of irreversible reactions violate weak reversibility, and because the deficiency of the networks were large, indicating a measure zero probability of being complex-balanced.

The framework calls for analyzing chemical concentrations ^{th} chemical species’ signal _{i} = _{i} ⋅ _{i} for some constant _{i} for the cells in one subpopulation. Assuming an excess of antibodies for both the experimental setup of the FACS and CyCIF data, this is simply the requirement that detection is in the linear regime.

The method still works because the _{i}’s would only result in a shift of the affine subspace

Both FACS and CyCIF data were fit with Gaussian mixture models (GMM) to match visible clusters. Cells from any single condition were fit with

GMMs were fit using the

Assuming sparse, random, linear constraints on a distribution, the covariance matrix would have singular eigenvectors whose span can be given by sparse vectors with random orientations. For either the FACS or CyCIF data, the null was given by row reducing

The 95% confidence intervals for the row-reduced vector entries in

(PDF)

(PDF)

Given recombined, singular vectors for the condition of activation with anti-CD3, anti-CD8 and inhibition of Protein Kinase C with G06976, we drew an edge between biomarkers if any vector entries had magnitude larger than 0.2.

(TIF)

(ZIP)

Dear Dr Sorger,

Thank you very much for submitting your manuscript, 'Inferring reaction network structure from single-cell, multiplex data, using toric systems theory', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool,

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at

If you have any questions or concerns while you make these revisions, please let us know.

Sincerely,

Pedro Mendes, PhD

Associate Editor

PLOS Computational Biology

Mona Singh

Methods Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact

[LINK]

Reviewer's Responses to Questions

Reviewer #1: Review: Inferring reaction network structure from single-cell, multiplex data, using toric systems theory, PCOMPBIOL-D-19-01320.

I find this article a nice contribution to the literature, it proposes a novel combination of network structure and statistical study of reaction networks. Partial linearities in log coordinates of steady states of the system are exploited. I suggest acceptance after the authors incorporate my comments below in a revised version of their submission.

1) The stoichiometric subspace S is defined as the linear span of the reaction vectors on line 55, which is correct. However, the reaction vector v defined on line 47 is not the true reaction vector whenever some of the reactant R_i coincides with a product P_j, This does not necessarily happen, e.g. this is not the case in the example on line 198. So this is misleading and it becomes an obscure concept for the reader, while it is very easy to compute S if the whole reaction network is known. Please, define S clearly and explain that its computation is straightforward if the network is known.

2) In display (3), it would be better to add the arrow from C to A curved back to the first node A on the left instead of adding a new node A. This weak reversibility is what allows for a complex balanced steady state and this is an equivalent condition for a linear network as this one.

3) On line 214 it seems that you are missing some references, e,g: SIAM J. Appl. Dyn. Syst., 17(2), 1650–1682. The Structure of MESSI Biological Systems. Mercedes Pérez Millán and Alicia Dickenstein.

Please, take a look at the notion of toric MESSI systems and how the networks are amenable to analytical treatment without simulations (as in several other recent articles). Another important missing reference is:J Theor Biol. 2009 Dec 21;261(4):626-36. The rational parameterization theorem for multisite post-translational modification systems. Thomson M1, Gunawardena J. doi: 10.1016/j.jtbi.2009.09.003.

4) Please add clearly in all your figures (and their corresponding explanations) which are the values of the reaction rate constants that are being taken into account.

5) On line 232, explain which is the assumption needed in order that the consideration of a single protein production step is sensible.

6) Please, make the definition of EES more visible, add a pointer to it from the very beginning of the paper (and consider giving it before in the text).

7) May be it is useful to take a look at section 5 of the MESSI reference above: when the positive steady states can be defined completely by binomial equations (linear after taking logarithms), the comparison of the matrix built from the exponents of these binomials and a matrix given the linear conservation relations defining S, can be used to detect multistationarity. In this case, the analysis cannot be done for all choices of parameters at the same time, but in regions of mono or multistationarity separately (plus finer stability considerations).

8) Your example on page 12, line 344 is easily seen to be s-toric MESSi. You are missing the fact that d[D]/dt = - k_d [S][D] – k_{-d} [SD], thus adding this to d[S]/dt and equating to 0, one immediately gets two binomials: this new one and the previous one involving [E],[S],[F],[P] describing $E$! (without any limit!). By the way, it is unfortunate that the steady state set is denoted by $E$ while E denotes a chemical species. I’d suggest to change the letter for the steady state set.

Reviewer #2: The paper discusses an important question: given measurements of a set of chemical species, which reaction network can reproduce the measurements sufficiently well. The authors suggest a novel approach that requires single cell data and that the data indicates that the measurements approach a steady state. The idea is to analyze the spectrum of the covariance matrix of the measurements and look for a "gap", indicating a subset of eigenvalues that can be considered zero. The eigenvectors corresponding to the nonzero eigenvalues are then interpreted as "effective reactions" that span an effective stoichiometric space. This space is interpreted as the image of a matrix that parametrizes the positive real part of the steady state variety.

I have only one major concern with the paper, it is not precise enough when it comes to the claim that a reaction network is inferred form the data. Only if one assumes that the system is complex balanced, is the effective stoichiometric space related to the reaction network: steady states of complex balanced systems are parameterized by the orthogonal complement of the stoichiometric space. Hence only in this case one may actually claim to have inferred a reaction network. If the system is not complex balanced, then the relation between the reaction network and the matrix that parametrizes the positive real part of the variety is more involved. I think the authors should describe this more clearly: if one chooses to interpret what is inferred as a reaction network, then the corresponding system is complex balanced (with the all the constraints that imposes on the rate constants). If one does not want to impose complex balance, then one infers a monomial parametrization of the positive real part of the variety (which is remarkable enough).

A suggestion: the approach can be applied to any data set coming from single cell measurement. If one finds a gap in the covariance matrix, then this indicates that the data can be described by a model that has a remarkable property: the positive real part of the variety can be parameterized by monomials. For generic polynomials this is rarely the case, so maybe one could advertise this.

Some minor points:

- the caption of figure one: I think it should be steady state set, not steady state

- line 67: I think the name is Kirchhoff

-line 404: I do not think that log-linear manifolds arise from mass action kinetics. They require a special network structure, not every mass action network gives rise to such a manifold.

-line 406-408. This sentence does not make any sense to me. You say you do not need QSS but that the steady state is asymptotically stable. And then you say that enough time should have passed to approach the QSS What is it that you want to say?

-line 482 what does it mean when you say all trajectories within a coset are forward invariant? It is my understanding that the coset is forward invariant.

-line 488 you could state that the upper bound is for all trajectories within a coset, just to be precise.

-reference 18. As far as I know, the authors describing mass action kinetics for the first time are called Waage and Guldberg.

**********

Large-scale datasets should be made available via a public repository as described in the

Reviewer #1: No: The rate constants considered should be clearly specified.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Reviewer #1: No

Reviewer #2: No

Submitted filename:

Dear Dr Sorger,

We are pleased to inform you that your manuscript 'Inferring reaction network structure from single-cell, multiplex data, using toric systems theory' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.

In the meantime, please log into Editorial Manager at

One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology.

Sincerely,

Pedro Mendes, PhD

Associate Editor

PLOS Computational Biology

Mona Singh

Methods Editor

PLOS Computational Biology

PCOMPBIOL-D-19-01320R1

Inferring reaction network structure from single-cell, multiplex data, using toric systems theory

Dear Dr Sorger,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Sarah Hammond

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom