Stochastic shielding and edge importance for Markov chains with timescale separation

Nerve cells produce electrical impulses (“spikes”) through the coordinated opening and closing of ion channels. Markov processes with voltage-dependent transition rates capture the stochasticity of spike generation at the cost of complex, time-consuming simulations. Schmandt and Galán introduced a novel method, based on the stochastic shielding approximation, as a fast, accurate method for generating approximate sample paths with excellent first and second moment agreement to exact stochastic simulations. We previously analyzed the mathematical basis for the method’s remarkable accuracy, and showed that for models with a Gaussian noise approximation, the stationary variance of the occupancy at each vertex in the ion channel state graph could be written as a sum of distinct contributions from each edge in the graph. We extend this analysis to arbitrary discrete population models with first-order kinetics. The resulting decomposition allows us to rank the “importance” of each edge’s contribution to the variance of the current under stationary conditions. In most cases, transitions between open (conducting) and closed (non-conducting) states make the greatest contributions to the variance, but there are exceptions. In a 5-state model of the nicotinic acetylcholine receptor, at low agonist concentration, a pair of “hidden” transitions (between two closed states) makes a greater contribution to the variance than any of the open-closed transitions. We exhaustively investigate this “edge importance reversal” phenomenon in simplified 3-state models, and obtain an exact formula for the contribution of each edge to the variance of the open state. Two conditions contribute to reversals: the opening rate should be faster than all other rates in the system, and the closed state leading to the opening rate should be sparsely occupied. When edge importance reversal occurs, current fluctuations are dominated by a slow noise component arising from the hidden transitions.


S1 Supporting Information
We establish the decomposition of the stationary variance and calculate η, the fraction of variance of the observable state arising from the hidden edges, providing explicit calculations for the 3-state process. We also review the connection between the population process and Gaussian approximations thereof, and give a detailed derivation of the Lyapunov equation for the 3-state case.
We show in Lemma 1 and Theorem 1 (see Results) that there is a unique decomposition of the stationary covariance into a sum of contributions from each edge in the graph (or, in the case of detailed balance, each reciprocal pair of edges). Since the stationary covariance matrix C satisfies the Lyapunov equation (Eq 46) and BB decomposes into a sum of contributions from each edge, we can establish the linear decomposition of C as sum of contributions from each edge. This allows us to rank each edge in terms of its importance, and it allows us to derive the accuracy of the stochastic shielding approximation. This result holds for the exact underlying process N(t) and for any number of particles N tot ≥ 1; the result is not based on an approximation. We derive this decomposition for the 3-state process in following section. To our knowledge, this decomposition has not been shown previously.

Decomposition of the Stationary Variance: 3-State Process
This section gives a detailed description of the framework and derivation of results for the 3-state process. We also make the connection between our previous work, a Gaussian approximation of the exact process [10], and the results for the exact discrete-state formulation in the current paper.
The exact formulation of the 3-state process is a discrete state, continuous time Markov jump process N(t) (see Fig 3 for an illustration of the transition state graph, and note that state 3 is the observable state which means that the measurement vector in this case is M = 0 0 1 ). Specifically, N(t) is an integer-valued process that can be described using the random time change representation [40,41] in terms of four independent unit rate Poisson processes Y 1 , Y 2 , Y 3 , Y 4 : Here, N(t) = (N 1 (t), N 2 (t), N 3 (t)) is the state occupancy vector at time t and satisfies 3 i=1 N i (t) ≡ N tot . The stoichiometry vectors ζ k are given by where edge k represents a transition from source node i(k) to destination node j(k). The graph Laplacian L and matrix B for the 3-state process are given by where J k = N tot α k π i(k) is the stationary flux along the k th edge (computed below). The steady state node occupancy probabilities π i for the 3-state process are Detailed balance is easy to establish. The steady state flux J k along edge k is J 1 = π 1 α 1 = α 1 α 2 α 4 α 1 α 3 + α 1 α 4 + α 2 α 4 = π 2 α 2 = J 2 (S7) J 3 = π 2 α 3 = α 1 α 3 α 4 α 1 α 3 + α 1 α 4 + α 2 α 4 = π 3 α 4 = J 4 .
PLOS 1/9 Note that the stationary distribution for the N tot random walkers is the multinomial distribution with probability vector π = (π 1 , π 2 , π 3 ) . The stationary mean isN = E[N] = N tot π. The stationary covariance is C = diag[ π] − π π , i.e. the variance of the population at node i is N tot π i (1 − π i ) and the covariance of nodes i and j is −N tot π i π j . If we let R = C 3,3 be the stationary variance of state 3, then we have Note that this is a special case of Eq 42. In this formula, J k is the stationary flux along the k th edge, and λ 3 ≤ λ 2 < 0 are the two nontrivial eigenvalues of the graph Laplacian L (which always has λ 1 ≡ 0). The corresponding right and left eigenvectors are v 3 , v 2 and w 3 , w 2 , respectively. Along with v 1 = π and w 1 = (1, 1, 1) , the eigenvectors are normalized to form a biorthogonal set, i.e. w i v j = δ ij .
We can obtain the eigenvectors and eigenvalues explicitly for each of the twelve cases of the 3-state chain as a function of the parameter α (See Table 3 for a description of the cases; see the next section for eigenvectors and eigenvalues). Then we can examine the range of applicability of the stochastic shielding method [9], the edge importance measure [10], and the connection to the power spectrum.

Explicit calculation of η (fraction of variance generated by the hidden edges)
To calculate η ≡ R 12 R 12 + R 23 , the fraction of the stationary variance of the observable state coming from the hidden edges in the 3-state chain, we derive expressions for the eigenvalues and eigenvectors of the graph Laplacian L as functions of the transition rates α ij . The eigenvalues are given by where α sum = α 12 + α 21 + α 23 + α 32 and Z = α 12 α 23 + α 12 α 32 + α 21 α 32 . Notice that α sum = −Tr [L] where Tr[L] is the trace of L. The right eigenvectors are The left eigenvectors are 2 } (S18) 3 , u 3 } (S19) The variance components are given by 2α sum Z 2 which correspond to the contributions made by the hidden and visible edges, respectively. Note that R 12 = R 21 and R 23 = R 32 due to detailed balance. The total variance is V tot = 2(R 12 + R 23 ): It follows that (verified by Mathematica and simplifying terms by hand) The set of equations above are equivalent to Eq 22 given in Results (note that the equality of the first factors in Eq S26 and S27 can be easily derived from Eq S4 -S6). Edge importance is reversed when R 12 > R 23 , and this corresponds to the condition η > 1/2.

Canonical transformation leading to edge importance reversal: detailed calculations
On page 16 of the main paper, we introduced a canonical transformation of a given set of transition rates that lead to edge importance reversal. Here we provide the details justifying our assertions. Given an arbitrary initial set of transition rates α 0 ij > 0 and > 0, define the following rescaled transition rates We will show that as approaches 0, we eventually reverse the edge importance and then approach η = R 12 /(R 12 + R 23 ) → 1. At the same time π 2 → 0, the timescale separation grows, as does the flux imbalance. The stationary probabilities satisfy so for any choice of α 0 ij > 0, π 2 becomes arbitrarily small for sufficiently small . Similarly, it is easy to see that J 12 = π 1 α 12 = α 0 12 α 0 21 α 0 32 /Z ∼ O( ), while J 23 = π 2 α 23 = α 0 12 α 0 23 α 0 32 /Z ∼ O(1), as → 0 + . So the ratio J 12 /J 23 = α 0 21 /α 0 23 ∼ O( ) as decreases. Thus, the condition that the majority of the stationary flux occurs along the observable edge is guaranteed to hold for sufficiently small .
In addition, as decreases the timescale separation grows without bound. The timescale factor ν = λ 3 /λ 2 (the ratio of the two non-zero eigenvalues) grows quadratically with : The fraction of the stationary variance due to the hidden edges satisfies Thus, for any positive values of α 0 ij , accelerating the 2 → 3 transition while decelerating the 2 → 1 transition will eventually lead to strongly inverted edge importance, with an arbitrarily large fraction of the stationary variance in the occupancy of the third state arising from the fluctuations in the transitions between nodes 1 and 2.

Connection to Gaussian approximation
We now facilitate going back and forth between the two formulations (Gaussian approximation and exact formulation) as we spell out the connection to our previous work [10]. We illustrate this for the 3-state process, but the connection holds in general.
For a given set of fixed transition rates α k , and sufficiently large population size N tot , we may approximate the integer process N(t) with the solution of a Langevin equation with independent Gaussian white noise forcing arising from each directed edge [42], §8. Namely, we have the Itô stochastic differential equation with linear drift L X and state dependent noise amplitude B(X) given by the matrices As a further simplification, valid for yet larger N tot , we may consider a linear Langevin equation with strictly additive noise, given by dX = LX dt + B 0 dW (S40) PLOS 4/9 with L as above, and where matrix B 0 depends on the equilibrium population sizeN = [N 1 ,N 2 ,N 3 ] for a total population of N tot ion channels, and W ∈ R 3 is a standard 4-dimensional Brownian motion. Recall that π is the leading eigenvector of the graph Laplacian L which means thatN i = N tot π i will change as a function of α, but we suppress this notation for clarity. We note that the size of the fluctuations on each edge is reflected in the columns of B 0 and that this is the same matrix we consider in the 3-state model with timescale separation shown in Equation 8. For the linear additive Gaussian noise process, we showed in [10] that the stationary variance of the observable state 3 decomposes into a sum of contributions from each edge as given in Equations S9 and S10.

Derivation of the Variance: 3-State Process
Lyapunov's equation (Eq 46) is well known to hold not only for linear Gaussian processes (multivariate Ornstein-Uhlenbeck processes) but also for discrete population processes in which the rates of transitions are linear functions of the population at each node, i.e. first-order transition networks, such as those we consider here (see [8], [44] page 1, ¶5).
In general, Q can be a function of time; nothing in the following derivation changes, even if Q is a discontinuous function (as in a voltage clamp experiment with sudden changes in the command potential).
To keep notation simple we write Q in place of Q(t).

5/9
To proceed, let N i (t) be the function that is 1 when X(t) = i and zero otherwise, for i ∈ {1, 2, 3}. Then the average of the vector N = (N 1 , N 2 , N 3 ) can be found componentwise, and satisfies d dt If we instead consider N tot random walkers moving independently, we end up with the same equation, since the averages of sums equal the sums of the averages. For the variance, we again consider a single random walker. If we have N tot individuals moving independently, then the variances sum, but the case N tot = 1 is easier to work with. Let X = (X 1 , X 2 , X 3 ) denote the row vector of indicator functions. Let M = E[X X] be the matrix of second moments, so M ij = E[X i X j ], and recall the definition of the covariance Since there is only one particle, X i X j ≡ 0 whenever i = j. If i = j, then X i X j = X 2 i = X i since X i is either zero or one. Therefore, where diag(p) is the diagonal matrix with entries given by the components of the vector p.
We wish to derive an expression for the rate of change of the covariance matrix C(t), i.e. the instantaneous covariance of X. (This is different from the lagged covariance matrix under stationary conditions).
Differentiating (S53), Using the observation that q 11 q 12 q 13 q 21 q 22 q 23 q 31 q 32 q 33   =   p 1 q 11 p 1 q 12 p 1 q 13 p 2 q 21 p 2 q 22 p 2 q 23 p 3 q 31 p 3 q 32 p 3 q 33   (or, in general (diag(p)Q) ij = p i q ij ), elementary algebra yields But p 1 q 11 +p 2 q 21 +p 3 q 31 −2p 1 q 11 = p 2 q 21 +p 3 q 31 −p 1 q 11 = p 2 q 21 +p 3 q 31 −p 1 (q 12 −q 13 ) = p 2 q 21 +p 1 q 12 +p 3 q 31 +p 1 q 13 . (For the 3-state chain model q 31 ≡ 0 ≡ q 13 , but here we allow for the general case). Similar arithmetic leads to Note this expression is valid whether or not we are in equilibrium. In equilibrium, if detailed balance holds, we would have p i q ij = p j q ji for all i and j, and the expression would simplify. Now we see that it is always possible to find the matrix B explicitly, i.e.
Again, this expression is exact even out of equilibrium, when p is varying. (This also holds when Q is a time varying function). The steady state for C, when Q is a constant matrix, therefore satisfies In the stochastic shielding approximation, we replace B with a new matrix where the ij reflect whether the noise contribution of edge i → j is included ( ij = 1) or neglected ( ij = 0). We can now address the question of how to write the stationary variance (e.g. C 33 , the variance of the open state) as a sum of contributions from the edges. As we previously observed (see the HH Na and K channel figures in [10]) under stationary conditions with detailed balance, the contribution of edges i → j and j → i are the same. This symmetry holds in the present case as well.
Demonstration that the sum condition holds Let C be a covariance matrix, i.e. for some vector valued random variable [X 1 , . . . , X n ] , . Suppose the total n i=1 X i ≡ const is preserved. Equivalently, the vector X is distributed such that, with probability 1, ∆ 1 + . . . + ∆ n ≡ 0. Then This condition holds for our networks because they conserve probability (or total population). And the sum includes each off-diagonal covariance term twice, hence the condition (S78) is equivalent.