Intrinsic Noise Induces Critical Behavior in Leaky Markovian Networks Leading to Avalanching

The role intrinsic statistical fluctuations play in creating avalanches – patterns of complex bursting activity with scale-free properties – is examined in leaky Markovian networks. Using this broad class of models, we develop a probabilistic approach that employs a potential energy landscape perspective coupled with a macroscopic description based on statistical thermodynamics. We identify six important thermodynamic quantities essential for characterizing system behavior as a function of network size: the internal potential energy, entropy, free potential energy, internal pressure, pressure, and bulk modulus. In agreement with classical phase transitions, these quantities evolve smoothly as a function of the network size until a critical value is reached. At that value, a discontinuity in pressure is observed that leads to a spike in the bulk modulus demarcating loss of thermodynamic robustness. We attribute this novel result to a reallocation of the ground states (global minima) of the system's stationary potential energy landscape caused by a noise-induced deformation of its topographic surface. Further analysis demonstrates that appreciable levels of intrinsic noise can cause avalanching, a complex mode of operation that dominates system dynamics at near-critical or subcritical network sizes. Illustrative examples are provided using an epidemiological model of bacterial infection, where avalanching has not been characterized before, and a previously studied model of computational neuroscience, where avalanching was erroneously attributed to specific neural architectures. The general methods developed here can be used to study the emergence of avalanching (and other complex phenomena) in many biological, physical and man-made interaction networks.

x n ((m + 1)∆t) = b n (x x x(m∆t)), for every n ∈ N , (S1) or more general probabilistic updating rules [3,4]. The vast majority of complex networks of interest do not update their states in a synchronous manner. As a consequence, Boolean networks tend to oversimplify the dynamics of many real networks. To address this problem, a number of investigators have focused their effort on an asynchronous stochastic updating scheme that leads to stochastic asynchronous Boolean networks [5][6][7]. According to this scheme, the state of a Boolean network at time (m + 1)∆t is determined by randomly selecting a node in the network (usually uniformly among all nodes), by updating the state of this node using the associated Boolean function, and by leaving the states of the remaining nodes unchanged. In this case, x n ((m + 1)∆t) = b n * (x x x(m∆t)), for n = n * x n (m∆t), for n = n * , where n * is the node selected to be updated at time (m + 1)∆t. Although the previous modification results in a model than may be more realistic than the classical Boolean model, it does not take into consideration major features of real complex networks. In particular, the model does not account for the facts that state updating can occur at any time t (not necessarily at discrete times m∆t) and that the time of next updating as well as the node to be updated can be influenced by the current state of the network. To address these issues, a number of models have been proposed in the literature [8][9][10][11][12][13][14]. It turns out that the LMN model discussed in this paper effectively addresses these problems and provides a natural alternative to the stochastic Boolean network models studied in the literature; see [14]. As a matter of fact, if the network is at state x x x(t) at time t, then the time t + τ * at which the state of the network will be next updated can be determined by drawing a sample τ * from the exponential distribution e t (τ ) = n∈N α n (x x x(t)) exp − τ n∈N α n (x x x(t)) , τ > 0, where α n (x x x) = (1 − x n ) + n + f n (r n (x x x)) + x n − n + g n (r n (x x x)) .
Moreover, the node to be updated can be specified by drawing a sample n * from the probability distribution u t (n) = α n (x x x(t)) n ∈N α n (x x x(t)) , n ∈ N .
This implies that the LMN model is a continuous-time Boolean network model with state-dependent asynchronous node updating and Boolean functions b n (x x x) = 1 − x n assigned at each node n ∈ N .

Markovianity of the Fractional Activity Process
Because the activity process {X X X(t), t ≥ 0} is a Markov process, only a single transition from X X X(t) to X X X(t) + e e e n * or X X X(t) − e e e n * , for some n * ∈ N , can occur within the infinitesimally small time interval [t, t + dt), where e e e n * is the n * -th column of the N × N identity matrix. Since where K = {1, 2, . . . , K}, with K being the number of homogeneous sub-populations N k , and N k = |N k | (i.e., the cardinality of N k ), this also means that the fractional activity process {Y Y Y (t), t ≥ 0} can transition only once within [t, Here, e e e k is the k-th column of the K × K identity matrix multiplied by N −1 k and k ∈ K is such that n * ∈ N k . As a matter of fact, the transition probabilities are given by (1 − x n ) λ + k + φ k (ρ k (y y y)) dt x n λ + k + φ k (ρ k (y y y)) dt where p + n (x x x) is the propensity function of activation of the n-th node of the LMN (see Methods in Main Text) and the k-th element y k of y y y is given by N −1 k n∈N k x n . Moreover, φ k and λ + k are such that, for every k ∈ K, f n = φ k and + n = λ + k , for all n ∈ N k , and we have ignored terms that go to zero faster than dt. Finally, we have used the assumption that there exists a function ρ k such that r n (x x x) = ρ k (y y y), for every n ∈ N k . Likewise, we have that x n λ − k + γ k (ρ k (y y y)) dt = N k y k λ − k + γ k (ρ k (y y y)) dt, where p − n (x x x) is the propensity function of inactivation of the n-th node of the LMN (see Methods in Main Text), whereas γ k and λ − k are such that, for every k ∈ K, g n = γ k and − n = λ − k , for all n ∈ N k . The previous discussion shows that the fractional activity process {Y Y Y (t), t ≥ 0} is Markovian with propensity functions given by π + k (y y y) = N k (1 − y k ) λ + k + φ k (ρ k (y y y)) (S10) π − k (y y y) = N k y k λ − k + γ k (ρ k (y y y)) . (S11) As a consequence, the probability distribution P (y y y; t) satisfies the following master equation: ∂P (y y y; t) ∂t = k∈K π + k (y y y − e e e k )P (y y y − e e e k ; t) + π − k (y y y + e e e k )P (y y y + e e e k ; t) − π + k (y y y) + π − k (y y y) P (y y y; t) . (S12) Finally, it is important to note that, when the net input to a node n in the LMN is given by r n (x x x) = h n + a a a T n x x x, we can find a function ρ k such that r n (x x x) = ρ k (y y y), for every n ∈ N k . Indeed, if h n = η k , for every n ∈ N k , and a nn = w kk /N k , for every n ∈ N k , n ∈ N k , then, for every n ∈ N k , we have that which shows that ρ k (y y y) = η k + k ∈K w kk y k .

Linear Noise Approximation
By following [15] (see also [16]), we define the shift operators Σ − k and Σ + k as Σ − k ϕ(y 1 , . . . , y k−1 , y k , y k+1 , . . . , y K ) := ϕ(y 1 , . . . , y k−1 , y k − N −1 k , y k+1 , . . . , y K ) (S14) Σ + k ϕ(y 1 , . . . , y k−1 , y k , y k+1 , . . . , y K ) := ϕ(y 1 , . . . , y k−1 , y k + N −1 k , y k+1 , . . . , y K ), (S15) for k ∈ K and any function ϕ(y y y). By using a Taylor series expansion, we have that In this case, we can write the master equation (S12) as k + · · · × π + k (y y y)P (y y y; t) k + · · · × π − k (y y y)P (y y y; t) . (S18) If we now define the drift and diffusion functions by respectively, then This equation, together with the ansatz yields ∂P (y y y; t) ∂t In Eq. (S22), µ µ µ(t) solves the macroscopic differential equations: initialized by µ µ µ(0) = 0, whereas, for each t, W k (t), k ∈ K, are zero-mean correlated Gaussian random variables. We can now use a Taylor expansion of the drift and diffusion functions to obtain: for k ∈ K, where A kk (y y y) := ∂A k (y y y)/∂y k and A kk k (y y y) := ∂ 2 A k (y y y)/∂y k ∂y k , and likewise for the diffusion functions. In this case, Eq. (S23) becomes We can now replace the probability distribution P (y y y; t) of the fractional activity process with the probability distribution P (w w w; t) of the noise process, in which case we obtain by virtue of the fact that where the second equality is a consequence of the fact that dY k (t)/dt = 0 (except at time points on a set of measure zero at which the derivative is infinite). This implies that dw k (t)/dt = −N 1/2 k dµ k (t)/dt, by virtue of Eq. (S22).
Note now that µ µ µ(t) satisfies the macroscopic equations dµ k (t)/dt = A k (µ µ µ(t)). As a consequence, and by virtue of Eq. (S27), the noise probability distribution P (w w w; t) is approximately governed by the linear Fokker-Planck equation where ζ k = N k /N and we ignore the terms of O(N −1/2 k ). Note that this equation does not depend on N , since we have used the relation N k /N k = ζ k /ζ k , which is true at any point en route to the thermodynamic limit. The solution to this equation is a multivariate Gaussian density with zero mean and correlation matrix R(t) with elements r kk (t) = E[W k (t)W k (t)] that satisfy the following system of Lyapunov equations: for t > 0 and k, k ∈ K, initialized with r kk (0) = 0, for every k, k ∈ K, where ∆ is the Kronecker delta function.
We note here that there are two approximation steps involved with the LNA method. The first step is the ansatz given by Eq. (S22), whereas the second step is ignoring all terms of O(N −1/2 k ) in Eq. (S27). Theoretically speaking, justification of the second step is simple -we can take N to be large so that 1/ √ N k is small enough that any term in the expansion multiplied by 1/ √ N k is negligible. This requirement however is not useful in practice, since we cannot determine the appropriate value of N that satisfies the required condition. On the other hand, assuming that N is large enough for the previous approximation to be valid, Eq. (S22) can be justified only for monostable systems [15].

Leakiness and Irreducibility
In the theory of Markov processes, the state y y y is said to be accessible from another state y y y if there is a non-zero probability to transition (possibly through intermediate states) from y y y to y y y. We denote this by y y y → y y y. The states y y y and y y y are said to be communicating whenever y y y → y y y and y y y → y y y. In this case, we write y y y y y y . If all states y y y ∈ Y are communicating, then there is a non-zero probability to transition from any state to any other state, and the Markov process is said to be irreducible. An irreducible Markov process has a unique, strictly positive, and asymptotically stable stationary probability distribution that is independent of the initial condition.
It is often difficult to prove that a Markov processes is irreducible. The following results allow us to determine when the fractional activity process Y Y Y (t) in a LMN is irreducible. It turns out that leakiness is intimately related to irreducibility. Proposition 1. If λ + k > 0, for every k ∈ K, then 0 → y y y → 1 for all y y y ∈ Y, where 0 is the state of zero fractional inactivity and 1 is the state of maximum fractional activity.
Proof. For every y y y ∈ Y such that y k ≤ 1 − 1/N k , for some k ∈ K, we have π + k (y y y) = N k (1 − y k )[λ + k + φ k (ρ k (y y y))] ≥ λ + k + φ k (ρ k (y y y)) > 0, where the second inequality is due to the fact that λ + k > 0 and φ k (ρ k (y y y)) ≥ 0. Therefore, y y y → y y y + e e e k , where e e e k is the k-th column of the K × K identity matrix multiplied by N −1 k . Thus, for any state y y y ∈ Y such that y y y + e e e k ∈ Y, we have that y y y → y y y + e e e k . Let y y y = (n 1 /N 1 , n 2 /N 2 , . . . , n K /N K ) T , for some n 1 , n 2 , . . . , n K . A non-zero probability path from 0 to y y y is then given by the following sequence: n 1 transitions taking y y y → y y y + e e e 1 , followed by n 2 transitions taking y y y → y y y + e e e 2 , . . ., followed by n K transitions taking y y y → y y y + e e e K . Likewise, a non-zero probability path from y y y to 1 is given by the following sequence: N 1 − n 1 transitions taking y y y → y y y + e e e 1 , followed by N 2 − n 2 transitions taking y y y → y y y +ẽ e e 2 , . . ., followed by N K − n K transitions taking y y y → y y y + e e e K . Hence, 0 → y y y → 1, for all y y y ∈ Y. Corollary 1. If λ + k > 0, for every k ∈ K, and 1 → 0, then Y Y Y (t) is irreducible. Proof. From Proposition 1, and for any y y y , y y y ∈ Y, we have that y y y → 1 and 0 → y y y . Since, 1 → 0, this implies y y y → 1 → 0 → y y y and thus Y Y Y (t) is irreducible.
k > 0, for every k ∈ K, then 1 → y y y → 0 for all y y y ∈ Y. Proof. For every y y y ∈ Y such that y k ≥ 1/N k , for some k ∈ K, we have π − k (y y y) = N k y k [λ − k + γ k (ρ k (y y y))] ≥ λ − k + γ k (ρ k (y y y)) > 0, where the second inequality is due to the fact that λ − k > 0 and γ k (ρ k (y y y)) ≥ 0. Therefore, y y y → y y y − e e e k . Thus, for any state y y y ∈ Y such that y y y − e e e k ∈ Y, we have that y y y → y y y − e e e k . Let y y y = (n 1 /N 1 , n 2 /N 2 , . . . , n K /N K ) T , for some n 1 , n 2 , . . . , n K . A non-zero probability path from 1 to y y y is then given by the following sequence: N 1 − n 1 transitions taking y y y → y y y − e e e 1 , followed by N 2 − n 2 transitions taking y y y → y y y − e e e 2 , . . ., followed by N K − n K transitions taking y y y → y y y − e e e K . On the other hand, a non-zero probability path from y y y to 0 is given by the following sequence: n 1 transitions taking y y y → y y y − e e e 1 , followed by n 2 transitions taking y y y → y y y − e e e 2 , . . ., followed by n K transitions taking y y y → y y y −ẽ e e K . Hence, 1 → y y y → 0, for all y y y ∈ Y.
From Proposition 2, and for any y y y , y y y ∈ Y, we have that y y y → 0 and 1 → y y y . Since, 0 → 1, this implies y y y → 0 → 1 → y y y and thus Y Y Y (t) is irreducible.
By combining the previous results, we obtain the following theorem: is irreducible. Note that physical systems satisfy a condition referred to as microscopic reversibility. This means that any physically possible transition from one state to another is also possible in the reverse direction. According to Theorem 1, taking λ − k , λ + k > 0, for every k ∈ K, results in an irreducible LMN model that ensures microscopic reversibility since, in this case, transitions from one state to another are also possible in the reverse direction. Therefore, we can make a LMN that violates microscopic reversibility satisfy this condition by setting the leak parameters to positive (and, if necessary, minuscule) values.

Thermodynamic Stability, Robustness, and Critical Behavior
We now employ a parameter Ω = N/N 0 , where N is the net number of nodes in the network and N 0 is a normalizing constant. If N 0 1, we may approximately assume that Ω is continuous-valued. We use Ω to quantify the size of the network.
The probability distribution of the fractional activity process is given by the Boltzmann-Gibbs distribution where V Ω (y y y; t) := − 1 Ω ln P Ω (y y y; t) P Ω (y y y * is the potential energy function with y y y * Ω (t) being a state in Y at which P Ω (y y y; t) attains its (global) maximum at time t, and Z Ω (t) := y y y∈Y e −ΩVΩ(y y y;t) (S33) is the partition function. Based on this formula, we can define a number of quantities that can be used to characterize the behavior of a LMN (and, as a matter of fact, of any Markovian network [17]), when nodes are removed from the network. We adopt these quantities from classical thermodynamics where they are used to describe the behavior of a physical system as its volume contracts or expands [18][19][20][21].
In particular, we can define the internal energy, entropy, and Helmholtz free energy, and subsequently introduce the concepts of internal potential energy, free potential energy, internal pressure, pressure, and bulk modulus. In the following, we denote by A the stationary limit of a time-varying parameter A(t); i.e., A = lim t→∞ A(t).
The internal energy at time t is defined by where is the energy of state y y y ∈ Y and E t [·] denotes expectation with respect to the probability distribution P Ω (y y y; t). On the other hand, the Helmholtz free energy F Ω (t) at time t is defined by where S Ω (t) is the entropy at time t, given by In thermodynamic terms, the Helmholtz free energy measures the energy available in the LMN to do work when the number of nodes is kept fixed. Note that U Ω (t) ≥ 0 and 0 ≤ S Ω (t) ≤ ln |Y|, for every Ω and t ≥ 0, where |Y| is the cardinality of the state-space Y. It also turns out that F Ω (t) ≥ 0 and dF Ω (t)/dt ≤ 0, for every Ω and t ≥ 0, with equality only at steady-state [17,22,23]. where is the internal potential energy of the LMN. Moreover, Note that A Ω (t) is the portion of the Helmholtz free energy due to the internal potential energy of the LMN. For this reason, we refer to A Ω (t) as the free potential energy. In thermodynamic terms, the free potential energy measures the portion of the energy, not accounted for by the energy of the most likely state, available in the LMN to do work when the number of nodes is kept fixed. The quantities define the internal pressure, pressure, and bulk modulus of the LMN, respectively. The internal pressure quantifies the rate of change in internal potential energy with respect to a change in the number of nodes, whereas the pressure quantifies the rate of change in free potential energy. Moreover, the bulk modulus measures the network's resistance to changing pressure. Note that P is the rate of entropy change with respect to a change in the number of nodes. Moreover, a network with near zero bulk modulus experiences negligible changes in pressure under changes in the number of nodes. The inverse bulk modulus is known as compressibility.
In the following, we focus our interest on the stationary behavior of a LMN. Due to the irreducibility properties of LMNs, all stationary thermodynamic quantities are unique and characteristic to the particular network under consideration. From Eqs. (S32) & (S39), note that where E[·] denotes expectation with respect to the stationary probability distribution P Ω (y y y) and I Ω (y y y) := − lnP Ω (y y y) = ΩU Ω (y y y).
The function I Ω (y y y) quantifies the amount of information associated with the occurrence of state y y y at steady-state, known as the self-information of state y y y. Therefore, and from an information-theoretic perspective, the internal potential energy measures how far the self-information of the most likely state at steady-state is from the expected self-information of all network states (which is the entropy). Note that zero internal potential energy implies zero self-information for the most likely state. In this case, the network will be at the most likely state with probability one. As a consequence, we may consider the internal potential energy as a thermodynamic measure of the "stability" of a particular ground state of the stationary potential energy landscape V Ω (y y y) with smaller values indicating increasing stability of that state. From Eqs. (S31), (S37), (S39), (S41) & (S47), we also have that ΩV Ω (y y y)P Ω (y y y) + y y y∈Y lnP Ω (y y y) P Ω (y y y) = y y y∈Y ΩV Ω (y y y)P Ω (y y y) − y y y∈Y Therefore, the negative of the free potential energy is the self-information of the most likely state at steady-state. Note that the internal potential energy of a LMN with equally probable states at steadystate is zero, whereas its free potential energy equals − ln |Y|. On the other hand, the internal potential energy of a LMN with "crystalized" behavior around a unique ground state is also zero, and the same is true for the free potential energy. Moreover, by virtue of Eqs. (S44) & (S49). Therefore, the pressure gives the slope of the support curve of the self-information of the most likely state at steady-state, whereas the bulk modulus is proportional to the "curvature" of this curve. In general, lim where v ∞ is a constant, given by is known as large deviation rate function [24]. This function characterizes the decay rate of the probability distribution away from the ground states as Ω → ∞. Moreover, its steady-state value V ∞ (y y y) = lim t→∞ V ∞ (y y y; t) acts as a Lyapunov function for the macroscopic equations (S24) [17,25]. This means that the solution µ µ µ(t) of the macroscopic equations produces a downhill motion in the value of the potential energy landscape V ∞ (y y y) until it asymptotically reaches a stable stationary state. Analytical derivation of V ∞ (y y y; t) is not possible in general. However, when the macroscopic equations are monostable, the system size expansion of van Kampen implies that In these equations, µ µ µ(t) solves the macroscopic equations (S24), the covariance matrix is the correlation matrix of the random vector W W W (t) whose elements solve the Lyapunov equations (S30), whereas Z is a diagonal matrix with elements ζ 1 , ζ 2 , . . . , ζ K , where ζ k = N k /N . Clearly, V ∞ (y y y; t) is hyper-quadratic in this case around the macroscopic solution µ µ µ(t), which is now the unique ground state. Moreover, the shapes of the equipotential surfaces centered at µ µ µ(t) are ellipsoidal, determined by R −1 (t) and, therefore, by the Lyapunov equations (S30).
If the stationary solution of the master equation can be well-approximated by the LNA method, then for those values of Ω at which this is true, where K is the dimension of the state-space Y. Indeed, from Eqs. (S39) & (S59), we have that where tr[A] denotes the trace of a matrix A. To obtain the previous result, we used three well-known properties of the trace: the trace of a scalar is itself, the trace is a linear operator (and therefore it commutes with expectation), and the trace is invariant under cyclic permutations. Moreover, we used C when the LNA method provides a sufficiently accurate solution to the master equation at steady-state.
We can use the pressure as a measure of (thermodynamic) robustness of the LMN with respect to the network size Ω. We say that the LMN is robust against variations in network size if there is no appreciable change in pressure when adding or removing nodes. Therefore, the LMN is robust if the derivative ∂P Ω /∂Ω of the pressure is small. As a consequence of Eq. (S50), the LMN is robust if the bulk modulus is small (especially at small network sizes). This implies that a robust LMN must significantly resist changes in pressure. On the other hand, Eqs. (S50) & (S51) reveal that the LMN is robust if the network is characterized by a "blunt" self-information curve σ * (Ω) with small curvature. Note that, if Ω is sufficiently large, then Eq. (S54) implies that the pressure and bulk modulus will approximately be zero and the network will be robust to changes in size. This also implies that the slope of the self-information support curve σ * (Ω) will approximately be zero and the same will be true for its curvature.
It is important to emphasize here that we can use the bulk modulus B Ω to detect network sizes at which the LMN exhibits critical behavior. As a matter of fact, it is well-known that an intensive thermodynamic quantity, such as the pressure, may experience a sharp discontinuity when another thermodynamic variable, such as the network size, varies past a critical value. If the pressure P Ω of the LMN experiences such a discontinuity as the network size Ω varies past a critical value Ω c , then B Ω will effectively capture this discontinuity by a delta function located at Ω c , thus indicating that the network experiences phase transition at Ω c .
As a consequence of the previous discussion, if the LNA method is valid for large values of Ω, then B Ω 0 at these network sizes. Moreover, if the thermodynamic behavior of the LMN changes abruptly when Ω is decreased past a critical value Ω c , then B Ω will produce a sharp spike at this value. A critical network size can demarcate a discontinuous transition of potential wells associated with the ground states. This is a direct consequence of the fact that spike-like behavior in the bulk modulus indicates an abrupt change in the slope of the self-information support curve σ * (Ω) of the most likely state at steady-state, as predicted by Eq. (S50).

Equilibrium and Non-equilibrium LMNs
We now discuss nuanced technical issues which arise when one is interested in determining the equilibrium status of a LMN.
It can be shown (e.g., see [17,26]) that the entropy of a LMN satisfies the balance equation where s Ω (t) is the entropy production rate of the network, defined by s Ω (t) := 1 2 k∈K y y y∈Y ϕ + k (y y y; t)A + k (y y y; t) + ϕ − k (y y y; t)A − k (y y y; t) , whereas h Ω (t) is the heat dissipation rate, defined by h Ω (t) := 1 2 k∈K y y y∈Y ϕ + k (y y y; t) ln π + k (y y y − e e e k ) π − k (y y y) π − k (y y y + e e e k ) π + k (y y y) , with e e e k being the k-th column of the K × K identity matrix multiplied by N −1 k . In these equations, ϕ + k (y y y; t) := π + k (y y y − e e e k )P Ω (y y y − e e e k ; t) − π − k (y y y)P Ω (y y y; t) (S65) ϕ − k (y y y; t) := π − k (y y y + e e e k )P Ω (y y y + e e e k ; t) − π + k (y y y)P Ω (y y y; t) are the forward and reverse probability fluxes associated with the k-th node of the network, whereas A + k (y y y; t) := ln π + k (y y y − e e e k )P Ω (y y y − e e e k ; t) π − k (y y y)P Ω (y y y; t)

(S67)
A − k (y y y; t) := ln π − k (y y y + e e e k )P Ω (y y y + e e e k ; t) π + k (y y y)P Ω (y y y; t) are the affinities of the k-th node. The affinities can be viewed as thermodynamic forces that drive the corresponding probability fluxes [17]. The flux ϕ + k (y y y; t) quantifies the flow of probability from state y y y − e e e k to state y y y, whereas the flux ϕ − k (y y y; t) quantifies the flow of probability from y y y + e e e k to y y y. Note that we can write the master equation (S12) as ∂P Ω (y y y; t) ∂t = k∈K ϕ + k (y y y; t) + ϕ − k (y y y; t) .
We now consider two notions of equilibrium: dynamic and thermodynamic equilibrium. A LMN is said to reach dynamic equilibrium if the joint probability distribution P Ω (y y y; t) tends to a stationary distribution P Ω (y y y) as t → ∞. An irreducible LMN approaches dynamic equilibrium with unique stationary distribution that is strictly positive, asymptotically stable and independent of the initial condition. On the other hand, a LMN is said to reach thermodynamic equilibrium if the affinities A + k (y y y; t) and A − k (y y y; t) tend to zero as t → ∞, for every k ∈ K and y y y ∈ Y. It is clear from Eqs. (S65)-(S68) and Eq. (S69) that thermodynamic equilibrium implies dynamic equilibrium. However, dynamic equilibrium can be reached without the network being in thermodynamic equilibrium. As a matter of fact, a LMN that is in dynamic equilibrium is also in thermodynamic equilibrium only when the following detailed balance conditions are satisfied π + k (y y y − e e e k )P Ω (y y y − e e e k ) = π − k (y y y)P Ω (y y y) π − k (y y y + e e e k )P Ω (y y y + e e e k ) = π + k (y y y)P Ω (y y y), for every k ∈ K, y y y ∈ Y. One can ensure that a LMN reaches thermodynamic equilibrium by constraining the propensities of a LMN to satisfy a set of well-defined constraints, known as the Kolmogorov-Wegscheider conditions; see [17] for details. It turns out that 0 ≤ s Ω = h Ω , where s Ω := lim t→∞ s Ω (t) and similarly for h Ω , with equality only when the network is in thermodynamic equilibrium [17,26]. As a consequence, the rate of entropy production in a LMN at dynamic equilibrium must be equal to the rate of heat dissipation. Moreover, the network is in thermodynamic equilibrium if and only if these rates are zero (i.e., if and only if the network does not produce entropy or dissipates heat).
Living biological systems produce entropy and dissipate heat. As a consequence, these systems must operate away from thermodynamic equilibrium. To check whether this is true for a given LMN model, we can use the maximum absolute affinity at steady-state, given bȳ Then, a LMN model is at thermodynamic equilibrium if and only ifĀ max = 0.

Noise-induced Modes, Stochastic Transitions and Bursting
To explain why a noise-induced mode may appear at the origin of the state-space Y at low population sizes, let T e (y y y) be the mean escape time from a state y y y ∈ Y, defined as the average time required for the LMN to move from state y y y to any other state in Y. Since the fractional activity process is Markovian, governed by the master equation (S12), the time it spends at state y y y is an exponential random variable with rate parameter k∈K [π + k (y y y) + π − k (y y y)], which implies that T e (y y y) = 1 k∈K π + k (y y y) + π − k (y y y) .

(S73)
Clearly, if T e (y y y) = ∞, then the state y y y is absorbing. This means that, once the network reaches y y y, it can never leave that state. As a consequence, we can use T e (y y y * ) to assess the "stability" of a ground state y y y * of the potential energy landscape of the LMN, with higher values of T e (y y y * ) indicating that the state y y y * is more stable. From Eqs. (S10), (S11) & (S73), we have that As a consequence, when λ + k = 0 and λ − k > 0, for every k ∈ K, 1 then T e (y y y) = ∞ only when y y y = 0. If T e (0) = ∞, then the master equation (S12) will have a trivial solution P Ω (y y y; t) = ∆(y y y), since the network is initialized at 0 and it will never move to another state. In this case, the resulting potential energy landscape V Ω (y y y; t) will have a unique global minimum at 0 [as a matter of fact, V Ω (0; t) = 0, whereas V Ω (y y y; t) = ∞, for every y y y = 0].
Since real-world networks must be characterized by non-trivial dynamics, we must have T e (0) < ∞. The fact that, when T e (0) = ∞, the probability distribution P Ω (y y y; t) is concentrated at the origin suggests that a very large but finite T e (0) may be indicative of a probability distribution that assigns high probability to the zero state, creating a noise-induced mode at the origin of Y. As a matter of fact, we expect that the stationary probability distribution P Ω (y y y) to have a peak around y y y = 0 whenever the mean escape time of the fractional activity dynamics from the zero state is sufficiently large and the average time it takes for these dynamics to return to the zero state is small (see pages 100 & 101 in [27]).
In general, Eq. (S74) implies that where ζ k = N k /N . Clearly, reducing the net population size N increases T e (0). It moreover decreases the size of the state-space Y, in which case, there will be fewer states for the network to visit, improving the likelihood of visiting the zero state and thus reducing the average return time to that state. The confluence of these two effects may contribute to the creation of a noise-induced mode at the state of zero fractional activity.
Eq. (S75) shows that, in addition to the net population size N , other system-specific parameters may also influence the mean escape time from the origin. For example, reducing the value of λ + k +φ k (ρ k (0)), for every k ∈ K, will increase the mean time spent at the origin when the LMN reaches complete inactivity.
We should note here that similar arguments can be made to show that, under appropriate conditions, reducing the net population size and tuning system-specific parameters can result in a noise-induced mode at the state 1 of maximum fractional activity. Indeed, from Eq. (S74), we have This implies that, by reducing N and λ − k + γ k (ρ k (1)), for every k ∈ K, we can increase the mean time spent at 1 while improving the likelihood of visiting 1 and thus reducing the average return time to this state. As a consequence, the confluence of these two effects may contribute to the creation of a noise-induced mode at the state of maximum fractional activity.
Let us now define a parameter and assume that the macroscopic equations (S24) have only one stable fixed point µ µ µ * . Note that, when b = 0, the macroscopic equations (S24) imply that the origin of the state-space Y is also a fixed point (albeit an unstable one) for the macroscopic equations. Thus b is a bifurcation parameter, since the macroscopic equations predict that bifurcation takes place at b = 0. Regardless how close the LMN is at the bifurcation point, the macroscopic equations predict that there will be no stable fixed point at the origin, with the dynamics moving away from 0 and towards the stable fixed-point µ µ µ * . On the other hand, our previous discussion implies that, at sufficiently small population sizes, the LMN may behave as if there is a stable fixed point at the origin, in the sense that the network will be operating close to 0 with non-negligible probability. This is also true for small nonzero values of the bifurcation parameter b, since Eq. (S75) implies that the mean escape time from the origin is given by T e (0) = (N b) −1 . This clearly demonstrates that, intrinsic noise present in the network at small population sizes is capable of blurring the bifurcation point from being a single point at b = 0 to a small nonnegative neighborhood of 0 while "stabilizing" the unstable fixed point of the macroscopic equations located at the origin of the state-space Y.
Another way to see how intrinsic noise contributes to the creation and stability of a noise-induced mode is by means of the LNA method. For sufficiently large network sizes Ω, the LNA method predicts that the stationary probability distribution P Ω (y y y) can be sufficiently characterized by a multivariate Gaussian distribution tightly centered around µ µ µ * with most probability mass being assigned over the state-space Y which, for all practical purposes, can be thought of as being continuous. 2 As a consequence, the net probability mass assigned by this distribution over the state-space Y will approximately equal to 1, as expected. However, as the network size Ω decreases, the Gaussian distribution becomes wider around µ µ µ * and may appreciably extend beyond the state-space Y, which will be discrete for small enough Ω. In this case, the net probability mass assigned at values outside the state-space Y will not be negligible, and the net probability mass assigned over Y will be smaller than one. As we explained above, and under appropriate conditions, the Gaussian approximation will begin to break down by placing significant probability mass outside of Y. This may force the stationary distribution P Ω (y y y) to undergo a qualitative change where the lost probability mass may be thought of as being absorbed at the origin, creating a mode in P Ω (y y y) at 0.
From a potential energy landscape perspective, the width of the potential well associated with the peak of the stationary probability distribution at µ µ µ * will increase as Ω decreases, whereas its depth will decrease. This behavior is also influenced by other system-specific parameters that control the steadystate solution C of the Lyapunov equations (S30). On the other hand, the width and depth of the potential well associated with the peak of the stationary probability distribution at 0 will both increase as Ω decreases. If the network parameters are such that the potential well at µ µ µ * is sufficiently wide and shallow, then a state within this potential well will eventually move towards the potential well at 0 with high probability and stay there for an appreciable amount of time before exiting.
The previous discussion provides a clear explanation of the fact that intrinsic noise is an important factor for bursting. In addition to Ω (or N ) and C, this behavior also depends on how far µ µ µ * is from the origin 0, since bursting is clearly better pronounced when µ µ µ * is further away from the origin. In general however this requires a wider potential well at µ µ µ * . Hence, the network size Ω and any other system-specific parameter that affects the steady-state solution µ µ µ * of the macroscopic equations (S24) and the steady-state solution C of the Lyapunov equations (S30) will also affect bursting. This allows the LMN to control bursting by employing alternative strategies.

Defining Avalanches
A common way to define avalanches has originated from work on neural networks, since the emergence of avalanches is a fundamental property of such networks [28,29]. This definition is based on partitioning time into bins T i = [i∆t, (i + 1)∆t), i = 0, 1, . . ., of equal duration ∆t > 0 and associating to each bin T i a frame F i , defined as the portion {X X X(t), t ∈ T i } of the activity process during T i . The frame F i is said to be blank if the activity process X X X(t) is zero within T i ; otherwise, the frame F i is said to be active. Then, an avalanche is defined to be a sequence of consecutively active frames that is preceded and ended by a blank frame [28]. Note that, if where Note that H(t) is the net fractional activity of the activity process X X X(t) at time t. As a consequence, ≥ 0 is a small threshold that dictates the minimum percentage of nodes that can be active in the network in order for the network to be deemed active.
It is common to characterize an avalanche by two parameters: its duration and size. The duration D of an avalanche is defined to be the number of consecutively active frames multiplied by ∆t. On the other hand, its size Σ is simply the number of times within the duration D that an element of the activity process X X X(t) becomes active (switches from 0 to 1).
Unfortunately, the previous definitions depend on the choice of ∆t. Moreover, the definitions are sensitive to arbitrary shifts of the time axis. For this reason, we provide here an alternative definition for an avalanche that is not influenced by the previous factors. In particular, we say that an avalanche occurs within a time window [t, t + τ ), whenever the following three conditions are satisfied: (i) There exist some small dt > 0 such that H(t ) ≤ , for all t ∈ [t − dt, t); i.e., the network is inactive immediately before time t.
Using the previous definition, it is not difficult to see that the duration D is now given by τ , whereas the size is defined as before by replacing D by τ . Note that a given element of the activity process may transition from 0 to 1 multiple times throughout the duration. As a consequence, the size of an avalanche is not limited by N . To account for the effect of population size, it is common to consider the fractional avalanche size Σ/N [29].

Epidemiological Example
In epidemiology, a common model of disease spreading is the SIS model [30]. According to this model, the n-th individual in a directed weighted network G of N interacting individuals is assumed to be in one of two states with respect to a disease at time t: susceptible (S) and infected (I). In this case, the state of the epidemiological system at time t is characterized by a random vector X X X(t) whose n-th element X n (t) takes value 1, if the n-th individual is infected, and 0, if she is susceptible. It is assumed that recovery from infection does not confer resistance to the disease with infected individuals becoming susceptible after recovery. For example, bacterial infection is modeled well by the SIS model, since an individual who recovers (e.g., through the use of antibiotics) is susceptible to re-infection.
In the classical SIS model, infection can only be transmitted from an infected to a susceptible individual. Note however that some infections can be acquired from other sources, such as the environment, animals, terror attacks, or self-infection. For example, individuals may be colonized by bacteria and be healthy for long periods of time until the bacteria suddenly seize the opportunity to pathogenically infect the individual. As a consequence, we choose here to discuss a slightly more general version of the SIS model known as the SISa model [31,32].
Although the SISa model is general enough to describe a variety of infections (e.g., due to illnesses, computer viruses, or social contagions), we focused on a matter of pressing concern to public health: infections due to methicillin resistant staphylococcus aureus (MRSA). We use the SISa model as a simple model of MRSA outbreaks in which bacterial infections, due to self-infection, are possible and the number of individuals spreading MRSA infection can be small, in which case stochastic modeling is necessary [33]. MRSA has been studied in confined swine populations, where more invasive and comprehensive data collection is feasible [34].
We assume that the propensity by which the n-th individual transitions from the susceptible to the infected state depends on a net input r n (x x x) = h n + a a a T n x x x, where h n ≥ 0 is the propensity of the individual to become infected regardless of her social contacts and a a a T n is the n-th row of the adjacency matrix A of the underlying network of infectious social contacts. The element a nn of the adjacency matrix provides the rate at which the n-th susceptible individual will be infected by the n -th infected individual and, as such, it is assumed to be nonnegative; i.e., a nn ≥ 0, for every n, n ∈ N , with a nn = 0, for every n ∈ N . Clearly, r n (x x x) represents the total infectious influence to the n-th individual. As a consequence, we set p + n (x x x) = (1 − x n )r n (x x x). On the other hand, we assume that the propensity of the n-th infected individual to recover is constant, given by − n , which implies that p − n (x x x) = − n x n . To simplify the previous model, we assume one homogeneous population of individuals and study the fraction Y (t) of the population that is infected at time t, given by In this case, one finds that N a nn = w > 0, for every n, n ∈ N such that n = n , whereas h n = η > 0, + n = 0, and − n = λ > 0, for every n ∈ N . This implies an all-to-all connectivity, which can be justified by considering the fact that, in small populations, it is always possible for any two individuals to come in contact with each other. We take w > 0, because sick individuals can only spread infection, whereas we take η > 0, because infections may be acquired independently of the network state.
Because the system is one-dimensional, its state space is reasonably sized. It was therefore possible to numerically solve the master equation (S80) for P Ω (y; t) using the Krylov subspace approximation (KSA) method implemented by the Expokit software package [35]. To do so, we used a tight tolerance parameter of 10 −6 and a value K 0 = 30 for the dimension of the Krylov subspace. We then employed Eq. (S32) to evaluate the potential energy landscape V Ω (y; t) and used the solution to the master equation after 30 years as an approximation to the stationary probability distribution P Ω (y). From this distribution, we numerically evaluated the internal potential energy V Ω and entropy S Ω at steady-state. We then calculated the stationary free potential energy according to A Ω = V Ω −S Ω . We set N 0 = 200, in which case, Ω = N/200. By evaluating A Ω for Ω = 0.005, 0.01, . . . , 1, we computed the stationary pressure P Ω using Eq. (S43) and subsequently the bulk modulus B Ω using Eq. (S44). We approximated all derivatives with respect to Ω using backward differences with ∆Ω = 0.005. When required, we drew sample trajectories from the master equation using the exact Gillespie algorithm [36,37]. Finally, we numerically solved the macroscopic equation (S83) and the Lyapunov equation (S84) using the stiff 'ode23s' solver in MATLAB with the default parameters. This resulted in a unique and stable macroscopic stationary state µ * = 0.4719.
One may be curious about the rather long time scales involved in Fig. 4 of the Main Text. The infection dynamics occur on the order of months (e.g., an infected individual requires an average time of 1/λ = 17.5 days to recover). However, the long-term infection trends (i.e., outbreaks of infections that eventually die out, only for another outbreak to occur) take place on the time span of multiple years. Recall that the system is initialized with all individuals susceptible and none infected, and therefore nothing happens in the model until an infection is acquired from the environment. This process is captured by the parameter η which is very small, indicating that we are modeling a system where the environment (e.g., the pig pen) is kept relatively clean. Quantitatively, one sees from Eq. (S81) that π + (0) = N η, and thus the start of a new outbreak is a rare-event for small N and η. Therefore, over a time frame of many years, one may observe multiple outbreaks of MRSA infections.

Neural Network Example
A model that fits well within our framework has been put forth in the literature to explain biological neural networks [16]. This model is based on an interconnected directed weighted network G of N neurons in a set N = {1, 2, . . . , N } that can exist in one of two distinct states: an active state, during which a neuron fires an action potential, 3 and a quiescent state, during which a neuron is at rest. In this case, the state of the neural system at time t is characterized by a random vector X X X(t) whose n-th element X n (t) takes value 1, if the n-th neuron is active at time t, and 0 otherwise.
Here, we study the stochastic behavior of a group of interacting neurons embedded within a larger neural network [38]. We assume that, when the n-the neuron is inactive, it is driven to become active by a net input r n (x x x) = h n + a a a T n x x x, where h n > 0 is the external input to the neuron that quantifies the influence of surrounding neurons and external environmental factors, and a a a T n is the n-th row of the adjacency matrix A of the network. If no external input is present, h n may be chosen to account for the (small) rate at which a neuron might fire independently of the incoming synaptic input. The element a nn of the adjacency matrix A provides the synaptic weight from the n -th to the n-th neuron. If a nn > 0, then the n -th neuron excites the n-th neuron making it more likely to spike, whereas, if a nn < 0, then the n -th neuron inhibits the n-th neuron making it less likely to spike. Finally, a nn = 0 indicates that the n-th neuron has no synaptic input from the n -th neuron. It is assumed that a neuron does not regulate itself, in which case a nn = 0, for n ∈ N .
The propensity by which the n-th neuron transitions from the quiescent to the active state is assumed to monotonically depend on the total synaptic input r n (x x x) to the neuron by means of a function f n (r) = r > 0 tanh(r), where r > 0 is the Iverson bracket, taking value 1, if r > 0, and 0 otherwise. In this case, p + n (x x x) = (1 − x n ) r n (x x x) > 0 tanh(r n (x x x)), where we take + n = 0. On the other hand, the propensity of the n-th neuron to transition from the active to the quiescent state is assumed to be a constant − n , regardless of the system state, which implies that p − n (x x x) = − n x n . We simplify the previous model by assuming that the neural network under consideration consists of two homogeneous populations N 1 and N 2 of excitatory and inhibitory neurons, respectively. The fractional activity process Y Y Y (t) is now two-dimensional, with elements Y 1 (t) and Y 2 (t) given by Due to homogeneity, we find that N 1 a nn = w 11 > 0, for every n, n ∈ N 1 such that n = n , N 2 a nn = w 22 < 0, for every n, n ∈ N 2 such that n = n , N 1 a nn = w 21 > 0, for every n ∈ N 2 , n ∈ N 1 , and N 2 a nn = w 12 < 0, for every n ∈ N 1 , n ∈ N 2 . Moreover, h n = η 1 > 0, for every n ∈ N 1 , h n = η 2 > 0, for every n ∈ N 2 , + n = 0, − n = λ 1 > 0, for every n ∈ N 1 , and + n = 0, − n = λ 2 > 0, for every n ∈ N 2 . Note that the implied all-to-all connectivity is a common assumption in the neuroscience literature [16,39].
It is usually justified by noting that some regions of the brain are comprised of neurons that are highly interconnected among themselves.
When η + w e + w i > 0, we have that 0 → 1 by following a sequence of state transitions whereby all excitatory neurons become active one-by-one and all inhibitory neurons become active one-by-one. As a consequence, Y Y Y (t) will be irreducible in this case according to Proposition 2.
By following [16], we set λ = 0.1 ms −1 and η = 0.001. We also defined two new parameters w s and w d , given by w s := w e + w i and w d := w e − w i .
Note that w s < w d , since w i < 0. Moreover, Y Y Y (t) is irreducible for w s + η > 0. It turns out that the steady-state solution of the macroscopic equations (S90) & (S91) depends only on w s , whereas bursting is controlled by the value of w d . By following [16], we set w s = 0.2 and study the behavior of the neural network for various values of w d . We start with w d = 0.3, in which case the network is not balanced (i.e., w s w d ), and proceed to examine the effect of w d . To do so, we numerically solved the corresponding master equation for the joint probability distribution P Ω (y 1 , y 2 ; t) using the KSA method with tolerance parameter of 10 −30 and a value K 0 = 50 for the dimension of the Krylov subspace. We took the value of the tolerance parameter to be appreciably smaller than in the case of the SISa model in order to effectively deal with the increased dimensionality of the state-space Y. Moreover, we took the value of K 0 to be larger than the one used in the case of the SISa model in order to effectively deal with the increased cardinality of Y. Similarly to the case of the SISa model, we employed Eq. (S32) to evaluate the stationary potential energy landscape V Ω (y 1 , y 2 ). We set N 0 = 200, in which case, Ω = N/200, and used the solution to the master equation at 2,000 ms as an approximation to the stationary probability distribution P Ω (y 1 , y 2 ), since we noticed that the neural network is approximately at steady-state after that time. By using this distribution, we numerically evaluated the internal potential energy V Ω and entropy S Ω at steady-state. We then calculated the stationary free potential energy according to A Ω = V Ω − S Ω . By evaluating A Ω for Ω = 0.01, 0.02, . . . , 1, we computed the stationary pressure P Ω using Eq. (S43) and subsequently the bulk modulus B Ω using Eqs. (S44). We approximated all derivatives with respect of Ω using backward differences with ∆Ω = 0.01. 4 When required, we drew sample trajectories from the master equation using the exact Gillespie algorithm. Finally, we numerically solved the macroscopic equations (S90) & (S91) and the corresponding Lyapunov equations using the stiff 'ode23s' solver in MATLAB with the default parameters, which resulted in a unique and stable macroscopic stationary state µ µ µ * = (µ * 1 , µ * 2 ) T = (0.5032, 0.5032) T .