Short-term synaptic depression can increase the rate of information transfer at a release site

The release of neurotransmitters from synapses obeys complex and stochastic dynamics. Depending on the recent history of synaptic activation, many synapses depress the probability of releasing more neurotransmitter, which is known as synaptic depression. Our understanding of how synaptic depression affects the information efficacy, however, is limited. Here we propose a mathematically tractable model of both synchronous spike-evoked release and asynchronous release that permits us to quantify the information conveyed by a synapse. The model transits between discrete states of a communication channel, with the present state depending on many past time steps, emulating the gradual depression and exponential recovery of the synapse. Asynchronous and spontaneous releases play a critical role in shaping the information efficacy of the synapse. We prove that depression can enhance both the information rate and the information rate per unit energy expended, provided that synchronous spike-evoked release depresses less (or recovers faster) than asynchronous release. Furthermore, we explore the theoretical implications of short-term synaptic depression adapting on longer time scales, as part of the phenomenon of metaplasticity. In particular, we show that a synapse can adjust its energy expenditure by changing the dynamics of short-term synaptic depression without affecting the net information conveyed by each successful release. Moreover, the optimal input spike rate is independent of the amplitude or time constant of synaptic depression. We analyze the information efficacy of three types of synapses for which the short-term dynamics of both synchronous and asynchronous release have been experimentally measured. In hippocampal autaptic synapses, the persistence of asynchronous release during depression cannot compensate for the reduction of synchronous release, so that the rate of information transmission declines with synaptic depression. In the calyx of Held, the information rate per release remains constant despite large variations in the measured asynchronous release rate. Lastly, we show that dopamine, by controlling asynchronous release in corticostriatal synapses, increases the synaptic information efficacy in nucleus accumbens.

Let X be a discrete random variable with a finite sample space {x 1 , x 2 , ..., x n }. The entropy of X, denoted by H(X), is the amount of uncertainty about the value of X and is calculated by where P (.) is the probability measure. For two discrete random variables X and Y , the conditional entropy of Y given X, denoted by H(Y |X), describes the remaining uncertainty about the value of Y provided that the value of X is known. The conditional entropy is derived from The mutual information between the two random variables X and Y is defined by and quantifies the amount of information that can be obtained from X about Y . The notions of entropy and mutual information are extended to random processes as well. Let X = {X i } ∞ i=1 be a discrete time random process, where X i is the random variable corresponding to the value of X at time i. We represent by X n the first n instances of X, X n (X 1 , X 2 , ..., X n ) if n > 0 0 if n ≤ 0 (4) The entropy rate of X is defined by if the limit exists. The mutual information rate of two random processes provided that the limit exists. Assume that the random processes X and Y are the input and output of a communication channel. Let E i be the (random) amount of energy that is consumed by the channel at time i. The energy-normalized information rate of the channel is defined by where E(.) is the expected value.

B. Recovery time constant and synaptic information efficacy
The speed of recovery from depression modulates the rate of information transfer at a release site. Slower recovery expands the impact range of the release history and consequently, increases the effective memory length of the release site (Fig. S1A). For a synapse with a recovery time constant of τ = 100 msec (corresponding to e = 0.1), the relative variation of the mutual information rate caused by different initial states (various seed values u 0 in the algorithm in Fig. 1B) reduces to 10% after 160 msec. The effective memory length is reduced to 70 msec for a synapse with faster recovery, τ =28 msec (equivalent to e = 0.3).
Faster recovery increases both the mutual information rate and energy-normalized information rate of the release site. (Fig. S1B). The mutual information rate changes substantially by the variations of recovery coefficient while the energy-normalized information rate is relatively robust. Specifically, by increasing the recovery coefficient, the capacity of the release site is attained at higher input spike rates. But the input spike rate that results in the optimal energy-normalized information rate is practically independent of the recovery time constant. From Fig. S1B and Fig. 3B, we conclude that the release sites with different depression dynamics can work at their optimal energy-rate regime with the same input spike rate.
If the recovery of the synchronous spike-evoked release is faster than the recovery of asynchronous release, then depression can increase the mutual information rate (Fig. S1C) and energy-normalized information rate of the release site (Fig. S1D). We show that the differences in recovery coefficient among synapses (and release sites) create three distinct functional categories for short-term depression (Fig. S1E).

C. Model parameters
We set the parameters of the MRO model by establishing a correspondence with an updated version of the stochastic model of depression [1]. The release probability (synchronous or asynchronous) follows a first-order differential equation in the stochastic model, where p r , τ, p 0 , u, and t r are the release probability, recovery time constant, default (maximum) release probability, depression coefficient and the release timing.
In the absence of release, the release probability recovers exponentially to its default value. Assuming that the release probability at time t = 0 is p in , Correspondingly, if we assume that in the MRO model, the release probability at time index i = 0 is p in , then it can be easily shown that after k steps of recovery (k successive quiescent intervals), where e is the recovery coefficient of the MRO model. The discrete time k and the continuous time t are related through the time unit, ∆, of the MRO model, To have similar recovery dynamics in the two models, and by substituting k from (11), This equation shows the relationship between the recovery coefficient of the MRO model and the recovery time constant of the synapse. For example, if the recovery time constant of a synapse is τ = 100 msec, then for a time unit of ∆ = 10 msec, the recovery coefficient of the MRO model should be e = 0.1.
After a release at time t r , the release probability of the synapse (described by (8)) decreases to Therefore, the depression multiplier of the MRO model, c, can be derived from However, estimation of c from (15) leads to an estimation bias, due to the slight recovery of release probability during a single time bin. Using the approximation and assuming a uniform distribution for p r , we can derive a more accurate estimation for the depression multiplier: The memory length, L, of the MRO model represents the number of previous release outcomes that are used to determine the current release probability of the synapse. We define the effective memory length of a synapse, L ef f , as the minimum value of L for which the mutual information rate of the synapse becomes independent from its past (characterized by the seed value in the algorithm in Fig. 1B). We can find the effective memory of a synapse (in milliseconds) by L ef f × ∆. For example, if the recovery time constant of a synapse is 100 msec (e = 0.1) and its depression coefficient is u = 0.67 (c = 0.5), then the effective memory of the synapse is approximately 160 msec (corresponding to L ef f = 16).
We use the context tree weighting algorithm [2] to calculate numerically the information rate of the synapse in a classical, stochastic model of depression [1]. In Fig. S2, we show that by increasing the memory length, L, the analytical mutual information rate of the MRO model converges to the numerical information rate estimates of the classical stochastic model of depression.

D. Proof of Theorems
Proof of Theorem 1: By definition where Using the chain rule [3], For integer values a, b, we define The random variable Y i depends on the release probabilities at time i, p i and q i , and the input spike variable at time i, X i . Since p i and q i are functions of ( Figure S2: Comparison between the analytical mutual information rate of the MRO model and the numerical estimation of the information rate of a classical stochastic model of depression [1]. The mutual information rate is plotted as a function of synchronous spike-evoked release probability (p 0 ) for various values of memory length, L, and input spike rates. The recovery time constant of the stochastic model is 100 msec and the corresponding recovery coefficient of the MRO model is e = 0.1. In this synapse model, the asynchronous release probability is zero, and the release site is inactivated after each release (i.e., c = 0).
Applying the chain rule to H(Y n |X n ), With a similar argument, From (19), (22) and (24), and based on the definition of conditional mutual information, The sample set of Y i−1 i−L consists of all the binary vectors of length L. For the sake of notational simplicity, instead of the binary vector ( Let R j represent the mutual information rate of the release site at state j. By definition, The release probabilities of the release site at state j, denoted by p(j) and q(j), are calculated from the algorithm in Fig. 1B. It can be easily shown that Each state of the release site can transit to two other states and the transition probabilities are fully determined by the current state (Fig. 1D). Therefore, a Markov chain is used to model the state transitions of the release site. The transition matrix of the Markov chain, denoted by M , is a 2 L × 2 L matrix and has two non-zero entries on each row. The pattern of the non-zero entries of M is shown in Fig. S3A. It is shown in [4] that for an irreducible aperiodic finite-state Markov chain, regardless of the initial state, the probability of each state j will converge to a steady-state probability, denoted by π j . We prove in Lemma 1 that the Markov chain of the release site in the MRO model is irreducible and aperiodic. Therefore, By interchanging the summations in (27), Since We then have Using (30) and the Cesàro mean theorem, Finally, from (34) and (35), We note that the stationary probability vector − → π = (π 0 , ..., π 2 L −1 ) is calculated using the power method.
We start by a random probability vector − → x 0 and in each iteration i ≥ 0, we calculate Then we substitute − → x i with − − → x i+1 and repeat (37). It is easily shown that the probability vector − → x i converges to − → π .
Lemma 1. In the MRO model, the markov chain of the release site is irreducible and aperiodic.
Proof. Let j and j be two arbitrary states of the release site corresponding to the binary vectors (a 1 , a 2 , ..., a L ) and (b 1 , b 2 , ..., b L ). We show that the state j is always accessible from the state j in the Markov chain M .
Assume that the Markov chain is in the state j at time i = 1. The release site can transit to the state (a 2 , a 3 , ..., a L , b 1 ) with a non-zero probability P 1 (b 1 ). Similarly, at each time i, 1 ≤ i ≤ L, the release site can transit from the state (a i , ..., a L , b 1 , ..., b i−1 ) to (a i+1 , ..., a L , b 1 , ..., b i ) with the non-zero probability P i (b i ). Therefore, the probability of transition from (a 1 , a 2 , ..., a L ) to (b 1 , b 2 , ..., b L ) after L time steps is greater than or equal to L i=1 P i (b i ). This proves that the state j is accessible from the state j, and consequently, M is irreducible. Moreover, there is a non-zero transition probability from the state j = 0 to itself. Since every irreducible finite-state Markov chain with a self-loop is aperiodic [4], we conclude that M is aperiodic and the proof is complete.
Proof of Theorem 2: The energy-normalized information rate of the release site is defined by (refer to Section A): where E i is the energy consumed by the release site to release a vesicle at time i. By assumption, one unit of energy is consumed at each release. Therefore, and From (38), (40) and Theorem 1, .

E. Quantized release probabilities
In the MRO model, the release probabilities of the release site at time i are determined by the last L release outcomes, Y i−1 i−L . Alternatively, the release probabilities at time i can be derived recursively from the release probabilities and the release outcome at time i−1. In this recursive approach, the state of the release site at time i is specified by the pair (P i , Q i ), where P i and Q i are the random variables corresponding to the synchronous spike-evoked and asynchronous release probabilities. Since P i and Q i are continuous variables, the number of states goes to infinity by increasing i. To avoid the limitations of infinite-state models, we quantize the release probabilities. For a quantization level of δ, the sample space of release probabilities is defined by where [.] is the floor function.
Let [x] S represent the largest entry in S that is less than or equal to x, i.e., [x] S = max{y : y ∈ S, y ≤ x}.
Also assume that p 0 , q 0 ∈ S are the default (maximum) synchronous spike-evoked and asynchronous release probability of the release site. The quantized release probabilities at time i + 1 are calculated recursively from and We refer to this model as the binary asymmetric channel with Quantized Release Probabilities, abbreviated by QRP. We note that since synchronous spike-evoked and asynchronous release probabilities do not exceed p 0 and q 0 , the sample space of (P, Q) can be reduced from S × S to S P × S Q , where  (Fig. S3B). For each state (p, q), the stationary probability, π (p,q) , is calculated using the power method. Also, the mutual information rate of the binary asymmetric channel, R (p,q) , is derived from (53) D be the mutual information rate and energy-normalized information rate of the release site in the QRP model. Then (p,q)∈P S ×Q S R (p,q) π (p,q) (p,q)∈P S ×Q S (αp + αq)π (p,q) . (55) be the input and output random processes of the QRP model (the top panel in Fig. S3B). By definition, where Using the chain rule for H(Y n ) and H(Y n |X n ), The vector of release probabilities, (P i , Q i ), can be calculated from Y i−1 . Also, given X i and Similarly, given Hence, From the definition of conditional mutual information, The term I X i ; Y i |(P i , Q i ) = (p, q) is the mutual information rate of the binary asymmetric channel with release probabilities p and q, which is denoted by R (p,q) . Therefore, Together with (56), (63) and (64), By interchanging the summations and moving the limit inside, The state of the release site at time i is given by (P i , Q i ) and the state transitions of the release site are modeled by a Markov chain with a transition matrix M of order |P S | × |Q S |. We prove in Lemma 2 that the Markov chain M is uni-chain and its recurrent class is aperiodic. Therefore, it has a unique stationary distribution and the probability of each state (p, q) converges to its stationary probability π (p,q) [4]. That is Applying the Cesàro mean theorem to (68), Finally, similar to the proof of Theorem 2, . ( In the QRP model, Hence, lim Proof. To show that the transition matrix of the QRP model is uni-chain, we need to prove that there exists only one recurrent class in M and the other states (if any) are transient. Let (a 0 , b 0 ) = (0, 0) be an arbitrary state in M . Consider the path (a 0 , b 0 ) → (a 1 , b 1 ) → (a 2 , b 2 ) → ... in which every state (a i , b i ) transits to its depressed state, i.e., (a i+1 , b i+1 ) = ([ca i ] S , [db i ] S ). As long as (a i , b i ) = (0, 0), the transition probability to the depressed state, (a i+1 , b i+1 ), is positive. Moreover, if a i > 0 then a i+1 < a i , and if b i > 0 then b i+1 < b i . Since the number of states in the Markov chain is finite and the sequences (a i ) i∈N and (b i ) i∈N are monotonically decreasing to zero, there will be a large enough integer N such that (a N , b N ) = (0, 0). Therefore, there is a path from (a 0 , b 0 ) to the state (0, 0) in M . This implies that the state (0, 0) is accessible from every state in the Markov chain. Now assume that there are two recurrent classes C 1 and C 2 in the Markov chain. Since the states in C 1 have access to (0, 0), from the definition of recurrent states, (0, 0) ∈ C 1 . With a similar argument, (0, 0) ∈ C 2 . Therefore, C 1 = C 2 and there is only one recurrent class in M , meaning that M is a uni-chain. Now we show that the period of the recurrent class is equal to one. Since the transition probability to the recovered state is always positive in the Markov chain M , we can consider the path (a 0 , b 0 ) = (0, 0) → (a 1 , b 1 ) → (a 2 , b 2 ) → ... in which every state (a i , b i ) transits to its recovered state, i.e., ( . It is clear that for each i, a i+1 ≥ a i and b i+1 ≥ b i . Since the number of states is finite, there exists a finite integer N such that b Therefore, the state (a N , b N ) transits to itself with probability αa N + α b N . On the other hand, (a N , b N ) is accessible from (0, 0), meaning that it belongs to the recurrent class. Since a recurrent state with a loop is aperiodic [4], we conclude that (a N , b N ), and consequently the recurrent class of M , is aperiodic and the proof is complete.

F. Comparison between the two models of short-term depression
We presented two models to calculate the mutual information rate of the release site during short-term depression: the MRO model and the QRP model. The mutual information rates and energy-normalized information rates of the two model are similar (compare Fig. S4A to Fig. 3B). We show in Fig. S4B that the relative difference between the calculated rates of the two models is negligible. Each model, however, has its own advantages and disadvantages. The state of the release site in the MRO model is a binary vector of length L, which corresponds to the last L release outcomes. The Markov chain of the release site consists of 2 L states and its transition matrix, M , grows exponentially with memory length. The pattern of non-zero entries in M is always fixed and does not depend on depression dynamics (Fig. S3A). In contrast, the state , and the dashed lines show the relative difference of energy-normalized information rates, . The parameters of the two models are similar to (A).
space of the QRP model consists of the quantized release probabilities, which is modeled by a Markov chain, M , of order |S P |×|S Q |. Therefore, the size of the Markov chain in the QRP model can be much smaller than that of the MRO model. This will decrease the computational resources that are required for calculation of information rate in the QRP model. However, the pattern of non-zero entries of M varies with depression coefficients (Fig. S3B), making it more difficult to achieve further analytical advances.

G. Distinct pools of vesicles
Studies indicate that asynchronous and synchronous release rely on the same pool of the vesicles, whereas spontaneous release may draw on a distinct pool.
Here we consider the hypothetical scenario in which the rate of spontaneous release is similar to that of asynchronous release to explore the consequences of having distinct pools of vesicles. Although our framework is based on the notion of a shared pool of vesicles, it can be slightly modified to comprise these hypothetical cases as well. We calculate average depression multipliers,c andd, by marginalizing c and d over the release modes:c where p a is the average synchronous release probability, q a denotes the average asynchronous plus spontaneous release probability, and r determines the ratio of the number of asynchronous releases to the total number of releases in the inter-spike intervals. The average depression multipliers substitute c and d in the model and simulate the impact of distinct pools of vesicles.
We study the extreme case for which the rates of spontaneous release and asynchronous release are identical, r = 0.5. If spontaneous releases come from a distinct pool of vesicles, the release of a vesicle after an action potential depresses both synchronous and asynchronous release, but does not change the spontaneous release probability. For releases occurring in the inter-spike intervals, if the released vesicle belonged to the separate pool of spontaneous release, only the probability of spontaneous release is reduced Figure S5: Information efficacy of a synapse with separate pools of vesicles. The mutual information rate of a synapse with a distinct pool of vesicles for spontaneous release is calculated numerically using context tree weighting algorithm (filled circles). The average depression multipliers of the MRO model (with a shared pool of vesicles) are estimated and the mutual information rate of the synapse is calculated (solid lines). Information rates are plotted as a function of q 0 (the summation of asynchronous and spontaneous release probabilities) for different depression multipliers of synchronous release, c. The other parameters are d = 0.5, a = 0.2, p 0 = 0.7, and e = f = 0.1. and the other modes of release are not affected; otherwise, depression reduces the probabilities of synchronous and asynchronous release, but does not impact spontaneous release.
The mutual information rate of the synapse is calculated numerically using the context-tree weighting algorithm [2]. We then update the MRO model with average depression multipliers and calculate the mutual information rate of the synapse. As seen in Fig. S5, even in this extreme case (with an unrealistically high rate of spontaneous release), the mutual information rate of the synapse with a distinct vesicle pool for spontaneous release can be precisely estimated by the MRO mode (with a shared pool of vesicles), provided that depression multipliers are marginalized over the release modes.
We should note that the same simulation can be used to study a hypothetical synapse in which half of the vesicles of asynchronous release are supplied from a separate pool of vesicles. Our results show that the MRO model, equipped with average depression multipliers, can provide an accurate estimation of the information efficacy of synapses with partially distinct pools of vesicles.