Astrocytes as a mechanism for contextually-guided network dynamics and function

Astrocytes are a ubiquitous and enigmatic type of non-neuronal cell and are found in the brain of all vertebrates. While traditionally viewed as being supportive of neurons, it is increasingly recognized that astrocytes play a more direct and active role in brain function and neural computation. On account of their sensitivity to a host of physiological covariates and ability to modulate neuronal activity and connectivity on slower time scales, astrocytes may be particularly well poised to modulate the dynamics of neural circuits in functionally salient ways. In the current paper, we seek to capture these features via actionable abstractions within computational models of neuron-astrocyte interaction. Specifically, we engage how nested feedback loops of neuron-astrocyte interaction, acting over separated time-scales, may endow astrocytes with the capability to enable learning in context-dependent settings, where fluctuations in task parameters may occur much more slowly than within-task requirements. We pose a general model of neuron-synapse-astrocyte interaction and use formal analysis to characterize how astrocytic modulation may constitute a form of meta-plasticity, altering the ways in which synapses and neurons adapt as a function of time. We then embed this model in a bandit-based reinforcement learning task environment, and show how the presence of time-scale separated astrocytic modulation enables learning over multiple fluctuating contexts. Indeed, these networks learn far more reliably compared to dynamically homogeneous networks and conventional non-network-based bandit algorithms. Our results fuel the notion that neuron-astrocyte interactions in the brain benefit learning over different time-scales and the conveyance of task-relevant contextual information onto circuit dynamics.

As stated in the main text, we can reveal the signal flow in the tripartite synapse more apparently via the symbolic description.We denote the activities associated with the neurons and the astrocyte by symbols V 1 N , V 2 N , and V A respectively, and the synaptic efficacy is V S .Then, the interplay between astrocytes and neuronal elements can be represented by different arrows as shown in Fig 1, where two instrumental feedback-loops are identified in the flow.In the first loop, the signal flows from Neuron 1 to Neuron 2 through the synapse, and Neurons 1 and 2's signals act on the synapse's efficacy (so-called Hebbian plasticity); Meanwhile, Neurons 1 and 2's signals also affect the astrocyte's activity and in turn acts on the synaptic efficacy, and thus form the second loop based on the signal flow from Neuron 1 to Neuron 2. It is noticed that the neurons 1 and 2's signals together act on the synapse and astrocyte.This integration of two signals is drastically different from two separated signals and thus forms a high-order interaction.

B Graphical description of neuron-astrocyte population as a hypernetwork
We have claimed that the neuron-astrocyte population can be described by a twolayer hypernetwork, and now we will introduce a formal graphical description of this hypernetwork.The brain is usually described as a network where nodes represent neurons and (directed) edges between nodes denote synaptic connections, which can be inhibitory or excitatory.In network theory, a generic (mono-layer) network can be defined by a graph G := (N, E), where N = {1, ..., n} is the set of nodes; E = {(i, j)| i, j ∈ N and are connected} is the set of edges.We consider the directional and weighted graph.That means an edge in E, e.g., (i, j), has the direction from node i to node j and owns a weight w ij ∈ R. In addition, a network can have multiple layers (neural and astrocytic) that can have the same or different nodes [1].When considering a large number of neurons and astrocytes presenting in the brain system, we need to separate neurons and astrocytes as two groups because of their natural differences.In this regard, we prescribe that the neuronastrocyte populations have two different layers, with each layer representing the group of neurons and astrocytes respectively.And they are denoted by the graphs G n = {N n , E n } and G a = {N a , E a } respectively.
On the other hand, an edge only connects two nodes and thus describes the pairwise interaction.Because of the presence of high-order interaction within the tripartite synapse, the interconnections between the two layers cannot be represented by normal edges.We then extend it to hyperedges that can connect any number of nodes.A hyperedge H i is defined as a subset of N satisfying H i ̸ = ∅.
Then, the whole neuron-astrocyte network can be denoted by the hypernetwork M = {G n , G a , {H i }}.This hypernetwork contains two layers, i.e, the neural layer and the astrocytic layer.The neural layer consists of all the neurons and the intralayer network structure is consistent with the standard directional neural network; the astrocytic layer includes the astrocytes and it embeds the directional network structure as well.Edges that connect a node to itself (self-loops) are excluded for both layers.Suppose there are neurons interacting with an astrocyte.In that case, an interlayer connection takes place therein, and we use a hyperedge to represent it: each hyperedge (the triangle shape in Fig 1B of Main Text) connects one astrocyte and two neurons.Noting that the hyperedge essentially includes the edges connecting the two neurons which represent the plastic synapses.In this sense, the defined hyperedge can well capture the high-order interactions be-tween two neurons, the astrocyte and synapse.As a result, the neuron-astrocyte ensemble structure is well represented by this two-layer hypernetwork.

C Well-definiteness of the neuron-astrocyte network model
From the mathematical point of view, it is important to check if the neuronastrocyte network model is posed and defined correctly.Our model is given by a set of continuous-time ODEs.For such a system, it is well-posed if the solution to an initial value problem exists and is unique.The well-posedness is guaranteed if the vector field of the model is Lipschitz continuous in the variables and continuous in time [2].It is well-known that common activation functions, such as the logistic sigmoid and tanh are Lipshitz.In the presumption that the external inputs are continuous in time, our model is well-posed.
The model is also well-defined in the sense that the dynamics will not expand without restriction but will be confined in an appropriate subspace in the real space for any initial conditions after some certain time as shown in Section 4.Moreover, we can approximately estimate this region using mathematical arguments as shown in the following.

D Normal analysis of network motif dynamics
As the neuron-astrocyte model in this work is proposed for the first time, it is necessary to conduct a normal analysis including studying the boundedness and fixed points conditions.The network is composed of network motifs, it is enough to consider the minimal model on the network motifs.Boundedness Note the network motif dynamics are given by We first define the boundedness of a general dynamical system without inputs.Definition 1.Given a dynamical system where f : R n → R n is continuous.Let x(t), t ≥ 0 be a trajectory of the system.x(t) is said to be ultimately bounded if there exist M > 0 and T > 0 such that ∥x(t)∥ ≤ M for all t ≥ T .Moreover, the system is said to be ultimately bounded if all trajectories are ultimately bounded.
For system (1) without external inputs, although the vector field are defined for R 5 , we can show that the dynamics are indeed bounded after some certain time.Let X = (x 1 , x 2 , x 3 , w 1 , w 2 , z) ⊤ .As the activation functions are bounded, we denote the maximum values of ϕ(•) and ψ(•) by M 1 > 0 and M 2 > 0 respectively.Define the set Theorem 1.In the absence of external inputs, the dynamics of (1) are ultimately bounded in the set Ω.
Proof: Let z(t) be the solution of the differential equation One can obtain that z(t) = (z(0) − |h|M 2 1 /e) exp(−et) + |h|M 2 1 /e, which yeilds As e > 0, the first term on the right hand side of the above inequality converges to zero exponentially.Therefore, there exist a time T > 0 such that |z(t)|≤ h|M 2 1 /e for all t > T .On the other hand, from the last equation of (1), we have ż ≤ ż.Then, by the comparison lemma [3, Lemma 3.4], we get |z(t)|≤ |z(t)|.It follows that there exists a time T 1 such that |z(t)|≤ h|M 2  1 /e for all t > T 1 .Analogously, we can derive the bound for the other variables as defined in Ω.Finally, we prove that the dynamics of (1) are ultimately bounded in the set Ω.
Along with the Theorem of boundedness, we have some remarks.
1. Here, we focus on the autonomous system.With the proper condition that the inputs are bounded, one can show the boundedness of the system when external inputs are presented [4].
2. The set Ω is positive invariant and attractive with respect to (1).Intuitively, by the definition of limit points [5], all the positive limit points of system (1), such as fixed points and limit cycles, must be included in the attractive set Ω.This property helps obtain the following results about fixed points.
Fixed points With the boundedness in hand, we can study the fixed points of the system.Continue considering the system (1) without external inputs.Letting the right hand sides of (1) be zero results in the following equations Each of the above equations defines a nullcline in R 5 , and together their solutions (intersections of nullclines) yield the fixed points.First, let us consider the existence of fixed points, which can be proved easily by using the Brouwer's fixed point theorem [6].Define the mapping Theorem 2. System (1) has (at least) a fixed point in Ω.
Proof: It is easy to check that the defined mapping F is continuous.To prove the existence of fixed points, according to Brouwer's Fixed-point Theorem, we only need to show the set Ω is compact and convex.Since Ω is bounded and closed, the compactness follows.In addition, the set Ω actually defines a hyper rectangle in R 5 , which is convex.Therefore, we can conclude that there exists (at least) one fixed point of system (1) in Ω.
Next, we examine the uniqueness of the fixed point in (1).The Jacobian of F is given by To ensure that Eqs. ( 2) have a unique solution in the previously obtained set Ω, one sufficient condition is that the inequality holds, where ∥•∥ denotes the matrix norm.We take the 1-norm of DF , i.e., the maximum of the absolute values sum of the rows (5) Note that all the variables and the derivatives of activation functions are bounded.It is always possible to find such conditions that (5) is less than 1 for all points in Ω.To showcase, let us consider the case where ϕ( where Therefore, one has ∥DF ∥ < 1 for all X ∈ Ω if the following conditions are satisfied (6) Then, it follows that the mapping F is a contraction mapping in Ω [7].A contraction mapping has the property of admitting a unique fixed point as stated in Banach's fixed point theorem [7].According to this theorem, F has a unique fixed point in Ω , which implies the system (1) has a unique fixed point as stated in the following theorem.Theorem 3. When (6) holds, system (1) has a unique fixed point in the defined domain, and this fixed point is located in the set Ω.
Remark 2. In the above process, we used the 1-norm of DF and arrived at the sufficient conditions (6).Of course, one can use other norms and thus obtain different sufficient conditions for the uniqueness of the fixed point.
Next, to go beyond the single fixed point, we investigate the conditions for the existence of multiple fixed points.As the involvement of so many parameters and uncertain activation functions, it is difficult to fully and analytically characterize the parameter conditions for the existence of multiple fixed points.For simplicity, we restrict to the case of sigmoid and hyperbolic tangent activation functions, i.e., ϕ(x) = 1/(1 + e −x ) and ψ(z) = (e z + e −z )/(e z + e −z ).
In ( 2), we substitute the third and fourth equations to the first two, and arrive at the following set of equations with the reduced dimension Now we presume that the values of variables x 1 , x 2 and z are large so that ϕ(x 1 ) = 1/(1+e , and ψ(z) = (e z +e −z )/(e z + e −z ) = 1 − σ 3 where 0 < σ 1 , σ 2 , σ 3 < 1 are small.In doing so, (7) yields To ensure that the solution given by ( 8) is one fixed point of system (1), the following equalities should also be satisfied Now, the question turns to be finding a collection of parameters a 1 , ..., h such that (9) holds with σ 1 , σ 2 , σ 3 > 0 being very small.We further simplify this problem by assuming that Substituting (8) results in It then yields The right hand side of ( 11) is in the range R + and is monotonically increasing for σ ∈ (0, 1).Note that a 1 , b 2 , c 2 , d 2 , e, h are free parameters.When they are all positive, the left hand side of (11) is in the range (0, ) and is monotonically decreasing for σ ∈ (0, 1).That means the two sides must have one intersection for σ ∈ (0, 1).In this case, there will be a unique fixed point.On the other hand, such many free parameters can give rise to other possibilities.We can fix some parameters to be constant values, e.g., a 1 b 2 = 0.1, d 2 = e = 1, h = 1.It can be calculated that when −1.5595 < c 2 < −1.1307, the two sides of (11) will always have intersections as shown in Fig 2B, which results in two fixed points for the system.It can be expected that when we release these restrictions on the parameters, it is easier for the system to have more fixed points.
Remark 3. We have shown in the above how the system can admit two fixed points analytically and numerically.The case of multiple fixed points is not rare because of the many parameters, and an example of 3 fixed points can be obtained under some conditions as in

E Singular perturbation analysis
By a change of time t ′ = τ t, we can rewrite the network motif dynamics without external inputs into where ′ = d/dt ′ denotes the differentiation with respect to t ′ .Because of the nature of the small value of τ , (1) (or (12)) indeed defines a perturbation problem.The singular perturbation theory [8] has been developed to solve such problems in the past few decades.In the following, we turn to the analysis of system (1) from the singular perturbation perspective.
By setting τ = 0 in (12), we obtain the singular limit of the system, i.e., (13) In the above system, the derivative of z is zero.Intuitively, one can consider that the z-variable is fixed as in initial conditions, i.e., z(t) = z 0 ∈ R. It results in the flowing truncated system The above system captures the dynamics on the neuronal layer, i.e., coupled ratebased RNN and Hebbian learning of synapses.Since z 0 is a constant in system (14), we take it as a non-dynamic parameter that can take different values.
As shown in Theorem 1, system (1) is bounded.Let z min and z min represent the minimum and maximum values that variable z can take in Ω.Under the assumption that z(t) can span the whole admitted space in Ω, we have that z 0 ∈ [z min , z max ].As a consequence, if the dynamics of the neuronal layer exhibit critical changes, such as changes of the number and/or stability of the fixed points, we can say there exist bifurcations in (14) with respect to z 0 , and these bifurcations are indeed induced by the self-slowly-varying astrocytic process.
In the following part, we will analyze the dynamics of the subsystems (14) in the spirit of the above idea.Astrocytes regulate neural dynamics We visualize the change process of the fixed point set of system (14) as a consequence of the perturbation induced by the constant astrocytic signal.Since there are 4 variables in Eqs. ( 2), the first step will be reducing the dimension, otherwise, it is difficult to visualize these nullclines in 3D coordinates.By eliminating the variable w 2 , we have Note that the activation function ψ min ≤ ψ(z 0 ) ≤ ψ max .To examine the change of fixed points is equivalent to studying the change of the intersection of (15a)-(15c) as ψ(z 0 ) varies in [ψ min , ψ max ].Recall that each equation of (15) defines a manifold in (x 1 , x 2 , w 1 ) ∈ Ωmotif ⊂ R 3 .We can show these manifolds geometrically for given parameters, where Fig 4 displays the situations for ψ(z 0 ) = ψ min , ψ(z 0 ) = 0, and ψ(z 0 ) = ψ max respectively under the parameter condition In Fig4, when ψ(z 0 ) ≈ −1 there is one fixed point (red dot); as ψ(z 0 ) increases, the position of this fixed point changes accordingly.When ψ(z 0 ) ≈ 1, another 2 fixed points exist.This means there is an increase of fixed point at a certain value of ψ(z 0 ), and this is confirmed by the bifurcation diagram obtained with Matcont ( see Fig 4D).It indicates that a branch of fixed points (red line) always exists.In contrast, the other branch of fixed points (blue line) exists when ψ(z 0 ) ≥ 0.7818, but a saddle-node bifurcation occurs at ψ(z 0 ) = 0.7818 such that these two fixed points collide and annihilate each other.As the bifurcation happens in the limit τ → 0 of the original whole system, we call this process a pseudo-bifurcation resulting from the slow change of the astrocytic activity.And the neural dynamic behaviors are regulated by the astrocytic process in this topdown manner.

F Extended simulation results
In this section, we provide extra simulations that are complimentary to the results in the main text.agent needs to explore the environment.After enough time, the agent can make the optimal actions such that there are no more regrets.It is observed that the neuron-astrocyte method does not generate regrets after about 2000 trials while other methods still give rise to regrets in the remaining trials.Therefore, the neuron-astrocyte method takes the shortest time to converge.In the right plot, we can see that the neuron-astrocyte method accounts for a medium amount of asymptotic cumulative regret among all the methods, while TS method has the lowest and UCB method has the highest asymptotic cumulative regret.

F.2 Robustness analysis in diverse stationary bandits
We conduct a robustness analysis of all methods under various conditions of arm probabilities, specifically (µ 1 , µ 2 , µ 3 ) = (0.6 − λ, 0.6, 0.6 + λ) with 0 < λ < 0.4.The UCB method is less competitive and excluded from this comparison due to its significantly larger regrets.As λ decreases, the bandit becomes more challenging due to the arms' probabilities converging.To vary the difficulty of the bandit tasks, we change λ from 0.38 to 0.02 (see Fig 6).It is recognized that the neuronastrocyte method performs similarly to other RNN-based methods under larger λ values.However, our method tends to exhibit better and more robust performance in more challenging bandits in contrast to others, which experience a decline in performance.

F.4 Time-scale separation impact
The time-scale parameter has a mild impact on the learning performance of the stationary bandit.When τ = 0.1, the average cumulative regret is the smallest.
It is also noted that the algorithm becomes more stable as the standard deviation of the final regrets becomes smaller as τ decreases.The time-scale separation has important influence on the learning performance for the non-stationary bandit.If there is no difference between the time-scales, i.e., τ = 1, the cumulative regrets will keep increasing and cannot reach a final stationary value for all runs, which means the agent is not able to adapt to the changing environments.When τ = 0.1, the agent can achieve stationary cumulative regrets occasionally over multiple runs; if τ ≤ 0.01, the agent can always achieve stationary asymptotic cumulative regret.

F.5 Flexible generalization to bandit tasks with different number of actions
In previous demonstrations, we have simulated the bandit tasks with only 3 actions.Here, we show that the neuron-astrocyte networks and the designed learning algorithm can be easily generalized to other cases by providing an illustrative example involving an 8-action Bernoulli bandit.In the stationary situation, the means for the actions are fixed as µ = (0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9).The non-stationary version is designed with the means changing abruptly from µ to 1 − µ every 5000 trials.
To accommodate the 8 actions of the bandit, the neuron-astrocyte learning algorithm can be modified by simply expanding the dimension of the neuronastrocyte module's outputs to 8, while leaving other settings unchanged.It is validated from simulations that our method can solve these even challenging tasks

F.6 Impact of astrocyte-neuron ratio on learning performance
We provide now a more complete characterization of the effect of different ratios between neurons and astrocytes.Let R = #astrocytes #neurons .As shown in Fig. 10, the ratios have important influence on the learning performance in both stationary and non-stationary bandits scenarios.In the stationary case, an increased ratio positively influences learning performance, leading to a reduction in asymptotic cumulative regret as more astrocytes are introduced.However, the dynamic nature of the flip-flop bandit introduces a subtle impact of the ratio: optimal cumulative regret occurs at intermediate ratios, whereas both excessively low and high ratios detrimentally affect learning by increasing regret.This outcome suggests an interesting interpretation: in non-stationary environments, the network requires appropriate exploration of the environment.Too few astrocytes hinder guidance for exploration, while an excess of astrocytes imposes overly strong regulation, limiting neural network flexibility.It turns out that the optimal ratio is around 7/10.In biology, the ratio of astrocytes to neurons is believed to be between 1:1.5 and 1:2, hence our modeling observation provides an interesting interpretation of the biology in this context.

A B
Cumulative Regret Figure 10: Learning performance of neuro-astrocyte networks with different astrocytes to neurons ratios ranging from 1/10 to 10/10.A is for the stationary bandit, while B is for the flip-flop non-stationary bandit.

F.7 Alternative model via potassium dynamics
Regarding slow dynamics potentially arising from passive potassium efflux.we have considered a null model of sorts, which was called the 'potassium only' model, with slow passive potassium dynamics [9] described as: ẋi = −a i x i + n j w ij ϕ(x j ) − λ i y i + u i τ p ẏi = n j p j ϕ(x j ), where y i denotes the potassium activity.This model enacts a slow time-scale, but without the nested feedback loop structure that we believe is key to neuralastrocyte interaction.We implement the potassium-only model into the learning algorithm.After the same training procedure as in our primary results, we are able to show that this null model can only learn the stationary (but not the non-stationary) version of the task (see Fig. 11).This lends credence to the idea that the unique interactivity properties of astrocytes with neurons are important to the purposes and hypothesized functional benefits.

Figure 1 :
Figure 1: The signal flow in the tripartite synapse structure.

Figure 2 :
Figure 2: Intersections of two curves where the blue line is the right hand side and the brown line is the left hand side.A. one intersection when c 2 = −1.1;B. two intersections when c = −1.3;C one intersection when c = −1.5595.

Fig 2 of
Main Text.To show the case of more fixed points is tedious and marginal, and thus it is out of the scope of this work.

Fig 5
Fig5 shows the details of each method in the learning procedure.As shown in the plots, the regrets of every method are dense at the beginning because the

Figure 5 :
Figure 5: Regrets per trial and cumulative regrets of each method.A shows the regrets per trial for different methods, while B shows the cumulative regrets over trials.

Figure 7 :Figure 8 :Figure 9 :
Figure 7: Learning performance in non-stationary Bernoulli bandits.For flipflop (panel A) and smooth-changing (panel B) cases, the cumulative regrets of different methods (neuron-astrocyte, LSTM, vRNN, GRU, DUCB, SWUCB) are shown for the single simulation and the average of 10 runs.The hyperparameters of DUCB and SWUCB have been carefully tuned to optimize their performance.

Figure 11 :
Figure 11: Cumulative regrets of the agent employing a neural network with passive potassium dynamics, where A is for the stationary bandit and B is for the flip-flop non-stationary bandit.

F. 8 Figure 12 :
Figure 12: Additional projections of astrocyte and synaptic activity for both contexts (indicated as a and b) in the early, middle, and late phases of learning, highlighting the formation of distinct synaptic weight trajectories Fig 7gives a comprehensive comparison of different methods in the non-stationary bandits.The neuron-astrocyte method consistently attains the lowest and maintains near-stationary asymptotic cumulative regret, both in single and multiple runs.In contrast, other methods struggle to adapt to evolving environments, resulting in steadily increasing regrets.This result is corroborated in experiments of both the flip-flop and smooth-changing non-stationary bandits.