A linear discriminant analysis model of imbalanced associative learning in the mushroom body compartment

To adapt to their environments, animals learn associations between sensory stimuli and unconditioned stimuli. In invertebrates, olfactory associative learning primarily occurs in the mushroom body, which is segregated into separate compartments. Within each compartment, Kenyon cells (KCs) encoding sparse odor representations project onto mushroom body output neurons (MBONs) whose outputs guide behavior. Associated with each compartment is a dopamine neuron (DAN) that modulates plasticity of the KC-MBON synapses within the compartment. Interestingly, DAN-induced plasticity of the KC-MBON synapse is imbalanced in the sense that it only weakens the synapse and is temporally sparse. We propose a normative mechanistic model of the MBON as a linear discriminant analysis (LDA) classifier that predicts the presence of an unconditioned stimulus (class identity) given a KC odor representation (feature vector). Starting from a principled LDA objective function and under the assumption of temporally sparse DAN activity, we derive an online algorithm which maps onto the mushroom body compartment. Our model accounts for the imbalanced learning at the KC-MBON synapse and makes testable predictions that provide clear contrasts with existing models.


Author summary
To adapt to their environments, animals learn associations between sensory stimuli (e.g., odors) and unconditioned stimuli (e.g., sugar or heat). In flies and other insects, olfactory associative learning primarily occurs in a brain region called the mushroom body, which is partitioned into multiple compartments. Within a compartment, neurons that represent odors synapse onto neurons that guide behavior. The strength of these synapses is modulated by a dopamine neuron that responds to one type of unconditioned stimuli (e.g., sugar), which implicates these synapses as a biological substrate for associative learning in insects. Modifications of these synapses is imbalanced in the sense that dopamine-induced modifications only weaken the synapses and are temporally sparse. In this work, we propose a simple mechanistic model of learning in the mushroom body that accounts for this

Introduction
Behavioral responses of animals are shaped in part by learned associations between sensory stimuli (e.g., odors) and unconditioned stimuli (e.g., sugar, heat or electric shock). A challenge in neuroscience is to understand the neural mechanisms that underlie associative learning. In invertebrates, the mushroom body is a well-studied brain region that plays a central role in olfactory associative learning [1][2][3][4]. The goal of this work is to propose a normative, mechanistic model of associative learning in the mushroom body that accounts for experimental observations and provides clear contrasts with existing models. The mushroom body is segregated into functionally independent compartments [5], Fig 1. Within each compartment, Kenyon cells (KCs), which encode sparse odor representations [6], form synapses with the dendrites of mushroom body output neurons (MBONs), whose outputs guide learned behavior [7]. Associated with each compartment is a single Dopamine neuron (DAN) that responds to an unconditioned stimulus [8,9], and projects its axon into the mushroom body compartment where it innervates the KC-MBON synapses to modulate plasticity, implicating the KC-MBON synapse as the synaptic substrate for associative learning in invertebrates.
Experimental evidence suggests that learning at the KC-MBON synapse is imbalanced in the sense that DAN-induced plasticity is one-sided and temporally sparse. In particular, coactivation of a KC and the DAN weakens the KC-MBON synapse (see Fig 1, right) and DAN- induced plasticity is independent of the MBON activity [5]. This suggests that DAN-induced plasticity is one-sided and another mechanism such as homeostatic plasticity is responsible for strengthening the KC-MBON synapse. Furthermore, since each DAN responds to one type of unconditioned stimulus [2], which only constitutes a small fraction of all stimuli, the DAN activity is temporally sparse.
In this work, we propose a normative, mechanistic model of associative learning in the mushroom body that accounts for the imbalanced learning. We model each MBON as a linear discriminant analysis (LDA) classifier, which predicts if an associated unconditioned stimulus is present (the class label) given a KC odor representation (the feature vector). Under this interpretation, the KC-MBON synapses and an MBON bias term define a hyperplane in the high-dimensional space of KC odor representations that separates odor representations associated with the unconditioned stimulus from all other odor representations, Fig 2. Here, 'normative' refers to the fact that our mechanistic model is interpretable as an algorithm for optimizing an LDA objective. The normative approach is top-down in the sense that first the circuit objective is proposed and then an optimization algorithm is derived and compared with known physiology. There are several advantages to this approach. First, it directly relates the circuit objective to its mechanism; for example, neural activities and synaptic weight updates are interpretable as steps in an algorithm for solving a relevant circuit objective. Second, the approach distills down what aspects of the physiological are essential for optimizing the circuit objective and what aspects are not captured by the objective. Third, normative models are often analytically tractable, which allows them to be analyzed for any input statistics without resorting to exhaustive numerical simulation.
To derive our algorithm, we start with a convex objective for LDA (in terms of the KC-MBON synaptic weights). The objective can be optimized in the offline setting by taking gradient descent steps with respect to the KC-MBON synaptic weights. To obtain an online algorithm that accounts for the imbalanced learning, we take advantage of the fact that DAN activity is temporally sparse to obtain online approximations of the input statistics. Finally, we show numerically that our algorithm performs well even when DAN activity is not temporally sparse.
Our model makes testable predictions that are a direct result of the learning imbalance. First, our model predicts that DAN-induced plasticity at the KC-MBON synapse is sensitive to the time elapsed since the DAN was last active. Second, our model predicts that if the DAN is never active, then the KC-MBON synapses adapt to align with the mean KC activity (normalized by the covariance of the KC activity).

LDA model of the mushroom body compartment
We consider a simplified mushroom body compartment that consists of n KC axons, the axon terminals from one DAN and the dendrites of one MBON, Fig 1. At each time t = 1, 2, . . ., the vector x t 2 R n encodes the KC activities and the scalar y t 2 {0, 1} indicates whether the DAN is active (y t = 1) or inactive (y t = 0). If the DAN is active, we refer to x t as a conditioned odor response, whereas if the DAN is inactive, we refer to x t as a neutral odor response. We assume the DAN activity is temporally sparse, which can be expressed mathematically as π 1 � 1, where π 1 ≔ hy t i t is the fraction of time that the DAN is active.
In our model, the MBON is a linear classifier that predicts the DAN activity y t (class label) given the KC activities x t (feature vector). Let w 2 R n be a synaptic weight vector whose i th component represents the strength of the synapse between the i th KC and the MBON. At each time t, the KC activities x t are multiplied by the synaptic weight vector w to generate the total input to the MBON, denoted c t ≔ w � x t . The output (firing rate) of the MBON is given by where b represents the 'bias' of the MBON; that is, the threshold below which the MBON does not fire. Under this interpretation, the KC-MBON synapses w and MBON bias b define a hyperplane H :¼ fx 2 R n : w � x ¼ bg in the n-dimensional space of KC activities that separates conditioned odor responses from neutral odor responses, Fig 2. In this case, z t > 0 (resp. z t = 0) corresponds to the prediction y t = 0 (resp. y t = 1). In other words, the MBON is a linear classifier that is active when predicting there is no unconditioned stimulus and inactive when predicting there is an unconditioned stimulus, which is consistent with experimental observations [2].
We derive learning rules for the KC-MBON synaptic weights w (and bias b) that solve an LDA objective and are consistent with experimental observations [2,5]. LDA is popular linear classification method that is optimal under the assumption that the neutral odor responses and conditioned odor responses are Gaussian with common covariance matrix, but works well in practice even when these assumptions do not hold [10].
Our starting point is the convex LDA objective min w LðwÞ; where μ 0 and μ 1 denote the means of the neutral odor responses and conditioned odor responses, respectively, and S denotes the covariance of the neutral odor response. In the offline setting, we can minimize L(w) by taking gradient steps with respect to w: where η > 0 is the step size. However, computing the means μ 0 , μ 1 and covariance S requires the MBON to have access to the entire sequence of inputs, which is an unrealistic assumption. To derive our online algorithm, we replace the averages μ 0 , μ 1 and S in Eq 2 with online estimates. When the DAN is inactive (y t = 0), we update the KC-MBON weights w according to the homeostatic plasticity rule where μ 0,t denotes the running estimate of the mean neutral odor response and z t denotes the running estimate of the mean total MBON input c t conditioned on the DAN being inactive.
Here, μ 0,t and (c t − z t ) (x t − μ 0,t ) are online estimates of μ 0 and S, respectively (see Methods section). The running means μ 0,t and z t can be represented by biophysical quantities such as calcium concentrations at the pre-and postsynaptic terminals of the KC-MBON synapses. When the DAN is active, we update the KC-MBON weights w according to the following DAN-induced plasticity rule where ℓ t−1 denotes the time elapsed since the last time the DAN was active; see Algorithm 1 only has one hyper-parameter-the learning rate η > 0-which corresponds to timescale for learning in the mushroom body compartment. Hige et al. [5] showed that mushroom body compartments have distinct timescales for learning, which can be modeled by choosing different learning rates η > 0.

Numerical experiments
Next, we test Algorithm 1 on synthetic and real datasets. We test our algorithm on inputs when our assumption π 1 � 1 holds, but also on inputs when π 1 � 0.5. To evaluate our algorithm, we measure the running accuracy of the projections z t over the previous min(100, t) iterations, where the algorithm is accurate at the t th iterate if z t = 0 and y t = 1 or if z t > 0 and y t = 0.
Synthetic dataset. We begin by evaluating Algorithm 1 on a synthetic dataset generated by a mixture of 2 overlapping Gaussian distributions, so that the optimal accuracy is less than 1. The data points of the 2 classes are each drawn from a 2-dimensional mean with common covariance. We simulate datasets of 10 5 data points using the same mean and covariance in both classes but vary the frequency of class 1 samples encountered. We consider the cases π 1 = 0.1, 0.2, 0.3, 0.4, 0.5. In Fig 3 (left) we plot the error and the accuracy of our model for varying π 1 . Remarkably, while the derivation of Algorithm 1 relied on the fact that π 1 � 1, the algorithm still performs well even when π 1 = 0.5.
KC activities dataset. We test our model on KC activities reported in [11]. Campbell et al. recorded odor-evoked KC responses in the fly mushroom body. The dataset we tested on contains the responses of 124 KCs in a single fly to the presentation of 7 odors, see [11, Figure 1]. To ensure the KC responses are well conditioned, we add Gaussian noise with covariance � I 124 , where � = 0.01. We apply Algorithm 1 to the KC dataset. We first consider the case that odor 1 denotes the class 1 odor and odors 2-7 denote the class 0 odors, so p 1 ¼ 1 7 . We then consider the cases that odors 1-2 (resp. 1-3) odors denote the class 1 odors and the remaining odors denote the class 0 odors, so p 1 ¼ 2 7 (resp. p 1 ¼ 3 7 ). In Fig 3 (right) we plot the error and accuracy of our model for varying π 1 . Impressively, the modified algorithm performs well (approximately 85% accuracy) even when the assumption π 1 � 1 is violated.
Competing MBONs. Using the KC activities dataset, we model 2 MBONs with competing valences by running 2 instances of Algorithm 1 in parallel with different class assignments for the odors. We consider the case that odor 1 is aversive, odor 7 is attractive and the remaining odors are neutral. For MBON 1 (resp. MBON 2), we assume that odor 1 (resp. odor 7) denotes the class 1 odor and odors 2-7 (resp. odors 1-6) denote the class 0 odors, so that MBON 1 (resp. MBON 2) activity promotes approach (resp. avoid) behavior. Let z i,t denote the output of MBON i 2 {1, 2}. At each iterate t, if odor 1 (resp. odor 7, odors 2-6) is presented, then the model is accurate if z 1,t > 0 and z 2,t = 0 (resp. z 1,t = 0 and z 2,t > 0, resp. z 1,t > 0 and z 2,t > 0), and inaccurate otherwise. We then repeat the experiment two more times, but with odor 2 (resp. odor 3) labeled as aversive and odor 6 (resp. odor 5) as attractive. In Fig 4, we plot the performance of the competing MBONs.

Summary
In this work, we proposed a normative model of the mushroom body compartment that accounts for imbalanced learning at the KC-MBON synapse. Testing our model on synthetic and real datasets shows that it performs well under a variety of conditions. In our model, DAN-induced plasticity at the KC-MBON synapse does not depend on the MBON activity, but rather on the time elapsed since the last time the DAN was active. This aspect of our model suggests testable predictions that provide clear contrasts with existing models of associative learning in the mushroom body.

Model predictions
Prediction 1-In the absence of DAN activity, the KC-MBON synapses will align with the mean KC activity normalized by the covariance of their activities. When presented with neutral odors, the synapses adapt according to the homeostatic update in Eq 3. Since this update is equal to η(μ 0 − Sw) on average (see Methods section), the KC-MBON synaptic weights equilibrate at w = S −1 μ 0 . Experimentally, this prediction could be tested by first presenting a fly with neutral odors and simultaneously recording from multiple KCs and an MBON. The weights can be estimated from the neural activities (using, e.g., [12]) and compared with our prediction w = S −1 μ 0 .
Prediction 2-DAN-induced plasticity is proportional to the time elapsed since the DAN was last active. According to update in Eq 4, DAN-induced plasticity is proportional to the time elapsed since the DAN was last active. Experimentally, this prediction could be tested by presenting a fly with conditioned odors with different time intervals between presentations and estimating the resulting change in the synaptic weights.

Relation to existing models
There are a number of existing computational models of associative learning in the mushroom body [13][14][15][16][17], many of which are faithful to biophysical details and successfully capture important computational principles underlying associative learning in the mushroom body (see, e.g., [15]). Through extensive numerical simulations, these computational models can explain a number of phenomena. For example, Heurta and Nowotny [15] show that the organization of the mushroom body supports fast and robust associative learning, Bazhenov et al. [16] show that interactions between unsupervised and supervised forms of learning can explain how the timescale of associative learning depends on experimental conditions, and Peng and Chittka [17] show how complex forms of learning (e.g., peak shift) depend on different mechanistic aspects of learning in the mushroom body. In this work, we propose a top-down normative model of learning at the KC-MBON synapse, which contrasts with the bottom-up approach in these works that build models closely tied to physiological evidence. In this way, the our model is interpretable as an algorithm for optimizing a circuit objective and the output can be predicted analytically for any environmental condition without needing to resort to numerical simulation. In addition, our normative model makes testable predictions that are in clear contrast with these models, providing a method for validating or invalidating our model.
In addition to these models, Bennett et al. [18] propose a reinforcement learning model of the KC-MBON synapses as minimizing reinforcement prediction errors. They first consider a model in which the reinforcement signal is computed as the difference between DAN activities, so their plasticity rule requires 2 DANs to innervate a single mushroom body compartment, which is in contrast to experiment evidence showing that most compartments only receive inputs from a single DAN [9]. To account for this experimental observation, they propose a heuristic modification that adds a constant source of synaptic potentiation, which can be viewed as a form of homeostatic plasticity and is in line with experimental evidence. However, the modification is not normative and can fail to minimize prediction errors.
A significant difference between our model and these existing models is that DAN-induced plasticity depends on ℓ t−1 , the time elapsed since the DAN was last active. In our model, the variable ℓ t−1 is critical for balancing homeostatic plasticity and DAN-induced plasticity. In S1 Appendix, we consider a modification of our algorithm in which ℓ t−1 is replaced by a fixed constant ℓ�.

Comparison of LDA to other linear classification methods
LDA is a linear classifier that is optimal under strict assumptions on the inputs, so it is worth considering other linear classification methods such as logistic regression and support vector machines (SVMs). Logistic regression is classical method for estimating the probability of one class versus another class. In terms of performance, there is evidence that there is not a substantial difference in the performance of logistic regression and LDA even when the assumptions for LDA are not met [19]. As a model of the insect mushroom body, we are unaware of an online algorithm for logistic regression that maps onto the mushroom body compartment and matches the experimental observations in [2,5].
SVMs are flexible linear classifiers that do not make assumptions about the underlying data distribution. Huerta et al. [13,15] proposed models of the mushroom body that are closely related to SVMs [20,21]; however, the DAN-induced synaptic update rules depend on the MBON activity, which is in contrast to recent experimental evidence [5].

Limitations
Our model is a dramatic simplification of the mushroom body focused on providing a normative account of learning at the KC-MBON synapse that can account for how balance between DAN-induced plasticity and homeostatic plasticity is optimally maintained. Consequently, our model does not account for a number of the physiological details. For example, in order to implement an LDA algorithm, we do not sign-constrain the synaptic weight vector w, which violates Dale's law. In addition, we assume that the DAN activity is binary. In reality the DAN may fire at different rates depending on the strength of the unconditioned stimulus and the firing rate may affect the DAN-induced plasticity. We can modify our model to allow y t to be any nonnegative scalar and replace the update in Eq 4 with Δw = −ηℓ t−1 y t x t . However, in this case the algorithm is not derived from an objective function for LDA and so it is more challenging to understand the output. In addition to such simplifications, there are other features such as feedback connections in the mushroom body that have been recently discovered and are relevant for associative learning [7,22], which are also not captured by our model.

Linear discriminant analysis
LDA is a statistical method for linear classification [23, section 4.3], which makes the following simplifying assumption: the conditional probability distributions p(x|y = 0) and p(x|y = 1) are both Gaussian with common full-rank n × n covariance matrix S; that is pðxjy ¼ 0Þ � N ðμ 0 ; ΣÞ; pðxjy ¼ 1Þ � N ðμ 1 ; ΣÞ; where μ 0 and μ 1 denote the means of the class 0 and class 1 feature vectors. In this case, the optimal decision criteria for assigning class 0 (resp. class 1) to feature vector and π i denotes the probability that a samples belongs to class i, for i = 0, 1. In particular, the hyperplane H ¼ fx 2 R n : w opt � x ¼ b opt g defines the optimal separation boundary for predicting whether a feature vector belongs to class 0 or class 1. While LDA assumes a specific generative model, it performs well in practice even when the assumptions do not hold [10]. The optimal weights w opt can be expressed as the solution of the convex minimization problem in Eq 1, which we can solve for by taking the gradient descent steps (Eq 2). Formally, taking the step size η to zero in Eq 2 yields the linear gradient flow _ wðtÞ ¼ μ 0 À μ 1 À ΣwðtÞ; whose solution is given by wðtÞ ¼ e À Σt wð0Þ þ ðI n À e À Σt ÞΣ À 1 ðμ 0 À μ 1 Þ: In particular, we see that the solution w(t) converges exponentially to the optimal solution w opt .

An online algorithm for imbalanced learning
In the online setting, the class means μ 0 , μ 1 and the covariance S are not available. Instead, at each time t the algorithm has access to the feature vector x t and class label y t . To derive our online algorithm, we make online approximations of the offline quantities μ 0 , μ 1 and S that are based on the fact that the unconditioned stimuli are sparse in time, i.e., π 1 � 1, where we recall that π 1 denotes the proportion of conditioned odors. First, we note that we can rewrite the sample class means where π 0 ≔ h1 − y t i t � 1 is the fraction of odors that are neutral, and the sample covariance Estimating the mean response to a neutral odor and the covariance. Since π 0 � 1, we approximate Therefore, in the online setting, we can keep a running estimate of μ 0 and z ≔ w � μ 0 � h(1 − y t )c t i t , where we recall that c t = w � x t , by performing the updates μ 0;t μ 0;tÀ 1 þ 1 t ð1 À y t Þðx t À μ 0;tÀ 1 Þ; z t z tÀ 1 þ 1 t ð1 À y t Þðc t À z tÀ 1 Þ: ð7Þ In view of Eq 6 and the definitions of c t and z, we can replace Sw with the online approximation We replace the first and third terms in the offline update in Eq 2, η(μ 0 − Sw), with the online Estimating the mean response to a conditioned odor. To obtain an online approximation of μ 1 , we first note that 1 p 1 is approximately equal to the average time elapsed between class 1 samples. To see this, let t 1 , t 2 , . . . denote the subset of times such that y t = 1. Then, letting t 0 = 0, we have ðt j À t jÀ 1 Þ: Thus, in the online setting, when the j th class 1 sample is presented (i.e., y t = 1), we can use the time elapsed since the last class 1 sample, t j − t j−1 , as an online estimate of 1 p 1 . Setting ℓ 0 = 1 and ( we see that at time t such that y t = 1, ℓ t−1 denotes the time elapsed since the last class 1 sample, so h' tÀ 1 jy t ¼ 1i t ¼ 1 p 1 . Assuming that the variables ℓ t−1 and x t are independent given y t = 1i.e., the KC representation x t of a conditioned odor is independent of the time elapsed since the last conditioned odor ℓ t−1 -we see that μ 1 = hℓ t−1 |y t = 1i t hx t |y t = 1i t = hℓ t−1 x t |y t = 1i t . We replace the second term in the offline update in Eq 2, −ημ 1 , with the online approximation −ηy t ℓ t-1 x t .
Estimating the bias. To estimate the bias b, we note that because π 0 � 1 and where the final inequality follows from the fact that log is concave and Jensen's inequality (with equality holding when the variance of ℓ t−1 given y t = 1 is zero). Thus, assuming the variance of the time elapsed between conditioned odors is small, we can estimate the bias b in the online setting with the updates: Substituting these approximations into the offline update rules in Eq 2 yields our online algorithm (Algorithm 1).
In view of Jensen's inequality, if the variance of the time elapsed between conditioned odors is large, then the bias b will be overestimated, meaning that the MBON will be less active than optimal. In other words, irregular intervals between DAN activity biases the MBON to be less active (i.e., predict that the unconditioned stimulus is present more often).
Supporting information S1 Appendix. Comparison with a modified algorithm. We consider a modification of Algorithm 1 in which the DAN-induced plasticity of the KC-MBON synapses does not depend on the time elapsed since the last time the DAN was active. (PDF)