^{*}

Conceived and designed the experiments: YL. Performed the experiments: YL. Analyzed the data: YL. Wrote the paper: YL.

The authors have declared that no competing interests exist.

It is widely believed that learning is due, at least in part, to long-lasting modifications of the strengths of synapses in the brain. Theoretical studies have shown that a family of synaptic plasticity rules, in which synaptic changes are driven by covariance, is particularly useful for many forms of learning, including associative memory, gradient estimation, and operant conditioning. Covariance-based plasticity is inherently sensitive. Even a slight mistuning of the parameters of a covariance-based plasticity rule is likely to result in substantial changes in synaptic efficacies. Therefore, the biological relevance of covariance-based plasticity models is questionable. Here, we study the effects of mistuning parameters of the plasticity rule in a decision making model in which synaptic plasticity is driven by the covariance of reward and neural activity. An exact covariance plasticity rule yields Herrnstein's matching law. We show that although the effect of slight mistuning of the plasticity rule on the synaptic efficacies is large, the behavioral effect is small. Thus, matching behavior is robust to mistuning of the parameters of the covariance-based plasticity rule. Furthermore, the mistuned covariance rule results in undermatching, which is consistent with experimentally observed behavior. These results substantiate the hypothesis that approximate covariance-based synaptic plasticity underlies operant conditioning. However, we show that the mistuning of the mean subtraction makes behavior sensitive to the mistuning of the properties of the decision making network. Thus, there is a tradeoff between the robustness of matching behavior to changes in the plasticity rule and its robustness to changes in the properties of the decision making network.

It is widely believed that learning is due, at least in part, to modifications of synapses in the brain. The ability of a synapse to change its strength is called “synaptic plasticity,” and the rules governing these changes are a subject of intense research. Theoretical studies have shown that a particular family of synaptic plasticity rules, known as covariance rules, could underlie many forms of learning. While it is possible that a biological synapse would be able to approximately implement such abstract rules, it seems unlikely that this implementation would be exact. Covariance rules are inherently sensitive, and even a slight inaccuracy in their implementation is likely to result in substantial changes in synaptic strengths. Thus, the biological relevance of these rules remains questionable. Here we study the consequences of the mistuning of a covariance plasticity rule in the context of operant conditioning. In a previous study, we showed that an approximate phenomenological law of behavior called “the matching law” naturally emerges if synapses change according to the covariance rule. Here we show that although the effect of slight mistuning of the covariance rule on synaptic strengths is substantial, it leads to only small deviations from the matching law. Furthermore, these deviations are observed experimentally. Thus, our results support the hypothesis that covariance synaptic plasticity underlies operant conditioning.

Synaptic plasticity that is driven by covariance is the basis of numerous models in computational neuroscience. It is the cornerstone of models of associative memory

In order for a synapse to implement covariance-based plasticity, it must estimate and subtract the mean of a stochastic variable. In many neural systems, signals are subjected to high-pass filtering, in which the mean or “DC component” is attenuated relative to phasic signals

Here, we study the effect of incomplete mean subtraction in a model of operant conditioning, which is based on synaptic plasticity that is driven by the covariance of reward and neural activity. In operant conditioning, the outcome of a behavior changes the likelihood of the behavior to reoccur. The more a behavior is rewarded, the more it is likely to be repeated in the future. A quantitative description of this process of adaptation is obtained in experiments where a subject repeatedly chooses between two alternative options and is rewarded according to his choices. Choice preference is quantified using the ‘fractional choice’ _{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{1}/_{2} = (_{1}/_{2})^{k}_{i}

In a recent study we showed that the matching law is a natural consequence of synaptic plasticity that is driven by the covariance of reward and neural activity

Decision making is commonly studied in experiments in which a subject repeatedly chooses between two alternative actions, each corresponding to a sensory cue. For example, in many primate experiments, the stimuli are two visual targets, and the actions are saccadic eye movements to the targets _{1} and _{2} (_{i}_{i}_{i}_{i}_{i}·N_{i}_{i}_{1}>_{2}. Otherwise alternative 2 is chosen. This process of competition between the two premotor populations can be achieved by a winner-take-all network with lateral inhibition _{i}

(A) The decision making network consists of two populations of sensory neurons _{i}_{i}

Consider the following plasticity rule, in which the change Δ_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}

In practice, synaptic efficacies are bounded and such divergence is prevented by synaptic saturation. We model the synaptic saturation by adding a polynomial decay term to the synaptic plasticity rule such that Eq. (2) becomes

The synaptic saturation is modeled here using a saturation stiffness parameter, ρ. When ρ = 1, as in _{i}_{bound}_{i}_{bound}

The dynamics of Eq. (4) are stochastic and therefore difficult to analyze. If the plasticity rate η is small then many trials with different realizations of choices and rewards are needed in order to make a substantial change in the value of the synaptic efficacies. Therefore intuitively, the stochastic dynamics of Eq. (4) can be viewed as an average deterministic trajectory, with stochastic fluctuations around it, where we expect that this average deterministic dynamics becomes a better approximation to the stochastic dynamics as the plasticity rate η becomes smaller. The conditions under which this intuitive picture is valid are discussed in _{1}, _{2}≠0, and γ>0, we show (_{i}_{i}_{i}_{i}_{i}_{i}_{i}

Consider Eq. (7). When γρ = 0,

For insights into the dependence of the susceptibility on γ, it is useful to consider the differential contributions of the covariance term, and the bias and saturation terms in Eq. (5). The smaller the value of γ, the larger the contribution of the covariance term, making it more similar to the idealized covariance-based plasticity rule that yields _{1} and _{2} become similar independently of the fractional income. In the limit of γ→∞, the efficacies of the two synapses become equal and the alternatives are chosen with equal probability. Thus, the larger the value of γ in Eq. (7), the smaller the susceptibility of behavior.

Consider the case of an infinitely hard bound, ρ→∞ in Eq. (4). As long as ^{*}<_{bound}^{*}/_{bound}^{ρ} = 0. Because of the incomplete mean subtraction, the two synapses are expected to grow continuously until they reach _{bound}^{*}>_{bound}^{*}/_{bound}^{ρ}→∞. Thus both synaptic efficacies are expected to become equal to the synaptic bound _{bound}

In the limit of low noise in the activity of the sensory neurons σ≪1, choice behavior is independent of the value of σ. For insight into this independence we consider the dual role of trial-to-trial fluctuations in the neural activity of the sensory neurons in our model. Information about past incomes is stored in the synaptic efficacies such that the stronger synapse corresponds to the alternative that yielded a higher income in the past, biasing choice toward that alternative. For this reason we denote the difference in synaptic efficacies as ‘signal’. The trial-to-trial fluctuations in the neural activity of the sensory neurons underlie the stochasticity of choice. In the absence of such fluctuations, the synaptic efficacies determine choice such that the chosen alternative is the one that corresponds to the larger synaptic efficacy. The larger these fluctuations are the more random choice is. We refer to this effect as ‘noise’. However, these fluctuations also play a pivotal role in the learning process. Changes in synaptic efficacy are driven by the covariance of the reward and the neural activity of the sensory neurons. The larger the fluctuations in the activity of these neurons, the larger the covariance and therefore the larger the learning signal, increasing the difference between the synaptic efficacies that correspond to the “rich” and “poor” alternatives. Thus, an increase in the stochasticity in the activities of the sensory neurons increases both the signal and the noise. We show that when σ≪1, the ratio of the signal to noise is independent of σ (

Eq. (7) is derived assuming that the stochastic dynamics, Eq. (4) has converged to the fixed point of the average trajectory, Eq. (5) and that σ≪1 (

(A) The probability of choice as a function of fractional income. Each point corresponds to one simulation of the model, Eq. (4), in a concurrent VI reward schedule with fixed baiting probabilities. The level of deviation from matching behavior (black line) depends on the level of incomplete mean subtraction, γ and synaptic saturation stiffness, ρ. Red squares, γ = 0.05, ρ = 1; blue diamonds, γ = 0.5, ρ = 1; gray triangles γ = 0.5, ρ = 4; colored lines are the analytical approximations, Eq. (7). (B) Susceptibility of behavior as a function of γ. In order to quantify the effect of γ on deviation from matching behavior, we repeated the simulations of A for many values of γ and measured the susceptibility of behavior (the slope of the resultant curve, see text and

In the previous section we analyzed the behavioral consequences of mistuning of the plasticity rule in a particular network model. The question of robustness is equally applicable to the parameters of the decision making network as it is to the parameters of the synaptic plasticity rule. Therefore, in this section we study the robustness of matching behavior to the mistuning of the parameters of the network.

There are various ways in which the decision making network can be mistuned. We chose to study the effect of a bias in the winner-take-all network, because this is a generic form of error that is likely to significantly affect choice behavior. It is plausible that a winner-take-all network will be able to choose the alternative that corresponds to the larger activity of the two premotor populations in trials in which _{1} and _{2} are very different. However, if _{1} and _{2} are similar in their level of activity it is likely that a biological implementation of a winner-take-all mechanism, which is not finely tuned, will be biased to favoring one of the alternatives. Formally we assume that alternative 1 is chosen in trials in which (_{1}−_{2})/(_{1}+_{2})>ε where ε is a bias. The unbiased case studied in the previous section corresponds to ε = 0. In contrast, ε>1 or ε<–1 correspond to a strong bias such that choice is independent of the values of _{1} and _{2}. With the same assumptions as in the derivation of Eq. (7), _{1}, _{2}≠0 and σ≪1, we show (_{i}_{1} is proportional to the deviation of the susceptibility of behavior from unity, 1−_{1} = 0 for any value of bias ε. This robustness of matching behavior to bias in the winner-take-all network is due to the fact that the idealized covariance based plasticity rule can compensate for the bias in the decision making network in almost any neural architecture _{1} is proportional to the bias ε. The larger the deviation of the plasticity rule from the idealized covariance rule, the larger the proportionality constant. Thus, there is a tradeoff between the robustness of matching behavior to changes in the plasticity rule and robustness to changes in the parameters of decision making. The larger the mistuning of the plasticity rule, the smaller the robustness of matching behavior to mistuning of the parameters of the decision making network. Importantly, the level of noise in the sensory populations strongly affects the bias in behavior through ε/σ. This contrasts with the independence of the susceptibility parameter _{i}

To study the validity of Eq. (8) numerically, we simulated the synaptic plasticity rule of Eq. (4) in the decision making model of _{1} that corresponds to δ_{1} = 0 for different values of ε and γ (

(A) The probability of choice as a function of fractional income. Each point corresponds to one simulation of the model (Eq. (4) with ρ = 1) in a concurrent VI reward schedule with fixed baiting probabilities. The level of deviation from matching behavior (black line) depends on the bias in the winner-take-all mechanism. Red squares, ε = −3σ; blue diamonds, ε = 0; gray triangle, ε = 3σ; γ = 0.05; colored lines are the analytical approximation, Eq. (8). (B) Choice bias. The simulation of A was repeated for different values of ε for two values of γ (blue dots, γ = 0.5; red dots, γ = 0.05), and the probability of choosing alternative 1 for a fractional income of _{1} = 0.5 was measured. Lines correspond to the expected probability of choice from the analytical approximation, Eq. (8).

In this study we explored the robustness of matching behavior to inaccurate mean subtraction in a covariance-based plasticity rule. We have shown that (1) although this deviation from the idealized covariance rule has a substantial effect on the synaptic efficacies, its behavioral effect is small. (2) The direction of the behavioral effect of incomplete mean subtraction is towards the experimentally observed undermatching. (3) When the plasticity rule is mistuned, matching behavior becomes sensitive to the properties of the network architecture. Thus, there is a tradeoff between the robustness of matching behavior to changes in the plasticity rule and robustness to changes in the parameters of the decision making network.

Covariance-based, Hebbian synaptic plasticity dominates models of associative memory. According to the popular Hopfield model, the change in the synaptic efficacy between pairs of neurons is proportional to the product of their activities in the training session, measured relative to their average activity _{1} = _{1}. Any set of synaptic efficacies that resides within this hyperspace is a fixed point of the family of synaptic plasticity rules that is driven by the covariance of reward and neural activity (in the average trajectory approximation)

Several studies have reported stochastic gradient learning in a model in which changes in the synaptic efficacy are driven by the product of the reward with a measure of past activity known as the ‘eligibility trace’

The level of fine-tuning required for normal brain functioning is unknown and robustness represents a major open issue for many models of brain systems. For example, the fine-tuning of neural parameters involved in the short term memory of analog quantities such as eye position in the oculomotor neural integrator

Undermatching in our model is the outcome of inaccurate mean subtraction, whether it is incomplete or overcomplete. This result is expected to hold in other symmetrical decision making models: when the mean subtraction is inaccurate, synaptic efficacies are determined by a combination of a covariance term, and bias and saturation terms. The bias and saturation terms are not influenced by the correlation between the neural activity and the reward. Therefore they drive the synaptic efficacies to values that are independent of the fractional income. If the architecture of the decision making network is symmetrical with respect to the two alternatives (as is the case in our model for ε = 0), they will drive the synaptic efficacies in the direction of a symmetrical solution for which the two alternatives are chosen with equal probability, which corresponds to

We hypothesize that the observed matching behavior results from a synaptic plasticity rule that is driven by an approximation to the covariance of reward and neural activity. In this case, behavior adapts because synapses in the brain perform a statistical computation and ‘attempt’ to decorrelate the reward and the fluctuations in neural activity. However, a very different class of matching models has been proposed, in which the brain performs computations that are “financial.” According to these models, subjects keep track of financial quantities such as return or income from each alternative and make choices stochastically according to the difference or ratio of the financial quantities between the two alternatives leading to matching

As was described above, the identity of choice in the network of _{i}_{i}·N_{i}_{1}>_{2}. Otherwise alternative 2 is chosen. Thus, the fraction of trials in which alternative 1 is chosen, or the probability that it is chosen is given by_{d}_{1}−δ_{2})/(2·E[_{s}_{1}+δ_{2})/(2·E[_{i}_{i}_{d}_{s}_{s}_{1}+_{2})/2, _{d}_{1}−_{2})/2. Because _{1} and _{2} are independent Gaussian variables with a coefficient of variation σ, _{d}_{s}_{d}_{s}_{1},_{2}≠0 implies that in the limit of σ→0,

Next we use Eq. (11) to compute two quantities that will become useful later:

In this section we compute the dependence of deviations from matching behavior on γ, assuming that synaptic efficacies are given by the fixed point of the average trajectory, Eq. (5). The precise conditions for the correctness of the approach are discussed in details in

According to Eq. (11), the probability of choice depends on the ratio of the synaptic efficacies; thus the scaling of the synaptic efficacies by a positive number does not change the probabilities of choice. For clarity we scale the synaptic efficacies of Eq. (5) (assuming ρ = 1) such that,_{d}_{s}_{s}_{d}_{x}_{x}_{x}_{x}_{i}_{i}_{1},_{2}≠0 and used the fact that _{1}+_{2} = 1 and _{1}+_{2} = 1. Substituting Eqs. (13), (14), (23) and (24) in Eqs. (18) and (19) yields_{1} with _{1} and _{1} is determined by the reward schedule). Next we use Eq. (27) to show that:

In the limit of σ→0 the model undermatches.

The level of undermatching is proportional to γ, (Eq. (6)).

Expanding Eq. (27) around _{1} = 0.5, yields a closed-form solution for _{1} (Eq. (7)).

(1) As was discussed above, the assumption that _{1},_{2}≠0 in the limit of σ→0 implies that ^{2}>0. Thus, _{1} and _{1} in Eq. (28) are the values at the fixed point and therefore a more accurate notation would have included an asterisk. However, in order to keep notations in the text simple and notations in the

(2) Taking the dominant terms in σ in Eq. (27) yields^{*} =

(3) In order to obtain a closed form approximation to Eq. (29) we expand Eq. (12) around _{1} = 0 yielding_{i}

In order to study the effect of bias in the winner-take-all network on choice behavior, we assume that that alternative 1 is chosen in trials in which (_{1}−_{2})/(_{1}+_{2})>ε where ε is a bias. Formally,_{s}_{d}_{1},_{2}≠0 implies in the limit of σ→0 ^{*} = _{1}−_{1} =

Expanding Eqs. (34) and (42) around _{i}

Rewriting Eq. (5),_{i}_{i}_{i}_{i}_{i}_{1}/_{2}. Therefore, the first term in the right hand side of Eq. (50) does not affect the probabilities of choice. The saturation stiffness parameter ρ affects the probability of choice through the second term and this effect is equivalent to the scaling of the mistuning parameter γ by ρ. Thus, assuming that synaptic efficacies converge to the fixed point of the average trajectory, Eq. (5), the effect of deviations of the saturation stiffness parameter from unity on choice is equivalent to the scaling of γ by ρ.

The synaptic saturation term also changes the effective plasticity rate, which will change the conditions of applicability of the average trajectory approximation. This analysis goes beyond the scope of this manuscript and will be discussed elsewhere. In short, changing the value of ρ changes the effective plasticity rate to

According to Eq. (3), when γ<0, _{i}_{i}_{low}_{i}_{low}

The fixed point of the average trajectory of Eq. (52) is

The analytical results presented in this paper hold for a general diminishing-return reward schedule. They are demonstrated in the simulations using a concurrent VI reward schedule

The sum of baiting probabilities in all simulations was kept constant at 0.5; σ = 0.1; E[_{0} = 0.001. Each symbol in ^{6} trials of fixed baiting probabilities. Susceptibility was measured by computing the least-square-error linear fit.

I am indebted to H. S. Seung for many fruitful discussions and encouragement, and to D. Hansel and M. Shamir for their helpful comments on the manuscript.