Dynamic Excitatory and Inhibitory Gain Modulation Can Produce Flexible, Robust and Optimal Decision-making

Behavioural and neurophysiological studies in primates have increasingly shown the involvement of urgency signals during the temporal integration of sensory evidence in perceptual decision-making. Neuronal correlates of such signals have been found in the parietal cortex, and in separate studies, demonstrated attention-induced gain modulation of both excitatory and inhibitory neurons. Although previous computational models of decision-making have incorporated gain modulation, their abstract forms do not permit an understanding of the contribution of inhibitory gain modulation. Thus, the effects of co-modulating both excitatory and inhibitory neuronal gains on decision-making dynamics and behavioural performance remain unclear. In this work, we incorporate time-dependent co-modulation of the gains of both excitatory and inhibitory neurons into our previous biologically based decision circuit model. We base our computational study in the context of two classic motion-discrimination tasks performed in animals. Our model shows that by simultaneously increasing the gains of both excitatory and inhibitory neurons, a variety of the observed dynamic neuronal firing activities can be replicated. In particular, the model can exhibit winner-take-all decision-making behaviour with higher firing rates and within a significantly more robust model parameter range. It also exhibits short-tailed reaction time distributions even when operating near a dynamical bifurcation point. The model further shows that neuronal gain modulation can compensate for weaker recurrent excitation in a decision neural circuit, and support decision formation and storage. Higher neuronal gain is also suggested in the more cognitively demanding reaction time than in the fixed delay version of the task. Using the exact temporal delays from the animal experiments, fast recruitment of gain co-modulation is shown to maximize reward rate, with a timescale that is surprisingly near the experimentally fitted value. Our work provides insights into the simultaneous and rapid modulation of excitatory and inhibitory neuronal gains, which enables flexible, robust, and optimal decision-making.


Introduction
Perceptual decision-making often requires temporal integration of sensory information and its subsequent transformation to a categorical motor choice [1]. The decision or response times in speeded perceptual decision tasks can range from tens of milliseconds to a second or more [2,3]. Perceptual decisionmaking is not a simple standalone feed-forward sensorimotor integration process, but is distributed and subjected to various neuromodulatory or cognitive control processes, possibly to enhance decision performance [1,[4][5][6][7][8][9][10][11][12].
In this work, we incorporate a time-varying excitatoryinhibitory gain modulation mechanism into our previous cortical microcircuit model of decision-making [55,56]. The model is based on and compared very closely to two classic visual motion discrimination task experiments performed in animals: a reaction time task and a cued response task [57]. Our model is constrained by both the neuronal and behavioural data of a reaction time task. The model naturally captures the essential characteristics of the neuronal firing rates throughout a trial in both experiments, with weaker gains in the seemingly less cognitively demanding cued response task. Dynamical systems analysis is used to provide insights into the flexibility and robustness of the network dynamics under the co-modulation of neuronal gains. We also show that with gain modulation, strong recurrent synapses are not necessary for making and storing decisions. Finally, using realistic temporal delays in the reaction time task, our model simulations show that rapid recruitment of gains can optimize decision performance, and suggests that the animals may adopt such a strategy. Part of this work has been presented at the Computational and Systems Neuroscience 2010 meeting [58].

Two-choice motion-discrimination tasks
The classic experiments of [57] involved primates performing two versions of a dot-motion-discrimination task. In the reaction time (RT) task ( Figure 1A), subjects were trained to make a (e.g. left-or rightward) saccadic eye movement in the direction of the motion coherence of the random dots stimulus, at their own pace. The stimulus was presented till a saccade was detected. For the fixed duration (FD) task ( Figure 1B), they were instead allowed a (1 sec) fixed viewing duration following which they were required to withhold their decision response until a cue was given to respond. Neuronal firing activity in the lateral intraparietal area (LIP) and behavioural performance were simultaneously recorded ( Figure 1C,D).

A biological cortical circuit model for decision-making
We used a neural circuit model of decision-making [55] that consists of two competing excitatory neural populations, each selective to a presented stimulus, e.g. with opposite motion direction selectivities in a motion discrimination task. An implicit population of interneurons provides common inhibitory feedback in the network (see Figure 2A). This model was reduced and approximated from a spiking neuronal network model of 2000 neurons [59] to an effectively analyzable model.
Input synaptic currents and output firing rates. As in [55] and [56], the input-output function of a single noisy excitatory cell is where r is the population-averaged firing rate, I is the total synaptic input current to a neuron, and i~L or R, denoting selectivity to a leftward or rightward motion stimulus, respectively. The non-linear input-output function f E (I) is approximated from the first-passage time input-output relation of a leaky integrateand-fire neuron [46,55,60] (see Eq. (9) in the Materials and Methods section). g E (t) and g I (t) represent the time-varying gain modulation parameter (see Figure 2B) of excitatory and inhibitory cells, respectively. For the inhibitory interneuronal population, we assume that r I~gI f I (I I ) is linear so that it can be implicitly embedded in the reduced two-variable model for analysis [55]. Following [56], we assume that recurrent excitation and inhibition in the network is mediated through NMDA and GABA A receptors, respectively. At any given time, the total synaptic currents to the two neural populations are given by I L,total~JLL S L zJ LR S R {J EI S I zI b zI motion,L zI target zI noise,L I R,total~JRR S R zJ RL S L {J EI S I zI b zI motion,R zI target zI noise,R ð2Þ where the J ij (w0) represents the effective synaptic coupling from neural pool j to i. Within a selective population J ii is a constant ( = 0.32 nA) times the (dimensionless) strength of the recurrent excitation, w z w1, while between selective populations, J ij , is the same constant times w {~1 {f w z {1 1{f v1, where f~0:15 is the fraction of selective neurons. w z (w { ) can be viewed as representing synaptic potentiation (depression) between neurons in the same (different) excitatory population after learning [61,62]. Unless otherwise specified, we used w z~2 :1, as in [56]. I b , I target , I motion and I noise are the input currents due to overall background inputs from neurons outside the local network, static choice targets within the response fields of the LIP neurons, output of upstream motion selective MT/V5 neurons, and noise from the motion stimulus and from within the brain, respectively (Figure 2A, C). S L and S R are synaptic gating variables of NMDA-mediated receptors, i.e. the population-averaged fraction of open channels. S I is the gating variable of GABA A receptors.
Following the deduction of [55], the network can be further reduced to a two-variable model, as we may consider J ii S i zJ ij S j {J EI S I zI b :J ii,eff S i zJ ij,eff S j zI b,eff ð3Þ

Author Summary
Perceptual decision-making involves not only simple transformation of sensory information to a motor decision, but can also be modulated by high-level cognition. For example, the latter may include strategic allocation of limited attentional resources over time in a decision task to improve performance. At the neurophysiological level, there is evidence supporting attention-induced neuronal gain modulation of both excitatory and inhibitory cortical neurons. In the context of perceptual discrimination tasks performed by animals, we make use of a biologically inspired computational model of decision-making to understand the computational capabilities of such co-modulation of neuronal gains. We find that dynamic co-modulation of both excitatory and inhibitory neurons is important for flexible, and cognitively demanding decision-making while also enhancing robustness in the decision circuit's functions. Our model captures the neuronal activity and behavioural data in the animal experiments remarkably well. Decision performance in a reaction time task can be optimized, maximizing the rate of receiving reward by using fast gain recruitment. Our experimentally fitted timescale is near the optimal one, suggesting that the animals performed almost optimally. By providing both computational simulations and theoretical analyses, our computational model sheds light into the multiple functions of rapid co-modulation of neuronal gains during decisionmaking.
with J ij,eff v0 allowing competition via effective mutual inhibition between the two selective excitatory populations (see Eq.(10) in the Materials and Methods section). Choice targets and motion stimulus inputs. The input current due to the encoded target ( Figure 2C) can be modeled as where J target is the coupling strength for the choice target and m target is the firing rate of upstream visual areas encoding the target. We set the choice target onset time, t target to be at 1300 ms after the start of a trial simulation. The firing rate of upstream neurons encoding the target, m target first attains 70 Hz before exponentially decaying over time to 30 Hz with a time constant of t ad~1 20 ms. The exact exponential time course, which is adopted from [56], is not important for our model's computations, but follows the experimental data in [17], mimicking visual adaptation of the stationary visual target stimuli. Firing rates of upstream MT neurons, selective for a particular direction of motion, can be assumed to increase (decrease) roughly linearly with the motion strength, when the motion is in the preferred (anti-preferred) direction of the neuron, in the regimes of motion strength experimentally tested [63] ( Figure 2D). For simplicity, we assume that there are no differences between the slopes of this linear function, except in sign, for motion in the preferred or anti-preferred directions (our results are qualitatively similar if we assume lower slopes, e.g. 2-3 times shallower, for the anti-preferred direction, see Figure S4). Thus the external current encoding motion stimulus relayed to the LIP neurons is expressed as where J MT is the coupling strength for the motion stimulus, c ranging from 0% to 100% represents motion coherence of the random dots, and t motion~2 100 ms is the motion stimulus onset time. The z({) signifies motion direction in the preferred (antipreferred) direction of the neuron m 0~4 0 Hz corresponds to the mean firing rate of MT neurons for the ambiguous zero motion coherence (c~0% [63]). Time-varying gain modulation. We assume the simplest form of gain modulation, a ''gain field'' [24,46,64] g with an effective time constant and amplitude ( Figure 2B inset; cf. sigmoidal function in [65], hyperbolic function in [17], and the more complex function in [15], [16]): where E or I denotes the excitatory or inhibitory population. t g I~2 000 ms is the onset time of gain modulation of inhibitory . The trial begins with the appearance of a fixation point followed by two choice targets, and then a motion stimulus in the form of computer-generated random dots. The motion stimulus has a fraction of the dots moving towards either the left or the right choice target, constituting the motion coherence of the stimulus. The subject is trained to discriminate this motion coherence and make a motor choice (saccade) in the same direction as this motion coherent direction towards the corresponding choice target (right in the above figure). In the RT task, the subject makes a saccade once it has accumulated sufficient evidence in support of its decision, and the motion stimulus is removed once a saccade is made. In the FD task, the motion stimulus is presented for a fixed duration of time (e.g. 1 second) before it is removed during a delay period. The subject has to remember the motion coherent direction to guide its saccadic choice. (C,D) LIP neural firing rate timecourse from the RT task (C) and the FD task (D neurons while t g E~2 040 ms is that for excitatory neurons. The 40 ms onset delay is used to replicate the signature 'dip' phenomenon [18]. We then chose a time constant of t g~1 20 ms in our simulations, to fit the neuronal and behavioural data of [57]. This is also around the timescale of rapid covert shift of attention in area LIP [66]. Our results are independent of the specific neural implementation of the gain modulation mechanism. Setting g 0~0 would lead to a gain factor of 1, which we assumed throughout the pre-motion stimulus epoch in a trial. In a RT task, upon motion stimulus onset, g 0E~2 and g 0I~0 :1. In a FD task, the gains were scaled to g 0E~0 :1 and g 0I~0 :06 during motion stimulus presentation to keep the firing rate encoding the accumulated decision below the fixed response threshold. However, upon cue to respond (at 4000 ms), the gain was increased to that of the RT task (with the same time constant). The specific values of g 0 were selected to fit the experimental data. The increasing gain over the course of a trial during motion stimulus presentation is representative of the fact that attentional modulation of decision-making would increase over the course of evidence accumulation, creating an urgency-to-response signal during this process.
Note that while representing the gain modulation of the explicit excitatory selective populations is straightforward (see Eq. 1), the effect of changing the gain of the inhibitory interneurons implicit in our effectively two population model is subtler. Increasing the inhibitory gain decreases the effective synaptic couplings in J ii and J ij by a linear factor of this gain as well as decreasing the mean background synaptic input I b by another linear factor (see Eq. 11 in the Materials and Methods section).
Dynamical equations. The slowest decay time constant in the model is that of NMDA receptors (t S~1 00 ms). All other dynamical variables operate at a much faster timescale and are assumed to achieve their steady states relatively rapidly. Thus the dynamical equations governing the network are [55,56] dS i dt~{ where c~0:641 is a fitted parameter [55].
Model parameters. In any parameterized model, there is always a fine balance between incorporating more biological details and reducing the number of model parameters. There are certainly simpler models than ours, with minimum parameters, such as the drift-diffusion model, which, with or without gain modulation can model the behavioural data [15,16,67,68]. However, as stated earlier, these abstract models cannot directly correlate with neuronal and synaptic properties, and thus cannot realistically incorporate inhibitory gain modulation.
Other than the new parameters pertaining to our gain modulation mechanism, the entire model parameters and their values are identical to those in our previous modelling work [55] and [56], more directly following the latter. The list of the more critical adopted and new parameters is shown in Tables 1 and 2, respectively.
The new parameters, pertaining to excitatory and inhibitory gain modulation consist of only their onset times (t gE ,t gI ), time constants (t g ), and amplitudes (g 0E ,g 0I ). These parameters were used to provide a qualitative rather than quantitative fit to the neural and behavioural data, and we simulated predictions at a range of parameter values until the desired fits were isolated. The onset times are constrained to qualitatively but reliably replicate the signature dip phenomenon of the firing rates right after motion stimulus onset and are fixed throughout this study. t g allows the network to dynamically change configuration from the dip period to the motion stimulus period. The value of t g is obtained from fitting the neuronal and behavioural data and we shall explore its effects towards the end of the paper. It also affords us an opportunity to study the optimal time-scale of gain recruitment in RT tasks, similar to [54], but with a more realistic model and simulation setting (the time parameters in the simulated experimental trial follow closely that of [57] and [15]). The model's ability to form and store decisions is independent of the values of t gE ,t gI and t g . This depends on the gain amplitude parameters (g 0E ,g 0I ). We shall show that our model is not sensitive to our chosen values of the gain amplitude parameters g 0E ,g 0I . The details of how the new parameters were constrained are provided in the Materials and Methods section and in Table 2.
Epochs in a trial. A simulated trial can be categorized under separate epochs [17] (i) fixation-only (tvt target ), with baseline firing rates; (ii) fixation-target (t target ƒtvt g ), with an initial phasic burst of activity and then adapting into a steady firing; (iii) fixationtarget-gain (t g vtƒt motion ), where the neuronal gains are assumed to start increasing; (iv) fixation-target-gain-motion (t motion vtƒt threshold=cue ) with an additional random-dot motion stimulus, and firing rates of the two selective populations start to deviate. t threshold=cue is the time of threshold crossing when the motor action of a saccade is initiated in an RT task or the time of cue presentation in the FD task.  Wang (2002), 2 Wong and Wang (2006), 3 Wong et al. (2007), 4 Eckhoff et al. (2011), 5 Churchland et al. (2008), and 6 Britten et al. (1993). A motor action (saccadic eye movement) is triggered when the higher population firing rate crossed a prescribed threshold at 70 Hz (cf. Figures 7 and 9 in [57]). In a RT task, the time elapsed from motion onset to threshold crossing yielded the decision time. A non-decision latency of 245 ms due to sensory signal transduction and motor saccadic preparation was added to the decision time to obtain the observable RT.
Excitatory-inhibitory gain modulation in the reaction time task As shown in Figure 3A our model captures the essential timecourse of the neural data in a RT task. Similar to previous models [55,56,59], our model reproduces the faster ramping up (down) of firing rates for larger motion coherences when the motion is into (out of) the response field of a LIP neuron. Our model can also reproduce the psychometric (accuracy) and chronometric (reaction time, RT) data of the experiment of [57] ( Figure 3B and C), and the RT distributions for both correct and error trials ( Figure 3D and E). In addition, our model naturally captures the characteristic dip at motion onset by modulating inhibitory gain before excitatory gain, providing an alternative mechanism to our earlier work [56]. In the RT task experiments [57], the firing rates of LIP neurons during motion stimulus presentation were observed to diverge from a level higher than the adapted firing rate during the target epoch ( Figure 1C). Our model is able to replicate this phenomenon, unlike previous work [56,69,70,71].
This activity timecourse of our two variable (r R ,r L ) model can be better understood by investigating its dynamics on a two dimensional phase/state space called the phase plane (see Materials and Methods, Figure S1 and Figure 4). Here we shall for simplicity consider only the unbiased motion stimulus (i.e. zero motion coherence, c~0%) condition. We defer the explanation for non-zero motion coherence, for both correct and error trials to [55], [56], and [72].
Following the procedure as in previous work [55,56], we first set both dynamical equations in Eq. (5) to be dS i dt~0 and solve for the The solutions for each equation (called a nullcline, orange and green curves in Figure 4) can be plotted in the two dimensional (S R ,S L ) phase plane. We transform the dynamical variables (synaptic gating variables) S R ,S L to firing rates r R ,r L (see Materials and Methods), enabling direct comparison with the experimental data.
Intersections of the nullclines, by definition, give us the steady states of the network. Steady states can either be stable, i.e. attractors (black filled circles in Figure 4), or unstable (open circles in Figure 4, see Materials and Methods and Figure S1). In the firingrate space, a steady state can be symmetric, i.e. lie along the phase plane diagonal, or asymmetric, i.e.. off-diagonal. This means the firing rates of the competing selective populations can be equal (symmetric) or unequal (asymmetric).
Symmetric (on-diagonal) attractors allow stable, steady, equal firing rates of both selective populations, which prevents decisionmaking and categorical choice. A symmetric unstable steady state on the other hand can force the network's state to move offdiagonally, causing the firing rate of one (winning) population to increase while that of the other (losing) to decrease, enabling decision-making and categorical choice. Asymmetric, 'choice' attractors ensure that the firing rates of the winning (losing) populations reach a stable steady state, and do not increase (decrease) without bound. Our modus operandi then is to let the network reach symmetric attractors during the fixation and fixation-target periods and then using gain modulation, approach (along the diagonal line) a symmetric unstable steady state during the fixation-target-gain-motion period (see Movie S1).
During the fixation period ( Figure 4A, inset, shaded region), the firing-rates of both competing selective populations are at a low, spontaneous/baseline stable state. This is represented by the network starting off and remaining at a low symmetric attractor ( Figure 4A).
When the choice targets appear, a burst of input current (I target ) transforms the nullclines such that there is only one attractor, which is symmetric ( Figure S1A, Movie S1). This precludes any winner-take-all dynamics and allows the firing rates of both competing populations to be reliably activated to an equal, high level. After adaptation, the network settles at a symmetric attractor ( Figure 4B) with a higher firing-rate than that during the fixation period. The grey region of Figure 4B represents the basin of attraction of this symmetric attractor. Trajectories starting in this region are attracted into this symmetric attractor. Although additional asymmetric attractors are present, the large basin of attraction of this symmetric attractor shows that this symmetric attractor is very stable. Even with noise (see noisy trajectory in Figure 4B (dark blue)), any winner-take-all dynamics and consequently, any decision-making during the target period is prevented, and both populations fire at the same rate prior to the onset of the motion stimulus ( Figure 4B, inset, shaded region), consistent with experimental data [17]. So far, we have ensured that our model behaves similar to our previous work [55,56], although later, we shall show that the presence of multiple stable states during the fixation and fixationtarget periods is not necessary. However, the model starts to differ from here onward. Immediately upon motion stimulus onset, the gain of the inhibitory neural population is turned on, creating a lower but nearby symmetric, unstable steady state ( Figure 4C, solid nullclines). Only trajectories starting on the diagonal line (called the stable manifold, see Materials and Methods, and represented by the black curve with arrows pointing towards this symmetric unstable steady state) are attracted to this unstable steady state, all others are repelled away. Since the attractor formed after target-adaptation is symmetric, i.e., on diagonal ( Figure 4B), the network starts from and moves along the diagonal line (with equal firing rates) towards this lower unstable steady state (sample trajectory in Figure 4C, dark blue). This creates an equal 'dip' in firing rates. Before it can reach the unstable steady state, the gains of the excitatory selective populations are activated. Consequently, the symmetric unstable steady state is raised ( Figure 4C, dashed nullclines), and the population firing rates increase, once again along the diagonal line (with equal firing rates). This yields the recovery from the 'dip'. Our delayed gain onset can thus create the dip phenomenon and the recovery from it ( Figure 4C, inset; Figure 3A) without lowering the overall inputs to the system (as implemented in previous modelling work).
The input current due to the motion stimulus (I motion ) causes the net input to the network to increase, causing the symmetric unstable steady state (after momentarily becoming stable, see Movie S1) to be at an even higher firing rate ( Figure 4D). As we shall show, the co-modulation of both excitatory and inhibitory gains is necessary to allow this symmetric unstable steady state to be at a higher activity level than the adapted target firing-rate (compare with Figure 4B). After briefly moving towards the symmetric unstable steady state (sample noisy trajectory in Figure 4D, dark blue), the network eventually gets perturbed off the diagonal line. The network is then repelled away from this unstable steady state to another curve (called the unstable manifold, see Materials and Methods, and shown by the black curve with arrows pointing away from the symmetric unstable steady state) and towards one of the asymmetric 'choice' attractors. This causes the firing rates of the competing selective populations to diverge such that the firing-rate of one population ramps up while that of the other ramps down, exhibiting winner-take-all behaviour, and forcing a decision. Prior to reaching one of these choice attractors, a motor action (saccade) is made when the network crosses the motor/saccadic threshold (70 Hz in our case; dashed horizontal black lines in all panels and insets of Figure 4). The various epochs within a trial are summarized in Movie S1. Additionally, the network can be reset before the start of the next trial by allowing the gains to decay to a low value after the threshold is crossed (see Figure S2).

Robust and flexible decision dynamics with excitatoryinhibitory gain modulation
Having accounted for the observed neural and behavioural data, we shall now demonstrate how gain modulation of both excitatory and inhibitory neurons is necessary for flexible and robust decision-making. For simplicity, we shall consider an unbiased stimulus input, i.e. zero motion coherence (see [72] for biased stimulus input) condition.
Following [55], we will map out the range of possible stable and unstable steady states, i.e. the stability (bifurcation) diagram of the system, as a function of a variable or parameter of interest. Parameter regimes where both asymmetric attractors and symmetric unstable steady states exist are regimes of decisionmaking and categorical choice. Figure 5 plots the stability diagram as a function of net (target and motion) stimulus input I stim~Itarget zI motion for a single selective excitatory population in the absence (black) and presence (grey) of excitatory-inhibitory gain modulation. Each of these consist of stable (bold lines) and unstable (dashed lines) loci of steady states. The upper and lower stable branches denote the 'choice' attractors for the winning and losing selective populations, respectively. Two of the stable loci and one unstable loci together form a continuous smooth line (highlighted with the word symmetric in Figure 5) that represents the symmetric steady states along the phase-plane diagonal in Figure 4. Its stability changes from stable to unstable and back to stable ( Figure 5) as the net stimulus input increased; the unstable region (double arrowheads) is where decision-making is possible. One can already easily see that this decision-making regime with excitatory-inhibitory gain modulation (grey double arrowheads) is significantly much larger than without gain modulation (black double arrowheads); i.e. comodulation of excitatory-inhibitory gains can lead to enhanced robustness of the decision-making process.
As in [56], in the absence of gain modulation, the selective excitatory populations transition from a low-activity spontaneous attractor during fixation epoch (circle, Figure 5) to a high-activity attractor during the choice target epoch (triangle, Figure 5) due to the choice target input, without forming any decision. Later in the trial, the motion stimulus further increases the net stimulus input. Since the symmetric curve increases monotonically with net stimulus input, this further increase in the input would force the symmetric steady state further right in the stability diagram than during the target period. Although this state has a higher firing rate, it is stable and thus does not allow any winner-take-all dynamics and decision making.
Previous modelling work solved this problem by reducing the input current due to choice targets upon motion stimulus onset [56,[69][70][71], by assuming divided covert attention from choice targets to the motion stimulus. The explicit reduction in target input compensated for the increase in input at the onset of motion stimulus. However, this manipulation leads to restrictions in the dynamical range of the neural firing rates. For example, the firing rates diverge at a level lower than the pre-motion stimulus firing activity level, contrary to some experimental findings [57,73]. Furthermore, the choice targets remain on display throughout a trial, rendering this implementation questionable. In this work, we provide an alternative biologically plausible mechanism of modulating the gains of both excitatory and inhibitory neurons. Despite the increase in net stimulus input due to the motion stimulus, gain modulation enables a transition from the symmetric attractor during the target period (triangle in Figure 5) to a higher activity, symmetric unstable steady state during the motion period (square in Figure 5). The firing rates thereby diverge from a higher activity level during the motion period than that of the adapted target firing rate.
In addition to making the decision process more robust and more dynamic, excitatory-inhibitory modulation can also create a wider range of firing rates that can be achieved when storing categorical choice (compare the firing rates of the black and grey upper stable branches, which represent the neural storage of choice). The higher firing rates for the grey upper stable branch can more easily allow a fixed motor threshold to be crossed, i.e. motor action to be initiated. It is precisely this flexible mechanism that allows it to be also used for other behavioural task paradigm e.g. the FD task, which we will later show. It is further noted that these wide range of firing rates can also allow this threshold (currently fixed at 70 Hz) to vary more widely, adding another dimension towards more flexible decision-making strategy for speed-accuracy trade-off [68,[74][75][76], provided there is a separate independent neural mechanism to instantiate such a decision/ motor threshold [77,78].
To further demonstrate the inflexibility of either excitatory or inhibitory gain modulation as opposed to their co-modulation, we plot the stability diagram of a single selective excitatory population with respect to each (excitatory or inhibitory) gain parameter. Figure 6A shows, for a fixed stimulus input, the effects on the excitatory neural population as the excitatory gain g E is varied with the inhibitory gain g I fixed at the control value of 1. As g E is increased from 1, the selective populations can transition from a regime with multiple high stable (HMS) steady states (including a symmetric stable state) to one with a very high single stable (HSS) firing activity. Similar transitions occur if we decrease the inhibitory gain g I from 1 ( Figure 6B). This means that increasing (decreasing) excitatory (inhibitory) gains in isolation does not allow any winner-take-all dynamics, as both populations fire at the same level ( Figure 6C).
Conversely, we may decrease (increase) excitatory (inhibitory) gains in isolation. The selective populations transition through the decision-making (DM) regime (where symmetric unstable steady states coexist with asymmetric attractors) to a single low stable branch (low single steady state, LSS). Increasing the inhibitory gain leads to an additional regime with multiple steady states including a low spontaneous stable state (LMS). The decisionmaking regime has a lower firing rate than that during the target period. Consequently, the firing rates of the selective populations diverge at a lower firing rate than the adapted target firing rate. Thus a decrease (increase) excitatory (inhibitory) gain in isolation can enable decision-making but at a smaller dynamic range (e.g. lower firing rates) and slower decisions ( Figure 6D) than found in experiments [57,73].
To more completely understand the interplay between stimulus inputs and gain modulation parameters, we extend the stability analysis of Figures 5 and 6A, B to the gain (g E , g I ) space for different epochs (and hence overall stimulus inputs) within a trial. As observed in Figure 6B (and Figure 5), in general, there are five distinct dynamical regimes for a single excitatory selective population (LSS, LMS, DM, HMS and HSS). We can see in Figure 7 that in the gain space, the range for g E is larger than that for g I due to the generally steeper input-output function of inhibitory than excitatory neurons. More so, it should be noted that our inhibitory gain modulation is not exactly multiplicative (see Materials and Methods).
During the fixation epoch ( Figure 7A), m 0~0 Hz, the state without gain modulation (g E , g I )~(1, 1) lies within the LMS regime (black filled circle, Figure 7A). We have implemented the state in this regime purely to be consistent with previous modelling work [55,56]. However, a trial need not necessary have to start within this regime; an alternative regime could be LSS (e.g. with a Figure 5. Robust decision-making regime with excitatory-inhibitory gain increase. Stability diagram of a single selective excitatory population as a function of the net stimulus input I stim with zero motion coherence. Black: without gain modulation, g E~1 ,g I~1 . Grey: gains increase to g E~3 ,g I~1 :1. Solid and dashed lines are the stable and unstable steady-states, respectively. Double horizontal arrows show the range where a symmetric unstable steady state (dashed symmetric curves) co-exists with asymmetric stable steady states (upper and lower stable branches). These are the dynamic ranges of decision-making under these two conditions. Circle, triangle and square represent the fitted firing rate for the net stimulus input during fixation, target and motion periods, respectively. Vertical dashed double arrows show the winner-take-all effect (from the square) during motion stimulus and gain increase, either transiting to the upper winning branch or lower losing branch. Note that with gain modulation, the upper branch is mostly higher than the 70 Hz response threshold, enabling saccade initiation to the winning direction. doi:10.1371/journal.pcbi.1003099.g005 smaller g E value). In the presence of a target stimulus of a (adapted to) firing rate of 30 Hz, all the dynamical regimes generally shift upward ( Figure 7B). In particular, the LMS regime becomes much smaller and not conducive for the population state to exist there, unless with very precise fine-tuning. It therefore makes sense for the transition to the HMS regime. Again, in principle, (g E , g I ) need not be (1, 1) (black filled circle, Figure 7B), but can be anywhere within the HMS or even the HSS regime as long as the activity is high. We chose (1, 1) to fit the neural data in [57]. Note that the DM regime has significantly increased at this stage of the trial.
With both choice target and motion stimulus onset (30 + 40 Hz, respectively), the dynamical regimes are only slightly altered ( Figure 7C). This is because the overall input into each selective population is primarily dominated by the target input ( Figure 2C). The DM regime is rather wide in both Figures 7B and C. The black filled circle in Figure 7C ((g E , g I )~(3, 1:1)) shows the model's fit to both neural and experimental data (Roitman and Shadlen (2002)), which is close to the transition of dynamical regimes (bifurcation point) between DM and HMS. Typically when approaching such a boundary, the dynamics of the network is generally slow, and the RT distributions can exhibit long tails [72,79]. However, if we have gains that continuously increase over time (creating a form of 'urgency' signal), we can curb such behaviour ( Figure 3D), which is not observed in [57] (see Figure 3E; [15,65]).
Overall, Figure 7 summarizes our analyses and shows that our model's gain parameters are robust and insensitive to small perturbations, and yet, tightly constrained by both neural and behavioural data. In particular, Figure 7 shows that for any afferent input, an increase in inhibitory gain alone can lead to more robust dynamical regimes than with only excitatory gain increase. However, the firing rate of the symmetric unstable steady state would become too large (*85 Hz, see Movie S2) and would not fit the experimentally observed divergence point of firing rates [57]. The dynamical regimes robustness can be further and continuously enhanced by following an appropriate increase in both excitation and inhibition, i.e. a larger increase in excitatory than inhibitory gains. show direction of change as g E or g I varies, respectively. Vertical dashed lines partition regimes of g E in (A) and g I in (B), respectively. A regime can have a single symmetric stable steady state, which is either low (LSS) or high (HSS), or multiple stable steady states: one symmetric and two asymmetric, with two asymmetric unstable steady states. The symmetric steady state can be low (LMS) or high (HMS). Or it may have a symmetric unstable steady state with asymmetric stable and unstable steady states. This constitutes the decision-making (DM) regime. (C,D) Sample activity timecourses showing either no winner-take-all behaviour (C) or divergence at low firing rates, when the excitatory (inhibitory) gain is increased (decreased) in isolation (C), or when the excitatory (inhibitory) gain is decreased (increased) in isolation (D), respectively. doi:10.1371/journal.pcbi.1003099.g006 Low gains in a cued response task with fixed viewing duration (FD) and a delay period Comparison between model and experiment. Unlike the RT task, in a cued-to-response version of the decision task with a fixed viewing duration (FD), participants may not need to search for a trade-off between response time and accuracy, and emphasize only on the latter [57,80]. We hypothesize that such FD tasks require lesser cognitive effort, and hence lesser (one quarter) gain modulation values in our model than in the RT tasks. Our fitted gain parameters are selected to be within the decisionmaking regime ( Figure 7C, opened circle).
In the FD task experiments, the neural firing rates during the 1 second motion stimulus epoch are found to be generally lower in the FD task than in the RT task ( Figure 1C,D). Furthermore, the divergence of firing rates are also lower in the FD task. In fact, this divergence can even take place at lower activity level than the (adapted) target firing rate [18,80]. Most importantly, the firing rates in the FD task are observed to diverge during motion presentation but maintained at a low activity level during the delay period. Our model with low gain modulations can account for these neuronal effects ( Figure 8A, B) of [57]. The lower gains cause slower ramping of neuronal activity (compare with Figure 3A). Although the fixed viewing duration of 1 sec would allow sufficient time to integrate sensory information with higher gains, for lower gains it does not. Thus, decisions are less accurate in the FD task than in the RT task ( Figure 8B inset). This is also consistent with the account in [59].
In Figures 8 A and B, we can see that during the motion stimulus period, although the firing rates have already diverged and thus the decisions formed, the activities are maintained at levels lower than the response threshold of 70 Hz. This is achieved by having lower excitatory and inhibitory gains. Thus, in the FD task, it is clear that the threshold is actually a motor response threshold rather than a decision threshold per se; the decision is already formed during motion stimulus presentation prior to the delay period and response cue. When the cue to respond is presented, the gains are increased to values as in the RT task. The firing rate of the winning population rapidly crosses the threshold and a corresponding saccade is initiated.

Strong recurrent excitation is not necessary for forming and storing decisions
Previous work has suggested that strong recurrent excitation is necessary for the formation of a decision and its maintainence in working memory during a delay period [55,59]. In the absence of gain modulation, with weak recurrent excitation (w z ƒ1:6), the stability diagram with respect to the net stimulus input current is shown in Figure 9A (black trace). Since there is neither any unstable steady state nor choice attractors, the network is incapable of forming or maintaining decisions ( Figure 9B). However, with excitatory and inhibitory gain enhancements, for the same weak recurrent excitation and range of net stimulus input current, the network can have a symmetric unstable steady state ( Figure 9A, dark grey trace) and can still perform very similar functions of decision formation and storage ( Figure 9C) as in Figure 8A. We have used a higher excitatory gain g E~1 :8 compared to the g E~1 :1 in Figure 8A, while maintaining g I~1 :06 in both cases.
Although there is no intrinsic hysteresis in its stability diagram (i.e. no LMS), the network can still sustain its decision formed throughout the delay period (with the removal of motion stimulus input) as long as the target stimulus remains (which is the case in [57]). Upon response cue onset, the excitatory and inhibitory gains are increased (g E~5 , g I~1 :1), so that the upper stable branch, i.e. of the winning 'choice' attractor has firing rates that are higher than the motor decision threshold ( Figure 9A, light grey trace) enabling the crossing of the saccadic threshold ( Figure 9C).
Thus, we have demonstrated that with co-modulation of excitation and inhibition, strong intrinsic recurrent excitation is not necessary for decision formation and storage. This can be explained heuristically by first noting that the input due to recurrent excitation I is proportional to w z g E r, where r is the (pre-synaptic) firing rate. Due to the multiplicative nature of these parameters and variables, a high excitatory gain g E can compensate weak recurrent excitation w z .

Optimal decision performance with fast recruitment of gains
We have so far been assuming a single time constant of gain modulation. However, this fitted time constant may not necessary be the optimal time constant for the tasks which we have discussed.
In particular, a RT task involves a speed-accuracy tradeoff: slow RTs are more accurate while fast RTs may lead to more errors [54,81]. Since only the correct trials are rewarded and error trials are penalised with a lengthened trial duration, performance can be measured by the average reward rate, which can be quantitatively defined as the total number of correct trials divided by the total time duration spent in a block or multiple blocks of trials. Thus, a form of optimal performance in RT task would require maximizing the reward rate. The time duration of a trial not only depends on the subject's RT but also on the experimental task design (e.g. inter-trial interval, and various other temporal delays). In particular, a trial in [57] can consist of several temporal delays, each contributing to the trial duration [15]: (i) between appearance of the fixation point and monkey achieving stable fixation; (ii) before the appearance of choice targets; (iii) before motion stimulus was presented; (iv) the recorded RT of the monkey; (v) saccade duration; (vi) a possible delay before reward was provided depending on whether the choice was correct; and finally, (vii) an inter-trial interval before the reappearance of the fixation point. When calculating the overall time spent within a trial, we follow the procedure in [15]. The average trial duration (TD) is as follows.
On correct trials: which accounts for the fact that the subjects had to wait a minimum time after the onset of the motion stimulus before which reward was delivered. On error trials: which takes into account the timeout following error choices and the RT dependent timeout that was imposed to prevent impulsive guesses. For simplicity, we do not include trials on which fixation was broken without an immediate saccade being made to a choice target. These were rare in the experiment of [57] and not included in the data set used in their publication (Jamie Roitman, personal communication). The mean reward rate for a trial of a particular coherence is given by

Reward Rate~F raction of correct choices Mean trial duration : ð8Þ
The mean trial duration for a particular coherence is the weighted mean of the average trial durations for correct and error trials. Since all six coherences used were uniformly distributed within each block, the overall fraction of correct choices was simply the arithmetic mean of the fraction of correct choices for the individual coherences. Similarly, the overall mean TD was the mean of the mean trial durations for the six coherences.
In order to understand how fast gain modulation should be recruited in an RT task, we calculated the overall mean reward rate as a function of the time constant of gain modulation t g ( Figure 10A). We find that the optimal time constant of gain modulation operates at a relatively short timescale with t Ã g *190 ms ( Figure 10A inset), for which accuracy is maximized while the mean TD is minimized ( Figure 10B, C, respectively). This short time constant indicates the need for the fast recruitment of gain modulation in order to maximise reward rate. Our fitted time constant of 120 ms suggests that although monkeys in the experiments are not performing optimally, they are not too far from optimality. In particular, recruiting gain with a time constant of 120 ms would cost an average reward rate of 0:33 rewards per minute.
Modelling the behavioural data with this optimal time constant produces left-shifted psychometric curves with a coherence threshold a Ã RT~5 :19% and a slope b Ã RT~1 :33, corresponding to a better performance than the experimental data and our model's fit to it ( Figure 11A, upper panel). On the other hand, chronometric curves are shifted upward, revealing slower responses when compared to the experiment and our model's fit (Fig. 11A, lower panel), a consequence of the speed-accuracy tradeoff.
When calculating this optimal gain time constant, we have assumed that subjects recruit gain modulation with the same timescale throughout multiple blocks of trials, irrespective of task difficulty (motion coherence). However, if coherences are known, higher reward rates may more likely be achieved by the optimal recruitment of gain modulation for each individual coherence [82]. This could be possible for experiments in which the coherence is fixed within a block. Then the optimal gain time constant decreases with coherence, except for zero motion coherence, where it is near our model's fit of t g~1 20 ms ( Figure 11B). For lower non-zero motion coherences, it is longer Figure 9. Weak recurrent excitation in the FD task. (A) Stability diagram for a single selective excitatory population as a function of net stimulus input current I stim , with weak recurrent excitation (w z ƒ1:6) in the absence of gain modulation (g E~gI~1 ), with low gains (g E~1 :8, g I~1 :06) and large gains (g E~5 , g I~1 :1). Solid and dashed lines show stable and unstable steady states, respectively. (B) Without any gain modulation, the network cannot perform decision-making nor store decisions. (C) With sufficient gain modulation, network can form a decision and store it below a motor threshold (dotted horizontal line at 70 Hz). Upon cue to respond, higher set of gains allow threshold to be crossed and saccade to be made. doi:10.1371/journal.pcbi.1003099.g009 than the single optimal gain time constant, but shorter than that for higher ones. Since the optimal gain time constants are larger for lower coherences (c~3:2% and 6:4%), a longer duration of evidence accumulation becomes possible, and thus, RTs are lengthened ( Figure 11A, upper panel) and accuracies are improved ( Figure 11A, lower panel).
This task-difficulty dependent gain modulation strategy has a reward rate (averaged over the motion coherences) that is higher than that of the previous task-difficulty independent gain modulation strategy. In fact, this difficulty-dependent strategy can lead to an average reward rate of 9.66 rewards per minute, which is 0:25 rewards per minute greater than the reward rate using a single time constant of gain modulation ( Figure 11C) throughout multiple blocks of trials.
The optimal time constants are applicable when we use the realistic temporal delays of [57]. These impose temporal penalty delays to discourage the monkeys from responding impulsively. Eq. (7) shows that fast, erroneous responses elongate the error trial duration (TD err ) more than slow ones according to the exp { RT 1000ms term (see the dashed curves in Figure S3A, lower panel). Although theoretically, an erroneous response on a 0% coherence condition is ill-defined, responses for this condition in the experiment were randomly (with probability 0.5) deemed erroneous (see Figure S3A, middle panel) and thus also incurred the error delays. Since accuracy on a 0% coherence condition is 0.5, thus TD err and TD cor are equi-probable. The reward rate (see Eq. (8)) for responses that are too fast is therefore lower (see Figure S3A, upper panel). Furthermore, the minimum trial duration (see Eq. (6)) even for correct responses also discourages fast responding. However, if only RTs, plus a fixed inter-trial interval (Figure S3 B, lower panel) instead of TD were used to maximize reward rate [81], then the optimal time constant for a 0% coherence would indeed approach 0, i.e., it would be optimal to respond randomly and immediately ( Figure S3B, upper panel).

Discussion
Top-down cognitive control such as attention has been suggested to form an integral component in perceptual decisionmaking [30]. Our current study is inspired by the findings in [52], which show that attention can induce gain modulation of both excitatory and inhibitory neurons, and also by those of [53], which suggest that attention can have a time-varying nature. However, there has not yet been any study on how time-dependent gain modulation of both excitatory and inhibitory neurons can affect decision dynamics and performance.
Specifically focusing on the two highly-studied reaction time (RT) and fixed delay (FD) decision task paradigms, we use both computational simulations and dynamical systems analysis of a biologically inspired decision-making model to address this issue. In our study, we have shown that simultaneous dynamic gain modulation of both excitatory and inhibitory neurons is capable of reproducing the experimentally observed dynamic range of neural activities throughout an entire trial. Our model is able to robustly reproduce realistic temporal dynamics including the signature dip phenomenon in the firing rates (shortly after motion stimulus onset) without artificially lowering the overall stimulus input as implemented in previous modelling work [56,69,70,71]. Interestingly, there is some evidence to show that this dip is possibly related to lateral inhibition in the neural circuit [83]. We are also able to replicate the behavioural data of the monkey experiments, including slow mean and short-tailed RT distributions even when the network is operating near a dynamical bifurcation point (Figures 5 and 7C).
Without specifying any particular neural mechanism, our excitatory-inhibitory gain modulation can allow the same local cortical circuit to flexibly adapt over time to different decisionmaking task demands. By adopting higher gains in the RT than FD task, we were able to capture not only the behavioural data as in previous models [55,56,59] but also better replicate the neuronal activity timecourse of recorded LIP neurons. In particular, our model suggests that the presumed decision threshold in the FD task could actually be more of a motor activation threshold rather than an actual decision threshold -the decision is already made during the stimulus presentation. Our weaker gain implementation is based on the hypothesis that a RT task, which requires optimizing a speed-accuracy trade-off, is more cognitively demanding than the FD task.
When both excitatory and inhibitory neuronal gains are comodulated, our model becomes more robust to small changes with respect to the stimulus input ( Figure 5). Furthermore, Figure 7 shows that decision-making computations are robust when both gains are increased by an appropriate amount. From a broader perspective, this adds further support to our previous work that modulation of both excitation and inhibition is necessary to produce robust decision-making without sacrificing optimal decision performance [72,84]. In a more realistic setting, decision-making is usually influenced by a multitude of (e.g. sensory) information through modulation of the neuronal firing rate during temporal integration [56,73]. Thus, it may be important to have a decision network operating with a larger capacity to allow more potentially useful information to be stored during sensory integration. Such higher information storage capacity or larger decision bandwidth has recently been investigated in other contexts [71,85]. This decision bandwidth was previously shown to be relatively small with weak recurrent excitation (small w z ) [55].
In the absence of gain modulation, weak recurrent excitation can lead to decisions made not being stored in working memory (owing to an inability to sustain neural firing activity), and can prevent the subsequent motor action from being triggered (owing to neural firing activity which is lower than the motor threshold) [55]. With even weaker recurrent excitation, a decision may not even be formed at all [55]. This could impede performance in the FD task, which requires working memory during a delay period [84]. Our work here demonstrates that enhanced excitatory and inhibitory gains can compensate such weak recurrent excitation to make decisions, and even store them in working memory.
Our gain modulation mechanism is thus more flexible than conventional decision-making models [55,56,59]. Incidentally, Figure 9C is comparable to experimental data in [18,80]. Recent experimental evidence has shown that the parietal cortical neurons without persistent activity can still show some form of decisionmaking (winner-take-all) capabilities [86], thus supporting our proposed mechanism (see Figure 9A). Furthermore, by reducing the gains in our model after threshold crossing, we can also easily clear any storage of decisions, resetting the system towards a low single stable state (LSS) at the end of a task trial ( Figure S2), as observed in many experiments [17,57,73,80]. This deviates from previous computational work, which require negative current [87] or transient synchronized firing [88] to reset the system. Clearly, to inform such post-decision shutting down of gains, a different neural circuit may be necessary, and the basal ganglia may be a putative candidate [77,89].
Previous decision-making models have also incorporated timevarying gain modulations or urgency signals to allow flexible reconfiguration in the model dynamics, or in some cases, attempted to capture the characteristics of (especially LIP) neuronal firing rates throughout a decision-making trial [6,[13][14][15][16][17]21,[54][55][56]65,[69][70][71]82,89,90]. Although the models that incorporated urgency signals [15,16,21,54] share some similarities to our model, the instantiations of these models differ in distinctive ways. Specifically, in [15], the urgency signal multiplies the instantaneous evidence (i.e. drift rate and noise in a drift-diffusion process), while in [21] the decision bound or threshold generally decreases over time. In a two-choice task, these two mechanisms are equivalent [16]. The work in [54] resembles our model the Comparing optimal t gc for each individual motion coherence (triangles, dash-dotted curve) with model fit (t g~1 20 ms, (black bold line)), and single optimal (t g Ã~190 ms, (dashed line)). (C) Comparing RR for optimal t gc for each individual coherence with that of model fit (t g~1 20 ms), and single optimal (t Ã g~1 90 ms). Dotted: average of the RR with t gc over all motion coherences. doi:10.1371/journal.pcbi.1003099.g011 most, in which the slope of the input-output function of the population firing rate is modulated over time and the decision network transits from a leaky to a competitive integrator (see also similar discussions in [91,92]), whereas our gain modulation mechanism is more multiplicative (see Eq.(1)). Generally, these models are not as biologically grounded as our model, which was previously reduced from a spiking neuronal network model [55,56]. Our more realistic model is able to directly compare and qualitatively account for the full dynamical range of the LIP neuronal activity throughout a trial (e.g. compare with [57,73,80]). In addition, as in [15], our model uses realistic temporal delays and replicates the reaction time distributions in [57]. Finally, the most important distinctive feature in our model is that it incorporates the gain modulation of the inhibitory neural population for flexible and robust decision-making, which none of the previous modelling work has investigated.
Our work also demonstrates, for the first time, how dynamic excitatory-inhibitory gain modulation in a biologically realistic model can give rise to optimal decision performance. Using realistic temporal delays from the experiment of [57], we found that our model's excitatory-inhibitory gain modulation timescale (t g ) that maximizes reward rate (RR) in a RT task is not far from the value we obtained when fitting both the neural and behavioural data. This suggests that the monkeys may be performing not far from optimality, although a more in-depth study similar to [93] may be required to further support this claim. Interestingly, we found that the fitted gain time constant resides slightly on the left and steeper side of the RR-vs-t g ( Figure 10A) curve, i.e. t g vt Ã g . It would have been less cognitively demanding for the subject to operate around the shallower right side of the curve, where the cost of RR is lesser. The optimal gain time constant found in our model may not be the true value, but simply the optimal value given all other parameter settings. Precise experimental verification will be required to measure and disentangle the contributions of different parameters. Alternative model parameter settings such as with a higher noise level or a different MT neuronal firing output (e.g. weaker dependence on motion coherence in the anti-preferred direction) would lead to different optimal gain time constants (Figures S4, S5). However, changing these would introduce additional free model parameters. In this work. we prefer to maintain the model parameters based on previous work, especially since the main focus of our work concerns the flexibility and robustness afforded by the co-modulation of excitatory and inhibitory gains. It is however interesting to note that the fast 120 ms time constant we used to fit the data is similar to that of the urgency signal deduced in [21] from the same dataset [57].
The first part of our optimal performance study is based on the assumption that subjects employ the same 'strategy', and thus a fixed timescale of gain modulation throughout the whole block of trials. However, in experiments in which coherence is fixed within a block [76,81] and can be determined, adjustment of gains based on task difficulty is the optimal strategy in the RT task. We confirm this in the second part of our optimal reward study, which shows that by strategizing the timecourse of gain recruitment (e.g. via rapid feedback) for different task difficulty (e.g. motion coherence), a higher optimal RR can be achieved. The main contribution for higher RR, comes from the higher motion coherences (easier tasks). The task-difficulty dependent recruitment of gains, however, may not be practically adopted by subjects. It is plausible that this flexible, more optimal strategy may be too demanding to be practically implemented. The slight increase in RR may not be worth the cognitive effort ([76] make a similar argument in the context of response threshold modulation instead of gain modulation). Furthermore, our optimal time constants are applicable when temporal delays such as those in [15,57] are used to penalise fast errors more than slower ones.
Since in this case, fast errors reduce the reward rate more, the optimal time constant of gain-modulation gets skewed towards a longer time. Not including such temporal penalisations, but using a fixed inter-trial interval leads to much shorter optimal time constants of gain modulation (see Figure S3B upper panel). Assuming that the subject knows the coherence, the optimal time constant for the 0% coherence condition then approaches 0, since the subject can receive more trials and increase its reward rate by guessing as fast as possible. Provided subjects are attempting to maximize reward rate, the optimal time constant should be much shorter on tasks that include a fixed inter-trial interval (e.g. [81]) and longer on those which penalise (either in time or through explicit punishments) fast errors.
Finally, it would be interesting to test our proposed simultaneous excitatory-inhibitory gain modulation mechanism in behaving animals performing perceptual decision-making tasks. For example, the task can include a fraction of trials with some cue to capture the subject's attention, while putative excitatory and inhibitory neurons are recorded. The neural activities and behavioural performance could then be compared between trials with or without the attentional cue.

Materials and Methods
Following [55] and [56], the non-linear input-output function of a single noisy excitatory cell is approximated from the first-passage time input-output relation of a leaky integrate-and-fire neuron [46,55,60].
where I is the total synaptic input current to a neuron, and i~L or R, denoting selectivity to a leftward or rightward motion stimulus, respectively. We follow the fitted parameters of a~270 Hz/nA, b~108 Hz and d~0:154 s as in [55]. t ref~2 ms is the absolute refractory period of the neuron, although our results are similar if we ignore this term. For the inhibitory interneuronal population, we assume that f I (I I ) is linear so that it can be implicitly embedded in the reduced two-variable model for analysis [55]. This yields f I (I I )c Note that J ij,eff v0 is required to allow competition via effective mutual inhibition between the two selective excitatory populations.
This means that J ii w To allow all defined J's to be w0, we shall replace J ij,eff with {DJ ij,eff D. To simplify the notation, we shall henceforth remove the label ''eff''. Note that this effectively makes our implementation of inhibitory gain strictly non-multiplicative, although the results would be similar if it was; in fact, multiplicative gain modulation is sufficient but not necessary for producing our results [91]. I noise in Eq. (2) is assumed to be primarily filtered by fast AMPA receptors (with decay time constant t noise of 2 ms) via an Ornstein-Uhlenbeck process [55,94] where s noise is the standard deviation of the noise and g is Gaussian distributed white noise with zero mean and unit variance.

Simulations and stability analyses
We performed noisy simulations of our model using MATLAB using a forward Euler-Marayama numerical scheme [95] with a time-step of dt~0:1 ms. Smaller time steps do not affect our results. In order to compute the average firing rates and behavioural statistics, we performed 5000 trials of noisy simulation for each set of model parameters.
The activity timecourse of our model can be best understood by analysing its dynamics on the two dimensional state/phase space called the phase plane (see Figure S1). We set both dynamical equations (Eq. (5)) to be dS i dt~0 and solve for the S i 's, where i~L,R. The solutions for each equation (called a nullcline) can be plotted in the (S R ,S L ) phase plane. Since the firing rates r i~Si = ct s (1{S i ) ½ are monotonic functions of the average gating variables S i , transforming to (r R ,r L ) coordinates yields the same qualitative dynamics.
Intersections of the nullclines give the steady states (fixed points) of the network. For our purposes, these steady states can be stable, i.e. point attractors ( Figure S1A) or (semi-) unstable, i.e. saddle points ( Figure S1B). In a noiseless system, trajectories near an attractor will move towards it (with local velocities called vector fields shown by the length of the arrows in Figure S1A). For an unstable steady state, only trajectories starting on a unique curve are attracted into it. This curve is called the stable manifold of the unstable steady state. Trajectories on all other parts of the plane are repelled away from it to another associated curve, called its unstable manifold [55,96].
The loci of all steady states (stable and unstable) as a function of a parameter yields a stability (bifurcation) diagram. Phase-plane and stability (bifurcation) analyses of our two-variable network model were done using XPPAUT [97].

Constraining model parameters
Apart from the new parameters mentioned in Table 2, we maintained the parameters as in previous work, as reported in Table 1. These parameters were used to provide a qualitative rather than quantitative fit to the neural and behavioural data, and we simulated predictions at a range of parameter values until the desired fits were isolated. In order to constrain our new parameters, we first ensured that our excitatory and inhibitory gain maximal amplitude parameters (g 0E and g 0I ) were in the dynamical regime that allowed decision-making ( Figure 7). We then fitted the excitatory and inhibitory gain onset times t gE ,t gI to replicate the characteristic dip in neural firing-rates at motion onset. Finally, we fit the gain modulation time constant t g to the behavioural (reaction time and accuracy) data, while ensuring that the neural activities were not unrealistically low. Figure S1 Two dimensional state/phase space called phase plane. The firing rate r L of the population selective towards leftwards (L) motion is plotted against the firing rate r R of the population selective towards rightward (R) motion. Orange and green curves represent nullclines: where dS L dt~0 and dS R dt~0 , respectively. The synaptic gating variable S L=R activities are transformed to firing rates, preserving the same qualitative dynamics.

Supporting Information
Intersections of the nullclines yield the steady states of the system. (A) A symmetric stable steady state (symmetric attractor). Trajectories starting near this steady state are attracted into it, with local velocities given by the arrows. This set of arrows is called the vector field. The set of all starting points for trajectories attracted into this attractor is called its basin of attraction. In this figure this is the entire phase plane. (B) A symmetric unstable steady state, called a (symmetric) saddle point. Only trajectories starting on a unique curve (shown in light blue) are attracted into it. This curve is called the stable manifold of the unstable steady state. Trajectories on all other parts of the plane are eventually repelled away from the unstable steady state, to another curve (shown in yellow), called its unstable manifold. There are also two asymmetric attractors.
The stable manifold of the unstable steady state separates the basins of attraction of these two attractors.
(TIF) Figure S2 Post-decision shutdown. After the firing rate of one of the selective populations has crossed the motor threshold (70 Hz) for saccade initiation, the gains of both excitatory and inhibitory neurons are allowed to decay towards 0. (A) Phase plane at the end of a trial. The firing rate r L of the population selective towards leftwards (L) motion is plotted against the firing rate r R of the population selective towards rightward (R) motion. Orange and green curves represent nullclines: where dSL dt~0 and dSR dt~0 , respectively Only a low, symmetric attractor is present. Postdecision, trajectories start from either the upper left or the bottom right, and move towards this attractor, with local velocities shown by the arrows (B) As a result, the firing rates of both winning and losing populations are reset to baseline, before the start of the next trial, as observed in experiments. (TIF) Figure S3 Task dependent optimal timescale of gain modulation for maximizing reward rate for individual coherences. (A) using the realistic temporal delays from Roitman and Shadlen (2002). (B) in a reaction time task with a fixed intertrial interval (ITI). Upper, middle and lower panels show reward rate (RR), accuracy and trial duration (A) or RTzITI (B). The ITI shown here is 1500 ms. Solid and dashed curves in lower panels show correct and error trials, respectively. Black arrows in upper panels show how the optimal time constant of gain modulation changes with increasing coherence. (TIF) Figure S4 Shallower slopes in relation to motion coherence in the anti-preferred direction leads to slightly longer reaction times and poorer accuracy. (A) Timecourse of input currents for equal slopes (black) and 2 times shallower slopes (red) of input current out of the response field, in relation to motion coherence. (B) Activity timecourse of model, averaged over multiple trials, with different motion coherences for (left) equal slopes and (right) 2 times shallower slope of input current in relation to motion coherence in the anti-preferred direction. Response threshold at 70 Hz, compare with Figure 1. (C,D) Accuracy (C) and mean RT (D) generated by model and in the experiment of [57]. The model with equal slopes (see also Figure 3) and a modified version with a 2 times shallower slope of input current in relation to motion coherence in the anti-preferred direction (red) are shown. (E) Network model performance as a function of the time constant of gain modulation t g using the equal slopes (black) as in the main manuscript (see Figure 10) and another with a 2 times shallower slope (blue) of input current in relation to motion coherence in the anti-preferred direction. Upper panel: mean reward rate (RR); middle panel: accuracy; lower panel: mean trial duration (TD). Dashed horizontal lines show our model's fit to the data with equal slopes and t g~1 20 ms. Vertical lines in the upper panel show the optimal timescale of gain modulation. Inset: mean reward rate zoomed in around the optimal timescale, showing the optimal timescale is only around 20 ms shorter *170ms (compared to *190ms) when using shallower slopes for input current in relation to motion coherence in the anti-preferred direction. Shallower slopes in relation to motion coherence in the anti-preferred direction lead to a lesser discriminatory ability for the network. Thus, the firing rates for the different motion coherences ramp up/down closer together (B), resulting in poorer accuracy (C). Furthermore, the changes are more pronounced for the higher motion coherences (A) -the input currents to the losing population are generally greater, leading to greater competition between the two competing populations. This subsequently slows the integration time of the winning population, and hence lengthens the reaction time (D). This leads to a very slightly shorter optimal gain time constant (E). (TIF) Figure S5 Optimal timescale of gain modulation for maximizing mean reward rate for different noise levels. Network model performance as a function of the time constant of gain modulation t g using the noise level (black) in the main manuscript (see Figure 10) and another with a higher (twice the standard deviation) noise level (red). Upper panel: mean reward rate (RR); middle panel: accuracy; lower panel: mean trial duration (TD). Dashed horizontal lines show our model's fit to the data with t g~1 20 ms and noise as reported in the main manuscript. Vertical lines in the upper panel show the optimal timescale of gain modulation. Inset: mean reward rate zoomed in around the optimal timescale, showing the optimal timescale is around 200 ms longer *320ms (compared to *190ms) for a higher (twice the standard deviation) noise level. (TIF) Movie S1 Dynamics of decision-making over an entire trial. View slideshow to play. Upper and lower panels show phase-planes and activity timecourses, respectively, throughout the various epochs of a trial for coherence c~0%. The firing rate r L of the population selective towards leftwards (L) motion is plotted against the firing rate r R of the population selective towards rightward (R) motion. Orange and green curves represent nullclines: where dSL dt~0 and dSR dt~0 , respectively. The intersection of nullclines yield the steady states. Arrows denote the local velocities of trajectories. The network starts from a low symmetric attractor during the fixation period. A burst of input current at the onset of choice targets reconfigures the network such that only a single stable steady state is present, preventing decision-making. After adaptation, the network settles to a high symmetric attractor, with additional asymmetric attractors also present. At motion onset, the inhibitory gain is increased, forming an nearby unstable steady state with a lower firing rate. The network moves towards this steady state along the diagonal line (with equal firing rates), but before it can reach it and firing rates diverge, the excitatory gain is increased, raising this unstable steady state. This reproduces the ''dip'' phenomenon. When the input due to the motion stimulus comes into effect, the network is reconfigured initially to a symmetric stable steady state (which prevents early divergence of firing rates), and then to a symmetric unstable steady state with a firing rate that is higher than the adapted target firing rate. This unstable steady state becomes more unstable (nullclines come closer together) as both gains increase dynamically. Consequently, the firing rates of the competing selective populations diverge and ramp up (down) more and more quickly towards the winning (losing) 'choice' attractor. The motor/response threshold of 70 Hz is crossed prior to the network reaching the corresponding 'choice' attractor, and a saccade is initiated. The increasing instability caused by both attractor network and increasing gain dynamics creates an urgency signal, leading to short tailed RT distributions, even if the network operates close to a dynamic transition (bifurcation) point.

(PPTX)
Movie S2 Larger dynamic ranges allow robust decisionmaking for co-modulation of excitatory and inhibitory gains. Stability diagrams are shown as a function of excitatory gain g E for different inhibitory gains g I . Black dashed lines show parameters that fit the behavioral and neural experimental data, namely (g E ,g I )~(3,1:1). For each stability diagram as a function of g E , dark shades show stable branches, while light shades show unstable ones. As the inhibitory gain g I is reduced, the dynamic range of the network decreases. Furthermore, our fitted parameters are not sensitive to small perturbations. If we increase or decrease g E or g I slightly, our model adequately performs its decision-making computations. However, if we had chosen a much smaller value of g I as our parameter, small perturbations in parameter values would have rendered the network incapable of performing its decision-making computations. On the other hand, we may increase g I , which would lead to the network performing its decision-making computations with a larger dynamic range. However, the unstable branch would then have a very large firing rate (*85 Hz) and not fit the neural experimental data. (AVI)