Training Excitatory-Inhibitory Recurrent Neural Networks for Cognitive Tasks: A Simple and Flexible Framework

doi:10.1371/journal.pcbi.1004792

Fig 1.

Recurrent neural network (RNN).

A trained RNN of excitatory and inhibitory rate units r(t) receives time-varying inputs u(t) and produces the desired time-varying outputs z(t). Inputs encode task-relevant sensory information or internal rules, while outputs indicate a decision in the form of an abstract decision variable, probability distribution, or direct motor output. Only the recurrent units have their own dynamics: inputs are considered to be given and the outputs are read out from the recurrent units. Each unit of an RNN can be interpreted as the temporally smoothed firing rate of a single neuron or the spatial average of a group of similarly tuned neurons.

More »

Expand

Table 1.

Parameters for stochastic gradient descent (SGD) training of recurrent neural networks (RNNs).

Unless noted otherwise in the task description, networks were trained and run with the parameters listed here.

More »

Expand

Table 2.

Summary of tasks.

In the multisensory integration and parametric working memory tasks, networks receive both positively (pos.; increasing function) and negatively (neg.; decreasing function) tuned versions of the same input.

More »

Expand

Fig 2.

Perceptual decision-making task.

(A) Inputs (upper) and target outputs (lower) for a perceptual decision-making task with variable stimulus duration, which we refer to as VS here. The choice 1 output must hold low during fixation (fix.), then high during the decision (dec.) period if the choice 1 input is larger than choice 2 input, low otherwise, and similarly for the choice 2 output. There are no constraints on output during the stimulus period. (B) Inputs and target outputs for the reaction-time version of the integration task, which we refer to as RT. Here the outputs are encouraged to respond after a short delay following the onset of stimulus. The reaction time is defined as the time it takes for the outputs to reach a threshold. (C) Psychometric function for the VS version, showing the percentage of trials on which the network chose choice 1 as a function of the signed coherence. Coherence is a measure of the difference between evidence for choice 1 and evidence for choice 2, and positive coherence indicates evidence for choice 1 and negative for choice 2. Solid line is a fit to a cumulative Gaussian distribution. (D) Psychometric function for the RT version. (E) Percentage of correct responses as a function of stimulus duration in the VS version, for each nonzero coherence level. (F) Reaction time for correct trials in the RT version as a function of coherence. Inset: Distribution of reaction times on correct trials. (G) Example activity of a single unit in the VS version across all correct trials, averaged within conditions after aligning to the onset of the stimulus. Solid (dashed) lines denote positive (negative) coherence. (H) Example activity of a single unit in the RT version, averaged within conditions and across all correct trials aligned to the reaction time.

More »

Expand

Fig 3.

Perceptual decision-making networks with different constraints.

(A) Psychometric function (percent choice 1 as a function of signed coherence) and connection weights (input, upper-right; recurrent, upper-left; and output, lower) for a network in which all weights may be positive or negative, trained for a perceptual decision-making task. Connections go from columns (“pre-synaptic”) to rows (“post-synaptic”), with blue representing positive weights and red negative weights. Different color scales (arbitrary units) were used for the input, recurrent, and output matrices but are consistent across the three networks shown. In the psychometric function, solid lines are fits to a cumulative Gaussian distribution. In this and the networks in B and C, self-connections were not allowed. In each case 100 units were trained, but only the 25 units with the largest absolute selectivity index (Eq 30) are shown, ordered from most selective for choice 1 (large positive) to most selective for choice 2 (large negative). (B) A network trained for the same task as in A but with the constraint that excitatory units may only project positive weights and inhibitory units may only project negative weights. All input weights were constrained to be excitatory, and the readout weights, considered to be “long-range,” were nonzero only for excitatory units. All connections except self-connections were allowed, but training resulted in a strongly clustered pattern of connectivity. Units are again sorted by selectivity but separately for excitatory and inhibitory units (20 excitatory, 5 inhibitory). (C) Same as B but with the additional constraint that excitatory recurrent units receiving input for choice 1 and excitatory recurrent units receiving input for choice 2 do not project to one another, and each group sends output to the corresponding choice.

More »

Expand

Fig 4.

Context-dependent integration task.

(A) Psychometric function, showing the percentage of trials on which the network chose choice 1 as a function of the signed motion (upper) and signed color (lower) coherence in motion-context (black) and color-context (blue) trials. (B) Average population responses in state space during the stimulus period, projected to the 3-dimensional subspace capturing variance due to choice, motion, and color as in [5]. Only correct trials were included. The task-related axes were obtained through a linear regression analysis. Note that “choice” here has a unit-specific meaning that depends on the preferred choice of the unit as determined by the selectivity index (Eq 30). For both motion (black) and color (blue), coherences increase from light to dark. Upper plots show trials during the motion context, and lower plots show trials during the color context. (C) Normalized responses of four recurrent units during the stimulus period show mixed representation of task variables. Solid lines indicate the preferred choice and dashed lines the nonpreferred choice of each unit. (D) Denoised regression coefficients from the linear regression analysis. By definition, the coefficients for choice are almost exclusively positive.

More »

Expand

Fig 5.

Constraining the connectivity.

Connectivity after training for the context-dependent integration task (Fig 4), when the connection matrix is (A) unstructured and (B) structured. Both networks consist of 150 units (120 excitatory, 30 inhibitory). In B the units are divided into two equal-sized “areas,” each with a local population of inhibitory units (I_S and I_M) that only project to units in the same area. The “sensory” area (green) receives excitatory inputs and sends dense, “long-range” excitatory feedforward connections E_M ← E_S to the “motor” area (orange) from which the outputs are read out. The sensory area receives sparse excitatory feedback projections E_S ← E_M from the motor area.

More »

Expand

Fig 6.

Multisensory integration task.

(A) Example inputs for visual only (left), auditory only (middle), and multisensory (both visual and auditory, right) trials. Network units receive both positively tuned (increasing function of event rate) and negatively tuned (decreasing function of event rate) inputs; panels here show positively tuned input corresponding to a rate of 13 events/sec, just above the discrimination boundary. As in the single-stimulus perceptual decision-making task, the outputs of the network were required to hold low during “fixation” (before stimulus onset), then the output corresponding to a high rate was required to hold high if the input was above the decision boundary and low otherwise, and vice versa for the output corresponding to a low rate. (B) Psychometric functions (percentage of choice high as a function of the event rate) for visual, auditory, and multisensory trials show multisensory enhancement. (C) Sorted activity on visual only and auditory only trials for three units selective for choice (high vs. low, left), modality (visual vs. auditory, middle), and both (right).

More »

Expand

Fig 7.

Parametric working memory task.

(A) Sample positively tuned inputs, showing the case where f₁ > f₂ (upper) and f₁ < f₂ (lower). Recurrent units also receive corresponding negatively tuned inputs. (B) Percentage of correct responses for different combinations of f₁ and f₂. This plot also defines the colors used for each condition, labeled by f₁, in the remainder of the figure. Due to the overlap in the values of f₁, there are 7 distinct colors representing 10 trial conditions. (C) Lower: Correlation of the tuning a₁ (see text) at different time points to the tuning in the middle of the first stimulus period (blue) and middle of the delay period (green). Upper: The tuning at the end of delay vs. middle of the first stimulus (left) and the end of delay vs. middle of the delay (right). (D) Single-unit activity for a unit that is positively tuned for f₁ during both stimulus periods (left), and for a unit that is positively tuned during the first stimulus period but negatively tuned during the second stimulus period (right). (E) Proportion of significantly tuned units based on a simple linear regression of the firing rates as a function of f₁ at each time point.

More »

Expand

Fig 8.

Eye-movement sequence execution task.

(A) Task structure (for Sequence 5) and (B) sample inputs to the network. During the intertrial interval (ITI) the network receives only the input indicating the current sequence to be executed. Fixation is indicated by the presence of a fixation input, which is (the central) one of 9 possible dot positions on the screen. During each movement, the current dot plus two possible target dots appear. (C) State-space trajectories during the three movements M1, M2, and M3 for each sequence, projected on the first two principal components (PCs) (71% variance explained, note the different axis scales). The network was run with zero noise to obtain the plotted trajectories. The hierarchical organization of the sequence of movements is reflected in the splitting off of state-space trajectories. Note that all sequences start at fixation, or dot 5 (black), and are clustered here into two groups depending on the first move in the sequence. (D) Example run in which the network continuously executes each of the 8 sequences once in a particular order; the network can execute the sequences in any order. Each sequence is separated by a 1-second ITI during which the eye position returns from the final dot in the previous trial to the central fixation dot. Upper: Eye position in “screen” coordinates. Lower: x and y-positions of the network’s outputs indicating a point on the screen. Note the continuity of dynamics across trials.

More »

Expand

Fig 9.

Estimated performance during training for networks in the Results.

(A)-(I) Percentage of correct responses. (J) Error in eye position. For each network the relevant figure in the main text and a brief description are given. Black lines are for the networks shown in the main text, while gray lines show the performance for 5 additional networks trained for the same tasks but using different initial weights. Red lines indicate the target performance; training terminated when the mean performance on several (usually 5) evaluations of the validation dataset exceeded the target performance. In I the target performance indicates the minimum, rather than mean, percentage of correct responses across conditions. The number of recurrent units (green) is indicated for each network. The number of minutes (in “real-time”) needed for training (blue) are estimates for a MacBook Pro running OS X Yosemite 10.10.4, with a 2.8 GHz Intel Core i7 CPU and 16 GB 1600 MHz DDR3 memory. GPUs were not used in the training of these networks.

More »

Expand