Recurrent neural networks that learn multi-step visual routines with reinforcement learning
Fig 5
Propagation of enhanced activity across the representation of the target curve during curve-tracing.
A. Upper, example stimulus presented to one of the networks. The target curve starts with a red pixel. Lower, activity of recurrent units in the input layer across time. The orange color denotes an increase in activity. Note the spread of enhanced activity over the representation of the target curve, starting at the red pixel. B. Testing accuracy for curves of length up to N+4 pixels where N is the maximum length used during training. At the beginning of training, the model does not generalize to longer curves. At the end of training, a model trained with curves up to 9 pixels long generalized to curves with up to 13 pixels (p<10−6, Wilcoxon signed-rank test). C. Activity of an example unit in the recurrent group elicited by the target (orange) or distractor curve (blue), and activity of the corresponding unit in the feedforward group (brown). The activity elicited by the target curve is enhanced compared to that elicited by the distractor curve. D. Average activity of neurons in area V1 of the visual cortex of monkeys during a curve tracing task, when their RF fell on the target curve (orange) or on the distractor curve (blue). Adapted from [26] E. Distribution of the modulation index across recurrent units of the neural networks. A positive value indicates an enhanced response to the target curve. F. Distribution of modulation index in area V1 of the visual cortex of monkeys (from [17]) G. Distribution of the modulation latency across units of the network. The onset of modulation is delayed for units representing pixels that are farther (7 pixels away), compared to pixels that are closer (2 pixels away) to the beginning of the curve (p<10−15, Mann-Whitney U test). H. The minimum number of timesteps needed to reach 85% accuracy increased for longer curves, indicating the need for recurrent processing. Error bars, 95%-confidence intervals. I. Distribution of the modulation latency across recording sites in monkeys performing the curve-tracing task, adapted from [18]. Dark green represents RF that were close to the fixation point, and light green represents RF that were farther from the fixation point.