Recurrent neural networks that learn multi-step visual routines with reinforcement learning

doi:10.1371/journal.pcbi.1012030

Recurrent neural networks that learn multi-step visual routines with reinforcement learning

Fig 8

Model performance in the trace-then-search task.

A. Example stimulus shown to one of the networks. Upper, an example stimulus. Lower, the spread of enhanced activity is shown in orange. It first spreads over the curve starting at the blue cue and reaches the target marker at the other end, cuing the color that needed to be selected during the search operation. B. Testing accuracy for curves of length up to N+4 pixels where N is the maximum length in the curriculum. The generalization performance improved when the network learned to trace longer curves (p = 1.5·10⁻⁴ for curves of 13 pixels, Wilcoxon signed-rank test). C. Normalized response enhancement for target pixels, averaged across units. Each curve is normalized by its maximum over time. First the curve connected to the fixation point is labeled with enhanced activity (trace operation, green curves) and then the units that represent the correct eye movement target, i.e. with the same color as the target marker, enhanced their activity (search operation, red trace). D. In the visual cortex of monkeys, the response enhancement also first labels the segments of the target curve (green trace), before it labels the position of the eye movement target (red trace; adapted from [25]). E. Distribution of the modulation latency across model units (230,000 stimuli and 16 networks). The response modulation of trace operation precedes that of the search operation (p = 1.5·10⁻⁵, Mann-Whitney U test). F. Distribution of the modulation latency across recording sites in monkeys solving the search-then-trace task (adapted from [25]).

doi: https://doi.org/10.1371/journal.pcbi.1012030.g008