Recurrent neural networks that learn multi-step visual routines with reinforcement learning

doi:10.1371/journal.pcbi.1012030

Recurrent neural networks that learn multi-step visual routines with reinforcement learning

Fig 7

Search-then-trace task.

A. Example stimulus shown to one of the networks. Upper panel, visual stimulus. Lower panel, orange shading shows the propagation of enhanced activity among recurrent units of the input layer, starting at the representation of the red marker, which is highlighted as the result of the search operation. From here, the enhanced activity spread along the curve (trace operation). B. We tested how well the models generalized to curves that were longer than those presented during training. Generalization was better for networks that had been trained on longer curves (x-axis). E.g. networks trained on curves up to a length of 9 pixels generalized to curves with 13 pixels (p<10⁻⁶, Wilcoxon signed-rank test). C. Normalized response enhancement for the target marker and target curve. Each curve is normalized by its maximum over time. First the activity of the unit with a RF at the location of the target marker was enhanced (search operation, red curve). Thereafter, enhanced activity propagated across the target curve connected to it (trace operation, green curves). D. In the visual cortex of monkeys, the representation of the target marker is enhanced (red) before the enhanced activity spreads over the V1 representation of the target curve (green; adapted from [25]). E. Distribution of the latency of the response enhancement across 260,000 stimuli and 19 networks. The latency of the modulation related to the search operation was shorter than that related to curve-tracing (p<10⁻¹⁵, Mann-Whitney U test). F. Distribution of the latency of response enhancements across V1 neurons in monkeys solving the search-then-trace task (adapted from [25]).

doi: https://doi.org/10.1371/journal.pcbi.1012030.g007