Recurrent neural networks that learn multi-step visual routines with reinforcement learning
Fig 6
A,B. More challenging curve tracing stimuli with long spirals (A) or with many distractors (B). C. Accuracy of networks trained on the curve-tracing task with one distractor, when tested on the curve-tracing task with 10 distractors. The networks trained with RELEARNN could solve the task as well, irrespective of the number of distractors (p = 0.17, Mann-Whitney test). Networks trained with BPTT did not generalize as well (p<10−5, Mann-Whitney test) and feedforward networks could not be trained on the curve-tracing task, i.e. they were at chance level. D. Activity of units in the accessory network whose RFs fall on the selected curve (blue traces) or the non-selected one (orange traces), at different distances from the blue pixel that is the target of the eye movement (continuous and dotted traces show the activity of accessory units representing pixels nearer to and farther from the saccade target, respectively). Hence, the credit assignment signal propagates in the opposite direction than to the enhanced activity, starting from the selected eye-movement target. This credit assignment signal is absent from the representation of the distractor curve. E. Activity of units at the beginning of the selected and non-selected curves in the accessory network, for curves that were one (left panel) or five pixels longer (right) than the curves used during training. If the length of the curve was similar to that in the curriculum, the credit assignment signal propagated to the beginning of the selected curve (red fixation point on correct trials) and training is effective. However, if the curves are much longer, the credit assignment signal does not spread to all other pixels of the selected curve and training fails.