Reinforcement Learning of Targeted Movement in a Spiking Neuronal Model of Motor Cortex
Figure 1
A virtual forearm with joint angle , controlled by 1 flexor and 1 extensor muscle, is trained to align to a target. A proprioceptive preprocessing block translates muscle lengths into an arm configuration representation. Plasticity is restricted to the mapping between sensory representation and motor command representation units (dashed oval). Motor units drive the muscles to change the joint angle. The Actor (above) is trained by the Critic which evaluates error and provides a global reward or punishment signal.