Multi-layer network utilizing rewarded spike time dependent plasticity to learn a foraging task
Fig 1
The network organization is a simplification of the information processing flow known in the visual pathway, involving mapping of the sensory input into the higher level representations and then using them for decision making in the prefrontal cortex [4]. Input indicating the position of food particles relative to the virtual agent (positioned in the center of the field) was simulated as a set of excitatory inputs to the input layer neurons. In the model, each input layer cell sends one excitatory and one inhibitory connection to each of the cells in the middle layer where object representation is built. Each middle layer cell sends one excitatory and one inhibitory connection to 9 cells in the output layer. The most active cell in the output layer (size 3x3) decides the direction of subsequent movement. Excitatory connections from the input to the middle layer are subject to non-rewarded STDP. Excitatory connections from the middle layer to the output layer are subject to rewarded STDP where reward depends on whenever a move results in food acquisition. Inhibitory connections from a given cell always match the average strength of the excitatory outputs of the same presynaptic cell.