Learning spatial hearing via innate mechanisms
Fig 5
Innate circuit detecting midline alignment as an intrinsic reward.
(A) An interactive procedure of using the innate Teacher circuit to detect the midline alignment, used as the intrinsic reward signal for reinforcement learning without any external labeling. (B) The Teacher circuit implementation, where the left LSO output and the right LSO output are combined. Circuits with similar connectivity and tuning curves have been found in the inferior colliculus (IC). (C) Sampled tuning curve of the Teacher circuit, showing the basic function as a midline detector - fires when the agent faces the sound. In the reinforcement learning procedure, output spiking means positive reward, no spiking means no reward, offering an alternative model of auditory orienting response(AOR). (D) Test errors of the Student after training, in the frontal semicircle