Fig 1.
(a) Generative model for dynamic predictive coding. (b) Parameterization of the model. The higher-level state modulates the lower-level transition matrices through a top-down network (“hypernetwork”) . (c) A possible neural implementation of the generative model using cortical pyramidal neurons. Pyramidal neurons receive the top-down embedding vector input via synapses at apical dendrites and the current recurrent state vector via basal dendrites, and produce as their output the next state vector. (d) Schematic depiction of an inference step when the dynamics at the lower level is stable. The higher-level state remains stable due to minimal prediction errors. (e) Depiction of an inference step when the lower-level dynamics changes. The resulting large prediction errors drive updates to the higher-level state to account for the new lower-level dynamics.
Fig 2.
Predictive coding of natural videos and learned space-time receptive fields.
(a) Inference on an example input image sequence of 10 frames. Top to bottom: Input sequence; model’s prediction of the current input from the previous step (the first step prediction being zero); prediction error (predicted input subtracted from the actual input); model’s final estimate of the current input after prediction error minimization. (b) The trained DPC network’s response to the natural image sequence in (a). Each plotted line represents the responses of a model neuron over 10 time steps. Top: responses of the 20 most active lower-level neurons (some colors are repeated); middle: responses of seven randomly chosen higher-level neurons; bottom: predicted transition dynamics (each line is the modulation weight for a basis transition matrix at the lower level). (c) 40 example spatial receptive fields (RFs) learned from natural videos. Each square tile is a column of U reshaped to a 16 × 16 image. (d) Space-Time RFs (STRFs) of four example lower-level neurons. First column: the spatial RFs of the example neurons. Next seven columns: the STRFs of the example neurons revealed by reverse correlation mapping. (e) Left panel: space-time plots of the example neurons in (d). Right panel: space-time plots of the RFs of two simple cells in the primary visual cortex of a cat (adapted from [25]).
Fig 3.
Hierarchical temporal representation with different timescales.
(a) Autocorrelation of the lower- and higher-level responses in the trained network with natural videos. Shaded area denotes ±1 standard deviation. Dotted lines show fitted exponential decay functions. Left: response recorded during natural video stimuli; right: white noise stimuli. (b) Autocorrelation of the neural responses recorded from MT and LPFC of monkeys. Adapted from Murray et al. [6] (c) Inference for an example Moving MNIST sequence in a trained network. The red dashed boxes mark the time steps when the dynamics of the input changed. (d) The network’s responses to the input Moving MNIST sequence in (c). Note the changes in the higher-level responses after the input dynamics changed (red dashed boxes); this gradient-based change helps to minimize prediction errors. (e) Higher-level responses to the Moving MNIST sequences visualized in the 2D space of the first two principal components. Left: responses colored according to motion direction; right: responses colored according to digit identities. (f) Comparison of decoding performance for motion direction versus digit identity using lower- and higher-level neural responses. Error bars: ±1 standard deviation from 10-fold cross validation. Orange: chance accuracies.
Fig 4.
Flash-lag illusion and object representations in apparent motion.
(a) The flash-lag test conditions used by [26]. The moving ring could have an initial trajectory (top) or no trajectory (bottom). At the time of the flash (bright disk), the ring could move along the initial trajectory, stop, or reverse its trajectory. Adapted from [26]. (b) Two test conditions (left) regarding initial trajectories of the moving object (a digit) in the flash-lag experiment with the model, and four test conditions (right) for the moving object. The flashed object was shown at time t and turned off at time t + 1 (same as the “Terminate” condition). (c & d) Psychophysical estimates for human subjects reported by [26] when the moving object had initial trajectories (c) or no initial trajectory (d). (e) Perceived location of the flashed object in the DPC model at time t + 1. The error bar indicates ±1 standard deviation (measured across presentations of different digits). (f) Perceived displacement between the moving object (with initial trajectories) and the flashed object in the DPC model for the four test conditions. (g) Same as (f) but with no initial trajectory for the moving object. (h) Illustration of the prediction-error-driven dynamics of the perception of the moving object in the model when the trajectory reversed at time t + 1. Red ellipsis between panels denotes the prediction error minimization process. (i) Interference pattern during human apparent motion perception with continuous motion (left) and reversed motion (right) at short latency (fast detection task). Brighter color denotes more interference. Dashed arrows represent object motion direction. Adapted from [29]. (j) Same as (i) but at long latency (slow discrimination task) [29]. (k) Perceived location of the moving object in the DPC model at time t + 1 probed at short versus long latency during prediction error minimization. Positive values denote distance along the original trajectory. Negative values denote distance along the reversed trajectory. Short and long latency correspond to “Early percept” and “Late percept” respectively in part (h). (l) Perceived location of the digit at all latencies during the prediction error minimization process in part (h).
Fig 5.
Cue-triggered activity recall in the DPC model.
(a) The experimental setup of Xu et al. (adapted from [1]). A bright dot stimulus moved from START to END repeatedly during conditioning. Activities of neurons whose receptive fields (colored ellipses) were along the dot’s trajectory were recorded. (b) Generative model combining an associative memory and DPC. The red part denotes the augmented memory component that binds the initial content vector r0 and the dynamics vector rh to encode an episodic memory. (c) Depiction of the memory encoding process. The presynaptic memory activity and postsynaptic prediction error jointly shape the memory weights G. (d) Depiction of the recall process. Prediction error on the partial observation drives the convergence of the memory estimates
and recalls the higher-level dynamics vector
as a top-down prediction. The red dotted box depicts the prediction error between the missing observations for rh and the prediction
; this error is ignored during recall, implementing a form of robust predictive coding [49]. (e) The image sequence used to simulate conditioning and testing for our memory-augmented DPC network. (f) Responses of the lower-level neurons of the network. Colored lines represent the five most active lower-level neurons at each step. Left to right: neural responses during conditioning, testing the network with a single start frame, middle frame, and end frame. (g, h) Normalized pairwise cross correlation of (g) primary visual cortex neurons (adapted from [1]) and (h) the lower-level model neurons. Top: during conditioning; middle two: testing with the starting stimulus, before and after conditioning; bottom: the differences between cross correlations, “After” minus “Before” conditioning.
Fig 6.
Three-level DPC model learns progressively more abstract temporal representations.
(a) Generative model for three-level DPC. (b) Schematic depiction of an inference process. Observation nodes are omitted for clarity. (c) Inference for an example Moving MNIST sequence with “straight bouncing” dynamics. Red time steps mark the moments when the first-level prediction error exceeded the threshold, causing the network to transition to a new second-level state (see Methods). For these time steps, the predictions (second row) are by the second-level neurons, while the rest are by the first-level neurons as in Fig 3. (d) The network’s responses to the Moving MNIST sequence in (c). Left to right: first-level responses, second-level responses, third-level responses, first-level modulation weights, second-level modulation weights. (e) Same as (d) but with “clockwise bouncing” dynamics. (f) Same as (d) but for the sequence in (e). (g) Third-level responses to the Moving MNIST sequences visualized in the 2D space of the first two principal components. Left: responses colored according to bouncing type; right: responses colored according to motion direction. (h) Comparison of decoding performance for bouncing type versus motion direction using the modulation weights generated by the second and third level. Error bars: ±1 standard deviation from 10-fold cross validation. Orange: chance accuracies.