Table 1.
Table of mathematical notation used in the paper.
Vector and matrix variables are defined with their dimensions. Otherwise, the variables are scalars.
Fig 1.
Graphical model of the generative process assumed by temporal predictive coding.
xk correspond to hidden states, yk to observations, and uk to control inputs. Circles denote latent variables, squares denote observations, and arrows denote conditional dependence of the variables (the absence of an arrow indicates conditional independence).
Fig 2.
Possible neural implementations of temporal predictive coding.
A: Potential neural circuit implementing the iterative recurrent predictive coding algorithm. For simplicity, we have depicted each neural ‘layer’ as possessing only two neurons. B: Version of the model where the prediction errors are represented by the difference in membrane potential in soma and at apical dendrites (depicted as ellipses). C: Neural circuitry required to implement the single-iteration predictive coding algorithms. This model no longer includes a separate set of neurons explicitly storing the estimate of the previous timestep, but instead, the temporal prediction errors are computed naturally through recurrent connections. For simplicity, we omitted the control inputs Buk, which can be implemented in a similar way to the recurrent inputs to the error neurons or apical dendrites.
Fig 3.
The tracking task and the impact of inference step size and the number of inference steps on performance.
A. The dynamics of the true hidden state are represented as a 3-dimensional vector at each time step, with entries corresponding to position (x1), velocity (x2) and acceleration (x3). B. The projected noisy observations from the true system state in A. C: Estimates of the acceleration with different models, zoomed in at the interval between 560 and 600 time steps. D: MSE difference between tPC and Kalman filter, with varying numbers of inference steps and step sizes for predictive coding. PC stands for temporal predictive coding and KF stands for Kalman filter. All values are with arbitrary units (a.u.).
Fig 4.
Effects of learning parameters A and C.
A, B: Estimation of the state and observation trajectories respectively by different models. ‘True’, ‘Learnt’ and ‘Random’ denote the predictive coding model with true, learnt and random A and C respectively. Only the first dimension of the latent and observation is shown for simplicity. The other two dimensions have similar performance. C, D: MSE of the predictions on the hidden and observation levels respectively. Boxplots were obtained with 40 trials for each model. Both x and y are with arbitrary units (a.u.).
Fig 5.
Performance with non-identity noise covariance.
A: True and learnt A and C matrices with different underlying noise covariance matrices. B, C: MSE of the predictions on the hidden and observation levels with different noise covariance matrices. Error bars obtained with 40 trials.
Fig 6.
Representations developed by the model when trained with patches from movies of dynamic natural scenes.
A: First 10 frames of 2 example training movies used in our experiments. Patches extracted from movies obtained at websites pexels.com, pixabay.com and commons.wikimedia.org (for wikimedia attributions see https://github.com/C16Mftang/temporal-predictive-coding). B: The projective fields C developed Gabor-like filters after training. C: Space-time receptive fields developed by hidden neurons of the tPC model.
Fig 7.
A: A free-body diagram of a simple pendulum that has a mass m attached to a string with length L. Also shown are the forces applied to the mass. The restoring force −mg sin θ is a net force toward the equilibrium position. B: A phase portrait of the pendulum simulation showing the result of our linear versus nonlinear models prediction for the ground-truth data. The vector field (i.e. set of small arrows) was created by computing the derivatives of and
at t = 0 on a grid of 30 points over the range of −π to + π and -4 to +4 for θ1 and θ2, respectively. C: The barplot shows the difference between the mean prediction errors of the linear model versus the nonlinear model from 100 simulations with varying noise profiles. The mean errors are significantly different (p << 0.001).