Fig 1.
Modular predictive coding and image reconstruction after learning.
a Example of a novel image I presented to the network after learning on 1000 other images in a total of 5000 randomly sampled presentations. b Input f0 to the first level after retinal and LGN processing. c1 Activity of level-1 familiarity neurons f1 receiving 1-to-1 input from n1-neurons through constrained connection matrix V1 (). d1 Reconstruction of the preprocessed image based on the steady-state activity of the level-1 familiarity neurons, f0 ≈ U1 f1. e1 Activity of level-1 novelty neurons n1 receiving 1-to-1 input from the LGN and localized input from f1-neurons through the constrained connection matrix U1. Orange circles represent RFs of three neighboring f1-neurons. c2, d2, e2 The corresponding quantities for level 2. The number of f2-neurons is 50% of the number of f1-neurons and 25% of the number of n1-neurons (= number of pixels in the visual input).
Fig 2.
Anatomical versus effective connectivity.
A Schematic anatomical connectivity pattern in the early ventral visual cortex shows recurrent synaptic connections within each level (Eq (4)). Lower-level ‘familiarity’ neurons (green) project to ‘novelty’ neurons (red) at the next higher level. B Effective connectivity expressing causal relationships results in a purely feedforward network (Eq (5)). At each level, familiarity and novelty information is extracted from the familiarity representation at the previous level.
Fig 3.
Local recurrent connectivity emerging from unsupervised learning improves reconstruction quality.
a Evolution of reconstruction error (δ, golden, averaged across 50 consecutive presentations) during 5000 random presentations from a set of 1000 training images. Average activity of level-1 novelty neurons (nav, magenta) mirrors decrease in δ. Initial sharp decrease in n1 activity is explained by average activity increase of level-1 familiarity neurons (fav, black), which quickly learn to extract the most dominant component, mean local brightness (cf. d2). b1–b3 Reconstruction of a single novel image (not used for training) after 0, 100, and 5000 presentations based on f1 activities. c1–c3 Activity of all novelty neurons n1 (same number of neurons as pixels in image), showing the reduction of local novelty with decreasing reconstruction error. d1–d3 Corresponding activities of f1-neurons in response to the novel image. Image representation is gradually refined, despite the image not having been presented to the network. e1–e3 Evolution of overlapping receptive fields (RFs, rows of V1), shown separated for visualization, of nine representative nearest-neighbor familiarity neurons.
Fig 4.
Novelty detection and familiarity information fill-in at different spatial scales.
A familiar image is either occluded by a narrow, 1-pixel-wide ‘curtain’ (a) or by a 7-pixel-wide ‘fence’ (b). Circles represent the RFs of level-1 (smaller circle) and level-2 neurons, respectively. Novelty neurons at level 1 are activated by the fine curtain (c1), while at level 2, curtain information is filtered out (c2). For the fence, novelty at level 1 is only detected at some edges (d1), but novelty of the wide fence bars is detected at level 2 (d2). Reconstruction of the image (LGN activity) based on the level-1 and level-2 familiarity neurons for the curtain (e1, 2) and the fence occlusions (f1, 2) shows how the occluded parts of the image are filled in despite the reduced number of neurons.
Fig 5.
Reconstruction and novelty as a function of occlusion patch size k.
a, b Example of a novel image with occlusion patches (grey squares) of side length k = 1 (a) and 10 pixels (b). For each patch size, the total occluded area is 5% of the image area. c Reconstructions of occluded and non-occluded images progressively become more similar across levels. Average distance δ (across 200 novel images) between the non-occluded original image I and the same image occluded by the patches (black curve, ‘level 0’), and between the reconstructions of those images F−1(I) and
based on the familiarity neurons at level 1 (blue) and level 2 (red), respectively. d Average activity of level-1 (blue) and level-2 (red) novelty cells over the occluded (solid) and non-occluded image areas (dotted curves). Novelty neurons in the occluded parts of the image are more active than novelty neurons in the non-occluded parts of the image as long as the patch size k is smaller than the RF diameter (8 and 18 pixels for levels 1 and 2—cf. blue and red circles—respectively).
Fig 6.
Non-classical effect of endstopping is present in the effectively feedforward architecture of modular predictive coding.
a Activity of level-1 novelty neurons averaged across the RF of bar-detecting level-1 familiarity neurons decreases as bar height exceeds approximately one-half of the RF diameter. Activity is shown relative to baseline activity for the short bar of 0.4 RF diameter and five times brighter than the background. Black circles indicate the responses of bar detectors relative to the baseline for short and tall bars in panels b and c, respectively. b Average activities of level-1 novelty neurons in a neighborhood of 3.5-by-3.5 RF diameters around each bar-detector in response to the short bar. Central dark red square (black circle) represents the average activity of all novelty cells within the RF of a bar detector, further averaged across all bar detectors. The same double averaging was applied to the novelty neurons within the RFs of the neighboring familiarity neurons. While these RFs are highly overlapping, they are displayed here side-by-side. c Same as in panel b, but in response to a tall bar of height 3 (instead of 0.4) RF diameters. The averaged activity of novelty neurons (representing the local reconstruction error) within the bar-detectors RF (black circle) is lower than that for the short bar (compare black circles) and for the same reason also lower than that recorded at the ends of the bar (top and bottom yellow patches).