Fig 1.
A. Structure of the retro-cueing task performed by monkeys from [10]. B. Input patterns corresponding to an example trial. Colour denotes the activation value of an input unit. Note that inputs near the edges would wrap around because the space is circular C. Schematic RNN architecture. Green dots denote recurrent units. Only 3 input units and 2 output are shown (there were in 36 and 17 respectively). D. Distribution of errors made by the models at test (note errors were binned into 40 degree-wide bins for plotting; circles and vertical bars correspond to M±SEM) shown with a best fit von Mises (circular normal) density function (line). E. Behavioural performance of the monkeys from [10] (circles) alongside the best-fitting mixture model (line).
Fig 2.
The geometry of cued mnemonic representations learnt by the RNNs.
A. Visualisation of the pre- (left) and post-cue (right) geometry of cued items reported in [10]. Population responses from LPFC binned into 4 colour categories (denoted by marker colour) for each stimulus location (upper, lower–denoted by marker shape). Data plotted in reduced dimensionality space, defined by the first 3 principal components (PCs) of the 8 location-colour pairs. Planes of best fit for each location shown as grey quadrilaterals. B. Analogous visualisation of the hidden activity patterns from two example RNN models for the pre-cue (left) and post-cue (right) delay. Two locations (L1, L2) correspond to triangles and squares, respectively. The percentage of total variance explained by each PC axis shown in square brackets. Chosen models (in this and subsequent figures) correspond to the ones with geometries qualitatively most and least similar to the group-level average (in this case, the average post-cue geometry). Note that the principal components were calculated separately for each delay (and model), thus the PC axes shown e.g. in the upper left (example model 1, pre-cue) panel are not the same as the ones shown in the upper right (example model 1, post-cue) panel. C.-D. Between-plane angles θ for the pre- (red dots) and post-cue delay (navy triangles) timepoints, respectively. Lighter and darker colours correspond to the values for individual models and grand averages, respectively. E. Phase alignment angles ψ between the cued planes in the post-cue delay. F. Proportion of total variance explained (PVE) by the first 3 PCs for the PCA models fit to activation patterns from individual networks. Values for individual models shown in lighter, and grand averages in darker colours. G. Subspace alignment index (AI) between location-specific planes during the pre- (red) and post-cue (navy) delay intervals. Individual models shown as points, bars correspond to the grand averages. AI reported for 2- and 3-dimensional subspaces H. AI for the unrotated (light grey) and rotated (dark grey) subspaces–for description, refer to the main text. Statistically significant contrasts denoted by asterisks (*: p < .05, **: p < .01, ***: p < .001).
Fig 3.
The geometry of the uncued item representations.
A. Visualisation of the pre- (left) and post-cue (right) geometry of the uncued memory items reported in [10]. Population activity patterns from macaque LPFC were averaged across the cued items and binned into 4 uncued colour categories (denoted by marker colours). Data from each delay period was plotted in reduced dimensionality space, degined by the first 3 PCs of the 8 location-colour pairs. Planes of best fit for each location shown as grey quadrilaterals. All conventions as described in Fig 2A. B. Hidden activity patterns for the uncued items, from two example models visualised in a 3-dimensional space. All conventions as in Fig 2B. C. Between-plane angles (θ, left) and phase alignment angles (ψ, right) between the two uncued colour planes in the post-cue delay. Individual models and population mean shown as transparent and opaque markers, respectively. D. Alignment index for the uncued subspaces in the pre- and post-cue delays. Note that prior to the presentation of the retro-cue, the coding format is location based. Therefore, the pre-cue data is the same as shown in Fig 2E. E. Visualisation of the cued and uncued memory items on trials where the upper location was cued reported in [10]. Population activity patterns from macaque LPFC were averaged across the uncued items to calculate the cued item representations, and across cued items for the uncued representations, prior to binning into 4 colour categories each. Data from each delay plotted in the reduced dimensionality space, corresponding to the first 3PCs. F. Visualisation of the hidden activity patterns for cued and uncued items in the post-cue delay, on trials where L1 was cued. Data from two example models. Note the plane for L2 is severely compressed and thus hard to see. G. Left: Angles θ (rectified) between cued and uncued planes in the post-cue delay, averaged across the two retro-cue locations. Values for individual models shown in grey, average across networks in black. Right: Complementary AI values between the cued and uncued subspaces, averaged across the two retro-cue locations. AI calculated for 2- and 3-dimensional subspaces. Black circles correspond to values for individual models, grey bars denote the means across all networks. H. Colour discriminability index (CDI) for the pre-cue subspaces (averaged across both locations, black circles) and cued and uncued subspaces in the post-cue delay (blue triangles and green crosses, respectively).
Fig 4.
A. Training loss of an example RNN model plotted against training epochs. The three dots correspond to the timepoints chosen for the subsequent analysis: untrained (orange), plateau (magenta) and trained (purple) stages, respectively. B. Scatterplot of the weights between the two retro-cue input units and recurrent units for an example model. C. Comparison of the plane angles θ and AI between the cued planes across different training stages for the pre- (top row) and post-cue (bottom row) delays. Left: Plane angles at the untrained (orange), plateau (magenta) and trained (purple) stages. Values for individual models show in transparent colours, circular mean in opaque. Right: AI at the same training stages. Mean across models depicted as bars with individual datapoints overlaid. D. Partial plots showing the log of training epochs (y-axis) versus the three representational geometry metrics used as regression predictors (left to right: AI Cued, AI Cued/Uncued and AI Uncued). Values for individual networks shown as grey dots, regression lines of best fit in red alongside the 95% slope confidence intervals in navy dashed lines.
Fig 5.
Comparison of cued item geometry across networks trained under various post-cue maintenance pressure conditions.
A. Example trials for a subset of the parametric family of retro-cueing tasks used. Overall trial length was constant, and the ratio between pre- and post-cue delay lengths was varied. Conventions as in Fig 1B. B. Comparison of cued subspace AI during the pre- (red circles) and post-cue (blue triangles) intervals between networks trained with different post-cue delay lengths. Asterisks denote the significance levels of the contrasts described in the main text: ***: p < .001, **: p < .01, ns: p > = .05.
Fig 6.
A. Cross-temporal generalisation scores for decoders trained to discriminate between colour pairs. Data has been averaged across all trained networks. Black lines indicate the junctions between task events–stimulus presentation, pre-cue delay, retro-cue, and post-cue delay. Left: Average scores for models trained with a fixed-length delay interval. Right: analogous plot for models trained with variable delay lengths. Note that the networks show a stable (cross-generalisable) memory code between the 10 and 16 cycle mark, which covers the range of temporal variability experienced during training (with the post-cue delay starting at the 10 cycle mark and lasting until between the 11 and 18 cycle marks). B. Boxplots showing the distribution of mean cross-temporal decoding accuracy scores, averaged across the two memory delays, for all models. Variable delay interval length condition shown on the left in green, fixed delay length condition on the right in sand. Asterisks denote the results of the one-sided one-sample t-tests against the chance (50%) decoding level, *** corresponds to p < .001. C.-D. Plane angles θ between the cued subspaces in the pre- and post-cue delay periods, respectively. E. Phase alignment angles ψ between the cued subspaces in the post-cue delay.
Fig 7.
Behavioural findings from the probabilistic cue paradigm.
A. Task structure. Example valid trial (where the retro-cue and probe inputs match) shown on the left, example invalid trial shown on the right. B. Distribution of errors made by the models after convergence (dots and spikes, M±SEM) shown with a best-fit von Mises pdf (line). Performance on trials where retro-cue matched the probe (valid) shown in green, mismatch (invalid) trials shown in red. Data plotted for models trained under 2 different cue validity conditions– 75 and 50%. Note that errors were binned into 40 degree wide bins for plotting. C. Comparison of the mixture model parameters (left to right panels: K, pT, pNT and pU) across cue validity conditions (50 and 75%, corresponding to lavender triangles and orange circles, respectively) and trial types (valid, invalid; shown on the x-axis). Asterisks denote the significance levels of the post-hoc tests described in the main text: ***: p < .001, **: p < .01, ns: p > = .05. Comparisons within a validity level shown in orange and lavender, comparisons across different validity levels (but for the same trial type) denoted in black.
Fig 8.
Neural geometry findings from the probabilistic cue paradigm.
A. Distribution of plane angles θ and phase-alignment angles ψ formed in the delay intervals for all trained networks. Values for the pre-cue, post-cue and post-probe delays shown as red circles, blue triangles and grey squares. Data from individual models shown in opaque, and grand averages in solid colours. Plots for models trained under trained under the deterministic (retrocue validity = 100%) and non-deterministic retrocue validity conditions (75% and 50%) ordered top to bottom rows, respectively. B. Colour discriminability index (CDI) for the pre-cue, post-cue and post- colour planes. Pre-cue values were averaged across both locations, whilst the post-cue values for the cued and uncued subspaces were averaged across valid and invalid trials. Bars correspond to the mean across trained network, error bars denote SEM. Panels correspond to results from models trained under 100%, 75% and 50% retrocue validity conditions, from left to right. C. Scatterplots showing the relationship between the normalised CDI benefit (y-axis, see main text) and the difference in mixture model parameter estimates on valid and invalid trials (x-axes). Panels correspond to the memory precision parameter K, probability of choosing the target item pT, probability of choosing the non-target item pNT, and probability of making random guesses pU, from left to right. Results of a Spearman correlation between the variables shown on the x- and y-axis shown on each panel. Lines of best fit plotted in red.