Temporally discontinuous engagement of the vmPFC in remote memory retrieval

Systems-level consolidation is the time-dependent reorganisation of a memory trace in the neocortex, with the ventromedial prefrontal cortex (vmPFC) being particularly implicated. Capturing the precise temporal evolution of this crucial process in humans has long proved elusive. Here, we used multivariate methods and a longitudinal functional MRI design to detect, with high granularity, the extent to which autobiographical memories of different ages were represented in vmPFC and how this changed over time. We observed an unexpected biphasic involvement of vmPFC during retrieval, rising and falling around an initial peak of 8-12 months, before re-engaging for older two and five year old memories. Remarkably, when re-examined eight months later, representations of individual memories had undergone their hypothesised strengthening or weakening over time. We conclude that the temporal recruitment of vmPFC in autobiographical memory retrieval seems to be non-linear, revealing a previously-unknown feature of systems-level consolidation that is absent from current theories.


Introduction
1 Autobiographical memories are the cherished ghosts of our past. Through them we visit places long 2 departed, see faces once familiar, hear voices now silent and re-experience emotions since dormant. 3 These memories of our personal past experiences can be many decades old yet we are often able to 4 recall them on a whim and with ease. How the brain represents autobiographical memories over a 5 lifetime is one of the central, and as yet unanswered, questions of memory neuroscience. 6 7 Our fleeting present transitions into our autobiographical past through the modification of synaptic 8 connectivity over the course of a few hours (Kandel, 2001), an undisputedly hippocampal-dependent 9 process (Guzowski, 2002; Morris et al., 2003;Pastalkova et al., 2006;Runyan & Dash, 2005). A 10 memory's journey does not end there, however, because over the course of time these memories 11 come to be represented in the neocortex -this is termed systems-level consolidation (Frankland & 12 Bontempi, 2005) -although the precise timeframe for this is unknown. Whether the hippocampus 13 ever fully relinquishes its involvement in autobiographical memory retrieval is a long-standing 14 debate. One theory asserts that the hippocampus plays a short-term role before memories become 15 fully consolidated to the neocortex and can be retrieved without the hippocampus (Squire,Genzel,16 Wixted, & Morris, 2015). Other accounts posit that the hippocampus is necessary for the vivid 17 retrieval of autobiographical memories in perpetuity ( In humans, a putative functional homologue in the context of autobiographical memory is the 12 ventromedial prefrontal cortex (vmPFC). Damage to this region in humans has been linked to 13 impoverished recall of recent and remote autobiographical memories (e.g., Bertossi,Tesini,Cappelli,14 and Ciaramelli, 2016). In a recent review, however, McCormick, Ciaramelli, De Luca, and Maguire 15 (2017) noted that it is difficult to come to a firm conclusion about the status of autobiographical 16 memory in vmPFC-damaged patients. This is due to the dearth of studies examining 17 autobiographical memory retrieval in detail in patients with selective bilateral vmPFC damage. photograph, chose a phrase to help remind them of this memory during the subsequent scanner 4 task, and rated its characteristics. (B-F) Subjective ratings (means +/-1SEM; see also Supplemental 5 Table 1A) of memory characteristics at each time period for Experiment 1, averaged across the two 6 sets of memories. Ratings were on a scale of 1 to 5, where 1 was low and 5 was high. For emotional 7 valence: 1-2 = negative, 3 = neutral, 4-5 = positive. * p < 0.05, ** p < 0.01, *** p < 0.001. 8 9 10 Objective scoring of memory details 1 To complement the subjective ratings of memory characteristics with a more objective assessment 2 of their content, transcripts of participants' memory interviews were scored using the 3 Autobiographical Interview protocol (Levine, Svoboda, Hay, Winocur, & Moscovitch, 2002;Materials 4 and Methods). In total for this first experiment, 10,187 details were scored. The mean (SD) number 5 of internal details (bound to the specific 'episodic' spatiotemporal context of the event) and external 6 details (arising from a general 'semantic' knowledge or references to unrelated events) are shown in 7 Supplemental Table 1B (see also Figure 2). They were then compared across time periods. In 8 contrast to the subjective ratings of memory detail, there was no difference in the number of details 9 recalled across memories from different time periods (F (4.54,131.66) = 1.92, p = 0.101). As expected, the 10 number of internal and external details differed (F (1,29) = 206.03, p < 0.001), with more internal 11 details recalled for every time period (all p < 0.001). No interaction between time period and type of 12 detail was observed (F (7,203) Table 1B)  2 number of internal and external details at each time period, averaged across the two sets of 3 autobiographical memories. 4 5

Memory representations in the vmPFC 6
Ventromedial prefrontal cortex was delineated as the ventral medial surface of the frontal lobe and 7 the medial portion of the orbital frontal cortex (Mackey & Petrides, 2014 10 10 and the medial part of BA 11 (see Figure 3A, and Materials and Methods). 11 12 On each trial, the photograph and associated pre-selected cue phrase relating to each event were 13 displayed on a screen for 3 seconds. Following removal of this cue, participants then closed their 14 eyes and recalled the memory. After 12 seconds, the black screen flashed white twice, to cue the 15 participant to open their eyes. The participant was then asked to rate how vivid the memory recall 16 had been using a five-key button box, on a scale of 1-5, where 1 was not vivid at all, and 5 was highly 1 vivid (see Figure 3B). 2 3 We used Representational Similarity Analysis (RSA) to quantify the extent to which the strength of 4 memory representations in the vmPFC differed as a function of memory age. This was achieved by 5 contrasting the similarity of neural patterns when recalling the same memory with their similarity to 6 other memories to yield a "neural representation" score for each memory (see Materials and 7 Methods and Figure 3C). As there were two memories recalled per time period, the neural 8 representation scores were averaged to produce one value for that period. representation score calculation using RSA. The neural pattern similarity across trials recalling the 4 same memory (orange) minus the mean pattern similarity between that memory and other 5 memories (yellow) generates a "neural representation" score. A score significantly higher than zero 6 indicates a neural pattern distinct to that memory is present in the vmPFC. 7 8 9 10 We anticipated an increase in the strength of memory representations at some point between 0.5M 1 and 24M, in line with the results of Bonnici et al. (2012) and Bonnici and Maguire (2017). This is what 2 we observed, where the most recent 0.5M memories were undetectable (t 29 = 0.72, p = 0.477) in 3 vmPFC, in contrast to the distinct neural signatures observed for 4M (t 29 = 2.85, p = 0.008), 8M (t 29 = 4 3.09, p = 0.004) and 12M (t 29 = 3.66, p < 0.001) old memories (see Figure 4A). These changes in the 5 strength of memory representations were significant across time periods (F (7,203) = 2.22, p = 0.034), 6 with an observed increase in vmPFC recruitment from 0.5M to 8M (t 29 = 2.07, p = 0.048) and 12M 7 (t 29 = -2.20, p = 0.036). However, what was observed for the following two time periods was 8 unexpected -an apparent disengagement of the vmPFC over the next eight months as we observed 9 weak detectability of memory representations in vmPFC for 16M (t 29 = 1.85, p = 0.074) and 20M (t 29 10 = 1.03, p = 0.310) old memories. Neither 16M (t 29 = -1.06, p = 0.298) nor 20M memories (t 29 = -0.40, 11 p = 0.691) were more strongly represented than the recent 0.5M old memories. In contrast, the 12 more remote 24M (t 29 = 4.34, p < 0.001) and 60M (t 29 = 3.55, p = 0.001) memories were detectable 13 in the vmPFC, and significantly more so than the most recent memories (24M vs 0.5M, t 29 = -2.93, p 14 = 0.007; 60M vs 0.5M, t 29 = -2.54, p = 0.017) as well as the more temporally proximal 20M old 15 memories (24M vs 20M, t 29 = -2.50, p = 0.018; 60M vs 20M, t 29 = -2.32, p = 0.028). 16 17 The experimental design afforded us the opportunity to verify this biphasic pattern. As we sampled 18 two memories per time-point, this time-dependent pattern should be evident in both sets of 19 memories. As shown in Figure 4B, the two sets of memories followed a similar time-course of 20 changes in representation within vmPFC. This is a compelling replication, given that the two 21 memories from each time-period were unrelated in content as a prerequisite for selection, recalled 22 in separate sessions in the scanner and analysed independently from each other. separately for the two sets of autobiographical memories. 7 1 Our main focus was the vmPFC, given previous work highlighting specifically this region's role in 2 representing autobiographical memories over time Bonnici & Maguire, 2017). 3 We also scanned within a partial volume, so were constrained in what other brain areas were 4 available for testing (see Materials and Methods). Nevertheless, we examined the same areas as 5 Bonnici et al. (2012) and Bonnici and Maguire (2017), and in no case did we observe a significant 6 change in memory detectability across time periods in the entorhinal/perirhinal cortex (F (7,203)  Following scanning, participants completed three additional ratings. They were asked to indicate the 12 extent to which the memories were changed by the 6 repetitions during scanning on a scale ranging 13 from 1 (not at all) to 5 (completely). They reported that the memories were not changed very much 14 by repetition (mean: 2.61, SD: 0.74). They were also asked how often during scanning they thought 15 about the memory interview one week previous on a scale of 1 (not at all) to 5 (completely), with 16 participants indicating they rarely thought about the interview (mean: 2.29, SD: 1.01). Finally, 17 participants were asked the extent to which the recall of memories from each time period unfolded 18 in a consistent manner over the course of the session. A difference was observed (F (7,203) = 2.78, p = 19 0.009), with the most recent 0.5M old memories being rated as more consistently recalled than the 20 most remote 60M memories (t 29 = 3.97, p = 0.012). 21

Rationale and predictions for Experiment 2 23
The biphasic pattern we observed in the fMRI data did not manifest itself in the subjective or 24 objective behavioural data. In fact, the only difference in those data was higher ratings for the most 25 recent 0.5M old memories. However, these were paradoxically the most weakly represented 26 memories in the vmPFC, meaning the neural patterns were not driven by memory quality. The 1 objective scoring of the memories confirmed comparable levels of detail provided for all memories, 2 without any significant drop in episodic detail or increase in the amount of semantic information 3 provided as a function of time. Therefore, the amount or nature of the memory details were not 4 contributing factors. 5 6 Nevertheless, to verify that the results genuinely represented the neural correlates of memory 7 purely as a function of age, one would need to study the effects of the passage of time on the 8 individual neural representations. Therefore we invited the participants to revisit eight months later 9 to recall the same memories again both overtly and during scanning; 16 of the participants agreed to 10 return. In order to generate specific predictions for the neural representations during Experiment 2, 11 we took the actual data for the 16 subjects from Experiment 1 who returned eight months later (see 12 representations. Overall, therefore, while an increase in detectability in vmPFC of the 0.5M 2 memories eight months later is an obvious prediction, the unexpected predictions generated by the 3 Experiment 1 data were a decrease in detectability of the previously well-represented 12M old 4 memories and an increase in the detectability of the previously undetectable 20M old memories, 5 with no concomitant changes in the behavioural data. 6 7 8 Figure 5. Predicted fMRI changes eight months later in Experiment 2. Predicted mean +/-1SEM 9 changes in the neural representations of individual autobiographical memories after eight months 10 (dark grey line), based on shifting the original observed data forward by two time-points for the 16 11 subjects from Experiment 1 (green line) who returned for Experiment 2. Light grey arrows indicate 12 the hypotheses. * p < 0.05, ** p < 0.01. 13 14 Experiment 2 (eight months later) 15 One week prior to the fMRI scan, with the assistance of the personal photographs and previously 16 chosen phrases which were used as cues in Experiment 1, the participants verbally recalled and 17 rated the characteristics of their autobiographical memories just as they had done eight months 1 previously (see Materials and Methods and Figure 6A). and 60M to 68M (t 15 = 9.67, p < 0.001; Figure 6C). Recalling memories eight months later was also 2 perceived as more effortful (F (1,15) = 43.32, p < 0.001), from 0.5M to 8M (t 15 = -7.81, p < 0.001), 4M to 3 12M (t 15 = -3.30, p = 0.039), 16M to 24M (t 15 = -1.95, p = 0.021), and 20M to 28M (t 15 = -4.03, p = 4 0.009; Figure 6D). The elapsed time between experiments also led to a reduction in the reported 5 personal significance of memories (F (1,15) = 11.82, p = 0.004), from 24M to 32M (t 15 = 3.58, p = 0.022; 6 Figure 6E). Ratings of emotional valence also changed over the eight month period (F (1,15) = 9.78, p = 7 0.007), with a reported attenuation of the positivity of memories from 12M to 20M (t 15 = 3.87, p = 8 0.012; Figure 6F). In addition to these main ratings, no difference was reported in the extent to 9 which memories were recalled from a first or third person perspective (F (1,15) = 0.513, p = 0.485) over 10 the eight month period. The extent to which memories were recalled as active or static was altered  Table 2A)  4 of memory characteristics at each time period for Experiment 1 (blue line, n=16) and how the ratings 5 of the same memories differed eight months later during Experiment 2 (red line, n=16) averaged 6 across the two sets of memories in both cases. Ratings were on a scale of 1 to 5, where 1 was low 7 and 5 was high. For emotional valence: 1-2 = negative, 3 = neutral, 4-5 = positive. Asterisks indicate 8 significant differences in memory ratings between Experiment 1 and 2; * p < 0.05, ** p < 0.01, *** p 9 < 0.001. 10 11 Objective scoring of memory details 1 As with Experiment 1, transcripts of participants' memory interviews during Experiment 2 were 2 scored using the Autobiographical Interview protocol (Levine et al., 2002; see Materials and 3 Methods). A total of 6,444 details were scored (see Supplemental Table 2B for means, SD). There 4 was a difference in the number of details recalled across different time periods in Experiment 2 5 (F (7,105) = 2.49, p = 0.021). However, this difference was only observed for external details (F (7,105) = 6 3.25, p = 0.004), with more provided for 28M memories than 12M memories (t 15 = -4.68, p = 0.008). finding which was inconsistent with the predictions generated by Experiment 1 was a decrease in 8 the representation of 24M old memories when they were 32M of age (t 15 = -2.69, p = 0.009). 9 However, this prediction was based on the assumption that memories do not undergo further 10 dynamic shifts in neural representation between two and five years, which may not be the case, and 11 we did not have 32M data from Experiment 1 to corroborate this finding. 12 13 For completeness, Figure 8B plots the neural representation scores for the two sets of memories in 14 Experiment 2. As previously observed in Experiment 1, the two sets of memories displayed a similar 15 time-course in terms of their neural representations, despite being recalled in separate scanning 16 sessions, in a randomised order and analysed separately. 17 18 As with Experiment 1, when examining other areas within the partial volume, in no case did we find 19 a significant difference in memory detectability across time periods. 20 21 Following scanning, participants completed three additional ratings. They were asked to indicate the 22 extent to which the memories were changed by the 6 repetitions during scanning on a scale ranging 23 from 1 (not at all) to 5 (completely). As in Experiment 1, they reported that the memories were not 24 changed very much by repetition (mean: 2.56, SD: 0.81). They were also asked how often they 25 thought of the experience of recalling the memories in Experiment 1 while performing the scanning 26 task in Experiment 2 on a scale of 1 (not at all) to 5 (during every memory). Participants indicated 1 they rarely thought about Experiment 1 (mean: 1.75, SD: 0.93). Finally, the consistency of recall 2 across time periods during the scanning session did not differ in Experiment 2 (F (7,105)  This study exploited the sensitivity of RSA to detect not only the extent to which memories of 2 different ages were represented in the vmPFC, but how these representations changed over time. 3 During Experiment 1, we observed detectability in vmPFC for memories at 4M to 12M of age, which 4 was also evident at 24M and 60M. As expected, recent 0.5M old memories were poorly represented 5 in vmPFC in comparison. Curiously, however, the same was observed for memories that were 16M 6 to 20M old. This pattern persisted across separate sets of memories and was replicated in a follow-7 up study eight months later with the same participants and memories. Behavioural data failed to 8 account for these time-dependent representational changes in either experiment, and this pattern 9 was not evident in other brain areas that we examined. schema. This refers to the abstraction of elements common to multiple experiences which help 25 guide future memory recall by constraining the search to representations matching that template 26 (Hebscher & Gilboa, 2016). Recent memories may not rely on this process to such a large extent, 1 either because they have not yet been assimilated into existing schemas, or are sufficiently close to 2 the current spatiotemporal context to not require such reorientation to representations of the past. 3 4 For memories that were 4M to 12M of age we observed a progressive increase in memory 5 detectability in vmPFC. This could reflect the increased adoption of relevant schema to retrieve a 6 memory, as the vmPFC integrates established memories (Schlichting & Preston, 2015). But with 7 integration comes interference. The more fused and embedded within other memories a single 8 representation becomes, the more difficult it is to avoid drifting into connected memories during 9 recall. The resistance of patients with vmPFC damage to the lure of schematically related content 10 during retrieval highlights this natural propensity in healthy controls (Warren, Jones, Duff, & Tranel, 11 2014). Given that the most recent 12 months represent a congested memory space (Crovitz & 12 Schiffman, 1974), retrieval is likely to also depend on another proposed function of the vmPFC in 13 memory -suppressing those memories which are not relevant (Eichenbaum, 2017). Patients with 14 vmPFC lesions tend to confuse memories from different events (Schnider, von Daniken, & Gutbrod, 15 1996). This has been attributed to a preconscious filtering out of irrelevant traces, a process which 16 when impaired leads to spontaneous confabulation in patients (Schnider, 2003). Therefore, memory 17 retrieval during this period may represent a delicate balance between locating a memory through 18 the elements it shares with others, and then reliving it through suppressing them.  (Rubin, 1982). Therefore, the months that follow may 3 represent the point at which a single memory trace has reached optimal stability through 4 consolidation, with minimal interference from previously related events which have now decayed. 5 As a result neither the guidance of a relevant schema, nor inhibition of irrelevant memories are 6 essential for its recall at this time. These interpretations are of course speculative, but also presumptive -that the neural patterns 24 represent content-related processes rather than either the content or process alone. The neural 25 patterns could theoretically represent memory-specific content, but this is not easily reconciled with 26 the observed biphasic time-course. Conversely, if the patterns represented a generic process 1 common to all autobiographical memory retrieval, detection of individual memories would be 2 impossible as their patterns would not be sufficiently unique. Therefore, it is more reasonable to 3 assume that, when required, the vmPFC retrieves and processes the content of each individual 4 memory in a consistent fashion. That is not to say that structural changes in the region are 5 unnecessary for recall, in fact they may be a prerequisite (Bero et  and possibly serves to strengthen the original memory trace (Lee, 2008). Human episodic memory is 9 also sensitive to such disruption (Hupbach, Gomez, Hardt, & Nadel, 2007)  week before the scan effectively resets the recall recency of all memories removing it as a confound. 17 Furthermore, if recall one week before had somehow accelerated the consolidation process, one 18 would expect 0.5M old memories to be more detectable in the vmPFC than they were. The second 19 and third potential reconsolidation windows were the repeated mental recall in the scanner during 20 Experiment 1, and overt memory recall during the Experiment 2 interview. These could theoretically 21 alter the neural data of Experiment 2. However, given that seven out of the eight specifically 22 hypothesised temporally sensitive changes in neural representations were supported, an altered 23 consolidation time-course appears highly unlikely. Again, the recency of memory recall was now 24 matched for Experiment 2, and participants reported very low frequency of recall between 25 experiments. This suggests that repeatedly recalling the memories during the first experiment did 26 not affect the rate at which participants recalled them subsequently. Increased representational 1 similarity across repetitions also predicts subsequent retrieval success (Xue et al., 2010). Despite the 2 differences in representational similarity scores within Experiment 1, this did not appear to exert an 3 influence as we did not observe any significant change in the number of details recalled between the 4 two experiments. 5 6 One other possible interpretation of the unexpected engagement and disengagement of the vmPFC 7 for memories of different ages is that it may be mirrored by a systematic change in the content of 8 memories. For example, types of events that have taken place at a particular time of year which 9 may be common to all participants, such as a seasonal holiday. However, participants were recruited 10 over a period of five months in an evenly spaced manner, making it unlikely that such events would 11 fall into the same temporal windows across participants. The occurrence of personal events such as 12 birthdays would also be naturally random across participants. The use of personal photographs as 13 memory cues also limited the reliance on time of year as a method for strategically retrieving 14 memories. Furthermore, the nature of memory sampling was that unique, rather than generic, 15 events were eligible, reducing the likelihood of events which are repeated annually being included. 16 Memory detectability was high at 12 month intervals such as one, two and five years in this study, 17 suggesting perhaps it is easier to recall events which have taken place at a similar time of year to the 18 present. However this should have been reflected in behavioural ratings, and equivalently strong 19 neural representations for recent memories, but neither was observed. Most importantly, if content 20 rather than time-related consolidation was the main influence on memory detectability, then we 21 would not have observed any change in neural representation scores from Experiment 1 to 22 Experiment 2, rather than the hypothesised shifts which emerged. 23

24
In the light of our hypotheses, Experiment 2 generated one anomalous finding. Twenty-four month 25 old memories from Experiment 1 were no longer well represented eight months later. If the 26 interpretation of the results of Experiment 1 is correct, these memories were originally the most 1 challenging to retrieve, requiring intervention of the vmPFC. In this context, the lower neural 2 representation values for Experiment 2 imply they became less challenging to recall. So perhaps 3 these particular memories were disproportionally affected by a reconsolidation process whereby the 4 repeated vivid recollection in Experiment 1 strengthened the memory trace and reduced the 5 reliance on the vmPFC for Experiment 2. An additional possibility is that memories of around 32M of 6 age are simply not as reliant on vmPFC, and that the biphasic pattern we observed is in fact a feature 7 that iterates again between 24M and 60M. We cannot verify this in the current experiment, as we 8 did not sample this time-period during Experiment 1. Unlike that study, we did not detect representations of 0.5M old memories in vmPFC. It could be 19 that classification-based MVPA is more sensitive to detection of memory representations than RSA, 20 however, the current study was not optimised for such an analysis because it necessitated an 21 increased ratio of conditions to trials. The Bonnici studies also involved many fewer memories that 22 were recalled more times, which may also have also influenced their results. Nonetheless, the 23 increase in memory representation scores from recent to remote memories been replicated and 24 additionally refined in the current study with superior temporal precision. 25 26 Given that the medial prefrontal cortex is often associated with value and emotional processing 1 (Grabenhorst & Rolls, 2011), could these factors have influenced the current findings? Humans 2 display a bias towards consolidating positive memories (Anderson & Hanslmayr, 2014) and 3 remembered information is more likely to be valued than that which is forgotten (Rhodes,Witherby,4 Castel, & Murayama, 2017). Activity in the vmPFC during autobiographical memory recall has been 5 found to be modulated by both the personal significance and emotional content of memory (Lin,6 Horner, & Burgess, 2016). However, in the current two experiments memories were matched across 7 time periods on these variables. In the eight months between experiments, some memories actually 8 decreased slightly in their subjective ratings of significance and positivity, suggesting that these 9 factors are an unlikely driving force behind the observed remote memory representations in the 10 vmPFC. 11 12 In conclusion, the current results revealed a two-stage systems-level consolidation process which 13 was remarkably preserved across completely different sets of memories in one experiment, and 14 closely replicated in a subsequent longitudinal experiment with the same participants and 15 memories. They support the notion that the vmPFC becomes increasingly important over time for 16 the retrieval of remote memories, perhaps by indexing and processing memory traces elsewhere in 17 the neocortex. Two particularly novel findings emerged. First, this process occurs relatively quickly, 18 by four months following an experience. Second, vmPFC involvement after this time fluctuates in a 19 highly consistent manner, depending on the precise age of the memory in question. Further work is 20 clearly needed to explore the implications of these novel findings, including studies looking at vmPFC 21 connectivity with other brain areas such as the hippocampus. Overall, we conclude that our vmPFC 22 findings may be explained by a dynamic interaction between the changing strength of a memory 23 trace, the availability of temporally adjacent memories, and the resultant differential neural circuitry 24 recruited to successfully retrieve the past. The path to consolidation may not be long, but it is 25 winding. Thirty healthy, right handed participants (23 female) took part (mean age 25.3, SD 3.5, range 21-32). 5 All had normal or corrected-to-normal vision. Each participant gave written informed consent for 6 participation in the study, for data analysis and for publication of the study results. Materials and 7 methods were approved by the University College London Research Ethics Committee. approach was adopted to balance temporal precision with the availability of suitable memories at 20 more remote time-points. 21

22
Participants were asked to describe in as much detail as possible the specific autobiographical 23 memory elicited by a photograph. General probes were given by the interviewer where appropriate 24 (e.g., "what else can you remember about this event?"). Participants were also asked to identify the 25 most memorable part of the event which took place within a narrow temporal window and unfolded 26 in an event-like way. They then created a short phrase pertaining to this episode, which was paired 1 with the photograph to facilitate recall during the subsequent fMRI scan (see Figure 1A). Participants 2 were asked to rate each memory on a number of characteristics (see main text, Figures 1 and 6, 3 Supplemental Tables 1 and 2), and two memories from each time period which satisfied the criteria 4 of high vividness and detail, and ease of recall were selected for recollection during the fMRI scan. 5 6 Behavioural Analyses 7 The interview was recorded and transcribed to facilitate an objective analysis of the details, and the 8 widely-used Autobiographical Interview method was employed for scoring (Levine et al., 2002). 9 Details provided for each memory were scored as either "internal" (specific events, temporal 10 references, places, perceptual observations and thoughts or emotions) or "external" (unrelated 11 events, semantic knowledge, repetition of details or other more general statements). To assess 12 inter-rater reliability, a subset of sixteen memories (n=2 per time period) were randomly selected 13 across 16 different subjects and scored by another experimenter blind to the aims and conditions of 14 the study. Intra-class coefficient estimates were calculated using SPSS statistical package version 22 15 (SPSS Inc, Chicago, IL) based on a single measures, absolute-agreement, 2-way random-effects 16 model. 17 18 As two memories per time period were selected for later recall in the scanner, behavioural ratings 19 were averaged to produce one score per time period. Differences in subjective memory ratings 20 across time periods were analysed using a one-way repeated measures ANOVA with Bonferroni-21 corrected paired t-tests. Differences in objective memory scores of internal and external details 22 across time periods were analysed using a two-way repeated measures ANOVA with Bonferroni-23 corrected paired t-tests. A threshold of p < 0.05 was used throughout both experiments. All ANOVAs 24 were subjected to Greenhouse-Geisser adjustment to the degrees of freedom if Mauchly's sphericity 25 test identified that sphericity had been violated. 26 1 Task during fMRI scanning 2 Participants returned approximately one week later (mean 6.9 days, SD 1) to recall the memories 3 while undergoing an fMRI scan. Prior to the scan, participants were trained to recall each of the 16 4 memories within a 12 second recall period (as in Bonnici et al., 2012 andMaguire, 5 2017), when cued by the photograph alongside its associated cue phrase. There were two training 6 trials per memory, and participants were asked to vividly and consistently recall a particular period 7 of the original event which unfolded across a temporal window matching the recall period.

MRI data acquisition 1
Structural and functional data were acquired using a 3T Siemens Trio MRI system (Siemens, 2 Erlangen, Germany). Both types of scan were performed within a partial volume which incorporated 3 the entire extent of the ventromedial prefrontal cortex (see Figure 3A). coil sensitivity profiles, the images were normalized using a prescan, and a weak intensity filter was 15 applied as implemented by the scanner's manufacturer. To improve the SNR of the anatomical 16 image, three scans were acquired for each participant, coregistered and averaged. Additionally, a 17 whole brain 3D FLASH structural scan was acquired with a resolution of 1 x 1 x 1 mm. 18 19 Functional data were acquired using a 3D echo planar imaging (EPI) sequence which has been 20 demonstrated to yield improved BOLD sensitivity compared to 2D EPI acquisitions (Lutti, Thomas, 21 Hutton, & Weiskopf, 2013). Image resolution was 1.5mm 3 and the field-of-view was 192mm in-22 plane. Forty slices were acquired with 20% oversampling to avoid wrap-around artefacts due to 23 imperfect slab excitation profile. The echo time (TE) was 37.30 ms and the volume repetition time 24 (TR) was 3.65s. Parallel imaging with GRAPPA image reconstruction (Griswold et al., 2002) 25 acceleration factor 2 along the phase-encoding direction was used to minimize image distortions and 26 yield optimal BOLD sensitivity. The dummy volumes necessary to reach steady state and the GRAPPA 1 reconstruction kernel were acquired prior to the acquisition of the image data as described in Lutti 2 et al. (2013). Correction of the distortions in the EPI images was implemented using B0-field maps 3 obtained from double-echo FLASH acquisitions (matrix size 64x64; 64 slices; spatial resolution 3mm 3 ; 4 short TE=10 ms; long TE=12.46 ms; TR=1020 ms) and processed using the FieldMap toolbox available 5 in SPM (Hutton et al., 2002). 6 7

MRI data analysis 8
Preprocessing 9 fMRI data were analysed using SPM12 (www.fil.ion.ucl.ac.uk/spm). All images were first bias 10 corrected to compensate for image inhomogeneity associated with the 32 channel head coil ( Van 11 Leemput, Maes, Vandermeulen, & Suetens, 1999). Fieldmaps collected during the scan were used to 12 generate voxel displacement maps. EPIs for each of the twelve sessions were then realigned to the 13 first image and unwarped using the voxel displacement maps calculated above. The three high-14 resolution structural images were averaged to reduce noise, and co-registered to the whole brain 15 structural scan. EPIs were also co-registered to the whole brain structural scan. Manual 16 segmentation of the vmPFC was performed using ITK-SNAP on the group averaged structural scan 17 normalised to MNI space. The normalised group mask was warped back into each participant's 18 native space using the inverse deformation field generated by individual participant structural scan 19 segmentations. The overlapping voxels between this participant-specific vmPFC mask and the grey 20 matter mask generated by the structural scan segmentation were used to create a native-space grey 21 matter vmPFC mask for each individual participant. against which to test the neural data, nor did we want to make assumptions regarding the spatial 16 distribution of informative voxels in the vmPFC. 17 18 As participants recalled two memories per time-point, the dataset was first split into two sets of 19 eight time points, which were analysed separately using RSA. To characterise the strength of 20 memory representations in the vmPFC, the similarity of neural patterns across recall trials of the 21 same memory was first calculated using the Pearson product-moment correlation coefficient, 22 resulting in a "within-memory" similarity score. Then the neural patterns of each memory were 23 correlated with those of all other memories, yielding a "between-memory" similarity score. For each 24 memory, the between-memory score was then subtracted from the within-memory score to provide 25 a neural representation score (see Figure 3C). This score was then averaged across the two 26 memories at each time-point. Results for the left and the right hemispheres were highly similar, and 1 therefore the data we report here are from the vmPFC bilaterally. A distinctive neural pattern 2 associated with the recall of memories at each time period would yield a score significantly higher 3 than zero, which was assessed using a one-sample t-test. Strengthening or weakening of memory 4 representations as a function of remoteness would result in a significant difference in memory 5 representation scores across time periods, and this was assessed using a one-way repeated 6 measures ANOVA with post-hoc two-tailed paired t-tests. The range of values that we observed are 7 entirely consistent with those in other studies employing a similar RSA approach in a variety of 8 learning, memory and navigation tasks in a wide range of brain regions (Bellmund, Deuker, Navarro

Memory interview 2
Participants were presented with the 16 photographs and cue phrases associated with the 3 autobiographical memories in Experiment 1 and were asked to describe in as much detail as possible 4 the specific event which they had recalled previously. General probes were given by the interviewer 5 where appropriate (e.g. "what else can you remember about this event?"). The interviewer availed 6 of summarised transcripts from Experiment 1 to verify the same memory and details were being 7 recalled. Participants then rated each memory on the same characteristics assessed in Experiment 8 one. The memory interview during Experiment 2 was also recorded and transcribed. 9 10

Behavioural Analyses 11
The analysis of subjective and objective ratings for Experiment 2 followed exactly the same 12 procedure as Experiment 1. The extent to which subjective ratings for the same memory had 13 changed between Experiment 1 and Experiment 2 was assessed using a two-way (experiment x time 14 period) repeated measures ANOVA with Bonferroni-corrected paired t-tests. Differences in objective 15 memory ratings across experiments were analysed using a two (experiment) x two (detail) x eight 16 (time period) repeated measures ANOVA with Bonferroni-corrected paired t-tests. 17 18

Task during fMRI scanning 19
Participants returned approximately one week later for the fMRI scan (mean 5.5 days, SD 3.7). Prior 20 to scanning, only one reminder training trial per memory was deemed necessary given the prior 21 experience of performing the task in Experiment 1. The scanning task remained unchanged from 22 Experiment 1, aside from the re-randomisation of trials within each session. When the least vivid 23 trials were excluded, the mean number of trials (/6) selected for analysis from each time period

MRI data acquisition 2
Structural and functional data were acquired using the same scanner and scanning sequences as 3 Experiment 1. However the prior acquisition of the partial volume structural scans negated the need 4 to include these in the protocol of Experiment 2. 5 6

MRI data analysis 7
Preprocessing 8 fMRI data were preprocessed using the same pipeline as Experiment 1, with the additional step of 9 co-registering the functional scans of Experiment 2 to the structural scans of Experiment 1, which 10 enabled the use of the vmPFC masks from Experiment 1. First-level GLMs of each recall trial were 11 constructed in an identical manner to Experiment 1. 12

13
Representational Similarity Analysis 14 RSA of the Experiment 2 fMRI data was conducted in an identical manner to Experiment 1. The 15 average number of voxels analysed in the vmPFC across the two sets of memories for all participants 16 was 5228 (SD: 1765). To generate predicted changes in representations in the eight months from 17 Experiment 1 to Experiment 2, the scores from Experiment 1 were shifted by two time-points, and a 18 two-tailed paired t-test was performed on each memory's original neural representation score and 19 its expected score eight months later (see Figure 5). To ascertain whether the observed neural 20 representation scores had changed between Experiments 1 and 2, a two-way (experiment x time 21 period) repeated measures ANOVA was performed. To investigate if these changes mirrored the 22 predictions generated by the original data, paired t-tests were performed between the actual neural 23 representation scores for each memory from Experiment 1 and Experiment 2, one-tailed if there was 24 a hypothesised increase or decrease. 25 26 Acknowledgements 1 We thank David Bradbury and Imaging Support for technical assistance. The studies were approved by the University College London Research Ethics Committee: #6743/002 5 Systems-Level Consolidation of Autobiographical Memories. Written informed consent was obtained 6 from each participant for participation in the study, for data analysis and for publication of the study 7 results. 8 9 10 11