We are remarkably adept at inferring the consequences of our actions, yet the neuronal mechanisms that allow us to plan a sequence of novel choices remain unclear. We used functional magnetic resonance imaging (fMRI) to investigate how the human brain plans the shortest path to a goal in novel mazes with one (shallow maze) or two (deep maze) choice points. We observed two distinct anterior prefrontal responses to demanding choices at the second choice point: one in rostrodorsal medial prefrontal cortex (rd-mPFC)/superior frontal gyrus (SFG) that was also sensitive to (deactivated by) demanding initial choices and another in lateral frontopolar cortex (lFPC), which was only engaged by demanding choices at the second choice point. Furthermore, we identified hippocampal responses during planning that correlated with subsequent choice accuracy and response time, particularly in mazes affording sequential choices. Psychophysiological interaction (PPI) analyses showed that coupling between the hippocampus and rd-mPFC increases during sequential (deep versus shallow) planning and is higher before correct versus incorrect choices. In short, using a naturalistic spatial planning paradigm, we reveal how the human brain represents sequential choices during planning without extensive training. Our data highlight a network centred on the cortical midline and hippocampus that allows us to make prospective choices while maintaining initial choices during planning in novel environments.
We are remarkably adept at inferring the consequences of our actions, even in novel situations. However, the neuronal mechanisms that allow us to plan a sequence of novel choices remain a mystery. One hypothesis is that anterior prefrontal brain regions can jump ahead from an initial decision to evaluate subsequent choices. Here, we examine how the brain represents initial versus subsequent choices of varying difficulty during spatial planning in novel environments. Specifically, participants visually searched for the shortest path to a goal in pictures of novel mazes that contained one or two path junctions. We monitored the participants’ brain activity during the task with functional magnetic resonance imaging (fMRI). We observed, in the anterior prefrontal brain, two distinct responses to demanding choices at the second junction: one in the rostrodorsal medial prefrontal cortex (rd-mPFC), which also signalled less demanding initial choices, and another one in the lateral frontopolar cortex (lFPC), which was only engaged by demanding choices at the second junction. Notably, interactions of the rd-mPFC with the hippocampus, a region associated with memory, increased when planning required extensive deliberation and particularly when planning led to accurate choices. Our findings show how humans can rapidly formulate a plan in novel environments. More broadly, these data uncover potential neural mechanisms underlying how we make inferences about states beyond a current subjective state.
Citation: Kaplan R, King J, Koster R, Penny WD, Burgess N, Friston KJ (2017) The Neural Representation of Prospective Choice during Spatial Planning and Decisions. PLoS Biol 15(1): e1002588. https://doi.org/10.1371/journal.pbio.1002588
Academic Editor: Matthew F. S. Rushworth, Oxford University, UNITED KINGDOM
Received: July 13, 2016; Accepted: December 14, 2016; Published: January 12, 2017
Copyright: © 2017 Kaplan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: fMRI data and accompanying statistical images are available from the UCL Institutional Data Access/Ethics Committee for researchers who meet the criteria for access to confidential data. To make a request, please contact Ms. Selina Mir by email at email@example.com. Quantitative data for the X-Y and bar plots shown in Figs 2, 3, 4 and 5 are in the Supporting Information.
Funding: The Wellcome Trust Centre for Neuroimaging is supported by core funding from the Wellcome Trust (Strategic Award Grant 091593/Z/10/Z). This research was supported by grants from Medical Research Council to NB, separate grants from the Wellcome Trust to KJF and NB, and a Sir Henry Wellcome Fellowship (101261/Z/13/Z) to RKa. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: BOLD, blood-oxygen-level dependent; dACC, dorsal anterior cingulate cortex; dlPFC, dorsolateral prefrontal cortex; fMRI, functional magnetic resonance imaging; FPC, frontopolar cortex; FWE, family-wise error; GLM, general linear model; IPS, intraparietal sulcus; ITI, intertrial interval; lFPC, lateral frontopolar cortex; LOC, lateral occipital cortex; MNI, Montreal Neurological Institute; PCC, posterior cingulate cortex; PFC, prefrontal cortex; pgACC, pregenual anterior cingulate cortex; PMC, premotor cortex; PPC, posterior parietal cortex; PPI, psychophysiological interaction; pSMA, pre-supplementary motor area; rd-mPFC, rostrodorsal medial prefrontal cortex; ROI, region of interest; RT, reaction time; SD, standard deviation; SEM, standard error of the mean; SFG, superior frontal gyrus; SVC, small-volume correction; vmPFC, ventromedial prefrontal cortex
Goal-directed behaviour rests on being able to rapidly evaluate the potential consequences of future actions. For example, consider the neuronal processing required for planning a new route home when a road you normally take is closed. Although previous studies have implicated anterior prefrontal regions in planning [1–5], it has been difficult to tease apart the relative contributions of different prefrontal cortex (PFC) regions (i.e., rostral versus caudal or lateral versus medial PFC) that respond to choices later in a sequence [6–7]. Moreover, the neural representation of how we rapidly make a series of novel choices remains unclear, because planning studies generally rely on extensive learning about the outcomes of alternative choices [2–5,7].
Here, we ascertained whether different anterior PFC regions signal uncertainty about novel sequential choices in a distinct manner during plan formation. Specifically, we were interested whether rostrodorsal medial PFC (rd-mPFC), a brain region associated with imagining/simulating potential choices [8–10], might be biased towards responding to choices later in a sequence, even in the absence of prior learning about the consequences of choices.
We created a spatial planning task that would require little to no learning in which participants could call on an internal model of space deployed during exploration of the physical world . Our task required participants to choose the shortest route between a start and goal location during functional magnetic resonance imaging (fMRI) scanning: participants viewed one of 220 mazes with either two routes (shallow mazes) or four routes (deep mazes) to the goal. Shallow mazes only had one choice point at the start location, whereas deep mazes also offered a second choice point deeper into the maze. This design enabled us to see how responses to plan formation were modified by the depth of prospection (i.e., the number of choice points) and the uncertainty about those choices (i.e., the difference in lengths between the two available paths from each choice point). After planning their route, participants were asked to make a decision—at a specified choice point in a given maze—about the direction of the shortest path (i.e., optimal choice) to the goal location. This gave us an additional measure (reaction time [RT]) to quantify the uncertainty about a choice beyond the difference in available path lengths (Fig 1A). As with shallow mazes, participants were only prompted to make one choice after seeing a deep maze, but until the choice point was highlighted, they did not know which choice point (starting point or the choice point further in the maze) would be probed.
(A) During a 3.25-s planning phase, participants had to infer the shortest path from the starting point in maze (a red square) to the goal location (green square) and remember the chosen direction for each choice point along the shortest path. Half of the mazes (shallow mazes) had two paths and only one choice point (red square), whereas the other half (deep mazes) had four paths and two choice points (red square and another point further in the maze). After 3.25 s, a choice point was highlighted (choice highlight) for 250 ms. The highlighted location could either be the red square or the second choice point along the shortest path for deep mazes. In shallow mazes, only the red starting location was highlighted. Crucially, for deep mazes, participants were tested on one choice point before starting the next trial. Subsequently, the choice period featured a first-person viewpoint of the highlighted location, where participants had a maximum of 1.5 s (Deep Maze mean: ~545 ms; Shallow Maze mean: ~440 ms) to choose the correct direction on the shortest path (left, forward, right, or equal) with a button box. Immediately following the button press, an intertrial interval (ITI) screen appeared for 1.5 s before a new trial began. (B) Left: Example shallow maze trial with a large path length difference (less demanding choice). Right: Example shallow maze trial with a small path length difference (demanding choice). (C) Left: Example deep maze trial with a small path length difference (demanding) initial choice at the red square and large (easy) path length difference at the second (prospective) choice point. Right: Example deep maze trial with a large (easy) path length difference initial choice at the red square and small (demanding) path length difference at the second (prospective) choice point. Deep mazes contained a combination of small, medium, or large path length differences at first (initial) and second (prospective) choice points. (D) Overhead view (not shown during the experiment) of three example mazes showing which path lengths contribute to each parametric regressor in our fMRI analyses: initial (left), prospective (centre), and unchosen path length differences in deep mazes. Initial path length differences in shallow mazes represent the difference between the only two available paths. The black line highlights shortest path.
Participants made correct choices 84.0% of the time (standard deviation [SD] = 5.13%; n = 29) during the fMRI experiment, with an average RT of 492 ms (SD = 150 ms). In deep mazes, when participants were prompted with choices that were at junctions deeper in the maze (i.e., the second/prospective choice point of a two choice sequence), they made correct choices 84.9% (SD = 9.89%) of the time, with an average RT of 671 ms (SD = 172 ms). There was no significant difference (t(28) = 1.84; p = 0.077; SD = 6.62%) in behavioural accuracy (percentage of correct choices) between deep (mean = 85.2%; SD = 6.33%) and shallow trials (mean = 82.9%; SD = 5.88%). In contrast, there was a significant difference in RT (t(28) = 14.3; p < 0.001; SD = 39.3 ms), with greater RTs for deep (mean = 545 ms; SD = 148 ms) versus shallow trials (mean = 440 ms; SD = 152 ms). Notably, mean RTs were not correlated with accuracy across participants (r = 0.257; p = 0.178).
Investigating the effect of path length differences on participant choice accuracy and RT in deep mazes, we observed a significant interaction between initial (i.e., first choice point) and prospective (i.e., at the second choice point) path length differences for both accuracy (F(2,27): 25.6; p < 0.001; Fig 2A) and RT (F(2,27): 11.4; p < 0.001; Fig 2B). There was a significant positive linear trend for accuracy and initial path length differences (F: 19.4; p < 0.001) but no similar linear trend for RT (F: 0.18; p = 0.674). Notably, we observed positive and negative significant linear trends with prospective path length differences for accuracy (F: 13.9; p = 0.001) and RT (F: 7.5; p = 0.011), respectively.
(A) Accuracy during choice phase. Left: Significant interaction (p < 0.001) for initial versus prospective path length differences in deep mazes. Deep mazes are split by small–small, small–medium, small–large, medium–small, medium–medium, medium–large, large–small, large–medium, and large–large path length differences at the initial choice point (i.e., the shortest options for either choice at the starting location) and the two available paths at the prospective choice point, respectively. Right: Significant positive linear trend in accuracy (p < 0.001) with increasing path length differences for shallow mazes. Shallow mazes are split by small, medium, and large path length differences. (B) Log RT during the choice phase. Left: Significant interaction (p < 0.001) for initial versus prospective path length differences in deep mazes. Deep mazes are split by small–small, small–medium, small–large, medium–small, medium–medium, medium–large, large–small, large–medium, and large–large path length differences at the initial choice point (i.e., the shortest options for either choice at the starting location) and the two available paths at the prospective choice point, respectively. Right: Significant negative trend (p < 0.001) in log RT with increasing path length differences in shallow mazes. Shallow mazes are split by small, medium, and large path length differences. See S1 Data for participant data.
In shallow mazes, we observed a significant main effect of path length difference for both accuracy (F(2,27): 173.1; p < 0.001; Fig 2A) and RT (F(2,27): 52; p < 0.001; Fig 2B). As expected, there was a significant positive linear trend for accuracy (F: 354.6; p < 0.001) with larger path length differences, whereas there was a significant negative trend with RT (F: 81; p < 0.001; Fig 2). In deep mazes, accuracy was much lower when there were both small initial and prospective path length differences (Fig 2A).
We then investigated the influence of path length differences at the initial choice point on prospective choice behaviour. Unsurprisingly, when participants were prompted with the prospective choice point, we observed a significant (p < 0.05) main effect of the prospective choice path length difference (F(2,27): 6.57; p = 0.005; S1 Fig) on these choices and a linear increase in accuracy with larger path length differences (F(2,27): 8.47; p = 0.007). However, we found no significant difference in prospective choice accuracy when split by the initial path length difference (F(2,27) = 0.887; p = 0.424; S1 Fig). Investigating prospective choice RT, we observed a main effect of prospective choice RT based on the path length difference of the prospective choice point (F(2,27) = 6.40; p = 0.005; S1 Fig) and also when split by the (unprobed) initial path length difference (F(2,27) = 5.70; p = 0.009; S1 Fig). Similar to choice performance, there was a negative linear trend for higher prospective choice point RT with smaller path length differences at the prospective choice point (F(2,27) = 6.0; p = 0.021). However, we did not observe a significant linear decrease in RT when we split prospective choice RTs by the path length difference of the initial choice/starting point (F(2,27) = 3.1; p = 0.089). Taken together, these results suggest that the path length difference of the initial choice did not affect performance on prospective choices but did influence deliberation time (i.e., RT).
To assess the impact of planning sequential choices with varying processing demands, we classified deep maze trials by the path length difference between the shortest path and the other paths separately (i.e., initial, prospective, and unchosen path length differences; see Fig 1D for schematic of each path length comparison and S4 Table for list of regressors). Additionally, we asked whether subsequent choice behaviour (RT and accuracy) as well as other aspects of the planning task (e.g., the length of the shortest available path and whether the first or second choice was prompted during deep maze trials) also explained brain activity during the planning phase. To summarize, we included the following parametric modulators for deep maze trials: the path length difference between the two shortest paths present at the starting point (Initial Path Length Difference), the path length difference at the optimal second choice point (Prospective Path Length Difference), the path length difference between the longest/least viable path in the initially unchosen direction and the shortest path (Unchosen Path Length Difference), participants’ subsequent log RT during the choice phase (Log RT), the length of the shortest available path, whether participants answered the subsequent choice trial correctly or not (Accuracy), and whether participants were prompted to make an initial or prospective choice (Prompted Choice). Importantly, the same parametric modulators were included for shallow maze trials except for Prospective Path Length Difference, Unchosen Path Length Difference, and Prompted Choice.
We only report clusters that survive family-wise error (FWE) correction for multiple comparisons (p < 0.05) at the statistical threshold of p < 0.005 uncorrected. The only exception is in the hippocampus, where all reported activations contain a peak-voxel that survives (p < 0.05) small-volume correction (SVC) for the bilateral hippocampus.
Prospective Path Length Difference
We first asked whether, during deep maze trials, there were fMRI responses specifically related to inferences about the prospective choice point, i.e., blood-oxygen-level dependent (BOLD) changes related to choosing between the two paths at the second choice point that were not fully explained by path length differences at the initial choice point. We observed a very large cluster peaking in dorsal anterior cingulate cortex/pre-supplementary motor area (dACC/pSMA; x = 6; y = 23; z = 37; Z-score: 5.08; Fig 3) with a sub-peak extending into rd-mPFC; x = −15; y = 38; z = 34; Z-score: 2.8; Fig 3) that responded to smaller prospective path length differences. Notably, there were also significant clusters in lateral frontopolar cortex (lFPC; x = −27; y = 53; z = 4; Z-score: 3.86; Fig 3), posterior parietal cortex (PPC; x = 3; y = −73; z = 55; Z-score: 4.41), left inferior temporal cortex (x = −57; y = −43; z = −17; Z-score: 3.68), and right cerebellum (x = 30; y = −55; z = −26; Z-score: 3.73; S8 Table).
(A) Regions significantly responding to smaller prospective path length differences. Left: Coronal image showing rd-mPFC/superior frontal gyrus (SFG). Centre: Sagittal image showing dACC/pSMA. Right: Coronal image showing left lFPC. (B) Pregenual anterior cingulate cortex/ventromedial PFC (pgACC/vmPFC) region significantly engaged by larger prospective path length differences. (C) Effect size for an 8-mm sphere around the rd-mPFC/SFG (left), dACC/pSMA (centre), and left lFPC (right) peak voxels that responded to smaller prospective path length differences displayed in A for three parametric modulators: initial (including both deep and shallow mazes), prospective, and unchosen path length differences (mean ± standard error of the mean [SEM]). (D) Effect size for an 8-mm sphere around the pgACC/vmPFC peak voxel that responded to larger prospective path length differences. For both C and D, asterisks indicate a significant correlation (p < 0.05) with path length differences. A positive effect size represents a BOLD correlation with larger path length differences, whereas a negative effect size represents a correlation with smaller path length differences. (E) Images centred on rd-mFPC peak in A, which was the only region featured in a that significantly responded to decreasing prospective versus initial choice path length differences in shallow mazes. All highlighted regions survived cluster-level FWE correction at p < 0.05 and are displayed at an uncorrected statistical threshold of p < 0.005. Corresponding coordinate from the Montreal Neurological Institute (MNI) template image listed below each brain image. See S2 Data for individual effect size data.
Given that the rd-mPFC activation was a small sub-peak in a very large cluster centred on dACC/pSMA, we wanted to confirm whether there was truly a robust rd-mPFC signal selectively related to planning prospective choices and whether this signal differed from the other prefrontal responses observed in dACC/pSMA and lFPC. We therefore conducted a paired t test comparing responses to prospective path length differences versus initial path length differences in shallow mazes. We observed a significant rd-mPFC sub-peak (x = −15; y = 38; z = 28; Z-score: 3.61; Fig 3D) that responded to smaller prospective versus initial path length differences. The cluster covered the peak rd-mPFC voxel from the previous contrast and was centred on left dorsolateral prefrontal cortex (dlPFC; x = −18; y = 17; z = 43; Z-score: 4.33; see S2 Fig for images of dlPFC peak). Crucially, this cluster was much smaller than the previous rd-mPFC result and did not include the dACC/pSMA region that responded to prospective path length differences. Likewise, we observed no significant difference in lFPC responses to smaller prospective versus initial path length differences. Of particular interest, the significant effect in rd-mPFC was driven by its significant response to both large initial and smaller prospective path length differences (Fig 3C)—a pattern that was not observed in lFPC or dACC/pSMA.
The null result suggesting that lFPC does not respond to smaller prospective versus initial path length differences in shallow mazes should be interpreted with caution. Our general linear model (GLM) based on path length differences did not distinguish whether these fMRI results were due to the number of paths or the depth of planning. Indeed, when using a Shannon entropy model that compared RT-fitted uncertainty for prospective path length differences versus the absolute value of the difference between all four available paths lengths (see S1 Text for details), we found that both rd-mPFC and lFPC selectively responded to prospective uncertainty (S1 Text).
In the reverse contrast, larger path length differences at the prospective choice point elicited responses in pregenual anterior cingulate cortex/ventromedial PFC (pgACC/vmPFC; x = −3; y = 38; z = −11; Z-score: 3.69; Fig 3B). Notably, this finding is in contrast to a model-based analysis (see Supplemental Results, S1 Text) in which no parallel activation in pgACC/vmPFC related to decreasing prospective uncertainty was observed. This is possibly due to the inclusion of all path length differences and not just the two shortest paths available at either choice point.
Initial Path Length Difference
We also examined whether in both deep and shallow planning trials there were regions that responded to the difference between the two shortest path lengths available at the initial/first choice point (See Fig 1D for illustration). We found that larger path length differences at the initial choice point elicited responses in the temporoparietal junction (TPJ)/angular gyrus, vmPFC (S3 Fig), and posterior cingulate cortex (PCC; see S3 Fig and S7 Table). Notably, rd-mPFC (t(28): 2.41; p = 0.023) but not lFPC (t(28): −0.468; p = 0.644; Fig 3C) significantly responded to increasing initial path length differences (see Table 1 for rd-mPFC and lFPC t-values related to other parametric regressor of interest). It is important to note that this vmPFC cluster only responding to large initial path length differences (S3 Fig) is rostral and superior to the pgACC/vmPFC cluster responding to both large initial and prospective path length differences.
Following our results related to larger path length differences, smaller path length differences at the first choice point elicited responses in the dACC/pSMA, along with right dlPFC, anterior insula, and PPC (see S7 Table). The dissociation between regional responses increasing and decreasing with initial path length differences reflects similar responses to larger versus smaller reward prediction errors observed during value-guided choice [12–14].
Unchosen Path Length Difference
In a separate comparison, we examined responses to the difference between the shortest and the least viable counterfactual/unchosen path (i.e., what regions corresponded to an exhaustive search or pruning of all potential paths). We found that larger unchosen path length differences engaged the right angular gyrus/TPJ (x = 51; y = −61; z = 25; Z-score: 5.98; Fig 4A and S9 Table), which was the strongest response we observed in any region relating to a path length difference regressor. Additionally, we found PCC (x = 12; y = −46; z = 37; Z-score: 4.93; Fig 4A) and right striatum (x = 27; y = 8; z = 1; Z-score: 4.58; Fig 4A) responses related to larger unchosen path length differences. Notably, in our Shannon Entropy model analysis, which did not have a specific parametric regressor accounting for unchosen path length differences, we found that right angular gyrus/TPJ and PCC both significantly related to increasing prospective uncertainty (Supplemental Results, S1 Text). Taken together, these analyses suggest that angular gyrus and PCC prune unviable paths in deep mazes that afford demanding prospective choices.
(A) Regions that significantly responded to larger unchosen path length differences between the shortest and unchosen path. Top left: Sagittal image showing posterior PCC. Top centre: Coronal image showing right striatum. Top right: Sagittal image showing right angular gyrus. Bottom: Effect size for an 8-mm sphere around the PCC (left), right striatum (centre), and right angular gyrus (right) regions that responded to larger unchosen length differences displayed in A, for three parametric modulators: initial (deep and shallow mazes), prospective, and unchosen path length differences (mean ± SEM). Asterisks indicate a significant correlation (p < 0.05) with path length differences. A positive effect size represents a positive BOLD correlation with larger path length differences, whereas a negative effect size represents a correlation with smaller path length differences. (B) dACC region significantly responding to smaller unchosen path length differences. All highlighted regions survived cluster-level FWE correction at p < 0.05 and are displayed at an uncorrected statistical threshold of p < 0.005. See S3 Data for individual effect sizes.
In contrast, smaller unchosen path length differences engaged dACC (x = 9; y = 17; z = 34; Z-score: 4.41; Fig 4B) and bilateral lateral occipital cortex (LOC; left: x = −24; y = −88; z = −4; Z-score: 4.71; right: x = 27; y = −91; z = 10; Z-score: 5.1; S9 Table). Our post hoc region of interest (ROI) analyses revealed that neither rd-mPFC (t(28): −0.19; p = 0.85) nor lFPC (t(28): 0.399; p = 0.693) significantly encoded the unchosen path, further suggesting that these regions corresponded to rapid sequential inference but not necessarily an exhaustive search of all possible paths.
Asking whether other aspects of mazes (beyond path length differences) influenced neural responses during planning, we investigated whether any fMRI signals during planning correlated with subsequent RT during the choice phase. During planning, fMRI signals in an extremely large portion of cortex—peaking in visual cortex—positively correlated with subsequent RT (S10 Table). The large visual cortical cluster also encompassed ventral temporal regions extending into the bilateral posterior hippocampus (left: x = −27; y = −37; z = −11; Z-score: 4.97; small-volume corrected (SVC) p < 0.001), peaking in the right hippocampus (x = 24; y = −37; z = −8; Z-score: 5.0; SVC p < 0.001; Fig 5). Notably, the right posterior hippocampus peak showed a significantly stronger relationship with subsequent RT in deep versus shallow maze trials (t(28) = 2.71; p = 0.011; Fig 5C).
(A) Coronal image showing higher left hippocampal activity (circled in blue) during planning prior to correct versus incorrect choices. Subthreshold right hippocampal activity that was higher for correct choices is also visible. (B) Sagittal image showing ventral temporal activity extending into right posterior hippocampus (circled in blue) that positively correlated with the distance of the shortest route between the starting and goal location. (C) Top: Sagittal image showing posterior right hippocampal activity during planning that positively correlated with subsequent log RT. Bottom: Effect size for an 8-mm sphere around right posterior hippocampus peak voxel showing that the correlation with log RT is significantly higher (p < 0.05) for deep versus shallow planning trials. All hippocampal peak voxels presented survive correction for multiple comparisons (p < 0.05) across the whole hippocampal volume, but clusters are shown at p < 0.005 uncorrected for visualization purposes. (D) Top: Sagittal image showing medial extent of rd-mPFC (peak voxel same as 3A) that exhibited increased functional connectivity with the right posterior hippocampus in deep versus shallow maze planning trials. Bottom left: Effect size for an 8-mm sphere around rd-mPFC peak voxel (mean ± SEM) showing significantly increased functional connectivity with hippocampus for deep versus shallow planning trials. Bottom right: Hippocampal rd-mPFC functional connectivity (mean ± SEM) was significantly higher during deep planning trials prior to correct versus incorrect choices. See S4 Data for individual effect sizes presented in 5C and 5D.
We also observed similar significant responses in smaller clusters in middle temporal gyrus and dACC (see S10 Table). Likewise, we observed significant (p < 0.05) positive correlations with increased subsequent RT in right angular gyrus/TPJ (t(28) = 3.53; p = 0.002) and PCC (t(28) = 3.01; p = 0.006) regions relating to larger unchosen path length differences, which provides additional evidence that these regions prune unviable paths during deep planning trials. The only negative correlation with subsequent RT was in the insula extending into a large portion of white matter (x = 27; y = −10; z = 10; Z-score: 4.72).
Length of Shortest Path
We investigated which regions responded to the distance of the shortest available path length (i.e., whether the optimal path was distal or proximal to the goal location, irrespective of the other available paths). We observed responses in inferior occipital cortex extending into right posterior hippocampus (x = 33; y = −37; z = −8; Z-score; 4.79; SVC p < 0.001; Fig 5B) that correlated with increasing length of the shortest available path to the goal, along with dACC (S11 Table). Conversely, bilateral TPJ, pgACC/vmPFC, rd-mPFC, precuneous, posterior superior temporal sulcus, and lateral PFC (see S4 Fig and S11 Table) correlated with decreasing distance of the shortest available path length.
Further characterizing the functional contribution of different brain regions, we asked if the responses of different regions during the planning phase related to whether participants subsequently made a correct or incorrect choice. We observed a left hippocampal activation (x = −18; y = −13; z = −17; Z-score: 3.87; SVC p = 0.044; Fig 5A) that preceded correct choices with a subthreshold activation in right anterior hippocampus. Additionally, bilateral cerebellum and motor cortex activations during the planning phase related to correct choices (S12 Table). However, the spatial extent of these performance results should be interpreted with caution, because the hippocampal cluster extended into a large portion of white matter.
Conversely, there was a significant dACC/pSMA cluster (x = 6; y = 17; z = 49; Z-score: 6.89; S5 Fig) that preceded subsequently incorrect choices, which was the strongest activation observed in any contrast. We then tried to determine whether this response was feedback related, because it could have been due to an unobserved choice point. However, we found no significant difference between deep and shallow planning (t(28) = 1.32; p = 0.198; S5 Fig). Likewise, adding a regressor encoding whether the initial or prospective choice point was highlighted in deep mazes (Prompted Choice) did not modify the robustness of the dACC/pSMA activation. Notably, we also observed significant clusters in bilateral anterior insula and intraparietal sulcus (IPS) preceding incorrect choices (S12 Table).
Deep versus Shallow Planning Effects
Investigating whether any regions responded differently to initial path length differences in deep versus shallow mazes, we found that a large cluster in PPC responded more strongly to smaller initial path length differences in shallow versus deep mazes. Likewise, we also observed smaller but significant clusters in premotor cortex (PMC) and dlPFC (S6 Fig and S13 Table). We did not observe any other significant clusters responding to initial path length differences in deep versus shallow mazes.
Next, we investigated whether during the planning phase there were any regions outside of the hippocampus whose responses correlated with subsequent RT for deep versus shallow planning trials differently. We found that visual cortex and right PMC correlated with increasing RT more strongly during shallow planning trials (S6 Fig and S13 Table) but did not find any other significant responses. When splitting responses to the length of the shortest path, we observed that inferior temporal cortex and superior frontal gyrus (SFG) responded to longer optimal path lengths more during deep versus shallow planning trials. Lastly, left LOC, left PPC, and right IPS responses to incorrect choices were higher for shallow planning trials (see S6 Fig and S13 Table).
Psychophysiological Interactions of the Hippocampus
We conducted a psychophysiological interaction (PPI)  analysis of whether the right posterior hippocampal region (Fig 5C) relating to longer subsequent RT was coupled with rd-mPFC as a function of planning depth (mazes affording single versus sequential choices). We tested which regions exhibited increased coupling with hippocampus for deep versus shallow maze planning trials. Taking an 8-mm sphere around the rd-mPFC peak that selectively responded to smaller prospective path length differences (x = −15; y = 38; z = 34), we observed significantly increased coupling between the hippocampus and rd-mPFC for deep versus shallow planning (t(28) = 2.69; p = 0.012; Fig 5D). Notably, the hippocampus coupled more strongly with rd-mPFC than any brain region (peak voxel, x = 12; y = 47; z = 28; Z-score: 3.95; in a separate cluster that did not survive FWE cluster correction p < 0.05 at the whole-brain level). We did not observe any other significant clusters that coupled with the hippocampus anywhere else in the brain for deep versus shallow planning.
To assess the functional relevance of hippocampal coupling with rd-mPFC during deep planning, we conducted a separate GLM analysis splitting deep planning trials based on whether the subsequent choice trial was answered correctly or not (see Supplemental Methods in S1 Text for details of the GLM). We found that hippocampal coupling with rd-mPFC was significantly higher for correct versus incorrect deep planning trials (t(28) = 3.04; p = 0.005; Fig 5D).
Using fMRI and a novel spatial planning paradigm, we examined how different brain regions respond to prospective versus initial choices. We observed two prefrontal regions, lFPC and rd-mPFC, that responded to smaller prospective path length differences (i.e., demanding second-step choices) during planning. Distinguishing the role of these two regions in planning prospective choices, we found that rd-mPFC most strongly responded to deep mazes with larger path length differences at initial/starting choice points and smaller path length differences at prospective choice points (Fig 4). In contrast, lFPC responded to smaller path length differences at prospective choice points without any significant response related to initial path length differences. Notably, we observed hippocampal signals that correlated with subsequent choice accuracy and response time, particularly in mazes affording sequential choices. Additionally, we observed enhanced hippocampal functional connectivity with rd-mPFC during deep maze planning that was higher prior to correct choices. In parallel, we found PCC and angular gyrus responses relating to unchosen paths during sequential planning, whereas vmPFC/pgACC activity related to larger initial and prospective path length differences. In what follows, we relate our prospective spatial planning findings to the wider decision-making literature and to the hippocampal and parietal signals we observed. We then speculate on potential planning computations that might occur during our task.
The Role of rd-mPFC versus lFPC during Planning
Highlighting distinguishable contributions to prospective planning for medial versus lateral anterior prefrontal regions, we find that rd-mPFC responds to difficult prospective choices while maintaining easier initial choices, whereas lFPC responds to prospective path length differences without being significantly modulated by initial path length differences. These findings are in line with the perceived capacity of anterior PFC to exploit recent reward trends during value-guided choice  and spatial navigation . More specifically, our findings suggest that rd-mPFC might be guiding computations related to chaining the whole sequence of choices, whereas lFPC more exclusively relates to robust planning at the second, more prospective choice point independent of the initial choice. Alternatively, when there are increased computational demands at the initial choice point, rd-mPFC might deactivate when it is not clear what the first step should be, allowing lFPC or dACC to take over more robust prospective planning. The ability of lFPC and dACC to help robustly compute second-step choices is in line with previous findings related to counterfactual signals in FPC [6,17–18] and dACC signals related to strategic shifts in decision-making [19–20], along with the smaller unchosen path length dACC signals presented here. Notably, our behavioural results showed initial choice path length differences modulate subsequent RT during prospective choices but not whether the choice was correct or not, which suggests more than one underlying computation occurring related to prospective planning. Taken together with our anterior PFC findings, these data broadly implicate at least two distinct anterior prefrontal computations when planning next-step choices in novel environments—one rapid and another more deliberative computation related to prospective planning.
This lateral versus medial distinction parallels previous research on anterior PFC, where lateral areas are believed to process stimulus-independent (i.e., counterfactual) information, whereas medial areas are engaged by stimulus-oriented information . Furthermore, prospective choices responses in rostral mPFC were primarily dorsal, but the exact location of responses was highly variable over participants, which may relate to the high anatomical variability between individuals in an evolutionarily complex region . Still, our observation of prospective planning responses throughout rostral mPFC is consistent with recent findings showing that different populations in mPFC contribute to internal strategy shifts (see [22–24] for human evidence and [25–28] for rodent evidence) and persistent activity in order to reevaluate sequential choices .
Our result showing increased lateral FPC responses to prospective path length differences might relate to the perceived function of FPC as a simultaneous evaluator of multiple options, perhaps due to a higher sampling capacity (i.e., capable of maintaining more information) than rd-mPFC. Simultaneous evaluation of multiple options is necessary whether a decision is a sequential choice problem or not and is supported by the putative role of FPC in the rapid learning of novel abstract rules  and counterfactual choice [6,17,31]. Further work could focus on the influence of working memory load or cognitive control on types of planning [32–34] and how or whether different cognitive demands determine how a plan is formed or implemented and which prefrontal structures (e.g., dACC versus lFPC or rd-mPFC) are engaged.
Hippocampal Responses during Spatial Planning
Decisions often rely on prospection during multi-step events in order to anticipate a potential outcome, which is a process commonly linked with hippocampal-based memory ([7,35–37]; see  for review). Furthermore, spatial planning in novel environments is usually associated with the use of a hippocampal-based internal model formed by exploration of the physical world , yet corresponding evidence of hippocampal involvement during on the fly planning without extensive prior learning has been lacking. Here, we present evidence of posterior hippocampal responses related to increased deliberation for novel sequential choices and anterior hippocampal responses that relate to choice accuracy. Although our experiment is more akin to a perceptual decision-making task than virtual navigation, our results are still consistent with the role of the hippocampus during navigational planning [5,39–41]. More specifically, posterior hippocampal activity related to increasing distance between the start and goal locations—along with higher right posterior hippocampal activity prior to longer choice RT in deep mazes—helps link our spatial decision-making results to the putative role of the right posterior hippocampus, which is thought to encode memory related to the spatial layout of an environment [42–44].
In novel environments, posterior hippocampal functional connectivity with rd-mPFC increased during deliberative planning for deep mazes and was highest before choice trials that were answered correctly. Likewise, a recent fMRI study has shown increased anterior prefrontal coupling with the hippocampus during remembering and planning upcoming trajectories to goal locations . Oscillatory coupling between the posterior medial temporal lobe and rostrodorsal portions of mPFC has been observed during dynamic spatial imagery , and our data add further support that coupling between these regions could relate to comparison of novel choices with previous experience .
Notably, the hippocampus is also thought to play a key role in rapid incidental learning [46–47]. Our anterior hippocampus activation related to spatial planning performance illustrates how the hippocampus can contribute to quick model-based inferences during tasks with little to no learning. Yet it is still unclear how one-shot episodic learning might contribute to hierarchical planning. Investigating the neural representations of novel decisions might help uncover contextualization processes important for decision-making (e.g., chaining together sequential choices as a single decision outcome) and episodic memory (e.g., chaining together individual learned representations into a cohesive episode).
Potential Computations Underlying Plan Formation and Implementation
We have elaborated on the distributed neural responses that relate to rapid prospective planning, but the precise computations required for our task are unclear. One disadvantage of our task is the inability to probe the time scale of plan formation and implementation in novel environments, particularly when choice accuracy and RT are influenced differently by path length differences. Most planning studies test after extensive training and are biased towards action-by-action evaluation without the need to maintain prior choices [3–4,48–50]. With extensively trained choices, the neural computations leading to increased decision implementation/RT are well studied [51–52]. On the other hand, the anterior prefrontal regions selectively responding to prospective uncertainty make evaluations that are more akin to rapid approximation of the best looking trajectory or jumping ahead to the most important sub-goal, which are neural computations that have not been as well explored. Interestingly, this “jumping ahead” process resembles computations that facilitate generalization between similar sequential states (successor representations) during episodic learning [53–55] and also best-first forward search models . Exploring the interactions between the successor representation, time scales, and heuristic pruning during plan formation could potentially help disclose the computations underlying rapid and efficient multi-step planning in novel environments [57–59].
The Role of vmPFC and dACC during Spatial Planning
Given that our experiment does not separate responses related to plan formation and implementation, the role of the vmPFC and dACC in our task is unclear. We observed dACC/pSMA responses related to an exhaustive comparison of path lengths (comparing the shortest path with every other available path), with additional responses related to increased deliberation, longer distance between starting and goal locations, and, most prominently, subsequently incorrect choices. Taking into account the importance of the dACC in model updating [60–61], it is not surprising that dACC responses would relate to uncertainty about potential trajectories at different choice points. However, due to the poor temporal resolution of our task, it is unclear whether dACC/pSMA responses are related to checking back on an uncertain initial choice point , focusing on one choice point for an extended period of time , performance monitoring , or increased cognitive control caused by difficult choices (see [61, 65] for an in-depth discussion of the potential role of dACC in these behaviours).
In contrast with dACC, vmPFC responses did not relate to comparisons of all available path lengths. Although subgenual portions of vmPFC responded to larger path length differences at both initial and prospective choice points, we did not observe any vmPFC signals that correlated with subsequently correct choices or quicker subsequent RT. A potential explanation for this result could be that vmPFC initially helps locate task-relevant sub-goals and signals an update of the current state [19,66].
Our findings also uncovered parietal responses that parallel activations observed in dACC and vmPFC. Smaller path length differences at both initial and prospective choice points engaged structures like PPC that have previously been implicated in value-guided decision-making when there is surprise and/or time pressure [60,67]. Notably, in other areas of the parietal lobe, right angular gyrus/TPJ and PCC responses during planning related to large initial and unchosen path length differences but also correlated with increased subsequent choice RT. One way to reconcile these seemingly contradictory results is that angular gyrus and PCC might be responding to irrelevant paths that need to be pruned/ignored , which could then help us suddenly proceed/shift [69–71] to a subsequent decision during planning. Planning studies informed by recent work investigating divisive normalization during multi-alternative choice  and dACC–PCC interactions when pursuing unlikely choices  can potentially isolate the biophysical mechanism underlying pruning irrelevant alternatives during sequential decision-making.
Notably, vmPFC, TPJ, and PCC responses to larger initial path length differences (i.e., certainty) overlap with a brain network commonly observed during value-guided choice [14,73]. Specifically, regions that increased with the precision of beliefs about choices overlap with regions that respond to reward differentials, i.e., greater value differences between chosen and unchosen options during value-guided decision-making [12–13,74]. Likewise, PPC and dACC/pSMA responses are also observed both during difficult value-guided choices (i.e., smaller value differences between chosen and unchosen options) [12,14,75] and smaller initial path length differences. This suggests a similar mechanism guiding probabilistic choice in both spatial and value-guided decision-making, regardless of whether an explicit reward, like food or monetary gain, is present.
Internal World Models and Prospective Choice
We observed increased coupling between the hippocampus and rd-mPFC during sequential plan formation that also predicted subsequent performance. Notably, resting-state fluctuations in these same regions—along with angular gyrus and PCC—are also correlated and form the default network [76–78]. Promising clues relating internal models of the physical world to resting default network fluctuations might come from investigating hippocampal sharp-wave ripples: spontaneous oscillations that co-occur with the reactivation (and pre-activation) of hippocampal place cell ensembles [79–83]. Indeed, a recent study in macaques has shown that ripples selectively influence ongoing activity in the default network but not other resting-state networks . Additionally, reactivation of hippocampal representations of previously learned goal locations has been observed during pre-navigational planning in familiar environments in humans . Despite these promising findings, further research is still necessary to determine whether endogenous hippocampal interactions with cortical midline regions reflect reactivation/exploration of internal states in order to prepare decision-making networks for upcoming novel choices [59,70,85–86].
We present a task adapted from rodent spatial navigation that enabled us to elucidate core neural computations underlying our ability to make fast and robust multi-step inferences in the absence of prior learning [85–87]. Our findings highlight a unique contribution of brain regions that do not respond to an exhaustive search of possible options during planning like caudal PFC and premotor regions but rather maintain current choices while planning subsequent choices. These data offer preliminary evidence of rapid heuristic-based computations in rd-mPFC and the hippocampus during sequential planning that might further elucidate how we make inferences about states beyond a current subjective state .
Materials and Methods
Thirty-four healthy adult participants performing the fMRI experiment gave informed written consent and were studied and compensated (as approved by the local research ethics committee at University College London and in accordance with Declaration of Helsinki protocols). Due to poor participant performance (answering less than 75% of trials correctly) in the fMRI experiment, we removed five participants, leaving a final sample of 29 participants (14 female; 23.4 mean age in y; SD of 4.09 y). All participants were right-handed had normal or corrected-to-normal vision and reported good health with no prior history of neurological disease.
Stimuli were presented using the Cogent (http://www.vislab.ucl.ac.uk/cogent.php) toolbox running in MATLAB (Mathworks, Natick, MA, USA). Over the course of 220 trials, participants viewed 220 different mazes from a slightly tilted (overhead) viewpoint and later chose from first-person viewpoints within mazes generated using Blender (http://www.blender.org). All mazes had a starting location (a red square) towards the bottom of the maze and a goal location (a green square) further into the maze. Mazes differed by hierarchical depth (number of paths to a goal location): there were 110 mazes with two possible routes (shallow mazes) and 110 mazes with four possible routes (deep mazes).
In the scanner, participants were first presented with pictures of mazes of varying difficulty (from our overhead viewpoint) and then asked to determine the shortest path from a starting location (a red square) at the bottom of the screen to the goal location (a green square). The overhead view appeared on the screen for 3.25 s, after which a location (choice point) along the path was highlighted briefly for 250 ms with an orange circle. The choice point location could either be the starting location or, if there were four paths to the goal location, a second choice point. Crucially, participants would only have to make a decision about one choice point for each trial. At any choice point, it was necessary to choose between two different directions, which could be left, forward, or right, with an additional option to select equal, if both routes were the same distance. No second choice points with two incorrect choices were ever chosen, only a second choice point along the optimal path after the starting location could be chosen (due to viewpoint constraints, only 47 choice points further were chosen versus the initial starting point/red square, which was chosen 53 times). After the choice point was highlighted, a “zoomed in” viewpoint of this location (always one square back and facing the same direction as the overhead viewpoint) was presented. Depending on the possible direction at the location, participants had less than 1,500 ms to decide whether to go left, forward, right, or occasionally either direction. If no button press was made within 1,500 ms, the trial counted as an incorrect trial and the experiment moved on to the 1500-ms intertrial interval (ITI) phase. Participants never received any feedback or reward for making the correct choice. As soon as participants chose a direction, the ITI phase of a trial began. Participants repeated this trial sequence 110 times per session, for a total of two sessions. Sessions lasted approximately 10–15 min. Session order was counterbalanced between participants.
All participants completed a brief practice session consisting of 40 mazes/trials before the experiment (on a laptop outside of the scanner). Deep mazes contained another branch/choice between routes further in the maze, and the path length to reach the two choice points further in the maze was always equal. Mazes had square tiled floors and were 8 x 8, 9 x 9, or 10 x 10 squares in total area. In shallow mazes, path length differences were split between 2, 4, and 6, with one catch trial per session having equal path lengths. In deep mazes, path length differences were split between 2 (small difference), 4 (medium difference), or 6 (large difference) squares (for an example, see square tiles in the mazes presented in Fig 1) for the two paths at the starting location and a path length difference of 2, 4, or 6 squares at the optimal choice point in the maze. There was one catch trial for deep and shallow mazes in each session, each containing all equal path lengths (path length differences of 0). In sum, shallow trials could either have path length difference of 2,4, and 6, while deep maze trials could be 2, 2; 2, 4; 2, 6; 4, 2; 4, 4; 4, 6; 6, 2; 6, 4; 6, 6; (e.g. 4, 2 would have a medium path length difference of 4 at the starting location, whereas the second choice point would have a small path length difference of 2; see Fig 1C for examples).
Functional images were acquired on a 3T Siemens Trio scanner. BOLD T2*-weighted functional images were acquired using a gradient-echo EPI pulse sequence acquired obliquely at 45° with the following parameters: repetition time, 3,360 ms; echo time, 30 ms; slice thickness, 2 mm; inter-slice gap, 1 mm; in-plane resolution, 3 × 3 mm; field of view, 64 × 72 mm2; 48 slices per volume. A field-map using a double echo FLASH sequence was recorded for distortion correction of the acquired EPI . After the functional scans, a T1-weighted 3-D MDEFT structural image (1 mm3) was acquired to co-register and display the functional data.
Functional images were processed and analysed using SPM8 (www.fil.ion.ucl.uk/spm). The first five volumes were discarded to allow for T1 equilibration. Standard preprocessing included correction for differences in slice acquisition timing, realignment/unwarping to correct for inter-scan movement, and normalization of the images to an EPI template (specific to our sequence and scanner) that was aligned to the T1 Montreal Neurological Institute (MNI) template. Finally, the normalized functional images were spatially smoothed with an isotropic 8-mm full-width half maximum Gaussian kernel. For the model described below, all regressors, with the exception of six movement parameters of no interest, were convolved with the SPM hemodynamic response function. Data were also high-pass filtered (cut-off period = 128 s). Statistical analyses were performed using a univariate GLM with a rapid event-related experimental design.
GLM1 was based on path length differences (see task description for possible path length differences): for the two shortest paths present at the starting point (Initial Path Length Difference), the path length difference between the shortest path and the longest unchosen path length that was not available at the second choice point (Unchosen Path Length Difference), the path length difference at the second choice point (Prospective Path Length Difference), log RT for the subsequent decision phase (Log RT), length of the shortest available path (Length of the Shortest Path), whether the participant made a correct choice during the subsequent choice phase (Performance), and whether the first or second choice point was prompted for deep maze trials (see Fig 1D for schematic showing the paths contributing to Initial Path Length Difference, Prospective Path Length Difference, and Unchosen Differences). For shallow trial regressors, there were only parametric regressors for Initial Path Length Difference, Log RT, Length of the Shortest Path, and Performance. Inferences about the effects of uncertainty were based upon t tests using the standard summary statistic approach for second-level random effects analysis (see S1 Text for additional follow-up GLMs and corresponding results and S5 Table for a complete table of conditions and parametric regressors for each GLM).
We conducted a PPI analysis  to examine hippocampal coupling with rd-mPFC and the rest of the brain during deep versus shallow planning trials. The group-level right posterior hippocampus peak (x = 24, y = −37, z = −8) that correlated with increased RT served as a centre for the spherical region of interest (8-mm radius). The first eigenvariate from these ROIs constituted the physiological variable. The psychological variable was the contrast vector representing the task effect of deep versus shallow mazes. These regressors and their interaction term were estimated at the first level. Contrast images associated with the PPI regressor were then entered into a one-sample t test.
Post hoc statistical analyses were conducted using 8-mm radius spheres in MarsBar  toolbox within SPM8 around the respective peak voxel specified in the GLM analysis. This allowed us to compare the effects of different parametric regressors of interest (e.g., to determine whether a length of the shortest available path effect was present in a region defined by an orthogonal main effect of prospective path length difference). This ensured we did not make any biased inferences in our post hoc analyses.
Given the previously hypothesized role of the hippocampus in spatial planning, we report whether hippocampal peak-voxels survive SVC for multiple comparisons (p < 0.05) based on a bilateral ROI of the hippocampus constructed using the SPM Anatomy toolbox [91–92]. For all analyses outside of the hippocampus, we report activations surviving an uncorrected statistical threshold of p = 0.005 and cluster-level correction for multiple comparisons (FWE p < 0.05), unless indicated otherwise. We also mention whether any significant clusters had a very large cluster extent (k > 2,000), and the cluster extent for every significant effect is reported in S7–13 Tables. Coordinates of brain regions are reported in MNI space. BOLD signal time courses in S5 Fig were plotted using the rfxplot toolbox .
S1 Fig. Prospective choice behaviour.
(A) Accuracy when participants are prompted on the prospective/second choice point further in the maze. Left: Fraction of correct prospective choices split by whether there was a small, medium, or large path length difference at the prospective choice point. Right: Fraction of correct prospective choices split by whether there was a small, medium, or large path length difference between the two shortest options available at the first initial choice point. (B) Log RT when participants are prompted on a choice point further in the maze. Left: Log RT for prospective choices split by whether there was a small, medium, or large path length differences at the second choice points. Right: Log RT for prospective trials split by whether there was a small, medium, or large path length difference between the two shortest options available at the first choice point.
S2 Fig. Large versus small prospective path length differences.
Anterior prefrontal peak for initial (in shallow mazes) versus prospective path length differences. Images showing dlPFC peak of cluster (circled in blue) also containing rd-mPFC sub-peak that significantly responded to small prospective versus small path length differences in shallow mazes (see Fig 3E). dlPFC cluster survives FWE correction p < 0.05 for multiple comparisons and is shown at p < 0.005 uncorrected.
S3 Fig. Response to initial choice path length differences.
Top: Sagittal image centred on vmPFC activation that responded to large initial path length differences. Regions responding to large initial path length differences shown in orange and regions responding to small initial path length differences shown in blue. Bottom: Effect size for 8-mm sphere around the vmPFC peak for initial, prospective, and unchosen path length differences. Asterisk signifies p < 0.05.
S4 Fig. Responses to proximal shortest available path length.
Images centred on rd-mPFC that correlated with smaller distance between the starting and goal location (i.e., length of the shortest available path).
S5 Fig. dACC activation during planning preceding incorrect choices.
(A) Left: Sagittal image showing dACC/pSMA activity during planning that was higher prior to incorrect versus correct choice trials. Right: Effect size for an 8-mm sphere around the dACC/pSMA peak voxel showing that there is no significant difference (p < 0.05) in the correlation with subsequently incorrect choices for deep versus shallow planning trials (mean ± SEM). Asterisks indicate a significant correlation (p < 0.05) with path length differences. A negative effect size represents a correlation with incorrect trials, whereas a positive effect size represents a correlation with correct trials. (B) Evoked dACC/pSMA BOLD response during planning separated by subsequently correct and incorrect choice trials (mean ± SEM).
S6 Fig. Deep versus shallow planning interactions.
(A) Images centred on PPC region that significantly responded to smaller initial path length differences in shallow versus deep mazes. (B) Images centred on left PPC region that significantly responded to subsequently incorrect choices in shallow versus deep mazes. (C) Images centred on right inferior temporal cortex region that responded to increasing distance between the starting and goal location in deep versus shallow mazes.
S1 Table. Bayesian model comparison of reduced forms of the behavioural model in Equation 1.
The first and second columns report the model variant and the corresponding Bayesian information criterion (BIC; summed across participants). A small BIC reflects greater (log) model evidence.
S2 Table. Shannon entropy values (H) by path length difference.
Several models of RT were compared using the BIC after penalizing for the number of parameters. A smaller BIC value indicates that a model has higher evidence, after penalizing its accuracy for its complexity or number of parameters.
S3 Table. Mean probability of making the correct choice using our Shannon entropy model.
S5 Table. GLM2.
Shannon entropy model-based regressors.
S6 Table. GLM3.
PPI correct or incorrect choice regressors.
S7 Table. Initial path length differences.
List of peak voxels for clusters found in the initial path length difference contrast. Please note that despite our stringent threshold (p < 0.005 activation threshold, cluster-based threshold p < 0.05), many of the activations are very large (k > 2,000) and span multiple brain regions. Consequently, the labels assigned to each cluster should be interpreted with caution.
S8 Table. Prospective path length differences.
List of peak voxels for clusters found in the prospective path length difference contrast. Please note that despite our stringent threshold (p < 0.005 activation threshold, cluster-based threshold p < 0.05), some activations are very large (k > 2,000) and span multiple brain regions. Consequently, the labels assigned to each cluster should be interpreted with caution.
S9 Table. Unchosen path length differences.
List of peak voxels for clusters found in the Unchosen Path Length difference contrast.
S10 Table. Planning responses related to subsequent choice RT.
List of peak voxels for clusters found in subsequent choice RT contrast. Please note that despite our stringent threshold (p < 0.005 activation threshold, cluster-based threshold p < 0.05), some activations are very large (k > 2,000) and span multiple brain regions. Consequently, the labels assigned to each cluster should be interpreted with caution.
S11 Table. Length of shortest available path.
List of peak voxels for clusters found in the length of shortest available path contrast. Please note that despite our stringent threshold (p < 0.005 activation threshold, cluster-based threshold p < 0.05), some activations are very large (k > 2,000) and span multiple brain regions. Consequently, the labels assigned to each cluster should be interpreted with caution.
S12 Table. Accuracy.
List of peak voxels for clusters found in the accuracy contrast.
S13 Table. Deep V shallow interactions.
List of peak voxels for clusters found in contrasts for deep versus shallow interactions related to different parametric regressors. Please note that despite our stringent threshold (p < 0.005 activation threshold, cluster-based threshold p < 0.05), some of the activations are very large (k > 2,000) and span multiple brain regions. Consequently, the labels assigned to each cluster should be interpreted with caution.
Data underlying plots in Fig 2 (panels A and B).
Data underlying plots in Fig 3 (panels C and D).
Data underlying plots in Fig 4 (panel A).
We would like to thank T. Behrens for helpful comments on an earlier version of this manuscript. We also thank T. Hauser, D. Ognibene, F. Rigoli, and B. Sengupta for helpful discussion and the staff at the Wellcome Trust Centre for Neuroimaging for assistance with scanning.
- Conceptualization: RKa JK NB KJF.
- Formal analysis: RKa.
- Funding acquisition: RKa KJF.
- Investigation: RKa RKo.
- Project administration: RKa.
- Resources: WDP KJF.
- Supervision: RKa KJF.
- Writing – original draft: RKa KJF.
- Writing – review & editing: RKa JK RKo WDP NB KJF.
- 1. Ramnani N, Owen AM. Anterior prefrontal cortex: insights into function from anatomy and neuroimaging. Nature Rev Neurosci. 2004;5: 184–194. pmid:14976518
- 2. Yoshida W, Ishii S. Resolution of uncertainty in prefrontal cortex. Neuron. 2006;50: 781–789. pmid:16731515
- 3. Wunderlich K, Dayan P, Dolan RJ. Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci. 2012;15: 786–791. pmid:22406551
- 4. Balaguer J, Spiers H, Hassabis D, Summerfield C. Neural mechanisms of hierarchical planning in a virtual subway network. Neuron 2016; 90:893–903 pmid:27196978
- 5. Brown TI, Carr VA, LaRocque KF, Favila SE, Gordon AM, Bowles B, Bailenson JN, Wagner AD. Prospective representation of navigational goals in the human hippocampus. Science. 2016;352:1323–1326. pmid:27284194
- 6. Boorman ED, Behrens TE, Rushworth MF. Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol. 2011;9: e1001093. pmid:21738446
- 7. Doll BB, Duncan KD, Simon DA, Shohamy D, Daw ND. Model-based choices involve prospective neural activity. Nat Neuro. 2015;18: 767–72. pmid:25799041
- 8. Benoit RG, Gilbert SJ, Burgess PW. A neural mechanism mediating the impact of episodic prospection on farsighted decisions. J Neurosci. 2011;31: 6771–6779. pmid:21543607
- 9. Nicolle A, Klein-Flügge MC, Hunt LT, Vlaev I, Dolan RJ, Behrens TE. An agent independent axis for executed and modelled choice in medial prefrontal cortex. Neuron. 2012;75; 1114–1121. pmid:22998878
- 10. Wang Q, Luo S, Monterosso J, Zhang J, Fang X, Dong Q, Xue G. (2014) Distributed value representation in the medial prefrontal cortex during intertemporal choices. J Neurosci. 2014;34:7522–30 pmid:24872557
- 11. O’Keefe J, Nadel L. The Hippocampus as a Cognitive Map. 1978:114–52 (Oxford Univ Press)
- 12. Hampton AN, Bossaerts P, O’Doherty JP. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 2006;26: 8360–8367. pmid:16899731
- 13. FitzGerald TH, Seymour B, Dolan RJ. The role of human orbitofrontal cortex in value comparison for incommensurable objects. J Neurosci. 2009;29: 8388–8395. pmid:19571129
- 14. Bartra O, McGuire JT, Kable JW. The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage. 2013;76: 412–427. pmid:23507394
- 15. Friston KJ, Buechel C, Fink GR, Morris J, Rolls E, Dolan RJ.Psychophysiological and modulatory interactions in neuroimaging. Neuroimage. 1997;6:218–29. pmid:9344826
- 16. Kovach CK, Daw ND, Rudrauf D, Tranel D, O’Doherty JP, Adolphs R. Anterior prefrontal cortex contributes to action selection through tracking of recent reward trends. J Neurosci. 2012;8434–42. pmid:22723683
- 17. Boorman ED, Behrens TE, Woolrich MW, Rushworth MF. How green is the grass on the other side? Lateral frontopolar cortex and the evidence in favor of alternative courses of action. Neuron. 2009;62: 733–743. pmid:19524531
- 18. Mansouri FA, Buckley MJ, Mahboubi M, Tanaka K. Behavioral consequences of selective damage to frontal pole and posterior cingulate cortices. Proc Natl Acad Sci USA 2015;112:E3940–9. pmid:26150522
- 19. Kolling N, Behrens TE, Mars RB, Rushworth MF. Neural mechanisms of foraging. Science 2012; 336:95–8. pmid:22491854
- 20. Kolling N, Wittman M, Rushworth MF. Multiple neural mechanisms of decision making and their competition under changing risk pressure. Neuron. 2014;81:1190–202. pmid:24607236
- 21. Burgess P. W., Simons J. S., Dumontheil I., & Gilbert S. J. (2005). The gateway hypothesis of rostral prefrontal cortex (area 10) function. In Duncan J., Phillips L., & McLeod P. (Eds.), Measuring the mind: Speed, control, and age (pp. 217–248). Oxford: Oxford University Press.
- 22. Schuck NW, Gaschler R, Wenke D, Heinzle J, Frensch PA, Haynes JD, Reverberi C. Medial prefrontal cortex predicts internally driven strategy shifts. Neuron. 2015;86: 331–340. pmid:25819613
- 23. Donoso M, Collins AG, Koechlin E. Human cognition. Foundations of human reasoning in the prefrontal cortex. Science 2014;344:1481–1486. pmid:24876345
- 24. Shenhav A, Buckner RL. Neural correlates of dueling affective reactions to win-win choices. Proc Natl Acad Sci USA. 2014;111: 10978–10983. pmid:25024178
- 25. Rich EL, Shapiro M. Rat prefronal cortical neurons selectively code strategy switches. J Neurosci. 2009;29: 7208–7219. pmid:19494143
- 26. Durstewitz D, Vittoz NM, Floresco SB, Seamans JK. Abrupt transitions between prefrontal neural ensemble states. Neuron. 2010;66: 438–448. pmid:20471356
- 27. Karlsson MP, Tervo DGR, Karpova AY. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science. 2012;338: 135–139. pmid:23042898
- 28. Powell NJ, Redish AD. Representational changes of latent strategies in rat medial prefrontal cortex precede changes in behaviour. Nat Commun. 2016. 7: 12830. pmid:2763278
- 29. McGuire JT, Kable JW. Medial prefrontal cortical activity reflects dynamic re-evaluation during voluntary persistence. Nat Neurosci. 2015;18: 760–8. pmid:25849988
- 30. Boschin EA, Piekema C, Buckley MJ. Essential functions of primate frontopolar cortex in cognition. Proc Natl Acad Sci USA. 2015;112: E1020–1027. pmid:25691741
- 31. Boorman ED, Rushworth MF, Behrens TE. Ventromedial prefrontal and anterior cingulate cortex adopt choice and default reference frames during sequential multi-alternative choice. J Neurosci. 2013;33: 2242–2253. pmid:23392656
- 32. Otto AR, Gershman SJ, Markman AB, Daw ND. The curse of planning dissecting multiple reinforcement-learning systems by taxing the central executive. Psychological Science. 2013;24:751–61. pmid:23558545
- 33. Otto AR, Skatova A, Madlon-Kay S, Daw ND. Cognitive control predicts use of model-based reinforcement learning. J Cogn Neurosci, 2015;27:319–33. pmid:25170791
- 34. Gershman SJ, Markman AB, Otto AR. Retrospective revaluation in sequential decision making: A tale of two systems. J Exp Psych: Gen 2014: 182–194.
- 35. Wimmer GE, Shohamy . Preference by association: how memory mechanisms in the hippocampus bias decisions. Science. 2012; 338:270–3. pmid:23066083
- 36. Barron HC, Dolan RJ, Behrens TE. Online evaluation of novel choices by simultaneous representation of multiple memories. Nat Neurosci. 2013;16:1492–8. pmid:24013592
- 37. Bornstein AM, Daw ND. Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans. PLoS Comput Biol. 2013;9:e1003387 pmid:24339770
- 38. Shadlen MN, Shohamy D. Decision Making and Sequential Sampling from Memory. Neuron. 2016. 90:927–39. pmid:27253447
- 39. Simon DA, Daw ND. Neural correlates of forward planning in a spatial decision task in humans. J Neurosci. 2011;31: 5526–5539. pmid:21471389
- 40. Viard A, Doeller CF, Hartley T, Bird CM, Burgess N. Anterior hippocampus and goal-directed spatial decision making. J Neurosci. 2011;31: 4613–4621. pmid:21430161
- 41. Howard LR, Javadi AH, Yu Y, Mill RD, Morrison LC, Knight R, Loftus MM, Staskute L, Spiers HJ. The hippocampus and entorhinal cortex encode the path and Euclidean distances to goals during navigation. Curr Bio 2014;24:1331–40. pmid:24909328
- 42. Abrahams S, Pickering A, Polkey CE, Morris RG. Spatial memory deficits in patients with unilateral damage to the right hippocampal formation. Neuropsychologia 35: 11–24. pmid:8981373
- 43. Bohbot V, Kalina M, Stepankova K, Spackova N, Petrides M, et al. Spatial memory deficits in patients with lesions to the right hippocampus and to the right parahippocampal cortex. Neuropsychologia 1998; 36:1217–1238. pmid:9842767
- 44. Spiers HJ, Burgess N, Maguire EA, Baxendale SA, Hartley T, et al. Unilateral temporal lobectomy patients show lateralized topographical and episodic memory deficits in a virtual town. Brain. 2001;124: 2476–2489. pmid:11701601
- 45. Kaplan R, Bush D, Bisby J, Horner AJ, Meyer SS, Burgess N. Medial prefrontal-medial temporal theta phase coupling in dynamic spatial imagery. J Cogn Neurosci. 2016. E-pub ahead of print. pmid:27779906
- 46. Lee SW, O’Doherty JP, Shimojo S. Neural computations mediating one-shot learning in the human brain. PLoS Biol. 2015;13:e1002137. pmid:25919291
- 47. Kumaran D, Hassabis D, McClelland JL. What Learning Systems do Intelligent Agents Need? Complementary Learning Systems Theory Updated. Trends Cogn Sci. 2016;20:512–34. pmid:27315762
- 48. Huys QJ, Eshel N, O’Nions E, Sheridan L, Dayan P, Roiser JP. Bonsai trees in your head: how the pavlovian system scupts goal-directed choices by pruning decision trees. PLoS Comput Biol. 2012;8;e1002410 pmid:22412360
- 49. Huys QJM, Lally N, Faulkner P, Eshel N, Seifritz E, Gershman SJ, Dayan P, Roiser JP. Interplay of approximate planning strategies. Proc Natl Acad Sci USA 2015:112:3098–3103. pmid:25675480
- 50. Solway A, Botvinick MM. Evidence integration in model-based tree search. Proc Natl Acad Sci USA. 2015;112:11708–11713. pmid:26324932
- 51. Smith PL and Ratcliff R. Psychology and neurobiology of simple decisions. Trends Neurosci. 2004. 27:161–8. pmid:15036882
- 52. Gold JI and Shadlen MN. The neural basis of decision making. Annu Rev Neurosci. 2007;30:535–74. pmid:17600525
- 53. Dayan P. Improving generalization for temporal difference learning: The successor representation. Neural Computation. 1993;5:613–24.
- 54. Gershman SJ, Moore CD, Todd MT, Norman KA, Sederberg PB. The successor representation and temporal context. Neural Computation 2012; 24:1553–68
- 55. Stachenfeld KL, Botvinick MM, Gershman SJ. Design Principles of the Hippocampal Cognitive Map. Advance Neural Info Process Sys 2014: 27
- 56. Bonet B, Geffner H. 2001. Planning as heuristic search. Artificial Intelligence 129,5–33.
- 57. Daw ND, Dayan P. The algorithmic anatomy of model-based evaluation. Philos Trans R Soc London B Biol Sci. 2014 369 pmid:25267820
- 58. Tervo DG, Tenenbaum JB, Gershman SJ. Toward the neural implementation of structure learning. Curr Opin Neurobiol. 2016;37:99–105. pmid:26874471
- 59. Gershman SJ, Daw ND. Reinforcement Learning And Episodic Memory in Humans and Animals: An Integrative Framework. Ann Rev Psychol 2017;68 EPUB pmid:27618944
- 60. O’Reilly JX, Schüffelgen U, Cuell SF, Behrens TE, Mars RB, Rushworth MF. Dissociable effects of surprise and model update in parietal and anterior cingulate cortex. Proc Natl Acad Sci USA. 2013; 110:E3660–9. pmid:23986499
- 61. Kolling N, Wittmann MK, Behrens TE, Boorman ED, Mars RB, Rushworth MF. Value, search, persistence and model updating in anterior cingulate cortex. Nat Neurosci. 2016;19:1280–5/ pmid:27669988
- 62. Stoll FM, Fontanier V, Procyk E. Specific frontal neural dynamics contribute to decisions to check. Nat Commun. 2016;7:11990. pmid:27319361
- 63. Hayden BY, Pearson JM, Platt ML. Neuronal basis of sequential foraging decisions in a patchy environment. Nat Neurosci. 2011;14:933–9. pmid:21642973
- 64. Ullsperger M, Fischer AG, Nigbur R, Endrass T. Neural mechanisms and temporal dynamics of performance monitoring. Trends Cogn. Sci. 2014.18:259–67. pmid:24656460
- 65. Shenhav A, Cohen JD, Botvinick MM. Dorsal anterior cingulate cortex and the value of control. Nat Neurosci. 2016;1286–91. pmid:27669989
- 66. Schuck NW, Cai MB, Wilson RC, Niv Y. Human Orbitofrontal Cortex Represents a Cognitive Map of State Space. Neuron 2016. 91:1402–12. pmid:27657452
- 67. Jocham G, Furlong PM, Kroger IL, Kahn MC, Hunt LT, Behrens TEJ. Dissociable contributions of ventromedial prefrontal and posterior parietal cortex to value-guided choice. NeuroImage 2014;100:498–506. pmid:24941453
- 68. McGuire JT, Nassar MR, Gold JI, Kable JW. Functionally dissociable influences on learning rate in a dynamic environment. Neuron. 2014;84: 870–881. pmid:25459409
- 69. Pearson JM, Hayden BY, Raghavachari S, Platt ML. Neurons in posterior cingulate cortex signal exploratory decisions in a dynamic multioption choice task. Curr Bio. 2009;19: 1532–1537. pmid:19733074
- 70. Pearson JM, Heilbronner SR, Barack DL, Hayden BY, Platt ML. Posterior cingulate cortex: adapting behavior to a changing world. Trends Cog Sci. 2011;15: 143–151. pmid:21420893
- 71. Crittenden BM, Mitchell DJ, Duncan J. Recruitment of the default mode network during a demanding act of executive control. eLife. 2015;4: e06481. pmid:25866927
- 72. Chau BK, Kolling N, Hunt LT, Walton ME, Rushworth MF. A neural mechanism underlying failure of optimal choice with multiple alternatives. Nat Neurosci. 2014;17:463–70. pmid:24509428
- 73. Rushworth MF, Noonan MP, Boorman ED, Walton ME, Behrens, TE. Frontal cortex and reward-guided learning and decision-making. Neuron. 2011;70: 1054–1069. pmid:21689594
- 74. Kable JW, Glimcher PW. The neural correlates of subjective value during intertemporal choice. Nat Neurosci. 2007;10: 1625–1633. pmid:17982449
- 75. Huettel SA, Song AW, McCarthy G. Decisions under uncertainty, probabilistic context influences activation of prefrontal and parietal cortices. J. Neurosci. 2005;25: 3304–3311. pmid:15800185
- 76. Greicius MD, Krasnow B, Reiss AL, Menon V. Functional connectivity in the resting brain: a network analysis of the default mode hypothesis. Proc Natl Acad Sci USA. 2003;100: 253–258. pmid:12506194
- 77. Raichle ME, MacLeod AM, Snyder AZ, Powers WJ, Gusnard DA, Shulman GL. A default mode of brain function. Proc Natl Acad Sci USA. 2001;98: 676–682. pmid:11209064
- 78. Buckner RL, Andrews-Hanna JR, Schacter DL. The brain’s default network: anatomy, function, and relevance to disease. Ann NY Acad Sci. 2008;1124: 1–38. pmid:18400922
- 79. Wilson MA, McNaughton BL. Reactivation of hippocampal ensemble memories during sleep. Science. 1994;265: 676–679. pmid:8036517
- 80. Skaggs WE, McNaughton BL. Replay of neuronal firing sequences in rat hippocampus during sleep following spatial experience. Science. 1996;271: 1870–1873. pmid:8596957
- 81. Kudrimoti HS, Barnes CA, McNaughton BL. Reactivation of hippocampal cell assemblies: effects of behavioral state, experience, and EEG dynamics. J Neurosci. 1999;19: 4090–4101. pmid:10234037
- 82. Nádasdy Z, Hirase H, Czurko A, Csicsvari J, Buzsáki G. Replay and time compression of recurring spike sequences in the hippocampus. J Neurosci. 1999;19: 9497–9507. pmid:10531452
- 83. Dragoi G, Tonegawa S. Preplay of future place cell sequences by hippocampal cellular assemblies. Nature. 2011;469: 397–401. pmid:21179088
- 84. Kaplan R, Adhikari MH, Hindriks R, Mantini D, Murayama Y, Logothetis NK, Deco G. Hippocampal sharp-wave ripples influence selective activation of the default mode network. Curr Bio. 2016 Mar 7;26(5): 686–91. pmid:26898464
- 85. Hopfield JJ. Neurodynamics of mental exploration. Proc Natl Acad Sci USA 2010;107:1648–53. pmid:20080534
- 86. Lee D, Seo H. Neural Basis of Strategic Decision Making. Trends Neurosci. 2016;39: 40–48. pmid:26688301
- 87. Battaglia PW, Hamrick JB, Tenenbaum JB. Simulation as an engine of physical scene understanding. Proc Natl Acad Sci USA. 2013;110: 18327–18332. pmid:24145417
- 88. Friston K, Frith C. A Duet for one. Conscious Cogn 2015;36:390–405. pmid:25563935
- 89. Weiskopf N, Hutton C, Josephs O, & Deichmann R. Optimal EPI parameters for reduction of susceptibility-induced BOLD sensitivity losses: A whole-brain analysis at 3 T and 1.5 T. Neuroimage. 2006;33: 493–504. pmid:16959495
- 90. Brett M, Anton JL, Valabregue R, Poline JB. Region of interest analysis using an SPM toolbox. 2002; Sendai, Organization for Human Brain Mapping.
- 91. Amunts K, Kedo O, Kindler M, Pieperhoff P, Mohlberg H, Shah NJ, Habel U, Schneider F, Zilles K. Cytoarchitectonic mapping of the human amygdala, hippocampal region and entorhinal cortex: intersubject variability and probability maps. Anat Embryol (Berl) 2005; 210:342–52.
- 92. Eickhoff SB, Stephan KE, Mohlberg H, Grefkes C, Fink GR, Amunts K Zilles K. A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data. Neuroimage 2005. 25:1325–35. pmid:15850749
- 93. Glascher J, Visualization of group inference data in functional neuroimaging. Neuroinformatics. 2009; 7:73–82. pmid:19140033