The Yin and the Yang of Prediction: An fMRI Study of Semantic Predictive Processing

Probabilistic prediction plays a crucial role in language comprehension. When predictions are fulfilled, the resulting facilitation allows for fast, efficient processing of ambiguous, rapidly-unfolding input; when predictions are not fulfilled, the resulting error signal allows us to adapt to broader statistical changes in this input. We used functional Magnetic Resonance Imaging to examine the neuroanatomical networks engaged in semantic predictive processing and adaptation. We used a relatedness proportion semantic priming paradigm, in which we manipulated the probability of predictions while holding local semantic context constant. Under conditions of higher (versus lower) predictive validity, we replicate previous observations of reduced activity to semantically predictable words in the left anterior superior/middle temporal cortex, reflecting facilitated processing of targets that are consistent with prior semantic predictions. In addition, under conditions of higher (versus lower) predictive validity we observed significant differences in the effects of semantic relatedness within the left inferior frontal gyrus and the posterior portion of the left superior/middle temporal gyrus. We suggest that together these two regions mediated the suppression of unfulfilled semantic predictions and lexico-semantic processing of unrelated targets that were inconsistent with these predictions. Moreover, under conditions of higher (versus lower) predictive validity, a functional connectivity analysis showed that the left inferior frontal and left posterior superior/middle temporal gyrus were more tightly interconnected with one another, as well as with the left anterior cingulate cortex. The left anterior cingulate cortex was, in turn, more tightly connected to superior lateral frontal cortices and subcortical regions—a network that mediates rapid learning and adaptation and that may have played a role in switching to a more predictive mode of processing in response to the statistical structure of the wider environmental context. Together, these findings highlight close links between the networks mediating semantic prediction, executive function and learning, giving new insights into how our brains are able to flexibly adapt to our environment.


Introduction
studies showing that participants modulate the strength of their semantic predictions, based on the predictive validity of the broader environmental context. Thus, in a block that contains a relatively higher proportion of associated (versus unrelated) word pairs, participants will use the prime to predict upcoming semantic features with greater strength than in a block that contains a relatively lower proportion of associated (versus unrelated) word pairs. The reason for this is that, so long as probabilistic predictions are based on the statistical probabilistic knowledge of the broader contextual environment, and so long as they have some expected utility, prediction should maximize the chances of optimal task performance (see [4,50] for discussion).
Obviously, the semantic priming relatedness proportion paradigm is less naturalistic than sentence or discourse processing paradigms that vary the predictability of the local context. However, it has the advantage of allowing the local semantic context to be held constant while varying the probability of a particular semantic prediction being confirmed or disconfirmed by an incoming target word. This means that neural processing of exactly the same set of target words, preceded by exactly the same set of contexts (prime words), can be contrasted across blocks of high versus low predictive validity. In this way, Relatedness (semantically associated versus unrelated word pairs) can be fully crossed with Predictive Validity (higher proportion of semantically associated word pairs versus lower proportion of semantically associated word pairs) in a 2 x 2 design (see Methods for further details). Thus, the neuroanatomical regions engaged in semantic predictive processing can be isolated while avoiding the types of confounds described above.
In previous work, we have used this paradigm in conjunction with ERP and MEG techniques. In an initial ERP study [41], we established that, just as in sentence processing paradigms, the N400 to target words was selectively attenuated when semantic predictions, based on the prime, were fulfilled: semantically associated target words in high predictive validity blocks showed more semantic facilitation than the same set of associated targets in low predictive validity blocks ( [41]; see also [48,49]). In a second ERP/MEG study (supplemented by a preliminary fMRI analysis), we used source localization methods to show that the differential activity to semantically associated versus unrelated targets within the N400 time window under conditions of higher but not lower predictive validity, localized to the left anterior temporal cortex [51]. This finding was consistent with a previous PET study that used a similar relatedness proportion priming paradigm and reported an effect of relatedness proportion in this anterior temporal region [52], although, because PET does not allow for an event-related design, the authors were unable to probe differential neural activity associated with the priming itself (the modulation of activity to associated versus unrelated word pairs).
Our previous ERP/MEG work using this paradigm suggested that the anterior temporal cortex was modulated by confirmed semantic predictions [51]. However, it left open the question of whether the other regions described above-the left IFG and post-S/MTG-also contribute to semantic predictive processing. In our ERP study [41], we showed some evidence of an ERP effect that was more prolonged and that had a more anterior distribution than the N400 effect, which was selectively evoked by unrelated target words in the high versus low predictive validity blocks. This effect may have reflected the suppression of medium probability semantic predictions that were disconfirmed by the unrelated targets (e.g. [9,53]). As noted above, semantic suppression has also been linked to activity within the left IFG. Although our MEG study showed no hint of modulation within the left IFG within the N400 time window, we were unable to examine neural activity past the N400 time window because there was too much ocular artifact in the later part of the evoked response.
In the present study, we used the same relatedness proportion semantic paradigm in conjunction with fMRI-a technique that has better spatial resolution than either ERP or MEG.
Our first aim was to determine which neuroanatomical regions were specifically modulated by semantic predictive processing. On the basis of our previous MEG/ERP findings, we expected to see increased modulation within the left anterior superior/middle temporal cortex (left ant-S/MTG) in the higher versus the lower predictive validity blocks, reflecting enhanced predictive semantic facilitation. We hypothesized that, under conditions of higher predictive validity, we would also see more activity to unrelated versus associated targets within the left IFG, recruited to suppress unfulfilled semantic predictions, and within the left post-S/MTG, which is often co-activated with the left IFG (e.g. [13,39,54], and which may reflect increased lexico-semantic activity to unpredicted target words. The second goal of this study was to begin to explore the neuroanatomical relationships between semantic prediction and adaptation. As discussed above, manipulating the proportion of associated word pairs within a block leads participants to modulate the strength (or certainty) of their semantic predictions. In fact, the relationship between adaptation and prediction is reciprocal: there is a large body of evidence from models of animal learning [55,56], connectionist models [57,58], and probabilistic Bayesian models [59,60] suggesting that prediction itself may be a computational mechanism that drives adaptation. Within at least some of these frameworks, the magnitude of the prediction error modulates the rate of adaptation (e.g. [61,62]).
In the brain, the main region implicated in mediating between prediction error and adaptation is the anterior cingulate cortex (ACC). While the precise role of the ACC is unclear, one proposal is that it continually monitors changes in statistical contingencies between stimuli [63] and stimulus-response mappings [64], using this information to weight the degree to which current prediction error influences adaptation to these statistical contingencies [65,66]. Learning itself may be mediated by a lateral superior/middle prefrontal-subcortical network to which the anterior cingulate cortex is closely connected, including the middle and superior lateral prefrontal cortices [63,67], and the thalamus and basal ganglia [68][69][70][71][72][73].
To explore the relationships between the regions mediating prediction and adaptation in the present study, we used a functional connectivity analysis [74,75]. If the left IFG and post-S/ MTG are indeed recruited in response to unfulfilled predictions in the high predictive validity block, and if the anterior cingulate cortex mediates between prediction error and adaptation, then we should see more functional connectivity across these three regions under conditions of higher versus lower predictive validity.

Methods Participants
Participants were 26 students recruited from universities in the Boston area, with an average age of 23.54 (age range: 20 to 30; 11 female). Three additional subjects were run in the study but were subsequently excluded from analysis: two because of artifacts in the data and one because of technical issues during scanning. All participants were native speakers of American English (who did not grow up speaking another language), without any prior history of neuropsychiatric disorders, and all were right-handed, as assessed using the Edinburgh Handedness Inventory [76]]. This study was carried out with the explicit review and approval of the Partners Human Research Committee and Massachusetts General Hospital IRB (study protocol #2010P001683). Participants gave written informed consent and were compensated for taking part in the study in accordance with the approved IRB protocol.

Stimuli and Overall Experimental Design
The experimental design and stimuli have been previously described in detail [41,51]]. Briefly, the fMRI design comprised two experimental factors, Relatedness (semantically associated versus semantically unrelated) and Predictive Validity (higher predictive validity versus lower predictive validity), which were fully crossed in a 2 x 2 design. The Relatedness manipulation was achieved by comparing associated word pairs (all with a forward association strength of .5 or higher on the University of South Florida Association Norms [77] and unrelated word pairs (created by randomly redistributing the primes across the target items, and checking by hand to confirm that this did not accidentally result in any related pairs). The Predictive Validity manipulation was achieved by adding different numbers of associated or unrelated filler word pairs to two blocks, such that the overall proportion of associated and unrelated word pairs differed across these two blocks. In the higher predictive validity block, 50% of word pairs (200/ 400) were associated, while in the lower predictive validity block, only 10% of the word pairs (40/400) were associated. Importantly, each participant saw a core set of 40 controlled and counterbalanced items in each of the four conditions (associated and unrelated) in each of the two predictive validity blocks, and no participant saw the same word twice. The subsequent analysis was done on this core set of 40 items per condition. Forward association strength between prime and target and log frequency for both prime and target, did not significantly differ between test items in each block. Eighty of the unrelated filler word pairs included an animal word, either in the prime or target position. These were necessary for participants to carry out a semantic monitoring task, as discussed below.
In the higher predictive validity block, where 50% of the word pairs were associated, the cumulative probability of encountering a given set of semantic features (e.g. a set of semantic features corresponding to the word, chair) following a prime word like table can be roughly estimated as the association strength of table (0.75 according to the Florida Association Norms [77]) multiplied by the broader probability of encountering an associated word in that block (0.5), i.e. 0.375. In the lower predictive validity block, however, where only 10% of the word pairs were associated, the cumulative probability of encountering the same set of semantic features following the same prime is only 0.075. Thus, participants should use the prime to predict upcoming semantic features with greater certainty in the higher than in the lower relatedness proportion block.

Stimuli Presentation and Task
Stimuli were projected onto a screen in white 20-point uppercase Arial font. Each trial began with a fixation cross, presented at the center of the screen for 200ms, followed by a 200ms blank screen. The prime word was then presented for 500ms, followed by a 100ms blank screen, and then the target word was presented for 900ms, followed by a 100ms blank screen. Thus, the stimulus onset asynchrony between prime and target was 600ms to encourage controlled rather than automatic semantic priming [47]. Participants were instructed to press a button on a handheld response box with their left thumb as quickly as possible when they saw a name of an animal. As noted above, the animal words appeared on 80/400 of the filler trials (there were no animal words in the experimental trials). This task ensured that participants processed the words semantically while at the same time not drawing their explicit attention to semantic relationships between primes and targets.
Following previous studies, the lower predictive validity block was always presented first, followed by the higher predictive validity block. This ensured that, during the low proportion block, participants had not already adapted to using the prime as a strong predictor of the target. Although this potentially introduced a potential confound of block order, most of the lowlevel variables that would normally be associated with trial order, such as lower attention and lower motivation, would predict reduced rather than increased effect sizes in the higher versus lower predictive validity block, going against our hypotheses.
Each block constituted 400 trials, which was divided into four runs, each of 100 trials. The order of stimuli within each run was randomized using the OptSeq algorithm to improve deconvolution of the hemodynamic response [78]. For this purpose fixation trials of different lengths (varying from between 2 and 10 seconds) were added.
The fMRI experiment was carried out as part of a larger study in which participants were also recruited to participate in a separate MEG session in which we used the same relatedness proportion design reported in [51]. The order of the two sessions was counterbalanced, and two completely distinct stimuli sets were created such that no primes or targets were repeated across sessions for any participants.

Structural and functional MRI data acquisition
Structural and functional magnetic resonance images were acquired using a 3T Siemens Trio scanner using a 32-channel head coil. FMRI data were acquired over eight runs (4 runs per predictive validity block), each lasting for approximately 5 minutes. In each run, 130 functional volumes (36 axial slices (AC-PC aligned), 3mm slice thickness, .3mm skip, 200mm field of view, in-plane resolution of 3.125mm) were acquired with a gradient-echo sequence (repetition time = 2s, echo time = 25ms, flip angle = 90deg, interleaved acquisition). In addition, at the beginning and end of the scanning session, we acquired two T1-weighted high-resolution structural images (1mm isotropic multi-echo MPRAGE: TR = 2.53s, flip angle = 7, four echoes with TE = 1.64ms, 3.5ms, 5.36ms, 7.22ms). We used the higher quality structural scan (based on visual inspection) for the subsequent analysis.

Data analysis
Pre-processing as well as the first and second level analyses of the fMRI data were conducted in Statistical Parametric Mapping 8 (SPM8, www.fil.ion.ucl.ac.uk/spm), supplemented by additional add-on toolboxes [79,80]].

Preprocessing
The first four images in each run were discarded to ensure that transient non-saturation effects did not affect the analysis. The next step was to detect spikes and interpolate these bad slices from surrounding images (using the ArtRepair toolbox). On average 0.3% of slices (range 0 to 1.4%) were removed and interpolated. Then, images were slice-time corrected and the volumes were realigned to the first images of each run and then to each other. The functional images were co-registered to the structural image by co-registering the mean functional image to the structural MPRAGE. The anatomical images were segmented into grey and white matter, and the spatial normalization parameters acquired during this step were used to normalize the functional images. Finally, the images were smoothed with a 10mm FWHM Gaussian kernel.

Standard functional activation analyses
First level statistical analysis: Individual participants. We modeled the data using a design matrix in which each of the two blocks-lower predictive validity and higher predictive validity-had four runs (following the experimental design described above). Each run had the following regressors: two for each level of Relatedness (associated and unrelated), one for the unrelated filler trials (the four higher predictive validity runs had an additional regressor for the associated fillers), and two for the probe animal-word filler trials (one for probe pairs in which the prime was an animal word, and one for trials in which the target was an animal word).
The trials were modeled from the start of the prime word until the end of the target word, i.e. the total time for each trial (1.8 s) was taken as its duration. All regressors were convolved with a canonical hemodynamic response function. Temporal derivatives [81]] were included for all conditions. The realignment parameters for movement correction were also included in the model. In addition, we used additional regressors to covary for excessive movement at time points where the image intensity was greater than 3 Ã SD or composite motion >0.5 mm. These covariates were created using the ART toolbox [80]]. On average the additional regressors were added for less than 5% of the time points.
We defined four contrasts to take to the second level for random effects group analysis:

[Lower Predictive Validity and Semantically Associated regressors (contrast value 1) versus Lower Predictive Validity and Semantically Unrelated regressors (contrast value -1)]
Second level statistical group analysis. We created two repeated measures ANOVA models for the effects of Relatedness and the effect of Predictive Validity. The first model was created to look at the effect of all word pairs relative to the implicit baseline as well as the main effect of Predictive Validity and consisted of the within-subject effect (26 subject regressors) as well as one regressor for the effect of Higher predictive validity (versus the implicit baseline; contrast (a)) and another for the effect of lower predictive validity (versus the implicit baseline; contrast (b)).
The second model was created to investigate the main effect of Relatedness and interaction between Relatedness and Predictive Validity. This consisted of the within-subject effect (26 subject regressors) and one regressor each for the effect of Relatedness in the Higher predictive validity (contrast (c)) and the lower predictive validity blocks (contrast (d)). Within these models statistical parametric maps (SPMs) were created for the t-statistics of the effects of interest, namely the main effects of Predictive Validity, Relatedness and the interaction between the two as well as word pairs compared to implicit baseline.
To home in on effects specifically related to semantic and lexico-semantic processing, we defined three a priori regions of interest to be used for small volume correction [82]] (a) the left anterior superior/middle temporal gyrus (left ant-S/MTG), which, as noted in the introduction, may act as a hub, mapping distributed conceptual-semantic features on to amodal semantic representations, and that has been shown in recent semantic priming MEG studies to be sensitive to highly automatic [24]] and predictive [51]] semantic facilitation in the N400 time window; (b) the left inferior frontal gyrus (left IFG), which has been associated with top-down semantic suppression [33,34,40]]; and (c) the left posterior superior/middle temporal gyrus (left post-S/MTG), which has been associated with lexico-semantic processing [21]]. All three regions were defined functionally as spheres of 5mm radius around a peak MNI coordinate, revealed using the search term, 'semantic' on Neurosynth software for automatic metanalysis ( We report whole-brain effects at a voxel-level threshold of p<0.005, and either (a) a clusterlevel familywise error-corrected (FWE) threshold of p<0.05 or (b) a small volume correction FWE-corrected at the peak of a priori regions of interest described above (the three regions of interest were combined into one image for small volume correction to account for multiple comparisons across these three regions). All reported coordinates are in MNI space.

Functional task-related connectivity analysis
In addition to activation analyses, we also carried out a hypothesis-driven connectivity analysis using the generalized context-dependent psychophysiological interactions (gPPI) toolbox [84]] to determine whether there was a difference between the higher and lower predictive validity blocks in the patterns of connectivity from two seed regions: the left inferior frontal cortex and the left anterior cingulate cortex.
First level statistical analysis: Individual participants. Our seed regions were the left inferior frontal cortex (IFG) and the left anterior cingulate cortex (ACC). For each of these regions, we specified a seed cluster based on relevant contrasts in the activation analyses (see Results). We entered the time series from each seed into the model as explanatory variables in each of our four conditions. Although we were only interested in the contrast between higher predictive validity versus lower predictive validity, we modelled all four conditions in order to mirror the structure of the activation analysis model described above. This gave rise to eight regressors: four interaction regressors describing connectivity from the left IFG seed in each of our four conditions (psychophysiological interactions), and four regressors corresponding to the connectivity from the left ACC seed in each of our four conditions (psychophysiological interactions). The design matrix also included regressors corresponding to activity in each of our four experimental conditions, regressors for activity in the left ACC and left IFG seeds and their interaction, and regressors for the three-way psychophysiological interactions between the left IFG, left ACC and activity in each condition across the rest of the brain.
We then defined four contrasts to take to the second level for a random effects group analysis, each modeling a psychophysiological interaction against the implicit baseline and each collapsing across the two levels of Relatedness: Second level statistical analysis: Functional connectivity group analysis. To determine whether connectivity from each of these regions (the left IFG and left ACC) differed between the higher and lower predictive validity blocks (a main effect of Predictive Validity), we used two design matrices-one for each seed. Each model consisted of a within-subject effect (thus 26 subject regressors) and two regressors that collapsed across Relatedness: one for connectivity (psychophysiological interaction) at higher predictive validity (contrasts (a) or (c)) and one for connectivity at lower predictive validity (contrasts (b) or (d)). Within these models statistical parametric maps (SPMs) were created for the t-statistics of the effects of interest, the main effects of Predictive Validity.
We report effects at a voxel-level threshold of p<0.005 and either (a) a cluster-level FWEcorrected threshold of p<0.05, or (b) a small volume correction FWE-corrected threshold that allowed us to home in on connectivity from our seed regions to our two temporal regions of interest: the left ant-S/MTG and the left post-S/MTG. All reported coordinates are in MNI space.

Behavioral data
The participants detected, on average, 83% of animal words in the higher predictive validity block and 84% of animal words in the lower predictive validity block (overall range: 69% to 96%). This small difference was not statistically significant, p>0.05. These data show that participants were on task and attending to the semantic features of each word.

Standard functional activation analyses
The directed contrast comparing all word pairs and the implicit baseline revealed increased activity to the word pairs across a bilateral but left lateralized network distributed across the frontal cortices (inferior frontal cortices, pre-central cortices), occipital cortices, temporal cortices (the temporal fusiform cortices and, on the left, the middle temporal cortex) and subcortical regions (left and right caudate extending through the putamen and pallidum into the thalamus). The reverse contrast showed more activity to the implicit baseline than word pairs within the occipital lobe, extending into the precuneus, see Fig 1 and Table 1. There were no clusters that showed main effects of Predictive Validity.
Several clusters showed main effects of Relatedness (collapsed across higher and lower predictive validity blocks): the directed contrast between unrelated versus associated showed that there was significantly less activity to associated than unrelated word pairs (hemodynamic response suppression) within the left and right temporal fusiform gyri, extending into occipital areas, and in the left anterior/middle cingulate, extending into supplementary motor area (SMA), see Fig 2 and Table 2. No clusters, however, showed more activity to associated than unrelated word pairs (hemodynamic response enhancement).
In the whole brain analysis (voxel-level, p<0.005), there was an interaction between Relatedness and Predictive Validity (for the directed contrast between: [(Higher Predictive Validity and Semantically Unrelated-Higher Predictive Validity and Semantically Associated)-: (Lower Predictive Validity and Semantically Unrelated-Lower Predictive Validity and Semantically Associated)]) in two of the three regions hypothesized, and both these effects reached significance on small-volume correction: the left IFG (dorsal portion: Z = 2.71, p FWE < .05) and the left post-S/MTG (Z = 2.93, p FWE = .01). The IFG effect appeared to be bilateral (right Z = 3.42) although, as we had no a priori reason to look at the right IFG, this did not survive small volume FWE correction.
Follow-up comparisons within the left IFG cluster showed a near-significant priming effect in the higher predictive validity blocks (Z = 2.67, p FWE = 0.054). In the lower predictive validity blocks, there was a trend towards the opposite effect in this region with more activity to the Associated than the Unrelated word pairs-hemodynamic response enhancement (Z = 2.56, p FWE = 0.071), see Although in the left ant-S/MTG the interaction between Relatedness and Predictive Validity was not significant, we carried out planned comparisons within this region at each level of Relatedness, given our previous findings using MEG and a preliminary fMRI findings using an FIR model [78,86,87]]. Consistent with our previous findings [51]], we saw an effect of Relatedness in the higher predictive validity block (Z = 3.1, p FWE <0.01), but not in the lower predictive validity block (Z<1).

fMRI functional task-related connectivity analysis
We first looked at the connectivity patterns from a seed in the left IFG, which was defined based on the functional activation in the left IFG for the interaction between Predictive Validity and Relatedness. We compared connectivity from this seed between the higher and lower predictive validity blocks. This revealed significantly more connectivity under conditions of higher (versus lower) predictive validity to: (a) bilateral anterior cingulate cortex and paracentral gyrus (whole-brain voxel-level, p<0.005, cluster-level FWE-corrected, p<0.  Yellow-red: more activity to word pairs than implicit baseline. Blue: less activity to word pairs than implicit baseline. Effects are shown at a voxel-level significance threshold of p<0.005, and include clusters consisting of 10 or more contiguous voxels. See Table 1 for the full list of peaks. Gray masks cover subcortical regions in which activity is displaced in the surface visualisations.  All regions shown reached a cluster-level significance threshold (after family-wise error correction) of p<0.05. Anatomical locations, MNI coordinates, and approximate Brodmann areas (BA) correspond to the p-values and z-scores of representative peaks within each cluster. Both the AAL atlas and the SPM anatomy toolbox [85]] were used to define the anatomical regions reported. Only one peak per anatomical region is reported for each hemisphere. The cluster-level p-values indicate the cluster-level significance after family-wise error correction, and k indicates the number of contiguous voxels within each cluster. doi:10.1371/journal.pone.0148637.t001 Semantic Predictive Processing in fMRI Table 3 for full set of coordinates within these clusters. The reverse contrast (lower predictive validity > higher predictive validity) did not reveal any significant effects. We next looked at the connectivity patterns from a seed in the left ACC, which was defined based on the functional activation in this region for the main effect of Relatedness. This revealed significantly more connectivity under conditions of higher (versus lower) predictive validity (whole-brain voxel-level, p<0.005, cluster-level FWE-corrected, p<0.05) to (a) the right superior frontal cortex extending into the middle frontal gyrus as well as the left superior frontal gyrus, and (b) a cluster that extended bilaterally from the thalami into the posterior part of the caudate, palladium and putamen, see Fig 4 and Table 3. Once again, the reverse contrast did not reveal any effects.

Discussion
In this study we used fMRI with a semantic priming relatedness proportion paradigm to characterize the neuroanatomical regions engaged in semantic prediction and adaptation. This paradigm allowed us to determine how hemodynamic activity was modulated to identical associated and unrelated prime-target pairs under conditions of both higher and lower predictive validity. Across both predictive validity blocks, we observed reduced activity in response to associated compared to unrelated word pairs-hemodynamic response suppression-within bilateral temporal fusiform cortices and the left anterior/middle cingulate and SMA, consistent with some previous fMRI studies of semantic priming [13,88,89,90]]. Under conditions of higher predictive validity we saw significant hemodynamic response suppression in three additional regions-the left ant-S/MTG, the left IFG and the left post-S/MTG. In the latter two regions, the effect differed qualitatively from the effect seen in the lower predictive validity block, driving a significant interaction between relatedness and predictive validity. A functional . Yellow-red: more activity to unrelated than associated word pairs. Effects are shown at a voxel-level significance threshold of p<0.005, and include clusters consisting of 10 or more contiguous voxels. See Table 2 for the list of peaks within these clusters. Gray masks cover subcortical regions in which activity is displaced in the surface visualisations. connectivity analysis showed that, under conditions of higher versus lower predictive validity, the latter two regions were more tightly interconnected with one another, as well as with the ACC. Also under conditions of higher versus lower predictive validity, the left ACC was also more tightly functionally connected with a lateral prefrontal-subcortical network.

Distinct neurocognitive mechanisms engaged to fulfilled and unfulfilled semantic predictions?
We interpret the hemodynamic response suppression effect within the left anterior S/MTG under conditions of higher predictive validity as reflecting facilitated semantic processing of semantically associated targets that confirmed prior semantic predictions. This interpretation is based on our recent MEG/ERP study using the same paradigm in an overlapping set of participants [51], which showed that this region was modulated between 350-450ms-the time window that corresponds to the N400, an ERP component that is selectively sensitive to semantic facilitation (e.g. [10,41]). We suggest that, under conditions of higher predictive validity, this region acted as a 'hub' that used context in a predictive fashion to facilitate access to semantic representations that were highly distributed across multiple cortical regions [22,23,52,91]. All regions shown reached a cluster-level significance threshold (after family-wise error correction) of p<0.05. Anatomical locations, MNI coordinates, and approximate Brodmann areas (BA) correspond to the p-values and z-scores of representative peaks within each cluster. Both the AAL atlas and the SPM anatomy toolbox [85]] were used to define the anatomical regions reported. Only one peak per anatomical region is reported for each hemisphere. The cluster-level p-values indicate the cluster-level significance after family-wise error correction, and k indicates the number of contiguous voxels within each cluster. doi:10.1371/journal.pone.0148637.t002

Semantic Predictive Processing in fMRI
Although the hemodynamic response suppression effect within the left anterior S/MTG was significant in the higher but not in the lower predictive validity blocks, the difference in its modulation across the two blocks (the interaction between relatedness and predictive validity) was not significant. This may be because, as in our previous MEG study [51], this region showed a numerical trend towards a relatedness effect in the lower predictive validity blocks, perhaps reflecting weaker semantic facilitation (see also [24]). On this account, any difference Effects are shown at a voxel-level significance threshold of p<0.005, k>10. Yellow-red: more activity to Unrelated than Associated word pairs in the higher predictive validity blocks or less activity to Unrelated than Associated word pairs in the lower predictive validity blocks. Yellow circles indicate regions that reached a small volume correction FWE-corrected threshold of p<0.05 at the peak of a priori regions of interest. The left anterior superior/middle temporal gyrus, indicated with the blue square, was an a priori region of interest that did not show a significant interaction between Relatedness and Predictive Validity, although it did show a significant effect of Relatedness in the higher predictive validity blocks.  Effects are shown at a voxel-level significance threshold of p<0.005, and include clusters consisting of 10 or more contiguous voxels. Yellow-red: More functional connectivity from seed regions in higher predictive validity blocks than Lower predictive validity blocks. Red circles indicate clusters that reached a cluster-level FWE-corrected threshold of p<0.05. Yellow circles indicate regions that reached a small volume correction FWE-corrected threshold of p<0.05 at the peak of a priori regions of interest. Graphs show the contrast estimates (and standard errors) from representative peaks within regions that reached cluster or small volume corrected significance. See Table 3 for the full list of peaks.
doi:10.1371/journal.pone.0148637.g004 in modulation within this region across the two blocks was quantitative rather than qualitative, and signal loss due to susceptibility artifact in this region may have reduced our power to detect this quantitative difference statistically.
We offer a different interpretation of the hemodynamic response suppression effect observed under conditions of higher predictive validity within the left IFG and posterior portion of the left S/MTG. Neither of these regions showed modulation within the N400 time window in our previous MEG study [51], and we suggest that their modulation was primarily driven by increased activity to the semantically unrelated word-pair trials in which the targets disconfirmed prior semantic predictions. More specifically, we suggest that, under conditions of higher predictive validity, the left IFG mediated the top-down suppression of semantic features that were predicted on the basis of prime words but that were unfulfilled by unrelated target words, while the left post-S/MTG reflected increased lexico-semantic processing of these unpredicted unrelated targets. All regions shown reached a cluster-level significance threshold (after family-wise error correction) of p < 0.05. Anatomical locations, MNI coordinates, and approximate Brodmann areas (BA) correspond to the p-values and z-scores of representative peaks within each cluster. Both the AAL atlas and the SPM anatomy toolbox [85]] were used to define the anatomical regions reported. Only one peak per anatomical region is reported for each hemisphere. The cluster-level p-values indicate the cluster-level significance after family-wise error correction, and k indicates the number of contiguous voxels within each cluster. † The contrast estimate for this peak is shown in Fig 4. doi:10.1371/journal.pone.0148637.t003 This interpretation of left IFG modulation is based on a large number of fMRI and lesion studies that have implicated this region in suppressing semantic features that act as distractors for performance on a wide variety of tasks, ranging from the disambiguation of word meaning [36][37][38][39]54], to cued semantic association [33,92]]. In the present study, we suggest that, by suppressing semantic features that were predicted by primes but unfulfilled by unrelated targets, the increased left IFG activity aided participants' classification of the unrelated targets' semantic features, as required by the task. More generally, this interpretation is in line with the proposed role of the ventrolateral prefrontal cortex in aspects of executive function, particularly the selection of a class of contextually relevant information from sets of potential competing distractors to serve a particular goal ( [93][94][95][96]; see [97,98] for more general reviews of prefrontal function, and see [40] for discussion in relation to language processing). Of particular relevance to the current findings, this account is consistent with previous findings reporting that the left IFG is more active in trials in which words disconfirm highly semantically predictive contexts than to trials with non-predictive contexts [13,35].
In at least some of these previous studies (e.g. during the resolution of ambiguity [39,54] and during semantic priming [13]), the left IFG was co-activated with posterior portions of the left temporal cortex, just as in the present study. We suggest that, in our study, the increased activity within the left post-S/MTG reflected increased demands of lexico-semantic processing of target words [21,[25][26][27]. Lexico-semantic processing within the left posterior S/MTG can be dissociated from the more purely semantic function of the left anterior temporal cortex discussed above (see also [99]). More specifically, in the present study, we suggest that the increased activity within the left post-S/MTG reflected the increased demands of mapping word-form (phonological or orthographic) representations of unrelated targets on to their corresponding semantic features, which had not been pre-activated. On this account, the topdown suppression of unfulfilled semantic predictions within the left IFG and bottom up lexicosemantic processing of unrelated targets are functionally linked. This interpretation is in keeping with the assumptions of many connectionist architectures (e.g. [100]) as well as neural frameworks that posit links between these two regions (e.g. [101]). In the present study, it is further supported by our functional connectivity analysis which showed that these two regions were more tightly functionally connected in the higher than the lower predictive validity blocks. This finding is consistent with the well-described structural connections between these two regions through the arcuate fasciculus [102][103][104][105][106] as well as with previous reports that these two regions are tightly interconnected at rest (e.g. [107,108]), and in association with different aspects of language processing, e.g. [109][110][111]).
Notably, the pattern of modulation within both the left IFG and the left post-S/MTG was somewhat different under conditions of lower predictive validity where there was no hint of hemodynamic response suppression, even at lower thresholds of significance. Indeed, within the left IFG, there was a near-significant reversed priming effect with more activity to associated than unrelated word pairs-so-called hemodynamic response enhancement (see [112] for a general review of factors that can contribute to this type of reverse hemodynamic priming effect). We tentatively suggest that this reflected a more reactive strategy of semantically matching the semantic features of prime and target (see [14] for evidence that semantic matching is associated with hemodynamic response enhancement) in order to aid task performance (see also [24] for discussion). On this account, whether we engage or disengage in semantic predictive processing is not only a function of the statistical structure of the wider contextual environment, but also of participants' specific tasks and goals (see [1], sections 3.4 and 3.5 for discussion): one might therefore expect to see quite different patterns of modulation in association with tasks in which semantic prediction is not necessarily beneficial to performance (see [45,113] for discussion in relation to behavioral findings).

Adaptation
Our use of the relatedness proportion paradigm also afforded us the opportunity to explore relationships between semantic prediction and adaptation. As noted in the Introduction, prediction and adaptation are reciprocally linked: not only can adaptation to the statistical structure of the broader contextual environment modulate the strength of predictions-the underlying logic of this paradigm-but prediction itself may be the driving force behind adaptation-an idea that is central to theories of classical conditioning [55,56], connectionist learning [57,58] and Bayesian inference and learning [59,60]. The basic idea is that, at any given time, an agent's graded predictions are compared with new inputs, and any differences between these predictions and the state of the system after these new inputs are encountered-prediction error-are used to update the agent's knowledge about the statistical contingencies that best explain these inputs (within a connectionist framework, these are encoded as graded connections, and within a Bayesian framework, they can be described as probabilistic beliefs). By iteratively predicting and updating knowledge on the basis of new observations, the agent's predictions will, over time, become increasingly accurate such that overall prediction error is minimized and the agent's knowledge accurately reflects the statistical structure of her environment.
Our functional connectivity data provide evidence that, under conditions of higher versus lower predictive validity, regions associated with semantic prediction error (the response to unfulfilled predictions within the left IFG and post-S/MTG) are more tightly connected to a region that is thought to play a critical role in monitoring changes in the statistical contingencies between stimuli or stimulus-response mappings-the ACC (see [63][64][65][66]). One possibility is that these tighter functional connections reflected a role of the ACC in using its assessment of the reliability of the agent's prior knowledge about these mappings to weight the degree to which current prediction error (the response to unpredicted target words associated with left IFG and post-S/MTG activity) influenced the rate of adaptation [65,66]. Adaptation (learning) itself may have been mediated by a lateral prefrontal-subcortical network, to which the anterior cingulate was also more functionally interconnected under conditions of higher versus lower predictive validity. This included the superior lateral frontal cortices and subcortical regions (thalamus and basal ganglia), which have previously been implicated in language monitoring [68,69], pattern-based sequential learning [70][71][72] and adaptation [73].

Open questions
Our findings raise a number of open questions. One set of questions concerns the relationships between activity within the neuroanatomical regions discussed here and various ERP components that have been associated with confirmed and disconfirmed semantic predictions. As discussed, based on our previous MEG/ERP findings using the same paradigm [51], we interpret the modulation of activity within the anterior temporal cortex in the high predictive validity block as reflecting activity within the N400 time window to confirmed semantic predictions. It is tempting to link the modulation of the left IFG (together with the left post-S/MTG) to another ERP component-a more prolonged anteriorly-distributed negativity effect, which, in our ERP study using this same paradigm, was selectively enhanced to targets that disconfirmed semantic predictions [41]. Similar to the left IFG, anterior negativities have been linked to the suppression of semantic features that are predicted on the basis of context with medium certainty but that are not fulfilled by target words [9,53,114] (see Methods for explanation of why participants in this study are likely to have predicted upcoming semantic features with medium certainty in the higher predictive validity blocks).
It is important to recognize, however, that several other later (post-N400) ERP components have also been linked to unfulfilled semantic predictions, including a series of late positivity components (see [6,115] for reviews). Late positivities tend to be evoked by inputs that violate high certainty predictions that are generated not only at the level of semantic features, but also at other level(s) of representation (see [5] for discussion). For example, an anteriorly distributed positivity effect is evoked by words that violate or conflict with high certainty predictions about contingencies between semantic features and word-form (strong lexical predictions, e.g. [116]), while a posteriorly distributed or P600 effect is evoked by words that violate or conflict with high certainty predictions about contingencies between semantic features and syntactic properties (strong predictions about likely structure [115]). These late positivity effects may be linked to particularly rapid adaptation to new statistical environments. Thus, one possibility, which could be explored in future work, is that they are associated with further recruitment of the anterior cingulate, which, as discussed above, is thought to monitor changes in statistical contingencies in the environment, and indeed was first characterized as monitoring errors or conflicts between pre-potent predictions and bottom-up evidence [66,117,118].
A second set of open questions concerns the relationship between these findings and predictive processing during sentence and discourse processing. As we have discussed, the advantage of the relatedness proportion semantic priming design is that it was able to isolate predictive processing while holding both the local context and target words constant across conditions. However, it is necessarily more artificial than examining prediction during higher-level language comprehension, and here we explored just two levels of predictive validity. It will therefore be important for future studies to determine whether the same set of regions is modulated by predictive constraint in a more graded fashion during sentence and discourse-level processing.

Conclusions
We have shown clear differences in the modulation of activity within left temporal and inferior frontal cortices to the same associated and unrelated context prime-target pairs under conditions of higher versus lower predictive validity. Based on these results, we have suggested that the anterior superior/middle temporal cortex plays a role in predictive semantic facilitation, while the posterior superior/middle temporal cortex and the left inferior frontal cortex together mediate the suppression of unfulfilled medium-certainty semantic predictions and the lexicosemantic processing of unpredicted inputs, respectively. We have also shown that, under conditions of higher predictive validity, the latter two regions were not only more tightly interconnected with one another, but also with the anterior cingulate cortex, which, in turn was more tightly connected with a lateral prefrontal-subcortical network. This is consistent with a role of the anterior cingulate in mediating between prediction error and adaptation. This work therefore paves the way towards understanding how our brains use prediction error to adapt to our ever-changing real-world communicative environments.