Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Similar Brain Activation during False Belief Tasks in a Large Sample of Adults with and without Autism

  • Nicholas Dufour,

    Affiliation Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

  • Elizabeth Redcay,

    Affiliation Department of Psychology, University of Maryland, College Park, Maryland, United States of America

  • Liane Young,

    Affiliation Department of Psychology, Boston College, Boston, Massachusetts, United States of America

  • Penelope L. Mavros,

    Affiliation Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

  • Joseph M. Moran,

    Affiliation Psychology Department, Harvard University, Cambridge, Massachusetts, United States of America

  • Christina Triantafyllou,

    Affiliations Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

  • John D. E. Gabrieli,

    Affiliations Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

  • Rebecca Saxe

    Affiliations Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America


Reading about another person’s beliefs engages ‘Theory of Mind’ processes and elicits highly reliable brain activation across individuals and experimental paradigms. Using functional magnetic resonance imaging, we examined activation during a story task designed to elicit Theory of Mind processing in a very large sample of neurotypical (N = 462) individuals, and a group of high-functioning individuals with autism spectrum disorders (N = 31), using both region-of-interest and whole-brain analyses. This large sample allowed us to investigate group differences in brain activation to Theory of Mind tasks with unusually high sensitivity. There were no differences between neurotypical participants and those diagnosed with autism spectrum disorder. These results imply that the social cognitive impairments typical of autism spectrum disorder can occur without measurable changes in the size, location or response magnitude of activity during explicit Theory of Mind tasks administered to adults.


Theory of Mind (‘ToM’) is the capacity to represent mental states, such as thoughts, beliefs, desires, feelings, plans, suspicions and doubts [1]. Consideration of others’ mental states helps people in many everyday activities: teaching, flirting, coordinating and cooperating, playing games, conducting minor and massive deceptions, making moral judgments and appreciating fiction. Individuals with autism spectrum disorders (ASD) have impaired ToM. For example, children with ASD are disproportionately delayed on tasks that tap inferences about other people’s beliefs [2]. The neural mechanism of this impairment remains unknown. However, in neurotypical (NT) adults and children, fMRI studies reveal a remarkably reliable group of brain regions recruited during a ToM task of belief reasoning [3][9]. These regions include the left and right temporo-parietal junction (RTPJ and LTPJ), right anterior superior temporal sulcus (rSTS), the medial precuneus (PC), and the medial prefrontal cortex (MPFC).

Previous authors have suggested that ToM impairments in ASD could be caused by impaired function in the brain regions typically involved in ToM [10][13]. Attempts to characterize the function of these ToM-relevant brain regions in adults with ASD have yielded conflicting results, however. Some studies suggest that activations in ToM regions show no difference between ASD and NT individuals [14], [15]. Others find reduced activity (i.e. hypo-activity) [13], [16], or the opposite pattern, hyper-activity, in ASD [17], [18], while still others find evidence of all three patterns depending on the specific task demands [16], [19].

One factor contributing to these conflicting results may be that sample sizes are small, and individual variability is large. Small samples of individuals with ASD are problematic because individuals with ASD may be highly heterogeneous in their neural responses (e.g., [20]). Small samples of NT individuals are equally problematic, because they allow for calculation of only the mean of the typical response, not its distribution. Understanding the typical distribution is critical if neural measures are to be useful in a clinical or diagnostic setting. For most clinical applications, it is less important to describe differences between groups of individuals (e.g. studies of this nature have an average of 14 adults with ASD vs. 14 NT adults [13]), and more important to be able to describe the neural activity pattern of each specific individual, relative to typical and atypical distributions. For example, using fMRI to help diagnose an individual with ASD would require comparing each individual to the typical distribution.

In the current study, we therefore aggregated data collected over 8 years from 462 NT participants. This large sample allowed us to investigate individual differences in neural responses to a belief-reasoning ToM task, and measure any difference between NT participants and high-functioning adults with ASD with unusually high sensitivity. We also tested whether the response of ToM regions in NT individuals is related to basic demographic factors that may be relevant for ASD, including gender, age, and IQ.


All studies whose data are used in the current paper were reviewed and approved by the MIT IRB, the Committee on the Use of Human Experimental Subjects. Participants provided written, informed consent, in accordance with the guidelines of the MIT Committee on the Use of Human Experimental Subjects, and were compensated monetarily for their time.

Typical Participants

Data were analyzed from 462 NT participants (mean = 24.9 years, range: 18–69 years; 223 male). IQ was measured in 61 of these participants using the Kaufman Brief Intelligence Test (KBIT-2, IQ mean = 117.2, range: 82–134, SD = 12.2). NT Participants are summarized in Table 1.

ASD Participants

31 participants with a clinical diagnosis of ASD (mean = 32.5 years, range: 18–66 years; 26 male) were included in this analysis, having volunteered to participate in one of three previous studies [21][23]. In addition to a clinical diagnosis of ASD, the Autism Diagnostic Observation Schedule (ADOS) was administered (ADOS communication score mean = 3.2, SD = 1.3; ADOS social score M = 5.9, SD = 2.1). Each ASD participants had a combined social and communicative score > = 7 (the criterion for inclusion in the study). IQ was measured for all but one male participant with ASD (KBIT-2, IQ mean = 116.8, 69–141, SD = 15.7). In previous studies in our lab, these participants were found to have significant behavioral deficits in ToM [21], [22] in a moral judgment task.

For direct NT vs. ASD comparison, a set of 27 NT participants were chosen based on pairwise match with 27 ASD participants on IQ (±10 points), age (±5 years), and gender. The pairs were also matched on all experimental parameters (e.g. the coil used, the TR and slice thickness, the modality of the stimuli, the number of stimuli per condition, the presentation duration of the stimuli, and the task the participant performed) (Note several of the ASD participants were excluded since they could not be matched to a specific NT participant). These samples each contained 22 males, and were matched on age (ASD mean age = 31.0 years, range 18–66, SD = 11.5; NT mean age = 30.6 years, range 19–50, SD = 9.3) and IQ (ASD mean IQ = 117.9, 90–141, SD = 12.7; NT mean IQ = 115.1, 83–133, SD = 12.2); these 54 participants were termed the ‘matched’ sample.

fMRI Tasks

All participants were presented with verbal stories in English that described a character who acquired a false belief (Belief condition) or a physical representation that became false, such as an outdated photograph or map (Photo condition). For example, one Belief story was: “The morning of high school dance Sarah placed her high heel shoes under her dress and then went shopping. That afternoon, her sister borrowed the shoes and later put them under Sarah’s bed.” One sample Photo story was: “Sargent famously painted the south bank of the river in 1885. In 1910 a huge dam was built, flooding out the whole river basin, killing the old forests. Now the whole area is under water.” (More example stimuli are available at

Across conditions, the stories were matched for length (see Table 2 for more details about the tasks). Each participant read or heard an equal number of stories in the two conditions. Localizers were designed to present between 10 and 16 stories per condition to participants, though due to extenuating circumstances a small number of participants were presented with as few as 5 stories per condition while others saw as many as 24 (mean = 13.2). The stories were presented either visually as text on a screen (to 420 participants), or aurally through headphones (to 73 participants). In separate blocks of the same experiment, 121 participants also saw stimuli from other conditions (e.g. physical descriptions of objects, lists of unconnected words) but those conditions were not included in the current analyses. The duration of the stimulus block corresponded, on average, to 0.47 seconds per word (STD = 0.06 s), followed by 10–12 seconds of rest (these values were constant within each variant of the task, see Table 2). After reading or hearing each story, participants performed one of three tasks: true/false (TF, e.g. “In the painting the south bank of the river is wooded. True/False”, N = 304), fill in the blank (FITB, e.g. “In the painting the south bank of the river is… Wooded/Flooded”, N = 101), or word match-to-sample (MTS, e.g. in the preceding story, did you read “Painted”?, N = 88). Task was held constant within participant, but varied across participant. These tasks correspond to the functional localizers used in previously published studies [24][30].

fMRI Methods

Participants were scanned on a 3T Siemens scanner at the Martinos Imaging Center at the McGovern Institute for Brain Research at the Massachusetts Institute of Technology. NT participants were scanned between 2006 and 2013. ASD participants were scanned between 2007 and 2013. Matched NTs were scanned between 2007 and 2013. Functional data were acquired using echo-planar-imaging with voxel size of 3.125×3.125 mm and TE = 30 ms, flip angle = 90°, TR = 2 s. Slice thickness varied from 3.1 mm to 4.8 mm (mean 4.0 mm, SD 0.2 mm) Participants were scanned on either a 12-channel (N = 419) or a 32-channel receive coil (N = 74), both Siemens products. Data were analyzed using SPM8 ( and in-house software. The data were realigned to account for motion, smoothed with a 5 mm Gaussian smoothing kernel and normalized to a standard template in Montreal Neurological Institute space.

ROI Analyses

Seven functional ROIs from the ToM network were defined in individual participants, using the contrast Belief>Photo, consistent with previous literature (e.g. [4], [6]): right and left temporoparietal junction (RTPJ, LTPJ), the precuneus (PC), the dorsal, middle and ventral components of MPFC (DMPFC, MMPFC and VMPFC) and the right superior temporal sulcus (RSTS).

To identify individual functional ROIs, initial “hypothesis spaces” for each of the 7 regions were defined based on group random effects analysis and used as a guide to identify clusters of activation representing the ROI in participants. To ensure independence, the participants were split randomly into two groups (first half N = 247, second half N = 246), and the hypothesis space from one group’s random effects analyses was used to define ROIs in the participants belonging to the other group. ASD participants were evenly distributed between the two groups. The hypothesis spaces consisted of all voxels contained in a continuous cluster of suprathreshold voxels that include the region representative of the ROI. The ROI hypothesis spaces were approximately spherical, except the RSTS which was elongated following the sulcus. Averaged across both halves, the DMPFC comprised 1,185 voxels, all z>20 mm, centered at xyz coordinates (−1 mm, 53 mm, 29 mm). The MMPFC comprised 1,094 voxels, between z>0 mm and z<20 mm centered at (1 54 12). VMPFC comprised 774 voxels, all z<0 mm, at (1 50 −12). The RSTS comprised 3,002 voxels, all z>6 mm, centered at (55 −10 −16). The RTPJ comprised 2,812 voxels, all z>6 mm, centered at (54 −52 23). The LTPJ similarly compromised 2,444 voxels, all z>6 mm centered at (−52 −58 25). Finally, the PC hypothesis space consisted of 3,339 voxels centered at (1–56 34). Average ROI hypothesis spaces are available as binary images in the NIfTI-1 file format at

Each participant’s contrast image (Belief>Photo) was masked iteratively with the six hypothesis spaces. After each masking, candidate voxels were identified within the hypothesis space–where a voxel was a candidate if it was individually significant at p<0.001 (uncorrected) and contiguous with at least 10 other voxels significant at p<0.001. From this set of candidates, the voxel with the peak T is selected, along with all other candidate voxels that are contiguous with and not more than 9 mm from the peak. From each ROI, five parameters were extracted: the size of the ROI (number of voxels included), the mean T value across voxels included in the ROI, and the x-, y-, and z-coordinate of the ROI’s “center of mass,” being the average position of ROI voxels weighted by their T values. The presence or absence of an identified ROI in each region was itself used as an additional parameter.

The reliability of ROI parameters within participants was assessed by split-half analysis. Two contrast images were defined, one from even run data and another from odd run data, in each participant. The correlation of the ROI even and odd parameter values was measured across participants. Significance was established by iteratively permuting (5000 permutations) the even-half data across participants to generate an empirical ‘null’ distribution. We report individual differences as reliable if the true pairing showed a higher correlation than 90% of the empirical null distribution. (Note that since these analyses are based on half of the data per subject, they are conservative estimates of the reliability of individual differences measured based on the full dataset per individual).

Next, we sought (i) to remove variance from the ROI parameters associated with ‘nuisance’ demographic and experimental variables to better reveal differences (if any) between ASD and NT groups and (ii) to evaluate the effects (if any) of our demographic and experimental variables on ROI parameters. To both these ends, a multivariate Generalized Linear Model (GLM) was constructed for each ROI parameter with a nine-column (age, gender, group, modality, coil, number of stimuli per condition, mean words per stimulus, task type and the intercept term) predictor matrix using data from 493 (462 NT and 31 ASD) participants (see Table 3). For the binary statistic that indicated whether or not the ROI of interest was identified in a given subject, the GLM presumed a binomial distribution and a logit linker function. The GLM used a normal distribution and an identity linker function for all other ROI statistics. Regressors, except the intercept, were mean-centered prior to regression. Correction for multiple comparisons was performed with Bonferroni correction for the nuisance predictors, across all predictors (age, gender, etc.) and all dependent measures (mean T, number of voxels, etc.), within each ROI, as detailed below. With the exception of the beta values that relate the predictors to the probability of discovery (which is binomial), these beta values directly relate the size of the effect in parameter units (i.e., mm) per regressor unit (i.e., years). Given the number of predictors being used, we evaluated the estimability of the predictors of interest, particularly the group predictor, using Belsley’s Collinearity test [31]. The group predictor of interest was never found to exceed the standard tolerance (a variance decomposition proportion greater than 0.5 and a condition index greater than 30) established in the MATLAB collintest function. All predictor/parameter pairs were found to lie well within tolerance, across all tests.

Table 3. Representation of the analysis methods used in this study.

In addition to the large-sample GLM, three other samples were considered in turn using an identical procedure, the only variation being the subset of participants from which the predictor matrix was constructed (see Table 1, Table 3). These groups were: (1) the ‘Matched’ sample: 27 ASD participants were matched pairwise to 27 NT participants based on gender, age, IQ, coil, stimulus modality, and task (both ASD and NT participants are drawn from [21][23]), (2) the ‘ASD only’ sample: an analysis of variability within participants diagnosed with ASD and (3) the ‘IQ’ sample: including all 91 participants for whom IQ was collected. In each sample, all non-degenerate predictors were used (i.e., predictors whose values were defined in all participants and varied within-group, see Table 3). Estimability was assessed in the same way as in the Full sample. Across all tests, only one predictor, for one parameter, was found to exceed this tolerance: IQ when used to predict the mean T value of the MMPFC in the IQ sample. Thus, our predictors of interest were properly estimable in our models.

Because of the very large number of comparisons, we corrected p values using three different correction factors (m) according to Bonferroni’s formula, [corrected p] = 1-(1-[uncorrected p])m. For our key a priori predictors of interest (ASD vs NT in the full sample and the ‘Matched’ sample, and IQ in the ‘IQ’ sample), we corrected for the 6 dependent variables (i.e. the ROI parameters) per ROI, resulting in m = 6. The effect of ADOS score was measured in the ASD-only sample, and since it has two parameters (a social and a communication score), we used m = 12. All of the remaining predictors were treated as exploratory, so effects of these predictors are reported as significant corrected for both the number of dependent variables (6) and the number of nuisance predictors (9), resulting in m = 54 (exploratory predictors are only considered in the full sample). Any relationship found to be significant at p<0.01 uncorrected is discussed as a ‘trend,’ though corrected p-values are always reported for consistency.

In the matched sample, the significance effect of group on the mean value of each ROI parameter was also measured and multiple-comparison corrected nonparametrically. The objective of such nonparametric tests was to select an alpha using an empirical distribution such that the probability of any parameter within an ROI being a false positive result was 0.05. To this end, we permuted the group labeling randomly 25,000 times. In each permutation, the significance of the difference in means between the randomly generated groups was measured by a t-test. This yielded 25,000 p values for each parameter within an ROI. The p values for that ROI were pooled together and sorted, and the 0.83%tile (i.e., the 5th %tile divided by 6, the number of comparisons per ROI) p value was chosen. This p value represents an empirical threshold such that, for a given ROI, the chances of obtaining at least one p value less than it for any parameter is 5%. We also tested whether the groups differed in the variance of any ROI parameter using a similar strategy, with two important differences: the p value was calculated based on an Ansari-Bradley test, a nonparametric two-sample test of equal variances, and the “found/not found” parameter was omitted, since the mean and variance of a vector of 1′s and 0′s are directly related by a deterministic function.

Whole-brain analyses

Whole brain analyses were conducted on the main contrast of interest (Belief>Photo). To correct for multiple comparisons, nonparametric whole-brain analysis was performed using SnPM (, which estimates the false-positive rate directly from the data. Each test used 3 mm variance smoothing and 5,000 permutations, with no global normalization, grand mean scaling, or threshold masking. The corrected p-value for filtering was 0.05, with an uncorrected T-value minimum threshold of 3, and a voxel-cluster combining theta value of 0.5. Voxel-cluster combining was performed jointly by Fisher, Tippet and Mass voxel-cluster combining functions. Permutations were repeated for each predictor of interest; all demographic and experimental predictor variables were included in each model as nuisance regressors using a modified SnPM plugin designed to support nuisance regressors (see Table 3).

To look for subtle group differences, we also conducted a second, more sensitive whole-brain analysis. We used a more lenient voxel-wise threshold (p<0.001 uncorrected) to correct for multiple comparisons, and then validated the results using a split-half analysis. We used data from each participant’s even and odd runs, separately, to identify clusters showing a group difference (NT>ASD, or ASD>NT) in the response to Belief>Photo stories. We identified clusters in either the even or odd run random effects analyses, and extracted the response in those clusters in the other half of the data; clusters are reported as significant if the corresponding group difference was observed in the left out data at p<0.05 uncorrected.


ROI results

Six functional ROIs (ROIs) from the ToM network were defined in individual participants, using the contrast Belief>Photo, consistent with previous literature [4], [6]: RTPJ (in 464/493 individual participants, or 94%), LTPJ (87%), PC (91%), DMPFC (68%), MMPFC (64%), VMPFC (55%) and RSTS (85%). In the matched sample, ROI definition was successful in both groups: RTPJ (NT: 96%, ASD: 96%), LTPJ (89%, 82%), PC (82%, 93%), DMPFC (74%, 70%), MMPFC (48%, 59%), RSTS (93%, 74%).

The goal of this project was to explain individual differences in the size, magnitude and/or position of brain regions involved in ToM. Before testing individual differences, however, it was critical to determine that (i) there was variability in these measures, and (ii) the differences between participants on these measures are reliable (i.e. that inter-individual differences do not simply reflect noise in the measurement). All ROI parameters showed reasonable variability. The standard deviation of the mean T-value ranged between 0.5 and 1 across ROIs, and the standard deviation of ROI size (measured in number of voxels) ranged between 60 and 100 voxels. In order to test whether this variability reflects stable individual differences, we compared the correlation of ROI parameters from independent halves of the data from the each individual to an empirical permutation-based ‘null’ distribution of these correlations. Both mean T and ROI size were reliable within individual, compared to variability across participants, for all ROIs (mean T: all r>0.25, rank>96%; size: all r>0.13, rank>90%), except VMPFC. Center of mass was somewhat less reliable: the x position was reliable (rank>90%) for RTPJ, LTPJ and MMPFC; the y position was reliable for RTPJ, PC, DMPFC, MMPFC, and RSTS; and the z position was reliable for RTPJ, DMPFC, VMPFC, and RSTS.

Next we used multivariate GLM analyses to estimate whether any variance in the size, position or response magnitude of ToM brain regions is explained by whether an individual has been diagnosed with ASD.

For the large sample analysis, we compared all of the participants with ASD (N = 31, 26 male) to all of the NT participants (N = 462, 197 male). No parameter of any ROI was significantly predicted by the group membership (ASD vs. NT) of the individual (all p>0.22 for all ROIs, see Figure 1, Table 4). Furthermore, the odds ratio favoring the null hypothesis (no difference between the distributions) over the alternative hypothesis (a difference between NT and ASD), for all regions and all parameters was greater than 1.8∶1 (Bayes factor, [32]), with two exceptions: for the mean T in VMPFC (0.8∶1) and the probability of finding RSTS (1.1∶1) the odds of the null and alternative hypotheses were approximately equal. No ASD participant fell outside of 3 standard deviations of the typical distribution on any measure for any ROI. The confidence intervals on the coefficient estimates were quite small, indicating a high degree of confidence that if any differences exist, those differences are very small. For instance, if there exists a difference in the mean T value of the RTPJ voxels between ASD and NT participants, we are 99% certain this difference is less than a T value of.3 in either direction.

Figure 1. Histograms of ROI mean T-values for six Theory-of-Mind ROIs individually defined in each participant.

The T-values are from the Belief>Photo contrast in each participant. The minimum T-value for defining an ROI was 3. Blue bars show NT participants while green squares show individual ASD participants. Insets show individually defined ROI for a representative participant. The regions shown are: (top left) right temporoparietal junction, (top right) left temporoparietal junction, (middle left) precuneus, (middle right) dorsal medial prefrontal cortex, (bottom left) ventral medial prefrontal cortex, and (bottom right) right anterior superior temporal sulcus.

Table 4. Summary of effects of group membership on ROI parameters in the Full and Matched sample.

Next we compared participants with ASD to NT participants in the ‘Matched’ group. Again, we found no significant difference between participants with ASD and the matched controls on any ROI parameter (all p>0.24, Table 4). For these comparisons, the odds ratio favored the null hypothesis over the alternative hypothesis (i.e. ratio>1.1∶1) for all regions and all parameters, with one exception, the probability of finding activity in the PC (0.85∶1). We also confirmed these results using nonparametric tests of group differences (which do not assume that the measured variables are normally distributed and used the null distributions to establish a corrected alpha). No ROI parameter showed an effect of group (the closest to significance was p>0.07, against a corrected threshold of p<0.01). We conducted a similar analysis to test whether the ASD group showed a more heterogeneous response (i.e. some participants showing hypo-activation while others showed hyper-activation). There was no evidence of increased variance in the ASD group, for any parameter for any ROI (the closest to significance was p>0.01, against a corrected threshold of p<0.002).

In the ‘ASD only’ group, we next considered the effect of ADOS scores (i.e. social and communicative symptom severity). No significant effects of ASD severity were found. At the level of a trend (i.e. p<0.01 uncorrected), an individual’s social ADOS score predicted the position of the RSTS along the STS (along the anterior-posterior axis), with greater social ADOS scores predicting more anterior RSTS ROIs (t(14) = 3.03, p = 0.11, β = 4.97±4.88 mm/ADOS point).

Finally, the effect of IQ was assessed in the ‘IQ group’. Higher IQ significantly increased the chances of identifying the PC (t(83) = 2.76, p = 0.03, β = 0.10±0.09), the MMPFC (t(83) = 2.88, p = 0.02, β = 0.06±0.05), and the VMPFC (t(83) = 2.74, p = 0.04, β = 0.06±0.06). Further, higher IQ predicted significantly greater mean T-value (t(74) = 3.35, p = 0.008, β = 0.03±0.02) and size (t(74) = 3.14, p = 0.01, β = 2.63±2.21) of the PC.

As an exploratory analysis, we looked for effects of other demographic and experimental parameters. We found that gender and age did not affect any ROI parameter, even at the level of a trend, nor did the modality of the stimuli.

In the full sample, the variable with the greatest effect was the choice of coil. The 32-channel coil produced ROIs with significantly greater mean T-value in all ROIs except VMPFC (RTPJ: t(455) = 7.54, p<0.0001, β = 0.57±0.20, LTPJ: t(422) = 6.55, p<0.0001, β = 0.49±0.19, PC: t(437) = 6.05, p<0.0001, β = 0. 46±0.19, DMPFC: t(328) = 3.92, p = 0.006, β = 0.21±0.14, MMPFC: t(308) = 4.50, p<0.0005, β = 0.26±0.15, RSTS: t(410) = 4.25, p<0.0001, β = 0.36±0.14). RTPJ and RSTS were also significantly larger in participants scanned with the 32-channel coil (RTPJ: t(455) = 4.27, p = 0.0001, β = 33.68±20.42, RSTS: t(410) = 4.25, p = 0.0016, β = 24.85±15.13). All ROIs were numerically, but not significantly, more likely to be found using the 32-channel coil versus the 12-channel coil. Coil choice did not affect the center of mass of the ROIs.

A larger number of words per stimulus slightly but significantly decreased the mean T-value of the RTPJ (t(455) = −3.99, p = 0.004, β = −0.020±0.013). At the level of a trend (i.e. p<0.01 uncorrected), a similar effect was observed in the LTPJ (mean T-value: t(422) = −3.32, p<0.06, β = −0.017±0.013; and size: t(422) =  −2.72, p = 0.3, β = −1.30±1.23). However, the results pertaining to the mean number of words per stimulus come with a caveat. One version of the ToM task (localizer B, see Table 1) had substantially more words per stimulus than any other version (58 vs 31); Localizer B was also the only version that used a Match-to-Sample task, so these effects may be related to stimulus length, task, or the specific stimuli used in this experiment.

The number of stimuli per condition did not predict any ROI parameter significantly. At the level of a trend, the VMPFC tended to have fewer voxels as the number of stimuli increased (t(260) =  −2.62, p = 0.4, β = −4.13±4.10).

In sum, ROI analyses suggest that while individuals differ reliably in the size and response magnitude (and to a lesser extent, position) of brain regions associated with ToM, these neural parameters were not affected by whether an individual was diagnosed with ASD. Within the range of ADOS scores in the current sample, autism severity did not explain variance in these ROI parameters, either. Only experimental parameters, such as the MRI coil used, and demographic variables, such as IQ, explained some of the variance across participants. However, ROI analyses inevitably provide a limited window on the brain, so to look further for differences between groups in ToM brain regions, we conducted whole brain analyses.

Whole brain analysis results

In the whole-brain analyses, the main effect identifies brain regions significantly recruited during Belief compared to Photo stories, controlling for variance explained by any of the nuisance regressors. This analysis identified robust activation in all of the regions previously associated with Theory of Mind, including bilateral TPJ, medial precuneus and posterior cingulate, MPFC, and STS (see Figure 2, Table 5). It also identified activation in other regions, including (bilaterally) the hippocampus, the parahippocampal gyrus, the temporal poles, the amygdala, and the dorsolateral prefrontal cortex.

Figure 2. Results of the random effects analysis of the main effect (Belief>Photo).

In blue, regions identified in NT participants (N = 462) as responding to the false belief condition more than the false photo condition (p<0.001 uncorrected). Results from ASD participants are in red (p<0.001, shown in purple because of overlap). Data are overlayed on the MNI template brain. The volume is centered at [0 mm −54 mm 28 mm], showing the LTPJ and RTPJ (visible in the coronal and axial slices), the PC (in all slices) and the MPFC (in the sagittal slice).

Table 5. Table of clusters and peaks identified in the random effects analysis of the main effect, in the matched sample, separated by group and sorted by size.

Next, we compared activation in individuals with ASD vs NT adults, in both the Matched and Full sample. There were no significant differences in activation, when correcting for multiple comparisons. We then repeated the whole brain analysis in the Full sample using a lenient threshold (p<0.001 voxel-wise, uncorrected) in half of the data, and validated the results in the remaining half (p<0.05). Two clusters were identified by the contrast NT>ASD × Belief>Photo: one in left anterior IPS (14 voxels, peak at [−32 mm, −40 mm, 40 mm]), and the other in left posterior IPS (38 voxels, peak at [−34 mm −38 mm 42 mm]). The anterior IPS cluster was identified in both odd and even halves of the data (independent validation in even half: t(462,31) = 3.15, p = 0.002), whereas the posterior IPS cluster was found only in the odd half, but validated in the even half (t(462,31) = 2.16, p = 0.03; see Figure 3). In both regions, both groups showed higher responses to the Photo than the Belief stories, but ASD participant’s greater activation during the Photo stories than NT participants. No regions were reliably recruited more in ASD than in NT individuals, for Belief>Photo.

Figure 3. Clusters identified in a split-half analysis for which group has a significant effect on the condition difference.

Two clusters, both in the intraparietal sulcus (IPS) were found to be reliable in the split-half analysis. The mean beta, averaged across individuals, for each condition in each group is shown. pIPS denotes posterior IPS.

A variety of other experimental covariates yielded clusters of activation, although we treat these as exploratory. These are listed in Table 6.

Table 6. Other clusters identified (Corrected). X, Y, and Z coordinates are mm, in MNI Space. A (−) indicates clusters that correlate negatively with the covariate.


The main question we sought to address in this paper was whether adults diagnosed with ASD show differences in the magnitude or location of activations in ToM-associated brain regions, compared to a large sample of NT participants. In order to answer this question, we aggregated data across multiple experiments to produce a large sample of NT individuals (N = 462) and a moderately large sample of high functioning individuals with ASD (N = 31). We tested whether the magnitude of neural responses to stories about people’s beliefs, versus about physical representations like photographs, differed between groups either in targeted regions of interest or in whole brain analyses. These analyses identified no reliable differences between groups in the previously identified ToM brain regions. These results suggest that differences in activation between these groups of participants during explicit Theory of Mind tasks, if they exist, are very small and could not be used to diagnose ASD.

Effects of ASD on ToM activations

We used two complementary analysis strategies: ROI analyses focused on previous identified ToM brain regions are more sensitive, whereas whole brain analyses look for differences between groups anywhere in the brain, and therefore are less restricted. For both kinds of analyses, we conducted two comparisons. First, we compared the ASD group to the whole group of NT individuals, using simultaneous nuisance regressors to control for variance associated with demographic and experimental differences among participants. Second, we compared the ASD group to a smaller sample of NT individuals, one-to-one matched to the ASD group on age, gender, IQ and experimental parameters. For both comparisons, we found no reliable differences between groups in the size, response magnitude, or probability of identifying above-threshold voxels, in any ToM ROI (see Figure 1).

In addition to the absence of mean differences between the groups, we found no evidence that even a subset of individuals with ASD differed significantly from the typical population. The ROI parameters of individuals with ASD fell squarely within the distribution of typical values, rarely straying more than 2 SD from the typical means and never more than 3 SD. We also tested the hypothesis that the ASD group was more heterogeneous than the NT group. For example, similar mean activation could mask differences between the groups if the ASD group included a bimodal distribution: some individuals showing hypo-activation while other show hyper-activation. We found no evidence for this hypothesis, as the variance of the ROI parameters did not differ significantly across groups in the matched sample.

In the whole brain analyses, permutation-based correction revealed no significant differences between ASD and NT individuals, in either the full sample or the matched sample. Because our results overall suggest a null result–namely, no difference between groups–we also examined the same analyses at a more lenient threshold in half of the data (in case true differences between groups that are just below the threshold for significance), and then validated in the left-out half. We found two regions of parietal cortex with reliable effects; however, the group differences in these regions were in an unpredicted direction. In both regions, both groups showed more activation during Photo than Belief stories, but the ASD group showed more activation than the NT group during Photo stories. Furthermore, these regions were not near any of the regions implicated in ToM by the overall contrast of Belief>Photo. While intriguing, differences in these regions therefore do not seem likely to explain impairments in ToM typically observed in ASD. We could not identify any region that both (a) was reliably recruited for Belief more than Photo stories in 462 NT individuals, and (b) showed significantly less, or more, activation in the same contrast in 31 individuals with ASD.

Effects of other experimental parameters on ToM activations

Using a similar analysis strategy, we also found that gender does not affect activity in ToM brain regions; nor do the modality of the stimuli (visual vs. aural) or the experimental task. The absence of an effect of gender is particularly noteworthy, because the full sample contained a large number of male and female participants. Behavioral measures of ToM often reveal an advantage for female individuals [33], [34]; apparently this advantage is not due to measurable differences in ToM-associated brain region activity as elicited by the false belief task.

One factor that did have a significant effect on ROI parameters was the coil used. The 32-channel coil has documented higher SNR [35]; we found that this difference translated into larger ROIs that were more likely to be detected in individual participants. Thus, our results suggest that for individually-defined ROI analyses, the increased SNR of the 32-channel coil provides a clear benefit.

Interpreting the current results

With regard to our key null results, the current study has advantages and disadvantages. On the one hand, the large sample size provides more power and sensitivity to detect effects where they exist. In particular, although our sample of ASD individuals was only moderately large, the very large sample of NT individuals included gives us very high confidence on the true mean of the ROI parameters in NT individuals. Finding that the ASD population mean does not differ from the NT mean is thus strong evidence that these data cannot be attributed to different population distributions.

However, these results cannot be interpreted as ruling out any differences in the neural mechanisms for ToM in individuals with ASD. One qualification of the current results is that the parameters measured here (i.e. response magnitude to Belief vs. Photo stories) provide only a limited measure of a region’s function. Other measures include the functional connectivity of each region and within-region spatial pattern of responses [36], [37]. Individuals with ASD may differ in these other measures of ToM region function [14], [38]. Indeed using multi-voxel pattern analysis, we found reliable differences between a subset of these same ASD and NT individuals in the pattern of activity in ToM regions [21].

A second qualification is that these results apply to a specific functional task: an explicit, verbal false belief task. It may be that deficits in theory of mind in individuals with ASD disproportionately affect implicit or spontaneous consideration of others’ mental states, but not performance on explicit tasks [39]. fMRI studies using tasks that elicit spontaneous or implicit social processing may be more likely to find hypo-activation [23], [40][42], whereas those with tasks that demand explicit social judgments find normal or hyper-activation [19]. For example, spontaneous processing of irony may produce hypo-activation in ASD [43], whereas explicit instructions eliminate the hypo-activation, and may even cause hyper-activation [19].

Finally, a third qualification is that the ASD participants in the current sample are very high functioning. Although they meet clinical diagnostic criteria for ASD (and have been shown to have behavioral deficits in ToM tasks in a previous study [22]), these individuals are highly verbal and pass first-order false belief tasks. Thus, our results do not rule out gross differences in the ToM regions of lower-functioning individuals with ASD. On the other hand, the individuals in our sample are diagnosed with ASD because of disproportionate difficulties with social interaction and communication. Also, we found no evidence that within our participants increasing ASD severity had any effect on the measured ROI parameters. So the current results imply that social cognitive impairments can occur without measurable changes in the magnitude or position of ToM brain regions. Collectively, the current results provide strong evidence that the neural differences between high functioning adults with ASD and NT participants are not due to gross changes in ToM brain regions.

A common hypothesis is that the lack of performance differences between NT and high-functioning ASD individuals is a function of the development of compensatory processes in the ASD individuals. Our findings provide evidence against this hypothesis. Compensation predicts that successful performance on explicit ToM tasks would be supported by activity in other regions than (or in addition to) ToM regions. For example, one possible prediction might be that individuals with ASD pass false belief tasks by recruiting the mechanisms that NT individuals used to solve the logically similar ‘False Photograph’ tasks, such as the fronto-parietal network [44], [45]. By contrast to these predictions, we found no sign of any increased compensatory activation during Belief stories, in ASD compared to NT individuals, in any region.

These results leave open a number of key questions. First, it will be important to identify the neural differences between adults with ASD and NT individuals that do account for behavioral differences in ToM. One possibility is that individuals with ASD are highly heterogeneous, so that different neural sources explain the behavioral delays in different individuals. As noted above, though, we do not see evidence for this possibility in the current data. Another possibility, also discussed above, is that the difficulties in theory of mind processing are related to the online use of these regions in real-world social interactions. It will be important to determine what social contexts lead to atypical as well as typical recruitment of these brain regions in ASD. Perhaps ToM brain regions can be recruited during explicit tasks but atypical interaction with other brain regions and networks results in hypoactivation during implicit tasks. Third, the current study focused on adults. It will be important in future research to test whether the developmental trajectory of ToM brain regions differs in children with ASD compared to NT children, even if the mature states of the system are reasonably similar. Finally, it would be useful to extend these analyses to lower-functioning individuals with ASD.

Nevertheless, the implications of this study are that (i) social-cognitive impairments can occur without large differences in the activation of ToM brain regions; and (ii) hypo-activation during explicit Theory of Mind tasks will not be useful for diagnosing ASD.


The authors wish to acknowledge Marina Bedny, Emile Bruneau, Hyowon Gweon and Jorie Koster-Hale for collecting fMRI data, and Diana Gerr for help curating the dataset.

Author Contributions

Conceived and designed the experiments: ER LY JM JG RS. Performed the experiments: ND ER LY PM JM. Analyzed the data: ND. Contributed reagents/materials/analysis tools: CT. Wrote the paper: ND RS.


  1. 1. Premack D, Woodruff G (1978) Does the chimpanzee have a theory of mind. Behav Brain Sci 1: 515–526.
  2. 2. Baron-Cohen S (1989) The autistic child's theory of mind: A case of specific developmental delay. J Child Psychol Psychiatry 30: 285–297.
  3. 3. Castelli F, Happé F, Frith U, Frith C (2000) Movement and mind: A functional imaging study of perception and interpretation of complex intentional movement patterns. Neuroimage 12: 314–325.
  4. 4. Frith U, Frith CD (2003) Development and neurophysiology of mentalizing. Phil Trans R Soc Lond B Biol Sci 358: 459–473.
  5. 5. Vollm BA, Taylor ANW, Richardson P, Corcoran R, Stirling J, et al. (2006) Neuronal correlates of theory of mind and empathy: A functional magnetic resonance imaging study in a nonverbal task. Neuroimage 29: 90–98.
  6. 6. Saxe R, Kanwisher N (2003) People thinking about thinking people:: The role of the temporo-parietal junction in. Neuroimage 19: 1835–1842.
  7. 7. Wolf I, Dziobek I, Heekeren HR (2010) Neural correlates of social cognition in naturalistic settings: A model-free analysis approach. Neuroimage 49: 894–904.
  8. 8. Mar RA (2011) The neural bases of social cognition and story comprehension. Annu Rev Psychol 62: 103–134.
  9. 9. Sommer M, Döhnel K, Sodian B, Meinhardt J, Thoermer C, et al. (2007) Neural correlates of true and false belief reasoning. Neuroimage 35: 1378–1384.
  10. 10. Frith U (2001) Mind blindness and the brain in autism. Neuron 32: 969–979.
  11. 11. Gallagher HL, Happe F, Brunswick N, Fletcher PC, Frith U, et al. (2000) Reading the mind in cartoons and stories: An fMRI study of 'theory of mind' in verbal and nonverbal tasks. Neuropsychologia 38: 11–21.
  12. 12. Castelli F, Frith C, Happe F, Frith U (2002) Autism, asperger syndrome and brain mechanisms for the attribution of mental states to animated shapes. Brain 125: 1839–1849.
  13. 13. Lombardo MV, Chakrabarti B, Bullmore ET, Baron-Cohen S (2011) Specialization of right temporo-parietal junction for mentalizing and its relation to social impairments in autism. Neuroimage 56: 1832–1838.
  14. 14. Gilbert SJ, Bird G, Brindley R, Frith CD, Burgess PW (2008) Atypical recruitment of medial prefrontal cortex in autism spectrum disorders: An fMRI study of two executive function tasks. Neuropsychologia 46: 2281–2291.
  15. 15. Nieminen-von Wendt T, Metsähonkala L, Kulomäki T, Aalto S, Autti T, et al. (2003) Changes in cerebral blood flow in asperger syndrome during theory of mind tasks presented by the auditory route. Eur Child Adolesc Psychiatry 12: 178–189.
  16. 16. Kennedy DP, Courchesne E (2008) Functional abnormalities of the default network during self-and other-reflection in autism. Soc Cogn Affect Neurosci 3: 177–190.
  17. 17. Dichter GS, Felder JN, Bodfish JW (2009) Autism is characterized by dorsal anterior cingulate hyperactivation during social target detection. Soc Cogn Affect Neurosci 4: 215–226.
  18. 18. Mason RA, Williams DL, Kana RK, Minshew N, Just MA (2008) Theory of mind disruption and recruitment of the right hemisphere during narrative comprehension in autism. Neuropsychologia 46: 269–280.
  19. 19. Wang AT, Lee SS, Sigman M, Dapretto M (2006) Neural basis of irony comprehension in children with autism: The role of prosody and context. Brain 129: 932–943.
  20. 20. Hasson U, Avidan G, Gelbard H, Vallines I, Harel M, et al. (2009) Shared and idiosyncratic cortical activation patterns in autism revealed under continuous real-life viewing conditions. Autism Res 2: 220–231.
  21. 21. Koster-Hale J, Saxe R, Dungan J, Young LL (2013) Decoding moral judgments from neural representations of intentions. Proceedings of the National Academy of Sciences 110: 5648–5653.
  22. 22. Moran JM, Young LL, Saxe R, Lee SM, O′Young D, et al. (2011) Impaired theory of mind for moral judgment in high-functioning autism. Proc Natl Acad Sci USA 108: 2688–2692.
  23. 23. Redcay E, Dodell-Feder D, Mavros PL, Kleiner M, Pearrow MJ, et al.. (2012) Atypical brain activation patterns during a face-to-face joint attention game in adults with autism spectrum disorder. Hum Brain Mapp.
  24. 24. Dodell-Feder D, Koster-Hale J, Bedny M, Saxe R (2010) fMRI item analysis in a theory of mind task. Neuroimage 55: 705–712.
  25. 25. Kliemann D, Young L, Scholz J, Saxe R (2008) The influence of prior record on moral judgment. Neuropsychologia 46: 2949–2957.
  26. 26. Young L, Camprodon JA, Hauser M, Pascual-Leone A, Saxe R (2010) Disruption of the right temporoparietal junction with transcranial magnetic stimulation reduces the role of beliefs in moral judgments. Proc Natl Acad Sci USA 107: 6753.
  27. 27. Young L, Scholz J, Saxe R (2011) Neural evidence for “intuitive prosecution”: The use of mental state information for negative moral verdicts. Soc Neurosci 6: 302–315.
  28. 28. Young L, Nichols S, Saxe R (2010) Investigating the neural and cognitive basis of moral luck: It’s not what you do but what you know. Rev Phil Psych 1: 333–349.
  29. 29. Young L, Saxe R (2008) The neural basis of belief encoding and integration in moral judgment. Neuroimage 40: 1912–1920.
  30. 30. Young L, Saxe R (2009) An fMRI investigation of spontaneous mental state inference for moral judgment. J Cogn Neurosci 21: 1396–1405.
  31. 31. Belsley DA, Kuh E, Welsch RE (2005) Regression diagnostics: Identifying influential data and sources of collinearity. : Wiley-Interscience.
  32. 32. Rouder JN, Speckman PL, Sun D, Morey RD, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16: 225–237.
  33. 33. Baron-Cohen S, Jolliffe T, Mortimore C, Robertson M (1997) Another advanced test of theory of mind: Evidence from very high functioning adults with autism or asperger syndrome. J Child Psychol Psychiatry 38: 813–822.
  34. 34. Baron-Cohen S, Wheelwright S, Hill J, Raste Y, Plumb I (2001) The “Reading the mind in the eyes” test revised version: A study with normal adults, and adults with asperger syndrome or high-functioning autism. J Child Psychol Psychiatry 42: 241–251.
  35. 35. Triantafyllou C, Polimeni JR, Wald LL (2011) Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. Neuroimage 55: 597–606.
  36. 36. Biswal B, Zerrin Yetkin F, Haughton VM, Hyde JS (1995) Functional connectivity in the motor cortex of resting human brain using echo-planar mri. Magn Reson Med 34: 537–541.
  37. 37. Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, et al. (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293: 2425–2430.
  38. 38. Kleinhans NM, Richards T, Sterling L, Stegbauer KC, Mahurin R, et al. (2008) Abnormal functional connectivity in autism spectrum disorders during face processing. Brain 131: 1000–1012.
  39. 39. Senju A, Southgate V, White S, Frith U (2009) Mindblind eyes: An absence of spontaneous theory of mind in asperger syndrome. Science 325: 883–885.
  40. 40. Groen W, Tesink C, Petersson K, Van Berkum J, Van Der Gaag R, et al. (2010) Semantic, factual, and social language comprehension in adolescents with autism: An FMRI study. Cereb Cortex 20: 1937–1945.
  41. 41. Pelphrey KA, Carter EJ (2008) Brain mechanisms for social perception. Ann N Y Acad Sci 1145: 283–299.
  42. 42. Adolphs R, Sears L, Piven J (2001) Abnormal processing of social information from faces in autism. J Cogn Neurosci 13: 232–240.
  43. 43. Wang AT, Lee SS, Sigman M, Dapretto M (2007) Reading affect in the face and voice: Neural correlates of interpreting communicative intent in children and adolescents with autism spectrum disorders. Arch Gen Psychiatry 64: 698.
  44. 44. Saxe R, Schulz LE, Jiang YV (2006) Reading minds versus following rules: Dissociating theory of mind and executive control in the brain. Social Neuroscience 1: 284–298.
  45. 45. Wellman HM, Baron-Cohen S, Caswell R, Gomez JC, Swettenham J, et al. (2002) Thought-bubbles help children with autism acquire an alternative to a theory of mind. Autism 6: 343–363.