Language Differences in the Brain Network for Reading in Naturalistic Story Reading and Lexical Decision

Differences in how writing systems represent language raise important questions about whether there could be a universal functional architecture for reading across languages. In order to study potential language differences in the neural networks that support reading skill, we collected fMRI data from readers of alphabetic (English) and morpho-syllabic (Chinese) writing systems during two reading tasks. In one, participants read short stories under conditions that approximate natural reading, and in the other, participants decided whether individual stimuli were real words or not. Prior work comparing these two writing systems has overwhelmingly used meta-linguistic tasks, generally supporting the conclusion that the reading system is organized differently for skilled readers of Chinese and English. We observed that language differences in the reading network were greatly dependent on task. In lexical decision, a pattern consistent with prior research was observed in which the Middle Frontal Gyrus (MFG) and right Fusiform Gyrus (rFFG) were more active for Chinese than for English, whereas the posterior temporal sulcus was more active for English than for Chinese. We found a very different pattern of language effects in a naturalistic reading paradigm, during which significant differences were only observed in visual regions not typically considered specific to the reading network, and the middle temporal gyrus, which is thought to be important for direct mapping of orthography to semantics. Indeed, in areas that are often discussed as supporting distinct cognitive or linguistic functions between the two languages, we observed interaction. Specifically, language differences were most pronounced in MFG and rFFG during the lexical decision task, whereas no language differences were observed in these areas during silent reading of text for comprehension.


Introduction
Writing systems differ dramatically in how they represent language in written form: alphabets and syllabaries emphasize fidelity to the spoken forms of the language, whereas morphosyllabic systems combine probabilistic information about both sound and meaning [1][2][3]. Theoretical models also differ on whether distinct cognitive processes are required for reading these two kinds of writing system, as exemplified by models of English and Chinese.
One view holds that reading in English and Chinese involves the same set of mappings among orthographic (written) and phonological (spoken) forms of words and their semantics (meaning, [4]). Thus, the same basic processes are engaged by English and Chinese, but the "division of labor" [5] between them differs by degree. Consistent with this approach, we have shown that statistical learning models with the same functional architecture and learning rules simulate a range of effects in typical and disordered reading in both English and Chinese [6,7].
Another view holds that English and Chinese writing systems differ qualitatively in the cognitive and neural processes they engage. English is characterized as involving the application of spelling-to-sound rules, so that the spoken form of a word can be "assembled" from its smaller, alphabetic parts (e.g., [8]). This process is associated with activity in the posterior superior temporal gyrus (pSTG, [9,10]). Chinese, in contrast, is characterized as permitting only "addressed" phonology, or the retrieval from memory of whole syllables based on whole characters [3] a process further related to processing of non-linear spatial arrangements of orthographic forms, which is associated with activity in the left middle frontal gyrus (MFG, [11,12]).
Thus far, language differences in the reading system have been observed entirely in tasks in which participants must make judgments about single words or pairs of words presented in isolation (e.g., [13][14][15]). Such tasks involve a host of ancillary meta-linguistic and decision processes that may determine how reading processes are flexibly deployed to meet task demands [16,17] and that are unlikely to be engaged during reading under normal ecological conditions. Further, recent studies have demonstrated very large task by stimulus interactions throughout the reading system [18][19][20].
In the current study, we examined whether main effects of language on the organization of the reading system might be embedded in task by language interactions. We tested this by comparing activity in the reading network for English and Chinese under a naturalistic reading task and a commonly used artificial laboratory task (lexical decision). Data from the lexical decision task are consistent with prior comparisons between English and Chinese (e.g., the metaanalyses by [11,21]). In contrast, data from the naturalistic reading task reveal a very different pattern of language. Language by task interactions are also observed in many regions, raising concerns that effects of language observed in studies of the reading system may need to be understood in the context of interactions between language properties and task demands.

Participants
Sixteen monolingual adults with normal vision participated in each experiment from each of two sites (one in the US, the other in China). Participants were matched across sites for age and education. They provided written informed consent and paid an hourly stipend. Each participant only performed one experimental task. The study was approved by ethical review boards at all sites (the Weill Cornell Medical College IRB and Yale's Humans Research Protection Program in US, IRB of BNU Imaging Center for Brain Research (BICBR) in China). Experiment 1. Chinese speakers in Experiment 1 (8 males, 8 females) had a mean age of 21.8 (range: [18][19][20][21][22][23][24][25]. English speakers in Experiment 1 (8 males, 8 females) had a mean age of 24.5 (range: [18][19][20][21][22][23][24][25]. Data from Chinese speakers were collected at Beijing Normal University, and data from English speakers were collected at the Yale Magnetic Resonance Research Center.

Materials
Experiment 1. For the naturalistic reading task, six fairy tales (by Hans Christian Andersen) in Chinese translation were translated to English phrase by phrase, in order to match the timing of the experiment as closely as possible between languages. Each story had an average of 96 phrases with and average of 12 characters/words per phrase. A full story was presented in each run, and was split into 4 blocks, two for reading and two for listening task. Reading and listening blocks mixed in each run. Data from the periods during which the story was presented in spoken form are not included in the current analyses. Experiment 2. For the lexical decision task, stimuli in Chinese comprised real characters, pseudo-characters and "artificial" character-like stimuli, with 30 stimuli in each condition designed to manipulate wordlikeness parametrically [22,23]. The real characters were selected to be "phonograms", comprising a combination of a phonetic component that provides probabilistic information about pronunciation and a semantic component that provides probabilistic information about meaning. Three types of pseudo-characters were constructed, reflecting a parametric manipulation of wordlikeness: pseudo-characters containing both phonetic and semantic components, pseudo-characters containing only semantic components and pseudocharacters containing neither phonetic nor semantic components. Two types of artificial stimuli were also included, one in which the position of legal orthographic structures was reversed, violating orthotactic constraints of Chinese [24], and a stimulus type in which the position of strokes that make up a real character were randomly organized, destroying any larger-scale orthographic information. Ninety additional real character stimuli were included in order to balance the number of "word" responses in the lexical decision task. Filler frequency, alignment (left-right), number of radicals and strokes were matched to the target stimuli.
Stimuli for the English lexical decision task were also designed to vary parametrically in word-likeness. Thirty low-to-moderate frequency real words from four to six letters long were selected from the English Lexicon Project (http://elexicon.wustl.edu/default.asp), and a set of length-and bigram-frequency-matched pseudo-words was created using the same software. Two kinds of consonant string were created by combining either high-frequency bigrams (pairs of letters that occur frequently together such as "WR" and "GL") or low frequency bigrams, such as "KY" and "BK." Bigram frequencies were taken from Berndt, Reggia, and Mitchum [25]. Finally, a non-text condition was created by rearranging the strokes of individual letters to destroy any letter-level orthographic information. Thirty additional real-word stimuli-matched to the critical stimuli on length, frequency, and bigram frequency-were included as filler items. Note that although these are conceptually parallel manipulations, they necessarily differ in many details, making it impossible to examine effects of stimulus type across languages.

Procedure
The same general procedures were used for both languages in each experiment. In both experiments, participants were familiarized with the task (for the lexical decision experiment, they were given a short practice session with different items), then lay comfortably in the scanner and viewed stimuli via rear projection. Experiment 1. In the naturalistic reading experiments, participants read a story presented phrase by phrase at the center of the screen. Based on pilot work in which a comfortable reading rate was established, each phrase was presented for 2 seconds. Text was presented during long blocks of approximately one minute in length during which an average of 26 (range: [22][23][24][25][26][27][28][29] phrases were presented, yielding a mean block duration of 52s (range: 44-58s). The slight variability in block duration was introduced to avoid ending blocks mid-sentence. Each block was followed by 20s of rest, after which the story was continued in the auditory modality. Data from these listening blocks were not analyzed for the current report. After each story, a set of four multiple choice comprehension questions was presented. Experiment 2. In the lexical decision experiments, individual stimuli were presented one at a time at the center of the screen, and participants were instructed to respond as quickly and accurately as possible, pushing a button with their right index finger for real words/characters, or their right middle finger to non-word/non-character stimuli. A fast random interval eventrelated design was used. On each trial, a 200ms fixation cross was presented, followed by a stimulus presented for 500ms, followed by a randomly jittered ITI (mean: 5.3s, range: 1-14s). The task was completed in two consecutive runs. Stimulus presentation was controlled, and response time and accuracy were recorded using E-Prime software.
Data from the lexical decision experiment in Chinese are taken from a study that also included a symbol detection task, and revealed task by stimulus interactions in this population [19], and data from the English lexical decision experiment are taken from a parallel study [26].

MRI data acquisition
Functional and anatomical images were collected using 3T Siemens Magnetom TrioTim syngo MR systems, with 12-channel head coils, using identical data equipment and data collection parameters at all three sites. Functional images were collected using a gradient-recalled-echo echo-planar imaging sequence sensitive to the BOLD signal. Forty-one axial slices were collected with the following parameters: TR = 2500 ms, TE = 30 ms, flip angle = 90°, FOV = 20 cm, matrix = 64 x 64, 3mm thickness, yielding a voxel size of 3.125 x 3.125 x 3mm, interleaved slices with no gap. Following the acquisition of functional data, high resolution T1-weighted anatomical reference images were obtained using a 3D magnetization prepared rapid acquisition gradient echo (MPRAGE) sequence, TR = 2530 ms, TE = 3.45 ms, flip angle = 7°, FoV = 25.6 cm, matrix = 256 x 256 with 1 mm thick sagittal slices.

MRI data analysis
Functional data were analyzed using AFNI ( [27], program names appearing in parentheses below are part of the AFNI suite). Cortical surface models were created with FreeSurfer (available at http://surfer.nmr.mgh.harvard.edu/), and functional data projected into anatomical space using SUMA ( [28 29], AFNI and SUMA are available at http://afni.nimh.nih.gov/afni).
Surface-based spatial normalization of anatomical and functional data was accomplished using Freesurfer [30] and SUMA [29]. Anatomical data were reconstructed (to3d), and a surface model for each participant was made with Freesurfer: cortical meshes were extracted from the structural volumes, and then inflated to a sphere and registered anatomically [30]. Using the surface atlas, an averaged subject was created by averaging surfaces, curvatures, and volumes from all participants both from Chinese and English. The averaged surface was converted into SUMA [29] as a standard mesh on the SUMA surfaces. The standard mesh was then converted to a volume and transformed to Talairach space (@auto_tlrc), using the N27 template [31] for visualization and reference purposes. Functional data were normalized by transforming volumes resulting from AFNI into surface representations using the standardized surfaces, and computing averages over surfaces. Reported Talairach coordinates are reported based on creating a 2x2x2mm AFNI volume from the average surface in each experiment (3dSurf2Vol).

Preprocessing
The same preprocessing was conducted for data for both experiments. After reconstructing 3D AFNI datasets from 2D images (to3d), the anatomical and functional datasets for each participant were co-registered using positioning information from the scanner. The first 3 volumes were discarded, and functional datasets preprocessed to correct slice timing (3dTshift) and head movements (3dvolreg), reduce extreme values (3dDespike) and detrend linear and quadratic drifts (3dDetrend) from the time series of each run, with no smoothing or filtering.

General linear models
Preprocessed data were analyzed in general linear model (GLM, 3dDeconvolve) for each experiment separately. The model for Experiment 1 included hypothetical hemodynamic response functions (HRFs) constructed by convolving the onsets and durations of reading blocks with a model HRF (waver) to test for task vs. rest effects, along with six regressors of no interest for head movement, and an additional regressor for the listening task. The model for Experiment 2 included a regressor constructed by convolving the onsets of trials in the lexical decision task with a gamma function HRF (3dDeconvolve), to compute task vs. rest effects, along with six regressors of no interest for head movement and an additional regressor for filler trials.

Group analysis
Group analyses were conducted in the standard surface space [30]. In order to remove site effects from the data, FIRST-BIRN algorithms (from BXH/XCEDE tools (1.10.3) available at http://www.nitrc.org/projects/bxh_xcede_tools/) were used to estimate the signal-to-fluctuation-noise ratio (SFNR) for each participant's entire data set, for inclusion as a covariate in all group-level analyses. This procedure has been shown to dramatically reduce effects introduced by variability between scanners at different sites [32]. Activation maps and regions reported as active in tables were obtained by first thresholding individual voxel at p <. 005 (uncorrected), and then applying a subsequent cluster-size threshold based on Monte Carlo simulations (AlphaSim), resulting in a corrected threshold of p <. 05.

Region of Interest (ROI) selection for Language x Task analyses
In order to test for interactions between task and language, we selected four regions that are often discussed in the literature as distinguishing Chinese reading from English: left middle frontal gyrus (MFG), left posterior superior temporal gyrus and sulcus (pST) and a portion of ventral occipito-temporal cortex (vOT, or fusiform gyrus, FFG) associated with the processing of visual word form (in the left hemisphere) as well as its right hemisphere homologue. Regions of interest were taken from three meta-analyses: Tan et al. [11], Bolger et al. [21], and Wu et al. [33]. In the case of MFG, we observed that the same coordinates were identified as peak activations for Chinese readers in the Tan et al., and Wu et al. analyses. Tan et al. specifically observed that this region was not active for English readers, and discussed the implications of this finding at length (see also [34]). These coordinates were not identified as a peak in the Bolger et al. analyses, but activation in the same location, which was contiguous with activity in the inferior frontal gyrus, was identified in that study. Thus, we selected a single set of coordinates as most representative of the Chinese-specific MFG activation across studies.
In the case of left and right FFG, and left pST, multiple loci were identified across meta-analyses, raising the problem of how to create representative ROIs for these regions. In each case our first step was to collect all of the loci identified in the meta-analyses as corresponding to the gross anatomical location under consideration. We then created 6mm spheres around each locus, and combined voxels identified as overlapping across those loci to create ROIs. In Table 1, we present all of the loci used, data for both tasks and both languages from each, and indicate which overlapping regions were combined to create the ROIs.
Each ROI was used as a mask to extract the mean coefficient (3dmaskave) for reading > rest contrast across languages and tasks. The task by language interaction was analyzed in each ROI via 2 x 2 ANCOVA, with SFNR for each participant as a covariate.

Whole brain ANOVA analysis for Language x Task
To further identify brain regions for interaction between languages and task, each participant's GLM dataset was mapped onto the averaged subjects' surface. For each node of the surface, a language-by-task ANOVA analysis (3dANOVA2) was conducted. The resulting surface was mapped back (3dSurf2Vol) onto the volume dataset of averaged subject. The brain regions for language-by-task interaction were acquired by setting the threshold at voxel-wised p <. 005, and cluster-wise p <. 05.

Experiment 1: Naturalistic reading
In order to compare activity between the two languages during naturalistic reading, matched groups of monolingual Chinese and English readers read and listened to a set of children's stories in their respective languages while we collected fMRI data. Stories were presented phrase by phrase at a constant rate based on preliminary data from a self-paced reading task, as shown in Fig 1. Long (approximately one minute) blocks of reading were interspersed with equal intervals during which audio recordings of excerpts from the same story were presented (picking up where the printed text left off). Participants had no explicit task while reading, but were asked simple comprehension questions at the end of each story. Task vs. rest contrast for each language. A general linear model (GLM) analysis of BOLD responses to reading compared to rest revealed highly similar maps across groups (Fig 1C,  Table 2). For both scripts, robust activation was observed throughout visual regions in the occipital and temporal cortices bilaterally, in regions associated with semantic processing throughout the temporal lobe, and in temporal and frontal regions associated with phonological processing. Regions previously identified as playing a particular role in reading were identified in both groups, including the left FFG "visual word form area" [35,36], a posterior middle temporal (pMTG) region associated with mapping from print to meaning [37 38], and a pST region associated with mapping from print to speech [9,10]. Regions previously identified as specific to the Chinese reading network, such as the MFG and right FFG [11,21,33] were strongly activated for both languages.
Activity in many areas associated with reading (and language more generally) was strongly bilateral for both groups, as is often the case in comparisons of language tasks against rest. This pattern is likely to include activation related to a heterogenous collection of processes ranging from basic word recognition to sentence and discourse comprehension, along with attendant perceptual, memory, and reasoning processes. It is beyond the scope of the current analyses to disentangle these from one another, but we note that some discourse processes do seem to be right lateralized (see, e.g., [39]), whereas contrasts designed to identify reading-specific processes typically produce left-lateralized maps.
Negative activations shown in Fig 2C and Table 2 are generally typical of "default mode network" (e.g., [40]) including precuneus, medial temporal cortex, posterior cingulate and anterior supramarginal gyrus. Deactivation was also observed throughout the insula. It is important to note that many functions associated with the default mode network-such as prospective memory and theory-of-mind reasoning (e.g., [41,42])-are also potentially important for discourse comprehension. Thus, the large effect of greater activity during the rest periods than during reading, conceal more transient, positive activations throughout these regions associated with discourse comprehension.
Direct contrast between the two languages. A direct contrast between Chinese and English (Table 3) revealed robust differences in visual areas of the occipital and temporal lobes, Language Differences in Brain for Reading consistent with the greater visual complexity of Chinese characters relative to English words. The precuneus is negatively activated in both languages, consistent with its participation in the default mode network, as discussed above; this region is more strongly deactivated for English than for Chinese, leading to a positive (Chinese > English) activation in the difference image. The region exhibiting this contrast is contiguous with inferior visual regions, where the positive activation is due to stronger activation in Chinese than in English. Activity in MTG, along with a number of semantic processing regions in temporal cortex, was also greater for Chinese than for English, consistent the view that reading in Chinese makes greater use of mapping from orthography to semantics than reading in English [7]. A small portion of precentral gyrus was also more strongly associated with Chinese than English, consistent with evidence for more automatic engagement of motor processes associated with writing during reading for complex characters than alphabetic text [43,44]. Language differences were also observed in bilateral FFG: readers in both groups showed robust activity throughout the ventral occipital cortex, but this activity was stronger in Chinese than in English. In contrast with prior studies of language differences in reading, no differences were found in MFG or pST.

Experiment 2: Lexical decision
The results of Experiment 1 are consistent with models in which the same cognitive processes are involved in reading for both English and Chinese. Such models predict that reading will engage a similar network of regions across languages, but with relatively greater dependence on semantic processing during Chinese reading, because of the statistical structure of mappings from orthography to phonology [7]. These data are difficult to reconcile with prior studies showing qualitative differences in the reading network between the two languages. We therefore hypothesized that language differences observed in prior studies might in fact reflect a language by task interaction; that is, differences in the apparent functional organization of the reading system between Chinese and English are largest during metalinguistic tasks. To test this hypothesis, in Experiment 2, we collected data from parallel lexical decision tasks conducted in English and Chinese. We first asked whether this task would replicate prior differences (showing main effects of language in MFG, FFG and pST), and then considered data from the two tasks in a factorial design to test directly for interactions between task and language.
Task vs. rest contrast for each language. In these lexical decision experiments, agematched groups of native English and Chinese readers were presented with stimuli that varied in graduated levels of word-likeness from real words to scrambled text, and asked to decide on each trial whether the stimulus was a real word/character or not. As shown in Fig 2B, it is not entirely possible to match these stimulus manipulations across scripts. For this reason, and in order to parallel the analyses of the natural reading data, we tested for differences between the two languages in activity for task vs. rest.  Overall patterns of activity ( Fig 2C, Table 4) included many of the reading-related regions identified in the naturalistic story reading, including bilateral vOT and inferior frontal gyrus (IFG) in both languages. Additional task-related activity is observed throughout precentral gyrus, likely due to the explicit motor demands of the lexical decision task. Consistent with our prior studies of metalinguistic tasks, we found robust activity in the insula to lexical decision task; this is of particular interest given that the insula was deactivated during naturalistic story reading, and may be related to the insula's role in error-monitoring [45,46].
Deactivation was observed in the angular gyrus, and in the left superior temporal sulcus (this deactivation was restricted to the anterior portion of STS for English, and was more extensive in Chinese). This again contrasts with the results of the naturalistic reading experiment, in which STS was strongly activated for both languages. The STS, especially its anterior aspect, is typically associated with semantic processing [47] but we have observed it to be negatively correlated with the reading network in other laboratory tasks that do not explicitly require Language Differences in Brain for Reading semantic processing [22]. The dynamics of activation in this area during lexical decision are an interesting question for future research, given the influence of semantic variables on behavior in this task [17]. Direct contrast between the two languages. A direct contrast between the two languages ( Fig 2C, Table 5) reveals limited similarities between the lexical decision and naturalistic reading data in Experiment 1: large differences between Chinese and English are observed in visual areas of the occipital and temporal lobes. Otherwise, the results more closely resemble those of prior comparisons between the two languages based on data from meta-linguistic decision tasks. Unlike naturalistic reading, lexical decision was associated with language differences in the mid-fusiform (putative VWFA) bilaterally, and the MFG (also bilaterally). In both of these cases, differences reflected greater and more extensive activation for Chinese than for English.
The language difference observed in MTG is due to a greater deactivation relative to rest observed for Chinese readers. Note that this location in MTG is anterior to the region associated with orthographic to semantic mapping that was found in the naturalistic task. A similar pattern of deactivations explains the contrast in AG. As in Experiment 1, the large contiguous "blob" encompassing inferior/lateral portions of the occipital lobe, along with the precuneus and other superior midline areas in the difference image comprises two different kinds of effect. The midline regions are strongly deactivated in English, but are neither activated nor deactivated in Chinese. The more lateral visual regions typically associated with reading-related processes show greater activation for Chinese than English (without a difference in sign) and more bilateral activity in Chinese than English, consistent with prior fMRI studies of laboratory reading tasks. In addition, stronger activity is observed in superior parietal, and the post-and precentral gyri for Chinese compared to English. Thus, in striking contrast to the data from Experiment 1, a task requiring a meta-linguistic decision gave rise to robust differences between groups in regions that have previously been identified as specific to Chinese. Language-by-task interaction in whole brain analyses. We conducted a whole brain ANOVA to test for Language x Task interactions ( Table 6, Fig 3). Significant interactions were observed in a large cluster of bilateral regions including visual cortex, precuneus, superior parietal lobule, angular gyrus and fusiform gyrus, as well as a number of regions at superior pre-/ post-central gyrus, supramarginal gyrus, anterior and middle temporal gyrus, middle frontal gyrus and insula.
Two types of Language x Task interaction were observed (Fig 3). One group of regions activated stronger for Chinese than English during lexical decision task, while they showed a smaller or null language effect during naturalistic reading. Those regions included FFG, MFG, insula, pre/post-central gyrus, as well as a region near anterior supramarginal gyrus. Another group of regions, including posterior MTG and angular gyrus, are deactivated for Chinese LDT, but less deactivated for English LDT, whereas during naturalistic reading they are positively activated in both languages, with no difference between Chinese and English.
Comparison of naturalistic and artificial task data. The whole-brain analyses revealed a very large network of regions that produce a Language x Task interaction. In order to test whether specific regions identified in the literature show this interaction, we selected a set of regions of interest from meta-analyses of the Chinese reading literature, and tested for the same interaction based on ROI analyses (Fig 4). Regions were selected based on prior meta-analyses that have identified differences between alphabetic and morphosyllabic scripts [11,21], and a meta-analysis of Chinese reading [33].
Interactions in which language effects are seen only during the lexical decision task imply that language differences in the tested region are related to meta-linguistic task demands. This pattern was found in the MFG, where the Language x Task interaction was reliable F (1,56) = 11.29, p <. 005. The effect of Language for lexical decision was significant, F (1,28) = 18.12, p <. 001, but the Language effect for naturalistic reading was not, F (1,28) < 1. A similar pattern was observed in right FFG, Language x Task, F (1,56) = 25.78, p <. 001, Language effect for lexical decision, F (1,28) = 40.36, p <. 001, Language effect for naturalistic reading, F (1,28) = 2.11, p =. 15.
In the left FFG, there was a significant Language by Task interaction, F (1,56) = 151.35, p <. 001. Although significant Language effects were found in both contexts, with Chinese greater Language Differences in Brain for Reading than English, the Language effect for lexical decision, F (1,28) = 46.84, p <. 001 was nonetheless stronger than for naturalistic reading, F (1,28) = 5.60, p =. 025. Thus, although the effects are much larger under task demands that bias orthographic processing, small but reliable differences are observed in this region during naturalistic reading, consistent with the both the greater visual complexity of the stimuli [48] and their more complex mappings to phonology and semantics [18]. The ROI identified in left pST, when mapped to the surface, produced distinct regions in STG and MTG. Because these regions are thought to be functionally distinct, an anatomical mask was used to divide this ROI into superior and middle temporal portions. In both of these ROIs, a very different pattern was observed from the MFG and FFG, such that there was no interaction between Language and Task, Fs < 1. Instead, a main effect of Task was observed in both STG, F (1, 56) = 21.73, p <. 001, and MTG, F (1,56) = 40.59, p <. 001, such that activity was greatest during naturalistic reading, and did not differ between the two groups. This region is associated with mapping from spelling to sound in alphabetic languages [9] and is rarely  observed to be active in studies of Chinese character recognition (but see [23]). The lack of activity in this region during lexical decision for both languages is not entirely surprising, given the relatively weak demands that task puts on mapping from orthography to phonology [49].

Discussion
The overall pattern of results from naturalistic reading and lexical decision reveals substantial flexibility in the deployment of cortical resources in response to task demands. Dramatic differences in brain activity were observed across task. For example, much of the superior temporal sulcus, supramarginal and angular gyri were strongly activated during story reading, but deactivated during lexical decision. These regions are associated with various aspects of language processing, and seem to interact in important ways during spoken and written language comprehension [50][51][52] and thus it is not surprising to see their activity associated with a task that involves comprehension of an extended discourse. In contrast, within-task differences between languages were relatively subtle, generally reflecting a differential degree of activity in the same network across writing systems. Further, very different patterns of language-specificity were observed across the two tasks. Language by task interactions were observed in regions previously identified as playing a specific role in Chinese reading [11,12,53,54]. Comparisons of language differences across tasks are of particular interest, because thus far our understanding of the brain network for reading-and thus how it may differ across writing systems-has been based almost entirely on data from artificial tasks. Language differences in the reading network need to be understood in the context of interactions between task and stimulus properties [18][19][20], and we have shown here that naturalistic reading studies can add valuable data to this discussion.
During naturalistic reading, differences between English and Chinese were observed throughout the visual cortex, in the left middle temporal gyrus and ventral occipitotemporal regions bilaterally. Differences in visual regions are likely related to the greater visual complexity of Chinese characters relative to English words [12,21,54,55].
Language differences in MTG are in line with the notion that mapping from orthography to semantics plays a more important role in Chinese character recognition than in English word recognition [13,56]. This difference in affordances for orthography-to-semantics mappings arises because Chinese characters contain probabilistic cues to meaning that are unrelated to pronunciation [1]. Activity in the MTG has been shown to be task-dependent. For example, Booth et al. [13] observed MTG activity during a semantic judgment task, but not during a closely matched phonological judgment task (see also [57,58]). Thus it is interesting to note that this region is strongly activated in both languages during natural reading, and that language differences are predictable from differences in the relative demands of each script on direct mapping from orthography to semantics [7,23].
As in the story reading task, language effects in the lexical decision task were observed in visual regions. Aside from this, the differences between the brain networks for reading in English and Chinese differ sharply from what is observed under naturalistic reading. Activity in the portions of the ventral temporal cortex (fusiform gyrus or FFG) associated with visual processing were generally more strongly activated during both tasks for Chinese than for English. An anterior portion of particularly is strongly associated with visual word recognition [18,59]. Region of interest analyses of the left and right FFG revealed strong language by task interactions. Activity in this region was greater overall for lexical decision in both hemispheres, and this difference was exaggerated for Chinese readers relative to English readers.
Robust activity was observed in an anterior portion of the middle frontal gyrus for Chinese, whereas activity in this region was weaker and less extensive for English. This is consistent with prior research using artificial tasks such as phonological and semantic similarity judgment [12,13,34,60]. Activity in this region is similar across languages during the story reading task. Investigation of a region of interest based on meta-analyses of Chinese and English reading revealed a large language by task interaction in the middle frontal gyrus, consistent with the notion that language differences in this region may be driven by the meta-linguistic decisionmaking demands of this task, or by visual processing that is particularly important for processing Chinese characters in isolation [11,34].
The notion that MFG is related to visual processing demands is supported by evidence that it is strongly activated during lexical decision tasks in Korean, a transparent alphabet that has in common with Chinese only its visually dense and non-linear character structure [54,61]. On the other hand, we show here that MFG activity is strongly modulated by task (see also BOOTH, etc.). Analysis of effects of stimulus class [19] shows that activity in the MFG tracks with behavioral measures of difficulty during LDT. Notably, stimulus effects are not observed in MFG for the same stimuli in a one-back task [22], which arguably puts an even greater premium on visual processing, but requires less in the way of meta-linguistic decision-making. Thus, we would suggest that the data so far are most consistent with view that the specific activation of MFG for Chinese is related to an interaction between the visual complexity of the orthography, and task demands related to meta-linguistic decisions. These results may have implications for the relationship between MFG activity and dyslexia in Chinese, which, given the task-dependency of MFG activation, might be explained in terms of struggling readers' relative inability to flexibly deploy different components of the reading network in the face of task demands [13], rather than to a constitutive deficit localized to MFG. In any event, comparisons of results from naturalistic reading tasks in struggling readers across languages are likely to reveal interesting language differences. Activity in the posterior superior temporal cortex is often associated with mapping from orthography to phonology in alphabetic languages [10,37,62,63]. The lack of activity in this region during reading tasks in Chinese is often attributed to the lack of "assembled phonology" [3,11]. That is, mappings from written to spoken forms in Chinese are not componential in the way that alphabetic representations are. Instead, whole characters, or their phonetic components, are probabilistically associated with whole syllables [3]. This region was not activated during lexical decision in either language. Further, analyses of these data by Yang & Zevin [64] revealed no activity in pST in analyses of task vs. rest, nor did we find evidence for sensitivity to specific stimulus classes. This area was equally active in both languages during naturalistic reading, however. This is difficult to interpret, however, because the pST is involved in a wide range of language processes that are engaged by understanding coherent narratives [65] in addition to its role in spelling-to-sound mappings.
The whole brain GLM analyses show a very large Language x Task interaction in the insula as well, such that the language difference is restricted to the LD task. That activity is restricted to the LD task is not surprising, given the insula's role in error monitoring [45,46], and the language difference may be explained in terms of the lower overall accuracy in the Chinese behavioral data. Yang et al. [19] and Yang & Zevin [64] further showed that insula activity to conditions with the lowest accuracy in the lexical decision task in both languages. This is a perfect example of a region that behaves very differently under artificial task conditions and naturalistic reading.
Thus, throughout the reading network, and particularly those regions thought to differ in skilled Chinese and English readers, there are important differences between the patterns that emerge under different task demands. This does not seem to be due to differences in the overall power to resolve contrasts across tasks; consideration of the peak inferential statistical values and overall extent of activation in Tables 2 and 4 suggests that the block design used in Experiment 1 produces much stronger contrasts than Experiment 2, and yet between-language contrasts in left MFG and bilateral FFG are greatest in Experiment 2. Instead, it seems as though the overall reorganization of patterns of brain activity in response to task demands can interact with the affordances specific writing systems provide for meeting those demands.
In Chinese, it is possible to construct pseudo-characters that have cues to both phonology and semantics, whereas in English, pseudo-words based on the monomorphemic words used in most studies contain no cues to meaning. In English, it is not possible to create orthographically word-like strings that do not have a plausible pronunciation, whereas in Chinese this is easily done. A sizeable minority of Chinese characters contains neither cues to meaning nor cues to pronunciation, and so it is possible to construct pseudo-characters that also lack these cues, but are orthographically legal. Thus, the stimulus parameters that are available for manipulation by experimenters differ greatly between the two languages. Whereas English readers can distinguish words from non-words by pooling information about spelling, pronunciation and meaning at multiple levels of description, Chinese readers are much more dependent on determining whether they recognize the particular orthographic configuration at the whole character level. This notion, that the different writing systems have different task-specific affordances, contrasts with the view that differences between patterns of brain activity observed in reading tasks reflect cultural differences in brain organization resulting from experience with different scripts. Reading connected text silently for meaning more directly simulates the behavior we wish to understand by studying reading in the brain than making metalinguistic decisions about isolated words or characters. But it is not necessarily the case that the language differences we observed in our "naturalistic" task are the "real" differences between the reading networks in the two languages. During both silent reading and lexical decision, a number of ancillary cognitive and linguistic abilities are active simultaneously, creating ambiguity about structure-function relationships across languages. For example, differences between Chinese and American English readers in midline regions and the inferior temporal lobe, well outside the typical reading network, may plausibly be related to cultural differences in narrative comprehension, as these regions are associated with discourse comprehension [66]. The activity observed in the posterior STS is also difficult to interpret. Is there some probabilistic orthography-to-phonology mapping that is, for some reason, more likely to be engaged during processing of connected text? Or is the activity observed during story reading more related to general language comprehension?
What is clear from these data is that brain activity is organized very differently during different tasks, and that language effects can interact with task. A full understanding of the reading network-and how, or whether it differs in important ways between languages-will require interpretation of main effects of language in the context of potential interactions with task demands.