Artificial Language Training Reveals the Neural Substrates Underlying Addressed and Assembled Phonologies

Although behavioral and neuropsychological studies have suggested two distinct routes of phonological access, their neural substrates have not been clearly elucidated. Here, we designed an artificial language (based on Korean Hangul) that can be read either through addressed (i.e., whole word mapping) or assembled (i.e., grapheme-to-phoneme mapping) phonology. Two matched groups of native English-speaking participants were trained in one of the two conditions, one hour per day for eight days. Behavioral results showed that both groups correctly named more than 90% of the trained words after training. At the neural level, we found a clear dissociation of the neural pathways for addressed and assembled phonologies: There was greater involvement of the anterior cingulate cortex, posterior cingulate cortex, right orbital frontal cortex, angular gyrus and middle temporal gyrus for addressed phonology, but stronger activation in the left precentral gyrus/inferior frontal gyrus and supramarginal gyrus for assembled phonology. Furthermore, we found evidence supporting the strategy-shift hypothesis, which postulates that, with practice, reading strategy shifts from assembled to addressed phonology. Specifically, compared to untrained words, trained words in the assembled phonology group showed stronger activation in the addressed phonology network and less activation in the assembled phonology network. Our results provide clear brain-imaging evidence for the dual-route models of reading.


Introduction
A key component of reading is phonological access, that is, the association of visual forms of words with their sounds. Many behavioral and neuropsychological studies have suggested two distinct routes of phonological access [1][2][3][4]. For the indirect (or assembled phonology) route, visual words are transformed into phonology through grapheme-to-phoneme correspondences (GPC). It is believed that readers of alphabetical languages mainly rely on assembled phonology, although there are further variations between shallow orthography (e.g., Italian) and deep alphabetical orthography (e.g., English). For the direct (or addressed phonology) route, phonological access either is mediated by semantics or relies on direct associations between the visual forms of words and their sounds. For logographic languages such as Chinese, phonological access mainly relies on addressed phonology. Within alphabetical languages (especially those with deep orthography such as English), it is also believed that high-frequency words and orthographically irregular words are accessed mainly through the addressed phonology route. In contrast, low-frequency regular words and pseudowords are accessed through the assembled phonology route [3].
In spite of these theoretical discussions and empirical investigations, a clear dissociation of the two phonological access pathways has not yet been established. Existing studies based on natural language materials have usually failed to reveal strong qualitative differences between the two pathways. For example, stronger activation in the left inferior frontal gyrus was consistently found when brain activities during pseudoword reading were subtracted by those during familiar word reading, suggesting this region's role in assembled phonology [10][11][12]21,23,25]. However, the reverse subtraction (familiar words minus pseudowords) showed very little difference and thus failed to identify regions for addressed phonology [21,23,25,28]. Still, the exact localization of the brain regions for the two pathways of phonological access is still under debate. For instance, the definition of a key region responsible for assembled phonology, the temporoparietal area, varied from the posterior superior temporal gyrus, angular gyrus, to supramarginal gyrus in different studies [27]. Some studies also failed to reveal any difference in the temporoparietal regions when participants read pseudowords vs. familiar words [10,11,24].
The inconsistent results across studies can be attributed to at least two causes. First, reading a natural language often involves both neural pathways, whose relative contributions depend on word familiarity (i.e., familiar words rely more on addressed phonology than do unfamiliar words) [3,29]. It is therefore not clear which pathway is used by a given reader when reading a particular word whose familiarity may vary across readers. Second, contrasts of natural language materials are often confounded by factors such as task difficulty, visual form, phonology, and semantics. Specifically, the familiar word and pseudoword conditions differ in task difficulty and in whether they involve semantic processing. Similarly, the irregular and regular word conditions, especially for low-frequency words, often differ in task difficulty (i.e., leading to the regularity effect, with regular words named faster than irregular words) [30][31][32]. Finally, logographic and alphabetic scripts differ in visual form, phonology, and task difficulty.
To overcome these limitations and to clearly dissociate the neural pathways of addressed and assembled phonologies, we created an artificial language by adopting the visual forms and phonologies from 120 Korean Hangul characters (see Fig. 1A). Korean Hangul is ideal for our purpose because of its logographic visual appearance but alphabetic orthography [33]. In other words, Korean Hangul can read either through addressed or assembled phonology with the same visual forms. Semantics were excluded to avoid any potential effects of semantic processing on phonological access. The design of the artificial language in the addressed-phonology condition was the same as that used in Xue et al. [34] except that no semantics were provided. Therefore, this study investigated the addressed phonology that is not mediated by semantic. We trained two matched groups of native Englishspeaking participants in the U.S. with either addressed or assembled phonology, one hour per day for eight days. During fMRI scans, we used two reading tasks to investigate the neural mechanisms of addressed and assembled phonologies with different demand levels of phonological access: a naming task (Fig. 1B) that requires cognitive effort and a perceptual task (Fig. 1C) that emphasizes automatic phonological access.
Two specific issues were addressed in this study. First, we examined the neural mechanisms of addressed and assembled phonologies by comparing brain activation patterns elicited by trained words in the addressed group vs. the assembled group. It should be noted that the two groups were strictly matched on behavioral performance, visual complexity, and other linguistic factors such as visual form and phonology. The effect of semantics was also removed because the artificial words were not given meanings. Second, we examined the strategy-shift hypothesis, which postulates that practice can shift word reading strategy from sub-lexical (i.e., assembled phonology) to more automatic lexical reading (i.e., addressed phonology) [3,29,35]. We compared brain activation patterns elicited by trained words versus new words whose sounds had to be assembled based on their components in the assembled group. If the strategy-shift hypothesis were true, the trained words would elicit more activation in the addressed phonology pathway than would the novel words.

Participants
Forty-three native English speakers (20 males; mean age = 21.1961.97 years old, with a range from 19 to 27 years) participated in this study. They were divided into two groups: one was trained on ''addressed phonology'' (n = 21) and the other on ''assembled phonology'' (n = 22). The two groups were matched on nonverbal intelligence (Raven's Advanced Progressive Matrices) [36] and performance on English reading tasks [word identification and word attack from the Woodcock Reading Mastery Tests -Revised (WRMT-R) [37], phonemic decoding efficiency and sight word efficiency from the Test of Word Reading Efficiency (TOWRE)] [38] (Table 1). Based on participants' self-report, 9 participants in the addressed group and 8 participants in the assembled group were monolinguals. The remaining participants considered themselves as bilingual with their second language being one of the alphabetic languages (e.g., Spanish, French, or German). None of the participants had previous experience with Korean language. Because their second language was an alphabetic language, we believed that second language orthography would not be a significant confound when we contrasted the two groups. All participants had normal or corrected-to-normal vision, had no previous history of neurological or psychiatric disease, and were strongly right-handed as judged by Snyder and Harris's handedness inventory [39]. Informed written consent was obtained from the participants before the experiment. This study was approved by the IRBs of the University of California, Irvine and the University of Southern California.

Materials
Sixty English words, 60 English pseudowords and 120 artificial language words were used in the study (see Fig. 1 for examples). All English materials were presented in gray-scale with 2266151 pixels in size, and the artificial language words were 1516151 pixels in size.
The artificial language words were constructed using 22 Hangul letters (12 consonants and 10 vowels). We selected the phonemes that are easy to pronounce for native English speakers because this study focused specifically on learning form-sound association, not on learning new phonemes. To confirm our judgment, three native English speaking college students were asked to listen to the phonemes one-by-one and assess the ease of pronouncing the phonemes on a 5-point scale (1: very difficult to pronounce; 5: very easy to pronounce). The average scores across the judges were  higher than 3 for each of the phonemes used in this study. The artificial language words were divided into two groups, one for training and the other (not trained) for examining transfer of learning. The two groups of words were strictly matched on the number of letters, as well as on the complexity and frequency of each letter. The sounds of the English materials and artificial language materials (both words and phonemes) were recorded from a native English female speaker and a native Korean female speaker, respectively. All the sounds were denoised and normalized to the same length (600 ms) and loudness using Audacity 1.3 (audacity.sourceforge.net).

Training Procedure
Using a computerized learning program, we trained participants to learn the association of visual forms and sounds of 60 artificial language words for eight days (one hour per day). Two training conditions (i.e., addressed-phonology and assembledphonology training) were designed based on the same set of materials to contrast the neural bases of addressed and assembled phonologies (see Fig. 1A). In the addressed group, participants were asked to memorize each character as a whole. Because Korean Hangul has a shallow orthography with consistent correspondence between letters and their pronunciations in words/characters, participants would implicitly acquire the grapheme-phoneme correspondence (GPC) rules through learning if we used the original pronunciations of the letters. Thus, to avoid implicit acquisition of the GPC rules, we assigned each word with a new pronunciation (borrowed from one of the 60 artificial language words used for training in the study). In the assembled group, participants first learned the pronunciations of the 22 letters one by one and then assembled the phonology of the characters from their letters. In order to encourage the use of GPC rule instead of simply memorizing the association between letters or characters and their pronunciations, 30 new characters (untrained words consisting of learned letters) were tested at the end of each training session. For both groups, several types of learning tasks were designed to facilitate the acquisition of visual forms, sounds and their associations. They included naming, naming with feedback, fast naming (reading sets of ten words randomly selected from the 60 trained words as fast as possible), and a phonological choice task (selecting the correct pronunciation for the presented word from four sounds). It should be noted that, except for the type of training, all other variables such as time-on-task were controlled across the two groups.

fMRI Task
Participants were scanned while performing two reading tasks (i.e., perceptual and naming tasks) often used in previous studies [10,24,34,42,43]. Both tasks consisted of four types of stimuli, namely English words, English pseudowords, trained artificial language words, and untrained artificial language words. Each type of materials contained 60 items. Stimulus presentation and response collection was programmed using Matlab (Mathworks) and the Psychtoolbox (www.psychtoolbox.org) on a laptop. Rapid event-related design was used for both tasks, with the five types of materials pseudo-randomly mixed. For both tasks, trial sequences were optimized with OPTSEQ (http://surfer.nmr.mgh.harvard. edu/optseq/) [44].
Two runs of the perceptual task were performed both before and after training (Fig. 1C). During each run, the stimuli were presented either in visual, auditory, or audiovisual modality. We focused on the visual modality in this paper because the purpose of this paper is to reveal the neural mechanisms of word reading.
Each trial lasted for 600 ms, with a jittered inter-stimulus interval varying randomly from 1.4 to 6.4 sec (mean = 1.9 sec) to improve the design efficiency. Participants were asked to carefully view and/or listen to the stimuli. To ensure that participants were awake and attentive, they were instructed to press a key whenever they noticed that the visual word was underlined. This happened 6 times per run. Participants correctly responded to 10.061.0 of 12 underlined words at the pre-training stage and 11.360.8 at the post-training stage, suggesting participants were attentive to the stimuli during the perceptual task.
The naming task also included two runs, which could only be performed after training (Fig. 1B). Each run consisted of 120 trials, with 30 trials for each condition. In each trial, a word was presented for 1000 ms, followed by a 1000 ms black interval. A jittered inter-stimulus interval varying randomly from 0.5 to 4 s (mean 1.2 s) was used to improve design efficiency. Participants were asked to read each visual word as fast and accurately as possible. Participants' responses were recorded through an MRIcompatible microphone connected to a laptop.

Processing and Evaluation of Oral Responses
Participants' oral responses (reading out loud) recorded from the scanner were first denoised using Audacity 1.3 (http://audacity. sourceforge.net) to remove scanner noise. The reaction time (RT) for each trial was calculated using the following formula: RT = response time point (RTP) -trial onset. The RTP was defined as the first time point of 3 continuous points (within the time window of 300-2500 ms after the stimulus onset) whose intensity was higher than one standard deviation above the mean. The RTP was first automatically identified by a computer program on Matlab, and then manually checked one-by-one by the experimenter.
To calculate the accuracy of the naming task, we had two research assistants evaluate the sounds. The agreement rates between the two evaluators were very high: 97.59% (ranging from 91.67% to 100%) for English materials and 92.75% (ranging from 81.67% to 100%) for artificial words. Items that were initially scored differently by the two evaluators were evaluated by them jointly again to make final agreed-upon decisions.

MRI Data Acquisition
Data were acquired with a 3.0 T Siemens MRI scanner in the Dana & David Dornsife Cognitive Neuroscience Imaging Center at the University of Southern California. A single-shot T2*weighted gradient-echo EPI sequence was used for functional imaging acquisition with the following parameters: TR/TE/h = 2000 ms/25 ms/90 o , FOV = 1926192 mm, matrix = 64664, and slice thickness = 3 mm. Forty-one contiguous axial slices parallel to the AC-PC line were obtained to cover the whole cerebrum and part of the cerebellum. An anatomical MRI was acquired using a T1-weighted, three-dimensional, gradient-echo pulse-sequence (MPRAGE) with TR/TE/h = 2530 ms/3.09 ms/ 10 o , FOV = 2566256 mm, matrix = 2566256, and slice thickness = 1 mm. Two hundred and eight sagittal slices were acquired to provide a high-resolution structural image of the whole brain.

Image Preprocessing and Statistical Analysis
Initial analysis was carried out using tools from the FMRIB's software library (www.fmrib.ox.ac.uk/fsl) version 4.1.2. The first three volumes in each time series were automatically discarded by the scanner to allow for T1 equilibrium effects. The remaining images were then realigned to compensate for small head movements [45]. Translational movement parameters never exceeded 1 voxel in any direction for any participant or run.
The images in the naming task were denoised using MELODIC independent components analysis within FSL [46]. An average of 11.60 components (ranging from 1 to 25) was removed from each scanning run. The images in the perceptual task were not denoised because there was little effect of head movements on the BOLD signal. All data were spatially smoothed using a 5-mm full-widthhalf-maximum Gaussian kernel. The smoothed data were then filtered in the temporal domain using a nonlinear high-pass filter with a 60-s cutoff. A 2-step registration procedure was used whereby EPI images were first registered to the MPRAGE structural image, and then into the standard (Montreal Neurological Institute [MNI]) space, using affine transformations with FLIRT [45] to the avg152 T1 MNI template.
At the first level, the data from the perceptual and naming tasks were separately modeled with the general linear model within the FILM module of FSL for each participant and each run. Events were modeled at the time of the stimulus presentation. The events' onsets and durations (600 ms for the perceptual task and 2000 ms for the naming task) were convolved with canonical hemodynamic response function (double-gamma) to generate the regressors used in the general linear model. Temporal derivatives and the 6 motion parameters were included as covariates of no interest to improve statistical sensitivity. Null events (i.e., fixation) were not explicitly modeled, and therefore constituted an implicit baseline. For the naming task, only correct responses were included in the analysis. The incorrect trials were modeled as nuisance variables to avoid their potential confounding effect. Eight contrast images (English words-baseline, English pseudowords-baseline, trained words-baseline, untrained words-baseline, English pseudowords-English words, trained words-English words, untrained words-English words, and trained words-untrained words) were computed separately by task and run for each participant.
Two second-level models (fixed-effects models) were separately constructed for the two reading tasks. For the perceptual task, training effect was calculated across the four runs (two at the pretraining stage and the other two at the post-training stage) for each condition and for each participant by using the contrast of posttraining minus pre-training. For the naming task (administered after the training only), a second-level analysis was performed to average across the two runs for each participant.
For both tasks, the data from second-level analyses were then input into the third-level analyses which included three contrasts: 1) trained words-English words in the addressed group vs. trained words-English words in the assembled group; 2) trained words vs. untrained words in the assembled group; 3) English pseudowords vs. English words for all participants. One-sample and two-sample T tests were performed for within-subject and between-subject analyses, respectively. In these analyses, the two groups' English conditions were used as their own high-level baseline to control for potential group differences in baseline activation and test-retest variability of training effect. Group activations were computed using a random-effects model (treating participants as a random effect) with FLAME stage 1 only [47][48][49]. Unless otherwise indicated, group images were thresholded with a height threshold of z.2.3 and a cluster probability, P,0.05, corrected for wholebrain multiple comparisons using the Gaussian random field theory.

Region of Interest Analysis
To examine whether the left precentral gyrus and fusiform gyrus are involved in the assembled phonology, we defined two regions of interest (ROIs) as two spheres each with a radius of 6 mm and centered at the peaks of activation found in the contrast of English pseudowords minus English words in the naming task (left precentral gyrus: x = 252, y = 0, z = 40; left fusiform gyrus: x = 246, y = 256, z = 220). The ROI analyses were performed by extracting parameter estimates (betas) of each event type from the fitted model and averaging across all voxels in the cluster for each participant. Percent signal changes were calculated using the following formula: [contrast image/(mean of run)] 6 ppheight 6100%, where ppheight is the peak height of the hemodynamic response versus the baseline level of activity [50].

Behavioral Results
Behavioral results during the post-training scan showed that both groups correctly named more than 90% of the trained words (see Fig. 2). The assembled group also correctly named more than 85% of the untrained words. These results suggest that our training was effective and the assembled group had learned the GPC rules.
We then examined the behavioral differences between trained words in the addressed group and those in the assembled group, between trained words and untrained words in the assembled group, and between English words and pseudowords, because the subsequent analysis of fMRI data would focus on these three contrasts. First, we compared the trained words in the addressed versus assembled group by performing a two-way ANOVA [material (i.e., trained words and English words) and group (i.e., addressed and assembled groups)]. For reaction time (RT) the two groups did not show significant differences in the trained words (group: F(1,41) = 1.52, n.s.; group-by-material interaction: F(1,41) = 2.45, n.s.). For accuracy (CR), the assembled group performed better than the addressed group on the trained words, while there were no significant group differences for English words (group-bymaterial interaction: F(1,41) = 10.89, p,.01). These results suggest that RT is matched between the trained words across the two groups. The differences in CR between the two conditions should not affect the subsequent fMRI analysis because we only included the correctly named words in the fMRI analysis.
Second, we compared the trained words and untrained words in the assembled group. Results showed that participants performed better on trained words than on untrained words, as indicted by the significant higher accuracy (t(21) = 7.45, p,.001) and shorter RT (t(21) = 5.26, p,.001). Potential effect of different RT on the BOLD response in this contrast was controlled statistically by adding RT as covariant in subsequent fMRI analysis.
Finally, we compared behavioral performance for English words versus pseudowords in the addressed and assembled groups. For both reaction time (RT) and accuracy rates (CR), participants performed better reading English words than pseudowords (RT: F(1,41) = 349.76, p,.001; CR: F(1,41) = 7.20, p = .01). Any potential effect of different RT on the BOLD response in this contrast was also controlled statistically by adding a covariate of demeaned RT in subsequent fMRI analysis. More importantly, the addressed group and assembled group did not show any significant differences for both RT (group: F(1,41) = 0.01, n.s., group-by-material interaction: F(1,41) = 0.10, n.s.) and CR (group: F(1,41) = 1.65, n.s., group-by-material interaction: F(1,41) = 2.46, n.s.), which further confirmed that the two groups were matched on English reading performance.

Neural Bases of Addressed and Assembled Phonologies
To reveal the neural bases of addressed and assembled phonologies, we first compared neural activities elicited by the trained words in the addressed group (relying on addressed phonology) with those elicited by the trained words in the assembled group (relying on assembled phonology) in the naming task. The BOLD responses in the English word condition were used as high-level baseline to control for group-related variability in BOLD response. As noted in the ''Behavioral Results'' section, behavioral performance on trained words in the naming task was matched in the two groups. Consequently, regions showing stronger activation for the addressed group's trained words were deemed as responsible for addressed phonology, whereas those showing stronger activation for the assembled group's trained words as responsible for assembled phonology. Results showed that greater activations for the assembled group were found in the left supramarginal gyrus [SMG, extending to superior occipital gyrus (SOG)] (see Tables 2 & 3), whereas greater activations for the addressed group were found in the right orbital frontal cortex (OFC) and middle temporal gyrus (MTG) (see Tables 4 & 5 and Fig. 3A). Anterior cingulate cortex (ACC), posterior cingulate cortex (PCC), and right angular gyrus (AG) also showed more activation in the addressed group.
We then used the data from the perceptual task to compare the training effects for the trained words in both addressed and assembled groups. The English word condition was used as the baseline to control for test-retest fluctuations of the BOLD response. Specifically, the training effect was defined as follows: post-training contrast (i.e., trained words -English words) minus pre-training contrast (i.e., trained words -English words). Consistent with the results of the naming task, the left SMG showed greater activation for the assembled group than the addressed group (see Fig. 3B and Tables 2 & 3), but no regions showed more activation for the addressed group than the assembled group. The bilateral precentral gyrus [PCG, extending to the inferior frontal gyrus (IFG)] also showed more activation for the assembled group. Regions showing greater activation for addressed phonology than for assembled phonology in the naming task were not replicated in the perceptual task probably because of its lower demand on phonological access.
Finally, we compared the neural activities for English words versus pseudowords, i.e., the contrast that was often used by previous studies [10][11][12]21,23,25], in both naming and perceptual tasks. We found English pseudowords elicited stronger activation in the left PCG/IFG, fusiform gyrus (FG), and bilateral inferior occipital gyrus (IOG) than English words (Table 2 and Fig. 3C&D). In contrast, English words showed stronger activation in an extensive network than pseudowords in the naming task, including ACC, PCC, right OFC, bilateral middle frontal gyrus (MFG), AG, and MTG (Tables 4 & 5 and Fig. 3C). However, no regions showed more activation for English words than pseudowords in the perceptual task (Fig. 3D).
In the above whole-brain analyses, the left PCG/IFG and FG showed more activation for English pseudowords than words. However, those two regions did not show greater activation for trained words in the assembled group than those in the addressed group in the naming task. To examine whether those two regions are involved in assembled phonology, we further extracted the percent signal changes from those two regions in the naming task. In the PCG/IFG, trained words in the assembled group elicited greater activation than those in the addressed group, but the activations for English words did not differ across the two groups (group-by-material interaction: F(1,41) = 5.92, p,.05) (Fig. 3E), suggesting the left PCG/IFG is responsible for assembled phonology. In the left FG, the two groups did not show any differences for either English words or trained words (group-bymaterial interaction: F(1,41) = 0.09, n.s.) (Fig. 3E), suggesting the left FG is involved in both addressed and assembled phonologies.

Practice Shifts Word Reading Strategies from Assembled Pathway to Addressed Pathway
Previous behavioral studies suggest that practice can shift participants' reading strategies from sub-lexical to more automatic lexical reading [3,35]. We therefore hypothesized that the activation pattern for trained words in the assembled group should show more activation in regions for addressed phonology and less activation in regions for assembled phonology than untrained words in the assembled group. This hypothesis was confirmed: activation in the addressed phonology network (i.e., PCC, left MTG, and left AG) was stronger for trained words than for untrained words (Table 4 & 5 and Fig. 4A). In contrast, activation in the assembled phonology network (i.e., bilateral PCG/IFG and SMG) was weaker for trained words than for untrained words (Table 2). These results suggest that with increasing word familiarity, reading had a greater reliance on the addressed phonology pathway.
Similarly, in the perceptual task, training-induced increases in the addressed phonology network (i.e., bilateral MTG) for the trained words in the assembled group were greater than those for untrained words in the assembled group (see Fig. 4B and Tables 4 & 5). In contrast, training-induced increases in the assembled phonology network (i.e., the left PCG/IFG and SMG) were greater for the untrained words than the trained words in the assembled group, although they did not survive the stringent whole-brain correction perhaps due to the lower phonological demands of the perceptual task.

Discussion
Using an artificial language training paradigm, we found clear evidence for separate neural substrates underlying addressed and Figure 3. Neural pathways of addressed and assembled phonologies. Brain maps of trained words in the addressed group vs. those in the assembled group in the naming task (A) and in the perceptual task (B); brain maps of English words vs. English pseudowords in the naming task (C) and in the perceptual task (D). Red indicates regions showing more activations for the first element than the second in each contrast, while green indicates the reverse (second . first element). All activations were thresholded at z.2.3 (whole-brain corrected) and rendered onto PALS-B12 atlas [67,68] via average fiducial mapping using caret software [69]. Bar   assembled phonologies: (1) addressed phonology depended more on the ventral pathway, including the anterior cingulate cortex, posterior cingulate cortex, right orbital frontal cortex, angular gyrus and middle temporal gyrus, whereas assembled phonology relied more on the dorsal pathway, including the left precentral gyrus/inferior frontal gyrus and supramarginal gyrus; (2) training of addressed and assembled phonology increased activations in the respective pathways; and (3) the recruitment of the two neural pathways was modulated by word familiarity-familiar words recruited more of the addressed pathway and less of the assembled pathway than did novel words. Compared with previous studies that relied on the contrasts of natural language materials [7,[10][11][12][19][20][21][22], the artificial language training paradigm used in this study has several advantages: (1) words for the addressed and assembled groups were constructed using the same set of words, and consequently strictly matched on visual form and phonology; (2) the two groups were trained using the same procedure, matched on the number of repetitions and overall learning time; (3) semantic effects were removed because the artificial words were not given meanings; and (4) the contrast between addressed phonology (i.e., trained words in the addressed group) and assembled phonology (i.e., trained words in the assembled group) to a great extent reduced co-activations of the two reading routes. Consistent with these arguments, our results showed that the contrast between the trained words in the addressed group and those in the assembled group revealed much Table 3. Cluster size (number of voxels) and cluster-level significance of brain regions for assembled phonology.  clearer results than the contrast between familiar words and pseudowords.
Results of our study made two significant contributions to the literature on the neural bases of reading. First, the artificial language training paradigm allowed us to better specify (and to resolve some related debates about) the neural substrates involved in the dual routes of phonological access during reading. In general, the two distinct neural pathways associated with addressed and assembled phonologies found in this study are consistent with the results of a previous meta-analysis on natural language materials [27], as well as previous findings of differential engagement of the ventral and dorsal neural pathways in lexical access and phonological processing [51][52][53][54]. Our results together with previous studies provide strong support for the dual-route models of reading, although they are not able to differentiate between localist (e.g., the dual-route cascaded model) [3] and connectionist models (e.g., the connectionist dual process model) [55,56].
More importantly, our results may help resolve the continuing debate of the functional localization of the left temporoparietal cortex in assembled phonology. Although there is a general consensus about the involvement of the left temporoparietal cortex in grapheme-to-phoneme conversion [57][58][59][60], the exact location of temporoparietal activation varies greatly across existing studies (from the left posterior superior temporal gyrus (STG), AG, to SMG) [27]. Early lesion studies labeled the left AG as the center for grapheme-to-phoneme conversion [60]. However, the latest neuroimaging studies have suggested that the left AG and the left posterior STG (and adjacent SMG) are engaged in semantic and phonological processing, respectively [51,[60][61][62]. Consistent with this view, a recent meta-analysis on existing neuroimaging studies of dyslexia found consistent under-activation in the left STG and SMG for impaired readers [63]. A recent transcranial magnetic stimulation (TMS) study also found selective disruptions of phonological processing when TMS was applied over the SMG, but not when it was applied over AG [64]. Similarly, a recent neuroimaging study revealed a critical role of the left SMG as well as posterior MTG in orthography-phonology mapping after controlling for potential confounds such as task difficulty, word frequency, spelling-sound consistency, imageability, and length in letters [51]. In sum, the literature is mixed in terms of whether the STG is involved in assembled phonological access. When we used artificial language training to control for problems such as coactivations of both phonological routes and inherent variations in natural language materials, our results showed that activations in the left SMG, but not in the left posterior STG, were greater for assembled phonology than for addressed phonology. These results suggest that STG may not play a critical role in assembled phonology, and its activations in previous studies with natural languages might have been due to co-activations. It appears that the left SMG may be the main region involved in assembled phonology or orthography-phonology mapping. Table 5. Cluster size (number of voxels) and cluster-level significance of brain regions for addressed phonology.   Another refinement our results may provide to the dual-route model concerns the role of semantics. In the dual-neural-route model of reading, Jobard et al. [27] has proposed that regions in the ventral pathway are responsible for semantic access. There is ample evidence for the involvement of the ventral pathway in semantic access, but it is not clear whether the pathway is involved only in semantic-based access. By training our participants only in orthography-grapheme mapping (i.e., no semantics), we found that the ventral pathway (e.g., OFC, MTG) was involved in addressed phonology without the mediation of semantics. This finding poses an interesting question for future research: What are the overlapping and dissociated neural mechanisms between the semantic and nonsemantic routes in addressed phonology? This question can be addressed by comparing words trained with and without semantics. Results from such studies can help further refine the dual-route cascaded model (DRC) [3].
In addition to the refinements and specifications made to dualroute models of reading, the second contribution of our study is to provide direct experimental evidence for neural changes associated with the switch from assembled phonology to addressed phonology as a result of learning. Some dual-route models of reading [3,29] have proposed that practice can shift participants' reading strategies from sub-lexical (i.e., assembled phonology) to more automatic lexical reading (i.e., addressed phonology). In particular, the phonologies of low-frequency regular words in alphabetic languages are mainly accessed through the assembled phonology route, whereas those of high-frequency words are mainly accessed through the addressed phonology route. In support of this view, our results found that, compared to untrained words, trained words elicited more activations in the addressed pathway and less activations in the assembled pathway. This result clearly shows the shift in neural pathway of reading as a result of increased familiarity with the new words. These results might also help to reconcile existing mixed findings in previous natural language studies that could have involved the co-activations of the two reading routes.
Three limitations of this study should be discussed. First, this study used a between-subject design to avoid the interference between addressed and assembled conditions. Although we matched the two groups of participants carefully, they might still have some differences that might potentially affect the results in this study. Future studies should consider a within-subject design to confirm the findings in this study. Second, trained and untrained words in the assembled group differed in task difficulty. We tried to remove the potential effect of task difficulty in the fMRI analysis by adding a covariate of RT. However, such a statistical control cannot eliminate all potential confounds. Finally, the present study successfully identified the neural pathways associated with the addressed and assembled phonologies by relying on the methodological merits of an artificial language training paradigm. However, the artificial language used in this study was different from natural languages in several important aspects. These differences might limit the generalization of our findings to natural languages to some extent. First, unlike natural languages, the artificial language used in this study only had a limited vocabulary size, which would eliminate some welldocumented effects such as the neighborhood effect and regularity effect, and would impede the acquisition of the inherent structures of words such as the combination of letters (i.e., bigram, trigram). Second, although participants' accuracy in naming the artificial language words was generally high, their reading speed was still much slower than word reading in their native language. In other words, the two reading mechanisms found in this study were based on an early stage of word reading. There is evidence that reading networks are switched from the left temporoparietal cortex to the left occipitotemporal ventral areas with the improvement of reading skill [65,66]. Therefore, the early and late stages of reading might differ in the engagement of the two neural routes. Future research on natural language development would help to clarify that question. Finally, as noted before, the DRC model [3] has proposed that addressed phonology can be accessed either directly (i.e., nonsemantic route) or mediated by semantics (i.e., semantic route) in natural language. Artificial language without semantics used in this study prevented us from exploring the neural substrate of the semantic route and examining the neural overlap and dissociation between the semantic and nonsemantic routes.
In sum, by using an artificial language training paradigm and thus overcoming the limitations of natural language materials, our study provides (1) a refined picture of neural substrates for addressed and assembled phonologies and (2) direct evidence of neural mechanisms involved in the strategy shift from assembled to addressed phonology during the process of learning to read.