Language selective or non-selective in bilingual lexical access? It depends on lexical tones!

Xin Wang; Bronson Hui; Siyu Chen

doi:10.1371/journal.pone.0230412

Abstract

Much of the literature surrounding bilingual spoken word recognition is based on bilinguals of non-tonal languages. In the Mandarin spoken word recognition literature, lexical tones are often considered as equally important as segments in lexical processing. It is unclear whether and how lexical tones contribute to bilingual language processing. One recent study demonstrates that tonal bilinguals require the availability of both tonal and segmental information to induce cross-language lexical competition during bilingual lexical access, even without phonological overlap between the target and non-target language. The current study investigates whether overt phonological overlap between the target and non-target language would equally require both tonal and segmental information available to induce cross-language lexical competition. We employed two auditory lexical decision experiments with both Mandarin-English bilinguals and English monolinguals to test whether inter-lingual homophones (IH) would induce lexical competition from the non-target language, L1 Mandarin. Our results show that cross-language lexical competition was only observed with the presence of lexical tones, in addition to segmental overlap.

Citation: Wang X, Hui B, Chen S (2020) Language selective or non-selective in bilingual lexical access? It depends on lexical tones! PLoS ONE 15(3): e0230412. https://doi.org/10.1371/journal.pone.0230412

Editor: Caicai Zhang, The Hong Kong Polytechnic University, HONG KONG

Received: September 12, 2019; Accepted: February 21, 2020; Published: March 23, 2020

Copyright: © 2020 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This project was funded by John Fell OUP (EPD06550) from University of Oxford and Hong Kong Mobility Scholarship Fund (99483509) from Macquarie University to Dr Xin Wang. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the domain of spoken word recognition, recognizing an auditory stimulus is considered a process of matching spoken input with mental representations associated with word candidates, and then selecting the best candidate amongst those activated, each of which will be at least partially consistent with the input. Most theories of spoken word recognition (e.g., the Cohort model and the TRACE model) centre on the debate regarding how the sensory input activates lexical representations and how the best candidate is selected by eliminating alternatives [1–5]. Nevertheless, all current theories of spoken word recognition acknowledge the need to account for competition among candidates for lexical access and selection. These theories have been shaped by considerable evidence showing that words that reside in sparse phonological neighbourhoods (e.g., wolf has only a few phonologically similar words: woof, wooly, and wool.) are recognized more easily than words that reside in dense phonological neighbourhoods (e.g., cat has many phonologically similar words: bat, at, cab, rat, chat, that, mat, cattle, pat, gnat, and so on) [6–7]. This difference indicates that our lexical processor needs not only to interpret the unfolding sensory input, but also to inhibit the activation of non-target candidates. This inhibitory effect, which is required for successful spoken word recognition, has been extended to issues in bilingualism in order to understand how bilinguals recognize spoken words in one language that sound similar to words in the other [8–12]. In general, two main questions are being explored in this area. First, when bilingual listeners hear spoken words (especially interlingual homophones), do they activate word candidates in both languages or only those in the target language? Second, do bilingual listeners use language-specific phonetic features to select the word candidates in the target language?

The first issue has been widely discussed and debated in the bilingual literature, especially in the areas of bilingual visual word recognition [13] and bilingual language production [14]. In general, the debate has centered on two opposing views: a language selective view which predicts that linguistic input in one language should only activate the target language; and a language non-selective view which predicts that linguistic input in one language can induce co-activation of both languages. As a result, a language selective view would imply separate lexicons for two languages, while a language non-selective view suggests that the bilingual lexicon is integrated [15]. In the domain of visual word recognition, there is ample evidence showing that bilingual lexical access is language non-selective for recognizing inter-lingual homographs or cognates [16–20]. That is, visual input in one language can activate a bilingual’s other language, evidenced by faster or slower responses to inter-lingual homographs or cognates, depending on the task. In particular, cross-language masked priming studies have presented a strong test of language non-selectivity in bilingual lexical access for both within-script and cross-script readers who effectively processed prime-target pairs in different languages even being unaware of the existence of the prime words [21–28].

Bilingual spoken word recognition

In the auditory domain, however, both empirical and modeling efforts to understand bilingual word recognition are more limited. It is less clear whether language non-selectivity equally applies in the auditory domain during bilingual lexical access, in particular, due to the rich sub-lexical cues encoded in auditory input. In fact, there are good reasons to test this, because visual input differs from auditory input in the following ways [10–11]. First, many languages, such as English and Spanish, use the same writing system such that the visual input, especially cognates or inter-lingual homographs, does not differentiate language membership, while the auditory input contains acoustic-phonetic cues that differentiate language membership (e.g., Spanish and English differ from each other in voice onset time). Second, the time course of stimulus presentation is different across the visual and auditory modalities: visual letters of a written word are usually presented simultaneously, while spoken words unfold over time. In fact, some evidence suggests that bilinguals might adopt language-specific processing strategies in speech perception, on the basis of language-specific acoustic-phonetic cues [29–30]. With this in mind, it is reasonable to posit that bilinguals might use language-specific cues to guide lexical access.

One line of research has adopted the visual world paradigm to investigate whether phonologically similar words across languages can activate both languages even under a monolingual experimental situation [9], [12], [31]. In this experimental procedure, bilingual participants are instructed in either L1 or L2 to move or click on target objects in a visual array and their eye fixations to objects within the array are recorded while they perform this task. Typically, each display consists of 4 different objects: the target object (e.g. the speaker); the cross-language competitor whose name in the non-target language is phonologically similar to the target (e.g., ‘the match’ in Russian, pronounced as /spi:tʃki/); and two filler objects. For example, Spivey and Marian [31] reported strong cross-language competition effects using this paradigm, reflected by a larger proportion of fixations to the cross-language competitor objects than to the phonologically unrelated distractors. That is, bilingual participants looked at the competitors for longer (52% vs. 37% of the time) and/or more often (31% vs. 13%) than the phonologically unrelated distractors. These differences were interpreted as the result of the spoken input activating both the target and the cross-language competitor. In turn, this was taken as support of the language non-selective view.

However, other researchers have stressed the importance of other factors that might constrain language selectivity or non-selectivity in auditory word processing. First, Weber and Cutler [12] demonstrated that language selectivity or non-selectivity in bilingual word recognition might depend on language status (i.e. whether the target language is L1 or L2). Adopting the same eye-tracking technique, they found that Dutch-English bilinguals fixated longer to competitor objects only when the target language was L2 English (e.g., the English target kitten activated the Dutch competitor kist), but not when the target language was L1 Dutch (e.g., the Dutch target kist failed to activate the English competitor kitten). This asymmetry suggests that co-activation in bilingual word recognition is not unconditional, but rather depends on the characteristics of the auditory input. Second, using the same experimental paradigm, Ju and Luce [9] found that Spanish-English bilinguals fixated cross-language competitors (i.e., the non-target object whose English name pliers is phonologically similar to the Spanish target playa 'beach') more frequently than phonologically unrelated distractors (e.g., ojo 'eye'), but only under the condition where the Spanish target words were altered to have an English appropriate voice onset time (VOT). These results suggest that bilingual listeners might be sensitive to language-specific cues so as to guide lexical access.

Other experimental paradigms have also been used to investigate the issue of language selectivity or non-selectivity in the auditory domain. For instance, in a cross-modal priming paradigm, Dutch-English bilinguals completed a lexical decision task on visual word targets that were preceded by auditory word primes [11]. The critical conditions for comparison were: (1) targets were inter-lingual homophones of primes (IHs, e.g. lease primed by /liːs/, where lease sounds like lies ‘groin’ /liːs/ in Dutch); and (2) non-IH prime-target pairs (e.g. frame primed by /freɪm/, which is not a lexical item in Dutch). The authors found that reaction times for the IH condition were significantly slower than those for the non-IH targets. Given that longer reaction times reflect a larger cohort [32], the results showed that the cohort size formed upon hearing the IH (e.g., /liːs/) was larger than that formed upon hearing the control prime (e.g, /freɪm/), suggesting that both English and Dutch meanings of the inter-lingual homophones were activated. The inhibitory effect generated on the IH prime-target pairs was interpreted as cross-language lexical competition, as a result of language non-selectivity.

Similar effects for IHs were also reported by Lagrou, Hartsuiker, and Duyck [10] in a more straightforward way with Dutch-English bilinguals who performed an auditory lexical decision task in both L1-Dutch and L2-English. For example, in their experiments, Dutch-English bilinguals responded to IHs in Dutch (e.g, bij, pronounced similarly to ‘bye’ in English) more slowly than non-IHs in Dutch (e.g., vol). However, monolingual Dutch listeners did not show this inhibitory effect on IH words. Along the same line, they tested the same bilinguals with English IH and non-IH words in an auditory lexical decision task and obtained the same pattern, compared to monolingual English listeners. These results were taken as evidence to support language non-selective access in bilingual language processing because the inhibitory effect was driven by the cross-language lexical competition.

Furthermore, both Shulpen et al. [11] and Lagrou et al. [33] investigated whether sub-lexical cues (i.e. language-specific acoustic information, like accent) would modulate cross-language activation so as to guide lexical access. They found bilinguals were sensitive to language-specific sub-phonemic cues, but lexical access was still language non-selective. That is, both languages were active and competing for access in processing, regardless of accent. Interestingly, the same authors reported that bilingual auditory word recognition in a sentence context could be modulated by semantic constraints and speaker accent, which did not restrict cross-language activation either [33]. These results suggest that language-specific speech cues were exploited by bilingual listeners to guide lexical access.

Thus, an overview of the relevant literature seems to suggest that bilingual lexical access in the auditory modality is largely language non-selective, but is also sensitive to language-specific phonetic cues that can be used to guide lexical access. In particular, an interesting question to ask is the extent to which bilingual listeners use the language-specific speech cues in cross-language lexical competition and access. To date, the majority of research has primarily focused on segmental information (e.g., phonemes) in triggering language co-activation (i.e. cross-language lexical activation/competition); it remains underexplored whether and to what extent supra-segmental information (e.g., lexical tones) would affect word recognition in bilinguals of tonal languages. Thus, the primary objective of the present study is to investigate the role of lexical tone representation in cross-language processing in tonal bilinguals (e.g., Mandarin-English bilinguals).

Lexical tone processing in monolinguals and bilinguals

Belonging to the Sino-Tibetan language family, Mandarin Chinese is a tonal monosyllabic language, which utilizes four different tones to disambiguate lexical meanings [34]. For example, the Mandarin word ma can refer to “mother” when pronounced in Tone 1 (high flat), “hemp” when pronounced in Tone 2 (rising), “horse” when pronounced in Tone 3 (low dipping) and “scold” when pronounced in Tone 4 (falling). Thus, lexical access in Mandarin needs to involve both segmental information (i.e., consonants and vowels) and supra-segmental information (i.e., lexical tones). By contrast, in English, pitch contours do not alter lexical meanings. In addition, Mandarin syllables predominantly follow a CV (consonant-vowel) structure, with the only exceptions being syllables which include the nasals /n/ and /ŋ/ in the coda position [35]. Because a Mandarin syllable can be articulated in as many as four tones, most tonal syllables are homophones of other morphemes/words/characters. Given the limited number of legal syllables in Mandarin, one syllable is shared by eleven characters on average [36].

Because Mandarin tones are lexical, it has been reported in the literature that Mandarin tones are a critical cue in constraining spoken word recognition [37–41]. In an auditory priming paradigm, where participants performed lexical decision on auditory targets following auditory primes, Lee [37] investigated the role of lexical tones in word recognition. In this study, there was a total of four conditions differing in the phonological relationship between primes and targets: 1) prime-target overlap in only segmental information (e.g., lou3 ‘hug’—lou2 ‘hall’); 2) prime-target overlap in only tone (e.g., cang2 ‘hide’—lou2 ‘hall’); 3) prime-target overlap in both segments and tone (e.g., lou2 ‘hall’—lou2 ‘hall’); and 4) no prime-target overlap (e.g., pan1 ‘climb’—lou2 ‘hall’). Under two different ISIs (prime-target inter-stimulus interval: 250ms and 50ms), reliable priming was only observed in the condition where both tones and segments overlapped between primes and targets. The author concluded that segmental overlap alone was not sufficient for facilitation on target recognition, and thus lexical tones were critical in producing priming.

Further, Malins and Joanisse [38] used the visual world paradigm to compare the role of tonal information and segmental information in Mandarin spoken word recognition. Native Mandarin speakers were presented with auditory Mandarin words and instructed to press a button on a keypad corresponding to the position of a target picture in a visual array. The visual stimulus display consisted of the target (e.g., chuang2 ‘bed’), a competitor item whose name overlapped phonologically with the target in either segments (e.g., chuang1 ‘window’), onset and tone (e.g., chuan2, ‘ship’), rhyme and tone (e.g., huang2 ‘yellow’), or only tone (e.g., niu2 ‘cow’), in addition to two unrelated distractors. The authors found that the time course over which listeners resolved competition between items differing in segments was comparable to that over which listeners resolved competition between items differing in tone. Following this, they concluded that tones not only constrained the cohort size, but also played a role that was comparable to segmental information in Mandarin spoken word recognition.

Therefore, it is clear that lexical tones provide important independent cues for lexical access within a tonal language, yet there is limited evidence showing how lexical tones play a role in cross-language processing. In fact, research has primarily explored how tonal bilinguals process a non-tonal language differently at the perceptual level due to their extensive experience with lexical tones [42,43]. For instance, Ortega-Llebaria et al. [43] demonstrated that Mandarin speakers of English were more sensitive to F0 than Spanish speakers of English and native English speakers when recognizing English words; in addition, Mandarin speakers were faster retrieving a falling F0 than a rising F0 in an English lexical decision task. This study suggests that tonal bilinguals process pitch contour differently even in a non-tonal language, compared to non-tonal bilinguals and monolinguals. However, this evidence does not offer insights of how tonal information would affect word recognition in the non-tonal language.

One recent study tapping into the effect of lexical tones in cross-language processing at the lexical level employs a translation task with Mandarin-English bilinguals [44]. Tonal bilinguals were instructed to select the correct Mandarin translations, which were visually presented on the computer screen, of the auditory English words. The auditory English words were manipulated in pitch contour to either match with the tones of the Mandarin translations or mismatch with the tones of the Mandarin translations. Tonal bilinguals were found to be sensitive to the manipulation of pitch contours in English words. While one can argue that the tonal effect in the translation task could also be driven by the visually presented Chinese characters, another recent study demonstrated the tonal effect in cross-language processing in a more straightforward but implicit way. Wang, Wang, and Malins [45] demonstrated that Mandarin-English bilinguals implicitly accessed their L1 Mandarin words when recognizing auditory English words in the visual world paradigm and that this cross-language lexical activation/competition was sensitive to lexical tones. For example, in the visual world paradigm, when Mandarin-English bilinguals listened to the word ‘rain’, whose translation in Mandarin is yu3, they were instructed to pick the target picture of rain among an array of 4 pictures on the computer screen. Among the 4 pictures, there was a competitor whose name in Mandarin was either a homophone of the Mandarin translation of the target (e.g., ‘feather’–yu3) or an item that only overlapped with the Mandarin translation in segments but not in tones (e.g., ‘fish’—yu2). The eye-movement data only showed significance in the homophone condition where both segments and tones overlapped with the Mandarin translations of the targets, suggesting that participants landed their eye fixations on the competitors (e.g., ‘feather’) before picking the target (e.g., ‘rain’). This competition effect was interpreted as evidence of lexical competition corresponding to the competitor picture and that the source of the lexical activation in Mandarin was due to the phonological overlap between the competitor and the target translation in both segments and tones. This evidence indicated that Mandarin-English bilinguals implicitly accessed their L1 Mandarin when recognizing L2 words and that lexical activation in L1 Mandarin was driven by both segments and tones. This study was the first to demonstrate a cross-language tonal effect in English only, a non-tonal language, without any manipulation on pitch contour or phonological overlap between the target and non-target language in the input.

Thus, it remains unclear whether a pitch contour on an English word would have any effect in cross-language lexical activation in Mandarin in a direct and explicit way without involving the translation process. One way to address this issue is to see whether words pronounced similarly/the same across languages would be able to elicit language co-activation with or without tones, like in [11] and Lagrou et al. [10] [33] where cross-language homophones elicited lexical competition with non-tonal bilinguals. Homophones are words that have the same pronunciation but differ in meaning, spelling, or grammatical class [46]. In the same vein, inter-lingual homophones (IH) are words that are pronounced similarly across languages but differ in meaning, spelling, or grammatical class. Thus, IHs share segments but not tones for Mandarin-English bilinguals. The main purpose of the current study is to investigate whether lexical tones are crucial in cross-language lexical activation/competition when tonal bilinguals are exclusively processing a non-tonal language. Given the compelling evidence within and across languages that supra-segmental information is crucial in activating Mandarin words, we hypothesize that Mandarin-English inter-lingual homophones sharing only segments are not sufficient to trigger parallel language activation as in non-tonal bilinguals and that lexical tonal information needs to be available along with segments.

The present study

We aim to test whether the presence of lexical tones is critical in cross-language lexical competition, thus, tonal manipulation can be achieved with English target words in order to measure the lexical activation in the non-target tonal language (e.g., Mandarin). Experiments 1–2 were designed to compare results from Mandarin-English bilinguals in order to test whether IHs with versus without lexical tones would produce similar cross-language inhibitory effects as in Lagrou et al. [10]. We instructed bilingual participants to identify words in their L2 English and presented inter-lingual homophones either as naturally produced native English words (Experiment 1) or English words superimposed with lexical tones (Experiment 2). We hypothesize that lexical tones are obligatory in guiding lexical access to L1 Mandarin; thus, IHs with lexical tones will induce cross-language lexical competition, in contrast to no cross-language lexical competition in IHs without lexical tones. In order to confirm that this difference is due to bilinguals’ knowledge of Mandarin, we also tested English native speakers who should not show any difference in this manipulation of pitch contour. In other words, monolingual English listeners should not show any difference regardless of whether IHs are natural or superimposed with lexical tones as they do not have any knowledge of Mandarin. Note that our logic is not to compare Mandarin-English bilinguals and English monolinguals within Exp 1 or 2, but to compare within bilinguals and monolinguals to show that the presence of lexical tones produced cross-language lexical competition effect in bilinguals but not monolinguals.

Experiment 1 (Auditory lexical decision task)

Materials and methods

Participants.

Twenty-three Mandarin-English bilinguals and 22 English monolinguals from the University of Oxford participated in Experiment 1 for a monetary compensation. All the participants provided their written consent forms for the study which was approved by the Departmental Research Ethics Committee (DREC) in accordance with the procedures prescribed by Oxford University for ethical approval of all research involving humans (CUREC). The bilingual participants were asked to report their English proficiency with respect to the four skills (i.e., speaking, listening, writing and reading), using a Likert scale from 1 (very poor) to 7 (native-like), as well as their IELTS scores to be admitted to study in the UK. Means and SD are reported in Table 1. The bilingual participants were not informed that their L1 Mandarin knowledge would be relevant to the experiment. The whole experiment was conducted in English.

Download:

Table 1. Means (SD) based on bilingual participants’ self-ratings of their English language skills on a 1–7 Likert scale (1 = very poor and 7 = native like) and IELTS scores in Experiments 1 and 2.

https://doi.org/10.1371/journal.pone.0230412.t001

Stimuli.

Three types of stimuli–interlingual homophones (IHs), non-interlingual homophones (non-IHs) and non-words–were selected for the study. In selecting interlingual homophones, a systematic comparison of Mandarin and English phonemes resulted in 24 pairs of phonemes that were considered to be sufficiently similar across Mandarin and English, including 10 vowels and 14 consonants [47]. There could, in theory, be 140 CV syllables in Mandarin sounding similar to English syllables. We, highly proficient Mandarin-English bilinguals, compiled a table to see how many possible CV syllables are permissible and sound similar across Mandarin and English (See S1 Appendix). Among these, 50 are legal in both languages, 50 are legal only in Mandarin, 17 are legal only in English, and 23 are illegal in both languages. In Mandarin, each syllable can be articulated with different lexical tones; thus, taken together, 150 meaningful tonal syllables in Mandarin correspond to 50 meaningful syllables in English (e.g., both 法 (law), pronounced as /fa/ in Tone 3, and 发 (hair), pronounced as /fa/ in Tone 4, correspond to far in English). In addition, these Mandarin syllables (words) are semantically unrelated to their counterparts in English.

These 50 IH items were then rated by 5 highly proficient Mandarin-English bilinguals on a Likert scale from 1 (completely different) to 7 (the same) on their similarity in pronunciation between Mandarin and English. All the raters were native speakers of Mandarin and had received undergraduate and postgraduate training in General Linguistics or English Linguistics, such that they had some training in understanding the linguistic similarity between two languages. They were given English monosyllabic words (e.g., bay) and their counterparts in Mandarin (e.g., 被 /bei4/ ‘quilt’) and assessed how similar they were after reading aloud each pair in both languages. In the similarity rating, the most frequent Chinese character (word) among characters of the same segments/syllables but of different tones, based on the SUBTLEX-CH database [48], was chosen to represent the Mandarin counterparts. Eventually, we selected 37 interlingual homophones (e.g. me—/mi4/ as in 密 ‘secret’), having a cut-off mean score of 4.0 or above in the ratings across the 5 bilingual raters (see S2 Appendix). Another 37 monosyllabic non-IHs (e.g., sale) were selected and matched item by item in their frequency and phonological neighborhood density with the IHs, based on the Irvine Phonotactic Online Dictionary (IPhOD) database [49] (See Table 2 & S3 Appendix). An independent t-test confirmed that there was no statistically significant difference in log frequency (t (72) = .05, p = .96) between the IHs (M = 3.12, SE = 2.56) and the non-IHs (M = 3.11, SE = 2.57). There was also no statistical difference in the phonological neighborhood density (t (72) = .99, p = .32) between the IHs (M = 39.49, SE = 1.18) and the non-IHs (M = 39.65, SE = 4.42). Controlling the number of phonemes was unrealistic because there are not sufficient English words of CV (Consonant-Vowel) structures to be selected as non-IHs, when both frequency and phonological neighborhood density were matched. This limitation is due to the fact that most Mandarin syllables do not have a coda while English monosyllabic words with a CV structure are less common [35]. As a result, the IHs consist of 8 items of 3 phonemes and 29 items of 2 phonemes; by contrast, the non-IHs all contain 3 phonemes. Nasal codas, like /n/, are considered independent phonemes, not part of the vowel in the CV syllables. In addition, 74 monosyllabic non-words were generated from the same database in [49], all of which contain 3 phonemes.

Download:

Table 2. Stimulus characteristics of Experiments 1–2: Means (SD) of log frequency, phonological neighborhood density (PND), and number of phonemes.

https://doi.org/10.1371/journal.pone.0230412.t002

Auditory recording.

The speaker recording the stimuli was a 25-year-old highly proficient female simultaneous Mandarin-English bilingual. She grew up in a household where her father was a native speaker of English and her mother was a native speaker of Mandarin. As a result, she reported acquiring both languages simultaneously. She reported speaking mostly Mandarin at home and mostly English at school. To ensure that the selected bilingual could produce native English words, we asked her to read aloud an English passage and her voice was recorded. Six native English speakers were asked to judge the passage on a 1–5 Likert scale (1 = native English speaker, no accent; 5 = strong foreign accent) about the native-ness of the Mandarin-English bilingual’s English. Out of the six raters, five rated her English as 1, and one rated 2. All stimuli were recorded using the open source software Audacity, version 2.0.3 [50] at 44.1 kHz with a recorder in a quiet room. Prior to the actual recording, the speaker was given time to familiarize herself with the stimuli and read aloud for practice. All tokens were trimmed for programming purposes and normalized to -1.0 dB for amplitude.

Procedure.

Participants were tested in a quiet room and were wearing a headset in front of a testing computer. Prior to the experiment, they were given a written instruction of the task in English. Each trial started with a 500ms fixation (+) on the center of the screen, using a black font size of 30 against a white background, followed by the presentation of the auditory stimulus through the headset. Immediate to that, the participants were expected to press either the YES or NO button. They were instructed to press YES if the auditory stimulus was a word in English; otherwise, press NO. Visual feedback, either ‘Correct!’ or ‘Incorrect’ as appropriate, was presented at the bottom of the screen for 200ms immediately after the response. The between trial interval was 250ms. Responses were recorded, and reaction times (RTs) were recorded from word onset until the motor response on the YES or NO key on the keyboard. All the trials were programmed for the presentation of stimuli in a random order by E-Prime 2.0 [51].

Results and discussion

Participants who made errors on more than 30% of the total trials were excluded from the analysis. As a result, 22 out of 23 bilingual participants and 22 monolingual participants were included in the final analysis. Statistical analyses were performed using linear mixed-effects models [52,53]. Unlike more traditional ANOVAs, mixed-effects models take raw unaveraged data as input and incorporate both random effects of participants and items within a single analysis. In addition, we employed maximal random-effect structures in the models and included random slopes for factors of repeated measures [54], to avoid Type I errors. The fixed-effect factors were Word Type (IHs vs. non-IHs) and Group (bilinguals vs. monolinguals). Subjects and items were random factors. The lmerTest functions from the lme4 package (version 1.1–7) in R were used (version 3.1.0; CRAN project; The R Foundation for Statistical Computing, 2008). In error analysis, the binomial function (i.e., glmer) was employed to report the statistical significance of error rates across conditions; in reaction time analysis, the lmer function was employed to show the statistical significance of response times across conditions. Following standard conventions, any t-value greater than 2.0 or p-value smaller than .05 was deemed significant.

Because non-words were not of our theoretical interests in lexical decision, we only presented and analyzed data that reflect our factorial design: Group (monolingual vs. bilingual) x Word Type (IH vs. non-IH). Mean error rates and response times for IH and non-IH words are presented in Table 3. The overall analysis of error rates showed neither main effect of Group (z = .38, p = .71), nor main effect of Word Type (z = 1.72, p = .09). There was a marginal interaction between Group and Word Type (z = 1.85, p = .01 < .06). Restricting the analysis to each group, the error rates of the bilingual results showed no statistical difference between the non-IH and IH conditions (z = 1.6, p = .11). Similarly, the error analysis of the monolingual results showed no significant difference between the non-IH and IH conditions (z = .04, p = .97). These results indicate that neither group found any type of words particularly difficult to process; however, the non-IH words were slightly more difficult (compared to the IH words) for bilinguals than monolinguals (i.e., interaction).

Download:

Table 3. Mean offset reaction times (RTs in milliseconds) (SD) and error rates (ERs in percentages) (SD) (Experiment 1 & 2).

https://doi.org/10.1371/journal.pone.0230412.t003

Analyses on reaction times were based on responses to the word offsets. The choice of using the word offset measures was due to the variation in the duration of the critical word stimuli (Min. = 346ms, Max. = 981ms, Mean = 617ms, SD = 133ms). The average word durations for IHs, non-IHs and nonwords are: 555ms, 681ms, and 930ms. Therefore, the offset measure was believed to be more sensitive than the onset measure, free from the confounding variation of the stimuli durations [55]. The offset measures were calculated as the differences between the latencies logged by E-Prime (i.e., from the stimuli onset to the motor response on the keyboard) and the duration of the stimuli as measured using Praat [56]. In analyzing the data, reaction times outside 2SD above or below the mean were excluded from analysis (3%), as were trials on which an error occurred (16.1%).

The overall maximal mixed-effects analysis of the RTs showed a main effect of Group (t = 2.71 p < .01) and an interaction between Group and Word Type (t = 2.87, p < .01). However, there was no main effects of Word Type (t = 0.097, p = .93). These results show that the bilingual participants responded to L2 English spoken targets much more slowly than their monolingual counterparts and these two groups responded to the experimental manipulation (IH vs. non-IH) differently. Restricting the analysis to just the bilingual group, the mixed-effects analysis of the RTs showed no main effect of Word Type (t = .18, p = .86). Thus, bilinguals treat both IHs and non-IHs similarly, without demonstrating the evidence of activating the Mandarin lexicon so as to interfere lexical access in English. However, when restricting the analysis to just the monolingual group, the mixed-effects analysis of the RTs showed a main effect of Word Type (t = 2.29, p = .025 < .05). This result indicates that monolinguals responded to IHs more slowly than non-IHs. The outcome of the statistical models analyzing both groups is presented in Table 4.

Download:

Table 4. Linear mixed-effects analysis results for Experiment 1.

https://doi.org/10.1371/journal.pone.0230412.t004

It was predicted that if bilingual lexical access is language non-selective, the bilinguals would respond more slowly to the IHs than to the non-IHs (i.e. showing IH inhibitory effects due to cross-language lexical competition). On the other hand, if lexical access is language selective, IH effects would not be observed. The current results show a clear pattern in favor of language selectivity. The critical result is that bilinguals did not show any disadvantage in responding to IHs, which is contradictory to previous results in [10,11], [33] and indicates that lexical access is language-specific in the current experiment. In addition, bilinguals and monolinguals showed a contrast in responding to IHs vs. Non-IHs in the above analysis, as monolinguals showed an advantage in processing the non-IHs over the IH items. It is safe to rule out that the inhibitory effect on the IHs was due to cross-language lexical competition because the monolinguals did not have any knowledge of Mandarin. We believe that this difference showed in monolinguals is due to the characteristics of our stimuli in the IH and non-IH conditions; because most of the IHs are open syllables to match with their Mandarin counterparts while most of the non-IHs are closed syllables with other matched psycholinguistic variables. Due to the constraints in selecting appropriate stimuli in both languages to meet certain criteria, the difference of the phonotactic structures between Mandarin and English was impossible to avoid because English has quite a small number of open monosyllabic words and Mandarin is a prevalently open-syllabic language. On the other hand, this contrast indicates that bilinguals process their L2 English differently from monolinguals; otherwise, they should also have showed inhibitory effects on the IHs. We will return to this later in General Discussion, with regard to whether/how this inhibitory effect observed in monolinguals would affect our interpretation of the results.

Thus, Experiment 1 did not show cross-language lexical competition effects with inter-lingual homophones when lexical tones were absent. To further investigate this, Experiment 2 was designed to understand whether the same stimuli would produce cross-language lexical competition with the presence of lexical tones. In order to test this hypothesis, we superimposed Mandarin tones onto the English words and non-words in Experiment 2.