Bilinguals’ speech perception in noise: Perceptual and neural associations

The current study characterized subcortical speech sound processing among monolinguals and bilinguals in quiet and challenging listening conditions and examined the relation between subcortical neural processing and perceptual performance. A total of 59 normal-hearing adults, ages 19–35 years, participated in the study: 29 native Hebrew-speaking monolinguals and 30 Arabic-Hebrew-speaking bilinguals. Auditory brainstem responses to speech sounds were collected in a quiet condition and with background noise. The perception of words and sentences in quiet and background noise conditions was also examined to assess perceptual performance and to evaluate the perceptual-physiological relationship. Perceptual performance was tested among bilinguals in both languages (first language (L1-Arabic) and second language (L2-Hebrew)). The outcomes were similar between monolingual and bilingual groups in quiet. Noise, as expected, resulted in deterioration in perceptual and neural responses, which was reflected in lower accuracy in perceptual tasks compared to quiet, and in more prolonged latencies and diminished neural responses. However, a mixed picture was observed among bilinguals in perceptual and physiological outcomes in noise. In the perceptual measures, bilinguals were significantly less accurate than their monolingual counterparts. However, in neural responses, bilinguals demonstrated earlier peak latencies compared to monolinguals. Our results also showed that perceptual performance in noise was related to subcortical resilience to the disruption caused by background noise. Specifically, in noise, increased brainstem resistance (i.e., fewer changes in the fundamental frequency (F0) representations or fewer shifts in the neural timing) was related to better speech perception among bilinguals. Better perception in L1 in noise was correlated with fewer changes in F0 representations, and more accurate perception in L2 was related to minor shifts in auditory neural timing. This study delves into the importance of using neural brainstem responses to speech sounds to differentiate individuals with different language histories and to explain inter-subject variability in bilinguals’ perceptual abilities in daily life situations.

The auditory brainstem response evoked by speech stimuli is used to examine how subcortical structures of the auditory pathway encode temporal and spectral aspects of speech sounds (e.g., [29][30][31]). The frequency-following response (FFR) evoked by these speech stimuli can closely mimic the waveform of the acoustic stimulus [32]. The FFR has been studied in bilingual populations to examine whether lifelong language experience can affect subcortical auditory processing. Enhancements in FFRs because of bilingualism were reflected in different aspects of the neural response. For example, some studies showed greater consistency and stability of FFRs among bilinguals compared to monolinguals [25, 28, 33] and among bilinguals who are more proficient in the language [25] and have more years of bilingual experience [27]. Other studies showed that exposure to a second language induces earlier neural latencies [20], more pronounced and robust FFRs [22,23], and a larger representation of the fundamental frequency (F0) component [24,26,27], which is used to recognize and track speech, and serves as an important cue for speech perception in challenging listening conditions [26,[34][35][36][37][38]. Notably, differences between monolinguals and bilinguals in FFRs were observed mainly when the speech stimulus was presented in background noise compared to quiet conditions [e.g., 20,24]. Further, better representation of F0 among bilinguals was associated with better attentional and cognitive abilities, mainly when individuals were tested in noise (e.g., [24,25]).
To the best of our knowledge, this study is the first to combine perceptual and brain measures in bilingual populations. This combination is important for translating knowledge regarding physiological outcomes into practice and for examining how the advantage in subcortical responses interacts with a disadvantage in perception. Further, in the current study, we examined how the perceptual-physiological correlation varies across the two languages of bilinguals. For this purpose, in addition to comparing the perceptual performance of bilinguals and monolinguals, bilinguals were examined in both languages: (first language-L1 (Arabic) and second language-L2 (Hebrew)). This was done to determine whether the subcortical physiological mechanism can predict bilinguals' perceptual performance in general or related to the language of the processed stimuli.
In summary, the current study was designed to answer the primary research questions: Does subcortical processing predict perceptual performance? And how do L1 and L2 modify the perceptual-physiological correlation?

Participants
Sixty, right-handed, first-year college students (41 females, mean age = 24.6 ± 3.7) were recruited to participate in the study. All had obtained a similar level of formal education (mean ± standard deviation = 13.3 ± 2.6 years) and had normal cognitive function (based on Wechsler Intelligence Test [39]). In addition, all participants exhibited normal hearing thresholds in both ears (� 20 dB HL pure-tone air conduction thresholds for octave frequencies 250 through 8000 Hz [40]) and absolute peak and interpeak latencies within normal limits [41] to 100-μs clicks presented at 80 dB nHL at a rate of 13.3/s, with a 10.66 ms recording window. To control for knowledge of music, a factor known to affect subcortical processing (e.g., [42][43][44][45][46][47][48][49][50]), only participants with no-to minimal-musical knowledge (less than one year of experience in elementary school) were included. Professional musicians were excluded from the study. Participants provided written informed consent before participating and were compensated with either a coffee coupon or course credit for participating. The Ethics Committee of the University of Haifa approved the study protocol.
Participants were divided into two groups based on their language history and knowledge: Arabic-Hebrew bilinguals and Hebrew monolinguals. All participants were asked to fill out a questionnaire about their demographic information and language profile [adapted from 12,51]. The information reported below is based on self-reports. The means and percentages reported are based on group averages.
The bilingual group consisted of 30 Arabic-Hebrew speakers. Bilinguals were exposed to Arabic (L1) from birth and to Hebrew (L2) from age three. All bilinguals considered themselves dominant in L1 and reported intensive exposure to this language. Bilinguals received more than 10 years of formal academic education in Hebrew (12.3 ± 3.9 years), passed the high school matriculation exams with Hebrew as their L2, and used Hebrew extensively in their academic studies (about 45% of the time). The monolingual group consisted of 30 participants who reported only knowledge of Hebrew and had no substantial learning or proficiency in Arabic.
Data of one monolingual participant were excluded because of excessive noise in the electrophysiological recordings caused by high levels of myogenic activity. Consequently, data from 59 participants (30 bilinguals and 29 monolinguals) were included in the final analysis. Except for language history and knowledge, the two groups were similar in chronological age (t (57) = -1.857, p = 0.07), gender (t (57) = -0.085, p = 0.933) and years of formal education (t (57) = 1.529, p = 0.132). All participants underwent an electrophysiological recording and performed perceptual tasks in the hearing lab at the University of Haifa.

Electrophysiology
During the electrophysiological session, participants sat in a comfortable reclining chair in a sound-treated, electrically shielded booth and were asked to stay calm during passive exposure to the stimuli. Lights inside the audiological booth were dimmed during recording. Brainstem responses were collected using the Biologic Navigator Pro System (Natus Medical Inc., Mundelein, USA). Vertical montage for Ag-AgCl electrode placement was applied. The non-inverting electrode was at the midline (Cz), the negative inverting electrode on the right earlobe (A2), and the ground electrode was put on the left earlobe (A1). The maximum permissible impedance level for each electrode was less than 5 kO, and the inter-electrode impedance was less than 3 kO.
Stimuli and conditions. A 40-ms synthesized /da/ syllable was used. This universal syllable was chosen since it is shared across many languages [ 85.33 ms, including a pre-stimulus period of 15 ms. Brainstem responses were elicited in response to the speech syllable in quiet and in noise conditions. In noise, the syllable was presented with 80 dB SPL continuous, white noise with a signal-to-noise ratio (SNR) of 0 dB. The stimuli in the two listening conditions were presented monaurally to the right ear via electromagnetically shielded biologic insert earphones (580-SINSER) while leaving the left ear unoccluded. Similar to the protocol used in Krizman et al. [54], all participants in the current study heard a movie soundtrack played at < 40 dB SPL (an insufficient intensity to mask the stimulus in the right ear) with the unoccluded ear. This was done to promote stillness and rule out differences in state as a potential confound.
Recording. Two blocks of 3000 artifact-free sweeps were collected in each of the two listening conditions. Trials with activity exceeding ± 25 μV were rejected. Responses were online band-pass filtered from 100 to 2000 Hz, which captures the limits of the brainstem and the inferior colliculus phase-locking and minimizes collecting myogenic noise and cortical activities [29, 55, 56]. The total recording time was 20 minutes.
Data averaging and analysis. A final waveform was created for each listening condition by averaging the two blocks collected. The final waveform comprised 6000 artifact-free sweeps. In total, two final averaged waveforms were analyzed for each participant, one for the quiet condition and the other for the noise condition. Transient peaks and those reflecting the harmonic portion of the stimulus were visually identified and manually marked. Detailed information regarding the FFR components is described in previous reports (e.g., [31,32,57]). In this study, all components (V, A, C, D, E, F, and O peaks) were detected in the quiet condition. However, we considered analysis of the V, A, and O peaks reflecting the initiation and the offset of the response, and peak F corresponding to the voicing of the speech sound. We focused solely on these peaks because the detectability of the remaining peaks (C, D, and E) was relatively poor in the noise condition (64.4%, 66.1%, and 76.2%, respectively). A peak was considered reliable if it was present in >85% of participants [58]. Peaks of interest (V, A, F, and O) were identified by the two authors. The second author marked the waveforms separately to verify uniform marking and was also blinded to participants' identities and group. In addition, to avoid bias, the second author was blinded to the condition under which the recording was conducted. Measures of both timing (latency) and magnitude (amplitude) were applied for peaks V, A, F, and O. Also, a fast Fourier transform (FFT) in Matlab (The Mathworks) using the Brainstem Toolbox [31] was performed to calculate F0 amplitude.

Perception
In this part, bilinguals were tested with Arabic (L1) and Hebrew (L2) speech stimuli, with language order counterbalanced, while monolinguals were examined only in Hebrew. Bilinguals were given a 10-minute break between Arabic and Hebrew perceptual tasks. Here, we examined the ability of participants to perceive wordlists and sentences presented in quiet and with background noise (detailed below). Participants performed the perception part individually while sitting in front of a laptop. Participants were asked to listen to a given stimulus and repeat it. This procedure continued until all stimuli had been presented. Each participant heard each stimulus only once, no stimulus was used twice, and no feedback was provided. The presentation order of the speech stimuli (wordlists and sentences) and listening conditions (quiet and noise) was randomized across participants.
Stimuli and conditions. Participants were presented with two types of speech stimuli (wordlists and sentences), which differed in the contextual cues included. At the wordlists level, words from audiological speech and hearing tests were used to compose each wordlist [59]. Seven bi-syllabic words from unrelated semantic categories were used in each wordlist. At the level of the sentences, plausible, syntactically correct sentences from Arabic and Hebrew versions of the Hagerman test [60] were adapted and used. Each sentence consisted of seven words with semantic and syntactic redundancies.
Instead of relying on one type of speech stimulus, we included two types to reflect the perception of the individual. This was done because previous studies have shown that bilinguals show more difficulty in noise as the linguistic complexity of the task increases [7,17,[61][62][63][64][65][66][67]. Here, to evaluate the relation between physiology and perceptual measures, we combined the perceptual accuracy of the individual in both wordlists and sentences to reflect the perceptual performance in general, regardless of the effect of task complexity on performance.
Wordlists and sentences in Arabic and Hebrew were created. The average number of syllables across Arabic and Hebrew stimuli was similar (p > 0. 19), as were the root-mean-square (RMS) amplitudes (p > 0.7). Stimuli across the two languages were also similar in terms of frequency (tested in a pilot study). Uncommon words, cognates and false cognates were not included in the data set. An Arabic speaker recorded stimuli in Arabic, and a Hebrew speaker recorded Hebrew stimuli to avoid bias resulting from pronunciation problems.
Stimuli were presented binaurally to participants at 80 dB SPL in two listening conditions: quiet and a 4-talker, babble noise at a fixed level of SNR = 0 dB. For the noise condition, the babble used for Arabic stimuli was in Arabic, and the babble used for Hebrew stimuli was in Hebrew. The babble noise was selected because it closely resembles a natural situation where individuals need to extract a target speech from a background of competing voices. We assessed and normalized the amplitudes of the Arabic and Hebrew babbles to ensure equal intensity and analyzed a set of acoustic parameters to reveal similar babble characteristics across languages [68]. Stimuli were presented to participants through software developed and used previously [69].
Scoring. Participants' perceptual responses were digitally audio-recorded using a Mini USB recorder. Two trained coders who were blind to the study goals coded each participant's responses. One point was given for each word that the participant could repeat. The perceptual accuracy for each condition was examined. This was done by calculating the percentage of correct responses accomplished over all wordlists and sentences given in each listening condition.

Statistical analysis
Data from 59 participants (30 bilinguals and 29 monolinguals) are reported below and were included in the final analysis. We focused on condition (quiet vs. noise) and group (monolinguals vs. bilinguals) differences in the electrophysiological analysis. Dependent variables included the latencies and amplitudes of V, A, F, and O peaks and F0 amplitude. In the perceptual analysis, within participant comparisons were also conducted (bilingual L1 vs. L2) as a main effect of language. All statistical analyses were completed using IBM SPSS Statistics V25. Repeated measures analyses of variance (ANOVA) were performed. Pairwise comparisons were used when required. All statistical analyses were adjusted for multiple comparisons using Bonferroni corrections [70], and effect sizes were indicated using η 2 p. Shapiro-Wilk tests were used to test for normal distribution within each group. Levene's tests were conducted to test the homogeneity of variance for all measures. Finally, Pearson r correlations coefficient were calculated to study the perceptual-physiological association. Detailed analyses are presented in the Results subsections below.

Electrophysiology
Repeated measures ANOVA with condition (quiet, noise) as the within participant factor and group (monolingual, bilingual) as a between participant factor was used.
Latency. Main effects of condition and group were observed for peaks V, A, F, and O (see Table 1). Bonferroni pairwise comparisons revealed that latencies were significantly prolonged in noise compared to quiet (p values < 0.001), and significantly earlier latencies were observed in the bilingual group compared to the monolingual group (p � 0.005). Significant condition × group interactions were observed for all peak latencies (p � 0.01). These significant interactions indicate that the presence of noise prolonged the peak latencies in one group to a greater extent than in the other. Specifically, post hoc t tests showed that whereas the latencies of the peaks were comparable in quiet (see Fig 1A, p � 0.133), earlier latencies were observed in the bilingual group compared to the monolingual group in noise. The prolonged latencies of bilinguals compared to monolinguals in noise are illustrated in Fig 1B ( Amplitude. A main effect of condition was observed for peaks V, A, F, and O, and F0 amplitude ( Table 1, p < 0.001). Background noise diminished the amplitudes of all peaks (see Fig 1) and F0 amplitude in both groups. No significant main effect of group was observed ( Table 1, p � 0.1). Condition x group interactions were also not significant (Table 1, p � 0.168). Monolinguals and bilinguals demonstrated similar V, A, F, and O peak amplitudes in the two listening conditions, and F0 amplitudes were comparable in quiet (monolingual mean amplitude (μV) = 7.495 ± 3.422, bilingual mean amplitude (μV) = 7.936 ± 2.730) and noise (monolingual mean amplitude (μV) = 3.729 ± 2.957, bilingual mean amplitude (μV) = 3.908 ± 2.247) across the two groups.

Perceptual-physiological association
To examine the perceptual-physiological association, we first calculated the delta (difference) between quiet and noise conditions in physiological and perceptual measures. The degree of Grand average subcortical responses to speech stimuli obtained from bilinguals (green) and monolinguals (red) recorded in quiet (A) and noise (B). �� p � 0.01; ��� p � 0.001. Significant group differences between peak latencies (as revealed in t tests for independent samples) were found in noise. The tables below represent means ± SD (ms) for peak latencies in quiet (right) and noise (left), and the p value for the group differences. . We chose the onset of the response specifically because of its sensitivity to the effects of adverse listening conditions [76,77]. In perception, the deterioration in accuracy due to noise was calculated (delta perception = accuracy in quiet minus accuracy in noise [larger values reflect more deterioration in accuracy]).
Correlation between delta F0 and delta perception. This correlation was conducted to test whether larger effects of noise in the neural representation (i.e., larger F0 deltas) correlated to greater susceptibility to the degradative effects of noise on the perception task (i.e., larger perception deltas). As shown in Fig 3A, this correlation was significant and positive among bilinguals in L1 (r = 0.4, p = 0.04), indicating that bilingual listeners who were more susceptible to the effect of noise and showed more significant perceptual deterioration in their dominant language, demonstrated greater susceptibility with the neural encoding of F0. This correlation was not significant among monolinguals (r = 0.102, p = 0.597) or bilinguals when tested in their L2 (r = -0.069, p = 0.717).
Correlation between V latency shift and delta perception. This correlation was conducted to examine whether greater changes in the latency of the physiological response due to the addition of noise are related to greater deterioration in perception. As demonstrated in Fig  3B, a significant  p = 0.01), indicating that larger shifts in V latency were associated with more deterioration in accuracy. However, this correlation was not found for monolinguals (r = -0.318, p = 0.1) and was marginally significant for bilinguals when tested in their L1 (r = 0.349, p = 0.059).

Discussion
The current study characterized the subcortical neural processing of speech sounds and the perceptual performance of normal-hearing bilinguals and monolinguals in quiet and noisy conditions. We also examined how this subcortical neural processing is related to perceptual performance. Our results demonstrated that the effect of lifelong experience with two languages-bilingualism-was reflected mainly in perceptual and neural measures in challenging listening conditions. Specifically, perceptual performance was worse for bilinguals than monolinguals in noise, but neural timing was earlier. Further, the current study showed brainbehavior associations. Among bilingual individuals, perceptual performance in noise was associated with the extent of subcortical resistance to the disruption caused by background noise. These findings are discussed in the following subsections.

The effect of noise on perceptual accuracy and neural response
Our results indicated a considerable decline in perceptual accuracy in the presence of background noise and substantial changes in the morphology of the auditory brainstem responses. Overall, monolinguals and bilinguals (in both languages) achieved significantly lower accuracy in the noisy condition than the quiet condition, and the neural responses elicited from both groups were observed with longer latencies and smaller amplitudes in the noisy condition (Figs 1B and 2B). The poorer perceptual performance of bilinguals in L2 compared to L1 or relative to the performance of monolingual listeners is consistent with previous findings [8, 11-14, 16, 78-85] and can be attributed to multiple factors that may independently or interactively affect bilinguals' performance, such as late age of language acquisition [7,86,87], lower proficiency in L2 [11,79,86,88], limited exposure to the languages [16,[89][90][91] and coactivation between languages [92-96]. Specifically, previous studies have suggested that bilinguals tend to show perceptual difficulties in their L2 because they acquire the language at a later stage of life and exhibit low proficiency on it (e.g., [7,11,79,[86][87][88]). In addition, studies have proposed that because bilinguals split their resources across different languages, they are less exposed to each one. This limited exposure to each language may lead to less precise automatic processing and consequently pose more challenges to the listener [16,[89][90][91]. In addition, as bilinguals' two languages are co-activated even when only one language is needed for the task [92-96], bilingual listeners need to manage more competing distractors during language processing. This co-activation may result in fewer resources available during speech processing and increase listening effort and perceptual difficulty [6,16]. To note, in the current study, bilinguals showed a significant decrease in their perceptual accuracy in background noise, even when tested in their L1. The limited exposure to languages and co-activation may explain the speech perception in noise disadvantage seen in bilinguals even when operating in their dominant language [16].
From the physiological aspect, the responses elicited in noise were diminished and degraded, consistent with previous findings [30, 47, 76, 77, 97-99]. This reduction can reflect neural desynchronization [98, 100] and less efficient efferent processing [20]. Neural group differences were reflected in noise, where earlier neural latencies were observed in bilinguals compared to their monolingual peers. However, since the participants were normal-hearing, young adults with no clinical condition, it is likely that the prolonged responses observed among monolinguals do not reflect deficiencies. Instead, they indicate that the neural encoding of speech became more resistant to the detrimental effects of background noise in bilinguals. Considering differences in the linguistic experience between bilinguals and monolinguals and consistent with the evidence that FFRs are dynamic and malleable to the effects of immediate or long-term auditory experiences (e.g., [20,[24][25][26][27]42, 44-50, 58, 101-108]), we suggest that the language experience likely underlies the group differences observed in the neural responses in the noise condition. The enriched language environment of bilinguals and their need to manage two linguistic systems may explain the early latencies observed, as these listeners become faster in detecting the characteristics of speech stimuli. Further, since bilinguals manage two linguistic systems that can compete with each other (e.g., [93, 95, 96]), these listeners are required to enhance their attention to focus on the target signals and to prevent interference from those that are irrelevant [24, 25, [109][110][111][112][113][114][115]. Therefore, the earlier FFR neural latencies observed in bilinguals may reflect their advantage in attentional control, mainly as FFRs are known to be sensitive to the effects of these skills [24,25,[116][117][118][119].
In the current study, no group differences were found in the amplitudes of the neural response peaks or in the magnitude of the fundamental frequency. These findings are consistent with a recent study [20] suggesting that the effects of bilingualism are not detectable by measuring amplitude. However, in contrast to our results, other previous studies showed differences between monolinguals and bilinguals in the magnitudes of F0 and suggested that bilinguals encode F0 more robustly than monolinguals do (e.g., [24][25][26]). Several factors may explain the inconsistent findings. Some can be related to methodological differences. Specifically, a longer syllable was used in previous studies than the one used in the current study (170 ms vs. 40 ms). Consequently, the characteristics of the stimulus differ. For example, in previous studies, the 170 ms syllable consisted of a 50 ms formant transition and a 120 ms steady-state portion. Consequently, the sustained vowel period in the 170 ms stimulus is likely to capture subtle enhancements in the FFR. However, the shorter stimulus used in the current study may have theoretically restricted us from obtaining group differences in the amplitudes of F0. Furthermore, it is possible that since noise diminished the entire response dramatically ( Fig  1A versus 1B), subtle differences in amplitude measures, which are known to be more variable and less stable compared to the latency measure [120], were not observed in the current analysis. Taken together, our results do not rule out the possibility that bilinguals and monolinguals differ in the amplitude aspect. Rather, they suggest that in this specific design, differences might be confined to specific factors that should be examined in more detail.

Perceptual-physiological associations
Regarding perceptual-physiological associations, our results demonstrated that subcortical processing played a role in the perceptual abilities of bilinguals. Our findings indicate that perception is related to the degree of subcortical resilience to the disruption caused by background noise. Specifically, in noise, increased brainstem resistance (i.e., fewer changes in F0 representation or fewer shifts in V latency) were related to better speech perception abilities among bilinguals, as indicated by less deterioration in perception (Fig 3). Alternatively stated, bilinguals who exhibited more disruption of brainstem processing in noise had more perceptual difficulties. These correlations further the understanding of the neural processes underlying perception of speech among bilinguals (especially given that these correlations were not significant in the monolingual group) and suggest that subcortical processes could be one source that explains inter-subject variability in daily challenging listening conditions. The significant correlations observed among the bilingual listeners, who were more affected perceptually by the detrimental effect of noise than their monolingual peers, suggest that the brainstem processing (low-level information) may be exploited in conditions that are more challenging for the individual. This suggestion corresponds with the Reverse Hierarchal Theory (RHT), which states that processing starts at high-level areas; however, when the task demands increase, more reliance on lower levels is needed to search for more optimal representation [121]. Also, the correlations found mainly among bilinguals suggest greater recruitment of subcortical perceptual areas by these individuals, which aligns with the anterior-toposterior and subcortical shift (BAPSS) model [18,122]. The BAPSS model posits that with experience, bilinguals do not rely only on the typical regions during processing. Instead, they may recruit other areas, such as automatic subcortical or posterior regions to manage the coactivation and competition between languages.
The current correlations may be evidence of the interplay between central and peripheral processes [123]. The literature has shown that FFRs are determined by peripheral processes and affected by central processes [45,[124][125][126][127]. In the ascending track, better brainstem processing-reflected in the current study as less susceptibility to the effects of noise-may provide a good platform for higher cortical processing, which can improve perceptual performance in noise [128][129][130]. In the opposite view of the descending path, the associations found in the current study can also be explained by the corticofugal (top-down) system [43,44,46,108,121,[131][132][133][134][135][136] by which cortical processes that are critical for understanding distorted signals [137][138][139][140][141][142][143] project backward to tune structures in the auditory periphery [131,136,144,145], which might enhance or modify features of the target speech subcortically. Consequently, associations between perceptual performance and subcortical neural responses can also indirectly reflect the effect of the auditory cortex. In this regard, previous studies have shown that bilingual experience increases grey matter density in various cortex regions, including executive control regions [146][147][148][149], which are essential in challenging listening conditions [150,151]. Thus, it can be argued that bilinguals who tend to use more cortical resources in background noise may have more efficient backward processes, and consequently, their brainstem responses were found to be less susceptible to the effect of noise. Future studies that examine brainstem and cortical event-related potentials (simultaneously) and their relationship with perceptual abilities could investigate the above assumptions and shed light on the interplay between central and peripheral processes.
Within the bilingual group, the correlations found yielded interesting results. The latencies of the subcortical neural response were linked more to bilinguals' perceptual performance in L2 and marginally in L1. At the same time, the encoding of the fundamental frequency was correlated to their perceptual performance in L1. To the best of our knowledge, these correlations are innovative and have not been reported to date. Still, they can be partially explained by the results of Tremblay, Namjoshi [152], who demonstrated that language experience affects listeners' use of F0, a cue that is important for word segmentation and comprehension of signals masked by other interferences (e.g., [36,37,153]); in this case, the perception of speech embedded in noise. Also, it can be argued that since bilinguals demonstrate differences in high-level (i.e., cortical) processes in their two languages [7,17,61,65,66,[154][155][156], the backward pathway that may modify features subcortically (consistent with the explanation of the corticofugal effect discussed above) might also differ, leading to different perceptual-neural correlations in bilinguals' two languages. However, future studies should examine perceptualneural associations in other groups of bilinguals to shed light on the mechanism underlying the current findings and better understand the variability in bilinguals' L1 and L2 associations.
No correlations were found between subcortical processing and perceptual abilities in noise among monolingual listeners in the current study. The changes in F0 amplitudes and the shifts in the latency of V peak did not predict accuracy among monolingual listeners. These results align with those of Yellamsetty and Bidelman [98], who showed that F0 amplitudes failed to predict the accuracy of listeners' identification, but contradict with others (e.g., [34,76,128]) who found a relationship between subcortical neural processing and perceptual performance in noise among monolingual speakers. We suggest that the main reason for not finding a significant perceptual-neural correlation among monolinguals in the current study is that noise deteriorated their perception to a lesser extent compared to bilinguals. Consequently, it is likely that the lexical access and perception occurred rapidly and automatically [157,158], and no reliance on lower levels was needed [113]. We hypothesize that a correlation would likely be found when testing monolinguals with a more challenging noise condition that could pose a more significant perceptual challenge, and consequently, more need for reliance on lower levels. Accordingly, Song and colleagues [35] found a significant perceptual-neural correlation in their monolingual sample. Their participants were much adversely affected by the addition of noise (the highest accuracy obtained was 75%, and the averaged accuracy was 40.56%, both lower than these of the current study). Nevertheless, this assumption needs to be tested in future studies because additional methodological factors may explain the inconsistency between the different studies. Among others, studies that found a significant correlation [34, 76,128] used a 170 ms syllable, which differs from the one used in the current study.

Future directions, implications, and limitations
Our results demonstrate differences between monolinguals and bilinguals in auditory brainstem responses elicited by speech in noise, even in the selected sample of normal-hearing young adults. This finding highlights the importance of considering an individual's language background when assessing brain processing and listening skills. Following the current findings, we expect more pronounced differences in bilingual clinical populations with more deficient perceptual abilities (for example, hearing-impaired individuals or older adults). Future studies are needed to evaluate this issue. Further, by establishing relationships between brainstem processing and perception of speech in noise among bilinguals, the current study provides new insights into how bilingualism shapes the brain and the impact this has on everyday listening situations. Our results can help determine which individuals may have a biological signature for excessive difficulty in challenging listening conditions. These findings can aid in the development of programs to help individuals predicted to encounter more difficulties in challenging listening conditions. Perhaps bilinguals with a brainstem system that is more susceptible to noise effects could benefit from speech-in-noise training or from using assistive listening devices that enhance the signal-to-noise ratio in academic settings.
Future studies are needed to determine the generalizability of the findings while considering the current study's limitations. In this study, perceptual and physiological stimuli were administered differently (binaurally versus monaurally and in the presence of white noise versus babble noise). In the physiological part, we followed previous protocols that presented the speech stimulus monaurally (e.g., [24,25,54,58]). The monaural stimulation enables one ear to be unoccluded, maximizing participants' alertness and promoting stillness. In addition, the use of a continuous, stable masker was preferred in the physiological part (similar to [20,30,159] using the 40 ms /da/] because we used a very short stimulus that made it difficult to combine with a constantly changing masker. Therefore, a future study that uses the same stimulation conditions in perceptual and neural measures is needed. Furthermore, as some differences between our findings and previous ones have been suggested to be attributed to using a shorter syllable, a follow-up study should examine the effects of stimulus duration and characteristics on physiological results. Such a study is critical to examine whether physiological differences observed depend on the characteristics of the speech being tested.

Conclusions
The current study addresses the subcortical neural aspects underlying speech perception in noise among bilinguals. We provide evidence for differences between monolinguals and bilinguals in perceptual performance and auditory brainstem responses evoked by speech stimuli presented in challenging listening conditions. The current results show that the susceptibility of the auditory system at subcortical levels correlates with the ability of bilinguals to perceive speech presented in noise and can predict which individuals may be more prone to the detrimental effect of noise. Implications of the current results were discussed, and future studies were proposed.