Incidental Acquisition of Foreign Language Vocabulary through Brief Multi-Modal Exposure

First language acquisition requires relatively little effort compared to foreign language acquisition and happens more naturally through informal learning. Informal exposure can also benefit foreign language learning, although evidence for this has been limited to speech perception and production. An important question is whether informal exposure to spoken foreign language also leads to vocabulary learning through the creation of form-meaning links. Here we tested the impact of exposure to foreign language words presented with pictures in an incidental learning phase on subsequent explicit foreign language learning. In the explicit learning phase, we asked adults to learn translation equivalents of foreign language words, some of which had appeared in the incidental learning phase. Results revealed rapid learning of the foreign language words in the incidental learning phase showing that informal exposure to multi-modal foreign language leads to foreign language vocabulary acquisition. The creation of form-meaning links during the incidental learning phase is discussed.


Introduction
There are many advantages to learning a foreign language (FL), such as a better understanding of another culture or a better chance of employment in an increasingly multilingual society [1]. However, learning a FL can be a difficult and frustrating experience. Informal exposure to a FL requires little effort and benefits FL learners. For example, in childhood, such exposure has been shown to help FL learners acquire a more native like accent as adults [2]. Advanced learners can also improve their FL speech perception by watching a FL film with FL subtitles [3]. Furthermore, exposure to a short FL weather report resulted in an increased sensitivity to the words heard in the weather report compared to other foreign language words [4]. Thus, informal exposure to spoken FL can give rise to speech perception and production benefits. However, can it lead to the acquisition of vocabulary through linking new FL forms with existing meaning representations?
In order to acquire form-meaning links, FL learners are often encouraged to read in the FL [5,6]. This type of informal exposure provides an incidental learning situation, where a few new words are acquired while learners read for pleasure. However, the incidental acquisition of FL vocabulary through reading is only suitable for more advanced FL learners. In order to be able do derive meaning from context, it is estimated that learners need to know at least 95% of the words in a text [7]. Beginner learners simply do not possess enough FL knowledge to achieve this. A multi-modal situation, which presents both verbal and pictorial information, may be more appropriate for learners of all levels, as in this case, the meaning of the words can be derived from the pictorial information. In such a situation, and with complete beginners it is so far unclear whether form-meaning links can be acquired incidentally.
Here we investigated the effects of a brief multi-modal incidental learning situation on subsequent explicit FL wordlearning with complete beginners of the FL. The current study differs from prior studies on FL vocabulary learning (see [4][5][6][8][9][10] for example) as it focused on incidental learning, with complete novices of the FL, and measured the potential acquisition of FL vocabulary after minimal exposure to the FL. Furthermore, the current study addresses the creation of formmeaning links through a few exposures to new FL word forms with their corresponding pictures.
As studies of incidental FL vocabulary learning have highlighted the need for sensitive measures of vocabulary knowledge [5,[8][9]11], we used a methodology based on the savings paradigm to measure the acquisition of FL vocabulary. The savings paradigm is more sensitive than typical recognition and recall tests [12][13][14][15] and has been used in recent studies of language attrition to detect traces of knowledge [16][17][18][19]. The idea of the savings paradigm originally comes from Ebbinghaus who noticed that once something had been learnt, a certain amount of residual knowledge remained in memory (referred to as the ''forgetting curve''); this residual memory trace facilitated relearning by reducing the number of trials to criterion, a phenomena now known as ''savings'' [20]. Importantly, in contrast to prior studies, the present study used the savings paradigm to detect traces of new FL vocabulary knowledge that has not necessarily reached the threshold for explicit recognition or recall.
As illustrated in Figure 1A, phase 1 of the experiment, the incidental learning phase, made use of multi-modal FL stimuli by presenting auditory and written FL words with a picture illustrating the meaning. Participants engaged in a letter-search task in order to provide an incidental learning situation. Importantly, participants did not know the FL and were unaware that their acquisition of FL vocabulary would be assessed later on. In order to complete the task, participants only needed to attend to the written word form: the auditory word form and the picture were irrelevant for the task. However, the meaning of the FL word could be inferred from the picture. In phase 2, the explicit learning task, participants were asked to learn the meaning of FL words through a translation recognition task. Auditory FL word forms from phase 1 (old words) as well as new auditory FL words not previously encountered were presented simultaneously with an English word that was either the correct or incorrect translation. It was expected that in the incidental learning phase, participants would start building some knowledge about the old words, and that this would help them reach the translation recognition threshold faster for these words then for completely new words during the subsequent explicit learning phase ( Figure 1B).
To ensure that differences in performance for the old and the new words in the explicit learning task could be attributed to acquisition rather than to attentional arousal, in the incidental learning phase, a different group of participants (mismatched group) saw picture stimuli that did not match the correct meaning of the words. If attentional arousal leads to an advantage for the old words, the results for this group should not differ from the group where the pictures matched the meaning of the words, as both groups were exposed to the same FL word forms.
Another group of participants (multi-session group) took part in phase 2 of the experiment the next day rather than immediately after phase 1 and they completed the translation recognition task once again one week later. This multi-session group was used to explore whether the incidentally acquired form-meaning links were transitory or became embedded in memory after a relatively long retention interval. In phase 1 (incidental learning), participants were exposed to 40 FL words in a letter-search task in which both the auditory and written forms of a FL (Welsh) word were presented simultaneously with a picture illustrating the meaning of the word (8 repetitions each). In Phase 2 (explicit learning), participants were presented with an auditory Welsh word and were asked to indicate with a button press whether the written English word presented simultaneously on the screen was its correct translation or not. The 40 words from phase 1 (old words) as well as 40 new words were used for this part of the experiment. It was expected that in the incidental learning phase, participants would start building some knowledge about the old words, and that this would help them reach the translation recognition threshold faster for these words then for completely new words during the subsequent explicit learning phase. doi:10.1371/journal.pone.0060912.g001

Ethics Statement
This research was approved by the School of Psychology Ethics Committee at the University of Nottingham, and all participants gave written informed consent prior to taking part.

Participants
Sixty-six participants took part in the experiment and received payment for their participation. Participants were all native English speakers with no prior knowledge of Welsh. They were split into four groups of participants. Two groups of 16 participants completed phase 1 and 2 of the study in a singlesession: matched picture group (mean age 21.6, 11 females) and mismatched picture group (mean age 21.0, 15 females). A multisession group of 18 participants (mean age 18.9, 15 females) completed phase 1 on the first day of the study, phase 2 the next day, and returned one week later to complete phase 2 once more (one participant from this group did not return one week later and was therefore removed from the analyses). A further 16 participants (mean age 25.0, 12 females) were included as a control group and only completed phase 2 of the study.

Stimuli
Welsh was chosen as the FL because it uses the same script as English but is sufficiently different from English so that participants could not simply guess the meaning of the words based on phonological or orthographic similarity. The stimuli consisted of 80 Welsh words (both the written and auditory forms) and 80 pictures corresponding to these words [21]. The words were split into two sets, and these were matched for category [21], word frequency in English (based on CELEX and British National Corpus) and word length in Welsh. None of the words were Welsh-English cognates. In phase 1 of the experiment (incidental learning phase), participants were exposed to one set of words (counterbalanced across participants and groups) with their corresponding pictures, whilst in phase 2 of the experiment (explicit learning phases), all words were used. For the mismatched picture group, the words were presented with a randomly assigned picture in the incidental learning phase (e.g. a picture of a dog presented with the auditory and written Welsh word ''bwrrd'' meaning ''table'') and presented with the same picture for all the trials in the incidental learning phase. The words used in phase 1 were labeled ''old words'' and the words participants were exposed to for the first time in phase 2 were labeled ''new words''. For the control group, one set of words was also classified as ''old words'' and the other as ''new words'' (counterbalanced across participants) to perform the analysis despite all of the words being presented for the first time for this group in phase 2 of the experiment.

Procedure
In phase 1 (incidental learning), participants were asked to perform a letter-search task. In each trial, they were presented first with a letter and then a written Welsh word. Their task was to indicate with a button press whether or not the word contained the letter. Each word was presented 4 times with a letter that was included in it and 4 times with a letter that was not (320 trials in total). Although irrelevant to the task, the corresponding auditory Welsh words and pictures were presented simultaneously with each written Welsh word. Participants were told that the words would be in a FL, but they were not informed that the FL was Welsh.
In phase 2 (explicit learning), participants were presented with each auditory Welsh word and were asked to indicate with a button press whether the written English word presented simultaneously on the screen was the correct translation or not. Each Welsh word was presented once with the correct translation and once with a foil in each block. The foils were chosen randomly from amongst the correct English translations and were different for each block. After each trial, participants received feedback on the screen (''correct'' or ''incorrect'') and they were instructed to use this feedback to learn the correct translations. At the end of each block (160 trials), the percentage of correct answers was calculated and displayed on the screen, and the task continued until a criterion of 80% correct answers in one block was met or after a maximum of 4 blocks (this was reduced to a maximum of 3 blocks for the multisession group). For this part of the experiment, participants were informed that they would be asked to learn some Welsh words, however they were not told that some of the words had already been presented in phase 1.

Results
The number of hits and false alarms in blocks 1 and 2 of phase 2 of the experiment were used to calculate d' (d-prime) scores (see [22]) for all groups of participants for both old and new words. As participants had reached criterion in block 2 and therefore did not proceed to block 3, we did not analyze the results of block 3. Furthermore the analyses of block 2 yielded the same results as the analyses of block 1, and therefore we only report the results of block 1 throughout the Results section.

Single-session Groups
3.1.1. Matched vs. mismatched picture groups. Accuracy. The overall error rate in the letter-search task of incidental learning phase (phase 1) was low (5.8%).
The d' scores for block 1 of phase 2 were analyzed using a mixed-design ANOVA with group as a between-subject factor (matched and mismatched picture groups) and word type (new and old words) as a within-subject factor. The results showed significant main effects of word type, F(1, 30) = 5.43, p,.05,  Figure 3).
An analysis of hits only (correct match trials) revealed that the matched picture group were significantly faster at responding to the old words compared to their responses to the new words, F 1 (1, 30) = 6.69, p,.05, g p 2 = .18, F 2 (1, 79) = 6.45, p,.05, g p 2 = .08, whereas there is a trend for the mismatched picture group to be slower at responding to the old words compared to the new words, F 1 (1, 30) = 3.23, p = .08, g p 2. = .09, F 2 (1, 79) = 3.46, p = .07, g p 2 = .04. We do not report the full analyses of response times for hits as it yields the same results as the accuracy analyses.
3.1.2. Control vs. matched picture groups. Accuracy. A mixed-design ANOVA with group as a between subject factor (matched picture group and control group) and word type as a within-subject factor (old and new words) revealed a significant main effect of word type, F(1, 30) = 9.53, p,.01, g p 2 = .24, however, the main effect of group was only marginally significant, F(1, 30) = 3.92, p = .06, g p 2 = .12. Crucially, the interaction between word type and group was significant, F(1, Response Times. A mixed-design ANOVA with group as a between-subject factor (matched picture group and control group) and word type (new and old words) as a within-subject factor revealed no significant main effects of group, F 1 (1, 30) = 1.41, p = .24, g p 2 = .05, F 2 (1, 79) = 57.05, p,.001, g p 2 = .42, or word type, Fs ,1, and no interaction Fs ,1. 3.1.3. Control vs. mismatched picture groups. Accuracy. A mixed-design ANOVA with group as a between subject factor (mismatched picture group and control group) and word type as a within-subject factor (old and new words) revealed neither a main effect of word type, F(1, 30) = 3.67, p = .07, g p 2 = .11, nor of group, F(1, 30) = 2.73, p = .11, g p 2 = .08 and no interaction between word type and group, F(1, 30) = 1.02, p = .32, g p 2 = .03. Response Times. A mixed-design ANOVA with group as a between-subject factor (mismatched picture group and control group) and word type (new and old words) as a within-subject factor revealed neither a main effect of word type, F 1 (1, 30) = 4.14, p = .05, g p 2 = .12, F 2 (1, 79) = 2.61, p = .11, g p 2 = .03, nor a main effect of group, F 1 ,1, F 2 (1, 79) = 14.16, p,.001, g p 2 = .15. However, there was a strong trend for an interaction between group and word type,

Multi-session Group
Accuracy. Error rates in the letter-search task of phase 1 were again low (6.4%).
The d' scores for block 1 of phase 2 were submitted to a repeated-measures ANOVA with word type (new and old) and delay between phases (one day and one week) as within-subject factors. After a one week delay, many participants only completed one block of trials as they reached criterion in block 1, and therefore we did not analyze the results of block 2 for the multisession group. The results showed a main effect of word type, indicating that d' scores were significantly higher overall for old words than for new words (M = 1.03, SE = 0.12 vs. M = 0.65, SE = 0.11), F(1, 16) = 10.78, p,.01, g p 2 = .40. Furthermore, there was a main effect of delay between phases, indicating that d' scores were overall higher one week later than the next day (M = 1.23, SE = 0.15 vs. M = 0.45, SE = 0.08), F(1, 16) = 42.81, p,.001, g p 2 = .73. This was expected however, as participants returned to complete the translation recognition task one week later, having already completed 2 or 3 (depending on when they reached the 80% criterion level) blocks of learning on this task the day after phase 1. This explains the overall higher accuracy scores one week later relative to the first block after a day delay. Importantly, there was no interaction between word type and delay between phases, F ,1, which indicates that participants scored significantly higher for the old words both the next day, F(1, 16) = 8.82, p,.01, g p 2 = .36 and one week later, F(1, 16) = 7.93, p,.05, g p 2 = .33. Finally, similarly to the single-session groups, d' scores in phase 2 for the new words were significantly above chance, t(16) = 2.95, p,.01, d = 0.72.

Discussion
The results revealed incidental acquisition of FL vocabulary through a brief exposure to multi-modal stimuli. Being exposed to the written and auditory word forms of the FL words, as well as a picture illustrating the meaning of the word, resulted in incidental acquisition of FL vocabulary knowledge as shown by the higher scores for these words in the translation recognition task both immediately after the incidental learning task as well as the next day. In addition, the incidental learning effect remained one week later in the subsequent explicit learning task.
Participants in the mismatched picture group did not benefit from being exposed to the old words in the incidental learning phase, in fact, they suffered from being exposed to the wrong pictures as shown by significantly slower responses to the old words than the new words in the explicit learning phase. This disadvantage caused by the mismatched pictures in the incidental learning phase indicate that this group made form-meaning links that were incorrect. Thus, the higher scores for the words included in the incidental learning phase for the groups exposed to the correct pictures is due to the representation of form-meaning links rather than simple arousal.
An important question is what kind of learning best explains the results of both the matched and mismatched picture groups. Crucially, the observed acquisition of vocabulary reflects more than paired-associate learning between the auditory FL word form and the written native language word form, as this pairing was not presented in phase 1. Here, participants were exposed to the written FL word form (necessary to complete the letter-search task), the auditory FL word form and the meaning of the word via the picture. Written English translations were not presented in phase 1. One explanation for the results is that participants linked the FL word forms with the semantic representation of the words activated by the pictures during phase 1. Then, when the auditory FL word forms were presented in phase 2, participants activated the meaning of the FL words (acquired via the pictures in phase 1) and from there, they could accessed the written native language word form and reach a decision as to whether the translation was correct. Equally, translation recognition could have occurred if the written English word form activated its meaning which in turn was linked to the FL word form. Either way, participants relied on form-meaning links acquired during phase 1 to complete the translation recognition task. This interpretation is compatible with Dobel et al. [23] who also argued that form-meaning links were created during their statistical learning paradigm. In their study, participants were exposed to novel phonological word forms (pseudowords in the native language) in combination with pictures, with correct pairings occurring more frequently than incorrect ones. After completing 5 sessions over 5 consecutive days, participants achieved 90% accuracy in a translation test. The authors concluded that their results showed learning beyond mere stimulus-stimulus association, as the native language word forms used in the translation test were never presented during the statistical learning paradigm.
An alternative explanation for the observed incidental learning effect found here is based on the cascading activation model of speech production (see [24][25][26][27]). This model predicts that even irrelevant pictures will automatically activate their conceptual representation, and that this in turn will cascade down to activate the lexical representations. Applying this model here would suggest that during the incidental learning phase, the presentation of the line drawings automatically activated the semantic representation for the concept and that this in turn activated the native language lexical representation of the word. As a consequence, it is possible that links were created between the latter and the FL lexical representations (phonological and/or orthographic). However, our task did not involve naming, and it is less clear whether pictures that are irrelevant would activate lexical representations in a task that does not require explicit picture naming (but see [28]). Crucially, even though the cascading model predicts the activation of the native language word form during the processing of the line drawing, the concept still needs to be activated first. Therefore, links could have been created between the FL word forms and BOTH the concept AND the native language word form, i.e., both form-meaning and form-form links. It is also important to remember that all this happened extremely rapidly and in parallel while participants were performing the letter-search task, which required attention to be focused on the FL written word form.
Our current data does not allow us to rule out the second explanation for the locus of the incidental FL vocabulary learning. However, what is certain is that representations in the mental lexicon, either semantic and/or lexical, were automatically activated during the incidental learning phase, and that this in conjunction with the processing of the FL word forms was responsible for the learning. Furthermore, the current study does not enable us to describe the neural mechanisms responsible for the creation and consolidation of form-meaning links, nor was it the aim of the experiment. However, these would likely involve working memory structures (for example the episodic buffer) [29][30] with a rapid initial familiarization stage followed by a slower consolidation process as proposed by the complementary learning systems model of memory [31].
Regardless of the precise locus of the form-meaning links, the acquisition of FL vocabulary occurred very rapidly in the incidental learning phase as FL words were only presented 8 times. This is much faster FL word learning with complete novices than found in previous studies. For example, McLaughlin, Osterhout and Kim [10] reported the first evidence of vocabulary learning (the learning of word forms) after 14 hours of exposure to French in a classroom setting. However, learners only became sensitive to the semantic properties of the FL words after approximately 60 hours of exposure. Another interesting study using informal exposure to a 7-minute Chinese weather report showed that some participants became sensitive to the spoken word forms included in the report as opposed to new FL word forms [4]. This study used a similar approach to ours, as the learning was incidental, and the words were presented between 2 to 8 times each in the weather report. However, the results revealed sensitivity to the word forms, which is an early stage of FL word learning, but not to the meaning of the words.
Another important aspect of the present data is the persistence of the incidental learning the next day as well as one week later, which highlights the long lasting impact of informal multi-modal FL exposure in vocabulary learning. Thus, this predicts that multimodal FL exposure through activities such as games or watching FL films with subtitles could facilitate subsequent formal vocabulary learning even days later.
The new methodology to measure vocabulary acquisition in the current study was based on the savings paradigm. The results indicate that this paradigm is sensitive enough to detect differences in lexical knowledge between words presented in an incidental learning phase and completely new words. This more sensitive measure of vocabulary acquisition could be used in future incidental learning studies as an alternative to traditional recognition and recall vocabulary tests because it does not require explicit vocabulary knowledge.
Overall, our results show for the first time incidental vocabulary learning beyond the form level with complete beginners in the FL. Importantly, this learning persisted the next day as well as one week later. Learning and being able to use FL vocabulary fluently takes a long time, and the present findings show that incidental vocabulary acquisition through multi-modal exposure can play an important role in facilitating this process.