Can adults learn L2 grammar after prolonged exposure under incidental conditions?

Panagiotis Kenanidis; Ewa Dąbrowska; Miquel Llompart; Diana Pili-Moss

doi:10.1371/journal.pone.0288989

Abstract

While late second language (L2) learning is assumed to be largely explicit, there is evidence that adults are able to acquire grammar under incidental exposure conditions, and that the acquisition of this knowledge may be implicit in nature. Here, we revisit the question of whether adults can learn grammar incidentally and investigate whether word order and morphology are susceptible to incidental learning to the same degree. In experiment 1, adult English monolinguals were exposed to an artificial language (Kepidalo) that had case marking and variable word order: a canonical Subject-Object-Verb order and a non-canonical Object-Subject-Verb. In a five-session online study, participants received vocabulary training while being incidentally exposed to grammar, and completed a series of picture-selection and grammaticality judgment tasks assessing grammatical knowledge. Despite extensive exposure to input, and although performance on vocabulary increased significantly across sessions, learners’ grammatical comprehension showed little improvement over time, and this was limited to Subject-Object-Verb sentences only. Furthermore, participants were better at detecting word order than case marking violations in the grammaticality judgment tasks. Experiment 2 further increased the amount of incidental exposure whilst examining native speakers of German, which exhibits higher morphological richness. Testing was followed by a post-test metalinguistic awareness questionnaire. Although greater learning effects were observed, participants continued to have difficulties with case marking. The findings also demonstrated that language outcomes were modulated by learners’ level of metalinguistic awareness. Taken together, the results of the two experiments underscore adult learners’ difficulty with case marking and point towards the presence of a threshold in incidental L2 grammar learning, which appears to be tightly linked to prior first language experience. In addition, our findings continue to highlight the facilitative role of conscious awareness on L2 outcomes.

Citation: Kenanidis P, Dąbrowska E, Llompart M, Pili-Moss D (2023) Can adults learn L2 grammar after prolonged exposure under incidental conditions? PLoS ONE 18(7): e0288989. https://doi.org/10.1371/journal.pone.0288989

Editor: Claudia Felser, Potsdam University, GERMANY

Received: November 27, 2022; Accepted: July 7, 2023; Published: July 26, 2023

Copyright: © 2023 Kenanidis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data and code are available on OSF: DOI 10.17605/OSF.IO/3JY52.

Funding: We acknowledge financial support by an Alexander von Humboldt Professorship (ID- 1195918) awarded to the second author.

Competing interests: The authors have declared that no competing interests exist.

Introduction

There is a general consensus that mastering a second language (L2) is a notoriously demanding task, particularly for adult learners, and, therefore, native-like attainment is very rarely achieved. Previous studies on L2 acquisition indicate that learning a language at a later stage in life is largely guided by explicit learning [1–3]. Consequently, studies comparing L2 learning under incidental and intentional conditions have demonstrated that the presentation of explicit information about the learning target leads to greater learning gains [4–6]. By contrast, first language (L1) acquisition is a process that is thought to occur mostly unconsciously and automatically (e.g., [7–10]).

Interestingly, however, a growing body of research on artificial language learning has provided evidence that participants can rapidly develop knowledge about different aspects of a novel (L2) grammar even under incidental conditions (e.g., [11–13]). Yet, a recurrent pattern in these studies is that participants typically achieve performance that is only slightly above chance and rarely exceeds 60% accuracy [14, 15]. These findings may be attributable to three different yet potentially overlapping factors. First, adults’ capacity to learn grammatical rules incidentally may, to a certain extent, be affected by maturational constraints, resulting in low learning rates [9, 16]. Second, language learning outcomes are frequently assessed after a limited amount of exposure to input, usually confined within a single session, which may be insufficient for learners to develop robust grammatical knowledge (for studies with an extensive language training regimen see [17, 18]). In fact, notwithstanding their differences, contemporary cognitive models of L2 acquisition (e.g., [2, 9, 19, 20]) converge in suggesting that repeated exposure to input and practice can lead to better language learning outcomes. Third, adult L2 attainment appears to be substantially affected by learners’ previous L1 experience [1, 21, 22]. Hence, given that in the majority of studies the target population consisted of native speakers of English [11–13], a fixed word order language, the extent to which previous learning outcomes can be attributed to limits in learning under incidental exposure conditions per se, or whether they are additionally modulated by previous L1 experience, is still far from clear.

The present study set out to revisit the question of whether adults can learn novel grammatical structures incidentally, while addressing the aforementioned gaps in the literature. To this end, we examined if word order and inflectional morphology are susceptible to learning under incidental conditions to the same extent, as well as the degree to which their learnability is affected by the amount of exposure to the novel language and the level of similarity between the L1 and the new target language. Furthermore, most studies have focused on testing the learning of one grammatical feature at a time (e.g., case-marking; [13, 17]; word order; [23–25]; noun-adjective agreement; [26]; verb morphology; [27]). Therefore, surprisingly, little is known about the order in which different aspects of grammar are acquired when adult learners are incidentally exposed to multiple grammatical features simultaneously. Comparing the simultaneous acquisition of various grammatical features can allow studying how participants weight novel linguistic cues, which one(s) they prioritize and how the development of one may influence the development of the other(s), thus providing important insights into the language acquisition process. Note that, in this paper, the terms acquisition and learning will be used interchangeably to refer to the process by which one learns language.

Background

Incidental and intentional L2 learning

As part of the effort to gain a better understanding of the fundamental cognitive mechanisms underlying L2 learning, researchers have explored how learners acquire linguistic knowledge under two different conditions: incidental and intentional exposure conditions. In this context, intentional exposure refers to the condition in which participants are given explicit information about the learning targets or are instructed to engage in deliberate hypothesis testing and memorization of rules [3, 28, 29]. Such conditions promote primarily the engagement of explicit learning processes, which are thought to contribute primarily to the development of explicit knowledge, often signified by learners’ ability to verbalize the acquired rules [29, 30]. In contrast, incidental exposure is operationalized as the situation where participants are not informed about the learning target and the subsequent test phase [29, 31]. To achieve this, a cover task is used which is intended to focus learners’ attention on another activity that requires processing the input for meaning, instead of overtly encouraging them to consciously focus on the linguistic structure. Learning under incidental conditions is considered to favor the involvement of implicit learning processes, which result in the acquisition of implicit knowledge [30].

Intentional and incidental conditions are usually conflated with the terms explicit and implicit, respectively, and they are occasionally used interchangeably. However, they are not identical. The former two terms are more appropriate for describing the experimental conditions researchers design to investigate the type of learning that is taking place, as well as the nature of the L2 knowledge participants develop. In contrast, the latter two refer to the internal learning process that is engaged while acquiring new knowledge. The distinction can account for previous findings showing that participants can engage both explicit and implicit learning processes and can acquire both explicit and implicit knowledge irrespective of the conditions to which they are exposed [11, 27, 32–35]. In accordance with this distinction, the terms intentional and incidental will be used with reference to the environmental conditions under which learning is taking place without making any assumptions about the underlying language processes.

While a series of studies on artificial and semi-artificial languages has demonstrated a significant advantage of learning novel grammatical structures under intentional over incidental exposure conditions [24, 35–37], learners have been shown to succeed not only in learning grammatical structures incidentally, but, in some cases, in developing knowledge that is partly implicit (e.g., [13, 38, 39]). Although these findings may suggest that some aspects of L2 grammar can be learned without intention or awareness, the overall learning effect observed is generally limited. For example, Rebuschat and Williams [30] exposed adult learners to an artificial language consisting of English vocabulary and German word order and evaluated learning of word order via a grammaticality judgement task (GJT). Participants in the incidental group only performed with ~55% accuracy, while the addition of an elicited imitation task resulted in an increased learning effect (~62%). Subsequent studies have found similar learning effects [23, 24]. Small learning effects have also been reported in studies targeting the learning of novel case markers. Rogers et al. [13] tested L1 English speakers’ ability to learn case-marking incidentally by presenting them with a semi-artificial language consisting of English phrases and Czech nouns marked either for nominative (-a) or accusative case (-u). This design was aimed at directing participants’ attention to the target grammatical markers, thereby facilitating noticing. Despite that, and although learners showed above-chance performance, mean accuracy was only ~56%. These results have been corroborated by subsequent studies testing inflectional morphology learning under incidental exposure [12, 40, 41].

One potential explanation for these findings relates to the limitations imposed on learners by the nature of implicit learning. Incidental contexts are thought to engage primarily implicit learning processes [42]. According to the literature on L2 acquisition, the ability to learn a language implicitly decreases with age [16, 43]. However, this age effect does not apply uniformly to all components of implicit learning [44]. Therefore, adults may retain the ability to learn simple structures implicitly but may face problems with low-salience and abstract linguistic patterns and rules [45], which may explain learners’ severe difficulties with L2 inflectional morphology.

Additionally, implicit learning requires extensive and repeated exposure to input for the development of new linguistic knowledge [1]. Despite this, most studies examine learning outcomes shortly after exposure to novel structures, which may be too brief for robust learning to occur [14]. Importantly, prior work has shown that a mere increase in exposure to the linguistic stimuli over the same experiment does not significantly improve learning gains [41, 46]. In contrast, accuracy scores tend to improve when the length of exposure is extended to at least a second session [18]. One explanation for these different findings may be found in sleep-related memory consolidation of new information. Such memory effects have been demonstrated both for cognitive abilities, such as implicit learning [47], and for novel word learning [48]. Hence, it is likely that the findings of studies examining grammar learning after minimal exposure to input may underestimate adults’ learning abilities. Thus, given the scarcity of (micro-)longitudinal studies in this area, the extent to which adults can learn grammatical rules under incidental conditions is still unclear, and so is whether receiving relatively extensive incidental exposure would result in higher levels of L2 grammatical accuracy. To gain a fuller understanding of adult learners’ capacity for incidental grammar learning, in the present study, language exposure was spread over five separate sessions, which allowed us to track how learning develops over time.

L1 transfer in L2 grammar learning

Another reason for the meagre learning effects observed in previous studies may be tied to the fact that the acquisition of various aspects of grammar is generally particularly challenging for late L2 learners [49–51]. The cause of these difficulties can be traced to various factors, such as input frequency [52, 53], complexity of features [54], emotions [55], individual differences in cognitive abilities [56] and prior L1 knowledge [57–59]. Among these factors, understanding how prior linguistic experience can influence the perceived difficulty of L2 structures has received considerable attention in the L2 learning literature. Previous research has shown that, at the initial stages of learning a new language, participants tend to use L1 processing strategies to interpret L2 sentences [60–63]. Prior L1 experience tunes the perceptual system interfering with subsequent L2 processing. Associative learning mechanisms are, thus, hindered by such learned attention effects [58, 64]. Specifically, earlier L1 experience with a cue (e.g., a temporal adverb such as yesterday or today) that reliably leads to a particular outcome (e.g., temporal reference) may block the acquisition of another cue that is also relevant for the interpretation (e.g., past tense -ed) [65]. Such effects can be particularly detrimental for cues that lack perceptual salience, have low communicative value (e.g., agreement) and are not present in the L1.

However, studies directly addressing the effect of L1 experience on artificial language learning are scarce. Some evidence of this effect comes from Williams and Kuribara [25], who exposed L1 English speakers to a semi-artificial language consisting of English words and Japanese syntax. Participants in the incidental exposure condition were informed about the function of the case markers and were presented with a number of sentences, including mainly canonical sentences and a minority of scrambled structures. They were then tested on their ability to learn the different word order regularities of the language. The results of a GJT containing novel lexis and some new structures showed that while participants learned the canonical structures, they did not reliably reject the new ungrammatical sentences, particularly those that were grammatical in English, indicating that they did not generalize the notion of scrambling to new sentences. Instead, learners developed a strong preference for canonical word orders. Additional evidence is provided by Gao and Ma [35]. In a replication of the Tagarelli et al. [24] study, L1 Chinese participants were presented with sentences that had Chinese vocabulary and German grammar, allowing for three grammatical structures, one simple and two complex structures, which differed in terms of verb placement. Following exposure, participants trained in both incidental and instructed conditions completed a GJT and an elicited production task. While in the original study the incidental exposure group of L1 English speakers learned both the simple and one of the complex patterns, linguistic complexity did not emerge as a significant predictor for the L1 Chinese speakers, who performed close to chance on all structures. According to the authors, performance can be attributed to the fact that Chinese allows for verbs to occur later in the sentence, causing strong L1 interference. Similar findings appear to emerge from studies examining incidental learning of mappings between novel determiners and semantic properties of nouns. In one of their experiments, Leung and Williams [66, 67] introduced L1 Chinese and L1 English speakers to a miniature artificial determiner system and instructed that the determiners encode the distance between the speaker and the object (gi and ro for near objects and ul and ne for far objects). However, they were not informed that these determiners also referred to the shape of objects (gi and ul referred to long objects while ul and ne referred to flat objects). Subsequently, participants were tested on their ability to incidentally learn the relationships between the determiners and their shape meanings. Both groups were visually presented with noun phrases (e.g., gi shoelace vs ul tissue) in their native language and were asked to indicate, as quickly and accurately as possible, first whether the object presented was long or flat and, secondly whether the object was near or far. L1 Chinese speakers, but not L1 English speakers, managed to learn the hidden associations, taking advantage of the fact that the shape distinction is explicitly encoded in the classifier system of (written) Chinese. Using the same artificial determiner system and experimental design, Cayado and Chan [67] tested Chinese–English bilinguals’ and native English speakers’ ability to learn the associations between determiners and fire/water semantic categories (gi and ul for water-related words and ro and ne for fire-related words), a distinction that is also marked in written Chinese. Here, test items were presented in English to both groups. While both groups showed evidence of learning, Chinese–English bilinguals responded faster than the L1 English speakers despite testing taking place in their L2. Thus, overall, earlier studies suggest that different patterns of performance may arise depending on learners’ L1 background. Yet, to date, the role of L1 experience and transfer in artificial language has not been comprehensively tested, limiting the generalizability of previous findings as well as the potential and limits of adult incidental grammar learning. Therefore, an additional aim of the current study was to remedy this by investigating how prior linguistic experience moderates L2 grammar learning under incidental exposure conditions.

Artificial language paradigms

Natural languages are highly complex; consequently, isolating and examining how learners acquire a particular structure or pattern and what factors are involved in the acquisition process can be a difficult endeavor. This problem can be overcome by using artificial language paradigms [for reviews, see 68, 69]. The use of such paradigms allows for exerting full control over the type of structures or patterns to be tested, the degree of (di)similarity to learners’ known language(s), the amount of input that learners are exposed to and the type of exposure. In contrast to natural languages, artificial languages allow participants to achieve high levels of proficiency within a short amount of time. Furthermore, given that the vast majority of artificial languages studies is conducted within a controlled laboratory environment, researchers are afforded the opportunity to specify the desired inclusion criteria and focus on specific structures, while also avoiding potential confounds associated with the characteristics of the participants [70].

However, the use of these paradigms does have some potential limitations, the major of which is likely the concerns over their ecological validity. This is because, given the simplified nature of the input and the target structures, the results from artificial languages may not fully scale up to natural languages. This seems to be particularly relevant for semi-artificial languages, where the insertion of artificial or unknown morphological markers to real words, often known by learners, may increase the salience of these markers, and consecutively their learnability [27, 33]. Despite these concerns, previous neuroimaging studies suggest the existence of significant parallels in the brain activity during artificial and natural language processing [71–73]. Additionally, performance on artificial language learning measures has been found to correlate positively with accuracy on natural language learning measures [74]. Therefore, the methodological advantages that these artificial language paradigms offer and the similarities in the neural correlates and mechanisms underlying processing of novel and native language constructions allow such paradigms to serve as ‘test tube’ models of natural languages [38], thus, making them a particularly useful tool to investigate L2 leaning and bilingualism [75, 76].

The present study.

The primary goal of the current study is to investigate whether adults can learn different aspects of novel language grammar under incidental exposure conditions. Extending earlier work, we explore whether prolonged incidental exposure can lead to more robust learning effects. Additionally, we examine the extent to which grammar attainment is influenced by learned attention effects stemming from learners’ prior L1 experience. The findings of two experiments are reported here. In the first experiment, adult L1 English speakers were exposed to an artificial language over five separate sessions, during which they were trained on the vocabulary of the language and completed a series of grammatical comprehension tests. The artificial language, Kepidalo, had variable word order and case marking on nouns and adjectives, features that are not present in English. To further tease apart the effects of incidental exposure and prior L1 experience on grammar learning, in a follow-up study (i.e., Experiment 2) we repeated the same experiment, but this time with native speakers of German, a morphologically richer language, while also increasing the amount of exposure to six sessions. In addition, Experiment 2 also investigates the relationship between learning outcomes and participants’ level of metalinguistic awareness, which was assessed by means of a post-test questionnaire.

The research questions addressed in the following experiments were the following:

RQ1. Can adult learners acquire grammar under incidental exposure conditions?
RQ2. If so, what aspects do they acquire (word order, case marking, agreement marking)?
RQ3. Is extensive incidental exposure sufficient to obtain robust learning effects?
RQ4. To what extent does learners’ L1 background modulate L2 grammar learning?
RQ5. To what extent is L2 grammar learning associated with metalinguistic awareness of the target structures?

Regarding RQ1, based on previous research demonstrating at least some grammar learning after a single session of incidental exposure to the linguistic stimuli (e.g., [12, 13, 23, 40]), and considering the extensive amount of artificial language input participants received, we predicted that evidence of grammar learning would be found for both L1 groups. For RQ2, following previous studies, we hypothesized that both the L1 English and the L1 German participants would show greater learning effects for word order than for morphology, given the low salience of morphosyntactic cues. For RQ3, given the scarcity of available research, our predictions are more tentative. While both Rogers [40] and Williams [46] failed to find better performance after increasing or even doubling the amount of exposure to stimuli within the same session, Pili-Moss, Brill-Schuetz, Faretta-Stutenberg and Morgan-Short [77], who provided a session-by-session analysis of the data originally collected in Morgan-Short et al. [18], found that learners’ grammatical abilities improved over time (see also [78] for a similar pattern of results). Given that in the present study participants completed each session on different days, allowing for consolidation effects to occur, we hypothesized that grammatical comprehension would increase as a function of time. Despite that, we still expected persistent difficulties with inflectional morphology throughout the study. Regarding RQ4, it was predicted that the L1 English learners would show strong L1 transfer effects leading to relatively low accuracy scores at the early stages of exposure, with performance then improving as a function of time (experiment 1). Since the L1 German participants have prior experience with case marking and word order variation from their L1, we expected them to outperform the L1 English group on all aspects of grammar (experiment 2). Finally, for RQ5 (experiment 2), we hypothesized that the development of knowledge of which learners are aware would result in greater learning outcomes, given its facilitative role in L2 learning [17, 39, 79–81]. In addition, it was expected that the effect of metalinguistic awareness on learning would become stronger with increased exposure to artificial language input. Predictions regarding RQs 1, 2, and 5 were borne out, while our results regarding RQs 3 and 4 were less conclusive.

Experiment 1

Method

Participants.

Forty-one adults with a mean age of 22.02 years (SD = 4.17, range = 18–35) participated in the study. All participants were monolingual native speakers of English who were resident in the United Kingdom. Recruitment was conducted via email and social media (Facebook and Twitter; N = 14) and through Prolific, an online participant recruitment platform (https://www.prolific.co; N = 27). To ensure that participants fulfilled the inclusion criteria, the following filters were applied: English speaking Monolingual, Nationality: United Kingdom, Country of Birth: United Kingdom, Age: 18–35, Country of Residence: United Kingdom. Participants were asked to electronically consent to take part in the study and received monetary compensation for their time (60.70 GBP). Based on self-reports, participants had on average 15.93 years of formal education (SD = 1.79, range = 12–21).

Artificial language learning game.

Participants were exposed to the novel artificial language in the context of an online computer-based game. In this game, the learners’ task was to travel to Tikon, a fictitious galaxy, and complete a number of challenges in order to collect four weapons that would help them defend the earth from an alien invasion. In order to accomplish their goal, participants had to learn an artificial language, namely Kepidalo.

The lexicon of Kepidalo comprised 14 disyllabic pseudowords: 8 nouns, 4 verbs and 2 adjectives (see S1 Appendix). The verbs designated semantically transitive events and always occurred with a direct object. All nouns and adjectives were overtly marked for case. The nouns were evenly distributed into two classes. The nominative case of nouns belonging in Class 1 was marked with the suffix -i, whereas nouns of Class 2 bore the suffix -a. In the accusative case, nouns of both classes took the suffix -o. The novel words were constructed to be easily pronounceable by participants.

In terms of syntax, Kepidalo was a verb-final language in which the order of subject and object was free, thus exhibiting either a canonical (SOV; 1a) or a non-canonical (OSV; 1b) word order. Adjectives were optional, occurred postnominally and carried an inflection morpheme that agreed in class and case with the noun they modified.

(1). a. Noun_NOM−(Adj_NOM)–Noun_ACC−(Adj_ACC)–Verb
b. NounACC−(Adj_ACC)–Noun_NOM−(Adj_NOM)–Verb

A total of 400 Kepidalo sentences were generated for the purpose of the experiment. Within these sentences, all lexical items (nouns, verbs and adjectives) occurred an equal number of times, with each noun being assigned to the subject and the object positions with equal frequency (45 times in each position). In addition, 290 of the sentences were SOV while the remaining 110 sentences had a non-canonical OSV word order. All sentences were three to five words long and had an average duration of 1967 ms (range = 1622ms– 2742ms). The auditory stimuli were synthesized using the Google Cloud Text-to-Speech service. We opted for the use of a Polish accented synthesized voice to contribute to the impression that participants were learning a new language spoken by an alien character. Furthermore, a slightly slower than the normal speaking rate was employed (0.75 with 1 being the normal), as this is thought to facilitate L2 comprehension [82, 83].

The novel sentences described the actions of eight alien cartoon characters that corresponded to the eight nouns in the artificial language. The aliens appeared in two different colors, dark red or light green, each of which corresponded to one of the two Kepidalo adjectives. Short animated scenes in which the aliens were seen performing one of four simple actions (approaching, catapulting, chasing, or jumping over) were generated and were then converted to GIF format. The reason for using GIFs instead of videos was twofold: first, they are small in size, thus taking less time to load even on devices with slower network connections [84]. Second, GIFs’ ability to loop continuously allows participants to spend as much time as they need to while processing the stimuli while also minimizing the need to interact with the device during playback [85].

General procedure.

The experiment was conducted online through the Gorilla experiment builder [gorilla.sc; 86] and could only be accessed via computers and laptops. Participants were, first, asked to electronically fill out a short background questionnaire concerning their demographics and prior language experience. Those who met all the inclusion criteria individually participated in five separate sessions within a 10-day span. The time interval between sessions was at least 24 hours but not more than 48 hours. The experimental stimuli and scripts for all tasks used in this study are available on Gorilla Open Materials (https://app.gorilla.sc/openmaterials/484592).

A summary of the tasks that participants had to complete in each session, and the order in which they were administered is provided in Table 1. Over the first four sessions, participants were trained on the vocabulary of the novel language and were tested on their knowledge of the language’s grammar (word order, case marking). During the final session, two additional tasks designed to probe grammatical knowledge were administered. Each session also included a cognitive test measuring individual differences (not discussed here). The order of presentation of the tasks was the same for all participants.

Download:

Table 1. Summary of the artificial language tests administered during the study.

https://doi.org/10.1371/journal.pone.0288989.t001

Pretraining.

The first session of the study began with training on the nouns of the novel language. Participants were told that, as part of their mission, they would need to learn the names of the inhabitants of the Tikon galaxy. A four-alternative forced-choice (4AFC) task modeled after Llompart and Reinisch [87, 88] was used to assess learning of the lexical items. The task contained 2 identical phases: a training phase consisting of 64 familiarization trials, and a test phase during which each of the 8 nouns was presented twice, for a total of 16 trials. In each trial, four aliens of the same color were presented simultaneously, one in each corner of the screen. When participants clicked on a “Play” button, the name of one of the aliens was presented auditorilily in the nominative form (e.g., Alg-i, Flub-a). The participants’ task was to choose the picture that matched the word they had just heard. Visual feedback on accuracy was provided in the form of a green tick for correct answers or a red cross for incorrect answers. Following this, the target alien appeared on the screen in isolation for 1500ms and the corresponding noun was presented auditorily again. The location of the target items on the screen was randomized and the same trial sequence was used for all participants.

The familiarization phase was mandatory for all participants and the test phase differed in that participants’ scores determined whether they were ready to proceed to the next phase. Participants who achieved 100% accuracy on the test trials proceeded immediately to the next task, while those who failed to achieve the target score (n = 16) were given an additional 16 trials practice. After these additional trials, all participants were allowed to move on to the next task regardless of their final score.

Lexical training.

During each of the first four sessions, participants were exposed to 270 auditorily presented Kepidalo sentences, which were pseudo-randomly selected from the total set of 400 sentences that were originally generated. To control for possible recency and primacy effects, four randomized lists were created, one for each session. The sentences were divided into three training blocks, each containing 90 sentences. The order in which the sentences appeared was the same across participants. To keep participants motivated throughout the task, they were told that each correct answer would award a unit of solar energy which would propel them towards the next planet in the game, whereas each incorrect would decrease their solar energy by 1 unit. All participants advanced to the next task, regardless of the number of correct answers.

Lexical training involved a two-alternative forced-choice task (2AFC). In each trial two short animated scenes, each showing two aliens performing an action, were simultaneously displayed on the screen (Fig 1), while a sentence describing the events depicted in one of the two scenes was played (2). Participants were instructed to indicate, as quickly and accurately as possible, which of the two scenes the sentence described. Participants could replay the sentence a second time if they wished, and the animated scenes looped continuously until they responded. The side of the target scene was counterbalanced and participants received visual and auditory feedback immediately after they provided a response. At the end of each of the three blocks, a display showing participants’ cumulative score was presented.

(2). Velg-a pog-a prad-o kov-o varek
velg-NOM green-NOM prad-ACC red-ACC jump-over
the green velg jumps over the red prad

Download:

Fig 1. Screenshot of a training trial in the lexical training task (Left scene: The (green) velg is jumping over the (red) prad, Right scene: the (green) velg is approaching the (red) prad).

https://doi.org/10.1371/journal.pone.0288989.g001

Crucially, the target and distractor scenes were designed to differ by one single element. Specifically, in trials testing knowledge of verbs, the target and distractor scenes differed in terms of the action that the aliens performed; in the noun test trials, the two scenes differed with regards to one of the alien characters; finally, in the trials testing adjective learning, the color of one of the aliens in the distractor scene was changed. There were 90 trials for each of the three lexical categories involved (verbs, nouns, adjectives) interspersed among the three training blocks. The majority of the sentences had a canonical SOV word order (200), while the rest were OSV (70).

Grammatical comprehension test.

At the end of the lexical training trials, participants were tested on the grammar of the artificial language by being exposed to 90 new sentences. Seventy of those sentences displayed SOV word order and 20 were OSV. Participants heard the same 90 sentences over the four sessions. However, a different pseudorandom order was created for each session. All participants saw the stimuli in the same sequence.

A two-alternative forced-choice task (2AFC) was used for testing grammar learning and the procedure followed was similar to the one used for lexical training; participants heard a sentence in the artificial language and viewed two animated scenes: a target scene which depicted the meaning of the sentence and a distractor scene in which the agent and patient roles of the two aliens were reversed. No feedback regarding accuracy was displayed. The side of the target video (left or right) was counterbalanced within each list.

Grammaticality judgement task.

In the final session, grammatical knowledge was assessed by means of a Grammaticality Judgement Task (GJT) in which participants were presented with novel sentences (i.e., sentences that were not used in the preceding sessions) and were asked to decide whether the sentences were correct or incorrect. The GJT task consisted of 80 sentences. Half of the sentences were grammatical and the other half contained various kinds of grammatical violations (Table 2). Violations of each type occurred 8 times each.

Download:

Table 2. Types of ungrammatical sentences in the grammaticality judgement task.

https://doi.org/10.1371/journal.pone.0288989.t002

SOV and OSV patterns appeared with equal frequency during the task. Within each construction, i) half of the sentences were grammatical and half ungrammatical, and ii) half of the sentences included an adjective which modified the subject (10) or the object (10) of the sentence. The sentences appeared in random order, but the order of presentation was the same for all subjects. No feedback was given on responses.

Final grammatical comprehension test.

In the final task, participants were once again tested on their knowledge of the grammar of the artificial language by means of a 2AFC task that was identical to the Grammatical Comprehension Test blocks administered in the previous sessions. The auditory stimuli consisted of the same 40 grammatical sentences that were presented in the GJT, thus SOV and OSV appeared equally frequently. Each sentence was accompanied by a target video that correctly depicted the sentence and a distractor video showing reversed subject/object roles.

Results

Data analysis.

For the Lexical Training and the Grammatical Comprehension test, mean accuracy scores and mean reaction times were calculated for each individual for each of the four sessions and were then averaged across participants. Performance is summarized in Table 3. For the Pretraining task, individual scores were calculated as the total number of correct responses during the training phase and the first test block (80 items in total). For the GJT task, following signal detection theory [89], participants’ ability to discriminate between correct and incorrect sentences was measured by d-prime (d’). Specifically, four scores were obtained for each participant: hits (grammatical sentences judged as acceptable), misses (grammatical sentences judged as unacceptable), false alarms (ungrammatical sentences judged as acceptable) and correct rejections (ungrammatical sentences judged as unacceptable). From these scores, d’ scores were calculated for each participant [90] using the ’psycho’ package [version 0.6.1; 91] in RStudio [92]. A d-prime score of 0 indicates chance performance and high d’ scores indicate greater discrimination sensitivity. Finally, for the Final Grammatical Comprehension test (henceforth, FGCT), individual scores were calculated as the number of correct responses provided by each participant. A summary of participants’ performance on the three artificial language tasks is presented in Table 4. Correlation matrices showing the relationship between the three artificial language tasks and performance on the Lexical Training and Grammatical Comprehension trials in each session are provided in the S2 Appendix. Data from each task (except Pretraining) were analyzed separately using mixed-effects models. To further explore significant interactions, post-hoc pairwise comparisons were performed using the emmeans package [version 1.8.1.1; 93]. Finally, we calculated effect sizes for the models, measured by marginal and conditional R², using the rsquared.GLMM function from the MuMIN package [version 1.47.1; 94], odds ratios and confidence intervals for the predictor variables using the tab_model function from sjPlot package [version 2.8.11; 95] and Spearman-Brown split-half reliability for all test measures using the splithalf package [version 0.8.2; 96]. These reliability coefficients are reported in S3 Appendix. All data and R scripts for the analyses are available at (https://osf.io/3jy52/?view_only=2664569e74964a5b84c2c3989f70e41f).

Download:

Table 3. Mean accuracy and reaction times across sessions in the lexical training and grammatical comprehension blocks.

https://doi.org/10.1371/journal.pone.0288989.t003

Download:

Table 4. Descriptive statistics for performance on the pretraining and post-tests grammar tasks.

https://doi.org/10.1371/journal.pone.0288989.t004

Lexical training.

Since Language Training scores were obtained from a 2AFC picture selection task, chance-level performance was 50% or 135 correct responses (out of 270 trials). Participants’ performance was greater than chance from the first session onwards and their ability to discriminate between correct and distractor scenes continued to improve throughout the study.

With regards to performance by distractor type, accuracy was higher on trials involving noun distractors (M = 88%, SD = 8.1%), followed by trials including verb (M = 79.8%, SD = 15.7%) and adjective distractors (M = 68.9%, SD = 18.2%). Regarding Word Order, participants achieved similar accuracy for SOV (M = 79.2%, SD = 11.9%) and for OSV sentences (M = 77.9%, SD = 12.4%).

To determine how accuracy rates on the lexical training task changed as a function of time and whether the presence of pretrained words affected learning, trial-by-trial data from the Lexical Training trials were submitted to a mixed-effects logistic regression model using the lme4 package [version 1.1–30: 97] in R. This type of model is well suited to analyzing binary response data [98, 99]. Response accuracy, coded as correct (1) or incorrect (0), was entered as the categorical dependent variable. Pretraining score was included in the model as a continuous variable. Scores from the Pretraining task were centered and scaled using the scale() function in R. Session (contrast coded as -1, -0.5, 0.5 and 1, for Sessions 1 to 4, respectively) and Word Order (contrast coded as -0.5 and 0.5, for OSV and SOV respectively) were also entered in the model as predictors. Contrast coding allows for recentering categorical variables by making the intercept the grand mean (i.e., 0), so that the predictors and their interactions can be interpreted in a manner analogous to ANOVA. By doing so, the direction of the overall effect of predictors in the model is indicated by the regression weights (positive or negative). The model contained all the two-way interactions between the predictors. The predicted probabilities of correct responses for all contrasts of interest were computed using the ggeffects package [100; version 1.1.3].

Data were initially fitted to a model containing random intercepts for participants and items. To determine the best random-effects structure, random slopes for all fixed effects were first tested separately and then compared to the random intercepts only model by means of likelihood ratio tests using the anova() function of R. Subsequently, random slopes were added to the model one at a time, starting from the one that improved the model’s fit the most and were retained if the model converged and fitted the data significantly better than the previous base model (the model without the random slope) as determined by likelihood ratio tests. The best-fitting model contained random intercepts for participants and items, by-participant random slopes for Session and by-item random slopes for Pretraining. The output of the best-fitting model is shown in Table 5.

Download:

Table 5. Mixed-effects model fitted to the lexical training data.

https://doi.org/10.1371/journal.pone.0288989.t005

The model revealed significant effects of Session and Pretraining, which as indicated by the positive coefficients suggest that vocabulary learning rates improved significantly across sessions and that learners who exhibited better learning of the lexical items in the Pretraining task were more likely to achieve higher accuracy in the Lexical Training trials. Furthermore, the interaction between Session and Pretraining, albeit not significant, suggests that the effect of Session was stronger for participants with higher Pretraining scores. Finally, performance did not differ significantly between the two Word Orders and there was no interaction between Session and Word Order as leaners appeared to respond with approximately equal accuracy to both types of sentences across sessions (Fig 2). Specifically, the estimated accuracy was 87% for SOV and 86% for OSV sentences.

Download:

Fig 2. Predicted probability of a correct answer as a function of Session and Word Order in Lexical Training (left) and Grammatical Comprehension test (right) in L1 English participants.

https://doi.org/10.1371/journal.pone.0288989.g002

Grammatical comprehension test.

For each Session of the task, 50% or 45 correct responses (out of 90 trials) represent chance-level performance. As shown in Table 3, accuracy in the grammatical comprehension trials was above chance across all sessions, but performance remained stable over time. With regards to Word Order, overall, performance was better on SOV (M = 74.7%, SD = 15.8%) than on OSV sentences (M = 29.4%, SD = 16.7%).

A mixed-effects logistic regression model was fitted with Accuracy (correct = 1, incorrect = 0) as categorical dependent variable and with Session and Word Order, contrast coded in the same way as described in the model on Lexical Training data above, as predictors. In order to examine whether participants’ initial knowledge of words affects grammar learning, Pretraining was also entered as predictor. Furthermore, the model included all the two-way interactions between the three variables. The random-effects structure was selected following the process outlined above. The final model contained random intercepts for participants and items, by-participant and by-item random slopes for Session and by-participant slopes for Word Order.

According to the model (Table 6), the effect of Session was not significant, suggesting that, despite an improvement in performance as shown by the positive estimate of the effect, overall, learners’ accuracy rates did not increase substantially over time. However, there was a significant effect of Word Order and a significant interaction between Word Order and Session, qualifying the main effect of Session. The positive coefficient for the Word Order effect suggests that participants were more accurate on SOV than on OSV sentences, corresponding to an estimated accuracy of 81% and 23% respectively, and a follow-up analysis on the interaction revealed that the effect of Session was significantly higher for SOV sentences as compared to OSV ones, indicating that the difference in performance between the two Word Orders increased across sessions (Fig 2).

Download:

Table 6. Mixed-effects model fitted to the grammatical comprehension test data.

https://doi.org/10.1371/journal.pone.0288989.t006

Grammaticality judgement task.

Performance on the task is summarized in Table 7. Overall, learners judged 64% (SD = 7.2%) of the test sentences correctly, but performance was driven primarily by accuracy on grammatical rather than on ungrammatical test sentences. The descriptive statistics show variation in performance on different types of violation, with participants achieving higher scores on sentences involving word order violations than on sentences that contained case marking violations. Performance on sentences with SOV and OSV word order was nearly indistinguishable (M = 64.5%, SD = 6.5%, and M = 63.6%, SD = 9.3%, respectively).

Download:

Table 7. Mean percentage of correct responses (SDs) and d’ scores by sentence type the grammaticality judgement task.

https://doi.org/10.1371/journal.pone.0288989.t007

In order to assess the extent to which participants learned the syntactic structure of the artificial language, data were submitted to a mixed-effects logistic regression model with Accuracy as a binary outcome variable (correct = 1 vs incorrect = 0) and Word Order (contrast coded with OSV as -0.5 and SOV as 0.5), Pretraining, Grammaticality (contrast coded with ungrammatical sentences as -0.5 and grammatical as 0.5) and Error Type (contrast coded with case marking sentences as -0.5 and word order sentences as 0.5) as independent variables. All the two-way interactions between the aforementioned predictors were also entered in the full model. The model included random intercepts for participants and items, by-participant random slopes for Grammaticality and Error Type.

The model (See S4 Appendix for the full model) returned a significant effect of Grammaticality (β = 4.536, z = 6.844, p < .001), indicating that learners judged grammatical sentences more accurately than ungrammatical ones, as shown by the positive coefficient, with predicted accuracies of 98% and 38% respectively. There was a significant effect of Pretraining (β = 0.288, z = 2.003, p = .045), and a significant interaction between Grammaticality and Pretraining (β = 1.298, z = 2.675, p = .007) suggesting that participants with higher scores in the Pretraining task performed more accurately in this task and that the effect of Pretraining was different at different levels of the Grammaticality variable. Follow-up simple slope analysis, which involves estimating and comparing the slopes of the covariate trend for each level of a factor variable, showed that the effect of Pretraining was positive for grammatical sentences but negative and non-significant for ungrammatical items. Finally, the model showed a positive coefficient for the effect of Error Type (β = 2.940, z = 6.292, p < .001), suggesting that, overall, participants performed better on word order than case marking sentences, with 96% and 58% predicted accuracies respectively, and significant interactions between Grammaticality and Error Type (β = -4.466, z = -5.465, p < .001) and between Pretraining and Error Type (β = 0.438, z = 2.075, p = .038). Regarding the former interaction, post-hoc analyses revealed that the effect of Grammaticality, while significant for both types of sentences, was bigger for those in the case marking condition, with the predicted probability of a correct response increasing by 94% from ungrammatical (4%) to grammatical (98%), as compared to an increase of 10% for word order sentences (89% for ungrammatical and 99% for grammatical). For the latter of the two interactions, simple slope analysis found that the effect of pretraining was stronger and significant only for word order sentences.

Final grammatical comprehension test.

Overall, participants exhibited high accuracy on SOV (M = 74.9%, SD = 18.6%), but were below chance on OSV sentences (M = 32%, SD = 23%). Yet another mixed-effects logistic regression model (See S4 Appendix for the full model) was built with Accuracy (correct = 1 vs. incorrect = 0) as a binary outcome, Pretraining, Word Order (contrast coded with OSV as -0.5 and SOV as 0.5) and their interaction as fixed effects, random intercepts for participants and items and random by-participant slopes for Word Order. A significant effect of Word Order was found (β = 2.522, z = 5.538, p < .001), which confirmed that participants reliably identified the correct picture for SOV items (81% predicted accuracy) but encountered difficulties doing so when presented with OSV sentences (26% predicted accuracy) even after four sessions of incidental exposure to the grammatical structure of the artificial language. Finally, the effect of Pretraining was not significant (β = 0.035, z = 0.374, p = .708), nor was the interaction between Pretraining and Word Order (β = 0.055, z = 0.137, p = .891).

Discussion

As we have seen, our participants’ accuracy scores on the grammatical tasks were relatively low (M = 64.6% in the Grammatical Comprehension test; M = 64% in the GJT; M = 53.4% in the FGT), confirming that learning new grammatical constructions under incidental exposure conditions, without instruction about the structure of the language or feedback on accuracy of performance, is particularly challenging for adult L2 learners [cf. 13, 36, 80]. This difficulty seems to hold even after extensive exposure to the artificial language and despite the fact that participants performed relatively well on the lexical trials even in the early stages of the experiment. Specifically, while participants succeed in interpreting the SOV sentences, their ability to identify the correct scene upon hearing stimuli that had the less frequent OSV word order did not exceed chance level at any point during the study. This could be taken to suggest that participants might have relied primarily on word order cues, as opposed to inflectional morphology, in order to process the Kepidalo sentences. Further support for this interpretation comes from participants’ performance on the GJT. As shown in Table 7, learners were more accurate when presented with sentences that had word order violations, especially with those containing verb placement errors, than when asked to detect the grammaticality of sentences that contained case marking violations. These results are in line with similar recent artificial language learning studies [11, 12] showing that under incidental exposure conditions, adult learners are more likely to develop knowledge of word order than case marking rules. This difficulty with case marking could be at least partially attributed to the effects of learned attention [58]. Specifically, a common theme across these studies is that the participants targeted were native English speakers. Hence, learners’ prior L1 experience with English, a fixed word order language without case marking, may have driven them to look for word order cues when processing the novel sentences, which would in turn block the learning of the low-salient inflectional markers. Furthermore, the fact that they are required to learn word order patterns (i.e., SOV or OSV) that are different from the canonical word order of their native language could have also led them to focus their attention on this aspect of the language.

However, there is one piece of evidence indicating that participants might have learned more than their performance implies. Specifically, performance on the Grammatical Comprehension test shows that, although participants demonstrated superior performance on SOV compared to OVS items, they did not consistently apply the dominant word order to all sentences at any point during the task. Instead, as shown in Fig 2, the mean accuracy rates for the two word order patterns approximated the relative proportion of each pattern in the input participants received during the lexical training parts in each session (i.e., SOV = 74%, OSV = 26%). Interestingly, learners’ performance in the FGCT followed a somewhat similar pattern (SOV: M = 74.5%, OSV: M = 32%). These results could be taken to imply that learners were indeed sensitive to the presence of two distinct word order patterns, as well as their frequency of occurrence. This finding seems to be in accord with previous studies showing that adult L2 learners may be capable of learning the probabilities of occurrence of different patterns in the input without necessarily learning the grammatical forms [101].

Experiment 2

As mentioned in the introduction, a large body of literature has shown that early adult L2 learning is heavily influenced by effects arising from prior L1 knowledge [58, 102, 103]. Such effects can lead learners to mistakenly engage L1-tuned processing affecting the route of L2 development. The results of Experiment 1 appear to be in line with these observations and warrant the question as to whether similar patterns of performance would be found for speakers of a language that relies primarily on case marking cues. In addition, note that, in Experiment 1, there was a small but non-significant increase in learners’ overall grammatical comprehension scores across sessions. Though this cannot be taken as a clear indication of improvement in learning, it seems to be consistent with the idea that incidental acquisition requires extensive amount to linguistic input [104]. Therefore, the slow learning rate seems to be partly attributable to difficulties associated with learning under incidental conditions per se.

In order to investigate the extent to which the results obtained in Experiment 1 stem from learners’ L1 experience and/or from the limited capacity to learn novel grammatical structures through incidental exposure, in Experiment 2, we examined whether the pattern of performance obtained for the L1 English participants could be replicated with native speakers of a language that is morphologically richer than English, namely German. In addition to this, Experiment 2 had two further aims. The first of these was to test whether additional incidental exposure would result in further incremental increases in performance or whether performance would stabilize at a suboptimal level. To do so, the amount of exposure was extended to six sessions, thus providing more opportunities for learning the rules of the language. Second, we also aimed at examining whether learning in incidental conditions depends on the learners’ ability to make explicit inferences about the grammatical structure of the language. While previous studies have focused on tracking the development of awareness during the test phase using source attributions and confidence ratings (e.g., [11, 32, 105]), here, we tested the extent to which learners’ metalinguistic awareness at the level of understanding [106], as measured by a post-test questionnaire, was predictive of vocabulary and grammar learning and we assessed whether this effect varied across sessions.