Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Can adults learn L2 grammar after prolonged exposure under incidental conditions?

  • Panagiotis Kenanidis ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Chair of Language and Cognition, Department of English and American Studies, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

  • Ewa Dąbrowska,

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliations Chair of Language and Cognition, Department of English and American Studies, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, Department of English Language and Linguistics, University of Birmingham, Birmingham, United Kingdom

  • Miquel Llompart,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Writing – review & editing

    Affiliations Chair of Language and Cognition, Department of English and American Studies, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, Department of Translation and Language Sciences, Universitat Pompeu Fabra, Barcelona, Spain

  • Diana Pili-Moss

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Institute of English Studies, Faculty of Education, Leuphana Universität Lüneburg, Lüneburg, Germany


While late second language (L2) learning is assumed to be largely explicit, there is evidence that adults are able to acquire grammar under incidental exposure conditions, and that the acquisition of this knowledge may be implicit in nature. Here, we revisit the question of whether adults can learn grammar incidentally and investigate whether word order and morphology are susceptible to incidental learning to the same degree. In experiment 1, adult English monolinguals were exposed to an artificial language (Kepidalo) that had case marking and variable word order: a canonical Subject-Object-Verb order and a non-canonical Object-Subject-Verb. In a five-session online study, participants received vocabulary training while being incidentally exposed to grammar, and completed a series of picture-selection and grammaticality judgment tasks assessing grammatical knowledge. Despite extensive exposure to input, and although performance on vocabulary increased significantly across sessions, learners’ grammatical comprehension showed little improvement over time, and this was limited to Subject-Object-Verb sentences only. Furthermore, participants were better at detecting word order than case marking violations in the grammaticality judgment tasks. Experiment 2 further increased the amount of incidental exposure whilst examining native speakers of German, which exhibits higher morphological richness. Testing was followed by a post-test metalinguistic awareness questionnaire. Although greater learning effects were observed, participants continued to have difficulties with case marking. The findings also demonstrated that language outcomes were modulated by learners’ level of metalinguistic awareness. Taken together, the results of the two experiments underscore adult learners’ difficulty with case marking and point towards the presence of a threshold in incidental L2 grammar learning, which appears to be tightly linked to prior first language experience. In addition, our findings continue to highlight the facilitative role of conscious awareness on L2 outcomes.


There is a general consensus that mastering a second language (L2) is a notoriously demanding task, particularly for adult learners, and, therefore, native-like attainment is very rarely achieved. Previous studies on L2 acquisition indicate that learning a language at a later stage in life is largely guided by explicit learning [13]. Consequently, studies comparing L2 learning under incidental and intentional conditions have demonstrated that the presentation of explicit information about the learning target leads to greater learning gains [46]. By contrast, first language (L1) acquisition is a process that is thought to occur mostly unconsciously and automatically (e.g., [710]).

Interestingly, however, a growing body of research on artificial language learning has provided evidence that participants can rapidly develop knowledge about different aspects of a novel (L2) grammar even under incidental conditions (e.g., [1113]). Yet, a recurrent pattern in these studies is that participants typically achieve performance that is only slightly above chance and rarely exceeds 60% accuracy [14, 15]. These findings may be attributable to three different yet potentially overlapping factors. First, adults’ capacity to learn grammatical rules incidentally may, to a certain extent, be affected by maturational constraints, resulting in low learning rates [9, 16]. Second, language learning outcomes are frequently assessed after a limited amount of exposure to input, usually confined within a single session, which may be insufficient for learners to develop robust grammatical knowledge (for studies with an extensive language training regimen see [17, 18]). In fact, notwithstanding their differences, contemporary cognitive models of L2 acquisition (e.g., [2, 9, 19, 20]) converge in suggesting that repeated exposure to input and practice can lead to better language learning outcomes. Third, adult L2 attainment appears to be substantially affected by learners’ previous L1 experience [1, 21, 22]. Hence, given that in the majority of studies the target population consisted of native speakers of English [1113], a fixed word order language, the extent to which previous learning outcomes can be attributed to limits in learning under incidental exposure conditions per se, or whether they are additionally modulated by previous L1 experience, is still far from clear.

The present study set out to revisit the question of whether adults can learn novel grammatical structures incidentally, while addressing the aforementioned gaps in the literature. To this end, we examined if word order and inflectional morphology are susceptible to learning under incidental conditions to the same extent, as well as the degree to which their learnability is affected by the amount of exposure to the novel language and the level of similarity between the L1 and the new target language. Furthermore, most studies have focused on testing the learning of one grammatical feature at a time (e.g., case-marking; [13, 17]; word order; [2325]; noun-adjective agreement; [26]; verb morphology; [27]). Therefore, surprisingly, little is known about the order in which different aspects of grammar are acquired when adult learners are incidentally exposed to multiple grammatical features simultaneously. Comparing the simultaneous acquisition of various grammatical features can allow studying how participants weight novel linguistic cues, which one(s) they prioritize and how the development of one may influence the development of the other(s), thus providing important insights into the language acquisition process. Note that, in this paper, the terms acquisition and learning will be used interchangeably to refer to the process by which one learns language.


Incidental and intentional L2 learning

As part of the effort to gain a better understanding of the fundamental cognitive mechanisms underlying L2 learning, researchers have explored how learners acquire linguistic knowledge under two different conditions: incidental and intentional exposure conditions. In this context, intentional exposure refers to the condition in which participants are given explicit information about the learning targets or are instructed to engage in deliberate hypothesis testing and memorization of rules [3, 28, 29]. Such conditions promote primarily the engagement of explicit learning processes, which are thought to contribute primarily to the development of explicit knowledge, often signified by learners’ ability to verbalize the acquired rules [29, 30]. In contrast, incidental exposure is operationalized as the situation where participants are not informed about the learning target and the subsequent test phase [29, 31]. To achieve this, a cover task is used which is intended to focus learners’ attention on another activity that requires processing the input for meaning, instead of overtly encouraging them to consciously focus on the linguistic structure. Learning under incidental conditions is considered to favor the involvement of implicit learning processes, which result in the acquisition of implicit knowledge [30].

Intentional and incidental conditions are usually conflated with the terms explicit and implicit, respectively, and they are occasionally used interchangeably. However, they are not identical. The former two terms are more appropriate for describing the experimental conditions researchers design to investigate the type of learning that is taking place, as well as the nature of the L2 knowledge participants develop. In contrast, the latter two refer to the internal learning process that is engaged while acquiring new knowledge. The distinction can account for previous findings showing that participants can engage both explicit and implicit learning processes and can acquire both explicit and implicit knowledge irrespective of the conditions to which they are exposed [11, 27, 3235]. In accordance with this distinction, the terms intentional and incidental will be used with reference to the environmental conditions under which learning is taking place without making any assumptions about the underlying language processes.

While a series of studies on artificial and semi-artificial languages has demonstrated a significant advantage of learning novel grammatical structures under intentional over incidental exposure conditions [24, 3537], learners have been shown to succeed not only in learning grammatical structures incidentally, but, in some cases, in developing knowledge that is partly implicit (e.g., [13, 38, 39]). Although these findings may suggest that some aspects of L2 grammar can be learned without intention or awareness, the overall learning effect observed is generally limited. For example, Rebuschat and Williams [30] exposed adult learners to an artificial language consisting of English vocabulary and German word order and evaluated learning of word order via a grammaticality judgement task (GJT). Participants in the incidental group only performed with ~55% accuracy, while the addition of an elicited imitation task resulted in an increased learning effect (~62%). Subsequent studies have found similar learning effects [23, 24]. Small learning effects have also been reported in studies targeting the learning of novel case markers. Rogers et al. [13] tested L1 English speakers’ ability to learn case-marking incidentally by presenting them with a semi-artificial language consisting of English phrases and Czech nouns marked either for nominative (-a) or accusative case (-u). This design was aimed at directing participants’ attention to the target grammatical markers, thereby facilitating noticing. Despite that, and although learners showed above-chance performance, mean accuracy was only ~56%. These results have been corroborated by subsequent studies testing inflectional morphology learning under incidental exposure [12, 40, 41].

One potential explanation for these findings relates to the limitations imposed on learners by the nature of implicit learning. Incidental contexts are thought to engage primarily implicit learning processes [42]. According to the literature on L2 acquisition, the ability to learn a language implicitly decreases with age [16, 43]. However, this age effect does not apply uniformly to all components of implicit learning [44]. Therefore, adults may retain the ability to learn simple structures implicitly but may face problems with low-salience and abstract linguistic patterns and rules [45], which may explain learners’ severe difficulties with L2 inflectional morphology.

Additionally, implicit learning requires extensive and repeated exposure to input for the development of new linguistic knowledge [1]. Despite this, most studies examine learning outcomes shortly after exposure to novel structures, which may be too brief for robust learning to occur [14]. Importantly, prior work has shown that a mere increase in exposure to the linguistic stimuli over the same experiment does not significantly improve learning gains [41, 46]. In contrast, accuracy scores tend to improve when the length of exposure is extended to at least a second session [18]. One explanation for these different findings may be found in sleep-related memory consolidation of new information. Such memory effects have been demonstrated both for cognitive abilities, such as implicit learning [47], and for novel word learning [48]. Hence, it is likely that the findings of studies examining grammar learning after minimal exposure to input may underestimate adults’ learning abilities. Thus, given the scarcity of (micro-)longitudinal studies in this area, the extent to which adults can learn grammatical rules under incidental conditions is still unclear, and so is whether receiving relatively extensive incidental exposure would result in higher levels of L2 grammatical accuracy. To gain a fuller understanding of adult learners’ capacity for incidental grammar learning, in the present study, language exposure was spread over five separate sessions, which allowed us to track how learning develops over time.

L1 transfer in L2 grammar learning

Another reason for the meagre learning effects observed in previous studies may be tied to the fact that the acquisition of various aspects of grammar is generally particularly challenging for late L2 learners [4951]. The cause of these difficulties can be traced to various factors, such as input frequency [52, 53], complexity of features [54], emotions [55], individual differences in cognitive abilities [56] and prior L1 knowledge [5759]. Among these factors, understanding how prior linguistic experience can influence the perceived difficulty of L2 structures has received considerable attention in the L2 learning literature. Previous research has shown that, at the initial stages of learning a new language, participants tend to use L1 processing strategies to interpret L2 sentences [6063]. Prior L1 experience tunes the perceptual system interfering with subsequent L2 processing. Associative learning mechanisms are, thus, hindered by such learned attention effects [58, 64]. Specifically, earlier L1 experience with a cue (e.g., a temporal adverb such as yesterday or today) that reliably leads to a particular outcome (e.g., temporal reference) may block the acquisition of another cue that is also relevant for the interpretation (e.g., past tense -ed) [65]. Such effects can be particularly detrimental for cues that lack perceptual salience, have low communicative value (e.g., agreement) and are not present in the L1.

However, studies directly addressing the effect of L1 experience on artificial language learning are scarce. Some evidence of this effect comes from Williams and Kuribara [25], who exposed L1 English speakers to a semi-artificial language consisting of English words and Japanese syntax. Participants in the incidental exposure condition were informed about the function of the case markers and were presented with a number of sentences, including mainly canonical sentences and a minority of scrambled structures. They were then tested on their ability to learn the different word order regularities of the language. The results of a GJT containing novel lexis and some new structures showed that while participants learned the canonical structures, they did not reliably reject the new ungrammatical sentences, particularly those that were grammatical in English, indicating that they did not generalize the notion of scrambling to new sentences. Instead, learners developed a strong preference for canonical word orders. Additional evidence is provided by Gao and Ma [35]. In a replication of the Tagarelli et al. [24] study, L1 Chinese participants were presented with sentences that had Chinese vocabulary and German grammar, allowing for three grammatical structures, one simple and two complex structures, which differed in terms of verb placement. Following exposure, participants trained in both incidental and instructed conditions completed a GJT and an elicited production task. While in the original study the incidental exposure group of L1 English speakers learned both the simple and one of the complex patterns, linguistic complexity did not emerge as a significant predictor for the L1 Chinese speakers, who performed close to chance on all structures. According to the authors, performance can be attributed to the fact that Chinese allows for verbs to occur later in the sentence, causing strong L1 interference. Similar findings appear to emerge from studies examining incidental learning of mappings between novel determiners and semantic properties of nouns. In one of their experiments, Leung and Williams [66, 67] introduced L1 Chinese and L1 English speakers to a miniature artificial determiner system and instructed that the determiners encode the distance between the speaker and the object (gi and ro for near objects and ul and ne for far objects). However, they were not informed that these determiners also referred to the shape of objects (gi and ul referred to long objects while ul and ne referred to flat objects). Subsequently, participants were tested on their ability to incidentally learn the relationships between the determiners and their shape meanings. Both groups were visually presented with noun phrases (e.g., gi shoelace vs ul tissue) in their native language and were asked to indicate, as quickly and accurately as possible, first whether the object presented was long or flat and, secondly whether the object was near or far. L1 Chinese speakers, but not L1 English speakers, managed to learn the hidden associations, taking advantage of the fact that the shape distinction is explicitly encoded in the classifier system of (written) Chinese. Using the same artificial determiner system and experimental design, Cayado and Chan [67] tested Chinese–English bilinguals’ and native English speakers’ ability to learn the associations between determiners and fire/water semantic categories (gi and ul for water-related words and ro and ne for fire-related words), a distinction that is also marked in written Chinese. Here, test items were presented in English to both groups. While both groups showed evidence of learning, Chinese–English bilinguals responded faster than the L1 English speakers despite testing taking place in their L2. Thus, overall, earlier studies suggest that different patterns of performance may arise depending on learners’ L1 background. Yet, to date, the role of L1 experience and transfer in artificial language has not been comprehensively tested, limiting the generalizability of previous findings as well as the potential and limits of adult incidental grammar learning. Therefore, an additional aim of the current study was to remedy this by investigating how prior linguistic experience moderates L2 grammar learning under incidental exposure conditions.

Artificial language paradigms

Natural languages are highly complex; consequently, isolating and examining how learners acquire a particular structure or pattern and what factors are involved in the acquisition process can be a difficult endeavor. This problem can be overcome by using artificial language paradigms [for reviews, see 68, 69]. The use of such paradigms allows for exerting full control over the type of structures or patterns to be tested, the degree of (di)similarity to learners’ known language(s), the amount of input that learners are exposed to and the type of exposure. In contrast to natural languages, artificial languages allow participants to achieve high levels of proficiency within a short amount of time. Furthermore, given that the vast majority of artificial languages studies is conducted within a controlled laboratory environment, researchers are afforded the opportunity to specify the desired inclusion criteria and focus on specific structures, while also avoiding potential confounds associated with the characteristics of the participants [70].

However, the use of these paradigms does have some potential limitations, the major of which is likely the concerns over their ecological validity. This is because, given the simplified nature of the input and the target structures, the results from artificial languages may not fully scale up to natural languages. This seems to be particularly relevant for semi-artificial languages, where the insertion of artificial or unknown morphological markers to real words, often known by learners, may increase the salience of these markers, and consecutively their learnability [27, 33]. Despite these concerns, previous neuroimaging studies suggest the existence of significant parallels in the brain activity during artificial and natural language processing [7173]. Additionally, performance on artificial language learning measures has been found to correlate positively with accuracy on natural language learning measures [74]. Therefore, the methodological advantages that these artificial language paradigms offer and the similarities in the neural correlates and mechanisms underlying processing of novel and native language constructions allow such paradigms to serve as ‘test tube’ models of natural languages [38], thus, making them a particularly useful tool to investigate L2 leaning and bilingualism [75, 76].

The present study.

The primary goal of the current study is to investigate whether adults can learn different aspects of novel language grammar under incidental exposure conditions. Extending earlier work, we explore whether prolonged incidental exposure can lead to more robust learning effects. Additionally, we examine the extent to which grammar attainment is influenced by learned attention effects stemming from learners’ prior L1 experience. The findings of two experiments are reported here. In the first experiment, adult L1 English speakers were exposed to an artificial language over five separate sessions, during which they were trained on the vocabulary of the language and completed a series of grammatical comprehension tests. The artificial language, Kepidalo, had variable word order and case marking on nouns and adjectives, features that are not present in English. To further tease apart the effects of incidental exposure and prior L1 experience on grammar learning, in a follow-up study (i.e., Experiment 2) we repeated the same experiment, but this time with native speakers of German, a morphologically richer language, while also increasing the amount of exposure to six sessions. In addition, Experiment 2 also investigates the relationship between learning outcomes and participants’ level of metalinguistic awareness, which was assessed by means of a post-test questionnaire.

The research questions addressed in the following experiments were the following:

  1. RQ1. Can adult learners acquire grammar under incidental exposure conditions?
  2. RQ2. If so, what aspects do they acquire (word order, case marking, agreement marking)?
  3. RQ3. Is extensive incidental exposure sufficient to obtain robust learning effects?
  4. RQ4. To what extent does learners’ L1 background modulate L2 grammar learning?
  5. RQ5. To what extent is L2 grammar learning associated with metalinguistic awareness of the target structures?

Regarding RQ1, based on previous research demonstrating at least some grammar learning after a single session of incidental exposure to the linguistic stimuli (e.g., [12, 13, 23, 40]), and considering the extensive amount of artificial language input participants received, we predicted that evidence of grammar learning would be found for both L1 groups. For RQ2, following previous studies, we hypothesized that both the L1 English and the L1 German participants would show greater learning effects for word order than for morphology, given the low salience of morphosyntactic cues. For RQ3, given the scarcity of available research, our predictions are more tentative. While both Rogers [40] and Williams [46] failed to find better performance after increasing or even doubling the amount of exposure to stimuli within the same session, Pili-Moss, Brill-Schuetz, Faretta-Stutenberg and Morgan-Short [77], who provided a session-by-session analysis of the data originally collected in Morgan-Short et al. [18], found that learners’ grammatical abilities improved over time (see also [78] for a similar pattern of results). Given that in the present study participants completed each session on different days, allowing for consolidation effects to occur, we hypothesized that grammatical comprehension would increase as a function of time. Despite that, we still expected persistent difficulties with inflectional morphology throughout the study. Regarding RQ4, it was predicted that the L1 English learners would show strong L1 transfer effects leading to relatively low accuracy scores at the early stages of exposure, with performance then improving as a function of time (experiment 1). Since the L1 German participants have prior experience with case marking and word order variation from their L1, we expected them to outperform the L1 English group on all aspects of grammar (experiment 2). Finally, for RQ5 (experiment 2), we hypothesized that the development of knowledge of which learners are aware would result in greater learning outcomes, given its facilitative role in L2 learning [17, 39, 7981]. In addition, it was expected that the effect of metalinguistic awareness on learning would become stronger with increased exposure to artificial language input. Predictions regarding RQs 1, 2, and 5 were borne out, while our results regarding RQs 3 and 4 were less conclusive.

Experiment 1



Forty-one adults with a mean age of 22.02 years (SD = 4.17, range = 18–35) participated in the study. All participants were monolingual native speakers of English who were resident in the United Kingdom. Recruitment was conducted via email and social media (Facebook and Twitter; N = 14) and through Prolific, an online participant recruitment platform (; N = 27). To ensure that participants fulfilled the inclusion criteria, the following filters were applied: English speaking Monolingual, Nationality: United Kingdom, Country of Birth: United Kingdom, Age: 18–35, Country of Residence: United Kingdom. Participants were asked to electronically consent to take part in the study and received monetary compensation for their time (60.70 GBP). Based on self-reports, participants had on average 15.93 years of formal education (SD = 1.79, range = 12–21).

Artificial language learning game.

Participants were exposed to the novel artificial language in the context of an online computer-based game. In this game, the learners’ task was to travel to Tikon, a fictitious galaxy, and complete a number of challenges in order to collect four weapons that would help them defend the earth from an alien invasion. In order to accomplish their goal, participants had to learn an artificial language, namely Kepidalo.

The lexicon of Kepidalo comprised 14 disyllabic pseudowords: 8 nouns, 4 verbs and 2 adjectives (see S1 Appendix). The verbs designated semantically transitive events and always occurred with a direct object. All nouns and adjectives were overtly marked for case. The nouns were evenly distributed into two classes. The nominative case of nouns belonging in Class 1 was marked with the suffix -i, whereas nouns of Class 2 bore the suffix -a. In the accusative case, nouns of both classes took the suffix -o. The novel words were constructed to be easily pronounceable by participants.

In terms of syntax, Kepidalo was a verb-final language in which the order of subject and object was free, thus exhibiting either a canonical (SOV; 1a) or a non-canonical (OSV; 1b) word order. Adjectives were optional, occurred postnominally and carried an inflection morpheme that agreed in class and case with the noun they modified.

  1. (1). a. NounNOM−(AdjNOM)–NounACC−(AdjACC)–Verb
    b. NounACC−(AdjACC)–NounNOM−(AdjNOM)–Verb

A total of 400 Kepidalo sentences were generated for the purpose of the experiment. Within these sentences, all lexical items (nouns, verbs and adjectives) occurred an equal number of times, with each noun being assigned to the subject and the object positions with equal frequency (45 times in each position). In addition, 290 of the sentences were SOV while the remaining 110 sentences had a non-canonical OSV word order. All sentences were three to five words long and had an average duration of 1967 ms (range = 1622ms– 2742ms). The auditory stimuli were synthesized using the Google Cloud Text-to-Speech service. We opted for the use of a Polish accented synthesized voice to contribute to the impression that participants were learning a new language spoken by an alien character. Furthermore, a slightly slower than the normal speaking rate was employed (0.75 with 1 being the normal), as this is thought to facilitate L2 comprehension [82, 83].

The novel sentences described the actions of eight alien cartoon characters that corresponded to the eight nouns in the artificial language. The aliens appeared in two different colors, dark red or light green, each of which corresponded to one of the two Kepidalo adjectives. Short animated scenes in which the aliens were seen performing one of four simple actions (approaching, catapulting, chasing, or jumping over) were generated and were then converted to GIF format. The reason for using GIFs instead of videos was twofold: first, they are small in size, thus taking less time to load even on devices with slower network connections [84]. Second, GIFs’ ability to loop continuously allows participants to spend as much time as they need to while processing the stimuli while also minimizing the need to interact with the device during playback [85].

General procedure.

The experiment was conducted online through the Gorilla experiment builder [; 86] and could only be accessed via computers and laptops. Participants were, first, asked to electronically fill out a short background questionnaire concerning their demographics and prior language experience. Those who met all the inclusion criteria individually participated in five separate sessions within a 10-day span. The time interval between sessions was at least 24 hours but not more than 48 hours. The experimental stimuli and scripts for all tasks used in this study are available on Gorilla Open Materials (

A summary of the tasks that participants had to complete in each session, and the order in which they were administered is provided in Table 1. Over the first four sessions, participants were trained on the vocabulary of the novel language and were tested on their knowledge of the language’s grammar (word order, case marking). During the final session, two additional tasks designed to probe grammatical knowledge were administered. Each session also included a cognitive test measuring individual differences (not discussed here). The order of presentation of the tasks was the same for all participants.

Table 1. Summary of the artificial language tests administered during the study.


The first session of the study began with training on the nouns of the novel language. Participants were told that, as part of their mission, they would need to learn the names of the inhabitants of the Tikon galaxy. A four-alternative forced-choice (4AFC) task modeled after Llompart and Reinisch [87, 88] was used to assess learning of the lexical items. The task contained 2 identical phases: a training phase consisting of 64 familiarization trials, and a test phase during which each of the 8 nouns was presented twice, for a total of 16 trials. In each trial, four aliens of the same color were presented simultaneously, one in each corner of the screen. When participants clicked on a “Play” button, the name of one of the aliens was presented auditorilily in the nominative form (e.g., Alg-i, Flub-a). The participants’ task was to choose the picture that matched the word they had just heard. Visual feedback on accuracy was provided in the form of a green tick for correct answers or a red cross for incorrect answers. Following this, the target alien appeared on the screen in isolation for 1500ms and the corresponding noun was presented auditorily again. The location of the target items on the screen was randomized and the same trial sequence was used for all participants.

The familiarization phase was mandatory for all participants and the test phase differed in that participants’ scores determined whether they were ready to proceed to the next phase. Participants who achieved 100% accuracy on the test trials proceeded immediately to the next task, while those who failed to achieve the target score (n = 16) were given an additional 16 trials practice. After these additional trials, all participants were allowed to move on to the next task regardless of their final score.

Lexical training.

During each of the first four sessions, participants were exposed to 270 auditorily presented Kepidalo sentences, which were pseudo-randomly selected from the total set of 400 sentences that were originally generated. To control for possible recency and primacy effects, four randomized lists were created, one for each session. The sentences were divided into three training blocks, each containing 90 sentences. The order in which the sentences appeared was the same across participants. To keep participants motivated throughout the task, they were told that each correct answer would award a unit of solar energy which would propel them towards the next planet in the game, whereas each incorrect would decrease their solar energy by 1 unit. All participants advanced to the next task, regardless of the number of correct answers.

Lexical training involved a two-alternative forced-choice task (2AFC). In each trial two short animated scenes, each showing two aliens performing an action, were simultaneously displayed on the screen (Fig 1), while a sentence describing the events depicted in one of the two scenes was played (2). Participants were instructed to indicate, as quickly and accurately as possible, which of the two scenes the sentence described. Participants could replay the sentence a second time if they wished, and the animated scenes looped continuously until they responded. The side of the target scene was counterbalanced and participants received visual and auditory feedback immediately after they provided a response. At the end of each of the three blocks, a display showing participants’ cumulative score was presented.

  1. (2). Velg-a pog-a prad-o kov-o varek
    velg-NOM green-NOM prad-ACC red-ACC jump-over
    the green velg jumps over the red prad
Fig 1. Screenshot of a training trial in the lexical training task (Left scene: The (green) velg is jumping over the (red) prad, Right scene: the (green) velg is approaching the (red) prad).

Crucially, the target and distractor scenes were designed to differ by one single element. Specifically, in trials testing knowledge of verbs, the target and distractor scenes differed in terms of the action that the aliens performed; in the noun test trials, the two scenes differed with regards to one of the alien characters; finally, in the trials testing adjective learning, the color of one of the aliens in the distractor scene was changed. There were 90 trials for each of the three lexical categories involved (verbs, nouns, adjectives) interspersed among the three training blocks. The majority of the sentences had a canonical SOV word order (200), while the rest were OSV (70).

Grammatical comprehension test.

At the end of the lexical training trials, participants were tested on the grammar of the artificial language by being exposed to 90 new sentences. Seventy of those sentences displayed SOV word order and 20 were OSV. Participants heard the same 90 sentences over the four sessions. However, a different pseudorandom order was created for each session. All participants saw the stimuli in the same sequence.

A two-alternative forced-choice task (2AFC) was used for testing grammar learning and the procedure followed was similar to the one used for lexical training; participants heard a sentence in the artificial language and viewed two animated scenes: a target scene which depicted the meaning of the sentence and a distractor scene in which the agent and patient roles of the two aliens were reversed. No feedback regarding accuracy was displayed. The side of the target video (left or right) was counterbalanced within each list.

Grammaticality judgement task.

In the final session, grammatical knowledge was assessed by means of a Grammaticality Judgement Task (GJT) in which participants were presented with novel sentences (i.e., sentences that were not used in the preceding sessions) and were asked to decide whether the sentences were correct or incorrect. The GJT task consisted of 80 sentences. Half of the sentences were grammatical and the other half contained various kinds of grammatical violations (Table 2). Violations of each type occurred 8 times each.

Table 2. Types of ungrammatical sentences in the grammaticality judgement task.

SOV and OSV patterns appeared with equal frequency during the task. Within each construction, i) half of the sentences were grammatical and half ungrammatical, and ii) half of the sentences included an adjective which modified the subject (10) or the object (10) of the sentence. The sentences appeared in random order, but the order of presentation was the same for all subjects. No feedback was given on responses.

Final grammatical comprehension test.

In the final task, participants were once again tested on their knowledge of the grammar of the artificial language by means of a 2AFC task that was identical to the Grammatical Comprehension Test blocks administered in the previous sessions. The auditory stimuli consisted of the same 40 grammatical sentences that were presented in the GJT, thus SOV and OSV appeared equally frequently. Each sentence was accompanied by a target video that correctly depicted the sentence and a distractor video showing reversed subject/object roles.


Data analysis.

For the Lexical Training and the Grammatical Comprehension test, mean accuracy scores and mean reaction times were calculated for each individual for each of the four sessions and were then averaged across participants. Performance is summarized in Table 3. For the Pretraining task, individual scores were calculated as the total number of correct responses during the training phase and the first test block (80 items in total). For the GJT task, following signal detection theory [89], participants’ ability to discriminate between correct and incorrect sentences was measured by d-prime (d’). Specifically, four scores were obtained for each participant: hits (grammatical sentences judged as acceptable), misses (grammatical sentences judged as unacceptable), false alarms (ungrammatical sentences judged as acceptable) and correct rejections (ungrammatical sentences judged as unacceptable). From these scores, d’ scores were calculated for each participant [90] using the ’psycho’ package [version 0.6.1; 91] in RStudio [92]. A d-prime score of 0 indicates chance performance and high d’ scores indicate greater discrimination sensitivity. Finally, for the Final Grammatical Comprehension test (henceforth, FGCT), individual scores were calculated as the number of correct responses provided by each participant. A summary of participants’ performance on the three artificial language tasks is presented in Table 4. Correlation matrices showing the relationship between the three artificial language tasks and performance on the Lexical Training and Grammatical Comprehension trials in each session are provided in the S2 Appendix. Data from each task (except Pretraining) were analyzed separately using mixed-effects models. To further explore significant interactions, post-hoc pairwise comparisons were performed using the emmeans package [version; 93]. Finally, we calculated effect sizes for the models, measured by marginal and conditional R2, using the rsquared.GLMM function from the MuMIN package [version 1.47.1; 94], odds ratios and confidence intervals for the predictor variables using the tab_model function from sjPlot package [version 2.8.11; 95] and Spearman-Brown split-half reliability for all test measures using the splithalf package [version 0.8.2; 96]. These reliability coefficients are reported in S3 Appendix. All data and R scripts for the analyses are available at (

Table 3. Mean accuracy and reaction times across sessions in the lexical training and grammatical comprehension blocks.

Table 4. Descriptive statistics for performance on the pretraining and post-tests grammar tasks.

Lexical training.

Since Language Training scores were obtained from a 2AFC picture selection task, chance-level performance was 50% or 135 correct responses (out of 270 trials). Participants’ performance was greater than chance from the first session onwards and their ability to discriminate between correct and distractor scenes continued to improve throughout the study.

With regards to performance by distractor type, accuracy was higher on trials involving noun distractors (M = 88%, SD = 8.1%), followed by trials including verb (M = 79.8%, SD = 15.7%) and adjective distractors (M = 68.9%, SD = 18.2%). Regarding Word Order, participants achieved similar accuracy for SOV (M = 79.2%, SD = 11.9%) and for OSV sentences (M = 77.9%, SD = 12.4%).

To determine how accuracy rates on the lexical training task changed as a function of time and whether the presence of pretrained words affected learning, trial-by-trial data from the Lexical Training trials were submitted to a mixed-effects logistic regression model using the lme4 package [version 1.1–30: 97] in R. This type of model is well suited to analyzing binary response data [98, 99]. Response accuracy, coded as correct (1) or incorrect (0), was entered as the categorical dependent variable. Pretraining score was included in the model as a continuous variable. Scores from the Pretraining task were centered and scaled using the scale() function in R. Session (contrast coded as -1, -0.5, 0.5 and 1, for Sessions 1 to 4, respectively) and Word Order (contrast coded as -0.5 and 0.5, for OSV and SOV respectively) were also entered in the model as predictors. Contrast coding allows for recentering categorical variables by making the intercept the grand mean (i.e., 0), so that the predictors and their interactions can be interpreted in a manner analogous to ANOVA. By doing so, the direction of the overall effect of predictors in the model is indicated by the regression weights (positive or negative). The model contained all the two-way interactions between the predictors. The predicted probabilities of correct responses for all contrasts of interest were computed using the ggeffects package [100; version 1.1.3].

Data were initially fitted to a model containing random intercepts for participants and items. To determine the best random-effects structure, random slopes for all fixed effects were first tested separately and then compared to the random intercepts only model by means of likelihood ratio tests using the anova() function of R. Subsequently, random slopes were added to the model one at a time, starting from the one that improved the model’s fit the most and were retained if the model converged and fitted the data significantly better than the previous base model (the model without the random slope) as determined by likelihood ratio tests. The best-fitting model contained random intercepts for participants and items, by-participant random slopes for Session and by-item random slopes for Pretraining. The output of the best-fitting model is shown in Table 5.

Table 5. Mixed-effects model fitted to the lexical training data.

The model revealed significant effects of Session and Pretraining, which as indicated by the positive coefficients suggest that vocabulary learning rates improved significantly across sessions and that learners who exhibited better learning of the lexical items in the Pretraining task were more likely to achieve higher accuracy in the Lexical Training trials. Furthermore, the interaction between Session and Pretraining, albeit not significant, suggests that the effect of Session was stronger for participants with higher Pretraining scores. Finally, performance did not differ significantly between the two Word Orders and there was no interaction between Session and Word Order as leaners appeared to respond with approximately equal accuracy to both types of sentences across sessions (Fig 2). Specifically, the estimated accuracy was 87% for SOV and 86% for OSV sentences.

Fig 2. Predicted probability of a correct answer as a function of Session and Word Order in Lexical Training (left) and Grammatical Comprehension test (right) in L1 English participants.

Grammatical comprehension test.

For each Session of the task, 50% or 45 correct responses (out of 90 trials) represent chance-level performance. As shown in Table 3, accuracy in the grammatical comprehension trials was above chance across all sessions, but performance remained stable over time. With regards to Word Order, overall, performance was better on SOV (M = 74.7%, SD = 15.8%) than on OSV sentences (M = 29.4%, SD = 16.7%).

A mixed-effects logistic regression model was fitted with Accuracy (correct = 1, incorrect = 0) as categorical dependent variable and with Session and Word Order, contrast coded in the same way as described in the model on Lexical Training data above, as predictors. In order to examine whether participants’ initial knowledge of words affects grammar learning, Pretraining was also entered as predictor. Furthermore, the model included all the two-way interactions between the three variables. The random-effects structure was selected following the process outlined above. The final model contained random intercepts for participants and items, by-participant and by-item random slopes for Session and by-participant slopes for Word Order.

According to the model (Table 6), the effect of Session was not significant, suggesting that, despite an improvement in performance as shown by the positive estimate of the effect, overall, learners’ accuracy rates did not increase substantially over time. However, there was a significant effect of Word Order and a significant interaction between Word Order and Session, qualifying the main effect of Session. The positive coefficient for the Word Order effect suggests that participants were more accurate on SOV than on OSV sentences, corresponding to an estimated accuracy of 81% and 23% respectively, and a follow-up analysis on the interaction revealed that the effect of Session was significantly higher for SOV sentences as compared to OSV ones, indicating that the difference in performance between the two Word Orders increased across sessions (Fig 2).

Table 6. Mixed-effects model fitted to the grammatical comprehension test data.

Grammaticality judgement task.

Performance on the task is summarized in Table 7. Overall, learners judged 64% (SD = 7.2%) of the test sentences correctly, but performance was driven primarily by accuracy on grammatical rather than on ungrammatical test sentences. The descriptive statistics show variation in performance on different types of violation, with participants achieving higher scores on sentences involving word order violations than on sentences that contained case marking violations. Performance on sentences with SOV and OSV word order was nearly indistinguishable (M = 64.5%, SD = 6.5%, and M = 63.6%, SD = 9.3%, respectively).

Table 7. Mean percentage of correct responses (SDs) and d’ scores by sentence type the grammaticality judgement task.

In order to assess the extent to which participants learned the syntactic structure of the artificial language, data were submitted to a mixed-effects logistic regression model with Accuracy as a binary outcome variable (correct = 1 vs incorrect = 0) and Word Order (contrast coded with OSV as -0.5 and SOV as 0.5), Pretraining, Grammaticality (contrast coded with ungrammatical sentences as -0.5 and grammatical as 0.5) and Error Type (contrast coded with case marking sentences as -0.5 and word order sentences as 0.5) as independent variables. All the two-way interactions between the aforementioned predictors were also entered in the full model. The model included random intercepts for participants and items, by-participant random slopes for Grammaticality and Error Type.

The model (See S4 Appendix for the full model) returned a significant effect of Grammaticality (β = 4.536, z = 6.844, p < .001), indicating that learners judged grammatical sentences more accurately than ungrammatical ones, as shown by the positive coefficient, with predicted accuracies of 98% and 38% respectively. There was a significant effect of Pretraining (β = 0.288, z = 2.003, p = .045), and a significant interaction between Grammaticality and Pretraining (β = 1.298, z = 2.675, p = .007) suggesting that participants with higher scores in the Pretraining task performed more accurately in this task and that the effect of Pretraining was different at different levels of the Grammaticality variable. Follow-up simple slope analysis, which involves estimating and comparing the slopes of the covariate trend for each level of a factor variable, showed that the effect of Pretraining was positive for grammatical sentences but negative and non-significant for ungrammatical items. Finally, the model showed a positive coefficient for the effect of Error Type (β = 2.940, z = 6.292, p < .001), suggesting that, overall, participants performed better on word order than case marking sentences, with 96% and 58% predicted accuracies respectively, and significant interactions between Grammaticality and Error Type (β = -4.466, z = -5.465, p < .001) and between Pretraining and Error Type (β = 0.438, z = 2.075, p = .038). Regarding the former interaction, post-hoc analyses revealed that the effect of Grammaticality, while significant for both types of sentences, was bigger for those in the case marking condition, with the predicted probability of a correct response increasing by 94% from ungrammatical (4%) to grammatical (98%), as compared to an increase of 10% for word order sentences (89% for ungrammatical and 99% for grammatical). For the latter of the two interactions, simple slope analysis found that the effect of pretraining was stronger and significant only for word order sentences.

Final grammatical comprehension test.

Overall, participants exhibited high accuracy on SOV (M = 74.9%, SD = 18.6%), but were below chance on OSV sentences (M = 32%, SD = 23%). Yet another mixed-effects logistic regression model (See S4 Appendix for the full model) was built with Accuracy (correct = 1 vs. incorrect = 0) as a binary outcome, Pretraining, Word Order (contrast coded with OSV as -0.5 and SOV as 0.5) and their interaction as fixed effects, random intercepts for participants and items and random by-participant slopes for Word Order. A significant effect of Word Order was found (β = 2.522, z = 5.538, p < .001), which confirmed that participants reliably identified the correct picture for SOV items (81% predicted accuracy) but encountered difficulties doing so when presented with OSV sentences (26% predicted accuracy) even after four sessions of incidental exposure to the grammatical structure of the artificial language. Finally, the effect of Pretraining was not significant (β = 0.035, z = 0.374, p = .708), nor was the interaction between Pretraining and Word Order (β = 0.055, z = 0.137, p = .891).


As we have seen, our participants’ accuracy scores on the grammatical tasks were relatively low (M = 64.6% in the Grammatical Comprehension test; M = 64% in the GJT; M = 53.4% in the FGT), confirming that learning new grammatical constructions under incidental exposure conditions, without instruction about the structure of the language or feedback on accuracy of performance, is particularly challenging for adult L2 learners [cf. 13, 36, 80]. This difficulty seems to hold even after extensive exposure to the artificial language and despite the fact that participants performed relatively well on the lexical trials even in the early stages of the experiment. Specifically, while participants succeed in interpreting the SOV sentences, their ability to identify the correct scene upon hearing stimuli that had the less frequent OSV word order did not exceed chance level at any point during the study. This could be taken to suggest that participants might have relied primarily on word order cues, as opposed to inflectional morphology, in order to process the Kepidalo sentences. Further support for this interpretation comes from participants’ performance on the GJT. As shown in Table 7, learners were more accurate when presented with sentences that had word order violations, especially with those containing verb placement errors, than when asked to detect the grammaticality of sentences that contained case marking violations. These results are in line with similar recent artificial language learning studies [11, 12] showing that under incidental exposure conditions, adult learners are more likely to develop knowledge of word order than case marking rules. This difficulty with case marking could be at least partially attributed to the effects of learned attention [58]. Specifically, a common theme across these studies is that the participants targeted were native English speakers. Hence, learners’ prior L1 experience with English, a fixed word order language without case marking, may have driven them to look for word order cues when processing the novel sentences, which would in turn block the learning of the low-salient inflectional markers. Furthermore, the fact that they are required to learn word order patterns (i.e., SOV or OSV) that are different from the canonical word order of their native language could have also led them to focus their attention on this aspect of the language.

However, there is one piece of evidence indicating that participants might have learned more than their performance implies. Specifically, performance on the Grammatical Comprehension test shows that, although participants demonstrated superior performance on SOV compared to OVS items, they did not consistently apply the dominant word order to all sentences at any point during the task. Instead, as shown in Fig 2, the mean accuracy rates for the two word order patterns approximated the relative proportion of each pattern in the input participants received during the lexical training parts in each session (i.e., SOV = 74%, OSV = 26%). Interestingly, learners’ performance in the FGCT followed a somewhat similar pattern (SOV: M = 74.5%, OSV: M = 32%). These results could be taken to imply that learners were indeed sensitive to the presence of two distinct word order patterns, as well as their frequency of occurrence. This finding seems to be in accord with previous studies showing that adult L2 learners may be capable of learning the probabilities of occurrence of different patterns in the input without necessarily learning the grammatical forms [101].

Experiment 2

As mentioned in the introduction, a large body of literature has shown that early adult L2 learning is heavily influenced by effects arising from prior L1 knowledge [58, 102, 103]. Such effects can lead learners to mistakenly engage L1-tuned processing affecting the route of L2 development. The results of Experiment 1 appear to be in line with these observations and warrant the question as to whether similar patterns of performance would be found for speakers of a language that relies primarily on case marking cues. In addition, note that, in Experiment 1, there was a small but non-significant increase in learners’ overall grammatical comprehension scores across sessions. Though this cannot be taken as a clear indication of improvement in learning, it seems to be consistent with the idea that incidental acquisition requires extensive amount to linguistic input [104]. Therefore, the slow learning rate seems to be partly attributable to difficulties associated with learning under incidental conditions per se.

In order to investigate the extent to which the results obtained in Experiment 1 stem from learners’ L1 experience and/or from the limited capacity to learn novel grammatical structures through incidental exposure, in Experiment 2, we examined whether the pattern of performance obtained for the L1 English participants could be replicated with native speakers of a language that is morphologically richer than English, namely German. In addition to this, Experiment 2 had two further aims. The first of these was to test whether additional incidental exposure would result in further incremental increases in performance or whether performance would stabilize at a suboptimal level. To do so, the amount of exposure was extended to six sessions, thus providing more opportunities for learning the rules of the language. Second, we also aimed at examining whether learning in incidental conditions depends on the learners’ ability to make explicit inferences about the grammatical structure of the language. While previous studies have focused on tracking the development of awareness during the test phase using source attributions and confidence ratings (e.g., [11, 32, 105]), here, we tested the extent to which learners’ metalinguistic awareness at the level of understanding [106], as measured by a post-test questionnaire, was predictive of vocabulary and grammar learning and we assessed whether this effect varied across sessions.



Thirty-eight participants, all native speakers of German and residents of Germany with a mean age of 25.45 (SD = 4.39, range = 18–35) took part in this study. Participants were recruited through Prolific (, gave consent to participate electronically, and were paid €88 for their time. No participants reported having prior exposure to a language with a rich morphological system, such as Latin or Russian. On average, participants reported having 15.93 (SD = 3.10; range = 6–22) years of formal education.

General procedure.

Participants were invited to take part in an online study consisting of six sessions, during which they were exposed to the same artificial language stimuli as in Study 1. The procedure was identical up to the point described in Study 1 (see Table 1). In Session 5, after completing the GJT and the FGCT, participants were once more asked to complete a Lexical Training task and a Grammatical Comprehension test. The same fours tasks were given again in Session 6, but in a different order, with the Lexical Training and Grammatical Comprehension test preceding the GJT and the FGCT. After completing these measures, a post-test questionnaire was administered, aimed at investigating participants’ explicit knowledge of the grammatical rules of the artificial language.

Metalinguistic awareness questionnaire.

The questionnaire consisted of two parts. The first part comprised 6 questions. Participants were initially presented with a grammatical OSV sentence accompanied by a target and a distractor scene and were asked to select the scene described by the sentence and provide an explanation for their choice. Following that, participants heard five ungrammatical sentences, one for each type of violation sentence that occurred in the GJT (see Table 2) and were asked whether they could identify and explain the error in each sentence.

The second part of the questionnaire diverged from the traditional metalinguistic awareness tests, which typically consist exclusively of error corrections and explanations, as it included 45 True/False questions that were designed to probe participants’ explicit knowledge of the word order and case-marking rules of the artificial language (see Gorilla Open Materials).


Data analysis.

Data obtained from the five artificial language learning tasks (Pretraining, Lexical Training, Grammatical Comprehension test, GJT, FGCT) were coded following the procedure outlined in Study 1. Tables 8 and 9 provide descriptive statistics for the L1 German group’s performance on the Lexical Training task and the Grammatical Comprehension test, and on the remaining measures, respectively. Overall accuracy scores up to session 4 are also provided to allow for a more direct comparison with scores obtained in Experiment 1. For the metalinguistic awareness questionnaire, data from each part were coded separately. Responses in the first part of the questionnaire were coded as correct if the learners identified the error involved in the sentence (e.g., Stimulus: ‘kovo Prado Algi mulek’ ADJ PLACEMENT VIOLATION, Response: ‘the adjective comes after the noun’), while partial credit (0.5) was given if they supplied a partial explanation of the error (e.g., Response: ‘kovo is wrong at the beginning’). All transcribed responses were coded independently by two of the authors (inter-rater reliability measured with Cohen’s kappa was 0.88) and any discrepancies were discussed and resolved through consensus. Participants could get a maximum of six points for this component of the questionnaire. For the second part, each response was coded as 1 if correct and 0 if incorrect. Following that, scores from the two parts were combined to obtain a total metalinguistic awareness score for each participant. Correlation matrices for all measures are reported (S2 Appendix) and Spearman-Brown split-half reliability coefficients for all test measures (S3 Appendix) are provided in the Supplementary Materials. As in Study 1, we fit a series of mixed effects models on the item-by-item responses on each task separately and significant interactions were followed by post-hoc pairwise comparisons using the emmeans package in R. Last, effect sizes (marginal and conditional R2), were computed with the rsquared.GLMM function from the MuMIN package, odds ratios and confidence intervals for the predictor variables with the tab_model function from sjPlot package, predicted probabilities with the ggeffects package and Spearman-Brown split-half reliability for test measures with the splithalf package. These reliability coefficients are provided in S3 Appendix.

Table 8. Mean accuracy (%) and reaction times across sessions in the lexical training and grammatical comprehension blocks for the L1 German learners.

Table 9. Descriptive statistics for performance on the pretraining, post-tests grammar tasks, and the metalinguistic awareness questionnaire for the L1 German learners.

Lexical training.

Average accuracy on the Lexical Training items was greater than chance from as early as the first session and showed an upward trend until Session 4, where it stabilized until the end of the study (Table 8). Accuracy rates were higher for noun (M = 90.2%, SD = 10.2%) and verb trials (M = 88.2%, SD = 14.3%), compared to trials testing adjective learning (M = 78.1%, SD = 18.3%). Furthermore, participants responded with equal accuracy regardless of the Word Order of the auditory stimulus (SOV: M = 85.9%, SD = 13%, and OSV: M = 84.4%, SD = 13.8%).

In order to estimate the effect of Session and Pretraining on lexical learning, data were submitted to a mixed-effects logistic regression model. The model had Accuracy as categorical dependent variable and Session (contrast coded as -1.5, -1, -0.5, 0.5, 1 and 1.5 for Sessions 1 to 6, respectively), Word Order (contrast coded with OSV as -0.5 and SOV as 0.5), Pretraining and Metalinguistic Awareness, both scaled and centered, and all the two-way interactions as predictors. The procedure for building the model and for identifying the best random-effects structure was identical to that described in Study 1. The model had intercepts for participants and items, by-participant random slopes for Session, Word Order, and by-item random slopes for Session.

The results, shown in Table 10 revealed main effects for Session and Word Order which indicates that response accuracy improved for both word orders across sessions and was higher for SOV than for OSV sentences, as can be seen by the positive coefficients of the effects and the predicted probabilities for the two word orders (95% and 92% respectively). A significant interaction between Session and Word Order was also found. Post-hoc analyses revealed that the effect of Session was stronger for SOV sentences (Fig 3), suggesting that, as the study progressed, participants were more likely to select the correct image upon hearing sentences with the SOV structure as compared to when hearing OSV sentences. Furthermore, there were significant effects of Pretraining and Metalinguistic Awareness suggesting that participants with better scores on these measures exhibited greater lexical learning gains. The model also showed a significant interaction between Metalinguistic awareness and Session, for which a follow-up simple slope analysis showed that the effect of Metalinguistic Awareness varied across sessions, becoming stronger over time.

Fig 3. Predicted probability of a correct answer as a function of Session and Word Order in Lexical Training (left) and Grammatical Comprehension test (right) in L1 German participants.

Table 10. Mixed-effects model fitted to the lexical training data from L1 German participants.

Grammatical comprehension test.

As shown in Table 8, participants, on average, demonstrated above chance performance across all sessions. Learners’ ability to correctly identify the target scene seemed to improve between Sessions 1 and 2 and Session 2 and 3. However, accuracy remained stable from this point onwards, albeit with small fluctuations. In terms of Word Order, participants were more accurate with SOV (M = 81.7%, SD = 15%) than with OSV sentences (M = 40.1%, SD = 28.6%).

In order to determine whether there were any associations between participants’ reported level of explicit knowledge of the target grammatical rules and grammatical comprehension, a mixed-effects logistic regression model was used. Accuracy was modeled as a binary dependent variable. Metalinguistic awareness and Pretraining, Word Order and Session, both contrast coded as in the model on Lexical Training, and all the two-way interactions were added as fixed effects. The random-effects structure included random intercepts for participants and items as well as by-participant random slopes for Session, Word Order and Metalinguistic Awareness and a by-item random slope for Pretraining. The model building procedure was identical to that detailed in Study 1.

The results of the model are shown in Table 11. The model returned significant main effects for Session and Word Order, which means that while learners improved significantly across sessions, accuracy throughout the study was driven primarily by correct responses on SOV sentences, which corresponds to an estimated 90% accuracy as opposed to 35% for OSV sentences. Additionally, there was a significant interaction between the two predictors which was followed up by post hoc analysis that revealed that the difference in performance on SOV and OSV stimuli remained significant across sessions (Fig 3). The model also showed a significant effect of Metalinguistic Awareness indicating that learners’ level of conscious knowledge was predictive of grammatical comprehension scores. Moreover, there was a significant interaction between Session and Metalinguistic Awareness. A post-hoc simple slope analysis revealed that the effect of Metalinguistic Awareness became progressively more pronounced over the course of the study. In the first session, it was negative and nonsignificant. In the second session, it became positive but remained nonsignificant. It finally reached significance in Session 3 and was significant in all the subsequent sessions.

Table 11. Mixed-effects model fitted to the grammatical comprehension test data from L1 German participants.

Grammaticality judgment tasks.

Table 12 displays the descriptive statistics for accuracy scores in the two GJTs. Overall, in both tasks, performance at the group level was above chance (for GJT 1: M = 71.3%, SD = 12.2%, and for GJT 2: M = 72.4%, SD = 13.1%). Participants exhibited similar performance patterns in both tasks, with better performance on grammatical than ungrammatical sentences and on sentences containing word order violations (Verb placement and Adjective before noun) than on sentences with case marking errors (Adjective noun agreement, Two accusatives and Two nominatives).

Table 12. Mean percentage correct responses (SDs) and d’ scores by sentence type the grammaticality judgement task for the L1 German learners.

Data from the two GJTs were analyzed jointly in a mixed-effects logistic regression model. The model included Accuracy as the outcome variable and Metalinguistic Awareness, Pretraining, Word Order (all coded as in the Grammatical Comprehension test model above), Grammaticality (contrast coded with ungrammatical sentences as -0.5, and grammatical as 0.5), Error Type (contrast coded with case marking sentences as -0.5 and word order sentences as 0.5), Test Time (contrast coded with GJTTime 1 as -0.5, and GJTTime2 as 0.5) as predictors. The model’s random-effects structure included random intercepts for participants and items, by-participant random slopes for Grammaticality and Error Type and by-item random slopes for Test Time and Metalinguistic Awareness.

The model output (See S4 Appendix for the full model) showed significant effects of Grammaticality (β = 3.226, z = 6.695, p < .001) and Metalinguistic Awareness (β = 1.02, z = 6.725, p < .001), indicating that participants were less accurate when judging ungrammatical items relative to grammatical ones, with a difference in the predicted probability of a correct answer of 30% (68% vs. 98%), and that participants with higher scores on the metalinguistic awareness questionnaire were more likely to judge the grammaticality of the sentences correctly. In addition, the model returned significant interactions between Grammaticality and Pretraining (β = 0.878, z = 2.062, p = .039) and between Grammaticality and Metalinguistic Awareness (β = -1.507, z = -3.551, p < .001). Follow-up simple slope analyses showed that the effect of Pretraining was significant only for grammatical sentences, whereas the effect of Metalinguistic Awareness was significant only for ungrammatical sentences (Fig 4). A significant positive coefficient for Error Type (β = 2.206, z = 7.428, p < .001) was also detected, meaning that, overall, learners were more accurate in judging the grammaticality of word order than case marking sentences (97% and 78% predicted accuracies respectively). Moreover, the model showed significant interactions between Error Type and Grammaticality (β = -5.020, z = -8.783, p < .001) and between Error Type and Pretraining (β = 0.400, z = 2.600, p = .009). With regard to the Error Type and Grammaticality interaction, the result of follow-up analyses revealed that the effect of Grammaticality was stronger and significant only for sentences in the case marking condition. The probability of a correct answer was higher for grammatical (98%) than for ungrammatical sentences (17%), while no such difference was observed for word order sentences (98% vs. 96%). With respect to the interaction between Error Type and Pretraining, a simple slope analysis showed that the effect of Pretraining was significant only for word order sentences. Last, an interaction between Test Time and Awareness emerged, and the results of a simple slope analysis showed that the effect of Test Time was stronger for participants who achieved higher scores on the metalinguistic awareness questionnaire and significant only for those with higher scores (mean +1SD).

Fig 4. Predicted probability of a correct answer as a function of grammaticality and metalinguistic awareness scores in the grammaticality judgement task.

Final grammatical comprehension tests.

Descriptive statistics of the scores on the FGCTs in Session 5 (FGCT 1) and Session 6 (FGCT 2) revealed that learners were more accurate with SOV (for FGCT 1: M = 83.2%, SD = 18.3%, and for FGCT 2: M = 86.5%, SD = 16.1%) than OSV items (for FGCT 1: M = 44%, SD = 36.3%, for FGCT 2: M = 44.7%, SD = 40.2%).

Results from the two tests were analyzed using a mixed-effects logistic regression model. Accuracy was used as the outcome variable and Metalinguistic Awareness, Pretraining, Word Order and Test Time (contrast coded with FGCT 1 as -0.5 and FGCT 2 as 0.5) and all their two-way interactions were included as predictors. The best-fitting model contained random intercepts for participants and items, by-participant random slopes for Word Order.

The model (See S4 Appendix for the full model) revealed main effects of Word Order (β = 2.865, z = 5.192, p < .001), Test Time (β = 0.377 z = 2.715, p = .007) and Metalinguistic Awareness (β = 1.960, z = 7.262, p < .001). The positive estimates for the effects suggest that participants performed significantly better on SOV compared to OSV items (estimated accuracies of 94% and 45% respectively), that accuracy improved significantly between FGCT 1 and 2 (estimated accuracies of 74% and 81% respectively), and that participants with higher Metalinguistic Awareness scores showed better grammatical comprehension skills. Finally, there were interactions, though not significant, between Pretraining and Metalinguistic Awareness (β = 0.557, z = 1.904, p = .057) and between Test time and Metalinguistic Awareness (β = 0.359, z = 1.882, p = .060). Simple slope analyses revealed that the effect of Metalinguistic Awareness was less strong for participants with lower Pretraining scores and stronger for FGCT 2 than for FGCT 1.

L1 experience effects.

To investigate the impact of prior L1 experience on L2 grammatical learning, the L1 German group’s performance on the GJT1 and the FGCT1, both administered in session 5, was compared to that of the L1 English learners from Study 1. Before completing these tasks, both groups had received the same amount of exposure to the artificial language.

We first analyzed data from the GJT using a mixed-effects logistic regression model, with Accuracy (correct = 1, incorrect = 0) modeled as the binary outcome variable. The model included Grammaticality (contrast coded as in the GJT model above), Group (contrast coded, with the L1 English group as -0.5 and the L1 German group as 0.5), Error Type (contrast coded with case marking sentences as -0.5 and word order sentences as 0.5) and their interaction as predictors. The random-effects structure of the model had random intercepts for Participants and Items, and random slopes for Grammaticality and Error Type over Participants and random slopes for Group over Items.

The model (See S4 Appendix for the full model) revealed main effects of Grammaticality (β = 3.731, z = 7.806, p < .001) and Error Type (β = 2.395, z = 6.603, p < .001), indicating that participants in both groups performed better on grammatical than ungrammatical sentences and on word order than case marking sentences. Furthermore, a significant interaction between Grammaticality and Error Type (β = -4.159, z = -6.124, p < .001) was detected, for which post-hoc analysis showed that, overall, the effect of Grammaticality was significant for both error types but stronger for items in the case marking condition. Furthermore, the interaction between Group and Error Type (β = -0.792, z = -2.339, p = .019) was also significant, and a post-hoc analysis revealed that the L1 German learners performed significantly better on case marking sentences compared to their L1 English counterparts (with an estimated 76% accuracy vs. 59% respectively). This difference in performance was particularly evident in learners’ judgements of ungrammatical sentences containing case marking violations (Fig 5).

Fig 5. The distribution of scores across L1 groups and types of violation.

The black diamonds indicate group means and the solid blacklines indicate chance accuracy level.

Finally, another mixed-effects logistic regression model was built to determine the potential effect of L1 on groups’ performance in the first FGCT. Accuracy was entered as the dependent variable and Word Order (contrast-coded), Group (contrast-coded as in the previous two models) and their interaction as predictors. The model (See S4 Appendix for the full model) also had random intercepts for participants and items, and by-participant random slopes for Word Order. The model returned significant effects for both Word Order (β = 2.632, z = 6.883, p < .001) and Group (β = 0.783, z = 2.645, p = .008), showing that, while both groups performed better on SOV than on OSV sentences, the L1 German had higher accuracy overall compared to the L1 English group (with an estimated 74% accuracy vs. 55% respectively).


The primary goal of Experiment 2 was to test whether the results of Experiment 1 would be replicated with learners who have a different L1 background. In line with our prediction, L1-German learners were found to be more accurate than the L1-English group in all grammar tasks. This difference in performance appears to be consistent with previous studies suggesting that L2 sentence processing is modulated by learners’ L1 background, such that syntactic structures that are similar in L1 and L2 are learned more easily (e.g., [102, 107]), reflecting positive transfer from L1 to a new language in the absence of instruction on grammar structure.

Despite the differences in grammatical comprehension abilities between the two groups, closer inspection of the data revealed compelling similarities in their learning outcomes. Specifically, both groups had higher accuracy rates on SOV than on OSV sentences and were more accurate on judging sentences as ungrammatical when they contained a word order violation as opposed to when they involved a case marking error. These findings suggest that, despite experiencing positive transfer effects, adults continue to face difficulties in learning new grammatical rules under incidental conditions.

Another contribution of Experiment 2 involved testing whether the provision of additional incidental exposure suffices for obtaining stronger learning effects. Results demonstrated that increasing exposure, by providing 540 additional lexical training items over 2 sessions, leads only to very small learning gains. Performance in the Grammatical Comprehension blocks increased by 2.5% from session 4 to session 6 (i.e., ~2 additional correct responses over the last 2 sessions), while overall accuracy increased only marginally between GJT1 and GJT2. Counter to these results, learners became significantly more accurate from FGCT1 to FGCT2. However, this increase in accuracy came only from better identification of SOV sentences. Collectively, while Experiment 2 does not allow for a clear answer to the question of whether further incidental exposure can lead to better grammatical comprehension skills–the possibility that one or two additional sessions of exposure could result in significant increases in accuracy, albeit unlikely, cannot be precluded–, it shows the inefficiency of incidental learning as the sole L2 learning mode.

Finally, the data suggest a strong link between the development of grammatical comprehension, even under incidental conditions, and the emergence of conscious knowledge about the target linguistic structures. Characteristically, although performance in the Grammatical Comprehension blocks improved over time, this improvement was detected only in participants who exhibited average or above average performance in the post-test questionnaire. What’s more, the metalinguistic awareness effect became stronger over time. While counterintuitive at first sight, this finding seems to reflect the fact that learners likely required a relatively large amount of input before developing an understanding of the target grammatical structures. As shown in Table 8, participants showed limited grammar learning in session 1 (55.5%), which may have not been enough for such an effect to emerge. However, as performance began to diverge from chance and learning scores improved, the effect of metalinguistic awareness became more pronounced. Learners with high metalinguistic awareness scores were also more likely to perform better in both GJT and FGCT. Regarding the former task, metalinguistic awareness was found to be more predictive of participants’ accuracy in ungrammatical sentences, with aligns with earlier findings that responses to this type of stimuli are more likely to tap into learners’ explicit knowledge [108]. Analyses of the data also revealed a relationship between degree of metalinguistic awareness and lexical learning. First, participants who demonstrated a higher level of awareness of their grammatical knowledge performed better in the Lexical Training task. Secondly, in both the Grammatical Comprehension test and the FGCT, the effect of metalinguistic awareness was greater for learners who achieved higher scores in the Pretraining task. Overall, these findings appear to highlight the facilitative role of explicit knowledge that is available to awareness in the process of learning new grammatical structures [106, 109].

General discussion

The current study examined adult learners’ capacity to learn new grammatical rules from incidental exposure. Our main goal was to examine whether adults can learn different aspects of grammar incidentally and the degree to which learning is modulated by their prior L1 experience. In the two experiments presented here, learners with different L1 backgrounds (English and German) were exposed to Kepidalo, a new artificial language that included case marking while allowing for a relatively flexible word order with a canonical SOV and a non-canonical OSV order. Furthermore, in an effort to extend previous literature, we investigated if extensive incidental exposure can lead to higher levels of learning than those reported in previous studies and whether the degree of learners’ conscious knowledge of the grammatical regularities contributed to their learning outcomes.

Learning novel word order and case marking rules through extensive incidental exposure (RQ1, RQ2, & RQ3)

The results of Experiments 1 and 2 showed that, even after extensive incidental exposure to the artificial language, overall, both L1-English and L1-German learners continue to exhibit only partial knowledge of the language’s grammatical rules. These findings appear to align well with previous artificial language studies showing that learning nonnative structures under incidental conditions poses a great challenge for adult learners, leading to generally low learning effects (e.g., [11, 13, 40]).

That is, however, not to say that the learning difficulties observed apply uniformly to all grammatical aspects of the language. Rather, it appears that participants succeed in learning certain aspects of word order. Evidence of this comes from the GJT, where performance on sentences containing verb placement violations was found to be close to ceiling, indicating that participants from both language groups possessed well-developed knowledge of the verb-final rule of the language. This is in accord with the learning outcomes reported by Rebuschat et al. [12] and Walker, Monaghan, Schoetensack, and Rebuschat [110]. We hypothesize that the high accuracy on trials involving verb placement violations observed here can be attributed to a number of factors that work in unison. Firstly, the rigidity of verb placement may have rendered the verb position a consistent and reliable cue to word order, making it more accessible and learnable to participants. Secondly, given that items that appear towards the end of sentences are privileged during sentence processing [111113], it is likely that the systematic occurrence of verbs in sentence-final position may have increased their overall perceptual salience, making it easier for participants to learn them and, consequently, identify them in the sentence. This allowed learners to notice the erroneous verb placements in the GJT and correctly judged these sentences as ungrammatical. A third factor may relate to the morphological properties of the novel verbs. In fact, all nonce-verbs created for this study were phonologically similar. They all were disyllabic, had the same syllable structure (CVCVC) and carried the suffix -ek. These properties may have additionally increased the salience of words referring to verbs and facilitated their recognition. Thus, to detect the ungrammaticality of the sentences, participants may have simply relied on the absence of these target properties from the sentence-final word. Finally, learners’ knowledge of the verb placement rule may be tightly linked to the effective learning of verb referents [12, 110]. This seems to be fully in line with previous research demonstrating robust relationship between the development of vocabulary and grammar (e.g., [12, 114116]).

The GJT results also suggest that, by session 5, participants had developed an at least partial understanding of the noun–adjective word order. However, the learning effects observed were not equivalent for the two language groups. While the L1 German participants responded with above-chance accuracy on sentences containing adjective placement violations (Table 12), the L1 English learners’ performance was not greater than chance (Table 7). Following the above-mentioned claim regarding the relation between vocabulary and grammar, we argue that one likely reason for the difference in accuracy between the two groups may be tied to the better learning of the word-referent mappings for adjectives in the L1 German group (M = 78.1%) compared to the L1 English group (M = 68.9%). A second reason for these results may stem from participants’ previous foreign language learning experiences. While none of the L1 German learners had previous knowledge of additional case marked languages, all of them had learned English and nearly all of them (33/38) had been introduced to either Spanish or French in the school setting (foreign language learning is compulsory in Germany). Importantly, in both Spanish and French, color adjectives strongly tend to occur in post-nominal position [117, 118]. Although none of the German participants included in this study reported working knowledge of any additional languages other than English, it remains possible that their earlier linguistic experience with Spanish or French may have facilitated the detection of similarities between the previously taught languages and Kepidalo, leading to positive transfer effects. Such a possibility would be in line with the idea of lexical and syntactic transfer from L2 to L3 [119121].

Yet, the learning of the noun–adjective word order was not as effective as that observed for the verb placement rule. We hypothesize that this difference in learning outcomes is directly related to input frequency. Indeed, there is a great deal of research emphasizing the role of frequency in L2 learning [52, 122124]. Similar to L1 acquirers, L2 learners are sensitive to the frequency of syntactic constructions in the input, and thus, constructions that occur frequently are often more fluently processed than the less frequent ones. In Kepidalo, in contrast to nouns and verbs, adjectives were optional and appeared only in half of the sentences, providing fewer opportunities for learning. This is reflected in the lower learning outcomes for both the referents for adjectives and the noun-adjective word order, compared to the referents for verbs and the verb placement rule. Further support for the role of input frequency comes from the finding that additional exposure to language led to an improvement in L1 German participants’ performance on adjective placement violation trials between sessions 5 and 6.

Finally, further evidence of grammar learning emerges from participants’ responses in the Grammatical Comprehension tests. In particular, throughout the study, participants in both language groups exhibited superior performance on sentences that had the canonical SOV word over those with the non-canonical OSV sentences. This pattern of performance may be linked to the relative frequencies of occurrence of the two word orders in the input. Hence, it is likely that the higher frequency of the SOV word order promoted learning of this pattern. An additional factor that may have contributed to this frequency effect can be traced to the noun pretraining task. Recall that in that task, all nouns appeared in the nominative case only (e.g., Alg-i). Thus, the early presentation of nouns in the nominative case ending might have led participants to memorize these forms as unanalyzable chunks (e.g., Algi) and take them to be the only potential word-forms that can be mapped onto the different referents. Later, the increased likelihood of occurrence of these word-forms in the highly salient sentence-initial position during the lexical training and grammar test blocks may have facilitated their recognition, resulting in higher accuracy on the SOV sentences.

The learning of the canonical SOV pattern can also be taken to reflect a learners’ subject-first preference when assigning grammatical roles to noun phrases in transitive sentences [125127]. One potential explanation for this preference is that placing the subject/agent before the object/patient allows learners to engage a simple sequential processing strategy for determining the meaning of the sentences. Given the increased complexity of the artificial language, and due to lack of explicit instruction, learners may have adopted this strategy as it is less cognitively taxing than using case marking information [128, 129]. Accordingly, the stronger subject-first preference observed in the initial stages of exposure may reflect learners’ tendency to adopt this strategy, firstly, due to insufficient evidence regarding the presence of a second word order and, secondly, because it frees up more cognitive resources which they can use for processing the sentences for meaning, given that they are still attempting to learn the novel vocabulary.

A further explanation for the better learning attested for the SOV pattern can be linked to L1 experience effects. However, the manner in which these effects emerge are different for the two language groups. First, regarding the L1 German participants, L1 effects may stem directly from the application of L1-based strategies on L2 sentence processing. Native speakers of German tend to exhibit a preference for subject-first (SOV) readings when presented with Noun-Noun-Verb sentences in their L1 [130132]. Accordingly, similar to our findings, it has been shown that L1 German L2 learners are also inclined to interpret non-native Noun-Noun-Verb strings as SOV [133135]. On the other hand, the effect of L1 experience follows a more indirect path in the case of the L1 English learners. In contrast to native speakers of German, L1 English speakers usually assign an OSV interpretation to Noun-Noun-Verb sequences in their native language [133, 136]. Thus, the pattern of performance observed here cannot be directly attributed to L1 transfer. Rather, our results appear to be in line with the idea of meta-transfer [137]. In order to determine the grammatical roles of subject and object in a sentence, native speakers of English rely strictly on word order cues. Subsequently, when exposed to a new language, instead of simply employing the English surface word order (i.e., SVO), L1 English learners tend to transfer their sensitivity to word order as the main processing strategy from their L1 to L2. Thus, potential meta-transfer effects are thought to be strongly constrained by input frequency. In this study, the higher frequency of the SOV order as well as the early presentation of nouns in the nominative case may have led participants to abandon their L1-based preferences for SVO, or for OSV, when presented with Noun-Noun-Verb sentences, in favor of the interpretation that was more available, namely SOV. Interestingly, our results seem to corroborate earlier findings from natural language learning studies showing that, in contrast to native speakers of Japanese, L1 English learners of Japanese tend to over-rely on the canonical SOV order [137139].

Despite our participants’ success with the canonical SOV order, accuracy on OSV sentences in both groups was below chance levels across all sessions in both groups. While factors like input frequency and L1 experience can be used to account for the better learning of the canonical order documented in Experiments 1 and 2, they can equally be employed for explaining participants’ difficulties with the non-canonical OSV order. Indeed, previous studies have shown that grammatical constructions that occur less frequently in the input or share less structural properties with previously learned languages present more processing difficulties for non-native speakers and tend to be acquired later [140143].

Most importantly, however, the limited learning of the non-canonical word order could be seen as an outcome of participants’ struggles with learning case marking. Results of the GJT in both Experiments 1 and 2 revealed that participants showed a strong tendency to incorrectly accept ungrammatical trials that contained case marking violations (Tables 7 & 11). This appears to be fully in agreement with the well-documented difficulties of adult learners in processing and acquiring L2 inflectional morphology (e.g., [51, 60, 144, 145]).

One of the main sources of these difficulties is thought to be the low perceptual salience of inflectional morphemes [146, 147]. In contrast to most lexical items, morphemes are usually made up of a single segment or syllable, are unstressed, and, due to their word-final position, are often likely to be fused with surrounding items, making them hard to perceive and learn. These low salience effects can be additionally influenced by two factors, namely learned attention and decomposition. The fact that neither English nor German mark the singular nominative and accusative cases on nouns, may have led learners to direct their attention to other L1-related cues for interpretation, instead of attending to case marking. While word order appears to be an obvious candidate for native speakers of English, as mentioned earlier, we suspect that this was also the cue L1 German learners use, at least during the initial stages of learning. Secondly, limitations in adult learners’ decomposition ability [148, 149], may have led participants to treat the novel words as unanalyzable wholes, instead of decomposing them into stems and suffixes, hampering the detection and processing of case marking information. It should, however, be mentioned that, although both language groups experience difficulties in learning case marking, given the marginal use of case marking in English, these are far more pronounced in the L1 English group. In sum, these results seem to further confirm the idea that acquiring L2 morphology incidentally can be particularly challenging for adult learners [1113, 40], while also demonstrating that such learning difficulties can persist even for learners who have native knowledge of a morphologically rich language.

In light of these findings, the present study joins the handful of previous studies examining the simultaneous learning of word order and case marking [11, 12, 110] in showing that word order is more susceptible to learning in incidental contexts of exposure and, as a result, acquired faster. Importantly, while learned attention and cue salience tend to reduce the noticeability of grammatical morphemes rendering them less learnable for non-native speakers, sensitivity to sequential probabilities is thought to remain available in adults [16, 150], enabling them to extract sequential patterns from structured or unstructured input. Indeed, a number of studies have found a strong association between sequential learning and L2 sentence processing (e.g., [151, 152]). In fact, sensitivity to sequential/temporal cues appears to be pervasive in L2 acquisition. For instance, work on cue weighting in L2 speech perception shows that early L2 learners tend to rely primarily on temporal (i.e., duration) instead of spectral information in order to distinguish between speech sounds [153155], even in cases where their L1 makes little use of duration [156]. Thus, overall, our findings appear to highlight the important role of sequential processing in L2 acquisition [8, 157, 158].

Another objective of the present study was to examine whether the provision of extensive incidental exposure to input can increase the robustness of novel grammar learning (RQ3). Our results show that despite the improvement in participants’ performance over the course of the study, not all aspects of grammar were learned reliably. Specifically, while extensive incidental exposure was sufficient for learning rules related to surface word order patterns, it was not enough to lead to the acquisition of case marking. Thus, our results echo previous studies evidencing relatively low learning effects for L2 case marking under incidental conditions (e.g., [11, 13]), thereby suggesting that in order for low salient morphosyntactic forms such as inflectional morphemes to be effectively learned by adult L2 learners, explicit types of instruction, as well as sufficient exposure, should be considered basic preconditions [4, 159, 160]. It should, however, be acknowledged that although the training regimen employed here is described as extensive, the amount of artificial language input provided makes up only a limited proportion of the input learners usually receive in naturalistic contexts over an extended period of time. Hence, the observed learning outcomes can only be taken to be reflective of the very early phases of language acquisition.

The effect of L1 experience on incidental L2 grammar learning (RQ4)

Turning to the fourth research question, our results indicate the presence of significant L1 effects in the learning of novel grammatical constructions. Specifically, we found that the L1 German learners demonstrated better grammatical comprehension abilities than their L1 English counterparts and were more sensitive to both case marking and word order violations. Characteristically, the number of participants who demonstrated learning of case marking was highest among the L1 German group (Fig 4). In addition, according to the results from the Grammatical Comprehension blocks the discrepancy in performance between the two language groups seems to be numerically greater in the later sessions, indicating that the influence of L1 may become stronger as participants receive more input. Thus, although providing extensive exposure to input under incidental conditions did not result in robust grammar learning, it allowed for strong L1 effects to emerge.

These findings, however, do not mean that L1 experience does not play a role upon the very first exposure to the novel language. In fact, as mentioned earlier, such effects are expected to be at play from the very first stages of L2 learning [161, 162] influencing learners’ preference for certain cues over others (e.g., SOV over OSV or word order over case marking). What could, then, cause this delayed emergence of the influence of L1 background in the accuracy rates of the two language groups? Firstly, early L1 effects may have been masked by the increased complexity of the Grammatical Comprehension test in the initial stages of exposure. In addition, the meaning-focused nature of the preceding Lexical Training task may have biased participants towards processing the language for meaning, leading learners from both groups to engage equivalent processing strategies and, hence, hindering the detection of L1-specific influence. Most importantly, this pattern of development appears to be congruent with earlier research suggesting a piecemeal integration of cues during the early stages of L2 learning [22, 62, 163165]. In particular, learners are thought to first rely on a single cue or pattern and upon learning it, they proceed to the next one, which, in turn, enables them to use both cues in a combinatory manner. The order in which different cues are learned is largely influenced by their availability (i.e., how often they are present in the L2 input) and by transfer effects stemming from prior L1 knowledge. Therefore, the developmental pattern that emerged from the current study can be seen to reflect this incremental learning process. During the initial stages of exposure, learners from both L1 groups relied on the more salient and available SOV order as the main cue to agent and patient identification. As they received more input, their performance began to diverge. While the vast majority of L1 English learners continued to show a strong reliance to word order, given its dominance in L1 processing, L1 German learners were more likely to demonstrate a gradual shift from the initial reliance to word order to the more reliable case marking, benefitting from their prior extensive experience with case marking. Despite this, it should be noted that although the aforementioned developmental pattern accounts well for the current data, learners’ performance was subject to large individual differences. Indeed, not all native speakers of German exhibited learning of case marking and, conversely not all participants from the L1 English group failed to pick up on the case marking cue. All in all, the results show that prior L1 experience plays a crucial role in learning novel grammatical constructions under incidental exposure conditions. However, a considerable amount of input exposure may be required in order to capture such L1 effects.

The role of awareness in incidental L2 grammar learning (RQ5)

Our final research question concerned the extent to which learning grammar under incidental conditions is contingent upon the development of explicit knowledge of which learners are aware. As already stated in the interim discussion, the effect of metalinguistic awareness was ubiquitous across all tasks administered, underscoring its role in the course of the L2 learning process and its importance for the less salient aspects of grammar [1, 160, 166168]. Crucially, this indicates that the emergence of metalinguistic awareness of the underlying grammatical patterns remains predictive of learning outcomes even in conditions where learning occurs unintentionally and without explicit feedback.

Overall, the findings yielded by this study raise two important points that are worth discussing. First, the present results are compatible with previous findings from artificial language learning studies showing that higher levels of awareness, either at the level of noticing or at the level of understanding [106], are associated with greater learning gains following incidental exposure, a trend that appears to be particularly strong for learning of case marking (e.g., [11, 13, 17, 40]). Furthermore, although our participants’ level of awareness was identified as a strong predictor of grammar learning, its effect only reached significance by the third session, indicating that a substantial amount of incidental exposure is likely required before making valid explicit inferences about the less salient grammatical aspects of a novel language.

Secondly, and building on the previous point, even though the predictive power of conscious awareness suggests that L2 grammar learning was largely driven by explicit knowledge, it does not mean that implicit knowledge was entirely absent. Note that we relied on offline measures to gauge both participants’ level of metalinguistic awareness and learning outcomes. Awareness was assessed via a retrospective questionnaire in which, similar to retrospective verbal reports, the provision of a correct answer or rule description does not necessarily entail that learners have only developed explicit knowledge [39, 169]. Grammar learning, on the other hand, was assessed using untimed 2AFC tasks and GJTs, both of which are thought to allow participants to consciously reflect upon the learned material before arriving to an answer, increasing the likelihood of engaging metalinguistic processes [160]. Thus, it is possible that the effect of metalinguistic awareness as well as the learning effects may be somewhat modulated by task-specific characteristics. The extent to which similar results can be obtained using online measures of learning (e.g., visual-world eye-tracking paradigm, word-monitoring task) is a question that needs to be further explored (see Pili-Moss [81] for similar results with a different type of accuracy measure). Nevertheless, the present findings appear to be in line with earlier research on L2 learning suggesting that the emergence of metalinguistic awareness plays an important, if not critical, role for the creation of L2 grammatical knowledge [1, 166, 168].


Using a novel artificial language, Kepidalo, the current study attempted to provide further insight into adult learners’ ability to acquire new grammatical structures under incidental exposure conditions. While more replications are required in order to verify the generalizability of the reported results, the current set of findings replicates and extends earlier studies by demonstrating that learning difficulties, particularly with the less salient aspects of grammar, persist even after extensive exposure to language input, indicating that incidental language learning continues to pose a significant challenge for adult learners. Nevertheless, the magnitude of the difficulties was found to differ as a function of L1 background. In particular, leveraging their prior experience with a morphologically rich L1, native speakers of German showed better learning of both word order and case marking compared to their L1 English counterparts. This suggests that the role of L1 transfer should be carefully considered when investigating incidental learning of novel grammatical structures. Finally, the study revealed a strong link between metalinguistic awareness and language learning rates, highlighting, once again, the role of awareness in L2 learning. Importantly, however, participants exhibited some signs of learning during the initial stages of exposure and before there was an observable effect of metalinguistic awareness, bringing up the question of how explicit knowledge arises in incidental contexts of exposure. Earlier work from the field of consciousness suggests that explicit knowledge may emerge from implicit representations [170172]. Further research is needed to shed light on this topic.

Furthermore, it should be acknowledged that the learning outcomes of both language groups are characterized by large individual differences. While examining the sources of individual variation was outside the focus of the current study, future research will benefit from investigating how individual differences relate to the learning of various grammatical structures under incidental exposure conditions. In addition, although grammar learning was found to be modulated by L1 experience, this effect emerged only after participants had received adequate input. It is, hence, recommended that future work consider including more than a single session in order to better gauge any potential L1 effects. Whether further increasing the typological similarity between the artificial language and participants’ L1 (e.g., Russian or Japanese, which mark case on the noun rather than the determiner, as in German) would facilitate learning also remains to be seen.


  1. 1. Ellis NC. At the interface: Dynamic interactions of explicit and implicit language knowledge. Stud Second Lang Acquis. 2005; 27(2):305–352.
  2. 2. Ellis NC, Wulff S. Usage-based approaches to L2 acquisition. In: VanPatten B, Keating GD, Wulff S, editors. Theories in second language acquisition. Abingdon: Routledge; 2020. p. 128–161.
  3. 3. Rebuschat P, editor. Implicit and explicit learning of languages. Amsterdam: John Benjamins Publishing Company; 2015 Sep 15.
  4. 4. Goo J, Granena G, Yilmaz Y, Novella M. Implicit and explicit instruction in L2 learning. In: Rebuschat P, editor. Implicit and explicit learning of languages. Amsterdam: John Benjamins; 2015. p. 443–482.
  5. 5. Norris JM, Ortega L. Effectiveness of L2 instruction: A research synthesis and quantitative meta‐analysis. Lang Learn. 2000 Sep; 50(3):417–528.
  6. 6. Spada N, Tomita Y. Interactions between type of instruction and type of language feature: A meta‐analysis. Lang Learn. 2010 Jun; 60(2):263–308.
  7. 7. Dąbrowska E. Experience, aptitude and individual differences in native language ultimate attainment. Cognition. 2018 Sep 1; 178:222–235. pmid:29886057
  8. 8. Ellis NC. Sequencing in SLA: Phonological memory, chunking, and points of order. Stud Second Lang Acquis. 1996 Mar; 18(1):91–126.
  9. 9. Paradis M. Declarative and procedural determinants of second languages. Philadelphia: John Benjamins Publishing Company; 2009.
  10. 10. Hulstijn JH. Implicit and incidental second language learning: Experiments in the processing of natural and partly artificial input. In: Dechert HW, Raupach M, editors. Interlingual processes. Tübingen: Narr; 1989 p. 49–73.
  11. 11. Grey S, Williams JN, Rebuschat P. Incidental exposure and L3 learning of morphosyntax. Stud Second Lang Acquis. 2014 Dec; 36(4):611–645.
  12. 12. Rebuschat P, Monaghan P, Schoetensack C. Learning vocabulary and grammar from cross-situational statistics. Cognition. 2021 Jan 1; 206:104475. pmid:33220942
  13. 13. Rogers J, Révész A, Rebuschat P. Implicit and explicit knowledge of inflectional morphology. Appl Psycholinguist. 2016 Jul;37(4):781–812.
  14. 14. DeKeyser R. Implicit and Explicit learning. In: Doughty C, Long MH, editors. Handbook of second language acquisition Oxford: Blackwell; 2003. p. 312–348.
  15. 15. Leow RP. ISLA: How implicit or how explicit should it be? Theoretical, empirical, and pedagogical/curricular issues. Lang Teach Res. 2019 Jul;23(4):476–493.
  16. 16. DeKeyser RM. The robustness of critical period effects in second language acquisition. Stud Second Lang Acquis. 2000; 22:499–533.
  17. 17. Brooks PJ, Kempe V. Individual differences in adult foreign language learning: The mediating effect of metalinguistic awareness. Mem cogn. 2013 Feb;41(2):281–296. pmid:23055121
  18. 18. Morgan-Short K, Faretta-Stutenberg M, Brill-Schuetz KA, Carpenter H, Wong PCM. Declarative and procedural memory as individual differences in second language acquisition. Biling-Lang Cogn. 2014; 17:56–72.
  19. 19. DeKeyser R. Skill acquisition theory. In: VanPatten B, Keating GD, Wulff S, editors. Theories in second language acquisition. Abingdon: Routledge; 2020. p. 83–104.
  20. 20. Ullman MT. The declarative/procedural model. In: VanPatten B, Keating GD, Wulff S, editors. Theories in second language acquisition. Abingdon: Routledge; 2020. p. 128–161.
  21. 21. MacWhinney B. The Competition Model: The input, the context, and the brain. In: Robinson P, editor. Cognition and Second Language Instruction. New York: Cambridge University Press; 2001. p. 69–90.
  22. 22. MacWhinney B, Bates E (1989) The crosslinguistic study of sentence processing. New York: Cambridge University Press.
  23. 23. Ruiz S, Tagarelli KM, Rebuschat P. Simultaneous acquisition of words and syntax: Effects of exposure condition and declarative memory. Front Psychol. 2018:1168. pmid:30050480
  24. 24. Tagarelli KM., Ruiz S, Vega JLM., Rebuschat P. Variability in second language learning: The roles of individual differences, learning conditions, and linguistic complexity. Stud Second Lang Acquis. 2016 Jun;38(2):293–316.
  25. 25. Williams JN, Kuribara C. Comparing a nativist and emergentist approach to the initial stage of SLA: An investigation of Japanese scrambling. Lingua. 2008 Apr 1;118(4):522–553.
  26. 26. Denhovska N, Serratrice L. Incidental learning of gender agreement in L2. J Psycholinguist Res. 2017 Oct;46(5):1187–211. pmid:28409318
  27. 27. Godfroid A. The effects of implicit instruction on implicit and explicit knowledge development. Stud Second Lang Acquis. 2016 Jun;38(2):177–215.
  28. 28. Hulstijn JH. Theoretical and empirical issues in the study of implicit and explicit second-language learning: Introduction. Stud Second Lang Acquis. 2005 Jun;27(2):129–140.
  29. 29. Williams JN. Implicit learning. In Ritchie WC, Bhatia TK editors. The New Handbook of Second Language Acquisition. Bingley: Emerald Group Publishing Limited; 2009. p. 319–353.
  30. 30. Rebuschat P, Williams JN. Implicit and explicit knowledge in second language acquisition. Appl Psycholinguist. 2012 Oct;33(4):829–856.
  31. 31. Isbell DR, Rogers J. Measuring implicit and explicit learning and knowledge. In: Winke P, Brunfaut T, editors. The Routledge handbook of second language acquisition and language testing. Routledge. 2020 Dec. p. 304–313.
  32. 32. Hamrick P, Rebuschat P. How implicit is statistical learning. In: Rebuschat P, Williams J, editors Statistical learning and language acquisition. Berlin: De Gruyter Mouton; 2012. p. 365–382.
  33. 33. Maie R, DeKeyser RM. Conflicting evidence of explicit and implicit knowledge from objective and subjective measures. Stud Second Lang Acquis. 2020 May;42(2):359–382.
  34. 34. Miller ZF, Godfroid A. Emotions in incidental language learning: An individual differences approach. Stud Second Lang Acquis. 2020 Mar;42(1):115–141.
  35. 35. Gao J, Ma S. Learning Condition, Linguistic Complexity, And First Language Transfer In Semiartificial Language Learning: A Conceptual Replication And Extension Of Tagarelli et al. (2016). Stud Second Lang Acquis. 2021 May;43(2):355–378.
  36. 36. Monaghan P, Schoetensack C, Rebuschat P. A single paradigm for implicit and statistical learning. Top Cogn Sci. 2019 Jul;11(3):536–54. pmid:31338980
  37. 37. Robinson P. Learning simple and complex second language rules under implicit, incidental, rule-search, and instructed conditions. Stud Second Lang Acquis. 1996 Mar;18(1):27–67.
  38. 38. Morgan-Short K, Steinhauer K, Sanz C, Ullman MT. Explicit and implicit second language training differentially affect the achievement of native-like brain activation patterns. J Cogn Neurosci. 2012 Apr 1;24(4):933–47. pmid:21861686
  39. 39. Rebuschat P, Hamrick P, Riestenberg K, Sachs R, Ziegler N. Triangulating measures of awareness: A contribution to the debate on learning without awareness. Stud Second Lang Acquis. 2015 Jun;37(2):299–334.
  40. 40. Rogers J. Awareness and learning under incidental learning conditions. Lang. Aware. 2017 Apr 3;26(2):113–33.
  41. 41. Rogers J. Levels of awareness, depth of processing, and the learning of L2 case markings. In: Leow RP, editor. The Routledge handbook of second language research in classroom learning. New York: Routledge; 2019. p. 76–88.
  42. 42. Hulstijn JH. Incidental and intentional learning. In: Doughty CJ, Long MH, editors. The Handbook of Second Language Acquisition. Oxford: Blackwell Publishing Limited; 2003. p.349–381.
  43. 43. Long MH. Second language acquisition and task-based language teaching. Oxford, UK: Wiley-Blackwell; 2015.
  44. 44. Arciuli J. The multi-component nature of statistical learning. Philosophical Transactions of the Royal Society B: Biol Sci. 2017 Jan; 372(1711):20160058. pmid:27872376
  45. 45. Li S, DeKeyser R. State of the Scholarship. Stud Second Lang Acquis. 2021; 43:473–497.
  46. 46. Williams JN. Initial incidental acquisition of word order regularities: Is it just sequence learning?. Lang Learn. 2010 Dec;60: 221–244.
  47. 47. Durrant SJ, Taylor C, Cairney S, Lewis PA. Sleep-dependent consolidation of statistical learning. Neuropsychologia. 2011 Apr 1;49(5):1322–1331. pmid:21335017
  48. 48. Davis MH, Di Betta AM, Macdonald MJ, Gaskell MG. Learning and consolidation of novel spoken words. Journal of cognitive neuroscience. 2009 Apr 1;21(4):803–20. pmid:18578598
  49. 49. DeKeyser RM (2005). What makes learning second‐language grammar difficult? A review of issues. Language learning, 55, 1–25.
  50. 50. Hopp H. Second Language Sentence Processing. Annu Rev Linguist. 2022 Jan 14;8:235–256.
  51. 51. Larsen‐Freeman D. Not so fast: A discussion of L2 morpheme processing and acquisition. Language Learning. 2010 Mar;60(1):221–230.
  52. 52. Ellis NC. Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Stud Second Lang Acquis. 2002 Jun;24(2):143–188.
  53. 53. MacWhinney B. Models of the emergence of language. Annu Rev Psychol. 1998 Feb;49(1):199–227. pmid:15012469
  54. 54. Housen A, Simoens H. Introduction: Cognitive perspectives on difficulty and complexity in L2 acquisition. Stud Second Lang Acquis. 2016 Jun;38(2):163–175.
  55. 55. Dörnyei Z. Individual differences: Interplay of learner characteristics and learning environment. Lang Learn. 2009 Dec;59:230–48.
  56. 56. DeKeyser R., Koeth J. Cognitive aptitudes for second language learning. In: Hinkel E editor. Handbook of research in second language teaching and learning. New York: Routledge; 2011.p. 395–406.
  57. 57. Abutalebi J, Green D. Bilingual language production: The neurocognition of language representation and control. J Neurolinguistics. 2007 May 1;20(3):242–275.
  58. 58. Ellis NC. Selective attention and transfer phenomena in L2 acquisition: Contingency, cue competition, salience, interference, overshadowing, blocking, and perceptual learning. Appl Linguist. 2006 Jun 1;27(2):164–194.
  59. 59. MacWhinney B. Entrenchment in second-language learning. In: Schmid HJ, editor. Entrenchment and the psychology of language learning: How we reorganize and adapt linguistic knowledge. Berlin/Boston: De Gruyter Mouton; 2017. p. 343–366.
  60. 60. Ellis NC, Sagarra N. The bounds of adult language acquisition: Blocking and learned attention. Stud Second Lang Acquis. 2010 Dec;32(4):553–80.
  61. 61. Morett LM, MacWhinney B. Syntactic transfer in English-speaking Spanish learners. Biling-Lang Cogn. 2013 Jan;16(1):132–51.
  62. 62. Matessa M, Anderson JR. Modelling focused learning in role assignment. Lang Cogn Process. 2000 Jun 1;15(3):263–92.
  63. 63. Williams JN. Learning without awareness. Stud Second Lang Acquis. 2005 Jun;27(2):269–304.
  64. 64. Ellis NC. Usage-based and form-focused language acquisition: The associative learning of constructions, learned-attention, and the limited L2 endstate. In: Robinson P, Ellis NC, editors., Handbook of cognitive linguistics and second language acquisition. London: Routledge; 2008. p.372–405.
  65. 65. Ellis NC, Sagarra N. Learned attention in adult language acquisition: A replication and generalization study and meta-analysis. Stud Second Lang Acquis. 2011 Dec;33(4):589–624.
  66. 66. Leung JH, Williams JN. Crosslinguistic differences in implicit language learning. Stud Second Lang Acquis. 2014 Dec;36(4):733–55.
  67. 67. Cayado DK, Chan RK. The Influence of Prior Linguistic Knowledge on Second Language Semantic Implicit Learning: Evidence from Cantonese–English Bilinguals. Language Learning. 2022 Nov 22: 1–26.
  68. 68. Culbertson J, Schuler K. Artificial language learning in children. Annu Rev Linguist. 2019 Jan 14;5:353–73.
  69. 69. Weiss DJ. Introduction: The use of artificial languages in bilingualism research. Biling-Lang Cogn. 2020 Jan;23(1):72–3.
  70. 70. Hulstijn JH. Second language acquisition research in the laboratory: Possibilities and limitations. Stud Second Lang Acquis. 1997 Jun;19(2):131–43.
  71. 71. Friederici AD, Steinhauer K, Pfeifer E. Brain signatures of artificial language processing: Evidence challenging the critical period hypothesis. P Natl Acad Sci. 2002 Jan 8;99(1):529–34. pmid:11773629
  72. 72. Silva S, Folia V, Hagoort P, Petersson KM. The P600 in implicit artificial grammar learning. Cogn Sci. 2017 Jan;41(1):137–57. pmid:26913833
  73. 73. Tagarelli KM, Shattuck KF, Turkeltaub PE, Ullman MT. Language learning in the adult brain: A neuroanatomical meta-analysis of lexical and grammatical learning. NeuroImage. 2019 Jun 1;193:178–200. pmid:30826361
  74. 74. Ettlinger M, Morgan‐Short K, Faretta‐Stutenberg M, Wong PC. The relationship between artificial and second language learning. Cogn Sci. 2016 May;40(4):822–47. pmid:26201508
  75. 75. Grey S. What can artificial languages reveal about morphosyntactic processing in bilinguals?. Biling-Lang Cogn. 2020 Jan;23(1):81–6.
  76. 76. Morgan-Short K. Insights into the neural mechanisms of becoming bilingual: A brief synthesis of second language research with artificial linguistic systems. Biling-Lang Cogn. 2020 Jan;23(1):87–91.
  77. 77. Pili-Moss D, Brill-Schuetz K, Faretta-Stutenberg M, Morgan-Short K. Contributions of declarative and procedural memory to accuracy and automatization during second language practice. Biling-Lang Cogn. 2020; 23(3):639–651.
  78. 78. Brooks PJ, Kwoka N, Kempe V. Distributional effects and individual differences in L2 morphology learning. Language Learning. 2017 Mar;67(1):171–207.
  79. 79. Andringa S. The emergence of awareness in uninstructed L2 learning: A visual world eye tracking study. Second Language Research. 2020 Jul;36(3):335–57.
  80. 80. Hama M, Leow RP. Learning without awareness revisited: Extending Williams (2005). Stud Second Lang Acquis. 2010 Sep;32(3):465–91.
  81. 81. Pili-Moss D (2022) Long-term memory predictors of adult language learning at the interface between syntactic form and meaning. PLoS ONE 17(10): e0275061. pmid:36190977
  82. 82. Matsuura H, Chiba R, Mahoney S, Rilling S. Accent and speech rate effects in English as a lingua franca. System. 2014 Oct 1;46:143–50.
  83. 83. McBride K. The effect of rate of speech and distributed practice on the development of listening comprehension. Comput Assist Lang Learn. 2011 Apr 1;24(2):131–54.
  84. 84. Bakhshi S, Shamma DA, Kennedy L, Song Y, De Juan P, Kaye JJ. Fast, cheap, and good: Why animated GIFs engage us. Proceedings of the 2016 chi conference on human factors in computing systems; 2016 May 7–12; Sam Jose, US: ACM; 2016. p. 575–586.
  85. 85. Miltner KM, Highfield T. Never gonna GIF you up: Analyzing the cultural significance of the animated GIF. Soc Media Soc. 2017 Aug;3(3):2056305117725223.
  86. 86. Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK. Gorilla in our midst: An online behavioral experiment builder. Behav Res Methods. 2020 Feb;52(1):388–407. pmid:31016684
  87. 87. Llompart M, Reinisch E. Articulatory information helps encode lexical contrasts in a second language. J Exp Psychol Hum Percept Perform. 2017 May;43(5):1040. pmid:28263636
  88. 88. Llompart M, Reinisch E. The phonological form of lexical items modulates the encoding of challenging second-language sound contrasts. J Exp Psychol Learn Mem Cogn. 2020 Aug;46(8):1590. pmid:32162959
  89. 89. Macmillan NA, Creelman CD. Detection theory: a user’s guide. 2md ed. Mahwah: Lawrence Erlbaum.
  90. 90. Huang Y, Ferreira F. The application of signal detection theory to acceptability judgments. Front Psychol. 2020 Jan 31;11:73. pmid:32082223
  91. 91. Makowski D. The psycho package: An efficient and publishing-oriented workflow for psychological science. J Open Source Softw. 2018 Feb 5;3(22):470.
  92. 92. R Development Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2019.
  93. 93. Lenth RV. Emmeans: Estimated Marginal Means, aka Least-Squares Means, R package Version 1.8. 1–1.
  94. 94. Bartón K. MuMIn: Multi-model inference. R package version 1.47.1. 2022.
  95. 95. Lüdecke D. sjPlot: Data Visualization for Statistics in Social Science. R package version 2.8.11. 2022.
  96. 96. Parsons S. Splithalf: Robust estimates of split half reliability. Journal of Open Source Software. 2021 Apr 23;6(60):3041. 041.
  97. 97. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software. 2015;67(1):1–48.
  98. 98. Baayen RH, Davidson DJ, Bates DM. Mixed-effects modeling with crossed random effects for subjects and items. J Mem Lang. 2008 Nov 1;59(4):390–412.
  99. 99. Jaeger TF. Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. J Mem Lang. 2008 Nov 1;59(4):434–46. pmid:19884961
  100. 100. Lüdecke D. ggeffects: Tidy data frames of marginal effects from regression models. J. Open Source Softw. 2018;3(26), 772.
  101. 101. Hudson Kam CL, Newport EL. Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Lang Learn Dev. 2005 Apr 1;1(2):151–95.
  102. 102. MacWhinney B. The logic of the Unified Model. In: Gass S, A. Mackey A, editors. Handbook of second language acquisition. New York: Routledge; 2011, p. 211–227.
  103. 103. Murakami A, Alexopoulou T. L1 influence on the acquisition order of English grammatical morphemes: A learner corpus study. Stud Second Lang Acquis. 2016 Sep;38(3):365–401.
  104. 104. Erlam R, Ellis R. Task-based language teaching for beginner-level learners of L2 French: An exploratory study. Can Mod Lang Rev. 2018 Feb;74(1):1–26.
  105. 105. Dienes Z, Scott R. Measuring unconscious knowledge: Distinguishing structural knowledge and judgment knowledge. Psychol Res. 2005 Jun;69(5):338–351. pmid:15944859
  106. 106. Schmidt R. Attention, awareness, and individual differences in language learning. In: Chan WM, Chin KN, Bhatt SK, Walker I, editors. Perspectives on individual characteristics and foreign language education. Berlin: De Gruyter Mouton; 2012. p. 27–50.
  107. 107. McManus K. Crosslinguistic influence and second language learning. 1st ed. New York: Routledge; 2021.
  108. 108. Gutiérrez X. The construct validity of grammaticality judgment tests as measures of implicit and explicit knowledge. Stud Second Lang Acquis. 2013 Sep;35(3):423–49.
  109. 109. Ellis NC. Constructions, chunking, and connectionism: The emergence of second language structure. In: Doughty CJ, Long MH, editors. The handbook of second language acquisition. Oxford: Blackwell; 2003. p. 33–68.
  110. 110. Walker N, Monaghan P, Schoetensack C, Rebuschat P. Distinctions in the acquisition of vocabulary and grammar: An individual differences approach. Lang Learn. 2020 Jun;70(S2):221–254.
  111. 111. Freudenthal D, Pine JM, Aguado‐Orea J, Gobet F. Modeling the developmental patterning of finiteness marking in English, Dutch, German, and Spanish using MOSAIC. Cogn Sci. 2007 Mar 4;31(2):311–341. pmid:21635299
  112. 112. Shoemaker E, Rast R. Extracting words from the speech stream at first exposure. Second Lang Res. 2013 Apr;29(2):165–183.
  113. 113. Slobin DI. Crosslinguistic evidence for the language-making capacity. In: Slobin DI, editor. The crosslinguistic study of language acquisition. NJ: Lawrence Erlbaum Associates; 1985. p. 1157–1256.
  114. 114. Bates E, Bretherton I, Snyder LS. From first words to grammar: Individual differences and dissociable mechanisms. Cambridge: Cambridge University Press; 1991.
  115. 115. Marchman VA, Bates E. Continuity in lexical and morphological development: A test of the critical mass hypothesis. J Child Lang. 1994 Jun;21(2):339–366. pmid:7929685
  116. 116. Service E, Kohonen V. Is the relation between phonological memory and foreign language learning accounted for by vocabulary acquisition? Appl Psycholinguist. Cambridge University Press; 1995;16(2):155–172.
  117. 117. Montrul S. The acquisition of Spanish: Morphosyntactic development in monolingual and bilingual L1 acquisition and adult L2 acquisition. Amsterdam: John Benjamins Publishing. 2004.
  118. 118. Laenzlinger C. French adjective ordering: Perspectives on DP-internal movement types. Lingua. 2005 May 1;115(5):645–689.
  119. 119. Ellis NC, Larsen-Freeman D. Language emergence: Implications for applied linguistics—Introduction to the special issue. Appl Linguist. 2006 Dec 1;27(4):558–589.
  120. 120. Flynn S, Foley C, Vinnitskaya I. The cumulative-enhancement model for language acquisition: Comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses. Int J Multiling. 2004 Jan 1;1(1):3–16.
  121. 121. Rothman J. On the typological economy of syntactic transfer: Word order and relative clause attachment preference in L3 Brazilian Portuguese. IRAL Int Rev Appl Linguist Lang Teach. 2010;(48): 245–273.
  122. 122. Ellis NC, Ferreira–Junior F. Construction learning as a function of frequency, frequency distribution, and function. Mod Lang J. 2009 Sep;93(3):370–385.
  123. 123. Gass SM, Mackey A. Frequency effects and second language acquisition: A complex picture?. Stud Second Lang Acquis. 2002 Jun;24(2):249–260.
  124. 124. Hamrick P, Rebuschat P. Frequency effects, learning conditions, and the development of implicit and explicit lexical knowledge. In: Connor-Linton J, Amoroso L, editors. Measured language: Quantitative approaches to acquisition, assessment, processing and variation. Washington, DC: Georgetown University Press;2014. p. 125–140.
  125. 125. Bates E, MacWhinney B. Functionalism and the competition model. In: Bates E, MacWhinney B, editors. The crosslinguistic study of sentence processing. Cambridge: Cambridge University Press; 1989. p. 3–73.
  126. 126. Jackendoff R. Précis of foundations of language: Brain, meaning, grammar, evolution. Behav Brain Sci. 2003 Dec;26(6):651–665. pmid:15377127
  127. 127. VanPatten B. Input processing and grammar instruction in second language acquisition. New Jersey: Ablex Publishing Corporation; 1996.
  128. 128. Baten K. Processability Theory and German case acquisition. Lang Learn. 2011 Jun;61(2):455–505.
  129. 129. Sasaki Y. Processing and learning of Japanese double-object active and causative sentences: An error-feedback paradigm. J Psycholinguist Res. 1998 Jul;27(4):453–479.
  130. 130. Bader M, Meng M. Subject-object ambiguities in German embedded clauses: An across-the-board comparison. J Psycholinguist Res. 1999 Mar;28(2):121–143.
  131. 131. Gorrell P. Parsing theory and phrase-order variation in German V2 clauses. J Psycholinguist Res. 1996 Jan;25(1):135–156. pmid:8789370
  132. 132. MacWhinney B, Bates E, Kliegl R. Cue validity and sentence interpretation in English, German, and Italian. J Verbal Learning Verbal Behav. 1984 Apr 1;23(2):127–150.
  133. 133. Bates E, MacWhinney B. Second-language acquisition from a functionalist perspective: Pragmatic, semantic, and perceptual strategies. Ann N Y Acad Sci.1981 Dec; 379, 190–214.
  134. 134. Kilborn K. Sentence processing in a second language: The timing of transfer. Lang Speech. 1989 Jan;32(1):1–23. pmid:2622299
  135. 135. Weyerts H, Penke M, Münte TF, Heinze HJ, Clahsen H. Word order in sentence processing: An experimental study of verb placement in German. J Psycholinguist Res. 2002 May;31(3):211–268. pmid:12092710
  136. 136. MacWhinney B. Second language acquisition and the competition model. In: de Groot AMB, Kroll JF, editors. Tutorials in bilingualism: Psycholinguistic perspectives. Mahwah, NJ: Lawrence Erlbaum Associates; 1997. p. 113–144.
  137. 137. Kilborn K, Ito T. Sentence processing strategies in adults bilinguals. In MacWinney B, Bates E, editors. The crosslinguistic study of sentence processing. Cambridge: Cambridge University Press; 1989. p. 257–291.
  138. 138. Sasaki Y. Paths of processing strategy transfers in learning Japanese and English as foreign languages: A competition model approach. Stud Second Lang Acquis. 1994 Mar;16(1):43–72.
  139. 139. Rounds PL, Kanagy R. Acquiring linguistic cues to identify agent: Evidence from children learning Japanese as a second language. Stud Second Lang Acquis. 1998 Dec;20(4):509–542.
  140. 140. Abbot‐Smith K, Behrens H. How known constructions influence the acquisition of other constructions: The German passive and future constructions. Cogn Sci. 2006 Nov 12;30(6):995–1026. pmid:21702844
  141. 141. Bybee J. From usage to grammar: The mind’s response to repetition. Language. 2006 Dec 1:711–733.
  142. 142. Denhovska N, Serratrice L, Payne J. Frequency and working memory effects in incidental learning of a complex agreement pattern. Lingua. 2018 May 1;207:49–70.
  143. 143. Kempe V, Brooks PJ. Second language learning of complex inflectional systems. Lang Learn. 2008 Dec;58(4):703–46.
  144. 144. Hopp H. Grammatical gender in adult L2 acquisition: Relations between lexical and syntactic variability. Second Lang Res. 2013 Jan;29(1):33–56.
  145. 145. Jiang N. Morphological insensitivity in second language processing. Appl Psycholinguist. 2004 Oct;25(4):603–34.
  146. 146. Brown R. A first language: The early stages. London: George Allen & Unwin Ltd; 1973.
  147. 147. Ellis NC. Salience, cognition, language complexity, and complex adaptive systems. Stud Second Lang Acquis. 2016 Jun;38(2):341–51.
  148. 148. Gao Z, Wiener S, MacWhinney B. Acquisition of Chinese Verb Separation by Adult L2 Learners. Languages. 2022 Aug 29;7(3):225.
  149. 149. Neubauer K, Clahsen H. Decomposition of inflected words in a second language: An experimental study of German participles. Stud Second Lang Acquis. 2009 Sep;31(3):403–35.
  150. 150. Janacsek K, Fiser J, Nemeth D. The best time to acquire new skills: Age‐related differences in implicit sequence learning across the human lifespan. Dev Sci. 2012 Jul;15(4):496–505. pmid:22709399
  151. 151. Granena G. Individual differences in sequence learning ability and second language acquisition in early childhood and adulthood. Lang Learn. 2013 Dec;63(4):665–703.
  152. 152. Suzuki Y, DeKeyser R. The interface of explicit and implicit knowledge in a second language: Insights from individual differences in cognitive aptitudes. Lang Learn. 2017 Dec;67(4):747–790.
  153. 153. Bohn OS. Cross-language speech perception in adults: first language transfer doesn’t tell it all. In: Strange W, editor. Speech perception and linguistic experience: theoretical and methodological issues. Timonium, MD: York Press; 1995. p. 279–304.
  154. 154. Escudero P, Benders T, Lipski SC. Native, non-native and L2 perceptual cue weighting for Dutch vowels: The case of Dutch, German, and Spanish listeners. J Phon. 2009 Oct 1;37(4):452–465.
  155. 155. Morrison GS. Effects of L1 duration experience on Japanese and Spanish listeners’ perception of English high front vowels. Burnaby: Simon Fraser University; 2002.
  156. 156. Iverson P, Evans BG. Learning English vowels with different first-language vowel systems: Perception of formant targets, formant movement, and duration. J Acoust Soc Am. 2007 Nov;122(5):2842–2854. pmid:18189574
  157. 157. Bybee J. Sequentiality as the basis of constituent structure. In Givón T, Malle BF, editors. The evolution of language out of pre-language. Amsterdam: Benjamins; 2002. p. 109–132.
  158. 158. Frank SL, Bod R, Christiansen MH. How hierarchical is language use? Proc R Soc Lond B Biol Sci. 2012 Nov 22;279(1747):4522–31. pmid:22977157
  159. 159. Cintrón-Valentín MC, Ellis NC. Salience in second language acquisition: Physical form, learner attention, and instructional focus Front Psychol. 2016:1284. pmid:27621715
  160. 160. Ellis R. Form-focused instruction and the measurement of implicit and explicit L2 knowledge. In: Rebuschat P, editor. Implicit and explicit learning of languages. Amsterdam: John Benjamins; 2015. p. 417–441.
  161. 161. Hartsuiker RJ, Bernolet S. The development of shared syntax in second language learning. Biling-Lang Cogn. 2017 Mar;20(2):219–34.
  162. 162. MacWhinney B. A unified model of first and second language learning. In: Hickmann M, Veneziano E, Jisa H, editors. Sources of variation in first and second language acquisition: Languages, contexts, and learners. Amsterdam: John Benjamins; 2018. p. 287–312.
  163. 163. Cheng PW, Holyoak KJ. Complex adaptive systems as intuitive statisticians: Causality, contingency, and prediction. In Roitblat HL, Meyer JA, editors. Comparative approaches to cognitive science. Cambridge, MA: MIT Press; 1995. p. 271–302.
  164. 164. Ellis NC, Hafeez K, Martin KI, Chen L, Boland J, Sagarra N. An eye-tracking study of learned attention in second language acquisition. Appl Psycholinguist. 2014 May;35(3):547–79.
  165. 165. McDonald JL. The development of sentence comprehension strategies in English and Dutch. J Exp Child Psychol. 1986 Apr 1;41(2):317–35.
  166. 166. DeKeyser R, editor. Practice in a second language: Perspectives from applied linguistics and cognitive psychology. New York, NY: Cambridge University Press; 2007.
  167. 167. Ellis NC. Cognitive and social aspects of learning from usage. In: Cadierno T, Eskildsen SW, editors. Usage-based perspectives on second language learning. Berlin/Boston: De Gruyter Mouton; 2015. p. 49–73.
  168. 168. Schmidt R. Attention. In Robinson P, editor. Cognition and second language instruction. Cambridge: Cambridge University Press; 2001. p. 3–32
  169. 169. Shanks DR, John MF. Characteristics of dissociable human learning systems. Behav Brain Sci. 1994 Sep;17(3):367–395.
  170. 170. Dienes Z, Perner J. A theory of implicit and explicit knowledge. Behav Brain Sci. 1999 Oct;22(5):735–808. pmid:11301570
  171. 171. Haider H, Frensch PA. The generation of conscious awareness in an incidental learning situation. Psychol Res. 2005 Jun;69(5):399–411. pmid:15944861
  172. 172. Karmiloff-Smith BA. Beyond modularity: A developmental perspective on cognitive science. Eur J Disord Commun.1994 Jan 1;29(1):95–105.