Oral vocabulary training program for Spanish third-graders with low socio-economic status: A randomized controlled trial

Although the importance of vocabulary training in English speaking countries is well recognized and has been extensively studied, the same is not true for Spanish–few evidence based vocabulary studies for Spanish-speaking children have been reported. Here, two rich oral vocabulary training programs (definition and context), based on literature about vocabulary instruction for English-speaking children, were developed and applied in a sample of 100 Spanish elementary school third-graders recruited from areas of predominantly low socio-economic status (SES). Compared to an alternative read-aloud method which served as the control, both explicit methods were more effective in teaching word meanings when assessed immediately after the intervention. Nevertheless, five months later, only the definition group continued to demonstrate significant vocabulary knowledge gains. The definition method was more effective in specifically teaching children word meanings and, more broadly, in helping children organize and express knowledge of words. We recommend the explicit and rich vocabulary instruction as a means to fostering vocabulary knowledge in low SES children.


Introduction
Vocabulary knowledge is an important aspect in learning to read. In confirmation of this, high correlations between vocabulary and reading comprehension have been repeatedly reported in the literature for English-speaking children [1]. In particular, for at-risk children, such as children with low socio-economic status (SES) or learning difficulties, vocabulary deficits tend to be difficult to overcome and can accompany them throughout their entire academic career [2][3][4]. This can subsequently bring extra burdens to these students' already challenging educational path [5,6].
Studies with Spanish-speaking children report a similar pattern of results. According to a recent OECD report ( [7]; p. 216), around 12% of variance in reading performance of Spanish children is linked to socio-economic differences. Specifically in regard to vocabulary, it has PLOS ONE | https://doi.org/10.1371/journal.pone.0188157 November 29, 2017 1 / 25 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 been reported that differences in vocabulary levels between low and middle/high SES Spanish children are common and that this difference remains throughout the elementary school years [8]. Likewise, a study with Peruvian fourth-graders has shown that reading literacy was significantly correlated to children's SES and vocabulary knowledge levels [9]. Based on the strong body of evidence regarding the effects of vocabulary intervention for English-speaking children in the elementary school years, research summary reports (e.g., [10][11][12]) and books for educational practitioners (e.g., [13,14]) recommend that vocabulary should be taught by providing rich and varied language experiences, by explicitly teaching individual words and word-learning strategies, and by fostering word awareness. Additionally, the teaching methods used should promote active processing of the words and enable multiple encounters in order to boost learning outcomes.
Interestingly, these recommended teaching methods are still not common practice at the elementary school level in Spanish-speaking countries. Based on our observations in some schools, one procedure consists of giving teachers lists of words (e.g., [15]) that children should learn in each primary grade. As a consequence, vocabulary knowledge, although recognized by the teachers as an important skill, is sometimes treated as a component of reading comprehension that does not need specific instruction, and the teaching of vocabulary is confined to writing definitions of words (mostly pre-selected from the text book) after reading a text passage.
While there is a disconnect between research and practice in many countries (e.g., [16]), in the case of Spain, one of the reasons for this gap is the fact that educational policies are still not strongly guided by evidence. Some researchers in Spain, along with the Ministry of Education, have attempted to compile important findings related to reading research in the form of local and national reports [17][18][19]. However, most of these reports are largely based on results of studies with English-speaking populations. This can be problematic, as English and Spanish differ in many aspects, including in areas related to the transparency of the orthography [20], prosodic features [21,22], as well as differences in speech production [23] and the rate of learning to decode from print [20]. In fact, in the case of transparency, Share [24] has argued that English is an outlier orthography and that common models of reading developed using evidence gathered from English-speaking participants are "ill equipped to serve the interests of a universal science of reading" (p. 584). Consequently, it is plausible that teaching vocabulary in English and Spanish may require different strategies. Additionally, the language differences identified have been associated with tuitional practices (e.g., [25]). Thus, it is possible that practices in the classroom between Spain and English-speaking countries need to be adapted due to differences in the attitude and practices of parents related to supporting their children's literacy development at home. For these reasons, there is still a need to generate more evidence supporting theoretically-motivated models which are applicable to Spanish-speaking populations, taking into account environmental and language-specific differences.
In the area of vocabulary, the lack of theoretically motivated, comprehensive evidencebased training programs to foster vocabulary knowledge in Spanish-speaking children is quite apparent. To the best of our knowledge, there are only three published studies that have examined vocabulary training in Spanish-speaking children [26][27][28]. In the first study [26], Spanish pre-school aged children were taught just five words and all were from the same semantic category. In the second study with Spanish fifth-graders [27], although a larger set of training words were used, no random allocation of children to groups was undertaken. Instead, the training and control groups were composed of children from different schools. Thus, it is possible that uncontrolled differences between the two schools could partially explain the results reported. Finally, the third study with Chilean kindergarten children [28] does mention the use of a randomized assignment and a control group. Nevertheless, procedures and statistical analysis are poorly reported and, consequently, the results are hard to interpret.
In sum, although the basic underlying developmental and cognitive mechanisms of acquiring vocabulary may not differ between Spanish and English-speaking children, linguistic, cultural, school-related, and parental practice issues may influence the effectiveness of teaching methods. Thus, it cannot simply be assumed that a method which has been found to be effective in English will also have the equivalent effect in Spanish, and the efficacy of any method for improving vocabulary on Spanish-speaking children must be empirically determined in well-designed studies. Accordingly, the present work was designed to investigate the effects of two methods of oral vocabulary training in a sample of Spanish children. Such research would also have the potential to trigger discussions among educators in schools about more effective ways of fostering vocabulary development of elementary-school-aged children.
In English-speaking countries, in which the importance of fostering vocabulary using evidence-based teaching methods is more recognized, there are still researchers who argue that there are too many words to be taught explicitly (which is the recommended technique). This is mostly due to the assumed complexity of word knowledge [29] and the observed number of words learned along with the pace of word learning in the school years [30,31]. However, this does not necessarily mean that explicit teaching cannot operate as an additional learning channel alongside others, such as incidental learning [32], learning word meanings from context [33], wide reading [34], and reading aloud [35], all of which have also shown their value in supporting vocabulary development.
Moreover, in the case of at-risk children, comparative studies point to an advantage of explicit teaching of vocabulary for kindergarten children [36] as well as for elementary school children [37]. These authors argued that the encounters with words and text experienced by children with reading difficulties, low motivation to read, and poor language environment will not be equally productive or necessarily lead to vocabulary gain in comparison to average achievers. In accordance with this view, Perfetti [38] claims that children with comprehension difficulties will be able to learn fewer words during their reading experiences than children with well-developed reading comprehension skills. Therefore, in order to try to raise the level of at-risk children's vocabulary knowledge to that of average achievers, explicit and systematic vocabulary training at a young age is recommended [39].
The first step in order to develop effective and systematic vocabulary training is to try to understand and describe word knowledge and what it means to know a word. As a starting point, one could define vocabulary as the number of words a child possesses in his/her mental lexicon, no matter how superficial the knowledge is. This is known as breadth of vocabulary. However, a large amount of research undertaken in the area of vocabulary acquisition suggests that such a simple definition is not adequate (e.g., [29,[40][41][42][43]). A description of the qualitative aspects of word knowledge, such as its richness (e.g., polysemy, derivation), structure (semantic fields and links), and relation to other knowledge should also be considered. Collectively, these factors can be denoted as the depth of vocabulary knowledge. Importantly, it has been argued that vocabulary depth and the efficiency with which knowledge about words can be accessed have more influence on the higher-order processes in reading comprehension than does vocabulary breadth [44]. In accordance with this idea, the lexical quality hypothesis [38,43] suggests that high-quality mental representations of word knowledge are responsible for facilitating the access and integration of word knowledge when reading text. The quality of these mental representations of word knowledge would be determined by linguistic knowledge (phonological, orthographic, morphological, syntactic, and semantic) and knowledge stability (consistent and reliable retrieval), its synchrony of activation (integration of all knowledge components), and its fast access. In summary, although theories regarding vocabulary acquisition may vary in some aspects, they tend to agree that vocabulary knowledge is complex and develops incrementally, and reading comprehension is supported by both broad (quantity) and deep (richness) word knowledge.
In practical terms, this means that vocabulary intervention programs should be comprehensive and aim to not just increase the number of words that children recognize, but also to grow depth of word knowledge, that is, its quality, recall promptness, and correct use. One example of such a comprehensive evidence-based vocabulary intervention program for explicit teaching of vocabulary to elementary school children is the "rich vocabulary instruction" developed by Beck and colleagues [13]. This type of instruction is based on concepts of repeated exposure to words, deep processing of word meanings, and retrieval practices. Additionally, these authors stressed the importance of using "student-friendly" word definitions, that is, definitions specially modified to be focused and easily understood by the students, as well as being embedded in anchor sentences. This rich instruction has repeatedly shown positive effects on the knowledge of taught words [45,46], as well as transfer learning effects on control words and reading comprehension of passages containing the words [47][48][49]. Interestingly, a rich extended version of instruction which additionally included activities aimed at motivating children to use the taught words beyond the classroom, and so, indirectly fostering word awareness, was particularly useful in bringing about significant improvements in story comprehension. Word awareness is a term used in the field of vocabulary research to refer to metalinguistic knowledge of words [50], and is defined as awareness of, interest in, and appreciation of words [51]. It has been argued that the correlation between vocabulary and reading comprehension can be partially explained by metalinguistic awareness [52]. However, evidence about the effects of word awareness on vocabulary knowledge and reading comprehension in controlled settings is still scarce and more studies are needed to understand the mechanisms regarding how this metalinguistic knowledge specific to words affects word learning and reading comprehension [53].
Apart from comprehensive vocabulary training programs, various studies investigating the effects of single teaching methods for promoting vocabulary development have been of central importance to guide the development of effective and systematic vocabulary training for elementary school children.
Stahl and Fairbanks' meta-analysis [54] reported that teaching methods using definitional or contextual emphasis produced large effects and were similarly effective in teaching the meanings of target words and in using the words correctly in cloze-type sentences (sentences in which the target word is left blank for the student to complete; [55]). For comprehension of text passages containing the trained words, a small advantage for a combination of both definitional and contextual methods was found. In contrast, learning transfer effects captured by standardized measures of vocabulary and reading comprehension not containing the taught words were, in general, small.
Similar results were reported in the more recent meta-analysis carried out by Elleman and colleagues [53]. Of note, these authors found that vocabulary instruction had a larger impact on customized rather than on standardized measures of vocabulary and reading comprehension. Their conclusion was that the existing standardized measures of reading comprehension may not be sensitive enough to capture changes related to vocabulary training.
An important systematic review of vocabulary instruction was given in the influential National Reading Panel report [10]. Based on analysis of data trends, some of the recommendations for vocabulary teaching practices were: (a) teach vocabulary indirectly and directly; (b) provide repetition and multiple exposures to words; (c) provide rich contexts for words; (d) actively engage children in tasks; (e) combine teaching methods. Similarly, the relatively new research synthesis developed by the National Reading Technical Assistance Center [12] advocates for frequent exposure to words, explicit instruction, and engaging and interactive activities.
Nash and Snowling [56] directly compared definitional and contextual methods in a sample of 7 to 8-year-olds with poor vocabulary knowledge. The definition program consisted of reading aloud pre-determined, simplified dictionary definitions and asking children to think of a personal experience in which the word would fit. The context program consisted of presenting the word of the day in a short text passage. Children were asked to find words that "would give clues to the meaning of the new word" and write them down in a semantic map. At the end of the context program, children were also asked to think of a personal experience in which the words would fit. Both groups significantly improved their knowledge of the taught words. Nevertheless, only the context group maintained the gained word knowledge three months later. Nash and Snowling [56] concluded that the semantic representations built through the context method were more persistent, well-specified, and stable. Due to its potential to teach children how to independently find out the meaning of words beyond the teaching program, the contextual method was recommended over the definitional. The positive effects of contextually based vocabulary instruction for third-graders was also found by Nelson and Stage [57]. This training involving teaching multiple meanings of words using information embedded in a context brought about significant gains in reading comprehension, especially for children with low initial vocabulary knowledge.
In contrast, Jenkins and colleagues [58] found that a definitional method was more effective to teach children word meanings than a contextual method. In this study, the definition method was "richer," as in addition to the direct word definitions, it provided two examples for each target word in sentences. In this case, the provision of a "student-friendly context" to the words could have boosted the effects of the definition methodology. The conflicting results between Nash and Snowling [56] and Jenkins and colleagues [58] suggest that, despite these studies both looking at "definitional" and "contextual" methods, there were probably some important underlying differences in how these methods were implemented. Indeed, Beck and McKeown [59] point out the problem with attaching specific labels to training methods, given that under the same name very different teaching concepts can be found. Thus, a detailed examination of individual aspects under teaching methods is imperative in order to enable a more precise interpretation of the methodology's effect.
One common aspect that is present in the effective vocabulary interventions reported above is that they all include a high level of discussion about and around words. This is consistent with the idea that, as oral vocabulary knowledge develops before children start learning to read, oral language comprehension could form a base for the later development of reading comprehension [60]. This would mean that at the point of learning to read, a child's ability to derive word meanings from context would extend from oral to written language [34]. In an intervention study with children with reading comprehension difficulties, Clarke and colleagues [61] contrasted an oral language training, a text comprehension training, and a combined oral-text training. In the vocabulary component of the oral language training, graphic organizers (semantic maps), verbal reasoning tasks, mnemonics and illustrations to support the multiple-context learning approach were used. Results showed that the long-term gains achieved in reading comprehension were significantly higher for the children in the oral language training only group compared to both the text comprehension only group and the combined oral language and text comprehension training group. These authors concluded that improvement in children's oral vocabulary was the main mediator of the positive effects of the oral language training on reading comprehension.
According to Graves [31], listening to and speaking about words can be a great source for vocabulary learning at all ages. Nevertheless, reading instruction usually focuses on practicing to decode from print in the beginning elementary years (mostly high frequent and already known words), and only slowly moves towards strongly emphasizing comprehension of words and content from written text in later years [62]. Thus, a rich oral vocabulary training for children in the transition years, which would correspond to third-or fourth-grade in Spain, could potentially provide these children with extra support in boosting their word learning, and thus assisting children to better cope with the emerging higher reading comprehension demands.
Considering this background, the present study was designed to investigate the effectiveness of two oral vocabulary training programs especially developed for Spanish-speaking children. We had two main goals: (1) to determine if either of two different types of rich learning experiences would be effective at increasing vocabulary knowledge beyond what might be gained through classroom reading activities that did not include explicit vocabulary instruction and (2) to explore if learning transfer effects to items not taught and standardized tests not containing the taught words could be produced by indirectly fostering word awareness in the children. We hypothesized that these two programs based on rich instruction would be equally effective and promote significantly more vocabulary knowledge gains of the taught items in comparison to the control group. In relation to transfer and more broad effects of the trainings, for non-taught equivalent items, an advantage was expected for the context methodology, as this combined word awareness, encountering words in stories, and eliciting word relations. Further, if the effects of both definition and context methods on word awareness were sufficiently strong, both training groups would show statistically significant gains in the standardized measures of receptive and expressive vocabulary at both post-tests beyond that of the control group. Finally, in relation to the reading comprehension measure, again a possible advantage for the children in the context group was expected, as activities allowed extra experience in encountering and manipulating text passages and stories.

Participants
Three public schools located in low SES neighborhoods were selected based on recommendations from teaching professionals working in the region. Rather than selecting children based on some pre-determined set of criteria, we wanted the study to have high external validity that potentially better captures the reality in the classrooms as well as to adhere to inclusive educational practices [63]. For these reasons, no screening procedure was performed apart from the grade constraint.
The study was approved by the project's ethics committee. Signed participation forms were received from the principals of all three schools who participated in this study. Data was only collected from children who had returned signed parental consent forms. The final sample consisted of 100 third-graders (58 boys, 42 girls) from five third-grade classes with a mean age of eight years and two months (range 7;5-9;6) at the commencement of the study. Ninety-six children were native Spanish-speakers and the remaining four children were Spanish language learners (Arabic native speakers). Three children were receiving extra tutoring classes in specific subjects as part of the program "alumnado con necesidades específicas de apoyo educativo" [students with specific educational support need] and two children were attending the program "alumnado con necesidades educativas especiales" [students with special education needs].

Design
Children within classes were randomly assigned to one of the two training groups, or to the control group. Following this random allocation, due to the higher number of boys in some classes, some girls were randomly selected to be reassigned to ensure that both genders were represented in all groups. Also, for ethical reasons, we ensured that the four Spanish language learners were randomly allocated to one of the two experimental groups only (two children in the context and two in the definition group). Thus the final allocations were definition group (n = 33; 13 girls), context group (n = 34; 15 girls) and control group (n = 33; 14 girls). Assuming that the training methods were effective, we expected to find medium to large effects for knowledge of the taught words (see meta-analysis [53]). A priori power calculations using G Ã Power [64] indicated that the overall size of the sample and the well balanced nature of the design ensured sufficient statistical power was present for the planned group main comparisons and the expected effect size in the vocabulary knowledge of taught words (with power set to 0.8, f = 0.32, equivalent to η 2 of 0.09 and to Cohen's d of 0.64).
Children's evaluation assessments took place at the beginning of the school year just before the intervention started (pre-test), immediately after the intervention ended (post-test 1), and five months later at the end of the school year (post-test 2). Except for the reading comprehension test, which allows group testing, children were tested individually within the schools in multiple sessions no longer than 30 minutes each.
Evaluations were performed by nine research assistants. The recruited assistants were final year undergraduate students undertaking a "Teacher Education" university program and who had experience in teaching children at elementary school age. Training sessions for the evaluations were held for all research assistants together and it included written material with information about planning, schedules, research ethics, and the instruments' application rules and answer sheets.

Measures
The following measures were taken at all three time points: Receptive vocabulary. The Spanish version of the standardized Peabody Picture Vocabulary Test (PPVT-III; [65]) was used. In this test, the child selects one of four pictures to match a spoken word in meaning. Unlike in English, the Spanish version of the PPVT-III does not have two parallel forms, so children were tested at all time points with the same items.
Expressive vocabulary. The standardized Vocabulary subtest from the Spanish version of the Wechsler Intelligence Scale for Children IV (WISC-IV; [66]) was used. In this task the child was required to define orally a list of words.
Reading comprehension. The standardized multiple-choice test Comprensión Lectora de Complejidad Lingüística Progresiva [67] was used. The third-grade version with two parallel forms takes around 30 minutes and consists of 21 items organized in four main tasks that address comprehension at sentence and short text levels. The texts consist of a group of sentences connected by a common topic and characterized by simple grammatical structures and topics common to children's experiences at this age. The tasks involve interpreting the meaning of a sentence by marking another sentence that has an equivalent meaning, demonstrating the literal understanding of a short text passage by identifying the main characters and their actions, or showing inferential understanding of concepts not explicitly mentioned in the text by marking statements about the text as 'true' or 'false'. Form A and Form B were used to control for test-retest effects.
Knowledge of taught and control words. Consistent with the theoretical concept of incremental word knowledge [29,40,42], a self-report measure of vocabulary knowledge (VK) was used to estimate children's knowledge of taught and control words. Our VK task contained 30 words (15 words selected at random from the 60 taught words in the training methods, plus the 15 untaught control words). The final list of 30 words was the same for all children. Analogous to the vocabulary subtest from WISC-IV, the VK task consisted of asking children to explain orally the meaning of the words. Children's answers for each of the words were written down by the examiner and were later scored by two independent raters using a scale from zero to four points according to their correctness and quality ( Table 1). The correctness of the definitions was judged based on the entries of the target words in three age-appropriate pre-selected dictionaries. Inter-rater reliability at pre-test (κ = 0.79, p < .001), post-test 1 (κ = 0.73, p < .001), and post-test 2 (κ = 0.76, p < .001) pointed to an acceptable scoring classification system [68,69]. Also the criterion-related validity analyses of the instrument showed acceptable results, with moderate correlations at pre-test between the VK and the WISC-IV Vocabulary subtest, r = .58, p < .001, and between the VK and the PPVT-III, r = .58, p < .001 [70].

Selection of words
The words to be taught in the intervention, along with control words, were extracted from children's books adequate for the age group and selection was based on a number of criteria: type ("tier two" words, as defined by Beck and colleagues [13]), grammatical class (adjective, verb, and noun [8]), frequency (medium frequency [71]), productivity (number of derivate forms), and richness (number of definitions). Both productivity and richness were calculated based on the entries for the target words provided in three pre-selected dictionaries appropriate for elementary school children. From the final list of 75 words, 60 were randomly selected to be taught in the intervention (S1 Appendix) and the remaining 15 served as control words

Training sessions
Based on the principle of distributed practice [72], the intervention consisted of twenty sessions with three sessions per week over a seven week period plus a final closing session. In each session, except for the closing session, three words-one verb, one adjective and one noun-were to be taught in small groups of four to nine children (as children were randomly assigned to groups clustered in classes, group size varied according to class size). However, due to unforeseen changes to school schedules during the intervention phase, three training sessions had to be cancelled. The nine words that were planned to be taught on those days were transferred to the sessions that followed. This meant that in the first eight sessions the teaching plan of three words per session was followed. In the remaining nine sessions, four words per session were taught. For both the control and training groups, each session lasted 50 minutes. The small group sessions were held in separate rooms within the school. Thus, it was not possible for the children to observe or hear directly what was happening in a different intervention group.
For ethical reasons and because the implementation of a waiting list null control group was not viable in this project, children in the control group were offered an alternative intervention. Thus, the sessions for the control group consisted of reading aloud to the children combined with craft work related to the story being read. The reading aloud was based on the TWA approach (Think before, think While, think After reading), which has shown some benefit for English-speaking children [73,74] and is considered a traditional method found in Spanish books commonly used by elementary schools teachers (e.g., [75]). The story books used were the same books from which the taught and control words were originally selected. Thus, children in the control group were exposed to the same words as the children in the training groups, but the control group children did not receive any explicit teaching of word meanings. This is consistent with the research question of whether these methods would provide an advantage over the standard approach to reading instruction in Spanish classrooms.
Each session for the two training groups was divided into three parts: warm-up (part I), core program (part II), and recall game (part III). The activities in Parts I and III were identical for both groups. In Part I, the words of the day were introduced in a motivating and playful manner with a short activity lasting about ten minutes. The main goal was to get children involved, motivated, and curious about the words. Additionally, this first encounter was to give children the opportunity to acquire orthographic and phonological information from the visual and auditory inputs presented. Part III was based on the idea of retrieval practices [72]. It also lasted around ten minutes and consisted of recall games aimed at strengthening the recall and retrieval pathways for the newly learned words.
For Part II, a longer time slot of 30 minutes was reserved. According to McKeown and colleagues [49], choosing the most appropriate vocabulary teaching method depends on the specific instructional goals. As stated in the introduction, in this project we had two main objectives. The first was to enhance depth of vocabulary knowledge by providing rich learning experiences. To achieve this, the core intervention was based on several teaching principles, including repeated exposure to materials in various contexts [13], deep processing [76,77], modeling [78], and scaffolding [79,80]. The second goal was to explore learning transfer effects by indirectly fostering word awareness. It was hoped that the core activities would furthermore have the potential to motivate children to be curious about words, to enjoy playing with and investigating words, their usage, multidimensionality, nuances of meaning, and interrelatedness [13,31]. Additionally, to encourage the children to think about words outside of the intervention sessions, a few extra homework activities were included. Examples of these activities included, for example, asking children to find out what the longest word is in Spanish, to ask their parents what their favorite word was, to write down the first word they heard when they woke up on the following day.
As the focus of the intervention was on oral vocabulary, all activities required an oral response from the children. Specifically, children were asked to pay attention to information presented in written form on a paper or poster, or in a picture (visual), or to listen (auditory), then think, and lastly explain or tell something to the group (oral).
The definition method. This method involved the direct instruction of dictionary like definitions of words. The central idea was that words were presented and treated in isolation. The focus of the activities was the definitions of words themselves in the sense that children were taught what the components or characteristics of a "good" definition are. High-quality definitions were defined as the ones that are effective in helping others to understand the meaning of an unknown word. The main role of the trainer in this group was to call children's attention to the structure and components in word definitions, such as synonyms, antonyms, anchor sentences, and examples. Fig 1 shows a summary of the activities used in the definition training and their main goals.
The context method. In this method the taught words were always embedded in a short text or dialogue. The most important aspect was that the trainer should try not to give a direct and explicit dictionary-like definition of the words being taught at the onset of each session. Rather, children were encouraged to formulate their own verbal definitions based on the information encountered in the presented contexts, in the discussions with the trainer and their group mates, and also by taking advantage of their prior experiences and background knowledge. The main role of the trainer was to help and guide children in building and structuring their own word knowledge network, using their own words and personally relevant experiences. Thus, the activities were designed to explicitly foster the connection between incoming information and prior knowledge. For both training methods, in each session just one of the activities listed was used to teach the words of the day. The sequence of activities followed the order pictured in Figs 1 and 2, and this cycle was repeated until the end of the intervention.
The same nine research assistants who performed the pre-, post-, and follow-up evaluations (as evaluators) delivered also the seven-week training program (as trainers). To avoid contamination, each research assistant was randomly assigned to a single training method. Thus, although some taught more than one group of children, each assistant taught only one intervention condition (either control, definition, or context). Additionally, due to the fact that each research assistant served both as trainer and evaluator, we ensured that the research assistants did not evaluate the children they taught in the intervention sessions, in order to avoid bias at post-test.
Training sessions were held for the research assistants of each intervention method and control group separately to avoid cross-contamination. The training material included a short summary of the main ideas of the project, research ethics, general instructions for the intervention program, a plan of the activities including a description of each task (objective, duration, materials needed, procedure), as well as instructions on how to fill in session protocols. As part of the training, recruited instructors were additionally advised about strategies for improving behaviour in the classroom [81,82].

Implementation fidelity
Session protocols were completed by the trainers after each session as an additional control to ensure training method compliance (content fidelity). It included a description of what words were taught, what activities were performed during a particular session, how much time each of the activities took, and additional comments about the behavior of the children and other incidents or interruptions. As an additional measure of integrity, structured observation protocols were filled out by a trained third-party observer (quality of delivery). Due to resources and time constraints, only a randomly selected number of sessions were observed. Both these structured protocols as well as the protocols completed by the trainers were reviewed by the principal investigator each week to ensure adherence to the protocols and to promptly identify any potential problems.

Oral vocabulary training
In order to offer ongoing support to the trainers during the implementation of the intervention, meetings with all assistants were performed on a regular basis. Just as in the initial Oral vocabulary training training sessions, separate meetings were held for the assistants from each teaching method to avoid contamination. In these meetings, assistants could exchange experiences and discuss any problems encountered in the sessions in relation to using the materials and in dealing with children's behaviors.

Statistical analyses
The recommended method to assess the relative effectiveness of training studies is mixed design ANCOVA that takes into account pre-test variation between children [83]. Thus, for each set of results reported an ANCOVA analysis was performed with one between-subjects factor, Group (definition, context, control), and one within-subjects factor, Time (post-test 1, post-test 2), with pre-test scores entered as covariates. The ANCOVA paradigm was implemented as a linear mixed effect model using the lme4 package [84] in the R environment [85]. Consequently, the reported coefficients (bs) represent the estimate of the difference between two groups being compared. Exact p values cannot be calculated for these types of analyses and the significance of parameters must be assessed by inspecting the confidence intervals [86]. Confidence intervals which contain zero indicate a non-significant parameter-that is, the difference between the two values being compared is not significant. Finally, vocabulary knowledge of the taught and control words was analyzed at the individual item level, rather than at the subject level to simultaneously account for the crossed random effects of participants and items [87]. This technique also minimizes the impact of missing data. Accordingly, for the analyses involving the taught and control words the b values represent the difference in the mean VK rating scores, per item. For other analyses the b values represent the difference in subject means. In sum, in the main analyses that follow, what is being compared is whether there is a significant different between groups on each task, after controlling for pre-test differences.

Descriptive analyses
Preliminary analysis, including missing values analysis, showed that assumptions required for the statistical models used were met to an acceptable degree. Table 2 shows the mean score and range for all measures, broken down by time point and group. Omnibus ANOVAs confirmed that there were no significant differences between groups on any of the tasks at pretest. Of note, the proportion of children falling below the 50th percentile at pre-test was 64% for receptive vocabulary, 82% for expressive vocabulary and 72% for reading comprehension. This pattern of results is in line with previous results reporting a link between SES levels and literacy performance in Spain [7,8].

Implementation checks
Based on the implementation data collected, content fidelity was considered sufficient. Nevertheless, trainers reported difficulty in managing children's behavior. Thus, additional behavior management training was provided and extrinsic motivational strategies were introduced from session seven onwards in order to try to minimize the negative effects on learning outcomes. For example, a list of behavior ground rules was developed in collaboration with the trainers. Accordingly, children could collect stickers won after each session if they were compliant to the behavioral rules.

Training effects
A summary of the regression coefficients for all analyses can be found in Table 3  It should be noted that both the covariate of pre-test knowledge, as well as the interaction of pre-test knowledge with Time were significant in these analyses, indicating that children with higher pre-test scores improved more at both post-test 1 and post-test 2 compared to children with lower pre-test scores (and this is also true of all subsequent analyses). This result extends evidence of Matthew effects in reading beyond the oft-cited reports among English learners [88] and underlines the necessity of controlling for pre-test knowledge.
Turning now to potential transfer learning effects involving the control words, at post-test . Again, the only significant factor in determining later expressive vocabulary knowledge was the covariate pre-test vocabulary knowledge, and there were no differences between the two training methods either at post-test 1 or post-test 2.
Finally, with respect to reading comprehension, no effect of either training method was found, either at post-test 1 (b definiton = 0. 13 . As for the standardized measures of vocabulary, the only significant factor in determining later reading comprehension was the covariate pre-test reading comprehension. There were no differences between the two training methods either at post-test 1 or post-test 2.

Discussion
This study set out to test the efficacy of two rich oral vocabulary training methods in comparison to a read-aloud control group in a sample of third-grade Spanish speaking children from schools located in low SES neighborhoods. As expected, below average scores in the standardized measures of vocabulary and reading comprehension were found for children in this sample, and this is in accordance with the literature with English-and Spanish-speaking children [5,6,8].
The main results of the intervention, which corroborate the large body of evidence about vocabulary instruction for English-speaking elementary school children, confirm the high effectiveness of rich instruction [10,12,[45][46][47][48][49]53,54,56,58]. It is important to note that the children in the control group were incidentally exposed to the training words in the books that they read and this may have allowed some incidental learning of the training words to occur within this group [26,67,89]. Indeed, reading aloud has been shown in the past to be effective in improving vocabulary learning [90] due to children's ability to implicit learn words from context. In this sense, we effectively stacked the deck against ourselves, potentially making it more difficult to find statistically significant differences between the training groups and the control group. Nevertheless, using such a control group does have an advantage compared to using a null control group. By using a read-aloud activity for the control group, we were able to determine whether our training methods provided meaningful gains compared to the children's usual reading activities. Considering these results, we conclude that both definition and context methods of rich vocabulary instruction were more effective in teaching children the meaning of the target words when assessed at the end of the intervention in comparison to the simple exposure to the words that the control group received during their story reading sessions-that is, compared to activities children would likely undertake in reading classes.
Furthermore, five months after the intervention had terminated, children from the definition method still demonstrated a significant learning advantage over the control group. In contrast, the word knowledge advantage shown by the context group over the control group was no longer significant. This suggests that the positive effects of the contextual method boosted word learning only while it was being applied. At the same time, the results suggest that the definition method provided persistent improvement in word knowledge. This pattern of results is contrary to our expectations and those reported by Nash and Snowling [56].
One possible explanation for the long-term advantage of the definition group lies in the methodology itself and its adequacy for this age group. Developmentally speaking, the children who participated in the study were at an age where children in general are just starting to develop their metalinguistic abilities (around 8 years old; [91]). This is supported by informal observations in some activities at the beginning of the study-for example, even when children could correctly judge whether a definition was "good," or whether a word was used correctly in context, they nevertheless often struggled to express the reason why they thought so. Thus, reflecting about one's language choices and expressing word knowledge in the form of a general decontextualized definition appeared to be very challenging for these children at the onset of this study. Because the activities in the definition method were designed to clearly identify the relevant elements of a definition as well as to teach how to anchor and structure definitions, they provided children with additional support in organizing and expressing the word knowledge being acquired. In other words, in addition to accumulating new semantic knowledge, the children in this group were learning how to better express semantic knowledge in the form of a clearly structured definition by following an explicit model. In contrast, in the context group, although children were exposed to more words and stories compared to children in the definition group, the manner in which this knowledge was added to the already existing knowledge structures was less prescribed and less systematic. Consequently, these children had to rely more heavily on their own learning strategies for organizing the knowledge being presented.
Moreover, the way in which the activities were designed in the context condition meant that the success of this method was more dependent on the ability of the trainer in moderating the discussions and personal stories told by the children. As a result, even though children in the context group were able to express some of the attained word knowledge in the short-term, this knowledge may have been poorly anchored and was potentially attached to unstable structures that did not facilitate retention and accumulation of further word knowledge in the longrun. This suggests that for children with poor vocabulary knowledge or learning difficulties, a methodology that additionally guides word learning by providing a clearer model of word definition might be more suitable. A similar argument was made by Sternberg [92], in which an elaborated and rich pre-existing knowledge was said to facilitate further learning. It would, therefore, be plausible that a teaching method less dependent on children's own word learning strategies to accommodate knowledge to already existing (in this case potentially poor) knowledge structures would be more beneficial to children with low vocabulary knowledge. Consequently, the clear (pre-determined) structure and explicit models of student-friendly definitions offered and trained in the definition method used in this work would be more adequate to support these children's word learning processes.
In relation to the potential of the training methods to produce learning transfer effects to items not taught in the sessions, compared to the control group, only the children from the context group showed significant higher levels of knowledge for the control words immediately after the intervention. That said, the advantage for the definition group over the control group just failed to reach significance [95%CI -0.01, 0.39]. Given this confidence interval, a more practical interpretation of the definition result is that it too was more effective in improving word knowledge than the control method of mere exposure.
The success of the context method in demonstrating transfer effects is in accordance with the predictions: in addition to indirectly fostering word awareness, it was designed to elicit word relatedness and to allow children to encounter a larger number of words within dialogues and stories. This combination of effects would potentially increase the probability of acquiring knowledge about words not taught in the intervention. Nevertheless, we predicted an advantage for the context method over both the control and the definition group at posttest 1. However, no such advantage was found. Interestingly, five months later, a similar pattern to that seen with the taught words was found with the control words. Children in the definition group showed significantly higher levels of knowledge of words not taught in the intervention compared to the children in the control group, but the advantage shown by the context group had all but disappeared. According to the theory of spreading activation [93], improving representations of related concepts, as in the case of words not directly taught in an intervention, could be facilitated by a well-organized word knowledge. As stated previously, the definition methodology seems to have been more adequate in helping children to better structure and express their word knowledge-and crucially, both the gains in semantic knowledge and the ability to better express this knowledge appear to be persistent, long-term gains.
The long-term advantage of the definition method for the non-taught items could be interpreted in two additional ways. The first possibility refers to the effect of word awareness as a means of boosting word learning beyond the intervention sessions. If this method was effective in making children more curious about and attentive to words in general, it is plausible that children could improve their vocabulary knowledge of the control words as well as the taught words. If that were the case, statistically significant gains in the standardized measures of receptive and expressive vocabulary in children pertaining to the definition group would be expected. However, no such differences were found between the groups for either of the standardized vocabulary measures, suggesting that none of the methods had a significant impact on word awareness.
The second explanation for the long-term advantage of the definition method for the non-taught items involves the already mentioned general effect of the definition method in enabling children to express their word knowledge more precisely. If this were the case, statistically significant improvements would be expected in favor of the definition group in the WISC-IV vocabulary subtest, which similarly demands the ability of defining words orally. Yet, no differences were found in this measure. This raises the question about why improvements were found in the VK test of non-taught words, but none were found in the standardized test of expressive vocabulary, which also contained non-taught words. One possibility is that the VK test is more sensitive than the WISC-IV, both in terms of the items used and the scoring scale. In the first instance, all words in the VK test were age appropriate as they were taken from age-appropriate books. In contrast, the WISC-IV is designed for use with a wide age range (6 to 16 years old). Consequently, the first words in the WISC-IV (e.g., vaca [cow]) are probably too easy for the majority of children in this sample while the last ones (e.g. locuaz [loquacious]) are almost certainly too difficult. Consequently, these items would have very low power to discriminate within the sample, and in effect, a reduced number of items would be responsible for most of the variation found in the WISC scores. Accordingly, this would reduce the sensitivity of the test. The second factor to consider is the difference in scale between the two measures. The theory-based VK test employed a five-point scale and therefore would allow for the capture of smaller changes in word knowledge. In comparison, the WIS-C-IV vocabulary subtest uses a three-point scale (not known/more or less known/known).
Finally, for reading comprehension, no significant differences were found between any of the groups. This is in accordance with the literature, which has shown that vocabulary instruction has a larger impact on customized rather than on standardized measures of vocabulary and reading comprehension [10,36,53,54]. Although, some studies have reported increases in reading comprehension after vocabulary training [48,49], the texts used in these studies were conceived for the intervention and contained the taught words. In contrast, in this study none of the trained words appeared in the reading comprehension standardized tasks. In this sense, more general effects from vocabulary training to reading comprehension were targeted. The results suggest that the training methods were not robust enough in fostering children's word awareness to make a significant contribution to increasing performance in the reading comprehension measure. It should be noted that, while the hypotheses regarding the direct effects of the intervention were based on clear empirical evidence regarding the effect size [53], the hypothesis regarding the impact on reading comprehension (and indeed expressive and receptive vocabulary) were more theoretically based, and without clear empirical evidence for a specific effect size.
If we consider the theories about the relationship between vocabulary knowledge and reading comprehension, there are specific possibilities in which reading comprehension could be improved via a vocabulary training program. Firstly, if a direct relation is assumed, in the sense that knowing more words in a text would facilitate its comprehension, then improved comprehension could be achieved by teaching the specific words that come up in the tested texts (as it was the case in [48,49]; and in studies of text readability, e.g., [94]). Conversely, if the possibility of an indirect relationship between vocabulary and reading comprehension is assumed, the intervention would need a design that triggers not only an enrichment of vocabulary knowledge, but also a reorganization or restructuring of linguistic and metalinguistic information within the lexicon of the child. This could be accomplished in basically two ways. Either the intervention is designed to teach an enormous amount of words that would, in effect, accelerate or provide additional support of the natural process of learning words (as per developmental theories; [95]) or the specific teaching method must be thought to have the potential to trigger these restructuration processes even when fewer words are taught (as was the hope in this study). The study by Clarke and colleagues [61] did report significant improvements in standardized tests of reading comprehension by teaching the same number of words as were taught in this study. Nevertheless, apart from differences in sample characteristics (children with reading difficulties) and methodology (training length, intensity), in Clarke's study [61] the oral vocabulary training was only one component of a broader intervention for oral language abilities. One could argue that this intervention had the potential to promote only part of the effects triggered by the more comprehensive oral language intervention employed by Clarke and colleagues. Thus, the results of this study suggest that perhaps a small effect might exist, but this study did not have the power to detect it.
In summary, the results of this study demonstrate the effectiveness of explicit vocabulary instruction based on oral language activities using a sample of Spanish-speaking children from low SES. This is consistent with the works of Beck and colleagues [45,48,49], Nash and Snowling [56], Jenkins and colleagues [58], and Clarke and colleagues [61]. More specifically, the present results suggest that the rich oral vocabulary instruction based on the definition method was more effective to teach target word meanings. Moreover, children appeared to additionally profit from long-lasting and specific effects of the training in regard to structuring and expressing their word knowledge more precisely.

Limitations and future directions
An important issue to be considered when interpreting the overall results of the intervention is related to children's behavior. There were children in the sample who displayed disruptive behavior and some trainers reported that the strategies provided in their training sessions were not enough to create an optimal learning atmosphere. In response to this, additional behavior management strategies, based on extrinsic motivation, were introduced from the seventh session onwards in order to try to minimize the negative effects on the learning process. A description of cases would go beyond the scope of this work, but it is relevant to say that considering the numerous class disruptions experienced during the first half of the intervention, the positive effects found in this study are impressive, and are a reason to believe that children are highly skilled word learners when they are exposed to a rich language environment. Furthermore, even though behavioral issues were theoretically covered in the training provided to the evaluators, based on reviewing the protocols and from the weekly meetings, we decided to provide additional training and behavior management techniques to the trainers. When working with populations with behavioral problems, recruiting specialized and more experienced teachers is recommended. An alternative would be to implement more extensive training, along with a trial phase to allow trainers to get to know the children and gather initial experience using the behavioral strategies in the group before intervention officially starts.
A second limitation is that, due to the schools cancelling some sessions, the original plan to teach three words per session had to be modified to teach four words per session. This meant that less time could be spent teaching each of the words of the day, and children had less time and fewer opportunities to talk about each word. Especially in the case of the context method, it has been reported that this teaching technique would require more practice time to produce similar word learning effects compared to the definition method [58]. Potentially, this could have reduced the effectiveness of the two training methods for the second half of the study.
Despite the mentioned limitations, this study nevertheless fills an important gap in the literature, as to the best of our knowledge, it is the first evidence-based vocabulary training program undertaken with Spanish-speaking children which has used a randomized controlled design. Additionally, the inclusion of the five-month follow-up evaluation allowed us to assess the long-term efficacy of the two methods, and the importance of this was highlighted by the fact that the results changed from post-test 1 to post-test 2. Such data not only allow for more accurate cost-benefit estimations of potential interventions, but also enable a deeper understanding of the specific learning effects potentially triggered by the particular teaching techniques. The inclusion of the session protocols as a means of accessing implementation fidelity also allowed us to identify potential problems and take corrective action.
Further qualitative analyses of children's answers are planned. These should serve as basis for generating hypotheses for future exploration of the positive effects of the definition method in relation to fostering concept formation as well as a self-teaching strategy when learning new words or expressing word knowledge. The clear structure taught using this method could have the potential to help children when further learning new words independently, as they could possibly be trained to take special note of specific information and to develop word storage mechanisms that are supported by a pre-structure. The approach may be especially valuable for children with language comprehension difficulties and with below average vocabulary knowledge, as these children may benefit from scaffolding of more explicit teaching methods. From the perspective of teaching practices, this is also a more straightforward method and easier to apply for less experienced teachers, who might rely more strongly on instructions given in a manual.
Finally, various actions were planned and undertaken to facilitate the dissemination of these results to a wider, non-academic audience. These included oral presentations in the participating schools, written summaries for parents, and a vocabulary program manual in Spanish with detailed description of all activities in the training groups for elementary school teachers (a short summary in Spanish can be found in the supplementary materials; S3 Appendix). However, it is important to say that we are aware of the limitations of research and sample representativeness as well as the complexity and dynamics of school reality. Therefore, by no means it is a claim that the definition vocabulary program, as it was used in this work, is the best program for Spanish-speaking children at this age. In order to inform policy, stronger evidence based on larger scale studies and systematic mapping of vocabulary teaching practices in Spanish-speaking countries is needed. What can be said, based on the experiences gained implementing this project, is that the explicit rich vocabulary instruction can be recommended over one of the traditional methods practiced in Spanish schools. In addition, the definition program designed for this study can be used as a basis for further development of vocabulary intervention studies and for discussion with and among educators working with Spanish-speaking populations.