Risk Factors for Low Receptive Vocabulary Abilities in the Preschool and Early School Years in the Longitudinal Study of Australian Children

Receptive vocabulary development is a component of the human language system that emerges in the first year of life and is characterised by onward expansion throughout life. Beginning in infancy, children's receptive vocabulary knowledge builds the foundation for oral language and reading skills. The foundations for success at school are built early, hence the public health policy focus on reducing developmental inequalities before children start formal school. The underlying assumption is that children's development is stable, and therefore predictable, over time. This study investigated this assumption in relation to children's receptive vocabulary ability. We investigated the extent to which low receptive vocabulary ability at 4 years was associated with low receptive vocabulary ability at 8 years, and the predictive utility of a multivariate model that included child, maternal and family risk factors measured at 4 years. The study sample comprised 3,847 children from the first nationally representative Longitudinal Study of Australian Children (LSAC). Multivariate logistic regression was used to investigate risks for low receptive vocabulary ability from 4–8 years and sensitivity-specificity analysis was used to examine the predictive utility of the multivariate model. In the multivariate model, substantial risk factors for receptive vocabulary delay from 4–8 years, in order of descending magnitude, were low receptive vocabulary ability at 4 years, low maternal education, and low school readiness. Moderate risk factors, in order of descending magnitude, were low maternal parenting consistency, socio-economic area disadvantage, low temperamental persistence, and NESB status. The following risk factors were not significant: One or more siblings, low family income, not reading to the child, high maternal work hours, and Aboriginal or Torres Strait Islander ethnicity. The results of the sensitivity-specificity analysis showed that a well-fitted multivariate model featuring risks of substantive magnitude does not do particularly well in predicting low receptive vocabulary ability from 4–8 years.


Introduction
The initiation and expansion of language from childhood onwards is a universal and remarkable developmental accomplishment [1]. Receptive vocabulary is a central marker of language development as well as general ability and literacy. From onset in infancy, it develops rapidly in the preschool and school years, from around 200 words in the second year [2], to 20,000 words at 8 years [3]. Because vocabulary is also characterized by its continued growth throughout life, it has featured as a prominent and longstanding measure frequently used in omnibus scales of ability and intelligence [4,5]. To the extent that measures of vocabulary predict general ability and literacy they are also associated with onward academic competency and success [6].
Longitudinal studies have shown a consistent pattern of early emergence of disparities in language acquisition that persist over time [7] and have far reaching consequences for children's success at school and opportunities beyond school [8,9,10,11,12]. Because of the crucial role of oral language as a developmental means for literacy, education and employment, language acquisition is one of the major pathways that is seen to support human capability formation [13].
The rapid growth in receptive vocabulary in the preschool and school years invites a careful consideration of the underlying mechanisms that drive this growth. Necessarily, investigations of predictors of variation in the onset and growth of language abilities require large numbers of children and collection of data on a broad range of candidate predictors. The trade-off is that measures of language ability are generally narrow because direct behavioural assessment of children's language abilities is expensive. Such large-scale studies are rare in the field of language acquisition and largely designed to investigate a number of key developmental outcomes, not just language development. Therefore, in population level studies, measurement is typically restricted to one component of language [1,12,14] or one composite measure of general language ability [9,15]. This stands in contrast to smaller, 'purposive' studies which often measure specific aspects of language in great detail but do not necessarily capture growth or change within an ecological context.
A recent study of receptive vocabulary growth from 4-8 years showed substantial variation in receptive vocabulary ability at 4, 6 and 8 years and that predictor significance and strength also varied over time [1]. The study produced some counterintuitive effects: For example, factors associated with low receptive vocabulary ability at 4 years of age were maternal Non-English Speaking Background (NESB), low school readiness, child not read to at home, four or more siblings, low family income, low birthweight, low maternal education, maternal mental health distress, low maternal parenting consistency, and high child temperament reactivity. Yet, none of these factors were associated with an onward lower rate of growth from 4-8 years. Instead, the following factors were associated with a higher rate of receptive vocabulary growth: Maternal NESB, low school readiness and maternal mental health distress, although these higher rates of growth did not completely close the receptive vocabulary ability gap for children with and without these risks. The only risk factor associated with a lower onward rate of growth was socio-economic area disadvantage, which was actually not a risk for low receptive vocabulary ability at the intercept (i.e., 4 years) [1].
In a study that used data from the 1970 British Cohort Study, low receptive vocabulary at 5 years was a significant risk factor for low functional literacy in adulthood [12]. Five-year-olds with receptive vocabulary ability 1 SD below the mean (11.5% of the sample), were more than 3 times as likely to have low functional literacy in adulthood compared to 5-year-olds who met normative expectations for receptive vocabulary ability. The odds were substantially higher for children with receptive vocabulary ability 2 SD below the mean (3.9% of the sample). These children were almost 7 times as likely to have low functional literacy in adulthood, compared to children who met normative expectations for receptive vocabulary ability at 5 years. However, low receptive vocabulary ability at 5 years did not foretell low functional literacy attainment for most children. Sixty-eight percent of the 5-year-olds with low receptive vocabulary ability, defined as 1 SD below the mean, attained onward functional literacy. Once again, and somewhat counterintuitively, the percentage of children who attained functional literacy was higher for children who performed 2 SD below the mean than children who performed 1 SD below the mean. Eighty percent of children with low receptive vocabulary ability 2 SD below the mean attained functional literacy, compared to 68% of children with low receptive vocabulary ability 1 SD below the mean [12]. In a study that used data from the Children of the NLSY79 [14], race disparities in receptive vocabulary ability evident at 5 years persisted through to 13 years.
What remains unknown is the extent to which low language ability persists or improves beyond toddlerhood, and the extent to which other factors in the child's wider developmental environment are associated with this in the preschool and early school years. This study had two aims. First, to investigate the extent to which low receptive vocabulary at 4 years was associated with onward low receptive vocabulary at 8 years. Second, to estimate the predictive utility of a multivariate model that included child, maternal and family risk factors measured at 4 years toward the prediction of low language ability at 8 years. This study offers a rare assessment of a comprehensive set of risk factors for the persistence of low receptive vocabulary in kindergarten-and school-age children in a representative population level sample.

Ethics Statement
The Longitudinal Study of Australian Children (LSAC) is conducted in a partnership between the Australian Government Department of Social Services (DSS), the Australian Institute of Family Studies (AIFS) and the Australian Bureau of Statistics (ABS). The study has ethics approval from the Australian Institute of Family Studies Ethics Committee. The Ethics Committee is registered with the Australian Health Ethics Committee, a subcommittee of the National Health and Medical Research Council (NHMRC). Caregivers gave written informed consent to the survey. As the study children were all minors at the time these data were collected, written informed consent was obtained from the caregiver on behalf of each of the study children. The signed consent forms are retained by the field agency (ABS).

Access and Use of LSAC Data
Confidentialised LSAC data are publicly available. Researchers can apply to the Australian Government Department of Social Services for permission to access and use Longitudinal Study of Australian Children (LSAC) data (DSS website. Available: http://www.dss. gov.au/our-responsibilities/families-and-children/programs-services/ growing-up-in-australia-the-longitudinal-study-of-australian-childrenlsac. Accessed 2014 Feb 7Aug 2).

Study Design
The study sample comprised 3,847 children who participated in the Longitudinal Study of Australian Children (LSAC). The LSAC is a national longitudinal study that commenced in 2004. Data collection for the study is led by a consortium of expert Australian researchers [16]. Guided by a bioecological model of child development [17], data are collected on child, parental, family, community and school characteristics that influence children's development at different ages (i.e., a developmental pathways approach) [18,19]. The measurement framework is comparable to indicator frameworks used internationally [20]. Indicator frameworks group variables that influence child development into key domains. For example, time, income, human capital, psychological capital and social capital [21]. This has produced a comprehensive (and expanding) set of independent variables that researchers can select from, to model in relation to specific developmental outcomes, which in this study, is language development.
The study uses a cross-sequential design of biennial face-to-face visits with the family and study child. In this study we used data from the child cohort collected at 4 and 8 years. The child cohort comprised 4,983 children at 4 years and 4,331 children at 8 years. Response rates to the outcome measure, the Adapted Peabody Picture Vocabulary Test-III (PPVT), and the candidate predictors varied; 4,406 children completed the PPVT at 4 years, 4,273 completed the PPVT at 8 years; and 3,847 children completed the PPVT at 4 and 8 years.
The LSAC sampling frame was extracted from the Medicare Australia enrolment database, which was validated to ensure coverage of Australian children within the target age-range. The initial study sample was designed to be representative of Australian children within the selected age cohort, proportional to the regional distribution of children in the Australian population. An initial sample size of 5,000 was chosen as to ensure there would still be a sufficient sample for detailed analysis after attrition over the number of years of the longitudinal study.
The study entailed a two-stage clustered design, first selecting postcodes then children within postcodes. Stratification was used to ensure proportional geographic representation for states/ territories and capital city statistical division/rest of state areas. Cluster sampling was utilised because it provides a cost effective way to conduct face-to-face interviews, as well as an opportunity to collect and analyse community-level effects. Postcodes were selected with probability proportional to size selection where possible, and with equal probability for small population postcodes. Children were selected from 311 postcodes [22,23].
Analyses show that the initial sample was broadly representative of the general Australian population when compared with 2001 Census data, but slightly under-representative of families who were single-parent, non-English speaking and living in rental properties. Attrition somewhat increased these biases. For example, the overall attrition rate between ages 4 and 8 was 13%, but children with mothers classified as Non-English speaking background decreased from 15.7% at age 4 to 13.8% at age 8, an attrition rate of 23%. The proportion of mothers who had a year 11 or less education decreased from 39.2% at age 4 to 36.5% at age 8, an attrition rate of 19% [24].

Measures
The selection of measures used in this paper is guided by the LSAC ethos, empirical and/or theoretical associations with language development, and our previous work in the area.
The LSAC was funded as part of the (then) Department of Family and Community Services Stronger Families and Communities Strategy, which aimed to establish partnerships to strengthen families and communities and develop and deliver solutions at a local level [25]. The survey was designed with an explicit aim of use by a range of Commonwealth and State and Territory departments, and the research community. The variables selected for inclusion in the LSAC were based upon known risk factors from established longitudinal studies and a consideration of those variables which could meaningfully impact policy interventions [25].
Our variable selection in this paper is guided by our previous work in this area [1]. The study conceptual model [17,26] posited multiple domains of influence on child development. Among these domains are characteristics related to the child, the mother, and the family home environment. Variables in this predictor set met one of the following two criteria: (1) Evidence of an independent association with English language abilities in a representative population level sample of preschool and school age children; or (2) conceptual relevance to language abilities, in the absence of empirical evidence. Many of the measures are benchmarked against Australian census collections while others are referenced to large scale Australian and international child development studies.
Our previous work, using a multi-level modelling approach to receptive vocabulary growth, revealed some counter-intuitive findings [1]. In the multivariate model, risks for receptive vocabulary delay at 4 years, in order of magnitude, were: Maternal Non-English Speaking Background (NESB), low school readiness, child not read to at home, four or more siblings, low family income, low birthweight, low maternal education, maternal mental health distress, low maternal parenting consistency, and high child temperament reactivity. None of these risks were associated with a lower rate of growth from 4-8 years. Instead, maternal NESB, low school readiness and maternal mental health distress were associated with a higher rate of growth, although not sufficient to close the receptive vocabulary gap for children with and without these risks at 8 years. Socio-economic area disadvantage, was not a risk for low receptive vocabulary ability at 4 years but was the only risk associated with a lower rate of growth in receptive vocabulary ability. These variables were detailed extensively in that paper. The majority of these are singleitem measures.
Response variable. Our response variable is a measure of one the major component systems of language development -the semantic system. Children were assessed with the Adapted Peabody Picture Vocabulary Test-III (PPVT), a test of receptive vocabulary designed for the LSAC study [27]. The Adapted PPVT-III is a shortened version of the PPVT-III [28]. The Adapted PPVT-III was developed and validated specifically for Australian children at 4, 6 and 8 years, and the technical properties are described in detail by Rothman [27,29]. The Pearson product-moment correlation between the full PPVT-III and the Adapted PPVT-III was 0.93 for all children. After administration to the LSAC children, a one-parameter (Rasch) item response model was fitted to the data, which consisted of correct and incorrect responses, this gave the Adapted PPVT-III a person-separation reliability of 0.76 [27,29].
The Adapted PPVT-III was administered directly to each child during the home interview. For each word presented, the child was shown a card containing four pictures and was asked to point to the picture corresponding to the word (e.g., ''Show me wrapping''). Scaled scores for the Adapted PPVT-III were used in all analyses. Table 1 contains the median ages and age ranges in months for the study children along with the mean Adapted PPVT-III Rasch scaled score with SDs and associated ranges.
Age standardizing. In this study we wanted to examine whether children's receptive vocabulary, relative to their peers, at age 4 (51-67 months) is predictive of their receptive vocabulary, relative to their peers, at age 8 (95-119 months). Our previous work [1] established the relationship between child age and receptive vocabulary (PPVT), as illustrated in Figure 1. A linear regression model estimated a slope of 0.35 at age 4; that is, for every month older the study children get, we expect an increase of 0.35 in their PPVT scores. Therefore, we decided to adjust for child age, by comparing children's receptive vocabulary to their peers closest to them in age.
To do this we created age-groups of approximately equal size within the 4-year and 8-year age-cohorts (see Table 2). We then created PPVT z-scores where children were ranked relative to their peers within each of these age groups. Once this was completed we re-assembled the PPVT z-score rankings for each of the age 4-year and age 8-year cohorts. Thus, a child at 51 months of age who is -2 standard deviations on the PPVT measure compared to their age peers can be compared with a child of 59 months who is also -2 standard deviations on the PPVT measure. A similar method was used by Sanson et al [30] in establishing outcome measures for the LSAC.
Finally, as we were interested in the risks of low receptive vocabulary, we defined the children in each of the age 4 and age 8 cohorts as having low receptive vocabulary where they fell at or below the lowest 15 th percentile in PPVT performance. This enabled us to ask whether, relative to their peers, having low receptive vocabulary at 4 years is predictive of their receptive vocabulary at 8 years.
Candidate predictors. A total of 29 candidate predictor variables were used. These were grouped into child, maternal, and family and home environment characteristics. All predictor variables were measured when the child was 4 years of age. Predictors were modelled as risk variables with the lowest level of risk as the reference category (see Table 3). For each variable, the percentage of the sample in each of the reference categories, along with the percentage of the sample in each of the other categories are shown in Table 3. For example, in examining birthweight we used children of 'normal' birthweight as our reference group. For school readiness, we used children in the quintile with the highest Who Am I? scores as our reference group. The analytic sample for each candidate predictor varied somewhat, depending on item completeness.
The child characteristics in our models were: Gender, ethnicity, birth weight, ear infections, school readiness and temperament.
There were equal proportions of girls and boys in the sample. A small proportion of children (n = 121; 2.8%) were of Aboriginal and/or Torres Strait Islander decent and were coded to distinguish them from those who were not. Primary carers were asked to report their child's birth weight which was subsequently coded into those children who were born with low birth weight (,2500 grams; 6.3%) and those who weighed more than this (. = 2500 grams). A single item indicator of ongoing ear infections at 4 years was included.
In addition to the Adapted PPVT-III, each study child was directly assessed at 4 years using the Who Am I? [31]. This is a measure of school readiness and comprises 11 items in which children write their names, copy shapes and write words and numbers. It has been extensively calibrated for use in the LSAC and has well demonstrated item characteristics, high internal reliability (0.89), and excellent distributional properties [29]. Rothman [29] reported a correlation between the Adapted PPVT-III and the Who Am I? of 0.31 for the LSAC children at age 4. In this report, study children have been grouped into quintiles of performance based on the total Who Am I? score with high quintiles representing higher levels of performance.
Child temperament was measured at 4 years with the Short Temperament Scale for Children (STSC) [32]. The STSC measures three dimensions of temperament: persistence, reactivity and sociability. Each temperament dimension was assessed through parent report using four items, rating the frequency of the behaviours on a 6-point Likert scale of occurrence from ''almost never'' to ''almost always''. Where data were missing for any of the items making up a dimension of temperament respondents were coded as missing for that variable. Four composites were constructed based on the respective items and each was then divided into quintiles with higher quintiles representing the positive aspects of each dimension.
The maternal characteristics in our models were: Age at the birth of the child, problematic alcohol use, smoking, mental health distress, education, hours of paid employment and parenting. The biological mother's age at the birth of the child was grouped into categories representing teen birth (, age 20 years), 20-39 years and 40 or more years at birth with the vast majority of mothers (94.5%) of study children in the age range 20-39 years.
Information on current tobacco and alcohol was gathered from the mothers. We defined problematic alcohol use where women reported their daily alcohol consumption to exceed 2 standard drinks and/or where they reported frequent binge drinking of 5 or more alcoholic drinks at least 2-3 times per month with 12.0% being so classified. Study children's mothers were asked about tobacco use and also categorised as either current smokers (21.8%) or not current smokers.
In this study, we used the Kessler-6 (K-6) scale to measure maternal non-specific psychological distress. Women with scores of 8 or more were classified as having symptomatic psychological distress. This threshold is consistent with other studies [33,34]   using the K-6. The scale has robust characteristics as an indicator of mental health with recent Australian findings [35]. Fifteen per cent of mothers reported symptomatic psychological distress.
In Australia, at the time of this study, 10 years of education was compulsorily mandated. Maternal education in years was grouped into three levels according to those who had completed 11 years (36.4%), 12 years (32.9%), and those who had completed more than 12 years (i.e. University education) (30.7%).
Mothers were variously employed at the time when the children were first measured. We used total hours of paid maternal employment to distinguish mothers who were not in paid employment (0 hours), from those in part time paid employment (1-37 hours; Mean = 17.8) and in full time paid employment (. = 38 hours; Mean = 44.9). Similar proportions of women were either not in paid employment (41.0%) or working part time (45.5%) with the remainder (13.5%) working full time.
The parenting characteristics of both parents were measured in a self-complete form, using four measures of parenting warmth, hostility, consistency and inductive reasoning developed for the LSAC [36]. We use the mother's responses in this report. Responses to each item were on a 5-point Likert scale, ranging from ''almost never'' to ''always/almost always''. Items for each measure were summed to create a composite score with higher levels representing more positive parenting characteristics. Item and scale properties for the LSAC parenting measures have been extensively documented [37]. Ordinal scale reliabilities [38] were 0.72 for maternal hostility, 0.82 for consistency and 0.93 and 0.94 for warmth and inductive reasoning respectively.
The characteristics of the family home environment in our models were: NESB, family structure, number of siblings, income, health care card, financial hardship, socioeconomic disadvantage, reading to the study child, playgroup and child care. As the focus of this study is explicitly on English language development and because language development is known to vary where more than one language is spoken in the home, we used the mother's Non-English Speaking Background (NESB) as a general indicator for language other than English spoken in the household at 4 years. About 14% of mothers were predominately non-English speaking at the time of the interview.
With respect to family composition, two variables were selected as candidate predictors of vocabulary development: Family structure (sole parent vs. other) and number of siblings (0, 1, 2, 3, 4+). About 11.5% of the study children were living in single mother families at 4 years. The majority had one or two siblings (49.5% and 27.4% respectively) or were singletons (10.9%) at age 4. The study questionnaire did not permit the establishment of birth order.
Families were asked to report their total weekly family income from all sources. Responses were partitioned into relatively equal quintiles: those families earning under $600, $600-$999, $1000-$1499, $1500-$1999, and $2000 or more per week. In Australia where income falls below a defined threshold and/or certain hardship criteria are met families also qualify for a health care card. About 19.7% of LSAC families had a health care card and this is used as an indicator of financial need in the LSAC families. Additionally, an indicator of family hardship was also derived where families reported, due to shortage of money over the last 12 months that: they had not been able to pay gas, electricity or telephone bills on time; they had not been able to pay the mortgage or rent on time; adults or children had gone without meals; they had been unable to heat or cool their home; they had pawned or sold something; or sought assistance from a welfare or community organisation. About thirty per cent of families reported at least one of these occurrences in the previous 12 months at 4 years.
An area measure of socioeconomic disadvantage was also estimated for each participating family. The family home was coded with Socio-Economic Indicators for Area (SEIFA) disadvantage, indexed in quintiles -lower quintiles represent greater levels of disadvantage. The neighbourhood SEIFA disadvantage index summarizes information from the Australian Census of Population and Housing as this relates to economic and social disadvantage in small areas, such as low income, low educational attainment and high unemployment [39]. These data were linked at the Statistical Local Area (SLA) level or, where this was not available, the child's postcode. Several indicators of the child's learning environment were gathered. The frequency with which the primary caregiver read to the study child was assessed via face-to-face interview. A total of 129 (3.0%) parents reported not reading to the child at all, 802 (18.8%) reported reading 1 or 2 days a week, 1,255 (29.4%) reported reading to the child 3-5 days a week, and 2085 (48.8%) reported reading to the child daily. Mothers were asked if their child had attended a playgroup in the 12 months prior to the 4year interview with about one third indicating this to be the case. Finally, hours per week in child care or early education were identified by asking the primary carer ''how many hours a week on average does the child go to school, kindergarten, pre-school, and/ or day-care?''. A total of 258 (6.3%) attended 8 or less hours a week, 2,705 (65.8%) attended 9-20 hours a week, 818 (19.9%) attended 21-30 hours a week, and 332 (8.1%) attended 31or more hours a week. Table 4 shows the child, maternal and family home environment variables measured at age 4 and the percentage of the sample, at different levels of risk in the low PPVT group versus the

Data Analysis
The data were analysed using logistic regression in SAS 9.3 [40]. The surveylogistic procedure was used to account for the complex survey design of the LSAC. Logistic regression was used to estimate the odds of children being in the low group of PPVT (lowest 15% in each age-group) at wave 3 of the LSAC (age 7-9 years).
Our analysis proceeded in four steps. First, we used logistic regression to estimate the association between each of our candidate predictors and PPVT group (i.e., low vs. middle-high) at 8 years. Second, we grouped candidate predictors by their estimated effect size. Third, we established a multivariate logistic regression model. Finally, we undertook a sensitivity-specificity analysis to examine the predictive utility of our multivariate model.
To establish the effect sizes for the various predictors we used the odds ratios as estimated by logistic regression. For logistic regression, the odds ratio itself represents the effect size of interest [41]. Although some schemes exist denoting a correspondence between odds ratios and a substantive effect size (e.g. [41,42]), the judgement of what denotes a substantive effect size rests with the researcher, and must be considered within the context of the field of study [43]. For this paper we established an odds ratio of 2 or greater as our cut-off for a moderate effect size.    Table 5 contains the odds ratios for low receptive vocabulary at 8 years (bottom 15%), based on the candidate predictors of low performance at 4 years.
The remaining eight candidate predictors of low PPVT at age 8 were not statistically significant in the bivariate model: male gender, low birthweight, ear infections, teenage mother status, low maternal inductive reasoning, low maternal warmth, not attending playgroup, and daily use of non-parental child care.

Multivariate Analysis
The twelve predictors meeting the initial cut-off criterion of an odds ratio of 2.00 or higher in the unadjusted analyses were then included in the multivariate model (see Table 6). In the multivariate model, low PPVT at 4 years, low maternal education and low school readiness showed moderate or higher adjusted effects on the odds of low PPVT at 8 years. The odds of low PPVT at age 8 were 3.49 times higher for children who had low PPVT at age 4 relative to children who had middle-high PPVT at age 4; 2.55 times higher for children whose mothers had only completed year 11 of Australian Schooling relative to children of University educated mothers; and 2.50 times higher for children in the lowest quintile of school readiness at age 4 relative to those in the highest quintile of school readiness.
The following risk factors had adjusted effects on low PPVT at 8 years below our criterion of moderate effect size (2.0): Low maternal parenting consistency, socio-economic area disadvantage, low temperamental persistence, and NESB status. The odds of low PPVT at age 8 were 1.89 times higher for children of mothers in the lowest quintile for maternal parenting consistency relative to those children of mothers in the highest quintile of parenting consistency; 1.78 times higher for children living in the most disadvantaged neighbourhood relative to those children in the least disadvantaged neighbourhoods; 1.65 times higher for children who were in the lowest quintile of temperamental persistence relative to those in the highest quintile; and 1.59 times higher for children of mothers who spoke a language other than English relative to those children whose mother spoke English.
The following risk factors were not statistically significant in the multivariate model: Aboriginal and/or Torres Strait Islander status, low maternal work hours, low family income, child not read to at home, and one or more siblings.
Sensitivity-Specificity Analysis. Odds ratios and statistically significant associations do not necessarily lead to clear predictive relationships. Inevitably, researchers and practitioners turn their attentions towards asking how our knowledge of the associations demonstrated here might affect decisions about screening children for low language ability, predicting the persistence of this status, and offering interventions. Our intention in this section is not aimed at producing a practical algorithm for screening children for low language status per se. That would be premature. What we do want to do though, is widen the developmental description beyond just the empirical documentation of language growth and its associations. This entails including assessing the implications of the emerging evidence of high intra-individual variability in the growth of language, as measured here, the relatively low level of association between language measures and important predictors, and what this might portend for those interested in early detection and intervention. To do this, we include analyses of the predictive utility of these estimates.
We have provided the odds ratios for each predictor measure in Tables 5 and 6. The predictive utility of a variable in a logistic regression may be calculated not only from the odds ratio, but may also be calculated by estimating the area under the receiver operating characteristic curve, and by calculating sensitivity, specificity, positive predictive value and negative predictive value [44]. At the outset we would note that the single best predictor of low receptive vocabulary at age 8 was low receptive vocabulary at age 4 (unadjusted OR 5.32). Using just low receptive vocabulary at age 4 as a single predictor we calculated an area under the Receiver Operating Characteristic (ROC) Curve of 0.62. This may be interpreted as the probability that a randomly chosen child with low PPVT at age 8 will be correctly rated, relative to a randomly chosen child that does not have low PPVT at age 8 (that is, it is a measure of how well the model discriminates cases from non-cases) [45].
In contrast, the multivariate model estimates the probability that each study child has low PPVT at age 8, based on all of the selected candidate predictors at age 4. The multivariate model had an ROC of 0.75, and shows that the addition of other candidate predictors along with the age 4 PPVT performance improves discriminative utility -in other words, including additional predictors in the model has improved how well the model discriminates cases from non-cases. However, in order to evaluate predictive utility the researcher needs to establish a cut-point at which a child is classified as a case or non-case (that is, at what estimated probability do we predict a child will have low PPVT at age 8) [46,47].
Following examination of the ROC curve (see Figure 2), a probability criterion threshold of 0.13 was chosen to give an approximately equal sensitivity (0.694) and specificity (0.711). Hosmer and Lemeshow [44] would term this 'acceptable' discrimination. Using this criterion, of the 402 children with low PPVT at age 8, the model correctly identified 279 (69.4%) of them and of 2,779 children without low PPVT at age 8, the model correctly identified 1,975 (71.1%) ( Table 7). However determining acceptable classification discrimination is more complex than estimating the proportion of children with low PPVT at age 8 (sensitivity) and those who were not low at age 8 (specificity). The proportion of children classified as low that are actually low (positive predictive value), and the proportion of children correctly classified as not low (negative predictive value) are also of importance. The positive predictive value of the model at a probability threshold of 0.13 is much lower than the observed sensitivity and specificity; so, of 1,083 children predicted to have low PPVT at 8 years, only 279 of them actually had low PPVT (25.8%). That is, the model incorrectly predicted 804 of these children as having low PPVT at 8 years when in fact they did not. The negative predictive value is much higher, of 2,098 children predicted not have low PPVT at 8 years, 1,975 (94.1%) actually did not have low PPVT. So, at a practical level, if decisions were being made about offering ongoing professional monitoring of developmental status, or offering intervention, our model is good at predicting those children who will not have low receptive vocabulary status at age 8. In other words, we know who is less likely to need monitoring or intervention. Admittedly, of the children we predicted at age 4 to not have low receptive vocabulary at age 8, we will have misclassified 5.9% of these who actually go on to have low receptive vocabulary at age 8. This might be deemed a tolerable miss-rate. In contrast though, for the children at age 4 who were predicted to go on to have low receptive vocabulary at age 8, 74.2% of these children would actually go on to be classified as not having low receptive vocabulary at age 8. In other words, it is possible that considerable resources will be deployed towards children who, in all likelihood, won't need intervention. Decisions would need to be made as to whether this represents a tolerable deployment of monitoring and/or intervention to assist the 25.8% of children classified at age 4 who would otherwise have a low receptive vocabulary status at age 8.
We would note that it is possible to select different threshold criteria for classification. Changes in the selection of the threshold criterion produce changes in onward sensitivity and specificity, and these in turn, affect positive and negative predictive values. Relaxing the threshold identifies at age 4 more of the children who actually go on to be classified as having low receptive vocabulary at age 8, but at the consequence of identifying a larger number of ultimately unaffected children. For example, a cut-off of 0.05, increases the sensitivity (92.2%), decreases specificity (28.2%), decreases positive predictive value (15.7%), and increases negative predictive value (94.3%). In contrast, increasing the threshold criterion to 0.20 decreases the sensitivity (50.0%), increases  specificity (86.1%), increases positive predictive value (34.1%), and decreases negative predictive value (92.2%). While our criterion was selected to balance sensitivity and specificity, the establishment of the threshold criterion rests with decisions and values about the cost and effectiveness of monitoring and/or treatment, the outcome benefits, and the consequences of not intervening.

Discussion
Vocabulary acquisition is a major component of language development and a central marker of the semantic system. In this study we describe the extent to which low language ability, as measured by receptive vocabulary, persists or improves beyond toddlerhood, and the extent to which other factors in the child's wider developmental environment are associated with this in the preschool and early school years. To do this, we estimated the extent to which low receptive vocabulary at 4 years was associated with onward low receptive vocabulary at 8 years and estimated the contribution of child, maternal and family risk factors to predicting from age 4 onward low receptive vocabulary at 8 years.
The results show substantial variability in receptive vocabulary performance in the epoch from age 4 to age 8 with weak -at best, modest -contributions from child, maternal and family factors. This finding in children aged 4 to 8 mirrors the existing studies demonstrating high variability in vocabulary acquisition in children in the toddler range. The poor positive predictive value of persistent low language ability from toddlerhood has been attributed to the immaturity of the emergent language system  whereby low language ability is a transient variant of typical language development for most children. The results of our study suggest that, for most children, low receptive language ability at 4 years is also a transient variant of typical language development. Therefore, the catch-up pattern observed in follow-up studies of two-year-olds, was also evident in this study of language outcomes for four-year-olds. Our findings also demonstrate that the configuration of risks for low receptive vocabulary ability from 4-8 years was quite different to the configuration of risks in studies of toddlers. In toddler studies, maternal and family environment risk factors are conspicuous for their absence. Instead, child risk factors dominate the etiological models. These risk factors are subtle and best characterised as nonclinical neurobiological vulnerabilities that result in a slower start in language acquisition [48,49]. In this study, parental and family level risks were predominant, suggesting that these factors are increasingly important over time [50].
There are limitations to these findings. First, in developing our multivariate logistic model, we set an odds ratio of 2 to represent a criterion effect size of substantive interest. This is not a commonly used device, particularly with logistic regression, and other approaches in predictor selection and retention could have been taken. For example, researchers commonly choose to include all statistically significant variables initially screened for subsequent entry to the multivariate model. To test the robustness of our approach, we repeated the analysis and included all statistically significant variables from the bivariate modelling into a multivariate model, and then used a stepwise elimination of non-significant variables until a final model was achieved. The results of this analysis were consistent with the results presented in this paper. There were no substantial changes in the pattern of effects or in the model's predictive utility. To test whether the model's predictive utility could be improved by a more inclusive set of predictors, and by treating predictors as continuous rather than categorical where appropriate we also developed a predictive logistic model using all 29 candidate predictors. This model represented a marginal improvement on the final model presented in this paper, but did not change the pattern of findings in a meaningful fashion.
Second, our modelling only permits examination of the predictive utility of a single language component -semantics, as measured by receptive vocabulary. We accept that a clinical diagnosis of language impairment is based on psychometric assessment of multiple dimensions of language as well as naturalistic observation and information about the child's communication in the home and other environments such as school. We do not recommend our model as a clinical screening tool, but as an illustration of the difficulties in effectively implementing screening in a population representative sample based on a broad range of child, maternal and family characteristics, without extensive and time-consuming testing. Further, different dimensions of language (e.g., grammar vs. semantics) have different growth characteristics that produce different patterns of growth over time. For example, children's acquisition of grammar proceeds in known steps towards the adult grammar. This means that children's performance can be interpreted in relation to progress towards a known developmental endpoint -that is, mastery of the adult grammar [51]. By contrast, receptive vocabulary acquisition does not have a developmental endpoint and can only be referenced to the normative performance of children the same age, as in this study. This raises the question of whether the poor prediction of low language ability and the catchup pattern observed in this study was specific to only the dimension of language we assessed, that is, receptive vocabulary. We think this is unlikely. Findings from a study that comprehensively measured both the semantic and grammar dimensions of language revealed that most children with a history of low language ability at 2 years, met normative expectations on measures of grammar and semantics at 7 years [52]. Notwithstanding this, group differentiation was greater on measures of grammar than receptive vocabulary. Third, it could be argued that interests in and concerns about the persistence of low language ability and its prediction in the period from 4 to 8 focusses ''too late'' in the language development trajectory. And yet, what we demonstrate here is that the high variability in language acquisition and the weak predictive associations about language development empirically documented in infants and toddlers is also being observed in the epoch from 4 years to 8 years. This is substantively important in shaping our thinking about the nature of language variability. It is also the case that what happens in language development from 4 to 8 years is equally important in terms of onward lifecourse outcomes and has a direct bearing on readiness to learn.
Fourth, it could be argued the predictive utility of our models would have been enhanced had we included more extensive measures of child cognitive or language ability at age 4, or if we had combined measures into composites to reduce measurement error. We cannot comment in detail on the predictive utility of measures which were unavailable to us, but would note that the variables included in the LSAC were based upon known risk factors from established longitudinal studies and a consideration of those variables which could meaningfully impact policy interventions [25]. It would not be appropriate to collapse these measures into composites. Further, we would note that although composites can potentially reduce measurement error, the disadvantages of composites include difficulty in differentiating the independent and interactive effects of each component the loss of potentially valuable information [53].
Finally, it could be argued that as we have no measure of any remedial treatment, speech therapy or other interventions which children in our study have received between ages 4 and 8 we understate the efforts of parents and educators in assisting those children who started with low receptive vocabulary. If our model had included measures of intervention between ages 4 and 8, we would likely have a stronger predictive model of receptive vocabulary at age 8. However, whilst this seems a plausible limitation, this study was explicitly aimed at examining predictors from age 4 on age 8 receptive vocabulary. Further, this does not change our thesis that variability in children's language development is an important developmental phenomenon, and that the predictive utility of models of language is under-examined.
Notwithstanding these limitations, what are the implications of these findings?
First, much of the extant literature on language development trajectories is focussed on documenting the typical variability in language growth and assessing its predictors. This work is relatively silent about the utility of these predictors. Some of this reticence reflects methodological constraints which preclude offering these insights. But where there are good longitudinal data, of well described samples and populations, with relatively robust and replicated measures, some advance should be made in extending the implications of the findings towards demonstrating what they might practically mean. How do these associations guide planning, funding, and arranging services and interventions?
In prompting for more of this, we would suggest that the high developmental variability in receptive vocabulary performance from age 4 to age 8 will make accurate identification of children at age 4 for intervention on the basis of their low receptive vocabulary performance and a mix of child, maternal and family risks, challenging. We found the strongest predictor of low receptive vocabulary at age 8 was low receptive vocabulary at age 4. However, despite a moderate odds ratio with vocabulary ability at 8, receptive vocabulary was nonetheless a limited predictor of persistently low receptive vocabulary ability at 8 years. Dale et al., reported comparable findings, in a population level sample of young twins from 2-4 years [54] and 4-12 years [55]. Their conclusion from these studies, consistent with ours, was that prediction of persistent language impairment is not accurate enough to justify screening to identify children who are likely to have future language impairment. This means that identifying preschool age children for targeted vocabulary intervention will mostly include children who move within the typical range of language ability in the early years of school. On the other hand, not all children will catch up and this group of children will likely require ongoing support for language learning through the school years. In addition to low age 4 receptive vocabulary, we found that a well-fitted multivariate model featuring risks of substantive magnitude does not do particularly well in improving prediction of low onward receptive vocabulary ability at age 8. While we are not proposing that the specific model we have fitted here be used for screening general populations, the empirical literature on the prediction of language outcomes for unselected populations of children is not encouraging [56]. Is it likely that we will see models developed with greater predictive utility? What if this is as good as it gets? These are unanswered questions for future studies. Future studies could investigate the classification accuracy of predicting language status at a later age (e.g., age 8) from two or more measures of low language performance at younger ages (e.g., age 4 and age 6), as opposed to a single measure of low language performance at an earlier age (e.g., age 4).
Finally, some readers may be concerned that we are dismissive of the risks for persistent language difficulties. This is far from the case. In a recent Australian study, the financial costs associated with parents accessing health services for pre-schoolers with communication disorders were substantial [57]. In two related studies, substantial over-and under-servicing were identified [58,59]. In an RCT of a population-based early language promotion program, language outcomes for children in the control arm met normative expectations on psychometric tests of expressive and receptive language [60]. These are confronting findings and speak to the need for well-judged approaches to responding to need.
The emergent finding in studies of language growth trajectories is that variability in language growth needs to be seriously considered. Those who start below age expectations may not be those who continue to stay behind age expectations. Children need to be monitored throughout their development. The ''best'' predictor may actually be the persistence of low language performance over time rather than status at a given point in time. Our findings illustrate the tension between stability and change over time. This needs to be considered when observing developmental phenomena, particularly in a population representative sample rather than a sample of children already identified as 'at risk'. Whilst it is difficult to predict with great accuracy which children will have poor language at age 8, it is also clear that some children are exposed to far greater risk than others.

Future research
By school age, English-speaking children have reached adult levels of competence in some dimensions of language (e.g., the speech sound system, clause structure) while other dimensions such as receptive vocabulary ability continue to develop throughout life. Once children start formal schooling, learning to read and write become new dimensions of the language system. While oral language is the substrate for literacy, literacy also facilitates receptive vocabulary knowledge by giving children access to experiences, and words to describe those experiences, beyond the day to day. It will be important for future research to determine the extent to which low receptive vocabulary ability from 4 to 8 years projects to low literacy ability and the configuration of risks for low literacy ability.