Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Common Genetic Variants in FOXP2 Are Not Associated with Individual Differences in Language Development

Common Genetic Variants in FOXP2 Are Not Associated with Individual Differences in Language Development

  • Kathryn L. Mueller, 
  • Jeffrey C. Murray, 
  • Jacob J. Michaelson, 
  • Morten H. Christiansen, 
  • Sheena Reilly, 
  • J. Bruce Tomblin


Much of our current knowledge regarding the association of FOXP2 with speech and language development comes from singleton and small family studies where a small number of rare variants have been identified. However, neither genome-wide nor gene-specific studies have provided evidence that common polymorphisms in the gene contribute to individual differences in language development in the general population. One explanation for this inconsistency is that previous studies have been limited to relatively small samples of individuals with low language abilities, using low density gene coverage. The current study examined the association between common variants in FOXP2 and a quantitative measure of language ability in a population-based cohort of European decent (n = 812). No significant associations were found for a panel of 13 SNPs that covered the coding region of FOXP2 and extended into the promoter region. Power analyses indicated we should have been able to detect a QTL variance of 0.02 for an associated allele with MAF of 0.2 or greater with 80% power. This suggests that, if a common variant associated with language ability in this gene does exist, it is likely of small effect. Our findings lead us to conclude that while genetic variants in FOXP2 may be significant for rare forms of language impairment, they do not contribute appreciably to individual variation in the normal range as found in the general population.


Language acquisition requires the interplay of complex biological and behavioural/learning systems, combined with a stimulating and responsive environment where language serves as a tool for social engagement. There is now strong evidence that the neurobiological pathways supporting language learning are genetically influenced (see for instance [1]). Some of the strongest evidence in support of this comes from findings of rare mutations in FOXP2.

FOXP2 was identified via a large multi-generational pedigree—the so-called ‘KE’ family—that appeared to show an unusual autosomal dominant pattern of inheritance for speech and language impairment. Historically, there has been considerable disagreement over how impairments in this family are best characterized. The first published report described “a severe form of developmental verbal apraxia”, since both articulation and expressive language were noted to be affected (p. 352 [2]). The same year Gopnik [3] published work characterising the family’s communication difficulties as “developmental dysphasia” (more commonly known as Specific Language Impairment; p. 715). She described affected family members as “feature-blind”; arguing for a selective grammatical deficit in the use of rule-based morphological paradigms (e.g., the grammatical features that mark tense, number and agreement [3,4]). This was soon contested by evidence showing that affected family members are impaired in aspects of language unrelated to syntax, including phonology and semantics [2,5]. Fletcher [6] proposed a more likely source of the deficits to be in the speech and language production system.

More recent accounts of the KE phenotype have placed greater emphasis on the motor speech aspects of the family’s impairment. The dyspraxic elements first noted by Hurst et al. (1990) affected not just articulation, but also non-linguistic oromotor movements [79]. However, they were best exemplified in speech because of the fine-tuned motor movements necessary for oral language. In addition to oromotor weakness, family members presented with mixed dysarthric features [10]. It has been hypothesized that the expressive language impairments (e.g., phonological and syntactic) seen in this family derive from these lower level deficits in oromotor planning and execution [8,11].

Although the literature has primarily focused on expressive language, the KE phenotype is broader and includes deficits in receptive vocabulary [5], grammatical, and syntactic abilities [7]. There is also evidence of cognitive impairment, with more profound deficits in the verbal domain [7]. Because of the involvement of speech, cognitive and motor impairments, not all family members of the KE family would meet the selection criteria for studies of atypical forms of language development (or SLI), as proposed in Gopnik’s original assessment of the family [3, 4]. The discovery of frank neurological dysfunctions among some family members [12] also runs contrary to the current definiton of the disorder [13]. This might lead us to question the relevance of understanding the genetic underpinnings of this severe phenoype for ‘common forms’ of language impairment (e.g., SLI). “However, (it is also possible that) the identification of a specific candidate gene and mutations … can allow the development of targeted investigations in cellular or animal models, which, in turn, can point to mechanisms that might be relevant to more common forms of language-related conditions” (p. 287; [14]).

The initial linkage study of the KE family mapped FOXP2 to a 5.6-cM region of 7q31 between D7S2459 and D7S643, a region that became known as SPCH1 (MIM 602081; [15]). Linkage analysis of the family, and mapping of a translocation breakpoint in an unrelated child with a similar phenotype, led to the identification of a gene in this region, FOXP2 (forkhead box P2; [15,16]). Affected members of the KE family were found to carry a heterozygous point mutation in exon 14 of the gene that was absent in unaffected relatives [16]. This yielded an arginine-to-histidine substitution, R553H (G -> A transition), altering a key residue (Arg553). Lai et al. (2001) proposed that the KE phenotype is caused by haploinsufficiency of FOXP2 during embryogenesis, leading to the abnormal development of neural structures for speech and language [16].

The FOX genes encode a family of transcription factors with a characteristic winged-helix—or forkhead box (“fox”) DNA-binding domain [17]. They regulate a wide variety of cellular and developmental processes, including some in the central nervous system [18]. FOXP2 is highly conserved across species with only three amino acid changes between mice and humans, two of which have occurred in the human lineage since diverging from the chimpanzee [19, 20]. The gene is organized into 19 exons, three of which are alternatively spliced leading to different isoforms [16, 21]. Exons 12–14 encode the DNA-binding domain necessary for transcription factor function [17, 22].

Since the discovery of the KE family mutation, many cases of de novo and familial mutations in FOXP2 have been reported in the form of point mutations (missense, nonsense and frameshift mutations), small and large scale deletions, sequence alterations; as well as chromosomal alterations, including translocations and genomic copy number variants [2333]. With such heterogeneity, delineating the precise phenotype(s) associated with the gene is challenging. Individuals with a disruption in FOXP2 typically present with a severe motor speech disorder, usually verbal dyspraxia. Beyond that, receptive and/or expressive language and/or cognitive abilities and/or more generalised motor skills may also be affected (for a comprehensive review of singleton and family case studies, see [32]). While motor speech impairments seem to be universal in these cases, language impairments are also common and usually considered a core feature of the phenotype [34].

These reports of speech and language impairments in individuals and families with FOXP2 mutations raise the question as to whether common variants in the gene might be associated with individual differences in the general population. Sequencing of FOXP2 in children with severe dyspraxia has suggested a low prevalence for etiological variants of approximately 2%; [26]. Studies that have screened moderate-sized samples of children with and without language impairment have found no evidence of a common variant associated with language [3537].

Despite the lack of positive findings, it would be unwise to reject the possibility that FOXP2 has a connection to individual differences in language ability on the basis of these mutation searches alone. O’Brien et al. (2003) used sib-pair linkage and family-based association methods to investigate three microsatellites within FOXP2; [36]. They found no evidence of linkage or association to SLI as either a binary or quantitative trait. Further sequencing of exon 14 in a subgroup of the sample showed no evidence of functional mutations. Newbury et al. (2002) used a combination of SNPs and microsatellite markers spanning the coding regions of FOXP2 to investigate quantitative measures of SLI [37]. No mutations were found in the forkhead region of the gene. More recently, however, Rice et al. [38] reported nominally significant associations for four SNPs proximal or within FOXP2 to a general measure of language ability.

A series of genome-wide linkage and association scans have also failed to detect any signal of association to FOXP2 for either typical [39, 40] or impaired language abilities [4152]. Collectively, cohort studies of FOXP2 suggest that common variants are unlikely involved in more ‘common’ forms of developmental language impairment identified via clinical and population-based samples [36, 37]. However, these studies have been limited in number, by relatively small sample sizes, and by low density gene coverage. They have also focussed only on individuals with impaired speech and language abilities, with unaffected family members comprising the control group.

This study differs from previous research in that, as well as including a large sample of children with language impairment, it also contains a large number of individuals with abilities across the normal spectrum. Thus, it is sensitive to the discovery of a quantitative trait locus. Additionally, this study employed a more extensive panel of tag SNPs to cover the linkage disequilibrium (LD) structure of FOXP2 than previous studies, including markers in the promoter region of the gene. By comprehensive SNP genotyping of FOXP2 in a large population-based sample with a continuum of language ability, the current study aimed to address the question of whether common genetic variants in FOXP2 contribute to individual differences in language development.


The primary data for this study came from two earlier studies conducted in Iowa and Illinois. The first group of participants (the Longitudinal cohort) was originally ascertained as part of a cross-sectional study on the prevalence of language disorders in kindergarten [53]. Subsequently, another group (the School-Based cohort) was recruited from a separate study on language abilities in school-aged children. Combined, the total sample comprised 812 children.

All children had been tested for spoken language ability as part of their original study using age-appropriate standardized tests (see Materials and Methods for details). These children represented the full range of spoken language ability, although children with low language abilities were oversampled. As a consequence, the average language ability of the sample in this study was approximately one-third of a standard deviation below the mean (mean Z-score = -.35, SD = 1.10), with a range of -3.35 to 2.59.

Children also provided DNA samples. Thirteen tag SNPs were selected to cover the haplotype block structure of FOXP2, and genotyped using Taqman single SNP assays. Details regarding sample recruitment, assessment of language abilities and genotyping methods are detailed further in the Methods Section.

Tests for association to language ability as a quantitative trait (LCOMP; see Methods) consisted of 13 one-way ANOVAs using Proc GLM within SAS, where the genotype at each tag SNP was treated as a class variable. Table 1 summarizes language ability according to genotype at each tag SNP, and the results of the genotype test for association. Overall, we found no statistically significant association between FOXP2 and language ability (p > .05, Table 2). In one case (rs12155328) the nominal p level approached significance; however the effect size was quite small, and the p level was substantially higher than the .0038 level needed to exceed correction for multiple testing.

Table 1. Means and standard deviations (SD) of language composite scores for tag SNP genotypes within FOXP2 and effect sizes (R2) of genotypes on language in the combined Iowa sample.

Table 2. Means and standard deviations of language composite scores by genotype at rs1916988.

The data above were based on the Longitudinal and School-Based samples combined. Although the phenotype measures overlapped in these two cohorts and prior research based on the Longitudinal sample has demonstrated that all measures for the two cohorts are highly correlated [54], the two groups were ascertained differently. The Longitudinal cohort over-sampled children with low language ability, whereas the School-Based cohort was truly a population sample. These differences resulted in the Longitudinal sample having a lower average language ability level (M = -0.51, SD = 1.11) than the School-Based one (M = -.09, SD = .99). Thus, it remained possible that combining groups might obscure statistically-significant associations.

Therefore, the data were analysed for an interaction of genotype effects at each tag SNP with sample membership. One SNP yielded a significant genotype by phenotype interaction (rs1916988: F(2,772) = 5.76, p = .003) after adjustment for multiple testing (see Table 2). A test for simple effects of genotype by cohort showed a significant genotype effect in the School-Based sample, F = (2, 302) = 4.24, p = .015. There was no significant effect of this SNP in the Longitudinal sample. A comparison of genotype means in the School-Based sample (Table 2) showed that the TT genotype group had significantly lower language abilities than the TC group (p < .05), suggestive of a dominance effect for the C allele. By comparison, in the Longitudinal sample, the TT group averaged higher scores than groups carrying the C allele, although these were non-significant. Thus, the direction of the effect in the two samples was in the opposite direction.

These results leave the status of association between rs1916988 in FOXP2 and individual differences in language ability unclear. The School-Based sample that yielded a significant association had a distribution of language abilities that was very similar to the normative samples used in the design of the designated language tests, whereas the Longitudinal sample comprised an excess of children with poor language abilities. It is therefore possible that the strength association is dependent upon the overall level of language ability in the sample tested.

In order to resolve this ambiguity, we obtained data from a third sample of participants in a longitudinal birth cohort study (Early Language in Victoria Study: ELVS; [55]). ELVS was a population sample assessed for language ability with a subset of the same measures used in the Iowa Longitudinal and School-Based samples.

Mean z-scores for the ELVS’ sample by genotype are shown in Table 2. A test for genotype effects at rs1916988 showed no differences between mean language scores across the three genotype groups, F(2, 305) = 1.23, p = .29. Thus, these data are consistent with the results of the Longitudinal sample. The effect sizes for the ELVS and Longitudinal samples were similar and the direction of the effects, albeit non-significant, was the same. When the Longitudinal and ELVS samples were combined via meta-analysis, the weighted R2 effect size was .086. However, the lower bound of the 95% CI was -0.009 and the upper bound was 0.18. Thus, even in the combined samples, the effect was small and non-significant.

In order to assess whether a combination of SNPs were predictive of the language phenotype in a multivariate setting, we also fit a predictive model using Random Forests (RF; [56]). In essence, this analysis fits decision trees by splitting the data (i.e. the individuals) recursively based on genotypes at the different SNPs. In doing this, it aims to group individuals with similar language scores together. Data are divided into a training set and a test set, resulting in a less biased estimate of the predictive power of the RF. Random Forests have been repeatedly used in such genetic association settings, especially where genetic interactions (epistasis) are of interest [5759]. The advantage to RF is that it looks at all SNPs simultaneously, rather than in isolation, and if there are interactions between SNPs, or other kinds of combinations that are predictive, these will be detected.

When using LCOMP as the response variable (quantitative outcome), the correlation between the prediction on an independent test set (i.e., subjects held out of the training set) and the actual LCOMP values was r = -.0178, 95% CI (-0.09, 0.05), p = 0.61; power = 0.8 for r = 0.115 at α = 0.05). RF also has a built-in measure of variable importance, which can be used as an indicator of how much predictive power a SNP carries alone or in combination (e.g. epistasis) with other SNPs. No SNP had an importance score significantly greater than those obtained through permutation of the data. These results were robust even in the face of RF parameter tuning (RF typically needs little or no parameter tuning for optimal performance). Taken together, these results suggest that even in a multivariate machine learning paradigm, SNPs in FOXP2 have little or no explanatory power for language phenotypes in our sample.


This is the first study to investigate the association of common variants in FOXP2 to individual differences in language ability in a large sample with a range of language abilities. Much of our current knowledge regarding the neural correlates of FOXP2 comes from intensive study of a single multiplex family (the ‘KE’ family) that display an unusual speech and language phenotype due to a missense mutation in the FOX domain. Etiological point mutations and gross chromosomal rearrangements (e.g., deletions and translocations) have also been reported in singletons and small family studies [2333].

A few studies have considered whether common variants in FOXP2 are associated with language impairment (e.g., [37, 38]). However, these have been limited with regards sample size, comprising a relatively small number of affected individuals and their family members. O’Brien et al. (2003) have previously tested for association of common variation within FOXP2 and a sample with a range of language abilities (i.e., the Longitudinal cohort 1 in the current study); however coverage of the gene was limited [36].

In this study, we considered the full range of language abilities existent in the general population of unrelated individuals, and selected tag SNPs to cover the majority of LD structure found in FOXP2. We genotyped 13 common polymorphisms in 812 individuals, testing for association to a quantitative measure of language ability, with null results. One SNP provided evidence of an association in a subgroup of the participants in this study; however, these findings did not replicate. In conjunction with previous research indicating the rare and specific nature of FOXP2 mutations in the etiology of speech and language disorders, these findings lead us to conclude that common variants are unlikely to exert a large effect in typical language development. Using the combined longitudinal and school samples this study was powered to detect a QTL variance of 0.02 for an associated allele with MAF of 0.2 or greater with 80% power [60]. This means we had sufficient power to detect genetic effects responsible for at least 2% of the variance in our composite measure of language ability. Furthermore, this study had the additional advantage of an independent sample to test for replications of any positive findings. Our largest effect size (R2) for the combined sample was 0.001; thus it is possible that small genetic effects from FOXP2 contributing to individual differences in language ability may exist. One possibility is that the SNPs within FOXP2 each contribute some unique effects and that a combination of these effects could be large enough to be detectible; however our use of Random Forest regression analysis did not yield any significant evidence of such effects. If so, these effects are likely very small and would therefore be a part of a large ensemble of polygenetic background for individual differences in language.

In spite of this being one of the largest studies of its kind, we may still have been underpowered to detect common risk variants in FOXP2 of small effect size (i.e., those which affect less than 2% of variance in the composite language measure). This issue could be addressed by screening SNPs in a larger sample size, which would boost power. Also, with the advent of cost-effective whole exome and whole genome sequencing, it should soon be possible to determine the population effect of rare variants (i.e., infrequent alleles of large effect) in FOXP2 for phenotypes involving speech and language.

Nevertheless, FOXP2 has been critical in providing ‘a molecular window’ into the genetic bases of speech and language impairments [34] in that identification of the gene has opened up avenues of investigation into signaling pathways [6164]. In part, this is because FOXP2 serves as a regulatory gene—whose primary role is to modify the timing and expression of downstream genetic targets [17]. As such, it likely represents one of many elements in gene networks involved in speech and language development

This role for FOXP2 as a regulator in a network of genes important for language is demonstrated by evidence showing that up-regulation of FOXP2 coincides with the down regulation of expression in another gene in the 7q region, CNTNAP2 [64, 65]. CNTNAP2 encodes CASPR2, a neurexin found at the nodes of Ranvier in myelinated nerve fibers. It is expressed in the human cerebral cortex, specifically the frontal and temporal lobes and the striatum [66]; regions that are important for language and cognition [67]. Common polymorphisms in CNTNAP2 have been associated with language delay in autism [66] and the general population [68]; and more specifically to phonological memory [65] and reading abilities in language impairment [69, 70].

The null findings from this study may have implications for the study of the evolutionary properties of FOXP2. These data suggest that the mutations in FOXP2 with negative functional consequences may be under considerable selection pressure. Whether this selection is based on poor language or other concomitant functions is not clear. A study by Ayub et al. (2013) investigated whether recent positive selection on FOXP2 is also associated with positive selection on any known target genes [18]. They examined four different populations and found strong evidence for selection in Europeans, but not in the Han Chinese, Japanese or Yoruba populations. This may suggest selection of FOXP2 targets has occurred fairly recently, after the divergence of the populations, from local adaptation.

This study failed to reject the null hypothesis that common polymorphisms in FOXP2 are associated with population differences in language ability, building on previous research by examining coding regions in the 5’ promoter region of the gene that could affect transcription factor binding. However, sequence analysis of FOXP2 indicates a promoter region flanking exon s1 upstream of the gene [21], and it is entirely possible that our approach to genotyping failed to detect a putative signal from this region. Therefore, we cannot exclude the possibility that regulatory processes governing the expression of FOXP2 are important for individual differences in language development. This is important because FOXP2 expression levels in turn affect the expression of putative target genes, including those involved in neurite outgrowth and striatal plasticity [63, 71]. Gene knock-in of the humanized version of FOXP2 to mice has been found to specifically affect cortico-basal ganglia circuits (including the striatum; [72]), and facilitate both declarative and procedural learning [73]—two learning processes thought to be crucial for language acquisition.

Ultimately, the aim of future research into FOXP2 will be to characterize the regulatory networks or pathways of which the gene is a part, the implications of these for cellular and neuronal processes (for example, synaptic plasticity), and the role of these in shaping the mechanisms for language learning.

Materials and Methods

The study was approved by the Institutional Review Board at the University of Iowa, which subscribes to the basic ethical principles underlying the conduct of research involving human subjects, as laid out in "The Belmont Report". Parents provided written consent for their children’s participation in the project and for use of their DNA.

Participants were a sub-set of two larger studies on childhood language development and disorders.

Longitudinal Sample


The Longitudinal cohort (n = 500) was initially ascertained as part of a cross-sectional study on the prevalence of language disorders in kindergarten (7,218 participants screened; [53]); and were subsequently enrolled in a longitudinal study of outcomes in children with and without language impairment (see [74]). All children in this sample were mono-English speakers, had normal hearing and no reported neurodevelopmental disabilities. Because the longitudinal study was concerned with language impairment, it was intentionally designed to oversample for children with poor language abilities. To correct for this in the current study, we employed a weighting system. Children were assigned a weight value that represented the reciprocal of the probability that the child would be sampled from the original population. Children with high probabilities were given low weighting values, and children with low probabilities were given high values. This has the effect of reducing the contribution of children with language impairment to the study norm, and means the study sample approximates the original cross-section sample from which it was drawn (see [54] for details of this weighting method).

Language Phenotype.

The phenotype employed in the Longitudinal sample (see Table 2) is based on a scheme proposed by Tomblin et al. (1996; [75]). It comprised five subtests from a standardized language measure, the Test of Language Development-2:P (TOLD-2:P; [76]), and a narrative production and comprehension screen [77]. The subtests were selected to represent norm-based performance across three domains (vocabulary, grammar, and narration) and two modalities (comprehension and expression) of language. Raw scores were converted into standard scores based upon local norms [75] and combined to form an overall language composite. Factor analysis of these five measures of language showed that a single factor accounted for co-variance among the measures [78]. Thus, a composite score can be used as an appropriate representation of the language phenotype. This has the advantage of limiting the number of inferential tests, and enhancing reliability.

Participants also completed the Block Design and Picture Completion test of the Wechsler Preschool and Primary Scale of Intelligence (WPPSI; [79]). These tests of nonverbal (or performance) IQ were chosen to prevent confound with language abilities, as assessed by verbal and total IQ scores. Any proband with a nonverbal IQ <70 was excluded from the study on the basis of intellectual disability.

School Sample


In addition to the prevalence/longitudinal sample described above, a separate group of participants (n = 318) were recruited in 2007/2008 from a study on language abilities in school-aged children.

Language Phenotype.

Children in participating schools in grades one to four were screened using the verbal subscales of the Iowa Tests of Basic Skills [80], which have been found by our laboratory to be good predictors of receptive and expressive language abilities. Children with scores suggestive of poor language abilities, along with a random sample likely to have normal language, were then administered a more comprehensive test battery for their age (see Table 3). Again, all children were tested for normal hearing and according to parent report had no neurodevelopmental disability. The assessment of language ability in the School-Based sample paralleled that of the Longitudinal sample, although specific measures were changed to reflect the different age levels of participants [81, 82]. A composite language score (LCOMP) was derived in the same way as for the Longitudinal sample, although scores were not weighted. Similarly, participants also completed a nonverbal IQ test [83]. Again, any proband with a nonverbal IQ <70 was excluded from the study on the basis of intellectual disability.

ELVS Sample

Participants and Phenotypes.

Participants in the ELVS sample were part of a birth cohort (Early Language in Victoria Study, N = 1,910) recruited in and around the metropolitan area of Melbourne, Australia. Data for the current study was obtained when children were around seven years of age. Consent for participation was obtained from the parents of all children and the children also assented to participate. As a part of the 7-year wave of assessment, participants provided DNA from saliva samples. As with the Iowa sample, all children were of European ancestry and had no developmental disabilities or hearing loss. Measures of language ability were age appropriate measures of listening and speaking similar to those used in the Iowa Longitudinal and School-Based samples (Table 3; [82, 84]). Again, a composite score was derived to represent the children’s overall language ability. DNA and language phenotypes were available for 308 participants in the current study.

Participants from all cohorts were monolingual speakers of English with normal hearing and without any comorbid neurodevelopmental disorders (e.g., autism), based on parental report.

DNA Processing and Genotyping

DNA for the Iowa cohorts was obtained for 818 probands from blood, buccal swabs and saliva, and processed using standard protocols. DNA for the ELVS’ group (n = 308) comprised saliva samples only. Briefly, DNA was extracted from buccal swabs using procedures described in Richards et al. [85], and from whole blood using a modified version of Qiagen’s DNA Blood Maxi Kit (Qiagen, Inc., Valencia, CA). Samples derived from saliva were processed using Oragene’s standard 0.5mL and 4mL protocols (DNA Genotek, Ontario, Canada).

Single nucleotide polymorphisms (SNPs) were selected to cover the haplotype block structure of FOXP2, including exonic regions of high linkage disequilibrium in the 5’ promoter and 3’ ends of the gene (Fig 1; [8689]). We included variants in the promoter region of the gene because these can result in different isoforms, which affect the amount and timing of protein production during gene expression. Sequence analyses have provided evidence for at least one promoter region flanking exon s1 of the gene, located more than 300kb 5’ to exon s1 as first described by Bruce and Margolis [23].

Note: LD structure from Base-pair location is based on the National Center for Biotechnology Institute db SNP Build 131 of the human genome (NCBI dbSNP 131).

SNPs were selected on the basis of minor allele frequency (MAF) greater than 0.2 in the CEPH population (Utah Residents with Northern and Western European Ancestry; Table 4 The International HapMap Project). Selection of the correct reference population is an important design consideration because alleles and haplotypes vary by ethnicity [90]. The rationale for using a minimum MAF of 0.2 meant we should have been able to detect a QTL variance of 0.02 for an associated allele with MAF of 0.2 or greater with 80% power in our study population. Exceptions to this are rs2244419 and rs2396720, which have lower MAFs, but were chosen only after assays in that region of linkage disequilibrium (LD) failed quality control (QC) standards. A senior geneticist, with extensive experience in this area, approved all SNP selections.

Table 4. SNP location, minor allele frequency and call rates for markers in FOXP2.

Because allele frequencies can vary by population, we aimed to minimize potential confounds arising from genetic ancestry by genotyping only samples from Caucasian individuals. Ethnicity was determined via parental report. Allelic variation was determined via the Sequence Detection Systems 2.2 software (SDS, Applied Biosystems). Genotype data was uploaded into a Progeny database (Progeny Software, LLC, South Bend, IN) and integrated with phenotypic information via Microsoft Access.

Quality Control and Statistical Analyses

DNA samples can vary in both quality and concentration, affecting both genotype call rate and accuracy. DNA that is poor quality results in low call rates and inaccuracies in genotyping. Initial quality control steps included both biological and technological quality checks. Prior to genotyping, biological samples were quantified using The Thermo Scientific NanoDrop 1000 Spectrophotometer. Samples that were suspected to be contaminated, and those of low grade quality, were excluded from the study. Taqman Pre-designed Genotyping Assays (Applied Biosystems, Foster City, CA, USA) were tested prior to use with CEPH control plates.

Genotyping was performed using Taqman Pre-designed Genotyping Assays under standard reaction conditions (Applied Biosystems, Foster City, CA, USA). Genotypes for each SNP and each individual were called using the algorithm in SDS, which is based on the relative signal intensity of each allele for each SNP. Plots were then inspected manually by a lab technician with special expertise in this area. Low quality genotype calls were excluded from data analysis. There was no reassignment of SNP genotypes.

Further steps in quality control were performed using PLINK (v1.07; [91]). The first of these involved calculating SNP call rates. The call rate per SNP is the percentage of individuals whose genotypes are called for a given SNP. It is an important part of quality control because low call rates can lead to inaccuracies in genotype calls.

Table 4 details chromosomal location, alleles, minor allele frequencies, and call rates for the SNPs genotyped in this study. In general, call rates were higher in the School-Based Sample than the Longitudinal Sample. With the exception of rs12155328 in the Longitudinal cohort, all SNP call rates were ≥ 90% (see Table 4 for call rates of individual SNPs).

Another indicator of data quality is heterozygosity. An excessive or reduced proportion of heterozygous genotypes may be indicative of sample contamination or of inbreeding. The heterozygosity of samples was in this study .40, compared to .41 in the reference dataset (CEU population, NCBI Genome Build 131), which is not statistically significant.

Hardy Weinberg Equilibrium (HWE) may also be used as a measure of the fidelity of genotyping. Genotyping error is indicated by SNPs that show significant deviation from HWE, based on a pre-specified level of significance (p ≤ .001). However, such deviation may also be a signal of genetic association. For this reason, HWE was tested only in the control samples. There was no significant deviation from HWE in any of the cohorts.

Tests for association to language ability as a quantitative trait comprised 13 one-way ANOVAs using Proc GLM within SAS, where the genotype at each tag SNP was treated as a categorical variable. In addition to performing univariate statistical association tests, we also examined whether the genotyped SNPs were predictive of language phenotypes in a multivariate setting. It is conceivable that SNPs interact or otherwise show combined predictive power that might not be uncovered by traditional tests of association. To examine this possibility, we used Random Forests [56], a machine learning approach that fits an ensemble of decision trees to the data. In this setting, the SNP genotypes were the predictor variables, and the quantitative LCOMP trait was used as the response or outcome variable. These analyses were performed using R ( version 3.2.3 and the R randomForest package version 4.6–12.


We thank Jonathan Bjork for his assistance in the preparation of this study.

Author Contributions

Conceived and designed the experiments: KLM JBT JCM MHC. Performed the experiments: KLM. Analyzed the data: KLM JBT JJM. Contributed reagents/materials/analysis tools: JBT JCM JJM. Wrote the paper: KLM JBT JCM MHC JJM. Provided cohort data: SR.


  1. 1. Graham SA, Fisher SE. Decoding the genetics of speech and language. Curr Opin Neurobiol. 2013;23(1):43–51. pmid:23228431
  2. 2. Hurst JA, Baraitser M, Auger E, Graham F, Norell S. An extended family with a dominantly inherited speech disorder. Dev Med Child Neurol. 1990;32(4):352–5. pmid:2332125
  3. 3. Gopnik M. Feature-blind grammar and dysphasia. Nature. 1990;344(6268):715-.
  4. 4. Gopnik M. Feature blindness: A case study. Language acquisition. 1990;1(2):139–64.
  5. 5. Vargha-Khadem F, Passingham RE. Speech and language defects. Nature. 1990;346(6281):226. pmid:2374587
  6. 6. Fletcher P. Speech and language defects. Nature. 1990;346(6281):226. pmid:2374587
  7. 7. Vargha-Khadem F, Watkins K, Alcock K, Fletcher P, Passingham RE. Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Academy of Sciences. 1995;92(3):930–3.
  8. 8. Alcock KJ, Passingham RE, Watkins KE, Vargha-Khadem F. Oral dyspraxia in inherited speech and language impairment and acquired dysphasia. Brain and Language. 2000;75(1):17–33. pmid:11023636
  9. 9. Vargha-Khadem F, Watkins KE, Price CJ, Ashburner J, Alcock KJ, Connelly A, et al. Neural basis of an inherited speech and language disorder. Proc Natl Acad Sci U S A. 1998;95(21):12695–700. pmid:9770548
  10. 10. Liégeois F, Morgan A, Connelly A, Vargha-Khadem F. Endophenotypes of FOXP2: Dysfunction within the human articulatory network. Europ J Paediatr Neurol. 2011;15(4):283–8.
  11. 11. Watkins KE, Gadian DG, Vargha-Khadem F. Functional and structural brain abnormalities associated with a genetic disorder of speech and language. Am J Hum Genet. 1999;65(5):1215. pmid:10521285
  12. 12. Liégeois F, Baldeweg T, Connelly A, Gadian DG, Mishkin M, Vargha-Khadem F. Language fMRI abnormalities associated with FOXP2 gene mutation. Nature neuroscience. 2003;6(11):1230–7. pmid:14555953
  13. 13. Leonard LB. Children with specific language impairment: MIT press; 2014.
  14. 14. Newbury DF, Monaco AP, Paracchini S. Reading and Language Disorders: The Importance of Both Quantity and Quality. Genes. 2014;5(2):285–309. pmid:24705331
  15. 15. Fisher SE, Vargha-Khadem F, Watkins KE, Monaco AP, Pembrey ME. Localisation of a gene implicated in a severe speech and language disorder. Nat Genet. 1998;18(2):168–70. pmid:9462748
  16. 16. Lai CSL, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature. 2001;413(6855):519–23. pmid:11586359
  17. 17. Shu W, Yang H, Zhang L, Lu MM, Morrisey EE. Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lung and act as transcriptional repressors. J Biol Chem. 2001;276(29):27488–97. pmid:11358962
  18. 18. Ayub Q, Yngvadottir B, Chen Y, Xue Y, Hu M, Vernes SC, Fisher SE, Tyler-Smith C. FOXP2 targets show evidence of positive selection in European populations. The American Journal of Human Genetics. 2013 May 2;92(5):696–706. pmid:23602712
  19. 19. Enard W, Przeworski M, Fisher SE, Lai CSL, Wiebe V, Kitano T, et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature. 2002;418(6900):869–72. pmid:12192408
  20. 20. Enard W. FOXP2 and the role of cortico-basal ganglia circuits in speech and language evolution. Curr Opin Neurobiol. 2011;21(3):415–24. pmid:21592779
  21. 21. Bruce HA, Margolis RL. FOXP2: novel exons, splice variants, and CAG repeat length stability. Hum Genet. 2002;111(2):136–44. pmid:12189486
  22. 22. Li S, Weidenfeld J, Morrisey EE. Transcriptional and DNA binding activity of the Foxp1/2/4 family is modulated by heterotypic and homotypic protein interactions. Molecular and Cellular Biology. 2004;24(2):809–22. pmid:14701752
  23. 23. Feuk L, Kalervo A, Lipsanen-Nyman M, Skaug J, Nakabayashi K, Finucane B, et al. Absence of a Paternally Inherited FOXP2 Gene in Developmental Verbal Dyspraxia. The American Journal of Human Genetics. 2006;79(5):965–72. pmid:17033973
  24. 24. Zeesman S, Nowaczyk MJM, Teshima I, Roberts W, Cardy JO, Brian J, et al. Speech and language impairment and oromotor dyspraxia due to deletion of 7q31 that involves FOXP2. American Journal of Medical Genetics Part A. 2006;140(5):509–14. pmid:16470794
  25. 25. Lennon PA, Cooper ML, Peiffer DA, Gunderson KL, Patel A, Peters S, et al. Deletion of 7q31.1 supports involvement of FOXP2 in language impairment: Clinical report and review. American Journal of Medical Genetics Part A. 2007;143A(8):791–8. pmid:17330859
  26. 26. MacDermot KD, Bonora E, Sykes N, Coupe AM, Lai CSL, Vernes SC, et al. Identification of FOXP2 truncation as a novel cause of developmental speech and language deficits. The American Journal of Human Genetics. 2005;76(6):1074–80. pmid:15877281
  27. 27. Palka C, Alfonsi M, Mohn A, Cerbo R, Franchi PG, Fantasia D, et al. Mosaic 7q31 Deletion Involving FOXP2 Gene Associated With Language Impairment. Pediatrics. 2012;129(1):e183–e8. pmid:22144704
  28. 28. Tomblin JB, O'Brien M, Shriberg LD, Williams C, Murray J, Patil S, et al. Language features in a mother and daughter of a chromosome 7; 13 translocation involving FOXP2. Journal of Speech, Language and Hearing Research. 2009;52(5):1157.
  29. 29. Rice GM, Raca G, Jakielski KJ, Laffin JJ, Iyama‐Kurtycz CM, Hartley SL, et al. Phenotype of FOXP2 haploinsufficiency in a mother and son. American journal of medical genetics Part A. 2012;158(1):174–81.
  30. 30. Žilina O, Reimand T, Zjablovskaja P, Männik K, Männamaa M, Traat A, et al. Maternally and paternally inherited deletion of 7q31 involving the FOXP2 gene in two families. American journal of medical genetics Part A. 2012;158(1):254–6.
  31. 31. Laffin JJS, Raca G, Jackson CA, Strand EA, Jakielski KJ, Shriberg LD. Novel candidate genes and regions for childhood apraxia of speech identified by array comparative genomic hybridization. Genet Med [Internet]. 2012. Available from:
  32. 32. Turner SJ, Hildebrand MS, Block S, Damiano J, Fahey M, Reilly S, et al. Small intragenic deletion in FOXP2 associated with childhood apraxia of speech and dysarthria. American journal of medical genetics Part A. 2013;161(9):2321–6.
  33. 33. Roll P, Vernes SC, Bruneau N, Cillario J, Ponsole-Lenfant M, Massacrier A, et al. Molecular networks implicated in speech-related disorders: FOXP2 regulates the SRPX2/uPAR complex. Hum Mol Genet. 2010;19(24):4848–60. pmid:20858596
  34. 34. Fisher SE, Scharff C. FOXP2 as a molecular window into speech and language. Trends Genet. 2009;25(4):166–77. pmid:19304338
  35. 35. Meaburn E, Dale PS, Craig I, Plomin R. Language-impaired children: No sign of the FOXP2 mutation. Neuroreport. 2002;13(8):1075. pmid:12060812
  36. 36. O'Brien EK, Zhang X, Nishimura C, Tomblin JB, Murray JC. Association of specific language impairment (SLI) to the region of 7q31. Am J Hum Genet. 2003;72(6):1536–43. pmid:12721956
  37. 37. Newbury DF, Bonora E, Lamb JA, Fisher SE, Lai CSL, Baird G, et al. FOXP2 Is Not a Major Susceptibility Gene for Autism or Specific Language Impairment. The American Journal of Human Genetics. 2002;70(5):1318–27.
  38. 38. Rice ML, Smith SD, Gayán J. Convergent genetic linkage and associations to language, speech and reading measures in families of probands with Specific Language Impairment. Journal of Neurodevelopmental Disorders. 2009;1(4):264–82. pmid:19997522
  39. 39. Luciano M, Evans D, Hansell N, Medland S, Montgomery G, Martin N, et al. A genome‐wide association study for reading and language abilities in two population cohorts. Genes, Brain and Behavior. 2013;12(6):645–52.
  40. 40. Harlaar N, Meaburn EL, Hayiou-Thomas ME, Davis OS, Docherty S, Hanscombe KB, et al. Genome-Wide Association Study of Receptive Language Ability of 12-Year-Olds. Journal of Speech, Language, and Hearing Research. 2014;57(1):96–105. pmid:24687471
  41. 41. The SLI Consortium. A genomewide scan identifies two novel loci involved in specific language impairment. The American Journal of Human Genetics. 2002;70(2):384–98. pmid:11791209
  42. 42. The SLI Consortium. Highly significant linkage to the SLI1 locus in an expanded sample of individuals affected by specific language impairment. The American Journal of Human Genetics. 2004;74(6):1225–38. pmid:15133743
  43. 43. Bartlett CW, Flax J, Logue MW, Vieland VJ, Bassett A, Tallal P, et al. A Major Susceptibility Locus for Specific Language Impairment Is Located on 13q21. The American Journal of Human Genetics. 2002;71(1):45–55. pmid:12048648
  44. 44. Bartlett CW, Flax JF, Logue MW, Smith BJ, Vieland VJ, Tallal P, et al. Examination of Potential Overlap in Autism and Language Loci on Chromosomes 2, 7, and 13 in Two Independent Samples Ascertained for Specific Language Impairment. Hum Hered. 2004;57(1):10–20. pmid:15133308
  45. 45. Nudel R, Simpson N, Baird G, O'Hare A, Conti‐Ramsden G, Bolton P, et al. Genome‐wide association analyses of child genotype effects and parent‐of‐origin effects in specific language impairment. Genes, Brain and Behavior. 2014;13(4):418–29.
  46. 46. Eicher J, Powers N, Miller L, Akshoomoff N, Amaral D, Bloss C, et al. Genome‐wide association study of shared components of reading disability and language impairment. Genes, Brain and Behavior. 2013;12(8):792–801.
  47. 47. Monaco AP. Multivariate Linkage Analysis of Specific Language Impairment (SLI). Ann Hum Genet. 2007;71(5):660–73.
  48. 48. Addis L, Friederici AD, Kotz SA, Sabisch B, Barry J, Richter N, et al. A locus for an auditory processing deficit and language impairment in an extended pedigree maps to 12p13. 31‐q14. 3. Genes, Brain and Behavior. 2010;9(6):545–61.
  49. 49. Falcaro M, Pickles A, Newbury DF, Addis L, Banfield E, Fisher SE, et al. Genetic and phenotypic effects of phonological short-term memory and grammatical morphology in specific language impairment. Genes, Brain and Behavior. 2008;7(4):393–402.
  50. 50. Gialluisi A, Newbury DF, Wilcutt EG, Olson RK, DeFries JC, Brandler WM, et al. Genome‐wide screening for DNA variants associated with reading and language traits. Genes, Brain and Behavior. 2014;13(7):686–701.
  51. 51. Pourcain B, Cents RA, Whitehouse AJ, Haworth CM, Davis OS, O’Reilly PF, et al. Common variation near ROBO2 is associated with expressive vocabulary in infancy. Nature communications. 2014;5.
  52. 52. Wiszniewski W, Hunter JV, Hanchard NA, Willer JR, Shaw C, Tian Q, et al. TM4SF20 Ancestral Deletion and Susceptibility to a Pediatric Disorder of Early Language Delay and Cerebral White Matter Hyperintensities. The American Journal of Human Genetics. 2013;93(2):197–210. pmid:23810381
  53. 53. Tomblin JB, Records NL, Buckwalter P, Zhang X, Smith E, O'Brien M. Prevalence of specific language impairment in kindergarten children. Journal of Speech, Language, and Hearing Research. 1997 Dec 1;40(6):1245–60. pmid:9430746
  54. 54. Tomblin JB. General Design and Methods. In: Tomblin JB, Nippold MA, editors. Understanding individual differences in language development across the school years. New York: Psychology Press; 2014. p. 11–46.
  55. 55. McKean C, Mensah FK, Eadie P, Bavin EL, Bretherton L, Cini E, Reilly S. Levers for Language Growth: Characteristics and Predictors of Language Trajectories between 4 and 7 Years. PloS one. 2015 Aug 4;10(8):e0134251. pmid:26241892
  56. 56. Brieman L. Random Forests. Machine Learning. 2001 Oct;45(1):5–32.
  57. 57. Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics. 2012 Jun 30;99(6):323–9. pmid:22546560
  58. 58. Goldstein BA, Polley EC, Briggs F. Random forests for genetic association studies. Statistical applications in genetics and molecular biology. 2011 Jan 1;10(1).
  59. 59. Michaelson JJ, Alberts R, Schughart K, Beyer A. Data-driven assessment of eQTL mapping methods. BMC genomics. 2010 Sep 17;11(1):502
  60. 60. Purcell S, Cherny S, Sham P. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics. 2003;19(1):149–50. pmid:12499305
  61. 61. Konopka G, Friedrich T, Davis-Turak J, Winden K, Oldham Michael C, Gao F, et al. Human-Specific Transcriptional Networks in the Brain. Neuron. 2012;75(4):601–17. pmid:22920253
  62. 62. Spiteri E, Konopka G, Coppola G, Bomar J, Oldham M, Ou J, et al. Identification of the Transcriptional Targets of FOXP2, a Gene Linked to Speech and Language, in Developing Human Brain. The American Journal of Human Genetics. 2007;81(6):1144–57. pmid:17999357
  63. 63. Vernes SC, Oliver PL, Spiteri E, Lockstone HE, Puliyadi R, Taylor JM, et al. Foxp2 regulates gene networks implicated in neurite outgrowth in the developing brain. PLoS Genet. 2011;7(7):e1002145. pmid:21765815
  64. 64. Vernes SC, Spiteri E, Nicod J, Groszer M, Taylor JM, Davies KE, et al. High-throughput analysis of promoter occupancy reveals direct neural targets of FOXP2, a gene mutated in speech and language disorders. Am J Hum Genet. 2007;81(6):1232–50. pmid:17999362
  65. 65. Vernes SC, Newbury DF, Abrahams BS, Winchester L, Nicod J, Groszer M, et al. A functional genetic link between distinct developmental language disorders. N Engl J Med. 2008;359(22):2337–45. pmid:18987363
  66. 66. Alarcón M, Abrahams BS, Stone JL, Duvall JA, Perederiy JV, Bomar JM, et al. Linkage, Association, and Gene-Expression Analyses Identify CNTNAP2 as an Autism-Susceptibility Gene. The American Journal of Human Genetics. 2008;82(1):150–9. pmid:18179893
  67. 67. Lieberman P. On the nature and evolution of the neural bases of human language. American Journal of Physical Anthropology. 2002;119(S35):36–62.
  68. 68. Whitehouse AJO, Bishop DVM, Ang Q, Pennell C, Fisher SE. CNTNAP2 variants affect early language development in the general population. Genes, Brain and Behavior. 2012;11(4):501-.
  69. 69. Newbury DF, Paracchini S, Scerri TS, Winchester L, Addis L, Richardson AJ, et al. Investigation of Dyslexia and SLI Risk Variants in Reading- and Language-Impaired Subjects. Behavior Genetics. 2011;41(1):90–104. pmid:21165691
  70. 70. Peter B, Raskind W, Matsushita M, Lisowski M, Vu T, Berninger V, et al. Replication of CNTNAP2 association with nonword repetition and support for FOXP2 association with timed reading and motor activities in a dyslexia family sample. Journal of Neurodevelopmental Disorders. 2011;3(1):39–49. pmid:21484596
  71. 71. French CA, Jin X, Campbell T, Gerfen E, Groszer M, Fisher S, et al. An aetiological Foxp2 mutation causes aberrant striatal activity and alters plasticity during skill learning. Mol Psychiatry. 2011;17(11):1077–85. pmid:21876543
  72. 72. Reimers-Kipping S, Hevers W, Pääbo S, Enard W. Humanized Foxp2 specifically affects cortico-basal ganglia circuits. Neuroscience. 2011;175:75–84. pmid:21111790
  73. 73. Schreiweis C, Bornschein U, Burguière E, Kerimoglu C, Schreiter S, Dannemann M, et al. Humanized Foxp2 accelerates learning by enhancing transitions from declarative to procedural performance. Proceedings of the National Academy of Sciences. 2014;111(39):14253–8.
  74. 74. Tomblin JB, Nippold MA. Understanding individual differences in language development across the school years: Psychology Press; 2014.
  75. 75. Tomblin JB, Records NL, Zhang X. A system for the diagnosis of specific language impairment in kindergarten children. Journal of Speech, Language and Hearing Research. 1996;39(6):1284.
  76. 76. Newcomer PL, Hammill DD. Test of Language Development-2: Primary: Pro-ed Austin, TX; 1988.
  77. 77. Culatta B, Page JL, Ellis J. Story retelling as a communicative performance screening tool. Language, Speech, and Hearing Services in Schools. 1983;14:66–74.
  78. 78. Tomblin JB, Zhang X. The dimensionality of language ability in school-age children. Journal of Speech, Language, and Hearing Research. 2006 Dec 1;49(6):1193–208. pmid:17197490
  79. 79. Weschler D. Wechsler Preschool and Primary Scale of Intelligence. San Antonio, TX: Pearson; 1967.
  80. 80. Hoover H, Dunbar S, Frisbie D, Oberley K, Ordman V, Naylor R, et al. The Iowa Tests: Interpretive Guide for Teachers and Counselors, Form A, Levels 5–8. Itasca, IL: Riverside Publishing; 2001.
  81. 81. Dunn LM, Dunn LM. Peabody Picture Vocabulary Test-Revised. Circle Pines, MN: American Guidance Service; 1981.
  82. 82. Wiig E, Secord W, Semel E. Clinical Evaluation of Language Fundamentals-Preschool. San Antonio, TX: Psychological Corporation; 1992.
  83. 83. Wechsler D. Wechsler Intelligence Scales for Children-III. San Antonio, TX: Pearson; 1991.
  84. 84. Dunn LM, Dunn LM. Peabody Picture Vocabulary Test-4th Edn. San Antonio, TX: Pearson; 2007
  85. 85. Richards B, Skoletsky J, Shuber AP, Balfour R, Stern RC, Dorkin HL, et al. Multiplex PCR amplification from the CFTR gene using DNA prepared from buccal brushes/swabs. Hum Mol Genet. 1993;2(2):159–63. pmid:7684637
  86. 86. Gibbs RA, Belmont JW, Hardenbol P, Willis TD, Yu F, Yang H, et al. The international HapMap project. Nature. 2003;426(6968):789–96. pmid:14685227
  87. 87. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome research. 2002;12(6):996–1006. pmid:12045153
  88. 88. Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap project web site. Genome research. 2005;15(11):1592–3. pmid:16251469
  89. 89. Karolchik D, Hinrichs AS, Kent WJ. The UCSC genome browser. Current protocols in bioinformatics. 2009:1.4. 1–4.33.
  90. 90. Jorgensen TJ, Ruczinski I, Kessing B, Smith MW, Shugart YY, et al. (2009) Hypothesis-Driven Candidate Gene Association Studies: Practical Design and Analytical Considerations. American Journal of Epidemiology 170: 986–993. pmid:19762372
  91. 91. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007;81.