Identification of Candidate Genes for Dyslexia Susceptibility on Chromosome 18

Background Six independent studies have identified linkage to chromosome 18 for developmental dyslexia or general reading ability. Until now, no candidate genes have been identified to explain this linkage. Here, we set out to identify the gene(s) conferring susceptibility by a two stage strategy of linkage and association analysis. Methodology/Principal Findings Linkage analysis: 264 UK families and 155 US families each containing at least one child diagnosed with dyslexia were genotyped with a dense set of microsatellite markers on chromosome 18. Association analysis: Using a discovery sample of 187 UK families, nearly 3000 SNPs were genotyped across the chromosome 18 dyslexia susceptibility candidate region. Following association analysis, the top ranking SNPs were then genotyped in the remaining samples. The linkage analysis revealed a broad signal that spans approximately 40 Mb from 18p11.2 to 18q12.2. Following the association analysis and subsequent replication attempts, we observed consistent association with the same SNPs in three genes; melanocortin 5 receptor (MC5R), dymeclin (DYM) and neural precursor cell expressed, developmentally down-regulated 4-like (NEDD4L). Conclusions Along with already published biological evidence, MC5R, DYM and NEDD4L make attractive candidates for dyslexia susceptibility genes. However, further replication and functional studies are still required.

We performed the first two quantitative-trait locus (QTL) based genome-wide linkage screens for DD using 89 United Kingdom (UK) and 119 United States (US) families [23]. Both revealed their most significant QTLs at chromosome 18p11.2 (DYX6 [MIM 606616]), with various reading-related measures. An independent set of 84 UK families replicated this linkage at 18p11.2, and also revealed linkage at 18q12.2.
We then performed a multi-variate linkage study with the original 89 UK families to explore the contribution of six different reading-related traits to DYX6 [24]. Dropping any one of these reading-related measures from the multi-variate model significantly reduced the linkage at DYX6, thereby implying that all measures were influenced by the underlying QTL.
Here, we conduct linkage analysis by genotyping our UK and US families to an approximate density of 1 microsatellite marker every 5 cM. A further 91 UK and 39 US families were also similarly genotyped. These new families continue to support linkage to chromosome 18. Subsequently, we performed a highdensity association study by genotyping nearly 3000 SNPs from 18p11. 31

Ethics Statement
Ethical approval for this study for the UK samples was acquired from the Oxfordshire Psychiatric Research Ethics Committee (OPREC O01.02). Written informed consent to participate in this study was obtained from all individuals prior to taking blood or buccal samples for DNA extraction, with the understanding that they could withdraw from the study at any time. Research plans and consent forms for the US families were reviewed and approved by the Institutional Review Boards of both the University of Colorado and the University of Nebraska Medical Center.

Sample collection
The UK families were identified at clinics and schools of the Berkshire area, and have been detailed previously [23]. Families were ascertained if the proband had a British abilities scales (BAS) single-word reading score .2 standard deviations (SDs) below that predicted by their intelligence quotient (IQ) derived from their verbal and non-verbal reasoning scores and if at least one other sibling had a record of reading disability. Proband exclusion criteria included other disorders such as specific language impairment (SLI [MIM 606711]), autism (MIM 209850) or attention deficit-hyperactivity disorder (ADHD [MIM 143465]). These criteria identified some probands with high IQ scores and BAS scores within the 'normal' range. Therefore, after collecting 173 UK families the criteria were changed such that the proband's difference in their BAS single-word reading score had to be $1 SD below the population mean for their age-band (and not IQ), along with an IQ$90, and the requirement of reading disability in another sibling dropped. A further 116 UK families were then collected with these new criteria. The total sample now comprises 289 UK families, with 685 siblings measured for a series of reading-related quantitative traits.
The 155 US families were drawn from the Colorado Learning Disabilities Research Center (CLDRC) twin study of reading disability. Twin pairs were identified from the records of 27 Colorado school districts and ascertained if at least one member had a school history of reading difficulty. Monozygotic twins were excluded, but additional non-twin siblings were included. Each child was assessed for a series of psychometric measures as detailed previously [23].
Briefly, the psychometric measures include graded tests for single-word reading (READ) and spelling (SPELL), and tests for orthographic coding by an irregular word task (OC-irreg; only UK families) and forced choice task (OC-choice; a pseudohomophone detection task), phoneme awareness (PA), phonological decoding (PD).
We also report a new collection of 317 UK cases of DD recruited through the Dyslexia Research Centre clinics in Oxford and Reading, and the Aston Dyslexia and Development clinic in Birmingham. These cases are between 8 and 18 years old, have a BAS2 single-word reading score #100 (at chronological age) and .1.5 SDs below that predicted by IQ.
Population controls were taken from the Human Random Control (HRC) panel of the European Collection of Cell Cultures (ECACC). We analysed the DNA of 287 unique samples from these cell lines. Three assumptions are made about these controls. Firstly, that ,5% have DD due to the prevalence of this disorder in the UK. Secondly, that they are unrelated to our DD individuals, which is reasonable as they have been randomly ascertained from the UK and Ireland. Thirdly, that they come from the same ethnic origin as our DD samples, which is important to prevent population stratification from affecting our association analyses.

Genotyping
Highly polymorphic microsatellite markers were genotyped by semi-automated fluorescent genotyping techniques with the ABI3700 machine and GenotyperH software from Applied Biosystems as previously described [23].
SNPs were genotyped with GoldenGate assays from IlluminaH [36] or iPLEX assays from SEQUENOMH [37,38], according to the manufacturers' instructions. Two GoldenGate assays of 1,536 SNPs were created for genotyping in a single multi-plex reaction. After amplification, hybridisation and washing steps, the arrays were scanned and analysed to generate genotypes which were then verified by eye. SEQUENOMH's Assay Design software was used to design the PCR and extension primers for each SNP after downloading and processing sequences with Biomart and RealSNP, respectively. Genotypes were called automatically and verified by eye using SEQUENOMH's Typer software.

Gene and SNP Selection
The SNP selection and genotyping were conducted in two stages. Both stages utilised the International HapMap Project (IHMP) genotype data from 30 Centre d'Etude du Polymorphisme Humain (CEPH) trios to guide SNP selection (see Table S1). Our second stage superseded the first as more genotype data were then available from the IHMP, thereby increasing the number of available polymorphisms to test and the resolution of linkage disequilibrium. Exclusion criteria for the IHMP SNPs were deviations from Hardy-Weinberg (H-W) equilibrium (p-value,0.001), low genotype call rates (#50%), Mendelian inheritance errors (.0) and low minimum allele frequency (,5%). HAPLOVIEW [39] created blocks of SNPs in strong LD according to the definition of Gabriel et al. [40], and subsequently selected haplotype-tagging SNPs (htSNPs) within each block to tag all haplotypes $3% frequency. htSNPs from all ''genic blocks'' in the candidate region were selected for genotyping, where we define a ''genic block'' as a Gabriel block covering any part of a gene (including any additional upstream or downstream sequence).
Three genes remained totally uncovered by any block, whilst all others were covered completely or partially by at least one block. Details for all genes in the candidate region were downloaded from the University of California, Santa Cruz (UCSC) Genome Table Browser freezes July 2003 and May 2004, for the first and  second stages, respectively (see Table S2).

Data handling and analysis
The Integrated Genotyping System (IGS) [41] was used to store and check genotypes for Mendelian inheritance problems. MERLIN (1.1.1) [42] was used to detect unlikely double recombinants which might indicate further genotyping errors. PEDSTATS (0.6.9) [43] was used to assess levels of H-W equilibrium. STRUCTURE [44,45] was used to assess population structure within and between sample sets by comparing the genotypes of 28 SNPs at genomic loci unlinked to DD. STRUCTURE was executed with a burn-in length of 1,000,000 followed by 1,000,000 iterations until completion. Analyses were performed within and between each sample set assuming K = 1 sub-populations (i.e. no population stratification), and K = 2 and 3 sub-populations (assuming a model of admixture) and revealed a homogenous ancestry of samples.

Linkage analysis
Multi-point linkage analysis was performed. GENEHUNTER (2.1_r2 beta) [46] was used to apply the traditional Haseman-Elston (HE) sibling-pair squared trait-differences model [47] or a variance components (VC) framework without dominance variance and with a single-trait mean. The DeFries-Fulker (DF) regression technique [48] was applied with scripts and macros for the SAS package [49].

Association analyses
Family-based samples with their quantitative traits were analysed with the 'total association' option within QTDT (2.5.1) [50,51]. The association tests were not adjusted for linkage. All traits were tested against each SNP individually. Qualitative population-based analyses were performed with PLINK (1.01) [52] which supports the genotype and allele count tests. Quantilequantile plots validated these tests (see Figure S1).

Linkage analysis of DD susceptibility on chromosome 18
The 89 UK families and 116 (of the 119) US families used for the original whole genome-wide scans and the 84 UK families used for replication [23], were genotyped with a denser set of microsatellite markers. We detected the same linkages as previously reported (see Figures S2 and S3). Subsequently, we genotyped a third set of 91 UK families and a second set of 39 US families with microsatellite markers to the same high density.
Both new samples reveal linkage at 18q12.2; the 91 UK families with OC-irreg (LOD<1.5; see Figure S2) and the 39 US families with OC-choice (LOD<3.5; see Figure S3). Linkage at 18p11.2 was also observed in the 39 US sample with PD, SPELL and OCchoice (LOD<1.5) and also at 18q21 with PD and READ (LOD.3.5). Combining all 264 UK families together produced linkage at 18q12 with a LOD.2 and at 18p11.2, 18q12.2 and 18q22.3 with a LOD<1.5 with various traits (see Figure S2).
For analysis with the DF regression technique, UK or US families were selected if at least one child scored .1.5 SDs or .2.0 SDs, respectively, below the normative mean for any one of the reading-related traits (see Figure 1). This selection yielded 188 UK families and 133 US families. DF analysis revealed strong linkage This linkage analysis therefore provides supporting evidence of a DD susceptibility gene on chromosome 18. However, we were unable to narrow down the region and instead extended the linkage to the other side of the centromere. We therefore carried out a high-throughput genotyping and association strategy of all genic regions in the broader candidate region spanning 18p11.31 to 18q21.31. A total of 768 samples, including quality control samples, were genotyped in 8696 well plates, which represented the most cost-effective strategy. To enable family-based tests, the 188 UK families that gave strongest linkage to chromosome 18 with the DF analysis were selected for genotyping. Following QC procedures one family was removed. The remaining 187 UK families included 68 from the original 89 families used in the whole genome-wide linkage scan, 53 from the 84 families used in the replication of DYX6, and 66 from 91 families newly reported here.

Association analysis: discovery stage
Altogether, 2,895 SNPs in our candidate region were successfully genotyped in 759 samples, and of these .97% SNPs and .99% samples had a genotype call rate .96%. The 759 samples included 187 successfully genotyped UK families known hence forth as the ''discovery sample'' (see Table 1).
Quantitative association analyses with each of the six quantitative traits were performed on all 2,895 SNPs in the discovery sample (see Figure S4 and Table S3 for the complete results, and  Table S4 for

Association analysis: Replication stage
The 11 most significant SNPs (p-values,0.001) from the discovery stage were selected for replication in independent samples. Another 14 highly ranked SNPs (p-values between 0.001 and 0.002) were also selected if they were compatible with the iPLEX assays. In total, 25 SNPs were selected for further genotyping (see Table 2) in our independent samples consisting of 102 UK families, 152 US families and 317 UK DD cases and 287 UK population controls (see Table 1). Tests in the family samples were performed with QTDT whilst tests in the cases and controls were performed with PLINK (see Tables S5 & S6 for the complete  results).
Significant results were observed in the same direction as the discovery sample for 5 SNPs with the QTDT analysis and 5 SNPs with either of the population-based analyses (p-values,0.05; see Table 3). Of particular note are two SNPs that gave significant results in both the 102 UK families and the case-control samples; rs1299348 within MC5R and rs11873029 within DYM. Also of note are the two SNPs rs8094327 and rs12606138 within NEDD4L that both replicated in the case-control samples.

Discussion
In the present study we confirm linkage to DD on chromosome 18 by genotyping an extended sample of our UK and US families with a denser set of microsatellite markers. By combining all UK or US families, the linkage signals appear broad (.40 Mb) and span the centromere from 18p11.2 to 18q12.2.
We then performed a high-throughput SNP genotyping experiment covering genes from 18p11.31 to 18q21.31 in a subset of 187 UK families. Highly associated SNPs were then genotyped in the remaining independent samples; 102 UK families, 152 US families, 317 UK DD cases and 287 UK controls (see Table 1). We successfully found associations for SNPs within several genes, particularly MC5R, DYM and NEDD4L, with the same trend in independent samples. Between samples, the associations were not always with the same trait or test, and so not all are replications in the purest sense. However, we know from the multivariate linkage analysis that all six traits are influenced by the same underlying QTL(s) on chromosome 18. Furthermore, there is consistency between the observed linkage and association patterns. The linkage signals in both UK and US samples are spread across chromosome 18, spanning some 40 Mb, and accordingly we find associations to genes located along this chromosome (see Figure 2). Hence, it would be attractive to speculate that this spread of associations across the candidate region explains the broad spread of linkages. Indeed, READ gives strongest linkage to 18p11.21 in the discovery sample (see Figure 1), and the genes PTPN2, C18orf1, C18orf15, MC5R and ZNF519 at 18p11.21 each appear strongly associated with READ (see Table S4). Whilst on the q-arm, strongest linkage is at 18q12.2 in the discovery sample with PD, READ, OC-irreg, and PA, and the genes C18orf34 and RIT2, at 18q12.1 and 18q12.3 respectively, are associated strongest for all or most of these traits. Also on the q-arm, at 18q21, are DYM and NEDD4L, and together these are strongly associated to these traits too.
MC5R is a G-protein-coupled 7 transmembrane receptor [53] that binds melanocortins, including the neuropeptides adrenocorticotropic hormone (ACTH) and a-, band c-melanocytestimulating hormones (a-,b-and c-MSH). The melanocortins are involved in skin physiology, behaviour, learning and memory [54,55,56]. ACHT, or peptides derived from it, are implicated in attention, visual attention, analytical thinking, spatial awareness and musical ability [57,58,59,60]. MC5R is a single exon gene less than 1 kb in length. Here, we find association to the SNP rs1299348 within this gene in 3 independent samples. The major allele of ,65% frequency confers risk to DD susceptibility.
DYM is nearly 500 kb long, and mutations in this gene cause Dyggve-Melchior-Clausen syndrome [MIM 223800] [61], characterised by short trunk dwarfism, developmental delay, microcephaly and psychomotor retardation. We find here an association to a single SNP, rs11873029, in the discovery sample. We then found association to a proxy SNP in our independent UK families and UK case and control samples. The two SNPs are found in separate introns of DYM, and the major alleles of ,85% frequency confer risk to DD susceptibility.
NEDD4L has 78% amino acid sequence identity with neural precursor cell expressed, developmentally down-regulated 4 Table 3. Results for any SNPs that replicated in any independent sample. (NEDD4 [MIM 602278]) and is implicated in the regulation of the epithelial sodium channel [62]. A SNP within NEDD4L (rs4149601) has been separately associated to ADHD [63], hypertension and blood pressure [64,65,66,67], and another SNP (rs2288774) has also been associated with ADHD [63]. DD shows strong co-morbidity with ADHD [68] and has also been tentatively associated to low blood pressure [69]. Proxies of these two SNPs that were genotyped in the discovery sample did not reveal association with DD. However, a marked haplotypic diversity has previously been observed within NEDD4L such that opposite alleles of the same SNPs associate with hypertension in different white populations [65,70]. NEDD4L is about 350 kb in length, and we identify here three associated intronic SNPs in our discovery sample that are separated by about 135 kb. Two of these SNPs replicated in our case and control samples (see Table 3). The major alleles, of 70-80% frequency, for each of these SNPs appear to confer risk for DD susceptibility. The power of this study was limited by a relatively small sample size. Another limitation was the failure of some SNPs to genotype and the technical constraints of genotyping other SNPs, as this has led to some genetic variability remaining untested. A further issue not yet addressed is that of multiple-testing. The p-values reported here are all uncorrected for multiple-testing. Given that 2,895 SNPs and six quantitative traits were analysed in the discovery stage analysis, a total of 17,370 tests were performed in the discovery sample (see Table 1). In the follow-up replication stages, 144 tests were performed in the independent UK families, 120 tests in the US families and 48 tests with the independent UK cases and controls. A Bonferroni corrected significance threshold for the discovery stage is therefore 2.87610 26 , and for the replication stages are 3.47610 24 (UK families), 4.17610 24 (US families) and 1.04610 23 (UK cases and controls). None of the SNPs reached this level of significance in either the discovery or replication stages. However, a simple Bonferroni correction is highly conservative here as many of the SNPs are in strong LD with each and the traits themselves are highly correlated.
We also recognise that our association results were not as consistent or significant as the linkage study, and this may in part be due to rare variations present in individual families. However, with our approach we did find consistent association to genetic variants in several independent samples for MC5R, DYM and NEDD4L. Published biological evidence for these genes make them attractive candidates with respect to DD susceptibility. We suggest that further independent samples be tested for these genes.  Figure S1 Quantile-Quantile plots of the association analyses. Quantile-quantile plots were created for the association analyses performed in QTDT and TDT. The left column displays the SNPs from the first stage, the middle displays the SNPs from the second stage, and the right displays both stages combined. The xaxis is the expected test statistic and the y-axis is the observed test statistic.