CELSR2 is a candidate susceptibility gene in idiopathic scoliosis

A Swedish pedigree with an autosomal dominant inheritance of idiopathic scoliosis was initially studied by genetic linkage analysis, prioritising genomic regions for further analysis. This revealed a locus on chromosome 1 with a putative risk haplotype shared by all affected individuals. Two affected individuals were subsequently exome-sequenced, identifying a rare, non-synonymous variant in the CELSR2 gene. This variant is rs141489111, a c.G6859A change in exon 21 (NM_001408), leading to a predicted p.V2287I (NP_001399.1) change. This variant was found in all affected members of the pedigree, but showed reduced penetrance. Analysis of tagging variants in CELSR1-3 in a set of 1739 Swedish-Danish scoliosis cases and 1812 controls revealed significant association (p = 0.0001) to rs2281894, a common synonymous variant in CELSR2. This association was not replicated in case-control cohorts from Japan and the US. No association was found to variants in CELSR1 or CELSR3. Our findings suggest a rare variant in CELSR2 as causative for idiopathic scoliosis in a family with dominant segregation and further highlight common variation in CELSR2 in general susceptibility to idiopathic scoliosis in the Swedish-Danish population. Both variants are located in the highly conserved GAIN protein domain, which is necessary for the auto-proteolysis of CELSR2, suggesting its functional importance.


Introduction
Idiopathic scoliosis is the most common spinal deformity manifesting in children and adolescents, and is characterised by an abnormal structural curvature of the spine. The prevalence of idiopathic scoliosis is approximately 2-3% worldwide [1,2]. The cause of this disorder remains elusive, but studies on twins have shown a heritability of approx. 40%, indicating the importance of genetic factors [3].
Genome wide association studies have identified several common genetic variants that modulate the susceptibility to this disorder [4][5][6][7] and linkage studies and exome sequencing in families with a high burden of idiopathic scoliosis have suggested some rare variants that modulate susceptibility to this disorder [8][9][10][11][12][13]. The function of the associated genes/variants is in most cases unclear and their role in the pathogenesis of idiopathic scoliosis unknown. It is furthermore clear that much of the underlying genetic risk factors and mechanisms still remain to be identified.
The aim of the current study was to identify the hypothesized single gene underlying an apparently dominant form of idiopathic scoliosis in a family from Sweden. An initial linkage analysis was combined with subsequent exome sequencing to enable prioritization of genome regions for risk variant search. Our analysis highlighted a rare non-synonymous variant in CELSR2 as a plausible idiopathic scoliosis risk variant. The identification of CELSR2 and understanding how it might contribute to the risk of idiopathic scoliosis may shed further light on the pathways and mechanisms involved in the pathogenesis of idiopathic scoliosis. US case-control cohort. The US case-control dataset consisted of an independent cohort of 9,312 Caucasian subjects (1,360 cases and 7,952 controls). Case subjects were recruited at Texas Scottish Rite Hospital for Children in Dallas, Texas as previously described [6] and genotyped using Illumina HumanCoreExome Beadchip array. Informed consents were obtained from all research subjects. The study was approved by the Institutional Review Board of the University of Texas Southwestern Medical Center. For controls, we utilized a single dataset of individuals downloaded from the dbGaP web site [17] from the Geisinger Health System-MyCode, eMERGE III Exome Chip Study under phs000957.v1.p1 (https://www.ncbi.nlm.nih. gov/projects/gap/cgi-bin/study.cgi?study_id=phs000957.v1.p1). The dbGaP controls were previously genotyped on the same microarray platform used for cases. Only subjects self-reported as Non-Hispanic White (NHW) were included in the present study. Phenotypes of all controls were reviewed to exclude any with associated musculoskeletal or neurological disorders.

Family genotyping & linkage analysis
1 ug of genomic DNA from each of the 15 participating family members was used at the SNP&SEQ genotyping facility in Uppsala, Sweden for genotyping on the Illumina HumanOm-niExpressExome-8v1-2 array according to standard protocols (Illumina, San Diego, US) and the results were analyzed using the software GenomeStudio 2011.1 from Illumina. Two controls were also run in parallel. Genotyping was based on cluster files generated from the signal intensities from more than 800 DNA samples processed in parallel to this project. Samples with low (<95%) genotyping rate, markers with low (<99%) success rate, and monomorphic

Exome sequencing
Two individuals, heterozygote carriers of the putative chromosome 1 risk haplotype, were chosen for exome sequencing. 120ng of genomic DNA was used to prepare AmpliSeq libraries run on Ion Torrent (Life Technologies, USA) sequencing according to standard protocols. The sequencing was performed at the Uppsala Genome Center, Uppsala, Sweden. The sequence reads were aligned to the hg19 genome reference assembly using the Ion Proton pipeline (Life Technologies, USA) and single nucleotide variants (SNVs) were identified in each sample using the Torrent Suite Software (Life Technologies, USA).
The list of variants was filtered to retain only non-synonymous/splice/stop variants with a read depth of at least 10x. We also removed any variant with a frequency of 1% or higher in the 1000Genomes dataset (either all populations or Europeans, http://browser.1000genomes. org/, [22]) or the ExAC dataset (all populations or Europeans, exac.broadinstitute.org, [23]).
Based on our linkage and haplotype analysis, we further limited our analysis to variants shared by both exome-sequenced individuals as heterozygous, and located within the linked region on chromosome 1 (GRCh37: 1:84,722,102-113,819,478). Finally, the sequence quality underlying any such variants was manually assessed using the IGV browser (http://www. broadinstitute.org/igv/, [24]) to eliminate false-positive variant calls due to e.g. read-end artifacts.

Bioinformatic analysis
The tissue expression of any genes carrying putative candidate scoliosis risk variants was assessed through the FANTOM5 promoterome browser available at http://fantom.gsc.riken. To analyse the effects of the p. V2287I mutation in CELSR2, we compared the predicted 3D structure of the protein either with a Valine (V) or Isoleucine (I) at position 2287 of the protein. We used the RaptorX tool (available at raptorx.uchicago.edu [31]) for 3D structure predictions with default parameters.
A Taqman assay C_166310564_10 from Life Technologies (Thermo Fisher, Waltham, US), following the manufacturer 0 s instructions, was used to assess the frequency of the rare CELSR2 variant, identified through exome sequencing of the family, in a previously described cohort of Swedish-Danish individuals with idiopathic scoliosis [15].

Genotyping of common CELSR1, 2 and 3 variants in a Swedish-Danish case-control cohort
The Agena (http://agenabio.com/, San Diego, US, previously Sequenom) MassARRAY 1 System, combined with iPLEX 1 chemistry, was used to genotype a set of common variants in CELSR1, 2 and 3. The variants were chosen using the LD TAG SNP Selection tool at https:// snpinfo.niehs.nih.gov/snpinfo/snptag.php, [32]) to tag each gene with an LD threshold of r 2 = 0.8. For practical reasons, only the variants fitting into two iPLEX pools were selected. The 20 common variants in CELSR1-3 were genotyped in the same cohort of Swedish-Danish cases and controls as used previously [15]. The HWE of each marker in controls was assessed, and any sample with <80% call-rate was removed.

Replication studies of associated CELSR2 variants
The genotyping and imputation analysis in the Japanese cohort were carried out as previously described [16].
In the US cohort, initial per-sample quality control measures were applied and we excluded sex inconsistencies and any with missing genotype rate per person more than 0.03. The remaining samples were merged using the default mode in PLINK. 1.9 [33]. Duplicated or related individuals were removed as described in [34]. We used principal component analysis (PCA) [35] on the merged data projected onto HapMap3 samples as recommended by [36] to correct for possible stratification. We used PLINK for per-SNPs quality control including genotyping call-rate per marker (>95%) and deviation from Hardy-Weinberg equilibrium (cutoff p-value = 10 −4 ). To check the association for common SNPs around the CELSR2 gene we performed genotype imputation using minimac3 [37] with the 1000G-Phase3.V.5 reference panel as described in the instructions provided by the software developer available at (http:// genome.sph.umich.edu/wiki/Minimac3_Imputation_Cookbook). Only common SNPs (MAF>0.05) with imputation quality r 2 >0.3 were included for further analysis. Genetic association for the imputed allele dosages for the region around the CELSR2 gene (2Mb each side) was performed by Mach2dat software [38] using logistic regression with gender and ten principal components as covariates.

Identification of candidate genomic regions
Fifteen family members (Fig 1) were genotyped using the Illumina HumanOmniExpressExome-8v1-2 array (genotyped individuals are marked with Roman numerals). The average genotyping call rate, per SNP with sample call rate > 0, was >99%. The average call rate per sample was >99%. A set of 19,030 pruned-in genetic markers with a 100% call rate was used for the subsequent non-parametric linkage analysis. S1 Fig shows the linkage for all chromosomes. The only region of the genome harboring an NPL linkage signal >1, indicating increased sharing of that chromosomal region by affected individuals, was at ca. 100-150 cM on chromosome 1 (Fig 2). This region thus emerged as the most plausible region harboring a putative scoliosis risk gene/variant, shared by the majority of affected family members.
We found that one haplotype co-segregated with idiopathic scoliosis in the pedigree, as shown in blue in Fig 1 (all other haplotypes are marked as white) and in further detail in S2 Table. This putative risk haplotype was found in one copy in all affected individuals, as well as in unaffected individuals I:II, I:IV and II:VII as shown in Fig 1, and most likely explains the linkage signal we identified. Based on this analysis, the putative disease variant was hypothesized to lie between 84, 8 Mb (Genome Reference Consortium version 37, GRCh37), yielding a candidate region of circa 30Mb containing approx. 300 known genes. We restricted our subsequent search for scoliosis risk variants to this genomic region.

Exome sequencing analysis & CELSR2
We exome sequenced II:I and II:III, two members of the pedigree who shared one copy each of the putative risk haplotype on chromosome 1, in order to look for shared rare variants within the haplotype. We obtained >35M reads from each individual, with >95% on target and an average base coverage depth of >100, >98% of target bases covered at least 1x, and >57M bases in target regions (S2 Fig). In total, 64,948 variants were called in the exome analysis; the full set of variants is available as S1 Data. Of these, 40,818 were found in both individuals. 40,291 variants were autosomal, and 16,290 variants were exonic/affecting splice sites. Of these, 7162 variants were predicted to result in stop codon loss or gain, or non-synonymous changes and 61 of them were located within the region of linkage on chromosome 1. Filtering by frequency (allowing <1% minor allele frequency in 1000Genomes ALL/EUR or ExAC ALL/EUR), only three variants remained. Two of these variants were further excluded as being highly likely technical artifacts (S3 Fig). The final filtered list of high-quality non-synonymous variants within the linkage region on chromosome 1 thus consisted of only one single-nucleotide variant. This finding was confirmed by Sanger sequencing (S4 Fig shows individual I:III (wildtype) and I:I (heterozygote carrier)). This variant was rs141489111, a non-synonymous variant resulting in a c.G6859A (NM_001408) change in exon 21 of the cadherin EGF lag seven-pass G-type receptor 2 (CELSR2) gene. Rs141489111 is predicted to result in a p.V2287I (NP_001399.1) change in the protein.
This variant was found on the genotyping array used for the initial linkage analysis. The cosegregation of this variant and the chromosome 1 risk haplotype with scoliosis within the entire pedigree could thus be assessed (green boxes in Fig 1).
The The V2287I variant is located within the ancient and highly conserved GAIN domain [39] (Fig 3). Fig 4 shows the predicted three-dimensional structure of the GAIN domain of CELSR2 (pdb-files available on demand). While the V2287I change is not expected to have major effects on the structure of the protein, the predicted structure of CELSR2 indicates that it is located in close proximity to the H2355-T2357 autoproteolysis cut site [39].
Genotyping of rare CELSR2 variant in an independent cohort 1737 Swedish individuals with idiopathic scoliosis [15] were successfully genotyped for the CELSR2 rs141489111 variant (genotyping success rate of >99%). We identified five independent carriers of the rs141489111 variant, including the proband of the Swedish large family. This equates to an allele frequency of 0.1439% in this case cohort. We found no indication that the four carriers were directly related to the family under study or to each other.

Genotyping of common CELSR1-3 variants in an independent cohort
To further investigate CELSR2 and its related genes CELSR1 and CELSR3, we selected a set of twenty common tagging variants within these genes. Removing any samples with a genotyping success rate <80% yielded data for 1731 cases and 1783 controls.
No significant association was found to any of the variants in CELSR1 or CELSR3, but two variants in CELSR2 showed significant association at the p<0.05 level (Table 1 and  . The more strongly associated variant rs2281894 represents a predicted synonymous R2060R within the GAIN domain of the CELSR2 protein (Figs 3 and 4). These two markers are neither in linkage disequilibrium with each other nor with the rare rs141489111 variant identified in the large family.

Replication study of common CELSR2 variants
The two common variants in CELSR2 that showed significant association to idiopathic scoliosis in the Swedish-Danish population, rs6698843 and rs2281894, were further tested for association in two additional case-control datasets. In a genome-wide association study (GWAS) dataset from Japan (previously described in detail in [16]), neither marker showed association

Discussion
Idiopathic scoliosis has proven to be a complex disorder with high genetic heterogeneity. By linkage analysis and exome sequencing of a family with a high burden of idiopathic scoliosis we identified a rare missense mutation in the highly conserved CELSR2 gene. We suggest a novel causative mechanism in the aetiology of idiopathic scoliosis.
The rare CELSR2 variant co-segregating with idiopathic scoliosis in the pedigree under study is a previously described variant with a frequency of approx. 0.08-0.16% in the general European population. The position of the rare CELSR2 rs141489111 variant, as well as the Gprotein coupled receptor (GPCR) auto-proteolysis-inducing (GAIN) domain within which it is situated, is strongly conserved throughout evolution, arguing for the functional importance of this protein. Upon genotyping of a large cohort of independent Swedish-Danish idiopathic scoliosis patients, we did not find an elevated prevalence. This variant does thus not alone explain a significant part of idiopathic scoliosis susceptibility at the population level, while it remains a plausible candidate for rare, monogenic forms of scoliosis.
In order to further understand the possible importance of CELSR2 in idiopathic scoliosis, we genotyped common variants within CELSR2 and the related genes CELSR1 and 3, in a large independent case-control cohort. Interestingly enough, we found an association of idiopathic The name of each marker, position (build 38), the allele counts and allele frequencies are shown. Also the P-values as well as the odds ratio (OR) and 95% confidence interval (CI) for markers with P-values < 0.05. All P-values are uncorrected.
https://doi.org/10.1371/journal.pone.0189591.t001 scoliosis to another variant, rs2281894, also situated in the GAIN domain of CELSR2. This association could not be replicated in large cohorts of Japanese and US case-control datasets. It remains unknown if this is due to differences in diagnostic criteria/selection of samples or perhaps due to population differences in the underlying genetic risk factors. Future studies, aiming to dissect in further detail the overall importance of the CELSR1-3/associated genes contribution to scoliosis in multiple populations will be undertaken to understand this further. CELSR2 is highly expressed in neuronal tissues, including adult temporal lobe, occipital pole and postcentral gyrus, in addition to whole brain, neurons and fetal and adult spinal chord (as assessed through FANTOM5). In CELSR2-deficient mice, the development and planar organization of ependymal cilia are compromised, leading to fatal hydrocephalus [40].
Compound heterozygosity for mutations in CELSR2 have recently been suggested to cause Joubert syndrome, a ciliopathy disorder, in a young girl [41]. This girl showed a phenotype of cortical heterotopia, microophthalmia, severe growth retardation, cone-shaped epiphyses and growth hormone deficiency. This phenotype is obviously more severe than idiopathic scoliosis and supports the notion that a drastic change in the function of CELSR2, e.g. a gain-of-function due to the mutation, would likely be lethal or cause a more severe phenotype than idiopathic scoliosis. Based on this, it seems more plausible that the mutation identified in the current family would involve a loss of function, with heterozygote carriers having approximately half of the wildtype levels of functioning CELSR2. This is mirrored by the situation in ADGRG6 (also called GPR126), a gene previously associated to idiopathic scoliosis [4]. The protein product of this gene also carries a GAIN domain, and also needs to be auto-proteolysed to function properly. The ADGRG6 variants that have been associated to idiopathic scoliosis have been intronic, and presumably only mildly or not directly functional. In contrast, homozygosity for a mutation that severely reduces the auto-proteolytic function of ADGRG6 has been shown in severe arthrogryposis multiplex congenita [42]. This further argues for the importance of the GAIN domains and supports our idea that any mutations abolishing their function (in ADGRG6 or CELSR2) would likely lead to more severe phenotypes than idiopathic scoliosis.
The penetrance of scoliosis in carriers of the CELSR2 variant in the present study is incomplete, suggesting additional unknown modifiers of disease risk (genetic or environmental). This is a common finding even in clearly monogenic situations (reviewed in e.g. [43]). Another example of this is the reduced penetrance of POC5 mutations segregating with idiopathic scoliosis in the families studied by Patten et al. [13]. They suggest the possible presence of a second risk variant within the families. Another thing to keep in mind is that idiopathic scoliosis is a continuous variable that we dichotomise, and that the difference between a classification as unaffected and mild scoliosis may be very subtle.
The cellular function of CELSR2 is not fully understood but it is known to be important for axon guidance, neuronal migration, and cilium polarity [44][45][46]. CELSR2 is located at the plasma membrane and belongs to the flamingo cadherin subfamily [40]). This group of proteins is involved in contact-mediated communication between cells and consists of a large extra-cellular domain, a seven-pass transmembrane domain and a cytoplasmic tail [47]. The rare variant identified in the family induces a change from valine to isoleucine in the extracellular part of the protein, in the so-called GAIN domain. The GAIN domain is required and necessary for autoproteolysis of the protein [39], and while the result of changing this particular valine to an isoleucine remains unclear, it is plausible to suggest a possible effect on this mechanism.
In conclusion, we have identified CELSR2 as a putative novel risk gene idiopathic scoliosis, and we hypothesise that this effect may be mediated through a disruption of the auto-proteolytic mechanism of the GAIN domain in the CELSR2. Future studies will focus on functional studies to understand the specific mechanisms underlying this disruption. Association of tagging variants in CELSR2 with idiopathic scoliosis in a Swedish case-control dataset. The plot is produced using LocusZoom, available at locuszoom.org. The x-axis shows the position of the variants on the chromosome (in Mb), and relative to CELSR2 and neighbouring genes. The left y-axis shows the -log of the association. The most strongly associated variant, rs2281894, is marked with a diamond; the colour of the other markers (circles) is determined by their linkage disequilibrium (LD) with rs228194 (based on hg19/1000Genomes Nov2014 EUR). The right y-axis shows the recombination rate in the region as a light blue line. (PDF) S1 Table. Phenotypes. Phenotypic information on the family members included in the linkage/exome sequencing analysis. Individual numbering is as in Fig 1. (PDF) S2 Table. Haplotypes. Chromosome 1 Merlin most probable haplotypes flow (haplotypes coded as A, B, C etc, and colour coded for simplicity). Haplotypes are produced using Merlin, according to the most likely pattern of gene flow (-best) and using the-horizontal flag. (PDF) S1 Data. All exonic variants in II:I and II:III. Compressed variant list, annotated by Annovar, containing all the exonic variants found in either II:I or II:III (or both). (TXT)