Exome-Sequencing Confirms DNAJC5 Mutations as Cause of Adult Neuronal Ceroid-Lipofuscinosis

We performed whole-exome sequencing in two autopsy-confirmed cases and an elderly unaffected control from a multigenerational family with autosomal dominant neuronal ceroid lipofuscinosis (ANCL). A novel single-nucleotide variation (c.344T>G) in the DNAJC5 gene was identified. Mutational screening in an independent family with autosomal dominant ANCL found an in-frame single codon deletion (c.346_348 delCTC) resulting in a deletion of p.Leu116del. These variants fulfill all genetic criteria for disease-causing mutations: they are found in unrelated families with the same disease, exhibit complete segregation between the mutation and the disease, and are absent in healthy controls. In addition, the associated amino acid substitutions are located in evolutionarily highly conserved residues and are predicted to functionally affect the encoded protein (CSPα). The mutations are located in a cysteine-string domain, which is required for membrane targeting/binding, palmitoylation, and oligomerization of CSPα. We performed a comprehensive in silico analysis of the functional and structural impact of both mutations on CSPα. We found that these mutations dramatically decrease the affinity of CSPα for the membrane. We did not identify any significant effect on palmitoylation status of CSPα. However, a reduction of CSPα membrane affinity may change its palmitoylation and affect proper intracellular sorting. We confirm that CSPα has a strong intrinsic aggregation propensity; however, it is not modified by the mutations. A complementary disease-network analysis suggests a potential interaction with other NCLs genes/pathways. This is the first replication study of the identification of DNAJC5 as the disease-causing gene for autosomal dominant ANCL. The identification of the novel gene in ANCL will allow us to gain a better understanding of the pathological mechanism of ANCLs and constitutes a great advance toward the development of new molecular diagnostic tests and may lead to the development of potential therapies.


Introduction
The neuronal ceroid lipofuscinosis (NCLs) are the most common group of inherited neurodegenerative diseases in children, with an incidence in the U.S. of approximately 1 in 12,500 live births [1]. NCLs encompass a genetically heterogeneous group of disorders, clinically characterized by progressive deterioration of cognitive and motor skills, visual impairment, and premature death [2]. The onset of the clinical symptoms in addition to the differences in ultrastructural features of the lipopigment inclusions underlie the nosological spectrum of the NCLs: infantile (INCL, Santavuori-Haltia, MIM 256730), lateinfantile (LINCL, Jansky-Bielschowsky, MIM 204500), juvenile (JNCL, Batten disease, Spielmeyer-Vogt, MIM 204200), adult (ANCL, Kuf's disease, MIM 204300), and Northern epilepsy (NE, progressive epilepsy with intellectual disability) [2].
Adult onset NCLs (ANCLs) represent between 1.3% and 10% of NCLs cases [4]. ANCLs are rapidly worsening conditions with a wide range of age at onset (6-62 yr) and broad clinical variability. Two main clinical subtypes have been described: progressive myoclonus epilepsy (type A), and dementia with motor disturbances, such as cerebellar, extrapyramidal signs, and dyskinesia (type B). However, there is some overlap, or a continuum of signs between the two types, particularly late in the course of the disease. Therefore, it is not always easy to differentiate them [4]. Unlike other NCLs there is an absence of retinal degeneration [5]. Pathologically, the ceroid-lipofuscin accumulates mainly in neurons and contains subunit C of the mitochondrial adenosinetriphosphate synthase (SCMAS) [6], but has different ultrastructural appearances such as granular osmiophilic deposits (GRODs) and fingerprint, curvilinear, or rectilinear structures [7].
ANCLs are genetically heterogeneous with either a sporadic, autosomal recessive (Kufs' disease, MIM 204300) or dominant (Parry's disease, MIM 162350) pattern of inheritance in confirmed cases [4,5]. Three known NCL genes were previously associated with atypical ANCLs such as PPT1 [8,9], CLN5, and N-Sulfoglucosamine Sulfohydrolase (SGSH, MIM 605270) [10], suggesting that some of the ANCLs were not distinct genetic entities and raising the possibility that they actually represent an extreme of a clinical spectrum of low penetrance and variable expressivity of NCL mutants [10,11]. This knowledge, although significant, could not contribute to molecular elucidation of the ANCLs.
That was the state of the art when we started to perform an exome-sequencing study in the largest family (10 affected members over 5 generations) with autosomal-dominant ANCLs, known until now ( Figure 1A) [12]. However, during the course of this study, two genes, a well known NCL gene (CLN6) [11] and DNAJC5 gene (MIM, 611203) [13] have been associated with autosomal recessive (locus NCL4A) and dominant (locus NCL4B) cases of ANCLs, respectively. There was no apparent correlation of the underlying genetic defect with the clinical course and the ultrastructural features between the studies and, to date, neither of them has been independently replicated. Independent replication studies are important especially in ANCLs, wherein a critical evaluation of the literature led to the rejection of 68 (out of 118) cases published as Kufs' disease [5], which means that there is a high rate of misdiagnosis in ANCLs. These results also indicate that the genetic architecture of ANCL is more varied and complex than previously thought.
In this study, we performed exome-sequecing in three family members from a family with autosomal dominant ANCL with early dementia [12] ( Figure 1A). Several recent studies have successfully identified the disease-causing genetic variant in non-NCL diseases with similar pedigrees by exome-sequencing [14]. Exome-sequencing is a very powerful technique, especially in families that are not big enough for classical linkage studies. In previous studies, the pathogenic variant was identified by only sequencing as low as two or three individuals [15].This method is a hypothesis-free approach that allows for a targeted enrichment and resequencing of nearly all exons of protein-coding genes. Protein-coding exons account for only 1% of the human genome, but 85% of Mendelian diseases are caused by mutations in this genomic space.

Exome Sequencing
Overall, a mean of 158.6 million reads were generated for the three samples. Approximately 80% of these were aligned to the human reference genome (hg18) and 95% of these fell onto targeted and enriched exons. Reads not corresponding to the targeted bases of the exome were discarded (less than 2%). The mean exome coverage was 94.7% with .56 fold coverage. After filtering for a minimum length good quality sequence of 30 and a minimal sequence read depth of 56 (standard parameters), we identified on average 38,179684 coding single nucleotide substitutions (SNSs) at a transitions-to-transversions ratio of 3.05, and 2931610 small insertions and deletions (indels), among all three individuals.
To identify pathogenic variants, we consecutively filtered these variants by subjecting them to an analytical pipeline for highconfidence variant calling and annotation. Briefly, we discarded (a) common and known variants present in HapMap, the dbSNP130 Database or 1,000 Genomes Project at a frequency of greater than 5% for heterozygous and 30% for homozygous calls, and (b) variants in intergenic or intronic regions [16]. After filtering, 2,3376102 SNPs and 21062 indels were identified as novels. Next, we further focused on functionally significant variants such as missense, nonsense, or splice-site changes, with sufficient and consistent depth and quality values plus a high rate of concordance among the three samples. Thus, we identified 96 SNSs (95 missense, 1 nonsense), and 13 indels which were private variants for these members of this family (Table 1). There were 24 heterozygous non-synonymous coding variants and three heterozygous indels ( Table 1) that were present in both affected individuals but absent in the control. In order to remove systematic artifacts and rare variants, we checked these variants against an additional Washington University exome database of 59 individuals, which reduced the list to 19 variants and three indels ( Table 2). These 22 high-quality variants had a median SNP quality score of 228 (range 61 6095) and a read depth of 124.5 (range 9-381), and were selected for validation using two different genotyping technologies: the MassARRAY SNP (Sequenom, Inc) and KASPar v4.0 SNP (KBioscience) genotyping systems [17,18,19]. Only one variant turned out to be a false positive, which was most likely due to a mapping error. Thus, we have a low rate of false discovery of 0.045.

Identification of the causative variant
Next, we carried out an extended co-segregation analysis within the family for all the validated SNPs. In total, we genotyped three affected samples and three elderly healthy individuals (See methods). As a result, only two SNSs, one located in PDCD6IP (Programmed cell death 6-interacting protein, MIM 608074) and one in DNAJC5, plus a deletion in the LIPJ gene (lipase-like, abhydrolase domain containing 1, MIM 613921) segregated perfectly with disease status ( Table 2).
It has been shown that disease genes display significant functional clustering in molecular networks [20,21]. One-third of known disorders with two or more associated genes were found in physical clusters of genes with the same phenotype [20]. Therefore, we used a disease-network approach using all NCL genes as a training group (see methods) to prioritize further validation among these three genes (PDCD6IP, DNAJC5 and LIPJ). Surprisingly, we found that these three variants were in the top five genes in the combined analysis (See: Table S1), suggesting that they may be functionally or structurally related with NCLs encoded genes and constituting true candidates as ANCLs causative genes. We also noticed that out of these three variants, PDCD6IP and DNAJC5 have the highest Genomic Evolutionary Rate Profiling (GERP) and are predicted to be damaging for the protein ( Table 2). The deletion in the LIPJ gene was predicted to remove a potential donor splice site by using the Human Splicing Finder server (Consensus Values (CV) for wild type 69.84 and 36 for the mutant, an average reduction (-DCV) of 48.45) [22]. Taking into account this evidence, PDCD6IP, DNAJC5 and LIPJ were all identified as excellent plausible candidates for the genetic defect responsible for ANCL. Therefore, we performed genotyping for each variant in a cohort of 1,600 (3,200 chromosomes) ethnically matched controls. The p.G429S variant in the PDCD6IP gene and the deletion on the 39-splicesite in the LIPJ gene turned out to be rare variants with a MAF of 0.01 and 0.03, respectively.
The only variant remaining is a single nucleotide substitution (c.344T.G) that causes a p.L115R amino acid change in DNAJC5 gene. Sanger sequencing was performed to confirm the presence of the mutation in DNAJC5 gene in all affected individuals in the family ( Figure 1B). We also analyzed the exome-sequencing data of all individuals for mutations in previously reported genes associated with ANCLs such as PPT1, CLN5, SGSH and CLN6 [8,9,10,11] and revealed no non-synonymous changes in any of them.
The fact that the c.344T.G variant in DNAJC5 was present in all the affected individuals but not in 1,600 control individuals strongly indicates that this is the underlying genetic cause of the ANCL phenotype in this family. In order to confirm that mutations in the DNAJC5 gene causes ANCL, we used Sanger sequencing to analyze the entire coding sequence plus the exonic flanking region in three other independent autosomal dominant familial cases of ANCL and one of LINCL (internal validation set). We found an in-frame single codon deletion (c.346_348 delCTC), which causes a deletion of p.L116del ( Figure 1B) in a second family. We designed a Kaspar assay to test this deletion, and did not detect it in more than 3,200 chromosomes from our control samples. None of these variants have been previously reported in dbSNP (build 134) or 1000 genomes project database (20101123 releases).
Thus, combining all the genetic analysis together; we found that different novel mutations in DNAJC5 are present in unrelated families and exhibit perfect segregation with disease status. These variants are located in highly conserved regions (GERP score of 5.34), are predicted to be damaging and are not present in 1,600 controls. Therefore, our results replicate the recent report of mutations in DNAJC5 gene as a cause of ANCL [13].
The p.L115R and p.L116del mutations affect highly conserved dileucine residues located immediately N-terminal to the CSD ( Figure 1C); this region is responsible for palmitoylation, membrane binding/targeting and oligomerization of CSPa [24,25,26]. Therefore, we took advantage of the existence of robust in silico tools (See methods), which are widely validated with curated experimental data, to test the functional impact of the mutations on these processes.

Palmitoylation Analysis
First, we performed an in silico analysis of the impact of the mutations on the palmitoylation status of CSPa. As shown in Figure 2A, changes in the p.C121-124L residues used as a positive control, showed a significant reduction in the levels (3.21762.723) of predicted palmitoylation compared to the wild type (WT) (6.94262.579) (p,0.05 kruskall-wallis test), which is in agreement with experimental data [25,26]. In contrast, we did not find any significant (p.0.05) change in the pattern of palmitoylation induced by p.L115R (7.09361.919) and p.L116del (6.67762.395) over the wild type sequence, although, the p.L115R mutation eliminates palmitoylation of p.C113.

Hydrophobicity Profile of CSPa
The CSD is a highly hydrophobic region, and residues in the Nterminal half, are likely to play a key role in membrane association [25,26]. The general index of hydrophobicity (mean 6 sd, for the segment from 110-120 residues) for WT is 1.69960.69, for p.L115R  Figure 2B). A nonparametric ANOVA (kruskal-wallis statistic), followed by Dunn's multiple comparison test showed that there is a significant reduction in the global hydrophobicity index induced by the p.L115R mutation (p,0.05) but is not significant for the p.L116del mutation.

Membrane binding/targeting analysis
Next, we evaluated in silico the propensity of CSPa to interact with the lipid bilayer and the impact of the mutations on the hydropathy of the CSD. We measured the free energy of transfer from water to a phosphocholine bilayer interface of CSD. In this thermodynamic free energy scale, the stable TM helical segments show favorable water-to-membrane transfer free energies (DG,0). We found that WT CSPa has a DG 25.69 kJ/mol. while the mutations p.L116del (DG 25.13 kJ/mo) and p.L115R (DG 24.32 kJ/mo), reduce this propensity by similar amounts to the positive control p.C113-118-9S (DG 24.58 kJ/mo). This result suggests that the mutated domain will release less energy to bind to the membrane than the WT. In addition, we examined the difference between the octonal and interface scales (DG woc 2DG wif ); a measure which identifies segments that tend to prefer a transbilayer helix conformation relative to an unfolded interfacial location. The wild type sequence showed DG 6.55 kJ/mol, while p.C113-119S (DG 6.88 kJ/mol), p.L116del (DG 7.24 kJ/mol) and p.L115R (DG 8.24 kJ/mol) reduced this propensity. This finding suggests the mutated domain alters CSPa such that it would prefer to be in water than affiliated to the membrane. The results of our simulation are in agreement with the weak affinity for the membrane reported for the CSD of CSPa [25,26]. In addition, these results suggest that the mutations reduce the intrinsic tendency of CSPa to bind to the membrane.
In addition, both p.L115R (0.0303660.001724) and p.L116del (0.519060.05983) reduce the probability of CSPa forming membrane-binding domains compared to WT (0.736960.03764), and exhibit values similar to the positive control p.C113-116S (0.226960.01267) ( Figure 2C-D). Thus, the number of residues that were expected to be part of the membrane-binding domain decreased from 18 residues for WT to less than one in p.L115R, 12 residues for p.L116del, and 5 residues for the positive control (p.C113-116S). This result suggests that the mutations are dramatically reducing what has been experimentally demonstrated to be the minimum-required membrane-binding domain [25,26].

Oligomerization Analysis
We employed the prediction of amyloid structure aggregation (PASTA) software to identify the regions of CSPa, which are likely to stabilize the CSPa-CSPa dimers into aggregates [27]. We found that the region between p.F110-P138 residues has a strong propensity to assemble into antiparalell ß-sheets (See: Figure S1). The same core region for oligomerization identified experimentally [24,28]. On average this segment has a pairing energy score of 225 in WT, 225.4 in p.L115R, and 225.3 in p.L116del mutations supporting the antiparalell assembly. In addition, we found that on average the aggregation propensity [h(k)] of WT CSD is around 0.04 as well as p.L115R (0.04) and p.L116del (0.04). The positive control (Ab-40) has an aggregation propensity around 0.05 (See: Figure S1) [27].
Due to the lack of sufficient structural data of this region of CSPa (only the tertiary structural information for residues  at the N-terminal region of the mouse homologue of CSPa is currently known), it was not possible to accurately simulate the effect of the mutations at a structural level, and thus, we performed an initial analysis assuming an unfolded state. In general, CSPa has a tendency to be unfolded, as is shown in Fig. 3A by the Z-fold score [29]. Next, we calculated the intrinsic aggregation propensity of CSPa, initially assuming a model from unfolded to fibrillar state [30]. We found that between p.F110-N130 residues, there is a region with a high propensity to aggregate (Zagg score .1): WT (Zagg score: 3.32061.182) ( Figure 3A) [31]. This intrinsic propensity is slightly higher but not significant for the mutation p.L116del (Zagg score: 3.40161.180) and slightly lower for p.L115R (Zagg score: 3.04161.282) ( Figure 3D). In addition, the propensity to form protofibrillar species (Ztox score) is also found to be high in this region of WT (Ztox score: 2.78760.8957), p.L115R (Ztox score: 2.38460.9925) and p.L116del (Ztox score: 2.86760.8973) ( Figure 3C). There is a high correlation (r = 0.97) between Ztox score and Zagg score for the region between p.F110-N130, which supports the idea that the CSD of CSPa forms stable fibrillar species.
However, in a folded state, regions that are highly aggregation prone are protected from aggregation because they are buried within the native structure without exposure to the solvent and hence, unavailable for intermolecular interactions [32]. It is also crucial to note that it has been suggested CSD may adopt a helical or beta sheet structure that aligns CSPa parallel to the plane of the membrane [26], possibly allowing CSPa to adopt a folded state. Therefore, we tested the propensity of CSPa to be protected; and found that this region p.F110-N130 may be protected in WT (14.0766.110), p.L115R (13.9765.743) and p.L116del (12.076 6.421) (lnp,5 for nonprotected residues; lnp.5 for protected residues) [32] in a folded state ( Figure 3B).
In summary, an in silico analysis indicates that the p.L115R and p.116del mutations are not directly affecting the palmitoylation or the propensity of CSPa to form aggregates per se, but by modifying the membrane-targeting sequence, the physical and chemical properties of CSPa are altered, weakening the membrane binding and consequently affecting intracellular sorting of CSPa. There are potential errors and biases in the assumptions in silico analysis takes into account, therefore these results should be interpreted cautiously. We used as proof of principle, controls with the strongest experimental support available to validate our findings. Nevertheless, further experimental studies are needed to understand the pathogenicity of these mutations.

Discussion
We have confirmed that mutations in DNAJC5 cause autosomal dominant ANCL [13]. However, ANCLs are disorders clinically and genetically very heterogeneous [4]. The subsequent difficulty in performing an accurate diagnosis had contributed as a limiting factor in the identification of its genetic cause [5]. However, to date, two different groups have been able to concurrently and independently identify the same DNAJC5 gene and same mutations using different and complementary approaches, which consolidate and validate the results [13](shown here). However, mutations in DNAJC5 are currently explaining approximately 25% of the autosomal dominant ANCLs [13] Therefore, It is possible that other forms of ANCLs may have another genetic cause.
This is the first replication study of the identification of mutations in the DNAJC5 gene in ANCLs. By performing whole-exome sequencing in a multigenerational family with autosomal dominant ANCL ( Figure 1A), we have identified a novel single-nucleotide variation (c.344T.G) in DNAJC5. In addition, using Sanger sequencing we found an in-frame single codon deletion (c.346_348 delCTC) in an independent family ( Figure 1B). Thus, these variants fit genetic criteria for diseasecausing mutations: they are present in unrelated families ( Figure 1B); they exhibit perfect segregation with disease status (Table 2), they are not present in any healthy controls tested, they are located in evolutionarily highly conserved residues ( Figure 1C), and they are predicted to functionally affect the encoded protein (CSPa) ( Table 2).
Whole exome sequencing has proved useful to identify the pathogenic variant in monogenic disease [14,15]. It is a rapid and cost-effective method that only requires sequencing in a small number of individuals. To date, different approaches to whole exome sequencing have been used including various designs, filtering methodologies and analytical strategies [15]. Some examples are sequencing several affected individuals from different families, sequencing two affected individuals from the same family, and combining whole-exome sequencing data with linkage data, among others [13,14,15]. Our design included two distantly-related affected individuals and a healthy sibling of one of the affected (and first cousin of the other affected individual) ( Figure 1A). By selecting the two most distantly-related affected individuals available, the number of non pathogenic variants due to relatedness is dramatically reduced ( Table 1).
In fact, 674638 novel non-synonymous variants were found in each sample, but only 95 were shared by the two affected individuals. Subtracting the variants in common between the control and affected individuals, 25 SNSs were remaining (Table 1). Taking into account that on average 50% of genetic variants are shared between siblings and 12.5% between first cousins, we expected approximately 40 SNSs to remain after filtering against the control. To control for the false positive variants due to technical errors associated with Next-Generation Sequencing Platforms, all the three samples were run in the same flow cell. Thus, it is likely that the number of shared variants by both affected individuals was initially overestimated, and these artifacts were then filtered out by comparing with the control.
As done in other studies [15], additional affected family members were genotyped to identify the variants that are present in all of the affected individuals. A major challenge of wholeexome sequencing is to uniquely identify the causative variant. Most of the current approaches are based on first, removing candidate variants (non-synonymous, non-sense and splice-site variants) that are present in public databases (1000 genomes project and dbSNP), and second, on selecting only the variants present in the affected individuals. As a result, we found three unique variants located in the PDCD6IP, LIPJ and DNAJC5 genes, which were present in all the affected family members but not in public databases. Interestingly, these three genes also exhibited the highest values based upon their GERP scores ( Table 2). In order to elucidate the real cause of ANCL in this family, a large series of population controls were screened to verify if these variants in the PDCD6IP and LIPJ genes were present in healthy individuals. Only 16 and 8 heterozygous carriers for the variants on the PDCD6IP and LIPJ genes were found, respectively. It is important to note that these variants were not present in the 1000 Genomes Project or in the dbSNP Database at the time of the analysis. Similar to others, [14,15] this study clearly shows that pathogenic or causative variants can be identified by performing exomesequencing in a small number of family members. It is now clear that by selecting two affected family members distant in the pedigree and one unaffected sibling of one of the affected individuals, and by running the three samples simultaneously, the number of potential candidate variants is significantly decreased. Our study also shows that although the public databases are very useful in removing commonly found variants (Table 1), they are still not comprehensive enough to eliminate all possible rare variants and unequivocally identify the causative mutation. Most importantly, screening a large number of nonaffected individuals is still necessary.
DNAJC5 gene encodes CSPa, which is a key element of the synaptic molecular machinery and accounts for 1% of all vesicle proteins [33,34], as well as part of the general exocytotic machinery [35]. The synaptic vesicle localization and chaperone activity of CSPa suggests that it may function in rescuing synaptic proteins that have been unfolded by activity-dependent stress [36]. Deletion of CSPa in flies and mice results in neurodegeneration and impairs synaptic function [36,37,38,39]. CSPa has mostly been found associated with vesicles; however, it has a weak membrane affinity. Furthermore, there is an inverse correlation between membrane targeting of CSPa palmitoylation and adequate intracellular trafficking [25,26]. Site-directed mutations that enhance membrane association such as p.C121-124L prevent adequate palmitoylation and lead to accumulation of CSPa in the ER and Golgi apparatus [25,26]. In contrast, residue changes such as p.C113-119S (p.L115R, p.L116del, shown here in Figure 2C-D) reduce binding to the membrane resulting in inadequate palmitoylation. They exhibit a localized punctuated pattern throughout the cytoplasm [25,26], co-localize with markers of ER-Golgi intermediate complex (Golgi SNARE proteins), and show a significant reduction in synaptic regions [13].
A potential neuronal-specific effect of this disproportionate and persistent CSPa missorting is a depletion of CSPa [13] and possibly some of its SNARE partners the synapse, leading to a disruption in neurotransmission and synaptic dysfunction as displayed by the CSPa null animal models (mimicking loss of function) [36,37,38,39].
CSPa self-associates, forming oligomers [24,28,40]. The pG83-C136 residues constitute the core region for CSPa oligomerization [24]. Our analysis revealed that a more localized region between residues pF110-P138 has a tendency to form antiparallel ß-sheets species (See: Figure S1), consistent with reports of the CSPa-CSPa dimers that are stable to temperature-and SDS-resistant particle [28,40]. We did not find any significant increase in the intrinsic tendency of the mutations to aggregate (Figure 3C-D). However, the unattached form of the protein can induce conformational changes that facilitate a more reactive oligomeric state ( Figure 3A, D). The effective increase in concentration of soluble CSPa produced by these mutations can also increase the level of macromolecular crowding, which in turn may dramatically enhance the own propensity of CSPa to aggregate, as has been shown for a-synuclein [41].
In a macromolecular crowded environment, the equilibrium between protein folding and protein-protein interactions is driven towards the lower volume (folded to unfolded) species [42]. CSPa has been shown to have a high affinity for unfolded proteins [23]. In a crowded environment, this can increase the likelihood of CSPa to interact with itself (more stable CSPa-CSPa dimers can be generated) and with other amyloidogenic partners such as asynuclein [38,43]. Recently, it has been shown that CSPa-CSPa dimers appear to be the main form of CSPa found in the brains of carriers of the ANCLs mutation [13].
Another potential target of abnormal interactions of CSPa are the synaptic proteins, especially those which are central to synaptic vesicles exocytosis, including proteins from the v-and t-SNAREs complex, and the putative Ca2+ sensor synaptotagmin 1, which undergoes palmitoylation in the golgi [44]. Abnormal CSPa-CSPa dimers may impede appropriate synaptic vesicle targeting and subsequently, disrupt neurotrasmission. Indeed, recently it was shown that increasing the CSPa dimerization inhibits synaptic transmission [28].
Synaptic dysfunction has been consistently reported in several human and animal models of NCLs [45]. Several NCL-encoded proteins have been found in synaptic compartments [46,47,48]. Furthermore, signs of synaptic dysfunction (reduction in synaptic vesicle number) and degeneration have been demonstrated in PPT1 deficient neurons in vitro [49], and synaptic pathology (redistribution of SNARE complex and aggregates of Syp/ SNAP25) occurs early on in disease progression in the congenital form of NCL [50]. Synaptic involvement in two different mouse models of INCL was also recently demonstrated [45,51]. The PPT1 null animal model displays alterations in the endocyclic\recycling pathway of synaptic vesicles associated with the impairment of depalmitoylation of the SNARE proteins [45], unlike CSPa null models that do not exhibit any primary defects in the endocytosis process or vesicle recycling [52]. To date, the mechanisms causing synaptic vulnerability in NCLs remains poorly understood.
Our genetic results confirm DNAJC5 is the disease-causing gene of some ANCLs with autosomal dominant inheritance [13]. The in silico analysis suggests reduced the membrane binding and subsequent missorting of CSPa may play a crucial role in the pathogenicity of these mutations. This mislocalization can by itself affect the palmitoylation status and the propensity to aggregate. Thus, a dominant-negative mechanism resulting from CSPa propensity to self aggregate may be involved in the pathogenicity. The mutated CSPa may aggregate with the wild type, induces mislocalization and subsequent reduction of CSPa levels in the synapse Since CSPa is a synaptic protein and the null animal models show a progressive neurodegenerative phenotype, a better understanding of the cellular and molecular characteristics of synaptic vulnerability will be important for our understanding of NCLs pathogenesis and for the effective development of therapeutic approaches.

Patients and Study Design
The Institutional Review Board (IRB) at the Washington University School of Medicine in Saint Louis approved the study. Prior to their participation, a written informed consent was reviewed and obtained from family members. The Human Research Protection Office (HRPO) approval number for our ADRC Genetics Core family studies is 93-0006.
To identify the gene underlying disease in this family, the affected individuals 5:5 and 6:7 and the unaffected sibling control 5.2 were selected for exome-sequencing. The affected samples are distant enough in the family tree that they share about 1/8 of their genome and since both are affected they should harbor the same single causative variant. The control is an unaffected sibling of one affected sample, which in theory should allow us to reduce the non-pathogenic variants shared by these individuals by ,50%. By combining the data from these three individuals we were able to narrow down the number of potential pathogenic variants. We also used DNA available from another affected (6:9) and two unaffected samples (5:11, 6:11) to perform an extended segregation analysis.

Validation Set of patient with ANCLs
There were at least 5 similarly affected individuals in the family over 3 generations with apparent autosomal dominant inheritance. The proband had behavioral difficulties that started in the mid 20 s, probably OCD-like. The first generalized seizure occurred in the early 30 s and was followed by cognitive regression (loss of speech-language and memory). There were also mobility problems by early 30 s.
A patient selected from the second family is 70 years old with a history of tremor, progressive encephalopathy with rectilinear inclusions only (no FP, CL, GRODS) on skin biopsy. NCL gene testing was negative. The daughter has early-adult onset tremors and myoclonic jerks, with rectilinear inclusions in buffy coat. NCL gene testing was negative Patient has a late-infantile NCL with onset of progressive myoclonic epilepsy at age four, visual loss at age five and curvilinear and fingerprint bodies on nerve/muscle biopsy. This patient is a carrier of a change c.896C.T/P299L in the CLN6 gene [11].
An individual from the third family (two sisters) had childhood absence seizures, followed by the onset in her late 20 s of myoclonus, generalized seizure disorder, and dementia. History of autosomal dominant transmission and electron-microscopical studies revealed an accumulation of dense osmophilic material with a vague internal architecture resembling fingerprint shapes and occasional curvilinear bodies [53].

Exome sequencing
Enrichment of coding exons and flanking intronic regions was performed using a solution hybrid selection method with the SureSelectH human all exon 50 Mb kit (Agilent Technologies, Santa Clara, California) following the manufacturer's standard protocol. This step was performed by the Genome Technology Access Center at Washington University in St Louis. The captured DNA was sequenced by paired-end reads on the HiSeq 2000 sequencer (Illumina, San Diego, California). Next, raw sequence reads were aligned to the reference genome NCBI 36/hg18 by using Novoalign Version V2.07.00 (Novocraft Technologies, Selangor, Malaysia). Base/SNP calling was perform by SNP Samtools Version 0.1.7. SNP annotation was carried out using version 5.07 of SeattleSeq Annotation server (see URL). [16].

SNP Genotyping
Two different genotyping technologies were used: MassARRAY SNP (Sequenom, Inc) and KASPar. The principle of the MassARRAY system is PCR-based, where different size products are analyzed by SEQUENOM MALDI-TOF mass spectrometry [17,18,19]. The KBioscience Competitive Allele-Specific PCR genotyping system (KASP) is FRET-based endpoint-genotyping technology, v4.0 SNP (KBioscience) [17,18,19]. The proportion of SNPs with genotype call rates were .98% and the number of samples that failed to give data for .98% of SNPs was extremely small.
Protein-coding exons and 100 base of flanking upstream and downstream intronic sequence of DNAJC5 (Transcript: ENS-T00000360864) were amplified on Applied Biosystems (Applied Biosystems, Carlsbad, California, USA) 96-Well GeneAmpH PCR System 9700 Thermal Cyclers using a touchdown protocol. All PCR products to be sequenced were amplified under the same conditions (25-ml volume containing 106 PfuUltra TM HF reaction buffer (Stratagene, La Jolla, California, USA), 56 Betaine (Sigmaaldrich, St Louis, USA), 100 mmol/l each dNTP, 200 nmol/l each primer, 0.4 PfuUltra TM High-Fidelity DNA Polymerase (Promega); PCR profile: 94uC followed by 34 cycles of 45 s at 94uC, 45 s at 62u, and 1 min at 72uC). PCR products purification was completed with Exosap-IT (USB Corporation). Sequencing was performed both in forward and reverse direction using BigDyeH Terminator v3.

Disease-network analysis of NCLs genes
It has been shown that genes that are associated with phenotypically close disorders are prone to have similar molecular signatures, especially in inherited disease [21]. Here, we have used a disease-network analysis approach as supporting in silico evidence of the role of the candidate genes we identified by exome sequencing. We found that the analysis is robust across different algorithms and random subsets of training NCL disease genes.
We have used Endeavour [55], which was trained with all possible features except BLAST. Endeavour has been recently benchmarked using 450 pathway maps and 826 disease marker sets, containing a total of 9911 and 12,432 genes, respectively. It was reported that the area under the receiver operating characteristic curves is 0.97 for pathway and of 0.91 for disease gene sets [56].
Endeavour, http://homes.esat.kuleuven.be/;bioiuser/endeavour/ tool/endeavourweb.php ToppGene [57] was used with the default training parameters. Both softwares were trained using the causal genes of other NCLs (NCL Mutation Database, URL) along with genes that are associated with phenotypically close disorders (i. e. differential diagnosis of ANCLs). They were tested against the variants that we have identified by exome sequencing. As a control group we used the entire genome as a training group (See : Table S1).
We used the Clustering and Scoring Strategy for Palmitoylation Sites Prediction (CSS-Palm) system. It has been shown that the program's prediction performance has a highly positive Jack-Knife validation results (sensitivity 82.16% and specificity 83.17% for cut-off score 2.6) [58].

Analysis of the impact of mutations on the hydropathy of CSPa
Protscale, http://web.expasy.org/protscale/ The hydrophobicity profile was calculated using the scale of Kyte and Doolittle [59].
The TMHMM program was used to predict the transmembrane domain in the entire sequence of CSPa (accession number Q9H3Z4). We included the A108-139K residues in the analysis because it has been proven experimentally that this region weakens the membrane affinity of CSPa [25,26].
TMHMM has shown that it can correctly predict 97-98% of the transmembrane helices. Additionally, it can discriminate between soluble and membrane proteins with both a specificity and sensitivity better than 99% [60].
TMHMM Server v. 2.0, Prediction of transmembrane helices in proteins http://www.cbs.dtu.dk/services/TMHMM/ Using MPEx, we measured the free energy of transfer from water to a phosphocholine bilayer interface of this domain, DG 28.12 for p.KPK137-39del and DG 26.97 for p.C121-24L. We also calculated the values of the octanol/interface for these changes; this measure identifies segments that tend to prefer a transbilayer helix conformation relative to an unfolded interfacial location. We found a DG of 2.91 kJ/mol for p.C121-124L, DG / 3.24 kJ/mol for p.KPK137-139del, and DG 6.55 kJ/mol for wild type. These two changes in residues increased the membrane affinity of CSPa, in agreement with the experimental findings [25,26]. Besides the enrichment of hydrophobic residues in the cysteine-string domain, the b-barrel analysis did not find any membrane-spanning strands/regions [61].
The TMX approach uses an experiment-based whole-residue hydropathy scale (WW scale), which includes the backbone constraint, to identify TM helices of membrane proteins with an accuracy greater than 99% [61,62].

Analysis of conformational changes induced by the mutations
There is only tertiary structural information for the 1-109 residue N-terminal region of mouse homologous CSPa (pdb entry 2CTW; Riken Structural Genomics Initiative), therefore, it was not possible to simulate the effect of the mutations at the structural level.
Prediction of amyloid structure aggregation(PASTA)benchmarked on the dataset of 179 peptides derived from the literature revealed close to an 80% true positive prediction with a ,20% false positive rate at a PASTA energy threshold of 24.0 [27,64].
PASTA, http://protein.cribi.unipd.it/pasta/. Zyggregator method is based on three physico-chemical properties of the polypeptide chain, hydrophobicity, charge, and the propensity to adopt a-helical or b-sheet structures. This method reproduces to a remarkable extent (r = 0.85) the changes in the aggregation rates observed experimentally for single amino acid substitutions [31].
Ztox score uses the same equation as in Zagg but the difference being that the parameters are fitted on a database of polypeptide chains whose aggregation resulted in protofibrillar species, rather than amyloid fibrils [65]. The propensities predicted by Ztox to form protofibrillar aggregates correlate very strongly with their in vivo effects (Stox, r = 0.83) [31].
Zyggregator: Prediction of Protein Aggregation Propensities, http://www-vendruscolo.ch.cam.ac.uk/zyggregator.php The folding propensity profile (Z-fold) is defined in terms of the physicochemical properties of the amino acids such as hydrophobicity, secondary-structure propensity, and electrostatic charge [29].
Sequence-Based Prediction of Folding Rates, http://www-vendruscolo.ch.cam.ac.uk/camfold.php The CamP method predicts if regions of a structure are protected from hydrogen exchange with an accuracy in the range 80%-100%; The region is protected if lnp.5 for all of its amide groups [32] Sequence-Based Prediction of Protein Flexibility in Native States http://www-vendruscolo.ch.cam.ac.uk/camp.new.php