ATP1A3 Mutations and Genotype-Phenotype Correlation of Alternating Hemiplegia of Childhood in Chinese Patients

Alternating hemiplegia of childhood (AHC) is a rare and severe neurological disorder. ATP1A3 was recently identified as the causative gene. Here we report the first genetic study in Chinese AHC cohort. We performed whole-exome sequencing on three trios and three unrelated patients, and screened additional 41 typical cases and 100 controls by PCR-Sanger sequencing. ATP1A3 mutations were detected in 95.7% of typical AHC patients. At least 93.3% were de novo. Four late onset, atypical AHC patients were also mutation positive, suggesting the need for testing ATP1A3 mutations in atypical cases. Totally, 13 novel missense mutations (T370N, G706R, L770R, T771N, T771I, S772R, L802P, D805H, M806K, P808L, I810N, L839P and G893R) were identified in our study. By homology modeling of the mutant protein structures and calculation of an extensive list of molecular features, we identified two statistically significant molecular features, solvent accessibility and distance to metal ion, that distinguished disease-associated mutations from neutral variants. A logistic regression classifier achieved 92.9% accuracy by the average of 100 times of five-fold cross validations. Genotype-phenotype correlation analysis showed that patients with epilepsy were more likely to carry E815K mutation. In summary, ATP1A3 is the major pathogenic gene of AHC in Chinese patients; mutations have distinctive molecular features that discriminate them from neutral variants and are correlated with phenotypes.


Introduction
Alternating hemiplegia of childhood (AHC, OMIM 614820) is a rare and severe neurological disorder [1]. AHC occurs mostly in sporadic cases, though familial cases have been reported [2,3]. It is characterized by episodic hemiplegia or quadriplegia attacks, accompanied by other paroxysmal symptoms, including oculomotor abnormalities, dystonia, seizures and autonomic disturbances. The age of onset is usually before 18 months [4,5]. Most patients also display developmental delay and progressive cognitive impairment. AHC was initially regarded as a hemiplegic migraine variant, but genes responsible for familial hemiplegic migraine such as CACNA1A, ATP1A2 and SLC1A3 have failed to be confirmed as causative for AHC [6][7][8]. Sodium-potassium (Na + / K + ) ATPase a3 subunit (ATP1A3) has recently been identified as a causal gene for sporadic AHC by three groups [2, 9,10], resulting in 82.2% positive rate and at least 78.9% as de novo for European/ American samples. However there had been no large genetic study of Chinese cohort. Such a study may not only confirm the main causal gene in Chinese patients, but also result in the discovery of novel mutations.
The protein encoded by ATP1A3 is a subunit of an integral membrane protein responsible for maintaining the sodium and potassium concentration gradients. It has two structural conformations, E1 and E2 that selectively bind three Na + and two K + respectively [11]. The E2 structure of its homologous protein has been resolved in 2007 [11], and recently E1 structure has also been resolved [12]. The protein crystal structures show that ion binding sites locate inside transmembrane helixes M4,M8. The mutation D801N, demonstrated by Heinzen et al. [2], apparently prevents the binding of K + at E2 conformation. But other variants are not so close to the ion binding site and their influences on protein functions remain unclear. An extensive survey of the molecular features of the mutations may shed light on the etiology of diseases and be useful for predicting the pathogenicity of novel ATP1A3 variants.
Besides association with AHC, ATP1A3 was first reported to be associated with rapid-onset dystonia-parkinsonism (RDP, OMIM 128235), a distinctive autosomal-dominant movement disorder [13]. Rosewich et al noted that AHC and RDP may make up a continuum of a dystonic movement disorder, but they also have a considerable list of different characteristics [9]. Furthermore,      ATP1A3 mutations identified in AHC and RDP patients so far have no overlap except for one mutation D923N [3,14,15]. This implied that different mutation in ATP1A3 may give rise to different phenotype. Finally, there is a need to investigate the genotype-phenotype correlations of AHC, which would be valuable in clinical diagnosis.

Standard protocol approvals, registrations, and patient consents
This study was approved by Institutional Review Boards at Peking University First Hospital. Written informed consent was obtained from all participants or their parents in case of minors. The diagnosis of all the patients was made according to clinical diagnostic criteria for typical AHC as follows [4,16]: (1) onset of paroxysmal events before 18 months of age, (2) repeated bouts of hemiplegia involving right and left side of the body in some attacks, (3) episodes of bilateral hemiplegia or quadriplegia starting either as generalization of a hemiplegic episode or bilateral from the start, (4) other paroxysmal disturbances including tonic/ dystonic attacks, nystagmus, strabismus, dyspnoea and other autonomic phenomena occurring during hemiplegic bouts or in isolation, (5) immediate disappearance of all symptoms upon sleep, with probable recurrence of long-lasting bouts 10-20 min after awakening, (6) evidence of developmental delay, mental retardation, neurologic abnormalities, choreoathetosis and dystonia or ataxia, (7) not attributable to other disorders. When age of onset was later than 18 months but patients fulfilled the other AHC diagnostic criteria, these patients were considered as atypical cases [17]. A total of 51 AHC patients were enrolled including a monozygotic twin which were counted as one patient. 47 of them fulfilled the diagnostic criteria for typical AHC. Four patients were considered as atypical cases, since their age of onset was later than 18 months. Venous blood samples were obtained from the participants and their parents. Detailed clinical phenotypes including the disease onset, initial symptoms, paroxysmal and non-paroxysmal manifestations, EEG and magnetic resonance imaging (MRI) results were acquired by face-to-face interviews, questionnaires, and telephone follow-up.

Whole-exome sequencing
Genomic DNA was extracted from venous blood according to standard protocol from three trios and three sporadic cases. Exomes were captured by Agilent SureSelect Human All Exon 50 Mb Kit and sequenced on Illumina HiSeq 2000. Sequence analysis was performed following the best practice of GATK v1.6 [18]. Variants that fit the de novo dominant model were selected for further functional annotations and filtered following the ANNO-VAR protocol [19]. Local assembly was performed to handle structural variations and insertions/deletions [20]. A gene was reported as a candidate gene if it had a de novo functional mutation (including insertion/deletion, splicing, nonsense and missense mutation) in at least one of the trios and has a functional or missense mutation in at least one other unrelated case. ATP1A3 was confirmed as the only candidate gene.

Sanger sequencing
Mutations identified by next-generation sequencing were validated by PCR-Sanger sequencing. Primer sequences were the same as that published in previous work [9]. The remaining unrelated patients were sequenced on all 23 exons of ATP1A3.
Mutation sites were sequenced in parents to determine whether the mutations were transmitted or de novo. 100 unrelated healthy individuals were sequenced as normal controls.

Statistical analysis of molecular features of the variants
Variants in ATP1A3 were collected from our data, published genetic studies of AHC and RDP, HGMD database [21], 1000 Genomes Project [22] and NHLBI Exome Sequencing Project [23]. For each missense variant, we took the crystal structure of pig ATP1A1 as reference (identity 86.2%, similarity 93.2%). We modeled its mutant protein 3D structure at both E1 [12] and E2 [11] conformations by homology modeling with SWISS-MODEL [24] followed by energy minimization [25]. An extensive list of molecular features was calculated by SAPred [26], PyMOL and other tools (see ''Web resources'' section). For each molecular feature, its correlation with neutral versus disease-associated status, and AHC-associated versus RDP-associated, were calculated using Spearman's rho and term's P-value in logistic regression in the R software. Lasso [27] was employed for robust feature selection: 10000 times of bootstrap produced 10000 simulation datasets; in each dataset Lasso selected discriminative features as few as possible; the number of times each feature was chosen was recorded and ranked. Two of the most significantly correlated features were selected to build a logistic regression classifier. The decision boundary of classifier was further verified in consideration of weight. Frequency weight was set according to how many individuals carried the variant. Precision weight for mutation hotspot, recurrent mutation and singleton mutation were set as 2, 1 and 0.75 respectively. The third strategy used all mutations for training without weighting. The accuracy of the classifier was assessed by 100 times of five-fold cross validations.

Genotype-phenotype correlation analysis
Clinical data of the AHC patients were collected from our own Chinese cohort as well as the reports by Heinzen et al.
[2] and Rosewich et al. [9]. Potential population differences in ATP1A3 mutations between Chinese and European/American were assessed. The mutation frequencies of two populations were compared using Fisher's exact test and the population differentiation was measured by F st based on data from HapMap Project [28]. The incident rate of each symptom was also compared. Symptoms that every patient would have were excluded from our analysis, such as hemiplegia and abnormal eye movement. The three mutation hotspots were analyzed to find correlations with each symptom and with the Flunarizine treatment effect by Fisher's exact test. Multiple hypothesis testing correction was done by FDR correction [29].

Web resources
The SAP disease-association predictor [26] (SAPred; http:// sapred.cbi.pku.edu.cn/) is an automatic pipeline to predict the disease-associated single amino acid polymorphism and takes an extensive list of molecular features into account. In this study we used its intermediate feature matrix.
DisEMBL [33] (http://dis.embl.de) was used to predict whether a mutation is located in disordered regions.
ProtScale [34] (http://web.expasy.org/protscale) was used to view the crystal structure and calculate distances between atoms.

Genetic findings
We recruited 47 patients fulfilled the diagnostic criteria for typical AHC. The male to female ratio was 1:0.62 (Table 1). Whole-exome sequencing of three trios and three sporadic cases produced about 18.8 Giga bases of raw sequence data per individual. 98.1% of raw reads could be properly mapped to human reference genome. On average, each base in the capture region has been sequenced about 200X. Following our analysis pipeline, rare mutations in ATP1A3 were identified in all six patients. All mutations were missense and validated by further Sanger sequencing. Together with Sanger sequencing of all ATP1A3 exons in other 41 typical AHC patients, 95.7% (45/47) patients were found to carry ATP1A3 mutations. In total, nineteen missense mutations (at nucleotide level, Table 2) were identified. Eight of them had been reported in AHC cases before, and eleven were novel mutations reported for the first time. These novel mutations were located at highly conserved sites ( Figure S1) and were absent in 100 normal controls, the public 1000 Genomes databases and NHLBI Exome Sequencing Project. These data confirmed ATP1A3 as the main causal gene in the Chinese cohort.
Besides, there were four atypical AHC cases in our cohort whose age at onset was later than 18 months. They all carried missense mutations in ATP1A3 (Table 1) too. Patient A01203 carried a de novo hotspot mutation G947R, while patient A05203 carried a novel and de novo mutation G893R. Unexpectedly, patient A04103 carried a de novo mutation D923N which was previously reported to be associated with RDP and in an AHC family [3,14,15], and patient A01403 inherited a novel mutation L770R from his unaffected mother. All these mutations were in conserved loci and absent in 100 normal controls and public 1000 Genomes data and NHLBI Exome Sequencing Project data.
The sequencing data had been submitted to dbGaP with accession number phs000660.v1.p1. All the mutations had been deposited in the LOVD database (www.lovd.nl/ATP1A3) and dbSNP.

Discriminative molecular features
In addition to mutations identified in our own cohort, we collected all previously reported mutations in ATP1A3 in AHC patients and RDP patients, as well as variants in normal individuals (neutral variants) (Table S1). We found no overlap among the three groups of variants except for D923N which was first reported in RDP patients [14,15] and recently in familial AHC patients [3] as well as in one of our atypical patients (Patient A04103). When viewed on top of the protein domains ( Figure 1 and Table S1), there was clear difference (Fisher's exact test Pvalue ,0.001). Mutations associated with AHC were predominantly located in transmembrane domains (73.7% of AHCassociated mutations were located in transmembrane domains, compared to 13.8% of those not associated, P-value ,0.001), especially aM5 and aM6 which were nearest to metal ion binding sites [11]. Most neutral variants were away from transmembrane domains (5.3% compared to 64.6%, P-value ,0.001). In contrast, mutations associated with RDP showed no location bias (36.4% compared to 50.0%, P-value 0.517).
Further, we built 3D structure models of mutant proteins in both E1 and E2 conformations for each missense mutation/ variant (18 of them classified as neutral variants, 35 as AHCassociated, 8 as RDP-associated and 1 as both AHC and RDP associated; Table S2) and calculated 71 molecular features including protein aggregation property, amino acid composition, conservation, distance to metal ion site, secondary structure, solvent accessibility, protein stability and stereo-chemistry property (Table S2).
Among AHC-associated missense mutations, there were twentythree singleton mutations, seven of which were reported by Heinzen et al.
[2], three by Rosewich et al. [9], one by Ishii et al. [10] and twelve in our study. Given the lack of recurrence, to ensure quality we excluded them from the classifier building procedure. Compared to neutral variants, in addition to the tendency to be located in transmembrane regions, the sites of the disease-associated mutations also showed lower solvent accessibility, closer distance to metal ion binding sites and higher conservation. These four classes of features were also frequently selected in robust feature selection procedure by Lasso (Figure 2A, and Table S3A). To minimize the possibility of overfitting, we chose two features which were top of the list of both correlation and selected frequency. They were 'cbeta_wt_E2' (the b carbon atoms density around the mutated site within 10 Å in wildtype protein E2 conformation; calculated by SAPred), and 'dist_me-tal_E2' (the minimum distance from mutated site to metal ion binding pocket in wildtype protein E2 conformation; calculated by PyMOL). The higher the value of 'cbeta_wt_E2' was and the lower the value of 'dist_metal_E2' was, which meant less accessible and closer to metal ion, the more likely the mutation was diseaseassociated. From the scatterplot ( Figure 2B), it was apparent that most disease-associated mutations clustered in the lower right corner. Based on these two features, we built a logistic regression model. The average of 100 times of five-fold cross validation accuracy for this procedure was as high as 92.9% (95% confidence interval on empirical distribution was 71.4%,100%). Importantly, this classifier predicted almost all singleton missense mutations as disease associated except for D220N and G893R. D220N was Figure 2. Discriminative features and classifier for disease-associated mutations versus neutral variants. (A) Discriminative effect and correlation of each molecular feature based on training dataset. X-Y axes demonstrated correlation between each feature with variant category through logistic regression analysis and Spearman's rho calculation. The dot color represented the class the molecular feature belonged to, and size meant selected frequency in robust feature selection procedure. The selected features were labeled. (B) Classifiers and its prediction result. X-Y axes represented the selected features 'cbeta_wt_E2' (the number of b carbon atoms around the mutated site within 10 Å in E2 wildtype protein structure) and 'dist_metal_E2' (the minimum distance from mutated site to metal ion binding pocket in E2 wildtype protein structure; unit Å ), belonging to 'solvent accessibility' and 'distance to metal site' class respectively. The violin diagrams demonstrated the distribution of each feature in each variant category. The crosses indicate singleton mutations in AHC, while dots were mutations used in the train dataset. The size of the dots represented precision weights. The solid line was corresponded to the simplest model without any weight, while the three dotted lines from left to right according to the intercept on X axis were the decision boundaries of three different models (see Methods section): using all mutations for training; weighting train dataset with frequency weight; weighting train dataset with precision weight. doi:10.1371/journal.pone.0097274.g002 reported as de novo mutation in one trio by Heinzen et al., and G893R was de novo in one atypical case in our sample. When we trained other classifiers taking into consideration mutation frequency weights, the decision boundaries shifted around a bit, but the prediction result still held ( Figure 2B).
Using the same methods, comparison of AHC-associated mutations versus RDP-associated mutations revealed differences between them in terms of transmembrane region and alteration of stereo-chemistry properties (Table S3B), but none of the features reached statistical significance, which could be at least partly due to the small sample sizes.

Genotype-phenotype correlation
We collected detailed phenotype data from our cohort, including both typical and atypical AHC patients, but the following statistical analyses were only performed on the 47 typical cases. All patients had abnormal eye movement and hemiplegia. Up to 70.2% of the patients developed their first symptom before four months and 40.4% before two months. Abnormal eye movement was the initial symptom in 57.4% of the patients at a median age as early as 2 months, making it the most common and earliest onset.
Previous genetic studies on AHC were mainly conducted on European/American samples. The genetic architecture of Chinese AHC cohort was first revealed in this study. We assessed the population differences between Chinese and European/American patients. We found that no mutation showed significant difference in allele frequency ( Figure 3A), and overall ATP1A3 gene exhibited no marked population differences (mean F st was 0.04. Figure 3B). However incident rates of two symptoms were different. 71.7% European/American AHC patients exhibited dystonia, while the rate was as high as 91.5% in Chinese (Fisher's exact test P-value 0.007); 54.5% European/American AHC patients exhibited epilepsy, while the rate was only 17.0% in Chinese (Fisher's exact test P-value ,0.001).
A mouse mutant strain Myshkin that carries heterozygous I810N mutation in ATP1A3 has been reported to show generalized seizures [35]. This mutation corresponds to I810N in human ATP1A3, and one patient in our cohort (A00403) who carried this mutation did exhibit epilepsy. Unfortunately because of limited sample size positive for I810N, we could not verify the statistical association between epilepsy with I810N. However, investigating the genotype-phenotype correlation for the three mutation hotspots, we found that patients exhibiting epilepsy were more likely to carry E815K mutation (Fisher's exact test P-value ,0.001, FDR corrected P-value 0.010).
Forty-one patients were treated with Flunarizine, 28 of whom showed reduced severity, duration, or frequency of hemiplegic attacks (Table 1) although none had been completely cured. We found no correlation between treatment effects and the three mutation hotpots, which was not surprising because the direct target of Flunarizine was not ATP1A3.

A web site of ATP1A3 variants and predictions
We set up a freely available website at http://ahc.cbi.pku.edu. cn for continued update of genetic variations and other related information of AHC and ATP1A3, and prediction of the functional effects of ATP1A3 variants.

ATP1A3 causes AHC mainly through de novo mutations
Our study was the first reported genetic study of AHC in Chinese Han patients. All ATP1A3 mutations were identified in typical cases as de novo except for one familial case. This case came from a three-generation family with two affected individuals ( Figure S2). The proband (Patient A02903) inherited G947R mutation from her affected mother. The mother had onset of eye deviation and dystonia at six months. She developed alternating episodes of hemiplegia at one year of age with or without eye deviation and dystonia, at the frequency of once per month, lasting about one hour per episode. She had no seizures. Flunarizine reduced her frequency of hemiplegia to once every half a year and also alleviated symptoms. She had mild developmental delay. She was 26 years old at last follow up, and could take care of herself and do house-work when not having an episode. The grandmother had no neurological symptoms and was confirmed with no mutation at the position. The grandfather had passed away and thus was not available for sequencing, but he had no neurological symptoms. Therefore, the mother's mutation was likely de novo.

Application of the mutation classifier on ATP1A3
Given the high cross validation accuracy and stable decision boundary, our classifier would be useful for prediction of whether novel missense variants are neutral or predisposing for AHC or RDP. This method would be applicable to other genetic diseases, especially those without hotspot mutations. From Figure 2B, we noticed that two missense mutations (T613M and S684F) associated with RDP were a little far away from the cluster of disease-associated mutations and interspersed like neutral variants. This agreed with the fact that RDP was less severe than AHC in some aspects such as cognitive development [9]. What's more, G854V which was collected from NHLBI Exome Sequencing Project, was clearly inside the cluster of disease-associated mutations. We suspected that the carrier was susceptible to RDP, but had been free from triggers.
Genetic testing would facilitate differential diagnosis AHC often manifested with ocular deviation and dystonia. These early symptoms were often misdiagnosed as epileptic seizures and treated with antiepileptic drugs. Epilepsy may coexist in some AHC patients [16,36,37]. Recognizing early clinical features of AHC and video EEG monitoring of episodes are important for differential diagnosis. ATP1A3 mutation screening can be highly effective for differential diagnosis, especially in the early stages of AHC.
Besides our samples in which ATP1A3 accounted for 95.7% in the typical AHC cases and 100% in four atypical cases, other studies have reported several atypical patients who also carried ATP1A3 mutations. Heinzen et al reported one AHC family in which an affected individual had onset of episodes of whole body tonic stiffening at three years old and was confirmed to have   [3]. Taken together, these evidences suggested that mutation analysis of ATP1A3 gene is helpful to confirm the atypical AHC cases.
Given the high positive rate of ATP1A3 mutations in atypical cases and that one of them (Patient A01203) shared the hotspot mutation G947R, the typical and atypical cases may share the same pathogenesis. It suggested that the diagnostic criteria of onset of paroxysmal events before 18 months of age may need to be relaxed.

Potential genetic complexity exists
The aforementioned D923N mutation was also discovered in one of the atypical AHC patients in our cohort (Patient A04103). This mutation had previously been reported in two unrelated RDP patients [14,15], one of whom was atypical for RDP because his age of onset was earlier and he had paroxysmal episodes. Although the vast majority of AHC and RDP mutations had no overlap, this one mutation seemed to cause distinct phenotypes, indicating that other genes or epigenetic and environmental factors may modify the clinical features.
Mutation L770R was discovered in an atypical AHC patient (Patient A01403) and inherited from his unaffected mother. This mutation was located at a highly conserved site and absent from normal controls, and predicted to be highly associated with disease by our classifier with 98.6% probability, implying that it was likely a functional mutation. The mother carried the L770R mutation but had no clinical symptoms of AHC or RDP at last follow up when she was 36 years old. It may be due to incomplete penetrance.

Development of targeted and more effective treatment are needed
Flunarizine is a drug developed to treat migraine [38]. It is used in AHC patients in China, Europe, Canada, and Japan, but not commonly used in the U.S. Flunarizine reduced the severity, duration, or frequency of the hemiplegic attacks in 68.3% of the patients, but could not lead to complete cure. Flunarizine is a nonselective calcium entry blocker targeting the CACNA family [39], not ATP1A3. There is an urgent need for a new drug to specifically target ATP1A3.
In summary, this study was the first genetic analysis performed in a Chinese AHC cohort. The ATP1A3 mutation rate was 95.7% in our typical patients. We identified 13 novel missense mutations of ATP1A3. The majority of mutations were de novo. Genotypephenotype correlation analysis showed that patients with epilepsy were more likely to carry E815K mutation. Logistic regression classifier exhibited accurate prediction on missense variant of whether it was neutral or predisposing for disease. Genetic testing of ATP1A3 mutations is helpful for early diagnosis and confirming atypical cases of AHC.  The 'Spearman's rho' was the spearman correlation between features with category, 'Pvalue of term in logistic regression' was the P-value of term in logistic regression between features with category, 'Selected Frequency' was the selected frequency at robust feature selection procedure, and 'Feature class' meant the class each feature belonging to. (XLSX)