Large scale analyses of genotype-phenotype relationships of glycine decarboxylase mutations and neurological disease severity

Monogenetic diseases provide unique opportunity for studying complex, clinical states that underlie neurological severity. Loss of glycine decarboxylase (GLDC) can severely impact neurological development as seen in non-ketotic hyperglycinemia (NKH). NKH is a neuro-metabolic disorder lacking quantitative predictors of disease states. It is characterized by elevation of glycine, seizures and failure to thrive, but glycine reduction often fails to confer neurological benefit, suggesting need for alternate tools to distinguish severe from attenuated disease. A major challenge has been that there are 255 unique disease-causing missense mutations in GLDC, of which 206 remain entirely uncharacterized. Here we report a Multiparametric Mutation Score (MMS) developed by combining in silico predictions of stability, evolutionary conservation and protein interaction models and suitable to assess 251 of 255 mutations. In addition, we created a quantitative scale of clinical disease severity comprising of four major disease domains (seizure, cognitive failure, muscular and motor control and brain-malformation) to comprehensively score patient symptoms identified in 131 clinical reports published over the last 15 years. The resulting patient Clinical Outcomes Scores (COS) were used to optimize the MMS for biological and clinical relevance and yield a patient Weighted Multiparametric Mutation Score (WMMS) that separates severe from attenuated neurological disease (p = 1.2 e-5). Our study provides understanding for developing quantitative tools to predict clinical severity of neurological disease and a clinical scale that advances monitoring disease progression needed to evaluate new treatments for NKH.

Introduction Enzyme dysfunction underlies many pathologies, including a large number of neurological disorders, the metabolic consequences of which affect the central and/or peripheral nervous system. Clinical presentations of neuro-metabolic disorders include movement disorders [1], seizures, childhood epilepsies [2], and/or peripheral neuropathy [3]. Glycine decarboxylase (GLDC also known as P-protein) is an enzyme that catalyzes the cleavage of glycine, the first step of the mitochondrial glycine cleavage system (GCS). Other GCS components are aminomethyl transferase (AMT; T-protein), glycine cleavage system H-protein (GCSH), and dihydrolipoyl dehydrogenase (DLD; L-protein). Loss of GLDC-protein activity completely abrogates GCS function. Catabolism of glycine by the GCS is an essential metabolic process. Degradation of glycine feeds into one-carbon folate metabolism through the formation of 5,10-methylene THF [4], which in turn is utilized to synthesize nucleotides and proteins. Perturbations in the GCS are involved in a number of disease states, including cancer and neural tube defects (NTDs) [5][6][7]. Loss of function mutations in GLDC are the primary cause for the rare neuro-metabolic disorder non-ketotic hyperglycinemia (NKH), accounting for approximately 85% of NKH cases [8].
NKH affects approximately 1 in 76,000 births [8], although some populations, such as the Finnish, have a higher rate due to founder mutations and consanguinity [9,10]. A high incidence of NKH has also been reported amongst the Amish [11]. NKH results from loss of GLDC-or (to a lesser degree) T-protein activity. This causes an acute increase of glycine in plasma and cerebral spinal fluid (CSF) [12]. But plasma glycine is not predictive of clinical severity; furthermore, it is a challenge to continuously monitor glycine in the CSF. NKH is typically characterized as either severe or attenuated type [12,13]. Severe NKH causes intractable seizures, failure to thrive, lack of developmental milestones and often premature death. Attenuated patients are often able to control seizures, go to school, and live into adulthood. But lack of mutation-based predictors of disease progression and a quantitative scale of disease severity scale impedes both management of NKH as well as development of therapies to treat and cure the disease.
In this study, we developed a multi-parametric mutation scale applicable to all but 4 of 255 missense NKH mutations across the GLDC gene and thereby assigned a multiparametric mutation score (MMS) to 251 patient mutations. The MMS was further optimized against a newly developed patient-based clinical outcomes score that was based on major symptomatic domains extracted from a comprehensive review of clinical cases reported in the literature. Our findings yield a quantitative tool with high predictive value to support disease management and emerging treatments associated with >95% of known clinical NKH mutations.

Positional distribution of NKH Missense Mutations
The most recently published comprehensive list of NKH missense mutations lists 171 unique mutations across the length of GLDC [8]. To this, we added 85 missense mutations based on additional literature and the Clinvar database. This yielded a list of 256 mutations ascribed to 213 unique residues (S1 Table, Fig 1A). NKH missense mutations were found in the C-terminus, which houses most of the GLDC active site, as well as the N-terminus, suggesting substantial distribution throughout the protein. Of the twelve of the most frequently mutated residues (which give rise to 18 mutations; Fig 1B) five appear in the N-terminus and seven in the C-terminus, and are not concentrated in any specific domains (based on conservation analyses as well as domains predicted from primary and secondary structure; Fig 1C). One of the 256 mutations is located in the mitochondrial leader sequence (M1I). Of the remaining 255, only 49 mutations have been assessed for loss of enzymatic activity compared to the wild type. Of these, only 9 have been analyzed for their potential effect on GLDC-protein structure based on in silico 3D-modeling [14]. Together, the findings summarized in Fig 1A-1C show the overall lack of annotation for NKH mutations and demonstrate the need for improved tools and analyses to better understand mutations across the gene, how they may cause deficiency in the encoded protein and thereby impact disease severity.

Structural annotation of GLDC
Comparative structural analyses. Since comparative structural analysis provides a robust path to understanding functions of conserved domains, we generated a high-confidence homology model for human P-protein using Synechocystis sp. 6803 PLP-bound glycine decarboxylase (PDB: 4LHC) as a template (Fig 2A). The model in Fig 2A shows a global mean quality estimate (GMQE) of 0.77 (where 1 is the highest possible score). Most of the uncertainty in this model comes from a flexible loop consisting of amino acids 360-384. This region is missing in the bacterial orthologue catalogued in the Protein Database because the region can exist in either a disulfide-bridge or open form, resulting in a low electron density [15]. SWISSmodel predicted this region to be in the open conformation, which is the conformation seen in the holoenzyme [15].
GLDC's enzymatic function is dependent on the binding of its cofactor pyridoxal phosphate (PLP) at Lys754 (Fig 2B). Active site residues ( Fig 2C) were defined as any amino acids within 5 Angstroms of the PLP-bound Lys754 or substrate glycine. Active site tunnel residues ( Fig 2D) were defined as residues equivalent to those constructing the active site tunnel observed in the bacterial structure (based on sequence alignment). The dimerization interface ( Fig 2E) was defined as residues predicted to be within 5 Å of the GLDC-protein α'-subunit of the αα' dimer (although there is no direct evidence that dimerization is required for enzymatic function). These three conserved, functional regions provide annotation for much of the Cterminus and they account for 40 NKH-causing mutations, only seven of which have been previously characterized.
N-terminal active site function. Although comparisons with the bacterial structure accounted for mutations at and around the active site, they failed to explain the presence of high density of pathogenic mutations in N-terminal regions, that appear to be devoid of functional annotation. We undertook additional evolutionary analyses to predict N-terminal domain function of GLDC. GLDC belongs to the PLP-enzyme Fold Type I family [16]. However, it is an unusual member as all other PLP Fold Type I enzymes in this family form α 2 homodimers (where each α-subunit is~500 amino acids) while GLDC is either a single polypeptide over 1000 amino acids or, in some bacteria and archaea, an αβ heterodimer, with the α-and β-subunits corresponding to the N-and C-termini respectively (Fig 3A). In these αβ orthologs, there is structural and sequence similarity between the α and β subunits, leading to the suggestion that they came from the same evolutionary precursor [17]. We found that in human GLDC-protein, pBLAST alignment of the N-and C-termini shows a region of similarity (26% identity, 48% positives) between amino acids 238-349 and 572-785 (S1 Fig). Notably, the C-terminal region contains the active site, Lys754. In addition, large regions of the N-terminus (113-472) and C-terminus (531-906) are significantly structurally similar by rigid FAT-CAT alignment (p = 3.61e-13, RMSD = 2.59Å; Fig 3B). The structure of the C-and N-termini also both show similarity to other Fold Type I carboxylases. The defining feature of this fold is https://doi.org/10.1371/journal.pcbi.1007871.g001 the 7-stranded β-sheet packed by α-helices [18], which is evident in both N-and C-termini ( Fig 3C).
Taken together, these analyses suggested a conserved PLP-binding fold in the N-terminus. Accordingly, I-Tasser COFACTOR prediction for P-protein predicts the N-terminus to be a PLP-binding site based on structural similarity to other PLP-binding enzymes (Fig 3D), although with a low confidence score of 0.05 (range: 0-1). However, the known C-terminus PLP-binding site was not predicted by I-Tasser, enabling us to retain the N terminus projection (despite the low score). Intriguingly, however, the N-terminal PLP Fold is lacking an active site lysine, having instead a glutamine at that position. However, site-directed mutagenesis of another PLP Fold Type I enzyme's active site lysine showed that lysine, while essential for catalysis, was not essential for PLP-binding [19]. Thus, we do not predict that this pocket has enzymatic activity. Rather, we predict that the N-terminus non-covalently binds PLP. It  has been observed that PLP is needed for GLDC-protein to fold properly [20], suggesting a rationale for why 11 clinical NKH-mutations cluster in the fold-predicted PLP-binding region ( Fig 3E; which has no other known function).

Macromolecular interaction of H-Protein with GLDC-protein.
The interaction of GLDC with H-protein is essential for GLDC function, but the molecular coordinates of the interaction remain unknown. A lipoyllysine group on H-protein accepts the amino-methyl moiety produced by decarboxylation of glycine by GLDC-protein and transfers the moiety to T-protein [21]. We predicted that this lipoyllysine accesses the active site through the observed active site tunnel. To model this interaction, a homology model for H-protein was generated with SWISS-Model using a published crystal structure for bovine H-protein (98% sequence similarity to human H-protein) as the template. The ClusPro 2.0 server was used to model the interaction of H-Protein with the homology model for human GLDC-protein. ClusPro 2.0 produced 101 potential interaction models (S1 Appendix). To select the best model, a novel scoring system was developed based on conservation of the two proteins' interacting residues and proximity of the modified lysine of H-protein to the entry of the active site tunnel in GLDC-protein. Conservation was chosen to select the best model because highly conserved protein surfaces are a predictor of a protein binding site [22]. Accordingly, GLDC-protein contains a highly conserved region at the site where H-protein should bind (Fig 4A). To further test our conservation parameters, we scored the interaction between human T-protein with H-protein (which are expected to interact) and T-and GLDC-protein (not expected to interact). Like the H-GLDC interaction, the H-T interactions had scores close to the max of 2 while the T-GLDC interaction did not (Fig 4B), confirming the utility of this method. To test the accuracy of all three scoring parameters, the interaction between E. coli T-and H-protein was modeled using ClusPro 2.0 (S1 Appendix) and scored (S1 Table). The highest-ranking computational docking interactions were compared to the previously published crystal structure of the docking interaction [23], and it was found that the second highest scoring model (score = 2.84 of 3) was visually similar to the crystal structure (S2 Fig).
We performed two rounds of H-and GLDC-protein scoring to ensure that we obtained the highest-possible scoring model (S2 Table; Fig 4C and 4D) with a score of 2.81 out of 3 ( Fig  4D). This model was selected as our predicted model of the interaction between human GLDC-and H-protein ( Fig 4E). Finally, of seven known NKH-mutations at the predicted H-GLDC interface, five are predicted by Mutabind to negatively affect the interaction between H-and GLDC-protein ( Fig 4F). These mutations point to the importance of the salt bridges formed between Arg373 and Lys376 of GLDC-protein and Glu77 and Glu124 of H-protein for stabilizing the interaction (Fig 4G). Thus, our interaction models provide crucial information for these six residues with NKH-mutations that were previously not understood.
As summarized in Table 1, the rates of mutation in the N-term PLP binding site (0.3) and H-GLDC interface (0.35) are higher than the baseline mutation rate in the protein (0.21) and approach those seen in the active site (0.33) ( Table 1), but these increases are not statistically significant by the hypergeometric test suggesting that mutations throughout the protein have some capacity to confer defect.
Large-scale analysis of disease mutations. Comparative structural and evolutionary analyses undertaken in i-iii, enabled annotation of 51 previously uncharacterized NKH mutations. Although this reflects a 100% increase in mutation annotation, a large number of mutations (~150) remain in need of annotation. Since many missense mutations are expected to impact protein folding, we initiated large-scale studies that incorporate the assessment of the Gibbs free energy changes (ΔΔG) that arise as a consequence of mutation, as well as other changes in intrinsic properties of amino acids. ΔΔG provides a benchmark measure to predict the change in stability of a monomeric protein caused by a point mutation. Although many predictive online tools exist [24], we used CUPSAT because it makes fast and accurate ΔΔG predictions and is thus ideally suited for the large number of missense mutations seen in NKH and GLDC. We provided the SWISS-Model generated GLDC homology model as the input and defined destabilizing mutations as those with predicted ΔΔG < -1.5 kcal/mol (see Methods) to yield stability predictions for 251 of 255 missense mutations (see S3 Table). The remaining four mutations in a small, uncrystallized region at the beginning of the N-terminus could not be assessed and were not pursued further.
Fig 5A provides a pictographic representation that suggests that of the 251 mutations, 105 were predicted to be destabilizing (< -1.5 kcal/mol) with 42 being very destabilizing (< -5 kcal/mol). Destabilizing mutations were seen throughout the protein with 44 being found in the N-terminus and 61 being found in the C-terminus (S3 Table). The two most common NKH clinical mutations, both of which are known to cause severe disease are predicted to be destabilizing (R515S ΔΔG = -2.47 kcal/mol; S564I ΔΔG = -3.23 kcal/mol). In total, 4 (R515S, S564I, G771R, and V905G) of the top 10 most common missense mutations are predicted to be destabilizing. But the majority of mutations are predicted to have (i) negligible effect (ΔΔG = -1.5 to 1.5 kcal/mol; N = 96), (ii) stabilizing effect (ΔΔG = 1.5 to 5 kcal/mol; N = 35), or (iii) very stabilizing effect (ΔΔG > 5 kcal/mol; N = 15). This suggests that while ΔΔG provides valuable information on predicted stability for NKH missense mutations, it is not sufficient as a comprehensive parameter for the impact of these mutations.
Multiparametric mutation scores (MMS) that incorporate ΔΔG and other protein parameters, have previously been successfully used to predict missense mutation effect on protein dysfunction [25]. Therefore, we created an MMS that incorporated four broad categories of 1) stability effects, 2) conservation of mutated amino acid, 3) position of the mutated amino acid, and 4) change in amino acid properties caused by substitution (Fig 5B and 5C). Specifically, we defined two stability parameters (stabilizing and destabilizing substitutions), two conservation parameters (conservation of residue and of amino acid substitution), eight location-based parameters (sheet, helix, c-terminus, active site, active region, N-terminus PLP pocket, H-protein interface, and dimerization interface mutations), and six parameters based on change in amino acid properties (change in polarity, charge, aromaticity, codon availability, size, and to/

Fig 4. Model of the interaction between GLDC and H-Protein of the Glycine Cleavage System. [A]
Model showing amino acid conservation scores in GLDC calculated using the Consurf server. Highly conserved amino acids are shown in red, while poorly conserved residues are shown in blue. The highly conserved surface (deep red) found at the entry of the active site tunnel suggests this region may be a binding site for Hprotein.
[B] Scoring of models generated for the GLDC + H interaction (blue), T + H interaction (orange), and the GLDC + T interaction (gray). Conservation of interacting surfaces was scored from 0-1, with the most conserved surface assigned a score of 1. GLDC and T-protein likely do not interact, thus their interaction was included as a negative control. Accordingly, the GLDC + T scores were lower than the GLDC + H and H + T interaction scores, validating our novel scoring method.
[C] Scoring of GLDC + H-protein interaction models scored using the conservation of interacting amino acids with the distance between the H-protein and the entry to the GLDC-protein active site tunnel as an additional parameter. H-protein was allowed to bind to any location on the GLDC-protein surface.
[D] H-protein was constrained to a region of GLDC protein based on the highest scoring results of the first round of scoring, and scoring was repeated for the resulting models.
[F] Seven known NKH mutations were found at the predicted H-GLDC interface. The ΔΔG caused by the mutation was estimated using Mutabind. 5 of 7 mutations are predicted to caused deleterious effects to the H-GLDC protein interaction, with 3 being high confidence predictions.
[G] Salt bridges between Glu77 and Glu124 of H-protein and Arg373, and Lys376 of GLDC-protein. These GLDC-protein residues are known to cause NKH when mutated.
https://doi.org/10.1371/journal.pcbi.1007871.g004 from proline). Together these contributed eighteen distinct parameters which, when applied to 251 NKH causing-missense mutations, yielded an MMS for each mutation (S2 Table). Phi correlation analysis of the eighteen parameters demonstrated independence between all parameters. There were light negative correlations between conservation of amino acid substitution and change in polarity (φ = -0.44, with 1 being perfect correlation) and change in volume (φ = -0.51) (Fig 5B). However, these correlations are statistically weak and hence both parameters were retained. The distribution of the scores indicates a lack of distribution bias in scoring ( Fig 5D). Only 4 (R212K, R377Q, E495Q, and N709S) of 251 mutations received a score of 0, confirming MMS yields value for the vast majority of known mutations. Mutations with scores of 1-2 were considered mild (N = 58), 3-4, moderate (N = 93), and �5 severe (N = 99) (S4A Table), consistent with reports that NKH disease in the patient population is more likely to be moderate to severe (rather than mild) [12,13].

Clinical outcomes score to assess patient disease severity status
NKH is a multi-system neuro-metabolic disorder. Hennermann et al [12] have classified disease on the basis of presence and absence of neurological/brain features, while Swanson et al [13] classified disease based on reaching developmental outcomes. But the lack of a disease severity scale based on a quantitative, dynamic progression of different NKH symptoms across severe and attenuated disease, has limited linking genotype to phenotype.
To develop such a clinical severity scale, we began by building a comprehensive list of symptoms associated with NKH. Prior studies utilized a list of 12 NKH symptoms [12]. We reviewed 131 patient records from 26 publications over the last 15 years, to identify fifty-eight unique symptoms (Table 2), that were identified and classified into eleven categories of hyperglycinemia, cognitive disorders, seizures, muscle/movement control, brain malformations/ injury, respiration, hormonal disorders, hearing, eyesight, immune system and digestion. A category was considered a major disease domain if it was represented in at least of 30% of patients with recorded symptoms, with the exception of glycine elevation, because while it provides a diagnostic criterion, it does not correlate well to the severity of neurobehavioral disease [13]. As shown in Table 3, four major domains emerged, namely (and in order of frequency) cognitive disorders (81%), seizures (73%), muscle and movement dysfunctions (35%), and brain malformations (32%). They encompassed 46 of 58 (79% of) symptoms. Respiratory defects were seen in 17% of patients, which is likely a result of under-reporting of the symptom. Regardless, respiratory issues usually self-resolve (barring when intubation was removed because of the overall poor prognosis), and respiration was not included in symptomatic domains. Hearing, eyesight, immune system, hormonal and digestive disorders were each seen in less than 3% of cases and therefore not included. A Likert-like scale was used to assign major domain scores of 0-3 based on severity in each domain (Table 3). Cognitive disorders and muscle/movement control were assigned linearly from 0-3. The seizure domain was assigned a non-linear step increase of 1 to 3 corresponding to transition from controlled seizure activity to uncontrolled seizure activity (and capture the severity of intractable seizure compared to controlled seizure). The brain malformation domain was assigned a binary choice of 0 or 3, because any brain malformation is expected to seriously impact neurological disease. Summation of all four domains yielded a patient clinical outcome score (COS), with a maximal score of 12.
In the assessment of patient COS, we removed 45 records where patients had died, because death can occur due to a single acute event that may not reflect multi-symptom disease severity. Most of these were predominantly of pediatric patients and associated with a range of mutations (S4B Table). An additional 12 patient records were also excluded because they contained information in only 1 (of four) major domains. The remaining 74 patient records were quantitatively scored for their associated major disease domains (Tables 4 and 5, S2 Table). The majority of records from patients were scored for 3 out of 4 major disease domains in both homozygous and compound heterozygotes (Tables 4 and 5). Seizures and cognitive disorders were the two most common disease domains, although muscle/movement control and brain malformation were also often reported. As shown in Fig 6A, the majority of patient COS's ranged from 2-9, with two patients showing the maximal scores of 12. Severe patients scored above 5 (having a severe score of 3 in at least one domain and moderate score of 2 in another). In our cohort, there were 29 patients with severe disease. 43 patient records showed attenuated disease, and two individuals were asymptomatic (despite being homozygous for the pathogenic A802V mutation; Fig 6A). Of the 74 patients, 40 were male and 34 were female, and both genders showed similar range distributions of severe and attenuated COS's ( Fig 6B). Analyses for age suggest that severe disease was more prominent in children < 5 in both genders (Fig 6C).
We examined whether COS could be corroborated with known information about patient mutations for both homozygous and compound heterozygous mutations. First, for the 24 patients with homozygous mutations, each allele was assigned the same COS (see Table 4), which also served as the overall patient COS. When multiple patients were homozygous for the same mutation, an average COS was calculated. As shown in Table 6, four of the 18 � Number assigned on basis of S4A homozygous mutations (R515S, T269M, A389V, A802V) are amongst the top 10 mutations found in clinical NKH. R515S received a COS of 9 (out of 12) consistent with its association with severe disease [26]. T269M, A389V, A802V, received COS's of lower than 5, in keeping with their association with attenuated disease [27][28][29]. Overall, 14 out of 16 attenuated (COS � 5) cases were associated with low MMS, while 4 of 8 severe cases (COS greater than 5) were associated with MMS � 5. There was no overt positional bias of COS in GLDC-protein ( Fig 7A). For compound heterozygotes, since this group had two alleles, the observed patient COS was considered to be the composite contribution of both alleles without the assumption that each allele contributes exactly 50% to the clinical outcome. As shown in Table 7, 27 of 29 (93.1%) of patients with attenuated disease (COS<5) showed lower MMS scores (<5). 2 of 21 (9.5%) patients with severe disease (COS>5) showed higher MMS scores (�5). Notably of 11 patients with truncations/deletions and COS>5, all showed MMS scores below 5. Examination of the location of missense mutation and deletions/truncations in each heterozygote suggested that deletions or truncations early in the gene maybe associated with severe disease (Fig 7B).

Weighted-Optimization of the MMS by the COS
For both homozygous and compound heterozygous mutations in a patient, the MMS is a better predictor of attenuated versus severe disease. This suggested that the MMS needed further optimization against clinical disease to more accurately capture the determinants of severe disease. It should also be noted that the MMS does not incorporate human factors such as genetic background which is known to play a prominent role in disease manifestation in genetic disorders. Homozygous mutations. The MMS was first applied to all 18 homozygous mutations associated with 24 clinical cases (summarized in Fig 8A). As shown in Table 6, every patient mutation was assigned an MMS score in addition to the previously determined COS. Ten variants that were found in homozygous form of GLDC in the Exome Aggregation Consortium (ExAC) database, hosted by the Broad Institute, (Table 8) were also scored. The ExAC database utilizes genomic data from healthy individuals; thus, homozygosity of these mutations indicates that they are non-pathogenic. They were included therefore as a non-pathogenic control group and were assigned a COS of zero. The COS was applied as a function of the MMS of homozygous mutations (S3 Fig, Fig 8A), and this correlation was used to optimize parameter weights in order to yield a model with more biological and clinical value. Weighting was automated using Python. Each parameter weight was optimized using the linear regression, done with the linear regression class in Python's Scikit Learn module. The final correlation between COS and the weighted MMS (WMMS) yielded an R 2 -value of 0.79 ( Fig 8B). As shown in Fig 8B, R515S, the severe and dominant NKH mutation, remained predicted to be pathogenic with a WMMS of 4.3. Less pathogenic but prevalent mutations A389V (WMMS = 1.2), and A802V (WMMS = 0.3) and T269M (WMMS = -0.8) are predicted to be attenuated.
R515S scored in five parameters. R515 is a highly evolutionary conserved, C-terminal residue, and the substitution to serine is destabilizing and causes a change in charge and volume (see S5 Table, S6 Table). A389V and A802V each scored in 4 parameters, including the negatively scoring conserved evolutionary substitution parameter. A389 is a conserved residue in  an α-helix, while A802 is a C-terminal residue and part of a C-terminal loop that is within 5 Å of the predicted N-term PLP pocket. Substitution from alanine to valine causes a change in volume but is a well-tolerated substitution throughout evolution. T269M only scored in one parameter, as the mutation results in a change in polarity. Fig 9A. Compound heterozygous scoring was complicated by the fact that each individual bears two different mutations, often with one of the mutations being a deletion, nonsense mutation, intronic mutation, or a mutation in the mitochondrial signal sequence. Of the 50 compound heterozygous patients in our cohort, 18 patients had two missense mutations and 32 patients had one mutation that was not a missense mutation. Each pair of missense mutations was assessed across 18 parameters, described in Methods. MMS parameters were designed specifically for missense mutations; thus, to facilitate scoring of the 32 patients, deletions, nonsense mutations, intronic mutations, and mutations in mitochondrial signal sequence were added as four additional parameters. Heterozygous mutations were trained without the assumption that each allele contributes exactly 50% to the clinical outcome (Methods; Table 7).

Compound heterozygous mutations. The overall strategy for application of COS and MMS (S3 Fig) for compound heterozygous individuals is summarized in
We also scored variants from healthy individuals (obtained from dbGaP), which were included as non-pathogenic controls with a COS of zero (Table 9). MMS parameters and deletion, nonsense, intronic, and mitochondrial signal sequence mutation scores were optimized using the best fit line of the COS vs the composite score using the linear regression class in Python's Scikit Learn module (see Methods). The R 2 of resulting best fit of COS vs composite score fit obtained with heterozygous mutations (Fig 9B) was slightly lower than that observed with homozygous mutations (0.73 vs 0.79) (Fig 8B). We suggest that with more robust clinical data, these differences as well as differences in weighting between homozygous and heterozygous-trained parameters (Table 10), would converge. Nonetheless, our analyses informed that COS is proportional to WMMS.
We further examined cases where both mutations were heterozygous by plotting the score of one allele as a function of the score of the second allele (Fig 9C). Individuals that were asymptomatic (COS = 0), attenuated (COS = 1-5), and severe (COS > 5) could be separated based on gating these populations (Fig 9C). Overall, this plot shows that severe disease requires two severe mutations, while attenuated disease is often either a mixture of mild and severe or two moderate mutations. Based on the asymptomatic cohort clustering, healthy individuals can be compound heterozygous for GLDC variants if one of them is very mild. These results support the use of WMMS's as a predictive tool for NKH outcome. severe NKH patients (COS > 5). For both the homozygous-trained ( Fig 10A) and heterozygous-trained ( Fig 10B) WMMS's, each disease category shows significant separation. Most importantly, severe disease can be significantly distinguished (by student's t-test) from attenuated disease with a p-value of 1.2e-5 for homozygous patients and a p-value of 3.5e-7 for heterozygous patients. These data support the use of WMMS's as a predictive tool for the clinical outcome of NKH based on retrospective clinical analyses.

Discussion
We used computational approaches to undertake large scale, comparative and evolutionary analyses to enable multiparametric in silico assessment of 251 (of 255) NKH disease causing missense mutations based on structure-function properties intrinsic to GLDC protein. Further, our data ascribe either conserved, evolutionary function or clinical disease severity to 89 previously uncharacterized mutations. In contrast, prior studies have cumulatively reported on biochemical characterization of 49 mutations. Our evolutionary analyses support a new PLP binding function for the N-terminal PLP domain and predict residues that form the junction for H-GLDC interactions. Four of the top 10 clinical mutations (R515S, S564I G771R, and V905G) could only be annotated by the MMS. The development and utilization of COS (rather than a biochemical activity) are particularly advantageous for a large gene like GLDC (1020 aa) with hundreds of disease-causing mutations. But as with many rare diseases, clinical symptom presentation of NKH disease is highly heterogeneous. We therefore based our clinical outcomes scale/score on multiple (~50) symptoms that were, aggregated into four major disease domains. A major limitation of retrospective analyses of clinical records, is the variability in the published data content. This variability was somewhat decreased by excluding records that only contained one (of four) major symptomatic domains. Removing patients who died may decrease the power of prediction for severe cases in the WMMS model. Nonetheless the cohort analyzed captured a dynamic range of disease outcomes, including severe disease in young patients.
In studies with compound heterozygotes, mutations were trained without the assumption that each allele contributes exactly 50% to the clinical outcome. The extent to which an allele contributes to clinical disease is poorly understood and may well be affected by the overall genetic background of the patient. Thus, further adjustments may be needed to optimize the multi-parametric mutation score of compound heterozygotes. Further optimization is also  needed for other non-missense mutations, such as considering the position of a deletion or premature stop codon in relation to the protein active site. Nonetheless we were able to score 50 compound heterozygotes and 24 homozygotes (total of 74 patient records) based on a comprehensive survey of the clinical literature and in future work could be rapidly scaled to all mutations, missense or otherwise.
In conjunction with pathogenic destabilizing mutations in target proteins, multiple additional factors influence disease progression in all genetic diseases. A nonsense or deletion early in the coding region is likely to be very severe, but much milder if located in the C-terminus after the protein active site. Intronic mutations that are not at the splice site will likely have less impact while those at the splice site can have differential effects depending on location. Finally, the overall genetic background has a profound influence on the emergence and progression of disease. While these parameters cannot be incorporated into the first step of developing a mutation-based score, weighting the MMS against the COS enables influence of parameters of 'clinical' relevance into the weighted MMS (WMMS). The WMMS was particularly important in analyses of compound heterozygous mutations in cases where the second mutation is a deletion whose location can dramatically alter severity. In conclusion, our data suggest that WMMS is sufficiently robust to distinguish between severe and attenuated disease based on optimization using retrospective analyses of patient records. We therefore suggest that it presents a powerful tool to initiate and refine future analyses in larger prospective studies with active recruitment/review of patients and their medical records to strengthen management and prediction of disease course.

Homology modeling of P-and H-proteins
Human homology models were generated for GLDC and H-proteins using the SWISS-model (Swiss Institute of Bioinformatics), which generates homology models as described previously [30][31][32]. Human GLDC was modeled using as the template the solved crystal structure for Synechocystis sp. PCC 6833 P-protein holoenzyme (PDB ID = 4lhc; sequence identity with human = 56.8%). Human H-protein was modeled using as the template bovine H-protein crystal structure (PDB ID = 3wdn, sequence identity with human = 98.0%).
The active site, active site tunnel, and dimerization interface functional regions of human P-protein were inferred from sequence comparison to the Synechocystis holoenzyme crystal structure.

Protein imaging
All 3D protein images were generated using the free protein-modeling software Jmol.

NKH mutation analysis
A comprehensive list of NKH-causing missense mutations was compiled through a literature search of previously published mutations and mining of missense mutations catalogued in the ClinVar database hosted by the National Center for Biotechnology Information (NCBI). Missense mutations in ClinVar reported as "Benign" or "Likely Benign" were excluded. Secondary structure location of missense mutations was determined using jMol. Allele frequencies, if reported, were extracted from the Exome Aggregation Consortium (ExAC) database (Broad Institute) [33].

Ligand prediction
Structure-based ligand predictions were done using the I-Tasser COFACTOR tool provided by the Zhang Lab [34,35] (University of Michigan).

Protein-protein interaction modeling
The human P-and H-protein docking models were generated using the ClusPro 2.0 server [36][37][38] (Boston University). Models were generated as previously described. Briefly, the interacting proteins were docked using the fast-fourier transform (FFT) method. Highly populated clusters were selected and screened by CHARMM minimization.

Model ranking
ClusPro 2.0 models were ranked using three equally weighted parameters: conservation of Pprotein interacting residues, conservation of H-protein interacting residues, and distance of the active site lipoylated lysine from H-protein from the active site entry tunnel of P-protein.
Scores were normalized such that the highest scoring model in a particular parameter received a score of 1, and the lowest scoring model received a score of 0. Interacting residues were defined as residues within 4 Angstroms from the docking partner. Conservation scores were generated using the Consurf server [39][40][41], which assigns a conservation score based on a multiple sequence alignment (MSA) with 150 unique orthologs. Parameters were tested using a previously-published crystal structure of the interaction between E. coli T-and H-protein (PDB: 3A8K) [23]. The interaction was modeled using ClusPro 2.0, and the models were scored using the above parameters. Top models were compared to the crystal structure. The interaction between human T-and H-protein and P-and T-protein (as a negative control) were also modeled using the above methods. Human T-protein has a previously published crystal structure (PDB: 1WSV) [42].

Mutation effect on protein stability
Multiple online tools for prediction of ΔΔG caused by point mutations were preliminarily assessed, including CUPSAT, MuPro, PopMusic, STRUM, mCSM, FoldX, Dynamut, I-Mutant 3.0, SDM, and DeepDDG. CUPSAT uses protein structure to make fast and accurate ΔΔG predictions from environment-specific atomic and torsion angle potentials. The high speed of CUPSAT predictions lends itself to analyses of diseases associated with a large number of missense mutations [43]. CUPSAT was also recently found to be the most accurate predictor for disease-causing mutations in PIXT2 that were known to be destabilizing [44]. Because of these considerations and because predictions are based on structural, site-specific information about the mutated residue, predictions of the stability effects of NKH-causing mutations were generated using CUPSAT. The SWISS-Model generated GLDC homology model was provided as the input. We defined destabilizing mutations as mutations with a predicted ΔΔG < -1.5 kcal/mol because a study found that 45 missense mutations associated with protein loss-offunction with experimentally derived ΔΔG's had an average value of -1.67 kcal/mol [45].

Multi-Parametric Mutation Score (MMS)
An eighteen-parameter test based on stability effects, conservation, location, and amino acid properties was used to score each mutation. Parameters were defined as the following: i. Destabilizing ΔΔG as predicted by CUPSAT. ii. Highly stabilizing mutation (positive ΔΔG, because rigidity in the protein is known to cause a loss of function). iii. Evolutionary conservation of the mutated amino acid (based on 150 homologs from different species because high conservation indicates that the residue is vital to the function or fold of the protein) iv. Conservation of amino acid substitution (based on Blosum62 matrix which indicates how likely it is that one amino acid will be substituted for another), v. Location in secondary structure domains of αhelix or vi. β-sheet because secondary structures contribute to fold and function of proteins vii. location in C-terminus because the catalytic activity is primarily in the C terminus viii.
Residue is part of the active site (directly involved in binding/stabilizing PLP or substrate glycine, because it is essential to the catalytic function of the protein). ix. Mutation is within 7 Angstroms of the active site (because mutations in 3D proximity to the active site can affect its fold) x. Residue is part of the N-term PLP pocket (this study, because this will affect stability and/or catalytic activity.) xi. Residue is at P-H interaction interface (because this too can affect stability and/or catalytic activity). xii. Residue is at the known dimerization interface (because dimerization may be important to function). Xii-xviii. The mutation results in a change of amino acid properties, including change in xiii. polarity, xiv. charge, xv. aromaticity, xvi. to/ from proline, xvii. tRNA availability, and xviii. volume. Phi correlation analysis demonstrated independence between all parameters, except for slight negative correlations between conservation of amino acid substitution and change in polarity (φ = -0.44, with 1 being perfect correlation) and change in volume (φ = -0.51) ( Fig  5B). However, these correlations are statistically weak and therefore do not negate the validity of including each of these parameters.
These eighteen parameters were combined to create a multiparametric mutation score (MMS), which scores mutations based the broad categories 1) stability effects, 2) conservation of mutation amino acid, 3) position of the mutated amino acid, and 4) change in amino acid properties caused by substitution. Mutations with scores of 1-2 were considered mild, 3-4, moderate, and �5 severe.

NKH Patient clinical outcome scoring
Phenotypic data were collected from case studies for 131 patients in the literature where the genotype was known and the patient had at least one missense mutation [13,14,[27][28][29]. A comprehensive list of NKH symptoms in the clinical data was created (Table 2), and we developed a clinical outcome scoring scale based on the four major symptomatic domains of 1) seizures, 2) cognitive disorders, 3) muscle/movement control and 4) brain malformations ( Table 3). The cognitive disorders and muscle/movement control domains were assigned linear Likert-like scores from 0-3 which represent the observed severity gradation of these domains in NKH patients. The seizure domain has a non-linear increase from 1 to 3 for controlled seizure activity and uncontrolled seizure activity, respectively. This increase more accurately captures the severity of the intractable seizure phenotype. The brain malformation domain is a binary choice of 0 or 3, because unlike the other symptom domains which have gradations of severity, brain malformations are a presence/absence binary. We concluded that the seriousness of a structural brain malformations warranted a score of 3. Summation of all four domains yields a patient clinical outcome score (COS), with a maximal score of 12.
Patients who were deceased at the time of the case report were not scored, leaving 86 patients. Domains that were reported as asymptomatic in the case report were scored 0. Domains that were not mentioned were left blank. Patients for whom only one major domain was able to be scored (12 of the 86 patients) were excluded from further analyses, leaving a 74-patient cohort.

Homozygous mutation clinical outcome scores
Twenty-four of the 74 patient cohort were homozygous for 18 NKH-causing missense mutations. In cases where the mutation was present in one individual, the mutation was assigned the same COS as the individual. In cases where more than one individual was homozygous for the same mutation, an average COS was taken (Table 4).

Weighted Multiparametric Mutation Score (WMMS)
i. Homozygous Mutations. The MMS was first applied to all 18 homozygous mutations (listed in S6 Table) associated with 24 clinical cases. Every patient mutation was assigned an MMS score in addition to the previously determined COS. Ten variants of GLDC were found in homozygous form in the Exome Aggregation Consortium (ExAC) database, hosted by the Broad Institute were also scored. The ExAC database utilizes genomic data from healthy individuals; thus, homozygosity of these mutations indicates that they are non-pathogenic. They were included therefore as a non-pathogenic control group and were assigned a COS of zero.
ii. Compound heterozygous mutations. For compound heterozygous mutants, both GLDC mutations were listed in S7 Table. Each set of mutations was assessed across the 18 parameters for missense mutations. If both of the individual's GLDC mutations met a parameter condition, the individual was assigned a score of 1 for that parameter. If only one mutation met the parameter condition, the individual was given a score of 0.5. If neither met the condition, the individual was given a score of 0. Four additional parameters were included to account for non-missense mutations. These parameters included deletions, frameshift/premature stop codon mutations, intronic mutations, or mutations in the mitochondrial leader sequence. The summation across these parameters yielded an MMS.
Variants from healthy individuals (obtained from dbGaP), were included as non-pathogenic controls with a COS of zero. MMS parameters were optimized as above using the best fit line of the COS vs the composite score.
iii. For both homozygous and heterozygous mutations. Ideally, the homozygous-trained parameters and heterozygous-trained parameters would have approximately equal weights. The weight of the stabilizing mutations and the change in polarity parameters regressed to 0 for each dataset, indicating that these parameters have a negligible effect on disease severity. Active site mutations in each case were amongst the top weighted parameter with weights of 2.48 for homozygous and 3.92 for heterozygous mutations (Table 10). This indicates, as would be expected, that active site mutations severely affect protein function. For both the homozygousand heterozygous-trained sets, the parameters helix mutations, change in proline and codon availability are within weights of +/-1. For the homozygous mutations, none of the mutations were in the active region, and thus this parameter could not be optimized and was set to 0. The other 11 parameters have weight differences larger than +/-1, indicating either that the parameter weighting for these parameters is biased by the data that it's trained on, or that these parameters have different degrees of importance for heterozygous and homozygous mutations.
Coefficients for the 18 missense mutation parameters and 4 non-missense mutation parameters were calculated by minimizing the sum of squares between the observed clinical outcome scores and the predicted clinical outcome scores from the MMS using the linear regression, done with the linear regression class in Python's Scikit Learn module. We note that heterozygous mutations were trained without the assumption that each allele contributes exactly 50% to the clinical outcome.