Dissecting mutational allosteric effects in alkaline phosphatases associated with different Hypophosphatasia phenotypes: An integrative computational investigation

Hypophosphatasia (HPP) is a rare inherited disorder characterized by defective bone mineralization and is highly variable in its clinical phenotype. The disease occurs due to various loss-of-function mutations in ALPL, the gene encoding tissue-nonspecific alkaline phosphatase (TNSALP). In this work, a data-driven and biophysics-based approach is proposed for the large-scale analysis of ALPL mutations-from nonpathogenic to severe HPPs. By using a pipeline of synergistic approaches including sequence-structure analysis, network modeling, elastic network models and atomistic simulations, we characterized allosteric signatures and effects of the ALPL mutations on protein dynamics and function. Statistical analysis of molecular features computed for the ALPL mutations showed a significant difference between the control, mild and severe HPP phenotypes. Molecular dynamics simulations coupled with protein structure network analysis were employed to analyze the effect of single-residue variation on conformational dynamics of TNSALP dimers, and the developed machine learning model suggested that the topological network parameters could serve as a robust indicator of severe mutations. The results indicated that the severity of disease-associated mutations is often linked with mutation-induced modulation of allosteric communications in the protein. This study suggested that ALPL mutations associated with mild and more severe HPPs can exert markedly distinct effects on the protein stability and long-range network communications. By linking the disease phenotypes with dynamic and allosteric molecular signatures, the proposed integrative computational approach enabled to characterize and quantify the allosteric effects of ALPL mutations and role of allostery in the pathogenesis of HPPs.


Introduction
Hypophosphatasia (HPP) is a rare autosomal dominant or recessive metabolic disorder that constitutes a complex, multisystemic disease [1,2]. The clinical manifestation of this disease is highly diverse and is often linked with the mutational landscape and the inheritance mechanism [2]. From a prenatal lethal form with no skeletal mineralization to a mild form with late adult onset, HPP can be classified into six subtypes including perinatal, infantile, prenatal benign, childhood, adult and odonto [3]. In general, perinatal and infantile subtypes represent severe forms of HPP, while childhood, adult, odonto, and prenatal benign subtypes constitute mild phenotypes. [4] HPP is caused by loss-of-function mutations in the ALPL gene encoding the Tissue Nonspecific Alkaline Phosphatase (TNSALP). [5] TNSALP is a membrane-bound metalloenzyme, whose activity is reduced by various mutations in the ALPL gene, leading to the increased inorganic pyrophosphate, which in turn causes different HPP phenotypes [6]. As an enzyme replacement therapy, ENB-0040 is a bone-targeted, recombinant human TNSALP that prevents the manifestations of HPP [7]. However, to date, there is no established treatment for HPPs [8], due to the little knowledge about the relationship between mutations in the gene responsible for HPPs and phenotypic variability.
The ALPL gene is localized on chromosome 1p36.1-34 and consists of 12 exons distributed over 50 kb [9,10], ALPL mutation detection is important for recurrence risk assessment and prenatal diagnosis [11]. To date, more than 400 ALPL gene mutations have been reported worldwide and approximately 80% of these mutations are missense. Although HPP is caused by homozygous, heterozygous, or compound heterozygous ALPL mutations [12], most of these mutations cause changes in a single amino acid in TNSALP. Mutations in ALPL would reduce the enzyme activity of TNSALP mutant proteins to a varying degree, with residual activities often exhibiting different enzymatic properties from wild-type TNSALP. By measuring the effects of amino acid changes on TNSALP dimer stability, the relationship between ALPL missense variants and TNSALP enzymatic activity can be predicted [13]. Despite recent progress, the correlation between these mutations and the six HPP subtypes is not well established, and thus the molecular mechanisms underlying the genotypic-phenotypic relationships of HPP remain unclear [14]. A more detailed molecular characterization of ALPL mutations can help to better understand mechanisms driving the pathogenesis, and onset of HPP. In addition, understanding and characterization of the molecular determinants of pathogenic mutations may enhance a toolkit for inhibiting TNSALP and design of targeted therapies [15]. level, structural modeling, energetic analysis, protein structure network (PSN) modeling, and ENM-based dynamics analysis, different molecular and network signatures were determined and attributed to three types of mutations. Second, statistical analysis including creation of a random forest model, was performed to test which molecular signatures can serve as robust predictors for classifying the three kinds of mutations. Finally, through integration of longrange perturbation dynamics and network-based approaches, we quantified the allosteric potential of selected mutation residues. Our study characterizes the mutational landscape of ALPL through modeling and analysis of molecular signatures and allosteric effects of mutations [52], providing new insights into the genotype-phenotype interrelationship in HPP [53].

Data collection of ALPL mutations
With the goal of classifying various ALPL variants according to different phenotypes, 242 single-point loss of function variants were selected in HPP patients with mild and severe phenotypes and the control group from the ALPL gene mutation database (https:// alplmutationdatabase.jku.at, accessed on 11 January 2021), the Human Gene Mutation Database (http://www.hgmd.cf.ac.uk/ac/index.php), the Locus Specific Mutation Databases (http://www.hgmd.cf.ac.uk/docs/oth_mut.html), and literature in the PubMed database. The collated data set of ALPL mutations associated with mild, severe HPP phenotypes and putative functionally non-pathogenic variants, as well as their molecular signatures, are listed in S1 Table.

Sequence conservation and coevolution analyses
By using the Consurf server, [54] the conservation scale for each residue in the TNSALP protein was calculated (range from 1 through 9, where 1 denotes the node least conserved and 9 denotes the most conserved sites). The refined multiple sequence alignment (MSA) of the TNSALP protein can also be retrieved from the Consurf server for the following up Shannon information entropy S(i) and mutual information (MI) calculations. S(i) measures the variability of specific sites of protein sequences, which is calculated as where P(ai) is the probability of occurrence of amino acid type in the i th column. S(i) varies in the range 0�S(i)�3.0, and a lower S(i) implies higher evolutionary conservation. Similarly, MI was applied as a measure of the degree of intra-molecular coevolution between residues. The MI associated with the i th and j th sequence positions is defined as a N×N matrix of the form Pðxi; yjÞ PðxiÞPðyjÞ ð2Þ where P(xi, yj) is the joint probability of observing amino acid types x and y at the respective sequence positions, i and j; P(xi) is the marginal/singlet probability of amino acid of type x at the i th position. Gaps are counted as residue type 21.
[55] I(i, j) varies in the range [0, I max ], corresponding to fully uncorrelated and most correlated pairs of residues. The coevolution of a mutation was measured by the average MI values corresponding to each residue. The calculation of S(i) and MI was performed by Evol [56].

Structural modeling and protein stability analysis
A protein homology model for human TNSALP was constructed by using the MODELLER V9.19 [57] platform, using a template corresponding to the human placental alkaline phosphatase (PDB id: 1ZED [58]) and a very slow refinement. The template had a sequence identity of 57% to TNSALP and an X-ray crystal structure resolution of 1.57 Å. The quality of the modeled structure was evaluated by Verify 3D [59], PROCHECK, [60] ProSA, [61] and ERRAT [62]. Each single ALPL missense variant and its mutant structures were automatically generated by FoldX [63] and its effect on protein stability can be measured by the difference folding Gibbs free energy (ΔΔG values) between the wild type (WT) and the mutated forms of TNSALPs. The accessible surface area (ASA) is the atomic surface area of a protein that is accessible to a solvent, and the relative ASA (RASA) attribute is the per-residue ratio between the calculated ASA and 'standard' ASA for a particular mutational residue. As a measure of amino acid side-chain accessibility, the RASA also serves as a quantitative predictor of variant pathogenicity [64]. The RASA calculation of TNSALP was performed using through PSAIA 1.0. [65] Amino acid contact energy network analysis Amino Acid Contact Energy Networks (AACENs) for the TNSALPs were constructed and analyzed by using the NACEN R package based on static structures [66]. A node in the network denotes a single amino acid residue, and edges are defined by the environment-dependent residue contact energy between two nodes [67,68].
Based on AACENs, some network centralities have been defined. [69] The simplest PLOS COMPUTATIONAL BIOLOGY centrality measure is the degree centrality (DC) of a node i in AACENs, defined as the total number of nodes to which it is directly connected to. The betweenness centrality (BC) was defined as the number of times residue i was included in the shortest path between each pair of residues in the protein, normalized by the total number of pairs. It is calculated by where n jk is the number of shortest paths connecting j and k, while n jk (i) is the number of shortest paths connecting j and k and passing through i. The closeness centrality (CC) for a node was calculated by the reciprocal of the average shortest path length, which can be calculated as follows: where N is the set of all modes and n is the number of nodes in the network. The clustering coefficient (C) measures the degree to which nodes tend to cluster together and is defined as: where K i is the degree of node i and e i is the number of connected pairs between all neighbors of i.

Dynamic features based on elastic network models
In ENM approaches [35,36], each node represents a Cα atom in proteins and each edge is a spring γ for connecting two sites within a given cutoff distance r c . Two most commonly used ENM methods, the GNM and the ANM, are adapted in this paper. The total potential energy of the ANM and GNM systems with N nodes are expressed as where R ij and R 0 ij are the instantaneous and equilibrium distances between nodes i and j, and Γ ij is the ij th element of the N×N Kirchoff matrix Γ, which is written as In our study, r c between protein nodes were 7 Å and 13 Å for GNM and ANM, respectively. In comparison with GNM, which only measures fluctuation, ANM provides additional information on the motion directions of each residue. Herein, the GNM and ANM calculations were performed by ProDy [70].
The normal modes are extracted by eigenvalue decomposition Γ = U^U T . U is the orthogonal matrix whose k th column U k is the k th mode eigenvector, and^is the diagonal matrix of eigenvalues, λ k . Mean-square fluctuations (MSF) of a residue are given by where k B and T represent the Boltzmann constant and temperature, respectively. Based on ENM calculations, several other kinds of matrices can be generated. Form the perturbation response scanning (PRS) matrix [71], two dynamic features of effectiveness and sensitivity were defined as row and column averages of the matrix, respectively. The effector residues most effectively propagate signals in response to external perturbations. The sensor residues can easily sense signals and respond with dynamic changes. Directly from the Kirchoff matrix, its eigenvalues can be used to ascertain how important each node is to maintain the overall mechanical connectedness of the network. This amounts to measuring how much the network Laplacian spectrum changes when the connections, or couplings, of a node with its neighbors are deleted. As such, the mechanical bridging score (MBS) for a given mutation reflects the response ability of this residue. In the symmetrical stiffness matrix, the elements describe the effective spring constants associated with each residue pair. The stiffness for individual mutational residues is obtained by averaging all the elements in the corresponding row/column of the matrix. A detailed description of these dynamic features can be found in Bahar's work [38]. The statistical analysis of all features was performed in the R language (https://www.rproj ect.org/, R Core Team, 2017), version 3.6.2. Plots were generated using the R package ggplot2, version 3.3.3 [72]. Statistical significance was determined by the Wilcoxon signed ranked test (P <0.01) in our analysis.

Molecular dynamics simulations
MD simulations were performed on WT and mutant TNSALPs for the conformational dynamics study. The minimized structures were then slowly heated from 0 to 303.15 K over 100 ps and a two-step equilibration (each 100 ps) was carried out to ensure the correct temperature and pressure of the solvated system was attained. Firstly, the temperature was set at 303.15 K (NVT-constant number of particles, volume and temperature) using the V-rescale thermostat. Subsequently, equilibration at 1 atm (NPT-constant number of particles, pressure and temperature) was performed using the Parrinello-Rahman barostat [73]. The equilibrated systems were subjected to production MD for 500 ns using the GPU-enabled version of the GROMACS software package (version 5.1.4) [74] with the AMBER99SB-ILDN force field [75], using a integration step of 2 fs. The structures were immersed in an octahedral box filled with TIP3P water molecules, imposing a minimum distance of 15 Å between the solute and the box. The nonbonded interaction potential was smoothly switched off between 10 and 12 Å, beyond which coulombic interactions were treated with the particle-mesh Ewald method [76]. The LINCS algorithm [77] was used to constrain the hydrogen containing bonds. The atomic positions were saved every 50000 steps (100 ps) for analyses. For each system (WT and mutant), three independent replicas of 500 ns were performed. GROMACS analysis toolkit utilities were used to analyze MD trajectories produced during the last 400 ns production run, and root mean square deviations (RMSDs) and root mean square fluctuations (RMSFs) were calculated for each system.

Allosteric mutation analysis
Two methods were used to investigate the allosteric effects of mutations. The dynamic residue network analysis (DRN) was performed by the program, MD-TASK [78]. For each protein system, the network was constructed by extracting C α and C β atoms of MD trajectories as nodes, and the edge was created when two nodes were within 6.5 Å. The network measure of BC was also calculated based on DRNs. Then, the allosteric communication pathways were calculated by connecting mutational sites and allosteric sites with the shortest edges, using the Floyd-Warshall algorithm [79]. The AlloSigMA server [80,81] was used to evaluate the allosteric effects of each mutation based on the structure-based statistical mechanical model of allostery. This statistical mechanical model estimates the allosteric free energy difference Δg of each residue to perform the allosteric signaling and mutation analysis in TNSALP.

Sequence and structural landscapes of ALPL mutations
To clarify the sequence characteristics of genotype and phenotype underlying ALPL, we first divided all mutations into three categories according to the severity of HPP: the mild form, severe form, and control group. Fig 2A shows the distribution of mutational data, while the detail description can be found in S1 Text. The primary sequence analysis showed that there were 261 conserved residues (conservation score > 5, see S2 Fig PLOS COMPUTATIONAL BIOLOGY more than 71.0% of the sites are highly conserved, while in the mild group, this proportion is 65.0%, and in the control group, it is much lower, at only 27.3% ( Fig 2B). It is worth noting that half of these highly conserved sites in the control group are from the intersection of the mild group and the severe group. Furthermore, the evolutionary landscape of the three types of ALPL mutations was plotted in terms of S(i) and MI profiles. In general, disease-causing variants were more likely to occur at conserved sites, while the non-pathogenic variants are more likely to occur at the less conservative sites (Fig 3A). The distributions of S(i) and MI scores for mutations were very similar, and a significant positive correlation between coevolution and entropy was observed (S3A Fig), and the distributions of S(i) and MI scores for mutations were also very similar. In contract with S(i), the higher MI values of amino acids correspond to residues that were highly coevolutionary. As shown in Fig  3B, pathogenic mutations including both severe and mild mutations occurred at lower coevolutionary residues.
The 3D structure of TNSALP was then computationally modeled by using the crystal structure of human placental alkaline phosphatase as the template (S4 Fig and S1 Text). By mapping all the mutations onto the modeling TNSALP structure, we identified the spatial distribution of the three kinds of mutations ( Fig 2C and 2D). The major difference in the spatial distribution between pathogenic and control groups was that the control group had no mutations at the active site, thereby suggesting the role of active sites in HPP pathogenicity. According to the RASA (Fig 3C), residues were classified into three classes, i.e., buried areas (RASA < 5%), semi-exposed areas (between 5% and 30%), and fully exposed areas (RASA > 30%). Although different types of mutations were distributed across all functional domains, disease-causing variants were more likely to occur in buried areas (122 mutations, 65.9%). In contrast, nonpathogenic variants were more likely to occur at the surface (32 mutations, 56.1%).
The statistical analysis of S(i), MI and RASA for control, mild, and severe mutations showed statistical significance between the control and mild, and control and severe mutations; however, there was no statistical significance between the mild and severe mutations (Fig 3D-3F). Accordingly, structural environment and evolutionary studies have found that the HPP-causing ALPL mutations, especially severe mutations, are located in conserved, less coevolutionary, and buried residues with small S(i), MI and RASA scores.

Protein stability effects of severe and mild mutations: Structural determinants of TNSALP stability and energetic hotspots
To explore the thermodynamic determinants of mutational hotspots and evaluate their functional significance, FoldX approach [63] was adopted for the predicting of the free energy changes induced by single point mutations. As shown in Fig 4A, we found that most of the disease mutations were associated with the increase in the folding free energy (ΔΔG>0), thus leading to the reduced protein stability. Importantly, the largest destabilization changes upon substitutions correspond to the group of mutations with severe phenotype (Fig 4A). At the same time, the mean value of ΔΔG for most mutations in the control group, were smaller than 1.0 kcal/mol. Furthermore, as shown in Fig 4B, the predicted folding free energy changes ΔΔG for mutations that cause a severe phenotype were significantly larger than that the ones in the mild group and in the control group. These findings indicated that most of the disease-causing mutations are associated with a significant decrease in the protein stability, and this trend varies in different phenotypes.
In addition, ΔΔG provides a benchmark or metric that can be used to compare with physicochemical properties for large-scale analysis of other mutations. We have compared the ΔΔG with the RASA, S(i) and MI, and weak correlations were found among them (S3B- S3D Fig).
The low coefficients imply that the three properties provide additional measures of ΔΔG for describing ALPL mutations, namely pathogenic mutations tend to have a high ΔΔG value and low RASA, S(i), and MI values. For example, pathogenic mutations, including G220R, G221R, PLOS COMPUTATIONAL BIOLOGY G326R, G129R, P108L, R223W, and G63R (severe mutations), and G339R and A468V (mild mutations) showed obvious energy changes but with low RASA, S(i), and MI scores. Based on the above analysis, we concluded that most of the "severe" and "mild" mutations were predicted to have large ΔΔG values and differed in terms of evolutionary conservation and physicochemical properties.
It is worth noting, however, that severe mutations R272L, N47I, L299P, R272H, H438L, and R450C, and mild mutations R272H, R450H, V424A, K264R, E21K, V424M, R184Q, E84V, N47I, K422R, and T148I showed a relatively moderate free energy change (Fig 4). At the same time, some non-pathogenic mutations I269V, H482N, F327C, A488S, T277A, T255I, I317L, and D239Y, displayed larger free energy changes. Structurally, these mutations were distributed in different domains and did not cluster together (Fig 4C). These results indicated that functional effects of pathogenic mutations may be determined not only by the local energetic changes but could be also influenced by changes in the long-range allosteric interactions and potential reorganization in the global interaction networks.

Modeling of residue interaction networks reveals allosteric signatures of ALPL mutations
Network theory has become a widely used method to quantify protein structures and functions in terms of their topological connectivity [82]. In our analysis, we used the difference between network centrality measures of mutant and WT values to describe the topological change of TNSALP AACENs caused by ALPL mutations. For each amino acid substitution, four common network parameters including DC, BC, CC, and C were calculated (see Materials and Methods for a detailed definition of these parameters). The mutation-induced changes ΔDC, ΔBC, ΔCC, and ΔC were computed on each mutation and analyzed for the mild, severe, and control groups, as illustrated in Figs 5 and 6.
The profiles in Fig 5A-5D indicted that most of the mutations can induce a wide range of changes in the values of DC, CC, and C network parameters. In general, some common peaks were found in the ΔDC, ΔBC, and ΔCC profiles, such as the severe mutation R391H and three mild mutations G162S, T166I, and R272C. Detailed analysis of the ΔDC profile showed that its peaks corresponded to pathogenic mutations, including severe mutations R391C, Q207P, D294A, and L289F and mild mutations A468V, G162S, Y117C, G144E, E291K, and D406G. These significant peaks aligned precisely with functional centers, including hotspot positions involved in the crown domain, the homodimer interface, and the Ca 2+ binding site, thereby suggesting their latent functional roles. In comparison, smaller variations were observed in the

PLOS COMPUTATIONAL BIOLOGY
ΔBC profile (Fig 5B), but much sharper peaks for severe mutations F355I, D378H and A446T and mild mutations E452K, Y117C, G162S, R152C, and T166I were found. Functional domain analysis revealed that E452 and D378, R391, and A446, R272 are involved in the active site, crown domain, and Ca 2+ binding site, respectively. Eleven peaks were found in both the ΔCC and ΔC profiles. In the ΔCC profile (Fig 5C), eight corresponded to pathogenic mutations, including G162S, T166I, G63R, T165I, L208F, E291K, G220R, and N47I, and only one (E291K) was located at the functional domain. In the ΔC profile (Fig 5D), the other set of pathogenic mutations, peaks corresponded to R335T, N47I, E429K, D294Y, G455D, V459F, E354D and V431A, including many functional sites. Among these, R335T and G455D were located at the active site, D294Y at the Ca 2+ binding site, and E429K at the crown domain.

PLOS COMPUTATIONAL BIOLOGY
A series of ENM-based dynamics descriptors, including GNM-based MSF, PRS-based effectiveness and sensitivities, and ANM-based mechanical bridging score (MBS) and stiffness, have been introduced to enhance the predictive ability of disease-related mutations [38]. Here, we employed these dynamic features to systematically characterize different types of ALPL mutational sites. First, mapping the three types of mutations onto the dynamic profiles based on the TNSALP protein provided primary insight into the interpretation of the functional impact of variants in light of the intrinsic dynamics of the mutational site (Fig 5E-5I). As shown in the MSF profile (Fig 5E), pathogenic mutations including both mild and severe mutational sites always demonstrated lower MSF values, except for some pathogenic mutations at the Ca 2+ binding domain. Some particular pathogenic mutations, located at minimal positions that correspond to hinge sites have been found, such as M62, G63, S65, A377, D378, and H381, which belong to severe mutational sites, and D60, V461, and G473, which correspond to mild mutational sites. Structurally, these hinge mutational hotspots are located at the active site and the dimer interface. These conformational dynamic signatures of disease-associated mutations have been revealed both at the genome [83] and proteome-levels [33].
Regarding the PRS analysis, two matrices were used to quantify the allosteric effect of each residue, which are effectiveness and sensitivity, for evaluating the propensity of residues to act as sensors or as effectors of allosteric signals. As shown in the profiles in Fig 5F and 5G, the distribution of effectiveness and sensitivity for ALPL mutation hotspots shows different trends. In general, most of the pathogenic mutations match almost exactly with some peaks of the effectiveness profile, with sharp dominant peaks being found for three severe mutational sites (G63, A377, and V459). On the other hand, the peaks of the sensitivity profile correspond to some control mutational sites, while pathogenic mutational sites have smaller sensitivity, such as V95 and G473 in the mild group and G63, S65, and V459 in the severe group. As two unusual ENM-based dynamic features, the distribution of mutations in MBS and stiffness profiles are shown in Fig 5H and 5I. For the MBS profile, the overall distribution of pathogenic mutational sites is larger than that of control mutational sites, highlighting the importance of pathogenic mutational sites in maintaining the stability of the ALPL protein.
In addition, both ΔBC and ΔCC profiles showed significant differences between groups of mutations ( Fig 6). As shown in Fig 6B, the mean values of ΔBC for the severe, mild, and control groups were 0.05, 0.16 and 0.21, respectively, with P = 1.889e-07, 9.906e-09, and 0.0005129, respectively, by the Wilcoxon signed rank test between the severe and mild, severe and control, and mild and control groups. In contrast, as shown in Fig 6C, the mean values of ΔCC for the severe, mild, and control groups were 0.03, 0.17 and 0.17, with P = 9.646e-05, 0.0001094, and 0.0003274, respectively, by the Wilcoxon signed rank test, between the severe and mild, severe and control, and mild and control groups.
An important revelation of the protein network centrality analysis is that pathogenic ALPL mutations typically lead to significantly higher variations in ΔBC and ΔCC parameters as compared to the control group (Fig 6). Given topological nature of the network parameters measuring the global effect of mutations, the observed magnitude of the mutation-induced changes in the betweenness and closeness centrality suggests that pathogenic mutations, and especially severe mutations, may significantly alter the long-range connectivity of the interaction network. In particular, the ΔBC profile displayed the most pronounced changes and highlighted the potential allosteric effects for pathogenic mutations [84].
To clearly observe the ability of dynamic parameters to distinguish mutations in different groups, statistical analysis was further performed on the data of severe, mild, and control mutational sites. As shown in Fig 6E, the control group had the highest mean MSF, with P = 1.8e-05 and 0.00021 by the Wilcoxon signed rank test, in comparison with mild and severe mutations, showing significant differences between both pathogenic mutational sites and the control group. However, significant differences were not found between mild and severe mutational sites, with P = 0.75. Similar significant results were also found for effectiveness ( Fig  6F), sensitivity (Fig 6G), and MBS (Fig 6H), while significant differences in stiffness could not be found among the three groups of mutations (P> 0.01, Fig 6I).

Atomistic simulations and dynamic network modeling determine the effects of severe phenotype-associated mutations on allosteric communications
Based on previous results, some severe phenotype-related mutations have relatively low ΔΔG but higher ΔBC values in network topology. To gain energetic and topological insights, we compared ΔBC and ΔΔG values for all mutations. The scatterplot revealed that N47I, L289F and M355I were three severe mutations with large ΔBC but low ΔΔG (Fig 7A). To determine the dynamic effects induced by severe phenotype-associated mutations, we performed a set of

PLOS COMPUTATIONAL BIOLOGY
three independent replicas, all-atom MD simulations for WT and three mutant TNSALPs caused by N47I, L289F, and M355I variants (Fig 7). The RMSD values of Cα atoms served as an overall measurement of the departure of the structures from the initial coordinates (S5A- S5D Fig). All the systems became convergent for duration of 100-500 ns, and the RMSD values became stable during the last 400 ns of the simulations. The conformational dynamics results (S5E- S5H Fig) revealed that the profile of RMSF was very consistent in three replicas of each system and also consistent between the WT and mutant systems, and no conformational change occurred at the active site, further explaining why ΔΔG for these severe mutations was small. The differential fluctuation ΔRMSF, with respect to WT, showed considerable changes in regions around the ion binding pocket and the Ca 2+ binding site located in the other monomer, with two of the largest peaks at residues N190 and S258 (Fig 7B). This suggests the existence of allosteric communication between the mutant sites and these two regions. Accordingly, in the four systems, we selected mutant sites in chain A as starting points and N190 and S258 of chain B as the target residues to further identify specific allosteric pathways caused by severe phenotype-associated mutations. In addition, the comparisons of BC values based on DRNs among three independent MD replicas (S6 Fig), as well as between AACENs and DRN (Figs 7C and 7D and S7) were performed. The overall similar profiles of BC not only show the reproducibility of MD simulations, but also indicate that BC provides a robust network index. The difference is that DRN can capture more peaks that correspond to disease mutations, such as T166I, M295V, I395V, and S445F (mild mutations) and P292T and G334D (severe mutations) in WT (Fig 7C), while P108L, T166I, M295V, and G473S (mild mutations) and A348T (severe mutations) in N47I mutant (Fig 7D).
The shortest path algorithm based on dynamic network models was employed to identify specific allosteric pathways ( Table 1). As shown in Fig 8, the intermolecular interaction of A: G473-B: V95 was predicted as the key bridge in WT, and the other interaction of A: G386-B: Q106 was predicted as the key bridge in mutants. It was, therefore, determined that WT and mutant TNSALPs use different allosteric interfaces for signal transmission. Overall, pathway plasticity was found to exist, although the structures between WT and mutants were well conserved, conforming to the recent function-centric allosteric regulation study [85]. Among these pathways, a common region including three active site residues (T100!P108!D109) corresponding to mutational sites was also found, suggesting that these mutational paths may lead to the same functional state. In particular, the molecular signatures showed that T100, P108, and D109 have relatively high effectiveness and ΔBC, demonstrating that these nodes are key sites for structural signal propagation.
The comparison of allosteric pathways reveals several important insights. First, by using the same starting mutational sites and ending points, the mutant TNSALPs exhibit shorter allosteric pathways, which is in accordance with our hypothesis that the mutants caused by severe mutations have higher allosteric propensities. The system with the largest reduction in signaling path nodes is the mutant caused by M355I, which has the highest ΔBC and a relatively small ΔΔG. In WT, the path from M355 to S258 passed through the nodes F462!S463!L471!H472!G473 on the dimeric interface of the chain with mutations and the active sites P108 and D109 in the other chain (Fig 8A). In contrast, none of these sites participated in the signaling paths in the mutants. Hence the whole length becomes shorter and straight (Fig 8B). The same situation occurs in the path from M355I to N190 (Fig 8C and 8D). Second, the pathways in mutants involve fewer active sites, interfacial residues, and mutational sites. This comparison further highlights these functional regions as hubs for long-range allosteric communication in WTs, while it also means that the functional role of these reduced nodes may be lost if a severe mutation occurs. An extreme situation is observed for N47I, in which the path from I47 to S258 does not pass through the active area and employs the shortest Table 1. The constituent residues of the shortest pathway from the three severe mutational sites (N47, L289, and M355 to N190/S258). Residues in the two chains are denoted by different colors.

PLOS COMPUTATIONAL BIOLOGY
and most straight path. Last, the path patterns are more robust in WT than in mutant states, suggesting that severe mutations introduce more pathway plasticity. For example, the shortest pathways from residues N47, L289, and M355 to N190 of the other chain in the WT shared some similar nodes, including P108, D109, G126, R184, D185, Y187, and S188, in the other chain, with most of located at the active site or ion binding pocket, and with all crossing the same area and to the ending residue N190 (orange circle region in Fig 8C and 8D). However, the situation was different in mutants. Fewer residues were shared in the three mutants, and only residue P108 showed the shortest pathway from I47, F289, and I355 to N190. The situation was similar in the paths from residues I47, F289, and I355 to S258 of the other chain. Three paths shared residues I204, A205, Y206, L252, and T255 of the other chain in the WT and none in the mutants.
As shown in Fig 8A, among the four mutations (E451K, M355I, R391C, and Y117C) with the largest ΔBC but low ΔΔG values, only M355I was related to the severe HPP phenotype. At the time of waiting, the ALPL mutation database has been updated by including several more severe mutations. We were surprised to find that the new patients with severe phenotypes Allosteric paths originating at three mutational sites and terminating at S258 in the WT (A) and mutant states (B), as well as terminating at N190 in the WT (C) and mutant states (D), respectively. The TNSALP structure is depicted as represented by a semitransparent colored cartoon, and the starting and ending residues of all the paths are represented as green and cyan spheres, respectively. The alpha-carbon of the path through the residues is shown as silver (chain A) and orange (chain B) spheres, in which active sites are represented by red spheres.
https://doi.org/10.1371/journal.pcbi.1010009.g008 PLOS COMPUTATIONAL BIOLOGY related to E452K and R391C were reported to have mild and control mutations. To further investigate their allosteric effects and whether they are an intrinsic dynamic of these severe mutational sites, single-residue perturbation analysis was performed by using the AlloSigMA. The results showed that perturbations of all these five mutational sites induced intermolecular allosteric effects (Fig 9) in the form of both positive and negative modulation with maximum values of Δg (N47!chain B) = 2.44 kcal/mol, Δg (N289!chain B) = 1.07 kcal/mol, Δg (N335!chain B) = 0.99 kcal/mol, Δg (R391!chain B) = 1.05 kcal/mol and Δg (E452!chain B) = 0.86 kcal/mol, respectively. Although ΔΔG and Δg have different biological means referring to folding stability and allosteric ability, the comparison of them was also performed for these five mutations (S8 Fig). The most interesting finding is that N47I has smallest ΔΔG but large Δg, which is in agreement with MD simulation result that the mutant is stable but show most different allosteric landscapes. The analysis exhibited an evident influence on the stability of some functional regions, including the Ca 2+ binding and crown domain, thereby revealing that severe mutations can induce changes in the stability of other sites and affect the catalytic activity of proteins.
Taken together, the MD results suggested that severe ALPL mutations may affect the signal transduction pathway between the two monomers of TNSALP. Allosteric pathways in WTs are robust and involve some functional sites consisting of active, interfacial, and mutational residues. In the allosteric mutant states, the overall pathway pattern is more "flexible", with shorter pathways involving fewer functional sites, resulting in a severe phenotype. The allosteric free energy calculation once again suggest that ΔBC is a good and facile indictor for predicting severe mutations, whose molecular pathogenicity does not alter the local stability at active sites to change the protein folding energy but instead generates alternative and longrange molecular effects.

Machine learning identifies key molecular features for classification of different mutation types
To determine the relationship between the above studied features, their redundancy and ability to classify and predict various mutation types, we first computed the pairwise correlations between different prediction scores by using Spearman's rank correlation coefficient ( Fig  10A). These features can be classified into three major types according to their related sequence, network, and dynamics information. Among sequence-based features, S(i) has high positive correlations with MI and RASA, but these three features all have negative correlations with conservation calculated by Consurf. For the four network parameters, ΔDC, ΔBC and ΔCC show significant correlations; only ΔC is an independent factor. The dynamics-based matrices that show more complex correlations have been divided into two groups: stiffness, PLOS COMPUTATIONAL BIOLOGY MSF, and sensitivity; effectiveness and MBS. A highly positive correlation was predicted within each group, but a negative correlation was found between the two groups. The special index ΔΔG did not show any correlation with other features. Accordingly, these features could complement our understanding of the molecular signatures of HPP-causing mutations.
Next, we employed Radom Forest (RF) model to classify mild, severe, and control groups using the 14 features mentioned above (see S2 Text for details). We considered classifying the model containing five variables, whose accuracy of repeated cross-validation was 80.5%, as the optimized model, which was used for comparison with the model using all features. To be more specific, the feature importance was ranked by the mean decrease accuracy (Fig 10B). Most interestingly, we found that the top four important features were all network-based features, including ΔDC, ΔBC, ΔCC, and ΔC, suggesting an essential role of protein network topology in controlling mutation events. ΔΔG is an important feature and is also an important determinant of protein stability in mutations. To our surprise, features relating to ENM protein dynamics were not ranked as important predictors. This may be caused by the loopy structure of the TNSALP protein. By considering only the top five features, the RF model yielded an accuracy of 96.8%. However, taking all features into account, the accuracy of the RF model dropped to 92.2%. The inclusion of more features reduces the predictive accuracy of the RF model, which led us to focus on the specific interpretability of each parameter.
To aid in the interpretability of the classification effect of each parameter, we further evaluated the performance of the 14 features of the dataset composed of mild and severe mutations, mild and control mutations, and severe and control mutations. To this end, the ROC curves with the AUCs of the 14 metrics for each comparison group were plotted, and the related values are listed in Table 2. First, we examined the difference in the AUCs for the 14 functional features of the severe and control groups ( Fig 10C); ΔΔG showed appreciable performance with an AUC of 0.8447. Moreover, the S(i), conservation score, and ΔBC also showed good performance with AUCs, all greater than 0.73. We then tested the performance of the 14 features between mild and control group mutations (Fig 10D), as well as the mild and severe group mutations (Fig 10E). In the classification of the mild group and control group, two sequence-based and four dynamics-based parameters showed moderate predictive performance. The AUCs of the conservation score, entropy, MSF, effectiveness, sensitivity, and MBS In contrast, in the classification of the mild and severe groups, we found that ΔBC showed the best performance, with an AUC = 0.7471, while ΔΔG showed the second-best performance, with an AUC = 0.7048. The most interesting finding is that ΔΔG is a good indicator for identifying disease mutations, while ΔBC is the best indicator for classifying mild and severe mutations. The machine learning analysis showed that sequence, dynamics and network features contribute to classification of ALPL mutations and their phenotypes.

Discussion
As the biological hallmark of HPP, the reduced activity of TNSALP is caused by loss-of-function mutations in the ALPL gene; varying levels of reduced activity are related to mild or severe HPP phenotypes. Despite some progress, more research is needed to obtain a comprehensive understanding of genotype-phenotype interrelationships in HPP. Thus, in this study, the functional landscape of ALPL mutations was established by investigating a set of features generated from the protein sequence, network topology, and ENM dynamics calculation and their relationship with different HPP phenotypes. Further structure-function studies including MD simulation and structural-communication pathway analysis of mutant TNSALPs have supported our arguments that mutational hotspot sites often correspond to global mediators of allosteric interactions. In addition, a machine learning classifier was developed not only to examine the relationship between the pathogenicity status of mutations and their biophysical attributes, but also for the prediction of mutations in the control, mild, and severe groups. We have found that in addition to ΔΔG as the commonly used predictor, coevolutionary conservation and network-based features also yield strong signals in machine learning predictions and provide orthogonal information. These findings suggest that the study of molecular signatures and allosteric regulation of ALPL mutations may be a step toward defining a greater quantitative genotype-phenotype interrelationship in HPP.
Through the large-scale analysis of disease-causing ALPL mutations, we propose the following possible molecular principles underlying HPP-related mutations. First, HPP pathogenicity is largely due to the structural instability of TNSALP caused by ALPL mutations, which have variable effects on enzyme activity. Thus, for such cases, ΔΔG is a satisfactory and acceptable index for predicting pathogenic mutations, especially for distinguishing mutations between the control and severe groups. There is also the possibility of extrapolating our methods of using ΔΔG to estimate changes in protein stability upon mutations as a popular way to predict pathogenic mutations in other diseases. Many pervious works have proposed that BC and ΔBC are good network predictors for identifying important mutations [86,87]. Second, in our paper, we further demonstrate that ΔBC is also a good indicator to distinguish mild and severe groups in pathogenic mutations. We speculate that mutations in the severe group have stronger allosteric effects than mutations related to the milder forms of HPP, serving as "allosteric mutations" [88,89]. Analysis of allosteric properties of severe mutations adds an additional confirmatory layer to segregate variants in "mild" and "severe" states, thereby furthering understanding of the allosteric basis of loss-of-function mutations. Third, for the classification of mild mutations and differentiation from the control group, coevolutionary conservation has been shown to be the most important predictor, thereby suggesting that mild pathogenicity may be related to amino acid changes with small evolutionary substitution probability. However, to reach the elusive goal of establishing the precise relationship between ALPL mutation genotypes and HPP phenotypes and a more reliable prediction model or score, such as protein regulatory and functional binding site prediction [90], more clinical data on mutations [91] and data on enzyme activity [13] are needed. Verify-3D analysis with 91.37% of the amino acids scoring > 0.2 in the 3D/1D profile; Only 5% of all the residues showed bad Psi degrees, and majority of which located in the terminal loops indicated that the model is well constructed. ProSA Z-score of − 9.07 shows the Z-value of the protein was similar to native protein of equivalent sizes(D). The reliability of the model was also shown by the ProSA energy plot with no obvious problematic regions with a positive value in the ProSA energy plot(E). Besides, the ERRAT plot is expressed as percentage of protein with calculated error value falls below the 95% rejection limit, and an ERRAT score of 84.97 indicates a good quality model. If the residues corresponding to the peaks of DRN BC has the mild and severe mutations we collected, it is highlighted as yellow and red diamonds, respectively. DRNs can capture more peaks that correspond to disease mutations, such as T68M, T165I and I395V (mild mutations) and G334D (severe mutations) in L289F mutant, while T68M, V95M, T165I, D378V and L414M (mild mutations) and D378V (severe mutations) in M355I mutant.