Comparison of the Internal Dynamics of Metalloproteases Provides New Insights on Their Function and Evolution

Metalloproteases have evolved in a vast number of biological systems, being one of the most diverse types of proteases and presenting a wide range of folds and catalytic metal ions. Given the increasing understanding of protein internal dynamics and its role in enzyme function, we are interested in assessing how the structural heterogeneity of metalloproteases translates into their dynamics. Therefore, the dynamical profile of the clan MA type protein thermolysin, derived from an Elastic Network Model of protein structure, was evaluated against those obtained from a set of experimental structures and molecular dynamics simulation trajectories. A close correspondence was obtained between modes derived from the coarse-grained model and the subspace of functionally-relevant motions observed experimentally, the later being shown to be encoded in the internal dynamics of the protein. This prompted the use of dynamics-based comparison methods that employ such coarse-grained models in a representative set of clan members, allowing for its quantitative description in terms of structural and dynamical variability. Although members show structural similarity, they nonetheless present distinct dynamical profiles, with no apparent correlation between structural and dynamical relatedness. However, previously unnoticed dynamical similarity was found between the relevant members Carboxypeptidase Pfu, Leishmanolysin, and Botulinum Neurotoxin Type A, despite sharing no structural similarity. Inspection of the respective alignments shows that dynamical similarity has a functional basis, namely the need for maintaining proper intermolecular interactions with the respective substrates. These results suggest that distinct selective pressure mechanisms act on metalloproteases at structural and dynamical levels through the course of their evolution. This work shows how new insights on metalloprotease function and evolution can be assessed with comparison schemes that incorporate information on protein dynamics. The integration of these newly developed tools, if applied to other protein families, can lead to more accurate and descriptive protein classification systems.


Introduction
Proteases are a vast class of enzymes found in all kingdoms of life that participate in a wide range of biological processes [1]. They present different catalytic chemistries, structures, specificities, oligomeric states, and are grouped into distinct families and clans according to different classification schemes. Examples are the MEROPS [2], SCOP [3], and CATH databases [4], which use a combination of sequence-and structure-based methods for grouping distinct proteins. The need for a better understanding of their function has also led to the search of other common features shared between a wide range of known class members [5,6]; perhaps the most pervasive similarity was identified by Tyndall et al., where it was observed an almost universal binding of the Aspartic, Serine, Cysteine, and Metallo-proteases to the extended βstrand conformations of substrates, products, and inhibitors of peptidic and non-peptidic origin [5]. Nonetheless, there is still the need for a better characterization of the multiple factors governing protease function and evolution. We therefore tested if the employment of novel protein comparison tools can, when combined with conventional comparison methods, help in the search of additional features shared between distinct protease members.
Protein internal dynamics plays an important role on enzyme function, since it encompasses the space of catalytically-relevant structural changes occurring in a given fold during the reaction path [7][8][9][10][11][12][13]. These structural changes span a broad range of time-scales and magnitudes; from harmonic vibrations of bonds and angles occurring at the femtosecond time-scale, to global conformational fluctuations of large domains at the microsecond time-scale, some of them associated with substrate binding and product release [14][15][16][17][18]. Therefore, understanding the relation between protein internal dynamics and its structural and functional features is a challenging task, since it depends not only on the analysis of protein dynamics at different time scales, but also on their functionally-relevant molecular states (e.g. bound vs unbound state).
There is an ongoing debate on how protein internal dynamics is subject to evolutionary selection due to its functional significance and how it is related to sequence and structure evolution [19,20]. It has been shown that internal dynamics and backbone flexibility are conserved in homologous proteins [6,[21][22][23][24][25][26][27]. Specifically, it has been observed that: i) low-frequency, largeamplitude normal modes tend to be evolutionarily conserved [24,28] and; ii) there is a significant correspondence between low-energy modes determined for superfamily structural cores and the modes of structural variance observed within protein superfamilies [29,30]. These findings support the notion that conservation of protein dynamics is subject to evolutionary selection and were based on the observation that ligand binding can be described in many cases by few lowenergy normal modes [31,32]. However, similarities of low-energy modes observed between non-homologous proteins with the same architecture and even between unrelated proteins [26,33], together with the observation that low-energy modes are more robust to random mutations suggests that protein dynamics may not always be subject to evolutionary selection [34][35][36].
Additional insights have been provided by phylogenetic studies addressing the evolution of normal modes. It was shown that changes in protein dynamics are associated with functional divergence in enzymatic families and that non-homologous enzymes that perform similar functions also share similar motions related to catalysis [37,38]. Protein dynamics has also been found to diverge between structurally related proteins at functionally important sites [39,40], and this divergence has been argued to be dependent on other functional requirements, such as intracellular localization [41]. Finally, protein dynamics has also been associated with the evolution of new enzymatic functions and on the promiscuity of enzymes [42][43][44][45]. These findings suggest that protein dynamics plays an important role in the function and evolution of enzymes, although the extent to which evolution has selected for this particular trait still remains unclear.
The development of dynamics-based comparison methods has been crucial to the above mentioned studies and has provided insights that may not be detectable from sequence or structure comparison methods alone [27,40,[46][47][48][49][50][51][52]. These methods rely on Essential Dynamics Analysis (EDA), to retrieve the collective motions of protein structures, which are typically obtained from Molecular Dynamics (MD) simulations or alternatively by simplified, coarsegrained representations of protein structure, such as Elastic Network Models (ENM). The similarity between modes of two distinct structures can be compared based on different quantitative schemes and therefore new relations between proteins can be sought based on their dynamical properties [50].
In the search for common features shared between different types of proteases, Carnevale et al. carried out pairwise structure-and dynamics-based alignment of 17 representative protease structures with minimal mutual sequence identity and distinct folds [6]. In most cases the division into distinct folds was consistent with the division in clans of the MEROPS classification scheme, indicating that structures with different evolutionary origins adopt distinct folds. Nonetheless, significant structural similarities among proteases of different clans were identified, thus suggesting a convergent evolutionary process. Indeed, in pairs of structures showing higher structural similarity, the aligned segments in both structures consisted on regions close to the active-site, even for pairs from distinct clans (i.e. different catalytic chemistries). The authors proposed that a criterion for catalytic activity not dependent on chemical determinants could be at play, namely the dependence on specific and concerted protein motions related to function. A close correspondence of the internal dynamics between some pairs was identified, which consisted in rearrangements of active-site surroundings that lead to distortions of the substrate-accommodating pockets. Therefore, it was suggested that a "dynamical selection" process, operated by the necessity to interact with substrates in well-defined geometrical arrangements, may lead to convergence or conservation of the internal dynamics in proteases. Indeed, recent work reported by Micheletti further identified significant dynamical similarities between proteases from different clans with no detectable structural similarity [50]. It was therefore suggested that dynamics-based comparison methods could be useful in detecting functionally-related features shared by proteases that otherwise would remain elusive using only sequence-and structure-based methods.
Metalloproteases (MPs) are one of the most diverse types of proteases, presenting a wide range of folds and catalytic metal ions. They are divided in more than 40 families identified among all kingdoms of life. In the case of the MEROPS MA clan, where most of the known MPs are grouped by common ancestry, its members are characterized by a single catalytic zinc ion, a consensus HEXXH sequence motif and a common fold architecture. However, this structural conservation is not observed at the domain level since members from different families have distinct domain composition and topology. MPs are therefore attractive candidates to study the relationship between structure and dynamics within a protein clan. In this work, the suitability of employing coarse-grained methods to the study of MP internal dynamics was first made by comparison of ENM-derived internal dynamics profile of thermolysin to those obtained by Principal Component Analysis (PCA) of crystal structures and EDA of MD simulation trajectories. Subsequently, an analysis of pairwise structural-and dynamics-based alignments of a representative set of MPs from 13 families of the MA clan was performed. It was found for members of this clan that dynamical similarity does not appear to correlate with structural similarity. Interestingly, pairs having high dynamical similarity despite having no structural similarity were identified. Inspection of the produced alignments indicates that in these cases conservation of internal dynamics has a functional basis, namely to dictate proper interactions with the substrate. Our data show the suitability of using simple comparison schemes that incorporate information on dynamics to provide new insights on MPs function and evolution, unveiling their potential as tools to study the role of internal dynamics in protein evolution.

Internal Dynamics of Thermolysin
PCA of the structural set was made to obtain the respective Principal Components (PC), using the ProDy software [53,54]. The 3N x 3N covariance matrix was calculated over n, where N = 316 is the number of residues in thermolysin (represented by the respective C α atom coordinates), and n = 112 is the number of thermolysin crystal structures (Uniprot ID: P00800) retrieved from the PDB [55], with corresponding IDs (Table A in S1 File). Structures were initially superposed to the unbound crystal structure (PDB ID: 1L3F) to obtain mean coordinates, then iteratively superposed until convergence to eliminate rigid-body translational and rotational differences.
MD simulation of thermolysin in the unbound form (PDB ID: 1L3F) was performed using GROMACS 4.6.1 simulation package with GPU acceleration [56][57][58], with the AMBER99S-B-ILDN force field [59] (Sim1). The system was solvated with explicit Simple Point Charge (Extended) water model (SPC/E, [60]) and placed in a dodecahedral box, each edge at least 12 Å from the protein surface. The system was charge-neutralized by addition of two sodium counter-ions and minimized in two steps to remove atom clashes and bond contacts: first by a steepest descent minimization algorithm (2000 steps), followed by a conjugated gradient algorithm (1000 steps). The energy-minimized model was coupled to the V-rescale thermostat (300 K, coupling time constant 0.1 ps [61]) and Berendsen barostat (1 bar, coupling time-constant 0.6 ps [62]) and then equilibrated, where the force-constant of positional restraints for heavy-atoms was decreased from 1000 kJ/mol, 100 kJ/mol to 10 kJ/mol in three consecutive steps (100 ps). A production phase was finally run for 20 ns, with an integration step of 2 fs. Long-range electrostatics were treated with the Particle-Mesh Ewald algorithm and distance constraints between all H-bonds was enforced by the LINCS algorithm [63]. Although the employed force field does not appropriately represent the interactions between the catalytic zinc metal ion and the coordinated residues [64], this metal ion is not considered to have a structural role [65]. An additional non-biologically relevant zinc ion found at the active site of the crystallographic structure was removed. The protonation states of active site residues from the conserved HEXXH sequence motif were manually attributed, with H142 and H146 monoprotonated at the N δ position and E143 and E146 not protonated, taking into account its specific pKa. A replicate of the MD simulation was carried out (Sim2), where all abovementioned simulation set-up parameters were kept unchanged. After removal of the global rotation and translation of the protein, simulation trajectories show a RMSD convergence after 1 ns of the production phase (Fig A in S2 File). Therefore, only the 1-20 ns interval of full trajectories were used for EDA with the ProDy software, with the 1 ns frame being used as the reference structure. EDA is based on PCA, with the difference that the respective PCs (termed here as ED modes) are calculated based on n = 9500 trajectory frames taken at intervals of 2 ps, considering only the N = 316 C α atom coordinates.
The Anisotropic Network Model (ANM, [66][67][68]) of thermolysin in the unbound form (PDB ID: 1L3F) was calculated using the ProDy software. The C α atom coordinates were used as node representations of each residue of the protein (N = 316) to build the respective 3Nx3N Hessian matrix. Variations of the model included additional nodes, matching the coordinates of the catalytic zinc ion alone (N = 317) or the catalytic zinc ion and the four calcium ions (N = 321). The uniform force constant γ = 1 was used to calculate the overall potential of the system and the interaction cutoff distance r c = 15 Å was used to generate the respective Kirchhoff matrix of inter-residue contacts.
The collectivity degree, κ, was used as a measure of the number of atoms significantly affected by a given PC, ED or NM mode [69]. This value varies from κ = 1 for modes describing global translations of the protein to κ = N -1 if only one atom is affected (N = 316). The overlap, or correlation cosine, between two modes is given by the dot product of the respective eigenvectors after normalization, being equal to one if two modes are identical. The subspace overlap between two sets of modes is given by corresponding Root Mean Square Inner Product (RMSIP) value [70]. The respective overlap between the covariance matrices was also calculated, with a value of 1 if the two matrices are identical and of 0 when the respective subspaces are orthogonal [71].

Selection of representative structures
A representative set of MP structures from distinct families belonging to the MA clan of the peptidase database MEROPS (release 9.9) was selected [2]. In MEROPS, members are grouped in families based on their sequence similarity. Families are further grouped in clans when there is detectable structural homology, implying common ancestry. For each of the 23 (out of 42) families that contain members with resolved structures in the PDB, a representative structure was selected based on the following criteria: i) unbound structure containing no inhibitor molecule, substrate or substrate analog molecule and; ii) the structure with highest resolution or with no mutations. Only unbound crystal structures were selected since they are assumed to represent the native conformation of the corresponding protein. Also, the degree and mode of conformational change upon substrate or inhibitor binding is not well characterized for most clan members as it is for thermolysin, and the effects associated with ligand heterogeneity can be ruled out. The resulting set is comprised of 13 structures belonging to distinct families. For each structure, the respective information from the SCOP and CATH databases was retrieved [3,4]. In MEROPS, MA clan members are grouped into the MA(E) and MA(M) subclans, commonly termed as gluzincins and metzincins, respectively. These two subclans are divided based on the nature of the third zinc ligand: in MA(E) a glutamate, 18-72 residues apart from the conserved HEXXH motif towards the C-terminal; in MA(M) an histidine or aspartate in the extended HEXXHXXGXXH/D motif [1]. All structures correspond to the monomeric form of the proteins and the majority is characterized by a two-domain peptidase unit, with a conserved N-terminal domain and a more variable C-terminal domain. The active site is generally located between these two domains. Despite these general features, the set is structurally heterogeneous, with proteins containing domains that differ at the class, architecture and topology level of CATH classification criteria. In most cases, MEROPS classification is coincident with SCOP classification at the family level, except for the SCOP family 55505 "Neurolysin-like" that combines MEROPS families M2, M3 and M32; and also family 55487 "Zinc Protease" that combines M7 and M35 families. In the case of families M10 and M12, which are divided into subfamilies due to deep sequence divergence between their members, a representative was selected for each subfamily. Representatives of the MA(E) subclan families M3, M4 and M27 are endopeptidases and M1, M2 and M32 are exopeptidases, while in the MA(M) subclan all representatives are endopeptidases.

Structure-based alignments
Pairwise structural alignment of structures was performed with the DaliLite web-server for all 78 distinct pairs [72]. The embedded DALI algorithm identifies blocks of residues between two distinct structures that have similar inter-residue distances. Matching regions are evaluated based on a knowledge-based score and the produced alignment is the one maximizing this value for a variable number of distinct blocks. The statistical significance of the alignment is quantified in terms of a Z-Score that compares the obtained score with the one expected for a pair of structurally unrelated structures of the same size. A Z-Score greater than 2.0 is considered significant and was used as threshold value for a pair of structures to be considered as structurally similar [72]. In the MEROPS, a structure is grouped in a predefined clan if a Z-Score greater than 6.0 is obtained between the structure and at least one member of that clan. Therefore, not all pairs of representatives are expected to be structurally similar.

Dynamics-based alignments
Dynamics-based alignment of all pairs used for structural alignments were performed using the ALADYN web server [51]. First, the implemented algorithm calculates the low-energy modes for each structure and then it detects regions of both proteins with similar dynamic profiles. Calculation of modes is based on the coarse-grained β-Gaussian ENM, where amino acids are represented by a two-centroid representation: C α atom for the main chain and; C β for the orientation of the side chains (except for Gly residues) [73,74]. This ENM has been shown to describe protein motions similarly to the employed ANM [75]. The dynamics-based alignment is made by rewarding superpositions of proteins regions that exhibit high overlap between the 10 first modes for each amino acid pair within the cutoff distance of 7 Å. This allows for the alignment of proteins with different sequences and size. Following the optimization of scoring function, the statistical significance of the resulting alignment is evaluated against a reference probability distribution of scores obtained from alignments of unrelated protein pairs, being expressed in terms of a P-value. Dynamics-based alignments of two structures with P-value smaller than 0.02 are considered statistically significant and was used as the cut-off value to consider two structures as dynamically similar [51].

Results and Discussion
Internal Dynamics of Thermolysin Thermolysin, the MA clan type peptidase, is a 316 residue-long thermostable neutral MP from Bacillus thermoproteolyticus [1]. It presents endopeptidase activity towards peptide substrates, cleaving peptidic bonds preferentially close to aromatic residues. The active site contains a catalytic zinc ion bound to two histidines (H142 and H146) and one glutamate (E166) residue, and an additional catalytic glutamate (E143) residue [76,77]. It is located at the bottom of a pocket formed by the two protein domains: the N-terminal domain composed mostly of βsheets containing the conserved HEXXH sequence motif with the corresponding H142, E143 and H146; the C-terminal domain composed mostly of α-helices, where E166 is located. Given our interest in analyzing and comparing the dynamical properties of MPs, we first characterized the dynamical features of thermolysin, particularly those that are functionally relevant. In order to do this, we used the ProDy software, and the type of results produced can be found in Figs 1 and 2.
Several studies of thermolysin have been made in the last decades, with multiple crystal structures obtained in different conditions available in the PDB. These structures provide snapshots of the motions undergone by the protein upon interaction with different molecules, including substrate-analogs and inhibitors, thus giving the opportunity to describe the conformational subspace related with its function [78][79][80]. For this purpose, we applied PCA to a set of 112 thermolysin crystal structures in order to characterize the collective modes of atomic displacements (see Methods section for details on PCA calculations) [78,81]. As shown in Table 1, a small set of PCs describes the majority of atomic positional variations occurring in the structural set, with the first six PCs (PC1-PC6) describing 80% of the total variance. PC1 alone accounts for 35% of the variance, with a cumulative displacement of < 5 Å. The respective structural variation for each residue along PC1 is shown in Fig 1A and 1B. Although these structural differences are small (0.60±0.08 Å), they are nonetheless highly collective, affecting approximately half of the atoms in the protein (0.43 κ 0.69). Higher fluctuations are seen for the N-terminal region (including the surface residues 42-62). Regions 105-117 and 210-230, which form the top of the active site pocket and contain residues involved in substrate binding [82], also exhibit high, anti-correlated variations (Fig 1A inset). Conversely, the pocket bottom region where the zinc binding residues are located shows relatively lower variation, with the exception of the catalytic residue E143. The structural fluctuations seen for this  catalytic residue towards the pocket bottom reflects local accommodations of the structure to the presence of different ligand molecules in the active site [65,83]. Projection of each structure onto the subspace spanned by PC1-PC2 is shown in Fig 1C. Two distinct clusters corresponding to the subsets of bound and unbound forms are obtained, as seen by the distribution of structures along the PC1 axis (P-value in Table 1). This indicates that the presence of molecules in the active site is associated with conformational changes in the protein that are effectively described by PC1. The variations described by PC1 point to an opening-closure movement of the active site pocket, i.e. an hinge-bending motion of N-and C-terminal domains with the vertex at the pocket bottom. Large scale, hinge-bending displacements were initially described for thermolysin and related neutral proteases [84]. The correspondence with hinge-bending motions was made by analysis of short MD trajectories and later confirmed by the reported unbound crystallographic structure of thermolysin [83,85]. Therein, the detected hinge-bending motions were related to transitions between "open" and "closed" conformations of the unbound and bound forms of the protein, respectively, indicating their functional role. Therefore, the current results show that PC1 alone can describe to a reasonable extent the functionally-relevant conformational changes of thermolysin, with motions in the positive direction along the PC1 axis describing an "opening" of the active site pocket and in the negative direction to its "closure".
In order to test if the conformational changes described by the PCA are intrinsic, i.e. if they are encoded in the internal dynamics of thermolysin, we performed EDA on snapshot conformations from two replicates (Sim1 and Sim2) of a 20 ns MD simulation trajectory of the protein in its unbound state (see Methods section for details) [86,87]. A similar analysis has been previously reported for other proteases of different clans (with distinct catalytic chemistry) [79,88]. EDA is focused on the subspace of PCs, typically the top-ranking modes, that describe the majority of collective atomic motions along a simulation trajectory, and which are typically related to protein function. As indicated in Table 1, the first six EDA-derived PCs (ED1-ED6) have lower variance values than the corresponding structurally-derived PCs (PC1-PC6), with ED1-ED6 of Sim1 accounting for only 52% of the total variance (40% for Sim2). However, in terms of collectivity degrees, EDs and PCs are similar (0.35 κ 0.71 for Sim1 and 0.33 κ 0.61 for Sim2), reflecting the collective nature of motions described by ED1-ED6 modes. Although conformational changes related to ligand-binding typically occur at longer timescales, EDA of MD simulations of a few ns generally provides a reasonable description of the full conformational space explored by the protein [70,71]. During the total simulation time analyzed (19 ns after RMSD convergence), the sampled subspaces are convergent for each simulation. This is shown by comparing the covariance of residue fluctuations between two time Table 1. Fraction of variance and collectivity of the first 6 PC, ED and NM modes obtained for thermolysin. The collectivity degree is expressed as κ and reflects the portion of atoms in the structure affected by a given mode. intervals, the interval of the initial 11.4 ns and the full time interval. The overlap of covariance matrices obtained is 0.63 for Sim1 and 0.65 for Sim2. Moreover, the overlap between subspaces explored during the two time intervals yields a RMSIP of 0.93 for both Sim1 and Sim2. When comparing Sim1 against Sim2, the sampled conformational subspaces defined by the respective ED1-ED6 modes are similar, with a RMSIP of 0.79. However, the overlap between the respective covariance matrices is relatively low (0.38). These results indicate that the two simulations exhibit similar dynamical behavior during the analyzed time window, although the sampled conformational space explored are distinct, as it has been reported for other proteins [89,90]. While it remains uncertain if the protein explores a distinct conformational space on longer time scales, which would require simulation times from μs to ms or other methods more suitable to characterize the protein potential energy surface, the remaining analysis is focused on the correspondence between the results obtained from the current EDA with the experimentally-derived PCA of thermolysin. Therefore, in order to compare the obtained EDs with structurally-derived PCs, the overlap between each of the respective first six modes was calculated. As shown is Fig 2A, the highest value was obtained between modes ED3 and PC1 from Sim1, with an overlap of 0.72 (for Sim2 an overlap of 0.71 was obtained between ED1 and PC1). The large overlap between ED3 and PC1 from Sim1 translates in similar directions of residue fluctuations shown in Fig 1A and 1D, particularly in the region comprising the active site pocket. Further confirmation of the high similarity between these modes is obtained in Fig 1E, where cross-projection of crystal structures along PC1 and ED3 from Sim1 yields a distribution along the diagonal with a clear separation between bound and unbound subsets. Indeed, the large cumulative overlap of 0.85 between ED1-ED3 from Sim1 and PC1 (0.80 for ED1-ED3 and PC1 of Sim2) show that functionally-relevant conformational changes are effectively captured by the first three ED modes and therefore can be said to be encoded in the internal dynamics of the protein. However, there is low correspondence between subspaces defined by the respective modes, as given by the low RMSIP value of 0.49 for Sim1 and 0.46 for Sim2 between ED1-ED3 and PC1 (for ED1-ED10 and PC1-PC10 the RMSIP is 0.56 for Sim1 and 0.53 for Sim2) [70,91]. The obtained results show that a high conformational space is sampled during MD simulations, which includes a functionally-relevant subspace (particularly the ones described by ED3 from Sim1 or ED1 from Sim2) that is only explored upon interaction with the ligand (as described by PC1). This is in line with conformational selection models of protein function described elsewhere [80,92].

PC
ENMs have been extensively evaluated against experimental and computational benchmarks [73,[93][94][95][96][97][98][99][100][101]. They have also been used previously to describe functional aspects of MP internal dynamics, although the relation between collective motions of MPs and their function was only indirectly established [50]. Specifically, it was shown that Atrolysin E and other nonmetallo proteases have similar dynamical profiles in the respective active site regions. However, regions close to metal-centers and active sites tend to exhibit similar and relatively restrained, dynamical profiles [102,103]. Since MPs contain a metal-center in their active site, the functional relevance of the respective ENM-predicted dynamical profiles should be more thoroughly addressed. For this purpose, we first tested the suitability of using coarse-grained ENMs in reproducing the internal dynamics of thermolysin. The ANM implemented in the ProDy software was employed to the unbound structure of thermolysin and the space of collective motions was characterized by Normal Mode Analysis (NMA) [54,93,[104][105][106].
The ANM employs a C α -based node representation of protein structure. Given the presence of metal ions in clan MA members, the addition of node representations for metal ions was also evaluated for thermolysin, using a similar approach to the one made for another ENM [103]. Variations of thermolysin ANM were generated to represent the zinc ion found at the active site and also four additional calcium ions found at the N-terminal domain. While the later are crucial for the thermostability of the protein, the zinc ion is not considered to play a crucial structural role, since its metal-substituted forms present very similar tertiary structures [65]. Fig 1F shows that the respective residue square fluctuations profiles of the model variants produced are almost identical to the C α -only ANM (r > 0.995 for both variants), with only slight variations on fluctuation amplitudes in the 220-226 region, at the top of the active site pocket. The close similarity between the two variants indicates that the introduction of calcium nodes does not produce significant changes in the global dynamical profile of the protein, but the inclusion of the zinc node alone produces small local changes in the dynamic profile of the active site pocket. Since the calcium nodes are introduced at a highly clustered region of the network of inter-residue contacts, its topology is mostly unchanged. The introduction of the zinc node, on the other hand, leads to new network connections in the active site region, producing slight variations on its topology that nonetheless result in very similar dynamical profiles. Therefore, the inclusion of additional nodes for metal ion representation produces only local changes on the ANM-derived dynamical profile of thermolysin, the clan MA type protein.
The results above are in line with recent studies where the employment of coarse-grained models was evaluated for globular folds [21,36], such as the one of thermolysin. It was found that the respective ENMs can effectively capture their essential dynamics [21] and that these models are robust to local perturbations [36]. Given the conserved tertiary structure of clan MA members, the accuracy of the respective ENMs in describing their internal dynamics is not expected to greatly increase with the inclusion of additional metal ion nodes. We therefore focused the remaining analysis on the results obtained for the C α -based ANM of thermolysin (N = 316), particularly on the subset of NMA-derived low-frequency NM modes (NM1-NM6).
As shown in Table 1, the variance of NM1-NM6 is significantly lower than the corresponding PCs and similar to ED modes, accounting for 57% of the total variance. In particular, NM1 accounts for only 18% of the total variance, similar to ED1 from Sim1 and Sim2. In terms of collectivity degrees, the three lowest-frequency modes (NM1-NM3) are highly collective (0.63 κ 0.64), in the range of PCs and EDs, and the remaining modes (NM4-NM6) have significantly lower values (0.09 κ 0.38). The overlap between NM modes and PCs is shown in Fig 2B. Remarkably, a large overlap of 0.71 is found between NM1 and PC1. Fig 1A shows the close correspondence between NM1 mode directions and PC1. Again, a high similarity of motions is observed in the region comprising the active site pocket. This similarity is also reflected in terms of the corresponding profiles of residue fluctuations in Fig 1F. Cross-projection of the structural set along NM1 and PC1 shown in Fig 1G further confirms the close correspondence between these modes. This indicates that the lowest-frequency mode (NM1) predicted by the ANM can effectively reproduce the functionally-relevant conformational change described by PC1 [68]. Indeed, structural variations described by the first PCs are well covered by the low-frequency modes, with a large cumulative overlap of 0.81 between NM1-NM3 and PC1 and a RMSIP of 0.66 between subspaces defined by NM1-NM3 and PC1-PC3 (for NM1-NM10 and PC1-PC10 the RMSIP is 0.58).
Comparison between NMs and EDs was also made by calculating the overlap between corresponding modes. As it can be seen in Fig 2C, there is significant overlap between NM modes and EDs from Sim1, with a particularly large overlap of 0.82 between NM1 and ED3 (0.69 between ED1 and NM1 from Sim2). Remarkably, a high RMSIP of 0.72 is also obtained between subspaces defined by NM1-NM3 and ED1-ED3 for both Sim1 and Sim2 (for NM1-NM10 and ED1-ED10 the RMSIP is 0.76 for Sim1 and 0.74 for Sim2). Therefore, it can be said that the employed coarse-grained ANM can reproduce fairly well the conformational space explored by thermolysin during the two independent MD simulations.
In conclusion, the results show that both MD simulations and ANM provide a reasonable description of thermolysin internal dynamics, particularly the subspace of collective motions with functional relevance. This prompted the use of ENM-based methods to study other evolutionarily-related MPs and to quantitatively compare their internal dynamics. The obtained results will be discussed in following section.

Structural and Dynamical Alignments
Structure-and dynamics-based alignments of a representative set of 13 MA clan proteins ( Table 2) was made using the DaliLite and ALADYN web-servers, respectively, as described in Methods section. The employed algorithm in ALADYN is based on the β-Gaussian ENM, where a two-nodes per residue representation is used. Using this approach, the inclusion of additional metal ion nodes could not be evaluated as in the previous section for thermolysin ANM. However, given the structural heterogeneity of clan representatives, the difference between their respective dynamical profiles is expected to surpass the local changes produced in each protein by the inclusion of metal nodes.
The scores obtained for each of the resulting 78 aligned pairs are shown in Fig 3 and Table B in S1 File, being classified into three distinct groups: 1) MA(E), with both proteins belonging to the MA(E) subclan; 2) MA(M), for proteins belonging to the MA(M) subclan and; 3) Mixed, with each protein belonging to different subclans. Given that each representative protein is not a homologue of the remaining representative proteins [2], the simplified mapping of scores shown in Fig 3A can be seen as an additional layer to the typical representation of protein relatedness in terms of sequence similarity, providing a characterization of the clan in terms of structural and dynamical diversity. As it will be discussed below, it allows for the identification of unnoticed functional similarities between distinct proteins and provides a description of how structure and internal dynamics of proteins are related within a given clan. It can be useful, e.g. in the field of structural genomics, where protein function assignment could be made based not only on sequence and structural similarity, but also on information obtained from dynamics-based alignments with a set of proteins with known catalytic function.
The threshold values to consider a pair either structurally or dynamically similar are based on the employed methods. Overall, the majority of pairs analyzed are structurally similar; with 87% of pairs having Z-scores > 2.0 [72]. However, 69% of pairs are not dynamically similar, since they have P-values > 0.02 [51]. Regarding the structural conservation of the three types of pairs considered, all 21 MA(M) pairs are structurally similar while for MA(E) pairs this is the case for 87% of the 15 pairs. In the case of mixed pairs, 81% of the 42 are structurally similar, although with Z-scores clustered near the threshold of structural similarity. Regarding the internal dynamics of subclan members, remarkably, only 14% of MA(M) pairs have dynamical similarity, while 53% of MA(E) pairs also present it. In the case of mixed pairs, only 17% have dynamical similarity. Since the internal dynamics of proteins are ultimately dependent on their structure, structural similarity is expected to be related with dynamical similarity, i.e. pairs with higher structural similarity scores would tend to have higher dynamical similarity scores. However, no correlation is apparent between Z-scores and P-values for all analyzed pairs, although it is noted that pairs with Z-scores > 19.0 are associated with a high dynamical similarity scores (P-value < 0.001), corresponding to MA(E) pairs M3-M2, M3-M32 and M32-M2. As it will be discussed below, MA(E) pairs M32-M8 and M32-M27 are particularly relevant since their respective alignments have Z-Scores < 2.0 and P-values < 0.001. In the case of MA(M) pairs, there is an apparent inverted relation between dynamical and structural similarities, since pairs with lower structural similarity (lower Z-scores) show higher dynamical similarity (lower Pvalues), e.g. pair members M35-M12(B), M35-M12(A), M8-M10(A).
A similar analysis was made by Carnevale et al., where no general trend between structural and dynamical similarity scores for pairs of proteases from different clans and catalytic chemistries was found [6]. In that case, no specific correlation was expected since the analyzed representatives were considered to have different evolutionary origin and therefore share minimal structural homology. Nonetheless, structural and dynamical similarity was identified for some pairs, with the authors arguing for convergent evolutionary pressure to be at play in such cases. It was therefore suggested that in cases where structurally variability is observed, a compensatory mechanism for dynamical conservation could maintain the catalytic capacity of the  proteins. Conversely, since in this study MP representatives are from the same clan and therefore have assumed common ancestry, no structural or dynamical similarity between two representatives is expected to reflect a divergent evolutionary process. Given that structural and dynamical similarities are apparently not correlated in this case, it is suggested that selective pressure is acting independently on structure and internal dynamics. Fig 3B provides an overview of how individual representatives are related in terms of structural and dynamical similarity. Only pairs considered to be structurally-or dynamically-similar are linked in the weighed graphs, where edge width is related to Z-score and P-value. In terms of structural similarity, members are more connected to representatives belonging to the same subclan, but overall there are also inter-subclan connections, reflecting the structural conservation across the entire clan. There is one exception for representative M27, Botulinum Neurotoxin Type A Light Chain, which shows low connectivity to the remaining representatives.
Regarding dynamical similarity, there is less connectivity between representatives, which may reflect divergence of internal dynamics along the clan. Also, subclan similarity is less pronounced as it can be seen for the case of, e.g. representative M8, which has higher connectivity with MA(E) subclan members than with those of MA(M) subclan where it belongs to. Notable exceptions are representatives Extracellular Small Neutral Protease (M7) and Interstitial Collagenase (M10(A)) that share no dynamical similarity with the remaining MP representatives. These two proteins are relatively small in size; Extracellular Small Neutral Protease is a 132-residue long protein and the structure of Interstitial Collagenase corresponds only to the 168-residue long catalytic domain. Although their internal dynamics are also characterized by hingebending motions (not shown), the amplitude of motions of their smaller domains has no correspondence with the motions of the larger domains of the remaining representatives.
These results provide a quantitative measure of the structural and dynamical similarity that characterizes MA clan members and provides a "horizontal" view on MP evolution. In order to understand if there is a functional basis for the conservation or divergence of such features, a chosen set of alignments was further inspected. For pairs with high structural and dynamical similarity M32-M2, M3-M32 and M3-M2, the structural homology between the corresponding representatives Angiotensin Converting Enzyme (M2), Neurolysin (M3) and Carboxypeptidase Pfu (M32) had been previously reported [107][108][109]. Indeed, these proteins are grouped in the same SCOP family and their high dynamical similarity can be related directly to their structural resemblance. Nonetheless, Angiotensin Converting Enzyme and Carboxypeptidase Pfu have no attributed CATH codes and Neurolysin presents three domains, including the common 3.40.390.10 domain found in other representatives. The alignments of Neurolysin and Carboxypeptidase Pfu, which represent the pair with highest dynamical similarity, were chosen for further inspection. Neurolysin is a 681 residue-long endopeptidase that cleaves the 13-residue peptide neurotensin but it has also activity towards a diverse set of oligopeptide sequences [107]. Carboxypeptidase Pfu is a 499-residue long thermostable carboxypeptidase homodimer with broad substrate specificity [108].
Both structures are mainly α-helical in content and characterized by a deep narrow channel that divide the structures into two domains, with a wider opening at one end and with the active site located at the bottom. This prevents activity towards large, folded substrates. Their structure-based alignment is shown in Fig 4A. The alignment produces a RMSD of 3.7 Å for 449 amino acids used with 15% sequence identity. Aligned regions consist on core regions surrounding the active sites and α-helices that constitute the channel base and walls. The respective dynamics-based alignment, shown in Fig 4B, produces a lower number of equivalent residues, with RMSD of 2.8 Å for 371 amino acids with 13.5% sequence identity (RMSIP of 0.870). It reveals that these regions undergo very similar deformations resembling hinge motions, most likely corresponding to channel opening for substrate access. The aligned portions in both structure-and dynamics-based alignments have high degree of identity.
The motions produced by the employed β-Gaussian ENM are in agreement with findings reported for each MP. Indeed, hinge movements of Carboxypeptidase Pfu sub-domains were previously argued to be involved in channel closure and this conformational change was proposed to have a functional role, namely to extend the number of interactions with the substrate [108]. In the case of Neurolysin, it was suggested that both domains are rigid but present some mobility in relation to each other due to looser packing at the base of the channel [107]. In both cases, the amplitude of hinge movements is restricted due to the presence of cross-domain segments that tighten the channel at one end and limit the access of longer substrates to the buried active site. Remarkably, Carboxypeptidase Pfu cross-domain α-helix (α 4 , residues [81][82][83][84][85][86][87][88][89][90][91][92][93][94][95][96][97][98][99], that contains the conserved R92 considered to have a crucial role in substrate-length restriction, is almost perfectly aligned with an equivalent cross-domain α-helix of Neurolysin (α 5 , residues 137-152) in both structure-and dynamics-based alignments. Moreover, it is noted that Neurolysin residue Lys148 and the conserved Arg92 of Carboxypeptidase Pfu are very close positioned in the dynamics-based alignment of the respective structures, that together with sharing the chemical nature suggests their functional equivalence (Fig B in S2  File). Therefore, the conservation of internal dynamics between these two MPs may not be only a direct consequence of their structural similarity, but may also have a functional basis. Specifically, the requirement to have specific channel opening amplitudes in order to restrain substrate-length, while more local variations in structure allow for distinct specificities in terms of substrate sequence recognition and binding.
Further support for the conservation of internal dynamics between these two MPs comes from the findings reported for Angiotensin Converting Enzyme, which shares high structural and dynamical similarity with both Neurolysin and Carboxypeptidase Pfu [110,111]. The authors showed that for this MP hinge-bending motions have a functional role, since channel opening allows for substrate access to the active site. Furthermore, it was shown that these motions are conserved between other M2 family members. Finally, the presumed dynamical resemblance between M2 family members and another member of the M3 family, Carboxypeptidase Dcp, has been previously noted [112]. Together with these findings, results obtained for M2, M3 and M32 representatives suggest that conservation of internal dynamics is not limited to homologues, but that it can be extended for other families that share some structural similarity. Pairs M32-M8, M27-M8 and M32-M27 were also analyzed due to their high dynamical similarity with no structural similarity. Both Carboxypeptidase Pfu (M32) and Botulinum Neurotoxin Type A (M27) belong to the Gluzincin subclan but have no attributed CATH numbering. Leishmanolysin (M8) belongs to the Metzincin subclan and has four CATH domains, with the particular feature of lacking the conserved 3.40.390.10 domain shared by other Metzincin subclan representatives. The pair of Leishmanolysin and Carboxypeptidase Pfu was chosen for further inspection, as it corresponds to the only Mixed pair with very high dynamical similarity. Leishmanolysin (also termed GP63) is a 478 residue-long surface glycoprotein from Leishmania major that occurs as a dimer and has activity towards a wide variety of peptidic substrates [113]. It adopts a predominantly compact fold composed of mostly β-sheet secondary structure elements, in contrast with the predominant α-helical structure adopted by Carboxypeptidase Pfu [114]. The structure differs from Carobxypeptidase Pfu and other MPs as it is composed of three domains: the N-terminal domain that contains the active site HEXXH sequence motif; the central domain, that presents an unique 62 residue-long insertion between the conserved Glycine of the HEXXHXXGXXH/D metzincin sequence motif and the third active site residue Histidine; and the C-terminal domain, which contains the membrane anchoring point and is composed mainly of β-strand and random coil elements, being positioned at one end of the active site cleft base. Both N-and central domain form the active site cleft. EDA of Leishmanolysin MD trajectories revealed that large-scale, collective motions dominate the conformational changes explored by the protein. These consist in hinge-bending motions characterized by rigid body movements of the N-terminal domain relative to the central and C-terminal domains, with the hinge axis located at base of the active site cleft [115]. The corresponding structure-based alignment is shown in Fig 5A, with a RMSD of 4.6 Å for 88 amino acids used with 9% sequence identity. Unlike the pairs with high structural and dynamical similarity with aligned regions spanning almost the entirety of the structures, the aligned portions of Leishmanolysin and Carboxypeptidase Pfu are restricted to the N-terminal domain, including the α-helix containing the active site HEXXH sequence motif. Indeed, this was typically observed for other non-structurally similar pairs of MPs [1]. The overall orientation of the structures is kept, with the substrate-binding pockets and active residues being almost identically positioned in both structures and with the positioning of the C-terminal domain of Leishmanolysin at the more open end of Carboxypeptidase Pfu channel. Dynamics-based alignment shown in Fig 5B produces a slightly higher number of equivalent residues, with an RMSD of 3.3 Å for 91 amino acids with 7.7% sequence identity (RMSIP of 0.832). Remarkably, the dynamics-based alignment results in a complete horizontal rotation of Leishmanolysin in relation to Carboxypeptidase Pfu when compared to the structure-based alignment. This results in the positioning of its C-terminal domain at the cross-domain segments that constitute the more closed end of Carboxypeptidase Pfu channel and the active sites become approximately 10 Å apart from each other. Nonetheless, the substrate binding-pocket of both MPs retain their orientation and regions undergoing hinge-bending motions are almost identically positioned, indicating their dynamical equivalence. This is also observed in the structure-based and dynamics-based alignments produced for Leishmanolysin and Botulinum Neurotoxin Type A (data not shown), thus suggesting that there is conservation of internal dynamics even when remarkable structural differences between members occur. Moreover, this conservation has a functional basis, namely to allow for proper orientation of interactions between the proteins and their substrates. Therefore, the relatedness of Leishmanolysin, Botulinum Neurotoxin Type A and Carboxypeptidase Pfu, which has so far been only considered at the clan level, becomes evident when their internal dynamics are taken into account.

Conclusions
This work provides a quantitative characterization of the structural and dynamical diversity occurring within the MEROPS MA clan of MPs. It shows that metalloproteases of this clan have distinct dynamical profiles despite their overall structural similarity. Also, it is shown that in cases where high dynamical similarity is observed, the predominant modes correspond to hinge-bending motions associated with substrate-binding. These motions are functionally relevant and appear to be conserved in the clan even when remarkable structural differences between its members occur. Therefore, besides providing a description of the structural and dynamical features of a set of proteins, this type of analysis can also provide new insights on enzyme function that remained unnoticed so far. For metalloproteases, it is suggested that the need to maintain proper substrate interactions has an important role on the conservation of their internal dynamics. Therefore, the type of interactions between a protein and its ligand and the associated motions should be more carefully considered when comparing the internal dynamics of a diverse set of functionally distinct proteins. Together, this work contributes to the development of simple and effective approaches that incorporate quantitative analysis of dynamical similarity between proteins to study the evolution of metalloprotease internal dynamics and the factors governing them.
Supporting Information S1 File. Thermolysin structures (Uniprot ID: P00800) retrieved from the PDB. Unbound structures in bold. Reference structure for PC, NM and MD simulations in bold underlined ( Table A). List of Z-scores and P-values obtained for the alignments of MP representative structures (Table B). (DOCX) S2 File. Root Mean Square Deviations of residue C α obtained for Sim1 and Sim2 (Fig A). Dynamics-based alignment of Neurolysin (Blue, PDB ID: 1I1I) and Carboxypeptidase Pfu (red, PDB ID: 1KA4). Neurolysin K148 and Carboxypeptidase Pfu R92 in bond representation. Active site residues in colored surface representations (Neurolysin: H474, E 475, H478 and E503; Carboxypeptidase Pfu H269, E270, H273 and E299) (Fig B). (DOCX) Internal Dynamics of Metalloproteases