Josephin Domain Structural Conformations Explored by Metadynamics in Essential Coordinates

The Josephin Domain (JD), i.e. the N-terminal domain of Ataxin 3 (At3) protein, is an interesting example of competition between physiological function and aggregation risk. In fact, the fibrillogenesis of Ataxin 3, responsible for the spinocerebbellar ataxia 3, is strictly related to the JD thermodynamic stability. Whereas recent NMR studies have demonstrated that different JD conformations exist, the likelihood of JD achievable conformational states in solution is still an open issue. Marked differences in the available NMR models are located in the hairpin region, supporting the idea that JD has a flexible hairpin in dynamic equilibrium between open and closed states. In this work we have carried out an investigation on the JD conformational arrangement by means of both classical molecular dynamics (MD) and Metadynamics employing essential coordinates as collective variables. We provide a representation of the free energy landscape characterizing the transition pathway from a JD open-like structure to a closed-like conformation. Findings of our in silico study strongly point to the closed-like conformation as the most likely for a Josephin Domain in water.


S1 Text Molecular Dynamics and Metadynamics Lowest Energy States
the Josephin Domain (JD) in explicitly modeled water classical Molecular Dynamics (MD) for 500 ns. In Figure S1 the evolution of the calculated by averaging the last 100 ns of each MD simulation heck if the system conformation is stable under simulation reasonably stable in the last 100 ns of each simulation ( Figure S1).
C α -C α Root Mean Square Deviation (RMSD) relative to the average structure (calculated by averaging the last 100 ns of each MD simulation). The RMSD plot is represented for each MD replica.
shows the JD's secondary structure probability, calculated over the available NMR contained in 1YZB, 2AGA, 2DOS, 2JRI, several configurations at the lowest energy state coming from metadynamics and the overall MD trajectory in the last 100 ns. MD data coming from the five replicas are used as an ensemble. The JD secondary structure showed to be , metadynamics and classical MD simulations. Figure S3) have been also investigated with particular attention to hydrophobic contacts for available NMR data and metadynamics lowest energy states. Contact matrixes which are not present in 1YZB and 2JRI. Those areas are: α3, a region between β1 and α5 and β6 ( Figure   S2). Figure S2. Secondary structure percentage, calculated over the available NMR configurations contained in 1YZB, 2AGA, 2DOS, 2JRI, several configurations at the lowest energy state coming from metadynamics and the overall MD trajectory in the last 100 ns for the simulated five replicas. The alpha-helix (blue), beta-sheet (orange) and unstructured coil (green) are represented. Figure S3. Protein-Protein contacts been considered), calculated as average over the available NMR configurations contained in 1YZB, 2AGA, 2DOS, 2JRI, and several configurations at the lowest energy state coming fr are highlighted similar contacts in lowest energy states

Josephin Domain Structural Conformations Explored by Metadynamics in Essential Coordinates
matrix and Protein-Protein hydrophobic contacts matrix , calculated as average over the available NMR configurations contained in 1YZB, 2AGA, 2DOS, 2JRI, and several configurations at the lowest energy state coming from the metadynamics simulation. In yellow are highlighted similar contacts in lowest energy states, half-closed and closed NMR structures.

Josephin Domain Structural Conformations Explored by Metadynamics in Essential Coordinates
3 Protein hydrophobic contacts matrix (all heavy atoms have , calculated as average over the available NMR configurations contained in 1YZB, 2AGA, 2DOS, metadynamics simulation. In yellow closed and closed NMR structures.

Section 2 Principal Component Analysis
Principal Component Analysis (PCA) was applied to the whole data coming from the MD trajectory pertaining to the five replicas used as ensemble in order to provide collective motion directly related to the hairpin closure. In detail, after the alignment of the protein Cα positions, the covariance matrix was calculated and diagonalized.
The visualization of the first 10 eigenvalues ( Figure S4), representative of the variance of the principal components, clearly highlights that the first eigenvalue accounts for the 50 % of the total variance in the data set. Figure S4. Eigenvalues obtained applying the PCA to the classical MD trajectory.

Section 3 Metadynamics Collective Variables
The estimation of the free energy profile was performed by employing the reweighted-histogram procedure [1,2], taking into account four collective variables: the projection along the first PCA eigenvector, the JD Radius of Gyration (RG), the hairpin angle [3] and the alphaRMSD variable (parameters r 0 =0.08 nm, n=6, and m=12).

PCA eigenvector
The first eigenvector derived from the PCA was used as CV for a well-tempered Metadynamics simulation of 500 ns starting from the open-like 1YZB model [4]. To perform Metadynamics, a Gaussian width of 0.1 was used. Along the simulation, the initially Gaussian deposition rate value of 0.2 kJ/mol·ps was used and it gradually decreased on the basis of an adaptive scheme, with a bias factor of 20. The Gaussian width value was the same order of magnitude as the standard deviation of the CVs (Table 1), calculated during unbiased simulations (production MD).

Radius of Gyration
The JD Cα-Radius of Gyration (RG), already used in previous studies [3,5] to analyze the JD conformational transition, was considered as second CV. Given that the NMR models (1YZB, 2JRI, 2DOS, and 2AGA) considered in this work present a different number of residues, the RG has been calculated by considering all the residues in common among the above mentioned PDB models. In detail, the residue range 1Met-171Asp (according to 1YZB numbering) has been chosen.

AlphaRMSD
The alphaRMSD variable represents the distance from the alpha helix configurations to measure the number of segments that have an alpha helical configuration. This is done by calculating the following sum of functions of the RMSD distances: where the sum runs over all possible segments of alpha helix.

Angle
As third reaction coordinate we selected a large-scale bending angle between the globular and the helical hairpin subdomain. This reaction coordinate has been previously used in a recent literature [3] to estimate the JD free energy profile. The angle is here calculated from the centers of mass of the Cα atoms from three distinct JD regions ( Figure S5): globular subdomain (residues 111-113, 122-125 and 162-165), hinge (residues 32-35) and loop (residues 45-48 and 58-61) [3].

Convergence of the free energy estimation
The convergence of the Metadynamics computational procedure performed i between low energy states and ii i) Several recrossing events between low energy states can be identified analyzing the time evolution of the hairpin angle and radius of gyration ( estimation of the protein free energy state, as reported in ii) The free energy difference between low energy states at different times along the simulation ( S7) was calculated to assess the convergence. The reasonably stable from 100 to 500 ns. The uncertainty, calculated as the SD from the asymptotic value of the free energy obtained from the last part of the simulation is 0.5 kJ/mol. Figure S6. a) Plot of the Gaussian Height added to the system along the recrossing events between the low small. These events lead to the convergence in the

onvergence of the free energy estimation
Metadynamics simulation was demonstrated by following a computational procedure performed in a recent work [6], which requires to check ii) fluctuations of the free energy difference, around a Several recrossing events between low energy states can be identified analyzing the time evolution of the hairpin angle and radius of gyration ( Figure S6). These events lead to the convergence in the estimation of the protein free energy state, as reported in Figure S7.
The free energy difference between low energy states at different times along the simulation ( ) was calculated to assess the convergence. The Figure S7 shows that the free energy difference is from 100 to 500 ns. The uncertainty, calculated as the SD from the asymptotic value of the free energy obtained from the last part of the simulation is 0.5 kJ/mol.
. a) Plot of the Gaussian Height added to the system along the Metadynamics ow energy states can be identified, even when the added Gaussian small. These events lead to the convergence in the estimation of the JD free energy profile.  (Right) Convergence of the Free Energy calculation. ΔG is calculated using 1.54 < Ewell1 < 1.56 nm and 1.77 < Ewell2 < 1.79 nm. c) (Left) Free Ene obtained by reweighting the Metadynamics the Free Energy calculation. ΔG is calculated using 62 < Ewell1 < 64 ° and 97 < Ewell2 < 99 °. The uncertainty, calculated as the SD from the asymptotic value of the free energy obtained from the last part of the simulation is 0.5 kJ/mol. rgy profile as function of the hairpin angle data. The Energy wells are highlighted in blue. (Right) Convergence of the Free Energy calculation. ΔG is calculated using 62 < Ewell1 < 64 ° and 97 < Ewell2 < 99 °. The uncertainty, mptotic value of the free energy obtained from the last part of the simulation is