Molecular Simulation-Based Structural Prediction of Protein Complexes in Mass Spectrometry: The Human Insulin Dimer

Protein electrospray ionization (ESI) mass spectrometry (MS)-based techniques are widely used to provide insight into structural proteomics under the assumption that non-covalent protein complexes being transferred into the gas phase preserve basically the same intermolecular interactions as in solution. Here we investigate the applicability of this assumption by extending our previous structural prediction protocol for single proteins in ESI-MS to protein complexes. We apply our protocol to the human insulin dimer (hIns2) as a test case. Our calculations reproduce the main charge and the collision cross section (CCS) measured in ESI-MS experiments. Molecular dynamics simulations for 0.075 ms show that the complex maximizes intermolecular non-bonded interactions relative to the structure in water, without affecting the cross section. The overall gas-phase structure of hIns2 does exhibit differences with the one in aqueous solution, not inferable from a comparison with calculated CCS. Hence, care should be exerted when interpreting ESI-MS proteomics data based solely on NMR and/or X-ray structural information.

ESI-MS has also been used for structural proteomics in combination with experimental structural biology techniques (e.g. X-ray and NMR) and/or computational techniques (e.g. homology modeling and protein-protein docking) [3,[19][20][21][22][23]. These applications are based on the assumption that the vaporization of non-covalent protein complexes from aqueous solution into the gas phase (as occurs during ESI-MS) in general preserves the characteristic structural determinants of the complexes in water [24][25][26][27][28]. This assuption is consistent with the avaliable CCS data for some biomolecules and with the fact that intact non-covalent protein complexes in ESI-MS are indeed transferred into the gas phase [29][30][31][32][33][34]. However, direct proove for this concept has not been forthcoming at the atomistic structural level, because the structural determinants of gas-phase protein complexes have remained largely unknown [28]. Thus, the preservation of these determinants on passing from solution into the gas phase is still under debate for protein complexes. Predicting the structure of protein complexes under ESI-MS conditions, and in particular assessing whether native interactions in the gas phase reflect those in the aqueous phase, is therefore important for ESI-MS based structural proteomic studies.
A straightforward approach to improve the structural prediction is to run molecular dynamics (MD) simulations and select models that are consistent with the CCS [35,36]. However, these investigations have limited predictive power as no validations are provided against the main charge and the simulation is basically used as a tool to generate structural ensembles from which specific conformers can be selected [35]. More elaborate protocols have been developed for single proteins in the gas phase [24,[37][38][39][40][41][42][43][44][45][46][47].
These approaches have predicted ensembles of structures consistent with the experimentally measured charge and CCS [38,39]. They have further suggested that desolvation leads to more compact overall protein structures while preserving the majority of the secondary and tertiary structures [24,37,38,40]. In addition, the fraction of hydrogen bonds (relative to the theoretical maximum) increases significantly upon passing from aqueous solution (on average 43%) to the gas phase (on average 56%) [41]. This suggests that proteins in the gas phase may be trapped in a low energy state, structurally close to the native state in water [42]. Our recent studies further indicate that the ionization state of a gas-phase protein is the result of the balance between repulsive electrostatic terms and stabilizing forces that include salt bridges, hydrogen bonds, p-charge and long-range electrostatic interactions [37,38]. Therefore, these simulation schemes appear instrumental to predict the structural determinants of protein complexes.
Recently, we have proposed an efficient approach to sample exhaustively the proteins' protonation state space, based on a hybrid Monte Carlo (MC)/MD scheme [38]. Here, we extend this computational scheme, originally developed for single protein ESI-MS structural predictions, to a protein complex, the human insulin dimer (hIns 2 hereafter, supporting information (SI), Figure S1). hIns 2 is present in vivo [48]. It is used for the treatment of diabetes and obesity [49,50]. Our predictions reproduce the experimental main charge state and CCS. They further show that, in the sub-ms time scale (possible times of the ESI-MS experiments to form stable gas-phase structures, ranging from ms to s [28]) the overall gas-phase structure of the complex rearranges already significantly. The final gas-phase structure differs distinctively from the solution structure as large amplitude reorganizations take place in order to maximize intra-and intermolecular hydrogen bond interactions, which are necessary for the formation of stable gasphase structures. Hence, our current work provides evidence against the assumption that non-covalent complexes being transferred into the gas phase generally preserve their structural determinants in solution.

Results/Discussion
Here (i) we first employed our hybrid MC/MD scheme-based protocol, used for single proteins [38], to explore the protonation state space of hIns 2 and to identify the main charge state and its most probable conformer. Then, (ii) we performed sub-ms MD simulations on the latter in the gas phase to investigate its structural features. Both steps were validated by comparison with experiments. Comparison with independent MD simulations of the protein dimer, with different initial condition and/or force field was additionally made.
(i) The protocol developed by us and used here for protonation state space exploration [38] was applied to hIns 2 in the gas phase with different charge states ([hIns 2 ] q , q = 1+,2+,…,15+) (see the Methods and Text S1). The protocol uses both MC and MD simulations and is based on standard force field energy augmented by additional energy terms associated with the gas-phase basicity (GB) of ionizable residues [38]. The initial structure of our calculations was taken from MD simulations of the protein complex in water at physiological pH, which was in turn based on a high resolution X-ray structure (see SI, Figure S1 for further details).
The GB corrected energies correlate with density functional theory (DFT) results much better than the energies derived from the original force field (Figure 1, see the Methods for further details). The corrected energies of the complex for fifteen charge states turned out to decrease largely already after few hundred steps ( Figure 2A for the case of [hIns 2 ] 6+ and SI, Figure S2 and Figure S3). The identified lowest energy protonation states for each charge state are reported in Table S1.
The protocol was validated by predicting the experimental main charge (which is usually the maximum charge for folded proteins [51]) of the complex under ESI. This is q = 6+ [52]. We used the fact that the charge state of protein ions with the apparent gasphase basicity (GB app , see its definition in the Text S1) close to the GB of the solvent, from which the protein ions are formed, reproduces the experimental maximum charge states under ESI. The theoretical values are within 6% of the experimental values for 13 proteins [53]. Following published procedures [53,54], we estimated the maximum charge by calculating the intersection of the GB app fitted line as a function of protein complex net charge with the line of the solvent GB. The intersection occurs at q = 6+ ( Figure 2B), matching the experimentally measured main charge for hIns 2 generated from a solution at pH = 7.4 [52].
(ii) 0.075 ms long MD simulations in the gas phase were performed on the lowest energy protonation state for the main charge state, i.e. [hIns 2 ] 6+ . The simulations appeared to be gradually equilibrated already after ,55 ms as indicated by the convergence of the backbone heavy atoms root mean square deviations (RMSD) of the complex and other structural properties (e.g. radius of gyration (R g ) and center-of-mass distance between monomers) as a function of simulated time ( Figure S4A to Figure  S4D). The convergence of the simulations has been probed by the cosine content of the first principal component (PC) according to the Hess method [55]. When the cosine content is close to 1, it means that the system is far from convergence. The cosine contents of our systems turn out to be close to 0, indicating a good sampling of insulin dimer conformations (see Text S1 for details).
Overall our simulations indicate that the b-strand secondary elements are more stable than the a-helices, i.e. the average contents of b-sheets in water and in the gas phase are 4.061.2% and 3.961.1%, respectively, while the ones of a-helices are 38.763.0% and 2.864.0%, respectively. These findings are consistent with the lower stability of a-helices than b-sheets in the gas phase observed from previous simulations [56,57]. Specifically, the two-stranded antiparallel b-sheet motif at the interface of the two insulin monomers was well maintained during MD simulations ( Figure 3A and Figure 3B). This motif was

Author Summary
Electrospray ionization (ESI) mass spectrometry (MS) plays a pivotal role in proteomics and structural biology. The applications of ESI-MS to protein complexes make use of the assumption that the vaporization of protein complexes into the gas phase (as occurs during ESI-MS) preserves the structural determinants of the complexes that are observed in water. We used computational methods to investigate this key issue by studying the gaseous structure of a pharmacologically relevant protein complex. The complex in the gas phase differs in a subtle yet significant way from the solution structure. This finding is likely of general relevance for protein-protein complexes. Hence, our work implies that the assumption used in proteomic studies, i.e. that in the gas phase non-covalent complexes generally preserve the representative structural determinants observed in the aqueous phase, needs to be reconsidered. Therefore we suggest that the analysis of complexes should be performed on an individual base rather than by generalized principles.
Our MD simulations of the protein complex in solution suggest that 230.268.6 hydrogen bonds are formed between the protein complex and the solvent. A significant fraction of these (,12%) is replaced by hydrogen bonds within the protein complex on passing from solution into the gas phase. Hence, several of the protein hydrogen bonds functionalities, forming hydrogen bonds with the solvent, rearrange in the gas phase so as to form intra-and intermolecular hydrogen bonds not present in solution. Figure 4 shows the reorganization of one of the hydrogen bond networks between solution and gas phase (panel A and B, respectively). In contrast, the intermolecular van der Waals contacts did not reveal significant changes (Table S2). This may be due, at least in part, to the fact that these contacts are maximized both in solution (because of the hydrophobic effect [60]) and in the gas phase.
The R g of the complex in the gas phase (1.3060.01 nm) decreased compared to the one in water (1.3760.01 nm, see Figure S4A and Table S2). This is consistent with previous simulations both on single proteins and protein complexes [35,37,38].
During the simulations we also observed progressive rearrangements of the two insulin monomers with respect to each other. Specifically one of the two monomers (monomer I (cyan) in Figure 3A) first rotated by about 30 degrees relative to the other after 6 ms ( Figure 3C), with additional small rearrangements when the helices unfolded ( Figure 3B), and then stepwise rotated backward by about 20 degrees between 28 ms and 55 ms. The angle values between monomer I -b-sheet -monomer II at the end of the MD simulation were similar (,1.5 degree of difference) to the initial values.
Accordingly, the evolution in time of the number of hydrogen bonds within the whole complex and those between monomers decreased (from 90.364.6 to 80.664.7 and from 14.661.6 to 13.461.5, respectively) after 6 ms and stepwise increased (from 80.664.7 to 92.864.6 and from 13.461.5 to 15.361.5, respectively) from 28 ms to 55 ms ( Figure S4E and Figure S4F). At the end of the simulation, these numbers were larger than those of the complex in water (Table S2). Stepwise rearrangements to maximize inter-and intra-molecular hydrogen bonds in the   formation of gas phase structures has also been observed in monomeric proteins [61]. In contrast, the number of van der Waals contacts first increased and then decreased, then increased again and finally were maintained in the latter part (after 55 ms) of the simulations (Figure 3E). At the end of the simulation the number was comparable to the starting situation, as we discussed above.
Next, we investigated the largest scale motions of the system by essential dynamics analysis (EDA) [62] (see Methods). In the combined water-and-gas-phase trajectories (see Methods for details), the largest scale motion involves a fast compaction of the complex and an unfolding/refolding transition of the a-helices ( Figure S5A). Instead in the converged part of trajectory of the gas phase (i.e. the latter 0.020 ms, see discussion above and Figure 3) the largest scale motion entails a twisting of each subunit relatively to the other ( Figure S5B). This suggests that after the system achieves equilibration in the gas phase, the compaction motion is less relevant than in solution, possibly because of the observed structural changes on passing from water into the gas phase. Notably, the largest scale motion calculated for the entire simulation in the gas phase is similar to that of the combined water-and-gas-phase trajectories ( Figure S5C). Thus, the initial, non-equilibrated gas phase motion retains a ''memory'' of the simulation in solution.
Importantly, the calculated CCS values obtained from the gas phase simulations (12.860.2 nm 2 ), reproduced the experimentally determined value (12.9 nm 2 ) [52]. However, we found that the calculated CCS values were not sensitive enough to detect the subtle, yet significant structural arrangements described above (Figure 3 and Table S3). Indeed, the calculated CCS shows no correlation with other gas-phase structural properties ( Figure S6). As a further test to prove this issue we also calculated the CCS (Table S3) before (e.g. at time 5.7 ms) and after (e.g. at time 8.1 ms) the turn of monomer I. The CCS variation is about 0.1 nm 2 (from 12.6 nm 2 to 12.7 nm 2 ), a value within one standard deviation from the average values.
Finally, to check the dependence of conformational dynamics of the complex on the microscopic initial conditions and on the force field, we performed additional MD simulations (see Table S2 and Figures S7, S8 and S9) on the lowest energy protonation state for the main charge state (q = 6+) in the gas phase. Specifically, we performed (1) two additional 0.035 ms long OPLS/AA-based [63] vacuum independent simulations with different starting velocities and (2) one additional 0.025 ms long vacuum simulation with the GROMOS 43a1 force field [64]. Selected averaged structural properties calculated from these simulations are similar to each other (Table S2). The only exception is the slightly more compact structure obtained from the GROMOS 43a1-based simulation. This may be due, at least in part, to the overestimation of London forces in this force field [65]. Taken altogether, these results indicate that our calculations are basically independent of the initial microscopic conditions and the adopted force field. Despite the similarities of the observed average structural properties, several possible pathways and intermediate conformers exist upon transfer from water into the gas phase (Figures 3, S7, S8 and S9), consistently with what has been observed previously [66][67][68].

Conclusions
We have reported a systematic exploration of the charge and conformational space of the hIns 2 non-covalent complex in the gas phase by using a hybrid MC/MD approach and sub-millisecond MD simulations. The long time required for observing structural changes such the unfolding of the helices (,25 ms), as well as other conformational rearrangements, confirms that conformational changes in the gas phase may happen over long time scales (from ms to ms) [28,69,70]. Our calculations correctly reproduce the experimental main charge and the CCS measured in solution at pH = 7.4 [52]. Hence, molecular simulations approaches such as the one reported here may be a useful tool to (study and) complement the structural analysis of protein complexes via ESI-MS. We suggest that distinct protein complexes differ from one another when their structural properties are determined in gas phase or in solution. This is due to a substantial structural reorganization as a consequence of the maximization of intra-and monomer II. (D) CCS values. The experimental value of 12.9 nm 2 , as reported [52], at the main charge state is indicated by a red solid line and its 5% variations are indicated by the dashed lines. The average value from our MD simulation in the gas phase is 12.860.2 nm 2 . (E) Number of contact pairs between the carbon atoms of the monomers within 0.60 nm. doi:10.1371/journal.pcbi.1003838.g003 Therefore, care should be exerted when interpreting ESI/IM-MS data that are solely based on NMR and/or X-ray structural information. Consistent with this, recent experimental work also illustrates that the comparison between measured and calculated CCS based on X-ray structures can only provide a semiquantitative estimate [71][72][73][74]. This may be attributed to the considerable uncertainties (from 0 to ,40%) involved in the experimental measurements of CCS related to drag enhancement of protein ions in the drift tube and other factors [71][72][73], as well as to the compaction of protein structure in the gas phase in comparison to the corresponding X-ray crystal structure [73].
Computational approaches such as ours or those by other groups [24,56,75], may therefore be instrumental to understand how desolvation affects the structure and stability of other protein complexes. Such simulations may establish whether the present findings can be generalized. This type of calculations may be of help for the development of efficient strategies to optimize experimental factors to control the gaseous protein ion structure in ESI-MS experiments.

Methods
We first performed MD simulations in water based on the X-ray structure of hIns 2 (1.0 Å ) (PDB ID: 1MSO [76]). The protonation states of residues in solution were assigned according to the corresponding pK a values calculated by using the H++ webserver [77]. As a result, H26, H31, R43, K50 and N-terminal residues (G1, F22, G52 and F73) were positively charged and E4, E17, E34, E42 and C-terminal residues (N21, T51, N72 and T102) were negatively charged. The total charge of the complex is 0. hIns 2 was inserted into a water box with edges of 71652663 Å 3 (in total 22,519 atoms). The AMBER ff99SB-ILDN force field [78][79][80][81] and TIP3P force field [82] were used for the protein complex and for water, respectively. Periodic boundary conditions were applied. Electrostatic interactions were calculated using the Particle Mesh Ewald (PME) method [83], and the cutoff for the real part of the PME and for the van der Waals interactions was set to 0.9 nm. All bond lengths were constrained using the LINCS algorithm [84]. Constant temperature and pressure conditions were achieved by coupling the systems with a Nosé-Hoover thermostat [85,86] and an Andersen-Parrinello-Rahman barostat [87]. A time-step of 2 fs was employed. The protein complex underwent 1000 steps of steepest-descent energy minimization with 1000 kJ?mol 21 ?Å 22 harmonic position restraints on the protein complex, followed by 2500 steps of steepest-descent and 2500 steps of conjugate-gradient minimization without restraints. The system was then gradually heated from 0 K up to 300 K in 20 steps of 2 ns. 100 ns long MD simulation at 300 K and 1 atm pressure was carried out using GROMACS 4.5.5 [88]. The structure nearest to the average conformation of the complex in aqueous MD simulation (see Figure S1) was employed as starting structure for the MC/MD exploration of the protonation state space. The solvent molecules were removed.
The MC/MD simulations (see Text S1 for details) were based on the OPLS/AA [63] force field energies augmented by additional energy terms associated with the GB of ionizable residues [38]. To validate the augmented term, the energies of 60 selected protonation states for q = 6+ without and with the GB correction, as well as with DFT were calculated using the Becke exchange and Lee-Yang-Parr correlation functional (BLYP) [89,90] and the TZV2P Gaussian basis set [91]. As in ref. [37,38,92], only the N-terminal, C-terminal, R, K, H, Q, D, and E residues were allowed to protonate or deprotonate. We chose the OPLS/AA [63] force field because it offers the most complete set of base/conjugate acid pairs for these residues, e.g. the force-field parameters for the deprotonated arginine residue are missing in AMBER [93] or CHARMM [94] force fields. Issues related to a particular choice for the force field have been carefully addressed in our earlier work [37,38]. Specially, we showed that three different force fields (GROMOS 41a1 [64], AMBER99 [93], and OPLS/AA [63]) give the same gas-phase charge state for nine proteins of different size and fold, when the calculations were limited to protonation states containing the ionized residues common to all of the three force fields [38]. We considered protonation states at total charge states from q = 1+ to q = 15+ (this includes the experimentally measured q = 6+ [52]). The MC/MD protocol converged after a number of MC steps in the range of 1,500 to 6,500, depending on the charge state (over a total of ,4,000 to ,120,000,000 possible protonation states for each charge, see Table S4) were performed for various charge states.
The lowest energy protonation state for the main charge state (q = 6+) underwent MD simulations at 300 K for 0.075 ms in the gas phase with the same setup as the one described for the aqueous MD simulation, except that the time-step was 1.5 fs and the force fields was OPLS/AA [63]. To check for dependence on the microscopic initial conditions, additional two MD simulations, each 0.035 ms long, on the same protonation state were performed using different starting velocities. To check for the dependence of the results from the force field, we also performed 0.025 ms long MD simulation using GROMOS 43a1 [64]. The latter force field along with OPLS/AA [63], unlike others such as AMBER [93] and CHARMM [94], have standard parameters for deprotonated arginine residues. The latter are present in the identified lowest energy protonation state of [hIns 2 ] 6+ (see Table  S1). Furthermore, MD simulations on other lower energy protonation states at the main charge state, with charges located on different residues, have been also carried out (see Table S5, Table S6 and Text S1).
Secondary structure elements were detected by using Define Secondary Structure of Proteins (DSSP) [95]. All figures for the visualization of structures were drawn using PyMOL (Molecular Graphics System, Version 1.3, Schrödinger LLC). CCS values were calculated for structures every 73.5 ns using the trajectory method [96] implemented in the MOBCAL code [97]. The EDA [62] was carried out for the whole (0.010 ms long) trajectory in water combined with the whole (0.075 ms long) trajectory in the gas phase, for the whole gas-phase one alone and for the converged part (0.055 to 0.075 ms) of the trajectory in the gas phase. The EDA was performed after iterative superposition of the MD trajectories on the crystal structure of hIns 2 . The ProDy (Protein Dynamics & Sequence Analysis) interface [98] implemented in VMD1.9.1 [99] was used for the visualization of EDA. The MC calculations were carried out using standard Metropolis sampling [100] written as a bash/awk shell script, the MD using GROMACS 4.5.5 [88]. Figure S1 MD simulation of hIns 2 in water. (A) Primary sequence of hIns 2 (each monomer consists of two chains of 21 and 30 amino acids linked by 2 disulfide bridges derived from a precursor molecule). The letters colored in red and blue represent chargeable sites of acidic (E, D, and C-terminal) and basic groups (R, K, H, and N-terminal), respectively, in solution. (B) hIns 2 Xray structure (PDB ID: 1MSO [76]). Monomer I (residues 1-51) and II (residues 52-102) are colored in blue and red, respectively.

Supporting Information
Each insulin monomer is composed of two peptide chains (A and B, colored in dark and light, respectively) linked by two disulfide bonds (shown as green sticks, sulfur atom in yellow). (C) Backbone atoms RMSD (in nm) from the starting conformation of hIns 2 during the 100 ns long MD simulation in water. RMSD of the entire hIns 2 , of monomer I, and of monomer II are colored in black, red, and green, respectively. (D) B-factor (in Å 2 ) plotted for Ca atoms of hIns 2 from MD simulation and X-ray. The last 5 ns long MD trajectory of hIns 2 has been used in the calculation of Bfactors. The experimental values are obtained from the X-ray structure data of hIns 2 [76]. In general, the starting structure of protein complex for gas-phase calculations is generated from MD simulations in water (light blue background, steps 1-3). After selecting representative starting structures and a random generation of initial protonation states, structures for low energy gas phase protonation states are derived in an iterative procedure (blue background) beginning with high-temperature MD simulations in the gas phase. Subsequently, the lowest energy conformation within equally spaced time windows is obtained by geometry optimization. The optimized structures are then employed in the MC procedure using GB corrected force field energies and a Metropolis test to define the current lowest energy protonation state. For the next iteration, a new protonation state is generated. Convergence is reached when the program fails to generate a new protonation state for ten consecutive iterations. The procedure converges in a relatively small number of MC steps indicated by our current work on a protein complex and previous calculations of single similar-sized proteins [38]. (B) Probability that a pair of DFT conformers with DE DFT less than 10 kJ/mol falls within DE c in the GB corrected force field energies. The probability is calculated by counting the number of pairs falling within DE c . (C) The number of ionized residues (circles) in the most probable protonation states of the hIns 2 as a function of the protein net charge (q). Standard deviation from the average is given as error bars. The minimum and the maximum numbers of possible ionized residues for each total charge are indicated by the green and the red lines, respectively. The vertical dashed blue line indicates the main charge state in ESI-MS [52]. Table S1 The lowest energy protonation states for charge states from 1+ to 15+. The positive and negative charged residues are indicated by ''+'' and ''2'', respectively. (DOCX) Table S2 Average structural properties of MD simulations in the gas phase of [hIns 2 ] 6+ with the lowest energy protonation state. From left to right: length of simulation (Length in ms); radius of gyration (R g in nm); radius of gyration of backbone atoms (R g,BB in nm); radius of gyration of monomer I (R g,MI in nm); radius of gyration of monomer II (R g,MII in nm); collision cross section (CCS in nm 2 ); total surface area (SA in nm 2 ); center-of-mass distance between monomers (COM P-P in nm); number of hydrogen bonds in protein-protein interface (HB P-P ); number of hydrogen bonds in complex (HB); number of hydrogen bonds in complex (HB); number of contact pairs between the carbon atoms of the monomers defined by a cutoff of 0.60 nm (Cont P-P ). Standard deviations were reported in parenthesis.
(DOC)   Table S6 Average structural properties of MD simulations in the gas phase of hIns 2 at the main charge state with the most probable protonation states. From left to right: radius of gyration (R g in nm); radius of gyration of backbone atoms (R g,BB in nm); radius of gyration of monomer I (R g,MI in nm); radius of gyration of monomer II (R g,MII in nm); collision cross section (CCS in nm 2 ); total surface area (SA in nm 2 ); centerof-mass distance between monomers (COM P-P in nm); number of hydrogen bonds in protein-protein interface (HB P-P ); number of hydrogen bonds in complex (HB); number of hydrogen bonds in complex (HB); number of contact pairs between the carbon atoms of the monomers defined by a cutoff of 0.60 nm (Cont P-P ). Standard deviations are reported in parenthesis.