Effect of Ca2+ on the promiscuous target-protein binding of calmodulin

Calmodulin (CaM) is a calcium sensing protein that regulates the function of a large number of proteins, thus playing a crucial part in many cell signaling pathways. CaM has the ability to bind more than 300 different target peptides in a Ca2+-dependent manner, mainly through the exposure of hydrophobic residues. How CaM can bind a large number of targets while retaining some selectivity is a fascinating open question. Here, we explore the mechanism of CaM selective promiscuity for selected target proteins. Analyzing enhanced sampling molecular dynamics simulations of Ca2+-bound and Ca2+-free CaM via spectral clustering has allowed us to identify distinct conformational states, characterized by interhelical angles, secondary structure determinants and the solvent exposure of specific residues. We searched for indicators of conformational selection by mapping solvent exposure of residues in these conformational states to contacts in structures of CaM/target peptide complexes. We thereby identified CaM states involved in various binding classes arranged along a depth binding gradient. Binding Ca2+ modifies the accessible hydrophobic surface of the two lobes and allows for deeper binding. Apo CaM indeed shows shallow binding involving predominantly polar and charged residues. Furthermore, binding to the C-terminal lobe of CaM appears selective and involves specific conformational states that can facilitate deep binding to target proteins, while binding to the N-terminal lobe appears to happen through a more flexible mechanism. Thus the long-ranged electrostatic interactions of the charged residues of the N-terminal lobe of CaM may initiate binding, while the short-ranged interactions of hydrophobic residues in the C-terminal lobe of CaM may account for selectivity. This work furthers our understanding of the mechanism of CaM binding and selectivity to different target proteins and paves the way towards a comprehensive model of CaM selectivity.

Introduction Calmodulin (CaM) , Fig 1a and 1b, is a promiscuous Ca 2+ -sensing protein that plays part in many physiologically important cellular processes [1]. Its two lobes, connected by a flexible linker, have one beta sheet and two EF-hand motifs each, Fig 1a-1c. The EF-hand binds a Ca 2+ ion which induces tertiary structure rearrangements of lobe helices, exposing hydrophobic residues [2]. This allows CaM to bind and regulate a myriad of target proteins such as ion channels, kinases and G-protein coupled receptors. The Ca 2+ -signaling and olfactory transduction pathways (Fig 1d and 1e) are two examples of cell-signaling pathways where CaM is involved. In the Ca 2+ -signaling pathway, CaM activates and regulates the myosin light chain kinase IV (MLCK) and calcineurin (CaN), among others [3]. Through this, CaM plays part in regulating a variety of biological processes including metabolism, proliferation and learning. In olfactory  2+ ions are represented as black spheres and the beta sheets are marked with gray color. c) The EF-hand motif. d) The role of CaM in the Ca 2+ -signaling pathway. CaM activates the myosin light chain kinase IV (MLCK) and phosphorylase kinase (PHK), calcineurin (CaN), Ca 2+ /calmodulin-dependent protein kinase (CAMK), nitric oxide synthase 1 (NOS), adenylate cyclase 1 (ADCY) and phosphodiesterase 1A (PDE1). This affects downstream processes such as contraction, metabolism, proliferation, learning etc. e) The role of CaM in olfactory transduction. CaM inhibits the cyclic nucleotide-gated (CNG) channel and activates Ca 2+ /calmodulin-dependent protein kinase II (CaMKII). CaMKII then inhibits adenylate cyclase 3 (ADCY3). Proteins marked by a star are included in our CaM binding study. The pathways in d) and e) are adapted from KEGG [3]. simulations of C-CaM using Markov state modeling (MSM), in which target protein binding was proposed to occur by C-CaM adopting the bound conformation before binding to the protein [31]. However, the analysis did not cover the differences of CaM N-terminal domain (N-CaM) and C-CaM dynamics, mechanisms and binding modes, nor did the study explore the possibility of different binding mechanisms linked to different binding modes. Binding a ligand can be conceptualized using two frameworks: conformational selection and induced fit. In pure conformational selection, the apo protein adopts a holo-like state before binding [32]. In pure induced fit, the ligand binds in a typical apo state that is not ideal, which causes subsequent rearrangements before reaching a high-affinity holo state [33]. In reality, binding likely involves a combination of the two mechanisms. However, spectroscopy experiments, as well as extensive MD simulations, may shed light on which mechanism is dominating. If the apo protein samples the holo-like state, conformational selection is typically assumed to be dominating, otherwise induced fit is assumed [34].
Here, we analyze thermally enhanced MD simulations of calmodulin with different Ca 2+occupancy and use spectral clustering to elucidate calmodulin selective promiscuity. We search for indicators of conformational selection by mapping solvent exposure of residues from sampled states (clusters) to contacts of already existing structures of CaM/peptide complexes. Moreover, we gain knowledge about the characteristics behind different binding modes of the two lobes, as well as the difference between holo and apo binding modes.

Calmodulin system preparation and equilibration
For this project, we considered four different binding states: holo and apo calmodulin, as well as Ca 2+ bound only in the N-term (N-holo) and Ca 2+ bound only in the C-term (C-holo), Table 1. The simulations of N-holo, C-holo and holo CaM used structure 3CLN [35], while the apo simulations used structure 1LKJ [36]. N-holo was generated by removing Ca 2+ from C-CaM and C-holo by removing Ca 2+ from N-CaM from the holo structure. The systems were built using Charmm-gui [37,38], where the protein was solvated in a box of about 21000 water molecules. The systems were then ionized with 0.15 M NaCl. Charmm36 was chosen as force field [39], and TIP3P [40] as water model. The modified parameters of Charmm27 force field from Marinelli and Faraldo-Gomez [41] were used to parameterize Ca 2+ ions. 5000 steps of minimization were carried out, followed by a 50 ps NVT ensemble equilibration with strong harmonic restraints on the protein atoms. The box was then scaled, relaxing pressure with Berendsen barostat [42], while gradually releasing the position restraints for 350 ps.

Molecular dynamics simulation parameters
The MD parameters used in these simulations are extensively described elsewhere [43]. Calmodulin was simulated in an NPT ensemble with a 1 atm pressure and 2 fs time step. The short-ranged electrostatic interactions were modeled with a 1.2 nm cutoff where the switching function started at 1.0 nm, and the long-ranged ones with PME [44]. We used a Nose-Hoover thermostat [45], an isotropic Parinello-Rahman barostat [46], and constrained hydrogen bonds with LINCS [47].
The MD simulations were run at a constant temperature of 303.5 K. In addition to regular MD simulations, temperature enhanced simulations were performed; temperature replica exchange [48] (T-REMD) and replica exchange solute tempering [49][50][51] (REST). In T-REMD, parallel simulations of independent replicas are run at different temperatures. A random walk in temperature space is achieved by employing the Metropolis criterion periodically to accept coordinate exchanges between neighboring replicas. Conformations obtained at higher temperatures are propagated down to the lower temperatures through exchanges. Because barriers are more easily passed at higher temperatures, the efficiency is increased when the free energy landscape is rugged. However, the number of replicas needed to span a certain temperature interval scales with system size [52]. For this reason, T-REMD may not always be efficient. To alleviate this, REST only modifies the hamiltonian of the system for the solute (protein), and not the solvent [51]. However, the relative efficiencies of regular MD, T-REMD and REST depend on the ruggedness of the free energy landscape [43].
We used 25 replicas in both T-REMD and REST. The replicas of T-REMD spanned a temperature range of 299.13-326.09 K, while the REST replicas were simulated at temperatures between 300.0-545.0 K. The temperature ranges for REST and T-REMD were chosen using the "Temperature generator for REMD simulations" [53], considering only the protein for REST. Exchanges between neighboring replicas were attempted every 2 ps, where half of the replicas were involved in each attempt.
The REST simulations were performed with the Plumed 2.3b [54] plug-in patched with Gromacs version 5.1.2 [55], where the charge of the atoms in the hot region were scaled, as well as the interactions between the two regions and the proper dihedral angles [56]. Analysis was performed using the replica at the lowest temperature. and was carried out on the protein heavy atoms. The first four residues in the apo structure were removed, because those are missing in the 3CLN structure.
The apo CaM ensemble is more diffusive and generally well sampled by regular MD, while holo CaM free energy landscape is more hierarchical, and thus sampled more efficiently by temperature enhanced methods [43]. For this reason, more MD data is used in the apo analysis while more REST data is used for holo, Table 1. The three methods may sample different regions with varying efficiency, which is why data from all three simulations is used. In a first step, the protein was divided into quasi-rigid domains using SPECTRUS [57]. This procedure exploits fluctuations between residues to determine which parts of the protein move together. The clustering and post-processing described hereafter were done on each quasi-rigid domain.

Identifying states through spectral clustering
To further reduce dimensionality and complexity of the dataset while preserving the most important features, we used spectral clustering [58]. The advantage of spectral clustering compared to regular clustering techniques like k-means is three-fold. First, it is able to accurately assign points to non-convex clusters. Second, non-linear dimensionality reduction is intrinsic to the algorithm. For high-dimensional data, such as MD trajectories, non-linear dimensionality reduction improves clustering by circumventing the curse of dimensionality where the sparsity of the data increases with increased dimensionality [59]. Third, the number of clusters is the same as the number of dimensions onto which the points are projected. This feature becomes advantageous as the number of free parameters is reduced.
In spectral clustering, the data manifold is first approximated by a graph with adjacency, or similarity, matrix A. It contains the local relationships between points and is constructed given the matrix of distances d. The distances d ij are passed through a radial basis function, or Gaussian kernel, with scaling parameter σ, yielding the graph edge-weights where d ij is the dissimilarity between conformation x i and x j . The geodesic distance between two points is the distance between these points along the manifold, the shortest distance on the graph. The size of the scaling parameter, σ, influences the accuracy of geodesic distances, and should not be too large nor too small. A too large σ would yield short-cuts, thus causing non- The quasi-rigid domains of CaM were first identified. Then, interatomic inverse distance matrices were used to cluster the frames into states with spectral clustering. The states were analyzed by computing interhelical angles and secondary structure frequencies.
Finally, the solvent exposure of each state was mapped to contacts formed in CaM structures, followed by a classification of binding modes with spectral clustering. https://doi.org/10.1371/journal.pcbi.1006072.g003 Effect of Ca 2+ on the promiscuous target-protein binding of calmodulin convex clusters to be poorly defined. A too small σ, on the other hand, would result in a disconnected graph. Here, σ is the average nearest neighbor distance between configurations. The random walk matrix, related to the Laplacian [58], is then constructed The degree matrix, D, is diagonal with D ii being the degree of node i. The first k eigenvectors are computed and normalized per row to obtain points projected onto the k-sphere. These points are clustered into k clusters using k-means with centers projected onto the sphere. The choice of number of clusters is guided by the maximum eigengap, the difference between two consecutive eigenvalues. The representative structure of a cluster is chosen as the structure with smallest RMSD with respect to the other structures in the cluster. A cluster with all structures represents the conformational heterogeneity of one state.
Here, in practice, the dissimilarity between conformations, d ij , is measured as the distance between contact maps of inverse inter-atomic distances (iiad-cmap). This general metric is effective for all proteins and, unlike root-mean-square-deviation (RMSD), does not rely on structural alignment. The inverse distances make larger distances small so that far-away motions are cancelled out, without requiring a cutoff. The frame-to-frame distance matrix is compiled by computing the distances between each frame iiad-cmap.

Analysis of states
Each state, or cluster of frames, was analyzed to provide statistical information about its specific characteristics and molecular features. The features included interhelical angles, secondary structure and importance of states for target protein binding.
Extracting molecular features of states. A common approach to study calmodulin conformations is to compute its interhelical angles [2,24]. The interhelical angles in part define the exposed hydrophobic interface and thus selectivity. The angle, α, between two helices, u and v, was determined by their dot product, u Á v = kukkvk cos(α). We defined a helix vector as the principal eigenvector of the points corresponding to the heavy backbone atoms in sequence direction. The angles between each pair of helices A − D (N-CaM and first half of linker) and E − H (second half of linker and C-CaM) were calculated, following the helix definitions from Kuboniwa et. al [2]. The helix boundaries are shown in Table 2. We compared the angles found with the method used here to the results in [2,24], as well as with the experimentally obtained structures of protein-unbound holo and apo CaM, Table 3, and CaM bound to target proteins, Tables 4 and 5.
DSSP secondary structure assignment [72] frequencies were computed to detect the fraction of time each residue is involved in α-helices, β-sheets or coils. In order to easily spot differences between the different conformational states, as well as between the four binding states, the difference in secondary structure frequency was compared to a typical holo ensemble.
To be able to evaluate the fluctuations in the holo simulations, the typical holo ensemble was represented by one third of the T-REMD trajectory. A conformational state of CaM was characterized by the average solvent exposure of each residue. Solvent exposure of a residue was calculated as the number of water-oxygens in contact with any heavy atom of the residue. The cutoff was chosen to be 4.5 Å. This was not only done for each state, but also for the complete holo, apo, C-holo and N-holo ensembles, respectively. To characterize allosteric communication between the two lobes, we calculated the residue solvent exposures of C-holo and N-holo compared to holo and apo ensembles. To obtain the relative solvent exposure of a state, the cluster-specific solvent exposure was divided by the average solvent exposure of all states.
Distances and DSSP secondary structure assignment were calculated with MDtraj [73], and molecular visualization was done with VMD [74].
Kinetic analysis. We calculated the average time taken to transition from one state to another in the MD simulations. The kinetic rates are taken as the inverse of this average time.
Propensity of states to exhibit conformational selection. We studied the involvement of the different states (clusters) in binding various targets in a conformational selection mechanism by comparing them with CaM in complex with targets. This analysis required four steps: 1) defining the CaM residues in contact in the different available CaM-complex structures, 2) computing the relative solvent exposure of each residue for the different sampled states, 3) mapping these two results, and 4) classifying different binding modes. These steps are depicted in the bottom left part of Fig 3. A contact in CaM-complexes was defined as a CaM-residue heavy atom being within 4.5 Å of a target-protein residue atom. For an ensemble of structures, the residue pair were said to be in contact if they were within 4.5 Å for 70% of the structures. A contact vector, m (k) , was devised for each complex, k, marking CaM contact residues with ones and CaM non-contact residues with zeros, 1CLM [61] 1DMO [24] 1CLL [62] 1CFC [2] 1EXR [63] 1OOJ [64] 1OSA [65] 1PRW [66] 1UP5 [67] 3CLN [35] 4CLN [68] 4BW7 [69] 4BW8 [69] 3IFK [70] 1AK8 [71] https If conformational selection dominates the binding to a target protein, the CaM residues in contact to this protein should be exposed to solvent when CaM is unbound. Therefore, the relative solvent exposure, s ðiÞ j , of residue j in state i, was calculated for each residue. The matching score, or total solvent exposure, for a state i and complex k was computed as the sum over all residues in contact to the target protein, This was normalized over states to provide a distribution for each CaM-complex. To identify patterns of different classes of binding modes, these distributions were clustered. For this, spectral clustering was performed again, taking the distance matrix d to be the distance between the distributions. For each class, the mean distribution, with standard deviation, of total relative solvent exposure was computed. Moreover, the residue types (polar, apolar, charged) involved in the contacts were identified and the average distribution of residue types as well as the standard deviation were calculated.

Results and discussion
To characterize the ability of calmodulin to bind to calcium and target peptides through conformational selection, the protocol outlined in Fig 3 was used. The configurations of quasirigid domains in molecular trajectories were partitioned with spectral clustering and the obtained conformational states (hereby referred to as states) were analyzed.

Conformational selection aspects of Ca 2+ -binding in the two lobes
We sampled conformations of apo and holo CaM using MD, T-REMD and REST, Table 1 Table 3, while pink squares correspond to structures with CaM bound to a target peptide, Tables 4 and 5. The black circles are mean values of CaM interhelical angles inferred from NMR data [2,24], used for comparison. Furthermore, to allow a comparison between interhelical angles in the holo and apo ensemble, the sampled apo conformations are shown as a gray shadow in Fig 4 (holo) and the sampled holo conformations are shown as a gray shadow in Fig 5 (apo). A simple sanity check, S5-S10 Figs, showed that most states were sampled by all three simulation methods, with a few exceptions. Examples of such are states that are separated by a high free energy barrier and therefore not sampled by the regular MD simulations within these timescales, or states that could not be sampled by the replica exchange simulations because of the relatively short time scales simulated. Similar conclusions can be drawn from the kinetic analysis of MD trajectories (S11 Fig). A change in interhelical angles due to Ca 2+ -binding can be seen in Fig 4 (holo) and Fig 5  (apo), in agreement with previous work [2,24,26] (see S1 Text for more extensive description and details). We note also that the conformational ensemble generated by MD simulations is broader than the ones observed experimentally for both protein-bound and unbound CaM. In agreement with NMR studies of holo CaM [60], the protein-unbound CaM adopts the conformations of CaM bound to specific target proteins, which is seen as an overlap between the pink squares and the MD data (colored dots) in Fig 4, thus hinting at a possible conformational selection mechanism of target protein binding. Fig 6 displays the difference in secondary structure frequency for each residue compared to holo. Red dots denote the helical, black the strand and blue the coil content. Positive values indicate a gain in secondary structure element compared to holo, and negative ones a loss. The helical break around residue 74-82 [26][27][28] is seen to different extents in all Ca 2+ -binding states of CaM, Fig 6. This break occurs in the linker and yields a compact state with the two lobes in contact. This compact state resembles the binding mode where CaM is wrapped around the target protein and was suggested to be an intermediate conformation during target protein-and Ca 2+ -binding [69,75]. Furthermore, binding Ca 2+ rearranges the helices in the lobes and pins the beta sheets between residues 27/63 and 100/136 instead of 28/62 and 99/ 137, Fig 6. Apo CaM is more flexible, allowing the beta sheets to shift between 26-28/62-64 and 99-101/135-137, Fig 6. This highlights the rigidity of holo compared to apo CaM. Due to the flexibility of apo, a C-CaM state (Fig 5b, state 2) is found where the end of the fourth binding loop is involved in a beta sheet, deforming the binding loop, in agreement with [31]. We Effect of Ca 2+ on the promiscuous target-protein binding of calmodulin hypothesize that this state may inhibit Ca 2+ -binding. In this state, residues 129-131 are more prone to form a beta sheet with 99-101 and 135-137, S12 Fig. This extra sheet does not form in apo N-CaM because the coiled loop between helices C and D is slightly shorter than the one in apo C-CaM between helices G and H. The flexibility obtained by removing Ca 2+ from C-CaM allows these beta sheet formations to occur.
To investigate the possibility for conformational selection during the Ca 2+ -binding process, we searched for potential overlaps in interhelical angle space between apo and holo. We observe holo-like states in apo N-CaM, Fig 5a. (states 3, 5 and 8, plotted with colors lime, cyan and cerise), as well as C-CaM holo-like states that have been reported in previous work [31,76]  Effect of Ca 2+ on the promiscuous target-protein binding of calmodulin frequencies are plotted in S13 Fig, which show similar secondary structure frequencies as apo (Fig 6b), with a shift of beta sheet to residues 28/62. The C-CaM interhelical angles of holo and apo overlap more than the N-CaM interhelical angles do. The overlap in interhelical angles and shift of beta sheet in N-CaM, make these states likely intermediate states in Ca 2+ -binding, involved in conformational selective binding of Ca 2+ .
To assess whether Ca 2+ -binding allosterically modulates the conformational ensemble of the other lobe, we used the MD, T-REMD and REST datasets of C-holo (Ca 2+ in C-CaM) and N-holo (Ca 2+ in N-CaM), Table 1. Solvent exposure for each residue in N-holo and C-holo was compared to the apo (Fig 7a and 7c), and holo (Fig 7b and 7d) ensembles. This shows that the Ca 2+ -free lobe tends to transition to an apo-like conformational ensemble, while the Ca 2+ -bound lobe stays in the holo-like ensemble. Although there is a small allosteric influence between the lobes, as seen in Fig 6c and 6d, where N-CaM mostly lacks beta sheet in C-holo, the solvent exposure analysis indicates that the overall conformation of one lobe is mostly independent of the conformation of the other. These results indicate that binding of Ca 2+ to one of the lobes does not yield a significant population shift in the other lobe and thus that binding is likely not conformationally cooperative between the two lobes. Cooperative binding between sites and lobes has been investigated but the results obtained in different studies have been largely inconsistent, thus making it difficult to validate our results. This is because many different parameter sets in the models used to fit the experimental data perform equally well [77]. Effect of Ca 2+ on the promiscuous target-protein binding of calmodulin

Conformational selection aspects of target protein-binding
To investigate conformational selection aspects of calmodulin binding to target proteins, the relative solvent exposure of the different CaM residues in different states was calculated and mapped to the contacts in a set of CaM complexes, Tables 4 and 5 and Fig 3. The underlying idea is that in a conformational selection mechanism, water molecules around an exposed residue will be replaced by the target peptide. Therefore, the residues that are in contact with the target peptides should be found exposed to solvent in a state involved in a binding mechanism dominated by conformational selection. To then characterize different modes of binding, the distributions of total solvent exposure per state were clustered and divided into classes, as described in the Materials and Methods section.
Holo C-CaM complexes can be divided into five classes according to binding depth. The clustering of distributions of total relative solvent exposure of residues involved in binding holo C-CaM complexes (Table 4) yielded five subclasses. Through visual inspection, we aligned these along a depth gradient with deep, intermediate and shallow subclasses, Fig 8, S14 and S15 Figs.
Deep binding is characterized by a large hydrophobic target peptide residue buried in the CaM hydrophobic cleft. It is favored by states 3 and, especially, 6. These states expose hydrophobic residues from deep within the cleft simultaneously; PHE92, LEU105, ILE100, VAL121, VAL136 and PHE141. State 6 is characterized by parallel F and G helices, Fig 4b, and lacking beta sheet structure, S16 Fig. This state, which could be favorable for binding target proteins, is a high free energy state, only observed in the temperature enhanced simulations, S17 Fig. Binding is also likely when CaM is in the state 3 conformation, but binding to state 3 may require more subsequent rearrangements than binding to state 6. An example of deep binding is shown by the contact interface of CaM-CNG ligated ion channel (2M0K) where the a tryptophan is buried in the hydrophobic cleft, Fig 8c. Coincidentally, the residues that give rise to significant signals in relative solvent exposure, S18-S20 Figs and S1 Fig, are also highly conserved in eukaryotes through evolutionary time [78].
Intermediate depth binding specifically favors state 6 and 2, Fig 8. In intermediate depth binding, the residue pointing towards the cleft is smaller than in deep binding, as in the case of CaM-Kv7.2-Kv7.3 (5J03) binding interface with a leucine pointing towards the cleft , Fig 8d. In the superficial binding class, the residues in contact are roughly equally exposed to solvent in all states, Fig 8a and S18-S20 Figs. As seen in the distribution of residue types that are part of contacts, Fig 8b, less hydrophobic contacts are formed for more shallow binding modes. Moreover, the shallow binding modes include a larger fraction of charged residues, and do not favor any particular state since the residues needed for contact are always equally exposed to solvent. Thus, the long-ranged electrostatic interactions of the charged residues may initiate binding, yielding fast and promiscuous binding, while the CaM hydrophobic For deeper binding, a mix of states gives exposure to all necessary contact residues, namely states 1, 2, 4 and 6, Fig 9a. State 6 is slightly more favorable for complexes with deeper binding, associated with more hydrophobic contacts S21 Fig. However, since not one state exposes all necessary residues, the binding of the N-CaM could be dominated by induced fit. After binding to any state, the rest of the residues necessary for binding will be exposed to reduce free energy. In state 4, CaM binds its own helix A, S2 Fig. This state has been observed before with N-CaM forming salt bridges to the linker [79]. It is also represented by structure 1PRW [66] which overlaps state 4 in the interhelical angle space , Fig 4a. It may display a possible intermediate conformation during a flexible binding process in the N-term.
The hydrophobic cleft of N-CaM is not as open as in C-CaM. This is reason to believe that while C-CaM binding may be dominated by conformational selection, N-CaM binding may need a more flexible mechanism with intermediate states to facilitate deep binding. Also here, however, deep binding involves a large hydrophobic target protein residue buried in the CaM cleft, Fig 9a. As in C-CaM binding, the depth of binding in the N-CaM hydrophobic cleft determines which interacting residues and therefore which states are more favorable for binding. The contacts in deeper binding are formed through a combination of CaM residues ILE27, LEU32, VAL35, ILE63, PHE68 and MET71, S23-S25 Figs and S2 Fig. These contacts are here associated to a large hydrophobic residue from the target peptide being buried or facing the Effect of Ca 2+ on the promiscuous target-protein binding of calmodulin hydrophobic cleft. Furthermore, these CaM residues are also reported as highly conserved in eukaryotes. [78] In summary, states 1 and 2 expose residue PHE68 and MET71 which are required even in shallow binding. Therefore, state 1 and 2 are presumably favorable for initiating binding to both shallow and deeper binding modes. State 6 exposes residues in contact in slightly deeper binding and state 4 is probably an intermediate state visited in on the way towards deep binding.
Apo CaM target-protein binding is more shallow than holo CaM binding.  Table 5 and S27 Fig. These residues are more exposed to solvent in the states that favor binding to the analyzed target proteins, suggesting that they are important for C-CaM binding.
Interestingly, mutating CaM either at PHE89 and PHE141 to LEU causes arrhythmias and long QT syndrome [80][81][82]. Moreover the Ca 2+ affinity has been shown to decrease for the PHE141 to LEU mutation [83]. This residue is, in fact, more exposed in state 1 and 5 (S3 and S27 Figs), the two previously identified holo-like states, Fig 5, which indicates that this residue is important for both calcium as well as target protein binding. Effect of Ca 2+ on the promiscuous target-protein binding of calmodulin As in holo, apo C-CaM target protein binding involves a larger fraction of hydrophobic residues than apo N-CaM binding. When binding Na V 1.6 (3WFN), for example, a LEU is buried in the CaM hydrophobic cleft , Fig 10a. In the structures of apo N-CaM binding, however, we only observe superficial binding, Fig 10b and S28 Fig. An example is binding to Myosin VIIa (5WSU), where a VAL is facing the cleft, Fig 10b. The two classes from clustering instead distinguish whether polar contacts or more hydrophobic contacts are formed. The hydrophobic residues are well packed within N-CaM and a lack of hydrophobic cleft characterizes shallow binding.
Comparison of apo and holo target peptide binding. For apo CaM binding, the residues that bind to target proteins are in general equally exposed to solvent in all states, S27 and S29 Figs. Therefore, apo does not need to stay in distinct states for binding, but uses its configurational flexibility to adjust the binding pose after initial contacts are made. This flexibility that stems from the low free energy barriers and diffusivity of apo CaM [43], is possibly linked to apo not showing a diverse set of binding interfaces. It has previously been proposed that C-CaM in N-holo is semi-open [25], which would facilitate binding to target proteins, but later it was shown that the semi-open state exists in even pure apo C-CaM [2,24,31] and apo C-CaM complexes [8]. Here, the semi-open state is shown to exist in the unbound apo C-CaM ensemble, whereas the N-CaM ensemble is mostly closed. This is in line with the finding where the apo C-CaM interhelical angles overlap holo more than the interhelical angles of apo N-CaM, Fig 5. Despite the overlap of interhelical angles, and thus semi-openness of apo C-CaM, the true open state and deep binding with a large hydrophobic residue buried in the CaM hydrophobic pocket is an effect of Ca 2+ and thus only observed in holo CaM. Binding to the protein is then most likely facilitated by the holo C-CaM cluster 6, where the loss of beta sheet opens the cleft and exposes a larger hydrophobic pocket, allowing for deeper binding, S1 and S16 Figs and Fig  4. These well-defined states of holo allow for binding to target peptides in a mechanism dominated by conformational selection.

Conclusions
We used molecular dynamics and temperature enhanced MD with an agnostic spectral clustering scheme to search for conformational selection of CaM Ca 2+ and target protein binding. We found that the Ca 2+ -state of one lobe does not significantly influence the conformation of the other lobe, but binding Ca 2+ may occur through conformational selection facilitated by a transition state that is visited by both apo and holo CaM. It is observed here by overlapping conformations in interhelical angle space. In N-CaM, the transition between apo and holo likely involves a beta sheet residue shift.
The target protein binding modes of apo CaM are more shallow than those of holo CaM, and the charged interactions dominate due to the induced burial of the hydrophobic binding interface from Ca 2+ -depletion. The residues involved in binding are equally exposed in apo CaM states and thus all observed states are equally likely to initiate binding to the target protein.
The notion that not only the linker, but also the binding interface of C-CaM shows configurational flexibility has previously been proposed both through extensive MD simulations [31], but also in structural studies [2,24,25]. The conformation of C-CaM interface is suggested to vary more than the N-CaM interface [25]. Here, holo C-CaM shows distinct states exposing hydrophobic residues that are otherwise screened from water. Such states are absent in N-CaM, which also exhibits less binding heterogeneity than C-CaM in the CaM-complex structures. This is seen as the shallow class of C-CaM show more shallow binding than the shallow class of N-CaM, while the deep binding class of C-CaM shows deeper binding than the deep binding class of N-CaM. The distinct states of holo C-CaM with a clear mapping to bound structures indicate a tendency for C-CaM binding to be selective and dominated by conformational selection, while N-CaM, which lacks this clear mapping, likely binds through a more flexible mechanism involving intermediate states.
For general protein-ligand binding, strong and long-ranged interactions have been thought to favor binding dominated by induced fit, while short-ranged interactions would favor conformational selection. [84] Our results support and extend this idea to CaM by showing that weak hydrophobic interactions dominate deep binding modes which indeed tend to be dominated by conformational selection in C-CaM. In the flexible binding of N-CaM, on the other hand, hydrophobic interactions occur less frequently. We hypothesize that the long-ranged electrostatic interactions of the N-CaM charged residues may initiate fast binding while the hydrophobic pocket in C-CaM may account for selectivity. Previously published studies using NMR [23] proposed that C-CaM is selective while N-CaM binds afterwards through induced fit mechanism, or a coupled conformational selection mechanism initiated by C-CaM [60]. This is in line with the argument given here on C-CaM selectivity and N-CaM flexibility.
This study opens the way towards understanding the process and mechanisms behind calmodulin Ca 2+ -sensing. In a next step, the second aspect of binding, induced fit, may be studied. The full mechanism of specific target peptide binding could also be simulated starting from the conformational selection state of CaM identified here. (TIFF) S1 Data. Interhelical angles, secondary structure assignments, and data used for solvent exposure analysis of all the states in holo and apo C-CaM and N-CaM. (ZIP) S1 Text. Explanation of interhelical angles and comparison of sampled states and experimental structures. A discussion of derived rate constants is also provided. (PDF)