Experimental conditions or the presence of interacting components can lead to variations in the structural models of macromolecules. However, the role of these factors in conformational selection is often omitted by in silico methods to extract dynamic information from protein structural models. Structures of small peptides, considered building blocks for larger macromolecular structural models, can substantially differ in the context of a larger protein. This limitation is more evident in the case of modeling large multi-subunit macromolecular complexes using structures of the individual protein components. Here we report an analysis of variations in structural models of proteins with high sequence similarity. These models were analyzed for sequence features of the protein, the role of scaffolding segments including interacting proteins or affinity tags and the chemical components in the experimental conditions. Conformational features in these structural models could be rationalized by conformational selection events, perhaps induced by experimental conditions. This analysis was performed on a non-redundant dataset of protein structures from different SCOP classes. The sequence-conformation correlations that we note here suggest additional features that could be incorporated by in silico methods to extract dynamic information from protein structural models.
Citation: Srivastava SK, Gayathri S, Manjasetty BA, Gopal B (2012) Analysis of Conformational Variation in Macromolecular Structural Models. PLoS ONE 7(7): e39993. https://doi.org/10.1371/journal.pone.0039993
Editor: Peter Csermely, Semmelweis University, Hungary
Received: February 21, 2012; Accepted: May 30, 2012; Published: July 9, 2012
Copyright: © 2012 Srivastava et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: No current external funding sources for this study.
Competing interests: The authors have declared that no competing interests exist.
The substantial improvement in the methodology of protein structure determination is reflected by an exponential increase in the number of structures deposited in the Protein Data Bank (PDB) . Functional annotation and mechanistic interpretations of several of these structural models, however, remains a significant hurdle. Information on protein dynamics and conformational variations is an important input for mechanistic interpretation. While this information is experimentally captured by Nuclear Magnetic Resonance (NMR) spectroscopy methods, structural models determined by X-Ray crystallography have to be further subjected to intensive computational methods for dynamic information. In silico strategies to obtain dynamic information are both time-consuming and have an inherent limitation as they do not explicitly incorporate experimental errors and artifacts induced by experimental conditions. While experimental errors can, in principle, be incorporated in computational simulations, these require access to unprocessed experimental data that is not currently freely available to analyze. Experimental conditions, on the other hand, are available either with the structural coordinates or in manuscripts that describe macromolecular structures in more detail. An examination of protein structural models along with experimental conditions could potentially aid in de-convoluting conformational selection induced during the structure determination process.
It is increasingly apparent that a single structural model of a protein is likely to be incomplete in its information content- given that it provides a single representation of several flexible segments and alternative conformations. It is thus imperative to de-convolute the dynamics and alternate conformations from a structural model to obtain a more functionally relevant model of a biological molecule. In silico strategies, such as Molecular Dynamics (MD) simulations, from-CONstraints-to-COORDinates (CONCOORD) analysis or more often, normal modes analysis are employed to extrapolate dynamic motions of a protein from a single experimentally determined structural model. These techniques, however, do not explicitly incorporate features such as experimental conditions or the propensity of a protein stretch to adopt conformations other than that modeled by the experimenter. The large number of structures present in the protein data bank suggests that a systematic analysis of these parameters could form a potentially useful source of information to interpret protein structures solved at high resolution. A reliable de-convolution of dynamic information that accounts for experimental artifacts could also aid in structure-based functional annotation. Indeed, a protocol that incorporates dynamic information from small protein domains to predict structural variations in large macromolecular complexes could provide valuable mechanistic information. An essential requirement towards these goals is an estimate of the influence of experimental parameters in the selection of alternate conformations that were modeled in X-Ray crystal structures or were retained in an NMR derived structural ensemble. In this study, we examine differences between structural models that share high sequence similarity to obtain an estimate of context-dependent remodeling or conformational selection. The dataset for this analysis comprised structural models derived by X-Ray and NMR methods encompassing five Structural Classification of Proteins (SCOP) classes. Multi-protein complexes and structures of peptides determined independently and as a part of large proteins were included in this analysis. Structural variations within this data-set were examined for intrinsic (sequence-based) features as well as external (experimental) parameters. This analysis highlights structural differences and provides a dataset to test in silico methods to extract dynamic properties of proteins while explicitly incorporating the influence of experimental parameters on structural models.
A mechanistic interpretation of the function and regulation of a protein crucially depends on information on the dynamic motions and alternate conformations that could be adopted by its structure. An estimate of the extent of conformational variation in structural models of proteins that share high sequence similarity can provide vital inputs to incorporate alternate conformations for a given molecular model. This data, however, requires additional information to distinguish between inherent flexibility vis-à-vis structural variations that can be explained by experimental conditions. Experimental context in this case includes factors that influence conformation by virtue of interactions between polypeptide fragments, concentration dependent and osmolyte-induced effects as well as ligand interactions. A representative dataset of protein structural models was collated to examine the effect of experimental conditions on conformational selection.
Dataset of Proteins for Comparative Analysis
The dataset for this analysis includes high resolution crystal structures, NMR structural ensembles, protein structures that were determined in both the free-state (apo) as well as complexes with ligands or as a component of a large macromolecular complex. A pictorial description of this dataset is shown in Figure 1. This dataset incorporates all SCOP classes of proteins except membrane proteins. As there were no suitable NMR entries for multi-domain proteins and very few structures in the category of membrane and cell surface proteins, these classes were not included in this study. Protein structures were retrieved from the PDB based on folds, super-families and families which yielded a total of 1086 folds, 1777 super-families and 3464 families . Further pruning based on sequence and structural criteria resulted in 233 structures spread across 5 classes of proteins viz., α, β, α+β, α/β and small proteins. A sub-set of 31 protein pairs that shared high sequence similarity but showed prominent differences in conformation were chosen for detailed analysis (Table 1, Table S1). Information on disordered proteins was obtained from the DISPROT database . From this dataset of 183 protein-protein and 82 protein-nucleic acid complexes, 90 protein complexes and 35 protein-nucleic acid complexes were selected for further analysis. We found 52 protein-protein complexes and 20 protein-nucleic acid complexes that showed substantial variation in their structures between the free form, as a part of larger complexes or in some cases between different multi-protein complexes. Although peptides are not a true SCOP class, these were also included (110 structures) to examine the influence of context on structure. 45 amongst these peptide structures had an equivalent stretch (sequence identity >80%) in a larger protein (Figure 1B). The final dataset of protein complexes and peptide structures that show conformational variation are listed in Tables 2 and 3.
(A) The initial dataset of proteins was compiled for a representative sampling of folds and families. After selecting protein-structural pairs based on experimental and sequence criteria, the dataset for analysis included 31 different protein pairs across five different structural classes. (B) Bar diagrams represent the protein-protein, protein-nucleic acid complexes and peptides used in this study. Dark blue bars in all the classes represent the initial selection from a set of 183 protein-protein complexes, 82 protein-nucleic acid complexes and 110 peptide structures. The final composition of this dataset (shown here in gray and light blue bars) is based on the sequence and structural criteria described in the methods section of this manuscript.
Variations between Solution and Crystal Structures
A comparison between crystal and NMR structures provides experimental evidence for conformational variation and sampling. In the all α family, most differences, although not all, between the solution and crystal structures could be rationalized to ligand binding. For example, the S100 protein has been structurally characterized in the Ca2+- free form (PDB: 1K9P), the Ca2+-bound form (PDB: 1K96)  and in solution (PDB: 1A03) . In the X-ray structure, the stretch proximal to the ligand binding site adopts a helical conformation in the crystal structure whereas it is unstructured in the NMR structure despite the presence of a bound Ca2+ cofactor. Another example of conformational change induced by ligand binding are the crystal (PDB: 1GU2) and solution structures (PDB: 1E8E) of the oxidized form of Cytochrome C that reveal structural differences closer to the heme binding pocket , . These include a stretch I28–N36 (ITDGKIFFN) that adopts a helical conformation in the crystal structure while it is unstructured in solution. The segments A48–T54 (ACASCHT) and G61–I70 (GKNIVTGKEI) adopt α-helical and β-sheet conformation in the crystal structure as opposed to hydrogen bonded turns in solution. These structural variations are highlighted in Figure 2A.
(A) All α class (B) All β class (C) α+β class (D) α/β class (E) Small proteins. A comprehensive list of these parameters is compiled in Table 1.
Plastocyanins are a good example of structural differences in the β-class of proteins. The X-ray (2GIM)  and solution structures (1FA4)  of Anabaena variabilis plastocyanin differ in their secondary structural content (Figure 2B). β-strands are less structured in solution compared to crystal structures where they form extended β strands. Also, residues S52–S60 (SADLAKSLS) and E90–G96 (EPHRGAG) in the crystal structure from A. variabilis plastocyanin and the corresponding region in the Phormidium laminosum homologue (PDB: 2Q5B) are α-helical in the crystal structure while they remain unstructured in solution.
Three pilin crystal structures (α + β family in SCOP) exemplify variations in this structural class. The structural descriptions include N. gonorrhoeae strain MS11 pilin , the truncated toxin-coregulated pilin from V. cholerae  the P. aeruginosa strain K pilin  and the ΔK122–4 pilin examined by NMR . The ΔK122–4 crystal structure (PDB: 1QVE) exhibits a characteristic type IVa pilin fold, with the N-terminal α-helix (α1–C) packed onto a four-stranded antiparallel β-sheet. Although the relative positions of the core secondary structure elements are well-conserved among the crystal structures, they differ considerably between the crystal and NMR structure of ΔK122–4 pilin (PDB: 1HPW). Superposition of these structures shows that in the solution structure of ΔK122–4, the N-terminal α-helix A31–G55 (AQLSEAMTLASGLKTKVSDIFSQDG) is shifted by one turn and thus deflected away from the β-sheet . The C-terminal residues V78–A88 (VAKVTTGGTA) form a β-strand in the crystal structure whereas they are unstructured in solution (Figure 2C).
ADP-ribosylation factors (ARF-1) belong to the α/β family of proteins. Structural comparison in this case was made using four structural models viz., the GDP bound structure of human ARF-1 (1HUR), rat ARF-1 (1RRF) and human ARF-1 (1U81) . A comparison between the crystal and solution structures reveals several changes. The region P76–N84 (PLWRHYFQN) is helical in solution NMR (1U81) but unstructured in the crystal structure. Other differences include regions M18–M22 (MRILM), V43–V53 (VTTIPTIGFNV) and T85–V92 (TQGLIFVV) which are β-strands in the crystal structures of these ARFs but are unstructured or adopt turns/bridges in solution. Similarly, R99–E113 (RVNEAREELMRMLAE) is a well defined α-helical stretch present in the crystal structure while in solution this stretch is a mix of a hydrogen bonded turn (R99–E102), a short helix (E102–L107) followed by another hydrogen bonded turn (M108–E113; Figure 2D). Another prominent example is that of Rubredoxin where the major difference between the X-Ray (PDB: 1BRF)  and NMR structure (PDB: 1RWD) is the absence of β-strands in solution (Figure 2E).
Structural Variation Due to Conformational Restraints in a Larger Macromolecular Complex
An experimental construct that allows a recombinant protein to be purified in large amounts to homogeneity is a critical step towards structure determination. Important variables in this step include the length of the recombinant protein along with the choice of an affinity or solubilization tag. A particularly dramatic case of a change in the fold of a protein due to a change in the sequence-length is that of human PRP-8 D4 structure that has a different fold from that determined for a shorter D4 construct (Figure 3A). In the case of multi-protein complexes, co-expression and co-purification of interacting proteins often provides a viable route towards structural characterization. Protein-protein interactions often involve conformational changes that make the complex more stable and tractable for crystallization. These conformational changes can also be context-dependent. An example of this feature is Synaptobrevin, a part of the vesicle-associated membrane protein (VAMP) family that forms a component of the neuronal SNARE (soluble N-ethylmaleimide-sensitive factor attachment receptor) complex. The isolated solution structure of synaptobrevin is largely unfolded but is a well-defined helix in the SNARE complex . The structure of synaptobrevin (residues 27–57) in complex with Neurotoxin type F from Clostridium botulinum (3FII)  shows a largely disordered segment with a small β-strand at the N terminus and a small α-helix at the C terminal end while the same segment is a helix in the neuronal synaptic fusion complex (PDB: 1SFC) . A superposition of the two structures is shown in Figure 3B. A search for similar stretches in the PDB yielded several protein-complexes in which this sequence-stretch is an ordered α-helix. For example, synaptobrevin in the complexin-SNARE complex (PDB: 1KIL)  shows a well defined α-helix similar to other SNARE complexes (PDB: 1N7S, 3HD7, 3IPD) . Recombinant proteins of different sizes (based on different expression constructs) also influence secondary structural composition. For example, in the case of the catalytic domains of Protein Tyrosine Phosphatases (PTP), addition of an additional stretch of ca 45 residues substantially influences the solubility and propensity to crystallize. This stretch either adopts an α helical conformation or is involved in dimerization . Context-dependent conformational changes are more common in protein-nucleic acid complexes (Figure 3C). Indeed, successful structure determination of protein-nucleic acid complexes is often only possible in the presence of the interacting components (Table 2).
Structural differences in (A) human splicing protein Prp-8 (Full length and N-terminal deletion) variants. These structures illustrate sequence length-dependent structural changes. (B) & (C) depict structural changes in protein-protein and protein-nucleic acid complexes.
Peptide Structures Exemplify Conformational Selection
Structural differences in peptide structures have been extensively examined in the case of the amyloid peptides and chameleon sequences , . For instance, the NMR structure of an eleven residue peptide from the amyloid β A4 protein (PDB: 1QWP) adopts a α-helical conformation. The same sequence, however, variously adopts β-strand conformations (PDB: 3MOQ, 2BEG, 2OTK) ,  α-helical segments (PDB: 1Z0Q, 1IYT, 1BA4, 1AML)  or coiled-coil conformations (PDB: 1HZ3) as a part of a larger protein sequence (Figure 4A; Figure S1). Another representative example is the NMR structure of a peptide from the C2 domain of Factor VIII (PDB: 1CFG)  which is α-helical in isolation. The same sequence in the context of the entire C2 domain of Factor VIII (PDB: 3HNB, 3HNY, 3HOB, 1D7P, 3CDZ, 1IQD) , ,  adopts a β-strand conformation (Figure 4B). It is relevant to note in this context that the secondary structure prediction (using PSIPRED)  for this peptide revealed a 22% β-strand and 63% α-helical structure.
Limitations of Temperature Factor and CONCOORD Simulations to Examine Conformational Variation
High B-factors, classical indicators for conformational variation or flexibility, are often ambiguous due to experimental limitations. A case for this observation is Synaptobrevin, a protein involved in two different complexes, one with Botulinum Neurotoxin (PDB: 3FII) and the other with SNARE complex proteins (PDB: 1SFC). In this case, the unstructured component (PDB: 3FII) showed slightly lower B-factor values as compared to the structured component (PDB: 1SFC). We stress here, however, that a vast majority of segments that show conformational variability in this dataset can be clearly flagged by virtue of high B factors in those stretches when compared with the rest of the protein. In these cases, alternate conformations are also easily identifiable by in silico methods. For example, in the Prevent-host-death (Phd) protein, the region 50–73 forms an α-helix when involved in a complex with the Death-on-curing (Doc) protein (PDB: 3K33) while it remains unstructured in isolation (3HRY). The temperature factors show a marked increase for 3HRY while in 3K33, where the protein is structured, the region has a B-factor that is below the average value for the protein. Consistent with this experimental data, this stretch in 3HRY shows high RMS fluctuation in a CONCOORD analysis that correlates well with changes in secondary structure conformations. The Dictionary of Secondary Structure Predictions (DSSP) output for the stretch in 3HRY shows a largely turn-dominated profile interspersed with 310-helices, bends and alpha helices at several points of time in the simulation (Figure 5).
CONCOORD and temperature factor analysis of Prevent host death protein (Phd: 3HRY) that shows a disordered-to-ordered conformational transition upon forming a complex with the Death on curing protein (Phd-Doc complex: 3K33). The grey bar represents the region in the Phd protein that undergoes structural change upon forming the Phd-Doc complex.
Comparison Between the Secondary Structure Propensity and Conformational Variations
The secondary structure propensity is highlighted in several cases of conformational differences between solution and crystal structure. For example, in the crystal structure (PDB: 1NZN) of the cytosolic domain of human mitochondrial fission protein Fis1, the region E5–S13 (EAVLNELVSVED) is α-helical whereas it is unstructured in solution (PDB: 1PC2). The PSIPRED prediction for this stretch is a α-helix. These results from the comparative analysis dataset of X-ray and NMR pairs are summarized in Table 1. A comprehensive list of root-mean-square-deviations (RMSD) for this dataset is compiled in Table S2. This aspect of conformational selection is also seen in the case of multi-protein complexes. In the synaptosomal associated protein complexed with Botulinum Neurotoxin BONT/A (PDB: 1XTG), the region M167–G204 is unstructured. In the truncated neuronal SNARE complex (PDB: 1N7S), however, the stretch is helical, consistent with the secondary structure prediction. A summary of these observations, along with the output obtained from the DISOPRED  predictions is compiled in Table 2.
Effect of Experimental Conditions on Conformational Differences
The composition of a crystallization condition can influence the secondary structural composition of a protein and hence facilitate conformational selection. This analysis is compiled in Tables S3 and S4. The compilation in Table S3 suggests that polyethylene glycols (PEG; in the molecular range of 200–4000) are involved in the crystallization of ca 80% of the proteins in this dataset while a minority (ca 10%) of them have salts like ammonium sulphate. PEGs serve to aggregate protein molecules, often inducing secondary structural features, thus increasing the chance of crystallization . This observation perhaps rationalizes the finding that in the dataset of structural pairs (X-ray and NMR; Table S3), most of the crystal structures showed additional secondary structural elements than the corresponding solution structures. While an ideal comparison would have involved a pair of structural models (X-Ray/NMR) where the structure determination was performed under identical conditions, these are difficult to achieve due to divergent experimental requirements of mono-disperse solution behavior of a protein sample for NMR versus conditions that promote systematic aggregation to form crystals. Conformational selection, in the case of multi-protein complexes is also facilitated by crystallization agents. For example, the crystallization condition of the Prevent host death protein (3HRY) where the stretch 50–73 is unstructured contains Ethylene glycol and PEG 8000 as precipitants. Ethylene glycol is known to decrease α-helicity and its interaction with proteins is enhanced in the presence of high molecular weight PEG . Hydrophobic interactions are known to increase with high salt concentrations . These interactions could have facilitated the folding of the stretch (L630–E710) in DNA Topoisomerase 2 (PDB: 2RGR) as the salt concentrations are much higher than the corresponding concentration in the structure without bound DNA (PDB: 1BGW). Perhaps coincidentally, an observation on the denaturation of β sheets at low pH  also correlates with the structure of the T-cell surface glycoprotein CD4 (PDB: 1CDJ, 1G9M) which shows well-defined β-strands when compared to its structure in complex with two other proteins where it is unstructured. Representative cases of conformational changes induced by crystal packing effects are illustrated in Figure S1. It is, however, difficult to correlate crystallization conditions or the high protein concentration in an NMR experiment with the packing in a protein structure. This analysis is summarized in Table S5.
The packing fraction varies in the range of 0.66 to 0.84 . The average packing density of proteins is about 0.75. Comparative studies of packing density and cavity analysis of similar NMR and crystal structures for all classes of proteins was performed using Voronoia . The grid level for all the input PDBs were adjusted to 0.2 for calculating the parameters. This analysis, however, did not yield new information, apart from confirming that NMR structures tend to have a slightly higher packing density when compared to crystal structures.
Conformational changes in proteins often provide the first step to rationalize a functional role or to build a mechanistic hypothesis for a biological observation. Deducing conformational variations is thus an important step in functional annotation. This information is also crucial for structural models that form the basis for in silico modeling of homologous proteins or as fragments that are utilized for de novo structural prediction. An understated feature of currently available structural models is that they implicitly incorporate experimental conditions, limitations inherent to the method for structure determination and data as well as by the length of the recombinant protein construct. These limitations, in an extreme case, provide alternate structural models for an identical protein sequence. This was noted, most recently, in the case of the human PRP-8 D4 structure that has a different fold than that determined for a shorter D4 construct (Figure 3A) . In this study, we examined representative structural models in the PDB for evidence of conformational selection or context-dependent modeling , . The dataset for this analysis was spread across different structural families and multi-component (protein-protein and protein-nucleic acid) complexes. This diverse set of protein structures was evaluated for sequence features (secondary structure propensity, disorder) that could suggest alternate conformations. In particular, aspects such as a skewed distribution of highly fluctuating residues (G, A, S, P, D) over weakly fluctuating residues (I, L, M, Y, F, W, H) in irregular structural elements (loops), chameleon sequences and intrinsically disordered proteins ,  were examined. The next step involved an examination of context dependent structural variations that could be ascribed to experimental conditions, packing, or induction of secondary structure by binding to cognate partners. The result of this analysis is compiled in Figure 6 and Figure S1. This analysis suggests that methods to de-convolute dynamic information are better served by incorporating both sequence features (for example, disorder propensity, ambivalent secondary structures and chameleonic sequences) and experimental conditions that nucleate or aid conformational selection.
This data is based on information presented in Tables 1–3. The abbreviations used here are- psipred score: differences between predicted and observed secondary structure; Disorder promoting residues, Chameleon sequences: Classification based on aminoacid composition; Salt, pH, PEG: Effects of ionic strength, pH, high concentration of polyethylene glycol; Packing induced, Technique/Resolution: Differences between solution and crystal structural models.
Static structural models, such as those obtained from single crystal X-Ray diffraction methods, incorporate dynamic information at multiple layers. B-factors and ligand induced displacements provide an insight into potential conformational changes and conformational sampling. The so-called consensus structures that involve different levels of structural overlap in multiple crystal structures have been proposed as a route to obtain dynamic information that is otherwise not evident from single crystal structural models. An alternative approach involves diffuse scattering that originates from fluctuations in the average electron density and appears as a background on an X-ray film. This analysis, however, requires ultra high resolution structures as the higher order scattering makes a significant contribution at high resolutions. Furthermore, these studies also require robust scaling between the vibrational density of states to make a comparison between experimental and theoretical temperature factors. The data-set utilized in this manuscript was compiled with the aim of having protein structural models determined using different experimental methods. This data-set does not contain crystal structures of the resolution required to analyze diffuse scattering. In an effort to examine if potential conformational variants could be deduced from a given crystal structure, we performed an analysis using CONCOORD . A significant number of outliers, however, suggest that both normal modes and CONCOORD analysis, the preferred route to examine structural variations in the absence of detailed MD simulations, are inadequate (Figure 5). Do conformational differences actually depict characteristics similar to those of the so-called chameleon sequences? The sequence analyses presented in Table 3 broadly support that perspective. The sequence composition also suggests more scope for residue fluctuations  supporting the view that structural models represent conformational selection influenced by experimental conditions.
Put together, this analysis suggests that experimental conditions substantially influence conformational selection. The experimentally determined structural model, that is the template for in silico methods to derive dynamic information, can thus bias interpretations on conformational variation and dynamics. This study presents a case for a more comprehensive inclusion of physico-chemical parameters associated with experimental conditions in the interpretation of protein structural data. This analysis also emphasizes the need to incorporate information on chameleon sequences in protein structural models while inferring dynamic properties of proteins.
Dataset of Structures Used in this Analysis
A compilation of protein structures was initially based on the SCOP (1.73 version) database. Upon the identification of candidate structural models, an advanced search in PDB was performed to obtain the corresponding protein structure determined either in solution by NMR or as a part of a larger macromolecular complex. The following criteria were used to obtain the dataset for this analysis- i. Resolution cut-off for the X-ray crystal structures was set at 3.00 Å (3.9 Å in complexes) and ii. Only structures with a minimum overall sequence identity of 30% in a pair-wise alignment were selected. For this purpose, the EMBOSS Align program was used. PyMOL was used for the superposition of the structure pairs. The dataset of protein structural pairs had a total of 31 pairs of structures, belonging to five SCOP classes. The dataset for disordered proteins was collated from DISPROT . The homologues for the disordered proteins for which PDB files were available were compiled from the PDB. The dataset for peptide structures were obtained from the PRF database within the DBGET integrated database retrieval system. In this search, the peptide length was limited to 10–40 amino acids. 110 peptide structures that contained only naturally-occurring amino acids were chosen for the study. Based on the availability of comparable sequences within large protein structures, a dataset of 45 peptide structures were compiled.
RMSD Calculation, Temperature Factor and Normal Mode Analysis
The root mean square deviation (RMSD) was calculated between one X-ray crystallographic structure and the average structure from the NMR ensemble using LSQMAN . The average of that RMSD was taken for further analysis as the deviation between the two representative proteins. The ensemble average for the NMR structure was calculated using MOLMOL . The B-factor analysis was also performed on all the X-ray structures in the database presented in this work. Packing densities and cavities of the protein molecules for each structure in the dataset were calculated using Voronoia . In this method, packing density is defined by the equation: PD = Vvdw/(Vvdw+ Vse) where Vvdw is the assigned atomic volume inside the atoms’ Van der Waals radius and Vse is the remaining solvent excluded volume. Only monomers of each structure were used for calculating the packing parameters while an averaged structure was used for calculating values in the case of solution NMR. A grid level of 0.2 was assigned for calculating the packing densities and cavity in each structure. Water molecules were removed from the coordinate files and only monomer structures were considered for calculations.
Analysis of Conformational Dynamics
Along with the crystal structures, we also used CONCOORD (from CONstraints to COORDinates) tool  to predict and analyze the likely motion(s) of the segments/motifs in proteins in our dataset. All the simulations were performed for 1000 ps using the default parameters to generate 1000 conformations. The trajectory analysis of the region of differences during the course of simulations was performed using the RMSF (root mean square fluctuation) plots of the residues during the simulation period. Changes in secondary structure were analyzed using DSSP .
Sequence Analysis of the Regions of Conformational Change
The peptide segments that show conformational differences between X-Ray and NMR structures as well as protein complexes were used as a template to search for similar sequences using BLAST (Basic Local Alignment Search Tool) . Cut-off values for sequence identity were set at 80% with the template segment. The secondary structure propensities of the protein sequences in this dataset were determined using PSIPRED . In case of disordered proteins, sequence analysis were performed both using PSIPRED and DISOPRED .
Representative examples of structural variations in protein models that can be rationalized by oligomerization or crystal packing (A) Human mitochondrial Fis1∶1NZN/1PC2 (B) Interleukin 8∶3IL8/1IKM (C) Pancreatic spasmolytic peptide: 1PSP/1PCP (D) Sterol carrier protein-2∶1C44/1QND (E) The allergen PHL P2∶1WHO/1BMW.
Comparison between X-ray and NMR structures in different classes of proteins.
Comparison of root mean square deviations (r.m.s.d.) between the NMR ensemble and crystal structures.
Analysis of experimental conditions for X-ray and NMR pairs.
Analysis of experimental conditions in multi-protein complexes.
Conceived and designed the experiments: SKS BG. Performed the experiments: SKS SG BAM. Analyzed the data: SG BG. Contributed reagents/materials/analysis tools: BG. Wrote the paper: BG.
- 1. Berman H, Henrick K, Nakamura H, Markley JL (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 35: D301–D303.H. BermanK. HenrickH. NakamuraJL Markley2007The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data.Nucleic Acids Res35D301D303
- 2. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, et al. (2000) SCOP: a structural classification of proteins database. Nucleic Acids Res 28: 257–259.L. Lo ConteB. AileyTJ HubbardSE BrennerAG Murzin2000SCOP: a structural classification of proteins database.Nucleic Acids Res28257259
- 3. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, et al. (2007) DisProt: the Database of Disordered Proteins. Nucleic Acids Res 35: D786–793.M. SickmeierJA HamiltonT. LeGallV. VacicMS Cortese2007DisProt: the Database of Disordered Proteins.Nucleic Acids Res35D786793
- 4. Otterbein LR, Kordowska J, Witte-Hoffmann C, Wang CL, Dominguez R (2002) Crystal structures of S100A6 in the Ca(2+)-free and Ca(2+)-bound states: the calcium sensor mechanism of S100 proteins revealed at atomic resolution. Structure 10: 557–567.LR OtterbeinJ. KordowskaC. Witte-HoffmannCL WangR. Dominguez2002Crystal structures of S100A6 in the Ca(2+)-free and Ca(2+)-bound states: the calcium sensor mechanism of S100 proteins revealed at atomic resolution.Structure10557567
- 5. Sastry M, Ketchem RR, Crescenzi O, Weber C, Lubienski MJ, et al. (1998) The three-dimensional structure of Ca(2+)-bound calcyclin: implications for Ca(2+)-signal transduction by S100 proteins. Structure 6: 223–231.M. SastryRR KetchemO. CrescenziC. WeberMJ Lubienski1998The three-dimensional structure of Ca(2+)-bound calcyclin: implications for Ca(2+)-signal transduction by S100 proteins.Structure6223231
- 6. Enguita FJ, Pohl E, Turner DL, Santos H, Carrondo MA (2006) Structural evidence for a proton transfer pathway coupled with haem reduction of cytochrome c” from Methylophilus methylotrophus. J Biol Inorg Chem 11: 189–196.FJ EnguitaE. PohlDL TurnerH. SantosMA Carrondo2006Structural evidence for a proton transfer pathway coupled with haem reduction of cytochrome c” from Methylophilus methylotrophus.J Biol Inorg Chem11189196
- 7. Brennan L, Turner DL, Fareleira P, Santos H (2001) Solution structure of Methylophilus methylotrophus cytochrome c: insights into the structural basis of haem-ligand detachment. J Mol Biol 308: 353–365.L. BrennanDL TurnerP. FareleiraH. Santos2001Solution structure of Methylophilus methylotrophus cytochrome c: insights into the structural basis of haem-ligand detachment.J Mol Biol308353365
- 8. Schmidt L, Christensen HE, Harris P (2006) Structure of plastocyanin from the cyanobacterium Anabaena variabilis. Acta Crystallogr D Biol Crystallogr 62: 1022–1029.L. SchmidtHE ChristensenP. Harris2006Structure of plastocyanin from the cyanobacterium Anabaena variabilis.Acta Crystallogr D Biol Crystallogr6210221029
- 9. Ma L, Soerensen GO, Ulstrup J, Led JJ (2000) Elucidation of the paramagnetic R1 relaxation of heteronuclei and protons in Cu(II) plastocyanin deom Anabaena variabilis. J Am Chem Soc 122: 9473–9485.L. MaGO SoerensenJ. UlstrupJJ Led2000Elucidation of the paramagnetic R1 relaxation of heteronuclei and protons in Cu(II) plastocyanin deom Anabaena variabilis.J Am Chem Soc12294739485
- 10. Parge HE, Forest KT, Hickey MJ, Christensen DA, Getzoff ED, et al. (1995) Structure of the fibre-forming protein pilin at 2.6 A resolution. Nature 378: 32–38.HE PargeKT ForestMJ HickeyDA ChristensenED Getzoff1995Structure of the fibre-forming protein pilin at 2.6 A resolution.Nature3783238
- 11. Craig L, Taylor RK, Pique ME, Adair BD, Arvai AS, et al. (2003) Type IV pilin structure and assembly: X-ray and EM analyses of Vibrio cholerae toxin-coregulated pilus and Pseudomonas aeruginosa PAK pilin. Mol Cell 11: 1139–1150.L. CraigRK TaylorME PiqueBD AdairAS Arvai2003Type IV pilin structure and assembly: X-ray and EM analyses of Vibrio cholerae toxin-coregulated pilus and Pseudomonas aeruginosa PAK pilin.Mol Cell1111391150
- 12. Audette GF, Irvin RT, Hazes B (2004) Crystallographic analysis of the Pseudomonas aeruginosa strain K122–4 monomeric pilin reveals a conserved receptor-binding architecture. Biochemistry 43: 11427–11435.GF AudetteRT IrvinB. Hazes2004Crystallographic analysis of the Pseudomonas aeruginosa strain K122–4 monomeric pilin reveals a conserved receptor-binding architecture.Biochemistry431142711435
- 13. Keizer DW, Slupsky CM, Kalisiak M, Campbell AP, Crump MP, et al. (2001) Structure of a pilin monomer from Pseudomonas aeruginosa: implications for the assembly of pili. J Biol Chem 276: 24186–24193.DW KeizerCM SlupskyM. KalisiakAP CampbellMP Crump2001Structure of a pilin monomer from Pseudomonas aeruginosa: implications for the assembly of pili.J Biol Chem2762418624193
- 14. Seidel RD, Amor JC, Kahn RA, Prestegard JH (2004) Conformational changes in human Arf1 on nucleotide exchange and deletion of membrane-binding elements. J Biol Chem 279: 48307–48318.RD SeidelJC AmorRA KahnJH Prestegard2004Conformational changes in human Arf1 on nucleotide exchange and deletion of membrane-binding elements.J Biol Chem2794830748318
- 15. Bau R, Rees DC, Kurtz DM, Scott RA, Huang H, et al. (1998) Crystal Structure of Rubredoxin from Pyrococcus Furiosus at 0.95 Angstroms Resolution, and the structures of N-terminal methionine and formylmethionine variants of Pf Rd. Contributions of N-terminal interactions to thermostability. J Biol Inorg Chem 3: 484–493.R. BauDC ReesDM KurtzRA ScottH. Huang1998Crystal Structure of Rubredoxin from Pyrococcus Furiosus at 0.95 Angstroms Resolution, and the structures of N-terminal methionine and formylmethionine variants of Pf Rd. Contributions of N-terminal interactions to thermostability.J Biol Inorg Chem3484493
- 16. Hazzard J, Sudhof TC, Rizo J (1999) NMR analysis of the structure of synaptobrevin and of its interaction with syntaxin. J Biomol NMR 14: 203–207.J. HazzardTC SudhofJ. Rizo1999NMR analysis of the structure of synaptobrevin and of its interaction with syntaxin.J Biomol NMR14203207
- 17. Agarwal R, Schmidt JJ, Stafford RG, Swaminathan S (2009) Mode of VAMP substrate recognition and inhibition of Clostridium botulinum neurotoxin F. Nat Struct Mol Biol 16: 789–794.R. AgarwalJJ SchmidtRG StaffordS. Swaminathan2009Mode of VAMP substrate recognition and inhibition of Clostridium botulinum neurotoxin F. Nat Struct Mol Biol16789794
- 18. Sutton RB, Fasshauer D, Jahn R, Brunger AT (1998) Crystal structure of a SNARE complex involved in synaptic exocytosis at 2.4 A resolution. Nature 395: 347–353.RB SuttonD. FasshauerR. JahnAT Brunger1998Crystal structure of a SNARE complex involved in synaptic exocytosis at 2.4 A resolution.Nature395347353
- 19. Chen X, Tomchick DR, Kovrigin E, Arac D, Machius M, et al. (2002) Three-dimensional structure of the complexin/SNARE complex. Neuron 33: 397–409.X. ChenDR TomchickE. KovriginD. AracM. Machius2002Three-dimensional structure of the complexin/SNARE complex.Neuron33397409
- 20. Ernst JA, Brunger AT (2003) High resolution structure, stability, and synaptotagmin binding of a truncated neuronal SNARE complex. J Biol Chem 278: 8630–8636.JA ErnstAT Brunger2003High resolution structure, stability, and synaptotagmin binding of a truncated neuronal SNARE complex.J Biol Chem27886308636
- 21. Madan LL, Gopal B (2008) Addition of a polypeptide stretch at the N-terminus improves the expression, stability and solubility of recombinant protein tyrosine phosphatases from Drosophila melanogaster. Protein Expr Purif 57: 234–243.LL MadanB. Gopal2008Addition of a polypeptide stretch at the N-terminus improves the expression, stability and solubility of recombinant protein tyrosine phosphatases from Drosophila melanogaster.Protein Expr Purif57234243
- 22. Baeten L, Reumers J, Tur V, Stricher F, Lenaerts T, et al. (2008) Reconstruction of protein backbones from the BriX collection of canonical protein fragments. PLoS Comput Biol 4: e1000083.L. BaetenJ. ReumersV. TurF. StricherT. Lenaerts2008Reconstruction of protein backbones from the BriX collection of canonical protein fragments.PLoS Comput Biol4e1000083
- 23. Mezei M (1998) Chameleon sequences in the PDB. Protein Eng 11: 411–414.M. Mezei1998Chameleon sequences in the PDB.Protein Eng11411414
- 24. Streltsov VA, Varghese JN, Masters CL, Nuttall SD (2011) Crystal structure of the amyloid-beta p3 fragment provides a model for oligomer formation in Alzheimer’s disease. J Neurosci 31: 1419–1426.VA StreltsovJN VargheseCL MastersSD Nuttall2011Crystal structure of the amyloid-beta p3 fragment provides a model for oligomer formation in Alzheimer’s disease.J Neurosci3114191426
- 25. Luhrs T, Ritter C, Adrian M, Riek-Loher D, Bohrmann B, et al. (2005) 3D structure of Alzheimer’s amyloid-beta(1-42) fibrils. Proc Natl Acad Sci U S A 102: 17342–17347.T. LuhrsC. RitterM. AdrianD. Riek-LoherB. Bohrmann20053D structure of Alzheimer’s amyloid-beta(1-42) fibrils.Proc Natl Acad Sci U S A1021734217347
- 26. Tomaselli S, Esposito V, Vangone P, van Nuland NA, Bonvin AM, et al. (2006) The alpha-to-beta conformational transition of Alzheimer’s Abeta-(1-42) peptide in aqueous media is reversible: a step by step conformational analysis suggests the location of beta conformation seeding. Chembiochem 7: 257–267.S. TomaselliV. EspositoP. VangoneNA van NulandAM Bonvin2006The alpha-to-beta conformational transition of Alzheimer’s Abeta-(1-42) peptide in aqueous media is reversible: a step by step conformational analysis suggests the location of beta conformation seeding.Chembiochem7257267
- 27. Gilbert GE, Baleja JD (1995) Membrane-binding peptide from the C2 domain of factor VIII forms an amphipathic structure as determined by NMR spectroscopy. Biochemistry 34: 3022–3031.GE GilbertJD Baleja1995Membrane-binding peptide from the C2 domain of factor VIII forms an amphipathic structure as determined by NMR spectroscopy.Biochemistry3430223031
- 28. Liu Z, Lin L, Yuan C, Nicolaes GA, Chen L, et al. (2010) Trp2313-His2315 of factor VIII C2 domain is involved in membrane binding: structure of a complex between the C2 domain and an inhibitor of membrane binding. J Biol Chem 285: 8824–8829.Z. LiuL. LinC. YuanGA NicolaesL. Chen2010Trp2313-His2315 of factor VIII C2 domain is involved in membrane binding: structure of a complex between the C2 domain and an inhibitor of membrane binding.J Biol Chem28588248829
- 29. Pratt KP, Shen BW, Takeshima K, Davie EW, Fujikawa K, et al. (1999) Structure of the C2 domain of human factor VIII at 1.5 A resolution. Nature 402: 439–442.KP PrattBW ShenK. TakeshimaEW DavieK. Fujikawa1999Structure of the C2 domain of human factor VIII at 1.5 A resolution.Nature402439442
- 30. Ngo JC, Huang M, Roth DA, Furie BC, Furie B (2008) Crystal structure of human factor VIII: implications for the formation of the factor IXa-factor VIIIa complex. Structure 16: 597–606.JC NgoM. HuangDA RothBC FurieB. Furie2008Crystal structure of human factor VIII: implications for the formation of the factor IXa-factor VIIIa complex.Structure16597606
- 31. McGuffin LJ, Bryson K, Jones DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16: 404–405.LJ McGuffinK. BrysonDT Jones2000The PSIPRED protein structure prediction server.Bioinformatics16404405
- 32. Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT (2004) The DISOPRED server for the prediction of protein disorder. Bioinformatics 20: 2138–2139.JJ WardLJ McGuffinK. BrysonBF BuxtonDT Jones2004The DISOPRED server for the prediction of protein disorder.Bioinformatics2021382139
- 33. Tanaka S, Ataka M (2002) Protein crystallization induced by polyethylene glycol: A model study using apoferritin. Journal of Chemical Physics 117: 3504–3510.S. TanakaM. Ataka2002Protein crystallization induced by polyethylene glycol: A model study using apoferritin.Journal of Chemical Physics11735043510
- 34. Kumar V, Sharma VK, Kalonia DS (2009) Effect of polyols on polyethylene glycol (PEG)-induced precipitation of proteins: Impact on solubility, stability and conformation. Int J Pharm 366: 38–43.V. KumarVK SharmaDS Kalonia2009Effect of polyols on polyethylene glycol (PEG)-induced precipitation of proteins: Impact on solubility, stability and conformation.Int J Pharm3663843
- 35. Morimoto K, Furuta E, Hashimoto H, Inouye K (2006) Effects of high concentration of salts on the esterase activity and structure of a kiwifruit peptidase, actinidain. J Biochem 139: 1065–1071.K. MorimotoE. FurutaH. HashimotoK. Inouye2006Effects of high concentration of salts on the esterase activity and structure of a kiwifruit peptidase, actinidain.J Biochem13910651071
- 36. Lin SY, Li MJ, Ho CJ (1999) pH-dependent secondary conformation of bovine lens alpha-crystallin: ATR infrared spectroscopic study with second-derivative analysis. Graefes Arch Clin Exp Ophthalmol 237: 157–160.SY LinMJ LiCJ Ho1999pH-dependent secondary conformation of bovine lens alpha-crystallin: ATR infrared spectroscopic study with second-derivative analysis.Graefes Arch Clin Exp Ophthalmol237157160
- 37. Fleming PJ, Richards FM (2000) Protein packing: dependence on protein size, secondary structure and amino acid composition. J Mol Biol 299: 487–498.PJ FlemingFM Richards2000Protein packing: dependence on protein size, secondary structure and amino acid composition.J Mol Biol299487498
- 38. Rother K, Hildebrand PW, Goede A, Gruening B, Preissner R (2009) Voronoia: analyzing packing in protein structures. Nucleic Acids Res 37: D393–395.K. RotherPW HildebrandA. GoedeB. GrueningR. Preissner2009Voronoia: analyzing packing in protein structures.Nucleic Acids Res37D393395
- 39. Schellenberg MJ, Ritchie DB, Wu T, Markin CJ, Spyracopoulos L, et al. (2010) Context-dependent remodeling of structure in two large protein fragments. J Mol Biol 402: 720–730.MJ SchellenbergDB RitchieT. WuCJ MarkinL. Spyracopoulos2010Context-dependent remodeling of structure in two large protein fragments.J Mol Biol402720730
- 40. Minor DL Jr, Kim PS (1994) Context is a major determinant of beta-sheet propensity. Nature 371: 264–267.DL Minor JrPS Kim1994Context is a major determinant of beta-sheet propensity.Nature371264267
- 41. Minor DL Jr, Kim PS (1996) Context-dependent secondary structure formation of a designed protein sequence. Nature 380: 730–734.DL Minor JrPS Kim1996Context-dependent secondary structure formation of a designed protein sequence.Nature380730734
- 42. Plaxco KW, Gross M (1997) Cell biology. The importance of being unfolded. Nature 386: 657, 659. KW PlaxcoM. Gross1997Cell biology. The importance of being unfolded.Nature 386: 657, 659
- 43. Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 293: 321–331.PE WrightHJ Dyson1999Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm.J Mol Biol293321331
- 44. de Groot BL, van Aalten DM, Scheek RM, Amadei A, Vriend G, et al. (1997) Prediction of protein conformational freedom from distance constraints. Proteins 29: 240–251.BL de GrootDM van AaltenRM ScheekA. AmadeiG. Vriend1997Prediction of protein conformational freedom from distance constraints.Proteins29240251
- 45. Ruvinsky AM, Vakser IA (2010) Sequence composition and environment effects on residue fluctuations in protein structures. J Chem Phys 133: 155101.AM RuvinskyIA Vakser2010Sequence composition and environment effects on residue fluctuations in protein structures.J Chem Phys133155101
- 46. Kleywegt GJ, Jones TA (1997) Detecting folding motifs and similarities in protein structures. Methods Enzymol 277: 525–545.GJ KleywegtTA Jones1997Detecting folding motifs and similarities in protein structures.Methods Enzymol277525545
- 47. Koradi R, Billeter M, Wuthrich K (1996) MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 14: 51–55, 29–32. R. KoradiM. BilleterK. Wuthrich1996MOLMOL: a program for display and analysis of macromolecular structures.J Mol Graph 14: 51–55, 29–32
- 48. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.W. KabschC. Sander1983Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.Biopolymers2225772637
- 49. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410.SF AltschulW. GishW. MillerEW MyersDJ Lipman1990Basic local alignment search tool.J Mol Biol215403410