Principal component analysis of alpha-helix deformations in transmembrane proteins

α-helices are deformable secondary structural components regularly observed in protein folds. The overall flexibility of an α-helix can be resolved into constituent physical deformations such as bending in two orthogonal planes and twisting along the principal axis. We used Principal Component Analysis to identify and quantify the contribution of each of these dominant deformation modes in transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. Using three α-helical samples from Protein Data Bank entries spanning these three cellular contexts, we determined that the relative contributions of these modes towards total deformation are independent of the α-helix’s surroundings. This conclusion is supported by the observation that the identities of the top three deformation modes, the scaling behaviours of mode eigenvalues as a function of α-helix length, and the percentage contribution of individual modes on total variance were comparable across all three α-helical samples. These findings highlight that α-helical deformations are independent of cellular location and will prove to be valuable in furthering the development of flexible templates in de novo protein design.


α-helices are deformable bodies
The α-helix is an essential secondary structural component commonly observed in native state protein folds. α-helices are broadly classified as a series of backbone atoms arranged in a righthanded helix with a large dipole moment through backbone carbonyl groups that all point in the same direction. The Ramachandran diagram studies backbone steric clashes and degrees of freedom to conclude on which dihedral angles are most appropriate for the α-helix [1]. The helical geometry is typically specified as having a periodicity of 3.6 residues and a rise of 5.4 Å per helix turn. Although these parameters are generally used to specify the α-helix, by no means is it an immutable structure. α-helices are flexible bodies, as further evidenced by the variety of helical deformations that are recorded in Protein Data Bank (PDB) submissions [2]. The ability to quantify the deformations of flexible elements in a protein fold is paramount for the development of flexible templates in computational de novo protein design.
The earliest computational protein design strategies focused on rigid backbone templates. The atomic coordinates of these templates were fixed to simplify the design process and reduce the combinatorial complexity in searching for an optimal protein fold [3]. Studies done with these fixed templates identified sets of side-chain conformations, known as rotamers, that could build a stable protein core for the de novo protein [3]. These protein cores were wellsuited for folding by hydrophobic collapse, thereby providing a low-energy structure which could stabilize the surface regions [3]. Although the rigid backbone template is a relatively simple model, it is scrutinized for ignoring backbone flexibility. The superposition of 20 different nuclear magnetic resonance structures of PDB entry 1AEL shows slight positional variations in the backbone atom positions [3]. This implies that rigid templates do not properly balance packing energies and deformation energies [4]. Flexible templates offer more design parameters to refine, which introduces the possibility that these templates can further optimize the free energy of a protein fold, with the drawback of a greater computational complexity. These additional parameters stem from backbone flexibility on the atomic scale and the collective flexible motions of secondary structures. The collective deformations experienced by α-helices can be resolved into individual deformation modes (such as bending and twisting), which from a computational standpoint, represent additional degrees of freedom in the de novo protein design process over existing rigid template design studies [5,6].

α-helix flexibility is analyzed through constituent deformation modes
α-helix flexibility can be investigated using Principal Component Analysis (PCA) on the atomic coordinates of α-helices collected from the PDB. PCA is a data-driven analysis that can be performed on a sample of static α-helical structures to reveal their principal components. In this context, principal components and deformation modes are interchangeable terms because they both originate from two distinct models (PCA and normal mode analysis) that draw similar conclusions on the flexibility of an α-helix. These modes are each represented by one physical deformation and their individual contribution to the overall deformation of the α-helix is quantified by an eigenvalue (λ). We illustrate the three dominant principal components exhibited in α-helices in Fig 1. Previous work identified that the three dominant modes of flexibility from the PCA of αhelices are two bending modes and one twist mode [4]. The two largest eigenvalues capture two nearly degenerate bending modes in two orthogonal planes, which is owed to the approximate cylindrical symmetry of an α-helix [4]. The third largest eigenvalue represents a twisting mode along the principal axis of the α-helix [4]. Each deformation mode has a pair of extreme cases, which are shown individually in each subfigure of Fig 1A-1C, but when these extremes are superimposed, they provide a visual aide on the bounds between which an α-helix may deform (See S1 Fig). The work done by Emberly et al. determined these three dominant deformation modes and studied the scaling behaviour of the eigenvalues as a function of the α-helix length [4]. We aim to expand on that research by elaborating on how the dominant deformation modes and scaling behaviour depend on the location of the α-helix in the cell, namely, whether the protein is surrounded by membrane or aqueous environments.
In the past decades, bioinformaticians struggled with the scarcity of high-resolution structural information of transmembrane proteins [7][8][9]. The amount of publicly available transmembrane data over time has been tracked by Stephen White and co-workers, where they catalogue high-resolution structures of membrane proteins as part of their mpstruc database [10]. In 2003, at the time of the work completed by Emberly et al. [4], 88 membrane proteins were listed on the mpstruc database [10]. This shortage of data would not have led to a comprehensive and convincing analysis for comparing the deformation modes of α-helices in soluble proteins and membrane proteins. Our work covers three different α-helix types: 5aex, 5af1, 5an8, 5ayn, 5azb, 5azd, 5b57, 5c78, 5c8j, 5cfb, 5ctg, 5cxv, 5d0y, 5d3m, 5d91, 5da0, 5dir, 5do7, 5doq, 5dqq, 5dsg, 5dwy, 5e9s, 5egi, 5eh4, 5ek0, 5eke, 5eqg, 5eul, 5ezm, 5fgn, 5fl7, transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. We aim to substantiate and validate the conclusions reached by Emberly et al. [4] using a dataset that is over 500% the size of theirs. Furthermore, we expand the study of dominant principal components into several cellular environments to examine how an α-helix's cellular milieu affects the physical deformations it experiences in its native state.
As an α-helix approaches its native state conformation, the total deformation it experiences will be partitioned between bending and twisting. We study this partition using the variance explained by each principal component as a function of the α-helix length across membrane and aqueous environments. If these profiles are similar between cellular environments, then the variance explained by each deformation mode would exclusively rely on α-helix geometry. The variance explained by each principal component as a function of the α-helix length consequently describes an important relationship between the proportion of deformation manifested as bending or twisting, the cellular milieu of the α-helix, and the α-helix length; however, these profiles would not describe differences in α-helical mechanical properties (intensive properties) across cellular milieus. For example, prior work from Bavi et al. used molecular dynamics to estimate the Young's modulus of α-helices from M. tuberculosis and E. coli homolog mechanosensitive channels [11]. Their work concludes that the Young's modulus from α-helix stretching simulations is higher in a vacuum than it is in water [11], but this result would not describe exactly how variance is partitioned between the constituent modes.

Transmembrane and soluble proteins have notable similarities and differences
Transmembrane α-helices and α-helices in soluble proteins have different amino acid compositions. The analysis done by Baeza-Delgado et al. on amino acid composition in α-helices revealed that transmembrane α-helices possess glycine and large hydrophobic amino acids such as leucine, valine, isoleucine, and phenylalanine more frequently whereas polar amino acids like glutamate, lysine, asparagine, arginine, and glutamine were less prevalent [8]. Although their study had 792 transmembrane α-helices and 7348 α-helices in soluble proteins compared to our study with 6075 transmembrane α-helices and 6716 α-helices in soluble proteins, our conclusions on the most prevalent amino acid types were the same (S2 Fig).
In a bioinformatic study of the yeast membrane proteome where membrane-embedded transmembrane residues were compared with extramembrane residues, it was concluded that for a fixed degree of residue burial, transmembrane regions evolve 42% more slowly than extramembrane regions using the ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions at the DNA level [12]. The transmembrane regions evolve more slowly since the membrane environment imposes greater selective constraint than the aqueous environment surrounding the extramembrane regions [12][13][14]. Even more, residue evolutionary rate scales in a strong, positive, and linear trend with relative solvent accessibility in both transmembrane and extramembrane regions of membrane proteins [12]. Although extramembrane regions of membrane proteins and soluble proteins have different functional roles, they are both surrounded by an aqueous environment and have similar linear relationships between residue-level evolutionary rate and relative solvent accessibility [12].
Hydrogen bonding is a crucial force in preserving native state transmembrane protein folds. A polar residue in a transmembrane protein is thermodynamically unfavourable unless it is in a hydrogen bonded state as a result of the low dielectric constant of the membrane environment [15]. Transmembrane apolar to polar mutations can lead to non-native hydrogen bonding which can compromise protein function and lead to diseased phenotypes [15]. The glycine-to-arginine mutation alone leads to 4.8% of all transmembrane domain phenotypic mutations, which is statistically more frequent than its occurrence in soluble proteins [15]. More generally, Partridge et al. determined that residues which participate in hydrogen bonds "are overrepresented as molecular causes of disease when they replace a native [transmembrane domain] residue" [16].
Transmembrane α-helices exhibit structural irregularities more frequently than α-helices in soluble proteins. The standard α-helix is defined in terms of several key metrics including the number of residues per turn (which falls between 3.4 and 4.0) and the rise per residue (between 1.36 Å and 1.76 Å) [17]. α-helix structural irregularities include kinks, the 3 10 -helix, and the πhelix [17]. If the local bending angle at a residue within an α-helix is greater than 20˚, then the hydrogen bond between residue i and i +4 is broken, and it is consequently called a kinked helix [17]. Hall et al. determined that 44% of transmembrane α-helices had a significant helical kink, with 35% of those kinks caused by proline [18]. The angles of proline-based helical kinks are modulated by proximal serines and threonines [18,19]. Non-proline kinks were mainly associated with serines and glycines at the center of the kink [7,18]. In particular, the serine side chain of residue i forms a hydrogen bond with either residue i−4 or i+4 [7,18]. The 3 10helix is a tight-turning and tall α-helix with a periodicity of less than 3.4 residues per helix turn and a rise of greater than 1.76 Å per residue [17]. The π-helix is a wide-turning and short αhelix with a periodicity of greater than 4.0 residues per helix turn and a rise of less than 1.36 Å per residue [17]. Kinks (K), kinks associated with tight turns (K−3 10 ), and kinks associated with wide turns (K−π) are more frequently observed irregularities in transmembrane α-helices than in α-helices in soluble proteins [17]. More specifically, the ratios (TM:soluble) are 6:1 for K, 9:5 for tight turns, and 11:4 for wide turns [17]. These irregularities are biologically relevant as White et al. show that serine and threonine motifs shape the local structure of transmembrane α-helices through local kinking to improve both solvation and flexibility [20].
In response to the similarities and differences between transmembrane and soluble proteins on a residue-level, we studied the effect of an α-helix's cellular environment on its deformation modes, the scaling behaviour of its eigenvalues, and the contribution of each physical deformation to the overall flexibility of the secondary structure.

Results and discussion
There are notable comparisons between transmembrane proteins and soluble proteins highlighted by previous research on amino acid propensity, residue-level evolutionary rates, hydrogen bonding, and the frequency of structural irregularities. We investigated the effect of the surrounding environment on the deformation behaviours of transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. As deformable bodies, the flexibility of an α-helix can be quantified through the collective deformations of its residues using Principal Component Analysis (PCA) [4].

The total deformation of an α-helix can be resolved into deformation modes
N α-helices of a given length (L residues) were collected from PDB entries (See Methods). Once the α-helices were structurally aligned, the raw data for PCA comprised of an N by 3L matrix of transformed 3D α-carbon atomic coordinates. We decided to use the α-carbon positions instead of all backbone atoms because α-carbon position appropriately captures the geometry of the backbone and to remain consistent with Emberly et al. [4]. Upon performing PCA, the total deformation of the α-helical sample was segmented into constituent modes, with each mode describing a part of the total deformation. The contribution of each mode to the flexibility of an α-helix is quantified with an eigenvalue (λ). These eigenvalues measure the variance in Å 2 captured by an individual deformation mode. The eigenvalues associated with each of the 3L principal components were computed for transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins in the range 10�L�25 for a total of 48 sets of eigenvalues.

The deformation modes have different magnitudes in different cellular milieu
In Fig  , in which they also report a nearly degenerate pair of PCA bending modes with nearly identical eigenvalues [4]. Across all three α-helix types in Fig 2A, the twisting mode represented a smaller contribution to the total deformation with the third largest eigenvalue.
The deformation modes that we elucidated from our samples were larger in magnitude (i.e. the eigenvalues were larger) than those published by Emberly et al. [4] in the range of 10�L�25. This implies that the total variance in each of our α-helical samples were greater than the total variance in their dataset. This is due to the fact that their threshold for accepting potential candidate α-helices (done by selecting unbroken series of residues with dihedral angles {ϕ,ψ = −50˚±30˚,−50˚±30˚}) [4] was more stringent than ours. In other words, their study was more likely than our study to reject α-helices with more extreme deformation types.
On the topic of total variance exhibited by a helical dataset, since there are different physical constraints in the plasma membrane and the cytoplasm due to differences in hydrogen bonding and electrostatic interactions between the two environments, the total variance in helical deformation will be different in each cellular setting. Therefore, for each respective mode in transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins, the eigenvalues should not equal one another, and the amplitude of the individual deformation modes cannot be meaningfully compared across different cellular milieus. To address differences in total variance between each dataset, we normalized the eigenvalues by the total variance in their respective datasets as shown in Fig 2B. The resulting percentage of variance explained is a more worthwhile metric to compare since it describes on a percentage basis the way that total deformation is partitioned between constituent modes.
In the range 10�L�25, focusing on individual deformation modes, we found the eigenvalues between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins were different. This suggests that the eigenvalues of the deformation modes of an α-helix depend on its cellular environment, owing to differences in the physical constraints of these environments. The amplitudes of the α-helical deformation modes rely on the environmental constraints which restrict their deformation. Other metrics such as the helix's scaling behaviour may not necessarily be reliant on these constraints. To investigate this claim further, we studied the scaling behaviour of the three dominant deformation modes.

PLOS ONE
For Bend 1 and Bend 2, the scaling exponents were very similar between transmembrane αhelices, extramembrane α-helices, and α-helices in soluble proteins. The scaling exponent of atom of a standard α-helix (with a periodicity Δθ of 3.6 residues per helix turn, a rise Δz of 1.5 Å per residue) to its corresponding atom on the deformed α-helix. The tails of these arrows are all translated to the corresponding atom on the deformed α-helix to more easily illustrate how each atom is pulled under the influence of a particular deformation mode.
https://doi.org/10.1371/journal.pone.0257318.g001 Twist is similar between the three different types of helices, especially for the transmembrane α-helices and α-helices in soluble proteins. Moreover, the scaling exponents of the twisting mode are consistently lower than the scaling exponents of the two bending modes across all three α-helix types. The distinction between the bending mode exponents and the twisting mode exponent exists due to the way in which the deformation modes induce displacements away from a mean α-helical structure: for bending modes, these displacements increase quadratically with α-helix length (δx�L 2 /R,λ bend /L 4 ) [4]; however, for the twisting mode, these displacements increase linearly with helix length (δx�Lδθ,λ twist /L 2 ) [4]. In this approach, the scaling of PCA eigenvalues of an α-helix was likened to the scaling of a fluctuating elastic rod in thermal equilibrium [4], irrespective of the rod's surrounding environment.
The final column of Table 1 summarizes a key conclusion made by Emberly et al. in their comparisons of the principal components of PCA with the dynamical normal modes of normal mode analysis (NMA) [4]. Unlike PCA, which summarizes a set of related static atomic structures, NMA describes protein dynamics through the collective motions of atoms [21][22][23]. Emberly et al. used a spring model describing the thermodynamics of a free α-helix to determine normal mode eigenvalues representative of the lowest energy deformations and described an inverse relationship between the principal component eigenvalues and the spring constants [4]. In their study, since the top three principal components agreed with the three lowest-energy normal modes, they concluded that the scaling behaviours between PCA modes and normal modes must also match [4]. By approximating an α-helix as an elastic rod, they identified that the two bending modes scale with λ bend /L 4 and that the twisting mode scales with λ twist /L 2 [4]. In other words, the data-driven methods of PCA and the fundamental physics arguments of NMA reach the same conclusions on how α-helices behave as deformable bodies.
In principle, the results of the NMA should be the same regardless of which environment the elastic rod is located, so the α-helical normal modes identified by Emberly et al. are extendable to membrane environments [4]. The results in Table 1 show consistency in PCA scaling behaviour of mode eigenvalues between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. This is evidence that the way deformation depends on αhelix geometry (i.e., scales with helix length) is independent of cellular microenvironment.

The contribution of each deformation mode as a fraction of total α-helix flexibility
Next, we investigated the percentage of contribution made by each deformation mode to the overall flexibility. Since the eigenvalues each measure the variance in Å 2 captured by an individual deformation mode and the total variance was different in each of the three α-helical samples that we investigated, it would be worthwhile to normalize the eigenvalues across all three α-helical samples as a percentage of their total variance (from all 3L deformation modes) for 10�L�25. Then, eigenvalue trends can be observed independent of the differences in total variance between the three α-helix samples.
The eigenvalues of the deformation modes are normalized in Fig 3 to display trends across the principal component number and trends along the α-helix length. When comparing Table 1

λ/L ∎ Transmembrane α-helices Extramembrane α-helices α-helices in soluble proteins α-helices in soluble proteins [4]
Bend transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins in Fig 3, the collection of sixteen lines in each panel are all generally concave up. By inspection, the blue lines, which describe the relative contribution of each deformation mode for L = 25, have a much greater concavity (steeper initial 'slope') than the red lines, which describe the relative contribution of each deformation mode for L = 10. This means that the fraction (λ Bend1 +λ-Bend2 )/λ Twist is much greater in 25-residue α-helices than in 10-residue α-helices. This follows our intuition well since we expect large, exaggerated bends to hold a greater contribution to the total deformation in the longer α-helices. In fact, the percentage of variance explained by the twisting mode is lower in 25-residue α-helices than in 10-residue α-helices across all three α-helix types shown in Fig 3. While the fourth and fifth deformation modes are not negligible in magnitude when compared with the three dominant deformation modes, we decided to focus on the first three because they capture the majority of variance explained. This is illustrated more clearly in Fig  4, where we can more closely examine how Bend 1, Bend 2, and Twist-the most prominent physical deformations-contribute the majority of variance explained in each cellular environment.
Following each of the pink lines in Fig 4 from left to right, the summed contributions of the first three principal components describe around 60% of the variance explained for L = 10 and the variance explained rises to around 75% as the α-helix length increases to L = 25. This observation is invariant to changes in the location of α-helices in the cell. This remarkable similarity between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins is another indication towards α-helix principal components relying primarily, if not solely, on the geometry as opposed to its cellular environment.
The relative importance of the bending modes in explaining the total variance within all three samples increases as the α-helix gets longer as illustrated by the red and blue lines in Fig  4. The relative importance of the twist mode in explaining the total variance within all three samples lowers as the α-helix gets longer as illustrated by the green line in Fig 4. These directional trends match the results of the previous study done by Emberly et al. on 680 α-helices in coiled-coil structures [4]. In these coiled-coiled motifs, once α-helix lengths exceeded 80 residues, higher-order harmonics of the bend mode become lower in energy than the twist mode (i.e. the higher-order harmonics of the bend mode explain a greater percentage of variance than the twist mode) [4]. This means that in α-helices in coiled-coil motifs with lengths greater than 80 residues (and in free α-helices with lengths exceeding 33 residues), the twisting mode will cease to be the third lowest normal mode (and therefore will no longer be the third largest eigenvalue in PCA either as we had represented in Fig 2) [4]. This is consistent with the steady decrease in the percentage of variance explained by the twisting mode in the range 10�L�25 across all three α-helix types that we observed in Fig 4. The diminishing importance of the twisting mode across all α-helix types as L increases implies that higher-order harmonics of the bending mode will overshadow the twisting mode in longer α-helices regardless of the αhelix's location in the cell. This overshadowing of the twisting mode will rarely be a concern in transmembrane α-helices, and consequently transmembrane protein design since the thickness of the cell membrane imposes a natural constraint on the maximal length of transmembrane α-helices.

Fig 3. Each line represents the percentage of total variance explained by the first ten principal components for αhelices of a certain length (L).
Sixteen lines are plotted to illustrate this trend in the range 10�L�25. The length of the α-helix in question is represented by the colour and thickness of each line. These distributions were plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. The structures of PDB entries 3JBR [24] and 5AM9 [25] are shown for illustrative purposes. https://doi.org/10.1371/journal.pone.0257318.g003 Returning to the computational work on an α-helix's Young's modulus by Bavi et al., they determined that water acts as a 'lubricant' as the TM1 α-helix in a mechanosensitive channel pore is elongated [11]. At first glance, since the reported Young's modulus of their simulated α-helix is higher in a vacuum than it is in water (i.e., the α-helix is stiffer in a vacuum than in water) [11], it would appear to contradict our conclusion that deformation mode scaling behaviour and percentage of variance explained are independent of cellular surroundings. Deformation modes (including the profiles of variance explained) across cellular milieus cannot be directly compared with an intensive property like Young's modulus. For Bavi et al., the difference in Young's modulus is attributed to changes in the number of hydrogen bonds between the solvent and the helix [11], but for our study, a constant number of native state hydrogen bonds are automatically accounted for in the static PDB structure of each α-helix. Consequently, it is possible to have a lower Young's modulus in an aqueous environment, while also maintaining the same percentage of variance explained profile seen in both a membrane environment and an aqueous environment.
We considered the possibility that the resolution of the protein structures used to pursue our study could affect the deformation mode eigenvalues, scaling behaviour, and percentage of total variance explained that we observe. The average resolution of soluble proteins collected in our study is 2.31 Å and the average resolution of soluble proteins collected in our study is 3.02 Å (see the histograms in S4 Fig). We repeated our analysis on structures within our original three datasets that have a resolution of � 3 Å. The ten largest eigenvalues of 18-residue αhelices across the three datasets in protein structures with a resolution of � 3 Å are presented in S5 Fig. Using these eigenvalues, the scaling exponents (in S2 Table), and the percentage of variance explained by each deformation mode (in S6 and S7 Figs) were calculated. The results of our high-resolution analysis closely match the ones presented in our main study, except for the extramembrane α-helices' scaling behaviour. With a resolution of 3 Å as an upper bound, the extramembrane α-helix dataset shrunk to about 20% of its original size. As presented in S2 Table, this resulted in a Bend 2 scaling exponent of 2.9 (NMA predicts a scaling exponent of 4 for bending modes) and a Twist scaling exponent of 2.1 (NMA predicts a scaling exponent of 2 for the twisting mode).
Future work stemming from our analysis could go in several directions. We decided to use L α-carbons in each α-helix for PCA to remain consistent with Emberly et al. [4] and pursued the assumption that in any one α-helix, if side chain-environment interactions led to some native state structural deformation of the backbone, then it might be manifested in the corresponding α-carbon coordinates that we see in the PDB. It would be worthwhile to include side chain identities in PCA, which would imply that the dataset would need to be segmented by cellular microenvironment, α-helix length, as well as by sequence. This would require a far greater amount of data than is available now. Moreover, in future work, α-helices could be stratified by their degree of solvent exposure, but this would also require more data than is available now, especially for membrane proteins.
In addition to including residue identity and degree of solvent exposure, future analyses could include all α-helix backbone atoms. This would open the possibility of using torsion The percentage of total variance explained by each of the first three principal components individually (red, blue, and green) and combined (pink) for α-helices with helix lengths (L) in the range 10�L�25. The red, blue, and green lines represent the contributions of Bend 1, Bend 2, and Twist modes respectively towards explaining the total variance. The pink line represents the summed contributions of the first three principal components towards explaining the total variance. These results are plotted for (A) transmembrane α-helices, (B) extramembrane α-helices, and (C) α-helices in soluble proteins. The structures of PDB entries 3JBR [24] and 5AM9 [25] are shown for illustrative purposes.
https://doi.org/10.1371/journal.pone.0257318.g004 angle representations since this approach follows the assumption that bond lengths are invariant. Since the distance between α-carbons is not uniform, this internal representation would not be accurate with the α-carbon dataset we used to pursue this study. Furthermore, an analysis of all α-helix backbone atoms could lead to an improved understanding of how the prevalence of structural irregularities such as kinks between transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins depend on α-helix length.
In our analysis, the top three deformation modes are manifested as Bend 1, Bend 2, and Twist specifically because PCA outputs the principal components using an orthogonal basis. We selected PCA as it is considered a data-driven counterpart to NMA [4]. It is possible as future work to analyze the α-helix atomic coordinates using other data-driven approaches such as Independent Component Analysis (ICA), which will not force the components into an orthogonal basis. At the same time, the independent components likely will present the results differently in such a way that they would not be directly comparable to NMA.

Conclusion
We investigated the relationship between the cellular surroundings of an α-helix and their deformation modes by performing PCA on three α-helical samples representative of three different cellular contexts: transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins. Our findings confirmed that for α-helices with lengths in the range of 10-25 residues, the total deformation is described primarily by two nearly degenerate bending modes and a twisting mode. The eigenvalues, which quantify the variance in the sample captured by each individual deformation mode, were calculated across all three cellular milieus and used to study the scaling behaviour of the eigenvalues as a function of the α-helix length using a power law function. The scaling exponents were consistent across the three types of α-helices even though the eigenvalues were not comparable. The independence of deformation mode scaling behaviour on cellular surroundings supports the theory and applicability of normal mode analysis in diverse cellular contexts [4]. The different physical constraints of each cellular environment led to differences in the total variance of each dataset, implying that the amplitudes of individual deformation modes were different across the three different samples. We then studied the contribution of each deformation mode as a fraction of the total deformability in our α-helical samples by plotting mode eigenvalues that were normalized by the total variance of their respective datasets. From these plots, we inferred that the relative contributions of the bending modes and the twisting mode towards the total deformation relied on the length of the α-helix, and not their environment. The similarity between the scaling behaviour and percentage of variance explained profiles of transmembrane α-helices, extramembrane α-helices, and α-helices in soluble proteins can be incorporated in flexible templates in computational protein design to refine the structures of de novo transmembrane proteins.
Methods 667 PDB entries classified as α-helical transmembrane proteins were collected from the mpstruc database for Membrane Proteins of Known 3D Structure [10]. These PDB files have α-helix annotations. Their corresponding entry was collected from the Orientations of Proteins in Membranes (OPM) Database from the University of Michigan [26,27]. The OPM PDB files modify the Standard Research Collaboratory for Structural Bioinformatics (RCSB) PDB entries by rotating the coordinate system of the 3D atomic coordinates [26,27]. They set the origin (0,0,0) at the center of the membrane bilayer as illustrated in S8A Fig. The z-axis points to the extracellular space and it is a normal vector with respect to the membrane. The OPM PDB files also include the '½ of bilayer thickness' remark at the top of the file [26,27]. This reported bilayer thickness was used to determine which α-carbons are located inside the membrane.
RCSB PDB files have α-helix annotation information whereas OPM PDB files have transmembrane region information. When these two pieces of information are brought together, then transmembrane α-helical regions can be properly identified and annotated. Each residue (or more specifically, the α-carbon associated with each residue) of the 667 α-helical transmembrane proteins was annotated as either part of an α-helix, as part of a transmembrane region, as part of a transmembrane α-helix (both), or having no annotation (neither).
Once annotation is complete, the outputted files are then imported into MATLAB for structural alignment. To prepare the input data for PCA, N α-helices of equal amino acid length (L) must first be superposed. The goal is to optimally overlay each candidate α-helix (represented as an L by 3 matrix) with the ideal α-helix using only translations and rotations. We parameterized an ideal α-helix with a periodicity Δθ of 3.6 residues per helix turn, a rise Δz The entire methodology outlined above was repeated for two other types of α-helices: extramembrane α-helices and α-helices in soluble proteins. This was done to verify Emberly et al.'s results [4] on α-helix deformation modes and to highlight any potential differences in α-helix flexibility that would arise from its dependence on the surrounding environment. The 667 PDB entries that were used to collect transmembrane α-helix data were also used to collect extramembrane α-helix data. α-carbon atomic coordinates annotated with 'Alpha Helix' in S8B Fig were used as extramembrane α-helix data for import into MATLAB for superposition as well as for PCA. 959 PDB entries were consulted to acquire the data for α-helices in soluble proteins. Files resembling the one in S8B Fig for soluble proteins were prepared in Python 3, and the data was imported into MATLAB for superposition and PCA as outlined in the above methodology.
Once the main deformation modes of each α-helix type were characterized as shown in Fig  1, the scaling behaviours of each mode for each α-helix type was studied (i.e., the relationships between eigenvalues (λ) and α-helix length (L) were elucidated). The scaling exponents recorded in Table 1 were calculated using a log-log plot of the α-helix lengths (10�L�25) against the PCA mode eigenvalues using the Curve Fitting Toolbox in MATLAB. The three dominant deformation modes were inspected individually under a power law function. When the eigenvalue data was fit to the relationship log(λ) = a log(L)+b, the parameter a was the appropriate scaling exponent to fulfill the λ/L ∎ relationship in Table 1. In the first step of superposition, the centroid of the candidate α-helix is translated to the origin. (C) In the second step of superposition, the candidate α-helix is rotated with respect to the ideal α-helix. (D) The displacement between the z-coordinate of α-carbon 6 in candidate α-helix 3 of the sample and the z-coordinate of α-carbon 6 in the mean α-helix is one of many data points in the raw data for PCA. (E) The raw data for PCA is an N by 3L matrix recording the displacements between each atomic coordinate of the transformed candidate α-helix and the corresponding atomic coordinate in the mean α-helix. (TIF) S1 Table. The power law relationship between the eigenvalues (λ) of the first three deformation modes and the α-helix length (L).