Conformational Preference of ‘CαNN’ Short Peptide Motif towards Recognition of Anions

Among several ‘anion binding motifs’, the recently described ‘CαNN’ motif occurring in the loop regions preceding a helix, is conserved through evolution both in sequence and its conformation. To establish the significance of the conserved sequence and their intrinsic affinity for anions, a series of peptides containing the naturally occurring ‘CαNN’ motif at the N-terminus of a designed helix, have been modeled and studied in a context free system using computational techniques. Appearance of a single interacting site with negative binding free-energy for both the sulfate and phosphate ions, as evidenced in docking experiments, establishes that the ‘CαNN’ segment has an intrinsic affinity for anions. Molecular Dynamics (MD) simulation studies reveal that interaction with anion triggers a conformational switch from non-helical to helical state at the ‘CαNN’ segment, which extends the length of the anchoring-helix by one turn at the N-terminus. Computational experiments substantiate the significance of sequence/structural context and justify the conserved nature of the ‘CαNN’ sequence for anion recognition through “local” interaction.

Analyses of multiple families of proteins with different folds have shown that specific sites, mainly comprised of the main-chain atoms of three to four amino acid residues, participate in the formation of a ''functional surface'' where anions usually reside and that the interaction is mediated by proper positioning of the polypeptide backbone [11][12][13][14]. Among the several recognized 'anion binding motifs' [e.g. 'C a NN' [13], 'nests' [14][15][16], 'structural P-loop' [17], 'cup' [18] along with motifs for recognizing adenine, adenine-containing nucleotides and its analogues [12,19,20]], the recently identified 'C a NN' motif [13], common to more than 100 fold-representative protein structures as observed in the FSSP database, has a specially characteristic feature. This motif consists of main chain atoms of three consecutive residues (C a 21 N 0 , N +1 ), often present in the active sites of proteins and participates directly in several key regulating functions [1][2][3]21].
This 'C a NN' motif, evolutionarily conserved in sequence (having GXX 'sequence motif') and conformation, normally occurs in a loop region preceding a helix (anchoring helix) [13,22,23]. Further, this motif, usually possessing a spatial geometry of a right-handed baa or bab backbone conformation, upon interaction with anion endures an accompanying conformational change from a non-helical to a helical state at the 'C a NN' segment that extends the length of the anchoring-helix by one turn towards its N-terminus.
Fifty years ago, the chemistry Nobel laureate C.B. Anfinsen hypothesized that ''… the information ……. of the native secondary and tertiary structures (of proteins) is contained in the amino acid sequence itself'' [24]. As the 'C a NN' motif is found to be conserved in sequence and conformation through evolution, it would be worthy to know whether the information regarding its anion recognition is embedded in the local sequences and its conformational facet during anion recognition. Moreover, participation of the protein tertiary structure in affecting the N-terminal helix-extension upon anion binding has also to be corroborated. In a recent publication [25] using complementary spectroscopic techniques we have presented evidence for the interaction of a sulfate (SO 4 22 ) ion with a 'C a NN motif' segment (Gly-Lys-Gln from protein 1MUG) in an 18-residue designed chimeric peptide sequence in a context free system as reported in protein crystal structures.
To ensure the significance of the conserved nature of the sequence pattern in the 'C a NN' motif as well as to characterize the nature of interaction of anion with this protein segment, whether 'local' or 'global'; using computational and experimental approaches, we report here the interactions of both the sulfate (SO 4 22 ) and phosphate (HPO 4 22 ) ions (anions required by all cells for maintaining their normal function) [26], with several chimeric polypeptide sequences where the 'C a NN' anion binding structural motif containing residues have been appended at the N-terminus of these context free designed sequences. Molecular docking and molecular dynamics (MD) simulation studies have been employed to validate the nature of anion binding interaction along with its feasibility from the thermodynamic viewpoint and to monitor the detailed kinetic view of the accompanying conformational changes; while ESI-MS experiments are used to confirm the binding of anion(s) to the peptide. We describe here the conformational preference of the 'C a NN' motif during anion recognition which substantiate that the information regarding its anion recognition is embedded in the 'local sequences'. This study would help in understanding the sequence/structural context of anion recognition in proteins as proposed [12,13,14] along with the relevance of the associated thermodynamic parameters in binding interaction.

Results and Discussion
In order to rationalize the conformational preference of the 'C a NN' motif during its interactions with anion(s) [sulfate (SO 4 22 ) and phosphate (HPO 4 22 ) ions [11][12][13]16]], a series of context free 18-residue and 5-residue chimeric polypeptide sequences containing the naturally occurring 'C a NN' anion binding structural motif have been designed. Interaction with the anion(s) is studied using different computational approaches and results are discussed below.
However, the 5-residue sequence (SCPS224Ac) of the 'experimental NMR structure' has been obtained by truncating the anchor helix part of the 'experimental NMR structure' of CPS224Ac.
b. Model structures. The 'native model' structure [the Q, y dihedral angles of the first four residues ('C a NN' segment) at the N-terminus of 18-residue chimeric peptides (CPS224Ac, CPS226 and CPS228) have been fixed to the Q, y dihedral angles of the individual residues found in the respective crystal structure : pdb (protein data bank) code 1MUG, 1YCC, 1JW9 respectively (Table 1) [1,34,35]] and the 'extended model' structure [the Q, y dihedral angles of the first four residues (C a NN segment) at the N-terminus of 18-residue chimeric peptides (CPS224Ac, CPS226, & CPS228) have been fixed to 180u, 180u] have been generated using the Accelrys Discovery Studio 2.5.5 [36] where in both the cases remaining 14 residues at the Cterminus have been fixed as right handed a-helices (Q = 257u & y = 247u). The backbone geometry of both the model structures has been validated using the DSSP program [37].
Short five residue sequences (SCPS224Ac, SCPS226 and SCPS228) of 'native model' structure and that of 'extended model' structure have been obtained similarly by truncating the anchor helix part from the respective conformation of CPS224Ac, CPS226 and CPS228.  [38,39], have been performed between the 'experimental NMR structure' of CPS224Ac [25] and the anion(s) (separately with the sulfate and the phosphate ion) using several grid size to identify the site for recognition of anion in the peptide and to get an idea about the associated parameters of interaction. The results show that, in all the respective 250 iteration for individual sulfate and phosphate ion, for all the grid size only the 'C a NN' motif segment, present at the N-terminus of the context free helical peptide, can recognize the sulfate/phosphate ion through non-covalent interaction mediated through H-bond (hydrogen bond) (Figure 1a, b). MGL Tools (version 1.5.4) [40], used for monitoring the interaction of anion (sulfate/phosphate ion) with the 'experimental NMR structure' of peptide CPS224Ac, confirms that, for both the anions out of its four oxygen atoms, two are interacting simultaneously with the constituent atoms of the 'C a NN' motif (C a -H atom of Gly2, main-chain N-H atom of Lys3 and mainchain N-H atom of Gln4) of the related region of peptide CPS224Ac [ Figure 1a, b], as observed in the respective crystal structure (C a -H atom Gly108, main-chain N-H atom of Lys109 and main-chain N-H atom of Gln110) of DNA Glycosylase (1MUG) ( Figure S2). It is also found that out of these two interacting oxygen atoms, in each set of the respective 250 docked conformers, one oxygen atom is interacting concurrently with the C a -H atom of Gly2 and the main chain N-H atom of Lys3, while the other interacting oxygen atom has shared its contribution only to the main-chain N-H atom of Gln4, as observed in the crystal structure of 1MUG and proposed for the 'C a NN' motif [1,13]. This result supports the observation of our earlier NMR study on CPS224Ac [25] which showed that upon interaction with the sulfate ion the chemical shift value of the main-chain N-H along with C a H atom of Lys3 and the main-chain N-H along with C a H atom of Gln4 were only altered (downfield shift observed for N-H, while upfield shift observed for C a H when compared to those individual in the absence of the sulfate ion), while others including e-CH 2 and side chain NH 2 of Lys(s) remained unchanged ( Figure  S1). However, NMR results cannot locate the exact nature of the interaction of oxygen atoms of the sulfate ion, which could be acquired in details by this docking experiment.

II. Molecular Docking Experiment
Interacting parameters. Interaction between the 'C a NN' (C a 21 N 0 N +1 ) motif of the 'experimental NMR structure' of CPS224Ac and the sulfate/phosphate ion is mediated through classical H-bonds where the C a 21 -H-O H-bond pertains to be weak type [41,42], while the two N-H-O, H-bonds are as of the conventional (moderate/strong) type, as observed in the respective crystal structure [13]. Distances and angles constraints between each oxygen atom of the sulfate (SO 4 22 )/phosphate (HPO 4 22 ) ion and the constituent atoms of the 'C a NN' motif (C a -H atom of Gly2, main-chain N-H atom of Lys3 and main-chain N-H atom of Gln4) for each set of respective 250 docked conformers of the 'experimental NMR structures' of CPS224Ac are calculated (Figure 2a,b,c,d and Table 2) to describe the H-bond interactions, as in the crystal structures wherein the interactions are calculated on the basis of distance ((X)H-O,3 Å ) and angles (X-H-O .90u) constraints [13].
The (X)H-O (where X = C a 21 /N 0 /N +1 ) distances (,3 Å ) and the almost linear nature of the ,X-H-O (where X = C a 21 /N 0 / N +1 ) angles (.120u) ( Table 2) obtained for the individual 250 anion (sulfate/phoshate) docked conformers of CPS224Ac ('experimental NMR structure') have been fitted with a 'normal distribution' (Figure 2 a, b, c, d) for which the mean value (m) along with the standard deviation (s) have been shown in the inset. The goodness of fit of the curves, as given by values of the adjusted R 2 in figure 2 (a, b, c, d), show that for both the anions, the distribution of (X)H-O distances and ,X-H-O angles (where X = C a 21 /N 0 /N +1 ) largely follow the 'normal distribution', where almost the entire range of values cluster around the mean value with very little variance and comply well with the distribution ranges observed in such interactions in the crystal structures reported in the literature [13]. The observed low value of s/m (,10 22 ) highlights that the mean value can be approximated as the actual value which matches well with those observed in the crystal structure of DNA Glycosylase (pdb ID 1MUG) [1] resulting from the interaction of a sulfate ion (Table S1, Figure S2). The interaction, thus mediated through H-bond, indicates that the formation of structures are well justified and therefore be a good candidate for MD simulation.
Similar ranges of interacting parameters [(X)H-O distances and ,X-H-O angles (where X = C a 21 /N 0 /N +1 )] are also observed for each set of 250 individual docked conformations of SCPS224Ac (the 5-residue short 'experimental NMR structure') for both the anions. This validates the idea that 'C a NN' sequence with comparable conformation, even while the sequence is very short, can recognize the anion through H-bond interaction in a similar manner (Table S2 and Figure S3).   Table 2). MGL Tools (version 1.5.4) [40] indicates that out of the four oxygen atoms of the sulfate/ phosphate ion, two are interacting simultaneously with the constituent atoms of the 'C a NN' motif (C a 21 Gly2/N 0 Lys3/ N +1 Gln4) only. Out of these two oxygen atoms, one is interacting concurrently with the C a -H atom of Gly2 and the main-chain N-H atom of Lys3, while the other interacting oxygen atom shares its contribution only to the main-chain N-H atom of Gln4; similar to that observed in the 'experimental NMR structure', reported crystal structure (1MUG) [1] and proposed for the 'C a NN' motif [13].
For the short model structures (native as well as extended) similar interaction among the sulfate/phosphate ion and the 'C a NN' motif segment is observed. However, the corresponding values of the parameters vary according to the conformation of 'C a NN' motif segment and the interacting anion, as observed for its respective 18-residue chimeric analogue (Table S2).
Interacting parameters. Similar ranges of (X)H-O distance and ,X-H-O angle constraints (where X = C a 21 /N 0 / N +1 ) are obtained for the 'native model structure' of CPS224Ac (Table 2) as well as for its 'short native model structure' (SCPS224Ac) (Table S2) (Table 2  and Table S2), with respect to those of the 'experimental NMR structure' and the 'native model structure', support the conclusion. This corroborates the idea that 'helical' conformation at the 'C a NN' motif segment triggers favourable interaction with the anion.
a.i.3. Estimated free energy of binding/Calculated apparent binding energy. Magnitude of the 'binding free energy' although varies depending on the conformational status of the 'C a NN motif' segment as well as the interacting anion (Table 2), single cluster of 'free energy of binding' (AutoDock module calculated this using the inbuilt program within the software) [43] is obtained for each individual set of respective 250 docked conformers obtained from the 'experimental NMR structures', 'native model structures' and 'extended model structures' of CPS224Ac as well as their short sequence (SCPS224Ac) for both the sulfate and the phosphate ion. This clearly establishes the presence of only one site ('-Gly-Lys-Gln-' segment of the 'C a NN' motif at the N-terminus of CPS224Ac and SCPS224Ac) for anion recognition in the overall sequence and thus emphasizes the 'local' nature of interaction.
Average value(s) of 'estimated free energy of binding' obtained from the interaction of sulfate/phosphate ion and the different structures of CPS224Ac reported in Table 2 (for SCPS224Ac,  Table S2) may not be an absolute realistic measure; however, it reflects the relative estimate of anion binding potential of the 'C a NN' motif at its different conformational state as well as its preference of anion. The observed difference of ,2 Kcal/mole in average free energy of binding between the 'experimental NMR'/ 'native model' conformer and that of the 'extended model' structure accentuate that 'helical' conformation at the 'C a NN' segment has higher affinity for anion recognition in comparison to 'non-helical' conformation (difference in the backbone dihedral angle constraints of the interacting segment). This would validate the rationality of the extension of helical structure at the 'C a NN' segment from non-helical one, to accommodate the tetrahedral sulfate/phosphate ion for favourable interaction, as proposed by Denessiouk et al [13].
a.ii. Peptide CPS226 and CPS228. The 'model structures'. Appearance of single 'binding free energy' cluster obtained for each set of respective 250 docked conformers for both the 'model structures' ('native' and 'extended') of CPS226 and CPS228 peptides along with their short sequences confirms that 'C a NN' segment (Ser-Ala-Lys for CPS226/ SCPS226 and Gly-Gly-Leu for CPS228/SCPS228) [34,35] appended at the N-terminus of the sequences act as the only interacting site for the sulfate/phosphate ion as observed for CPS224Ac (and also SCPS224Ac) ( Figure S3, S4, S5). Magnitudes of the average 'binding free energy' as well as X-H-O (where X = C a 21 /N 0 /N +1 ) interacting parameters (distance and angles) vary depending on the conformation as well as the sequence of the 'C a NN' motif segment in the peptide and the interacting anion ( Table 2) Table 2, Table S1). Short sequences of both the peptides also show similar interactions (Table S2, Figure S3).
However, comparatively poor interaction of anion (sulfate/ phosphate ion) with the 'C a NN' motif segment in the 'extended model' structures is observed for both the sequences (CPS226,

21.48
The ranges of the parameters comply well with those obtained for respective crystal structures and described by Denessiouk (Table 2 and Table S2) as observed for the CPS224Ac and SCPS224Sc.
b. Interactions at a glance. In spite of the presence of three Lys residues in the anchoring helix and regardless of the hydrophilic or hydrophobic nature of amino acids in the 'C a NN' motif, it is found that in all the peptides only the consecutive C a 21 , N 0 and N +1 of the main chain atoms of the 'C a NN' motif located at the N-terminus of the anchor helix participate in anion recognition. This would be considered that the positive end of the helix macro-dipole may contribute to the anion recognition [44]. However, a more plausible alternative explanation for anion recognition can be interaction of anion with the localized microdipoles arising out of the NH (CONH) and C a H of the protein/ peptide main-chain through H-bond which is to a large extent electrostatic in nature [45,46]; as changes in charge at the Cterminus of helix have been found not to affect the sulfate binding in proteins [47]. In addition, the greater acidic nature of H of main chain CONH in comparison to that of the side chain e-NH 2 of Lys, would allow the main chain NH (as CONH) for higher anion recognition over the side chain NH 2 , through formation of stronger H-bond (N-H-O), which is explicitly observed in our study. This observation is in contrast with the recent anion binding study of Lys incorporated hexapeptides, where Bianchi et al [16] have 'unexpectedly' observed the participation of Lys side-chain e-NH 2 along with main chain CONH in the interaction with HPO 4 22 anion. Thus the resulting X-H-O (X = C a 21 , N 0 and N +1 ) distances and angle parameters appear on the recognition of both the sulfate and the phosphate ion separately by the 'C a NN' segment through H-bond, (Table 2, Figure 2 and Table S2), which comply well with those parameters reported for the respective crystal structures [13] (Table S1), emphasizes that the 'C a NN' motif segment has an intrinsic affinity for the anions and the information regarding anion binding is embedded in its local sequences. Thus one can justify the 'conserved nature' of the 'C a NN' sequence for anion recognition through 'local' interaction.
However, a consensus can be drawn from the comparison of the binding free energy (gives a relative estimate for the feasibility of interaction) obtained from the docking experiments (Table 2,  Table S2). Interaction with the sulfate ion is likely to be preferred by the 'C a NN' motif segment, in comparison to that of the phosphate ion although both the interactions are thermodynamically favorable. This is similar to the observation of Demuth et al [48] which showed the dominance of interaction of the sulfate ion over the phosphate ion with a designed oligopeptide where hydrophilic side chains (Ser, Arg) through salt bridges play crucial role in intermolecular interaction. Moreover, the observed preference for sulfate ion interaction also supports the fact that although 'C a NN' motif is considered as novel structural motif in protein for recognition of phosphate ion, yet during crystallization of these proteins in several cases sulfate ion occupies the positions of interaction as anions [13].
Our study thus highlights the significance of the conserved nature and the conformational preference of the 'C a NN' motif sequence for recognition of the anion(s) as suggested by Denessiouk et al. [13]. Comparatively 'poor' interaction between the 'extended model' structure of all the peptides (CPS224Ac and SCPS224Ac; CPS226 and SCPS226; CPS228 and SCPS228) and the anion(s) ( Table 2 and Table S2), clearly corroborates that for thermodynamically favorable anion recognition through H-bond by the 'C a NN' motif, helical conformation at this segment is required. This would rather strongly justify the extension of 'helical' structure at the 'C a NN' motif segment from the 'nonhelical' structure as a result of co-operative effect initiated by the approach of anion towards the motif segment and augmented by the attached helix as observed in the interaction of sulfate ion with CPS22Ac using CD and NMR spectroscopy [25]. This point is further clarified below from MD simulations.

III. Molecular Dynamics Experiment
Computer simulated Molecular Dynamics (MD) study is one of the best theoretical approaches for investigating the protein/ peptide conformational microstates as well as the atomic-level details during the binding interaction from kinetic approach [49][50][51]. Moreover, simulations with explicit water help to obtain more realistic picture of the interaction.
To investigate anion induced helical conformation at the 'C a NN' motif segment, MD simulations (for 40 ns at 276K) are carried out starting from the sulfate ion-bound conformer(s) of the 'experimental NMR structure' of CPS224Ac obtained from the docking experiments. Results, summarized in Figure 3 and Figure 4, show that during 'sulfate-peptide' interaction (,400 ps), out of the two interacting oxygen atoms of the sulfate ion, one oxygen atom is interacting concurrently with the C a -atom of Gly2 and the main-chain N-atom of Lys3, while the second interacts only to the main chain N-atom of Gln4 of the 'C a NN' motif through H-bond (Figure 3a), similar to that found in 1MUG ( Figure S2) and reported for anion-'C a NN' motif interaction [13] (a few snapshots of the MD trajectory is represented in Figure S6). This establishes that the 'C a NN' anion-binding motif comprised of C a 21 , N 0 and N +1 backbone atoms from three consecutive residues (Gly-Lys-Gln), binds the anion through 'local' interactions even in a context free system, as observed in the docking experiments as well as NMR experiment [25] of CPS224Ac with sulfate ion. However, once the sulfate ion leaves the 'C a NN' motif, it remains as free sulfate ion till 40 ns and does not bind to any group/atom, even to the e-NH 2 group of Lys; which is consistent with the single interacting site, observed in the docking experiment. To confirm the statistical significance of such 'sulfate ion-'C a NN' interaction', additional MD experiments have been pursued with bound anion for different time steps (20-40 ns) with different initial velocities. Similar type of interactions of sulfate ion with the 'native model' structure of CPS224Ac (residence time ,1000 ps) at 276K ( Figure S7) is also observed.
To assess the residue specific conformational microstates observed during the simulation (,40 ns) of 'experimental NMR structure' of CPS224Ac with the sulfate ion, the backbone dihedral angles (Q, y) are plotted on Ramachandran plot (Figure 3b). It can be seen that the backbone dihedral angles for the residues Ala5-Aib15 are mostly confined to the right-handed helical conformation throughout the simulation. However, an appealing scenario is observed for the 'C a NN' residues especially for Gln4 and Lys3, although the starting conformation (sulfate ion docked structure) of -Gly-Lys-Gln-is b-a R -a R as described by Denessiouk et al. [13]. The backbone dihedral angles (Q, y) of Gln4 (Figure 3b, details not shown) is distributed among the right-handed helical conformation (up to ,400 ps), non-canonical helical conformation (400,7000 ps) along with the PPII conformation (,7000-40,000 ps), while for Lys3, the Q and y values remain in the right-handed helical conformation [Q = 270u (615), y = 257u (612)] till the sulfate ion interacts to the 'C a NN' segment (Gly-Lys-Gln) of CPS224Ac (,400 ps). Once the sulfate ion ceases to interact with the peptide segment, the Q and y values of Lys3 instantaneously undergo an immediate transition from the helical form to the PPII conformation [Q = 270u (615), y = 150u (620)] (Figure 3a, c) and remains there till the end of simulation (400-40,000 ps). Similar distributions of backbone dihedral angles are obtained for the 'native model' structure with sulfate ion interaction during the MD simulations (data not shown).
NH-NH (i,i+1) distance parameter (,4 Å ) is a good measure for prevalence of the helical conformation [31]. Tracking the interaction using Euclidean distance measure, it is observed that the NH-NH distance between Lys3-Gln4 remains ,2.5 Å till the sulfate ion is attached to the motif sequence (,400 ps). This, suddenly increases to .4.5 Å once the sulfate ion moves apart (,400 ps) from the peptide segment and remains distant (.4.5 Å ) till the end of the simulation (Figure 4). An increase in the NH-NH distance between Gln4-Ala5 from ,2.5 Å to ,3.8 Å is also observed once the interaction of the sulfate ion fades away. The concurrent increase in the NH-NH distance between Lys3-Gln4 and Gln4-Ala5 reflects the destabilization of the helical population at 'C a NN' motif resulting from the loss of the sulfate ion interaction. However, the restriction of Gln4 (Q, y) conformation into non-canonical helical form even in absence of the sulfate ion (up to ,7 ns), supported by the NH-NH distance between Gln4-Ala5 (Figure 4), may be attributed to the presence of the neighboring helicogenic a-amino isobutyric acid (Aib/B) residue (Aib6) [29,30]. This effect of Aib on Gln4 can be inferred from the MD simulations of the short five-residue sequence SCPS224Ac with the sulfate ion, where ejection of the sulfate ion from the 'C a NN' segment (,400 ps) immediately transits Gln4 towards a non-helical conformation (PII) ( Figure S8).
The MD results thus corroborate that the helical conformation of the 'C a NN' motif segment embedded in the context free peptide sequence is sustained only during the time of sulfate ion interaction [for 'experimental NMR structure' ,400 ps, while for 'native structure' ,1000 ps; Figure 3 and Figure S7]. On ejection of the sulfate ion, the helical conformation of the 'C a NN' motif undergoes an immediate transition to a non-helical 'extended structure' and at no point the segment has acquired any nascent form of helical structure despite a long simulation time (40 ns). This strongly suggests that the 'C a NN' motif peptide segment itself has limited or no intrinsic helical propensity and validates that there is a cooperative transition involving the C a NN motif into helix, induced by the anion binding. In this new conformation the length of the anchoring helix gets extended by one additional turn towards the N-terminus as proposed for 'C a NN' motif by Denessiouk et al. [13], and observed in CPS224Ac-sulfate ion interaction through NMR spectroscopy [25]. The observed m/z values [for the sulfate ion (m/z 926.5 with isotopic distribution difference of 0.5 indicating doubly charged species) and for the phosphate ion (m/z 926.4 with isotopic distribution difference of 0.5 indicating doubly charged species)] obtained in ESI-MS experiments confirm the relevant interaction of sulfate and phosphate ion with the peptide CPS224Ac (Figure 1c, d). As biophysical experiments ensure the anion recognition, the loss of the sulfate ion in MD experiments may be interpreted as an indicator for lack of intrinsic helical propensity of the 'C a NN' segment rather than lack of anion binding; because it is obvious that the force-field is not successful in encapsulating the experimental observation.

IV. Conclusion
The current work primarily substantiates that information regarding the anion recognition by the 'C a NN' motif (found in naturally occurring proteins) is embedded within its primary structure and justify the 'conserved nature' of the sequences in 'C a NN' segment. Complementary computational techniques along with ESI-MS experiments clearly establish that even in the absence of a proteinaceous environment ('C a NN' segments appended as a part of an isolated helix, without any tertiary structural effect) the 'C a NN' motif sequences can recognize the sulfate/phosphate ion through 'local' interactions corroborating its intrinsic affinity for the anions. In the MD simulation, the detailed view of loss of the helical structure at the N-terminus immediately after fading out of interaction of the anion (sulfate ion), confirms an accompanied conformational switching from the non-helical to the helical state at this 'C a NN' anion-binding motif segment upon interaction with the anion, as proposed in the literature. The binding free energy (negative value) and the interaction parameters obtained from the computational study suggest that the anion recognition/binding is thermodynamically favorable; however, it depends on the conformation of the motif segment and the nature of the interacting anion. The helical conformation induced at this four-residue segment of the 'C a NN' anion-binding motif upon anion binding may occur as a result of the co-operative effect initiated during the approach of anion towards the 'motif' segment and be augmented by the anchoring helix which may have an implication for nucleation of folding. Moreover, using these computational techniques the detailed overview obtained about the stereochemistry of the anion recognition, which is really difficult to acquire from the 1 H-NMR data as the results obtained in NMR experiments is a weighted average, are in consistent with what is proposed about the 'C a NN' anion-binding structural motif in proteins and validate our previous results of the sulfate ion interaction with the peptide CPS224Ac as monitored by NMR and other spectroscopic techniques [25]. This study will help our understanding of the influence of sequence/structural context of anion binding in proteins which not only shed light on the aspects of protein folding but also provide guidelines for designing 'peptide-based model scaffold' host-receptors for anion-binding sites. This would likely to be adaptable for novel recognition purpose at the molecular level and to be used in various biotechnological applications.

Docking
To characterize molecular recognition and binding of sulfate (SO 4 22 ) and phosphate (HPO 4 22 ) ion by the 'C a NN' anion recognition segment with different conformations in a series of designed peptide sequences, AutoDock 4.2 software package [38,39] is used using Mac OS X 10.6.6 as an operating system. AutoDock Tools (version 1.5.4) is used to prepare the input files for docking experiments (rigid docking). Atomic coordinates of sulfate (SO 4 22 ) and phosphate (HPO 4 22 ) ion are obtained from protein data bank and docked separately with 'experimental NMR structure' and 'native model structure' along with the 'extended model structure' of 18-residue chimeric and short 5-residue sequences. All the hydrogen atoms are added and partial charges (Gasteiger charge) are assigned in the standard PDBQT file format. Different grid values [i.) first eight residues (up to Lys8) from the N-terminus acetyl along with their side chains and ii.) first five residues (up to Ala5) from the N-terminus acetyl with their side chains for all 18-residue sequences, while for all short sequence up to Ala5 including side chains from the N-terminus acetyl] are used to confirm the anion recognition site in the peptides. Although for both the grid values, identical site of interaction along with similar binding energy [e.g. for 'experimental NMR structure' of CPS224Ac 24.04 Kcal/mol for the grid value up to Lys8 (Grid points: 44(x) 44(y) 44(z)); while 23.96 Kcal/mol for grid value up to Ala5 (Grid points: 40(x) 40(y) 40(z))] are obtained in all the respective cases; however, for comparison of interaction parameters among the peptides ( Table 2 and Table S2) results for the grid value containing up to Ala5 is shown (as there is only 5 residues in short sequence). A user-defined protocol of 150 randomly generated individuals with a maximum number of 2.5610 7 energy evaluations and a maximum number of 2.7610 4 generations is employed for running the docking programme. For each case 250 anion bound conformations (iteration) are generated using Genetic Algorithm (GA-LS) searches with a mutation rate of 0.02 and a crossover rate of 0.8. Validation of the docking results is satisfied using the same package of AutoDock.
All the images are developed using Avogadro (version 1.0.3) and Chimera (version 1.6.2) [52]. Further, the images are rendered using POV-Ray (version 3.6).

MD Simulations
Molecular Dynamics studies are carried out on peptide-sulfate complexes -a. sulfate bound to CPS224Ac ('experimental NMR' structure and 'native model' structure); b. sulfate bound to SCPS224Ac ('experimental NMR' structure), using version 4.0 of the GROMACS package [53,54] with the OPLSAA force field [55]. The starting structures of the complexes used in this molecular dynamics study are the lowest energy structure(s) of the sulfate (SO 4 22 ) ion bound particular conformation of the peptide, obtained from docking experiments. The sulfate (SO 4 22 ) ion is built using GaussView 5.0.8 [56] and the geometry optimization with partial charge calculation are done using gaussian09 [57] with the 6311G++(d,p) basis set.
The peptide-sulfate complex(s) is placed within a cubic box with a distance of approximately 1.0 nm between the periphery of the complexes and the sides of the box. The box is filled with preequilibrated SPC (simple point charge) water [58]. The charge neutrality of the system is achieved by the insertion of four Cl 2 and two Na + ions replacing six water molecules for the long sequence and one Cl 2 and two Na + ions replacing three water molecules for the short sequence respectively. Both the systems are subject to an initial energy minimization applying the steepest descent algorithm. The systems are then subjected to NVT (constant number of particles, volume and temperature) ensemble dynamics to allow the relaxation of the water molecules in the systems while the peptide-sulfate complexes is restrained to their initial coordinates with a force constant of 100 KJmol 21 , followed by NPT (constant number of particles, pressure, temperature) equilibration [59] for 100 ps respectively. The systems are prepared by heating in increments of 50K for 20 ps until the desired temperature of 276K (3uC) is reached. Subsequently production MD simulations at constant pressure and temperature are carried out at 276K (3uC) with a time step of 2 fs using Berendsen coupling method [59]. The temperature and pressure time coupling constants used are 0.1 and 0.5 ps respectively. LINCS [60] algorithm is applied to constrain all bonds during the simulations. PME [61] method with a cut-off of 1.0 nm is used to calculate the non-bonded interactions and the neighbor-list is updated every 10 steps. The structures after every 1 ps are saved to the trajectory and are analyzed using various GROMACS utilities and custom written Perl and Fortran programs. The hydrogen bonding interaction between the sulfate ion and the peptides are calculated using a simple Coulomb point charge model adapted from the DSSP program [37] for calculating secondary structure from hydrogen bonds. For S = O-H-N interaction, the equation used is [E = 332(qSqH/rSH+qSqN/rSN+qOqH/rOH+qOqN/rON) Kcal/mol; where 'r' represents inter-atomic distances in Å and 'q' represents partial charge at each atom: qS = 1.6; qO = 20.9; qH = 0.3; qN = 20.5], and for S = O-H-C a the equation used is [E = 332(qSqH/rSH+qSqC a /rSC a +qOqH/rOH+qOqC a /rOC a ) kcal/mol, where 'r' represents inter-atomic distances in Å and 'q' represents partial charge at each atom: qS = 1.6; qO = 20.9; qH = 0.06; qC a = 20.1]. E 20.5 to 21.0 kcal/mol and E#21.0 are considered as a weak [41,42]

ESI-MS
Binding of sulfate/phosphate added species of CPS224Ac (through m/z value) is studied in negative mode in ESI-FTMS instrument (Apex Ultra 70, Bruker Daltonics direct infusion mode) using H 2 O/CH 3 CN (1:1) with 0.1% NH 3 . Figure S1 Effect of addition of sulfate ion to peptide CPS224Ac using NMR spectroscopy. a) 1D spectra of (HN region) indicate that only K3 and Q4 suffer a change in chemical shift values (downfield shift). b) DdC a H (Ha secondary chemical shift in ppm) in absence as well as in presence of sulfate ion (identified through individual TOCSY experiment). c) Cluster of 10 best ranked NMR-derived structures of sulfate added CPS224Ac, generated from NMR constraints using programme DYANA (TIF)  Figure S8 Distribution of backbone dihedral angles (Q, y) of the Gln4 residue during the MD simulation of sulfate ion interactions with the 'experimental NMR structure' of SCPS224Ac emphasizing its existence in helical conformation during interaction of sulfate ion with 'C a NN' motif segment peptide and the role of Aib6 residue pertaining the non-canonical helical conformation of Gln4 in CPS224Ac in the absence of sulfate ion. Two snapshots of the interaction are shown when the sulfate ion is close to 'C a NN' segment and apart from the segment. (TIF) Table S1 Representation of interaction parameters ((X)H…O Distance (Å ) and ,X-H…O Angle (6)) of anion (sulfate ion) with 'C a NN' segment obtained from the respective crystal structures of proteins reported in PDB ( a Barrett etal.,1998; Louie and Brayer 1990; c Lake etal., 2001).

(DOC)
Table S2 Interaction parameters between the sulfate/ phosphate ion with the related 'C a NN' segment of the short 5-residue peptides (truncated version of the 18residue sequences) in a context free system (250 docked structures of the individual conformation) are described in terms of X-H-O (where X = C a 21 /N 0 /N +1 ) distances (Å ) and angles (6) indicating the nature of H-bond formation (mean value of the parameters in parenthesis). The estimated binding free energy gives a relative affinity for anion which depends on the conformational status of the 'C a NN' segment. (DOC)