Local Conformational Changes in the DNA Interfaces of Proteins

When a protein binds to DNA, a conformational change is often induced so that the protein will fit into the DNA structure. Therefore, quantitative analyses were conducted to understand the conformational changes in proteins. The results showed that conformational changes in DNA interfaces are more frequent than in non-interfaces, and DNA interfaces have more conformational variations in the DNA-free form. As expected, the former indicates that interaction with DNA has some influence on protein structure. The latter suggests that the intrinsic conformational flexibility of DNA interfaces is important for adjusting their conformation for DNA. The amino acid propensities of the conformationally changed regions in DNA interfaces indicate that hydrophilic residues are preferred over the amino acids that appear in the conformationally unchanged regions. This trend is true for disordered regions, suggesting again that intrinsic flexibility is of importance not only for DNA binding but also for interactions with other molecules. These results demonstrate that fragments destined to be DNA interfaces have an intrinsic flexibility and are composed of amino acids with the capability of binding to DNA. This information suggests that the prediction of DNA binding sites may be improved by the integration of amino acid preference for DNA and one for disordered regions.


Introduction
Protein-DNA interaction plays an essential role in many cellular functions such as transcription, replication, recombination, and DNA packaging. To understand the recognition mechanisms of individual DNA binding proteins, the protein structures of DNA-bound as well as DNA-free forms have been analyzed [1,2,3,4,5]. It has been reported that flexible regions undergo conformational changes in order to recognize specific DNA targets [4,5,6,7,8,9,10]. For example, the b2/b3 connecting loop of the papillomavirus E2 protein, which is unstructured in the free form, adopts a b-hairpin conformation in order to form electrostatic contacts with DNA backbone phosphates in the complex form [4,6]. The conformational change in the loop has also been observed in molecular dynamics simulations [7,8]. Another example of conformational change was observed in the linker region of MATa2 [5]. In this case, two independent copies of the complex were found in the asymmetric unit. The flexible linker [9] in one copy of MATa2 adopted an a-helix structure and the other adopted a b-strand structure. The sequence of the region is coined as a chameleon sequence. This conformational transition at the sequence is thought to be important for DNA recognition [10].
The sequence characteristics of such proteins have also been examined. Dunker and other groups developed methods to predict the intrinsically disordered region of proteins on the basis of X-ray, nuclear magnetic resonance, and circular dichroism spectroscopic data [11,12]. Such regions are thought to undergo a disorderedordered transition of the conformations when they interact with a binding partner [11]. The genome-wide application of these methods indicated that transcription factors, especially those in eukaryotes, have a higher amount of intrinsically disordered regions [13,14]. The proposed role of the regions is to facilitate DNA searching and modulate the specificity and affinity to DNA [15]. Although the conformational change in the flexible region in DNA binding proteins has been well recognized, comprehensive analyses of the local structural rearrangements of proteins upon DNA binding have not been conducted yet.
To assess the structural rearrangement, the classification of the 3D geometries of local protein structures is necessary. Historically, Pauling first proposed the idea that protein structures could be represented as strings of secondary structures [16], and since then, secondary structures have often been used to compare protein structures [17,18]. However, secondary structures are too coarse to detect subtle local conformations because they only focus on the arrangement of the hydrogen bonding partners of the backbone atoms. The use of structural alphabets was then proposed to more precisely describe local structures, where the alphabets are assigned to certain local conformations [19]. Structural alphabets have been reported to classify protein structures more precisely than secondary structures [20,21] and have been applied to structure prediction [22,23], 3D structure comparison [24,25], motif searches [26], protein-protein interaction analysis [27,28], and de Novo protein design [29].
In this study, quantitative analysis of conformational changes in DNA binding proteins using structural alphabets [30] was performed. Using sets of proteins whose structures were solved for both the DNA-free and DNA-bound forms, it was found that DNA interfaces have higher conformational flexibility than noninterfaces. It was also found that conformationally changed regions in DNA interfaces have a high amount of glycine, proline, and the hydrophilic residues that have previously been found in intrinsically disordered regions [11,12,31]. This result indicates that fragments of DNA interfaces are composed of amino acids that have high flexibility and DNA binding capability.

Materials and Methods
Data Preparation for DNA-bound and DNA-free Forms Non-redundant pairs of crystal structures of DNA-free and DNA-bound forms were prepared as follows. A flow chart of the data preparation and a schematic diagram of the reduction of the dataset redundancy are given in Figure 1.
To prepare the dataset for the DNA-bound forms, the Protein Data Bank (PDB) (December 2010 version [32]) was searched for all DNA complexes with better than 3 Å resolution. The proteins that were co-crystallized with ss-DNA, Z-DNA, and RNA were then discarded. Antibodies, artificial DNA binding proteins, and a structure of a trp repressor that was crystallized with a high concentration of isopropanol (pdbID: 1mi7) were further excluded. The PDB was then searched for the DNA-free forms of the DNA-bound proteins (.90% sequence identity) with a resolution better than 3 Å . To reduce the dataset redundancy, the selected proteins with a sequence identity of 30% were clustered using Blastclust [33]. Subclusters with a sequence identity of 90% were then made within each of the clusters. In each cluster, the representative subcluster was determined to be the one that contained the largest number of protein chains among all subclusters. Finally, 126 representative cluster pairs of DNAbound forms and DNA-free forms were obtained. Hereafter, these representatives are referred to as DB bound and DB free , respectively.
To evaluate the conformational variation in the DNA free forms of the proteins, another dataset was prepared. Clusters that had more than two protein chains were extracted from DB free . This dataset is referred to as DB free($2) and contained 86 clusters. The members of DB bound , DB free , and DB free($2) are listed in Table S1.

Assignment of 11 Structural Alphabets
A library composed of 10 4-residue-long fragments that were developed by Kolodony et al. [30] and one fragment we introduced in this study was used to describe the protein structures. For describing the framents, alphabets A to J were assigned ( Fig. 2(a)) and their conformations are shown in Table S2. Hereafter, we call them structural alphabets. In addition to the original 10 fragments, one ''Y'' code was introduced to describe a fragment for which any of the Ca atoms were not determined in the crystal structure. Such fragments are thought to acquire multiconformations in a crystal. To each structural fragment, the bestmatched alphabet in terms of the root mean square deviation of the Ca atoms (cRMS) was assigned. The fragments corresponding to 5% outliers in the cRMS distributions were discarded from the analysis. In addition, fragments were excluded from the analysis if the sequences of the corresponding fragments were not identical among the proteins of the same cluster. To intuitively catch conformational feature of fragments, the ten alphabet structures were further classified into three conformations, extended, looplike and helix-like conformations based on the cRMS ( Fig. 2(b)).

DNA Interface/non-interface Assignment
A DNA interface residue was one that was exposed to both the solvent as well as the DNA. To determine such residues, the solvent accessible surface area (ASA) of the proteins in the DNAbound form was first calculated after removing the bound DNA using the ASC program [34]. The relative ASA for each residue was then calculated as the ratio of the surface area of a residue in the protein structure to that of a residue in a Gly-X-Gly tri-peptide having the trans form. The surface residues were defined as those with a relative ASA of more than 20%. The ASA on the protein structure in the presence of the bound DNA was also calculated. If the ASA of the residue exposed to the solvent was different between the DNA-bound and unbound structures, the residue was considered to be within the DNA interface. In addition, if at least one residue of a fragment was judged to be within the DNA interface, the fragment was considered a DNA interface and the remaining fragments were regarded as DNA non-interfaces.

Conformational Changes Upon DNA Binding
Conformational changes were considered to be the differences in the structural alphabets of the fragments in the DNA-free and the corresponding DNA-bound forms. The probability of the conformational change of fragment l from alphabet i in DNA freeforms (w l chg,i ) was calculated as follows: where ''region'' denotes DNA interface or DNA non-interface hereafter. The alphabet propensity, or the ratio of the frequencies of the conformational changes in the DNA interfaces and non-interfaces (R where L region is the number of fragments in region.

Conformational Variations in the DNA-free Forms
To determine the conformational variations in the DNA-free forms, the intrinsic conformational variation, which can be observed as alphabet variations within the same fragments obtained from different crystal structures, was considered. The expected alphabet variation of fragment l (w l var,i ) was calculated using the following equation: where q l i is defined as the probability that two members randomly selected from a set of fragment l will both have alphabet i and is given as For the disordered conformation ''Y,'' q l Y was defined as 1. : The alphabet propensity of the conformational variation in the DNA interfaces to that in non-interfaces (R

Propensity Calculations
To measure the relative differences in a pair of frequencies, propensities for various pairs of frequencies were calculated. Here, we describe, for example, how residue propensities are calculated. The frequencies of each amino acid r in the conformationally changed fragments were calculated for DNA interfaces (f interface res~r ) and non-interfaces (f non{interface res~r ) with the following equation: where r l,j is the jth amino acid residue of fragment l. The functiond(x) is 1 if x is 0 and 0 otherwise. The frequencies of each amino acid r in the conformationally unchanged fragments were calculated for the DNA interfaces (f interface res~r ) and the surface (f non{interface res~r ) with the following equation: The residue propensities for the DNA interfaces (R Homing endonuclease I-DmoI Phosphate regulon transcriptional regulatory protein

Alphabet Propensities for the Conformationally Changed Fragments and for Disorder-to-order Conformationally Changed Fragments and Order-to-order Conformationally Changed Fragments
In a similar way as residue propensity, calculated are alphabet propensities in the DNA interfaces (R bound=free½interface alp~i ) and noninterfaces (R bound=free½non{interface alp~i ) to characterize alphabets induced by conformational changes upon DNA binding, alphabet propensities for the DNA interfaces (R bound=free½interface order{order,alp~i ) and noninterfaces (R bound=free½non{interface order{order,alp~i ) to characterize the order-toorder conformational changes, and alphabet propensities of the fragments that undergo a disorder-to-order conformational change upon DNA binding for the DNA interfaces (R bound=free½interface disorder=order,alp~i ) and non-interfaces (R bound=free½non{interface disorder=order,alp~i ). The details for these calculations are provided as supplementary information (Text S1).

Statistical Reliability
Because the number of protein structures used in this study was limited, the statistical reliability of the calculated values was evaluated. The BCa bootstrap procedure [35] was used to estimate the confidence intervals for frequencies calculations on which propensities were calculated. We constructed 10,000 bootstrap datasets by resampling DB bound and DB free . In this test, the reliability standard was set as 85% of a two-sided confidence interval from the average value.

Dataset Preparation and Structural Alphabet Assignment
One hundred and twenty-six representative pairs of clusters in the DNA-free (DB free ) and DNA-bound (DB bound ) forms were   Table S2. **The molecular names were extracted from the PDB headers of DNA-bound forms. If the molecular names in the headers do not describe the molecule (e.g. Putative protein), the molecular names were extracted from the literatures. ***3 indicates that the clusters were used for DB free($2) . doi:10.1371/journal.pone.0056080.t001 obtained with a sequence similarity of less than 30%. The representative clusters were a set of subclusters with the largest members within each cluster (Table 1 and Table S1). The proteins of the 126 clusters had dsDNA binding domains that belonged to different structural classes according to the SCOP classification (version 1.75) [36]: 43 all alpha proteins, 12 all beta proteins, 30 alpha and beta proteins (a/b), 22 alpha and beta proteins (a+b), 11 multi-domain proteins (a andb), 1 small protein, and 1 coiled coil protein. The remaining 32 proteins were not classified in the SCOP database. The obtained structures were divided into overlapped 4-residuelong fragments and assigned the relevant structural alphabets, and then, the changes in the alphabets in the DNA interfaces and noninterfaces were analyzed. The datasets contained 4963 fragments for the DNA interfaces and 20826 for the non-interfaces. If longer fragments are used, more fragments will be required to express the conformation within the similar range of errors to the 4-residue-long fragments; however, due to the limit of available data, 4-residuelong fragments were used to obtain statistically significant results.

Fragments in DNA Interfaces Tend to have more Intrinsic Variations in their Conformations than those in DNA Non-interfaces
The frequencies of conformational change upon DNA binding for the DB free and DB bound datasets were calculated. A conformational change was considered to be a change in the alphabet between the DNA-free and DNA-bound forms. The frequencies of conformational changes in the DNA interfaces (f interface chg ) and noninterfaces (f non{interface chg ) were 23.1% and 14.7%, respectively (Fig. 3  (a)). This result indicates that compared with non-interfaces, DNA interfaces tend to undergo more conformational change upon DNA binding.
It was anticipated that the fragments in DNA interfaces might have a more intrinsic propensity to change conformation in order to adjust to the DNA structure. To examine this assumption, the conformational variations for the fragments in the DNA-free forms were calculated using the dataset DB free($2) , in which each cluster has at least two members in the DNA-free form. The frequencies of conformational variation in the DNA interfaces (f interface var ) and non-interfaces (f non{interface var ) are shown in Fig. 3(b). For the DNA-  free forms, the conformational variation in the DNA interfaces (24.4%) was higher than that in the non-interfaces (14.1%), indicating that intrinsic flexibility exists in the DNA interfaces. The flexibility of DNA interfaces was also pointed out in a previous analysis using a small set of DNA binding proteins (7 proteins) [37]. This finding was reconfirmed here using a larger set.

No Specific Alphabets are Responsible for the Conformational Changes in DNA Interfaces
Next, the differences in the frequency of conformational change for the different alphabets were analyzed to reveal which local structures were affected most often. The alphabet propensities of conformational change in the DNA interfaces to that in the noninterfaces (R interface=non{interface chg,i ;i is one of the 11 alphabets) are shown in Fig. 4(a) and those for conformational variation (R interface=non{interface var,i ) are shown in Fig. 4(b). In Fig. 4(a), a positive value indicates that frequency that alphabet of the free form changes the conformation in DNA interfaces upon DNA binding is higher than that in the non-interfaces. In Fig. 4(b), a positive value indicates that frequency that alphabet has the conformational variation in DNA interfaces is higher than that in noninterfaces. As expected, conformational changes occurred more frequently in the DNA interfaces (positive values in Fig. 4(a)) for all the alphabets, and the alphabets A, D, E, F, G and Y of the 11 alphabets were significantly high. These 6 alphabets are likely to appear more often; however, the errors for B, C, H, I and J are too large to conclude that they significantly appear in the interfaces. The frequencies of the conformational variation in the DNA interfaces were also significantly higher than those for the noninterfaces except for G ( Fig. 4(b)) though it is difficult to say which alphabet appears most often owing to the large errors.

Conformationally Changed Fragments in the DNA Interfaces have Amino Acids Suitable for Producing Flexibility and Binding to DNA
To reveal whether specific amino acids in the DNA interfaces affect the conformational change upon DNA binding, two pairs of residue propensities were calculated. First, the amino acid propensity in conformationally changed fragments to that in conformationally unchanged fragments (R chg=unchg½interface res~r , R chg=unchg½non{interface res~r ) was determined for the DNA interfaces and non-interfaces ( Fig. 5(a)). A positive value indicates that frequency that amino acid is observed in conformationally changed fragments is higher than that in unchanged fragments and zero indicates that both frequencies are equal. For example, 1.0 means that frequency of the amino acid in conformationally changed fragments is 2.7 times higher than that in unchanged fragments. Second, the amino acid propensity in the DNA interfaces to that in the non-interfaces was calculated for the conformationally changed and unchanged fragments (Fig. 5(b)). Figure 5(a) shows that Asn, Gly, Pro, Ser, Asp, Thr and Lys have a positive value, indicating that they are favored amino acids in conformationally changed fragments located in non-interfaces. In contrast, the disfavored amino acids in the conformationally changed fragments of the non-interfaces were Trp, Cys, Ala, Arg, Leu, Met, Phe, Ile, and Val. These results clearly show that hydrophilic residues, Gly, and Pro are located on more flexible fragments in the non-interfaces. This trend was also found in disordered regions [11,12,31]. However, there were no significant differences between the propensity in the DNA interfaces and noninterfaces (filled and open bars in the figure), indicating that the conformation change in fragments depends basically on the amino acid types constituting the fragments and not on the positions.
The propensities of the amino acid frequency in the DNA interfaces against that in the non-interfaces for conformationally changed fragments (R interface=non{interface½chg res~r ) and conformationally ) are shown in Fig. 5(b). A positive value indicates that frequency of amino acid in DNA interfaces is higher than that in non-interfaces and zero indicates that both frequencies are equal. The amino acids that favored to interact with DNA in the conformationally unchanged fragments are Arg, Thr, Lys, Gly, Ser, His, and Asn. On the other hand, Glu, Asp, Leu, Phe, Val, Ala, and Ile were disfavored in those fragments. The importance of basic and hydrophilic residues in the DNA interfaces has been emphasized in several previous reports [38,39,40]. Gly was also reported to be favored in protein-DNA interfaces but not in protein-protein interfaces [41]. Again, no significant differences could be detected in the propensity of the conformationally changed and unchanged fragments. These findings in amino acid propensities indicate that the amino acid preference depends solely upon the location of a fragment, that is, upon whether it is in a DNA interface or not, and whether a conformational change occurs upon DNA binding depends on the type of amino acids that constitute a fragment.

Three Specific Alphabets Appear more Often in Conformationally Changed Fragments Located in DNA Interfaces
Next, the alphabets that specifically increased upon DNA binding were evaluated to determine if they were different between the DNA interfaces and non-interfaces. To this end, the alphabet propensities R bound=free½interface alp~i for the DNA interfaces and R bound=free½non{interface alp~i for the non-interfaces were calculated and are shown in Fig. 6(a). Here, a positive value indicates that frequency that alphabet is observed in DNA-bound forms is higher than that in free forms.
For the fragments that undergo a conformational change and are located in DNA interfaces ( Fig. 6(a)), the helix-like conformation C increased most significantly (log e propensity = 0.8 indicates exp(0.8) or 2.2 times more frequent in DNA interfaces than in non-interfaces). Helix-like conformations J (0.4) and I (0.3), extended conformation B (0.5), and loop-like conformations H (0.5) and A (0.3) also increased upon DNA binding. The relative frequency of Y was significantly reduced upon DNA binding Figure 6. Propensities of the structural alphabets with various conditions. (a) Propensities of the structural alphabets that undergo a conformational change upon DNA binding. (b) Propensities of the structural alphabets that undergo an order-to-order conformational change upon DNA binding. (c) Frequencies of the structural alphabets that undergo a disorder-to-order conformational change (Y to one of A-J) upon DNA binding. (d) Propensities of the structural alphabets that undergo a disorder-to-order conformational change (Y to one of A-J) upon DNA binding. In (a) through (d), fragments located within DNA interfaces and those within non-interfaces are shown by filled and open bar, respectively. A positive value of the propensities indicates that frequency that alphabet is observed in DNA-bound forms is higher than that in free forms. Error bar indicates the 85% bootstrap confidence interval. doi:10.1371/journal.pone.0056080.g006 (22.2) because disorder-to-order conformational changes in the DNA interfaces often occurred upon DNA binding. In particular, the conformations B, C, and H significantly increased compared with those in the non-interfaces. In contrast to the DNA interfaces, in the non-interfaces, loop-like conformations, the values of A (log e propensity = 0.4), F, and G (0.2) were positive, and these conformations were induced upon DNA binding. Helix-like conformation I and extended conformation E (0.1) also slightly increased in the DNA-bound forms, whereas Y (20.9) was disfavored. These results indicate that disordered fragments, even in non-interfaces, tend to be ordered when they bind to DNA.
Next, the reasons why the three above-mentioned conformations increased in the DNA interfaces were considered. Initially, it was recognized that conformational changes from Y to A-J significantly increased in the DNA interfaces compared with the non-interfaces. Therefore, it was expected that the distribution of the alphabets in the DNA interfaces would be more noticeably affected by disorder-to-order conformational changes. Figure 6(b) shows the propensities of the structural alphabets that undergo an order-to-order (that is, an A-J conformation to an A-J conformation) change upon DNA binding. Regardless of whether they were in a DNA interface, the values for all alphabets were nearly zero, indicating that there is no alphabet preference except for H. H increased in the DNA interfaces, but not in the non-interfaces in these order-to-order conformational changes. Thus, the protein-DNA complex structures with H were examined and the loop-like conformation (H conformation) was identified that stabilizes the protein-DNA interactions in various ways. However, owing to the limited number of data, no common features could be identified that explain why H increases upon DNA binding.
Next, for the fragments that underwent a disorder-to-order conformational change, the frequencies of the alphabets in the DNA interfaces and non-interfaces (Fig. 6(c)) and the alphabet propensities ( Fig. 6(d)) were calculated. Neither the frequencies nor the propensities of the alphabets were significantly different between the DNA interfaces and the non-interfaces. This result indicates that the structures induced from disorder-to-order conformational changes in DNA interfaces are similar to those in non-interfaces. These results suggest that changes to B or C from Y in DNA interfaces occur more frequently. Consequently, B and C are considered to be the top two preferred alphabets in disorder-order conformational changes. The reasons why the B and C conformations are favored remains a subject for future investigation.

Conclusion
In this study, conformational changes in 4-residue fragments between DNA-free and DNA-bound forms were analyzed using structural alphabets, which enabled the precise description of the variety of local protein conformations. The results revealed the importance of the intrinsic conformational flexibility upon DNA binding: (1) intrinsic conformational variations in DNA interfaces are more frequent than those in non-interfaces and (2) conformationally changed fragments in DNA interfaces favor the disorderpromoting amino acids. In addition, it was found that three specific alphabets appeared in the DNA interfaces; however, the roles of the conformations in DNA binding are various. These findings may contribute to the more accurate prediction of the DNA binding sites of proteins and the potential conformational changes in the complex form.