Conformational Dissection of a Viral Intrinsically Disordered Domain Involved in Cellular Transformation

Intrinsic disorder is abundant in viral genomes and provides conformational plasticity to its protein products. In order to gain insight into its structure-function relationships, we carried out a comprehensive analysis of structural propensities within the intrinsically disordered N-terminal domain from the human papillomavirus type-16 E7 oncoprotein (E7N). Two E7N segments located within the conserved CR1 and CR2 regions present transient α-helix structure. The helix in the CR1 region spans residues L8 to L13 and overlaps with the E2F mimic linear motif. The second helix, located within the highly acidic CR2 region, presents a pH-dependent structural transition. At neutral pH the helix spans residues P17 to N29, which include the retinoblastoma tumor suppressor LxCxE binding motif (residues 21–29), while the acidic CKII-PEST region spanning residues E33 to I38 populates polyproline type II (PII) structure. At pH 5.0, the CR2 helix propagates up to residue I38 at the expense of loss of PII due to charge neutralization of acidic residues. Using truncated forms of HPV-16 E7, we confirmed that pH-induced changes in α-helix content are governed by the intrinsically disordered E7N domain. Interestingly, while at both pH the region encompassing the LxCxE motif adopts α-helical structure, the isolated 21–29 fragment including this stretch is unable to populate an α-helix even at high TFE concentrations. Thus, the E7N domain can populate dynamic but discrete structural ensembles by sampling α-helix-coil-PII-ß-sheet structures. This high plasticity may modulate the exposure of linear binding motifs responsible for its multi-target binding properties, leading to interference with key cell signaling pathways and eventually to cellular transformation by the virus.


Introduction
In 1984, Emil Fisher proposed that proteins must acquire a unique three-dimensional globular structure in order to achieve functionality. This hypothesis was first challenged by the discovery of gene sequences that encoded for unfolded proteins [1] and by the existence of proteins that have more than one minimum energy state in their folding landscapes [2]. These groups of proteins, which are functional but lack a compact, well-defined secondary or tertiary structure in solution, are known as ''natively unfolded'' or ''intrinsically disordered proteins'' (IDPs) [3,4]. After the first examples of IDPs were introduced [5,6], a high proportion of IDPs or proteins with intrinsically disordered domains (IDDs) were uncovered as increasing genomic information for the different organisms became available. IDPs and IDDs are widespread in nature representing 28% of prokaryotic proteomes, and more than 32% of eukaryotic proteomes [7]. They are involved in crucial biological processes such as signaling, molecular recognition and cell homeostasis, and are associated with human pathologies including cancer, neurodegenerative and cardiovascular diseases, among others [8]. Several algorithms have been developed for predicting intrinsic disorder. Some of them are based on the overall residue hydrophobicity/net charge ratio [9], whereas others are based on energy content estimated from aminoacid composition [10]. However, experimental information about the dynamics of IDPs and their conformational ensembles in solution is required for better understanding structure-function relationships in intrinsic disorder and for improving the accuracy of algorithm predictions.
The most widely used experimental techniques for IDP studies include protease mapping, Far-UV circular dichroism spectroscopy (CD), nuclear magnetic resonance (NMR), In-cell NMR, analytical ultracentrifugation (AUC), fluorescence correlation and vibrational spectroscopy, among others [11]. IDPs and IDDs present hydrodynamic properties that indicate that they are extended in solution and present low levels of consolidated secondary structure, high conformational flexibility and dynamic residual secondary structure ensembles, which are highly sensitive to changes in the local environment such as pH, ionic strength and temperature [2,3]. It has been proposed that the multiple protein interactions involving IDPs can be mediated by linear motifs, short linear interaction-prone segments [12] often present within disordered regions that undergo disordered to ordered transitions upon binding [13]. Some IDPs present strong conformational tendencies towards the bound conformation in the unbound state, suggesting the presence of local preferences for transient secondary structure elements within binding sites [14].
The proportion of IDDs and IDPs was noticed to be particularly high in viral genomes, where a small number of gene products or their combination is sufficient for completion of the viral life cycle. This increased proportion of disordered regions has been associated with high adaptability and mutation rates, and also with the high structural flexibility that allows interaction with multiple cellular and viral targets conferring a multifunctional nature [15,16]. Paradigmatic examples are papillomaviruses (PVs), which are small double-stranded DNA viruses with circular genomes of less than 9 kbp and only eight polypeptide products [17]. PVs are medically relevant human pathogens, since persistent infections can lead to different types of cancers, in particular cervical cancer [18]. Since PVs lack the required enzymes for viral genome replication and transcription, they depend on the cell machinery in order to carry out their viral life cycle [19]. In this regard, early proteins E6 and E7 interact with the tumor suppressors p53 and retinoblastoma tumor suppressor (pRb), respectively, leading to their proteasomal degradation [20]. Degradation of pRb mediated by E7 causes the release of the general transcription factor E2F, promoting S phase entry and therefore cell cycle progression [21]. E7 is the main transforming product from PVs [22] and has been reported to interact with a large number of cellular targets affecting multiple cell regulatory pathways [23,24]. These features highlight the binding promiscuity of the E7 oncoprotein, and may explain the many different mechanisms through which E7 can lead to cell transformation and cancer [25].
We have shown that HPV16 E7 can self-assemble into defined spherical oligomers (E7SOs) upon removal of its coordinated zinc atoms [39]. The E7N domain faces the solvent in E7SOs, providing solubility to the otherwise insoluble E7C oligomer [39,40]. The highly stable E7SOs present amyloid-like properties, display chaperone holdase-like activity [39,41] and were shown to be located in the cytosol of HPV-transformed cell lines and cancerous tissue, where they can interact with numerous binding partners [25]. Moreover, the E7 oncoprotein is under the repressive control of the PV E2 master regulator [21], whose open reading frame is disrupted upon integration of the viral genome to the host chromosome. In the absence of E2, E7 levels become deregulated, producing an increase of oligomeric states in the cytosol that may contribute to cellular transformation. The interaction between E2 and E7 takes place through the IDD E7N domain and suggests a mutually sequestering mechanism mediated by oligomerization or aggregation that may balance repression or over expression of E7 [42].
Given the medical relevance of HPV and the prototypic nature of the HPV E7 oncoprotein as a model viral IDP [30,43], we set out to experimentally dissect transient secondary structure elements within the E7N domain by using a fragmentation approach combined with Far-UV CD and NMR spectroscopies, AUC, and solvent stabilization. We identified two sequence stretches with strong propensity towards a-helical structure, located within the CR1 and CR2 regions respectively. The helix within CR1 is not affected by changes in pH, while the C-terminal region of the helix within CR2 alternates with polyproline type II (PII) structure depending on charge neutralization of the highly acidic stretch. Many of the structures acquired by E7 fragments depend on their sequence context and map to protein interaction sites, highlighting their structural plasticity and functional relevance. We discuss our results in the light of the promiscuous binding properties of the E7N domain and on their possible effect on the local conformational equilibria of the CKII-PEST region, which modulates protein turnover.
Comparison of the far-UV CD spectra of the E7N subfragments in 10 mM Tris.Cl buffer at pH 7.5 showed a lack of canonical secondary structure with a general appearance of disorder ( Figure 2A). Most sub-fragments presented similar spectra, with molar ellipticity values around 215,000 deg?cm 2 ?dmol 21 resembling those of the previously described E7  domain [27]. However, two sub-fragments presented significant changes in different regions of their spectra: the minimal LxCxE fragment E7 (21)(22)(23)(24)(25)(26)(27)(28)(29) showed an atypical shape with a positive band at ,230 nm that suggested a turn-type structure, and the E7 (25-40) fragment presented a largely increased negative minimum at ,200 nm (around 223,000 deg?cm 2 ?dmol 21 ) characteristic of PII conformation [46] (Figure 2A), suggesting that not ''all disorder'' in the fragments was similar.
A further means of stabilizing local secondary structure in peptides is the anionic detergent sodium dodecyl sulfate (SDS), which is known to stabilize a-helix in E7N at supra-micellar concentrations (CMC = 5 mM), and may stabilize ß-sheet conformations at sub-micellar concentrations [27]. In order to minimize charge effects, we worked at pH 3.0 where all the acidic residues were neutralized. At 25 mM SDS (supra-micellar concentration), all fragments tested except for E7 (25-40) stabilized a-helix to different extents ( Figure 5A), in agreement with the TFE results. At 1 mM SDS (sub-micellar concentration), we found that only E7N showed a tendency to form a ß-sheet structure. However, sub-fragments E7 (16-31), E7  and E7 (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40) showed Far-UV CD spectra similar to those in aqueous buffer, indicating no stabilizing effect of SDS ( Figure 5B). The E7   Table 2). C) Maximum percentage of a-helix content induced by TFE in E7N and the sub-fragments using 20 mM Tris.Cl buffer pH7.5 (white bars) and 20 mM sodium formate buffer pH 5.0 (gray bars). The percentage of residues in a-helix conformation was calculated from data fitting of TFE titrations for each fragment to a two-state coil-helix equilibrium model ( Figure S1 and Table 2). The symbol (*) indicates no alpha helix induction. D) pH titration curves for different fragments followed by molar ellipticity at 222 nm in 10 mM citrate-phosphate buffer containing 30% TFE. E7  (1-20) fragment stabilized a-helix even at sub-micellar SDS concentrations ( Figure 5A-B), which confirmed the high propensity of this fragment towards a-helical populations and was in line with the results from TFE experiments, where high helical content was induced irrespective of pH ( Figure 3D). Overall, these results confirmed the helical propensities observed in TFE experiments and suggested that the formation of ß-sheet structure within E7N might involve the establishment of long-range interactions.
We estimated the frictional ratio of the peptides, f/f min , which is a useful parameter for describing the shape of macromolecules in solution [50] (see Materials and Methods). The f/f min values of globular proteins are nearly constant, increasing from 1.15 to 1.3 for the 5-to 1000 kDa molecular mass range, while f/f min values for IDPs are significantly larger and increase from 1.5 to 3 for molecular mass ranging from 5 to 200 kDa [3,50]. The f/f min values for all E7N fragments had values ranging between 1.40 and 1.53 (Table 3), which indicated that both the E7N domain as well as the E7 (1-20) and E7 (16-40) fragments had hydrodynamic properties characteristic of IDPs. Finally, we analyzed the hydrodynamic radius (R H ) for the different fragments by using AUC at pH 7.5 and for the E7N domain by pulsed field gradient NMR experiments (PFG-NMR) at pH 7.5 and pH 5.0. PFG-NMR measures molecular diffusion rates and accurately complements the overall analysis of size and shape of proteins in solution by AUC. The R H values for the E7N domain from AUC and NMR measurements at pH 7.5 and similar concentrations (, 1mM) were 1761 Å and 15.360.2 Å , respectively. The small discrepancy of the R H values obtained may be due to differences intrinsic to both techniques. Both R H values obtained for E7N were significantly larger than the R H value expected for a 40 residue globular protein (13.8 Å ) [51], confirming the IDP nature of E7N. However, both values were also smaller than the R H value predicted for a fully unfolded polymer (18.1 Å ) [51], which suggested some degree of compaction of the domain that may be due to the presence of transient long-range interactions. PFG-NMR experiments showed a further decrease in the R H value at pH 5.0 (14.360.3 Å , Table 3), suggesting that protonation of the acidic residues contributed to E7N compaction.
Chemical shift assignment of poorly dispersed E7N spectra An essential step towards the description of fluctuating conformational equilibria in IDPs is the complete assignment of residue chemical shifts by NMR. A first inspection of the one dimensional 1 H spectrum of E7N in aqueous solution at pH 5.0 showed poorly dispersed amide proton resonances (data not shown), with most peaks within the 7.8-8.5 ppm range, making assignment a rather challenging task. The methyl proton region also showed poor dispersion of peaks, confirming the expected disordered nature of E7N. Using two-dimensional 1 H-1 H and 1 H-13 C spectroscopy we were able to fully assign the proton and carbon resonances of E7N in aqueous solution at pH 5.0 ( Figure 6 and BMRB code: 19269). Due to the great signal overlap in the amide proton region, the 1 H-13 C HMQC-TOCSY spectrum was essential to assign the different spin systems, while the sequential assignment of E7N was based on NOE connectivity in the 1 H-1 H NOESY spectrum ( Figure 6A). In the 1 H-13 C HMQC spectrum, residue resonances were mainly grouped by amino acid type, a feature also characteristic of IDPs ( Figure 6B).
Interestingly, in all the experimental conditions under which we studied E7N (see below), both prolines, P6 and P17, were fully in trans configuration. Only one set of peaks was observed for each proline residue and, as determined from 13 C b and 13 C c chemical shifts [52], they corresponded to the trans conformation. In addition, in the NOESY spectrum, the diagnostic strong H d -H a (i, i-1) NOEs for the trans-proline isomer was observed for P6 and P17 residues, while no H a -H a (i, i-1) cross peak, characteristic for the cis-proline conformation, was detected for either proline residue (data not shown). This is in contrast with the linker region joining the disordered N-terminal and the globular C-terminal domains of E7, which also harbors two proline residues (P41 and P47), that present a cis-trans isomeric equilibrium [29].
Transiently populated a-helical conformations by NMR chemical shift analysis Secondary chemical shifts are normally used to assess both the location and population of secondary structure elements in proteins. However, secondary structure in IDPs is typically transient and confined to short individual helical or extended segments with ensemble-averaged structured populations that can range a few percent. Because chemical shifts represent a population weighted average of local conformational preferences, secondary shifts for IDPs are generally small and consequently, secondary transient structures are difficult to detect [53]. Frequently, uncertainties in the random coil shifts can lead to ambiguities in this type of analysis of IDPs [54]. Short range NOEs can be also used to corroborate secondary structure propensities detected through chemical shifts, although in our case this was not possible because of the significant signal superposition in the amide and alpha regions of the proton spectrum. As a consequence, in order to estimate the E7N secondary structure propensities, we compared the chemical shifts of the domain under pH and solvent polarity conditions that stabilize particular conformations of the peptide as revealed by Far-UV CD experiments (with and without TFE at pH 5.0 and 7.5). The presence of TFE caused significant chemical shift changes and line broadening in the 1 H spectrum, suggesting that the peptide underwent a major structural change in this condition, in agreement with the results from Far-UV CD experiments. At pH 7.5 and 3 mM E7N (the concentration required for NMR assignments at 13 C natural abundance), we were able to measure chemical shifts using 50% TFE, where maximal helical content is reached ( Figure 7A and Figure S1) in spite of the fact that the spectra quality was poorer due to signal broadening (data not shown). In contrast, at pH 5.0 we observed E7N precipitation at 3 mM peptide and high TFE concentrations, and for this reason we assigned the resonances at 12% TFE, where the a-helical stabilization is still significant ( Figure 7B and Figure S1) and no precipitation was detected. Thus, we fully assigned E7N in the four experimental conditions: pH 7.5 and pH 5.0, in the presence or absence of TFE (BMRB code: 19269 and Tables S1-S3). Although the structure of E7N in TFE: water mixtures was also largely disordered, an increase in chemical shift dispersion indicated a higher degree of stabilization of secondary structure elements. The 13 C a and 1 H a shifts are the most robust indicators of residual secondary structures [53]. Positive differences of 13 C a chemical shifts in TFE solutions with respect to water    are an indication of a-helical propensity for a given segment of residues, while negative differences in 1 H a in TFE with respect to water indicate propensity for a-helical conformation. The chemical shift data at pH 7.5 ( Figure 7A) clearly highlight consecutive negative 1 H a and positive 13 C a chemical shift differences for regions encompassing residues 8 LHEYML 13 and 17 PETTDLYCYEQLN 29 , indicating the presence of two distinct regions with a-helical conformation within E7N. The similar DG H20 and m-values obtained for the helix-coil transition in fragments E7 (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) and E7  which contain these regions ( Table 2) suggests that both helixes have similar thermodynamic stability. Figure 7B shows the analysis of chemical shift differences at pH 5.0. Because of the different TFE contents, the magnitude of the shifts at pH 5.0 and pH 7.5 are not comparable. However, the same segments that display a-helical preferences at pH 7.5 also do so at pH 5.0. In addition, the second helix is propagated to the Cterminal stretch of the domain, comprising the acidic CKII-PEST region and spanning from residue P17 to I38 as judged by 13 C a and 1 H a chemical shifts. Despite the fact that shifts suggest that the helical structure in this segment is less populated than the Nterminal region of the helix, this acidic region is able to partially stabilize the a-helical conformation at low pH. The NMR experiments showed that a total of 20 residues are involved in a-helix at pH 7.5 and 50% TFE. In contrast Far-UV CD TFE titrations yielded values of ,10% residues in a-helix conformation in this condition (Table 2) which corresponds to an average of 4 residues fully populating the a-helical conformation [55]. This value was five-fold lower than the number of residues identified by NMR, suggesting that a-helical segments within E7N were partially populated, with average values of about 20%, and highlighting the dynamic nature of these conformations.

Analysis of E7N polyproline type II elements by Far-UV CD and coupling constants
Polyproline type II structure is prevalent in unfolded or intrinsically disordered proteins [56]. However, this conformation is not trivial to identify as it can be stabilized only locally at the individual residue level, and can present features characteristic of disordered chains, often confused with ''random coil'' properties [56]. Detailed analysis of Far-UV CD spectra can be used to detect PII structure. The intensity of the minimum typical of ''disordered'' states at ,198-200 nm and of the positive band at 218-220 nm are sensitive probes for measurement of PII content [46]. Moreover, low temperature stabilizes PII conformations leading to a decrease in the minimum at 198 nm and to an increase in the positive band at 218 nm [56]. In order to study the regions within E7N that can adopt PII structure, we analyzed the Far-UV CD spectra of all the E7N sub-fragments at different temperatures ( Figure S5). Figure 8A shows this analysis for the isolated E7 (25-40) fragment. Low temperature led to a decrease in ellipticity at ,200 nm and to an increase in ellipticity at ,218 nm ( Figure 8A). The difference spectrum (5uC -75uC) showed the induction of a strong positive band at 218 nm and a concomitant decrease at ,200 nm, suggesting that PII structure was prevalent within E7 (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40) (Figure 8A inset). This was in agreement with the results from pH titration of this fragment, which also showed a difference spectrum (pH 8.0-pH 3.0) suggestive of the induction of PII structure (Figure 8A inset). Further analysis of the difference spectra for all sub-fragments revealed an increase in PII content at pH 7.5 with respect to pH 5.0 only in the E7 (25-40) and E7 (16-40) fragments ( Figure 8B), which suggested that deprotonation of the polyacidic stretch at neutral pH led to an increase in charge repulsion that favored an extended PII structure. This was a strong indication that the poly-acidic CKII-PEST region was enriched in PII structure, as we previously described for the inter-domain CKII-PEST containing peptide of BPV-E2 [38].
We aimed at obtaining further evidence of PII structure within E7N by complementing the secondary chemical shifts study with the analysis of 1 J CaHa couplings by NMR [57]. Given that observed scalar couplings are population weighted averages of couplings sampled over various conformations, any deviation from random coil values can be interpreted as a secondary coupling contribution in analogy to secondary chemical shifts. In particular, 1 J CaHa coupling constants are a reliable indicator of both a-helical and PII structures at the residue level. Regions populating PII structure are difficult to identify by NMR because the PII backbone conformation does not exhibit characteristic proton or carbon chemical shift deviations from random coil values as found in a-helix and b-sheet conformations [58]. A characteristic pattern of short-range NOE connectivity is also distinctive of PII conformations [59]. Nonetheless, the number of observable inter-residue correlations in the E7N NOESY spectra was limited and precluded the use of NOE connectivity to identify PII regions. A good indication of PII structures is also provided by 1 J CaHa constants, which present values larger than random coil values [60]. However, a-helices also display increased 1 J CaHa constant values. Therefore, regions that do not present chemical shift deviations but exhibit relatively large 1 J CaHa values are candidates to populate PII conformations.
In order to have further confirmation of a-helical propensities and to investigate the presence of PII structure in E7N, we measured 1 J CaHa values, obtained from a proton coupled HMQC spectrum (Table S4). Figure 8C shows the difference between observed and random coil 1 J CaHa values [61], D 1 J CaHa for pH 5.0 and pH 7.5. The deviations of D 1 J CaHa values were in agreement with the preference for a-helical conformation of 7 TLHEYML 13 and 17 PETTDLYCYEQLND 30 tracts at both pH values, as determined from chemical shift analysis (Figure 7). In water at pH 5.0 and 20uC, the constants displayed positive deviations from H9 to Q16 and from L22 to I38 for residues whose constants could be obtained, providing an indication of residual a-helical structure in these regions. The slight discrepancies between information derived from chemical shifts and coupling constants may be due to the use of constant random coil values that do not take into account neighbor and side chain charge effects [61]. We further measured the 1 J CaHa constants and D 1 J CaHa values in aqueous buffer at pH 7.5 and at low temperature (4uC) where PII conformation is stabilized. Despite the large peak overlapping, we were able to measure four coupling constants in the PEST region (E32, E33, D36 and I38, Figure 8C), which presented positive differences respect to tabulated random coil values. The fact that at pH 7.5 the PEST region exhibits positive D 1 J CaHa values ( Figure 8C) yet no significant chemical shift dispersion even in the presence of TFE ( Figure 7A), makes it a candidate for PII stabilization. Taken together, the NMR results strongly suggest that the 33 EEEDEI 38 tract presents PII conformation at pH 7.5. Finally, both CD and NMR experiments indicate that when acidic side chains in the E7 (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40) region are deprotonated at pH 7.5, the electrostatic repulsion stabilizes extended conformations such as PII and precludes the formation of a-helix. Conversely, at low pH where acidic side chains are neutralized, a-helix conformations are favored.

Discussion
In the present work we performed a structural dissection of the HPV-16 E7N IDD [27] by making use of a fragmentation approach in combination with spectroscopic and biophysical techniques. This domain belongs to a prototypical viral oncoprotein, which is considered as the main transforming protein from PVs and shares sequence similarity and functional properties with related viral oncoproteins. Moreover, the E7N IDD contains several linear recognition motifs located in its CR1 and CR2 regions that are responsible for its multi-target binding properties [37].
Most sub-fragments used for the structural dissection display similar molar ellipticity values, disordered-like Far-UV CD spectra with a minimum at ,200 nm, and present transient a-helical populations that can be stabilized by TFE to different degrees (Figure 2 and 3). Two fragments within CR2 show no a-helix propensity, even at high TFE, and display distinct features in their spectra: E7 (21-29) comprising the LxCxE motif presents a spectrum typical of a turn-type structure, while E7 (25-40) including the CKII-PEST region displays a drastically increased negative ellipticity at 198 nm, pointing to a PII-type structure within this segment [56,62]. In addition, AUC experiments showed that E7N (E7(1-40)), the CR1 (E7(1-20)) and CR2 (E7(16-40)) fragments are monomeric and extended in solution, with similar f/f min values characteristic of IDPs. These results show that although E7N and its conserved regions present global hydrodynamic properties typical of IDPs, not ''all disorder'' within the E7N domain is similar.
We found a pH-dependent increase in a-helical content within E7  at high TFE that was not observed in either full-length E7 or in the truncated version E7   (Figure 4), which indicates that the information required for pH-dependent a-helix formation is local and lies within the E7N domain. By performing Far-UV CD and NMR measurements, we identified three regions with propensity to populate transient secondary structural elements: two a-helixes located within the CR1 and CR2 regions, respectively and a region rich in PII structure within CR2 (Figures 6-8). NMR studies revealed that at pH 7.5 the a-helixes spanned residues L8 to L13 (Helix I) within CR1 and P17 to N29 (Helix II) within CR2, while residues E33 to E37 within the acidic stretch present PII conformation. At pH 5.0, Helix I remains unchanged, while Helix II propagates into the acidic region, encompassing residues P17 to D38 (Figure 9). Far-UV CD measurements of the E7 (25-40) fragment support these results, showing a decrease in PII structure at low pH, which is presumably due to a reduced electrostatic repulsion between the acidic residues that relaxes the extended conformation and favors the propagation of the a-helix into this region. This is similar to what is observed in poly-glutamic sequences, which also preferentially populate PII conformational space when glutamate side chains become deprotonated [63]. These results reveal a conformational switch between PII structure and a-helix within the CR2 acidic stretch, which is triggered by charge neutralization at low pH.
Interestingly, fragment E7 (21-29) comprised within Helix II is not able to stabilize an a-helix in isolation even at high TFE (Figure 2), which suggests that the residue required for helix initiation is not contained within this peptide [47]. NMR experiments show that P17 is the first residue of Helix II in E7N, in agreement with the fact that the minimal region capable of forming an a-helix is E7 (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31), which contains P17. These results, together with previous evidence showing that proline is frequently found as an N-capping residue in helixes [64] lead us to hypothesize that P17 may act as the initiation site for Helix II in E7N. In addition, the high conservation of P17 within HPV E7 proteins [37,43] suggests a functional role for this structural element across HPV E7 proteins. Although further testing will be required to assess this possibility, the compaction of E7N at low pH measured here by PFG NMR and reported by previous size exclusion chromatography experiments [27] suggests that Helix II within E7N may be further stabilized through transient long-range interactions between the E7 CR1 and CR2 regions. In addition, SDS was able to stabilize a ß-sheet type structure in E7 (1-40) that was not observed in any of the E7N sub-fragments ( Figure 5), indicating that local sequence information is not sufficient for the formation of this structure within E7N and suggesting the presence of transient long-range interactions, whose nature and extent will need to be addressed by future NMR studies.
Two of the main interaction motifs within E7N show different conformational properties related to their target-bound forms. The E2F-mimic motif forms a a-helix in the bound form [65,66], while the region corresponding to this motif (residues 8-14) presents high a-helical propensity in the unbound form. This suggests that binding of the E2F-mimic motif to Rb may involve the selection of preformed structural elements [14], although further work will be required to test this hypothesis. On the other hand, the LxCxE Rb-binding motif presents an extended ß-strand like conformation in the bound state [44,67]. This region presents an extended, turnlike Far-UV CD spectrum in isolation, but is part of transient Helix II in the context of the E7N domain. These different structural tendencies of the LxCxE motif might be largely influenced by the neighboring sequence context, by environmental conditions, or by transient long-range interactions. The population of conformations that do not correspond to the Rb-bound state in this region could constitute an example of functional ''misfolding'' within IDPs, where conformations populated by the unbound state partially protect binding sites from undesired contacts [68]. Alternatively, the helix conformation may mediate binding to additional targets by this region. Interestingly, kinetic studies of complex formation between the retinoblastoma tumor suppressor AB domain and E7N fragments including E7 (21)(22)(23)(24)(25)(26)(27)(28)(29) revealed that if present, the sampling of conformational ensembles in this region is very fast [69] in contrast to the E7N-E7C hinge region, where the interaction mechanism with a specific antibody (M1) displays a slow conformational selection step involving proline isomerization [29]. The different timescales of conformational equilibria within E7N may be associated to a diversity of binding mechanisms, a salient feature of intrinsically disordered domains that may further contribute to their multi-target binding properties.
CKII and PEST sites are contiguous in most HPV E7 proteins, evolved together, and are functionally coupled in Rb binding [24,37]. Phosphorylation of S31 and S32 within the CKII-PEST region increases Rb binding affinity and PII content in this region [24,70]. Results from this work show that the presence of negative charge destabilizes Helix II and increases PII content in the CKII-PEST region, which may favor an extended conformation of the LxCxE motif that enhances Rb interaction affinity. The helix-PII equilibrium in this E7N region may also be related to protein degradation. We have previously shown that phosphorylation of a CKII-PEST motif within an intrinsically disordered ''hinge'' region of BPV-1 E2 that shows the presence of both a-helix and PII separated by a turn leads to an increase of protein turnover in vivo [38]. This example introduced a new concept whereby phosphorylation was not used as a ''label'' for recognition by kinases or the degradation machinery but instead destabilized local structure within this region, causing a decrease in thermodynamic stability that made the CKII-PEST site more accessible for degradation [38]. Although this hypothesis must be further tested, switching between a-helix and PII structure induced by physiological pH variations or phosphorylation might hinder the propagation of the Helix II into the acidic region at pH 7.5, leaving the PEST site more accessible in the form of PII, with direct consequences on E7 turnover.
Interestingly, a very recent publication reported secondary ahelical structures from homology and ab initio modeling in the same regions that we describe here by solvent stabilization and NMR [71]. There is an important difference in that the extent of formation of the HELIX II is largely pH dependent and is in equilibrium with pII structure, something that cannot be predicted by modeling. In any case, the ID domain of HPV-16 E7 samples a discrete number of conformations, but its most stable structure in solution is extended and intrinsically disordered.
In summary E7 (1-40) is a paradigmatic example of an IDD defined through bioinformatic and experimental analyses, which is able to populate dynamic but discrete structural ensembles that are tuned by pH. As the acidic residues are deprotonated, their side chains are electrostatically repelled, and the peptide adopts a more extended conformation involving an increase in PII at the expense of a loss in a-helix. The PV virus life cycle depends on the differentiation from basal epithelial cells to keratinocytes, which may involve substantial changes in the physicochemical environment inside the cell including parameters such as crowding, redox state or pH. Our results suggest that these changes could impact on the conformational equilibria and function of E7 through sampling of a-helix-coil-PII-ß-sheet structures that may modulate the exposure of post-translational modification, degradation and protein interaction sites. This conformational plasticity may potentiate E7 interference with key cell signaling pathways required for completion of the viral life cycle and contribute to cellular transformation by the virus.

Buffers and solutions
Unless stated otherwise, measurements at pH 7.5 were performed in 10 mM Tris.Cl buffer and at pH 5.0 in 10 mM Sodium formate buffer, both with 1mM DTT at 20+/20.1uC. All chemical reagents were of analytical grade (purchased from Sigma Aldrich) and all solutions were prepared with distilled and deionized water (Milli-Q plus) and filtered through 0.22 mm membranes prior to use.

Far-UV Circular Dichroism (CD) spectroscopy
CD measurements were carried out on a Jasco J-810 spectropolarimeter (Jasco, Japan) with cell paths of 0.1 and 0.2 cm. CD spectra were recorded between 195 and 260 nm at standard sensitivity and at a 50 nm/min of scanning speed with a response time of 8 s, data pitch of 0.1 nm and a bandwidth of 2 nm. All spectra were an average of at least 8 scans. Baseline measurements using buffer alone were subtracted from the measured spectra. Protein concentrations used for the E7 (1-98) and E7 (27-98) were 10 mM. Peptide concentration ranged between 20 to 80 mM depending on the peptide size. Raw data were converted to molar ellipticity, using the following equation: Where deg is the raw signal in millidegs, [c] is protein concentration in molar units, # bonds is the number of peptide bonds (number of amino acids -1), and L is the path length in cm.
Analysis of alpha helical content in different solvent mixtures 2,2,2-trifluoroethanol (TFE) measurements. TFE stabilization of helical conformations is stronger at lower temperatures, with little variation between 5uC and 25uC [47]. TFE titration curves were carried out at 20uC, a temperature that produces maximal stabilization in model peptides [47]. Samples were dissolved in 20 mM Tris.Cl buffer at pH7.5 and in Sodium formate buffer at pH 5.0 and equilibrated in 0 to 53% TFE (volume of TFE added/ total volume added) [62]. Data analysis. The mean residue ellipticity at 222 nm, which is assumed to be proportional to helical content, was plotted as a function of [TFE]/[H 2 O] ratio and was fit to a two state ''coil to helix'' equilibrium model [47]. This model assumes that the free energy for a-helix formation depends linearly on [TFE]/[water] ratio. To extract the thermodynamic parameters DG 0, H20 and m, molar ellipticity data was fitted to the following equation: Where n is the number of residues for a given peptide. The TFE titration data from the full length protein E7 (1-98) and the truncated E7 (27-98) protein could not be fitted to the two state model. In this case, the percentage of residues in alpha helical conformation was calculated from the initial and final molar ellipticity values at 222nm. pH titration curves at 30% TFE. The samples were prepared in 10 mM Citrate phosphate buffer ranging from pH 3.0 to 8.0 mixed with 30% TFE and incubated for one hour at room temperature.
Data analysis. The mean residue ellipticity at 222 nm as a function of pH was fit to a variant of the Henderson-Hasselbalch equation [72] to extract the global pKa value for each fragment, assuming that all ionizable groups are deprotonated in the initial state and protonated in the final state: Where y is the measured signal, Din and Pin are the spectroscopic signals of the deprotonated and protonated states, respectively and Dm and Pm account for the linear variation of the signals with pH. The fractions of deprotonated and protonated species (f D and f P ) are defined as: Sodium dodecyl sulfate (SDS). Experiments were carried out in 10 mM sodium formate buffer at pH 3.0. Samples were equilibrated for one hour at room temperature with different concentrations of SDS (100 mM to 25 mM). The critical micelle concentration (CMC) for sodium formate buffer used in the present work was previously reported [27].
Analysis of polyproline type II (PII) structure by circular dichroism Low temperatures were used in order to stabilize PII structures 56]. CD spectra for the different E7 peptides were measured at temperatures ranging from 5 to 75uC. The propensity to adopt PII structure for each peptide was assessed by analyzing the difference spectrum (5uC-75uC) at 218nm [46].

Analytical ultracentrifugation of peptides
Sedimentation velocity (SV) analytical ulracentrifugation experiments (AUC) were performed for peptides E7 , E7 (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20) and E7  in 50 mM Tris.Cl buffer at pH 7.5 ( Figure S3). All AUC experiments were performed on a Beckman Coulter XL-I analytical ultracentrifuge. SV experiments of solutions were performed at 20uC, at a rotor speed of 50000 rpm using the 8hole ANTi-50 rotor. Cells were equipped with sapphire windows. Titane double sector centerpieces from Nanolytics Inc. were used. Cells with centerpieces of 0.3 cm optical path were filled with 100 ml of sample and solvent reference. SV profiles were acquired during 24 hours, using absorbance optics, at intervals of 13 min for each cell. Density and viscosity of the buffer, which are required for the analysis, were calculated with SEDNTERP software, from John Philo (http://www.jphilo.mailway.com/). Partial specific volume of peptides was calculated from the amino acid sequences using SEDNTERP. Analyses of SV experiments were made using the continuous distribution c(s) and the noninteracting species model analysis of SEDFIT software from P. Schuck (http://www.analyticalultracentrifugation.com). In order to study non-ideality effects in solution, various protein concentrations were assessed (dilutions 1:1, 1:2 and 1:3). The sedimentation and diffusion coefficients s 0 and D 0app were derived from linear approximations to infinite dilution ( Figure S4) and used to obtain the apparent molecular mass (M app ) and hydrodynamic radius (R H ) for each peptide according to the Stokes-Einstein and Svedberg equations: Where R is the gas constant, T the absolute temperature, N A Avogadro's number and g solvent viscosity (see Ref. [50] for a complete description of the methods). The f/f min value depends on the hydration, surface roughness, shape and flexibility of the particle. This value is the ratio of the hydrodynamic radius (R H ), which depends on the size of the molecule in solution, to the minimum theoretical hydrodynamic radius of anhydrous volume (R min ): Where R min depends on the cubic root of the molar mass (M) and non-hydrated volumes of the particle (V):

NMR Experiments
NMR experiments were performed on a Bruker 600 MHz Avance III spectrometer equipped with a 5 mm triple resonance cryoprobe incorporating shielded z-axis gradient coils. All the heteronuclear 13 C-1 H correlation experiments were carried out at natural abundance. Pulsed field gradients were appropriately employed to achieve suppression of the solvent signal and spectral artifacts. The proton carrier was centered on the H 2 O frequency. Quadrature detection in the indirectly detected dimensions was obtained using the States-TPPI or the echo-antiecho method and the spectra were processed with the NMRPipe software [73] and analyzed using NMRView [74].

Chemical shift assignment
In order to explore the conformational properties of E7 , all proton and carbon resonances of an unlabelled 3 mM sample of E7  were assigned at 20uC, for which the best quality spectra were obtained, in the following solutions: H 2 O pHs 5.00 and 7.50, containing 5% D 2 O; TFE-d 2 (50%), final pH 7.5; TFEd 2 (12%), final pH 5.0. All NMR samples contained 10 mM TCEP for avoiding oxidation in the course of the spectra acquisition. The experimental conditions were carefully controlled such that no oxidation or precipitation of the peptide occurred during measurements and that the pH of the sample remained stable.
Proton and carbon chemical shift assignments were achieved using a combination of homonuclear and heteronuclear standard 2D experiments. 1 H-1 H NOESY (100 and 250 ms mixing times), 1 H-1 H TOCSY (30 and 70 ms mixing times), 1 H-13 C HMQC, optimized to observe aliphatic or aromatic regions, and HMQC-TOCSY (70 ms mixing time) spectra were acquired. The NOESY experiments were acquired with 4096 (t2), and 1024 (t1) complex points, with spectral widths of 9615 and 6000 Hz in the direct and indirect dimensions, respectively. The TOCSY data sets consisted of 2048 (t2), and 800 (t1) complex points with the same spectral widths used in the NOESY experiments. The 1 H-13 C HMQC and 1 H-13 C HMQC-TOCSY data were collected with 2048 (t2), and 800 (t1) complex data points, and spectral widths of 9615 ( 1 H), and 8906 ( 13 C) Hz, respectively. The 13 C carrier was centered at 40.0 ppm. For the aromatic 1H -13C HMQC, 2048 (t2), and 256 (t1) complex data points were collected, the spectral widths were of 9615 and 3774 Hz, respectively, and the carbon carrier was 130.0 ppm. Chemical shifts in water pH 5.0 were deposited in the Biological Magnetic Resonance Bank (BMRB code: 19269). For the other experimental conditions see Tables S1, S2 and S3.

Transient secondary structures of E7 (1-40)
Typically, secondary chemical shifts are a convenient and accurate way to determine a polypeptide secondary structure [75]. However, E7 (1-40) exhibit shifts that are significantly smaller than in ordered proteins, as expected for an IDP. Therefore, to estimate the E7 (1-40) transient secondary structures, we decided to compare the chemical shifts of the domain observed in different experimental conditions that stabilize particular conformations of the peptide (diverse pHs and temperatures and presence of TFE as a co-solvent). Thus, we were able to investigate the secondary structure conformations that E7 (1-40) populates in solution. To assist in the study of the local E7 (1-40) conformational preferences, the 1 J CaHa coupling constants were also measured. In this case, proton coupled 1 H - 13  To increase resolution, the spectral width of the indirectly detected dimension was lowered to a fraction of the 13 C spectral window.
The alpha region was selected and the rest of the signals were folded into the reduced spectral window. Thus, the 1 H-13 C HMQCs were acquired with the 13 C carbon carrier centered also at 40.0ppm, but with 2048 (t2), and 2048 (t1) complex data points, and spectral widths of 9615 ( 1 H), and 2868 ( 13 C) Hz, respectively.

Diffusion measurements
The pulsed field gradient NMR self-diffusion measurements were performed using the PFG-SLED sequence [76]. Dioxane (10 mL, 2% in H 2 O) was added to the sample (300 mL) as internal standard [77]. The length of all pulses and delays in the sequence were held constant and 19 spectra were acquired with the strength of the diffusion gradient varying between 5% and 95% of its maximum value. The pulse gradient width was 4 ms, and the length of the diffusion delay was calibrated for the sample in order to give a maximal decay of 80-90% for the protein and dioxane signals (150 ms and 20 ms for E7N and dioxane respectively). The E7N intensity decrease was mostly homogeneous throughout the entire proton spectrum (except for the dioxane and TCEP signals). Therefore, we were able to fit the E7N (1-40) intensity decay by following both the aliphatic and the aromatic-amide regions of the spectrum. A T2 filter was used to selectively observe the dioxane signal, without interference of the protein, and therefore to reduce the experimental error, especially at high gradient strengths. The dioxane NMR spectra were acquired with 16 K complex points, and the protein NMR spectra with 4 K complex points. Hydrodynamic radii (R H ) values for E7N in different experimental conditions were calculated as follows: Where D diox and D E7  are the measured diffusion coefficients of dioxane and E7 (1-40), respectively, and R Hdiox is the effective hydrodynamic radius of dioxane, taken to be 2.12 s [77]. Theoretical hydrodynamic radii were calculated from the empirical equations for folded proteins: And for unfolded proteins: Where N is the number of residues [51]. For IDPs, we used an equation that takes into account the main sequence determinants of the R H [78]: Where PPro is the fraction of proline residues, 2/40 in E7 , and |Q| the absolute net charge. |Q| was calculated at pHs 5.0 and 7.5 using the Protein Calculator v3.3 software, which uses for the individual amino acids the pKa values for isolated residues.

Supporting Information
Figure S1 TFE titration curves followed by Far UV-CD at pH 7.5 and pH 5.0. Rows represent the TFE titration curves for E7N and E7N sub-fragments. The name of the fragments is indicated at the right of each row. A) Far-UV CD spectra at different TFE percentage mixtures measured at pH 5.0. The arrows show the sense of change upon increasing TFE percentage. B) TFE Titration curves followed by ellipticity at 222 nm at different [TFE]/[buffer] molar ratios at pH 7.5 (black circles) and pH 5.0 (open circles). The data were fitted to a two-state coil-helix equilibrium model (see Materials and Methods and Table 2). For peptides E7 (21)(22)(23)(24)(25)(26)(27)(28)(29) and E7 (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40), the data could not be reliably fitted to the two-state model due to the lack of a transition. The vertical dashed lines in the E7N panel show the molar ratio corresponding to 12%, 30% and 53% TFE, respectively. The buffers used for all spectra were 10 mM Tris.Cl buffer (pH 7.5) and 10 mM sodium formate buffer (pH 5.0). All measurements were performed at 20uC.  Table S1, for s 0 , and D 0app . The dependence of the inverse of the sedimentation coefficient s 21 with the concentration c (g ml 21 ) is analyzed by the equation s 21 = s 0 21 +k s s 0 21 c, where s 0 is the sedimentation coefficient extrapolated to zero concentration, and k s is the concentration dependence coefficient for s (ml g 21 ). For the apparent diffusion coefficient D app , the dependence with the concentration c (g ml 21 ) is analyzed by the equation D app = D 0app +k D D 0app c, where D 0app is the diffusion coefficient extrapolated to zero concentration, and k D is the concentration dependence coefficient for D app (ml g 21 ).
(TIF) Figure S5 Analysis of polyproline type II (pII) propensity within the E7N domain. Difference spectrum between 5uC and 75uC for E7N and E7N sub-fragments performed in 10 mM Tris.Cl buffer at pH 7.5 (full line) and 10 mM sodium formate buffer at pH 5.0 (dotted line). The fragment names are indicated inside each panel.

(TIF)
Table S1 1 H and 13 C chemical shifts assignments of the E7N in aqueous solution containing 10 mM TCEP and 5% D 2 O at 206C and pH 7.5. a 1 H Chemical shifts are reported in ppm with an accuracy of 60.02 ppm. 13 C Chemical shifts are reported in ppm with an accuracy of 60.1 ppm. b Carbon chemical shifts first, and proton chemical shift in brackets. c These signals may be interchangeable.