Characterization of Intrinsically Disordered Prostate Associated Gene (PAGE5) at Single Residue Resolution by NMR Spectroscopy

Background The Cancer-Testis antigens (CTA) are proteins expressed in human germ line and certain cancer cells. CTAs form a large gene family, representing 10% of X-chromosomal genes. They have high potential for cancer-specific immunotherapy. However, their biological functions are currently unknown. Prostate associated genes (PAGE) are characterized as CTAs. PAGE5 is one of six proteins belonging to this protein family, also called CT16. Methodology/Principal findings In this study we show, using bioinformatics, chromatographic and solution state NMR spectroscopic methods, that PAGE5 is an intrinsically disordered protein (IDP). Conclusion/Significance The study stands out as the first time structural characterization of the PAGE family protein and introduces how solution state NMR spectroscopy can be effectively utilized for identification of molecular recognition regions (MoRF) in IDPs, known often as transiently populated secondary structures.


Introduction
The Cancer-Testis antigens (CTAs) are expressed mainly in normal human trophoblasts and germ line i.e. testis and placenta, but not in other healthy cells [1].Some cancer cells turn on CTA expression by epigenetic regulation, i.e. by DNA hypomethylation and histone post-translational modifications [2].Majority of CTA genes are X-chromosome linked and CTAs represent 10% (99 in total) of all X-chromosomal genes [3].These genes include GAGE, MAGE, SSX, NXF, SPANX, CSAGE, ESO/LAGE gene families, which have been found by X-chromosome DNA sequencing and sequence analysis [3].Expression profile of Xchromosome linked CTAs is more restricted when compared to non-X-linked ones [4].Limited expression profile and readily recognizable target for cancer patient immune system render CTAs highly useful for cancer-specific immunotherapy i.e. having great potential as therapeutic cancer vaccines in specific cancer [5,6,7,8,9].Recently, publicly available knowledge-based database of CTAs has been established (http://www.cta.lncc.br/index.php)due to the increasing interest towards CTAs and their applications [10].Prostate associated genes (PAGEs) together with their sequentially homologous proteins, X antigens (XAGEs) and G antigens (GAGEs), are members of GAGE gene family products [11].Exact biological functions of these proteins, either in prostate or cancer, remain to be characterized, although recent studies have highlighted anti-apoptotic properties for PAGE4 [12] and GAGE7 [13].Interestingly, cancer cell resistance to chemo-and radio-therapies, has been associated to the anti-apoptotic features of GAGE7 [13].There are six different PAGE proteins (PAGE1, 2, 2B, 3, 4, and 5) [11,14,15], expressed in prostate or testis and also in several cancer cells.PAGE5 has been recognized as potential marker for diagnosis of specific cancers as increased expression levels are observed in melanoma, renal and lung cancer cells [4,16,17].
Members of PAGE family are small proteins containing 102-146 amino acids.A more careful examination of amino acid composition reveals high abundance of charged/hydrophilic residues and few hydrophobic residues, characteristic for intrinsically disordered proteins (IDPs) [18,19].Very recently, using bioinformatics tools together with CD and 1 H NMR spectroscopy, PAGE4 has been characterized as a disordered protein that contains an N-terminal nuclear localization signal (NLS).In addition, biochemical assay showed that PAGE4 binds dsDNA [12].However, more detailed structural studies are needed of GAGE gene family products.
During past several years, increasing number of studies regarding IDPs and proteins with disordered regions (IDRs) has been reported, thus increasing our knowledge (and awareness) of proteins that lack well-defined three-dimensional structure but which exhibit essential biological function, thus challenging the structure defines function paradigm.In addition to classical, rigid lock-key binding model established for many folded proteins, enzyme dynamics in terms of conformational selection or induced fit is general feature of protein interactions and interaction of disordered protein with a ligand may induce (partial) folding for unstructured parts [20].However, a protein-protein interaction mode does not necessitate folding and it may take place without well-ordered conformations, a property termed as fuzziness [21].IDPs and IDRs cannot necessarily be described as random flight chains but often contain short recognition sites such as preformed structural elements (PSE) [22], molecular recognition regions (MoRFs) and eukaryotic linear motifs (ELMs).PSEs are short disordered regions in IDPs, which have tendency for formation of transiently populated secondary structures, which may function as potential ligand binding sites [22].MoRFs are short segments in protein, which upon binding to their ligands undergo disorder-toorder transitions [23].In addition, ELMs use distinct mechanisms exhibiting disordered recognition sites of proteins with exposed regions with characteristic physicochemical properties [24].Disorder-to-order transition upon binding is thermodynamically unfavorable.In folded proteins, the bound conformation may already exist whereas in IDPs the disordered binding region folds into a binding conformation, resulting in entropic penalty to Gibbs free energy of binding.However, disorder-to-order transition offers several functional benefits: low affinity and reversible binding, fast ligand binding and ability to bind several ligands (moonlighting) [25].Furthermore, it enables dissecting of affinity from specificity enabling highly specific interactions with low affinity.Consequently, IDPs are often involved in regulatory processes and signaling.From cellular compartments, nucleus is most enriched with IDPs or IDRs [18].
In this work, we have employed bioinformatics, chromatographic methods as well as solution state NMR spectroscopy for structural and functional characterization of PAGE5.We show that PAGE5 is structurally disordered protein but contains transiently populated structural elements.We also show that the elements are more populated at lower pH.In addition, our preliminary studies revealed no binding with double stranded DNA similar to PAGE4.The present study introduces for the first time the structural and dynamic characterization of GAGE gene family proteins at single residue resolution.

Size exclusion chromatography and bio-informatics prediction of PAGE5
The PAGE5 protein is a highly soluble protein at high concentrations (1.5 mM).According to size exclusion chromatogram (SEC), the last step of the purification procedure, PAGE5 migrates with volume characteristic for globular protein with molecular weight of 44 kDa (Figure 1A)).Since molecular weight of monomeric PAGE5 is only 11 kDa, SEC gave an estimation of four times larger MW, i.e. tetrameric protein.As protein migration at SEC column is affected in addition of molecular weight also by shape of the protein, we studied structural features of PAGE5 further by means of NMR spectroscopy.We also used IUPred software for the prediction of unstructured parts of PAGE5 and compared it with other PAGE family proteins (Figure 1B).According to the prediction all PAGE proteins are highly disordered.PAGE3 shows the only exception at the region of residues 62-73.Residues are characterized as disordered, if the disorder tendency (DT) exceeds 0.5 (Figure 1B).Exceptions are residues 62-73 of PAGE3, where DTs are between 0.37 and 0.49.Corresponding residues for PAGE5 and PAGE2 show lower than average DT (,0.7) for residues 68-72.Also N-terminal residues 1-10 of PAGE1 are predicted to form structural region, showing (A) PAGE5 elutes as a single peak from Superdex S75 (16/60) column, suggesting molecular weight of 44 kDa, which is approximately four times higher than actual molecular mass, 11.8 kDa.Void volume of the column was determined experimentally to 39 ml.Column was calibrated by using ovalbumin (elution volume = 58 ml, MW = 43 kDa) and chymotrypsinogen (elution volume = 68 ml, MW = 25 kDa) as a standard proteins (GE Healthcare).(B) IUPred Software [47] prediction suggests that all PAGE family proteins are highly disordered.(C) MoRFs of PAGE5 predicted by ANCHOR Software [48].Residues forming the MoRFs with propability larger than 80% are shaded.doi:10.1371/journal.pone.0026633.g001DT between 0.32 and 0.48.From other PAGE family proteins only PAGE2's N-terminus has lower DTs, i.e 0.70-0.77for residues 1-4.According to IUPred prediction, PAGE4 has the most disordered structure, shown by the highest average disorder tendency (.0.94).Regions with lowered DT can be predicted as MoRFs and might explain the different appearances in distinct cancer types.ANCHOR software was used for prediction of MoRFs (Figure 1C).We also used PSIPRED server for predicting secondary structure elements in PAGE5.PSIPRED analysis suggests a-helical segment for residues 67-79 in PAGE5 (data not shown).

Assignment of NMR resonances in PAGE5
Next we employed NMR spectroscopy to characterize the structure and dynamics of PAGE5 in solution.Figure 2A shows a two-dimensional 15 N, 1 H correlation spectrum ( 15 N-Heteronuclear Single Quantum Coherence) of 15 N, 13 C labeled PAGE5.The spectrum displays poorly dispersed 15 N, 1 H correlations, a hallmark of a disordered protein, stemming from highly similar chemical environment of amide protons due to rapid interconversion of conformers.Further inspection of aliphatic proton chemical shifts, especially lack of dispersion in the methyl proton region, supports the initial observations made on amide proton chemical shifts i.e. underscoring the disordered nature of PAGE5 (data not shown).At pH 8.5, only 23 amide correlation peaks remained detectable, indicating accelerated amide proton exchange with water, where amide protons are not protected by the globular structure (Figure 2A and 2B).Further evidence of the disordered PAGE5 was obtained by measuring steady-state { 1 H}-15 N heteronuclear NOEs, which report rigidity of the protein backbone (Figure 2C).For residues associated to secondary structure elements in rigid molecules, heteronuclear { 1 H}- 15 N NOEs have typically values larger than .0.7.In case of highly disorderd protein backbone, hetNOEs display negative values or values very close to zero.The HetNOE plot as a function of amino acid sequence of PAGE5 shows small positive and negative NOEs with several zero crossings, pinpointing the disordered nature of PAGE5 backbone.However, some amino acid segments exhibit clearly positive hetNOEs indicating existing transient structural rigidity in PAGE5 (vide infra).
The number of emerging correlations in the 15 N-HSQC spectrum indicated absence of few 15 N, 1 H cross peaks owing to linebroadening stemming from ms-ms timescale dynamics or increased NH exchange rate with solvent (vide infra).However, the chemical shift assignment was initially made using iHNCACB [26] and CBCA(CO)NH [27] experiments, which turned out to be an unsuccessful strategy for PAGE5 despite highly selective intraresidual and sequential magnetization transfer schemes utilized in these experiments, respectively.As 13 C9 chemical shifts in IDPs are typically less clustered in comparison to 13 Ca/ 13 Cb shift [28], the 13 C9 chemical shift-based assignment approach was next employed using i(HCA)CONH [29] and HNCO [30] experiments that provide solely intraresidual 1 H(i), 15 N(i) and 13 C9(i) and sequential 1 H(i), 15 N(i) and 13 C9(i-1) correlations, respectively.In this way, a nearly complete assignment of 1 H N , 15 N, 13 C9, 13 Ca and 13 Cb resonances was obtained.However, one proline residue as well as the N-terminal segments 1 MSEH 4 and 8 SQSS 11 remained unassigned.We reckoned that the absence of NH correlations in the N-terminal part is due to rapidly exchanging amide protons and to extend resonance assignments for these residues, we employed a suite of Ha-detected experiments that are less susceptible to fast NH exchange rates [31,32].Using this approach, we were able to obtain a nearly complete assignment of 1 Ha, 13 C9, 13 Ca and 15 N resonances also in the N-terminal part of PAGE5 (Supplementary Table S1).
Chemical shift analysis reveals transiently populated secondary structure elements NMR chemical shifts are extremely sensitive reporters of transient structural motifs.In proteins, so-called secondary chemical shifts can be used for probing fractional secondary structure e.g.transient a-helices or extended conformations [33].We compared the nearest neighbor effect corrected random coil chemical shifts obtained from Ac-QQXQQ-NH2 peptide recorded at neutral pH and milder urea concentration [34] to experimentally observed chemical shifts of PAGE5 [34].A positive (negative) inclination of 13 Ca and 13 C9 chemical shifts from the corresponding random coil shifts is an indication of a-helical (bstructure) propensity for a given segment of residues.A similar but opposite phenomena can be observed for 15 N chemical shifts i.e. chemical shifts that are negative (positive) indicate propensity for a-helical (b-structure) conformation.Figure 3A shows secondary chemical shifts for 13 Ca spins as a function of amino acid sequence of PAGE5.Chemical shift data reveal that PAGE5 is mostly disordered protein but it contains a few transiently populated secondary structure elements or local structural segments. 13Ca shifts are the most reliable indicator of any residual secondary structure and clearly highlight consecutive positive secondary chemical shifts for a region encompassing residues 66 Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 .This strongly suggests presence of fractional a-helical conformation in this region.These observations coincide closely with 13 C9 chemical shift data that display significant positive deviation from random coil shifts of residues in 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala 75 indicating that the polypeptide have a bias to a-helical and bstrand (extenteded) conformations (not shown).In addition, large deviations from random coil shifts for residues in the C-terminal segment 99 -Pro-Thr 100 hints nascent local structural order for this short stretch.The region encompassing residues 32 Thr-Glu-Glu-Lys-Arg-Gln-Glu-Glu-Glu-Pro-Pro 42 shows much vaguer tendency to negative 13 Ca (as well as 13 C9, not shown) secondary chemical shifts, which makes the observation of more extended conformation elusive.Next, a more quantitative analysis is given by the secondary structure propensity (SSP) score [35] using 1 Ha, 13 Ca, 13 Cb chemical shifts, was employed.In the SSP analysis, ahelical and extended (b-strand) structures will get positive and negative scores, where +1 and 21 indicate fully formed a-helix or b-structure, respectively.For PAGE5, the regions encompassing residues 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 and 29 Gln-Gln-Pro-Thr-Glu-Glu-Lys-Arg-Gln-Glu-Glu-Glu-Pro-Pro 42 populate a-helical and extended conformations albeit the corresponding propensities are low 18% and 9%, respectively (Figure 3B).While this is in good accordance with analysis based on 13 Ca (and 13 C9) secondary chemical shifts for helical segments some discrepancy exist in determining the extended structures.To conclude, NMR chemical shift data correlates well with the secondary structure prediction made by PSIPRED algorithm which suggested propensity for a-helical conformation in residues 67 Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu-Lys-Ile 79 .
Heteronuclear 2 J NCa and 1 J NCa couplings reveal tendency towards transient secondary structure Although secondary chemical shifts are highly useful for identification of transiently populated secondary structure ele-  13 Ca, 13 Cb chemical shift of PAGE5.(A) Chemical shift deviations from random coil shifts for 13 Ca(red bars) as a function of primary structure.(B) Secondary structure propensity score for PAGE5. 1 Ha, 13 Ca, 13 Cb chemical shifts were used for calculations.Regions with suggested transient secondary structure elements are shaded.doi:10.1371/journal.pone.0026633.g003ments in IDPs, further evidence can be obtained from analysis of J couplings.Given that observed scalar couplings are population weighted averages of couplings sampled over various conformations, any deviation from random coil values can be interpreted as a secondary coupling contribution in analogy to secondary chemical shifts.While quantitative description of the relation between protein secondary structure and one-bond couplings between 15 N(i) and 13 Ca(i) ( 1 J NCa ) or two-bond couplings between 15 N(i) and 13 Ca(i-1) ( 2 J NCa ) is difficult, 2 J NCa is extremely valuable in distinguishing between a-helical or turns, and b-structure [36].Indeed, fully formed a-helix exhibits 2 J NCa couplings varying within the range 5.5-7 Hz, whereas b-structures display 2 J NCa couplings between 8-10 Hz [36,37].Likewise, 1 J NCa couplings larger than 11 Hz can be associated to b-strands i.e. y angles 120-180u, whereas values smaller than 9.5 Hz are typically not found for b-strands (y,100-180u).Observed 2 J NCa couplings for the 64 Gly-Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 segment show a consecutive stretch of smaller than average values in comparison to flanking regions, which is in good accordance with the transiently populated a-helix recognized in the secondary chemical shift analysis above (Figure 4A).Likewise, 1 J NCa couplings show diminished values for this part of the PAGE5 sequence, providing further evidence of fractional a-helicity.In contrast, residues 32 Thr-Glu-Glu-Lys-Arg-Gln-Glu 38 which according to the SSP analysis populate b-strand for a given fraction of time, show slightly elevated values for 2 J NCa coupling, which supports observation based on secondary chemical shifts i.e. transient extended conformation found for this region.Pro-31 in the middle of the segment is likely to induce a kink to a b-strand.Interestingly, residues 97 Phe-Asp-Pro-Thr-Lys-Val 102 also display small 1 J NCa or 2 J NCa couplings, which fit in with perception of a short helical stretch in the SSP score analysis.It is noteworthy that prolines have significantly larger 2 J NCa couplings than the vast majority of nonproline residues.However, Pro-99, which is located in the 97 Phe-Asp-Pro-Thr-Lys-Val 102 motif has drastically smaller 2 J NCa coupling value, further supporting local structural ordering for this segment (Supplementary Table S1).

Transiently populated secondary structures show decreased exchange rates with solvent
Labile amide protons that are part of rigid, structured segments in the amino acid sequence are typically protected from chemical exchange process with solvent.In contrast, residues in flexible parts of the polypeptide chain have typically solvent exposed amide protons with modest protection against solvent exchange i.e. they show increased exchange rates compared to residues that are part of secondary structures.This exchange phenomenon can be studied using H/D spectroscopy, where site-specific signal decay is monitored after dissolving the protein sample into D 2 O.In case of IDPs, this is often impractical as H/D exchange is relatively rapid in comparison to globular proteins.Instead, selective saturation transfer from solvent protons to amide protons using the so-called CLEANEX-PM experiment [38] can be employed.In this approach, water magnetization is selectively transferred to amide protons in a series of spectra with increasing mixing times.Figure 4B shows observed ratio of saturated vs. reference spectra of amide proton cross peak intensities for two mixing times (10 ms and 25 ms).Those residues, which are less accessible to solvent show decreased ratios compared to solvent exposed residues especially with shorter mixing times.Strikingly, the C-terminal part of PAGE5, especially residues 69 Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 and 99 Pro-Thr-Lys-Val 102 , exhibit significant protection from solvent exchange, indicating presence of local structural motifs in these regions.In contrast, the N-terminal part of PAGE5 is clearly more prone to exchange with solvent.

Reduced spectral density mapping indicates restricted sub-nanosecond motions in regions with fractional ordering
It is evident that internal molecular dynamics deviate between fully formed secondary structure elements and random flight chain due to more restricted motional freedom in the former.NMR spectroscopy offers unique opportunity to study protein dynamics at residue-level by measuring 15 N auto-correlated relaxation rates [39].Therefore, observed variation in local dynamics reports differences in molecular motions in these areas, which in turn is an indication of difference in local rigidity or stiffness of polypeptide backbone.
Three different 15 N relaxation rates can readily be measured using a 15 N labeled sample.Steady-state heteronuclear { 1 H}- 15 N NOEs, 15 N longitudinal (R 1 ) and 15 N transverse (R 2 ) rates, expressed in terms of the spectral density function, J(v), for dipolar relaxation of 15 N by 1 H spin are defined as where d~( hm 0 c H c N 8p 2 )Sr {3 NH T and c~D sv N ffiffi ffi 3 p , v H and v N are the Larmor frequencies of 1 H and 15 N, c H and c N are the gyromagnetic ratios of 1 H and 15 N, h is Planck's constant, m 0 is the permeability of free space, r NH corresponds to the N-H bond length (1.02A ˚) and Ds is the chemical shielding anisotropy with an axially symmetric tensor (Dsv N = 2160 ppm).R ex corresponds to the chemical exchange term, which adds to observed R 2 rates, if present.
As can be inferred from Eqs. 1-3, { 1 H}-15 N NOEs are sensitive to high frequency backbone motions undergoing in picosecond timescales, whereas 15 N longitudinal (R 1 ) and transversal (R 2 ) relaxation rates are sensitive to motions taking place in slower psns timescales.In addition, 15 N R 2 relaxation rates include plausible contribution of slower motions occurring in ms-ms timescales due to conformational exchange.Hence, analysis of 15 N relaxation rates enables dissection of backbone dynamics in ps-ms timescales.
Classical model-free analysis [40] applied to globular proteins, where overall rotational correlation time (t c ) and fast internal, sitespecific, motions (t e ) are distinguished from each other is not an appropriate description of dynamics in IDPs as deconvolution of fast internal dynamics from overall molecular tumbling is violated.A more useful approach is the so-called reduced spectral density mapping (RSDM) [39,41,42] that describes spectral densities in three different frequencies, J(0), J(v N ) and J(0.87v H ). In this approach, given that c N /c H = 0.101, justified simplification is made by assuming J(v H 6v N )%J(v H ) and Eqs.1-3 now become It is now possible to derive values of J(0), J(v N ) and J(0.87v H ) from Eqs. 7-9 J(0), which is related to both 15 N R 2 and R 1 maps spectral densities in ps-ns timescales but contains also contribution from slower ms- ms timescales that is mainly governed by conformational exchange (R ex in Eq. 3).In contrast, J(0.87v H ) is only sensitive to motions on-going in subnanosecond timescales, whereas J(v N ) is sensitive to ps-ns timescales although faster (ps) and slower (ns) motions cannot be readily discriminated.
The measured 15 N R 2 and R 1 rates for PAGE5 are shown in Supplementary Figure S1 and Supplementary Table S1.In particular, experimental 15 N R 2 rates (average 15 N R 2 ,3.39 s 21 ), measured at 800 MHz 1 H frequency, are significantly lower than predicted for a globular protein of similar size ( 15 N R 2 ,11 s 21 ) confirming that PAGE5 is an IDP.Inspection of the R 2 /R 1 ratio (Supplementary Figure S1) reveals several residues with elevated R 2 /R 1 ratio i.e. their relaxation is dominated by slower time scale motions, implying restricted motional freedom for few segments e.g. 37Gln-Glu 38 , 69 Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu 76 and 100 Thr-Lys 101 corresponding to the transient structural elements identified using secondary chemical shift and J coupling analysis.A more elaborated relaxation analysis in terms of spectral density mapping at three different frequencies, is shown in Figure 5A.Restricted backbone motion in ps-ns timescales is observed for residues 69 Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu-Lys 78 as indicated by increased J(0) and decreased J(0.87v H ) spectral densities.Interestingly, however, J(0.87v H ) values show no significant decrease for 71 Gln-Gln-Glu 73 suggesting restricted backbone dynamics or conformational exchange in slower ms timescale.Increased J(0) densities can also be seen for residues 30 Gln-Pro-Thr-Glu-Glu-Lys-Arg-Gln 37 and 100 -Lys 101 .However, the former, highly charged segment, shows no significantly decreased dynamics in the ps timescale as evidenced by relatively uniform J(0.87v H ) values. 10 out of 12 first N-terminal NH resonances are broadened beyond detection due to increased NH exchange with the solvent.Plausible conformational exchange can be probed for Glu-12 flanking this region.Glu-12 shows an increased J(0) value, whereas no concomitant decrease in J(0.87v H ) is observed, confirming the additional line broadening being caused by ms-ms timescale motion in the N-terminal part of PAGE5.The very C-terminal residues display large amplitude motion in fast ps timescale manifested by very low J(0) values as well as large negative heteronuclear NOEs (Figure 2C).

Hydrodynamic radius indicates PAGE5 exists as a monomer in solution
The molecular weight estimation with SEC remained ambiguous.It was not clear, if PAGE5 exist as monomer, dimer, trimer or tetramer.To further analyze the oligomerization state of PAGE5, we used PG-SLED diffusion NMR experiment for determining the hydrodynamic radius (R h ) of PAGE5 in solution [43].By relating apparent translational diffusion rates (D trans ) measured for PAGE5 and the reference compound 1,4-dioxane, with a known R h = 2.12 A ˚, according to we obtained R h, page5 ,31.8A ˚.This agrees well with a theoretical R h of 30.2A ˚for a monomeric IDP, gleaned using a method that takes into account amino acid composition of a protein as described by Marsh and Forman-Kay [44].It is also comparable to results obtained with other proteins [44].
PAGE5 DNA binding studies and effect of pH to secondary structure of PAGE5 Although secondary chemical shifts are highly useful for identification of PSEs that is transiently populated helical or extended conformations establishing potential interaction modules, not necessarily all these regions correspond to binding epitopes or MoRFs.DNA binding features of PAGE5 were predicted using the DBS-Pred software package, which predicted probability of PAGE5 to bind DNA is 86%.This, as well as experimental data on homologous PAGE4 protein [12] led us to study plausible DNA binding of PAGE5.To this end, we employed 15 N-HSQC based approach for monitoring PAGE4like DNA binding induced chemical shift perturbations on PAGE5 sample upon addition of double stranded DNA fragment pool.In addition we used electrophoretic mobility shift assay (EMSA) for identifying DNA binding, using similar DNA fragments (Supplementary Figure S2).Although we were not able to observe either any perturbations or mobility shift, this does not exclude possibility that PAGE5 recognizes specific DNA sequence.
For studying the effect of pH on structure of PAGE5, we compared the chemical shifts at three different pHs, at 5.0, 6.5 and 8.5, all of which are above the theoretical pI of PAGE5, 4.13.By measuring 15 N-HSQC spectrum at pH 8.5 where amide proton exchange with solvent is especially pronounced, resulted in disappearance of the vast majority of amide protons and only 23 remained visible, which mainly belong to hydrophobic residues (Figure 2A).However, the chemical shifts of these remaining residues did not change.At acidic conditions (pH 5), amide proton and nitrogen chemical shifts of were significantly altered and also N-terminal HN resonances became visible.Interestingly, comparison of 13 Ca chemical shifts at pH 5.0 to the corresponding chemical shifts at pH 6.5, underpins increasing propensity for the a-helical conformation at the region of 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 (Figure 5B).On the contrary, the difference in 13 Ca chemical shifts between 6.5 and pH 8.5 were insignificant (data not shown).These observations confirm that the a-helical propensity of the segment 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 further increases at acidic pH.According to Zbilut et al. [45] proteins which fold via transient secondary structures have lower net charge and higher hydrophobicity in comparison to two-state folders [45].Charge distribution along the primary sequence of PAGE5 is rather uniform, except for the region comprised of residues 19-32, which is free from the charged residues (Supplementary Figure S3).According to hydropathy score plot, hydrophobicity of the PAGE5 is highest at the regions encompassing residues 22-27 and 71-81.The lowering pH decreases the net charge of the latter region (71-81) and may explain the increased a-helical propensity observed by the chemical shift analysis.If transient secondary structured regions serve as MoRFs, decreased intracellular pH of the cancer cell may have biologically significance, promoting interactions between natively disordered PAGE5 and its binding partner.

Conclusion
Taken together, in the present study we have shown using the experimental data at single residue resolution level that PAGE5, a member of GAGE family proteins, is an intrinsically highly disordered protein.However, there are few regions with predominant secondary structure propensities, i.e. 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 as well as 97 Phe-Glu-Pro-Thr-Lys-Val 102 showing propensity to form ahelical conformations.These regions were identified using secondary chemical shift, J coupling, relaxation as well as H/D exchange data concomitantly.Although propensities for these secondary structures elements are low, the segment 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 was predicted by PSIPRED algorithm.Less compelling evidence of transient extended conformation can be found for residues 29 Gln-Gln-Pro-Thr-Glu-Glu-Lys-Arg-Gln-Glu-Glu-Glu-Pro-Pro 42 , if present, the population is low.It is plausible that these transiently populated secondary structure regions serve as PSEs or MoRFs for PAGE5 thus being potential interaction sites for the natural binding partners of PAGE5 in cancer cells and in germ line cells.Interestingly, we also found that at acidic pH, the MoRF region ( 65 Thr-Asp-Val-Glu-Ala-Phe-Gln-Gln-Glu-Leu-Ala-Leu-Leu 77 ) more prominently populates a-helical secondary structure as compared to neutral pH (6.5).This study also illustrates how solution state NMR spectroscopy can be utilized for characterization of unfolded proteins and recognition of transiently populated conformations at single residue resolution.

NMR sample preparation
Gene encoding variant 2 of CT16 (GeneBank accession code NM_001013435) was cloned into a pGEX-2T as described previously [17]. 13C, 15 N labelled PAGE5 was expressed in Eschericia coli BL21, using 2 g/l 13 C D-glucose and 1 g/l 15 NH 4 Cl, as sole carbon and nitrogen sources, respectively.Glutathione-Stransferase (GST) fused PAGE5 was purified and thrombin cleaved as described earlier [17].Cleaved PAGE5 was applied into the Superdex S75 size-exclusion column with NMR buffer, containing 20 mM sodium phosphate, 50 mM NaCl, pH 6.5.Fractions containing PAGE5 protein were pooled and concentrated by using Vivaspin2 centrifugal concentrator (MWCO = 2 kDa) to final protein concentration of 1 mM.Prior to NMR measurements 7% of D 2 O was supplemented into the sample.Protein concentrations were measured using Bio-Rad Protein Assay (Bio-Rad) based on the method of Bradford, using bovine serum albumin (BSA) as a reference.NMR samples with 7% D 2 0, were also prepared at varied pHs, i.e. 20 mM Bis-Tris, pH 5 and 20 mM Tris-HCl, pH 8.5.

Preparation of dsDNA pool and EMSA experiment
The degenerate dsDNA pool was prepared by PCR using the primer and template sequences described in [12].The template contained a 10-base degenerate stretch of any of the four nucleotides.In addition, a corresponding template having a 10base stretch of nucleotides G or C was designed and used to prepare a GC-rich dsDNA pool.To label the dsDNA pools for EMSA, the PCR amplification was repeated in the presence of 67 nM [a-32 P]dCTP.EMSA was performed by incubating 20-ml reactions containing 10 mM [a-32 P]dCTP-labeled degenerate dsDNA pool; 80, 40, 20 or 0 mM PAGE5; 10% (v/v) glycerol; 50 mM KCl; and 20 mM HEPES (pH 7.4) at RT for 40 min and running them on a 6% polyacrylamide gel in TBE buffer (pH 8.3).The dried gel was visualized using a Fuji BAS-1800 phosphorimager.
DNA titrations were performed with constant 40 mM dsDNA concentration, with increasing 13 C, 15 N labelled PAGE5 concentration from 10 mM to 40 mM.Buffer used for titration experiment was 20 mM Bis-Tris, pH 6.5.Spectra were processed using VNMR 6.1C and VNMRJ 2.1C software packages (Varian Inc., Palo Alto, CA) and analyzed by Sparky [52].

Supporting Information
Table S1 15 N R 1 and R 2 relaxation rates, { 1 H}-15 N heteronuclear NOE, heteronuclear 1 JNC and 2 JNC couplings, and chemical shifts of PAGE5.(PDF) Figure S1 15 N R 1 and R 2 relaxation rates and ratio of R 2 /R 1 of PAGE5 plotted as a function of primary structure.(PDF)

Figure 2 .
Figure 2. 15 N-HSQC spectra and heteronuclear NOE suggest PAGE5 as IDP.(A) 2D 15 N-HSQC spectra of uniformly 15 N 13 C labelled PAGE5, recorded at pH 6.5 and 8.5.Assignments for remaining 23 HN signals at high pH are labeled into the spectrum.Visible correlations belong mostly to hydrophobic amino acids, also located at the region of possible PSE.(B) Sequence alignments of proteins belonging to PAGE family.Correlation peaks, which remained visible at high pH (8) are labelled above the sequence with magenta spheres.Suggested transient alpha helical and beta structures are marked with rectangular and arrow, respectively.(C) Steady-state { 1 H}-15 N heteronuclear NOE values as a function of amino acid sequence.Regions with suggested transient secondary structure elements are shaded.doi:10.1371/journal.pone.0026633.g002

Figure 3 .
Figure 3. Secondary structure prediction from 1 Ha,13 Ca,13 Cb chemical shift of PAGE5.(A) Chemical shift deviations from random coil shifts for13 Ca(red bars) as a function of primary structure.(B) Secondary structure propensity score for PAGE5.1 Ha,13 Ca,13 Cb chemical shifts were used for calculations.Regions with suggested transient secondary structure elements are shaded.doi:10.1371/journal.pone.0026633.g003

Figure 4 .
Figure 4.One and two bond NCa scalar J couplings of the PAGE5 and HN exchange with water.(A) Plot of 1 J(NCa) and 2 J(NCa) couplings in Hertz, with red and blue lines, respectively.(B) Plot of ratio of the CLEANEX experiments and reference 2D 15 N-HSQC spectrum with 25 ms (red) and 10 ms (black) mixing times as a function of amino acid sequence.Regions with suggested transient secondary structure elements are shaded.doi:10.1371/journal.pone.0026633.g004

Figure 5 .
Figure 5. Reduced spectral density plots and effect of pH to transiently populated secondary structures.(A) Plots for spectral density at zero frequency, J(0), at the 15 N, J(v N ), and at the 1 H, J(0.87v H ). Regions with suggested transient secondary structure elements are shaded.(B) 13 Ca chemical shift perturbation, at the pH 6.5 subtracted from the shifts at pH 5. doi:10.1371/journal.pone.0026633.g005

Figure S2
Figure S2 DNA binding test by EMSA.Lanes 1 to 3, 10 mM dsDNA pool containing 10-bp stretch of S nucleotides (S probe) incubated with 80, 40 and 20 mM CT16.Lanes 4 to 6, 10 mM dsDNA pool containing 10-bp stretch of N nucleotides (N probe) incubated with 80, 40 and 20 mM CT16.Lanes 10 to 12; 1, 0.1 and 0.01 mM S probe.Lanes 13 to 15; 1, 0.1 and 0.01 mM N probe.The lanes 7 to 9 are empty.Equal volumes were loaded.(PDF) Figure S3 Hydropathy Score and charge distribution at pH 5 and 6.5 plotted as a function of primary structure.The most hydrophobic regions are shaded.(PDF)