The role of electrostatic interactions and mutations that change charge states in intrinsically disordered proteins (IDPs) is well-established, but many disease-associated mutations in IDPs are charge-neutral. The Val66Met single nucleotide polymorphism (SNP) in precursor brain-derived neurotrophic factor (BDNF) is one of the earliest SNPs to be associated with neuropsychiatric disorders, and the underlying molecular mechanism is unknown. Here we report on over 250 μs of fully-atomistic, explicit solvent, temperature replica-exchange molecular dynamics (MD) simulations of the 91 residue BDNF prodomain, for both the V66 and M66 sequence. The simulations were able to correctly reproduce the location of both local and non-local secondary structure changes due to the Val66Met mutation, when compared with NMR spectroscopy. We find that the change in local structure is mediated via entropic and sequence specific effects. We developed a hierarchical sequence-based framework for analysis and conceptualization, which first identifies “blobs” of 4-15 residues representing local globular regions or linkers. We use this framework within a novel test for enrichment of higher-order (tertiary) structure in disordered proteins; the size and shape of each blob is extracted from MD simulation of the real protein (RP), and used to parameterize a self-avoiding heterogenous polymer (SAHP). The SAHP version of the BDNF prodomain suggested a protein segmented into three regions, with a central long, highly disordered polyampholyte linker separating two globular regions. This effective segmentation was also observed in full simulations of the RP, but the Val66Met substitution significantly increased interactions across the linker, as well as the number of participating residues. The Val66Met substitution replaces β-bridging between V66 and V94 (on either side of the linker) with specific side-chain interactions between M66 and M95. The protein backbone in the vicinity of M95 is then free to form β-bridges with residues 31-41 near the N-terminus, which condenses the protein. A significant role for Met/Met interactions is consistent with previously-observed non-local effects of the Val66Met SNP, as well as established interactions between the Met66 sequence and a Met-rich receptor that initiates neuronal growth cone retraction.
Intrinsically disordered proteins are proteins that have no well-defined structure in at least one functional form. Mutations in one amino acid may still affect their function significantly, especially in subtle ways with cumulative adverse effects on health. Here we report on molecular dynamics simulations of a protein that is critical for neuronal health throughout adulthood (brain-derived neurotrophic factor). We investigate the effects of a mutation carried by 30% of human population, which has been widely studied for its association with aging-related and stress-related disorders, reduced volume of the hippocampus, and variations in episodic memory. We identify a molecular mechanism in which the mutation may change the global conformations of the protein and its ability to bind to receptors.
Citation: Lohia R, Salari R, Brannigan G (2019) Sequence specificity despite intrinsic disorder: How a disease-associated Val/Met polymorphism rearranges tertiary interactions in a long disordered protein. PLoS Comput Biol 15(10): e1007390. https://doi.org/10.1371/journal.pcbi.1007390
Editor: Bert L. de Groot, Max Planck Institute for Biophysical Chemistry, GERMANY
Received: May 31, 2019; Accepted: September 10, 2019; Published: October 18, 2019
Copyright: © 2019 Lohia et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All trajectory and simulation files are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.fc0qr81. The file Val66Met_trajectories.tar.gz contains MD trajectories for the V66 prodomain sequence and M66 prodomain sequence simulations at 300K as described in the manuscript. The file setup_scripts.tar.gz contains bdnf-val66met simulation setup scripts, input coordinates, topologies, mdp and force field files. Software, version, availability: SPARTA+, Version 2.90, https://spin.niddk.nih.gov/bax/software/SPARTA+/. Hydropro, Version 10, http://leonardo.inf.um.es/macromol/programs/hydropro/hydropro.htm. Gromacs, Version 5.1.2, ftp://ftp.gromacs.org/pub/gromacs/gromacs-5.1.2.tar.gz. VMD, Version 1.9.2, https://www.ks.uiuc.edu/Research/vmd/. Cytoscape, 3.7.1., https://cytoscape.org/download.html.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
The physiological significance of intrinsically disordered proteins (IDPs), which can explore a wide range of conformational ensembles in their functional form, is now well-established [1–5]. More than 33% of eukaryotic proteins contain disordered regions longer than 30 residues , many of which are involved in critical biological functions, including transcriptional regulation  and cell signaling [7–9]. Long intrinsically disordered regions are particularly abundant among cancer-associated  and neurodegenerative-associated proteins [11, 12].
IDP amino acid sequences tend to be low-complexity [13, 14] and include numerous charged residues, often in long repeats [1, 15]. In contrast to ordered proteins, in which a complex sequence encodes a well-defined tertiary structure, an IDP sequence determines a heterogeneous conformational ensemble [16–18]. More than 35% of IDPs reported in DISPROT  are strong polyampholytes, and their ensemble properties can be predicted using statistical theories of polyampholytes from polymer physics and global properties of the sequence, including the fraction of charged residues and the separation of oppositely charged residues (Fig 1c) [20–23]. This role is consistent with the long-range nature of electrostatic interactions, which can affect coupling between distant residues in an otherwise disordered structure.
a) The two functional domains of precursor BDNF: the disordered prodomain considered in this manuscript and the structured mature domain BDNF (mBDNF). b) The mean hydrophobicity (〈H〉) per residue (top), given by the Kyte-Dolittle  score averaged over a three residue window, and scaled to fit between 0 and 1 was digitized (bottom) according to a cutoff at 〈H〉 > 0.37. Four or more contiguous residues above the cutoff were identified as forming a hydrophobic blob. Eight hydrophobic “h” blobs (darkgrey) are identified along with 3 “p” blobs of low hydrophobicity (light grey). c) The diagram of IDP states proposed by Das and Pappu , based on fraction of positive (f+) and negative (f−) charged residues, and annotated by the location of the simulated BDNF prodomain and each blob identified in panel b. d) Location of simulated BDNF prodomain and each blob on an Uversky diagram  of IDPs and globular proteins, as a function of absolute net charge per residue (|NCPR|) and 〈H〉, with the boundary line between folded and disordered proteins given by the equation in the legend. e) Blobs identified in panel b, colored according to (i) the region of the Das and Pappu  diagram in panel c or (ii) sign of net charge, where red is negatively-charged, blue is positively-charged, and white is neutral. The blob h2b contains the Val66Met SNP and is marked with star. Additional properties of the blob sequences can be found in Table 1.
Although IDP sequences are low-complexity and do not encode a well-defined structure, single residue substitutions can still have functional effects that are significant for the organism . More than 25% of disease-associated missense single nucelotide polymorphisms (SNPs) are found in IDPs . Although detectable, the relatively subtle functional effects of these SNPs may lead to relatively weak selection pressure, whether positive or negative, allowing the mutation to persist at high frequencies within a population. Numerous structural and simulation studies [26–32] have demonstrated clear effects of single charged-residue insertion, deletion, or substitutions on conformational ensemble and aggregation of IDPs monomers. Simple electrostatic models predict that modifications of residue charge will directly affect ensemble properties [20, 26, 33, 34]. Locally, such mutations can modulate residual secondary structure preferences via forming or breaking local salt-bridges or by introducing helix breaking residues [27, 31, 35].
For IDPs with a relatively low fraction of charged residues, typical of the Janus region of the state diagram proposed by Das and Pappu [20, 21] (Fig 1c), more subtle differences among neutral amino acids play an increasingly important role in determining the ensemble. More than 40% of disease-associated IDP polymorphisms annotated in the human UniProtKB/Swiss-Prot database  are substitutions between two charge-neutral residues. The extent to which such substitutions in IDPs can affect non-local aspects of the conformational ensemble is uncertain; these substitutions directly affect short-range interactions, and structure-based coupling between distant residues in IDPs is expected to be weak. Nonetheless, correlations between secondary structure of distant residues has been frequently observed in IDPs [27, 37, 38]; for example, several cancer mutations in transactivation domain of tumor suppressor p53 can lead to helicity changes in residues sequentially far away from the mutation sites .
In structured proteins, contacts between residues distant along the sequence are reflected in the tertiary structure, but developing a framework for describing the analogous property in IDPs has not been straightforward. Among traditional structural biology techniques, NMR has been most useful for characterizing IDPs, but is frequently limited to residual secondary structure (Ref. [11, 39] and references therein). Molecular dynamics (MD) simulations have played a significant role in understanding IDP structure and dynamics [40–45], but face limitations on chain length similar to those incurred in simulations of protein folding. Most unbiased simulations have been performed in implicit solvent and/or involve chains too short to meaningfully sample contacts between residues far apart on the peptide chain. Studies of aggregation among multiple shorter monomeric IDPs [46, 47] have provided some of the most useful frameworks for considering tertiary contacts between residues that are distantly connected along the peptide backbone. Point mutations are also known to affect these contacts via differential salt-bridge and hydrogen-bonding formations, with mutations that change charge states affecting conformational ensemble via altered salt-bridge networks .
Many SNPs in IDPs are associated with neurological, aging-associated neurodegenerative, or psychiatric disorders; despite an exponential increase in the amount of available genetic data, identifying the genetic origins of such disorders has proven remarkably challenging, with few variants identified as replicable predictors of disease. One of the earliest identified variants is the Val66Met SNP (rs6265) in precursor brain-derived neurotrophic factor (BDNF), a signaling protein that retains a critical role in neurogenesis and synaptogenesis throughout adulthood [48, 49] (Fig 1a). It has been implicated in maintenance of the hippocampus [50, 51], orientation selectivity in the visual system [52–54] and the mechanism underlying action of numerous antidepressants [55, 56], including rapidly acting low-dose ketamine . An extensive library of genome-wide association studies (GWAS) have repeatedly identified the Val66Met SNP as reducing hippocampal volume and episodic memory, as well as predicting increased susceptibility to neuropsychiatric disorders including schizophrenia, bipolar, and unipolar depression, but associations have been inconsistent and population dependent [57–61].
Difficulties in obtaining unambiguous disease associations at the precursor BDNF Val66Met SNP using GWAS are paralleled by challenges in characterizing its effects on the properties of the BDNF prodomain using structural techniques. A crystal structure of a homologous neurotrophic factor in complex with a shared receptor revealed a well-defined volume corresponding to the prodomain, but lacked resolvable density . The prodomain sequence falls in the Janus sequence region in the phase diagram proposed by Das and Pappu [20, 21].
It was subsequently revealed that the cleaved prodomains are found in monomeric states in vivo, and the M66 (but not V66) form binds to SorCS2 (sortilin-related VPS10p domain containing receptor 2), leading to axonal growth cone retraction  and eliminated synapses in hippocampal neurons . NMR measurements on the prodomain confirmed significant intrinsic disorder for both forms, with differential secondary structure preference around residue 66 . Tertiary contact distances from NOEs were not accessible, however, and uncertainty in interpretation of the NMR signal prevented evidence of non-local effects on secondary structure from being conclusive. Additional NMR experiments implicated residue 66 in binding of M66 prodomain to SorCS2 .
In this work, we aimed to provide insight into the following questions: (1) What interactions drive the secondary structure change local to residue 66 observed through NMR? (2) How can we meaningfully detect tertiary interactions in a long disordered protein? (3) Do effects on tertiary interactions explain the non-local secondary structure changes previously observed through NMR? (4) How and why does the Val66Met mutation change tertiary interactions, especially as a charge-neutral mutation? To achieve these aims, we conducted unbiased fully-atomistic replica-exchange MD simulations of the 91 residue BDNF prodomain in explicit solvent, for V66 and M66 sequence.
We begin by identifying globular regions, or blobs, within the protein using a sequence-based approach based on residue hydrophobicity; this is useful for both conceptualizing the long disordered protein in the absence of a well-defined topology, as well as focusing the analysis. We then compare our simulation results with previous NMR results of Anastasia et al.  and discuss the effects of the Val66Met SNP on residual secondary structure. We propose and apply an approach for decoupling short-range structural correlations from long-range structural correlations, by comparison with a simplified polymer model parameterized from the MD trajectories. We then discuss the effect of the Val66Met SNP on the network of correlated β strands between distant residues, illustrating how effects of the mutation propagate to tertiary contacts in which the mutation is not involved. Finally, we identify individual residue sidechains that drive the observed effects on this network. Our results suggest an important and previously-unconsidered role for specific Met-Met interactions in transducing the effects of the BDNF Val66Met SNP, and confirm the presence of weak but long-range structural correlations in a disordered protein.
Results and discussion
Prodomain sequence decomposition
The region of the BDNF prodomain studied using NMR , and simulated here, is 91 residues long. Conceptualization of long structured proteins relies heavily on the consecutive secondary structure elements that form the protein’s topology, allowing for a coarse cartoon-style representation. No such approach for constructing an IDP topology has been available. Our original motivation for identifying globular segments in the sequence was to improve statistical power in analyzing contacts, but we found the resulting topological description to be broadly useful for interpretation of results. We thus present this conceptual tool upfront for clarity.
To avoid ambiguity, we restrict use of the term “domain” to refer to the two major BDNF domains (mature domain and prodomain), and instead specify three levels of hierarchy below the domain level: the prodomain contains multiple “regions”, regions contain “groups”, and groups contain “blobs”. Blobs and groups were identified by sequence alone, as described in Methods, while regions were identified by Monte Carlo simulation of a simplified polymer representing the blobs.
The sequence-analysis approach outlined in Methods divides the sequence into alternating groups, classified as either hydrophobic (h groups) or non-hydrophobic (p groups). The prodomain is composed of six such groups, notated as p1-h1-p2-h2-p3-h3 from N-terminus to C-terminus. The h groups are further divided into blobs (Fig 1b), indexed with a letter. Each hydrophobic group contains two to four blobs: h1 contains h1a and h1b, h2 contains h2a and h2b, and h3 contains h3a, h3b, h3c, and h3d. We denote multiple consecutive blobs within a group by multiple letters: h3ab indicates the stretch of residues between the beginning of blob h3a and the end of blob of h3b. Each p group consists of just one blob. The results in Regions of tertiary enrichment led us to further designate Region I (containing p1 through h2), Region II (comprised of p3) and Region III (comprised of h3).
Since each blob sequence has its own properties (Table 1), this process also suggested a new, more tractable conceptualization of the long, disordered BDNF prodomain. Each blob can be analyzed individually according to Das and Pappu metrics  (Fig 1c) or Uversky metrics  (Fig 1d), while several other sequence properties of each blob are shown in Table 1. The Das and Pappu phase diagram  predicts the compactness of IDPs based on their fraction of positively (f+) and negatively (f−) charged residues (Fig 1c). Hydrophobic blobs h2b and h3a lie in the strong polyelectrolyte and Janus sequence region respectively. All the remaining hydrophobic blobs are classified as weak polyampholytes and, as isolated peptides, would be predicted to have compact globule conformations to shield hydrophobic residues . Linker blobs p1 and p2 also lie in the Janus sequence region, while blob p3 lies in the strong polyampholyte region. The charge distribution parameter κ  is also low for p3, which is predicted to have random coil conformations if present as an isolated peptide.
The Uversky diagram  characterizes proteins as globular or intrinsically disordered based on their normalized mean hydrophobicity (〈H〉) and absolute net charge per residue (|NCPR|) (Fig 1d). The proteins falling above the boundary line are predicted to be globular proteins, while the ones below that line are predicted to be IDPs. With the exception of hydrophobic blobs h2b and h3a, all hydrophobic blobs identified here fall in the globular side of the boundary. Blobs h2b, h3a and p1 fall on the disordered side of the boundary, while p2 and p3 fall deep in the disordered side of the boundary.
The blob h2b contains V/M66, and has several unique properties among the identified blobs: 1) it is located at the sequence midpoint 2) it is the only strong polyelectrolyte blob 3) it has the strongest NCPR (-0.38) among all the blobs 4) its sequence is composed almost entirely of two competing residue types, yielding the uncommon mix of a highly-charged, hydrophobic blob. Considering mean hydrophobicity alone, Uversky et al.  found 〈H〉 ∼ 0.48 ± 0.03 for a set of 275 folded proteins and 〈H〉 ∼ 0.39 ± 0.05 for a set of 91 unfolded proteins. By this criteria, we would expect the h2b sequence to be folded: for V66-h2b, 〈H〉 ∼ 0.54, while for M66-h2b, 〈H〉 ∼ 0.50. The full Uversky diagram also considers NCPR, and the high NCPR pushes h2b into the IDP region of the Uversky diagram .
More specifically, this blob sequence (HV/MIEELLD) has hydrophobic residues at i, i-1, i+3, and i+4 separated by acidic residues at i+1 and i+2. Helix formation would thus segregate hydrophobic residues from acidic residues but would also increase the density of like-charge residues. Similar sequences are observed in the activation domains of transcription factors: a motif of alternating hydrophobic and acidic residues folds into an amphipathic helix upon binding, and the interactions between the amphipathic helix and the binding partner are mediated by hydrophobic residues, not charged residues [68–72]. Staller et al.  have earlier reported that in the disordered acidic activation domain of Gcn4, the acidic residues keep key hydrophobic residues exposed to solvent and binding partners.
The blob h3a is a unique hydrophobic Janus blob with high NCPR. Janus sequences have intermediate compositional biases and their conformations are context dependent [20, 21]. The SNP blob h2b and the Janus blob h3a are separated by the long (15 residue) strong polyampholyte linker p3, which has well mixed charge (κ = 0.1). The blobs h1a and h3b are positively charged and all the remaining hydrophobic blobs are negatively charged (Fig 1e).
Comparison of experimental observables and their computational analogues
NMR spectroscopy  has previously confirmed the intrinsic disorder of the prodomain. Many of the common force-field and water model combinations used for MD simulations are optimized for folded proteins, and are not recommended for IDPs [73, 74]. Piana et al.  showed that several such force-field and water model combinations produced substantially more compact disordered states when compared with experiments. In order to predict accurate ensembles of the prodomain, we tested several force-field and water model combinations, optimized for IDPs, including a03sbws [75, 76] with Tip4p/2005 , a99sbws [76, 78] with Tip4p/2005 , a99sb*-ildn-q [78, 79] with Tip4p-D  and c36m  with Tip3p  on 30 residue fragments of the V66 prodomain using temperature replica-exchange molecular dynamics (T-REMD), further described in S1 Table. To minimize the effects of loss of long-range contacts in the 30 residue fragment, only ΔδCα were compared; ΔδCβ is more dependent on β-pairing within the sequence. Among all the force-fields tested, only a03sbws with Tip4p/2005 and a99sb-ildn with Tip3p yielded significant deviations from NMR. The three remaining force-fields compared reasonably well (ΔδCα RMSD <0.5 ppm) (S1 Fig, S1 Table). This is also consistent with the force-field comparison study by Robustelli et al. , which observed that for IDPs with little or no secondary structure, both c36m and a99sb*-ildn-q with Tip4p-D yielded the best agreement with experimental NMR measurements.
The a99sb*-ildn-q/Tip4p-D force-field was used for the full prodomain MD simulations further described in Methods. Fig 2 shows the Cα and Cβ secondary chemical shifts calculated from the full-length simulations using SPARTA+  (further described in Methods) and compares them with the NMR secondary chemical shifts obtained from Anastasia et al.  for the V66 and M66 sequences. We obtain good agreement with NMR secondary chemical shifts: the discrepancy at each residue is <0.7 ppm, which is less than the individual SPARTA+ prediction uncertainties of ∼ 1 ppm .
a) ΔδCα (top), ΔδCβ (middle), ΔδCα-ΔδCβ (bottom) values from NMR at 280K (black lines)  and MD at 300K for the V66 (a) and M66 (b) sequences. The gray region represents a discrepancy of more than 1 ppm from NMR secondary chemical shifts. Root-mean-squared deviation (RMSD) represents the deviation between the NMR and MD values. Error at each residue is calculated as the standard error in the mean, where n = 1088 is the product of the total number of replicas simulated and the average number of roundtrips per replica. Panels are annotated by a blob representation of the prodomain, as in Fig 1e(i); vertical grey lines in each panel represent the blob boundaries.
Comparison of the simulated hydrodynamic radii (Rh) generated from MD and from NMR/SAXS is an additional useful validation measure [73, 84, 85]. Rh was calculated from the trajectory using Hydropro  (further described in Methods). Mean hydrodynamic radii of both the V66 (〈Rh,V66〉 = 2.202 ± 0.006 nm) and M66 (〈Rh,M66〉 = 2.187 ± 0.005 nm) sequences are in excellent agreement with the experimental values from NMR diffusion measurements  (Rh,V66 = 2.24 ± 0.1 nm and Rh,M66 = 2.20 ± 0.1 nm) (Convergence and distribution are discussed in Methods). Error bars for simulation results represent statistical uncertainty and do not include the additional systematic uncertainty of about 5% or 0.1 nm associated with use of Hydropro . Although the M66 sequence is slightly more compact, the distributions of both Rh and the simulated radius of gyration (Rg) demonstrate that the V66 and M66 sequence populate closely overlapping ensembles (See Methods). Our results support previous reports [74, 82] on the importance of pairing a99sb*-ildn-q with the Tip4p-D water model in simulations of disordered proteins; prodomain simulations with Tip3P resulted in significantly more compact ensembles.
Effects of Val66Met on local and non-local secondary structure
Anastasia et al.  reported an increase in helical tendency for the M66 sequence within blob h2 and h3ab and an increase in β tendency within blob h3b in the V66 sequence (Fig 3a). Consistent with these NMR experiments , the M66 sequence demonstrates an increased tendency of forming helices within blob h2 and h3a relative to the same blobs in the V66 sequence (Fig 3b). Comparing the length of secondary structure formed at each residue (Fig 3c) reveals an even stronger effect of the mutation that would not have been detectable via NMR: Val66Met consistently increases the frequency of long helices formed within group h2.
a) ΔδCα-ΔδCβ values for the V66 and M66 sequences from NMR . Values on top are equivalent to the two NMR curves shown in Fig 2 (bottom panel), while the difference between the two curves is shown at the bottom. b) Helix (top) or β (bottom) propensity for each simulated residue of the 300K replica, defined as the probability of a given residue being part of a sequence of four or more consecutive residues whose dihedral angles place them in the helical (left) region or β (right) region of the Ramachandran map (further described in Methods). Errors represent standard error of a Bernoulli trial with n samples, where n = 1088 is the product of the total number replicas and the average number of roundtrips per replica. c) Difference (M66-V66) between probabilities of secondary structure formation of a given length, for helix (top) and β (bottom). Panels are annotated by a blob representation of the prodomain, as in Fig 1e(i); vertical grey lines in each panel represent the blob boundaries. d) Contact probability for each residue pair within the h2 group for V66 (top) and M66 (bottom) sequences. Each residue in group h2 is annotated with a circle representation and contacts found in at least 50% of the frames are represented with an edge. e) Difference (M66-V66) between the contact probabilities shown in panel d. Contacts with a population difference of at least 15% between the V66 and M66 sequences are represented by an edge.
In general, Cβ-branched amino acids, such as valine, have more restricted side-chain rotamers in helical conformation when compared with non-Cβ-branched amino acids. Creamer et. al.  ranked the entropic cost of helix formation for apolar side chains using simulations of an (Ala)8 sequence with the guest amino acid at the center, and reported a higher entropic cost of helix formation for valine when compared with methionine. In our simulations, the likelihood that V66 will be in a short helix decreases with temperature, while the opposite effect is observed for the M66 (S2 Fig). These trends are consistent with an increased entropic cost for helix formation at V66 relative to M66.
The helical structure within group h2 in M66 sequence is also stabilized by local sequence, including the favorable interaction between M66 (i) and F63 (i-3). MD simulations have previously shown the stability of a sulfur-aromatic contacts in a model helix . Fig 3d shows the residue level contact map within group h2. For the M66 sequence, M66 (i) more frequently contacts F63 (i-3) than any other residue within the blob: M66-F63 is formed 2.6 times as often as M66 (i)-E69 (i+3) (Fig 3d). We find that the largest change in intrablob contacts from V66 to M66 is the gain of contact at M66-F63 (1.7 times as often in M66 when compared with V66) followed by loss of contact at I67 (i+1)-L70 (i+4) (0.75 times as often in M66 when compared with V66) (Fig 3e). This is also consistent with a previously identified role for Met-Phe interactions [88–91].
While the effects of the Val66Met mutation on secondary structure in the blob which contains residue 66 (h2b) are not unexpected, we also observed an effect on secondary structure in group h1 and blobs h3a and h3b within group h3. As shown in Fig 3c, the increased frequency of long helices for blob h3a in the M66 sequence is comparable to the increase in blob h2b. We consider the possible tertiary origins of the non-local effects on secondary structure in Effects of Val66Met on the β-pairing network.
Regions of tertiary enrichment
The potential number of residue-residue contacts in the prodomain is 91 × 90/2 ∼ 4000, and each contact is formed infrequently (S3 and S4 Figs). Detecting significant differences for numerous weak signals is statistically prohibitive, even given the long simulations presented here. Dividing the sequence into blobs based on sequence hydrophobicity (Fig 1b), as described in Methods, helps address this analysis challenge. Such coarse-graining reduces the number of potential contacts to 11 × 10/2 = 55, while increasing the likelihood that any given contact will be formed.
We expect that even for a freely-jointed, self-avoiding heteropolymer (SAHP), contact probability between monomers would depend on monomer shape and separation, although a SAHP does not have tertiary structure. Inspired by the Kuhn treatment of real polymers , we propose that the expected intermonomer contact frequency in a SAHP can be a useful reference for detecting specific tertiary interactions (Fig 4), as long as the monomers mimic the blobs of the real protein (RP). In support of this approach, we find that within a given blob, the protein obeys Flory polymer scaling laws (S1 Text). The exponent varies across blobs (S5 Fig), capturing the intrinsic heterogeneity of the long polymer.
To decouple short-range and long-range structural correlations, this work grouped segments of the protein into blobs using sequence, and then compared contacts between the blobs to those expected for an analogous self-avoiding heteropolymer (SAHP). The SAHP was parameterized by extracting local properties (size and shape) of blobs from the real protein (RP) trajectory.
The predicted contact probabilities for this freely-jointed SAHP from Monte Carlo simulations (further described in Methods) are shown in Fig 5a. In the SAHP version of the prodomain, the chain is visibly segmented by the p3 blob. As shown in S6 Fig, shifting the p3 blob within the SAHP chain shifts the visible segmentation boundary, confirming that the p3 blob defines the segmentation. Based on this expectation, we define three regions: the pre-p3 blobs are “Region I”, p3 is “Region II”, and the post-p3 blobs are “Region III”. SAHP blobs within Region I are in contact for 61% of the frames, while SAHP blobs within Region III are in contact in 76% of the frames. In comparison, the average contact probability between Regions I and III is only 10% (Fig 5c).
a) Blob-blob contact probability for the V66 self-avoiding heteropolymer (SAHP) from Monte Carlo simulations (further described in Methods). The black boxes mark the regions identified. b) Blob-blob contact probability shown in panel a for the V66 (left) and M66 (right) sequences of the real protein (RP). The x and y axes are annotated with cartoon representation of the prodomain; circles are drawn to the scale of each blob’s size. c) Population of contacts in SAHP, RP (top) and enrichment in RP contacts with respect to SAHP contacts (bottom) for each region pair. The errors represent standard errors (n = 1088 as described in Methods). d) Difference (M66-V66) between the contact probabilities shown in panel b. e) Differences shown in panel d with respect to SAHP; interactions more frequently found in M66 or V66 sequence are in blue and orange respectively.
Fig 5b shows the probability of blob-blob contacts for both the V66 and M66 sequences of the RP, calculated analogously to those in the SAHP. The frequencies of contacts within Region I and within Region III were quantitatively consistent with the SAHP predictions. The total number of blob-blob contacts within Region I was enriched by 1.3 times the expected value for the SAHP. Within Region III, the total number was depleted by 0.9 times the expected value (Fig 5c).
In contrast, contacts between blobs on either side of the long p3 linker are more common in the RP than in the SAHP, and are also affected by the substitution at residue 66 (Fig 5c, 5d and 5e). Contacts between pre-linker Region I and post-linker Region III are about three times as common in the RP as in the SAHP, indicating specific tertiary interactions beyond those expected for a polymer undergoing a random-walk. Quantitatively, enrichment in the V66 sequence is 3.0±0.1 while enrichment in the M66 sequence is 3.4±0.1. The increased number of cross linker contacts are also consistent with the lower mean Rh and Rg for the M66 sequence.
Effects of Val66Met on the β-pairing network
To test whether the changes we observed in tertiary contacts at the blob, group, or region level could be due to a change in partnering β-strands, we applied a clustering approach. All frames were divided into 4 clusters, representing two independent collective variables with two possible values each: either a certain contact between blobs X and Y is formed or broken, and any residue in blob X is found within a stretch of 4 sequential residues in β conformation. The four clusters are thus represented as (contacting,absent), (contacting,present), (distant,absent), and (distant,present).
For each cluster, we calculated β propensity across all residues. If the X-Y contact reflects correlated β-strands, we expect a peak at residues in blob Y in the (contacting,present) cluster that is significantly higher than the signal for all other clusters. If the secondary structure in Y is used for clustering instead, the reciprocal peak (at blob X) should be reproduced. Furthermore, unless there are higher-order correlations between multiple sets of β-strands, β propensity should not depend on cluster for all residues not in blob X or Y.
This clustering process on all frames was carried out for all possible X and Y blobs, provided X and Y were not in the same group and were non adjacent blobs in sequence (S7–S17 Figs). For most pairs, there was no correlating peak in β structure. For some pairs, a peak was present in one direction but the reciprocal peak was not present in the opposite direction. This result reflected longer β-strands that extended to a neighboring blob, which had the true peak. One symmetrically significant peak (indicating correlated β structure) involving the h3b blob was observed in each sequence. The partner blob shifted from h2b in the V66 sequence to h1a in the M66 sequence (Fig 6, S15 Fig). A second correlated pair involving the blob p1 was also observed in each sequence. The partner blob for this pair shifted from h3d in the V66 sequence to h2b in the M66 sequence (S7, S12 and S17 Figs).
β propensities at each residue in the V66 sequence (top) and the M66 sequence (bottom) for four clusters. Frames were first clustered by whether the h3b-h2b (a) or h3b-h1a (b) contact was formed (purple) or broken (green), and then by whether β structure was present (solid) or absent (dashed) in h2b (panel a) or h1a (panel b). The dark-gray window indicates the contacting blob that is constrained to have high or vanishing values by construction of the cluster, while the white window indicates the contacting blob without constrained secondary structure. If the contact is coupled to simultaneous β-strand formation, the peak within the white window for the solid purple curve should be significantly higher than other curves. Errors represent standard error of a Bernoulli trial with n number of samples, where n is the product of the total number of unique replicas in a given cluster and the average number of roundtrips per replica. Panels are annotated by a blob representation of the prodomain, as in Fig 1e(i).
Despite a loss of correlated β-pairing, the contact between h2b and h3b is actually more probable in the M66 sequence than in the V66 sequence (Fig 5d). As discussed in Noteworthy residue-residue interactions stabilizing tertiary contacts, this result reflects a significant change at the residue level. In the M66 sequence, specific interactions between M66 and side-chains of residues within h3b form the contact, rather than backbone-backbone interactions. As the h3b side-chains stabilize the contact with h2b, the backbone of h3b is then free to pair with h1a, increasing the number of favorable long-range contacts and condensing the M66 sequence overall.
Noteworthy residue-residue interactions stabilizing tertiary contacts
As shown previosly in Effects of Val66Met on the β-pairing network, the Val66Met substitution causes loss of correlated β-strands between blobs h2b and h3b, while introducing correlated β-strands between blobs h3b and h1a. We consider here the effects of the substitution on these contacts at residue level. As shown in the absolute residue-residue contact probability maps (Fig 7a), both sequences frequently form contacts between hydrophobic residues in blobs h2b and h3b. The residue pairs most frequently forming the contact shift from V66-V94 in the V66 sequence to M66-M95 in the M66 sequence (Fig 7b). The residue-level contact maps also show a high probability of contacts between D72 and T91 in the V66 but not M66 sequence. As illustrated in Fig 7c, these contacts (between α carbons) are stabilized by salt-bridges between R93 and D74, in a conformation that is incompatible with a side-chain contact between V/M66 and M95.
a) Contact probability at each residue in h2b with each residue in h3 for V66 (left) and M66 (right) sequences. b) Difference (M66-V66) between the contact probabilities shown in panel a. c) Representative conformations of V66 sequence (top) and M66 sequence (bottom) showing preferred residue-level contacts in VDW representations. Residues are colored by residue type: blue:basic, red:acidic, cyan:polar, grey:hydrophobic except methionine, Met: yellow. The chain is colored according to the Das and Pappu diagram in Fig 1c. Tubes represent hydrophobic “h” blobs whereas lines represent non-hydrophobic linker “p” blobs. d) Contact probability between residues 63-69 and each residue in h3ab, when respective secondary structure is formed at each residue, for both the V66 (left) and M66 (right) sequences. Residue labels are colored according to residue type: blue:basic, red:acidic, grey:hydrophobic/polar and Met: yellow.
M95 is the only other methionine in the simulated sequence. The role of specific Met-Met interactions due to polarizable sulfur atoms is often under-appreciated, but such interactions are common in structures of folded proteins . Using ab initio calculations, Gómez-Tamayo et al.  predicted that Met-Met interactions are stronger than Met-aromatic or aromatic-aromatic interactions, due to the polarizability of sulfur. Although the fixed-charge force-field we are using (a99sb*-ildn-q) cannot explicitly capture polarizability, Gómez-Tamayo et al. demonstrate that this force-field preserves rankings of strong side-chain interactions involving methionine. In these simulations, the M66-M95 contact was about five times as common (10% of frames) as the analogous V66-M95 contact (2% of frames) (Fig 7a and 7b). Methionine-aromatic interactions also contribute to the increased number of Region I-III contacts: M66, but not V66, forms a frequent contact with F108 in blob h3d, which is also consistent with the favorable interactions between Met-Phe residues [88–90] (Fig 7a and 7b).
To determine which residue contacts between h2b and h3ab couple the secondary structure within the two blobs, we decomposed the residue-level contact maps into nine clusters. Each cluster was specified by two collective variables with three possible values each: secondary structure (helix, β, or coil) around residue 66 and secondary structure (helix, β, or coil) in h3ab (Fig 7d). The β-pairing at h2b-h3ab is stabilized via a combination of backbone hydrogen bonds between V66 and S92, salt-bridge between E64 and R93, and hydrophobic interactions between V66 and V94. The V66-M95 contact was only formed frequently within the (h2b—coil, h3ab—helix) cluster, and since this cluster was a very small part of the overall population, the contact overall was rare as well (Fig 7d). This cluster was more common in the M66 sequence, and contributes to the non-local increase in helicity around residue 95 (Fig 3b).
Summary and conclusion
We have carried out over 250 μs of fully-atomistic explicit solvent MD simulation of the 91 residue BDNF prodomain, with and without the disease-associated Val66Met mutation. These long simulations successfully reproduced the experimentally observed secondary chemical shifts and hydrodynamic radius. The simulations also correctly reproduced the location of both local and non-local secondary changes due to the Val66Met mutation in the BDNF prodomain.
We find that the highly disordered 91 residue prodomain, which as a whole falls in the Janus sequence region of the Das and Pappu phase diagram , can be meaningfully divided into 11 blobs based on sequence hydrophobicity alone. Among 8 hydrophobic blobs, we identified 2 blobs in the disordered region: the strong polyelectrolyte blob h2b (which contains Val66Met), and the Janus blob h3a. These are connected via the highly disordered long linker p3. The groups containing these unique blobs have biological significance as well: The sequence h2-p3-h3 is essential for intracellular trafficking of precursor BDNF .
We used the protein sequence to systematically design a tractable approach for coarse-graining analysis, by reducing the initial number of potential contacts from over 4000 to 55, while increasing the number of observations for each contact. Furthermore, it allowed us to isolate the most sensitive regions of the protein for examination at the residue level. This method, simply based on sequence hydrophobicity, may be a generally useful informatics strategy to suggest functionally significant regions in long disordered proteins. Our conclusions further suggest an important role for disorder heterogeneity within disordered proteins.
We were able to identify mechanisms through which a charge-neutral mutation can affect the residual secondary structure and tertiary contacts of a disordered protein. We further identified how these effects can be propagated to non-local residual secondary structure. Within its local blob h2b, the Val66Met mutation affects local contact preference due to local sequence effects (preferred Met-Phe contacts) and the reduced entropic cost of helix formation for the methionine sidechain.
The long, disordered, exposed Region II linker segregates the blob-level contact probability map: blobs within Region I or Region III have a high probability of contact, while Region I-III contacts are far less probable. We consistently observed this segregation in both simple self-avoiding hetropolymer simulations with beads mimicking identified blobs, and actual prodomain simulations. Val66Met increases the frequency of Region I-III contacts. We find here that the dominant mechanism involves replacing β-strand coupling between group h2 of Region I and group h3 of Region III with favorable Met/Met side-chain interactions between the same groups. The group h3 backbone is then exposed for interactions with the backbone of group h1, also of Region I. The non-local increase in helicity in group h3 may reflect stabilization of non-β structure by the Met-Met interactions.
Met/Met interactions have been shown to stabilize tertiary contacts in folded proteins and membrane proteins, but their role has not been investigated in disordered proteins. In general, our study supports previous observations [91, 94] that methionine plays a distinct role from true aliphatic residues in determining protein structure, and highlights the importance of mimicking its unique properties within fixed-charge force-fields.
Anastasia et al.  observed differential kinetics for interactions between the BDNF prodomain and SorCS2, and also observed that the SNP-containing blob h2b (H65 to L71) only interacts with SorCS2 in the M66 sequence. The increased interactions between M66 and SorCS2 could be attributed to increased helical propensity at that residue and/or specific Met-Met contacts. In the first mechanism, helix formation in the SNP blob segregates acidic and hydrophobic residues on opposite sides of the helix. It is possible that this preformed structure will stabilize binding. The second mechanism is suggested by the specific Met-Met interactions we observed in the isolated prodomain, as well as the high number of exposed methionines on the SorCS2 surface. It is also possible both mechanisms could contribute to stabilizing the complex, although this would require a more specific protein-protein interface.
To account for differences in starting coil conformation, we included six unique structures to represent residues 23-113 of BDNF prodomain. These structures were built using I-Tasser [95–97], Rosetta  or Modeller , and were simulated in a water box at 600K for 50 ns at a constant volume. From the six resulting trajectories, 64 structures with correct proline isomers were selected (based on at least 2ps time interval); in total, our study included 64 unique prodomain structures. All structures were cooled to 300K for 1ns, while prolines were restrained in trans-conformation. Each V66 replica was placed in a dodecahedron water box with approximately 30,500 Tip4p-D  water molecules and a 0.15M salt concentration (NaCl) for a total system size of approximately 124,000 atoms. The same volume for each replica was ensured by fixing the simulation box of each replica to the average box size (11 nm).
For the simulations we use the a99sb*-ildn-q force-field [78, 79] and the GROMACS 5.1.2 simulation package [100, 101], with a time step of 2 fs. Long-range electrostatics are calculated using the particle mesh Ewald (PME) method , with a 1 nm cutoff and a 0.12 nm grid spacing. Periodic boundary conditions are also used to reduce system size effects. The system was simulated using T-REMD  with an exchange frequency of 1ps for 2 μs, giving a total simulation time of 128 μs with NVT ensemble for each system. 64 replicas are used with temperatures ranging from 300-385K, with exponential spacing. A different random seed was used for the Langevin dynamics of each replica. The average exchange acceptance probability ranged between 0.19-0.23.
The minimum separation between the molecule and its image was less than 2 nm for less than 1% of the frames for both sequences and these frames were discarded from all analysis. Time-series of the relative measurements were generated every 100 ps. For both V66 and M66 sequences, initial 51.2 μs (800 ns × 64) trajectories were discarded for equilibration purposes, determined by plateauing of Rg (Fig 8a). Over the course of remaining 76.8 μs (1.2 μs × 64) simulations, each replica completes a minimum of 5 roundtrips and an average of 17 roundtrips for each sequence (Fig 8e). Simulation convergence was monitored using several metrics (Fig 8).
a) Trajectory (left) and distribution (right) of Rg and Rh for the 300K replicas. The shaded region represents the equilibration period discarded from the distribution and from all analysis presented in Results and Discussion. Experimental values of Rh from NMR diffusion  are 2.24±0.1 nm for the V66 sequence and 2.20±0.1 nm for the M66 sequence, and are indicated by dashed lines. b) Trajectory of helix (top) and β (bottom) propensity at residue 66 for the 300K replicas of both sequences. c) Trajectory of enrichment of total Region I-Region III contacts relative to SAHP in the 300K replica. The trajectory shows averages over a 100 ns moving window in panel a, b and c. d) Number of replicas forming the V66-V94 contact (top) and the M66-M95 contact (bottom) vs the simulation time. e) The number of round trips completed by each replica over the 1.2 μs production period.
Time-series of the Rg and end to end distance (Retoe) were calculated using respectively the g_gyrate and g_polystat utilities of Gromacs. We took Retoe as the distance between N-termini and C-termini N and O atoms respectively. Statistical uncertainties are provided for Retoe and Rg as the standard error in the mean, where n = 1088 is the product of the total number of replicas simulated (64) and the average number of roundtrips per replica (17).
〈H〉 at each residue is defined as the average Kyte-Dolittle  score with a window size of 3 residues, scaled to fit between 0 and 1. Any stretch of four or more residues with 〈H〉 > 0.37 is classified as a hydrophobic or “h” blob and any remaining stretch of four or more residues is classified as a non-hydrophobic linker or “p” blob. Multiple consecutive hydrophobic blobs without a “p” blob separating them are classified as a single group.
Secondary chemical shifts
Prior to the present study, Anastasia et al.  measured chemical shifts for the BDNF prodomain (residues 21-113) using NMR, and then used backbone NMR secondary chemical shifts to predict secondary structure via TALOS+  and SSP . For comparison with simulation data, we reinterpreted the chemical shifts directly from , deposited at Biological Magnetic Resonance Bank (BMRB). Cα secondary chemical shifts are calculated as follows: ΔδCα,MD = (δCα,MD − δCα,RC(300K)) and ΔδCα,NMR = (δCα,NMR − δCα,RC(280K)), where δCα,MD, δCα,NMR and δCα,RC are predicted Cα chemical shifts from MD simulation, NMR experiments and random coil respectively. δCα were calculated from MD simulated conformations using SPARTA+ . NMR experiments values were obtained from the data deposited at BMRB by Anastasia et al . Random coil δCα for the 91 residue BDNF prodomain were obtained using POTENCI  at pH 7, with a 0.15 M ion concentration, at 280K and 300K for NMR and MD respectively. Error at each residue is calculated as the standard error in the mean, where n = 1088 is the product of the total number of replicas simulated (64) and the average number of roundtrips per replica (17). Cβ secondary chemical shifts were calculated analogously.
Hydrodynamic radius calculation
The values for the Hydropro  parameters were: atomic level model with shell-method calculation, a = 0.29 nm, 6 minibead iterations, and σ = 0.1 to 0.2 nm. The temperature was taken to be 300 K, the solvent viscosity was 0.01 Poise, the solvent density was 1.0 gcm−3, the partial specific volume of the peptide 0.7313 cm3g−1 (V66 sequence) or 0.7304 cm3g−1 (M66 sequence), and molecular weight of the peptide was equal to 10044 Da (V66 sequence) or 10076 Da (M66 sequence). The resultant translational diffusion constants were then used for calculating Rh using the Stokes-Einstein equation. Error is calculated as the standard error in the mean, where n = 1088 is the product of the total number of replicas simulated (64) and the average number of roundtrips per replica (17).
Secondary structure calculation
Helix propensity or β propensity is expressed as the probability of a given residue being part of a sequence of four consecutive residues whose dihedral angles place them in the helical region or β region of the Ramachandran space. The helical region is defined as -100° <ϕ <-30° and -120° ≤ ψ ≤ 50° [42, 107, 108]. The β region is defined as ϕ <-80° and ψ > 50° or ψ < 120°. The error bars are the standard error of a Bernoulli trial with n number of samples, where n is the product of the total number of unique replicas in a cluster and the average number of roundtrips per replica. The length of secondary structure (SS-map)  were calculated with the above defined helical and β region.
Blob-level contact maps
As illustrated in Fig 9c, the excess distance between any two blobs i and j is defined as (1) where is the position vector of a blob i defined as the mean of its N-terminal N atom and the C-terminal O atom coordinates, calculated using g_traj utility of Gromacs. Two blobs i and j are in contact if the excess distance (de,ij) between the two is less than 0.55 nm. At residue level, two residues are in contact if the distance between Cα atoms of the two residues is 0.8 nm or less. Presented statistical uncertainties are the standard error in the mean, with n is the product of the total number of replicas forming the given contact and the average number of roundtrips per replica.
a) 〈Rg〉 vs 〈Retoe〉 for each blob of V66 sequence. Blobs are colored according to the Das and Pappu diagram in Fig 1c. Statistical error was smaller than the circles used for the representation of each blob. b) The distribution of normalized excess distances across all blob-pairs in the V66 RP, where |i − j| > 1. c) Relationship between the radius of gyration Rg,i, end to end distance Retoe,j, and excess distance dij, calculated for each blob or blob pair using a RP trajectory. d) The SAHP is a chain with each monomer representing a blob of the RP and modeled as a hard sphere. Each monomer i has radius aRg,i and is separated from monomer i + 1 by bond length (Retoe,i + Retoe,i+1)/2. Bond lengths are constrained and bond angles can rotate freely.
Self-avoiding heteropolymer simulation
The BDNF prodomain was approximated as a freely-jointed self-excluding heteropolymer with 11 monomers, each mimicking one of the blobs identified in Fig 1b. As illustrated in Fig 9d), the separation between monomers i and i + 1 (analogous to the Kuhn length for a homopolymer ) was constrained to be half the end to end distance for each of the analogous blobs: (2) where 〈Retoe,i〉 was determined from the coordinates of blob i residues in the MD simulations, shown in Fig 9a.
Two monomers i and j are considered to be overlapping if (3) where 〈Rg,i〉 was determined from the coordinates of residues in blob i in the MD simulations (Fig 9a), and a is a constant. In the MD simulations of the real protein, we observed that for almost all frames (Fig 9b), and thus we set a = 0.3.
The random walk was carried out using a simple Metropolis Monte Carlo, with the following move set: 1) a random bead i > 0 was selected, 2) a random displacement vector of magnitude 0.5 nm was generated in three cartesian dimensions, 3) was scaled so that , satisfying Eq 2, 4) the translation was applied for all j ≥ i.
Any trial move that caused an overlap according to Eq 3 was rejected, while all others were accepted. The Monte Carlo simulation was run for 5,000,000 steps (500,000 steps per moveable bead); additional steps did not change the outcome in Fig 5a.
S1 Table. Summary of force-field comparison simulations.
S1 Text. Heterogeneous behavior of individual blobs.
S1 Fig. Force-field comparison.
We ran T-REMD simulations of a 30 residue fragment of the V66 prodomain with several commonly used force-field and water model combinations. (a) Comparison of ΔδCα at 280K from MD ensembles for a99sb*-ildn-q [78, 79] with Tip4p-D , c36m , a99sbws [76, 78], a03sbws [75, 76], a99sb-ildn with Tip3p , calculated using SPARTA+  and NMR from Ref. . (b) Rg vs the simulation time, using a 100 ns moving window on left and Rg distribution for each force-field on right. Tip3p and a03sbws generates most collapsed and expanded Rg distribution respectively. The equilibration time and 〈Rg〉 is shown with vertical and horizontal dashed lines for each force-field. The Rg distribution and its mean does not include the simulation equilibration time. The 〈Rg〉 values are also reported in S1 Table.
S2 Fig. Effects of temperature and Val66Met mutation on helix propensity around residue 66.
The frequency of formation of a helix of a given length containing residue 66 in V66 (top) and M66 (bottom) sequences in the temperature range of 300K to 385 K. With the increase in temperature the color transitions from cooler (blue) to hotter (red). It is entropically unfavorable for V66 and its neighboring residue to be simultaneously in the helical region of the Ramachandran map, as indicated by the decreasing helical propensity with increasing temperature. For longer helices, the trend will depend more on the additional side-chains in the helix, and the trend with temperature is reversed, but it remains weaker than the analogous trend for the M66 sequence. Errors represent the standard error of a Bernoulli trial with n number of samples, where n is the product of the total number unique replicas forming the helix of given length at residue 66 at a given temperature and the average number of roundtrips per replica, 17.
S3 Fig. Residue level contacts for the entire prodomain.
a) Contact probability between every residue pair for V66 (left), M66 (middle) sequences and the difference between the two (right). Two residue pairs are considered to be in contact if the Cα-Cα distance between the two residues is less than or equal to 0.8 nm. Panels are annotated by a blob representation of the prodomain, as in Fig 1e(i); vertical and horizontal grey lines in each panel represent the blob boundaries. b) A linear network of transient tertiary contacts shown in panel a. The contact networks were build using Cytoscape  with a linear representation of residues. Each protein residue comprises a node in the network, with interactions between residues represented as edges. The strength of individual interactions can be interpreted by the thickness of the edge line on the network diagram. If the separation between residues forming the contact is more than 20, its edge is drawn above the node; otherwise, the edge is drawn at the bottom of the node. To focus on significant interactions, interactions showing more than 6% persistence were considered in the network visualization. The x axis is annotated with blob representation of the prodomain, as in Fig 1e(i).
S4 Fig. Residue level contacts for the entire prodomain including backbone-backbone, sidechain-sidechain, salt-bridge and hydrophobic contacts.
Contact probability between every residue pair for V66 (left) and M66 (middle) sequences and the difference between the two (right). Two residue pairs are in contact if the distance between backbone-backbone atoms between the two residues are 0.4 nm or less (1st row), if the distance between non hydrogen sidechain-sidechain atoms between the two residues are 0.4 nm or less (2nd row), if the distance between non hydrogen sidechain-sidechain atoms between the two hydrophobic residues are 0.4 nm or less (3rd row), if the two residue pairs are forming a salt-bridge with the distance between the donor and acceptor atoms < 0.32 nm (4th row). Panels are annotated by a blob representation of the prodomain, as in Fig 1e(i); vertical and horizontal grey lines in each panel represent the blob boundaries.
S5 Fig. Polymer scaling behavior for each identified blob and entire prodomain.
a) Mean distances between any residues i and j at 300K, for the entire V66 and M66 prodomains as well as each blob in the V66 (left) and M66 (right) sequences. Theoretical polymer scaling limits are represented by the curves 〈R|i−j|〉 = A|i − j|ν where A = 0.59 nm and ν is the Flory exponent. For good, theta, and bad solvent, ν = 3/5, 1/2, 1/3 respectively. b) Values of ν resulting from fits to each blob for V66 (left) and M66 (right) sequences. The x axis is annotated with a blob representation of the prodomain where blobs are colored according to the Das and Pappu diagram  in Fig 1c.
S6 Fig. Effect of perturbing monomer properties on freely-jointed, self-avoiding heteropolymer.
Contact probability maps from SAHP calculations, analogous to those in Fig 5a of the main text. The x and y axes are annotated with cartoon representation of the prodomain; circles are drawn to the scale of each blob’s size. Here the SAHP model is varied systematically by swapping the p3 blob with every other blob in the chain. As the p3 blob is shifted along the chain, p3 and p1 consistently bound a white “forbidden” region that has little interaction with the rest of the protein.
S7 Fig. β-pairing between blob p1 and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Frames were first clustered by whether the X-Y contact was formed (purple) or broken (green), and then by whether β structure was present in X (solid) or absent (dashed). The dark-gray window indicates the contacting blob that is constrained to have high or vanishing values by construction of the cluster, while the white window indicates the contacting blob without constrained secondary structure. If the contact is coupled to simultaneous β-strand formation, the peak within the white window for the solid purple curve should be significantly higher than other curves. Errors represent standard error of a Bernoulli trial with n number of samples, where n is the product of total number of unique replicas in a given cluster and average number of roundtrips per replica (17). X represents p1 and Y represents other blobs identified in the sequence and is annotated on the left for each panel.
S8 Fig. β-pairing between blob h1a and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h1a blob.
S9 Fig. β-pairing between blob h1b and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h1b blob.
S10 Fig. β-pairing between blob p2 and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for p2 blob.
S11 Fig. β-pairing between blob h2a and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h2a blob.
S12 Fig. β-pairing between blob h2b and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h2b blob.
S13 Fig. β-pairing between blob p3 and and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for p3 blob.
S14 Fig. β-pairing between blob h3a and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h3a blob.
S15 Fig. β-pairing between blob h3b and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h3b blob.
S16 Fig. β-pairing between blob h3c and each remaining blob, excluding adjacent or intra group pairs in the V66 (left) and M66 (right) sequences.
Same as S7 Fig, but for h3c blob.
The authors are grateful to Dr. Clay Bracken and Dr. Barbara Hempstead of Weill Cornell Medical Center for helpful discussions.
- 1. Uversky VN. Unusual biophysics of intrinsically disordered proteins. Biochim Biophys Acta. 2013;1834(5):932–51. pmid:23269364
- 2. Panchenko AR, Babu MM. Editorial overview: Linking protein sequence and structural changes to function in the era of next-generation sequencing. Curr Opin Struct Biol. 2015;32:viii–x. pmid:26199202
- 3. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT. Prediction and Functional Analysis of Native Disorder in Proteins from the Three Kingdoms of Life. J Mol Biol. 2004;337(3):635–645. pmid:15019783
- 4. Dyson HJ, Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005;6(3):197–208. pmid:15738986
- 5. Uversky VN. Intrinsically Disordered Proteins and Their “Mysterious” (Meta)Physics. Front Phys. 2019;7:10.
- 6. Minezaki Y, Homma K, Kinjo AR, Nishikawa K. Human Transcription Factors Contain a High Fraction of Intrinsically Disordered Regions Essential for Transcriptional Regulation. J Mol Biol. 2006;359(4):1137–1149. pmid:16697407
- 7. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272(20):5129–5148. pmid:16218947
- 8. Wright PE, Dyson HJ. Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol. 2015;16(1):18–29. pmid:25531225
- 9. Vucetic S, Xie H, Iakoucheva LM, Oldfield CJ, Dunker AK, Obradovic Z, et al. Functional anthology of intrinsic disorder. 2. Cellular components, domains, technical terms, developmental processes, and coding sequence diversities correlated with long disordered regions. J Proteome Res. 2007;6(5):1899–916. pmid:17391015
- 10. Iakoucheva LM, Brown CJ, Lawson JD, Obradović Z, Dunker AK. Intrinsic Disorder in Cell-signaling and Cancer-associated Proteins. J Mol Biol. 2002;323(3):573–584. pmid:12381310
- 11. Habchi J, Tompa P, Longhi S, Uversky VN. Introducing Protein Intrinsic Disorder. Chem Rev. 2014;114(13):6561–6588. pmid:24739139
- 12. Buée L, Bussière T, Buée-Scherrer V, Delacourte A, Hof PR. Tau protein isoforms, phosphorylation and role in neurodegenerative disorders. Brain Res Rev. 2000;33(1):95–130. pmid:10967355
- 13. Weathers EA, Paulaitis ME, Woolf TB, Hoh JH. Insights into protein structure and function from disorder-complexity space. Proteins Struct Funct Bioinforma. 2006;66(1):16–28.
- 14. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK. Sequence complexity of disordered protein. Proteins Struct Funct Genet. 2001;42(1):38–48. pmid:11093259
- 15. Jorda J, Xue B, Uversky VN, Kajava AV. Protein tandem repeats—the more perfect, the less structured. FEBS J. 2010;277(12):2673–2682. pmid:20553501
- 16. Dyson HJ, Wright PE. Equilibrium NMR studies of unfolded and partially folded proteins. Nat Struct Biol. 1998;5(7):499–503. pmid:9665178
- 17. Mukhopadhyay S, Krishnan R, Lemke EA, Lindquist S, Deniz AA. A natively unfolded yeast prion monomer adopts an ensemble of collapsed and rapidly fluctuating structures. Proc Natl Acad Sci. 2007;104(8):2649–2654. pmid:17299036
- 18. Abeln S, Frenkel D. Disordered flanks prevent peptide aggregation. PLoS Comput Biol. 2008;4(12):e1000241. pmid:19096500
- 19. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, et al. DisProt: the Database of Disordered Proteins. Nucleic Acids Res. 2007;35(Database issue):D786–93. pmid:17145717
- 20. Das RK, Ruff KM, Pappu RV. Relating sequence encoded information to form and function of intrinsically disordered proteins. Curr Opin Struct Biol. 2015;32:102–112. pmid:25863585
- 21. Das RK, Pappu RV. Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues. Proc Natl Acad Sci. 2013;110(33):13392–13397. pmid:23901099
- 22. Sawle L, Ghosh K. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins. J Chem Phys. 2015;143(8):085101. pmid:26328871
- 23. Firman T, Ghosh K. Sequence charge decoration dictates coil-globule transition in intrinsically disordered proteins. J Chem Phys. 2018;148(12):123305. pmid:29604827
- 24. Uversky VN, Oldfield CJ, Dunker AK. Intrinsically Disordered Proteins in Human Diseases: Introducing the D 2 Concept. Annu Rev Biophys. 2008;37(1):215–246. pmid:18573080
- 25. Vacic V, Markwick PRL, Oldfield CJ, Zhao X, Haynes C, Uversky VN, et al. Disease-associated mutations disrupt functionally important regions of intrinsic protein disorder. PLoS Comput Biol. 2012;8(10):e1002709. pmid:23055912
- 26. Larini L, Gessel MM, LaPointe NE, Do TD, Bowers MT, Feinstein SC, et al. Initiation of assembly of tau(273-284) and its ΔK280 mutant: an experimental and computational study. Phys Chem Chem Phys. 2013;15(23):8916. pmid:23515417
- 27. Ganguly D, Chen J. Modulation of the Disordered Conformational Ensembles of the p53 Transactivation Domain by Cancer-Associated Mutations. PLOS Comput Biol. 2015;11(4):e1004247. pmid:25897952
- 28. Viet MH, Nguyen PH, Derreumaux P, Li MS. Effect of the English Familial Disease Mutation (H6R) on the Monomers and Dimers of Aβ40 and Aβ42. ACS Chem Neurosci. 2014;5(8):646–657. pmid:24949887
- 29. Viet MH, Nguyen PH, Ngo ST, Li MS, Derreumaux P. Effect of the Tottori Familial Disease Mutation (D7N) on the Monomers and Dimers of Aβ 40 and Aβ 42. ACS Chem Neurosci. 2013;4(11):1446–1457. pmid:24041307
- 30. Truong PM, Viet MH, Nguyen PH, Hu CK, Li MS. Effect of Taiwan Mutation (D7H) on Structures of Amyloid-β Peptides: Replica Exchange Molecular Dynamics Study. J Phys Chem B. 2014;118(30):8972–8981. pmid:25010208
- 31. Zhan YA, Wu H, Powell AT, Daughdrill GW, Ytreberg FM. Impact of the K24N mutation on the transactivation domain of p53 and its binding to murine double-minute clone 2. Proteins Struct Funct Bioinforma. 2013;81(10):1738–1747.
- 32. Xu L, Shan S, Wang X. Single Point Mutation Alters the Microstate Dynamics of Amyloid β-Protein Aβ42 as Revealed by Dihedral Dynamics Analyses. J Phys Chem B. 2013;117(20):6206–6216. pmid:23662730
- 33. Bah A, Forman-Kay JD. Modulation of Intrinsically Disordered Protein Function by Post-translational Modifications. J Biol Chem. 2016;291(13):6696–6705. pmid:26851279
- 34. He Y, Chen Y, Mooney SM, Rajagopalan K, Bhargava A, Sacho E, et al. Phosphorylation-induced Conformational Ensemble Switching in an Intrinsically Disordered Cancer/Testis Antigen. J Biol Chem. 2015;290(41):25090–25102. pmid:26242913
- 35. Conicella AE, Zerze GH, Mittal J, Fawzi NL. ALS Mutations Disrupt Phase Separation Mediated by α-Helical Structure in the TDP-43 Low-Complexity C-Terminal Domain. Structure. 2016;24(9):1537–49. pmid:27545621
- 36. Yip YL, Famiglietti M, Gos A, Duek PD, David FPA, Gateau A, et al. Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase. Hum Mutat. 2008;29(3):361–366. pmid:18175334
- 37. Iešmantavičius V, Jensen MR, Ozenne V, Blackledge M, Poulsen FM, Kjaergaard M. Modulation of the Intrinsic Helix Propensity of an Intrinsically Disordered Protein Reveals Long-Range Helix–Helix Interactions. J Am Chem Soc. 2013;135(27):10155–10163. pmid:23758617
- 38. Feuerstein S, Solyom Z, Aladag A, Favier A, Schwarten M, Hoffmann S, et al. Transient Structure and SH3 Interaction Sites in an Intrinsically Disordered Fragment of the Hepatitis C Virus Protein NS5A. J Mol Biol. 2012;420(4-5):310–323. pmid:22543239
- 39. Mittag T, Forman-Kay JD. Atomic-level characterization of disordered protein ensembles. Curr Opin Struct Biol. 2007;17(1):3–14. pmid:17250999
- 40. Stanley N, Esteban-Martín S, De Fabritiis G. Progress in studying intrinsically disordered proteins with atomistic simulations. Prog Biophys Mol Biol. 2015;119:47–52. pmid:25814479
- 41. Ithuralde RE, Roitberg AE, Turjanski AG. Structured and Unstructured Binding of an Intrinsically Disordered Protein as Revealed by Atomistic Simulations. J Am Chem Soc. 2016;138(28):8742–8751. pmid:27348048
- 42. Knott M, Best RB, Hummer G, de Bakker P, Word J. A Preformed Binding Interface in the Unbound Ensemble of an Intrinsically Disordered Protein: Evidence from Molecular Simulations. PLoS Comput Biol. 2012;8(7):e1002605. pmid:22829760
- 43. Invernizzi G, Lambrughi M, Regonesi ME, Tortora P, Papaleo E. The conformational ensemble of the disordered and aggregation-protective 182-291 region of ataxin-3. Biochim Biophys Acta. 2013;1830(11):5236–47. pmid:23891935
- 44. Yedvabny E, Nerenberg PS, So C, Head-Gordon T. Disordered Structural Ensembles of Vasopressin and Oxytocin and Their Mutants. J Phys Chem B. 2015;119(3):896–905. pmid:25231121
- 45. Levine ZA, Shea JE. Simulations of disordered proteins and systems with conformational heterogeneity. Curr Opin Struct Biol. 2017;43:95–103. pmid:27988422
- 46. Levine ZA, Larini L, LaPointe NE, Feinstein SC, Shea JE. Regulation and aggregation of intrinsically disordered peptides. Proc Natl Acad Sci U S A. 2015;112(9):2758–63. pmid:25691742
- 47. Pappu RV, Wang X, Vitalis A, Crick SL. A polymer physics perspective on driving forces and mechanisms for protein aggregation. Arch Biochem Biophys. 2008;469(1):132–41. pmid:17931593
- 48. Korte M, Carroll P, Wolf E, Brem G, Thoenen H, Bonhoeffer T. Hippocampal long-term potentiation is impaired in mice lacking brain-derived neurotrophic factor. Proc Natl Acad Sci. 1995;92(19):8856–8860. pmid:7568031
- 49. Davies AM. Regulation of neuronal survival and death by extracellular signals during development. EMBO J. 2003;22(11):2537–45. pmid:12773370
- 50. Pezawas L, Verchinski BA, Mattay VS, Callicott JH, Kolachana BS, Straub RE, et al. The brain-derived neurotrophic factor val66met polymorphism and variation in human cortical morphology. J Neurosci. 2004;24(45):10099–102. pmid:15537879
- 51. Benjamin S, McQuoid DR, Potter GG, Payne ME, MacFall JR, Steffens DC, et al. The Brain-Derived Neurotrophic Factor Val66Met Polymorphism, Hippocampal Volume, and Cognitive Function in Geriatric Depression. Am J Geriatr Psychiatry. 2010;18(4):323–331. pmid:20220593
- 52. Huang ZJ, Kirkwood A, Pizzorusso T, Porciatti V, Morales B, Bear MF, et al. BDNF Regulates the Maturation of Inhibition and the Critical Period of Plasticity in Mouse Visual Cortex. Cell. 1999;98(6):739–755. pmid:10499792
- 53. Liu Bh, Li Yt, Ma Wp, Pan Cj, Zhang L, Tao H. Broad Inhibition Sharpens Orientation Selectivity by Expanding Input Dynamic Range in Mouse Simple Cells. Neuron. 2011;71(3):542–554.
- 54. Gao M, Maynard KR, Chokshi V, Song L, Jacobs C, Wang H, et al. Rebound Potentiation of Inhibition in Juvenile Visual Cortex Requires Vision-Induced BDNF Expression. J Neurosci. 2014;34(32):10770–10779. pmid:25100608
- 55. Autry AE, Monteggia LM. Brain-Derived Neurotrophic Factor and Neuropsychiatric Disorders. Pharmacol Rev. 2012;64(2):238–258. pmid:22407616
- 56. Björkholm C, Monteggia LM. BDNF—a key transducer of antidepressant effects. Neuropharmacology. 2016;102:72–79. pmid:26519901
- 57. Autry AE, Adachi M, Nosyreva E, Na ES, Los MF, Cheng Pf, et al. NMDA receptor blockade at rest triggers rapid behavioural antidepressant responses. Nature. 2011;475(7354):91–5. pmid:21677641
- 58. Soliman F, Glatt CE, Bath KG, Levita L, Jones RM, Pattwell SS, et al. A genetic variant BDNF polymorphism alters extinction learning in both mouse and human. Science. 2010;327(5967):863–6. pmid:20075215
- 59. Chen ZY, Bath K, McEwen B, Hempstead B, Lee F. Impact of genetic variant BDNF (Val66Met) on brain structure and function. Novartis Found Symp. 2008;289:180–8; discussion 188–95. pmid:18497103
- 60. Verhagen M, van der Meij A, van Deurzen PAM, Janzing JGE, Arias-Vásquez A, Buitelaar JK, et al. Meta-analysis of the BDNF Val66Met polymorphism in major depressive disorder: effects of gender and ethnicity. Mol Psychiatry. 2010;15(3):260–71. pmid:18852698
- 61. Notaras M, Hill R, van den Buuse M. The BDNF gene Val66Met polymorphism as a modifier of psychiatric disorder susceptibility: progress and controversy. Mol Psychiatry. 2015;20(8):916–30.
- 62. Feng D, Kim T, Özkan E, Light M, Torkin R, Teng KK, et al. Molecular and Structural Insight into proNGF Engagement of p75NTR and Sortilin. J Mol Biol. 2010;396(4):967–984. pmid:20036257
- 63. Anastasia A, Deinhardt K, Chao MV, Will NE, Irmady K, Lee FS, et al. Val66Met polymorphism of BDNF alters prodomain structure to induce neuronal growth cone retraction. Nat Commun. 2013;4:2490. pmid:24048383
- 64. Giza JI, Kim J, Meyer HC, Anastasia A, Dincheva I, Zheng CI, et al. The BDNF Val66Met Prodomain Disassembles Dendritic Spines Altering Fear Extinction Circuitry and Behavior. Neuron. 2018;99(1):163–178.e6. pmid:29909994
- 65. Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157(1):105–32. pmid:7108955
- 66. Uversky VN, Gillespie JR, Fink AL. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins. 2000;41(3):415–27. pmid:11025552
- 67. Holehouse AS, Das RK, Ahad JN, Richardson MOG, Pappu RV. CIDER: Resources to Analyze Sequence-Ensemble Relationships of Intrinsically Disordered Proteins. Biophys J. 2017;112(1):16–21. pmid:28076807
- 68. Brzovic PS, Heikaus CC, Kisselev L, Vernon R, Herbig E, Pacheco D, et al. The Acidic Transcription Activator Gcn4 Binds the Mediator Subunit Gal11/Med15 Using a Simple Protein Interface Forming a Fuzzy Complex. Mol Cell. 2011;44(6):942–953. pmid:22195967
- 69. Uesugi M, Nyanguile O, Lu H, Levine AJ, Verdine GL. Induced alpha helix in the VP16 activation domain upon binding to a human TAF. Science. 1997;277(5330):1310–3. pmid:9271577
- 70. Radhakrishnan I, Pérez-Alvarado GC, Parker D, Dyson HJ, Montminy MR, Wright PE. Solution Structure of the KIX Domain of CBP Bound to the Transactivation Domain of CREB: A Model for Activator:Coactivator Interactions. Cell. 1997;91(6):741–752. pmid:9413984
- 71. Canales Á, Rösinger M, Sastre J, Felli IC, Jiménez-Barbero J, Giménez-Gallego G, et al. Hidden α-helical propensity segments within disordered regions of the transcriptional activator CHOP. PLoS One. 2017;12(12):e0189171. pmid:29211802
- 72. Staller MV, Holehouse AS, Swain-Lenz D, Das RK, Pappu RV, Cohen BA. A High-Throughput Mutational Scan of an Intrinsically Disordered Acidic Transcriptional Activation Domain. Cell Syst. 2018;6(4):444–455.e6. pmid:29525204
- 73. Mercadante D, Milles S, Fuertes G, Svergun DI, Lemke EA, Gräter F. Kirkwood–Buff Approach Rescues Overcollapse of a Disordered Protein in Canonical Protein Force Fields. J Phys Chem B. 2015;119(25):7975–7984. pmid:26030189
- 74. Piana S, Donchev AG, Robustelli P, Shaw DE. Water Dispersion Interactions Strongly Influence Simulated Structural Properties of Disordered Protein States. J Phys Chem B. 2015;119(16):5113–5123. pmid:25764013
- 75. Best RB, Hummer G. Optimized Molecular Dynamics Force Fields Applied to the Helix−Coil Transition of Polypeptides. J Phys Chem B. 2009;113(26):9004–9015. pmid:19514729
- 76. Best RB, Zheng W, Mittal J. Balanced Protein–Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association. J Chem Theory Comput. 2014;10(11):5113–5124. pmid:25400522
- 77. Abascal JLF, Vega C. A general purpose model for the condensed phases of water: TIP4P/2005. J Chem Phys. 2005;123(23):234505. pmid:16392929
- 78. Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, Dror RO, et al. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins. 2010;78(8):1950–8. pmid:20408171
- 79. Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins Struct Funct Bioinforma. 2006;65(3):712–725.
- 80. Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot BL, et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods. 2017;14(1):71–73. pmid:27819658
- 81. Jorgensen WL. Quantum and statistical mechanical studies of liquids. 10. Transferable intermolecular potential functions for water, alcohols, and ethers. Application to liquid water. J Am Chem Soc. 1981;103(2):335–340.
- 82. Robustelli P, Piana S, Shaw DE. Developing a molecular dynamics force field for both folded and disordered protein states. Proc Natl Acad Sci U S A. 2018;115(21):E4758–E4766. pmid:29735687
- 83. Shen Y, Bax A. SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J Biomol NMR. 2010;48(1):13–22. pmid:20628786
- 84. Rauscher S, Gapsys V, Gajda MJ, Zweckstetter M, de Groot BL, Grubmüller H. Structural Ensembles of Intrinsically Disordered Proteins Depend Strongly on Force Field: A Comparison to Experiment. J Chem Theory Comput. 2015;11(11):5513–5524. pmid:26574339
- 85. Meng F, Bellaiche MMJ, Kim JY, Zerze GH, Best RB, Chung HS. Highly Disordered Amyloid-β Monomer Probed by Single-Molecule FRET and MD Simulation. Biophys J. 2018;114(4):870–884. pmid:29490247
- 86. Ortega A, Amorós D, García de la Torre J. Prediction of hydrodynamic and other solution properties of rigid proteins from atomic- and residue-level models. Biophys J. 2011;101(4):892–8. pmid:21843480
- 87. Creamer TP, Rose GD. Side-chain entropy opposes alpha-helix formation but rationalizes experimentally determined helix-forming propensities. Proc Natl Acad Sci U S A. 1992;89(13):5937–41. pmid:1631077
- 88. Viguera AR, Serrano L. Side-chain interactions between sulfur-containing amino acids and phenylalanine in alpha-helices. Biochemistry. 1995;34(27):8771–9. pmid:7612617
- 89. Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie. 2008;90(4):626–639. pmid:18086572
- 90. Valley CC, Cembran A, Perlmutter JD, Lewis AK, Labello NP, Gao J, et al. The methionine-aromatic motif plays a unique role in stabilizing protein structure. J Biol Chem. 2012;287(42):34979–91. pmid:22859300
- 91. Gómez-Tamayo JC, Cordomí A, Olivella M, Mayol E, Fourmy D, Pardo L. Analysis of the interactions of sulfur-containing amino acids in membrane proteins. Protein Sci. 2016;25(8):1517–24. pmid:27240306
- 92. Rubinstein M, Colby RH. Polymer physics. Oxford University Press; 2003.
- 93. Chen ZY, Ieraci A, Teng H, Dall H, Meng CX, Herrera DG, et al. Sortilin controls intracellular sorting of brain-derived neurotrophic factor to the regulated secretory pathway. J Neurosci. 2005;25(26):6156–66. pmid:15987945
- 94. Lim JM, Kim G, Levine RL. Methionine in Proteins: It’s Not Just for Protein Initiation Anymore. Neurochem Res. 2019;44(1):247–257. pmid:29327308
- 95. Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I-TASSER Suite: protein structure and function prediction. Nat Methods. 2014;12(1):7–8.
- 96. Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc. 2010;5(4):725–738. pmid:20360767
- 97. Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics. 2008;9(1):40. pmid:18215316
- 98. Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 2004;32(Web Server issue):W526–31. pmid:15215442
- 99. Šali A, Blundell TL. Comparative Protein Modelling by Satisfaction of Spatial Restraints. J Mol Biol. 1993;234(3):779–815. pmid:8254673
- 100. Berendsen HJC, van der Spoel D, van Drunen R. GROMACS: A message-passing parallel molecular dynamics implementation. Comput Phys Commun. 1995;91(1-3):43–56.
- 101. Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1-2:19–25.
- 102. Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, Pedersen LG. A smooth particle mesh Ewald method. J Chem Phys. 1995;103(19):8577–8593.
- 103. Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett. 1999;314(1-2):141–151.
- 104. Shen Y, Delaglio F, Cornilescu G, Bax A. TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR. 2009;44(4):213–23. pmid:19548092
- 105. Marsh JA, Singh VK, Jia Z, Forman-Kay JD. Sensitivity of secondary structure propensities to sequence differences between α- and γ-synuclein: Implications for fibrillation. Protein Sci. 2006;15(12):2795–2804. pmid:17088319
- 106. Nielsen JT, Mulder FAA. POTENCI: prediction of temperature, neighbor and pH-corrected chemical shifts for intrinsically disordered proteins. J Biomol NMR. 2018;70(3):141–165. pmid:29399725
- 107. Nodet G, Salmon L, Ozenne V, Meier S, Jensen MR, Blackledge M. Quantitative Description of Backbone Conformational Sampling of Unfolded Proteins at Amino Acid Resolution from NMR Residual Dipolar Couplings. J Am Chem Soc. 2009;131(49):17908–17918. pmid:19908838
- 108. García AE, Sanbonmatsu KY. Alpha-helical stabilization by side chain shielding of backbone hydrogen bonds. Proc Natl Acad Sci U S A. 2002;99(5):2782–7. pmid:11867710
- 109. Iglesias J, Sanchez-Martínez M, Crehuet R. SS-map. Intrinsically Disord Proteins. 2013;1(1):e25323. pmid:28516013
- 110. Ahlstrom LS, Baker JL, Ehrlich K, Campbell ZT, Patel S, Vorontsov II, et al. Network visualization of conformational sampling during molecular dynamics simulation. J Mol Graph Model. 2013;46:140–9. pmid:24211466