Skip to main content
Advertisement

< Back to Article

Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models

Fig 3

Inferred Potts-ACE model generates sequences with high folding probabilities and diversities.

A. Folding probabilities Pnat(SB|A), Eq [5], for four sets of 104 sequences A randomly generated with the Independent-site Model (IM, green), the Potts-ACE (red), the Potts-PLM (orange) and the Potts-Gaussian (blue) models vs. their Hamming distances to the consensus sequence of the ‘natural’ MSA of structure SB used to infer the four models. Black symbols show results for the ‘natural’ sequences, sampled at inverse temperature β = 103 (Methods). Most sequences drawn from the Potts-ACE model have high folding probabilities, while most sequences drawn from the IM have low values of Pnat; sequences generated with the Potts-PLM model lie somewhere in between. Sequences drawn from the Potts-Gaussian model have very high folding probabilities, but are very close to the consensus sequence, and fail to reproduce the diversity of sequences seen in the ‘natural’ MSA (black) and Potts-ACE (red) data. Hamming distances for the Potts-ACE and PLM models have been shifted by, respectively, and to improve visibility. Filled ellipses show domains corresponding to one standard deviation of the effective Hamiltonian , Eq [6]. B. Scatter plot of the ‘energy’ , Eq [9], with the inferred Potts-ACE (x-axis) vs. effective Hamiltonian , Eq [6] (y-axis), for the sequences in the MSA generated with the Potts-ACE model for structure SB. Only sequences within the 90%-100% percentiles of Pnat values are shown. Colors identify intervals of values for Pnat, see legend in panel. The energy of the sequences computed with the Potts-ACE model have been subtracted the energy of the best folder, such that the minimal energy is zero.

Fig 3

doi: https://doi.org/10.1371/journal.pcbi.1004889.g003