Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models

doi:10.1371/journal.pcbi.1004889

Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models

Fig 3

Inferred Potts-ACE model generates sequences with high folding probabilities and diversities.

A. Folding probabilities P_nat(S_B|A), Eq [5], for four sets of 10⁴ sequences A randomly generated with the Independent-site Model (IM, green), the Potts-ACE (red), the Potts-PLM (orange) and the Potts-Gaussian (blue) models vs. their Hamming distances to the consensus sequence of the ‘natural’ MSA of structure S_B used to infer the four models. Black symbols show results for the ‘natural’ sequences, sampled at inverse temperature β = 10³ (Methods). Most sequences drawn from the Potts-ACE model have high folding probabilities, while most sequences drawn from the IM have low values of P_nat; sequences generated with the Potts-PLM model lie somewhere in between. Sequences drawn from the Potts-Gaussian model have very high folding probabilities, but are very close to the consensus sequence, and fail to reproduce the diversity of sequences seen in the ‘natural’ MSA (black) and Potts-ACE (red) data. Hamming distances for the Potts-ACE and PLM models have been shifted by, respectively, and to improve visibility. Filled ellipses show domains corresponding to one standard deviation of the effective Hamiltonian , Eq [6]. B. Scatter plot of the ‘energy’ , Eq [9], with the inferred Potts-ACE (x-axis) vs. effective Hamiltonian , Eq [6] (y-axis), for the sequences in the MSA generated with the Potts-ACE model for structure S_B. Only sequences within the 90%-100% percentiles of P_nat values are shown. Colors identify intervals of values for P_nat, see legend in panel. The energy of the sequences computed with the Potts-ACE model have been subtracted the energy of the best folder, such that the minimal energy is zero.

doi: https://doi.org/10.1371/journal.pcbi.1004889.g003