Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

doi:10.1371/journal.pcbi.1002234

Figure 1.

The 465 loop data set.

Global superposition data set of 465 loops used to test sampling. All representations are in backbone cartoon. (a) 111 target loops from CDRH1 (12 residues), (b) 130 target loops from CDRH2 (7, 8 and 10 residues), (c) 111 target loops from CDRH3 (8, 10–17 residues), (d) 21 loops from CASP9 target, T0617 (12 residues), and (e) 92 globin EF loops (12, 13 and 15 residues).

More »

Expand

Figure 2.

Density estimations of φ,ψ distributions.

Examples of DPM-HMM estimated backbone dihedral angle density distributions at various positions of targets from predictions of the CDRH2 loop and anchor residues. The grey dots represent the observed φ,ψ input data at a particular alignment position. The contour lines represent the calculated density estimation calculated from the φ,ψ pair data. The red dots indicate the actual φ,ψ values of the target structure. Position refers to the place in the modeled loop and the PDB code refers to the predicted target. (a) position 1 of 1mfa [55], (b) position 6 of 1w72 [56], (c) position of 6 for 1gig [57] and (d) position 9 (last anchor residue) of 1rmf [58].

More »

Expand

Figure 3.

Local versus global superposition.

The 97 candidate loops below 1 Å average Cα–Cα termini distance cutoff for the target loop 3bpx from dataset T0617, showing various orientations of the candidate loops (grey) in backbone Cα trace. Reference loop is shown in red stick representation. The best candidate by local superposition in blue and best candidate by global superposition is shown as green. (a) Local superposition of candidate loops to the reference crystal structure with average local RMSD of 1.86 Å. (b) Candidate loops are superposed only at the take-off region (first residue at N-terminus) of the loop. Average global RMSD of candidates to the reference crystal structure is 3.17 Å.

More »

Expand

Figure 4.

DPM-HMM Sampling performance.

RMSD of the best candidate versus RMSD of the best template. The diagonal line is unity. Points below the line indicate predictions better than the best template. The inset shows the percentage of better and worse predictions in each RMSD bin. When the RMSD of the best templates are below 1 Å, the chances our methods improve the loop are about 38%. When they are between in 1–2 Å, the chances are higher than 75%. In the 2–3 Å range, chances of improvement are higher than 93%. For higher than 3 Å, the loop structures are always improved.

More »

Expand

Table 1.

Loop modeling template datasets and accuracy measure (RMSD) for the sampled candidates.

More »

Expand

Figure 5.

Influence of the variation of input data.

RMSD of the best candidate versus average RMSD between all the templates. The data points are classified according to the number of templates used for input in the DPM-HMM φ,ψ density estimation. Grey filled circles represent targets with less than 10 templates, open circles are with 10 to 30 templates and black filled circles are with more than 30 templates.

More »

Expand

Figure 6.

Dependence of input data: length and amount.

(a) Correlation of the best candidate RMSD with loop length. The prediction shows a linear correlation to loop length. (b) Correlation of RMSD of the best candidate to the number of templates. The candidates decrease in RMSD as the number of templates increases to a cutoff of ∼30 templates, suggesting that more than 30 templates do not improve the sampling in the DPM-HMM method.

More »

Expand

Figure 7.

CASP9 Loop sampling.

Assessment of sampling efficiency for the 90 loops modeled in the CASP9 experiment (see Materials and Methods for selection). All loops were modeled with very limited number of templates, mostly 1–5, and with templates of various lengths. For smaller loops with 3–8 residues, global RMSD is mostly below 2.5 Å. For medium sized loops (8–13 amino acids), global RMSD is between 1–3 Å. As the loop length increases, best-sampled conformations have higher RMSD from the native structure. The DPM-HMM fails after 20 residues as shown by the increase in RMSD above 5 Å.

More »

Expand

Figure 8.

Loop sampling comparison.

Boxplots display the RMSD sampling distribution of the DPM-HMM method alongside that of the LoopyMod method for loops of different difficulty: canonical (CDRH1) and non-canonical (CDRH3) loops. Comparison of sampling to the canonical CDRH1 is shown by the left 2 boxplots and the comparison to the non-canonical CDRH3 by the right 2 boxplots. In both cases, the DPM-HMM exhibits a tighter distribution and lower median RMSD.

More »

Expand