Defining an Essence of Structure Determining Residue Contacts in Proteins

doi:10.1371/journal.pcbi.1000584

Figure 1.

The concept of structural essence.

The concept of a minimal set of contacts essential for the reconstruction of the three-dimensional structure is elucidated with an example of CheY (1e6k). A The native structure of 1e6k is shown in ribbon representation (pink). B The Ca contacts are visualized in a contact map. The inset highlights all the Ca contacts (red) on the cartoon representation. C A subset selected from the native contact map is highlighted (black). The inset shows the selected subset mapped onto the structure. D The structure reconstructed from the selected subset is shown in ribbon representation (blue). E The superposition of the native and the reconstructed structures. The reconstruction accuracy is measured as the Ca RMSD of the superposition of the native structure and the reconstructed model.

More »

Expand

Figure 2.

Subsets from random selection.

A Increasing fractions of contacts (from 10% to 100%) are selected at random and reconstructed. Two independent random selections are performed for every fraction and the average Ca RMSD is reported for every protein in a SCOP class. Each class consists of three structures. In each class ‘*’ denotes proteins that are thrice as large as the other two proteins. B The reconstruction accuracies of the random subsets are compared between our method and Chen and co-workers. Five proteins (1dd3, 1nxb, 1igd, 1bxy, 1d0d) are selected from the Chen dataset and the random subsets are generated with (i) our contact definitions Ca 9.0 Å, Cb 8.0 Å (red) (ii) contact definition from Chen et al (Cb 7.5 Å) (black). Subsets from (i) and (ii) are reconstructed with Tinker (iii) The reconstruction accuracy from Chen et al (blue).

More »

Expand

Table 1.

Dataset.

More »

Expand

Figure 3.

Sequence-range based subset selection.

The reconstruction accuracy of the short-range (left) and the long-range subsets (right) are shown (blue). The entire short (SR) and long-range (LR) contacts subsets are used in reconstruction. The comparison is against a random subset of similar size (red). The class average is the average Ca RMSDs from the ensembles (1/4^th best models) of every protein. The sizes of the SR and the LR subsets vary slightly in each SCOP class; however the trend was the preserved for both the Ca and the Cb graphs. (The average sizes of Ca graphs:- All α: SR = 62.2%, LR = 37.8%; All β: SR = 51.1%, LR = 49.9%; α/β: SR = 55.5%, LR = 44.5%; α+β: SR = 51.1%, LR = 48.9%).

More »

Expand

Figure 4.

Common Neighbourhood of an edge (Cn(E_ij)).

A contact E_ij (red) between nodes i (pink) and j (green) is shown. Let (N_i) be the neighbours of the i and (N_j) be neighbours of the j (grey). The CNb of edge (E_ij) is defined as The nodes k₁, k₂ and k₃ (yellow) share edges with nodes i and j. The triangles k₁, k₂ and k₃ make with E_ij constitute the CNb triangles of E_ij.

More »

Expand

Table 2.

PIs^* of common neighbourhood (CNb) and sequence-range rank ordered subsets.

More »

Expand

Figure 5.

Deriving the structural essence from cone-peeling strategy.

A The contact map visualization of the common neighbourhoods. The cone shaped landscape of the CNbs is resultant of low CNb edges occupying the base of the cone, while the high CNb edges occupying the summits. The colour-bar shows the range of the CNb sizes. B The cone-peeling strategy characterizes the structural essence better than random selection. The algorithm selects a subset of native contacts that have high CNb and are also in the long sequence-range and removes all the local contacts. It can be seen that in all the proteins, the subsets selected from cone-peeling (blue) reconstruct better than a similar sized random subset (red) achieving a PI>1 consistently in all the cases. For every protein, the ensemble average Ca RMSD is reported. The sizes of the final subsets and the PIs of the individual proteins are given in Table 1. C The essential contacts (blue) obtained from cone-peeling are highlighted in the native structure of 1e6k (red) using Pymol [29]. With 4.3% of Ca-Ca and 9% of Cb-Cb contacts, the subsets achieve a PI of 1.74. D The overlay of the best reconstructed models onto native structure (1e6k). The models reconstructed from the essential subsets obtained from the cone-peeling algorithm are superposed to the native structure for comparison. The best models selected (in terms of Ca RMSD) are shown in ribbon representation (orange). The native structure is shown in cartoon (blue). The overlaid models show an average Ca RMSD of 4.5 Å to the native structure. In the reconstructed models, only with the essential subsets of contacts, the secondary structural regions are well distinguished from the inter-secondary structural regions.

More »

Expand

Figure 6.

The cone-peeling algorithm.

More »

Expand