Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

doi:10.1371/journal.pcbi.1008865

Fig 1.

The architecture of TripletRes, which formulates the contact-map prediction as a pixel-level labeling problem, where a pixel in the image represents a pair of residue positions in the contact-map of the query protein.

Starting from the MSA generated for the query sequence, three L×L×441 feature matrices (also called tensors) are computed for the three sets of coevolutionary features (PRE, PLM, COV). Here, L is the length of the query sequence while 441 = 21×21 is the combination of all 21 amino acid types (including the gap) for two positions in the MSA. Each tensor is input to a separate ResNet, where the first layer reduces the number of feature channels from 441 to 64, followed by instance normalization and 24 consecutive residual blocks to get an L×L×64 tensor. Details of a residual block are shown on the right-hand side inset. The three tensors from the three ResNets are concatenated into an L×L×192 tensor to feed into a final ResNet. In this ResNet, the first layer again reduces the feature channels from 192 to 64, followed by instance normalization, and 24 residual blocks to get an L×L×64 tensor, which is further reduced to L×L×12. Finally, a softmax layer is used to scale the values in the tensor between 0 and 1 and to make the sum of all values for each pixel (i.e. residue pair) equal to one. Since a protein contact/distance map is symmetric, TripletRes averages the corresponding softmax output of residue pair (i,j) and (j,i) to get the final L×L×12 distance-map prediction, where 12 stands of the number of distance bins. The contact-map is obtained by summing up the first 4 distance bin.

More »

Expand

Table 1.

Summary of long-range contact precision by TripletRes and control methods on 50 CASP11&12 FM targets and 195 CAMEO hard targets, sorted in ascending order of top-L precision.

p-values in parenthesis are from a Student’s t-test between TripletRes and each of the control methods, where bold fonts highlight the best performer in each category.

More »

Expand

Fig 2.

Comparisons of different strategies used to train TripletRes.

(a-c) Comparisons of the average long-range top-L/5 precisions over training epochs using different feature extraction strategies but trained with the same deep neural-network structure on three different coevolutionary analysis methods: (a) DCA based on pseudolikelihood maximization (PLM), (b) DCA based on the precision matrix (PRE), (c) Covariance analysis (COV) for contact-map prediction, on the validation set. “Processed” means the coevolutionary features are post-processed by Eqs A and B in S4 Text. (d) Comparison of the average long-range top-L/5 precisions over training epochs of individual coevolutionary features and the TripletRes model that ensembles all three sets of features, on the validation set. Each curve is for the training of a single model. (e) Comparison of long-range top-L/5 and top-L precisions with different loss functions on the CASP FM and CAMEO hard targets.

More »

Expand

Fig 3.

Long-range top-L precision of contact-maps predicted by TripletRes with deep MSAs versus that without deep MSAs.

(a) overall results on 50 CASP FM targets; (b,c) illustrative example of contact-map and the native structure of the T0896-D3 domain in CASP12; (d) Overall results on 195 CAMEO hard targets; (e,f) illustrative example of the contact-map and the native structure of the PDB ID 6g3bB in CAMEO. In (a) and (d), dashed lines mark the average precision of the top-L long-range contact prediction. In (c) and (f), dashed lines label the additional contacts predicted due to the employment of deep MSA.

More »

Expand

Fig 4.

An illustrative example of a CASP13 domain T0957s1-D1 showing a comparison of top-L/5 long-range contact prediction by TripletRes and the control methods.

In each map, the true contacts are marked in grey, true positives in red, and false positives in blue. (a-d) The comparison between TripletRes_CASP13, RaptorX-Contact, DMP, and ZHOU-Contact (in upper-left triangle) against TripletRes (in lower-right triangle). (e) Experimental structure of T0957s1-D1, with the long-range true positive prediction by TripletRes in Region 1, Region 2 and others marked in yellow, magenta and green dashed lines, respectively.

More »

Expand

Table 2.

Performance comparisons on CASP13 FM targets between TripletRes and RaptorX-Contact, DMP, and ZHOU-Contact servers, sorted in ascending order of top L long-range contact precision.

More »

Expand