Fig 1.
(A) The complete graph is constructed based on protein tertiary structure, where the adjacency matrix is derived from the intra-residue distance matrix. (B) Raw node features consist of distance-based feature xv and angle-based feature xa.
Fig 2.
Architecture of GNN-based encoder.
The BiLSTM module extracts low-level node features from the primary structures of proteins. The graph convolution module extracts high-level node features based on the adjacency matrices . The readout module transforms node features to the descriptors by a global max pooling layer. The residual blocks (ResBlock) used in the graph convolutional module consists of two graph convolutional (GC) layers.
Fig 3.
The contrastive learning framework for protein structure representation learning.
At each iteration, raw features Xq and Xk are extracted from the query protein structure and the key protein structure, respectively. Then, descriptors yq and yk are encoded by GNN encoder and
, respectively. The value of loss function guides the optimization of the parameters θq of
while the parameters θk are updated based on θq. At the end of the current iteration, yk will enqueue as a negative sample for the next iteration.
Table 1.
Ablation studies of length-scaling cosine distance, the dynamic training data partition strategy and the GNN-based encoder on SCOPe v2.07 and ind_PDB.
Table 2.
Ranking performance of GraSR and other baseline methods.
Fig 4.
Correlation between distance derived from the representations learned by GraSR/DeepFold and TM-score on (A) SCOPe v2.07 and (B) ind_PDB.
The Pearson correlation coefficient (PCC) is calculated for quantitative assessment.
Fig 5.
The F1-score of each class in SCOPe of GraSR and other baseline methods.
a: All alpha proteins; b: All beta proteins; c: Alpha and beta proteins (a/b); d: Alpha and beta proteins (a+b); e: Multi-domain proteins (alpha and beta); f: Membrane and cell surface proteins and peptides; g: Small proteins.
Table 3.
Multi-class classification performance of GraSR and other methods.
Table 4.
Time cost of GraSR and other methods for protein structure retrieval from ind_PDB.
Fig 6.
Visualization of descriptors learned from GraSR and other methods by t-SNE.
a: All alpha proteins; b: All beta proteins; c: Alpha and beta proteins (a/b); d: Alpha and beta proteins (a+b); e: Multi-domain proteins (alpha and beta); f: Membrane and cell surface proteins and peptides; g: Small proteins.
Fig 7.
Protein structure superposition derived from the residue-level descriptors of GraSR.
(A) SCOPe-sid: d1v59a2 (red) and d1h6va2 (blue) (B) SCOPe-sid: d5dqpa_ (red) and d1ezwa_ (blue).