Multilabel prediction of virus target proteins via multimodal graph representation learning

doi:10.1371/journal.pcbi.1014320

Fig 1.

Framework of MultiVTP.

Our algorithm comprises four stages, including the subgraph sampling, feature extraction, feature integration, and multilabel prediction. First, multi-view subgraphs are sampled from the PPI network for each protein. Second, both network topological (i.e., global and local) properties and multimodal (i.e., traditional, sequence, and functional) features are extracted to represent proteins. Third, the above features are integrated and upgraded using Graphormer within the subgraphs. Finally, a progressive layered extraction model is adopted to capture shared and virus-specific binding signatures, yielding prediction scores for each query protein.

More »

Expand

Fig 2.

Feature analysis and comparison on the dataset.

Statistical analysis of traditional features: (A) amino acid composition, (B) ratio, (C) predicted coil proportion, and (D) closeness. (E) AUPRs achieved by different types of traditional features. (F) SHAP analysis of traditional features. (G) t-SNE visualization of global topological properties for VTPs and non-VTPs. (H) Gene ontology similarity among VTPs and between VTPs and non-VTPs for each virus. Significant differences are evaluated using Wilcoxon rank sum tests. **** p < 0.0001, *** 0.0001 ≤ p < 0.001, ** 0.001 ≤ p < 0.01, * 0.01 ≤ p < 0.05, and ns: p ≥ 0.05. (I) Performance of different protein embeddings.

More »

Expand

Fig 3.

Interpretability and ablation studies on the dataset.

(A) Ablation studies at both feature and module levels. MM: multimodal features, GO: functional features, SEQ: sequence features, and TRA: traditional features. (B) Ablation experiments for the components of Graphormer. (C) Density distribution of attention values of VTPs and non-VTPs. (D) Top 20 biological processes enriched in the top 50% of proteins sorted by attention scores. (E) Silhouette coefficients for evaluating the separation of different samples through t-SNE reduction. (F) t-SNE visualization of representations of VTPs (orange points) and non-VTPs (blue points).

More »

Expand

Fig 4.

Performance comparison of MultiVTP and baseline methods on the dataset.

(A) ROC curves of MultiVTP and HIVPRE for HIV-1 target prediction. (B) Precision-recall curves of MultiVTP and HIVPRE for HIV-1 target prediction. (C) Performance of MultiVTP and machine learning methods. BR: binary relevance, CC: classifier chains, and LP: label powerset. (D) Comparison of AUPR between MultiVTP and baseline methods (i.e., MLP and other machine learning models with the optimal learning strategy). Viruses are sorted from left to right in descending order of VTP counts. (E) Performance of various approaches after removing different fractions of training VTPs. (F) Few-shot evaluation of scratch-trained and fine-tuned MultiVTP.

More »

Expand

Fig 5.

Analysis of predicted and known VTPs in the human proteome across virus species.

(A) Distribution of predicted and known VTPs for each virus species. The results are classified into three categories: novel (newly predicted VTPs), recovered (overlap between predicted and known VTPs), and missing (known but unrecognized VTPs). (B) Distribution of GO terms enriched in predicted and known VTPs for each virus species. The results are classified into three categories: novel (terms enriched exclusively in predicted VTPs), recovered (overlapping terms between predicted and known VTPs), and missing (terms enriched exclusively in known VTPs). (C) Density map of proportion of H1N1 VTPs among the interacting neighbors of different samples. (D) Novel pathways related to H1N1 VTP candidates. (E) Interaction network of relevant VTPs in the adherens junction pathway. (F) Distribution of novel, recovered, and missing labels. The classification is similar to that of VTPs. (G) Network topology and evolutionary conservation attributes of SVTPs and MVTPs. SVTP: single virus target protein, and MVTP: multiple virus target protein. (H) Top 10 biological processes enriched in MVTPs.

More »

Expand