A graph neural network-based approach for predicting SARS-CoV-2–human protein interactions from multiview data

doi:10.1371/journal.pone.0332794

Fig 1.

The analysis pipeline begins by constructing three distinct interaction networks (panel A): physical protein-protein interactions (PPI), gene ontology (GO)-based functional similarity, and protein sequence similarity networks.

Each network is separately encoded into embeddings using Graph Convolutional Networks (GCNs). Subsequently, embeddings from each network are integrated to represent each protein within a three-dimensional unit cube, where each dimension corresponds to a distinct biological perspective (panel B). Protein-protein similarity is computed using the Wasserstein distance, capturing the minimum “cost” to align multivariate distributions of protein embeddings derived from the three networks (panel C). Hierarchical clustering is applied to group similar proteins into clusters based on the Wasserstein distance matrix. Finally, probable targets or host factors of SARS-CoV-2 are predicted by identifying non-CoV-host proteins clustered closely with experimentally validated CoV-host proteins.

More »

Expand

Table 1.

Quantitative comparison of our proposed method against baseline embedding and fusion methods.

More »

Expand

Table 2.

Performance of GCN in three networks: The first two columns of the table show the total number of nodes and the number of edges in the three networks.

The rest of the columns show the ROC and average precision scores for the validation and test edges.

More »

Expand

Fig 2.

Figure shows results of clustering of the Wasserstein distance matrix.

Panel-A shows the heatmap annotated with 10 clusters of the Wasserstein distance matrix. Panel-B represents a heatmap of Wasserstein sub-matrices consisting of CoV-host (in rows) and non-CoV-host (in columns) specific to a particular SARS-CoV-2 protein.

More »

Expand

Table 3.

Details of the 10 clusters, including the total number of proteins, the number of CoV-host proteins, the number of non-CoV-host proteins, and the predicted interactions obtained from each cluster.

More »

Expand

Fig 3.

Barycenters of clusters in Wasserstein space, illustrating dependencies between sequence similarity, GO similarity, and PPI network features.

Clusters 7, 8, and 9 demonstrate near-perfect dependence, highlighting the cohesion of these measures.

More »

Expand

Fig 4.

Figure shows a network diagram of predicted interactions.

Panel-A shows the existing interaction (between column-1 and column-2) and predicted (between column-2 and column-3) interactions with a distance threshold of 0.07. Panel-B shows the same for those predicted proteins which have experimentally verified interactions with at least 3 other viruses.

More »

Expand

Table 4.

Predicted human proteins overlapped with other proteins targeted by other viruses.

More »

Expand

Table 5.

Table shows associations of FDA-approved drugs with the predicted host factors.

More »

Expand

Table 6.

Datasets used in this study.

More »

Expand