Fig 1.
The analysis pipeline begins by constructing three distinct interaction networks (panel A): physical protein-protein interactions (PPI), gene ontology (GO)-based functional similarity, and protein sequence similarity networks.
Each network is separately encoded into embeddings using Graph Convolutional Networks (GCNs). Subsequently, embeddings from each network are integrated to represent each protein within a three-dimensional unit cube, where each dimension corresponds to a distinct biological perspective (panel B). Protein-protein similarity is computed using the Wasserstein distance, capturing the minimum “cost” to align multivariate distributions of protein embeddings derived from the three networks (panel C). Hierarchical clustering is applied to group similar proteins into clusters based on the Wasserstein distance matrix. Finally, probable targets or host factors of SARS-CoV-2 are predicted by identifying non-CoV-host proteins clustered closely with experimentally validated CoV-host proteins.
Table 1.
Quantitative comparison of our proposed method against baseline embedding and fusion methods.
Table 2.
Performance of GCN in three networks: The first two columns of the table show the total number of nodes and the number of edges in the three networks.
The rest of the columns show the ROC and average precision scores for the validation and test edges.
Fig 2.
Figure shows results of clustering of the Wasserstein distance matrix.
Panel-A shows the heatmap annotated with 10 clusters of the Wasserstein distance matrix. Panel-B represents a heatmap of Wasserstein sub-matrices consisting of CoV-host (in rows) and non-CoV-host (in columns) specific to a particular SARS-CoV-2 protein.
Table 3.
Details of the 10 clusters, including the total number of proteins, the number of CoV-host proteins, the number of non-CoV-host proteins, and the predicted interactions obtained from each cluster.
Fig 3.
Barycenters of clusters in Wasserstein space, illustrating dependencies between sequence similarity, GO similarity, and PPI network features.
Clusters 7, 8, and 9 demonstrate near-perfect dependence, highlighting the cohesion of these measures.
Fig 4.
Figure shows a network diagram of predicted interactions.
Panel-A shows the existing interaction (between column-1 and column-2) and predicted (between column-2 and column-3) interactions with a distance threshold of 0.07. Panel-B shows the same for those predicted proteins which have experimentally verified interactions with at least 3 other viruses.
Table 4.
Predicted human proteins overlapped with other proteins targeted by other viruses.
Table 5.
Table shows associations of FDA-approved drugs with the predicted host factors.
Table 6.
Datasets used in this study.