Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Heterogeneous biological graph convolutional network for drug-target interaction prediction

  • Haoran Zhu,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Visualization, Writing – original draft

    Affiliations School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China, School of Computer Science and Informatics, University of Liverpool, Liverpool, United Kingdom

  • Jianjia Wang ,

    Roles Conceptualization, Funding acquisition, Project administration, Writing – review & editing

    jianjia.wang@xjtlu.edu.cn

    Affiliation School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China

  • Zhen Hua,

    Roles Investigation, Software, Validation, Writing – review & editing

    Affiliation School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China

  • Chaoqun Wang,

    Roles Formal analysis, Investigation, Validation, Writing – review & editing

    Affiliation School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China

  • Zimu Zhang,

    Roles Data curation, Visualization, Writing – original draft

    Affiliation School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu, China

  • Tong Yu,

    Roles Data curation, Formal analysis, Software, Writing – original draft

    Affiliation School of Computer Engineering and Science, Shanghai University, Shanghai, Shanghai, China

  • Ling Ge

    Roles Data curation, Investigation, Resources, Supervision, Writing – review & editing

    Affiliation Department of Clinical Laboratory, Huaibei People’s Hospital, Huaibei, Anhui, China

Abstract

Drug–target interaction prediction plays a critical role in drug discovery by identifying potential therapeutic targets and elucidating underlying molecular mechanisms. However, existing computational methods generally rely on limited biological modalities and inadequately capture heterogeneous associations. To overcome these limitations, we propose a Heterogeneous Biological Graph Convolutional Network (HBGCN) that employs a hierarchical graph propagation architecture to integrate multimodal biological information and learn homogeneous and heterogeneous representations for drug–target interaction prediction. By incorporating both direct and indirect meta-paths, HBGCN captures complex relational dependencies among diverse biological entities. Experimental results demonstrate that HBGCN achieves competitive performance on benchmark datasets. Case studies indicate that HBGCN effectively identifies therapeutic drug candidates and reveals proteins and gene expression patterns associated with drug regulation. The source code and dataset are available at https://github.com/Saxon0918/HBGCN.

Introduction

The prediction of drug–target interactions (DTIs) supports drug repositioning, adverse drug reaction detection, and molecular mechanism elucidation through the systematic analysis of binding patterns between bioactive compounds and targets [1]. Traditional DTI identification primarily relies on in vivo and in vitro experiments, including high-throughput screening and pharmacokinetic evaluation [2,3]. Although these approaches yield reliable results, they are typically constrained by high costs, substantial labor requirements, and limited scalability, particularly in large-scale studies. Consequently, computational approaches that exploit underlying biological characteristics have emerged as effective strategies for identifying potential drug–target interactions and improving the efficiency of drug discovery.

The general workflow of DTI prediction is illustrated in‌‌ Fig 1. Following the construction of a multimodal dataset, hierarchical molecular structures and known intra-entity interactions are extracted as initial features, which are subsequently refined through computational methods. Based on the strategies for information integration, existing DTI prediction studies are generally classified into feature-based and graph-based approaches.

thumbnail
Fig 1. The workflow of DTI prediction, including four components: datasets, entity representation, models, and clinical experiments.

https://doi.org/10.1371/journal.pone.0348895.g001

Feature-based methods leverage machine learning and dimensionality reduction techniques to infer potential associations among biological entities. Probabilistic frameworks integrate heterogeneous similarity measures of drugs and targets within bipartite networks and employ probabilistic inference to improve prediction performance [4]. Low-rank matrix factorization techniques incorporate chemical structures, phenotypic profiles, and drug–drug interactions to uncover latent drug-disease associations [5]. Hierarchical clustering-based evaluation schemes mitigate biases in conventional data-splitting strategies and provide a biologically meaningful assessment of model generalizability [6]. In addition, some studies apply representation learning to derive high-dimensional feature vectors, which are subsequently used in random forest classifiers and Bayesian models for interaction prediction, while recent work further employs gradient boosting classifiers to improve representation discriminability [710]. Other studies employ random walk strategies, logistic regression, and ensemble learning to improve drug discovery [1114].

Although machine learning approaches improve the efficiency of DTI prediction, their limited capacity to model subtle entity-specific features constrains predictive performance. In contrast, graph-based approaches integrate multimodal data by constructing heterogeneous networks, thereby revealing complex molecular interactions and functional associations. Some studies integrate attention mechanisms into graph neural networks (GNNs) to expand the receptive field, thereby capturing long-range dependencies among entities and uncovering potential DTI patterns [1519]. Other studies leverage generative adversarial networks and contrastive learning strategies to improve embedding alignment across distinct entities, thereby enhancing prediction accuracy [20,21]. In addition, meta-learning frameworks that employ subgraph matching and weakly supervised information bottlenecks enhance predictive performance [22]. Recent work integrates higher- and lower-order biological information to characterize complex biological associations [23]. Furthermore, combining large language models for biological text representation with GNN-based structural encoding and knowledge graph reasoning further improves model robustness [2426].

Despite recent advances in biological interaction prediction, existing studies exhibit limited performance and generalization capability when applied to sparse biological networks. Moreover, most approaches focus on modeling associations between specific pairs of entities and do not fully exploit the complex relationships inherent in biological systems. To address these limitations, we propose a graph convolutional network (GCN) framework that learns both homogeneous and heterogeneous representations of diverse biological entities, thereby improving the accuracy of DTI prediction. The main contributions are summarized as follows.

  • We propose a Heterogeneous Biological Graph Convolutional Network (HBGCN) that hierarchically integrates multimodal information to enhance the representation learning of drugs and targets within a heterogeneous biological network.
  • We construct a biological network dataset consisting of drugs, diseases, genes, and proteins to support DTI prediction.
  • Comparative experiments with state-of-the-art algorithms demonstrate the effectiveness of HBGCN in drug repositioning, while case studies further validate its performance in identifying potential drug–protein pairings and gene regulatory mechanisms.

The remainder of this paper is organized as follows. Materials and methods provides a detailed description of the dataset construction, feature extraction, and the architecture of HBGCN. Results and discussion presents the implementation details and experimental results. Conclusion summarizes the key findings of this study and discusses potential directions for future research.

Materials and methods

Overview

In this section, we introduce the Heterogeneous Biological Graph Convolutional Network (HBGCN), which integrates multi-source biological data to predict drug-target interactions. As illustrated in Fig 2(a), the framework incorporates four fundamental types of biological entities, including drugs, genes, proteins, and diseases. To comprehensively capture complex associations, drug-target relationships are further classified into direct and indirect interactions. The HBGCN framework comprises three major components. The drug-target meta-path component leverages relational pathways to enhance biological interpretability. The similarity network integrates multiple similarity measures to refine interaction patterns among entities. The feature learning and interaction prediction module applies graph convolutional mechanisms to extract high-level representations and infer potential associations.

thumbnail
Fig 2. The workflow of the HBGCN.

(a): The associations between drugs and targets in the model. (b): The four types of meta-paths between drugs and targets. (c) Construction of a heterogeneous drug-gene-protein-disease network by integrating multiple drug-related datasets. (d) The overall architecture of HBGCN, which consists of feature learning and interaction prediction modules.

https://doi.org/10.1371/journal.pone.0348895.g002

Data acquisition and preprocessing

We construct a heterogeneous network consisting of four types of nodes and nine types of edges. Specifically, drug information is obtained from the DrugBank v3.0 database [27]. Motivated by existing drug network construction strategies [28,29], drug similarities are computed from structural fingerprints using the Jaccard similarity coefficient, which quantifies the proportion of maximal common substructures (MCS) shared between drug molecules [30]. Given two drug interaction graphs dr1 and dr2, the JC(dr1, dr2) is calculated as

(1)

Disease entities are obtained from the Medical Subject Headings (MeSH) database, with similarities determined based on semantic relationships defined in the MeSH hierarchy, in which diseases are organized into coarse-grained parent and fine-grained child categories [31]. The semantic representation of each disease is derived from the cumulative semantic contributions of the disease itself and its related subcategories. Given two diseases di1 and di2, the similarity S(di1, di2) is calculated as

(2)

where is the semantic contribution of diseases.

Gene and protein interaction networks are derived from HumanNet [32] and Human Protein Reference Database [33], respectively. Drug-disease and disease-protein associations are retrieved from the Comparative Toxicogenomics Database [34]. Drug-gene and drug-protein associations are collected from DGIdb v5.0 [35] and reference [1], respectively. Disease-gene associations are obtained from the DisGeNET v6.0 database [36].

Table 1 summarizes the number and density of nine types of relationships. The constructed dataset consists of 542 drugs, 394 diseases, 11,153 genes, and 1,512 proteins. Density is defined as the fraction of known associations among all possible pairs of the corresponding entity sets. Notably, disease-protein associations exhibit the highest density, indicating extensive connectivity, whereas drug-gene and drug-protein associations remain relatively sparse due to the limited number of known interactions. Although gene-gene interactions involve numerous connections, their density remains low due to the vast interaction space. These variations in density highlight the structural heterogeneity of the dataset and reflect the diverse nature of biological relationships.

thumbnail
Table 1. Data volume, comprising four entities, as well as nine homogeneous or heterogeneous connections.

https://doi.org/10.1371/journal.pone.0348895.t001

Heterogeneous learning framework

This section introduces the proposed heterogeneous learning framework, which constructs a unified biological network over drugs, diseases, genes, and proteins by integrating meta-path semantics with entity similarity information. Building on this network, HBGCN is designed to capture relational dependencies among entities for identifying potential drug–target interactions.

Specifically, the meta-path is a predefined multi-hop schema among different entities, allowing models to capture higher-order interactions beyond direct associations. As illustrated in Fig 2(b), drug–disease associations are categorized into direct (drdi) and indirect associations. The indirect associations are further divided into three pathways, including drug-gene-disease (drgdi), drug-protein-disease (drpdi), and drug-gene-protein-disease (drgpdi). These meta-paths describe how drugs can be connected to diseases via intermediate entities, thereby encoding complex relational semantics.

The heterogeneous biological network constructed from entity similarities and meta-paths is shown in Fig 2(c). By extracting features from homogeneous and heterogeneous associations, we generate the initial vector representations of drugs (Vdr), genes (Vg), proteins (Vp), and diseases (Vdi), which are subsequently input into the graph convolutional model.

The architecture of HBGCN is illustrated in Fig 2(d). The model achieves DTI prediction by progressively aggregating information from biological entities. To effectively extract features of homogeneous similarity networks of drugs and diseases, we individually propagate their initial representations through GCNs, which are calculated as

(3)

where , and is the weight matrix for the lth graph convolutional layer. The normalized adjacency matrix is calculated by , where Ix is the identity matrix. Ax is set to 1 if a known link exists between two entities; otherwise, it is set to 0. is the corresponding degree matrix, and is the ReLU activation function. The inputs of the first layer and . Incorporating neighboring information from homogeneous networks enables capturing local features of drugs and diseases, thereby improving the interpretation of relationships within a broader context.

In addition, fully connected (FC) layers are employed to refine the embedding representations of the four distinct entities within their respective homogeneous networks, which are calculated as

(4)

where ; Wx and bx represent the weight matrix and bias vector of each layer.

Different from homogeneous networks, where edges represent uniform relationships, heterogeneous networks involve various types of edges, each carrying distinct semantic information. To capture direct characteristics of drug-disease associations, we employ a GCN to learn the topological dependencies among nodes, calculated as

(5)

where the inputs of the first layer and ; is the ReLU activation function; is a layer-specific trainable weight matrix; and .

In contrast, for indirect associations, we separately establish heterogeneous GCNs among various biological entities to explore potential latent relationships. Specifically, indirect GCNs are defined as

(6)

where ; the inputs of the first layer , , , and ; is the ReLU activation function; the definitions of , , and are similar to the corresponding symbol definitions in Eq 5.

To capture global features of entities within the heterogeneous network, we employ a relational graph convolutional network (RGCN). Specifically, the representation of each node is updated by aggregating information from its neighbors, where corresponding weights are determined by the types of surrounding nodes and the relationships between them. As a result, RGCN can uncover valuable relational patterns, thereby enhancing the learning ability and predictive performance of the framework. The specific operation is

(7)

where R is the number of edge types in the network; represents the set of neighbors of node i under edge type r; represents the weight parameter of edge type r; represents the weight parameter of node i itself; is the embedding of node i in the lth RGCN layer, when l = 0, , and the output of the last layer of the RGCN model is .

Finally, the updated representations of drugs, genes, proteins, and diseases are input into a multilayer perceptron (MLP) to transform the feature space and generate the final embeddings.

(8)

where ); denotes the set of entities associated with x; and are the weight and bias of the MLP. In addition, the activation function of each layer is ReLU.

Interaction prediction

To quantify the interrelationships among biological entities, we define interaction scores based on the cosine similarity between their final embedding vectors. For example, the score between drugs and diseases is calculated as

(9)

Analogously, we derive the relational strengths for drug-gene and drug-protein associations. During training, to ensure that each type of entity effectively captures the structural characteristics of the heterogeneous network, we adopt Mean Squared Error (MSE) as the loss function, which is defined as

(10)

where , and the definitions of the remaining four loss terms are analogous to Lossdrdi; is the weighting coefficient that balances the contributions of direct and indirect association losses.

Algorithm 1 illustrates the overall workflow of model training. During this phase, all trainable parameters are initialized with random values. The loss is calculated by propagating the input data through the network, followed by backpropagation to update model parameters and data representations.

Algorithm 1 Training Procedure of HBGCN

Input:

- Drug features Vdr; disease features Vdi; gene features Vg; protein features Vp

- Drug-disease associations Adrdi; drug-gene associations Adrg; drug-protein associations Adrp; disease-gene associations Adig; disease-protein associations Adip

- Training epochs E; pre-defined learning rate lr; weighting coefficient

Output:

- Interaction scores between drugs and targets , , and

Method:

1: Randomly initialize the weight W and bias b of each layer.

2: for epoch = 1 to E do

3:  Feature Learning module

4:  Calculate homogeneous drug vector dr1 and disease di1 vector by Eq 3.

5:  Calculate dr2, g1, p1, and di2 by Eq 4.

6:  Calculate direct drug vector dr3 and disease vector di3 by Eq 5.

7:  Update dr4, dr5, g2, g3, p2, p3, di4, and di5 by Eq 6.

8:  Update entities vector by Eq 7

9:  Calculate final vector dr, g, p, and di by 8.

10.

11:  Interaction prediction module

12:  Calculate the interaction scores , , and by Eq 9.

13: end for

Results and discussion

In this section, we present the experimental settings and results. Extensive experiments are conducted to evaluate the performance of HBGCN, including comparisons of predictive accuracy with baseline methods, drug–target case studies, ablation studies, and hyperparameter analyses.

Implementation details

In our implementation, we employ five-fold cross-validation, where each test set consists of a randomly selected subset of known positive drug-target interactions and an equal number of negative samples. The feature embedding dimensions for the FC, GCN, and RGCN modules are set to 256, while the output dimensions of the three-layer MLP are 256, 128, and 64, respectively. The model is trained for 2000 epochs using the Adam optimizer to minimize the loss function. The learning rate is 0.0001, and in the loss function is 0.05. To evaluate predictive performance, we adopt standard metrics such as the area under the ROC curve (AUC), the area under the precision-recall curve (AUPR), F1-score, precision, and recall, which are commonly used in DTI prediction tasks.

Comparison of predictive performance

To evaluate the predictive performance of HBGCN, we compare it with multiple state-of-the-art algorithms. Some of these methods incorporate graph-based learning with attention mechanisms and similarity-based matrix decomposition for DTI prediction, while others employ traditional methods such as collaborative filtering and low-dimensional vector projection. To ensure the rigor of comparative experiments, all models are trained and evaluated on the same benchmark dataset. Notably, for baselines that do not utilize all modalities, we provide only the required input data to avoid introducing unavailable information. The details of the baseline methods are summarized as follows.

  • DMHGNN [37]: A double multi-view heterogeneous graph neural network that jointly learns from a heterogeneous network informed by meta-path semantics and drug–target pair graphs.
  • GCNMM [38]: A graph convolutional network based on meta-paths and mutual information for DTI prediction.
  • HMLKGAT [39]: A multi-layer graph model enhanced by adaptive attention mechanisms to capture associations among biological entities within a drug-protein-disease heterogeneous network.
  • DRAGNN [16]: A weighted local information augmented graph neural network for drug repositioning.
  • MGRMF [40]: A similarity-based approach utilizing low-rank matrix factorization with multi-graph regularization to enhance the accuracy of drug-disease association predictions.
  • LBMFF [25]: A method that extracts the drug and disease fusion similarity matrix through BERT and refines feature embedding using a graph convolutional network.
  • DRWBNCF [41]: A neural collaborative filtering framework designed to infer novel potential drugs for diseases.
  • REDDA [42]: A heterogeneous graph neural network model that integrates multiple biological relationships.
  • LAGCN [43]: A layer-wise attention graph convolutional network to effectively capture hierarchical drug-disease associations.
  • MGRNNM [44]: A model that predicts the interactions between drugs and target proteins by integrating similarity measures and interaction patterns.
  • DTINet [1]: A drug-target interaction prediction method based on the feature-based low-dimensional vector projection scheme.

The experimental results for drug–target interaction prediction are presented in Table 2, where diseases serve as targets. Except for the AUC, which is slightly lower than that of DRWBNCF, HBGCN achieves the highest AUPR (0.9611), F1-score (0.8912), precision (0.8761), and recall (0.9072), outperforming the best baseline by 5.43%, 7.40%, 5.97%, and 8.92%, respectively. Compared with other graph convolutional models, HBGCN leverages both direct and indirect molecular interactions to integrate multimodal biomedical information, thereby enhancing DTI prediction performance. The superior recall demonstrates that HBGCN effectively reduces false negatives and improves sensitivity in identifying true positive associations. Furthermore, some methods rely on traditional matrix decomposition techniques, which, despite demonstrating effectiveness in capturing global structural patterns, exhibit limitations in refining feature representations through hierarchical information propagation.

Case study

Drug repositioning.

To evaluate the effectiveness of HBGCN, we conducted two case studies on drug repositioning for depressive disorder and autistic disorder. To ensure an unbiased assessment, all known associations related to the target diseases were removed from the training set. HBGCN was subsequently employed to estimate interaction scores between drugs and diseases, and the top ten potential therapeutic candidates for each disorder were identified. The predicted results were validated by exploring the PubMed database for supporting evidence in publications and clinical studies.

Depressive disorder is a mental health condition characterized by disturbances in mood, sleep, appetite, and cognitive function. The top ten candidate drugs predicted by HBGCN for the treatment of depressive disorder are shown in Fig 3(a) and Table 3. The results show that several predicted drugs have been reported in the literature as therapeutic agents for depressive disorder, primarily including selective serotonin reuptake inhibitors (SSRIs), antipsychotics, and mood stabilizers, such as fluoxetine, clozapine, amitriptyline, diazepam, carbamazepine, risperidone, and olanzapine. In contrast, other candidates, such as propofol and topiramate, have not yet been formally confirmed for the treatment of depressive disorder.

thumbnail
Table 3. Top ten predicted drugs for depressive disorder.

https://doi.org/10.1371/journal.pone.0348895.t003

thumbnail
Fig 3. The distribution of interaction scores for drug repositioning.

(a) The distribution of interaction scores for Depressive Disorder. (b)The distribution of interaction scores for Autistic Disorder. Red dots indicate the top ten most important drugs, grey dots represent drugs with a standardized score below 0.6, and blue dots denote drugs with moderate scores.

https://doi.org/10.1371/journal.pone.0348895.g003

To further assess the biological plausibility of the predicted candidates, we examined existing pharmacological evidence reported in the literature. Specifically, fluoxetine is widely prescribed for depressive disorder by inhibiting the serotonin transporter [45]. Furthermore, its neuroplasticity-enhancing properties promote synaptic remodeling, potentially contributing to sustained therapeutic efficacy. Clozapine modulates glutamatergic and -aminobutyric acid signaling, enhances neuroplasticity and neurotrophic factor expression, and reduces neuroinflammatory responses [46]. Amitriptyline modulates monoaminergic neurotransmission to produce antidepressant effects [47]. As an established mood stabilizer, carbamazepine has been shown to reduce manic and depressive episodes by modulating voltage-gated sodium channels and neurotransmitter activity [48]. In addition, although the efficacy of propofol and topiramate in treating depressive disorder remains unconfirmed, pharmacological studies indicate that both agents enhance inhibitory neurotransmission, which may contribute to anxiolytic and antidepressant-like effects [49,50].

Autistic disorder is a neurodevelopmental condition characterized by impairments in social interaction, communication deficits, and repetitive behaviors. The top ten candidate drugs predicted for the treatment of autistic disorder are shown in Fig 3(b) and Table 4. The predicted candidates mainly include neuropsychiatric drugs, such as olanzapine, clomipramine, risperidone, quetiapine, imipramine, aripiprazole, amitriptyline, thioridazine, and nortriptyline. In addition, tetracycline is identified as a potential candidate, although its therapeutic association with autistic disorder has not yet been reported in the literature.

thumbnail
Table 4. Top ten predicted drugs for autistic disorder.

https://doi.org/10.1371/journal.pone.0348895.t004

As the candidate with the highest interaction score, olanzapine is an atypical antipsychotic that modulates neurotransmission through antagonism of dopamine and serotonin receptors, thereby alleviating behavioral symptoms such as irritability and aggression [51]. Clomipramine modulates serotonin systems to alleviate obsessive–compulsive–like symptoms [52]. Risperidone improves repetitive, aggressive, and self-injurious behaviors in individuals with autistic disorder, with some limitations related to tolerability [53]. Quetiapine and aripiprazole, both second-generation antipsychotics, modulate dopaminergic and serotonergic neurotransmission to alleviate irritability and aggressive behaviors associated with the disorder [54]. Although tetracycline has not been confirmed for the treatment of autistic disorder, its derivative minocycline has been shown to alleviate autism-like behaviors in mice by inhibiting microglial activation, reducing neuroinflammation, and improving hippocampal neurogenesis [55].

These findings demonstrate that HBGCN effectively identifies pharmacologically relevant compounds for diseases, highlighting its potential application in drug repositioning.

Drug-gene interaction prediction.

Tamoxifen is a selective estrogen receptor (ER) modulator that inhibits breast cancer progression by interfering with tumor growth signaling pathways. The top ten predicted genes associated with tamoxifen are shown ‌‌in Fig 4 and Table 5. The predicted genes mainly participate in estrogen receptor signaling, drug metabolism, cell cycle regulation, and tumor proliferation, such as ESR1, PGR, GREB1, FOXA1, CYP2D6, CYP2C9, AURKA, BRCA1, and E2F7. In contrast, the candidate gene SOX5 has not yet been reported in the literature.

thumbnail
Fig 4. The distribution of drug-gene interaction prediction for Tamoxifen, with red dots representing the top ten important genes.

https://doi.org/10.1371/journal.pone.0348895.g004

thumbnail
Table 5. Top ten predicted genes for Tamoxifen.

https://doi.org/10.1371/journal.pone.0348895.t005

We further investigated existing functional and pharmacogenomic evidence to elucidate the regulatory mechanisms through which the predicted genes participate in tamoxifen response. As ER co-regulators, ESR1, PGR, GREB1, and FOXA1 modulate the transcriptional response to tamoxifen and influence hormone receptor signaling pathways involved in breast cancer progression [5658]. CYP2D6 and CYP2C9 are involved in the metabolic activation of tamoxifen [59], whereas AURKA, BRCA1, and E2F7 play essential roles in regulating cancer cell proliferation, DNA damage response, and cell cycle progression [60,61]. Although the predicted association of SOX5 remains unconfirmed, previous studies suggest that SOX5 participates in apoptosis regulation, and its dysregulation may contribute to tumor development [62].

These findings indicate that the predicted genes are involved in multiple biological processes related to tamoxifen response, supporting the effectiveness of HBGCN in predicting drug-gene interactions.

Drug-protein interaction prediction.

As shown in Fig 5, the predicted drug–protein interactions for clozapine are categorized into eight classes according to protein functions, biological pathways, and physiological processes. Notably, clozapine exhibits high binding affinity toward receptor proteins, which aligns with its pharmacological profile as an atypical antipsychotic that regulates synaptic transmission. Table 6 presents detailed information for the top ten protein targets. These targets primarily include neurotransmitter receptors and signaling-related proteins involved in dopaminergic, adrenergic, and cholinergic pathways.

thumbnail
Fig 5. The distribution of drug-protein interaction predictions for Clozapine.

The proteins are categorized into eight classes based on biological functions and involvement in physiological processes.

https://doi.org/10.1371/journal.pone.0348895.g005

thumbnail
Table 6. Top ten predicted proteins for Clozapine.

https://doi.org/10.1371/journal.pone.0348895.t006

Specifically, dopamine receptors, including the D(2) and D(4) subtypes, are identified as major targets due to their involvement in modulating dopaminergic neurotransmission, which is central to alleviating the positive symptoms of schizophrenia [63]. Clozapine also interacts with 5-hydroxytryptamine receptors, contributing to the management of mood instability [64]. The Alpha-2A adrenergic receptor is critical for controlling norepinephrine release and mediating the sedative properties of the drug [65]. For unconfirmed targets, clozapine may promote neuroplasticity and cognitive enhancement by inhibiting sodium ion influx and molecular mechanisms [66,67].

These findings suggest that the predicted protein targets are associated with multiple neurotransmitter systems and signaling pathways related to clozapine pharmacology.

Ablation study

To comprehensively evaluate the contribution of each component in HBGCN, we conduct an ablation study with three variants, each removing specific components.

  • w/o protein: The first variant removes the protein component and relies solely on the drug-gene-disease heterogeneous network to learn embedding representations.
  • w/o gene: The second variant relies solely on the drug-protein-disease heterogeneous network to learn embeddings.
  • w/o indirect GCN: The third variant removes the indirect association GCN modules that encode multi-hop connectivity between drugs, diseases, and intermediate entities, while retaining the remaining components.

The results in Table 7 demonstrate that HBGCN achieves the best performance in drug candidate prediction. Notably, removing either the protein or gene component leads to only a minor performance degradation, with an average decrease of approximately 1% across all evaluation metrics. In contrast, removing the indirect association encoder causes a substantial decline, with AUC and AUPR decreasing by 12.72% and 13.73%, respectively. These findings indicate that gene- and protein-mediated signals provide partially complementary information, whereas indirect multi-hop associations contribute critical relational context that cannot be adequately captured by direct links or homogeneous similarities alone. The ablation study validates the robustness of HBGCN and underscores the importance of incorporating diverse biological interactions in DTI prediction.

Hyperparameter study

We investigate the sensitivity of key hyperparameters and report the performance of HBGCN on the DTI prediction task under different parameter settings, as shown in Fig 6.

  • The value of the loss weighting coefficient : Since the loss of indirect drug–disease associations is significantly larger than that of direct associations, the available values of are set to [0.03, 0.05, 0.1, 0.5, 1.0]. As shown in Fig 6(a), the performance of HBGCN decreases progressively as increases, with the best performance obtained when is set to 0.05. These findings indicate that appropriately reducing the contribution of indirect associations improves the accuracy of DTI prediction.
  • The value of the training epoch: The second hyperparameter investigated is the number of training epochs. Fig 6(b) illustrates the variation of five evaluation metrics and the training loss under different epoch settings. As the number of epochs increases, the training loss decreases significantly, while all metrics gradually increase and eventually stabilize. Therefore, setting the epoch to 2000 is suitable for HBGCN.
thumbnail
Fig 6. Hyperparameter sensitivity analysis of HBGCN.

(a) The value of the loss weighted coefficient . (b) The value of the training epoch. The prediction performance improves with the increased weight assigned to direct drug-disease associations and the expansion of training epochs.

https://doi.org/10.1371/journal.pone.0348895.g006

Conclusion

In this study, we propose a novel multimodal graph convolutional network, termed HBGCN, for drug–target interaction prediction. The proposed model exploits similarities among homogeneous entities and meta-paths among heterogeneous entities to characterize complex biological relationships. By integrating heterogeneous information, HBGCN captures higher-order semantic dependencies and overcomes the limitations of conventional methods that primarily rely on direct interactions. Through hierarchical graph propagation, the model iteratively aggregates and refines node representations, thereby promoting the convergence of pharmacologically related drugs and targets within a shared latent space. The interaction score is computed based on similarity to quantify the predicted association strength between drug–target pairs. Extensive experiments demonstrate that HBGCN outperforms existing methods on benchmark datasets. In drug repositioning tasks, HBGCN achieves improved performance over the best baseline, with a 5.43% increase in AUPR and a 7.4% increase in F1-score. Case studies demonstrate the potential of HBGCN to identify therapeutic drug candidates and elucidate underlying pharmacological mechanisms.

Despite these findings, the current framework derives initial molecular features exclusively from inter-entity relationships. Therefore, future work will focus on incorporating intra-entity elemental distribution patterns and multiview biological evidence within a unified learning objective to improve the biological coherence of learned embeddings. In addition, the current framework relies on sparse biological associations, which may introduce noise and limit the robustness of representation learning. To address these limitations, advanced graph learning techniques, such as link-based attributed graph clustering and hypergraph structure discovery, may further denoise sparse associations and provide principled structural priors for capturing informative features.

Supporting information

S1 File. Dataset.

The dataset comprises four types of biological entities, including drugs, diseases, genes, and proteins, along with their corresponding interaction and association data.

https://doi.org/10.1371/journal.pone.0348895.s001

(ZIP)

References

  1. 1. Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573. pmid:28924171
  2. 2. Torrance CJ, Agrawal V, Vogelstein B, Kinzler KW. Use of isogenic human cancer cells for high-throughput screening and drug discovery. Nat Biotechnol. 2001;19(10):940–5. pmid:11581659
  3. 3. Katz DA, Murray B, Bhathena A, Sahelijo L. Defining drug disposition determinants: a pharmacogenetic-pharmacokinetic strategy. Nat Rev Drug Discov. 2008;7(4):293–305. pmid:18382463
  4. 4. Fakhraei S, Huang B, Raschid L, Getoor L. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Trans Comput Biol Bioinform. 2014;11(5):775–87. pmid:26356852
  5. 5. Hu P, Huang Y, Mei J, Leung H, Chen Z, Kuang Z. Learning from low-rank multimodal representations for predicting disease-drug associations. BMC Med Inform Decis Mak. 2021;21:1–13.
  6. 6. Bai P, Miljković F, Ge Y, Greene N, John B, Lu H. Hierarchical clustering split for low-bias evaluation of drug-target interaction prediction. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2021. p. 641–4.
  7. 7. Zhu Y, Ning C, Zhang N, Wang M, Zhang Y. GSRF-DTI: a framework for drug-target interaction prediction based on a drug-target pair network and representation learning on a large graph. BMC Biol. 2024;22(1):156. pmid:39020316
  8. 8. Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, et al. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS One. 2012;7(5):e37608. pmid:22666371
  9. 9. Madhukar NS, Khade PK, Huang L, Gayvert K, Galletti G, Stogniew M, et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nat Commun. 2019;10(1):5221. pmid:31745082
  10. 10. Zhao B-W, Su X-R, Hu P-W, Huang Y-A, You Z-H, Hu L. iGRLDTI: an improved graph representation learning method for predicting drug-target interactions over heterogeneous biological information network. Bioinformatics. 2023;39(8):btad451. pmid:37505483
  11. 11. Yan X-Y, Zhang S-W, He C-R. Prediction of drug-target interaction by integrating diverse heterogeneous information source with multiple kernel learning and clustering methods. Comput Biol Chem. 2019;78:460–7. pmid:30528728
  12. 12. Yang J, He S, Zhang Z, Bo X. NegStacking: drug-target interaction prediction based on ensemble learning and logistic regression. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(6):2624–34. pmid:31985434
  13. 13. Xuan P, Chen B, Zhang T, Yang Y. Prediction of drug-target interactions based on network representation learning and ensemble learning. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(6):2671–81. pmid:32340959
  14. 14. Zhao B-W, Su X-R, Yang Y, Li D-X, Li G-D, Hu P-W, et al. Regulation-aware graph learning for drug repositioning over heterogeneous biological network. Inf Sci. 2025;686:121360.
  15. 15. Xie Y, Wang X, Wang P, Bi X. A pseudo-label supervised graph fusion attention network for drug–target interaction prediction. Expert Syst Appl. 2025;259:125264.
  16. 16. Meng Y, Wang Y, Xu J, Lu C, Tang X, Peng T, et al. Drug repositioning based on weighted local information augmented graph neural network. Brief Bioinform. 2023;25(1):bbad431. pmid:38019732
  17. 17. Li M, Guo Z, Wu Y, Guo P, Shi Y, Hu S, et al. ViDTA: enhanced drug-target affinity prediction via virtual graph nodes and attention-based feature fusion. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2024. p. 42–7.
  18. 18. Zhang C, Sun J, Xing L, Zhang L, Cai H, Guo M. MHANDTI: drug-target interaction prediction model based on heterogeneous graph multi-hop attention networks. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2024. p. 475–8.
  19. 19. Hu J, Bewong M, Kwashie S, Zhang W, Nofong VM, Wu G, et al. A heterogeneous network-based contrastive learning approach for predicting drug-target interaction. 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2024. p. 294–9.
  20. 20. Sun C, Xuan P, Zhang T, Ye Y. Graph convolutional autoencoder and generative adversarial network-based method for predicting drug-target interactions. IEEE/ACM Trans Comput Biol Bioinform. 2022;19(1):455–64. pmid:32750854
  21. 21. Tian Z, Yu Y, Ni F, Zou Q. Drug-target interaction prediction with collaborative contrastive learning and adaptive self-paced sampling strategy. BMC Biol. 2024;22(1):216. pmid:39334132
  22. 22. Wang Y, Xia Y, Yan J, Yuan Y, Shen H-B, Pan X. ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions. Nat Commun. 2023;14(1):7861. pmid:38030641
  23. 23. Zhao B-W, Wang L, Hu P-W, Wong L, Su X-R, Wang B-Q, et al. Fusing higher and lower-order biological information for drug repositioning via graph representation learning. IEEE Trans Emerg Topics Comput. 2024;12(1):163–76.
  24. 24. Hua Y, Feng Z, Song X, Wu X-J, Kittler J. MMDG-DTI: drug–target interaction prediction via multimodal feature fusion and domain generalization. Pattern Recognit. 2025;157:110887.
  25. 25. Kang H, Hou L, Gu Y, Lu X, Li J, Li Q. Drug-disease association prediction with literature based multi-feature fusion. Front Pharmacol. 2023;14:1205144. pmid:37284317
  26. 26. Li D, Yang Y, Cui Z, Yin H, Hu P, Hu L. LLM-DDI: leveraging large language models for drug-drug interaction prediction on biomedical knowledge graph. IEEE J Biomed Health Inform. 2026;30(1):773–81. pmid:40601466
  27. 27. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, et al. DrugBank 3.0: a comprehensive resource for “omics” research on drugs. Nucleic Acids Res. 2011;39(Database issue):D1035-41. pmid:21059682
  28. 28. Peng L, Bai Z, Liu L, Yang L, Liu X, Chen M, et al. DTI-MvSCA: an anti-over-smoothing multi-view framework with negative sample selection for predicting drug-target interactions. IEEE J Biomed Health Inform. 2025;29(1):711–23.
  29. 29. Chen R, Xia F, Hu B, Jin S, Liu X. Drug-target interactions prediction via deep collaborative filtering with multiembeddings. Brief Bioinform. 2022;23(2):bbab520. pmid:35043158
  30. 30. Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc. 2003;125(39):11853–65. pmid:14505407
  31. 31. Lowe HJ, Barnett GO. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA. 1994;271(14):1103–8. pmid:8151853
  32. 32. Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, et al. HumanNet v3: an improved database of human gene networks for disease research. Nucleic Acids Res. 2022;50(D1):D632–9. pmid:34747468
  33. 33. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, et al. Human protein reference database--2009 update. Nucleic Acids Res. 2009;37(Database issue):D767-72. pmid:18988627
  34. 34. Davis AP, Wiegers TC, Johnson RJ, Sciaky D, Wiegers J, Mattingly CJ. Comparative toxicogenomics database (CTD): update 2023. Nucleic Acids Res. 2023;51(D1):D1257–62. pmid:36169237
  35. 35. Cannon M, Stevenson J, Stahl K, Basu R, Coffman A, Kiwala S, et al. DGIdb 5.0: rebuilding the drug-gene interaction database for precision medicine and drug discovery platforms. Nucleic Acids Res. 2024;52(D1):D1227–35. pmid:37953380
  36. 36. Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res. 2020;48(D1):D845–55. pmid:31680165
  37. 37. Ning Q, Wang Y, Zhao Y, Sun J, Jiang L, Wang K, et al. DMHGNN: double multi-view heterogeneous graph neural network framework for drug-target interaction prediction. Artif Intell Med. 2025;159:103023. pmid:39579417
  38. 38. Cao S, Cai B, Qiu Z, Chang T, Wuyun Q, Wu F-X. Graph convolution network based on meta-paths and mutual information for drug-target interaction prediction. BMC Bioinform. 2025;26(1):275. pmid:41204097
  39. 39. Li D, Xiao Z, Sun H, Jiang X, Zhao W, Shen X. Prediction of drug-disease associations based on multi-kernel deep learning method in heterogeneous graph embedding. IEEE/ACM Trans Comput Biol Bioinform. 2024;21(1):120–8. pmid:38051617
  40. 40. Ai C, Yang H, Ding Y, Tang J, Guo F. Low rank matrix factorization algorithm based on multi-graph regularization for detecting drug-disease association. IEEE/ACM Trans Comput Biol Bioinform. 2023;20(5):3033–43. pmid:37159322
  41. 41. Meng Y, Lu C, Jin M, Xu J, Zeng X, Yang J. A weighted bilinear neural collaborative filtering approach for drug repositioning. Brief Bioinform. 2022;23(2):bbab581. pmid:35039838
  42. 42. Gu Y, Zheng S, Yin Q, Jiang R, Li J. REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction. Comput Biol Med. 2022;150:106127. pmid:36182762
  43. 43. Yu Z, Huang F, Zhao X, Xiao W, Zhang W. Predicting drug-disease associations through layer attention graph convolutional network. Brief Bioinform. 2021;22(4):bbaa243. pmid:33078832
  44. 44. Mongia A, Majumdar A. Drug-target interaction prediction using multi graph regularized nuclear norm minimization. PLoS One. 2020;15(1):e0226484. pmid:31945078
  45. 45. Zhou Y, Lu Y, Wu Y, Kong W, Zhou L, Guo X, et al. Safety of antidepressants commonly used in 6-17-year-old children and adolescents: a disproportionality analysis from 2014-2023 on the basis of the FAERS database. PLoS One. 2025;20(8):e0330025. pmid:40802817
  46. 46. Gammon D, Cheng C, Volkovinskaia A, Baker GB, Dursun SM. Clozapine: why is it so uniquely effective in the treatment of a range of neuropsychiatric disorders? Biomolecules. 2021;11(7):1030. pmid:34356654
  47. 47. Valadez-Lemus RE, Góngora-Alfaro JL, Jiménez-Vargas JM, Alamilla J, Mendoza-Muñoz N. Nanoencapsulation of amitriptyline enhances the potency of antidepressant-like effects and exhibits anxiolytic-like effects in Wistar rats. PLoS One. 2025;20(2):e0316389. pmid:40019891
  48. 48. Grunze A, Amann BL, Grunze H. Efficacy of carbamazepine and its derivatives in the treatment of bipolar disorder. Medicina (Kaunas). 2021;57(5):433. pmid:33946323
  49. 49. Tian F, Lewis LD, Zhou DW, Balanza GA, Paulk AC, Zelmann R, et al. Characterizing brain dynamics during ketamine-induced dissociation and subsequent interactions with propofol using human intracranial neurophysiology. Nat Commun. 2023;14(1):1748. pmid:36991011
  50. 50. Pearl NZ, Babin CP, Catalano NT, Blake JC, Ahmadzadeh S, Shekoohi S, et al. Narrative review of topiramate: clinical uses and pharmacological considerations. Adv Ther. 2023;40(9):3626–38. pmid:37368102
  51. 51. Callaghan JT, Bergstrom RF, Ptak LR, Beasley CM. Olanzapine: pharmacokinetic and pharmacodynamic profile. Clin Pharmacokinet. 1999;37(3):177–93. pmid:10511917
  52. 52. Deb S, Roy M, Lee R, Majid M, Limbu B, Santambrogio J, et al. Randomised controlled trials of antidepressant and anti-anxiety medications for people with autism spectrum disorder: systematic review and meta-analysis. BJPsych Open. 2021;7(6):e179. pmid:34593083
  53. 53. Curnow E, Rutherford M, Maciver D, Johnston L, Prior S, Boilson M, et al. Mental health in autistic adults: a rapid review of prevalence of psychiatric disorders and umbrella review of the effectiveness of interventions within a neurodiversity informed perspective. PLoS One. 2023;18(7):e0288275. pmid:37440543
  54. 54. Henneberry E, Lamy M, Dominick KC, Erickson CA. Decades of progress in the psychopharmacology of autism spectrum disorder. J Autism Dev Disord. 2021;51(12):4370–94. pmid:34491511
  55. 55. Luo Y, Lv K, Du Z, Zhang D, Chen M, Luo J, et al. Minocycline improves autism-related behaviors by modulating microglia polarization in a mouse model of autism. Int Immunopharmacol. 2023;122:110594. pmid:37441807
  56. 56. Wu J-R, Zhao Y, Zhou X-P, Qin X. Estrogen receptor 1 and progesterone receptor are distinct biomarkers and prognostic factors in estrogen receptor-positive breast cancer: Evidence from a bioinformatic analysis. Biomed Pharmacother. 2020;121:109647. pmid:31733575
  57. 57. Wu Y, Zhang Z, Cenciarini ME, Proietti CJ, Amasino M, Hong T, et al. Tamoxifen resistance in breast cancer is regulated by the EZH2-ERα-GREB1 transcriptional axis. Cancer Res. 2018;78(3):671–84. pmid:29212856
  58. 58. Palaniappan M, Nguyen L, Grimm SL, Xi Y, Xia Z, Li W, et al. The genomic landscape of estrogen receptor α binding sites in mouse mammary gland. PLoS One. 2019;14(8):e0220311. pmid:31408468
  59. 59. Schroth W, Antoniadou L, Fritz P, Schwab M, Muerdter T, Zanger UM, et al. Breast cancer treatment outcome with adjuvant tamoxifen relative to patient CYP2D6 and CYP2C19 genotypes. J Clin Oncol. 2007;25(33):5187–93. pmid:18024866
  60. 60. Chen R, Zhou Z, Meng X, Lei Y, Wang Y, Wang Y. Emerging opportunities to treat drug-resistant breast cancer: discovery of novel small-molecule inhibitors against different targets. Front Pharmacol. 2025;16:1578342. pmid:40949156
  61. 61. Clements KE, Thakar T, Nicolae CM, Liang X, Wang H-G, Moldovan G-L. Loss of E2F7 confers resistance to poly-ADP-ribose polymerase (PARP) inhibitors in BRCA2-deficient cells. Nucleic Acids Res. 2018;46(17):8898–907. pmid:30032296
  62. 62. Jiang J, Wang Y, Sun M, Luo X, Zhang Z, Wang Y. SOX on tumors, a comfort or a constraint? Cell Death Discov. 2024;10(1):67.
  63. 63. Van Tol HH, Bunzow JR, Guan HC, Sunahara RK, Seeman P, Niznik HB, et al. Cloning of the gene for a human dopamine D4 receptor with high affinity for the antipsychotic clozapine. Nature. 1991;350(6319):610–4. pmid:1840645
  64. 64. Roth BL, Craigo SC, Choudhary MS, Uluer A, Monsma FJ Jr, Shen Y, et al. Binding of typical and atypical antipsychotic agents to 5-hydroxytryptamine-6 and 5-hydroxytryptamine-7 receptors. J Pharmacol Exp Ther. 1994;268(3):1403–10. pmid:7908055
  65. 65. Tsai SJ, Wang YC, Yu Younger WY, Lin CH, Yang KH, Hong CJ. Association analysis of polymorphism in the promoter region of the alpha2a-adrenoceptor gene with schizophrenia and clozapine response. Schizophr Res. 2001;49(1–2):53–8. pmid:11343863
  66. 66. López Ordieres MG, Rodríguez de Lores Arnaiz G. Clozapine administration modifies neurotensin effect on synaptosomal membrane Na+, K+ -ATPase activity. Neurochem Res. 2009;34(12):2226–32. pmid:19562485
  67. 67. Morais M, Patrício P, Mateus-Pinheiro A, Alves ND, Machado-Santos AR, Correia JS, et al. The modulation of adult neuroplasticity is involved in the mood-improving actions of atypical antipsychotics in an animal model of depression. Transl Psychiatry. 2017;7(6):e1146. pmid:28585931