SubGE-DDI: A new prediction model for drug-drug interaction established through biomedical texts and drug-pairs knowledge subgraph enhancement

Biomedical texts provide important data for investigating drug-drug interactions (DDIs) in the field of pharmacovigilance. Although researchers have attempted to investigate DDIs from biomedical texts and predict unknown DDIs, the lack of accurate manual annotations significantly hinders the performance of machine learning algorithms. In this study, a new DDI prediction framework, Subgraph Enhance model, was developed for DDI (SubGE-DDI) to improve the performance of machine learning algorithms. This model uses drug pairs knowledge subgraph information to achieve large-scale plain text prediction without many annotations. This model treats DDI prediction as a multi-class classification problem and predicts the specific DDI type for each drug pair (e.g. Mechanism, Effect, Advise, Interact and Negative). The drug pairs knowledge subgraph was derived from a huge drug knowledge graph containing various public datasets, such as DrugBank, TwoSIDES, OffSIDES, DrugCentral, EntrezeGene, SMPDB (The Small Molecule Pathway Database), CTD (The Comparative Toxicogenomics Database) and SIDER. The SubGE-DDI was evaluated from the public dataset (SemEval-2013 Task 9 dataset) and then compared with other state-of-the-art baselines. SubGE-DDI achieves 83.91% micro F1 score and 84.75% macro F1 score in the test dataset, outperforming the other state-of-the-art baselines. These findings show that the proposed drug pairs knowledge subgraph-assisted model can effectively improve the prediction performance of DDIs from biomedical texts.


Introduction
Drug-drug interaction (DDI) refers to a change in drug's effects due to the presence of another drug [1].This commonly occurs in cases of polypharmacy, when the effects of one drug alter the effects of other drugs in a combination regimen.DDI may enhance or weaken the efficacy of the drug, causing adverse drug reactions (ADRs), which can even be life-threatening in severe cases [2].Some databases, such as DrugBank [3], TWOSIDES [4], DDInter [5], KEGG [6], BIOSNAP [7] and MecDDI [8] have been established to provide information related to interactions between drugs and promote the development of new drugs while avoiding ADRs.Currently, the rapid growth in the number of biomedical publications makes it increasingly difficult to manually extract valuable DDI information from articles, despite its critical importance [9].This necessitates the development of automated DDI extraction methods.
Despite the extensive use of traditional neural networks for DDI extraction in biomedical texts before 2018, there is a need for more effective methods as the available models are largely dependent on manual labeling and unsatisfactory prediction accuracy.As a result, BERT model (proposed by Jacob Devlin in 2018) has been widely applied in recent years [14].Moreover, BERT and its derivative models are employed in DDIs extraction with good results.Chen et al. [15] proposed a novel method based on BioBERT [16] for extracting DDIs without adding any external drug information, where the drug name is not converted into standard tokens.Molina et al. [17] proposed a framework that leverages Gaussian noise injection to enhance the performance of DDI prediction.Besides, some researchers have shown that external drug-related information can further improve the effectiveness of the model [18][19].For instance, Asada et al. (2021) proposed a new method that can extract DDIs by combining information from external drug databases and large-scale plain text [20,21].
In addition, studies in the field of Knowledge Graph (KG) have shown that KG can effectively integrate multiple entity type and complex relationships between biological entities.These methods improve the extraction of informative high-order semantic features, which enhances the DDI prediction accuracy [22].Deep learning has significantly enhanced DDI prediction, giving rise to a variety of frameworks that capitalize on different information sources.One approach optimizes the Biomedical Knowledge Graph (BKG) by integrating local and global information, as demonstrated by Ren et al. [23].Another, proposed by Su et al. [24], leverages an attention-based KG representation learning framework.Meanwhile, Gu et al. [25] employed supervised contrastive learning with DDI data as negative samples to transform drug embedding vectors and predict interactions.Su et al. [26] adopted KG2ECapsule, which utilizes a capsule graph neural network to generate high-quality negative samples for DDI prediction.Tang et al. (2023) [27] propose a novel approach called DSIL-DDI that derives domain-agnostic representations of DDIs from a source domain, enhancing model generalizability and interpretability.
Although previous studies primarily performed DDI extraction from biomedical texts, relying solely on text information and drug molecular features, Duan et al. [28] introduced an improved approach in which molecular structure information is integrated with text, emphasizing functional group structure to enhance the extraction process.Further, He et al. [29] optimized the model's performance by integrating 3D molecular graph structure and position information.These methods improved the model performance of DDIs extraction using drug molecular structures and achieved good results.However, drug polymorphisms limit DDIs extraction.To address the limitations of current methods used to illustrate chemical structures, we encompass more biomedical entities in DDIs extraction, such as diseases and pathways, providing a better understanding of drug interactions.
In this study, a novel DDIs extraction framework involving external BKG, SubGE-DDI, was proposed to find another strategy to prevent these limitations.This model combines text features and drug pairs-related KG information for DDIs extraction.First, a BKG was built from several public biomedical databases, including DrugBank, TWOSIDES, OFFSIDES [30], SIDER [31], SMPDB [32], DrugCentral [33], Entrez Gene [34], and CTD [35] to generate the KG features of the drug pairs.The message of targeted drug pairs with position embedding was enhanced using CNN.Moreover, three pre-trained models are compared and the most suitable for DDIs extraction is determined.Finally, the pre-trained PubMedBERT-based [36] text feature extraction model was adopted because it can achieve the highest F1 score on both micro-averaged and macro-averaged metric.Moreover, a Subgraph -Attentional Graph Convolutional Network (SubAGCN) method was used to develop KG.SubAGCN can effectively anchor the relevant subgraph of drug pairs in KG and generate inference paths in the subgraph through a novel attention module.Three hidden layer fusion methods were proposed to improve the combination of subgraph features generated by SubAGCN and the text features generated by PubMedBERT.The resultant mixed vectors, including drug pairs KG interactive messages, and text features, are employed to conduct the final classification task.The results show that SubGE-DDI can accurately extract the DDI relationships and enhance macro-averaged and micro-averaged metrics, obtaining F1 macro and F1 micro of 84.75% and 83.91%, respectively.

Datasets
Training data were obtained from SemEval-2013 Task 9 dataset [10].This dataset was utilized to investigate automatic drug recognition and DDIs extraction algorithms based on biomedical texts.Data are obtained from two public datasets, including DrugBank and MedLine.A total of 730 and 175 articles about DDIs were obtained from the DrugBank and MedLine, respectively.The target drugs and interaction types in the aforementioned articles were annotated by experts.Subsequently, the dataset was divided into training and testing sets.Published models for extracting DDIs from biomedical texts are compared in this dataset [15,20,28].
The dataset defines the following four interaction labels and a negative label.
• Effect (Eff.):Used to annotate DDIs describing an effect (e.g.About 46% of uninfected volunteers develop rash while receiving SUSTIVA and clarithromycin) or a Pharmacodynamics (PD) mechanism (e.g.Chlorthalidone may potentiate the action of other antihypertensive drugs).
• Advise (Adv.):Used when giving a recommendation or advice regarding a drug interaction (e.g.UROXATRAL should not be used in combination with other alpha-blockers).
• Int.: Used when a DDI appears in the text without providing any additional information (e.g.The interaction of omeprazole and ketoconazole has been established).
• Negative (Neg.):used when there is no interaction between two drugs.
In summary, DDIs extraction is a multiclass classification task for classifying target entities in input sentences.C 2 n entity pairs will be generated if there are n entities in an input sentence.Herein, the drug pair was labeled as "Drug1" and "Drug2" to ensure the generalization of features.An example of preprocessing is shown in Table 1.
Most samples in Table 2 are negative pairs, indicating no interaction between the drugs in the sentence.This phenomenon often leads to data imbalance, which affects the performance of DDI extraction models based on machine learning.In this study, the negative instances were filtered as much as possible according to a negative instance filtering strategy.The filtering strategy from previous studies [37][38][39] is listed below (Table 3).
Rule 1.-Drug pairs that refer to the same name or drugs that are abbreviations of another drug should be screened.
Rule 2.-Drug pairs that are in coordinate structure should be screened.Rule 3.-A drug that is a special case of the other one should be filtered.
To ensure accurate mapping between drug pairs in the sentences and entities in the knowledge graph, drug names were mapped to their corresponding DrugBank IDs.Any pair without a DrugBank ID was excluded from the analysis.There were many negative pairs in the training and testing sets after filtering (Table 4).Therefore, a multi-focal loss was used to avoid the sample imbalance.Multi-focal loss is a loss function [40] that treats different DDI types as different weights.Multi-focal loss was determined as follows: m, α i , and p t represent the number of DDI types, weight of each DDI type as defined in Eq (2), and prediction of the DDI prediction model, respectively (γ = 2).
Count i and m represents the number of the i−th type and the number of different types, respectively.

The overview of SubGE-DDI
This study introduces SubGE-DDI, a novel model for extracting DDIs.SubGE-DDI leverages SubAGCN to extract biomedical subgraph knowledge for the corresponding drug pair entity.Text information of DDIs was obtained using the pre-trained PubMedBERT-based text feature extraction model.The message of targeted drug pairs was enhanced by position embedding via CNN.Finally, the three hidden layer fusion methods were compared to determine the best approach for fusing the subgraph features and the text features.An illustration of the proposed SubGE-DDI model is presented in Fig 1 .For example, the sentence "Administration of valproic acid decreases oral clearance of temozolomide by about 5%." was input where "valproic acid" and "temozolomide" are drug pair.The subgraph information for the drug pair was then obtained through SubAGCN.The positional information of drug pairs, text information, and subgraph information were combined as a fusion feature.Finally, the fusion feature was used as the input for DDI predictions.

Drug pairs features in knowledge subgraph
The biomedical KG is a comprehensive mechanism that illustrates human biology.For example, other nodes (e.g.pathway, protein, etc.) may change when a drug-drug paired node in the KG changes, resulting in a series of reactions and producing various physiological outcomes.By learning the structural features of the biomedical knowledge graph surrounding targeted drug pairs, we provide important data that can be leveraged to reveal the biological mechanisms underlying their effects on the body.Specifically, graph structure features are located inside the node and edge structures of the network and can be calculated directly from the graph.However, it may take longer and consume a massive amount of RAM if the entire KG is learnt in a computer.Zhang et al. [41] (2018) demonstrated that the local enclosing subgraphs contain enough information to effectively learn the structure features of the entire graph.In this study, we focused on local subgraphs around the drug pairs in the KG and the location information of the nodes within subgraphs when collecting information about drug interactions.Specifically, the k−hop neighboring nodes were extracted for the target drug pairs u and v, N k ðuÞ ¼ fsjdðs; uÞ � kg and N k ðVÞ ¼ fsjdðs; vÞ � kg.d(�,�) represents the distance between two nodes on G KG .The enclosing subgraph was obtained based on the intersections of these nodes, G Sub ¼ fðu; r; vÞju; v 2 N k ðuÞ \ N k ðvÞ; r 2 Rg.Meanwhile, the node i representation for each node i in the subgraph G Sub , was updated as h i ð0Þ ¼ ½h i ð0Þ ; p i � (p i ¼ ½one À hotðdði; uÞÞ � one À hotðdði; vÞÞ�), where � is concatenate).
A new GCN-based model, named SubAGCN, was designed to fully learn the structure features of subgraphs, and summarize the subgraph information into a graph-based pathway for potential drug interactions.Notably, the pathway is a sparse subgraph since drug interactions often contain varying amounts of complex interplays among many types of biomedical entities.Therefore, a layer-independent, relationship-aware module with self-attention was designed.The weight for each edge was configured in G Sub .
Specifically, the self-attention module consists of two parts: First, the weights should be assigned to each edge in the subgraph since the importance of the relationship between each entity in the subgraph will be different, even if the edge type is the same.Therefore, γ and β (Brockschmidt [42]) were determined to ensure that the model can dynamically weight features based on information at the edge target nodes, as follows: b j ðtÞ ; g j ðtÞ ¼ gðh j ðtÞ ; y g Þ; ð3Þ where g and h j ðtÞ represent a single linear layer and the targeted node representation of the edge in t layer, respectively.
The unimportant edges were removed from the subgraph to keep only important edges.Second, the neighboring nodes in the subgraph were taken into consideration to generate the attention weight and calculate α i,j , which is the signal strength score of the edge between nodes i and j [43,44].α i,j was calculated as follows: where W I and W J represent the weights of the individual linear layers corresponding to the source and target nodes, respectively; r i,j represents the relation between node i and node j; ffi ffi ffi ffiffi d k p represents the size of the node input vector h (0) ; Tanh(�) represents the tanh function for non-linear transformation, which makes α i,j range from -1 to 1.A threshold hyperparameter z was designed to screen out unimportant edges by setting their weights τ i,j to 0, when the α i,j is lower than z.
A key subgraph important for targeted drug pairs was obtained using the self-attention module.The information of subgraph was then integrated using the following message-passing scheme.For each node v, the neighbor node message b v ðtÞ was determined as follows: where N v and W r ðtÞ represent the set containing all the neighbors of node v in the subgraph and the weight matrix of the relationship r between nodes u, v at the layer t, respectively.Basis factorization was used to decompose W r ðtÞ into linear combinations of a small number of basis matrices {V b } b2B and avoid overfitting (Schlichtkrull et al. [45]) as follows: The biomedical entities associative information h v ðtÞ of node v was then updated by the neighbor node message b v ðtÞ as follows: where W self represents the weight matrix to transform the node embedding itself.
Otherwise, the mean of all node embeddings in subgraph at layer t was used to represent the information of subgraph h G Sub ðtÞ as follows: where W Sub and h i ðtÞ represent the weight matrix for nodes transformation in subgraph and embedding of the node i at layer t in subgraph, respectively.
Finally, the layer-aggregation mechanism was employed to integrate the various representations generated by each layer [46].The node/subgraph embeddings in each layer, such as h v and h G Sub , were concatenated.Finally, the node embeddings of the target drug pairs and the subgraph embeddings were concatenated to obtain the drug pairs representation h dp .

Drug pairs features in text
Each drug pair in the SemEval-2013 Task 9 dataset has a corresponding text description.DDIs extraction is a task performed to identify drug pairs in input sentences that describe the interactions of the drug pairs and assign the correct types of interactions to the drug pairs.Herein, BERT [14] was used as the basic model to extract features of drug-drug pairs from texts.The BERT model is very difficult to train, with an astonishing scale of billions of parameters.Researchers typically load pre-trained model parameters before training a specific task.In this study, three pretraining models were compared, and the most suitable model was selected for extracting text-related features.PubMedBERT was eventually integrated into the framework to enhance training efficiency and extract drug pair features more effectively from text.PubMed-BERT, a pre-trained model, was trained on the latest collection of PubMed5 abstracts, comprising 21 GB of data with 14 million abstracts and 3.2 billion words.A preprocessed input sentence was converted into a real-valued fixed-size vector via a BERT-based model (Fig 1A).Specifically, given an input sentence S = (w 1 ,. ..,w n ), where drugs d 1 and d 2 are involved, the sentence was first split into word fragments through the WordPiece algorithm [47] to obtain the corresponding token embedding e t i .Each token embedding e t i was then converted into a real-valued pretrained contextualized embedding e w i 2 R d w via the BERT model.In addition, d p -dimensional drug-relative position embeddings e p1 i and e p2 i were prepared for each word piece, which consists of the relative positions d 1 and d 2 .e p1 i and e p2 i was concatenated to obtain the corresponding position embeddings for each word fragment:

Features fusion methods
In this section, three different feature-fusion methods were used to combine features obtained from subgraphs and texts.Their performances were compared, and the best (Fusion method 2) was selected.

Fusion method 1
A Bi-directional Long Short-Term Memory (BiLSTM) [48] was used to deal with the pretrained embeddings, and fairly desirable results were obtained (based on Chen et al. [15] and Dou et al. [49] studies).BiLSTM was used to further emphasize contextual features in sentences.e w was input into BiLSTM to obtain forward and reverse sentence representations (Fig 2B ) as follows: where l w ¼ ðl w 1 ; . . .; l w n Þ, r w ¼ ðr w 1 ; . . .; r w n Þ and l w ; r w 2 R d lstm , and d lstm represent the hidden layer size of LSTM.
The outputs e lstm ¼ ðe lstm 1 ; . . .; e lstm n Þ and e t ¼ ðe t 1 ; . . .; e t n Þ were obtained as follows: e lstm ¼ ½l w ; r w �; e t ¼ ½e lstm ; e p �: ð12Þ z i was then introduced as a concatenation of k input embeddings around e t i : A convolution operation was performed on z i as follow: where represents element-wise product: b text represents a bias term; and f(�) represents a GELU function [50].A weight tensor for the convolution was defined as W text 2 R d c �ðd w �2d p Þ�k : W text j represents the j-th column of W text ; k represents window size.In addition, max-pooling was used to convert the output of each filter in the convolutional layer into a fixed-size vector as follows: Finally, e text was concatenated with h dp as shown in Eq ( 16): where H represents the final result vector, which includes text features and subgraph features about the targeted drug pairs.

Fusion method 2
The basic process for method 2 (Fig 1A) is similar to that of Fusion method 1 (Fig 2B).However, the output of BERT is not passed to BiLSTM, but directly delivered into CNN after concatenating with position embeddings.Therefore, all equations are the same as in Fusion method 1 except for Eq (12), which changes to e t = [e w ,e p ].

Fusion method 3
For fusion method 3 (Fig 2C ), the text embeddings e w and drug pairs representations h dp were concatenated as follows: Second, e p d 1 and e p d 2 , representing the positions corresponding to drugs d 1 and d 2 , were concatenated, with the sentence position embedded e p as follows: Finally, e ¼ ½e wdp ; e p dp � was obtained to emphasize the drug mention location information for drug pairs representations.e was then sent to CNN as mentioned in Fusion Method 1: where H represents the final result vector, including text features and subgraph features about targeted drug pairs.

DDIs extraction using subgraph information
The resulting vector H was used as the input to the prediction layer as follows: where s = [s 1 ,. ..,s m ] represent the prediction scores; W pred 2 R m�d p represents a weight matrix to convert H into prediction scores, where m represent the number of DDI types.A softmax function was then used to convert s into the probability of possible interactions p t .p t ¼ ½p t 1 ; . . .; p t m �; Finally, the loss function was calculated based on Eq (1) using p t (defined as Eq (21)).

Experiments
Evaluation metrics.Two common F1 scores: micro-f1 (F1 micro ) and macro-f1 (F1 macro ) score were used to evaluate the performance of the method.These two F1 score metrics can be calculated by precision (P) and recall (R).Micro precision, recall and F1 score were calculated as follows: TP, FP and FN represent the number of true positive cases, number of false positive cases, F number of false negative cases, respectively; i represents the i-th DDIs type.Although F1 micro is a common metric in DDIs extraction studies, F1 micro should give more weight to the primary classification according to Eq (22).In this study, Macro precision, recall, and F1 score were determined as follows: F1 macro is more at each type, regardless of its sample size according to Eq (23).Opitz [51] (2019) pointed that F1 macro is more suitable for unevenly distributed dataset.Moreover, we employ AUC and AUPR metrics for both micro-averaged and macro-averaged evaluations.AUC represents the area under the receiver operating characteristic curve, while AUPR represents the area under the precision-recall curve.Experimental setting.PyTorch library is widely used to implement SubGE-DDI.In this study, PubMedBERT was used to convert tokens into 768 dimensional vectors.In addition, drug node embeddings were initialized by xavier uniform with gain of ffi ffi ffi 2 p .The batch size was set to 32, and Adam was used as the optimizer.The initial learning rate and the number of fine-tuning epochs were 5 −5 and 5, respectively.Other important settings are shown in Table 5.

Model performance
The SubGE-DDI model was evaluated based on the test dataset of the SemEval-2013 Task 9.Moreover, we compared the performance to that of other state-of-the-art models (Table 6).
Both micro and macro metrics were used to evaluate the performance of the proposed model (F1 micro and F1 macro scores).
To ensure fair and reproducible comparisons, we applied the same pre-processed data to all models, regardless of their original dataset sizes.Compared with other methods, SubGE-DDI had the best F1 scores, precision and AUPR values on both micro-averaged and macro-averaged metrics, obtaining F1 micro , Precision micro , AUPR micro , F1 macro , Precision macro and AUPR ma- cro of 83.91%, 85.02%, 90.96%, 84.75%, 89.22%, and 73.00%, respectively.The results of SRGU-CNN [52] and RHCNN [37] demonstrated that the models using pre-training method achieved better performance compared to models using other machine learning algorithms.In addition, SubGE-DDI outperformed the models that used BERT only (SciBERT [53], Bio-BERT, and PubMedBERT), further indicating that the subgraph feature can improve DDIs extraction.
The average F1 score on micro-averaged metrics for five-folds of cross-validated training dataset is shown in Table 7.The model with position features and subgraph features scored higher on all categories than the baseline models.This finding indicates that the proposed framework can effectively alleviate the category imbalance problem.
The results of ROC curve (receiver operating characteristic curve) and PR curve (precisionrecall curve) are shown in

Ablation experiments
Ablation experiments were performed to explore the role of each component in SubGE-DDI.These experiments included models that use only text features as a baseline, followed by comparison of both micro-averaged and macro-averaged metrics by adding or removing additional features.In this paper, the performance of the model was worst when it was only based on PubMedBERT.Moreover, its performance did not significantly improve following addition of the position embeddings (Table 8).These findings indicate that text features are not significantly related to position embeddings.However, the model showed better performance when the subgraph features were added to the model than when the position features were added.Notably, the best performance was achieved when both position features and subgraph features were added.Compared with the sum of improvement after the addition of position features and subgraph features separately, the improvement was greater when position features and subgraph features were added together (F1 micro : 1.24% vs. -0.23%and 0.5%; F1 macro : 1.90% vs. 0.08% and 0.69%).These findings indicate that the addition of position features can help subgraph features improve the performance of DDIs extraction models.Interestingly, the model integrating subgraphs achieved the highest recall scores for both micro-averaged and macro-averaged metrics.This indicates that leveraging information from drug pair knowledge subgraphs can improve the capacity of the model to identify potential DDI relationships, hence obtain the highest recall.In addition, the three new fusion methods were compared to obtain the best method.The three fusion methods were evaluated using both micro-averaged and macro-averaged metrics based on the SemEval-2013 Task 9 test set, as shown in Table 9.The Fusion Method 2 had the best performance on both micro-averaged and macro-averaged metrics (Table 9).Therefore, Method 2 was selected.
Finally, to obtain the most suitable pre-trained language model for our experiment, we analyzed them on three different pre-trained language models: SciBERT, BioBERT and PubMed-BERT.Each model was evaluated using the F1 scores of both micro-averaged and macroaveraged metrics when using different feature combinations as shown in Table 10.
Analysis of the results showed that the three pre-trained language models had better performance when all the features were combined compared with 'BERT-only'.However, because the PubMedBert model obtained the best F1 scores on both micro-averaged and macro-averaged metrics, it was selected as the pre-trained language model.

The influence of different loss functions
To determine the impact of the multi-focal loss function, we compared the F1 score on binary classification between the model with cross entropy loss and the model with multi-focal loss as indicated in Fig 4. The results demonstrate that the multi-focal loss has better performance on all DDI types except "Int."compared to the cross-entropy loss.Moreover, the performance of multi-focal loss on "Int." is similar to that of the cross-entropy loss (57.81±4.92%vs 57.90±6.05%).This suggests that the multi-focal loss can alleviate category imbalance in multi-classification tasks and improve the Negative, Mechanism, Effect and Advise, by 0.04%, 1.12%, 0.36%, and 3.86%, respectively.

Error analysis
Our model exhibits a consistent trend of higher precision than recall.To investigate the underlying causes of this pattern, we conducted a comprehensive error analysis.This involved calculating the confusion matrix both before and after normalizing the test set, as depicted in Fig 5 .Most importantly, from Fig 5B, we notice that many "Int."samples were incorrectly classified as "Effect", while the "Effect" samples were rarely misidentified as "Int.".Analysis of Eq (22) and Eq (23) indicated that there are too few true positive samples for "Int.".This caused a low recall rate of "Int.", and thus reduce the overall recall rate.Subsequently, we reviewed the definition of labels in the SemEval-2013 Task 9 dataset, where the "Int." was described as "This type is used when a DDI appears in the text without providing any additional information".Therefore, "Int." is a vague category.If a specific term in the sentence unambiguously refers to a particular DDI category other than "Int.", it may lead to a misclassification.To further • 'Drug1 may interact with aminoglutethimide or Drug2 (causing too great a decrease in adrenal function).' The initial sentence shows a similarity between drug1 and drug2 as they both have the term "interact" associated with them.Conversely, in the second sentence, the word "decrease" is more proximate to drug2 than the term "interact".This proximity might cause confusion for the model, potentially leading to an incorrect categorization as 'Effect'.

Case study
In addition, we show four cases of prediction results as case study.Case 1, 2 and 3 are predicted correctly by using SubGE-DDI but incorrectly when PubmedBert is used alone (Table 11).Case 4 has opposite results.The result (like case 1, case 2) present a scenario where Pubmed-Bert shows an incorrect prediction and SubGE-DDI exhibit correct prediction although the distance between DRUG1 and DRUG2 is close.
Hence, leveraging information about drug pairs in BKG may be beneficial in cases where predictions relying on their contextual surroundings are difficult.Nevertheless, we infer that information obtained from BKG may result in inaccurate predictions when the term "effects" is in close proximity to DRUG1 in case 4. Despite this, SubGE-DDI still showed good performance in practical tasks.

Conclusion
In this study, a novel framework called SubGE-DDI, which combines the external subgraph features of BKG and semantic text features is proposed for DDIs prediction in biomedical text.Our experimental results indicate that the developed framework can efficiently predict DDIs, indicating that the subgraph features obtained from relevant knowledge graphs can improve DDIs extraction.To address these limitations, we will focus on the following aspects in future.Firstly, we will shall develop more efficient subgraph extraction methods, to improve the quality of subgraphs and hence the performance of SubGE-DDI.Thus, meta-paths or adaptive propagation depth decision strategies will be adopted [55,56].In addition, we plan to investigate the application of data augmentation techniques to facilitate DDI extraction.

Fig 1 .
Fig 1. Overview of the SubGE-DDI.This figure illustrates the work flow of the SubGE-DDI framework.The SubGE-DDI framework consists of three key parts-subgraph information, text features and fusion part.The subgraph information section is shown in the figure and the text features section and fusion part is illustrted in (A).https://doi.org/10.1371/journal.pcbi.1011989.g001

Fig 3 .
Although the model had a strong DDIs extraction ability, the category imbalance still existed (Fig 3B).

Table 3 . Examples of filtering instance for defined rules (the mentioned entities are in italic). Rule Example
1 Repeated oral administration of coumaphos in sheep: interactions of coumaphos with bishydroxycoumarin, trichlorfon, and phenobarbital sodium. 2 Other strong inhibitors of CYP3A4 (e.g., itraconazole, clarithromycin, nefazodone, troleandomycin, ritonavir, nelfinavir) would be expected to behave similarly.3 The concurrent use of tetracycline and penthrane (methoxyflurane) has been reported to result in fatal renal toxicity.https://doi.org/10.1371/journal.pcbi.1011989.t003

Table 6 . Evaluation on SemEval-2013 Task 9 test set.
aThere is no reproduction because the code is not exposed or does not apply to the experimental dataset.bThebold font indicates the best of all results.https://doi.org/10.1371/journal.pcbi.1011989.t006