MGAE-DC: Predicting the synergistic effects of drug combinations through multi-channel graph autoencoders

Accurate prediction of synergistic effects of drug combinations can reduce the experimental costs for drug development and facilitate the discovery of novel efficacious combination therapies for clinical studies. The drug combinations with high synergy scores are regarded as synergistic ones, while those with moderate or low synergy scores are additive or antagonistic ones. The existing methods usually exploit the synergy data from the aspect of synergistic drug combinations, paying little attention to the additive or antagonistic ones. Also, they usually do not leverage the common patterns of drug combinations across different cell lines. In this paper, we propose a multi-channel graph autoencoder (MGAE)-based method for predicting the synergistic effects of drug combinations (DC), and shortly denote it as MGAE-DC. A MGAE model is built to learn the drug embeddings by considering not only synergistic combinations but also additive and antagonistic ones as three input channels. The later two channels guide the model to explicitly characterize the features of non-synergistic combinations through an encoder-decoder learning process, and thus the drug embeddings become more discriminative between synergistic and non-synergistic combinations. In addition, an attention mechanism is incorporated to fuse each cell-line’s drug embeddings across various cell lines, and a common drug embedding is extracted to capture the invariant patterns by developing a set of cell-line shared decoders. The generalization performance of our model is further improved with the invariant patterns. With the cell-line specific and common drug embeddings, our method is extended to predict the synergy scores of drug combinations by a neural network module. Experiments on four benchmark datasets demonstrate that MGAE-DC consistently outperforms the state-of-the-art methods. In-depth literature survey is conducted to find that many drug combinations predicted by MGAE-DC are supported by previous experimental studies. The source code and data are available at https://github.com/yushenshashen/MGAE-DC.


Introduction
Drug combination therapy, a treatment modality that combines two or more therapeutic agents, is a widely-used paradigm for various complex diseases such as cancer [1], hypertension [2] and infectious diseases [3]. Compared with the monotherapy, the drug combination therapy has the advantages of enhancing the efficiency, overcoming the drug resistance and reducing dose-dependent toxicity [4]. However, most of the drug combinations show additive effects which is equal to the sum of single-drug administrations, while rare drug combinations show synergistic effects or antagonistic effects where they have greater or lower effects than the sum of their individual administrations [5]. The drug combinations with strong synergistic effects, or synergistic drug combinations (SDCs), are attractive, new candidate therapies for clinical studies [6].
The effects of drug combinations are context-dependent. There is a pressing need to accurately identify synergistic effects of drug combinations for a given disease. Early studies for discovering novel SDCs are mainly based on clinical trials. The biggest problem for this kind of trial-based methods is that they may cause patients to receive unnecessary or even harmful treatments [7]. Although the experimental methods like high throughput screening technique can efficiently evaluate the synergy scores of many drug combinations across hundreds of cell lines and reduce the potential damage of trial-based methods, they are infeasible to test the complete drug combination space due to the combinatorial explosion [8]. Therefore, fast and efficient computational methods have become increasingly popular in identifying novel reliable and efficacious SDCs for clinical studies [9].
Recent methods for drug synergy prediction usually first construct informative features for drugs and cell lines, and then build prediction models over the features [10]. The quality of the extracted features for drugs and cell lines are critical to the prediction performance [11]. Early studies rely on hand-crafted features like drug molecular fingerprint, which is a numerical vector indicating the existence of drug substructures. For example, Sidorov et al. trained on the drug fingerprints a random forest (RF) and an extreme gradient boosting (XGBoost) model separately for each cell line, to predict the synergistic effects of drug combinations [12]. Besides, Preuer et al. and Kuru et al. both concatenated drug fingerprints and cell line genomic data as input features, used a normalization strategy to account for input data heterogeneity, and built conical layers to model the drug synergies [13,14]. However, the prediction performances of these methods are limited by the design of hand-crafted features, which is laborintensive and rely on the expert experiences.
To solve the above drawbacks and limitations, deep learning methods have been proposed [15]. For instance, Wang et al. developed DeepDDS which automatically captures features from drug chemical data and gene expression profiles to predict SDCs for given cancer cell lines [16]. In this model, the drug chemical structures were treated as graphs, and the drug features were learned by a graph convolutional network (GCN) which encodes molecular topology information efficiently. Considering cell lines as different relations, the synergy data of drug combinations were modeled as a relational GCN (R-GCN) by Zhang et al.'s SDCNet [17], where nodes are drugs, and edges are SDCs. Cell line-specific decoders were adopted to reconstruct the known SDCs, and predict new ones for each cell line, with the help of learned invariant features of drug combinations among the cell lines. Readers are referred to [9,10] for a more comprehensive review.
The effects of drug combinations are context-dependent. Drug combinations may be synergistic in one cell line, but become additive or antagonistic in another one [18]. Meanwhile, the drug combinations may share invariant patterns across different cell lines [19]. The unique and common features of drug combinations both play indispensable roles in the drug synergy prediction. Besides, the existing methods are usually developed on the data of synergistic combinations, paying little attention to the non-synergistic ones, which limits their prediction performance.
In this paper, we propose a novel deep learning method, MGAE-DC, for predicting the synergistic effects of drug combinations across cell lines. Not only synergistic combinations but also additive or antagonistic ones are considered as multiple channels into a multi-channel graph autoencoder (MGAE) network. The representation learning of drug combinations becomes more discriminative between synergy and non-synergy than the existing methods because the non-synergistic combinations are explicitly modeled by our method. After obtaining the cell-line specific drug embeddings via MGAE, an attention mechanism is devised to fuse them across cell lines, and a common drug embedding is extracted to represent cell-line invariant features by cell-line shared decoders. Thus, both cell-line unique and common patterns are captured to improve the generalization performance across different cell lines. We evaluate the performance of MGAE-DC by comparing with state-of-the-art methods on four benchmark datasets. The comparison results, as well as ablation study, demonstrate the effectiveness of the proposed method in predicting the synergistic effects of drug combinations. Literature study indicate that the predictions by MGAE-DC are well supported by previously reported experiment data.

MGAE-DC architecture
MGAE-DC consists of an embedding module and a predictor module (Fig 1). The embedding module is implemented by a MGAE to learn low-dimensional drug embeddings. As given in Fig 1A, the synergy data of drug combinations in each cell line are represented as three graphs, i.e., synergistic graph, additive graph and antagonistic graph. The nodes are drugs, and the edges are determined according to the levels of their synergy scores, i.e., high, moderate, and low, for the three graphs respectively. Then, as in Fig 1B, MGAE is adopted for the graphs from a specific cell line to learn the features of combinations from the cell line. The obtained latent features are named as cell line-specific embeddings. An attention mechanism is developed to fuse the cell-line specific embeddings for each drug, and exploit the cell-line common patterns of combinations through partial parameter sharing in decoders. The model is trained by optimizing two types of reconstruction errors. One is on the three input graphs for every cell line, and the other is induced by cell-line common decoders to reconstruct the synergy data of all cell lines in one loss. Finally, in Fig 1C, a predictor module is built to train on the concatenated features of the learned drug embeddings, drug fingerprints, and cell line features, and to predict the synergy scores of drug combinations.

Comparative results on all cell lines
We evaluate the performance of MGAE-DC in predicting the synergy scores of drug combinations on all cell lines, in comparisons with state-of-the-art methods. We adopt the stratified 10-fold cross-validation strategy. Table 1  We further adopt leave-one-drug-out, leave-one-cell-line-out and leave-drug-pairs-out strategies, to comprehensively evaluate the generalization performance of our method on novel drugs, cell lines or drug pairs. S4 Table summaries the results on the O'Neil dataset using Loewe score. Consistent with the previous results, all methods achieve relatively low predictive performance when generalizing to novel drugs, cell lines or drug pairs [13,20]. For the leaveone-drug-out strategy, EC-DFR achieves the lowest RMSE of 20.86, PRODeepSyn achieves the highest PCC of 0.46 and our method obtain the second best performance in terms of all metrics. For the strategy of leave-one-cell-line-out, MGAE-DC obtains the best performance in terms of RMSE, while MGAE-DC and DeepSynergy both achieve the highest PCC value of 0.57. Similarly, for the strategy of leave-drug-pairs-out, MGAE-DC obtains the lowest RMSE and the highest PCC with values of 17.8 and 0.7, respectively. The above results demonstrate the robustness of MGAE-DC when generalizing to novel cell lines or drug pairs. Drug-drug synergy prediction has also been studied as a classification problem in the literature, i.e., predicting whether a drug combination is synergistic. For a comprehensive evaluation, we also modify MGAE-DC as a classifier and compare it with state-of-the-art classification models for this task. Table 2 reports the classification performances of different methods on the O'Neil dataset using Loewe score. Considering the high ratio of negative samples in the dataset, the evaluation metric AUPR is a relatively fair metric on imbalanced data, and it is taken as the primary metric here. MGAE-DC and PRODeepSyn both achieve the best performance in terms of AUPR with a value of 0.67. Besides, MGAE-DC achieves the best performance in terms of the ACC, F1 and Kappa with values of 0.95, 0.77 and 0.61, respectively. We conclude that MGAE-DC and PRODeepSyn are competitive methods on the classification task. To display more details of the methods' performances, three typical cell lines including MDAMB436, ES2 and LNCAP are selected, because different methods achieve superior, median and inferior level performances in these three cell lines in terms of RMSE, respectively ( Fig 2B). In cell line MDAMB436, the scatter plot of MGAE-DC prediction results and the ground truth is displayed in the first column. The straight line in red, which represents the function between the predicted synergy scores and the ground truth fitted using the least squares regression, indicating their strong linear correlation. Then the distributions of RMSE and PCC of different methods in the cell line are shown in the second and third column, respectively. Similar performances are achieved by MGAE-DC and PRODeppSyn, and they significantly outperform other methods on all drug combinations from the corresponding cell line. Among the three types of drug combinations, additive combinations are the majority but less important, synergistic and antagonistic combinations are rare but more attractive candidates for clinical study. Therefore, we exclude the additive combinations, and further evaluate the performances of different methods on the synergistic and antagonistic combinations, respectively. The fourth column shows that MGAE-DC outperforms other methods on both synergistic and antagonistic combinations, which we are most concerned about. The methods in the other two cell lines ES2 and LINCAP achieve similar results and further demonstrate the effectiveness of MGAE-DC in predicting synergy effects of drug combinations in a specific cell line.

Ablation study
We conduct an ablation study to investigate the contributions of the cell-line common drug embeddings, and the two input channels of additive and antagonistic drug combinations. Specifically, we evaluate and compare the performances of the following variants of MGAE-DC: • MGAE-DC (FIN) is the variant of MGAE-DC that only uses the drug molecular fingerprint as drug features.
• MGAE-DC (SPE) is the variant of MGAE-DC that uses the drug molecular fingerprint and the corresponding cell line-specific embeddings as drug features.
• MGAE-DC (COM) is the variant of MGAE-DC that uses the drug molecular fingerprint and the corresponding cell line-common embeddings as drug features.
• MGAE-DC (SYN) is the variant of MGAE-DC that only uses the synergistic combinations to learn the drug embeddings.
• MGAE-DC (SYN+ADD) is the variant of MGAE-DC that uses the synergistic and additive combinations to learn the drug embeddings.   Table 3 summaries the results of the ablation study on the O'Neil dataset using Loewe score. Compared with MGAE-DC (FIN), MGAE-DC achieves a lower RMSE and a higher PCC, indicating that the drug embeddings learned from the drug combinations' synergy data are more accurate than molecular fingerprint features. Compared with MGAE-DC (SPE) and MGAE-DC (COM), the MGAE-DC also show superior performance. This demonstrates that cell line-specific and -common drug embeddings both facilitate the prediction results and they contain complementary information. When considering the additive or antagonistic combinations as additional two input channels, we observe that both channels benefit the training process and improve the prediction performance of the model trained on a single channel of synergistic drug combinations. Moreover, the antagonistic channel is more effective than the additive one. In addition, if we incorporate for cell lines the low-dimensional embeddings that integrate the PPI network with omics data by Wang et al. [11], MGAE-DC is further improved by MGAE-DC (CELL).

Interpretation of the model
Considering the drug embedding in specific cell line is critical in determining whether the drug combinations are synergistic on the cell line, we are curious about whether the model captures the cell line-specific features in the drug combinations' synergy data. To visualize the cell line-specific drug embeddings, we mapped them into two-dimensional space with the first two components through the dimensionality reduction method t-SNE, respectively. Take the drug Zolinza for example (Fig 3), we found the embeddings in different cell lines belong to the same tissue are clustered together. Moreover, we find the Euclidean distance of many cell lines are similar with their synergy distances, which are consistent with that the same drug combination behave similarly on cell lines with similar embeddings. The embeddings of other drugs also show the same phenomenon (S1 Fig) and demonstrate that the model can capture the cell line-specific features in the drug combinations' synergy data.

The effect of different predictor
To evaluate the prediction performance of MGAE-DC more comprehensively, we change the NN to other predictors, such as Light gradient boosting machine (LightGBM), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), random forest (RF) and support vector regressor (SVR). Table 4 summaries the performances of models with different predictors on the O'Neil dataset with Loewe score. The best performance with the lowest RMSE of 12.73 and the highest PCC of 0.83 is achieved when using NN. One possible reason is that we integrate NN with batch normalization mechanism, which can effectively reduce the dependence of the NN model on initialized parameters, accelerate convergence and enhance generalization ability.

Case study
We further analyze the prediction results of MGAE-DC for the previously untested drug combinations based on the O'Neil dataset with Loewe score. In-depth literature survey was performed and we find many cases are consistent with previous studies. For example, vinorelbine and paclitaxel interfere with mitotic spindle function through different mechanisms of action,   [26], 5-FU [27] and doxorubicin [28], respectively, and additive effects were found for all of these combinations. The prediction results given by MGAE-DC are consistent with these studies, and the predicted synergy scores are 1.14, -12.10, 3.62 and -10.22, respectively. In summary, the above in vivo and in intro experimental results all demonstrate the potential of MGAE-DC predicting novel reliable SDCs for clinical study.

Discussion
In this paper, we propose a MGAE-based method, MGAE-DC, for predicting the synergistic effects of drug combinations. Our method considers the synergy data from the aspects of not only synergistic combinations but also additive and antagonistic ones, and integrating both unique and common features of drug combinations across different cell lines. Experiments on four benchmark datasets have demonstrated that MGAE-DC achieves consistent and robust performance and outperforms state-of-the-art methods. MGAE-DC is a valuable tool to facilitate the discovery of rational combination therapies for clinical study.
In the ablation study, we have demonstrated that the cell line-specific and -common drug embeddings contain complementary information and both are beneficial for the prediction performance. Moreover, the additive and antagonistic drug combination data have been explicitly incorporated and demonstrated to play a role complementary to the synergistic combination data by enhancing the discriminative capacity of the drug embeddings. Furthermore, we find that integrating the cell line embeddings, which learned from the PPI network and omics data, further improves the model's prediction results, because the synergistic effects of drug combinations are context-dependent and accurate context embedding is definitely helpful.
MGAE-DC still has some shortcomings. In addition to the drug combinations' synergy data, other data resources such as drug response data, drug-protein interactions and drug-disease interactions may also improve the model performance, which have not been included in MGAE-DC yet. For example, Jiang et al. has leveraged the drug combinations' synergy data, drug-protein interactions and protein-protein interactions to do heterogeneous graph embedding for improving the model performance in specific cell lines [29]. Hence, incorporating more prior knowledge into the model to build more powerful and robust prediction models is the direction of our follow-up works. Besides, a drug combination with strong synergistic effect does not guarantee its effectiveness, and other characteristics like sensitivity and side effect are also critical to the clinical study [18]. Therefore, in addition to the synergistic effects of drug combinations, we are interested to develop models which is capable of simultaneously considering the combinations' sensitivities and side effects when making the prediction.

Data collection and preprocessing
The drug combinations' synergy data are mainly comprised of four datasets including O'Neil, ALMANAC, CLOUD and FORCINA datasets [30]. Each drug combination in the datasets is represented by a drug-drug-cell line triple, and its synergistic effect is quantified by four synergy types namely Loewe additivity (Loewe), Bliss independence (Bliss), zero interaction potency (ZIP), and highest single agent (HSA), respectively. In general, combinations who with higher synergy scores are more synergistic, and vice versa [18]. A drug combination can be roughly classified into synergistic combination, additive combination, and antagonistic combination according to the thresholds in different synergy types. In particular, following the previous studies [31], the thresholds used are {0, 30} for the Loewe score, {-3.37, 3.68} for the Bliss score, {-3.02, 3.87} for the HSA score, and {-4.48, 2.64} for the ZIP score, where the combinations with scores higher than the large value are synergistic combinations, the combinations with scores lower than the small value are antagonistic combinations, and the other combinations are additive combinations. Table 5 summarizes the drug combinations' synergy data in different datasets with various synergy types.

The embedding module of MGAE-DC
Construction of the graphs in each cell line. The drug combinations' synergy data in an arbitrary cell line r is represented as three graphs including synergistic graph G r|1 = (V, E), additive graph G r|2 = (V, E) and antagonistic graph G r|3 = (V, E) according to their synergy scores (Fig 1A), where V is the set of nodes (drugs), E is the set of edges in the corresponding graph. In particular, the adjacency matrixes of the graphs are represented as A rjt 2 R N�N ; t ¼ 3, where N denotes the number of drugs in all cell lines. The entries in the three adjacency matrixes A r|1 (i, j), A r|2 (i, j), A r|3 (i, j) are set to 1 if the combination between drug i and drug j in cell line r is synergistic, additive, antagonistic, respectively; Otherwise, the entries are 0. Since the three graphs are undirected, we have A r|t (i, j) = A r|t (j, i). MGAE for graphs in a specific cell line. MGAE is applied for the three graphs to extract the cell line-specific drug embeddings, which contain the unique features of combinations in a specific cell line (Fig 1B).
Encoder-Learning the cell line-specific drug embeddings. Taking the three graphs in an arbitrary cell line r and additional drug feature vectors as input, the encoder produces embeddings for the drugs. The three graphs are treated as a relational graph, where the relations refer to different types of graphs. The encoder assigns separate processing channels for each type of graph, then propagates and transforms information across different parts of the graph and across different graphs. Specifically, a single layer updating rule for drug i is defined by Eq (1): where h ðlÞ i;r 2 R 1�d is the embedding of drug i in cell line r at the l-th layer and d denotes the dimensionality of the drug embedding. N i;rjt denotes the set of the neighbor nodes of the drug i in the t-th graph. jN i;rjt j is the number of the neighbor nodes of the drug i in the corresponding graph and used as a normalization constant. α r|t is a learnable parameter to weigh the contribution of drug i's neighbor nodes from the t-th graph, which represents the graph level attention in a specific cell line, so we name it as specific attention. W ðlÞ rjt represents a trainable graph-specific weight matrix at the l-th layer, while W ðlÞ 0;r is a weight matrix for drug i itself at the l-th layer. The function σ(x) is the ReLU activation function. With the above settings, we initialize the drug embedding h ð0Þ i;r with the corresponding drug molecular fingerprint. Multiple layers are stacked to make the model more expressive and aware of the graph structure. Each layer contributes differently to the final prediction, therefore, an attention mechanism is employed to calculate the cell line-specific embedding for drug i in cell line r through: where b ðlÞ r is a trainable parameter that indicates the attention weight of the l-th layer in cell line r, and L = 3 is the number of network layers.
Decoder. Three decoders are adopted to reconstruct the three graphs based on the cell line-specific drug embeddings learned by the encoder. For simplicity, the adjacency matrixes of the reconstructed synergistic graph, additive graph and antagonistic graph are denoted as c A rjt 2 R N�N ; t ¼ 3. In particular, the entry in the adjacency matrix of the reconstructed graph denotes a probability score indicating how likely the combination is connected. More precisely, utilizing the embedding vectors of drug i and drug j learned by the encoder, the adjacency matrix of the reconstructed graph is calculated through Eq (3): where W r 2 R d�d is a trainable weight matrix, which is shared in different graphs, modeling global interactions of drug combinations across different graphs in cell line r. D rjt 2 R d�d is the diagonal matrix that capture the importance of each dimension in the drug embeddings h spe i;r and h spe j;r towards the t-th graph of cell line r. σ is the sigmoid function.

Loss function.
To make the reconstructed graphs consistent with the original input graphs, we merge the mean square error (MSE) of the three graphs as the loss function. The MSE is calculated by Eq (4): where (i, j) denotes the combination between drug i and drug j in cell line r, and y r|t represents the set of combinations in the t-th graph. |y r|t | is the number of combinations in the t-th graph required to normalize the loss value. MGAE for graphs in all cell lines. MGAE is applied for graphs in all cell lines to extract the cell line-common drug embeddings, which capture the common features of combinations among different cell lines.
Learning the cell line-common drug embeddings. The cell line-common drug embeddings are merged from the cell line-specific drug embeddings. Considering the embeddings from each cell line contribute differently to the prediction, we sort a cell line level attention mechanism, which we named common attention. More specifically, given the cell line-specific embedding of drug i h spe i;r in all cell lines, its cell line-common embedding is calculated using Eq (5): where R is the number of cell lines, α r is a trainable parameter that indicates the attention weight of drug embeddings from cell line r.
Decoder. 3R decoders are adopted to reconstruct the graphs from all cell lines based on the cell line-common drug embeddings learned above. For simplicity, the adjacency matrixes of the reconstructed graphs in cell line r are denoted as d M rjt 2 R N�N ; t ¼ 3. In particular, utilizing the cell line-common embedding vectors of drug i and drug j, the adjacency matrix of the reconstructed graph is calculated through Eq (6): where W 2 R d�d is a trainable weight matrix, which is shared in graphs from all cell lines, modeling global interactions of drug combinations across different graphs. D rjt 2 R d�d is the diagonal matrix that capture the importance of each dimension in drug embeddings h com i and h com j towards the t-th graph in cell line r. σ is the sigmoid function. Loss function. The MSE of graphs from all cell lines are merged as the loss function. The MSE is calculated by Eq (7): where (i, j) denotes the combination between drug i and drug j in cell line r, and y r|t represents the set of combinations in the t-th graph of cell line r. |y r|t | is the number of combinations in the t-th graph of cell line r required to normalize the loss value. End-to-end training the embedding module. We deploy an end-to-end optimization approach training the embedding module to simultaneously learn low-dimensional cell linespecific and -common drug embeddings. The total loss is merged from the loss of models for each cell line and the loss of model for all cell lines, specifically, it is calculated by Eq (8): where R is the number of cell lines. All of the trainable parameters involved in the model are first initialized by the Xavier uniform initialization method, and jointly optimized via a gradient descent with the Adam optimizer. The model is implemented by TensorFlow (version 2).

The predictor module of MGAE-DC
The predictor module receives the features of two drugs and one cell line to predict the synergy score of the drug combination. Fig 1C displays the architecture of predictor module, which is built by a triple-layers neural network with batch normalization. Three types of features are considered for each drug including its cell line-specific embeddings and cell line-common embeddings learned from the graph structured drug combinations' synergy data in the embedding module, and the third type is the corresponding molecular fingerprint. The MSE loss function is used for training the predictor module.

Methods for comparisons
To present the performance of MGAE-DC predicting the synergy scores of drug combinations, which is a regression task, we compare it with four advanced methods including DeepSynergy [13], Matchmaker [14], PRODeepSyn [11], EC-DFR [20], HypergraphSynergy [32], TranSynergy [33] and SynPred [34]. Since some existing methods treat the prediction as a classification task, we also compare the performance of MGAE-DC with these methods including DTF [23], DeepDDS [16], Jiang's method [29], SynPathy [35] and SDCNet [17]. For the classification task, the synergistic drug combinations are labeled as positive samples while the other combinations are treated as negative samples. All existing methods are replicated using their publicly available programs. The detailed information of existing methods is summarized in S5 Table.

Cross-validation strategies
For experimental setup, we fist perform the stratified 10-fold cross-validation strategy to evaluate the performance of MGAE-DC (Fig 4). The synergistic drug combinations, additive drug combinations and antagonistic drug combinations are randomly split into ten equal-sized subsets in each cell line, respectively; one subset is selected from each type of combinations and the selected subsets from all cell lines are concatenated as the test set, while the remaining samples are taken as the training set. Then, to comprehensively evaluate the predictive and generalization performance, we further perform leave one drug out, leave one cell line out and leave drug pair out strategies (Fig 4). For the leave one cell line out, each cell line is regarded as the test cell line in turn, the samples in the test cell line are test set and the data in other cell lines form the training set. Similarly, the leave one drug out strategy treat each drug as test drug in turn and the samples contain the test drug are test set, the rest samples are training set. While the leave drug pairs out first randomly split the drug pairs into ten equalsized subsets, then each subset is treated as test drug pairs in turn and the samples contain the test drug pairs are test set, the rest samples are training set.

Hyperparameters selection
We consider different hyperparameter settings for the model including the dimensionality of the drug embeddings, learning rate, dropout rate and the hidden units in the predictor  Table). The best hyperparameters are determined by the grid-search and displayed in boldface.

Evaluation metrics
The primary evaluation metrics for the regression task, which predicting the synergy scores of drug combinations, is MSE, the 95 confidence interval of MSE, root mean square error (RMSE) and the Pearson correlation coefficient (PCC). For the classification task, which predicting whether the drug combinations are SDCs, the commonly used evaluation metrics, including the area under the curve (AUC), accuracy (ACC), area under the precision recall (AUPR), precision, and the Cohen's Kappa are used.