Drug combination sensitivity scoring facilitates the discovery of synergistic and efficacious drug combinations in cancer

High-throughput drug screening has facilitated the discovery of drug combinations in cancer. Many existing studies adopted a full matrix design, aiming for the characterization of drug pair effects for cancer cells. However, the full matrix design may be suboptimal as it requires a drug pair to be combined at multiple concentrations in a full factorial manner. Furthermore, many of the computational tools assess only the synergy but not the sensitivity of drug combinations, which might lead to false positive discoveries. We proposed a novel cross design to enable a more cost-effective and simultaneous testing of drug combination sensitivity and synergy. We developed a drug combination sensitivity score (CSS) to determine the sensitivity of a drug pair, and showed that the CSS is highly reproducible between the replicates and thus supported its usage as a robust metric. We further showed that CSS can be predicted using machine learning approaches which determined the top pharmaco-features to cluster cancer cell lines based on their drug combination sensitivity profiles. To assess the degree of drug interactions using the cross design, we developed an S synergy score based on the difference between the drug combination and the single drug dose-response curves. We showed that the S score is able to detect true synergistic and antagonistic drug combinations at an accuracy level comparable to that using the full matrix design. Taken together, we showed that the cross design coupled with the CSS sensitivity and S synergy scoring methods may provide a robust and accurate characterization of both drug combination sensitivity and synergy levels, with minimal experimental materials required. Our experimental-computational approach could be utilized as an efficient pipeline for improving the discovery rate in high-throughput drug combination screening, particularly for primary patient samples which are difficult to obtain.


Introduction
Despite great advances in the understanding of cancer, there remains a major challenge to develop more effective anti-cancer treatments. Next generation sequencing has revealed the intrinsic heterogeneity in cancer genomes, which partly explains why patients respond differently to the same therapy [1]. To reach durable clinical responses, cancer patients who relapse and become refractory to standard chemotherapy need novel multi-targeted drug combinations which can effectively overcome the emergence of drug resistance [2][3][4]. Ideally, a potential drug combination should achieve therapeutic efficacy at reduced dosages, and therefore minimize the toxicity and other side effects associated with high doses of single drugs [5][6]. Therefore, two important properties for a drug combination must be evaluated: sensitivity and synergy. Sensitivity of a drug combination is defined as the level of treatment response, usually measured in the unit of percentage inhibition of cell viability or growth. In contrast, synergy of a drug combination is referred to the degree of drug interactions that contributes to the drug combination sensitivity independent of the single drug effects [7].
In order to identify sensitive and synergistic drug combinations, high-throughput drug screening has been applied on a large variety of cancer cell lines and more recently on patientderived cancer samples [8][9]. Many high-throughput drug combination screens test pairs of drugs at a dose matrix, for which the cell viability or growth inhibition effects are measured [10][11]. The dose-response matrix results from a full factorial design that involves multiple dose combinations of a drug pair, and thus demands a relatively large amount of cancer cells. For patient-derived cancer samples, which are challenging to obtain and restricted in volume, the full matrix design may be infeasible to test even with a minimal number of drug combinations. Furthermore, cancer samples of different genetic profiles are known to respond differently to the same treatment [12]. With the limited amount of drug combination data points, it becomes a daunting task for any machine learning approach to navigate the combinatorial space to pinpoint the most promising drug combinations that are selectively effective for individual cancer samples [13].
Furthermore, many existing computational tools for drug combination analysis focus on the degree of interaction, i.e. drug synergy, but not the sensitivity of drug combinations. For example, Combenefit [14] and SynergyFinder [15] have been developed to provide multiple reference models to score drug synergy, such that drug combinations that produce higher growth inhibition effects compared to the single drugs will be prioritized. There have been multiple methods to score drug synergy [16], while the choice of the best synergy scoring methods is under debate [7,17]. On the other hand, as synergy is a measure of drug interaction while sensitivity is a measure of drug combination efficacy, these two metrics are expected to capture distinct properties of a drug combination. Therefore, neglecting the drug combination sensitivity may lead to a biased prioritization of drug combinations that are unable to kill cancer cells despite strong synergy [18]. However, unlike the sensitivity of single drugs which can be directly derived from monotherapy dose-response curves [19], the sensitivity of a drug combination remains largely undefined, as the same sensitivity can be achieved using different dose combinations. Furthermore, there is a lack of scoring approaches to fully capture the synergy and sensitivity simultaneously, which should be ideally interpretable using the same scale, e.g. percentage inhibitions [16].
To overcome these challenges, we proposed a cost-effective experimental and computational procedure to facilitate the prioritization of drug combination synergy and sensitivity. We proposed a novel experimental design to allow either drug to span over multiple doses while the concentration of the other drug is fixed at its IC 50 concentration. The resulting drug combination dose-response curves were utilized to determine a drug combination sensitivity score (CSS). Using a large-scale drug combination study, referred to as the O'Neil data [20], we showed that the CSS is highly reproducible, suggesting its robustness to be utilized as a metric for characterizing drug combination responses. Furthermore, we found that the CSS can be predicted at high accuracy using chemical and pharmacological features of the drug combinations. To assess the degree of synergy from the cross design, we developed an S synergy score based on the difference between the observed CSS score and the baseline effect predicted by a reference model. As there is no consensus on the reference model, we evaluated multiple variants of it and found that the S score can detect the true synergistic and antagonistic drug combinations with high accuracy, irrespective of which the reference model is used. Compared to the full matrix design, the cross design requires minimal amount of experimental materials, while it still maintains a robust and accurate characterization of both drug combination sensitivity and synergy levels. We foresee that such a cross experimental design and its CSS and S scoring methods should allow a scale-up of drug combination testing especially for patient-derived cancer cells. The R scripts for calculating and predicting CSS are available at https://github.com/amalyutina/CSS.

The cross drug combination design
We proposed a cross design to test the synergy and sensitivity of a drug pair by first introducing the concepts of background drug and foreground drug: background drug is the drug fixed at its IC 50 concentration while foreground drug is added into the background drug with multiple concentrations. We allow either drug in the pair to be the background drug, so that two vectors of dose combinations will be intersected at the IC 50 concentrations (Fig 1A). The doseresponse curves for these two vectors are usually measured in a unit of inhibition percentages by cell viability or toxicity assays. Note that the cross design requires specifically the combinations at the IC 50 concentrations, which need to be determined based on the monotherapy dose response curves. As shown in Fig 1B, as long as a minimal of two concentrations are tested for a drug combination, the cross design will require less experimental materials than a full matrix design. For a combination screen that tests one concentration only, both the cross design and the full matrix design converge to a point estimate of drug combination effects at IC 50 . Therefore, we considered that the cross design is in general more cost-effective compared to the full matrix design.

Determination of the CSS drug combination sensitivity scores
With the drug combination dose-response curves determined in the cross design, the CSS summarizes the area under the curve similar to the scoring approaches [21][22]. Namely, a four-parameter log-logistic function is used to fit the dose-response curve for a concentration x of the foreground drug according to: y ¼ y min þ y max À y min 1 þ 10 lðlog 10 IC 50 À log 10 xÞ ; ð1Þ where y min and y max are the minimal and maximal percentage inhibition (the bottom and top asymptotes of the curve, 0 � y, y min , y max � 1); IC 50 is the concentration of the foreground drug with which the drug combination reaches 50% of y max -y min inhibition of the cell growth; λ is the slope of the dose-response curve. The dose-response curve (1) is transformed by substituting x with x' = log 10 (x) as: The area under the log 10 -scaled dose-response curve (AUC) is determined according to where [c 1 , c 2 ] is the log 10 concentration range for the foreground drug tested in the experiment, and m = log 10 (IC 50 ). . One drug is utilized as the background drug fixed at its IC 50 concentration while the other drug becomes the foreground drug with multiple doses being titrated. The resulting two dose-response curves will be summarized as the drug combination sensitivity score (CSS), from which the S synergy score can be determined as the deviation from a reference model which predicts the expected percentage inhibition effect from monotherapy dose responses. (B) Comparing the cross design with the full matrix design in terms of data size. Less materials are needed for the cross design when the size of the full matrix is two or above. The AUC is further normalized as the proportion of its maximal possible inhibition (i.e. 100% inhibition) according to: where inh min is the minimum inhibition level that is considered as the drug effect (by default it is fixed at 10%, assuming that the inhibition below 10% is experimental noise, S1 Fig).
The CSS for the foreground drug is defined as a percentage which varies between 0 and 100: As there are two drug combination dose-response curves depending on which drug is fixed as the background drug, we refer to the results of Eq (5) for either scenario as CSS 1 and CSS 2 . The two variants of CSS are expected to reflect similarly the summarized % inhibition for a given drug combination. We considered them as two samples that are generated from the same random variable, and estimated the CSS as an average of CSS 1 and CSS 2 , i.e.
The O'Neil drug combination data Dose-response was measured as percentage of cell viability and retrieved from the supplementary material of O'Neil et al. [20], which includes 22,737 drug combinations that involve 38 unique drugs in 39 cancer cell lines, representing 7 tissue types. At the first stage, single-drug screening was done using 8 concentrations to determine the IC 50 concentration for each drug with six replicates. At the second stage, a 4 by 4 dose matrix was utilized to cover the span of IC 50 concentrations for a drug pair with four replicates. To utilize the cross design, we picked up only the row and the column corresponding to the concentrations closest to the IC 50 of the single drugs. These two vectors thus allowed the fitting of drug combination dose-response curves with which the CSS can be calculated. The cell viability percentage was first transformed to inhibition percentage according to: In our analysis, the CSS for a drug combination was determined based on the average % inhibition of the four replicates. The robustness of the CSS scoring was assessed using the Pearson correlation across the four replicates. All the correlation analyses utilized Pearson correlations.

Predicting the CSS using machine learning approaches
With the CSS being determined for each drug combination, we sought to evaluate the prediction accuracy of multiple machine learning methods. We considered a drug combination as a combination of their targets and chemical fingerprints. We collected the known targets that have been experimentally validated for the 38 drugs from Drugbank [23] and ChEMBL [24]. Furthermore, we also utilized the Similarity Ensemble Approach (SEA) to predict additional secondary targets based on the chemical structures of the drugs [25]. The targets that were predicted with Z-score higher than 20, Tanimoto coefficient higher than 0.4 and P-value smaller than 0.01 were included, following the previously reported filtering strategy [25]. The MACCS fingerprints of the drugs were determined using the SMILES strings with the R package rcdk [26]. The resulting feature set for a single drug included 398 validated and predicted targets and 166 MACCS fingerprints (S1 and S2 Tables). The feature vector for a drug combination was determined as the bitwise OR operation over the features of its single drugs.
We compared three state-of-the-art machine learning methods for the CSS prediction: Elastic Net [27], Random Forests [28] and Support Vector Machines [29]. Elastic Net is a regularization and feature selection method that combines both ridge and lasso regression by including the L 1 and L 2 penalty terms, which are regulated by hyper parameters α and λ. The α parameter controls the penalty term in the elastic net by giving more power either to the lasso regression (when α is closer to 1) or to the ridge regression (when α is closer to 0). In our studies, α was selected from the interval [0.1, 1], which determines the level of compromise between the lasso and ridge regressions. The λ parameter regulates the level of shrinkage and was chosen to minimize the difference between predicted and actual CSS scores. Random Forests is an ensemble learning method that constructs multiple decision trees. In our studies, we set the number of randomly selected predictors that is used at each split of the decision tree equal to the rounded down square root of the number of variables. We utilized Support Vector Machines with Radial Basis Function Kernel, which can be used not only for classification but for regression problems. The tuning parameters are the cost parameter C that sets the penalty for prediction error of a training point and a smoothing parameter σ, based on the loss function in cross-validation.
We focused on the model performance for predicting new drug combinations within the same cell line, as the set of drug combinations in the training data did not overlap with that in the test data. For each of the 39 cell lines, the number of drug combinations ranges from 290 to 688, with an average of 338. We randomly sampled 70% of the drug combinations to train multiple machine learning models using 10-fold cross-validation, which splits the training data randomly into 10 equally folds, 9 of which were used to fit the model and the remaining one was used to evaluate the prediction accuracy. The model with the lowest RMSE out of the 10-fold cross validation was then used for predicting the CSS values for the remaining 30% of the novel drug combinations as the testing data. As the sampling was done randomly for each cell line, the training and the testing data were therefore balanced. Four metrics including coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE) and Pearson correlation (COR) were utilized for evaluating the prediction performance on the testing data. The whole procedure above was repeated 20 times for each cell line on the predefined seeds, and the final model performance was obtained as the mean values of these iterations. To benchmark the performance of the machine learning methods, we utilized one randomly selected technical replicate as the best possible prediction to obtain the upper limit of the performance. All the methods were implemented and evaluated using the R package caret [30].

Determination of the S drug synergy scores
The advantage of CSS is that it allows a direct comparison of the sensitivity between a drug combination and its single drugs, and hence facilitates the quantification of drug synergy. The degree of synergy is often calculated as the deviation of the observed drug combination effect from the reference, which is defined as the expectation effect if the drugs are not interacting. However, how much the expected effect should be is a matter of mathematical modelling with certain assumptions. As the choice of the 'best' synergy model is rather heuristic, we proposed three variants of CSS-based synergy scores (termed as S scores) by assuming the reference model as the sum, the maximal and the mean of the AUCs for the monotherapy drug responses: The AUC for a monotherapy drug response was defined according to [22]: where [a, b, c] were the parameters to fit a logistic function on the single drug response at concentration x: To evaluate the prediction accuracy of the S synergy scores, we defined a set of true synergistic and antagonistic drug combinations as the gold standard, which were determined using the full dose-response matrix data including the combination and monotherapy responses. We utilized the R package synergyfinder [31] to calculate multiple versions of synergy scores including the HSA (Highest Single Agency [32]), the Bliss [33], the Loewe [34] and the ZIP synergy scores [35]. The principles of these four models are summarized below: Consider that drug 1 at concentration x 1 and drug 2 at concentration x 2 were combined to produce the inhibition effect of y c , while their respective single drug effects were y 1 (x 1 ) and y 2 (x 2 ). The synergy score was calculated as the difference between y c and the expected effect y e if there is no synergy. Each synergy scoring took a different model for y e : 1. HSA: y e is the maximal single drug effect, defining 2. Bliss: y e is the expected effect of two drugs acting independently, defining 3. Loewe: y e is the expected effect of a drug combined with itself, defining 4. ZIP: y e is the expected effect of two drugs that do not potentiate each other, defining For each of the four models, the synergy scores were determined first for a given dose combination and then were averaged over the full dose-response matrix. With the four synergy scores determined for each drug combination, the true synergistic and antagonistic drug combinations are those with all four synergy scores consistently higher than 5 and lower than 5, respectively. The aim was then to use the S synergy score which was determined by the cross design data to predict the ground truth determined by the full matrix design. The areas under the ROC curve and the precision-recall curve were used for evaluating how well the S synergy scores can predict the consensus drug combinations determined using the full dose-response matrix data.

CSS values are highly reproducible and robust
We applied the CSS scoring on the O'Neil drug combination data, which consists of 22,737 drug combinations for 39 cancer cells [20]. We found that the CSS 1 and CSS 2 values calculated using either drug fixed at its IC 50 concentration were highly correlated (Pearson correlation = 0.82, p-value = 2×10 −16 ; Fig 2A). Both CSS 1 and CSS 2 values ranged from 0 to 50, with a marginal absolute difference of 5.62 ( Fig 2B). As a CSS score can be directly interpreted as a normalized average % inhibition of the drug combination response (Eqs 1-6), such a result implies that about 5% inhibition difference is expected between CSS 1 and CSS 2 . The high level of consistency holds true for all the 39 cancer cell lines and the majority of the 38 unique drugs, suggesting the robustness of the CSS scoring method (Fig 2C and 2D, S2 Fig). To evaluate the robustness of the CSS values further, we permuted the O'Neil data randomly and recalculated the correlations between CSS 1 and CSS 2 . The correlations deteriorated quickly to near zero (Pearson correlation = 0.075, S3 Fig), suggesting that the high correlation can be attributed to the robustness of CSS on the actual drug combination data. We also found high correlation between the CSS value and those derived from individual replicates (minimal Pearson correlation = 0.97, S3 Table). In order to check whether the CSS values are within the range of the CSS replicates for each drug combination, we calculated the minimal and maximal values over the CSS replicates for each drug combination and plotted them together with CSS values over the standard deviation of the CSS replicates. For a better visualization, we applied a generalized additive model to smoothen the CSS lines and obtain 95% pointwise confidence interval around the mean (S4 Fig). Only 4% of the drug combinations have the CSS values being out of the CSS replicate-based limits, however this can be explained by the higher variance over the replicates.
Notably, we found that drug combinations that involved bortezomib showed much lower correlation (0.26) between the CSS 1 and CSS 2 values compared to other drug combinations. Since the O'Neil data contains the replicates for single drug screening, we analyzed the coefficient of variation (CV) of the cell viability readout for each drug in the replicates. As expected, we found that bortezomib has the highest CV (0.26), suggesting a relative low quality of the drug combination sensitivity data involving this drug (S5 Fig). When filtering out the drug combinations with a decreasing threshold of difference between CSS 1 and CSS 2 , the remaining drug combinations showed an increasing CSS 1 and CSS 2 correlation (S6 Fig). We found that the threshold of 10 is close to the middle point that reached the correlation of 0.91 (i.e. the average of 1 and 0.82). We therefore applied an empirical threshold of 10 to filter out the drug combinations that were in poor quality, as CSS 1 and CSS 2 in these combinations showed bigger than 10% inhibition difference. In total, there were 18,905 drug combinations after the filtering, constituting the 83.1% of the original data. As expected, the correlation between CSS 1 and CSS 2 was further improved (Pearson correlation = 0.93, p-value = 2×10 −16 ). Furthermore, the mean absolute difference between CSS 1 and CSS 2 was 3.83, which was comparable to the variability determined from the technical replicates of CSS 1 and CSS 2 (2.92 and 3.06 respectively), suggesting that the difference between CSS 1 and CSS 2 is similar to what is expected when repeating the experiment. Taken together, CSS 1 and CSS 2 values are highly consistent and therefore supported their averaging as a summary for the drug combination sensitivity score.

CSS can be predicted using machine learning approaches
Given that the CSS is highly reproducible as a summary of the overall sensitivity of a drug combination, we explored whether CSS can be predicted using pharmacological and chemical information of the drugs. We considered a drug combination as a combination of its drugs' target profiles as well as their chemical fingerprints, with which the machine learning approaches illustrated in the previous section can be optimized by exploring the feature space using the training data. We examined three major machine learning methods for predictions: Elastic Net, Random Forests and Support Vector Machines.
We found that all of these machine learning approaches worked reasonably well, where Elastic Net consistently achieved the best performance, with a mean MAE of 4.01 which is comparable to that (2.07) of a technical replicate ( Table 1). Note that in our cross-validation setting the drug combinations in the test data were not present in the training data, however, the machine learning methods were still able to predict the CSS values for new drug Drug combination sensitivity score combinations by exploring the feature similarity in the drug targets and chemical fingerprints. The prediction performance thus validated our hypothesis that a drug combination can be considered as a combination of their drug target profiles and chemical-structural properties, with which the CSS score can be predicted with high confidence using state-of-the-art machine learning approaches.
Since both the drug target profiles and chemical fingerprints were considered as the drug combination features, we next evaluated their prediction performances separately using the Elastic Net method. For drug-target profiles we collected known targets that were experimentally validated as well as the additional secondary targets that were predicted with high confidence using the SEA method. For chemical fingerprints we used the MACCS fingerprint which contains 166 structural features [36]. As expected, when combining all the features the model achieved the best performance (Table 2). We found that in general drug target profiles were predictive of CSS, especially when including the experimentally validated targets. The predicted targets using the SEA method did not improve the prediction accuracy significantly, indicating that even though secondary drug target interactions may occur, most likely they have minor functional impact that may not lead to changes in cancer cell viability and thus does not contribute to the prediction of CSS. On the other hand, we found that chemical fingerprints were less predictive of CSS compared to the drug-target profiles, suggesting that the use of MACCS might be suboptimal to capture the relevant structural information for predicting the drug combination sensitivity. However, as the focus of this study was to show the validity of using machine learning methods to predict the CSS score, we decided to explore other chemical fingerprint features as a future step.
We considered the regression coefficients that were determined in the Elastic Net model as an indication of their importance to contribute to the CSS prediction. We found that certain drug target features were present with high coefficients across all the cell lines (Fig 3). For If the CSS profiles for two cell lines are similar, then their feature importance vectors are expected to be similar. We focused on the most important features that have their absolute coefficients greater than 3 for at least one cell line, resulting in 67 top features. We then utilized these feature importance scores to cluster the cancer cell lines, using unsupervised hierarchical clustering with the Euclidean metric. We found that cell lines of the same tissue type did not necessarily cluster together, indicating their distinctive drug combination response profiles. For example, we found that breast cancer cell lines did not form a single cluster due to the outlier MDAMB436. Indeed, MDAMB436 is the only triple negative breast cancer (TNBC) subtype, while the other cell lines are either ER positive (KPL1, ZR751 and T47D), or HER2 positive (EFM192B and OCUBM). It has been known that TNBC respond to anticancer drugs differently from ER and HER2 positive breast cancers due to distinct disease mechanisms [38]. The top drug combination features separated these two distinctive breast cancer subtypes, suggesting the validity of using CSS-predictive features to cluster cancers of different subtypes. Furthermore, we found that AKT targets (AKT1/2/3) were among the top features that showed higher importance in the non-TNBC group. A combination of an AKT inhibitor and a TOP1MT inhibitor therefore can be proposed to treat non-TNBC, but not necessarily for TNBC breast cancers. On the other hand, we found that CHEK1 and PARP3/4 targets were selected for the TNBC cell line MDAMB436 but not for the non-TNBC group, suggesting that a combination of a CHEK inhibitor and PARP inhibitor might be effective for TNBC. The mechanisms of actions for the proposed drug combination may prove interesting for experimental validations. Taken together, the pharmaco-features that were determined from the CSS prediction may help pinpoint the underlying target combinations, which are of pivotal importance to explain the drug combination responses.

The S synergy scores can predict the true synergy and antagonism
Next, we defined the degree of drug synergy as the differences between the dose-response curves of a drug combination and its single drugs. We derived three variants of the S synergy score (S sum , S max , S mean ) and compared them with the HSA, Bliss, Loewe and ZIP synergy scores that were determined using the full-dose response matrix. Although being determined using only one row and one column from the dose-response matrix, all the S synergy scores Table 3. Pearson correlations of the S synergy scores with those derived using four reference models that were calculated using the full dose-response matrices. Drug combination sensitivity score managed to obtain a good correlation with the synergy scores based on the full matrix design (Table 3). We found that the S synergy scores correlated relatively well with the HSA and Bliss scores, while the correlation started to decrease when comparing to the Loewe and ZIP scores. Since all the synergy scoring models utilized different assumptions for the reference of no synergy, we therefore did not expect a perfect correlation in such pairwise comparisons. For example, the Bliss model assumes that two non-interactive drugs act independently while the Loewe model assumes two non-interactive drugs act as one drug. Their differences in mathematical models have been discussed in our previous publications, such as [16] and [35].

Synergy
Of all the three variations of S synergy score, we found that S sum showed the best correlation with those determined using the full dose-response matrix. As S sum considers the additive effect of single drug sensitivities as the expectation of no synergy, it thus can be considered a more conservative scoring method compared to S max and S mean , where the maximal and average effect of single drugs were considered as reference separately. To control the false discovery rate of detecting synergistic combinations, we therefore proposed S sum as a more appropriate scoring method for the cross drug combination design.
Next, we evaluated the predicting accuracy of the S synergy scores for true synergistic and antagonistic drug combinations. To be able to formulate a binary classification problem, we first selected the true positive and true negative drug pairs by applying stringent criteria to determine the ground truth from the full dose response matrix data. The rationale of the ground truth was based on the assumption that full dose-response matrix can capture the truly synergistic and truly antagonistic drug combinations, as it allows the full factorial testing of all the possible doses for a given drug combination. However, as there exist different models for synergy scoring, we decided to apply the most stringent criteria to determine the ground truth, such that a truly synergistic or truly antagonistic drug combination has to fulfill the criteria of all the four existing synergy scoring models (HSA, Bliss, Loewe and ZIP). Namely, the drug combinations with all the four synergy scores (HSA, Bliss, Loewe and ZIP) higher than 5, or lower than -5, were classified as true synergistic or antagonistic drug combinations, respectively. The threshold of [-5, 5] was determined by the empirical distribution of the synergy Drug combination sensitivity score scores in the O'Neil data, assuming that most of the drug combinations are non-interactive (S8 Fig). Furthermore, we considered that a synergy score of [-5, 5] range corresponds to a [-5, 5] % inhibition which might be due to experimental variation, as indicated by the analysis of replicates from O'Neil dose-response matrix data (mean standard deviation of % inhibition is 6.6%). Therefore, any synergy score between [-5, 5] may be simply experimental noises. From the O'Neil data, we showed that 78.1% of the drug combinations were within the [-5, 5] range by at least one synergy scoring, suggesting that the majority of the drug combinations indeed should not be reliably considered as true synergistic nor true antagonistic (S8 Fig). As a result, we identified 3,716 true synergistic and 218 true antagonistic drug combinations. All the other drug pairs were considered as non-interactive and thus excluded from the analysis.
Once the ground truth had been determined using the full dose-response matrix data, we then asked the question: Can the S synergy scores that were determined using the cross design correctly identify the most significant synergistic and antagonistic drug combination hits that were confirmed using the full matrix design? We showed that the S synergy scores achieved the area under the ROC curves of 0.997 (S sum ), 0.996 (S max ) and 0.992 (S mean ) to detect the true synergistic and antagonistic combinations (Fig 4A). The area under the precision-recall curves were 0.9997 (S sum ), 0.9995 (S mean ) and 0.9998 (S sum ), suggesting that the S scores retrieved a majority of synergistic combinations with minimal false positive and false negative rates (S9 Fig). The S synergy score was derived from the cross design, where only two vectors of the drug combination responses are needed. Still, the S synergy scores can predict the most synergistic and antagonistic drug combinations with high accuracy. These results showed that the cross design can be reliably utilized as a cost-effective strategy in a primary screen to detect the most significant synergistic or antagonistic drug combinations. On the other hand, the S synergy score and the CSS drug combination sensitivity score utilize the same unit as the percentage of inhibition of cell viability. Therefore, the synergy score can be interpreted as the extra percentage inhibition effect beyond the expectation. We summarized both CSS and S scores for all the drug combinations as an S (sensitivity)-S (synergy) plot (Fig 4B; S4 Table). By applying a threshold of the 3 rd quantiles for CSS and S, we can clearly identify the most promising drug combinations that fulfill both the sensitivity and synergy criteria, while avoiding the false positive drug combinations that might be synergistic but do not achieve a sufficient high level of sensitivity (S10 Fig). Taken together, the combined use of CSS drug combination sensitivity score and the S synergy score allows a simultaneous evaluation of the sensitivity and synergy for a drug combination, which will facilitate a more systematic analysis of high-throughput drug combination data with much less experimental materials.

Discussion
Drug combinations may potentially lead to more durable clinical responses by overcoming intra-tumoral heterogeneity and drug resistance to monotherapies. Identifying drug combinations that are tailored for personalized medicine is a challenge, as the number of possible combinations may easily grow exponentially [39]. High-throughput drug combination screening has been increasingly utilized for early detection of true synergistic and effective drug combinations. However, systematic identification of drug combinations is difficult, as the concepts of synergistic versus effective drug combinations are often intertwined and sometimes interchanged without sufficient clarification. Furthermore, there is a lack of consensus on what the definition of synergy and sensitivity are, which might contribute to the poor reproducibility of many drug combination studies. The uncertainty about the endpoint measurement in drug combination screens brings additional complexity for any machine learning approach to tackle the prediction problem.
We developed a novel scoring approach called CSS for drug combinations that can be efficiently determined using the cross design. We found that the CSS is highly reproducible and therefore can be considered as a robust metric to characterize drug combination sensitivity. To understand the drug combination sensitivity, we implemented a systematic evaluation of the prediction accuracy of three machine learning methods. We showed that machine learning in general worked well for the prediction of CSS, where Elastic Net showed the best performance according to our cross-validation setting. We found that the drug target information for the compounds as well as their chemical fingerprints are highly predictive of the CSS values, with an accuracy comparable to the level of experimental replicates. Therefore, the rationale of considering a drug combination as a function of their target and fingerprint profiles can be justified. This would also allow the augmentation of single-drug screening and drug combination screening data together to train a machine learning model, as many drugs are multi-targeted which are equivalent to a drug combination with the same target profile. In our study, we utilized the SEA method to predict new targets of compounds, by applying a combination of thresholds of Z-score, Tanimoto coefficient and p-value suggested by the authors of SEA [25]. In addition to the predicted targets, we also included the known primary and secondary targets of the compounds, so that the risk of false negative is minimal. However, we could not find significant improvement on the prediction accuracy when including the SEApredicted secondary targets ( Table 2), suggesting that the unknown targets for a compound may contribute minimally to the drug combination sensitivity. In this study we focused on drug combination prediction within the same cell line. In the future, we could include the molecular features of the cell lines to improve the prediction accuracy, aiming to identify drug combination specific biomarkers across different cell lines. On the other hand, as the focus of the study is to propose the new experimental design and to justify its associated drug combination scoring methods, we tested the predictability of CSS using only conventional machine learning methods with standard cross-validation schemes. A more comprehensive evaluation of machine learning approaches may be developed by including multiple cross-validation schemes and data pre-filtering techniques. More advanced machine learning methods such as Deep Learning [13] or network-based methods [40] may further improve the prediction accuracy as well as help the biological understanding of the mechanisms of action.
A truly promising drug combinations shall reach sufficient therapeutic efficacy via a strong sensitivity and synergy. Therefore, both these aspects should be evaluated for the prioritization of most potential drug combination hits. While there have been multiple synergy scoring methods that can be applied to the full matrix design, they do not always produce consistent results. The truly synergistic and antagonistic drug combinations may therefore be determined by achieving the consensus across the different scoring methods [16]. Based on the CSS drug combination sensitivity scoring, we developed an S synergy score to quantify the degree of interactions in a drug pair, and showed that it can identify truly synergistic and antagonistic drug combinations accurately. Tailored for the cross design, the S synergy score can be used for the prioritization of a primary drug combination screen, after which only the most significant drug combinations should warrant a confirmation screen using the full matrix design. Furthermore, we proposed a novel S-S plot (Fig 4B) to visualize drug combination sensitivity and synergy using the same scale, which enables an unbiased way to explore high-throughput drug combination data more efficiently with minimal bias. Notably, the CSS is defined at the IC 50 concentrations of the background drugs. Therefore, a synergistic drug combination determined by the S score should be more therapeutically relevant than a drug combination where the synergy is detected at higher concentrations, which are often associated with unwanted off-target effects and side-effects.
The proposed cross design aimed for alleviating the limitation of the conventional doseresponse full matrix design which usually requires a large amount of cancer cells that may not be amenable especially from patient-derived samples. Empowering the cross design with the CSS and S scoring methods, we are foreseeing a lower technical barrier to carry out large-scale drug combination studies with minimal cellular materials. Although we showed the proof-ofconcept using the drug combination data involving cancer cell lines, the cross design coupled with the CSS and S scoring methods can be readily applied for ex_vivo drug screening, where the amount of patient-derived materials can be extremely limited and technically difficult to obtain due to culture constraints [41]. While the majority of drug combination screens are limited to cytotoxic and molecularly targeted molecules, the cross design should be also favored for the testing of combinations that involve immunotherapies and antibodies. Furthermore, the cross design is not only applicable for cancer but also for other diseases, as long as cellular phenotypes of interests can be measured at multiple dose levels. With the help of cross design and its data analysis tools, drug combination discovery can be more quickly advanced and may eventually lead to the validation of personalized drug combinations in clinical trials.
Supporting information S1 Table. Drug target profiles including experimentally-validated primary and secondary targets, and SEA-predicted secondary targets for the 38 compounds.