Using drug exposure for predicting drug resistance – A data-driven genotypic interpretation tool

Antiretroviral treatment history and past HIV-1 genotypes have been shown to be useful predictors for the success of antiretroviral therapy. However, this information may be unavailable or inaccurate, particularly for patients with multiple treatment lines often attending different clinics. We trained statistical models for predicting drug exposure from current HIV-1 genotype. These models were trained on 63,742 HIV-1 nucleotide sequences derived from patients with known therapeutic history, and on 6,836 genotype-phenotype pairs (GPPs). The mean performance regarding prediction of drug exposure on two test sets was 0.78 and 0.76 (ROC-AUC), respectively. The mean correlation to phenotypic resistance in GPPs was 0.51 (PhenoSense) and 0.46 (Antivirogram). Performance on prediction of therapy-success on two test sets based on genetic susceptibility scores was 0.71 and 0.63 (ROC-AUC), respectively. Compared to geno2pheno[resistance], our novel models display a similar or superior performance. Our models are freely available on the internet via www.geno2pheno.org. They can be used for inferring which drug compounds have previously been used by an HIV-1-infected patient, for predicting drug resistance, and for selecting an optimal antiretroviral therapy. Our data-driven models can be periodically retrained without expert intervention as clinical HIV-1 databases are updated and therefore reduce our dependency on hard-to-obtain GPPs.


Introduction
Prolonged chemotherapy against the human immunodeficiency virus type 1 (HIV-1) bears the risk of selection of resistant viral strains, ultimately leading to therapy failure [1][2][3][4][5][6]. Once a drug-resistant HIV-1 variant has been selected in a host, it can be transmitted to another host PLOS ONE | https://doi.org/10.1371/journal.pone.0174992 April 10, 2017 1 / 27 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 [6,7]. Furthermore, drug-resistant viral variants are permanently archived in the body of the host and can promptly reemerge if drug pressure conveys them a competitive advantage to other viral variants [8]. In order to prevent premature therapy failure, the susceptibility of an HIV-1 variant to available antiretroviral drugs can be measured phenotypically or genotypically [4,[9][10][11][12]. Due to the high cost, limited accessibility and high turnaround time of phenotypic resistance assays, genotypic resistance determination has become the standard of care [4,9]. Phenotypic resistance assays afford direct, quantitative resistance assessments that take into account resensitizing mutations [13], as well as complex mutational patterns [14]. However, certain drugs show significantly decreased in-vivo efficacy at very low in-vitro susceptibility changes which are close to the inherent variability of the phenotypic assay [15]. Furthermore, viral strains with mutations that do not directly cause resistance, but are strongly associated with the emergence of drug resistance, may be deemed susceptible by in-vitro phenotypic drug-resistance assays. If the respective drugs are taken by patients harboring these strains, resistant variants will promptly emerge and compromise virologic response to therapy [16]. Determination of genotypic resistance is performed by sequencing the viral genes coding for the targets of antiretroviral drugs, and subsequently interpreting the resulting nucleotide sequence [12]. A handful of tools exist for interpreting HIV-1 genotypes with respect to drug resistance. Drug-resistance mutation tables list amino acid mutations that confer resistance to antiretroviral drugs [14,17]. Rules-based genotypic interpretation systems score an HIV-1 genotype according to a set of rules defined by experts. The score for each drug is subsequently discretized into two to five categories indicating increasing levels of resistance [18,19]. Datadriven genotypic interpretation systems rely on statistical models of drug resistance for interpreting an HIV-1 genotype. These models are trained on sets usually containing genotypephenotype pairs (GPP) [20,21] generated with in vitro phenotypic assays, and can thus potentially inherit their advantages and disadvantages.
HIV-1 substitutions resulting from chemotherapy are frequently divided in two groups: major drug-resistance mutations and minor drug resistance mutations, which can also occur as natural polymorphisms [14,17,[22][23][24][25][26]. While there is no consensus on the definition of these two groups of mutations, in the following, we list the defining criteria that tend to be used. Major drug resistance mutations are frequently present in viral genotypes from patients failing antiretroviral therapy, and appear very rarely in HIV-1 genotypes from therapy-naïve patients. In fact, detection of such mutations in drug-naïve patients is currently interpreted as transmission of a resistant variant from patients who have failed therapy. By themselves, major drug-resistance mutations can either be directly responsible for drug resistance, or be informative markers for drug resistance. The implications of a mutation with respect to drug resistance can be investigated through site-directed mutagenesis with subsequent phenotypic resistance testing of the produced viral variant [27]. Minor drug resistance mutations tend to be polymorphic, and do not cause drug resistance by themselves, although they may further decrease susceptibility to a drug in combination with major drug resistance mutations and / or compensate for decreased replicative capacity resulting from selection of major mutations. In population genetics, a polymorphism is defined as a substitution that is present in more than one percent of the population [26,28]. The role HIV-1 polymorphisms play in chemotherapeutic success remains controversial [23,24,[29][30][31]. Certain polymorphisms may tend to accumulate during chemotherapy while also being present in drug-naïve patients, albeit with a reduced frequency [22,23]. Polymorphisms present at baseline may influence the drug susceptibility of an HIV-1 variant [26,[30][31][32]. Differential polymorphism distribution among HIV-1 subtypes has been observed, however, significant implications for drug susceptibility only seem to originate from intra-subtype variability as opposed to inter-subtype variability [24,25,32]. The most convincing explanation for the subtype-specific distribution of natural polymorphisms seems to be the existence of subtype-specific resistance pathways rather than subtype-specific propensity for selecting drug resistance [32].
Before drug resistance assays became available, treatment history was frequently used for the selection of new drug regimens [33]. Nowadays, new drug regimens are sometimes selected on the basis of treatment history when no drug resistance test is available. Indeed, statistical models that use treatment history in place of the genotype for predicting the success of antiretroviral therapy have been reported to be comparable to those of models that use the genotype (and do not use therapy history) [34][35][36][37]. However, to our knowledge, these methods have not yet found their way into clinical practice. In our experience, in today's settings using therapy history as a proxy for genotype incurs substantial loss of predictive power. At the same time, a statistically significant increase in performance can be achieved by simultaneously using treatment history and the genotype for predicting the success of antiretroviral therapy [34,[38][39][40][41][42][43].
Drug exposure can be predicted from genotype since the virus acquires mutations as a result of being exposed to a drug. These mutations encompass but are not limited to drugresistance mutations. Thus, while some of these mutations may indicate clinically relevant drug resistance, others may also solely indicate that the virus has changed as a result of drug exposure. As drug susceptibility is a prerequisite for the success of antiretroviral therapy, the detection of drug exposure may pose a risk for therapeutic success (Fig 1). Reporting drug exposure from genotype is relevant if either no established resistance-associated mutations are detectable and / or in cases in which no treatment-history information is available. In this work, we present statistical models that use HIV-1 genotypes to produce predictions of drugexposure that are correlated with both therapeutic history and drug resistance. We have developed our method in close contact with prospective users. From the resulting experience, we expect the method to provide a significant clinical advance in bioinformatics-based therapysuccess prediction.
Note that this article is largely based on another publication from which we amply quote [44].

Results
We trained models for predicting whether an HIV-1 variant had been previously exposed to a certain drug. One or two models were trained for each of the drugs considered in this study (Methods). Specifically, Exposure models were trained with HIV-1 sequences and information on drug exposure. The development sets of ExposurePheno models included genotype-phenotype pairs (GPPs) in addition to the data included in Exposure models. Since a sufficient number of HIV-1 sequences with information on drug-exposure was not available for all drugs, Exposure models could not be trained for all drugs. Additionally, we trained a model for discriminating between HIV-1 sequences from treatment-naïve patients and HIV-1 sequences from treatment-experienced patients. In the following, we refer to a number of datasets that we used for training and validating our models. For the comfort of the reader, we summarize the contents of each of these datasets in Table 1. Furthermore, we depict the relationships of each of the datasets in Figure A in S1 File. Relationship between drug exposure, drug resistance, and therapeutic success. a) Prior to drug exposure, the virus typically does not carry drug-resistance mutations. In the absence of drug pressure, drugsusceptible virus can replicate at high titers (dark-green viral particles). b) If drug susceptibility is given, antiretroviral therapy frequently leads to the suppression of viral replication, which is a prerequisite for therapeutic success. While antiretroviral therapy is administered, however, drug concentrations fluctuate over the dosing interval and may vary within the different body compartments (orange-yellow gradient). This can give rise to sub-inhibitory concentrations in some compartments (light-yellow area in gradient), resulting in the selection of mutations that confer to the virus a selective advantage in the presence of the drug (light-green viral particles). These mutations need not result in virological therapy failure, since they may not enable the virus to to the PRRT dataset) and 6,214 integrase sequences (assigned to the IN dataset). PRRT and IN were further complemented with 36,774 and 5,262 sequences from therapy-naïve patients (short: therapy-naïve sequences), respectively, from the Los Alamos National Laboratory Sequence Database (LANLSD; http://www.hiv.lanl.gov/). The number of sequences in PRRT was reduced to 75,239 sequences after excluding sequences with more than 10% undetermined residues. After removal of duplicate sequences, PRRT included a total of 70,304 sequences (approximately 93% of the initially included sequences). The number of sequences in IN was reduced to 7,076 after excluding sequences with more than 10% undetermined residues. After duplicate removal, 5,523 sequences (approximately 48%) were left in IN. The number of sequences per subtype for PRRT and IN can be seen in Table 2. Sequences in PRRT and IN were randomly assigned either to the development sets D PRRT and D IN , respectively, or to the test sets T PRRT and T IN , respectively (Methods). Two additional test sets were created, TP and HIVdbExposure. TP contains sequences from T PRRT and T IN which were obtained during therapy pauses. HIVdbExposure was created from the treatment-change episode (TCE) repository in the HIV Drug Resistance Database (HIVdb) [14,46]. It contains nucleotide sequences and lists the sets of drug compounds that had been used by the patient before the sequence was obtained. The distribution of subtypes per sequence in HIVdbExposure can be seen in Table 2. Table 3 shows the number of sequences in datasets D PRRT , D IN , T PRRT , T IN , TP, and HIVdbExposure by drug exposure. In D PRRT , 37,557 sequences were therapy-naïve, of which 3,757 (10.0%) present transmitted drug resistance (TDR) [7]. A total of 1,917 sequences in D IN are therapy-naïve, of which 48 (approximately 2%) present TDR. Note that the duplicate removal procedure eliminated substantially more sequences from IN than from PRRT. T PRRT contains 2,056 therapy-naïve sequences, among which 219 (approximately 11%) present TDR, while T IN contains 154 therapy-naïve sequences with 3 (approximately 2%) presenting TDR. We applied the definition of the EuResist Standard Datum [40] to clinical HIV data in the EIDB and in the HIVdb TCE repository. This yielded two datasets of TCEs with binary labels for therapeutic success, the EuResistTCE (n = 1,650) and the HIVdbTCE (n = 1,000) datasets. Fig  2a) depicts the most frequent therapies in the EuResistTCE, while Fig 2b) does so for the TCEs in HIVdbTCE. The baseline sequences in EuResistTCE overlap with the sequences in PRRT partially; the baseline sequences of 619 TCEs are not included in PRRT. TCEs in EuResistTCE whose baseline sequences were obtained during a therapy pause were assigned to the EuRe-sistTCE TP test set. EuResistTCE contains 313 first-line therapies (19.0%) among which 44 (14.1%) present TDR in their baseline sequences. No therapy in HIVdbTCE is a first-line therapy. The Pheno dataset contains GPPs which were labeled susceptible or resistant using the resistance-factor (RF) cutoffs one and ten. Pheno was randomly split into the development and training sets D Pheno and T Pheno , respectively (Methods). The compositions of D Pheno and T Pheno are displayed in Table 4.

Training of models for predicting drug exposure
We trained linear Support Vector classifiers (SVC) [47,48] for discriminating between sequences from viruses with and without previous exposure to a certain drug. We trained SVCs on two kinds of development sets, Exposure or ExposurePheno. Specifically, each sequence in the development sets D PRRT

Sequence of integrase None
Exposure drug Cross-validation / development set for the compound drug. These datasets include sequences with drug exposure information but not GPPs.
Protease, reversetranscriptase or integrase sequence Binary drug-exposure label for drug Exposure naïvePRRT Cross-validation /development set for models discriminating sequences from treatment-exposed and treatment-naïve patients.
Protease and reversetranscriptase sequence Binary label indicating whether sequence was obtained from therapy-naïve patient (Continued) Predicting HIV-1 exposure to antiretroviral drugs exposure to a particular drug had occurred or not. We used D PRRT and D IN for creating one Exposure drug development set for each drug and subsequently trained one SVC on each of these development sets. We additionally created the development set Exposure naïvePRRT in which labels indicate whether viral sequences were derived from therapy-naïve or therapyexperienced patients. Subsequently, we trained one SVC on Exposure naïvePRRT . For creating the ExposurePheno drug development sets, we extended the data in the Exposure drug development sets with GPPs from D Pheno . In the ExposurePheno drug development sets, viral sequences from GPPs labeled as susceptible to the drug in question are treated as not having being exposed to the drug. Conversely, GPPs labeled as resistant to the drug in question are treated as having been exposed to the drug. We trained an SVC with each ExposurePheno drug development set. We do not consider the binary output of the SVC classifier but rather the reported signed distance from the decision boundary, a real number. We call this number drug-exposure score (DES).

Assessment and comparison of performance
We constructed DES models with 10-fold cross validation on the respective development sets. Then we used the DES reported by the resulting models and the predicted resistance factor for geno2pheno [resistance] , respectively to calculate and compare AUC performance of both models. This was done for the test sets T PRRT , T IN , TP, HIVdbExposure, T Pheno , EuResistTCE, EuResistTCE TP , and HIVdbTCE. Among the 7,275 protease and reverse-transcriptase nucleotide sequences contained in T PRRT and EuResistTCE, 23 (<0.01%) were not processed by gen-o2pheno [resistance] due to low sequence similarity. For the sake of performance comparison, these sequences were excluded. In the following, mean performances for the tested models are stated. In order to be able to compare the different models, these means were calculated only with the performances of the drugs that are common to Exposure and ExposurePheno models, as well as to geno2pheno [resistance] . p-values were calculated with a two-sided Wilcoxon signedrank test [49].

Assessment of performance for predicting drug exposure via cross validation.
We cross validated the SVC on each Exposure drug and each ExposurePheno drug development set, as well as on the Exposure naïvePRRT development set. Specifically, we performed ten repetitions of a five-fold cross validation on each development set while testing a series of values for the SVC c parameter (see Methods). One value of the c parameter was chosen for each The numbers of sequences by drug exposure for the development and test datasets are tabulated above. Columns including the abbreviation Comp. in their headers indicate the numbers of sequences from a certain dataset and with a certain drug exposure whose complete drug exposure history is known. The complete drug exposure history for all sequences from the HIVdbExposure dataset is available. In the following, p-values quantify the difference in the AUC distributions between Exposure models, ExposurePheno models or geno2pheno [resistance] . The best mean performance on the T PRRT dataset could be attained by Exposure models (μ = 0.78; σ = 0.06), while the mean performance of geno2pheno [resistance] was lower (μ = 0.71; σ = 0.07; p < 10 −4 ). On the T IN dataset, DES performance for RAL was comparable for models trained on ExposurePheno cross-validation sets (AUC = 0.71), but not for those trained on Exposure cross-validation sets (AUC = 0.62). On the HIVdbExposure dataset, the best mean performance with lowest standard deviation was achieved with Exposure models (μ = 0.76; σ = 0.09), while geno2pheno [resistance] achieved a lower mean performance (μ = 0.74; σ = 0.14; p = 0.43). The best mean performance on TP could be attained with Exposure and ExposurePheno models (μ = 0.61; σ = 0.08), while geno2pheno [resistance] displayed a lower mean performance (μ = 0.59; σ = 0.10; p = 0.3778).
2.3.3. Assessment of performance for predicting drug resistance. Fig 4 shows the correlation of DES with the logarithmized resistance factors from the T Pheno dataset. DES models trained on ExposurePheno cross-validation sets could attain substantially higher mean Predicting HIV-1 exposure to antiretroviral drugs

Assessment of performance for predicting therapy success.
We tested the performance of DES models and of geno2pheno [resistance] in predicting the binary therapy-success labels of the TCEs in EuResistTCE, EuResistTCE TP , and HIVdbTCE. For this purpose, we used DES and geno2pheno [resistance] , respectively, for calculating a genetic susceptibility score (GSS) for each TCE, an additive score that rates the susceptibility of the virus to the used drugs. For GSS calculation, the predictions of DES models and of geno2pheno [resistance] were translated into probability scores (Methods). The GSS for a TCE is the sum of the probability scores for its constituent drug compounds. The performances of GSS calculated with Exposure models, ExposurePheno models, and geno2pheno [resistance] , respectively, when predicting binary labels for therapeutic success are displayed in Table 5. On the EuResistTCE dataset, the best performance could be attained with Exposure and ExposurePheno models (AUC = 0.71), while the performance of geno2pheno [resistance] was lower (AUC = 0.68). On therapies with baseline sequences measured during a therapy pause (EuResistTCE TP ), ExposurePheno models displayed the best performance (AUC = 0.73), while the performance of geno2pheno [resistance] was lower (AUC = 0.66). The best performance on HIVdbTCE is displayed by geno2pheno [resistance] (AUC = 0.64), while the performance of drug-exposure models was lower (AUC 0.63). On average, ExposurePheno models displayed the highest performance with lowest standard deviation in predicting therapeutic success (μ = 0.69; σ = 0.05). The average performance of geno2pheno [resistance] when predicting therapeutic success was lower (μ = 0.66; σ = 0.02; p = 0.5).

Assessment of performance of drug-exposure models with cutoffbased categorization
HIV-1 nucleotide sequences can be submitted to our web service for interpretation with Expo-surePheno models (see Discussion). For the purpose of facilitating the use of DES by human experts, we estimated cutoffs for translating DES into clinically meaningful categories. We estimated two sets of cutoffs (see Methods for details). DEMax cutoffs translate DES into categories describing degrees of drug exposure (Table A in S1 File), while pheno cutoffs translate DES into categories describing degrees of drug resistance (Table B in S1 File). For determining and testing pheno cutoffs, we applied clinically relevant RF cutoffs to PhenoSense GPPs in Pheno (Table C in S1 File). After discretization of DES with DEMax cutoffs, we calculated their performance when predicting drug exposure on T PRRT , T IN , TP, and HIVdbExposure in terms of AUC (Table D in S1 File). Furthermore, we discretized DES with pheno cutoffs and calculated their misclassification rates when predicting drug resistance in T Pheno (Table E in  The numbers of phenotypes by drug in the D Pheno and T Pheno datasets are tabulated above. Phenotypes were measured with the Antivirogram™ or PhenoSense™ assays. Resistance-factor cutoffs one and ten were used for dichotomizing phenotypes into susceptible and resistant. Predicting HIV-1 exposure to antiretroviral drugs S1 File). The application of cutoffs to DES can be associated with a mild loss in predictive performance. When predicting drug exposure without application of cutoffs, average performances (AUC; mean calculated with all drugs for which an ExposurePheno model is available) on the T PRRT and T IN , TP, and HIVdbExposure datasets are 0.77 (0.06), and 0.57 (0.1), and 0.76 (0.11), respectively. After application of cutoffs, mean performance in predicting drug exposure is 0.76 (0.06), 0.58 (0.14), and 0.76 (0.11), respectively. When predicting phenotypic drug resistance, performance with and without application of cutoffs is difficult to compare HIVdbExposure was obtained from the HIVdb TCE repository and contains protease and reverse-transcriptase sequences. Performance on the test sets was compared to that of geno2pheno [resistance] . Bars depicting mean performances were calculated only using drugs that are common to Exposure and ExposurePheno models, as well as to geno2pheno [resistance] . Predicting HIV-1 exposure to antiretroviral drugs

Discussion
DES models constitute data-driven interpretation systems for HIV-1 protease, reverse-transcriptase, and integrase sequences. Two versions of DES models were trained and tested. Specifically, one version of the models is solely trained on genotypes and drug exposure information (Exposure models), while the other version additionally includes GPPs (Exposure-Pheno models). When compared to ExposurePheno models, Exposure models show a high performance when predicting drug exposure, but their correlation with RFs and their performance when predicting antiretroviral therapy success are lower. We chose to include GPPs in the training sets of ExposurePheno models for the following reasons. Both drug exposure and drug resistance are predictive of success of antiretroviral therapy [34,38-40]. The major factor leading to viral drug resistance is exposure to antiretroviral drugs. Specifically, drug resistance arises through the selection of HIV-1 strains with mutations that confer a replicative advantage in the presence of the drug. Thus, drug exposure indirectly causes drug resistance and therefore, both drug exposure and drug resistance are correlated with certain mutations in the genome of HIV-1. Nevertheless, drug exposure and drug resistance are not redundant, but can complement each other. For this reason, simultaneous interpretation of HIV-1 genotypes with respect to drug exposure and to drug resistance is useful for the prediction of the success of antiretroviral therapy. ExposurePheno models consider drug exposure and drug resistance jointly. For the purpose of including GPPs in the training set of classification models, RFs required categorization. Thus, we replaced the RFs in the GPPs with the labels susceptible and resistant. For the purpose of labeling, the RF cutoffs one and ten were applied to all GPPs, regardless of the drug-resistance test (Antivirogram or PhenoSense) and of the tested drug. GPPs with RFs between one and ten were not used for training the models. When clinically relevant categorization of GPPs is intended, different cutoffs for each drug and drug resistance test must be used [15]. However, rather than producing clinically relevant labels for training, we aimed at discriminating fully susceptible GPPs from those that have developed resistance to an extent well beyond the variability arising from the drug resistance test itself, for the following reasons. First, drug resistance is a continuum, and the creation of training instances with a clear separation in this continuum is adequate for the training of binary classification models. Second, clinically relevant cutoffs are selected under the (implicit) consideration of the pharmacokinetic properties of a drug. For example, the use of ritonavir as a booster for protease inhibitors (PIs) leads to an increased and sustained concentration of PIs in the body [50]. For this reason, clinically relevant cutoffs for boosted PIs are shifted upwards with respect to their unboosted counterparts [15]. However, we aim at discriminating viral sequences that display mutations as a consequence of drug exposure (or as the cause of resistance), without regard for drug concentrations in the blood of patient. The cutoffs one and ten are adequate for combining GPPs produced with the Antivirogram and PhenoSense assays; if other assays are used, other cutoffs might need to be selected. One advantage of ExposurePheno models over Exposure models is their higher performance. Another advantage is that they can make use of an additional data source, the GPPs. The use of GPPs allowed for the training of models for two additional drugs (EVG and RPV).
The interpretations provided by DES models can be used to addresses three questions: (1) Was an HIV-1 variant exposed to a certain drug? (2) Is an HIV-1 variant resistant to a certain drug? and (3) How does the effect of a drug combination therapy decompose into effects of its constituent drugs? In the following, we propose how DES models can be used for addressing the three questions mentioned above.
Ad question (1): When the prediction of drug exposure is required, we propose two ways in which DES can be used. For quantification and comparison of the degree of drug exposure between at least two groups of patients, we do not recommend translating DES into categories (by using cutoffs), since this leads to loss of precision. Instead, DES should be directly used for detecting differences between groups. Note that comparison of DES for different drugs requires normalization, e.g. via the calculation of z-scores (this is provided by our web service). If the prediction of the drug exposures of individual patients is required, cutoffs can be used in order to translate DES into clinically meaningful categories (this is also provided by our web service).
Ad question (2): When predicting drug resistance with DES, one should bear in mind that the correlation of DES with log RFs is weak to strong, depending on the drug in question (Fig  4). Correlation with drug resistance to the nucleotide and nucleoside reverse-transcriptase inhibitors (NRTIs) AZT, d4T, ddC, ddI, and TDF is lower than with resistance to other drugs. We interpret this to be the result of the high similarity of the resistance profiles among these NRTIs [17]. Mutations conferring resistance to one of these NRTIs confer resistance to the other NRTIs (cross-resistance) and are thus less discriminative of exposure and resistance to any specific drug among the NRTIs we mention above. Nonetheless, the correlation is sufficient for predicting the susceptible-intermediate-resistant (SIR) label of GPPs discretized with clinically relevant cutoffs (Table E in S1 File). While a mean misclassification rate of 0.27 (0.1) seems high, most of errors arise from misclassifying intermediate-labeled GPPs (μ = 0.1; σ = 0.06), for which the clinical relevance is uncertain [12], and from misclassification of susceptible-labeled GPPs that are predicted to be intermediate or resistant (μ = 0.1; σ = 0.06). For the drugs AZT, d4T, ddI, and TDF, misclassification of susceptible-and intermediate-labeled GPPs as resistant is especially high, which we also attribute to the high similarity of their resistance profiles. Misclassifying susceptible-labeled GPPs as intermediate or resistant can be clinically adequate, for the following reason. Phenotypic resistance measurements do not account for mutations that do not cause drug resistance at the time of resistance testing, but are indicative that drug resistance can be easily developed in the future [16]. Therefore, when such resistance mutations are present in the viral baseline genotype of a patient, and even if phenotypic resistance measurements indicate full drug susceptibility, classification of the genotype as non-susceptible will prevent selecting a combination of drug compounds that could quickly fail due to the emergence of drug resistant viral variants. Nonetheless, misclassifying susceptible-labeled GPPs as intermediate or resistant could also lead to rejection of a drug for treating a patient although the drug could have been a good choice. When the prediction of the results of phenotypic resistance tests is required, we recommend the use of interpretation systems that have been specifically designed and validated for this purpose, e.g. geno2pheno [resistance] . DES predictions are especially useful in two situations: first, when (imminent) drug resistance is not detected by other methods because the process of selection of drug-resistant variants has led to the selection of certain mutations that have not (yet) resulted in clinically relevant drug-resistance (Fig 3; ExposurePheno models have a higher performance in predicting drug exposure than geno2pheno [resistance] ). Second, when drug-resistant HIV-1 variants are in the process of reverting to the wild type after the removal of drug pressure (Fig 3 and Table 5; ExposurePheno models have a higher performance in predicting drug exposure and therapeutic success than geno2pheno [resistance] when genotypes were obtained during therapy pauses). Decision support for the choice of the use of the tool will be given in our follow-up paper. Ad question (3): DES are predictive of therapeutic success (Table 5). In order to facilitate the use of DES for deciding which drug could be useful components of combination antiretroviral therapy, as well as for interpretation of DES by human experts, our online web service offers the following features. (i) Calculation of DES. (ii) Calculation of z-scores, which normalize DES with respect to their distribution in therapy patients. These z-scores can be useful when DES for different drugs need to be compared or merged for the analysis of clinical data. (iii) Translation of DES into categories related to drug exposure and drug resistance via cutoffs. HIV-1 can mutate as a result of exposure to antiretroviral drugs, which does not necessarily entail clinically relevant drug resistance. Drug-exposure categories help the user to determine whether a viral variant has changed as a result of drug exposure. If a viral variant is not rated unexposed for a certain drug, drug resistance to that drug should be at least suspected. Drug-resistance categories indicate whether viral mutations are not only indicative of exposure to a particular drug, but also indicative of clinically relevant drug resistance. If a viral variant is not rated susceptible, drug resistance is highly likely. Drug-exposure and drug-resistance categories are useful when selecting the drug components of antiretroviral therapy. However, predictions are provided for each drug individually (as in most drug-resistance interpretation tools). Thus, the selection of an adequate drug combination under consideration of DES is still left to the expert. In a follow-up paper we will report on a DES-based model that does not require expert selection of drug combinations. Specifically, we are currently testing DES as input features for a model for predicting the success of combinations of antiretroviral drugs. This model will exploit DES for selecting the compounds of antiretroviral therapy. (iv) Presentation of the basis of the predictions by displaying the residues with the largest influence on the prediction.
In summary, in this study, we present a novel approach for training a data-driven interpretation system for drug exposure and drug resistance. We show that models trained on HIV-1 sequences from patients with known drug history can be used for predicting drug exposure, drug resistance, and therapeutic success, even if no GPPs are used. The inclusion of GPPs in the training sets of the models boosted the performance of the models when predicting in-vitro phenotypic drug resistance measurements and therapeutic success, but not when predicting drug exposure. Compared to geno2pheno [resistance] , the method could attain higher mean performances when predicting drug exposure and therapeutic success. The difference in performance was only statistically significant at the 5% level when predicting drug exposure on T PRRT . Note that many of the drugs in HIVdbTCE are not used any more due to their toxicity profiles or their comparatively low potency. A large advantage of DES models is that they are trained on clinical HIV data and freely available GPPs. In conjunction with a frequently updated database with HIV-1 data from routine clinical practice, such as the EIDB, DES models can be automatically updated on a regular basis. Thus, these models allow us to reduce our dependency on hard-to-obtain GPPs for offering a publicly available data-driven genotypic drug-exposure and drug-resistance interpretation system that is kept up to date. While regularly updatable interpretation systems are clearly the appropriate method for accounting for the growing richness of clinical data, innovative procedures may have to be put in place for adequate certification of such systems. DES models for protease and reverse transcriptase inhibitors have been integrated into the geno2pheno [resistance] server http://www.geno2pheno. org. Support for integrase inhibitors will follow. After a sequence has been submitted for prediction, the tab labeled Drug Exposure must be selected in order to view DES predictions. Note that in the input tab, sample nucleotide sequences can be loaded by selecting the appropriate action. On the website, mutations with the highest influence on the prediction are displayed. These are ordered by the magnitude of their influence. Mutations colored in red increase DES, while those colored in green decrease it.

Ethics statement
All data considered in this study had been previously de-identified. For this reason, consent was neither required nor given by human subjects.

Dataset construction
In the following, we describe a number of datasets that we used for training and validating our models. For the comfort of the reader, we summarize the contents of each of these datasets in Table 1. Furthermore, we depict the relationships of each of the datasets in Figure A in S1 File.

Datasets of genotypes and therapeutic history.
The PRRT and IN datasets contain HIV-1 nucleotide sequences and information on the antiretroviral compounds that were used before each sequence was obtained. When constructing these datasets, we disregarded episodes of treatment with a drug that lasted less than 30 days. The PRRT dataset was constructed by pooling 70,304 HIV-1 protease (PR) and reverse-transcriptase (RT) nucleotide sequences from two sources: 37,799 sequences from the EuResist Integrated Database (EIDB; http://www. euresist.org, downloaded April 11 th , 2014) [45], 9,627 of which were derived from drug-naïve patients (short: drug-naïve sequences), and 32,506 drug-naïve sequences from the Los Alamos National Laboratory Sequence Database (LANLSD; http://www.hiv.lanl.gov/; downloaded on March 31 st , 2015). Among the sequences in the PRRT dataset that were derived from therapyexperienced patients (short: drug-exposed sequences), 18,328 sequences were derived from patients whose complete drug history was available at the time of sequencing. The IN dataset includes a total of 5,523 integrase (IN) nucleotide sequences with the following characteristics: 3,382 sequences were extracted from the EuResist database, 1,240 of which are drug-naïve, 397 have been exposed to an integrase inhibitor (INI) and possibly other drugs, and 1,745 have been exposed only to drugs whose target is different from integrase. The complete drug history was available for 1,432 of the drug-exposed integrase sequences. Additionally, 3,782 drugnaïve integrase sequences from the LANLSD (downloaded on March 31 st , 2015) were added to the IN dataset. Inclusion criteria for the sequences were as follows. (1) Alignment with the MutExt software (http://www.schuelter-gm.de/mutext.html) must not have produced an error due to low sequence similarity to the reference sequence (2) at most 10% of the residues of the considered protein regions could not be determined by the sequencing procedure (considered protein regions are listed in Section Subtype Determination, Sequence Alignment and Encoding), (3) the amino-acid sequence resulting from nucleotide translation must be unique within the dataset, unless drug exposure differed between duplicates. The order of appearance of the sequences in the dataset determined which duplicate sequence was excluded, with sequences appearing first preempting inclusion of sequences appearing later. Older reverse-transcriptase sequences not covering amino-acid positions 221-230 were excluded as well.
PRRT and IN were split into development and test sets, as follows. For the purpose of rigorous validation, HIV-1 nucleotide sequences derived from the same patient were not allowed to be simultaneously present in the development and test sets. In the following, dataset nomenclature consists of an abbreviation describing a characteristic of the dataset and optionally PRRT or IN in subscript. The letters in subscript indicate whether the dataset is a subset of the PRRT or of the IN dataset. All HIV-1 nucleotide sequences in PRRT and IN that were obtained during a therapy pause were assigned to TP (n = 441). This dataset is only used for testing purposes, since we deem therapy-pause sequences valuable for testing and an insignificant minority in the much larger training set. In order to make sure that the test sets are patient-wise disjoint with respect to the development set, a set of test patients P was created iteratively. Initially, P included all patients with sequences in TP. Further patients from PRRT and IN were subsequently added to P by random selection until the number of available sequences for the patients in P was approximately 10% of the number of sequences in PRRT and IN. The test sets T PRRT and T IN contain the protease and reverse-transcriptase sequences available for the patients in P, respectively. The development sets D PRRT and D IN contain the sequences in PRRT and IN, respectively, that are not included in T PRRT , and T IN .

EuResistTCE dataset and standard datum definition.
In order to test the performance of our models in predicting therapeutic success, we created the EuResistTCE test set, as follows. We extracted a total of 9,201 therapy-change episodes (TCEs) from the EIDB [45]. These TCEs were constructed according to the definition of the EuResist Standard Datum [40]. In contrast to the EuResist Standard Datum, however, viral-load (VL) measurements were constrained to those not reaching a lower limit of quantification greater or equal than 50 HIV-1 RNA copies per milliliter of blood plasma. In summary, each TCE includes a protease and reverse transcriptase baseline sequence, the compounds that were prescribed to the patient, a baseline and a follow-up viral load (VL), and a binary label indicating whether the therapy was successful or not. The follow-up VL must have been measured within 4-12 weeks after therapy start, preferring the VL closest to week 8 after therapy start. Therapy success at follow up is defined as an at least 100-fold reduction in the VL or a VL of less than 400 HIV-1 RNA copies per ml of blood plasma. This definition of therapy success was used for producing binary labels for the TCEs. To allow performance comparison, only therapies including the following antiretroviral drugs were considered: 3TC, ABC, AZT, d4T, ddI, FTC, TDF, EFV, ETR, NVP, APV, ATV, DRV, FPV, IDV, LPV, NFV, SQV, TPV and ritonavir as a boosting agent (/r). Therapies including unboosted protease inhibitors (except for NFV, since the drug cannot be boosted) were excluded due to their comparatively inferior potency.
The baseline sequences of the EuResist TCEs partially overlap with the sequences in the datasets described above. A minority of baseline sequences were not included in any of the datasets described above because we could not ascertain whether drug exposure had occurred or the patient was therapy-naïve at the time of sequencing. We created a set of test TCEs, EuR-esistTCE, with a fraction of the initially extracted EuResist TCEs. Specifically, EuResistTCE only contained TCEs with baseline sequences that had not been derived from a patient with an HIV-1 nucleotide sequence in D PRRT or in D IN . A subset of the TCEs in EuResistTCE includes baseline sequences which were obtained during a therapy pause. We refer to these TCEs as EuResistTCE TP .

HIVdbExposure and HIVdbTCE datasets.
For testing the performance of our models in predicting drug exposure and therapeutic success, we created the HIVdbExposure and the HIVdbTCE test sets, respectively. The TCE repository in the HIV Drug Resistance Database was downloaded in its entirety on November 21 st , 2013 [14,46]. The TCE repository contains 58 TCEs from the EuResist database, which were discarded. A total of 1,384 sequences with drug-exposure information could be extracted from the repository. We assigned these sequences to the HIVdbExposure test set. For creating the HIVdbTCE test set, the EuResist Standard Datum definition was applied to therapies in HIVdbTCE whose drug compounds are investigated in this study (with the exception of ddC and raltegravir (RAL) for the sake of performance comparison).

Datasets of genotype-phenotype pairs.
A total of 7,597 GPPs were downloaded from the HIV Drug Resistance Database [14] on April 15 th , 2015 (Pheno dataset). The phenotypic drug-resistance assays used for producing the phenotypes were constrained to Antivirogram 1 [51] and PhenoSense 1 [52]. The genotypes are provided in the form of substitutions with respect to the reference sequence consensus B [14]. 3,323 GPP quantify protease-inhibitor (PI) resistance, 3,477 reverse-transcriptase-inhibitor (RTI) resistance, and 797 INI resistance. The T Pheno test set was created from the Pheno dataset by randomly sampling approximately 10% of the GPP. The rest of the GPPs in Pheno were assigned to the D Pheno development set. For training our models with the GPPs, we categorized the resistance factors (RFs) in D Pheno as susceptible or resistant. Specifically, GPPs with RFs lower or equal to one were categorized as susceptible, while GPPs with RFs greater or equal to ten were categorized as resistant. GPPs with RFs between one and ten were not used for training our models.

Naïve PRRT and Naïve IN datasets.
Transmitted drug resistance (TDR) in PR-and RT-naïve sequences was defined as the presence of at least one mutation in the list of drug resistance mutations for surveillance of transmitted HIV-1 drug resistance (DRMT) [7]. Since the list of DRMT only contains PR and RT mutations, TDR in IN sequences was defined as the presence of an INI drug-resistance mutation in the 2013 IAS list [17]. Following the methodology used for establishing the list of DRMT, INI drug-resistance mutations with a prevalence greater than 0.5% among sequences from the LANLSD in IN were not regarded as indicative of TDR [28]. The Naïve PRRT and Naïve IN were created by randomly sampling 2,500 LANLSD sequences without TDR from the PRRT and IN datasets, respectively. These sequences are not included in T PRRT , T IN , D PRRT or D IN . Naïve PRRT and Naïve IN are used by our web service for z-score calculation.

Subtype determination, sequence alignment and encoding
The subtype distribution in the PRRT and IN datasets was determined with the COMET subtyping tool [53]. Nucleotide sequences in PRRT and IN were aligned against HXB2 and translated, using MutExt (http://www.schuelter-gm.de). The resulting amino-acid sequences, along with amino-acid sequences in the Pheno dataset, were represented vectorially with a binary encoding. The vectorial representation considers substitutions, deletions and the presence of insertions within the following HXB2 amino-acid positions: protease 3-99, reverse transcriptase 40-230, and integrase 30-260. The presence of deletions and insertions was encoded for each amino-acid position, while the amino acids of which a specific insertion consists were not encoded.

Creation of exposure and ExposurePheno development sets
D PRRT and D IN were used for constructing the development sets Exposure drug for drug 2 {ABC, AZT, d4T, ddC, ddI, 3TC/FTC, TDF, EFV, ETR, NVP, RPV, ATV, DRV, APV/FPV, IDV, LPV, NFV, SQV, TPV, RAL, EVG} which contain an equal number of sequences exposed and not exposed to a certain drug, along with binary labels indicating exposure to the drug. Sequences not exposed to the drug were randomly selected from D PRRT or D IN , as they were in excess; these sequences were required to have been derived from patients whose complete drug exposure history is recorded. Where possible, half of the sequences not exposed to the drug were drug-naïve, and half of them were exposed to some other drug. A development set Exposure naïvePRRT containing an equal number of drug-naïve and drug-experienced PRRT sequences was constructed as well. An Exposure naïveIN development set was not created due to the fact that only a sufficient number of RAL-exposed integrase sequences was available.
The ExposurePheno drug development sets were created from the Exposure drug sets with additional supplementation of some GPPs from the D Pheno dataset. Specifically, genotypes with corresponding RFs classified as resistant were treated as drug-exposed sequences while those with corresponding RFs classified as susceptible were treated as sequences not exposed to the drug in question. Genotypes with corresponding RFs between the two cutoffs were not used for training (see Phenotypic Resistance Cutoffs). This procedure incremented the number of available drug-exposed sequences and allowed for the creation of the development sets ExposurePheno RPV and ExposurePheno EVG , as the number of available drug-exposed sequences for RPV and EVG was very low. Development sets for dolutegravir could not be created, as neither a sufficient number of resistant phenotypes nor a sufficient number of drugexposed sequences were available.

Training and selection of models for predicting drug exposure
For performing five repetitions of a 10-fold cross validation, each Exposure drug and each Expo-surePheno drug set was randomly partitioned five times into ten folds. Each fold contained an equal proportion of sequences with and without exposure to the drug in question. The partitions were used to cross validate linear support-vector classifiers (SVCs) [47,48] discriminating between sequences with and without exposure to a certain drug. The vectorial representation used to train each drug-specific model was constrained to the vector elements describing the drug's target protein (protease, reverse transcriptase or integrase). Each cross validation was performed with a certain value for the regularization parameter c for the SVM, specifically, c 2 {2 −8 , 2 −7 , . . ., 2 2 }. Performance was measured in terms of the area under the receivingoperator-characteristic curve (AUC) [54,55]. The signed distance to the classification hyperplane (also called decision value) was used as a score for predicting drug exposure. Thus, we call such decision values drug-exposure scores (DES). For each cross-validation set and vectorial representation, the model with the lowest value of c whose average performance was not significantly lower than the best average performance was selected (Benjamini-Hochberg-corrected Wilcoxon signed-rank test [49] with a significance threshold of 0.05). Finally, each cross-validation set and vectorial representation was used without partitioning to train a final SVC with the selected value of c. We refer to these SVCs as the final DES models.

Assessment and comparison of performance
The performance of the drug-exposure models was compared to that of geno2pheno [resistance] 3.3 (http://www.geno2pheno.org) [20]. The output of geno2pheno [resistance] includes a prediction of the resistance-factor (RF). Since geno2pheno [resistance] uses its own alignment program, performance comparison was constrained to the set of sequences which could be aligned without errors by geno2pheno [resistance] . Furthermore, the drug ddC was also excluded from performance comparison, as it is not supported by geno2pheno [resistance] any more.
Assessment of performance. Sequences in T PRRT , T IN , TP, EuResistTCE, EuResistTCE TP , HIVdbTCE, HIVdbExposure, and T Pheno were interpreted with the final drug-exposure models and geno2pheno [resistance] . Performance was assessed in three respects. First, the performance of DES and of geno2pheno [resistance] when predicting drug exposure was assessed using T PRRT , T IN , TP, and HIVdbExposure. These datasets contain HIV-1 sequences and a binary matrix indicating the previous exposure of these sequences to each individual drug compound. The area under the receiver operating characteristic curve (AUC) was used as a performance measure. Second, the performance of DES when predicting drug resistance was quantified with the correlation between DES for the genotypes in T Pheno and the corresponding log RFs. Unfortunately, performance in predicting drug resistance could not be compared to that of geno2pheno [resistance] , since the genotypes in T Pheno were only available as amino-acid sequences and geno2pheno [resistance] requires nucleotide sequences as an input. Third, the performance of DES and of geno2pheno [resistance] when predicting therapy success was assessed with EuResistTCE, EuResistTCE TP , and HIVdbTCE. For this purpose, DES and RF predictions were converted to probability scores. Specifically, DES, which are SVM decision values, were converted to probability scores as described by Platt [56]. Predicted RFs were converted to probability scores by fitting a two-component Gaussian-mixture model. In the Gaussian mixture model, one Gaussian is fitted to RFs that belong to the susceptible population, while the other Gaussian is fitted to RFs that belong to the resistant population. Subsequently, a sigmoid function is used for estimating the probability of resistance [20]. We define the probability of susceptibility as one minus the probability of resistance. Probability scores were used for calculating a genetic susceptibility score (GSS) for each therapy. A GSS consisted of the sum of the individual probability scores for each drug in the regimen. For each therapy, three GSS were calculated. The first two GSS were calculated using the probability scores derived with DES from Exposure and ExposurePheno models, respectively, while the third GSS was calculated with the probability scores derived with geno2pheno [resistance] . Performance in predicting therapeutic success was quantified with the AUC. Significance values in the Results section were calculated with a two-sided Wilcoxon signed-rank test [49].

Determination of parameters for our web service
Our drug-exposure models are freely available online (http://www.geno2pheno.org; see Discussion). For the purpose of facilitating the use of DES by human experts, we calculated two sets of parameters. The first set of parameters is used for calculating z-scores of DES. It includes a mean and a standard-deviation value for each drug, calculated with the nucleotide sequences in Naïve PRRT and Naïve IN . The second set of parameters includes cutoffs which translate DES into clinically meaningful categories related to drug exposure and drug resistance. For the purpose of displaying the sequence features with the largest influence on the predictions of DES models, we translated the Support Vectors and corresponding Support-Vector coefficients of each SVC into a linear function. In the following, we detail on the procedures we used for determining the z-score parameters, the cutoffs, and for extracting the weights for the input features. 4.8.1. Calculation of z-scores from drug-exposure scores. We interpreted each sequence in Naïve PRRT with each DES model for predicting exposure to protease inhibitors (PIs) and reverse-transcriptase inhibitors (RTIs). Likewise, we interpreted each sequence in Naïve IN with each DES model for predicting exposure to INIs. For each drug, we calculated the mean and standard deviation of the resulting DES. Our web service calculates the z-score for a sequence s and compound drug according to the following formula where z drug (s) is the z-score for sequence s and compound drug, δ drug (s) is the DES for sequence s and compound drug, μ drug is the mean DES value calculated with the Naïve PRRT or Naïve IN datasets, and σ drug is the corresponding standard deviation. 4.8.2. Estimation of cutoffs of drug-exposure scores. Two sets of cutoffs were determined for each final DES model. The following goals are addressed by each set of cutoffs: (1) prediction of drug exposure and (2) prediction of drug resistance. Each set of cutoffs includes a lower and an upper cutoff for the corresponding DES models. The description of the methods used for determination of these cutoffs follows.

Cutoffs maximizing the performance of the prediction of drug exposure (DEMax cutoffs).
ExposurePheno drug cross-validation sets were interpreted with the corresponding final DES models that were trained on them (this is also called calculation of reinsertion predictions). For each cross-validation set, an upper and a lower cutoff were estimated such that the AUC of the drug-exposure prediction is maximized. We call these cutoffs the DEMax cutoffs, and they allow for the discretization of a DES for a drug into the categories unexposed (U), possible exposure (PE) and probably exposed (E). A detailed description of the procedure with which DEMax cutoffs were determined follows.
Function (1) was defined for discretization of a value δ s 2 R associated to a sequence s by using the lower and upper cutoffs c L , c U 2 R.
Let Δ drug 2 R n be a vector of DES predicting the drug exposure of each of n sequences s to drug, and let E drug 2 {0,1} n be the corresponding vector of class labels, indicating whether each sequence s was exposed to the drug or not. Application of cutoffs c L , c U and Function (2) to a vector of DES Δ drug results in the discrete DES vector discretize(c L , c U , Δ drug ). For each bootstrap replicate, an upper and a lower cutoff c L and c U were determined as where AUC(discretize(c L , c U , Δ drug ), E drug ) is the AUC quantifying the performance of discretize(c L , c U , Δ drug ) in predicting exposure to drug for each sequence with a DES in Δ drug . AUC maximization was performed via grid search over c L and c U . ExposurePheno drug cross-validation sets were interpreted with the corresponding final models that were trained on them. Two thousand bootstrap replicates of the DES of each cross-validation set were created. For each bootstrap replicate and the corresponding class labels, an upper and a lower cutoff were determined by AUC maximization Eq (3). The resulting 2,000 upper and 2,000 lower cutoffs for each final drug-exposure model were averaged to yield the final set of cutoffs. We call these cutoffs the DEMax cutoffs. If a DES for a drug is less than both cutoffs for that drug, then we discretize that DES as unexposed (U). If a DES is greater or equal than the lower cutoff, but less or equal than the upper cutoff, we discretize that DES as intermediate exposure (IE). Finally, if a DES is greater than both cutoffs, then we discretize that DES as exposed (E).

Phenotypically-guided cutoffs for prediction of phenotypic in-vitro drug resistance (pheno cutoffs).
A set of clinically relevant cutoffs for PhenoSense GPPs was obtained from the HIVdb website [14] and is composed as follows. 3TC: 3 and 20; ABC: 3 and 6; AZT: 3 and 10; d4T: 1.5 and 2; ddI: 1.5 and 2; TDF 1.5 and 4; all non-nucleoside reverse-transcriptase inhibitors (NNRTIs): 3 and 10; and all INIs 4 and 20. The set of clinically-relevant cutoffs were used for discretizing PhenoSense GPPs in D Pheno into the categories susceptible (S), intermediate (I) or resistant (R), henceforth called the true labels. The genotypes associated with these GPPs were interpreted with the final DES models. For each drug, an upper and a lower DES cutoff yield predicted GPP labels. These cutoffs, which we call pheno cutoffs, are determined such that the sum of the penalties quantifying the differences between the true labels and the predicted labels is minimized. An individual penalty equals one, if the true label was R and the predicted label was S. If the true label is I, and the predicted label S, the penalty equals 0.75. All other differences between true and predicted labels were penalized with the value 0.5, while the equality of true and predicted labels was not penalized. Pheno cutoffs allow for discretization of a DES for a drug as susceptible (S), intermediate (I) or resistant (R). Further details on the cutoff-determination procedure, including the rationale for choosing the penalty values follow.
The error matrix E 2 R 3x3 Eq (4) was defined for penalizing the misclassification of a discretized value δ s with label l 2 {1,2,3} and predicted labell^2 1; 2; 3 E ðl;lÞ ¼ The rationale for choosing the values of the error matrix follows. Diagonal entries are zero, as correct classification incurs no penalty. From a clinical perspective, the worst kind of misclassification that can occur is the classification of a resistant viral strain (label 3) as susceptible (label 1), since the prescription of a therapy including a thus misclassified compound could compromise the susceptibility of all compounds in the therapy. Therefore, this kind of misclassification was assigned the maximum penalty, one. Misclassification of a resistant strain as intermediate (label 2) deserves a smaller penalty, as surpassing the lower cutoff indicates a clinically-relevant decrease in susceptibility, albeit implying that some susceptibility is given. Therefore, this kind of misclassification was assigned the penalty 0.75. All other types of misclassifications are considered equally undesirable, but less severe than the first two, and were assigned the penalty 0.5. Clinically-relevant cutoffs were used to discretize PhenoSense GPPs in D Pheno with Function (2), yielding their labels. The genotypes s associated with these GPPs were interpreted with the DES models. For each drug involved in a GPP, 2,000 bootstrap replicates of the PhenoSense GPPs in D Pheno were sampled. In order to assign to each of the three classes the same weight in this procedure, each bootstrap replicate was constructed using an equal number of GPPs with each label. For each drug, this number was equal to the maximum number of GPPs with a certain label. Each bootstrap replicate was used to determine a lower and an upper cutoffĉ L ,ĉ U which minimizes the sum of the penalties E ðl;l Þ for each label l = discretize(c L , c U , RF s ) with corresponding predictionl ¼ discretizeðĉ L ;ĉ U ; DES s Þ for a resistance factor RF and a drug-exposure score DES associated with genotype s. The resulting 2,000 cutoff pairs for each drug and DES model were averaged, yielding the final phenotypically guided cutoffs. If a DES for a drug is less than both cutoffs for that drug, then we discretize that DES as susceptible (S). If a DES is greater or equal than the lower cutoff, but less or equal than the upper cutoff, we discretize that DES as intermediate (I). Finally, if a DES is greater than both cutoffs, then we discretize that DES as resistant (R). 4.8.5. Extraction of input-feature weights from drug-exposure models. For the purpose of displaying the input features (i.e. HIV-1 substitutions, insertions, and deletions) with the largest influence on a DES interpretation, we represented the SVCs that produce DES as linear functions. Let x i 2 {0,1} p , i 2 {1, . . ., n} be the Support Vectors for a given DES model, α i 2 R their corresponding Support-Vector coefficients, and ρ 2 R their intercept. The linear-function representation for a DES model is given by where x s 2 {0,1} p is the encoding for input sequence s. Given an encoded sequence x s , the linear Function (5) produces the same numerical output as the corresponding DES SVC. The linear function consists of an offset (also called y-axis intercept) and p coefficients that correspond to the components of the vectors that encode each sequence. The vector P n i¼1 a i x i contains these coefficients (also called weights). Since the encoding of the sequence x s is binary, DES calculation can be performed by adding the offset to the coefficients that correspond to the input features that are present in sequence s. In our web service, we display for each drug a selection of features of the input sequence. These features have the largest absolute values of the coefficients in the linear-function representation of DES models. Features with positive coefficients increase DES and are displayed in red. Features with negative coefficients decrease DES and are displayed in green.