Pervasive and diverse collateral sensitivity profiles inform optimal strategies to limit antibiotic resistance

Evolved resistance to one antibiotic may be associated with "collateral" sensitivity to other drugs. Here, we provide an extensive quantitative characterization of collateral effects in Enterococcus faecalis, a gram-positive opportunistic pathogen. By combining parallel experimental evolution with high-throughput dose-response measurements, we measure phenotypic profiles of collateral sensitivity and resistance for a total of 900 mutant–drug combinations. We find that collateral effects are pervasive but difficult to predict because independent populations selected by the same drug can exhibit qualitatively different profiles of collateral sensitivity as well as markedly different fitness costs. Using whole-genome sequencing of evolved populations, we identified mutations in a number of known resistance determinants, including mutations in several genes previously linked with collateral sensitivity in other species. Although phenotypic drug sensitivity profiles show significant diversity, they cluster into statistically similar groups characterized by selecting drugs with similar mechanisms. To exploit the statistical structure in these resistance profiles, we develop a simple mathematical model based on a stochastic control process and use it to design optimal drug policies that assign a unique drug to every possible resistance profile. Stochastic simulations reveal that these optimal drug policies outperform intuitive cycling protocols by maintaining long-term sensitivity at the expense of short-term periods of high resistance. The approach reveals a new conceptual strategy for mitigating resistance by balancing short-term inhibition of pathogen growth with infrequent use of drugs intended to steer pathogen populations to a more vulnerable future state. Experiments in laboratory populations confirm that model-inspired sequences of four drugs reduce growth and slow adaptation relative to naive protocols involving the drugs alone, in pairwise cycles, or in a four-drug uniform cycle.


Introduction
The rapid emergence of drug resistance is an urgent threat to effective treatments for bacterial infections, cancers, and many viral infections [1][2][3][4][5][6]. Unfortunately, the development of novel a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 drugs is a long and arduous process, underscoring the need for alternative approaches to forestall resistance evolution. Recent work has highlighted the promise of evolution-based strategies for optimizing and prolonging the efficacy of established drugs, including optimal dose scheduling [7][8][9]; antimicrobial stewardship [10,11]; drug cycling [12][13][14]; consideration of spatial dynamics [15][16][17], cooperative dynamics [18][19][20][21], or phenotypic resistance [22][23][24]; and judicious use of drug combinations [25][26][27][28][29][30][31][32]. In a similar spirit, a number of recent studies have suggested exploiting collateral sensitivity as a means for slowing or even reversing antibiotic resistance [33][34][35][36][37][38]. Collateral evolution occurs when a population evolves resistance to a target drug while simultaneously exhibiting increased sensitivity or resistance to a different drug. From an evolutionary perspective, collateral effects are reminiscent of the trade-offs inherent when organisms are required to simultaneously adapt to different tasks, an optimization that is often surprisingly simple because it takes place on a low-dimensional phenotypic space [39,40]. If similarly tractable dynamics occur in the evolution of multidrug resistance, systematic optimization of drug deployment has the promise to mitigate the effects of resistance.
Indeed, recent studies in bacteria have shown that the sequential [38,[41][42][43][44][45][46] or simultaneous [47,48] deployment of antibiotics with mutual collateral sensitivity can sometimes slow the emergence of resistance. Unfortunately, collateral profiles have also been shown to be highly heterogeneous [49,50] and often not repeatable [51], potentially complicating the design of successful collateral sensitivity cycles. The picture that emerges is enticing but complex; although collateral effects offer a promising new dimension for improving therapies, the design of drug cycling protocols is an extremely difficult problem that requires optimization at multiple scales, from dynamics within individual hosts to those that occur at the hospital or community scale. Despite many promising recent advances, it is not yet clear how to optimally harness collateral evolutionary effects to design drug policies, even in simplified laboratory scenarios. The problem is challenging for many reasons, including the stochastic nature of evolutionary trajectories and-at an empirical level-the relative paucity of data regarding the prevalence and repeatability of collateral sensitivity profiles in different species.
In this work, we take a step toward answering these questions by investigating how drug sequences might be used to slow resistance in a simplified, single-species bacterial population. We show that even in this idealized scenario, intuitive cycling protocols-for example, sequential application of two drugs exhibiting reciprocal collateral sensitivity-are expected to fail over long time periods, though mathematically optimized policies can maintain long-term drug sensitivity at the price of transient periods of high resistance. As a model system, we focus on Enterococcus faecalis, a gram-positive opportunistic bacterial pathogen. E. faecalis are found in the gastrointestinal tracts of humans and are implicated in numerous clinical infections, ranging from urinary tract infections to infective endocarditis, in which they are responsible for between 5% and 15% of cases [52][53][54][55][56]. For our purposes, E. faecalis is a convenient model species because it rapidly evolves resistance to antibiotics in the laboratory [57,58], and fully sequenced reference genomes are available [59].
By combining parallel experimental evolution of E. faecalis with high-throughput doseresponse measurements, we provide collateral sensitivity and resistance profiles for 60 strains evolved to 15 different antibiotics, yielding a total of 900 mutant-drug combinations. We find that cross-resistance and collateral sensitivity are pervasive in drug-resistant mutants, though patterns of collateral effects can vary significantly, even for mutants evolved to the same drug. Notably, however, the sensitivity profiles cluster into groups characterized by selecting drugs from similar drug classes, indicating the existence of large-scale statistical structure in the collateral sensitivity profiles. To exploit that structure, we develop a simple mathematical framework based on a Markov decision process (MDP) to identify optimal antibiotic policies that

Isolates exhibit variability in fitness costs and collateral profiles
To investigate the potential impact of resistance evolution on fitness, we estimated the specific growth rate and the lag time in drug-free media for isolates selected from each of the 60 populations (four populations per selecting drug). The growth costs vary both for different selecting drugs and even across different populations selected by the same drug (Fig 2), similar to results in other species [49]. In some isolates-such as those selected by oxacillin (OXA) or nitrofurantoin (NIT)-growth rate and lag times are indistinguishable from those of the ancestral strains. On the other hand, isolates selected by CRO and FOF-selecting conditions that frequently result in collateral sensitivity-show dramatically reduced growth and an increased lag  (Table 1). After 8 days, a single mutant was isolated from each population. (B) The IC 50 for each of 15 drugs was estimated for all 60 mutants by nonlinear fitting of a dose-response curve (relative OD) to a Hill-like function (Materials and methods). (C) Main panel: resistance (red) or sensitivity (blue) of each evolved mutant (horizontal axis; 15 drugs × 4 mutants per drug) to each drug (vertical axis) is quantified by the log 2 -transformed relative increase in the IC 50 of the testing drug relative to that of WT (V583) cells. Although the color scale ranges from a 4× decrease to a 4× increase in IC 50 , it should be noted that both resistance to the selecting drug (diagonal blocks) and collateral effects can be significantly higher. Each column of the heat map represents a collateral sensitivity profile for one mutant. Right panel: enlarged first column from main panel. Mutants isolated from replicate populations evolved to DAP exhibit diverse sensitivity profiles. Although all mutants are resistant to the selecting drug (DAP), mutants may exhibit either sensitivity or resistance to other drugs. For example, the first and last replicates exhibit cross-resistance to CRO, whereas replicate 2 exhibits collateral sensitivity, and replicate 3 shows little effect. Data underlying this figure can be found in S1 Data. AMP, ampicillin; CHL, chloramphenicol; CIP, ciprofloxacin; CRO, ceftriaxone; DAP, daptomycin; DOX, doxycycline; FOF, fosfomycin; IC 50 , half-maximal inhibitory concentration; LVX, levofloxacin; LZD, linezolid; mut, mutant; NIT, nitrofurantoin; OD, optical density; OXA, oxacillin; RIF, rifampicin; SPT, spectinomycin; TET, tetracycline; TGC, tigecycline; WT, wild-type. time, suggesting that the selected resistance determinants are associated with strong pleiotropic effects even in drug-free media.
Our results indicate that collateral profiles can vary even when mutants are evolved in parallel to the same drug ( Fig 1C). For example, all four mutants selected by daptomycin (DAP) exhibit high-level resistance to the selecting drug, but replicates 1 and 4 exhibit collateral resistance to CRO, whereas replicate 2 exhibits collateral sensitivity, and replicate 3 shows little effect ( Fig 1C, right panel). To quantify the variation between replicates selected by the same drug, we considered the collateral profile of each mutant (i.e., a column of the collateral sensitivity matrix) as a vector in 15-dimensional drug resistance space. Then, for each set of replicates, we defined the variability V � We find that the variability is significantly correlated with average resistance to the selecting drug, even when one removes contributions to variability from the selecting drug itself (Fig 2C, left), indicating that collateral (rather than direct) effects underlie the correlation. Such a correlation might be expected if, for example, resistance arises from an accumulation of stochastic events following a Poisson-like distribution, in which the mean is proportional to the variance. We do note, however, that selection by spectinomycin (SPT) represents a notable exception to this trend. These results suggest that the repeatability of collateral effects is sensitive to the drug used for selection. As a result, certain drugs may be more appropriate for establishing robust antibiotic cycling profiles.
To further quantify the variability within and between isolates selected by different drugs, we calculated the pairwise Euclidean distance between collateral profiles of isolates selected in the same drug and pairs of isolates selected in different drugs (Fig 2C, right). We see that the distributions do have some overlap; that is, pairs of isolates selected by the same drug are sometimes more distinct from one another, by this metric, than pairs selected by different drugs. However, the distribution for different selecting drugs (blue) has a significantly shifted mean, indicating that isolates selected by the same drug are more similar to one another (on average) than to isolates selected by different drugs.

Cross-resistance to DAP appears frequently under selection by different drugs
DAP is a lipopeptide antibiotic sometimes used as a last line of defense against gram-positive bacterial infections, including vancomycin-resistant enterococci (VRE). Although DAP resistance was initially believed to be rare [61], it has become increasingly documented in clinical settings [62]. Recent work in a related enterococcal species has shown that cross-resistance to DAP can arise from serial exposure to chlorhexidine, a common antiseptic [63], but less is known about DAP cross-resistance following exposure to other antimicrobial agents. Surprisingly, our results indicate that DAP resistance is common when populations are selected by other antibiotics, with 64% of all evolved lineages displaying DAP cross-resistance and only 11% displaying collateral sensitivity ( Fig 3A).

Selection by linezolid leads to higher chloramphenicol resistance than direct selection by chloramphenicol
Surprisingly, we found that isolates selected by linezolid (LZD) developed higher resistance to chloramphenicol (CHL) than isolates selected directly by CHL ( Fig 3B). The isolates from LZD and from CHL exhibit similar growth and lag-time distributions in drug-free media ( Fig  2B), suggesting that this effect is not driven by fitness costs alone. To investigate further, we isolated LZD-selected mutants at days 2, 4, 6, and 8 of the laboratory evolution and measured the resistance of each to CHL. We find that early-stage (days 4-6) mutants exhibit low-level CHL sensitivity just prior to a dramatic increase in cross-resistance around day 8. These findings suggest that LZD selection drives the population across a CHL fitness valley, ultimately leading to levels of resistance that exceed those observed by direct CHL selection (Fig 3B,  inset).
To further investigate the repeatability of this phenomenon, we exposed 32 additional populations to increasing LZD concentrations in parallel over 8 days. Using the four initial LZD mutants as a guide, we measured the CHL susceptibility of isolates from each population at day 5 (Fig 3C, purple) and in isolates from 23 populations at day 8 (Fig 3C, blue; note that we identified contamination in nine populations after day 5 and therefore excluded them from the day-8 analysis; see Materials and methods). In addition, to account for potential heterogeneity in the original populations, we measured CHL susceptibility in three different (single colony) isolates from each of the original four populations selected in CHL (Fig 3C, black). On Blue or red circles correspond to the isolate, and black circles correspond to ancestral strains. Light green lines show fits to logistic growth function [60] given by g(t) = g 0 +K c (1+exp (4μ(λ−t)/K c +2)) −1 , where μ is the maximum specific growth rate, λ is the lag time, and K c is the carrying capacity. To reduce the number of free parameters, we fix K c = 0.5 to match that of the ancestral strain. (B) Maximum specific growth rate (μ, left) and lag time (λ, right) in drug-free media for isolates from each of the four populations selected by each drug. All values are normalized by the values measured in the ancestral strain. Error bars are standard errors of the mean estimated from three technical replicates for each isolate. (C) Left panel: variability in replicates for all 15 drugs versus the (log 2 -scaled) fold increase in IC 50 to the selecting drug (Spearman ρ = 0.58, p = 0.03 including the SPT mutants; ρ = 0.82, p < 10 −3 , without the SPT mutants). Variability is defined as V � P m i¼1 d i =m, where m = 4 is the number of replicates, and d i is the Euclidean distance between mutant i and the centroid formed by all vectors corresponding to a given selecting drug (S2 Fig). Right panel: histogram of Euclidean distances between collateral profiles in pairs of isolates selected by the same (red) or different (blue) drugs. Distributions exhibit significantly different means (p<10 −3 , Welch t test). To emphasize collateral, rather than direct, effects, the component(s) of each collateral profile corresponding to the selecting drug(s) were removed prior to calculating variability and pairwise Euclidean distances. Data underlying this figure can be found in S1 Data. AMP, ampicillin; CHL, chloramphenicol; CIP, ciprofloxacin; CRO, ceftriaxone; DAP, daptomycin; DOX, doxycycline; FOF, fosfomycin; LVX, levofloxacin; LZD, linezolid; mut, mutant; NIT, nitrofurantoin; OD, optical density; OXA, oxacillin; RIF, rifampicin; SPT, spectinomycin; TET, tetracycline; TGC, tigecycline.  50 for mutants evolved to LZD (blue). The width of the green line represents the confidence interval (± 3 standard errors of day 5, almost one-third (10 of 32) of the LZD-selected strains exhibited CHL resistance greater than that of any day-8 CHL-selected strains, whereas 25% (eight of 32) were more CHL-sensitive than even the ancestral strains. By contrast, on day 8, the vast majority of isolates (17 of 23) were highly CHL-resistant, with only a few strains (two of 23) exhibiting small levels of collateral sensitivity.
To identify genes that may be responsible for collateral CHL resistance, we PCR amplified and (Sanger) sequenced seven genes (30S ribosomal protein S10 (rpsJ), L3, and L4, which are genes for ribosomal proteins, and four genes for 23S rRNA: 23SA, 23SB, 23SC, 23SD) previously associated with LZD resistance [64] in a subset of 12 isolates. We selected the most CHLresistant isolate from each CHL population, two pairs of day-5/day-8 LZD-selected isolates that exhibited collateral sensitivity on day 5 and cross-resistance on day 8, two LZD-selected isolates with high-level collateral sensitivity to CHL on day 5, and two LZD isolates with large cross-resistance on day 8 ( Fig 3C; specific isolates marked by black arrows). We did not observe mutations in rpsJ, L3, or 23SB in any strain. In addition, the four LZD-selected isolates showed no mutations in any of the sequenced genes on day 5 ( Fig 3D). By contrast, all four of the LZD-selected strains contained at least three mutations in the 23S rRNA genes on day 8. Two of the CHL-selected isolates had mutations in L4, and one had a single mutation in the 23SC gene.
We observe a strong correlation between the level of CHL resistance and the total number of 23S rRNA mutations, similar to the dosing behavior previously observed for LZD [64]. This correlation suggests that the 23S mutations found in LZD-selected (and CHL-resistant) isolates from day 8-but missing in the CHL-sensitive isolates from day 5-may be responsible for the later-stage, high-level cross-resistance to CHL. Elucidating the precise evolutionary dynamics underlying differential selection for these mutations in LZD and CHL remains an open question, though the early (day 5) CHL sensitivity observed in LZD-selected isolates suggests that it may be necessary to cross a fitness valley in CHL resistance in order to eventually achieve higher CHL resistance.

Whole-genome sequencing reveals known resistance determinants and mutations in genes previously linked with collateral sensitivity
To investigate the genetic changes in drug-selected populations, we sequenced population samples from one (arbitrarily selected) evolved population per drug. In addition, we isolated and sequenced a single clone from each population. As controls, we sequenced two different isolates from the ancestral V583 stock as well as both single isolates and a population sample propagated in drug-free media. We then used breseq [65], an established computational pipeline capable of mutant identification in both clonal and population samples (Materials and methods). To minimize potential artifacts from sample preparation or analysis, we excluded from further analysis six populations in which variants identified by clonal and population samples did not share at least one mutation (but see S1 Data for results from all 15 populations). In addition, we limit our focus to those mutations estimated to occur with frequency greater than 30% in the population samples. This analysis revealed a total of 26 mutations in the nine populations (Table 2; note that the population selected in NIT contained no identifiable mutations). The control strain propagated in BHI contained no mutations relative to the ancestral strains. In the majority of the population samples, we identified mutations that likely confer resistance to the selecting drug. For example, we observed mutations in genes for drug targets associated with protein synthesis inhibitors (rpsJ [66], 30S ribosomal protein S5 [rpsE] [67]) and fluoroquinolones (DNA topoisomerase 4 subunit A [parC]; DNA gyrase subunit A, [gyrA] [68]). We also identified mutations in a sensor histidine kinase [69,70] (EF3290) in populations selected by cell wall inhibitors and mutations in 23S rRNA genes in the LZD-selected population [64,71]. Surprisingly, the DAP-selected population did not contain mutations in genes previously identified with DAP resistance [57,58], though we observe a mutation in rpsJ in both the clonal and population sequences. Although previous experiments have shown rpsJ not to confer DAP resistance in one genetic background of E. faecalis strain S613 [66], it may underlie the observed cross-resistance to other antibiotics. Finally, we observe no mutations in either the clonal or population sequencing for the Nit1 population, despite repeated experiments confirming Collateral sensitivity informs optimal anti-resistance strategies increased resistance to NIT. Because the resistance is relatively low level (IC 50 increases by approximately 50% relative to ancestor), it is possible the observed resistance corresponds to transient phenotypic resistance, similar to the postantibiotic effect or the cellular hysteresis observed when drugs are rapidly cycled [46]. Several of the mutations we identified occur in genes previously linked with collateral effects in other species. For example, mutations in the topoisomerase gene gyrA have been posited to induce collateral sensitivities via global transcriptional changes induced by modulated DNA supercoiling [35,72,73]. Similarly, mutations in ribosomal genes, such as rpsE, have been linked with multidrug resistance modulated by large-scale changes in the transcriptome [74].

Sensitivity profiles cluster into groups based on known classes of selecting drug
Our results indicate that there is substantial heterogeneity in collateral sensitivity profiles, even when parallel populations are selected with the same antibiotic. Although the genetic networks underlying these phenotypic responses are complex and, in many cases, poorly understood, one might expect that selection by chemically or mechanistically similar drugs would lead to profiles with shared statistical properties. For example, previous work showed (in a different context) that pairwise interactions between simultaneously applied antibiotics can be used to cluster drugs into groups that interact monochromatically with one another; these groups correspond to known drug classes [75], highlighting statistical structure in drug interaction networks that appear, on the surface, to be extremely heterogeneous. Recent work in bacteria has also shown that phenotypic profiles of mutants selected by drugs from the same class tend to cluster together in Pseudomonas aeruginosa [38] and Escherichia coli [76].
Similarly, we asked whether collateral sensitivity profiles in E. faecalis can be used to cluster resistant mutants into statistically similar classes. We first performed hierarchical clustering (Materials and methods) on collateral profiles of 52 different mutants (Fig 4, x-axis; note that we excluded mutants selected by CHL and NIT, which did not achieve resistance of at least 2× to the selecting drug). Despite the heterogeneity in collateral profiles, they cluster into groups characterized-exclusively-by selecting drugs from the same drug classes before grouping mutants from any two different drug classes. For example, inhibitors of cell wall synthesis (ampicillin [AMP], CRO, FOF, OXA) cluster into one group (noted by panel A in Fig 3), whereas tetracycline (TET)-like drugs (TET, doxycycline [DOX], tigecycline [TGC]) cluster into another (noted by panel B). This approach also separates SPT (aminoglycoside antibiotic class) from the TET class of antibiotics (TET, DOX, TGC) despite the fact that they both target the 30S subunit of the ribosome, suggesting that it may help identify drugs with similar mechanisms but statistically distinct collateral profiles.
We then performed a similar clustering analysis of the collateral responses across the 14 different testing drugs (Fig 3, y-axis), which again leads to groupings that correspond to known drug classes. One drug, FOF, provides an interesting exception. Mutants selected for FOF resistance cluster with those of other cell wall synthesis inhibitors (class A, columns). However, the behavior of FOF as a testing drug (last row) is noticeably distinct from that of other cell wall synthesis inhibitors (the three rows directly above FOF). Taken together, the clustering analysis reveals clear statistical patterns that connect known mechanisms of antibiotics to their behavior as both selecting and testing agents.

An MDP model predicts optimal drug policies to constrain resistance
Our results indicate that collateral sensitivity is pervasive, and although collateral sensitivity profiles are highly heterogeneous, clustering suggests the existence of statistical structure in the data. Nevertheless, because of the stochastic nature of the sensitivity profiles, it is not clear whether this information can be leveraged to design drug sequences that constrain evolution. It is important to note that our goal, at this stage, is not to design specific drug sequences that might be transferred directly to the clinic but, instead, to evaluate-in a simple setting-the feasibility of slowing resistance in even the most optimized cases. Given that resistance to the selecting drugs is often larger in magnitude than collateral (off-diagonal) effects, it is not clear a priori that a feasible strategy exists that prevents the inevitable march to high-level resistance, even in a highly idealized setting. . When clustering the testing drugs (rows), drugs from the same class are frequently but not always clustered together. For example, cell wall drugs such as AMP, OXA, and CRO form a distinct cluster that does not include FOF (bottom four rows). See also S1 Table. AMP, ampicillin; CIP, ciprofloxacin; CRO, ceftriaxone; DAP, daptomycin; DOX, doxycycline; FOF, fosfomycin; LVX, levofloxacin; LZD, linezolid; OXA, oxacillin; RIF, rifampicin; SPT, spectinomycin; TET, tetracycline; TGC, tigecycline. https://doi.org/10.1371/journal.pbio.3000515.g004 Collateral sensitivity informs optimal anti-resistance strategies To address this problem, we develop a simple mathematical model based on an MDP to predict optimal drug policies. MDP's are widely used in applied mathematics and finance and have a well-developed theoretical basis [77][78][79]. In an MDP, a system transitions stochastically between discrete states. At each time step, we must make a decision (called an "action"), and for each state-action combination there is an associated instantaneous "reward" (or cost). The action influences not only the instantaneous reward but also which state will occur next. The goal of the MDP is to develop a policy-a set of actions corresponding to each state-that will optimize some objective function (e.g., maximize some cumulative reward) over a given time period.
For our system, the state s t at time step t = 0,1,2,. . . is defined by the resistance profile of the population, a vector that describes the resistance level to each available drug. At each time step, an action a t is chosen that determines the drug to be applied. The system-which is assumed to be Markovian-then transitions with probability P a (s t+1 |s t ,a t ) to a new state s t i þ1 , and the transition probabilities are estimated from evolutionary experiments (or any available data). The instantaneous reward function R a (s) is chosen to be the (negative of the) resistance to the currently applied drug; intuitively, it provides a measure of how well the current drug inhibits the current population. The optimal policy π � (s) is formally a mapping from each state to the optimal action; intuitively, it tells which drug should be applied for a given resistance profile. The policy is chosen to maximize a cumulative reward function R c � h P 1 t¼0 g t R p ðs t Þi, where brackets indicate an expectation value conditioned on the initial state s 0 and the choice of policy π. The parameter γ (0�γ<1) is a discount factor that determines the timescale for the optimization; γ�1 leads to a solution that performs optimally on long timescales, whereas γ�0 leads to solutions that maximize near-term success.
To apply the MDP framework to collateral sensitivity profiles, we must infer from our data a set of (stochastic) rules for transitioning between states (i.e., we must estimate P a (s t+1 |s t ,a t )). Although many choices are possible-and different rules may be useful to describing different evolutionary scenarios-here, we consider a simple model in which the resistance to each drug is increased/decreased additively according to the collateral effects measured for the selecting drug in question. Specifically, the state s t+1 following application of a drug at time t is given by s t þ � C, where � C is one of the four collateral profiles (see Fig 1) measured following selection by that drug. Because resistance/sensitivity is measured using log-scaled ratios of IC 50 's, these additive changes in the resistance profile correspond to multiplicative changes in the relative IC 50 for each drug. For instance, if one selection step increases the IC 50 by a factor of 3, then two consecutive selection steps would increase IC 50 by a factor of 9. This model assumes that selection by a given drug always produces changes in the resistance profile with the same statistical properties. For example, selection by DAP increases the resistance to DAP (with probability 1) while simultaneously either increasing resistance to AMP (with probability 1/4), decreasing resistance to AMP (with probability 1/4), or leaving resistance to AMP unchanged (probability 1/2). Repeated application of the same drug will steadily increase the population's resistance to that drug, but the process could potentially sensitize the population to other drugs. This model implicitly assumes sufficiently strong selection that, at each step, the state of the system is fully described by a single "effective" resistance profile (rather than, for example, an ensemble of profiles that would be required to model clonal interference). Although we focus here on this particular model, we stress that this MDP framework can be easily adopted to other scenarios by modifying P a (s t+1 |s t ,a t ).
For numerical efficiency, we discretized both the state space (i.e., the resistance to each drug is restricted to a finite number of levels) as well as the measured collateral profiles (exposure to a drug leads to an increase/decrease of 0, 1, or 2 resistance levels; Fig 5A, S3 Fig). In practice, this means that resistance will eventually saturate at a finite value if a single drug is applied repeatedly. In addition, we restrict our calculations to a representative subset of six drugs (DAP, AMP, FOF, TGC, LZD, RIF). The set includes inhibitors of cell wall, protein, or RNA synthesis, and five of the six drugs (excluding RIF) are clinically relevant for Enterococcus infections. We note, however, that the results are qualitatively similar for different discretization schemes (S4 Fig) and for different drug choices (S5-S7 Figs).

Drug policies can be tuned to minimize resistance on different timescales
The optimal policy π � (s) is a high-dimensional mapping that is difficult to directly visualize. For intuition on the policy, we calculated the frequency with which each drug is prescribed as a function of resistance to each of the six individual drugs (S8 and S9 Figs; top panels). Not surprisingly, we found that when resistance to a particular drug is very low, that drug is often chosen as optimal. In addition, the specific frequency distributions vary depending on γ, which sets the timescale of the optimization. For example, the long-term optimal policy (γ = 0.99) yields a frequency distribution that is approximately independent of the level of resistance to FOF (S8 Fig, upper-right panel). By contrast, the frequency distribution for a short-term policy (γ = 0.1) changes with FOF resistance; at low levels of resistance, FOF is frequently applied as the optimal drug, but it is essentially never applied once FOF resistance reaches a certain threshold (S9 Fig, upper-right panel). Both the short-and long-term optimal policies lead to aperiodic drug sequences, but the resulting resistance levels vary substantially (S8 and S9 Figs, bottom panels). These differences reflect a key distinction in the policies: short-term policies depend sensitively on the current resistance level and maximize efficacy (minimize resistance) at early times, whereas long-term policies may tolerate short-term performance failure in exchange for success on longer timescales.

Optimal policies outperform random cycling and rely on collateral sensitivity
To compare the outcomes of different policies, we simulated the MDP and calculated the expected resistance level to the applied drug over time, hR(t)i, from 1,000 independent realizations ( Fig 5B). All MDP policies perform better than random drug cycling for the first 10-20 time steps and even lead to an initial decrease in resistance. The long-term policy (γ = 0.99, blue) is able to maintain low-level resistance indefinitely, whereas the short-term policy (γ = 0) eventually gives rise to high-level (almost saturating) resistance. Notably, if we repeat this calculation on an identical data set but with all collateral sensitivities set to 0, the level of resistance rapidly increases to its saturating value (Fig 5B, light red line), indicating that collateral sensitivity is critical to the success of these policies. We note that the timescales used here are not necessarily reflective of a clinical situation, and instead, our goal is to understand the performance of the optimization over a wide range of timescales.

Optimal policies highlight a new strategy for minimizing resistance
To understand the optimal policy dynamics, we calculated the time-dependent probability distributions P(Drug)-the probability of applying a particular drug-and P(Resist)-the probability of observing a given level of resistance to the applied drug-for the MDP following the long-term policy (γ = 0.99, Fig 5C and 5D). We also calculated the (steady-state) joint probability distribution characterizing the prescribed drugs at consecutive time steps (Fig 5E). The distributions reveal highly nonuniform behavior; after an initial transient period, RIF is applied most often, followed by FOF, whereas DAP is essentially never prescribed. Certain patterns also emerge between consecutively applied drugs; for example, FOF is frequently followed by RIF. In addition, although it is possible in principle for the same drug to be applied in consecutive steps, in practice it is rare (Fig 5E). Somewhat surprisingly, the distribution of resistance levels is highly bimodal, with the lowest possible resistance level occurring most often, followed by the highest possible level, the second lowest level, and then the second highest level (Fig 5D). The policy achieves a low average level of resistance not by consistently maintaining some intermediate level of resistance to the applied drug but, instead, by switching between highly effective drugs and highly ineffective drugs, with the latter occurring much less frequently. In other words, rare periods of high resistance are the price of frequent periods of very low resistance. These qualitative trends occur for other drug choices (S5-S7 Figs) and are relatively insensitive to the number of discretization levels chosen (S4 Fig). The results suggest a new conceptual strategy for minimizing resistance: interspersing frequent steps of instantly effective drugs (low resistance)-which provide short-term inhibition of pathogen growth-with rare steps of relatively ineffective drugs (high resistance), which provide little short-term inhibition but shepherd the population to a more vulnerable future state.

Optimal policies maintain lower long-term resistance than collateral sensitivity cycles
The resurgent interest in collateral sensitivity was sparked, in part, by innovative recent work that demonstrated the successful application of antibiotic switching, in which one drug promotes evolved sensitivity to the next drug [41]. To compare the performance of the MDP to that expected from collateral sensitivity cycles, we identified all collateral sensitivity cycles for the sixdrug network and calculated hR(t)i for 100 time steps of each cycle. We then determined the "best" cycle of a given length-defined as the cycle with the lowest mean value of hR(t)i over the last 10 time steps-and compared the performance of those cycles to the short-and long-term MDP policies (Fig 5F). The MDP long-term optimal solution (γ = 0.99) maintains resistance at a lower average value than for all of the collateral sensitivity cycles. For MDP policies with shorter time horizons (e.g., the instant gratification cycle, γ = 0), however, the collateral sensitivity cycles of three and four drugs (as well as the long-term MDP solution) lead to lower resistance at intermediate or longer time scales, reflecting the inherent trade-offs between instantaneous drug efficacy and long-term sustainability. One advantage of the MDP optimization is that it allows for explicit tuning of the policy (via γ) to achieve maximal efficacy over the desired time horizon.

Optimized drug sequences improve growth inhibition and reduce adaptation rates in laboratory evolution experiments
The MDP-based optimal policies perform well in stochastic simulations and highlight new strategies for potentially slowing resistance. However, the model contains a number of (B) Average level of resistance (hR(t)i) to the applied drug for policies with γ = 0 (red), γ = 0.7 (black), γ = 0.78 (magenta), and γ = 0.99 (blue). Resistance to each drug is characterized by 11 discrete levels arbitrarily labeled with integer values from −1 (least resistant) to 9 (most resistant). At time 0, the population starts in the second lowest resistance level (0) for all drugs. Symbols (circles, triangles, squares) are the means of 10 3 independent simulations of the MDP, with error bars ± SEM. Solid lines are numerical calculations using exact Markov chain calculations (see Materials and methods). Light red line, long-term optimal policy (γ = 0.99) calculated using the data in (A) but with collateral sensitivity values set to 0. Black shaded line, randomly cycled drugs (±SEM). (C) The time-dependent probability P(Drug) of choosing each of the six drugs when the optimal policy (γ = 0.99) is used. Inset, steady-state distribution P s s (Drug). (D) The probability P (Resist) of the population exhibiting a particular level of resistance to the applied drug when the optimal policy (γ = 0.99) is used. Inset, steady-state distribution P s s (Drug). (E) Steady-state joint probability distribution P(current drug, next drug) for consecutive time steps when the optimal policy (γ = 0.99) is used. (F) Average level of resistance (hR(t)i) to the applied drug for collateral sensitivity cycles of 2 (dark green, LZD-RIF), 3 (pink, AMP-RIF-LZD), or 4 (dark green, AMP-RIF-TGC-LZD) drugs are compared with MDP policies with γ = 0 (short-term, red) and γ = 0.99 (long-term, blue). For visualizing the results of the collateral sensitivity cycles, which give rise to periodic behavior with large amplitude, the curves show a moving time average (window size 10 steps), but the original curves are shown transparently in the background. Data underlying this figure can be found in S1 Data. AMP, ampicillin; DAP, daptomycin; FOF, fosfomycin; LZD, linezolid; MDP, Markov decision process; RIF, rifampicin; TGC, tigecycline. https://doi.org/10.1371/journal.pbio.3000515.g005 Collateral sensitivity informs optimal anti-resistance strategies assumptions that lead to an oversimplified picture of the true evolutionary dynamics. As a result, it is not clear whether optimized drug sequences from this model will be effective in real, evolving pathogen populations.
To test the performance of MDP-based drug cycles, we designed a laboratory evolution experiment comparing inhibitory effects of different drug cycling protocols over 20 days. For experimental feasibility, we restrict our focus to a subset of four drugs (FOF, RIF, AMP, TGC) and reduced the length of each evolutionary time step from 8 days-as in the original collateral sensitivity experiment (Fig 1)-to 2 days. First, we experimentally measured the collateral sensitivity matrix for the four-drug set following 2 days of laboratory evolution in eight replicate populations per drug (Fig 6A). We then calculated the optimal policy for two different values of γ (γ = 0.9, γ = 0.78), both corresponding to timescales commensurate with the planned experiment. In both cases, the steady-state distribution of drug application P(Drug) calls for frequent use of TGC and relatively rare use of FOF, though the specific distribution depends on the particular choice of γ (Fig 6B, top panel; see also S11 Fig).
An exact application of the optimal policy requires measuring the full sensitivity profile at each step and using that profile, in accordance with the policy, to choose the next drug in the sequence. However, simulations suggest that choosing a predetermined drug cycle-that is, a cycle drawn from a particular realization of the stochastic process-is expected to perform near optimally on the timescale of the experiment (S12 and S13 Figs). For experimental convenience, we choose a single MDP-derived cycle for each value of γ. For example, for γ = 0.9 the sequence involves ten 2-day time steps, with drugs applied in the following order: AMP-FO-F-RIF-TGC-AMP-TGC-AMP-TGC-FOF-TGC (Fig 6B, bottom; see S11 Fig for γ = 0.78 example). To evaluate the efficacy of the MDP-derived cycle, we exposed a total of 60 replicate populations to one of 13 different drug cycle protocols (including the two MDP-derived cycles) over a 20-day serial-passage laboratory evolution experiment. Every 40 hours, we measured the optical density (OD) of each population and then diluted each into fresh media containing the prescribed drug (added after a brief drug-free outgrowth phase, see Materials and methods). Drug concentrations were chosen to be just above the minimum inhibitory concentration (MIC) for ancestral populations-with MIC determined by complete absence of growth in ancestral strains after 24 hours under identical conditions-and the same concentration was applied at every time step calling for the associated drug. As a measure of drug efficacy, we defined the cumulative growth of a population at time t as the sum of the OD measurements up to and including time t. Note that because drug-resistant populations often reach a steady-state carrying capacity-in our case, about OD = 0.6-considerably faster than the 40-hour time window, cumulative growth is a conservative measure that underestimates differences in population size that would occur in exponentially growing populations (for example, in a chemostat).
In addition to the two MDP-derived drug protocols, we also tested protocols calling for repeated application of each drug alone (Fig 6C), each of the six possible two-drug cycles ( Fig  6D), a four-drug cycle consisting of repeated application of RIF-FOF-TGC-AMP ( (Fig 6E), and a drug-free control (dashed lines, Fig 6C-6E). The specific four-drug cycle (RIF-FOF-TG-C-AMP) is chosen as a representative example of the 4! = 24 possible four-drug permutations (S10 Fig). In all cases, cumulative growth was normalized to the value of the drug-free control at the end of the 20-day experiment. To compare results from the model with experiment, we mapped each of the discrete resistance levels to an OD value, with the highest level of resistance corresponding to drug-free growth (OD �0.6 each day) and the lowest resistance level corresponding to no growth (OD = 0); see S12 and S13 Figs. We found experimentally that cycles involving sequential application of drugs with (on average) mutual collateral sensitivity-for example, cycles of RIF-FOF or RIF-AMP (see (Fig 6A)-are among the best-performing two-drug cycles, as predicted by previous studies [41]. However, the MDP-derived protocols led to a reduction in cumulative growth, outperforming every other protocol, sometimes by substantial margins (Fig 6; S11 Fig). In addition to cumulative growth, we characterized each trajectory by calculating the adaptation rate, which is defined as the average rate of increase of instantaneous growth over time (i.e., the slope of the best-fit regression line for instantaneous growth versus time over days 2-20, S14 and S15 Figs). Adaptation rate, which is essentially an estimate of the average convexity of the cumulative growth curves, provides no information on the magnitude of the growth at each step but, instead, measures how rapidly that growth is increasing over time (starting with the first measurement after day 2). In addition to reducing cumulative growth, the MDP-derived sequences led to lower rates of adaptation than nearly every other protocol (Fig 6F; S11 Fig). A notable exception is the TGC-AMP cycle, which exhibits a (small) negative adaptation rate, reflecting the fact that growth at day 2 has already achieved relatively high levels-roughly 60% of drug-free growth-suggesting that adaptation largely occurs in that first period but is nearly absent after that. On the whole-when the data are grouped according to number of drugs in the cycle (1, 2, or 4)-the MDP cycles correspond to lower mean adaptation rates than the one-drug cycles (p = 0.04, one-sided two-sample t test), two-drug cycles (p = 0.05, one-sided two-sample t test), and the four-drug cycles (p = 0.01, one-sided two-sample t test) tested.

Discussion
Our work provides an extensive quantitative study of phenotypic and genetic collateral drug effects in E. faecalis. We have shown that cross-resistance and collateral sensitivity are widespread but heterogeneous, with patterns of collateral effects often varying even between mutants evolved to the same drug. Our results contain a number of surprising, drug-specific observations; for example, we observed a strong, repeatable collateral sensitivity to RIF when mutants were selected by inhibitors of cell wall synthesis. Additionally, cross-resistance to DAP is particularly common when cells are selected by other frequently used antibiotics. Because the Food and Drug Administration (FDA)/Clinical & Laboratory Standards Institute (CLSI) breakpoint for DAP resistance is not dramatically different than the MIC distributions found in clinical isolates prior to DAP use [80], one may speculate that even small collateral effects could have potentially harmful consequences for clinical treatments involving DAP. In addition, we found that selection by one drug, LZD, led to higher overall resistance to CHL than direct selection by CHL. Although CHL is rarely used clinically, the result illustrates that (1) collateral effects can be highly dynamic, and (2) indirect selection may drive a population across a fitness valley to an otherwise inaccessible fitness peak.
Our findings also point to global trends in collateral sensitivity profiles. For example, we found that the repeatability of collateral effects is sensitive to the drug used for selection,

Fig 6. Optimized drug sequences reduce cumulative growth and adaptation rates in laboratory evolution experiments. (A) Resistance
(red) or sensitivity (blue) of each evolved mutant (horizontal axis; 4 drugs × 8 mutant per drug) to each drug (vertical axis) following 2 days of selection is quantified by the log 2 -transformed relative increase in the IC 50 of the testing drug relative to that of wild-type (V583) cells. (B) Top: distribution of applied drug at time step 20 (approximate steady state) calculated over all realizations of the stochastic process using an optimal policy with γ = 0.9. Bottom: sequence of applied drug from one particular realization of the stochastic process with the optimal policy (γ = 0.9). (C-E) Cumulative population growth over time for populations exposed to single-drug sequences ([C], blue), two-drug sequences ([D], magenta), a four-drug sequence ([E], red), or the optimal sequence from panel B (black curves, all panels). Transparent lines represent individual replicate experiments and each thicker dark line corresponds to a mean over replicates. Dashed line, drug-free control (normalized to a growth of 1 at the end of the experiment). (F) Adaptation rate for single-drug (blue), two-drug (magenta), four-drug (red), and optimal sequences (black). Error bars are standard errors across replicates. Adaptation rate is defined as the slope of the best-fit linear regression describing time series of daily growth (see S15 Fig). Data underlying this figure can be found in S1 Data. AMP, ampicillin; FOF, fosfomycin; IC 50 , half-maximal inhibitory concentration; RIF, rifampicin; TGC, tigecycline. https://doi.org/10.1371/journal.pbio.3000515.g006 Collateral sensitivity informs optimal anti-resistance strategies meaning that some drugs may be better than others for establishing robust antibiotic cycling profiles. On the other hand, despite the apparent unpredictability of collateral effects at the level of individual mutants, the sensitivity profiles for mutants selected by drugs from known classes tend to cluster into statistically similar groups. As proof of principle, we show how these profiles can be incorporated into a simple mathematical framework that optimizes drug protocols while accounting for effects of both stochasticity and different time horizons. Within this framework, drug policies can be tuned to optimize either short-term or long-term evolutionary outcomes. The ability to systematically tune these timescales may eventually be useful in designing drug protocols that interpolate between short-term, patient-centric outcomes and long-term, hospital-level optimization.
Our results complement recent studies on collateral sensitivity and also raise a number of new questions for future work. Much of the previous work on collateral networks in bacteria has focused on gram-negative bacteria and highlighted the role of aminoglycosides in collateral sensitivity [36,41]. Many gram-positive bacteria, including enterococci, are intrinsically resistant to aminoglycosides [81], and we therefore included only one (SPT) in our study. In that case, however, we did observe collateral sensitivity to cell wall inhibitors (AMP and FOF) in SPT-selected populations, consistent with findings in other species [36,41], though it is not clear from our results whether aminoglycoside resistance would be associated with more widespread collateral sensitivity in E. faecalis. Recent work demonstrates that collateral profiles may be largely conserved across a wide range of E. coli isolates [76], offering hope that largescale analysis of clinical isolates may soon identify similar patterns in enterococci.
Multiple studies have shown that collateral profiles are heterogeneous [49,50], and optimization will therefore require incorporation of stochastic effects such as likelihood scores [51]. These likelihood scores could potentially inform transition probabilities in our MDP approach, leading to specific predictions for optimal drug sequences based on known fitness landscapes. Although we have quantified the variability in evolved populations in several ways (e.g., variability scores, interprofile distance, population sequencing), we cannot definitely comment on the source of that variability; it could arise, for example, from different fixation events in independent populations or, alternatively, from clonal interference and random sampling in isolating individual clones. Indeed, population sequencing does suggest some measure of heterogeneity, even when we limit our analysis to mutations occurring at greater than 30%. In any event, our results point to a rich collection of possible collateral profiles, meaning that successful approaches for limiting resistance will likely require incorporation of variability and heterogeneity.
Several previous studies have indicated that cycles involving mutually collaterally sensitive drugs may be chosen to minimize the evolution of resistance [41,42]. In the context of our MDP model, these cycles fall somewhere between the short-time-horizon optimization and the long-term optimal strategy, and in some cases, the collateral sensitivity cycling can lead to considerable slowing of resistance. However, our results indicate that the MDP optimizations on longer time horizons lead to systematically lower resistance, a consequence of intermixing (locally) suboptimal steps in which the drug is instantaneously less effective but shepherds the population to a more vulnerable evolutionary state. We also find experimentally that mutual collateral sensitivity cycles with two drugs do generally outperform most other two-drug and single-drug protocols-as predicted by previous studies-but they generally underperform the MDP-based sequences. We also find that the MDP-based sequences experimentally outperform a representative four-drug sequence. However, simulations suggest that some four-drug sequences are expected to perform better and some worse than the chosen cycle. Because it was not experimentally feasible to exhaustively test all possible four-drug cycles, it is possible that another four-drug cycle may have performed as well as the MDP. In fact, the existence of high-performing four-drug cycles would even, perhaps, be expected, given that some uniform cycles may be well suited to harness collateral sensitivities between pairs of drugs, even if they do so suboptimally. As a result, a significantly longer experiment may be required to evaluate comparative performance of truly optimal sequences against "best-case" four-drug cycles.
It is important to keep in mind several limitations of our work. Designing effective drug protocols for clinical use is an extremely challenging and multiscale problem. Our approach was not to develop a detailed, clinically accurate model but, instead, to focus on a simpler question: optimizing drug cycles in single-species host-free populations. Even in this idealized scenario, which corresponds most closely to in vitro laboratory experiments, slowing resistance is a difficult and poorly understood problem (despite much recent progress). Our results are promising because they show that systematic optimization is indeed possible given the measured collateral sensitivity profiles.
We have chosen to focus on a simple evolutionary scenario in which collateral effects accumulate over time based on the history of drug exposure. By using a simple model that can be analyzed in detail, our goal was to identify new conceptual strategies-and with them, experimentally testable predictions-for exploiting correlations in phenotypic resistance profiles. Although we have focused on an extremely simple model, the MDP framework can be readily extended to account for different evolutionary scenarios and to incorporate more complex clinically inspired considerations. For example, it would be straightforward to include fitness costs associated with different resistance profiles; in turn, the model might be extended to allow for drug-free periods ("drug holidays"), which potentially exploit these fitness costs to minimize resistance [50]. In addition, the current model inherently assumes that the dominant collateral effects are independent of the genetic background. In fact, collateral sensitivity profiles in cancer have been previously shown to be time dependent [50,82], epistasis certainly occurs [49,83], and population heterogeneity could limit efficacy of this strategy under some conditions [84]. Unfortunately, the frequency and relative impact of these confounding effects are difficult to gauge. However, the relative success of the MDP-inspired sequences in laboratory evolution experiments underscores the potential of the approach. In particular, our findings offer hope that strategies combining frequent use of highly effective drugs with rare periods of "evolutionary steering" by less effective drugs may be promising, even when the detailed assumptions of the model do not strictly hold.
Our future work will focus on experimentally characterizing dynamic properties of collateral effects and expanding the MDP approach to account for time-varying sensitivity profiles and epistasis. It may also be interesting to investigate collateral effects in microbial biofilms, in which antibiotics can have counterintuitive effects even on evolutionarily short timescales [85]. On longer timescales, elegant experimental approaches to biofilm evolution have revealed that spatial structure can give rise to rich evolutionary dynamics [86,87] and, potentially, but not necessarily, divergent results for biofilm and planktonic populations [88] Finally, our results raise questions about the potential molecular and genetic mechanisms underlying the observed collateral effects. The phenotypic clustering analysis presented here may point to shared mechanistic explanations for sensitivity profiles selected by similar drugs, and the full genome sequencing identifies candidate genes associated with increased resistance. However, it is important to remember that the population sequencing results correspond to only a small fraction of the evolved populations, which exhibit significant biological variability. The results may therefore not be representative of the dominant evolutionary trajectories leading to resistance for each drug. Fully elucidating the detailed genetic underpinnings of collateral sensitivity remains an ongoing challenge for future work. At the same time, because the MDP framework depends only on phenotypic measurements, it may allow for systematic optimization of drug cycling policies even when molecular mechanisms are not fully known.

Strains, antibiotics, and media
All resistant lineages were derived from E. faecalis V583, a fully sequenced vancomycin-resistant clinical isolate [89]. The 15 antibiotics used are listed in Table 1. Each antibiotic was prepared from powder stock and stored at −20˚C, with the exception of AMP, which was stored at −80˚C. Evolution and IC 50 measurements were conducted in BHI medium alone, with the exception of DAP, which requires an addition of 50 mM calcium for antimicrobial activity.

Laboratory evolution experiments
Evolution experiments to each antibiotic were performed in quadruplicate. Evolutions were performed using 1 mL BHI medium in 96-well plates with a maximum volume of 2 mL. Each day, populations were grown in at least three different antibiotic concentrations spanning both sub-and super-MIC doses. After 16-20 hours of incubation at 37˚C, the well with the highest drug concentration that contained visual growth was propagated into two higher concentrations (typically a factor 2× and 4× increase in drug concentration) and one lower concentration to maintain a living mutant lineage (always half the concentration that most recently produced growth). A 1/200 dilution was used to inoculate the next day's evolution plate, and the process was repeated for a total of 8 days of selection. On the final day of evolution, all strains were stocked in 30% glycerol. Strains were then plated on a pure BHI plate, and a single colony was selected for IC 50 determination. In the case of LZD mutants, days 2, 4, and 6 were also stocked for further testing. To avoid potential contamination, samples from each population were plated at regular intervals and also visualized using DIC microscopy to check for typical E. faecalis morphology. Contamination was identified in nine of the 32 LZD-selected populations after day 5, and those populations were therefore excluded from further analysis.

Measuring drug resistance and sensitivity
Experiments to estimate IC 50 were performed in replicate in 96-well plates by exposing mutants to a drug gradient consisting of 6-14 points-one per well-typically in a linear dilution series prepared in BHI medium with a total volume of 205 μL (200 μL of BHI, 5 μL of 1.5 OD cells) per well. After 20 hours of growth, the OD at 600 nm (OD600) was measured using an Enspire Multimodal Plate Reader (Perkin Elmer) with an automated 20-plate stacker assembly. This process was repeated for all 60 mutants as well as the WT, which was measured in replicates of eight.
The OD (OD600) measurements for each drug concentration were normalized by the OD600 in the absence of drug. To quantify drug resistance, the resulting dose-response curve was fit to a Hill-like function f(x) = (1+(x/K) h ) −1 using nonlinear least squares fitting, in which K is the IC 50 and h is a Hill coefficient describing the steepness of the dose-response relationship. A mutant strain was defined to be collaterally sensitive if its IC 50 had decreased by more than 3σ a relative to the ancestral strain (3σ a is defined as the uncertainty-standard error across replicates-of the IC 50 measured in the ancestral strain). Similarly, an increase in IC 50 by more than 3σ a relative to the ancestral strain corresponds to cross-resistance.

Estimating growth costs and lag times
To estimate growth costs and lag times associated with isolates from each evolved population, we measured OD time series over 5 hours in drug-free media (BHI). Overnight cultures were diluted 200× into individual wells of a 96-well plate, and OD time series were measured using an Enspire Multimodal Plate reader (Perkin Elmer). Background-subtracted growth curves were then fit with nonlinear least squares to a logistic growth function [60] given by g(t) = g 0 +K c (1+exp(4μ(λ−t)/K c +2)) −1 , where μ is the maximum specific growth rate, λ is the lag time, and K c is the carrying capacity, which we fix to K c = 0.5 to match that of the ancestral strain.

Hierarchical clustering
Hierarchical clustering was performed in Matlab using, as input, the collateral profiles � C for each mutant. The distance between each pair of mutants was calculated using a correlation metric (Matlab function pdist with parameter "correlation"), and the linkage criteria was chosen to be the mean average linkage clustering.

MDP model
The MDP model consists of a finite set of states (S), a finite set of actions (A), a conditional probability (P a (s 0 |s,a)) describing (action-dependent) Markovian transitions between these states, and an instantaneous reward function (R a (s)) associated with each state and action combination. The state of the system s2S is an n d -dimensional vector, with n d indicating the number of drugs and each component s i 2{r min ,r min +1,. . .,r max } indicating the level of resistance to drug i. The action a2A�{1,2,. . .,n d } is the choice of drug at the current step, and we take the reward function R a (s) to be the (negative of the) resistance level to the currently applied drug (i.e., the a-th component of s). The goal of the MDP is to identify a policy π(s), which is a mapping from S to A that specifies an optimal action for each state. The policy is chosen to maximize a cumulative reward function R c ¼ P 1 t¼0 g t hR p ðs t Þi, where t is the time step, s t is the state of the system at time t, R π (s t ) is a random variable describing the instantaneous reward assuming that the actions are chosen according to policy π, and brackets indicate an expectation value. The parameter γ (0�γ<1) is a discount factor that determines the relative importance of instantaneous versus long-term optimization. In other words, we seek an optimal policywhich associates the resistance profile of a given population to an optimal drug choice-that minimizes the cumulative expected resistance to the applied drug.
The MDP problem was solved using value iteration, a standard dynamic programming algorithm for MDP models. Briefly, the optimization was performed by first computing the optimal value function V(s), which associates to each state s the expected reward obtained by following a particular policy and starting in that state. Following the well-established value iteration algorithm [77][78][79], we iterate according to V i+1 (s) = max {a} (R a (s)+γ∑ s 0P(s 0 |s,a)V i (s 0 )). Given the optimal value function, the optimal policy is then given by the action that minimizes the optimal value function at the next time step.
Once the optimal policy π = π � is found, the system is reduced to a simple Markov chain with transition matrix T π � = P π � (s) (s 0 |s,π � (s)), where the subscript π � means that the decision in each state is determined by the policy π � (i.e., that a = π � (s) for a system in state s). Explicitly, the Markov chain dynamics are given by P t+1 (s) = T π � P t (s), with P t (s) the probability to be in state s at time step t. All quantities of interest-including P(Drug), P(Resist) (see Fig 5), and hR (t)i-can be calculated directly from P t (s). For example, hR(t)i = ∑ s2S P t (s)R π � (s), with R π � (s) the instantaneous reward for a system in state s under optimal policy π � .

Experiments to evaluate different drug sequence protocols
Experiments to evaluate different drug sequence protocols were performed in replicate in 96-well plates by exposing populations to antibiotic concentrations just above the WT MIC value, determined by an absence of measurable growth after 24 hours. Seed populations were grown overnight from single colonies and then diluted 1:200 into fresh BHI plates with the appropriate antibiotic concentration according to each prescribed policy. Populations were left to grow inside a plate reader for 40 hours while OD readings were taken every 20 minutes for at least the first 6 hours. To estimate daily growth, we took a final OD reading for each population after 40 hours. The populations were then diluted 1:200 into fresh BHI media, and following a brief 2-hour outgrowth phase, populations were then diluted immediately into preprepared plates containing the appropriate drug concentrations. The purpose of the outgrowth phase is to minimize drug-drug interactions and postantibiotic effects that may occur if the population were to be diluted into the next drug plate immediately. To avoid contamination, each plate was covered during growth phase. In addition, each experimental plate contained 36 control wells with BHI alone-no cells. If any of these wells displayed visible growth, the plate was considered to be contaminated and discarded; the experiment was then started again from the previous night's stock. During the 20-day experiment, only one such restart was required. Strains were stocked at −80˚C in 15% glycerol at the end of each 40-hour growth.
Cumulative growth and adaptation rate were estimated based on OD measurements at 40 hours. Adaptation rate was defined as the average rate of increase of instantaneous growth over time (i.e., the slope of the best-fit regression line for instantaneous growth versus time over days 2-20, S14 and S15 Figs). The instantaneous growth does not monotonically increase over time-as one might expect in adaptation to constant environments-and therefore, this definition (which estimates only a linear rate) differs from previous definitions based on nonlinear increases in exponential growth (see, for example, [27]).

Whole-genome sequencing
To identify any genomic changes that contributed to the measured collateral phenotypes identified, we sequenced 15 independently evolved drug mutants along with 2 V583 ancestors as well as a control V583 strain propagated in BHI for the 8 days. Each of the 15 drug-selected mutants and BHI control were subjected to both clonal and population sequencing. Populations were streaked from a frozen stock, grown up in BHI, and triple washed in PBS, and DNA was isolated using a Quick-DNA Fungal/Bacterial Kit (Zymo Research). The clonal samples were sequenced in two batches via the University of Michigan sequencing core, whereas the population samples were sequenced via the Microbial Genome Sequencing Center (MiGS) at University of Pittsburgh.
The resulting genomic data were analyzed using the high-throughput computational pipeline breseq, with default settings. Average read coverage depth was about 50 on batch 1, 300 on batch 2, and 200 on the population sequencing batch. Briefly, genomes were trimmed and subsequently aligned to E. faecalis strain V583 (accession numbers: AE016830-AE016833) via Bowtie 2. A sequence read was discarded if less than 90% of the length of the read did not match the reference genome or a predicted candidate junction. At each position, a Bayesian posterior probability was calculated, and the log 10 ratio of that probability versus the probability of another base (A, T, C, G, gap) is calculated. Sufficiently high consensus scores are marked as read alignment evidence (in our case, a consensus score of 10). Any mutation that occurred in either of the two control V583 strains was filtered from the results. Average level of resistance (hR(t)i) to the applied drug for policies with γ = 0 (red), γ = 0.7 (black), γ = 0.9 (magenta), and γ = 0.99 (blue). Resistance to each drug is characterized by 11 discrete levels ranging from −1 (least resistant) to 9 (most resistant). At time 0, the population starts in the second lowest resistance level (0) for all drugs. Symbols (circles, triangles, squares) are the means of 10 3 independent simulations of the MDP, with error bars ± SEM. Solid lines are numerical calculations using exact Markov chain calculations (see Methods). Black shaded line, randomly cycled drugs. (C) The probability P(Resist) of the population exhibiting a particular level of resistance to the applied drug when the optimal policy (γ = 0.99) is used. (D) The time-dependent probability P(Drug) of choosing each of the six drugs when the optimal policy (γ = 0.99) is used. (E) Steady-state joint probability distribution P(current drug, next drug) for consecutive time steps when the optimal policy (γ = 0.99) is used. (F) Average level of resistance (hR(t)i) to the applied drug for collateral sensitivity cycles of two (dark green, AMP-LVX) drugs are compared with MDP policies with γ = 0 (short-term, red) and γ = 0.99 (long-term, blue). For visualizing the results of the collateral sensitivity cycles, which give rise to periodic behavior with large amplitude, the curves show a moving time average (window size 10 steps), but the smoothed curves are shown transparently in the background. AMP, ampicillin; DAP, daptomycin; LZD, linezolid; MDP, Markov decision process; RIF, rifampicin; TGC, tigecycline. (EPS) S8 Fig. Optimal policy statistics and sample trajectories for γ = 0.99. The optimal policy π � (s) is a mapping from the set of all possible resistance profiles (S) to the set of drugs (A). The policy associates each resistance profile with a unique (optimal) drug. Top panels: frequency with which each drug is prescribed (according to the optimal policy) as a function of the level of resistance to an individual drug (horizontal axis). More specifically, for each of the six panels, the state space is partitioned into 11 distinct subsets, with each subset containing all states characterized by a given level of resistance to the particular drug in question (horizontal axis). The colored bars then show how frequently each of the six drugs is prescribed (according to the optimal policy) across all states within that subset. Bottom left panel: single simulated trajectory showing drug choice over time. Bottom right panel: single simulated trajectory of the instantaneous reward R, which corresponds to the resistance level to the applied drug. Blue curve is the specific trajectory; black curve is a moving average of the trajectory with a window size of 20. sequence of applied drug from one particular realization of the stochastic process with the optimal policy (γ = 0.78). (C-E) Cumulative population growth over time for populations exposed to single-drug sequences ([C], blue), two-drug sequences ([D], magenta), a four-drug sequence ([E], red), or the optimal sequence from panel B (black curves, all panels). Transparent lines represent individual replicate experiments, and each thicker dark line corresponds to a mean over replicates. Dashed line, drug-free control (normalized to a growth of 1 at the end of the experiment). (F) Adaptation rate for single-drug (blue), two-drug (magenta), four-drug (red), and optimal sequences (black). Error bars are standard errors across replicates. Adaptation rate is defined as the slope of the best-fit linear regression describing time series of daily growth (see Fig 20). As a whole, the MDP cycles correspond to lower mean adaptation rates than the one-drug cycles (p = 0.003, one-sided two-sample t test), two-drug cycles (p = 0.004, one-sided two-sample t test), and the four-drug cycles (p = 0.002, one-sided two-sample t test) tested. Data underlying this figure can be found in S1 Data. IC 50 , half-maximal inhibitory concentration; MDP, Markov decision process. drugs × 8 mutants per drug) to each drug (vertical axis) following 2 days of selection is quantified by the log 2 -transformed relative increase in the IC 50 of the testing drug relative to that of wild-type (V583) cells. The profile is then discretized into four levels of resistance. (B) Top: distribution of applied drug at time step 20 (approximate steady state) calculated using an optimal policy with γ = 0.9. Bottom: sequence of applied drug from one particular realization of the stochastic process with the optimal policy (γ = 0.9). (C-E) Cumulative population growth (simulations) over time for populations exposed to single-drug sequences ([C], blue), twodrug sequences ([D], magenta), a four-drug sequence ([E], red), or the optimal sequence from panel B (black curves, all panels). Black circles correspond to the true optimal (i.e., applying the MDP policy directly) and performs only slightly better, on average, than the fixed sequence in panel B. At each time step, resistance level to each drug is converted to an OD value using a linear conversion, with the highest resistance level corresponding to growth of drug-free cells (OD�0.6) and the lowest resistance level corresponding to OD = 0. Transparent lines represent individual replicate experiments, and each thicker dark line corresponds to a mean over replicates. Dashed line, drug-free control (normalized to a growth of 1 at the end of the experiment). (F) Adaptation rate for single-drug (blue), two-drug (magenta), four-drug (red), and optimal sequences (black). Error bars are standard errors across replicates. Adaptation rate is defined as the slope of the best-fit linear regression describing time series of daily growth. Data underlying this figure can be found in S1 Data. IC 50 , half-maximal inhibitory concentration; MDP, Markov decision process; OD, optical density. (EPS) S13 Fig. Optimized drug sequences reduce cumulative growth and adaptation rates in numerical simulations of the laboratory evolution experiments. Compare with S11 Fig. (A) Resistance (red) or sensitivity (blue) of each evolved mutant (horizontal axis; 4 drugs × 8 mutants per drug) to each drug (vertical axis) following 2 days of selection is quantified by the log 2 -transformed relative increase in the IC 50 of the testing drug relative to that of wild-type (V583) cells. The profile is then discretized into four levels of resistance. (B) Top: distribution of applied drug at time step 20 (approximate steady state) calculated using an optimal policy with γ = 0.78. Bottom: sequence of applied drug from one particular realization of the stochastic process with the optimal policy (γ = 0.78). (C-E) Cumulative population growth (simulations) over time for populations exposed to single-drug sequences ([C], blue), twodrug sequences ([D], magenta), a four-drug sequence ([E], red), or the optimal sequence from panel B (black curves, all panels). Black circles correspond to the true optimal (i.e., applying the MDP policy directly) and performs only slightly better, on average, than the fixed sequence in panel B. At each time step, resistance level to each drug is converted to an OD value using a linear conversion, with the highest resistance level corresponding to growth of drug-free cells (OD�0.6) and the lowest resistance level corresponding to OD = 0. Transparent lines represent individual replicate experiments, and each thicker dark line corresponds to a mean over replicates. Dashed line, drug-free control (normalized to a growth of 1 at the end of the experiment). (F) Adaptation rate for single-drug (blue), two-drug (magenta), four-drug (red), and optimal sequences (black). Error bars are standard errors across replicates. Adaptation rate is defined as the slope of the best-fit linear regression describing time series of daily growth. Data underlying this figure can be found in S1 Data. IC 50 , half-maximal inhibitory concentration; MDP, Markov decision process; OD, optical density. (EPS) S14 Fig. Estimated adaptation rate in laboratory evolution experiments based on = 0.9 MDP policy. Daily growth, which is defined as the OD measured at the end of each 48-hour period (normalized to drug-free control), for populations exposed to single-drug (blue), twodrug (magenta), four-drug (red), and optimal (black) drug sequences. All time series start at day 2 (i.e., following 48 hours of adaption). Transparent curves correspond to individual replicate experiments; solid dark lines show the (average) best-fit linear regression. Adaptation rate is defined as the slope of the regression line. Data underlying this figure can be found in S1 Data. MDP, Markov decision process. (EPS) S15 Fig. Estimated adaptation rate in laboratory evolution experiments based on = 0.78 MDP policy. Daily growth, which is defined as the OD measured at the end of each 48-hour period (normalized to drug-free control), for populations exposed to single-drug (blue), twodrug (magenta), four-drug (red), and optimal (black) drug sequences. All time series start at day 2 (i.e., following 48 hours of adaption). Transparent curves correspond to individual replicate experiments; solid dark lines show the (average) best-fit linear regression. Adaptation rate is defined as the slope of the regression line. Data underlying this figure can be found in S1 Data. MDP, Markov decision process; OD, optical density. (EPS) S1 Table. Labeling scheme for dendrogram (Fig 4)