Long-term coexistence and regional heterogeneity of antibiotic-resistant infections reproduced by a simple spatial model

Antibiotic-resistant infections are a growing threat to human health. Mathematical models are a common tool to compare potential intervention strategies, but often struggle to reproduce a ubiquitous pattern seen in data: the long-term coexistence of both drug-susceptible and drug-resistant strains. Here we show that simple models of infection in structured or spatially-heterogeneous host populations lead to persistent coexistence where well-mixed models fail. This coexistence is robust over a wide range of treatment coverage, drug efficacies, costs of resistance, and mixing patterns. Perhaps more importantly, this model can explain other puzzling spatiotemporal features of drug resistance epidemiology that have received less attention, such as large differences in the prevalence of resistance between geographical regions that consume similar amounts of antibiotics or that neighbor one another. Our analysis identifies key features of host population structure that can be used to assess drug resistance risk and highlights the need to include spatial or demographic heterogeneity in models to guide resistance management.


Introduction
Antibiotic resistance is a major threat to our ability to treat bacterial infections. Over the past century, resistance to each new class of drugs has appeared soon after clinical use began. Today, drug-resistant infections are estimated to cost perhaps $20 billion annually 1 . Individual bacteria that are resistant to multiple classes of antibiotics are now common in species such as Streptococcus pneumoniae, Pseudomonas aeruginosa, and Clostridium difficile 2 , and nearly untreatable strains of Neisseria gonorrheae 3 , Klebsiella pneumoniae 4 , and Acinetobacter baumannii 5 have recently been identified. These trends have lead to speculations about a "post-antibiotic future", in which routine medical procedures such as surgeries, childbirth, and dental work might become as high-risk as they were in the pre-WWII era due to a lack of effective antibiotic prophylaxis and treatment 6 .
Beyond sensational headlines, there is widespread interest among experts in predicting the future morbidity, mortality, and economic impact of drug-resistant infections, so that appropriate investments to counter these trends can be encouraged from government and industry. Mathematical modeling has traditionally played a key role in predicting the dynamics of infectious diseases 7 , and has a long history of application to antibiotic-resistant infections 8 . Despite this, there are several recurrent trends in the spatiotemporal dynamics of drug-resistant strains of bacteria that have confounded modelers. Since models must, at minimum, be able to explain current trends before being trusted for forecasting, these fundamental disagreements between models and data have either discouraged efforts to make predictions or lead to widespread suspicion of existing predictions (e.g. the Review on Antimicrobial Resistance 9, 10 ). We outline three of these issues below.

I) Frequencies of resistance over time seem to defy competitive exclusion
For a very large number of bacteria-antibiotic combinations, the prevalence of resistant strains has not reached 100% in the population, but has rather appeared to saturate at an intermediate frequency. Prominent examples include Escherichia coli and aminopenicillins in Europe over the past decade 11,12 , methicillin-resistant Staphylococcus aureus in the United States 2 , and penicillin resistance in S. pneumoniae in Spain from the 1980's to 1990's 13,14 (Figure 1i) among other examples 15,16 . This long-term coexistence of drug-susceptible and drug-resistant strains can be difficult to explain. Resistant strains are clearly selected in treated individuals, but generally carry a fitness cost such that susceptible strains do better in untreated individuals [17][18][19] . Mathematical models of infection dynamics under treatment predict that in most conditions the population will move towards an equilibrium where only a single strain persists, with drug-resistant strains reaching fixation when treatment is common enough and costs of resistance are low enough, and drug-susceptible strains dominating otherwise 20 . More generally, long-term coexistence of two strains would seem to defy the ecological principle of competitive exclusion [21][22][23] , which dictates that when two species compete for a single resource (here, the susceptible population), only one may survive. After an initial increase, drug-susceptible and drug-resistant strains have appeared to co-exist stably for ∼ 15 years. 24 . II) Percentage of S. pneumoniae isolates from 2005 that were resistant to penicillin versus the number of daily doses of penicillin administered divided by population size, for each country participating in the European Centres for Disease Prevention and Control 11,25 . The same quantity of antibiotic prescriptions can lead to very dramatic differences in rates of resistance. III) Time-averaged percent of K. pneumoniae isolates resistant to carbapenems in Italy, France, Germany, Belgium, Luxembourg, Austria, Slovenia, and Croatia from 2010-2015 11 (blue/yellow) and S. pneumoniae isolates resistant to penicillin in three provinces of Spain from 1990-1998 (green/red) 13,14 . The year-to-year deviation from this average is less than 3%. The equivalent map of Italy and its neighbors for these years can be seen in Supp. Fig. S1. Neighboring regions can have vastly different rates of resistance which persist for long time periods. . Note that for S. pneumoniae and macrolide or penicillin, "% resistant" includes isolates classified as "non-susceptible".
Dozens of previous mathematical models have been constructed in an attempt to explain this coexistence. While many models have focused in particular on S. pneumoniae, these and others have attempted to be as general as possible so the conclusions can be extended to other bacterial species. Studies have repeatedly found that models of well-mixed populations in which individuals can only be infected with one strain (either susceptible or resistant) at a time do not generally support coexistence 20,[26][27][28] . Including more details of within-host infection dynamics in models has suggested some potential mechanisms of coexistence. Drug resistant and susceptible strains can co-exist if dual infection of individuals with both is possible, but only for a very narrow range of values for both the treatment coverage level and the cost of resistance 20,26,27 . If de novo generation of one strain in an individual infected with the other can occur (presumably via point mutation or horizontal gene transfer from non-pathogenic co-colonizers), then coexistence is again possible 20 , but such a process is likely to lead to low-level mutation-selection balance, not coexistence at near intermediate levels. Recently, Davies et al. 28 have suggested that co-colonization of hosts with subsequent within-host competition results in a type of frequency-dependent selection that helps maintain coexistence. This occurs because resistant strains have an advantage when they are rare, because they will mostly co-colonize susceptible hosts who, when treated, will be rid of competitors. This mechanism is quite robust to parameter values, but only relevant to commensal bacteria for which co-colonization is common.

2/33
Another set of mechanisms shown to promote coexistence is host population structure. If there are two completely isolated sub-populations 20 , or if treated and untreated individuals interact extremely rarely 26 , then susceptible and resistant strains can both persist, but this is unlikely to apply to any realistic scenario. Modeling transmission in a population in which both contact patterns and the probability of being treated are age-dependent and informed by data for S. pneumoniae reveals slightly more opportunity for coexistence 27 . However, it is still far from general, since in this model coexistence only occurs for costs of resistance <8%, does not display full the range of resistance prevalence levels -and their dependence on antibiotic useobserved in data, and is somewhat reliant on the division of the bacterial population into discrete serotypes which each support different resistance levels.
There are also ecological effects known to promote coexistence of species more generally 29 , but their relevance to bacterial resistance to antibiotics remains unclear. For example, models of competition show that coexistence can occur when species compete more strongly among themselves than with other species 26 . The obvious candidate for such a mechanism in the context of infectious diseases is strain-specific immunity 30 , with the effect that hosts are less susceptible to re-infection by a strain with which they have previously been infected. Strain-specific immunity leads to balancing selection, since low frequency strains have an advantage. However, there is generally not expected to be a connection between the resistance status of a strain and its immunogenicity (e.g. serotype). Lehtinen et al. 31 have recently shown that such a connection may not be necessary, as linkage between a locus under balancing selection and a polymorphic locus that influences the relative fitness of resistant and susceptible strains (such as duration of carriage for S. pneumoniae) can promote coexistence of resistant and susceptible strains. The relevance of these mechanisms to most antibiotic resistant bacteria remains uncertain.
Overall, despite some progress in explaining the long-term coexistence of drug-sensitive and drug-resistant strains, many models only reproduce coexistence for small regions of parameter space or species-specific scenarios, and the ultimate set of causes for real-world coexistence is far from fully understood.

II) Different frequencies of resistance are seen in regions with similar rates of drug prescription
Another puzzling trend in spatiotemporal antibiotic resistance data is the appearance of very different levels of resistance in regions that seem to have the same levels of drug usage. While overall, there is a strong correlation between the amount of antibiotics consumed in a region and the prevalence of drug resistant strains 25,32 , there are many cases that diverge from this trend. For instance, comparing countries participating in the European Centre for Disease Prevention and Control (ECDC), the rate of resistance can vary by as much as 30% even between countries that have equivalent rates of outpatient prescription of penicillin, such as Spain and Poland 11,25 (Figure 1ii). Other examples can be found at smaller spatial scales, for instance between cities in the American South 33 and provinces in Spain 14 .
While this finding has not yet been specifically examined in the context of theoretical models, all of the models mentioned in the previous section 20, [26][27][28]31 admit only one stable equilibrium for a given set of parameters, including antibiotic consumption level. Therefore, if each region is considered to be an isolated and well-mixed population, extra mechanisms are needed to explain different resistance levels for the same overall antibiotic use. It could be that one or both of the regions in question had not yet reached a steady state prevalence of resistance by the time of sampling, though when longitudinal surveys of these same resistant strains are available, they generally suggest that levels have approximately equilibrated (e.g. ECDC data 11 ). Another possibility is that the cost of resistance varies between regions, for example due to different mechanisms of resistance or different genetic backgrounds, but experimental evidence to support or refute this idea is lacking so far. Finally, the discrete regions considered during resistance surveillance are in reality neither completely isolated nor well-mixed. Movement of individuals and microbes between regions leads to interdependence of resistance dynamics. Within a region, there could be heterogeneous distributions of treatment and infections which impacts the overall resistance level. Consequently, models that attempt to explain this trend must consider the connected nature of regions across spatial scales.

III) Neighboring regions can show very different frequencies of resistance
Another confounding element of antibiotic resistance epidemiology is the high degree of heterogeneity in resistance levels between neighboring populations at many differing scales. This is slightly different from the preceding point (II): not only do we find two regions with the same levels of antibiotic use but with different levels of resistance, we find that these regions can be bordering each other. Neighboring regions are likely to exchange infected individuals frequently, which would be expected to ameliorate differences in resistance levels over time, just like the predictions of well-mixed models. In contrast, they can actually sustain sharp gradients in the frequency of resistance seemingly indefinitely.
The scales at which this phenomenon is observed range from the size of nations down to neighborhoods of a city ( Figure  1iii). For instance, in Europe, the frequency of carbapenems resistance in K. pneumoniae has been much much higher in Italy than in neighboring states for many years 11 . Similar trends have led to differences in the frequency of certain resistant strains across the United States 34,35 . At a smaller scale, in the long-term study of S. pneumoniae strains in Spain revealed a large difference in frequency between penicillin resistance in Aragon and its neighboring regions 13,14,24,36 . At an even smaller scale, recent (2015) data on the United States revealed that in four neighborhoods in Greater Miami, rates of cefazolin resistance in

3/33
Proteus mirabilis differed by a staggering 89 % overall (rates of 8%, 13%, 88%, 97%) 33 . The fact that these large differences in resistance levels appear frequently in data at different spatial scales and between neighbors that disburse similar amounts of drug therefore require a unifying explanation.

* * *
In this paper we propose a general mechanism to explain these perplexing spatiotemporal trends (I-III) in antibiotic resistance levels that have eluded previous mathematical models. We consider that host populations may be structured, with different levels of mixing within and between regions, and that the use of antibiotic treatment may be heterogeneous. Spatial differences in treatment rates could arise, for example, from local differences in prescribing guidelines or norms, availability of care, or care-seeking behavior; from the presence of hospitals or other facilities with higher rates of antibiotic usage, or, an increased use of antibiotics following spatially-localized viral epidemics. Such a situation leads to a sort of coarse geographic and/or demographic heterogeneity which can slightly tip the odds in the favor of either a drug-susceptible or -resistant strain in a local region. Using this model, we evaluate under what conditions we observe the three trends mentioned above: drug-resistant strains co-existing at intermediate levels (I), regions with similar treatment levels experiencing different prevalences of resistance (II), and neighboring regions sustaining large, long-term differences in resistance levels despite inter-region spread of infections (III).

A general model for the evolution of drug resistance in a spatially-structured population
To better understand the mechanisms that could be responsible for the spatiotemporal motifs seen in drug resistance data, we developed a simple model for competition between strains of an infection in a structured population. We assume that the total population of size N is divided up into M subpopulations of uniform size D (Figure 2). Following the language of population genetics, we will call these subpopulations 'demes' 37 , although similarly structured models in epidemiology have also called them 'households' 38 . This structure is assumed to be static. Individuals within a deme contact and have the potential to spread infection to any other individual in the same deme, with the same rate (i.e. each deme is well-mixed). Demes may be connected to a subset of other demes in the population, and infection can also be spread between individuals in distinct yet connected demes. These demes could represent any subdivision of a human population of interest, such as different countries, regions within a nation, cities within a region, neighborhoods within a city, and so forth. The wild-type strain (green) is transmitted at rate κ within a deme and rate β between demes. Individuals infected with any strain recover with a rate g. The resistant strain (red) pays a cost c in its transmission rate with or without drug. C) Kinetics of infection within (bottom) and from (top) a drug-treated deme (blue). Transmission of the resistant strain is unaffected by treatment, but transmission of the wild-type strain (green) is reduced by a factor (1 − ε), where ε is the drug efficacy.

4/33
Within this population we consider the concurrent spread of drug-susceptible and drug-resistant strains of an infection. Individuals can be classified based on their infection status: either uninfected and susceptible to infection, infected with a wild-type drug-susceptible strain, or infected with a drug-resistant strain. Infected individuals spread the disease to uninfected individuals at rate κ if they are in the same deme ('within-deme' transmission rate) and rate β if they are in different but connected demes ('between-deme' transmission rate). We assume that κ ≥ β . Infected individuals recover and become susceptible again at rate g. We do not allow for any super-infection or co-infection: only susceptible individuals can be infected.
Infected individuals may receive drug treatment, and if they are infected with the drug-susceptible strain then treatment reduces the rate at which they transmit the infection to others (in the same or connected demes) by a factor (1 − ε), where ε is the efficacy of the drug, 0 < ε < 1. We assume that the resistant strain is perfectly resistant to the drug, so that transmission rates of individuals infected with it are unaffected by the presence of treatment. However, the drug-resistant strain incurs a fitness cost c, which results in lower transmission rates relative to the wild-type strain with or without treatment: For each individual, we assume that whether or not they will receive treatment if they become infected is a pre-determined, static property, and that treatment is administered as soon as they are infected and continues for the duration of infection. However, considering ε < 1 is one way to consider relaxations of this assumption.
Another possibility for the effect of treatment is that it accelerates disease recovery, and we consider this possibility in the Supporting Information.
Our model is an extension of the classic Susceptible-Infected Susceptible (SIS) model for disease transmission 39,40 to a case with two disease strains (sometimes called an SI 1 I 2 S model [41][42][43]. We assume that each deme is large enough that the dynamics can be considered deterministically. This model is structurally neutral 44 , meaning that if two identical strains are simulated (e.g. if ε = 0 and c = 0), they will continue to persist ad infinitum at the same level at which they were initiated.

Absence of coexistence in an unstructured population
We first consider the dynamics of competition between drug-resistant and drug-susceptible strains in a single large, well-mixed deme (an unstructured population) ( Figure 3A). A fraction f of all individuals are treated with drug immediately upon infection by either strain. Infection spreads between all individuals with the within-deme transmission rate κ. We track the proportion of the total population who are infected with the wild type and untreated (w u ), infected with the wild type and treated (w t ), infected with the resistant strain and untreated (r u ), and infected with the resistant strain and treated (r t ). The dynamics are then described by the following set of differential equations: The basic reproductive ratio, defined as the average number of secondary infections caused by a single infected individual in an otherwise susceptible population, can be defined for each disease strain in this model: The equilibria of this system are completely determined by the R 0 values and show competitive exclusion (details in Supporting Information). If both infections have R 0 < 1, then neither strain of infection will persist and the equilibrium will consists only of susceptible individuals. Otherwise, only the strain with the larger R 0 will persist (at a frequency 1 − 1/R 0 , Figure 3B) while the other will go extinct. For the parameters we have chosen, the resistant strain persists if and only if the coverage and efficacy of treatment are enough to offset the cost of resistance, ε f > c ( Figure 3C). Therefore, when the population is well-mixed, the heterogeneity introduced in the population when only some individuals are treated (0 < f < 1) is insufficient to allow coexistence of wild-type and drug-resistant infections.

Limited coexistence in two connected subpopulations with heterogeneous treatment distribution
We next consider the case of two separate but connected subpopulations of equal size. Individuals within and between demes are still well-mixed, but the within-deme transmission rate (κ) is different from the between-deme transmission rate (β ), where we assume κ ≥ β . Treatment is assigned to all individuals within only one of the two demes, so that the demes themselves can be labeled with the subscripts u (untreated deme) and t (treated deme). The system is then described by the following set of differential equations (see Supporting Information for a derivation): where w i (r i ) is the fraction of all individuals in deme i infected with the wild-type (resistant) strain. From the next-generation operator technique 45 , we can derive the value of R 0 for each of two strains, taking into account the heterogeneity in transmission rates across the demes due to treatment: When the within-and between-deme transmission rates are equal (κ = β ), R 0 reduces to the expression in the one-deme case (Eq. (2)). Otherwise, R 0 is always reduced by population structure when between-deme spread is less than within-deme spread (κ ≤ β ).
These two-deme population supports qualitatively different infection dynamics than the single-deme case ( Figure 4). While most parameter regimes still lead to persistence of only the drug susceptible or the drug resistant strain (even when R w 0 > 1 and R r 0 > 1), there is now a stable equilibria where there is a mixture of both strains co-existing ( Figure 4D). This region of coexistence occurs when the between-deme infection rate is relatively low compared to the within-deme rate (β /κ < 0.7 when ε = 0.9) and for intermediate values of the cost of resistance c. The mathematical derivations of these boundary regions and their physical intuition is included in the Supplementary Information. Interestingly, in this mixed equilibrium, both demes contain a mixture of susceptible and resistant strains, a stricter type of strain coexistence than a situation in which each strain dominates a different deme.
This simple example, consisting of only two spatially-connected neighborhoods of equal size, demonstrates that spatial heterogeneity in strain fitness (induced by heterogeneity in drug treatment) can lead to stable coexistence at not only the population level but also the subpopulation level, suggesting a resolution for the ubiquity of coexistence in empirical data (point (I) in the Introduction). This host population structure produces a situation in which even though a strain is disfavored in a global sense by having a smaller R 0 value than its competitors (as defined in Eq (4) above), it can be locally favored and thus  avoid extinction. However, in this simple example, the region in parameter space where coexistence is seen is relatively small, motivating the question of whether more complex structures will expand the stability of coexistence. This two-deme example cannot explain the observation that regions receiving the same amount of treatment or regions neighboring one another often support very different levels of resistance (see (II) and (III) in the Introduction). In this case there is only one treated deme, and for a given set of kinetic parameters, there is only one stable level of resistance. No alternative outcomes are possible. And although it is indeed the case that the two neighboring demes in this scenario have different frequencies of resistance when coexistence occurs, this is obviously due to the fact that individuals in one deme receive treatment and individuals in the other do not. Replicating the real-world data requires that we provide situations in which neighbors with the same rate of drug treatment still have very different rates of resistance. In the subsequent sections, we examine whether more complex population structures can recreate these observed patterns.

Robust coexistence in more complex population structures
Our results for simple one-and two-deme scenarios suggest that host population structure can promote coexistence of drugsusceptible and resistant strains of an infection. To investigate this relationship further, we extended our model to include an arbitrary number of demes and connectivity patterns. For simplicity, we assume all demes are the same size (D). Each deme may only be able to spread infection to a sub-set of other demes, and the connectivity of the population is described by the adjacency matrix ∆ i j (∆ i j = 1 if an individual in deme j can be infected by an individual in deme i). Treatment is assigned at the level of the deme, described by indicator variable T i (where T i = 1 if deme i is drug-treated and T i = 0 otherwise). The system of equations describing the fraction of individuals in each deme who are infected, with either the wild-type (w i ) or drug-resistant strain (r i ) iṡ We used this model to examine the spatio-temporal trends in drug resistance for a set of theoretical population structures. Each population consisted of M demes, and each deme was randomly connected to d other "neighbors" (which specifies the adjacency matrix ∆ i j ), therefore representing a random regular graph 46 ( Figure 5A). Although in reality much more complex human population structures are possible, these simplified structures provide a good base to examine the impact of specific properties of structure on infection dynamics. Each deme was independently assigned to be "treated" uniformly-at-random (meaning if an individual in that deme became infected they would receive treatment, while individuals "untreated" demes would not), such that the total fraction of treated demes is ρ. For each randomly-generated population structure and treatment allocation, the model in Eq. (5) was simulated until the level of drug-susceptible and drug-resistant strains had reached an equilibrium in each deme.
Using populations of M = 20 demes each connected to d =3, 4, or 5 other demes, we found that robust coexistence of resistance and susceptible strains within the entire population and within individual demes was possible for a wide range of population structures and parameters ( Figure 5C). We defined 'robust' coexistence to mean that at least 80% of demes had at least 10% of infections caused by each strain. For the baseline parameter values we used (κ=0.25, β =0.05, g=0.1, ε=0.9, c=0.2), treatment levels between 25 and 55% supported robust coexistence in at least some population structures and treatment allocation schemes. For a lower cost of resistance (c = 0.05), robust coexistence occurred at a lower range of treatment levels (10% to 35%). The degree of connectivity of the demes (d) does seem to impact the likelihood of coexistence (treatment levels supporting coexistence do not depend on d), but other details of the structure beyond the average degree have an impact as well. For example, of all the graphs with d = 3, Graphs 11 and 13 have the most coexistence, but Graph 11 requires treatment levels a full 10% lower than Graph 13. A more detailed study of the role of these properties is in a later section. The more important parameter appeared to be the amount of between-deme infection spread (β ) arising from contact between "neighboring" demes relative to the within-deme spread (κ). As β increases relative to κ, the region of coexistence decreases, which is consistent with our findings that single well-mixed populations result in competitive exclusion of strains even for incomplete treatment.
These results show that even relatively simple host population structures can reconcile mathematical models with empirical observations of long-term coexistence between drug-susceptible and drug-resistant strains. Coexistence is possible for a broad range of costs of resistance, treatment levels, and population structures. As observed in data, the overall prevalence of resistance in a population is roughly correlated with the treatment coverage ( Figure 5B). These results also provide a connection between our model and the data for another spatio-temporal trend, (II): the same overall rate of drug consumption can lead to different prevalence of drug resistance in different populations (even if treatment efficacy and cost of resistance are the same). For example, in Figure 5B, a particular treatment level (e.g. ρ = 0.1 for the red parameter set) can be associated with very different rates of resistance in different population structures (up 40% difference, similar to the data in Fig. 1).     Randomly generated population structures on which infection was simulated. Each node represents a deme (a well-mixed sub-population of individuals), and each edge indicates that infection can spread in either direction between those two demes. The total population consists of M = 20 demes and each deme has either d =3, 4, or 5 randomly connected neighbors. Ten example populations for each d were created and indexed for infection simulations. B) Fraction of infections that are resistant in the entire population (y-axis) versus fraction of demes treated, ρ (x-axis). Each color represents a different parameter set (blue background -baseline, red background -lower cost of resistance, teal background -more between-deme connectivity). Numbers show data points for individual populations indexed in (A). For each population, 100 simulations were run, each with a different spatial arrangment of treatment but perserving overall treatment levelρ. The colored envelope is created by shading between sigmoidal curves that were fit to the results for the d = 3 and d = 5 graphs. C) For each example population structure shown in (A) (x-axis) and each treatment level (ρ), the proportion of simulations that resulted in robust coexistence between drug-susceptible and drug-resistant strains is shown (by the colored area of the box). Robust coexistence was defined as at least 80% of demes supporting both strains at frequencies above 10%. All simulations used kinetic parameters κ = 0.25, g = 0.1, and ε = 0.9.

9/33
Large differences in resistance levels are possible even between connected regions With these complex multi-deme populations, we examined whether neighboring regions with equal levels of antibiotic consumption could sustain vastly different amounts of resistance (observation (III). Rather than looking at the total level of resistance, we wanted to know the distribution of differences between resistance levels among neighboring untreated regions. For each population structure and parameter set, we chose all nearest-neighbor pairs which were both untreated, and generated a distribution of the pairwise differences in resistance levels between these neighbors ( Figure 6A). We found that large differences in resistance levels between neighbors were common, with 34% of pairs receiving identical treatment differing by more than 10% resistance prevalence, for our baseline parameter values when averaging across all population structures with degree d = 3 or d = 4. In the same simulations, 2% of pairs overall -and up to 15% in some structures -differed by greater than 30% resistance frequency, a value near the upper limit of observations from data shown in Figure 1. Differences of up to 60% between neighbors were observed in some individual simulations.
While populations with higher levels of between-deme mixing (higher β /κ, teal parameter set) often supported persistence of both drug-sensitive and drug-resistant strains at equilibrium ( Figure 5B), they did not support large difference in resistance levels between demes receiving the same treatment. In this parameter regime, susceptible and resistant strains are more segregated between untreated and treated demes, respectively, explaining why simulations on these structures rarely produced "robust" coexistence as we defined it (over 80% of demes supporting at least 10% of each strain) ( Figure 5C). We found that populations with higher degrees of connection between demes (i.e. d = 5 graphs) rarely had pairwise differences in prevalence greater than 20%. Overall, these findings corroborate the intuition that adding edges between demes or raising the 10/33 between-deme infection rate of connected demes brings the system closer to well-mixed population and hinders coexistence.

Properties contributing to higher resistance
In order to understand how specific properties of host population structure contribute to the frequency of resistance at within an individual deme and across a community of interconnected demes, we simulated infection dynamics on a large set of randomly generated structures and then statistically analyzed the predictors of resistance levels. To generate a large ensemble of random population structures which varied in many graph-theoretic properties, we used the Watts-Strogatz 47 algorithm to create 1000 unique networks of M =50 demes each. This method allowed us to study graphs with different average degree, variance in degree, and a set of other properties such as centrality, efficiency, and mean path length (as shown in Figure S2). Details are provided in the Methods.
In each graph, we generated 50 different allocations of treatment at overall coverage levels of both ρ = 0.24 and ρ = 0.4. The number of treated neighbors and next-nearest neighbors was recorded for each deme. We simulated infection with both strains until an equilibrium was reached, at which point the frequency of resistance was recorded for each deme and for each population overall. At the lower treatment level, less than 50% infections were generally resistant to drug, whereas at the higher level resistance was more than 50% of infections ( Figure S3).
Collecting the results for each graph-treatment allocation scenario, we employed LASSO (least absolute shrinkage and selection operator) regression to identify the most important factors contributing to the frequency of resistant infections (Table  1). We first examined predictors of the resistance level within a deme. The most important contributor to resistance was whether or not the deme itself was treated, followed by the distribution of treatment among the deme's neighbors. The centrality of the deme -a measure of its overall level of connectivity within the network -emerged as a less important predictor.

Regression on deme properties
Regression on graph properties  Table 1. Properties of demes and graphs in order of importance in accurately and sparsely predicting the frequency of drug-resistant infections according to LASSO regression. The shorthand with "U" and "T" indicates the frequency of pairs and triplets on untreated and treated demes; thus, U-T-T indicates the frequency of three connected demes of which two are drug-treated, normalized by the number of all three-deme combinations in the graph. The rank is determined from the order in which the coefficient for that predictor becomes non-zero as the regularization constraint on the sum of coefficients increases from zero (Fig. S4), provided that the predictor is included in all subsequent models. The sign is the direction of the relationship between the property and the frequency of resistance in the model which minimizes the mean-squared-error upon cross-validation. If the sign is listed as "-", then the parameter is not used in the "best" (error-minimizing) model. The predictors whose sign changes depending on ρ are highlighted in red. ρ is the fraction of demes in the population receiving treatment. LASSO regression was conducted separately for two ρ values (0.24, 0.4). The mean-square error as a function of the regularizaton constrain after five-fold cross-validation is shown in Fig. S5.
We next evaluated predictors of resistance at the level of the entire population. Surprisingly, the simplest statistics of a graph -including the mean and variance in the number of neighbors of each deme -were the least important contributors to resistance. Rather, the best predictors involved a mix of treatment-clustering properties (frequency of triples involving two treated demes, for instance) and node-clustering properties (such as clustering coefficient and efficiency). Local efficiency and clustering coefficient are both measures of how interconnected the neighbors of a given deme are, on average. Global efficiency and mean path length are both measures of the ease of moving between any two random demes in the population. When treatment levels are lower and less than half of infections are resistant (ρ = 0.24), increasing each of these properties is 11/33 associated with decreased resistance levels (− sign in Table 1). However, for three of them -local and global efficiency and average path length -this trend actually reverses when most infections are resistant (ρ = 0.4) (+ sign in Table 1). Therefore, these properties tend to always favor the more prevalent strain, and hence act against coexistence. The more interconnected a network is, the harder it is for multiple strains to coexist. A similar effect occurs at the level of individual demes with the centrality measure: it always acts against coexistence within a deme. Highly-interconnected, hub-like demes are more likely to harbor the more common strain, while demes that are on the periphery of a network are more likely to support the strain which is rarer in the population overall. Hence, when resistance is rare, it may be preferentially found in less connected demes, whereas when it is more common, it may cluster in more central demes.
Interestingly, lower clustering coefficient always promotes resistance in a population. The reason why this trend is the opposite of efficiency and path length, which change sign depending on treatment level ρ, is not entirely clear. However, it is worth observing that the treatment-clustering properties are not independent of the clustering coefficient; in fact, since the number of treated demes is always the same (for a fixed value treatment level ρ), these frequencies are directly dependent on the clustering coefficient. The fact that some of these change sign depending on treatment level while others do not may explain the unchanging sign of the clustering coefficient.
Robustness of results to mechanism of treatment action Throughout this paper, we have modeled the effect of antibiotic treatment as reducing the ability of an infected individual to spread infection to others. Hence, treated individuals are still infected, and therefore immune to infection with the other strain, but do not contribute to transmission (or contribute less, if treatment is imperfect, ε < 1).
Alternatively, treatment could instead accelerate the rate at which an individual recovers from infection, a mechanism that has been used in many previous models of antibiotic resistance 26,27 . To determine if the mechanism of treatment action had any influence on our findings about the spatio-temporal patterns of resistance and to facilitate comparison of our results with those of other models, we considered a variant of the model where in treated individuals, the recovery rate increases from g → g + τ (see Supplementary Methods for details).
We found that all of the results reported in the main text for the model with treatment reducing transmission rate are qualitatively recapitulated in a model where treatment increases recovery rate. When there is only one deme, where a fraction of individuals are treated, we do not see coexistence for any parameter values (Fig. S6). For two demes, one of which is drug-treated, we see either competitive exclusion or robust coexistence, and the shape of the coexistence regime -which occurs when there is much more disease spread within vs between demes and costs of resistance are intermediate -is similar in both cases (Fig. S7), with the region of coexistence being slightly larger in the model with accelerated recovery rates.
When there are multiple demes, the overall prevalence of resistant infection as a function of the treatment coverage and the transmission parameters is very similar between the different models of drug effects, as is the frequency of "robust coexistence" wherein 80% or more demes contain 10% or more of each strain (Fig. S8). When we look at the distribution of pairwise differences in resistance levels between neighboring untreated demes, we again see very similar results (Fig. S9). The most noteworthy difference is that for graphs of degree 5 and a resistant strain with very low cost (c = 0.05), the modal difference increases from near 0 to near 10% (Fig. S9C).

Discussion
The spread prevalence of antibiotic-resistant bacterial infections displays several surprising spatial and temporal trends that defy canonical models of infection dynamics. The goal of this paper was to create a simple and flexible model for competition between drug-susceptible and drug-resistant strains of an infection in a structured host population and examine whether it could capture several of these trends. The first perplexing pattern observed in the data is the long-term coexistence of both susceptible and resistant strains at intermediate levels, which many other models have attempted to explain 20, 26-28 . Our model reproduces coexistence for a wide range of parameter values and in fact suggests that it is a natural feature to be expected in spatially (or otherwise) heterogeneous models. In addition, our model also captures more nuanced trends observed in surveillance data that other models have ignored. Regions that administer similar amounts of a particular antibiotic may experience persistent differences in drug resistance levels for many years, and these "frozen gradients" are observed even if regions are neighboring one another. These phenomena were robustly reproduced even in our simple model. We found that in a collection of partially-mixing subpopulations, subtle details about the pattern of connectivity and how treatment is distributed between them can lead to large differences in resistance prevalence between regions that otherwise seem similar. We were able to identify specific covariates that contribute to resistance levels and to promoting coexistence of resistant and susceptible strains.
Increasing levels of antibiotic resistance are widely considered to be a major public health threat, and attempts are continually underway to forecast resistance levels with or without additional interventions using at least some type of mathematical modeling. Our findings highlight the fact that infection models which rely on assumptions of well-mixed populations are unlikely to be 12/33 a useful tool for this task. As these models cannot reproduce ubiquitous trends observed in existing data on drug resistance epidemiology, they are unlikely to make predictions that are even qualitatively trustworthy. For example, it is very important to know whether resistance for specific bug-drug combinations will tend towards 100% or settle at an intermediate level, and so any model must be able to explain the phenomena of coexistence for most strains. While clearly any degree of resistance complicates clinical management, the economic impact of partial vs complete resistance in a population could differ dramatically. In addition, by ignoring the details of spatial or demographic heterogeneity, we may miss an opportunity to more effectively target intervention or surveillance to particular subpopulations. The obvious difficulty is that such models will may have to be highly-tailored towards a particular disease, and currently, there is a stark lack of data required to construct or calibrate realistic spatial models. More routine high-resolution mapping of resistance levels over space and time, as well as data to estimate population connectivity patterns over which disease spreads, are sorely needed.
Throughout this paper we have considered a relatively simple type of population structure: identical subpopulations ("demes") each contain well-mixed populations of individuals, and these subpopulations can be connected to others through an arbitrary graph structure. Connections indicate paths over which infection can spread, and we have assumed that transmission is characterized by only two parameters: a within-deme rate and a between-deme rate. For each deme, either every individual receives treatment if infected, or none do. In reality, population structure could of course be much more complicated. There may be multiple levels of structure -from a household to countries -with much more complicated patterns of connectivity and a continuum of transmission rates. Structures may be dynamic, as movement and interaction patterns of humans, animals, or disease-carrying material change over time. The mechanism leading to population structure varies depending on the infection of interest. For hospital-acquired infections, the movement of medical staff or the spatial arrangement of patients may be most important, while for community-acquired infections, the nature of close household, school, or workplace contacts may be of primary interest. For sexually-transmitted infections, networks of sexual contacts determine disease spread, while for diseases of livestock, relatively well-mixed contact within crowded farms along with patterns of transfer between farms may create the overall "meta-population" structure. Host population structures are notoriously difficult to measure in practice, but some approximation of them is likely needed to create realistic models for the spread of drug resistant infections. Technological advancements that allow better tracking of human mobility, such as wearable proximity censors [48][49][50] or fitness trackers, cell phone location data [51][52][53][54] , or large-scale transit networks 55,56 , provide one source of population structure data. Advances in genetic epidemiology allow us to retrospectively trace the spread of infection between individual hosts or communities using pathogen phylogenies [57][58][59][60][61] , which also provides information on population structure.
Spatial heterogeneity in transmission rates is understood to be an important factor in the spread of vector-borne infections, such as malaria and dengue. "Risk mapping" projects that attempt to identify the covariates most important for determining disease incidence, measure these variables at high spatial resolution, and use them for predicting disease distribution patterns, are a well-established tool 62-65 . While environmental factors such as temperature and rainfall which affect insect life cycles are clearly important in vector-borne disease transmission, more recent work has identified human factors such as housing materials and movement patterns as determinants of infection rates [65][66][67][68][69] . Results of these risk maps can be used to allocate prevention or treatment strategies, predict changes in disease incidence under changing environmental conditions, or model transmission routes. In our simulated populations, we found that particular features of population connectivity and the distribution of antibiotic consumption could predict resistance levels, suggesting that similar spatial risk-based assessment may be useful in the study of drug resistance. Much more fine-scaled surveillance will be needed to test these ideas on empirical data. Spatial heterogeneity has been understood by population geneticists to be an important factor in the spread of genes since the days of Fisher 70 and later Kimura 71 . Other classic work uncovered mechanisms by which the coexistence -as opposed to competitive exclusion -of species could be promoted by spatial heterogeneity 72,73 . More recently, Gavrilets & Gibson 74 examined a two-deme system with two competing types, each of which has higher fitness than the other in one of the two demes, and observed a phase diagram very similar to that for our two-deme infection model (Fig. 4), in which polymorphic equilibria were possible. This idea was later extended from two demes to multiple connected demes 75 , where it was demonstrated that spatial heterogeneity can preserve a species in spite of fitness differences which in a well-mixed model would drive it to extinction. Other work involving more complex population structures 76 or more complex fitness distributions across space 77,78 has examined how the rate of invasion of rare mutants depends on spatial heterogeneity. Interestingly, there may be asymmetry between how heterogeneity affects resident strains and their invaders 78 . However, all of this previous work involves traditional population-genetic models, such as Moran or Wright-Fisher processes, and may not be generalizable to infection dynamics. For example, the role of network structure in modulating the fixation probability of new strains differs dramatically between Moran processes 76 and infection models 79 .
In the context of drug resistance, previous work on within-host infection models has examined how spatial heterogeneity can accelerate the initial rate of emergence of drug resistance mutations [80][81][82][83][84][85] . Drug levels may differ between compartments with high or low drug penetration, or may follow continuous spatial gradients. The time it takes for enough mutations to accumulate to produce a fully-resistant strain may be dramatically reduced in the presence of spatial heterogeneity, which 13/33 allows step-wise accumulation of mutations and avoids crossing fitness valleys. The major difference between this work and our own is that we focus on how spatial structure has an additional role in maintaining balanced coexistence after resistance mutations are initially introduced.
While our results show that population structure can explain observed spatiotemporal patterns of antibiotic resistance, it is unlikely be the only mechanism contributing to either long-term coexistence or differences between regions. Persistence of both drug-susceptible and drug-resistant strains is also promoted by confection and superinfection [26][27][28] , by overlaps between resistance status and serotype 26,27 or other traits under balancing selection 31 , and by heterogeneities in transmission or recovery rates as opposed to treatment coverage 26,27 . Regions receiving similar overall levels of drug but experiencing different frequencies of resistant infections could alternatively be explained by local differences in i) prescribed vs consumed doses, ii) who receives drugs (e.g. age group, hospital vs outpatient) and for what type of condition, iii) genetic background of the pathogen and the cost of resistance, or iv) transmission or recovery rates. It is difficult to judge precisely how realistic our results are, since our models of infection dynamics and population structure are both dramatically simplified, but it likely cannot explain coexistence over the entire range of parameters for which it is observed empirically. In addition, in our model it is relatively rare to get large differences in resistance prevalence between regions with similar treatment rates unless they have very different connectivity patterns within or between them, and unless there are large spatial or demographic heterogeneities in treatment use that overlap with clustering of disease transmission. In reality, multiple mechanisms probably act in concert to explain the observed patterns. Moreover, there is unlikely to be single explanation for each different bug-drug pair for which these trends are observed. Other disease-specific factors that may be relevant to the ecology of antibiotic resistance include whether the bacteria is primarily pathogenic or also a commensal colonizer, whether infection is more common in the community or in healthcare facilities, whether resistance is carried on plasmids or chromosomally, and whether there are environmental reservoirs and if they are exposed to antibiotics.

Model assumptions
Our two-strain model of infectious disease dynamics makes a number of simplifying assumptions. We assume that individuals can transmit the infection immediately upon becoming infected, hence ignoring any sort of "latent" phase or "exposed" class. Birth, death, and migration into and out of the population are not modeled. The infection is assumed to be non-lethal. Infected individuals all recover to be susceptible again, which presumes that there is no long-lasting immunity to re-infection. We do not allow for any co-infection (simultaneous infection with both strains) or super-infection (replacement of one strain within an individual by another), which implies that there is complete cross-immunity during infection. We model resistance as a binary trait: there are only two strains, either completely susceptible to the drug, or completely resistant. In reality, there may be strains with intermediate levels of resistance. Individuals who are assigned a status of "treated" will receive treatment immediately upon being infected, every time they are infected. This ignores the possibility that there may be a delay to receiving treatment or that treatment behavior may vary within an individual over time. Treatment is assumed to prevent transmission of the infection, but may not be perfectly effective at doing this (which could approximate the effect of treatment delay or imperfect drug efficacy). In the Supplementary Methods, we consider an alternative model in which treatment acts to increase the rate of recovery from infection instead.
By using differential equations to model infection, we assume that the population sizes of individual demes are large enough so that variation from the average behavior is not important. We additionally assume that resistance is pre-existing and that stochastic extinction of either strain from the population is not possible. For the most part, we examine only the equilibrium behavior of the models.
For each model (one-deme, Eqs.(1), two-deme, Eqs (3), and multi-deme, Eqs. (5)), the transmission rates κ and β were scaled so that the model behavior did not depend on the size of the total population N or the size of each deme D. In the one-deme case, κ was divided by N, and in the two-and multi-deme cases, both κ and β were divided by D. A more detailed derivation of this scaling is given in the Supplementary Methods. This scaling is equivalent to saying that individuals are limited in how many contacts they can have with others, and that this limit is independent of the total population size.

Analytic results
The basic reproductive ratio R 0 was calculated for the one and two deme systems (Eqs1 and 3) using the next-generation technique 45 .

Numerical results
The differential equations were numerically integrated in MATLAB, using the Runge-Kutta solver provided in the function ode45. The initial condition for each parameter-population structure combination was each deme having a level of infection

14/33
with the wild-type that would occur if there were no resistant strain (a fraction 1 − 1/R 0 of individuals are infected with the wild type), and a very small level of infection with the resistant strain (10 −3 ). The rest of the population was susceptible.
The system was integrated until an equilibrium was reached, which was defined as the point at which the time derivatives for all strains in all demes changed by less than 10 −4 per day. To ensure that the equilibrium reached by the solver is stable, once an equilibrium was reached by the above standard, we applied a small random fluctuation with magnitude not exceeding 1% to the value of all strains in all demes. If the same equilibrium was arrived at after repeating this process three times, the equilibrium was taken to be sufficiently stable and used as the result of that simulation. To determine whether there was only a single positive and stable equilibrium for the multi-deme system for a given structure and parameter set, we simulated the same systems many times with uniformly-at-random initial conditions. For each deme, the initial fractions [w i , r i ] were both drawn from the uniform distribution on [0, 0.7]. If their sum exceeded 1, both were re-drawn. We found that the same equilibrium was always reached ( Figure S10). More details are given in the Supplementary Methods.

Network generation
Random regular graphs: Random regular graphs are randomly-generated graphs in which every node has the same degree (number of incident edges). There are multiple existing algorithms to create random regular graphs. Our random regular graphs were generated according to a pairwise-construction mechanism 86 implemented in MATLAB 87 .
Watts-Strogatz graphs: The Watts-Strogatz algorithm is a method of constructing networks with M total nodes and M * K total edges, but with heterogeneous properties. In the original construction, a ring lattice is formed with each of the M nodes connected with K edges to neighbors in a ring formation. Then, with probability p, the target of each edge is rewired to a uniformly random node. For our 1000 graphs, we selected M = 50, but allowed degree K to be selected uniformly-at-random for each graph from 3, 4, 5, or 6, and allowed p to be selected uniformly-at-random for each graph from [0.1, 0.6]. The resulting graphs have a broad distribution all graph properties of interest ( Figure S2) and of the frequency of resistance ( Figure S3).

Network statistics
A set of graph-theoretical properties were tested for their relationship to resistance levels within demes and within populations. A description of these properties is given below.
Eigencentrality: For a node i, its eigencentrality is a measure of not only its centrality (having a high degree, for example), but is also weighted according to how high the centrality scores of its neighbors are; as an example, an academic paper with many citations may have high centrality, but if the papers citing it have themselves low centrality then the eigencentrality of the original paper may be low. This is written as an eigenvector-type equation for the eigencentrality E i : where λ is a normalization factor.
Global efficiency: In general, the average efficiency of a graph is defined as Eff where M is the number of demes/nodes and d i j is the smallest number of edges between nodes i and j. Therefore, graphs that have higher efficiency have on average shorter path lengths between nodes. In the context of infection, we might say that if an outbreak in a completely healthy population begins in one node, graphs with higher efficiencies see infections spread across the entire graph faster than graphs with lower efficiencies.
Since D = 50 is fixed in all of our structures examined in this section, the only varying factor is the distances between nodes; the smaller the distances between all given nodes, the higher the efficiency. Global efficiency involves comparing the graph to the most efficient possible graph with as many nodes. Since this simply involves another prefactor which is equal for all of our graphs, our global efficiency is equal to the average efficiency.
Local efficiency: Local efficiency is defined as Eff where G i is the subgraph in which we keep the neighbors of node i but remove i itself, and the efficiency is defined as above.
Clustering coefficient: Global clustering coefficients are obtained by counting triplets, i.e., three nodes connected either in a triangle (three edges, a closed neighborhood) or with only two edges (an open neighborhood). The clustering coefficient used here is given by counting up all closed triplets (triangles) and dividing by the total number of triplets in the graph.
Average path length: Given any two nodes, we can calculate the path length as the smallest number of edges that must be traversed to move from one to the other. The average path length is the average over the distribution that arises from calculating this smallest number for all possible pairs of nodes. LASSO 88 is an advanced regression technique which assigns not only a regression coefficient, but also a hierarchical importance to each covariate. This is accomplished by forcing the sum of all the regression coefficients to be less than a certain number; in optimal fitting, some coefficients are then set to zero (i.e., deemed unimportant) rather than assigned a non-zero value. This is usually accomplished by adding the sum of all the regression coefficients, multiplied by a regularization parameter λ , to the quantity being minimized (such as sum of the squared error). When λ is large, having few covariates in a model becomes

15/33
more important than the model fitting the data accurately. Therefore, the error-minimizing model usually occurs for some intermediate value of λ .
We employed Matlab's lasso (and lassoPlot) function (which imposes the LASSO constraint on the L 1 norm of all fit coefficients onto a linear regression) with the various graph (and node) properties as possible model variables. The target to fit was either the rate of resistant infections (when considering an entire population of demes) or simply the prevalence of resistant infections (within a single deme).
Although a rigorous and robust significance test for the LASSO is still lacking, we can compute the mean-squared error of the predictions of the model to the actual data for cross-validation. Here we choose five-fold validation, meaning that the data is randomly split into five chunks of equal size. Four of these chunks are used to fit the regression, and then the mean-squared error between the actual fifth chunk and its prediction by the regression is measured. If this mean-squared error is sufficiently small, the model is a good fit. The results of this crossvalidation procedure for LASSO regression fits to both deme and population-level properties are shown in Fig. S5.

Supporting Information
Analytic results for well-mixed single deme with treatment The dynamics of competition between drug-susceptible and drug-resistant strains of infection in an infinite, well-mixed population where a fraction f of individuals receive treatment if infected is described by Eqns. 1. This system of equations tracks the proportion of the population in each treatment and infection category (Infected with the wild type and untreated (w u ), infected with the wild type and treated (w t ), infected with the resistant strain and untreated (r u ), and infected with the resistant strain and treated (r t )). These equations were developed by first writing down the dynamics of the number of individuals in each state: In Eqs. (S.1), the transmission rate k represents the rate at which each infected individual contacts and infects each susceptible individual.
To reduce the variables in this system and remove any dependence on total population size, we let κ = kN and x i = X i /N (the fraction of individuals instead of the total number, for X = W, R and i = u,t), which then leads to Eqns. 1. By keeping κ = kN as the constant parameter, we are implicitly assuming that the total rate at which a single individual can contact and infect others when the population is completely susceptible (kN) is independent of the population size. This is a reasonable assumption for many infections which require some type of active contact, and here allows us to reduce the number of parameters required. None of our results depend on this scaling, and the effect of changing population size is equivalent to changing κ in this deterministic model.
There are three possible equilibria of Eqns. (1). In each of these equilibria, there is no possible coexistence of infections. These equilibria are as follows:

No infection survives
This uninfected equilibrium is only stable when 2. Only wild-type (drug-susceptible) strain survives This infected equilibrium is stable when where the later condition implies c > f ε.

21/33
3. Only drug-resistant strain survives This infected equilibrium is stable when where the former condition implies c < f ε.

Analytic results for two connected subpopulations
In an effort to be more general than in the main text, we consider the case of a population of total size N divided into two demes, with a fraction f of the total population is contained in the drug-treated deme. The system describing the number of individuals in each state can then be written as: To create a nondimensional version of the system, we instead track the fraction of individuals in each deme who have a particular infection status, so , and κ = kN and β = bN, which gives the altered system: Using the next-generation technique 45 , we arrive at the following values for the reproductive numbers, In general, this system has nine equilibria. However, only one of these is stable and physically realizable (e.g. having between 0% and 100% of individuals infected) for any given parameter set. These are the values that are plotted in the main text. These outcomes fall into one of two qualitative categories. Either one strain drives the other to competitive extinction (true in Fig. 4 for c > 0.4 and c < 0.06, for instance) or both strains exist in both demes simultaneously. The boundaries separating this region from the competitive-exclusion region are R r 0 = R w 0 , the lower bound when viewed as a function of the cost of resistance c, and the curve which solves where [w * u , w * t , r * u , r * t ] denote the equilibrium values. This second curve, in words, is where the drug-treated population is infected at the same rate by both drug-resistant and drug-susceptible strains.
For the parameters chosen here and used throughout the paper, it is impossible to have R w 0 < 1. For sufficiently high c, we can have R r 0 < 1. However, this line in parameter space lies deep within the region where the population is infected with the wild-type strain only.

Scaling and nondimensionalization of the multi-deme model
The dynamics of infection in a network of interconnected demes can be described by the following system of equationṡ where X i represents the total number of individuals in deme i infected with strain X and D i is the total number of individuals in deme i. Assuming all demes are of the same size D allows us to scale the system on D and remove it as a parameter. Writing x i = X i /D, κ = kD and β = bD gives us the reduced system presented in Eqs (5).
The apparent absence of multiple equilibria Although the generic system of equations modeled by Eqns. (5) admits multiple equilibria, we actually never observed any dynamic phenomena other than a steady convergence to a single stable fixed point, even when initial conditions were wildly different from that considered in the main text. It is not uncommon for infection models to admit equilibria which are physically unrealizable, i.e., taking values outside the unit simplex -it may therefore be the case that any other stable equilibria are simply unphysical.
However, to comprehensively demonstrate this fact -either theoretically or numerically -is outside the scope of the current work. To prove such a result theoretically would necessarily draw on ideas of nonlinear algebra or algebraic geometry, such as intersection theory, which are outside the present scope; to demonstrate numerically that for all parameter sets and graph structures there is only one stable equilibrium is computationally infeasible, even on graphs of a given small size.
To illustrate the idea that there seems to be only one stable equilibrium per system, we fixed an exact distribution of treatment with ρ = 0.35 to be used on every random regular graph used in the main text. This means that, since all nodes in all graphs are labelled (and the random regular graphs in the main text have twenty demes each), seven labelled demes (here 1,3,8,11,15,19,20) were drug-treated irrespective of graph structure. We then performed, for all 30 graphs, 100 trials with uniformly random initial conditions over every deme, i.e., every deme can start with any possible arrangement of wild-type and resistant infectives (so long as the sum of the two does not exceed 1). We then compared the equilibria seen with the equilibria found for the initial condition used in the main body of the text.
There are many possible metrics which would illustrate the relative closeness of all the equilibria seen. In Fig. S10 we have elected to plot the maximum matrix norm (the 2-norm) of the difference between the equilibrium seen in any of the 100 trials and the equilibrium from the main text. That is to say, all of the trial equilibria differ in norm from the main-text equilibrium by an amount not more than that plotted in the figure for a given graph structure. The parameters used were our default (κ, β , g, ε, c) = (0.25, 0.05, 0.1, 0.9, 0.2).
In all of these cases, the maximum norm deviation is extremely small. This could be due to two possible reasons. The first is due simply to aggregated numerical error from the many steps involved of a) calculating the equilibrium (which involves a threshold of stopping when the time-derivatives in all demes fall below a certain threshold, which can allow for some very small freedom in what an "equilibrium" is in a given trial), b) calculating the matrix norm (which amplifies these small errors). The second explanation is that while multiple stable equilibria may exist, they are extremely close together, with almost clinically-unmeasurable differences in the rates of wild-type and resistant infections within a given deme in a given community.

Model with effects of treatment on recovery rates
To demonstrate the robustness of the results described in the main text, we explored a second model in which the drug, rather than inhibiting the spread of the infection, instead speeds up recovery. The effect of treatment is to increase the rate of recovery from g to g + τ. Here, τ represents the idea that an infected individual will seek a clinician, obtain a prescription for drug, and clear the infection as a result of drug at times represented by Poisson distributions, the average of which is τ. While in a mathematical sense there is no bound on τ in the way there is a bound on ε in the original model (where ε < 1), there is a realistic range for τ. For instance, if (g + τ) = 1, then it would take the average infected individual only one day to be diagnosed, prescribed, and cured -this is perhaps an unrealistic situation to model. We therefore usually used a value of τ between 0.1 and 0.3.
This model matches qualitatively with that discussed in the main text, in that in reproduces the spatial motifs in the same way. The two models are also quantitatively quite similar for reasonably-akin parameter sets. Here we reproduce the main results for this alternative model.

23/33
A single well-mixed deme Here, the system can be written aṡ Here, R 0 for the resistant strain is unchanged from the main text. For the wild-type strain, For the purpose of comparing this model, the value of τ for which R w 0 in the main-text model (with our baseline ε = 0.9, f = 0.5) equals R w 0 in this alternative model is τ = 0.163.

Two connected demes
In this model, the equivalent of Eqs S.9 can be written aṡ (S.14) Using the next-generation technique, we were able to derive the values of R 0 in a two-deme system for the alternative model, as an analog to Eq. S.10: The expressions are slightly more complicated, but the curves generated are very similar to the main text. Indeed, Fig. S7 shows the analog of Fig. 4, and the two are very similar upon visual inspection. As in the main text, here the boundaries of the coexistence region are the curves R w 0 = R r 0 (lower boundary) and the implicit curve where more wild-type infectives than resistant infectives are produced per unit time in the drug-treated deme (upper boundary).

Scaling and nondimensionalization of the multi-deme model
Here, the model can be summarized as a set of ODEs akin to Eq. (1) for each deme: 16) As in the main model, writing w i = W i /D, r i = R i /D,κ = kD and β = bD gives a nondimensionalized system, Resistance Surveillance Network 11 . The year-to-year deviation from the average value shown in Figure 1 is less than 3%.

25/33
Occurance across graphs Populations were created as random networks of 50 demes using a variant of the Watts-Strogatz algorithm. Treatment was randomly assigned to a portion ρ of demes so that an overall desired fraction treated was achieved (here, ρ = 0.24). Distributions show properties for 1000 such networks. The first six properties are intrinsic to the network structure, while the latter six describe how treatment is allocated over the network. Quantities of the form X-Y give the proportion of all pairs of connected demes in which one deme has treatment status X and one has treatment status Y (U=untreated, T=treated). Quantities of the form X-Y-Z give the same information for triples of connected demes (order ignored). A description of the methods for creating the networks and for calculating each property is given in the Methods.  Figure S4. Results of LASSO regression to identify predictors of resistance. Plots show the regression coefficients obtained as a function of the internal constraint on L 1 norm of coefficients ("regularization parameter", λ ). Each curve is a different predictor variable (described in Table 1). Analysis were separately run to predict deme-level resistance from deme-level properties (left side) or population-level resistance from graph-level properties (right side). Each analysis was conducted for two different levels of drug coverage in the population (either a fraction ρ = 0.24 or ρ = 0.4 of demes treated). Numerical labels on the curves denote the order in which the coefficients for the predictors become non-zero as the constraint is relaxed (λ → 0), and match up with the order of entries in Table 1.

27/33
Regularization parameter (λ) Since this percentage jumps from 0% to 100% at a critical value f * depending on kinetic parameters, there is no coexistence in one deme in this model.  Figure S8. Dynamics of drug-resistant infections in populations consisting of networks of inter-connected demes, in the model where drug-treated inviduals experience accelerated recovery from infections with the wild-type strain. A) Randomly generated population structures on which infection was simulated. Each node represents a deme (a well-mixed sub-population of individuals), and each edge indicates that infection can spread in either direction between those two demes. The total population consists of M = 20 demes and each deme has either d =3, 4, or 5 randomly connected neighbors. Ten example populations for each d were created and indexed for infection simulations. B) Fraction of infections that are resistant in the entire population (y-axis) versus fraction of demes treated, ρ (x-axis). Each color represents a different parameter set (blue background -baseline, red background -lower cost of resistance, teal background -more between-deme connectivity). Numbers show data points for individual populations indexed in (A). The colored envelope is created by shading between sigmoidal curves that were fit to the results for the 3-regular and 5-regular graphs. C) For each example population structure shown in (A) (x-axis) and each treatment level (ρ), the proportion of simulations that resulted in robust coexistence between drug-susceptible and drug-resistant strains is shown (by the colored area of the box). Each of the 100 simulations for each population structure had a different random spatial arrangement of treatment, where with probability ρ individuals in a given deme would all receive treatment if infected. Robust coexistence was defined as at least 80% of demes supporting both strains at frequencies above 10%. Figure S10. Difference between equilibrium values obtained for infection dynamics in multi-deme systems as initial conditions vary. For each of the population structures depicted in Fig. 5A, treatment was allocated randomly across demes with an overall proportion treated ρ = 0.35. For each graph-treatment allocation combination, 100 different sets of initial conditions (levels of resistant and sensitive infections in each deme) were chosen uniformly at randomly, and infection dynamics were run until an equilibrium was reached as described in the Methods. The maximum difference (in matrix 2-norm) was calculated between the equilibria seen in any of 100 trials and the equilibrium values used for results reported in the main text. Very low maximum norm values suggest there is a single stable equilibrium value for each parameter set.