Localization, epidemic transitions, and unpredictability of multistrain epidemics with an underlying genotype network

Mathematical disease modelling has long operated under the assumption that any one infectious disease is caused by one transmissible pathogen spreading among a population. This paradigm has been useful in simplifying the biological reality of epidemics and has allowed the modelling community to focus on the complexity of other factors such as population structure and interventions. However, there is an increasing amount of evidence that the strain diversity of pathogens, and their interplay with the host immune system, can play a large role in shaping the dynamics of epidemics. Here, we introduce a disease model with an underlying genotype network to account for two important mechanisms. One, the disease can mutate along network pathways as it spreads in a host population. Two, the genotype network allows us to define a genetic distance across strains and therefore to model the transcendence of immunity often observed in real world pathogens. We study the emergence of epidemics in this model, through its epidemic phase transitions, and highlight the role of the genotype network in driving cyclicity of diseases, large scale fluctuations, sequential epidemic transitions, as well as localization around specific strains of the associated pathogen. More generally, our model illustrates the richness of behaviours that are possible even in well-mixed host populations once we consider strain diversity and go beyond the"one disease equals one pathogen"paradigm.

Introduction Viral species are known to often undergo rapid evolution. Since the early 20th century, influenza viruses have been described as having marked variability and unpredictable behaviour [1]. Subsequent RNA virus studies of the 20th and 21st century have focused on, among others, the Zaire ebolavirus, strains of the SARS-CoV species, and HIV-1, all possessing high mutation rates [2]. These frequent mutations contribute to the antigenic evolution of these viruses, allowing them to evade recognition by the human immune system [3].
Despite the long-standing knowledge of subtypes and strains within viral species, mathematical disease modelling has continued to model viral diseases with one underlying pathogen. Notably, influenza violates the "one disease, one pathogen" paradigm: numerous types, subtypes, and strains of influenza viruses challenge the human immune system, driving vaccine effectiveness below 50% in most recent years [4][5][6][7]. Models which fail to account for antigenic variation of a pathogen may lead to biased characterizations of epidemic emergence and progression.
Modelling multi-strain pathogens with consideration for antigenic properties requires the inclusion of cross-protective effects, in which the immunity acquired towards one strain offers partial protection towards another strain based on their antigenic similarity. Cross-protection is seen in numerous viral species [8][9][10]. In general, more similar strains will have greater cross-protective effects, as with seasonal influenza [11]. However, cross-protective immunity is not necessarily a monotonically decreasing function of antigenic distance. Antibody-dependent enhancement has been observed in dengue viruses, in which a past infection may in fact increase the risk of severe infection [12,13]. Regardless, approximations of cross-protection may be made through antigenic distance or genetic distance. This relationship may be determined by a function of genetic distance to approximate the unique antigenic distances between all strains.
Several models have been proposed in the growing sub-discipline of multistrain disease modelling [14]. These models balance biological assumptions with computational tractability through reduction via symmetry (e.g. antigenic neighbourhoods [15]), age structure [16], and deciding to capture either infection history or immune status [14], among other modelling choices. Cross-protective immunity has been explored in two-strain models [17], multi-strain models with a restricted number of antigenic loci and alleles [18], and temporary cross-protective immunity in dengue models [19] capable of producing cyclical and chaos-like infection progression. However, the effects of an underlying genotype network structure-governing viable mutation pathways and genetic distances between strains-have not been thoroughly explored with multistrain models. Genotype networks consist of nodes that represent strains, with edges connecting strains that differ by one nucleotide or amino acid in some antigenic region of a gene or protein [20]. Genotype networks are a complementary structure to phylogenetic trees, and are a useful way of representing genetic distance necessary for crossimmunity in multi-strain models.
Moreover, the genotype network gives us a proxy through which we can specify potential mutation pathways between strains. Mechanisms for pathogen mutation have previously been included in mathematical models [21,22], often to consider the emergence of antiviral resistance [23][24][25][26]. Particularly, these models predict the emergence of sequential epidemic transitions-with a first epidemic threshold defining the emergence of macroscopic disease spread and a second marking the emergence of treatment resistant strains [24]. However, such models are often limited to only two pathogen strains as they require specification of the fitness cost associated with resistance. We therefore aim to introduce a more general model, allowing large number of strains to mutate along specific network pathways. While this general model could consider a complex fitness landscape over this genotype network, we focus on the case of neutral evolution and show how the previous results discussed here can all co-exist within a single, fairly simple model.
We introduce a multistrain Susceptible-Infectious-Recovered-Susceptible (multistrain SIRS) epidemic model with an underlying genotype network, allowing the disease to evolve along plausible mutation pathways as it spreads in a well-mixed population. We then investigate the effects of genotype network structure on the emergence of an endemic state and on the fitness distribution of strains across the genotype network. Altogether, our results challenge the typical phenomenology of epidemic models. We observe localization of infections in the genotype space and identify two epidemic transitions. The first corresponds to the emergence of an endemic state where new infections mostly target the susceptible portion of the population, and the second marks the point where recovered individuals significantly contribute to new infections. Between these thresholds, we find chaos-like behaviour which can be maintained for arbitrarily long times, yielding time series with epidemic cycles featuring large, unpredictable fluctuations.

Methods
We study the spread of infectious disease within a well-mixed population for a defined genotype network of the chosen pathogen. Our model is as follows.
The underlying epidemiological dynamics correspond to a simple SIRS model, but where we add a genotype network defined as a set of potential mutations, meaning an infection of strain i 2 [1, N] can mutate along the network to a neighbouring strain j 2 N i , where N i specifies the set of first network neighbours of strain i. Biologically, this network is defined such that neighbouring strains i and j differ by one unit of genetic distance.
The strains spread within a well-mixed host population. Host individuals are defined as susceptible (S) if they possess no immunity to any strain of a disease, see Fig 1. Susceptible individuals progress to infectious state I i at transmission rate β for every contact with individuals infectious with strain i, occurring at rate βI i for every susceptible individual. Note that this basic transmission rate is held constant for all strains, as we focus on neutral evolution as a first approximation.
Individuals in I i can either: (i) recover at rate γ to state R i and acquire direct immunity for strain i and partial immunity to strain j 6 ¼ i; or (ii) become infected with strain I j via mutation at a rate μ for all strains j in N i . Individuals in R i will either: (i) lose immunity and progress to S at rate α, or (ii) become infected with strain j 6 ¼ i to which they only possessed partial immunity and progress to I j at a reduced rate β � , where β � is an exponentially decaying function of genetic distance between strains i, j. Specifically: where x ij is the genetic distance between strain i, j (approximated by shortest path of length x ij = x ji between strains i, j in the genotype network) and Δ is the characteristic length of immunity transcendence (0 < Δ < 1). This model makes the simplifying assumption that an individual only possesses immunity to the most recent strain of infection, existing in one immune state at a time.
Altogether, the dynamics of our model can be followed by the following set of ordinary differential equations (ODEs), where A ij is an element of the adjacency matrix of the genotype network, equal to 1 if there is mutation pathway between i and j and 0 otherwise. The total proportion of individuals infected can be obtained by summing over all strains, I(t) = ∑ i I i (t), and we also focus on its asymptotic value We therefore have 5 important epidemiological parameters: transmission rate β, recovery rate γ, rate of waning immunity α, mutation rate μ and immunity transcendence Δ. Unless mentioned otherwise, we fix the recovery rate γ = 1 such that other rates are defined in units of infectious period (and Δ in units of genetic distance). The other parameters then allow us to investigate different regimes of interest. As Δ ! 0, immunity becomes strain specific with no cross-protective effects. As Δ ! 1, immunity becomes broad-reaching to the point of universal protection across all strains. With β � as a function of distance, we are able to reduce model complexity by avoiding specification of b � ij among all strains, whose values may not be known in real-world applications. Instead, we rely on the inverse relationship between antigenic distance and cross-protection that has been observed in influenza viruses [11]. Note that this relationship may not be monotonically decreasing for all pathogens, in which case β � may be defined by a function of genetic distance unique to the pathogens.
Our most important assumption is perhaps that only the most recent infection is relevant for cross-immunity effects. Indeed, I i and R i specify the pathogen involved only in the most recent infection for every individual. The alternative would have been to model an infectious state I i,j,. . . for all unique infection histories, of complexity Oð2 n Þ if order does not matter and complexity Oðn!Þ if it does. While a big assumption, focusing on the most recent infection reduces the complexity to OðnÞ. Computational feasibility would be largely restricted otherwise, which would limit the analysis of the effects of genotype network structure [14,27]. The infection history approximation enables genotype networks to be large enough to contain complex structure, necessary to investigate the role of genotype networks in epidemic progression.

Results
We focus our attention on the consequences of the genotype network underlying the spread of the disease. In order to gain as much insights as possible on how it affects prevalence of a disease, we keep the network itself simple using well-known graph toy models composed of lattices, chains and stars.

Localization in genotype space
We first ask which strains can be expected to have an advantage, not because of their own fitness or of our epidemiological parameters (as they all share the same β, γ and α), but because of their position in the genotype network. We use three simple network structures-a star, a square lattice, and a chain, all containing 25 strains-and run our model to produce a large outbreak with β = 25 much greater than the expected SIRS epidemics threshold of β c = 1. Accordingly, we set the evolutionary dynamics to be much slower than that of epidemic spread with μ = 10 −3 . We then let the system reach its endemic steady state, where the derivatives in Eqs (2)-(4) essentially go to zero such that the system is at equilibrium.
We observe a localization of infections by strain within genotype networks as shown in Fig  2, top row. Stationary or endemic infection counts differ from strain to strain. This holds true even with the assumption of neutral evolution, based solely on their position in the network and the resulting cross-protective immune effects. Epidemics can therefore be localized around a minority of strains, as is clear in the lattice and chain networks.
We quantify this localization phenomena with Kish's effective sample size [28], referred to here as effective participation ratio n � eff ¼ n eff =n ¼ ð As n � eff ! 1, all strains contribute an equal number of infections. As n � eff ! n À 1 , only one strain contributes infections. In Fig 2, top row, we observe lower n � eff in the lattice and chain, indicating greater localization. A small number of strains are able to escape strong cross-protective immunity in the corners of the lattice and at the ends of the chain, while such heterogeneity is not seen in the star and ring networks.
As network structure determines infection localization, so does the transcendence of immunity. In Fig 2, middle row, we see n � eff as a function of β and immunity transcendence Δ, revealing regimes of strong localization in the lattice and chain networks where n � eff remains small. High values of Δ > 10, indicating far-reaching cross-protection, are associated with localization in these two networks. The structure of the star and loop networks allow them to escape localization effects influenced by large Δ.
Stationary infection counts I � are also influenced by immunity transcendence Δ. In Fig 2, bottom row, we see reductions in I � as Δ increases. As cross-protective effects increase, a higher β becomes necessary to maintain infections. Again we see the importance of network structure, with different values of Δ required between networks to affect I � .

Sequential phase transitions
We then look at the behaviour of the endemic state as we vary the basic transmission rate β. We know from classic SIRS model that there should be an epidemic transition at β c = 1, marking a transition between a disease-free phase where the disease is too weak to establish itself in the population if β < 1, and an endemic phase for larger values. Yet, one interesting result of Fig 2, bottom row is that the epidemic threshold now seem to increase with transcending immunity Δ. This is somewhat surprising given that Δ does not matter for any one strain, which should still be able to survive on its own following SIRS dynamics once β > β c = 1.
In Fig 3, we take a deeper look at the phase diagram under varying transmission rate and observe a second epidemic transition. More precisely, if α is not too large, I � is no longer a concave function of the transmission rate; it emerges as expected at β c = 1 but has a new inflection point at a much higher β value. This means that only modest increases in I � are seen when β is just above to the epidemic threshold β c = 1, in contrast to standard SIS-like models in which this regime experiences the most rapid rate of change in I � as a function of β [29].
We conjecture that this second phase transition is governed by what we call the immune invasion threshold, corresponding to the point at which infected nodes starts to infect recovered nodes (of other strains) effectively. To see this, in Fig 3(a), we compare the bifurcation diagrams of two models: with and without waning immunity. In the latter case, in the stationary state, a node infected with strain i can only infect recovered nodes of strains j 6 ¼ i (since S � = 0). The immune invasion threshold β I can thus be estimated from β c if α 7 ! 0. Surprisingly, even though β c = 1 whenever α > 0, it is no longer the case when α = 0.
Previous work has demonstrated a similar phenomenon in which I � as a function of β is dependent upon the level of transcendence of immunity [30,31]. This indicates the importance of transcending immunity in multi-strain modelling. However, our model differs by showing the development of a second transition that is absent from these models, having potential consequences regarding the robustness of I � with change in β. The second transition reveals that little changes may occur in I � below the second transition, while I � may change rapidly above this transition in our model. Furthermore, the structure of the genotype network itself is critical to the nature of this second transition. This difference with previous work PLOS COMPUTATIONAL BIOLOGY could be consequential in evaluating the utility of public health interventions aimed at reducing transmission rate.
To derive the immune invasion threshold, let us rewrite the stationary state quantities I � , R � when α = 0. We have where T ij � ð1 À e À x ij =D Þ. Isolating R � i in the second equation and reinjecting the solution in the first one, we obtain a self-consistent equation for the fI � i g, Interestingly, the fI � i g do not depend upon β. However, we know that such solution is possible only if I � i > 0 8i, and this break down at β I when R � ¼ Therefore, the immune invasion threshold β I has the following explicit expression where the fI � i g are evaluated from Eq (5). We observe a direct relationship between Δ and β I . Namely, when Δ ! 1, T ij ! 0 for all i, j, hence β I ! 1, as seen from Eq (6). When Δ ! 0, T ij ! 1 for all i 6 ¼ j, and T ii = 0, hence For large networks, β I � γ � 1 in the limit Δ ! 0. Therefore, we conclude that increasing Δ increases the immune invasion threshold, which makes sense based on intuition alone.
The relationship between the immune invasion threshold and the genotype structure is further explored in Fig 4 for the three toy networks across multiple values of Δ. As Δ increases, immunity becomes wide-reaching in genetic distance, approaching the effects of universal immunity or a universal vaccine. This has the effect of necessitating higher β to produce the same I � as lower values of Δ. Importantly, because of the sum in the denominator of Eq (6), the immune invasion threshold β I is not simply set by the diameter of the genotype network (i.e., the maximum value of x ij ), and is instead set by the entire network structure. While strains maximally distant from each other can much better infect recovered individuals, competition between strains also play an important role: central strains can still infect individuals and grant them better immunity due to their central position in the network. Thus, the network structure plays a nontrivial role in setting the exact value of β I as determined by Eq (6).

Rich dynamics in between epidemic thresholds
Beyond the features of the endemic state, we observe rich prevalence dynamics throughout the epidemic when transmission rates are between the epidemic threshold β c = 1 and the immune invasion threshold β I � β c . By comparing the top, middle, and bottom rows of Fig 5 we see infection counts throughout the epidemic simulation while the transmission rate lays in different regimes, decreasing from β > β I to values closer to β c = 1.
For transmission rate below the immune invasion threshold (bottom two rows), we see oscillations in the overall infection counts across all three networks before converging on an endemic value, resembling a dampened pseudo-chaotic behaviour. Both cyclical and chaotic infection progression have been observed in modelling by Gupta et al., dependent upon the strain structure [32]. The strain structure introduced by the genotype network enables a clear depiction of these phenomena and allows us to show how the network structure itself impacts the range of parameters where chaotic behaviour is expected.
Noting the different time scales shown, the chain rapidly converges on its endemic state while the star undergoes drastic oscillations before convergence. We see variation in infection counts at the strain level, with the infection counts for 3 of the 25 strains shown. At the strain level we see convergence occurring on different time scales within the same network, as well as variability in oscillatory nature.
In comparison, the top row of Fig 5 shows the rapid convergence on the endemic state when the transmission rate is high (β = 25). There still exists infection localization, as indicated by different endemic infection counts at the strain level, as well as variability in convergence time between strains. However, the oscillatory nature is profoundly absent at transmission rates well above β I . In contrast, as transmission rates are lowered towards β c = 1 in the bottom rows of Fig 5, we see the oscillations preserved but stretched across a broader timescale. Importantly, as the timescale of oscillations is stretched, their minimal values decrease by orders of magnitude. In practice, this shows that any finite size simulations of the dynamics captured by our model would likely lead to strain extinction, with potential to reemerge through mutations. Discrete events are unfortunately not captured in ODE models as they assume continuous values, or infinite population.

Discussion
The introduction of an underlying genotype network to a multistrain model has demonstrated the emergence of cyclicity, infection localization, and sequential phase transitions, all in one model. Simple mathematical arguments have allowed us to solve for the transitions observed and highlight the nontrivial impact of the structure of the genotype network. Rich infection dynamics are seen between the epidemic threshold and the immune invasion threshold. Altogether, what these results show is that many features of infectious disease dynamics often explained by environmental factors or host behaviour, such as cyclicity [33], unpredictability [34] and sequential transitions [35], can also be explained by adding a layer of biological complexity in the form of a genotype network. Our results thus highlight the importance of going beyond the "one disease, one pathogen" paradigm, with complex dynamics emerging from even the most simple genotype network structures. Integration of the ODEs on three toy genotype networks-The chain (left column), square lattice (middle column), and star (right column)-All with 25 strains. We fix the recovery rate γ = 1, the mutation rate μ = 1/1000, the waning immunity rate α = 1/5000 and the transcending immunity Δ = 4 and vary the transmission rate: β = 25 (Top), β = 1.7 (Middle), β = 1.1 (Bottom). The system is initialized with a small fraction 10 −5 of infections on an "end strain" (end for the chain, corner for the lattice, leaf for the star). On the chain we see successive activation of all strains, with the system stabilizing once the entire network is explored and evolution reaches a dead-end. The star sees cycles caused by activation of the leaf strains. The lattice is much more interesting, with loops causing a random-like succession of strains to cycle. The dynamics become more interesting for the bottom row, with transmission rates between the epidemic and immune invasion thresholds, with cycles and chaos-like dynamics. The closer we get to the true epidemic thresholds β c = 1, the longer the interesting transient dynamics. https://doi.org/10.1371/journal.pcbi.1008606.g005

PLOS COMPUTATIONAL BIOLOGY
Future work needs to be done to integrate this modelling approach with real genomic data. Likewise, the interplay of our results with the finite size and the contact structure of the host population needs to be investigated; as does the role of strain extinction and emergence. Different modelling approaches will need to be considered, such as explicitly modelling the growth and evolution of the genotype network as it co-evolves (albeit on a different timescale) with the spread of the infectious disease in the host population. Coupling the large modelling literature on growing networks [36] with that of network epidemiology [29] should lead to a richer understanding of how networks, both biological and social, impact epidemics. One other temporal feature that should be taken into account is the immune history of individuals. Whereas we currently only consider the most recent infection, the total immune history could grow exponentially with the number of strains considered and therefore represents an important modelling challenge. Finally, this type of model could also be appropriate to reimagine vaccination strategies. The literature on targeted immunization and influential spreaders on networks could then be leveraged [37][38][39][40], but rather than targeting central individuals the objective would be to best hinder and block the immune evasion of the pathogen as it mutates along its genotype network.
In terms of applying these models to specific scenarios, there is a need for unbiased pathogen genomic data, as well as an understanding of their antigenic properties, to inform models that account for these features using real-world data and to refine the cross-protective immune effects between strains of a pathogen. Similarly, we need more realistic models to take advantage of the growing body of genomic data available and refine the mechanisms driving mutation and immunity. We call for the refinement of immune mechanisms and immune history to allow their incorporation in mathematical disease models. Further understanding of how pathogens explore genotype space, the growth of genotype networks, the role of host immunity towards past strains, and the influence of the above on the fitness landscape of pathogens will better inform models incorporating multiple strains, cross-protective effects, and the evolution of a pathogen.