An Exact Relationship Between Invasion Probability and Endemic Prevalence for Markovian SIS Dynamics on Networks

Understanding models which represent the invasion of network-based systems by infectious agents can give important insights into many real-world situations, including the prevention and control of infectious diseases and computer viruses. Here we consider Markovian susceptible-infectious-susceptible (SIS) dynamics on finite strongly connected networks, applicable to several sexually transmitted diseases and computer viruses. In this context, a theoretical definition of endemic prevalence is easily obtained via the quasi-stationary distribution (QSD). By representing the model as a percolation process and utilising the property of duality, we also provide a theoretical definition of invasion probability. We then show that, for undirected networks, the probability of invasion from any given individual is equal to the (probabilistic) endemic prevalence, following successful invasion, at the individual (we also provide a relationship for the directed case). The total (fractional) endemic prevalence in the population is thus equal to the average invasion probability (across all individuals). Consequently, for such systems, the regions or individuals already supporting a high level of infection are likely to be the source of a successful invasion by another infectious agent. This could be used to inform targeted interventions when there is a threat from an emerging infectious disease.


Introduction
'Compartmental' models [1]- [4], in which interacting individuals exist in discrete states, for example: susceptible, infectious or recovered, capture the essence of real-world host-to-host infection dynamics. Transition between states is often represented as a timehomogeneous Poisson process [2], [3], which can depend on the states of other individuals. Assuming a large and evenly-mixed population, in which every individual interacts equally with every other individual, many important results have been derived by using a mean-field approximation. For example, the deterministic susceptible-infectious-recovered/removed (SIR) model exhibits an invasion threshold whereby, depending on the combined effect of the rate at which individuals make infection-causing contacts and the rate at which infected individuals recover, a small number of initial infecteds will either cause a significant outbreak, the size of which can be calculated and is often referred to as the 'final size', or the infection rapidly dies out [1]. Likewise, the deterministic susceptible-infectious-susceptible (SIS) model also exhibits an invasion threshold such that, depending on the same factors, the infection either persists at some constant endemic level or rapidly dies out [5]. The way in which these thresholds, and the final size and endemic prevalence, are affected when an immunisation process is included has been investigated. By comparing these and similar models with real statistical data it has also been possible to quantify the invasion threshold, and expected impact on the population, for various diseases [6], [7]. Therefore, we have some way of determining optimum vaccination policies for the eradication or control of specific diseases.
To reflect the probabilistic nature of the real-world process of invasion, and disease transmission in general, stochastic descriptions are required. Moreover, the probability of invasion from a single infectious individual clearly depends on that individual's particular relationships with others, e.g. some individuals may be better connected than others. In order to capture such heterogeneity, the population can be represented as a contact network [8] which defines, for each individual, the subset of the population with which it has direct contact. If the population is then assumed to be infinite such that the number of neighbours with which an individual has contact is described by a probability distribution, it is sometimes possible to compute the invasion probability and (fractional) final size for the stochastic SIR model by utilising percolation theory [8]- [10]. For finite populations, there are many numerical methods by which to measure the final size distribution [11]. However, finite contact networks do pose conceptual difficulties; indeed, endemic behaviour is often associated with non-trivial stationary distributions which for many finite systems do not exist, while the theoretical definition of invasion probability depends on a framework which posits an infinite population such that invasion corresponds to indefinite persistence.
In the next section we will introduce the network-based SIS stochastic model and explain, following Harris [12], how it can be represented graphically. We will also show how such a graphical representation can be used to prove an important equation, which we state, and which expresses the property known as 'duality'. This property was discovered by Holley and Liggett [13], and Harris [14], but it is usually discussed in relation to undirected nonweighted networks. We find that it is also relevant in the context of directed weighted networks. In 'Theoretical Results' we will define and justify, for an arbitrary strongly connected network, exact mathematical quantifiers for both endemic prevalence and invasion probability. We also prove an exact relationship between them (see Pastor-Satorras and Vespignani [15] for a discussion of endemicity in random scale-free networks, and Gilligan and van den Bosch [16] for an overview of 'invasion and persistence' in epidemiological models). In 'Numerical Simulation' we will discuss how these quantifiers are to be measured through stochastic simulation and provide some examples. Notably, we will illustrate our theoretical results on the largest strongly connected (5,119 node) component T ex (File S1) of a particular heterogeneous transmission network. The full network comprises 11,480 nodes and was generated from simulations on a complex model of the spread of H5N1 avian influenza through the British poultry flock [17], [18]. It exhibits extensive heterogeneity including complex spatial structures, heterogeneous transmission strengths varying over many orders of magnitude, clustering and directed links. It therefore serves as a somewhat rare example of 'realistic' epidemic contact structures.

Markovian SIS Dynamics on a Contact Network
In Markovian SIS dynamics, an individual is able to flip repeatedly between two states: susceptible and infectious. This happens via a locally influenced time-homogeneous Poisson transmission process and an individual-specific time-homogeneous Poisson recovery process (on recovery an individual returns to the susceptible state). In the context of individuals interacting in this way on a regular square lattice the SIS model is also known as the contact process [19] (see Liggett [20] for theoretical results, and Durrett and Levin [21] for an ecological perspective). We will consider the dynamics of the Markovian SIS model on the full set of networks which are finite and static (our theoretical results will apply to the subset that are strongly connected). A generic weighted network (that can also be directed) will be denoted by a matrix T, where T ij indicates the rate parameter of the Poisson process in which individual j infects individual i, assuming j is infectious and i is susceptible (i,j[f1,2, . . . ,Ng where N is the population size). Thus, T combines the rates at which the individuals interact with the probability that infection occurs when an infectious individual contacts a susceptible individual. In this way, T captures features of the network and the infectious agent. We will also refer to a vector C~(c 1 ,c 2 , . . . ,c N ) where c i is the rate parameter of the Poisson process in which i recovers when it is infectious.
Assuming the system is in a specific stochastic state, let l i be the infectious pressure on individual i such that i is on course to be infected via the 'first arrival' of a Poisson process with rate parameter l i . Similarly, let m i be the recovery pressure on individual i such that i is on course to recover via the 'first arrival' of a Poisson process with rate parameter m i . We can now define the stochastic model with the following equations: where I i~1 if i is infectious and S i = 1 if i is susceptible, and both are zero otherwise. Given an initial configuration, such that the state of each individual is known, direct simulation can be employed to produce statistically accurate realisations of the stochastic model for any T and C. This framework, which represents the dynamics of several sexually transmitted diseases [22]- [25] and computer viruses [26]- [29], exhibits the phenomena of both invasion and endemic prevalence (See Durrett [30] for a discussion of methods and results relating to SIS dynamics on random scale-free networks).

Graphical Representation
Harris [12] showed that the contact process, and equivalently Markovian network-based SIS dynamics, can be fruitfully represented as follows: Each member of the population is assigned its own positive real number line or 'time line'. Next, a Poisson point process with intensity c i is placed on the time line corresponding to each individual i[V where V~f1,2, . . . ,Ng. Following Grimmett [31], such points are called 'points of cure'. Then, for each ordered pair of individuals (i,j), such that T ij =0, we place arrows going from j's time line to i's time line according to a Poisson point process with intensity T ij . These arrows are called 'arrows of infection'. Under such a representation, the probability that there is at least one path from 0 on a time line corresponding to an individual in subset A(V to t on the time line corresponding to individual k is equal to the probability that k will be infected at time t when only the members of A are initially infected (paths take into account the direction of time and the directions of the arrows of infection and are blocked by the points of cure). An example of this kind of representation, for a fully connected network of three individuals, is given in figure 1.

Notation
Assume that we have a network T, with an associated vector of recovery parameters C. Now, graphically representing Markovian SIS dynamics on this network, and adopting the notation of Harris [19], let j A t (with A(V ) be the set of individuals such that i[j A t if and only if there is at least one path from 0 on a time line corresponding to an individual in A to t on i's time line (see figure 1 for an example of a path). We will use P T,C to denote the appropriate probability measure. Thus, the probability that at least one member of a subset B will be infected at time t, given that only the members of subset A are initially infected, i.e. at t~0, can be expressed as P T,C (j A t \B=1).

The Duality Property for General Networks
The property of 'duality' pertaining to Markovian SIS dynamics (and the contact process) can be expressed as follows: where A,B(V , T is an arbitrary weighted contact network and T T is the transpose of T. Note that for undirected networks T~T T which simplifies the relationship and, in this case, the process is said to be 'self-dual'. To understand equation 2, it is simplest to firstly consider the case where T is undirected and where the subsets contain a single individual; A~j and B~i. Then the probability of existence of a path from j at time 0 to i at time t is expressed by the left hand side of equation 2. With reference to figure 1, reversing the arrow of time (turning the diagram upside down), we see that the probability that a path exists from individual i at time 0 to j at time t is the same calculation since the Poisson point processes are the same in reverse. This is expressed by equation 2.
When the network T possesses an asymmetry, the reverse process needs to be computed on the transposed network to be equivalent since the direction of infection is reversed. More generally, the probability that there exists at least one path from an individual in set A at time 0 to an individual in set B at time t is equal to the probability that there is at least one path from an individual in set B at time 0 to an individual in set A at time t in the transposed network, and this statement is represented by equation 2.

The Quasi-Stationary Distribution as a Quantifier of Endemic Prevalence
An immediate problem with finite systems is that there are no genuine stationary distributions corresponding to endemic infection because the long-term behaviour is always guaranteed extinction (of the infectious agent). However, from a practical point of view, it is sometimes possible to obtain something like the endemic stationary distribution; figure 2a illustrates this clearly for our example network T ex (File S1) (figure 2b illustrates the probabilistic nature of the invasion process which is discussed in subsequent sections). While a genuine stationary distribution does not exist, it is clear that we can still measure something similar to endemic prevalence through stochastic simulation since ultimate extinction is, in this case, very unlikely on short time scales.
From a theoretical perspective, the situation can be made precise. Our system is finite and Markovian with a single absorbing state (extinction of the infection). Also, if the network under consideration is strongly connected such that infection can be transmitted, via some route, from any individual to any other individual, then the transient states form a single commuting class. In this case, if we initiate the system in a transient state and condition on the survival of the infection, then the system tends to a unique distribution referred to as the quasi-stationary distribution (QSD) [32]-[] [34]. Let us denote by A an arbitrary subset of the population represented by a strongly connected network T, and let C be the vector of individual-specific recovery parameters. We can now unambiguously define P A T,C (quasi{prevalence) to be the probability that at least one individual in subset A of the network is infectious in the QSD. In the notation of the graphical representation introduced earlier, we can write: . It can be argued that the quasi-stationary distribution has practical relevance (i.e. is a good 'representation' of the endemic situation [34]) if the rate of convergence to this distribution, when conditioning on non-absorption, is rapid compared to the rate at which the system decays to inevitable absorption when it is 'initiated' in the QSD [32]. This can often be the case for SIS dynamics for which, according to Nåsell [34], 'it is easy to find examples where the expected time to extinction even for a rather small population exceeds the age of the universe'. More specifically, Simonis [35] has shown that the contact process on large but finite multi-dimensional homogeneous square lattices, where the initial state is all-infected, will be near to the upper invariant measure of the corresponding infinite process, restricted to the finite set, for most of its lifetime (assuming the corresponding infinite process is supercritical).
Let us consider the following quantities for an arbitrary strongly connected network T and arbitrary C: ~The probability that at least one member of subset A is infected at time t given that all individuals are infected at t~0. 2. P T ,C (j V t \V =1)~The probability that the infection survives to time t given that all individuals are infected at t~0. 3. P T ,C (j V t \A=1)=P T,C (j V t \V =1)~The probability that at least one member of subset A is infected at time t given that all individuals are infected at t~0 and given that the infection survives to time t.
Note that in the limit as t?? quantity 3 is equal to P A T,C (quasi{prevalence), the probability that at least one member of A is infected in the QSD.
In figure 3a, the way in which these three quantities may vary with respect to time is illustrated for a scenario in which the QSD has practical relevance. In such a scenario, the quantifier P A T,C (quasi{prevalence) is able to capture the value at which P T,C (j V t \A=1) initially 'plateaus' before its slow decay to zero. We have defined P A T,C (quasi{prevalence) in terms of the process which occurs when all individuals are initially infected. We could, however, define P A T,C (quasi{prevalence) in terms of the process which occurs when only the members of B5V are initially infected since the QSD, as defined by Daroch and Seneta [32], [33], is independent of initial conditions. Nonetheless, the process which occurs when all individuals are initially infected is unique in that it has the maximum expected time to extinction across all possible initial states (this follows from j B t (j V t (with B5V ) -see, for example, Grimmett [31]). Therefore, this is the nonconditioned process which is most appropriately described by the QSD.

The Equivalent Quantifier of Invasion Probability
In the context of finite contact networks, invasion probability has not been given a rigorous theoretical definition (see Nåsell [36] for a discussion of the threshold phenomenon in the stochastic SIS model). For dynamics with a genuine stationary endemic distribution, which (in this context) can only exist when the population size is infinite, the probability of invasion can be defined as the probability that the infection will persist indefinitely. For finite populations, this is complicated because we have to distinguish between infections that fail to invade and infections which successfully invade but then subsequently die out.
In this section we show that the quantifier of invasion probability, which is as equally meaningful and relevant as our quantifier for endemic quasi-prevalence, for outbreaks initiated on the members of a subset A of a strongly connected network T (with associated vector C) can be defined as:  Here we illustrate how it is possible for the quantifiers P A T,C (quasi{prevalence) and P A T,C (quasi{invasion) to capture critical information about the model. If T is undirected then these quantifiers are numerically the same and have equal practical relevance (as is seen by assuming that T is the same undirected network in (a) and (b), above Note that the definition of P A T,C (quasi{invasion) implies that quasi-invasion is certain when all individuals are initially infected. Our definition corresponds to the intuitive notion of invasion, i.e. a 'large' outbreak, or the 'attainment' of the QSD.
Let us now consider the following quantities: 4. P T,C (j A t \V =1)~The probability that the infection survives to time t given that only the members of A are infected at t~0. 5. P T,C (j V t \V =1)~The probability that the infection survives to time t given that all individuals are infected at t~0. 6. P T,C (j A t \V =1)=P T,C (j V t \V =1)~The quotient of quantities 4 and 5.
It follows from duality that the three quantities, 4, 5 and 6, are all equal respectively to the three quantities, 1, 2 and 3, provided that we transpose T. Note also that, in the limit as t??, quantity 6 is equal to P A T,C (quasi{invasion). Quantity 4 denotes the survival probability up to time t. We see in figure 3b that this 'plateaus' in exactly the same way as quantity 1 for the transposed network (figure 3a). This plateau, which is captured by quantity 6 in the limit as t??, i.e. P A T,C (quasi{invasion), corresponds to the 'achievement' of the QSD and thus with successful quasi-invasion. Our quantifier of invasion probability can be generalised as: where X is a transient stochastic state in which the infection is present. P X S (t) is the probability of survival to time t given that the system is initiated in state X , and P max S (t) is the probability of survival to time t given that the initial state is that which maximises the expected time to extinction. In this form, the definition becomes applicable to other Markovian infection dynamics (on strongly connected networks) which permit endemic behaviour, e.g. susceptible-infected-removed-susceptible (SIRS) dynamics. It is the existence of a unique QSD which enables our definition to capture the probability of invasion in the same way as for SIS dynamics. Note that the definition of quasi-prevalence can also be generalised to any infection model with a unique QSD.

The Prevalence-Invasion Relationship
Our main result can be stated as the following (prevalenceinvasion) relationship: for any subset A of a weighted and strongly connected contact network T, conditional on Markovian SIS dynamics. Equation 6 can be re-written as: lim t??
which holds because of the property of duality.
Note that for a single individual we have: that is, the probability of quasi-invasion from a given individual i is equal to the probability that it is infected in the QSD (in the transposed network). By summing over all i[V and dividing by N we get where P global T,C (quasi{invasion) is the probability of quasi-invasion given that the infection is seeded on a single individual selected uniformly at random, and P global T,C (quasi{prevalence) is the average fraction of the population that are infected in the QSD. An implication of the global-level relationship is that, for (strongly connected) directed networks, reversing the transmission processes (transposing T) will result in an interchange between these two quantifiers without affecting the 'stability' of the quasi-stationary behaviour, i.e. the rate at which the system decays to inevitable extinction when 'initiated' in the QSD is the same for T and its transpose. This can be understood by observing that given the infection is initiated by individual i, the probability that i is infectious at any t §0 is the same for T and its transpose.
For the case where T is undirected (T~T T ), the relationship implies that: and for a single individual i[V: and globally: Numerical Simulation The probability P i T ,C (quasi{prevalence) that individual i is infectious in the QSD is equal to the proportion of time for which i is infected after the system has 'reached' this distribution, assuming the infection does not die out. Therefore, we measure this as where n is a large number of simulated consecutive global events which occur when the probabilities for the system states obey the QSD. Dt k is the simulated time between the (k{1) th event and the k th event, and I k i~1 if i is in the infectious state between these events but is zero otherwise. The total simulated time is denoted by t~P n k~1 Dt k . Obviously, the larger we can make n, the more exact our measurement becomes. Notice that the events corresponding to a change in the state of individual i, and thus to a change in the value of I k i , represent a very small fraction of the total global simulated events (assuming the population size is not extremely small).
In practice, we run the simulation for a sufficiently long time such that the system is likely to be described by the QSD before we start to compute P i T,C (quasi{prevalence). For example, were our simulation to produce an infectious time line similar to the one in figure 2a we would only include the data from after, say, t~30 in our computation. Note that in every simulation there is always the possibility of extinction, even after the QSD has been 'reached'. However, we can safely discard any post-extinction data since the quantity we are interested in is conditioned on non-extinction. This method of measurement is only valid for systems in which quasi-stationary behaviour can be easily identified i.e. systems for which the QSD has significant practical relevance.

Measurement of P i T,C (quasi-invasion) by Stochastic Simulation
In measuring the probability of quasi-invasion through stochastic simulation, the key requirement is separating major outbreaks from minor outbreaks. Therefore, we look for dichotomised behaviour in relation to the time until extinction by carrying out large numbers of simulations, each initiated in the same stochastic state. For example, we can measure P i T,C (quasi{invasion) as the fraction of simulations in which an outbreak initiated by individual i persists for some sufficiently large number E I of infection events, such that a clear distinction can be made between the simulations in which long-term persistence occurs and the ones in which it does not. This is because P i T,C (quasi{invasion) corresponds to the value at which P T,C (j i t \V =1) initially plateaus. Without observation of this dichotomised behaviour it may not be possible to classify every simulation. For example, the histogram of infection events per simulation in figure 2b enables us to clearly identify the simulations which exhibit long-term persistence, even though we have only tracked each simulation to the 300th infection event i.e. E I~3 00. If such a histogram cannot be produced from the simulations, E I can be increased in order to try and 'uncover' the dichotomised behaviour.
In general, so long as this kind of dichotomised behaviour is found, the practical issues which emerge in measuring quasiinvasion probability by stochastic simulation for the SIS framework are minor. In other words, this method of measurement is valid, and easily carried out, in systems where P T,C (j i t \V =1) significantly plateaus i.e. systems where P i T,C (quasi{invasion) has significant practical relevance.

Simulations on our Example Network
By varying a scalar multiplier of a network matrix we can attempt to investigate infections of varying transmissibility spreading through the same population. Figure 4 illustrates our theoretical results, via this method of investigation, for a single individual in our example network T ex (File S1), clearly showing the relationship between quasi-invasion probability and endemic quasi-prevalence. We supply these numerical results to illustrate the prevalence-invasion relationship. We do of course obtain the same qualitative behaviour for any strongly connected network as proved in 'Theoretical Results'.
To obtain measurements of P i T,C (quasi{prevalence) for individual i~2332, 100 simulations were run for each of 20 different multipliers of T ex , and in each simulation the first 10 million events, out of 11 million, were discarded. Each simulation was initiated with sufficient infected individuals such that the probability of early extinction was negligible. As extinction never occurred, none of the data was discarded.
To obtain measurements of P i T T ,C (quasi{invasion), 1 million simulations were run for each of the same 20 multipliers. In each simulation, the infection was initiated on individual i~2332 and the process was tracked for a maximum of 500 infection events.

Simulations on a Small Square Lattice
An undirected and homogeneously weighted square lattice T of 25 individuals was investigated (see figure 5). Due to the small population size, the probability of extinction on a relatively short timescale was significant, even when starting from all-infected. This network enables us to illustrate the numerical measurement of our quantifiers in a scenario where the QSD has less practical relevance, i.e. where endemic quasi-stationary behaviour and dichotomised persistence are not 'recognisable' phenomena. In this case, we can compute P i T,C (quasi{invasion) (~P i T,C (quasi{prevalence)) by directly measuring P T,C (j i t \V =1)=P T,C (j V t \V =1) at increasing time points and then estimating its convergent value. Thus, for two different global transmission parameters (0.8, 0.5), and two different initial states (all-infected, one infected), 1 million simulations were allowed to run up to some specific point in simulated time (the global recovery parameter was always set to 1). For each simulation, the time at which extinction occurred was recorded so that the probability of survival up to increasing points in time could be measured.
In figure 5a, our quantifier is able to capture an important feature of the model, i.e. the value at which P T,C (j i t \V =1) (~P T,C (j V t \i=1)) plateaus before its inevitable decay to zero. Figure 5b gives an example of a scenario where, although our quasi-invasion and quasi-prevalence quantifiers are clearly defined, their practical relevance is less obvious. This is because the transmission parameter was sufficiently low such that early extinction was the dominant behaviour.

Computational Efficiency in the Measurement of Invasion Probability and Endemic Prevalence -A New Perspective
Through duality, we can approximate P A T,C (quasi{prevalence) by measuring P T T ,C (j A t \V =1) at increasing points in time (as in figure 5) in order to estimate the value at which it may initially plateau. This could, in certain circumstances, be much more efficient than trying to establish global quasi-stationary behaviour and then computing the proportion of time for which the infection is present in A. Conversely, if we wish to approximate P i T,C (quasi{invasion), for all i[V , it may be more computationally efficient to first establish global quasi-stationary behaviour in the transposed network and then measure the proportion of time each individual spends infected.

Discussion
By considering the unique QSD associated with Markovian SIS dynamics on finite strongly connected networks, along with its implications under duality, we have provided meaningful mathematical definitions for both endemic prevalence (quasi-prevalence) and invasion probability (quasi-invasion). Utilising these definitions, we have also provided a general statement of the exact relationship between invasion probability and endemic prevalence at the individual and population level, for any finite undirected Figure 5. Here we illustrate a method of measurement, through stochastic simulation, for P i T,C (quasi{invasion) (~P i T,C (quasi{prevalence)), where T is an undirected and homogeneously weighted square lattice of 25 individuals (we look for the value towards which P T,C (j i t \V=1)=P T,C (j V t \V=1) converges). For (a), the global transmission parameter (T g ) was set to 0.8. For (b), the global transmission parameter was 0.5. The global recovery parameter was set to 1 in both cases. The figure illustrates how these quantifiers are well defined but not always practically relevant. doi:10.1371/journal.pone.0069028.g005 network of arbitrary heterogeneity (including undirected networks with weighted links and individual-specific recovery parameters). The relationship also has implications in the context of directed networks.
We note that for two specific homogeneous networks (infinite square lattice and infinite 'great circle'), invasion probability (in these cases, the probability of indefinite persistence) from a single initial infected has been shown to be equal to the fraction of the population infected in the upper invariant measure [31], [37]. Furthermore, the relationship between the probability of longterm persistence and quasi-stationary distributions has previously been investigated (see Chaterjee and Durrett [38] and, for the related concept of 'metastability', see Schonmann [39] and Simonis [35]). However, although the prevalence-invasion relationship follows easily from a combination of the QSD and duality, to our knowledge this is the first general statement of this exact relationship for any finite strongly connected network. We have thus related two fundamental epidemiological quantifiers in systems where they cannot usually be calculated analytically due to complexity.
It is generally easier to collect empirical data on endemic prevalence rather than directly on invasion risk. In the case of undirected networks, prevalence data can thus be utilised to inform invasion risk. This method echoes Anderson and May's [7] estimation of the 'basic reproductive ratio' of measles from the total number of susceptible individuals in England and Wales (using data from Fine and Clarkson [40]). When other infectious agents exhibit qualitatively similar behaviour on the same undirected network, we can expect that the individuals carrying the greatest level of endemic infection are also those most likely to initiate new successful invasions. This lends support to the targeting of high-risk individuals in these systems as an effective strategy for the mitigation and control of emerging epidemics.

Supporting Information
File S1 Contact network data. (ZIP)