## Figures

## Abstract

We investigate methods to vaccinate contact networks—i.e. removing nodes in such a way that disease spreading is hindered as much as possible—with respect to their cost-efficiency. Any real implementation of such protocols would come with costs related both to the vaccination itself, and gathering of information about the network. Disregarding this, we argue, would lead to erroneous evaluation of vaccination protocols. We use the susceptible-infected-recovered model—the generic model for diseases making patients immune upon recovery—as our disease-spreading scenario, and analyze outbreaks on both empirical and model networks. For different relative costs, different protocols dominate. For high vaccination costs and low costs of gathering information, the so-called acquaintance vaccination is the most cost efficient. For other parameter values, protocols designed for query-efficient identification of the network’s largest degrees are most efficient.

## Author summary

Finding methods to identify important spreaders—and consequently protocols to identify individuals to vaccinate in targeted vaccination campaigns—is one of the most important topics of network theory. Earlier studies typically make some assumption about what information is available about the contact network that the disease spreads over. Then they try to optimize an objective function—either the average outbreak size in disease simulations, or (simpler) the size of the largest connected component. For public-health practitioners, gathering the network information cannot be detached from the decision process—their cost function includes the costs for both the vaccination itself and mapping of the network. This is the first paper to evaluate the cost efficiency of vaccination protocols—a problem that is much more relevant and not so much more complicated, than the oversimplified objective functions optimized in previous studies. We find a “no-free lunch” situation, where different protocols proposed in the past are most efficient at different cost scenarios. However, some methods are never cost efficient due to the amount of information they need. What protocol that is the best depends on network structure in a non-trivial way. We use both analytical and simulation techniques to reach these conclusions.

**Citation: **Holme P, Litvak N (2017) Cost-efficient vaccination protocols for network epidemiology. PLoS Comput Biol 13(9):
e1005696.
https://doi.org/10.1371/journal.pcbi.1005696

**Editor: **Matthew (Matt) Ferrari,
The Pennsylvania State University, UNITED STATES

**Received: **January 6, 2017; **Accepted: **July 25, 2017; **Published: ** September 11, 2017

**Copyright: ** © 2017 Holme, Litvak. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **"Colorado Springs" is available (printed) in Ref. 19. "Iceland" is available (printed) in Ref. 20. "HIV" is available (printed) in Ref. 18. "Prostitution" is available as an online attachment to: Luis E. C. Rocha, Fredrik Liljeros and Petter Holme, Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts, PLoS Comp. Biol. 7, e1001109 (2011). All other data sets are available at http://www.sociopatterns.org/wp-content/uploads/2013/09/detailed_list_of_contacts_Hospital.dat_.gz (hospital), http://www.sociopatterns.org/wp-content/uploads/2015/09/primaryschool.csv.gz (school 1 & 2), http://www.sociopatterns.org/files/datasets/003/ht09_contact_list.dat.gz (conference).

**Funding: **PH was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) http://www.nrf.re.kr/ funded by the Ministry of Education (2016R1D1A1B01007774). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Infectious disease is a major burden to global health. Infections spread from person to person over human contact networks. The propagation speed is an emergent property of both the pathogenesis in the infected individual and the contacts between people. By understanding the contact networks, we should thus be able to better predict and mitigate disease outbreaks. These are the premises of network epidemiology [1, 2]—one of its most active questions being how to exploit the contact network in targeted vaccination campaigns [3, 4]. Until now, targeted vaccination has mostly been a theoretical topic. The medical practice of network-based immunization has been very limited to both few cases and simple methods—the most famous being “ring vaccination” [5]. This strategy was used to eradicate smallpox and works by vaccinating all network neighbors of an infectious person [6]. Nevertheless, network immunization could be important in future disease control, especially for sexually transmitted infections (where the network links are evident) [7] or livestock diseases (where one node is a farm and links are connections by transport) [8].

In the theoretical literature, the problem of targeted vaccination has typically been formulated as follows. Given some knowledge of the contact network, identify the individuals that are potentially most important for disease spreading. To carry out a targeted vaccination campaign, one would first need to gather information about the contact network, then use this information to vaccinate (or otherwise reduce the impact of the important individuals). There are thus three major costs involved in such an endeavor: the cost of the disease itself (that we use as our base unit), the cost of gathering the information about the network *c*_{info} (in units of the cost of a person getting the disease) and the cost of vaccinating *c*_{vacc}. We can thus evaluate the cost efficiency of a vaccination protocol by measuring the net saving *χ* per person in units of the cost of sick individuals
(1)
where Ω and Ω′ are the expected outbreak sizes (number of individual who had the disease after it became extinct) respectively without and with using vaccinations, *N* is the number of individuals, *f* is the fraction of individuals to vaccinate and *n* is the number of inquiries needed to obtain information. Obviously, Ω corresponds to the no-vaccination scenario and thus does not depend on *f*. Both *n* and Ω′ depend on the specific vaccination protocol, but we drop this information in Eq (1) for brevity.

By reformulating the vaccination problem as a cost-optimization problem, one can evaluate the protocols proposed in the literature in a way more useful for decision makers. In this paper, we use this approach to evaluate eight vaccination protocols for many kinds of cost scenarios and underlying networks. We use eight different empirical networks of human contacts (representing sexual interaction or proximity). We also use the configuration model—a popular method to generate synthetic uncorrelated random networks given a degree sequence.

Before proceeding to the details of our approach we will give a brief overview of the recent analytical advances on the vaccination problem. The simplest vaccination protocol is just to vaccinate random individuals—the *Random* (*R*) protocol—which often serves as a baseline in the literature, see e.g. Refs. [9–12]. In a seminal paper, Cohen, Havlin and ben Avraham [9] proposed the more effective *Acquaintance* (*A*) vaccination. In their approach, one also starts by randomly selected individuals, but does not vaccinate these, rather, asks them to name someone they met (in such a way that contagion could occur). In an uncorrelated network, the probability of meeting a node of degree *k* in such an approach, is proportional to *k*. It is important to vaccinate high-degree nodes, not only because they have more people to spread the disease to, but also more people to get the disease from.

Let *f*_{c} denote the fraction of population that must be vaccinated in order to prevent a global outbreak. Formally, as *N* → ∞, *f*_{c} = inf{*f*: Ω′(*f*)/*N* = *o*(1)}, and we will use a superscript for *f*_{c} to denote a specific vaccination protocol. It was shown numerically in Refs. [9, 10] that . An implicit analytical expression for in uncorrelated networks (configuration model) was derived in Ref. [10]. Similar results were obtained in Ref. [11] for a more general model of infection spreading, in Ref. [13] for imperfect vaccine, and in Ref. [14] for the weighted configuration model, where weights of the edges represent contact probabilities.

A large empirical study based on the 2006 census of the Greater Toronto Area [12] suggests that vaccination of top-degree nodes—the *Degree* (*D*) vaccination protocol—is most effective. However, such strategy requires information about the entire network, which makes it hard to implement. For analytical results on degree-based vaccination and an implicit expression for , we refer to Ref. [11]. In this paper by optimizing Eq (1) rather that *f*_{c}, we confirm that the *Degree* protocol is never most efficient—in all scenarios, the cost of the complete knowledge does not justify the gain in Ω′. Ref. [15], like us, considers vaccination as a cost problem but does not consider the cost of information gathering. They find that the picture of Ref. [12] needs to be modified if one balances the cost of vaccination and treatments so that it is beneficial to vaccinate lower degree nodes.

In addition to the *Acquaintance* protocol, we consider two strategies, recently developed for quick detection of high-degree nodes: the *Random walk* (*RW*) strategy [16], and the *Two-step heuristic* (*TSH*) [17]. We also consider two other protocols that require complete knowledge of the network—*Coreness* and *Collective influence* (*CI*). See below for a complete description of all protocols.

## Methods

In this section we introduce the methods, data sets and network models we use.

### SIR simulation

We assume that an infectious disease is spreading over a static contact network represented as a graph *G* = (*V*, *E*). *V* is the set of *N* vertices, or nodes, representing individuals; *E* is the set of *M* undirected edges representing pairs of individuals between whom the disease may spread. The nodes are, at any given time, in one of three states—susceptible (S), infectious (I) or recovered (R). Susceptible nodes do not have the disease, but can get it. Infectious nodes have the disease and they can spread it. Recovered nodes do not have the disease and cannot get it. We assume a disease outbreak starts at time *t* = 0. At the beginning all nodes are susceptible, except a randomly chosen node that is infectious. If an edge is between one susceptible individual and one infected individual, then the susceptible becomes infectious at rate *β*. Every infectious recovers at rate *ν*. In this setting, the infection and recovery times are independent exponential random variables, and an infectious node transfers a disease through an edge before getting recovered with probability *β*/(*β* + *ν*).

The SIR model is essentially determined by the ratio between *β* and *ν*. In the well-mixed, differential equation version of the SIR model, this ratio is called *R*_{0}. The actual values of *β* and *ν* are only needed to calculate the time to reach the peak prevalence, extinction etc. In this paper, we set *ν* = 1 which is equivalent to saying that we are measuring the time in units of 1/*ν*. In order to simulate this model, it is efficient to perform one infection or recovery event every iteration of the algorithm. The probability of the next event being an infection is
(2)
where *M*_{SI} is the number of edges between infectious and susceptible individuals, and *N*_{I} is the prevalence (number of infectious individuals [18]). The time increment since the last iteration is, on average, 1/(*βM*_{SI} + *N*_{I}). If an infection event is not performed, one performs a recovery event. In an infection event, the S-I edge is chosen randomly among all S-I links. Similarly, in case of a recovery event, the infectious individual (to recover) is selected uniformly at random among all infectious individuals.

For all contact networks and parameter values, we use 300,000 or more runs of the SIR model for averages. Since each run represents an independent realization of the (random) costs of the entire process, we used the normal approximation of the sample average, to verify with 99% confidence intervals that our evaluations of mean costs were very accurate. The exact values of the confidence intervals are not informative for our purpose and thus are omitted. We use *β* = 1/32, 1/16, 1/8, 1/4, 1/2, 1, 2, 4, 8, 16, 32 and (as mentioned) *ν* = 1.

### Vaccination protocols

We compare the performance of seven vaccination protocols—five of these have been analyzed in the literature, and two are proposed by us in this work (but derived from a cost-efficient way of finding the highest degree vertices). The vaccination protocols range from simple to complex and use different amount of information about the network.

#### Random vaccination.

The simplest way of vaccinating a fraction *f* of a population is to just pick *fN* persons uniformly at random [19]. In this case, we can assume the information cost to be zero as all we need is a list of contact information of the population.

#### Acquaintance vaccination.

An elegant way of exploiting the network structure to find high-degree individuals to vaccinate is the *Acquaintance* vaccination scheme by Cohen, Havlin and ben Avraham [9]. In the literature it is often assumed that each individual is sampled a Poisson(*f*) distributed number of times, and each time a sampled individual names one neighbor to vaccinate. When the neighbor has already been vaccinated, no vaccination occurs and the next individual is sampled randomly. Then, the average fraction of vaccinated individuals *v*(*f*) is smaller than *f*. The exact formula for *v*(*f*) is given e.g. in Refs. [10, 11]. Naturally, *v*(*f*) is close to *f* when *f* is small. Here we assume that when a randomly sampled individual names a contact, which has already been vaccinated, then the individual is asked to name another contact. We discard the rare cases when all contacts of a random individual have already been vaccinated, and thus assume that *v*(*f*) = *f*. Then the information cost of this protocol is *fNc*_{info}, since one needs to make an inquiry to one node for every node that is vaccinated.

#### Random-walk vaccination.

If one would spend more effort mapping out the network, one can do significantly better than the acquaintance vaccination in finding the high-degree vertices. This is the idea of *Random walk* vaccination. Under this heuristics one keeps a list of the *fN* vertices with highest observed degree that is updated during a random walk of inquiries. This is based on Ref. [16] that proposed this method to find high-degree nodes in the World Wide Web and social networks in a cost-efficient way. Let *k*_{i} be the degree of node *i*. When the random walk is at node *i*, it jumps to a random node with probability *α*/(*k*_{i} + *α*), and with complementary probability it proceeds to a randomly chosen neighbor of *i*. The rationale is that the stationary probability of *i* in such random walk is proportional to *k*_{i} + *α*. We use *α* = 3 following the recommendation from Ref. [16] that *α* should be of the order of the average degree. This value will be the same in all networks because the performance is robust with respect to *α*. The second parameter *m* is the number of steps in the random walk. Rather than fixing this parameter, we will use the value that optimizes *χ*.

The cost of this protocol is the number of steps the random walk continues to a neighbor of the present node (rather than jumping to a random node) times *c*_{info}. (On average, in stationarity, the information costs are *mc*_{info}(1 − *α*/(〈*k*〉 + *α*)), cf. Ref. [16].

#### Two-step heuristic.

We also try a protocol that, like the *Random walk* in the previous section, was developed to cost-efficiently identify high-degree nodes in social media. We call it the *Two-step heuristic*. Just like *Random walk* it has a parameter to tune the amount of information used in the search process [17]. This protocol consist of two stages. In the first stage, one randomly chooses *n*_{1} nodes and considers a reduced network of these *n*_{1} nodes and their neighbors. In the second stage one measures the exact degrees of the *n*_{2} highest-degree nodes of the reduced network. For simplicity, we set *n*_{1} = *n*_{2} = *n* (which is not far off the expected optimal parameter setting [17]). This gives *n*(*f*, *G*) = 2*n*, and the total information cost 2*nc*_{info}.

#### Degree.

Since both *Random walk* and *TSH* aim at being cost-efficient methods to rank nodes according to their degree, we also use the correct values of the degree (which could only be obtained by knowing the entire network). The information cost of this protocol is thus *Nc*_{info}. This has also been discussed as a vaccination protocol [19].

#### Coreness.

There are other structures than degree that could be exploited for mitigating disease spreading. Coreness captures not only the degree of a node, but also increases with the connectedness of a node’s neighborhood. The idea that dense clusters (“core groups” in the epidemiological literature) are important for disease spreading dates back to Ref. [20]. Coreness is not the only metric to capture this property, but a simple and straightforward one. It is the byproduct of a *k*-*core decomposition*, which is a way to analyze the network by successively removing nodes from it. Specifically, at level *k*, one deletes all nodes with degree ≤ *k*. If nodes get degree ≤ *k* during the deletion process, one deletes these too, until all nodes have degrees larger than *k*. The coreness value of a node is the *k*-value when it was deleted.

The coreness as an estimate of importance with respect to disease spreading was proposed by Ref. [21], and further refined in Ref. [22]. To use it, one would need to map out the entire network, i.e. all its *M* edges. However, in reality, the inquiries will be implemented node by node. Therefore, we choose a simplified approach, in which we assume that knowing the complete network takes one inquiry per node, i.e. the total information cost is *Nc*_{info}. Note that this is a more demanding inquiry, because it requires an individual to list all its neighbors. Still, we use the same cost, meaning the performance of *Coreness* relative to its cost will be slightly overestimated compared to the above protocols.

#### Collective influence.

Finally, we use a yet more elaborate algorithm that, like coreness, requires full information about the network. We stick with the author’s rather non-descriptive name *Collective Influence* (*CI*) [23]. It starts by defining a quantity
(3)
where *k*_{i} is the degree (number of neighbors) of *i*, *d*(*i*, *j*) is the distance (fewest number of edges in any path) between *i* and *j*. The algorithm proceeds by deleting the node of largest *x*_{l}(*i*), then recalculating *x*_{l} for the reduced network and repeating the procedure until *fN* nodes are deleted. As *l* grows, the ranking stabilizes but the computation time increases. The choice of *l* is thus a trade-off between speed and precision. We follow Ref. [23] and set *l* = 3. Just like coreness, the collective influence needs all the network information. Thus the total cost of information gathering is *Nc*_{info}.

### Networks

Ideally, the underlying network of our study should be as realistic as possible (given a pathogen). Our knowledge of the structure of contact networks is advancing, and there are some datasets available. We use the ones that record actual contacts between people and disregard those where contacts are inferred from interaction on social media, etc. [24]. To better understand how the size of the network, and higher-order structures, affect the performance of the algorithms, it is desirable to have models able to generate contact networks. We study one of the simplest such models—the configuration model—not because it is able to generate a network with very realistic structure, but because it enables us to compare the result to other studies, in particular analytical ones.

#### Configuration model.

The input to the configuration model is a degree sequence, i.e. a sequence of desired degrees of the nodes of the network. Then the model proceeds by picking random pairs of nodes and adding an edge between them if their actual degrees are less than their desired degrees. When all nodes has their desired degree, the network has been constructed. The model does not enforce a simple graph (i.e. if there are already edges between a selected pair of nodes, one would still add another edge, and links from a vertex to itself are also allowed). Since the empirical graphs in our study are simple graphs by construction, we convert the output of the configuration model to a simple graph by deleting multiple edges and self-loops. In the literature this construction is sometimes called the erased configuration model [25].

Like many previous studies, we focus on networks with a power-law degree distribution, so the probability of a vertex having degree *k* is proportional to *k*^{−γ}. We truncate the degree distribution at *N*^{1/(γ − 1)}. Such a truncation improves the precision of the estimated average values of the infection outbreak, and at the same time it preserves the limiting degree distribution and the order of magnitude of the maximum degree.

The parameter values we use are: *γ* = 2.5 (as a typical value of empirical networks) and *N* = 625, 1,250, 2,500, 5,000, or 10,000. We generate 100 networks of each combination of parameter values.

#### Empirical networks.

The first type of empirical networks that we use represent self-reported sexual contacts. Two of these data sets—we label them *HIV* and *Colorado Springs*—were gathered by so called contact tracing where individuals testing positive with HIV were required to report their recent contacts. *HIV* data set is from the first study [26], which used an observed contact network between HIV patients to argue that HIV is a sexually transmitted disease. *Colorado Springs* is a larger and more recent contact-tracing data set based on patients from its namesake city in Colorado, USA [27]. Contact tracing does not follow contacts of uninfected individuals, indeed *HIV* only includes positive cases while *Colorado Springs* also includes uninfected individuals that had sex with HIV positive others.

We also use two networks of self-reported sexual contacts not related to contact tracing. One (*Iceland*) comes from Icelandic men who have sex with men [28]. The other (*Prostitution*) from a Brazilian web forum where sex buyers report their encounters with prostitutes [29].

The final type of empirical networks are so called proximity networks. In these, a link represent a pair of people being close to each other at some time. These data sets all come from the Sociopatterns project (sociopatterns.org) and were collected by radio-frequency identification sensors given to people in some specific social setting. Such sensors record a contact if two persons are within 1–1.5 m. The social setting of one of these data sets is a conference [30] (*Conference*), another is a hospital [31] (*Hospital*) and the final one is from a school (*School 1* and *2*) [32].

The original proximity data sets along with *Prostitution* are time resolved. We construct static networks by aggregating all contacts. (Ideally these data sets should be analyzed as temporal networks [33]—then one could get around the assumption that the past accurately predicts the future [34, 35]. However, that is outside the scope of this paper.)

We list the basic statistics—sizes, sampling durations, etc.—of the data sets in Table 1.

*N* is the number of individuals; *M* is the number of links. *x* is the connectance (fraction of vertex pairs that are links). *C* is the clustering coefficient of the original network and *C*′ denotes the averaged values of random graphs with the same expected degree sequence as the original network [36].

## Results

### Numerical results

We start by evaluating the vaccination protocols in some detail for the *Colorado Springs* data set. Then we proceed to take a cruder look at all the data sets to see how network structure affects the results.

#### A case study.

The *Colorado Springs* network serves well as an example since it is of intermediate size in our collection and has typical features, such as a heterogeneous degree distribution. In this section we set *β* = 2—once again choosing a modest value that is in the interesting range where disease can spread throughout the population. In Fig 1, we plot the optimal saved cost *Nχ*_{opt} as a function of the two parameters—the relative cost of information *c*_{info} and the relative cost of vaccination *c*_{vacc}. The general pattern is quite trivial—the protocols needing most information (*CI*, *Degree* and *Coreness*) are also the ones that depend most on *c*_{info}, while *Random*, that needs no information at all, depends only on *c*_{vacc}. The three protocols using an amount of information depending on *f* (*Acquaintance*, *Random walk* and *TSH*) are affected by both *c*_{info} and *c*_{vacc}. From the heat maps it is hard to see which protocol is the best (except, perhaps that *Acquaintance* has the largest *χ* for high *c*_{info}). This means that the efficiencies of the best-performing protocols are relatively similar.

Here we use the *Colorado Springs* network as a function of the costs of information retrieval and vaccination and *β* = 2.

The performance of the protocols can be better understood by measuring the fraction of vertices *f*_{opt} needed to be vaccinated to optimize the total costs. See Fig 2. The protocols where the information costs do not depend on *f* obviously have no *c*_{info} dependence. For the other ones—*Acquaintance*, *Random walk* and *TSH*—*f*_{opt} decreases with both *c*_{vacc} and *c*_{info}. Hence, more information does make these protocols more accurate. This can be seen even more clearly in Fig 3 where we set *f* = *f*_{opt} and study the optimal parameter values (*m*_{opt} and *n*_{opt}) of the *Random walk* and *TSH* protocols. Both the protocols naturally have larger values of their parameters the cheaper the information is. For *Random walk* the optimal *m*-value is largest when *c*_{info} is as low and *c*_{vacc} as high as possible. High *c*_{vacc} gives small optimal *f* (see Fig 2) which lowers the cost needed for gathering information. For high *c*_{vacc} and low *c*_{info} the relative cost for information gathering is thus so low that the rather small marginal benefit of longer random walks is still affordable.

The plot shows the same network and as a function of the same parameters as in Fig 1.

For the *TSH* protocol the largest parameter value is at an intermediate value of *c*_{vacc} (still *c*_{info} is as low as possible). One can understand the increase of the parameter value with *c*_{vacc} in a similar way as for *Random walk*. The eventual decrease, for *c*_{vacc} ≈ 0.1, as well as other non-monotonicities in the plot, can be related to how Ω′ responds to changing *f*_{opt}.

#### Network-structural effects.

The picture painted in the previous section remains roughly true for other data sets and *β* values. In this section, we go directly to our main question of what the most cost effective vaccination protocol is. Fig 4 shows the results for *β* = 2. The corresponding figure for the other *β*-values we study can be found in the Supplementary material. From these figures, the conclusions are roughly the same, but for small *β*, i.e. small outbreak sizes, the results are affected by noise (so the regions are not that clear cut).

In this figure *β* = 2 (for other parameter values, see the Supplementary information S1–S4 Figs.

For most of the data sets, *Acquaintance* vaccination is the most efficient protocol for relatively high information costs, *TSH* is the most efficient for low *c*_{info} and high *c*_{vacc}, while *Random walk* is the most efficient for the rest of the parameter space. One exception is the *Prostitution*—the largest and sparsest network—where *CI* is the most cost effective (despite the fact it requires global knowledge of the network structure). This network also has zero clustering coefficient—i.e. no triangles (because only heterosexual contacts are recorded). Still, the size and sparsity seem like more fundamental differences to the other networks (cf. Ref. [37]). To understand the role of clustering one could perform the same study on model networks where the clustering can be controlled. The densest network, *Hospital*, is also different in the respect that *TSH* performs best for the entire parameter space. *Random* is never the most efficient, meaning that there are network structures that can be exploited for all data sets and parameter values. *Coreness* and *Degree* does not perform best under any circumstance.

In addition to the empirical contact networks, we also study scale-free networks of different sizes. See Fig 5. These networks behave slightly differently from the empirical networks with *CI* dominating the high-*c*_{vacc} low-*c*_{info} region, *Acquaintance* dominating the low-*c*_{vacc} high-*c*_{info} region, *Random walk* being the best for the region of intermediate *c*_{vacc} and *c*_{info}, and *TSH* being the best protocol for some low *c*_{info} values and intermediate *c*_{vacc} values.

#### Analysis of the optimal *f*.

In this section our goal is to understand regularities behind the numerical results. The exact analysis is available only for asymptotic behavior of *Random*, *Acquaintance* and *Degree* strategies in a configuration model, see e.g. [10, 11], but the analytical expressions are cumbersome, and do not provide sufficient qualitative insights. The results on other vaccination strategies are currently not available. Therefore, we resort to heuristic arguments, that are based on the exact results in the literature.

Dividing both parts of Eq (1) by *N*, we write
(4)
where
(5)
is the fraction of the population that have avoided the disease due to vaccination. For any vaccination strategy, Δ(*f*), obviously, increases in *f*. Furthermore, remember that *f*_{c} is the fraction of the population that needs to be vaccinated in order to prevent a global outbreak. In other words, if *f* ≥ *f*_{c}, then Ω′(*f*) is negligible compared to *N*, so Δ(*f*) ≈ Ω/*N*. Furthermore, one expects that for small *ϵ* > 0, the additional gain Δ(*f* + *ϵ*) − Δ(*f*) decreases to zero when *f* approaches *f*_{c}. (This is closely related to subadditivity of spreading processes, which is used, for example, in solving influence maximization problems [38].)

Following a widely used approach in epidemiology and network science, consider a continuous version of Eq (4), where all functions of *f* ∈ [0, 1] are differentiable and all vanishing terms are neglected. (We note that proving formally that the process converges to its continuous representation as *N* → ∞ is a challenging mathematical problem, however, it is common to analyze the continuous version in its own right.) In the continuous version of the system, our observations above can be summarized as follows: (i) Δ′(*f*) > 0 for *f* < *f*_{c}; (ii) Δ(*f*) = Ω/*N* and Δ′(*f*) = 0 for *f* ≥ *f*_{c}; (iii) Δ′(*f*) → 0 when *f* → *f*_{c}. This behavior of Δ′(*f*) is schematically depicted in Fig 6.

We proceed with analyzing the optimal fraction of vaccinated individuals *f*_{opt}. Note that Eq (4) directly implies *f*_{opt} ≤ *f*_{c}. Indeed, it is not optimal to vaccinate a fraction greater than *f*_{c} because the negative part on the-right-hand side of Eq (4) will grow while the positive part will remain the same. In the continuous version, the maximal gain in Eq (4) is achieved at *f* = *f*_{opt}, which is a solution of
(6)
Since *n*(*f*, *G*) is non-decreasing in *f*, it follows that Δ′(*f*_{opt}) > 0.

Consequently, we have two rules of thumb to anticipate the value of *f*_{opt}. First, one expects that *f*_{opt} is smaller for more effective strategies. This is because *f*_{opt} ≤ *f*_{c}, while *f*_{c} can be viewed as an indicator of the effectiveness of a vaccination strategy in preventing the epidemics. Indeed, when *f*_{c} is small, then the global outbreak is prevented by vaccinating only a small fraction of individuals. Second, a higher value of the right-hand side of Eq (6) is also an indication for smaller *f*_{opt}, as illustrated in Fig 6, where this value is represented by the dashed line.

We will now compare *f*_{opt} for different vaccination strategies.

*Random* (*R*) is the most well-studied vaccination strategy. Assume that the underlying graph is a configuration model. If the degree distribution has a finite variance, then can be obtained directly from Eq (3.5) in Ref. [10] by equating the reproduction number to its critical value 1. Specifically, we have:
(7)
and the value is positive if the global outbreak occurs when no vaccination takes place. When the variance is infinite, as in our case *γ* = 2.5, then , so the global outbreak cannot be prevented by the random vaccination.

Applying Eq (6) we obtain that satisfies
(8)
When , one expects that is quite large for low *c*_{vacc}, and it decreases when *c*_{vacc} becomes larger (see Fig 6). This is indeed the case in our case study in Fig 2. We can also explain the modest gain in Fig 4 by relatively slow growth of Δ(*f*).

For the *Acquaintance* (*A*) strategy, in the configuration model, can be computed using Theorem 3.3 of Ref. [10], as long as the reproduction number in Eq (3.13) in Ref. [10] is smaller than one. The optimal fraction of vaccinated individuals satisfies
(9)
Compared to the *Random* strategy, the right-hand side has an extra positive term. Moreover, for the same epidemic on the same graph, it holds that (except when ), see Ref. [10]. Hence, using Fig 6, we deduce that should be considerably smaller than . We see that this is indeed the case in Fig 2.

The cost efficiency of *Acquaintance* and *Random* strategies is harder to compare because *Acquaintance* targets high-degree nodes while *Random* does not, but on the other hand, *Random* has no information costs. To take extreme examples, *Random* will yield higher gain on a regular graph, while *Acquaintance*—on a star graph. In the case study in Fig 4, we see that the gain for the *Acquaintance* strategy is similar to the one for the *Random* strategy, while in other data sets *Acquaintance* outperforms other protocols especially when information costs are high, see Fig 4.

*Degree* (*D*), *Coreness* (*C*) and *CI* strategies must be most effective in the configuration model because they target the nodes that have the highest potential for spreading the infection. A formula for the average outbreak size in the configuration model when nodes of degree *s* are removed with given probability is given in Ref. [11], but these results do not directly apply when fraction *f* of highest degree nodes is removed.

The fraction *f*_{opt} for these strategies satisfies the same eq (8) as , that is, Δ′(*f*) = *c*_{vacc}. A comparison between and may go both ways, as is easily illustrated by Fig 6. On one hand, one expects that , because both strategies target high degree nodes, only *Degree* identifies them precisely while *Acquaintance* is just a heuristic. On the other hand, Δ′(*f*_{opt}) is smaller for the *Degree* than for the *Acquaintance* strategy. Same argument applies to *Coreness* and *CI*, however, these protocols do not target nodes of large degrees *per se*, so depending on a network, and might be smaller or larger than .

In the case study in Fig 2 we obtain but . Very large value of *f*_{opt}, especially for *Coreness* in Fig 2 signals that these strategies are in fact inefficient for the *Colorado Springs* case study. In Fig 4, for the same case study we observe that *Degree*, *Coreness* and *CI* have very small gains. The efficiency of *CI* on configuration model (Fig 5) and on the *Prostitution* data set in Fig 4, for similar values of the parameters, is an interesting finding that deserves further research. Possible explanation can be in a small number of triangles—the feature that the *Prostitution* data set and configuration model share.

Finally, consider *Random-walk* (*RW*) and *TSH* strategies. Since these protocols target nodes with large degrees, but do not identify them precisely, one expects that the *Degree* protocol is more effective in preventing a global outbreak, but not by much. Therefore, and should be slightly larger than . The optimal value *f*_{opt} satisfies
(10)
For large enough *N*, we expect the last term above to be small, so Δ^{′}(*f*_{opt}) is close to the one of the *Degree* protocol. Invoking Fig 6, we expect that and are close to , especially when *c*_{info} is low, and they decrease when *c*_{info} increases. These are exactly the results in Fig 2. The net gain of *Random walk* and *TSH* should be considerably higher than that of the *Degree* strategy when *c*_{info} is large enough. Indeed, we observe that the *Degree* strategy never yields the largest gain.

The comparison of *Random walk* and *TSH* to the *Acquaintance* strategy is trickier since the latter also targets high degree nodes but at lower costs. On the other hand, the accuracy of *Random walk* and *TSH* is higher. The comparison between the three randomized strategies—*Acquaintance*, *Random walk*, and *TSH*—thus depends on the interplay between accurate targeting and information costs. This explains that the *Acquaintance* sometimes performs better than *Random walk* and *TSH*.

## Discussion

We have discussed how to make theoretical studies of targeted vaccination more practically useful for decision makers. Instead of evaluating vaccination protocols for some scenario about what is known about the network, we evaluate methods based as a cost-benefit problem. From this starting point, we have evaluated the cost efficiency of seven network-based vaccination methods. There is not one universally best method. Rather, depending on the network structure and relative vaccination and information costs, the best method (at least for the network and parameters we explore) seem to be one of four—*Acquaintance*, *TSH*, *CI* and *Random walk*. We make this point both by analytical calculations and simulations.

*Acquaintance* vaccination is almost always the most efficient for low *c*_{vacc} and large *c*_{info}. It is the protocol that uses second least network information after *Random*. For very high *c*_{info}, *Random* will trivially be the most efficient (keep in mind that *c*_{info} can, in principle, be larger than one), but we never observe this. *TSH* dominates the region of large *c*_{vacc} and low *c*_{info}, for denser networks (for very sparse networks *CI* could also be most efficient). *Random walk* dominates intermediate values of *c*_{vacc} and *c*_{info}. Something that we find hard to rationalize and leave to future investigations.

*CI* performs well for very sparse networks with few triangles, especially in the region of large *c*_{vacc} and low *c*_{info}. *Degree* is never most efficient, meaning that vaccinating exactly in order of degree is not so important that it is worth obtaining all the network information. Furthermore, *Coreness* is also never most efficient, supporting Refs. [39] and [23] (but disagreeing with Ref. [21]).

In practical applications, one would in principle need to know the parameters, both for the SIR model and to calculate the cost [4]. For e.g. sexually transmitted diseases, this is not impossible. If one, would base a pilot HIV pre-exposure prophylaxis campaign on mapping a sexual network like Ref. [28] (which, in addition to the network itself, could give the contact rates), then one could assume a per-contact transmission probability of 1–2% [40]. Furthermore, the societal cost for a positive HIV case is well understood [41]. With these parameters at hand, it should be possible to narrow down the protocols to one or two.

To proceed towards increasing realism and applicability, one would also need to take social mechanisms into account. Parallel to the targeted immunization problem, there is an emergent field studying vaccination as a social-psychological problem. One issue being that for voluntary vaccination it is irrational to become vaccinated if almost everyone else is vaccinated (the diseases would not spread anyway, and there are side-effects and discomfort associated with being vaccinated). Conversely, it is irrational not to vaccinate if almost nobody is vaccinate, leading to a typical game theoretical dilemma [42]. Another issue in this direction discusses how the awareness of a disease spreading affect the contact networks, and subsequently the spreading dynamics [43]. Or how vaccination and awareness diffusion can create synergistic effects [44]. Other papers study how social influence affects the decision to vaccinate ones children (e.g. Ref. [45]). To make theoretical vaccination studies fully realistic and most useful to decision makers, one would need combine such social aspects with the cost-benefit approach of this paper.

## Supporting information

### S1 Fig. The most cost efficient vaccination strategies for empirical networks as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 4, but for *β* = 1/2.

https://doi.org/10.1371/journal.pcbi.1005696.s001

(PDF)

### S2 Fig. The most cost efficient vaccination strategies for empirical networks as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 4, but for *β* = 1.

https://doi.org/10.1371/journal.pcbi.1005696.s002

(PDF)

### S3 Fig. The most cost efficient vaccination strategies for empirical networks as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 4, but for *β* = 4.

https://doi.org/10.1371/journal.pcbi.1005696.s003

(PDF)

### S4 Fig. The most cost efficient vaccination strategies for empirical networks as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 4, but for *β* = 8.

https://doi.org/10.1371/journal.pcbi.1005696.s004

(PDF)

### S5 Fig. The most cost efficient vaccination strategies for the configuration model with a power-law degree distribution as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 5, but for *β* = 1/2.

https://doi.org/10.1371/journal.pcbi.1005696.s005

(PDF)

### S6 Fig. The most cost efficient vaccination strategies for the configuration model with a power-law degree distribution as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 5, but for *β* = 1.

https://doi.org/10.1371/journal.pcbi.1005696.s006

(PDF)

### S7 Fig. The most cost efficient vaccination strategies for the configuration model with a power-law degree distribution as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 5, but for *β* = 4.

https://doi.org/10.1371/journal.pcbi.1005696.s007

(PDF)

### S8 Fig. The most cost efficient vaccination strategies for the configuration model with a power-law degree distribution as a function of the costs of information retrieval and vaccination.

The figure corresponding to Fig 5, but for *β* = 8.

https://doi.org/10.1371/journal.pcbi.1005696.s008

(PDF)

## References

- 1. Keeling MJ, Eames KT. Networks and epidemic models. J Royal Soc Interface. 2005;2(4):295–307.
- 2. Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A. Epidemic processes in complex networks. Rev Mod Phys. 2015;87:925–979.
- 3. Lü L, Chen D, Ren XL, Zhang QM, Zhang YC, Zhou T. Vital nodes identification in complex networks. Phys Rep. 2016;650:1–63.
- 4. Wang Z, Bauch CT, Bhattacharyya S, d’Onofrio A, Manfredi P, Perc M, et al. Statistical physics of vaccination. Phys Rep. 2016;664:1–113.
- 5.
Giesecke J. Modern Infectious Disease Epidemiology. 2nd ed. London: Arnold; 2002.
- 6. Strassburg MA. The global eradication of smallpox. Am J Infect Control;10:53–59. pmid:7044193
- 7. Liljeros F, Edling CR, Amaral LAN. Sexual networks: Implication for the transmission of sexually transmitted infection. Microbes Infect. 2003;5:189–196. pmid:12650777
- 8. Bajardi P, Barrat A, Natale F, Savini L, Colizza V. Dynamical patterns of cattle trade movements. PLOS ONE. 2011;6:e19869. pmid:21625633
- 9. Cohen R, Havlin S, ben Avraham D. Efficient Immunization Strategies for Computer Networks and Populations. Phys Rev Lett. 2003;91:247901. pmid:14683159
- 10. Britton T, Janson S, Martin-Löf A. Graphs with specified degree distributions, simple epidemics, and local vaccination strategies. Adv Appl Probab. 2007;39(4):922–948.
- 11. Lelarge M. Efficient control of epidemics over random networks. ACM SIGMETRICS Performance Evaluation Review. 2009;37(1):1–12.
- 12. Ventresca M, Aleman D. Evaluation of strategies to mitigate contagion spread using social network characteristics. Social Networks. 2013;35(1):75–88.
- 13. Ball F, Sirl D. Acquaintance vaccination in an epidemic on a random graph with specified degree distribution. J Appl Probab. 2013;50(4):1147–1168.
- 14. Deijfen M. Epidemics and vaccination on weighted graphs. Math Biosci. 2011;232(1):57–65. pmid:21536052
- 15. Wang B, Suzuki H, Aihara K. Evaluating Roles of Nodes in Optimal Allocation of Vaccines with Economic Considerations. PLOS ONE. 2013;8(8):1–9.
- 16.
Avrachenkov K, Litvak N, Sokol M, Towsley D. Quick Detection of Nodes with Large Degrees. In: Bonato A, Janssen J, editors. Algorithms and Models for the Web Graph: 9th International Workshop, WAW 2012, Halifax, NS, Canada, June 22-23, 2012. Proceedings. Berlin, Heidelberg: Springer; 2012. p. 54–65.
- 17.
Avrachenkov K, Litvak N, Prokhorenkova LO, Suyargulova E. Quick Detection of High-Degree Entities in Large Directed Networks. In: Proceedings of the 2014 IEEE International Conference on Data Mining. ICDM’14. Washington, DC, USA: IEEE Computer Society; 2014. p. 20–29.
- 18. Holme P. Model versions and fast algorithms for network epidemiology. Journal of Logistical Engineering University. 2014;5:51–56.
- 19. Pastor-Satorras R, Vespignani A. Immunization of complex networks. Phys Rev E. 2002;65:036104.
- 20. Yorke JA, Hethcote HW, Nold A. Dynamics and control of the transmission of Gonorrhea. Sex Transm Dis. 1978;5:51–56. pmid:10328031
- 21. Kitsak M, Gallos LK, Havlin S, Liljeros F, Muchnik L, Stanley HE, et al. Identification of influential spreaders in complex networks. Nature Phys. 2010;6:888–893.
- 22. Hébert-Dufresne L, Grochow JA, Allard A. Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition. Sci Rep. 2016;6:31708. pmid:27535466
- 23. Morone F, Makse HA. Influence maximization in complex networks through optimal percolation. Nature. 2015;524:65–68. pmid:26131931
- 24. Villani A, Frigessi A, Liljeros F, Nordvik MK, de Blasio BF. A Characterization of Internet dating network structures among Nordic men who have sex with men. PLoS ONE. 2012;7(7):1–8.
- 25.
van der Hofstad R. Random Graphs and Complex Networks; 2016.
- 26. Auerbach DM, Darrow WW, Jaffe HW, Curran JW. Cluster of cases of the acquired immune deficiency syndrome: Patients linked by sexual contact. Am J Med. 1984;76(3):487–492. pmid:6608269
- 27. Klovdahl AS, Potterat JJ, Woodhouse DE, Muth JB, Muth SQ, Darrow WW. Social networks and infectious disease: The Colorado Springs study. Social Science & Medicine. 1994;38(1):79–88.
- 28. Haraldsdottir S, Gupta S, Anderson RM. Preliminary studies of sexual networks in a male homosexual community in Iceland. J Acquir Immune Defic Syndr. 1992;5(4):374–381. pmid:1548573
- 29. Rocha LEC, Liljeros F, Holme P. Information dynamics shape the sexual networks of Internet-mediated prostitution. Proc Natl Acad Sci USA. 2010;107:5706–5711. pmid:20231480
- 30. Isella L, Stehlé J, Barrat A, Cattuto C, Pinton JF, van den Broeck W. What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol. 2011;271:166–180. pmid:21130777
- 31. Vanhems P, Barrat A, Cattuto C, Pinton JF, Khanafer N, Régis C, et al. Estimating potential infection transmission routes in hospital wards using wearable proximity sensors. PLoS ONE. 2013;8:e73970. pmid:24040129
- 32. Stehlé J, Voirin N, Barrat A, Cattuto C, Isella L, Pinton JF, et al. High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE. 2011;6:e23176. pmid:21858018
- 33. Masuda N, Holme P. Predicting and controlling infectious disease epidemics using temporal networks. F1000Prime Rep. 2013;5:6. pmid:23513178
- 34. Lee S, Rocha LEC, Liljeros F, Holme P. Exploiting temporal network structures of human interaction to effectively immunize populations. PLoS ONE. 2012;44:e36439.
- 35. Starnini M, Machens A, Cattuto C, Barrat A, Pastor-Satorras R. Immunization strategies for epidemic processes in time-varying contact networks. J Theor Biol. 2013;337:89–100. pmid:23871715
- 36. Bayati M, Kim JH, Saberi A. A sequential algorithm for generating random graphs. Algorithmica. 2010;58:860–910.
- 37. Holme P. Temporal network structures controlling disease spreading. Phys Rev E. 2016;64:022305.
- 38.
Kempe D, Kleinberg J, Tardos E. Maximizing the spread of influence in a social network. Proc 9th Intl Conf on Knowledge Discovery and Data Mining. 2003; p. 137–146.
- 39. Holme P. Epidemiologically optimal static networks from temporal network data. PLoS Comput Biol. 2013;9:e1003142. pmid:23874184
- 40.
Wilton J. Putting a number on it: The risk from an exposure to HIV; 2012. Available from: http://www.catie.ca/en/pif/summer-2012/putting-number-it-risk-exposure-hiv.
- 41. Hutchinson AB, Farnham PG, Dean HD, Ekwueme DU, Del Rio C, Kamimoto L, et al. The economic burden of HIV in the United States in the era of highly active antiretroviral therapy: evidence of continuing racial and ethnic differences. J Acquir Immune Defic Syndr. 2006;43(4):451–457. pmid:16980906
- 42. Wang Z, Andrews MA, Wu ZX, Wang L, Bauch CT. Coupled disease–behavior dynamics on complex networks: A review. Physics of Life Reviews. 2015;15:1–29.
- 43. Funk S, Gilad E, Watkins C, Jansen VAA. The spread of awareness and its impact on epidemic outbreaks. Proc Natl Acad Sci USA. 2009;106(16):6872–6877. pmid:19332788
- 44. Shaw LB, Schwartz IB. Enhanced vaccine control of epidemics in adaptive networks. Phys Rev E. 2010;81:046120.
- 45. Brunson EK. The impact of social networks on parents’ vaccination decisions. Pediatrics. 2013;131(5):e1397–e1404. pmid:23589813