^{1}

^{*}

^{2}

^{3}

Conceived and designed the experiments: MS JHJ. Performed the experiments: MS. Analyzed the data: MS. Wrote the paper: MS.

The authors have declared that no competing interests exist.

The dynamics of infectious diseases spread via direct person-to-person transmission (such as influenza, smallpox, HIV/AIDS, etc.) depends on the underlying host contact network. Human contact networks exhibit strong community structure. Understanding how such community structure affects epidemics may provide insights for preventing the spread of disease between communities by changing the structure of the contact network through pharmaceutical or non-pharmaceutical interventions. We use empirical and simulated networks to investigate the spread of disease in networks with community structure. We find that community structure has a major impact on disease dynamics, and we show that in networks with strong community structure, immunization interventions targeted at individuals bridging communities are more effective than those simply targeting highly connected individuals. Because the structure of relevant contact networks is generally not known, and vaccine supply is often limited, there is great need for efficient vaccination algorithms that do not require full knowledge of the network. We developed an algorithm that acts only on locally available network information and is able to quickly identify targets for successful immunization intervention. The algorithm generally outperforms existing algorithms when vaccine supply is limited, particularly in networks with strong community structure. Understanding the spread of infectious diseases and designing optimal control strategies is a major goal of public health. Social networks show marked patterns of community structure, and our results, based on empirical and simulated data, demonstrate that community structure strongly affects disease dynamics. These results have implications for the design of control strategies.

Understanding the spread of infectious diseases in populations is key to controlling them. Computational simulations of epidemics provide a valuable tool for the study of the dynamics of epidemics. In such simulations, populations are represented by networks, where hosts and their interactions among each other are represented by nodes and edges. In the past few years, it has become clear that many human social networks have a very remarkable property: they all exhibit strong community structure. A network with strong community structure consists of smaller sub-networks (the communities) that have many connections within them, but only few between them. Here we use both data from social networking websites and computer generated networks to study the effect of community structure on epidemic spread. We find that community structure not only affects the dynamics of epidemics in networks, but that it also has implications for how networks can be protected from large-scale epidemics.

Mitigating or preventing the spread of infectious diseases is the ultimate goal of infectious disease epidemiology, and understanding the dynamics of epidemics is an important tool to achieve this goal. A rich body of research _{0}, a quantity central to developing intervention measures or immunization programs, depends crucially on the variance of the distribution of contacts _{0} than one would expect from contact networks with a uniform degree distribution, and the existence of highly connected individuals makes them an ideal target for control measures

While degree distributions have been studied extensively to understand their effect on epidemic dynamics, the community structure of networks has generally been ignored. Despite the demonstration that social networks show significant community structure

In this article, we aim to understand how community structure affects epidemic dynamics and control of infectious disease. Community structure exists when connections between members of a group of nodes are more dense than connections between members of different groups of nodes

Using both simulated and empirical social networks, we show how community structure affects the spread of diseases in networks, and specifically that these effects cannot be accounted for by the degree distribution alone. The main goal of this study is to demonstrate how community structure affects epidemic dynamics, and what strategies are best applied to control epidemics in networks with community structure.

We generate networks computationally with community structure by creating small subnetworks of locally dense communities, which are then randomly connected to one another. A particular feature of such networks is that the variance of their degree distribution is relatively low, and thus the spread of a disease is only marginally affected by it

Panels show effect of community structure on (a) final size, (b) duration and (c) peak prevalence (i.e. maximum frequency of population infected). Each of the points represents the average of maximally 2000 simulation runs (only simulations with a final size of at least 2% of the population were included in calculating the averages). Error bars are omitted because the ranges are less than the size of the plotting points. The different colors represent different transmission rates: gray, _{0}≈_{0}≈_{0}≈^{2} is negligible.

Epidemics in populations with community structure show a distinct dynamical pattern depending on the extent of community structure. In networks with strong community structure, an infected individual is more likely to infect members of the same community than members outside of the community. Thus, in a network with strong community structure, local outbreaks may die out before spreading to other communities, or they may spread through various communities in an almost serial fashion, and large epidemics in populations with strong community structure may therefore last for a long time. Correspondingly, the incidence rate can be very low, and the number of generations of infection transmission can be very high, compared to the explosive epidemics in populations with less community structure (

(a) and (b): Typical incidence curves of disease outbreaks in a network with medium community structure ((a): _{0}≈

In order to halt or mitigate an epidemic, targeted immunization interventions or social distancing interventions aim to change the structure of the network of susceptible individuals in such a way as to make it harder for a pathogen to spread

(a) The correlation coefficient ^{2}

Color code denotes the difference in the average final size _{m}_{degree}_{betweenness}_{degree}_{randomwalk}_{betweenness}_{randomwalk}

To test the efficiency of targeted immunization strategies on real networks, we used interaction data of individuals at five different universities in the US taken from a social network website

The bars show the difference in the average final size _{m}_{degree}_{randomwalk}_{acquaintance1}_{CBF}_{acquaintance2}_{CBF}

In practice, identifying immunization targets may be impossible using such algorithms, because the structure of the contact network relevant for the spread of a directly transmissible disease is generally not known. Thus, algorithms that are agnostic about the full network structure are necessary to identify target individuals. The only algorithm we are aware of that is completely agnostic about the network structure network structure identifies target nodes by picking a random contact of a randomly chosen individual

It is important to note a crucial difference between algorithms such as CBF (henceforth called stochastic algorithms) and algorithms such as those that calculate, for example, the betweenness centrality of nodes (henceforth called deterministic algorithms). A deterministic algorithm always needs the complete information about each node (i.e. either the number or the identity of all connected nodes

In the computationally generated networks, CBF outperformed the acquaintance method in large areas of the parameter space (

In empirical networks, CBF did particularly well on the network with the strongest community structure (Oklahoma), especially in comparison to the similarly effective acquaintance method with

The speed of an algorithm is assessed by counting the nodes that have to be visited by the algorithm until the desired vaccination coverage is achieved. Each visit is counted, even if the same node has been visited before. The bars show the difference of node visits (▵ visits) between the acquaintance2 method and the CBF method. Red bars mean the CBF method has visited fewer nodes - the difference is given by the height of the bar. A black bar indicates that the acquaintance2 methods has visited fewer nodes. With the exception of vaccination coverage 30% in the North Carolina network, the CBF method is always faster. (Data for speed comparison between acquaintance1 and CBF is not shown - the acquaintance1 method is always faster, but significantly less effective - see middle column in

A great number of infectious diseases of humans spread directly from one person to another person, and early work on the spread of such diseases has been based on the assumption that every infected individual is equally likely to transmit the disease to any susceptible individual in a population. One of the most important consequences of incorporating network structure into epidemic models was the demonstration that heterogeneity in the number of contacts (degree) can strongly affect how R_{0} is calculated

An important caveat to mention is that community structure in the sense used throughout this paper (i.e. measured as modularity

Identifying important nodes to affect diffusion on networks is a key question in network theory that pertains to a wide range of fields and is not limited to infectious disease dynamics only. There are however two major issues associated with this problem: (i) the structure of networks is often not known, and (ii) many networks are too large to compute, for example, centrality measures efficiently. Stochastic algorithms like the proposed CBF algorithm or the acquaintance method address both problems at once. To what extent targeted immunization strategies can be implemented in a infectious diseases/public health setting based on practical and ethical considerations remains an open question. This is true not only for the strategy based on the CBF algorithm, but for most strategies that are based on network properties. As mentioned above, the contact networks relevant for the spread of infectious diseases are generally not known. Stochastic algorithms such as the CBF or the acquaintance method are at least in principle applicable when data on network structure is lacking.

Community structure in host networks is not limited to human networks: Animal populations are often divided into subpopulations, connected by limited migration only _{0}

To investigate the spread of an infectious disease on a contact network, we use the following methodology: Individuals in a population are represented as nodes in a network, and the edges between the nodes represent the contacts along which an infection can spread. Contact networks are abstracted by undirected, unweighted graphs (i.e. all contacts are reciprocal, and all contacts transmit an infection with the same probability). Edges always link between two distinct nodes (i.e. no self loops), and there must be maximally one edge between any single pair of nodes (i.e no parallel edges). Each node can be in one of three possible states: (S)usceptible, (I)nfected, or (R)esistant/immune (as in standard SIR models). Initially, all nodes are susceptible.

Simulations with immunization strategies implement those strategies before the first infection occurs. Targeted nodes are chosen according to a given immunization algorithm (see below) until a desired immunization coverage of the population is achieved, and then their state is set to resistant.

After this initial set-up, a random susceptible node is chosen as patient zero, and its state is set to infected. Then, during a number of time steps, the initial infection can spread through the network, and the simulation is halted once there are no further infected nodes. At each time step (the unit of time we use is one day, _{0}_{0}∼T(<k^{2}>/<k>−1)^{2}>_{0}≈3

After a simulation, we record the total number of cases infected (the epidemic size), the maximum frequency of infection at any point during the simulation (the peak prevalence), and the number of days that have passed between the first infected case and the simulation stop (the duration of the epidemic).

In order to understand the effect of community structure, we generated networks with 2000 nodes from scratch with varying degrees of community structure. The strength of community structure is generally measured as network modularity _{ij}_{i}_{ij} = a_{i} a_{j}

To generate networks with community structure, we initialize a network by creating 50 small-world communities (as found in various social networks, see e.g. ref.

The quantity ^{2}^{2}_{0}_{0}

We used the network data collected on the social network website Facebook (

Thus, in order to obtain contact network data that are relevant for the spread of infectious diseases transmitted directly from person to person by the respiratory or close-contact route, we make the following assumptions: Individuals who have a friendship relation in the network, and who either (a) have the same dormitory residence, or (b) who major in the same field and the same class year, are likely to be in close enough physical contact on a regular basis as to be able to transmit an infection to each other. Thus, using the raw friendship data and the available information on dormitory residence, major, and class year, we extract the subgraph which reflects our assumptions. Having extracted the subgraph, we remove all nodes who are not part of the largest connected component (i.e. small subgraphs that are not part of the larger network). The networks thus reduce to the following contact networks:

Caltech (620 nodes and 7,255 edges,

Princeton (5,112 nodes and 28,684 edges,

Georgetown (7,651 nodes and 79,799 edges,

Oklahoma (10,386 nodes and 163,225 edges,

North Carolina (13,081 nodes, 88,266 edges,

We note that the modularity

The algorithms used to identify nodes can be divided into two classes: deterministic and stochastic algorithms. Deterministic algorithms require the complete information about each node (i.e. either the number or the identity of all connected nodes

We identifiy target nodes by ranking nodes to one of the three following criteria: degree, betweenness centrality, and random-walk centrality.

The degree of a node simply denotes the number of edges incident to a node.

The betweenness centrality _{B}(i)_{st}_{st}(i)

The random-walk centrality of a node _{ij}_{is}

Nodes are ranked according to the measure chosen (i.e degree, betweenness centrality, or random-walk centrality). We then immunize nodes going from high to low rankings, until the desired immunization coverage is achieved.

We use two stochastic algorithms to identify target nodes without knowledge of the full network structure. In the algorithms described below, targets are identified and immunized if they have not been immunized before.

The first algorithm, acquaintance immunization, has been described by Cohen et al. _{0}_{1}_{0}

We propose another strategy, the community-bridge-finder (CBF) strategy, which rests on the observation that some individuals act as bridges between communities. The goal of the CBF algorithm is to identify such individuals based on random walks, without knowledge of the network structure, and thus without knowledge of the communities in a network. The algorithm works as follows: pick a random node _{i = 0}_{i≥2}_{i}_{i}_{i−1}_{i}_{i−1}_{i−1}_{i}_{i−1}_{j<i}_{i−1}_{i−1}_{0}

(a) A random walk follows the path starting from v_{0} to v_{1} and v_{2}, at which point it starts checking for connections of v_{2} to v_{0} and v_{1}. (b) Since there are more than one connections (v_{2}-v_{1} and v_{2}-v_{0}), the walk continues to v_{3}. (c) Except the obvious v_{3}-v_{2}, there are no connections from v_{3} to any of the previously visited nodes, so v_{2} is a potential target. (d) The algorithm then picks two random neighbors of v_{3} to check for connections to previously visited nodes - and finds one (to v_{0}). (e) Hence, v_{2} is dismissed as a potential target, and the random walk continues to v_{4}. Again, v_{4} does not back-connect to any previously visited node (except, of course, to v_{3}), and thus v_{3} is identified as a potential target - (f) thus again, two random neighboring nodes are picked to check for connections to previously visited nodes. Since no back connections can be found, v_{3} is identified as a target and immunized.

An algorithmic search for community bridges as described above can potentially take a very long time, depending on the structural features of the network. For example, the frequency of nodes that can potentially meet the immunization requirement set by the algorithm might be smaller than the desired immunization coverage. To prevent endless searches for community bridges, two additional checks are implemented. First, the number of nodes in any running random path does not exceed 10 (this is implemented using a first-in-first-out list that keeps track of the visited nodes). Second, we keep track of all nodes visited, and if a node has been visited at least

Results from simulations with the same parameters and settings as

(2.38 MB TIF)

Degree distributions of the empirical networks used in the main text. Main panels show cumulative frequency distributions; insets show non-cumulative frequency distributions.

(2.24 MB TIF)

We thank Sebastian Bonhoeffer, Roland Regoes and Jamie Lloyd-Smith for helpful comments.