Efficient mitigation strategies for epidemics in rural regions

Containing an epidemic at its origin is the most desirable mitigation. Epidemics have often originated in rural areas, with rural communities among the first affected. Disease dynamics in rural regions have received limited attention, and results of general studies cannot be directly applied since population densities and human mobility factors are very different in rural regions from those in cities. We create a network model of a rural community in Kansas, USA, by collecting data on the contact patterns and computing rates of contact among a sampled population. We model the impact of different mitigation strategies detecting closely connected groups of people and frequently visited locations. Within those groups and locations, we compare the effectiveness of random and targeted vaccinations using a Susceptible-Exposed-Infected-Recovered compartmental model on the contact network. Our simulations show that the targeted vaccinations of only 10% of the sampled population reduced the size of the epidemic by 34.5%. Additionally, if 10% of the population visiting one of the most popular locations is randomly vaccinated, the epidemic size is reduced by 19%. Our results suggest a new implementation of a highly effective strategy for targeted vaccinations through the use of popular locations in rural communities.


Introduction
Influenza A (H1N1), commonly known as swine flu, continues to be the dominant influenza virus in circulation across the globe with many countries and overseas territories reporting laboratory confirmed cases, including at thousands of deaths [1]. Factually, the origin of pandemic virus strains, such as the current H1N1, often trace back to rural regions. For example, the H1N1 2009 virus is suspected to have been originated in La Gloria, a small town near Veracruz, Mexico. Also, the previous strain of H1N1, commonly known as the Spanish Flu of 1918 that wrought devastation around the world, originated within the rural State of Kansas near Fort Riley. Other instances of epidemics originating in rural regions include the swine flu that originated in September 1988 at a hog barn in Walworth County, Wisconsin, the H5N3 virus that was identified at La Garnache farm in France in late January 2009, and the Asian flu that was a category 2 avian influenza in Ghizhou, China in 1956.
For analysis and containment purposes, large cities are generally considered to be infection hubs owing to the large population densities and mobility indices. Consequently, most spatio-temporal research on infectious human diseases focuses on large cities, such as Portland [2], Chicago [3], and Dresden [4], which respectively represent excellent examples of an agent-based model, a multiscale meta-population model, and a social-structure model, defining different levels of detail and complexity. Another approach to characterize the heterogeneous epidemic terrain of a human population is based on the construction of the network representing contacts among people. Studies of this type include [5], [6], and [7].
Various immunization strategies have been formulated for urban populations. Some of these strategies assumed that the human population distribution can be estimated as scale free. One such strategy was a targeted immunization wherein nodes having the highest connectivity were deemed to be the most critical for spreading the infection, and hence those highly connected people were chosen for vaccination [8]. However, global immunization strategies require the knowledge of the entire network of individuals and are complex from both computation and implementation stand points. Conversely, localized mitigation strategies, such as acquaintance immunization [9], randomly choose a subset of the entire population, and randomly select a set of their acquaintances to vaccinate. These strategies require a lower fraction of the population to be vaccinated than a random global immunization to dampen the impact of an epidemic. Other localized immunization methods include targeting acquaintances of randomly selected people by their estimated contact characteristics [10].
Disease dynamics in rural regions [11] [12] have received limited study. Since the population densities and human mobility indices of rural regions are very different from those of cities, it is imperative to develop specific mitigation schemes to impede the spread of epidemics right from its likely source, the rural location. Furthermore, recent studies show that rural residents have a lower likelihood to obtain certain preventive health services than urban residents [13]. These factors necessitate research on predictive and optimally preventive strategies in rural regions.
This paper takes a unique look at rural regions, and presents mitigation strategies tailored for rural Clay county in Kansas, USA. We propose mitigation strategies that are based on a contact network model developed using data collected through a survey campaign conducted in rural Clay and Kearny counties in Kansas, USA. By characterizing the contact structure of rural regions, we are able to investigate the influences of this structure and various mitigation strategies on the speed, shape, and size of the outbreak. Our analysis shows that, although global targeted strategies are the most efficient in mitigating epidemics with a limited amount of resources, they can also be unfeasible due to partial knowledge of the population and conflicts with individual rights. Random vaccine distribution in selected popular location within a rural community offers the opportunity to indirectly reach the individuals who play a significant role in the epidemic propagation. We demonstrate with simulations that this location-based strategy can be 55% as effective as the best global target strategy.

Methods
Our simulation results on epidemic spreading in rural communities are based on data collected through distributed surveys. This data is used to construct a contact network and to analyze the epidemic. Several random and targeted mitigation strategies are investigated through an SEIR model with parameters estimated in an analysis of the recent H1N1 influenza [3].

Survey Data
In Spring 2009, we surveyed residents of two rural Kansas counties through a visit to a county seat and mailed surveys, under our direct personal supervision. We obtained ethics approval in January 2009 for research protocols, survey forms, and informed consent procedures used, by the Institutional Review Board (IRB) of Kansas State University, Human Subjects Committee, University research Compliance Office, 203 Fairchild Hall, Manhattan, Kansas. All potential participants were provided informed consent in a cover letter attached to their surveys; signatures were not required from the participants as a way of protecting their privacy.
The mailed surveys were well accepted with response rates of 64.8% and 41%, respectively for Clay County and Kearny County. The survey consisted of 30 short questions, a question concerning visits to local businesses and locations, a question concerning visits to cities within the surrounding region, and a set of contact questions. The spread of an epidemic in rural areas may be influenced by both the vulnerability of the population and the extent of their contacts with each other. Vulnerability includes their susceptibility to infection due to both poor health and a lack of preventive measures, such as vaccination. Once an epidemic has begun, the willingness of the population to comply with precautionary health measures can influence the rate and extent to which the epidemic spreads. In the survey, all these factors were assessed.
Survey results yielded four measures of risk factors important to the spread of epidemics: health risk, contact risk, prevention risk, and compliance risk. To what extent did these risks overlap? Each possible combination of risks was evaluated and summarized in Table 1. It is interesting to note that people with the most contacts tended to have the least preparedness for an epidemic, and people who were willing to visit others even during an epidemic were among those most at-risk because of their health status. Additionally, those who tended to visit friends and family members more often during normal times were also likely to retain this behavior even under epidemic conditions. This property of the rural communities is interesting, since it provides a given level of stability within the contact network and increases the accuracy of our epidemic analysis.
Survey results were also used to construct the weighted contact network. To this purpose, we used the survey responses about frequently visited locations and the levels of contacts. The respondents were asked to identify within a set of locations, those which they visit on a typical day. The responses were captured as a binary vector L i for each respondent i with each element Of those respondents from families in which all members had been vaccinated, only 34.1% had one or more at-risk health conditions, compared to 49.6% of those from families in which at least some of the members had not been vaccinated (p,0.09).
The percentage of respondents with one or more at-risk health conditions tended to rise as a function of their unwillingness to comply with a directive to stay at home during an epidemic: no visits (46.1%), one or two (38.8%), and three or more (75.0%)(p,0.08).
Those who would be most vulnerable or susceptible to an epidemic were actually most likely to be engaging in multiple contacts with friends, family, and guests.
Those who were most susceptible to an epidemic were least likely to be prepared for it in terms of anticipatory vaccination.
Those who were willing to visit others even during an epidemic were among those most at-risk because of their health status.

Contact/Prevention Contact/Compliance Prevention/Compliance
As contact levels rose from low to medium to high, the percentage of households with full vaccinations fell from 32.1% to 19.7% and 20.8%, respectively (p,0.25 by chi-square test; r = 0.11, p,0.18).
As contact levels rose from low to medium to high, the percentage of respondents who would not comply with health directives to remain at home rose from 29.4% to 40.0% to 44.7%, respectively (p,0.28 by chi-square test; r = 0.13, p,0.12).
Those from families that were fully vaccinated declined from 29.1% to 22.0% to 0.0% as respondents shifted away from no visits, one or two visits, or three or more visits during an epidemic (p,0.08 by chi-square test; r = 0.17, p,0.04).
Those with the most contacts tended to have the least preparedness for an epidemic.
Those who tended to visit friends and family members more often during normal times were also likely to retain this behavior even under epidemic conditions.
Those making the most visits were the least likely to be protected by vaccination.
corresponding to a location. The contact questions asked the respondents to estimate the number of individuals with whom the respondent made contact for three different levels of contact. Contact levels were classified into Proximity contact (coming within 5 feet of another person, even if in passing), Direct-Low contact (directly touching another person for a short period of time in what most people would consider a low risk situation of being infected), and Direct-High contact (directly touching another person for an extended period of time or in what most people would consider a relatively high risk situation of being infected). The responses of the contact questions are quantified as values n x,i that represent the number of individuals contacted by respondent i in a typical day according to each contact level x.

Contact Network Construction
With these responses, the rural community is represented as a weighted contact network where each of the survey respondents is represented as a node within the network that is connected together with links representing the contact between respondents. Each link has a weight that represents the normalized measure of contact between the connected pair of respondents or nodes. Each link's weight w i,j is taken as the average of three sub-weights that correspond to the interactions between node i and node j estimated for each contact level x. Values of weights w i,j range within the interval [0, 1]. We capture the location responses within the parameter m i,j = (1+l i,j )/(1+d), where d is the total number of locations and l i,j is the dot product of the respective location vectors L i and L j for nodes i and j. For a given type of contact x, the related sub-weight function depends on the node degree and the parameter m i,j . When either node degree n x,i or n x,j is zero, the subweight should be equal to zero. The sub-weight should also increase monotonically with both n x,i and n x,j , approaching unity when both are large. When a pair of nodes visit all the locations, m i,j is equal to unity and the sub-weight should be maximum. On the other hand, m i,j has a small positive minimum, to allow for interactions outside the locations included in the survey. For given n x,i and n x,j , the sub-weight should be minimum when l i,j is equal to zero, and should increase monotonically with increasing l i,j , and consequently m i,j .
For each contact level x, we compute the sub-weight w x,i,j between node i and node j according to a simple function which follows the desired behavior: We selected the values of p x such that they include the relative importance of each of the three contact categories from the survey responses, constraining p Proximity to be less than p Direct-Low and p Direct-Low to be less than p Direct-High . The values of p x have been estimated by matching the epidemic curve of the H1N1 outbreak in La Gloria, Mexico [14], with the average epidemic curve obtained with the weighted network simulations for Clay Center, Kansas. Based on the minimum squared error, we found that the best contact levels for Proximity, Direct-High and Direct-Low contacts are p Proximity = 0.0025, p Direct-Low = 0.015, and p Direct-High = 1.0. Figure 1 reports the number of new infected individuals in La Gloria [14] with the corresponding simulated new infected individuals in Clay Center, Kansas given the best estimated values for the three contact levels. This parameter estimation was done through an SEIR epidemic model, which we describe in the following sections. With the estimated model parameters, we have created the weighted contact network shown in Figures 2.(a) Figure 2.(i) has been used to create Figure 3, where not only nodes representing people, but also nodes representing popular locations in Clay Center are shown.
We performed a sensitivity analysis on each p x , varying their values up to 15%. These variations, shown in Table 2, produce a maximum of 3.4% variation in total infection cases, with most changes resulting in a variation of less than 1% in total cases.
To compare our selection of the structure of the weight w x,i,j shown in Eq. 1, we have constructed another weighted contact network based only on data about common visited locations. In this case, the weights w l i,j are computed as w l i,j = l i,j /d. The network constructed in this way has 38% of nodes isolated, i.e., with node degree equal to zero, and when we simulated the SEIR model only 63% of the infected nodes coincided with the infected nodes obtained using the same model on our contact network. Consequently, the use of only location data produces not negligible differences in the results and is not considered in the following analysis.

Network Metrics
To describe in details the characteristics of the weighted contact network, we select some graph-theoretical metrics that reflect the local and global properties of the graph [15]. In Table 3, some relevant metrics for the contact networks are listed. The contact network is composed of 138 nodes (N) and 9222 links. It is important to note that that the network is not far from a fully connected network, which would have 9453 links. However, each link can have a very different importance due to the structure of the link weights. For this reason, we select the node strength as one metric to characterize a node. The strength s i of node i is defined as the sum of the weights w i,j of all links between node i and its neighbors N i , s i~P j[Ni w i,j . The node strength is analogous to the node degree in the binary network, which measures the number of contacts or neighbors of a node. The second metric we compute is the average shortest path. To compute shortest path properly, we define the distance d i,j between any neighbor nodes i and j as d i,j = 12w i,j . The distance defined in this way is always nonnegative and reveals a short distance separating node i and node j when their link weight w i,j is high. The third metric, the network diameter is defined as the longest of the shortest paths. As a centrality measure, we compute the betweenness b i of a node i. It is defined as the measure of the number of shortest paths between any pair of nodes passing through node i. where s h,j is the total number of shortest paths from node h to j and s h,j,i is the number of those shortest paths that pass through the node i. A node that appears in many shortest paths has high betweenness. Each b i is normalized by the maximum number of shortest paths that can pass through a node (N21)(N22)/2. Another measure of node centrality is the clustering coefficient c i of a node i, which measures the level of connection among the neighbors of node i.
whereŵ w i,j~wi,j = max (w i,j ) and k i is the degree of node i in the binary version of the weighted contact network [16]. By averaging over all individual clustering coefficients, we obtain the average clustering coefficient of the contact network. The node coreness is the maximum value k such that the node still exists in the network, before being removed in the k+1 core. The k-core of a graph is a maximal subgraph in which each vertex has at least strength k.
The coreness measures the deepness of a node in the core of the network where a higher value indicates that the node is deeper in the core. The discrete values for the strength classes are obtained by a fine quantization of the node strength (step size on the order of 10 26 ). From the spectral domain, the maximum eigenvalue is the largest eigenvalue of the weighted adjacency matrix W representing the network. The elements of W are the weights w i,j , and the matrix in this case is symmetric and has zeros in the main diagonal. A large maximum eigenvalue corresponds to a small epidemic threshold in the Susceptible-Infected-Susceptible model [17]. Networks often display some level of grouping of nodes in an organized fashion that allow them to be divided into different clusters or communities. One popular method of detecting communities is to maximize a parameter known as modularity [18]. Modularity is a measure of the difference between the edges within each community and the expected number of edges in the same community, summed over all communities within the graph.  Using the weighted version of the algorithm described in [19] [18], we found two communities within our contact network. With a modularity value of 0.1087, 61% of the population fell into community 1 and the remaining 39% of the population composed community 2.

SEIR Model on the Contact Network
We expand the weighted compartmental model [17] to represent the different disease states of individuals: Susceptible (S), Exposed (E), Infectious (I), and Recovered (R). The states and the transitions between states are unique to each disease and its characteristics, requiring customization to each disease. In this model, we selected b = 0.4, where b is the rate of infection across a link between a susceptible individual and infected individual, e<0.909, where e is the transition rate parameter between the Exposed and Infected compartments, and d = 0.4, where d is the transition rate parameter between the Infected and Recovered compartments [20].
The network topology plays an important role in the spreading process in the transition from S to E when an I individual contacts an S individual and successfully infects him/her. The probability that node i is not infected at time t depends on the probabilities that a neighbor node j is previously infected p j,t21 , is in contact with node i (w i,j ), and successfully infects node i (b) [17].
The probability that a node is infected (transition from S to E) at time t is then 12f i,t . The remaining transitions are topology independent and only depend on the rate parameters, e and d, of the disease model. When an individual has contracted the disease and has transitioned into the exposed compartment, the next transition to the state I occurs with rate e. Once infected, a node attempts to infect its susceptible neighbors until it transitions to the recovered state. Each infected node recovers with recovery rate d.
Once a node is recovered (R), it remains recovered for the remainder of the simulation. The recovered compartment serves as an accumulator of all the cases, thus the number of recovered individuals R at the end of a simulation is a good approximation of the total number of cases caused by the outbreak. The blue curve in Figure 1 has been computed using the above model.

Epidemic Simulations
The analysis of the epidemic evolution and the evaluation of multiple mitigation strategies are performed using an SEIR model on the contact network. We propose different immunization strategies that can be implemented as vaccinations or antiviral treatments. The immunization strategies are classified in three categories based on individuals, locations, and communities. In each individual immunization strategy, nodes are chosen either deliberately, based on a node metric or randomly.
The random selection of nodes as recipients of an immunization represents an unbiased distribution of resources and is the simplest method for distribution. The node metrics selected for the targeting strategies include node strength, node coreness, and node betweenness. Node strength, as a measure of how well an individual is connected with the rural population, is an intuitive measure of how likely a node is to be infected by other nodes as well as how likely the node is to pass the infection on to others. Therefore to mitigate the infection while using node strength to select nodes, we target the nodes with the highest strength. The node coreness is a measure of how deep a node is in the core of a network. This depth is a measure of the maximum strength of the nodes iteratively removed from the network periphery before the node is removed. From a topological perspective, the core of the network facilitates connectivity and is vital for it. Therefore a targeted removal or immunization of the core nodes serves to hinder and disrupt the connectivity that allows the spread of the infection. The betweenness of a node measures how many shortest paths between all pairs of nodes choose to route through the node. Thus targeting nodes with highest betweenness serves to disrupt the shortest paths that the virus can take, forcing it to longer routes. We applied these different targeting strategies globally on the entire network and then within the communities and selected locations. The immunization of a node is implemented by forcing the immunized nodes to remain susceptible throughout the epidemic.
In Table 4 the reduction in the number of cases by percentage with respect to the unmitigated epidemic is shown, for different criteria for the selection of the 10% of immunized people among the global population. The most effective strategy is the one where the 10% of nodes with highest strength are selected, in line with previous results, followed by the one based on the selection of 10% of the nodes with the deepest coreness. However, these types of strategies have an inherent problem: how can we practically detect those special nodes? Fortunately, the data collected on the location popularity, can help to solve this problem. The survey respondents associated themselves with various locations in the county by indicating which ones they typically visit. We used two criteria to select the locations for targeting and random strategies. To select locations for the random strategies, we chose the locations having the highest average value of the desired metric and being associated with at least 10% of the population. For the targeting strategies, we chose the locations visited by more than 10% of the population, and we immunized 10% of the population by selecting nodes with the highest combined sum for the desired metric within those locations. In Table 5, the reduction in the number of cases by percentage is shown, when the immunization of the selected people is performed among the group visiting a particular location.

Results and Discussion
Obtained results span the two investigated areas, namely risk assessment and mitigation strategy evaluations. Concerning risk assessment, very few rural respondents (2%) did not have a high level of risk in at least one of four areas assessed: health risk, contact risk, prevention risk, and compliance risk. Over 75% of households did not have complete uptake of flu vaccine, nearly half of respondents had at least one major health risk, and nearly two-fifths of respondents said they would not comply with directives to stay at home during an epidemic. Risk levels were positively associated, suggesting that risks were compounded with each other, a situation posing greater problems for any attempt to predict or reduce the spread of epidemics in rural areas. Married respondents were much less likely to report selected health risks by a substantial margin (38% vs. 69%). Other demographics factors had relatively small associations with health, compliance, prevention, or contact risks, although some nonlinear associations between income and the risk factors were noted, with middleincome respondents having the lowest risk levels compared to lower or higher-income respondents.
Concerning mitigation strategies evaluation, Table 4 shows that the random immunization of 10% of the population (first strategy) reduces the epidemic size by 11.40%, with no substantial gain. However, if 10% of the nodes with highest node strength are immunized (second strategy), the epidemic size is reduced by 34.57%, more than three times the size of the random immunization campaign. In the interesting case where the 10% of the immunized nodes are randomly selected within the group of people frequently visiting a specific popular location (third strategy), an intermediate benefit, of about 19% epidemic size reduction, is obtained. The identification of specific locations visited by highest strength nodes has the clear benefit of improving the efficiency of a random immunization campaign, when this campaign is conducted in specific locations. Figure 4 shows the curves of new infected nodes with time under free evolution and for the discussed three mitigation strategies.
Our simulations suggest that information and immunization activities for rural communities should be carried out in specific  locations, called key locations, which not only most people but also the most key people (highest strength nodes) often visit. Detecting key locations requires some amount of data collection and analysis. However, detecting key locations is much easier than identifying highest strength nodes (individuals with high levels of contact). In other words, the probability of immunizing a highest strength node given a node random selection in a key location is much higher that the probability of immunizing a highest strength node given a node random selection in the entire population.
In the presence of limited anti-viral and vaccination resources, government health agencies should seek to use the most effective methods of distribution for mitigation of the threat. Here, we have investigated the distribution of immunizations to 10% of the population through various targeting strategies. This work is of particular interest to rural regions, as they are more likely to face resource shortages due to smaller budgets than urban areas.
Additionally, rural regions are more likely to have a small set of local businesses and locations than urban areas due to lower population densities. Therefore classification and analysis of popular locations to be targeted for vaccine distribution is a feasible task. This work has shown the benefit of being able to select proper distribution locations; a strategy that can be implemented without having full knowledge of every individual within the rural population.