Skip to main content
  • Loading metrics

Role of genetic heterogeneity in determining the epidemiological severity of H1N1 influenza

  • Narmada Sambaturu ,

    Contributed equally to this work with: Narmada Sambaturu, Sumanta Mukherjee

    Roles Conceptualization, Formal analysis, Methodology, Software, Validation, Visualization, Writing – original draft

    Affiliation IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, India

  • Sumanta Mukherjee ,

    Contributed equally to this work with: Narmada Sambaturu, Sumanta Mukherjee

    Roles Conceptualization, Formal analysis, Methodology, Software, Validation, Visualization

    Affiliation IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, India

  • Martín López-García,

    Roles Formal analysis, Methodology, Validation, Visualization, Writing – original draft

    Affiliation Department of Applied Mathematics, University of Leeds, Leeds, United Kingdom

  • Carmen Molina-París,

    Roles Supervision, Writing – review & editing

    Affiliation Department of Applied Mathematics, University of Leeds, Leeds, United Kingdom

  • Gautam I. Menon ,

    Roles Conceptualization, Supervision, Writing – review & editing (NC); (GIM)

    Affiliations Computational Biology and Theoretical Physics groups, The Institute of Mathematical Sciences, Chennai, Tamil Nadu, India, Homi Bhabha National Institute, Training School Complex, Anushaktinagar, Mumbai, Maharashtra, India

  • Nagasuma Chandra

    Roles Conceptualization, Supervision, Writing – review & editing (NC); (GIM)

    Affiliations IISc Mathematics Initiative, Indian Institute of Science, Bangalore, Karnataka, India, Department of Biochemistry, Indian Institute of Science, Bangalore, Karnataka, India


Genetic differences contribute to variations in the immune response mounted by different individuals to a pathogen. Such differential response can influence the spread of infectious disease, indicating why such diseases impact some populations more than others. Here, we study the impact of population-level genetic heterogeneity on the epidemic spread of different strains of H1N1 influenza. For a population with known HLA class-I allele frequency and for a given H1N1 viral strain, we classify individuals into sub-populations according to their level of susceptibility to infection. Our core hypothesis is that the susceptibility of a given individual to a disease such as H1N1 influenza is inversely proportional to the number of high affinity viral epitopes the individual can present. This number can be extracted from the HLA genetic profile of the individual. We use ethnicity-specific HLA class-I allele frequency data, together with genome sequences of various H1N1 viral strains, to obtain susceptibility sub-populations for 61 ethnicities and 81 viral strains isolated in 2009, as well as 85 strains isolated in other years. We incorporate these data into a multi-compartment SIR model to analyse the epidemic dynamics for these (ethnicity, viral strain) epidemic pairs. Our results show that HLA allele profiles which lead to a large spread in individual susceptibility values can act as a protective barrier against the spread of influenza. We predict that populations skewed such that a small number of highly susceptible individuals coexist with a large number of less susceptible ones, should exhibit smaller outbreaks than populations with the same average susceptibility but distributed more uniformly across individuals. Our model tracks some well-known qualitative trends of influenza spread worldwide, suggesting that HLA genetic diversity plays a crucial role in determining the spreading potential of different influenza viral strains across populations.

Author summary

Levels of immunity to strains of H1N1 influenza can vary, depending on the individual. This strongly influences how the disease spreads in a population. Accounting for such variations is a major challenge for the epidemiology of infectious diseases. We study the effect of population-level genetic heterogeneity on the epidemic spread of different strains of H1N1 influenza. We model the immune response of specific ethnicities to a number of H1N1 viral strains, using this information to study disease spread for these (ethnicity, viral strain) epidemic pairs. Our results show that larger genetic diversity at the level of immune response, leading to the presence of susceptibility sub-populations with a broad distribution of susceptibilities, protects against the spread of influenza in a population. We also show that populations with a small number of highly susceptible individuals, but with a large number of less susceptible ones, should exhibit smaller outbreaks than populations with the same average susceptibility but where it is more uniformly distributed. Our work captures some qualitative trends of influenza spread worldwide, providing a first attempt at understanding how susceptibility heterogeneities arising from variations in immune response determine disease spread in populations.


A central aim of epidemiological studies is to identify factors that place some populations at greater risk of contracting an infectious disease than others [1]. Such factors can be associated with each of the three legs of the “epidemiologic triad” for infectious diseases, the combination of an external causative agent, a susceptible host, and an environment that links these two together [2]. Each of these could vary across populations. However, even if the causative agent was unique and environmental factors assumed to be largely common, variations intrinsic to the host can lead to large inhomogeneities in epidemic progression across populations [1, 2]. Such variations are ignored in standard formulations of compartment models for infectious diseases, which project all properties of the host onto a small set of states describing the host status. These states are typically taken to be susceptible, infected or recovered, with respect to the progress of the disease [3].

The influenza pandemic of 2009 originated in a new influenza virus, pandemic H1N1 2009 influenza A (pH1N1), to which a large fraction of the population lacked immunity [4]. The virus responsible is thought to have arisen from a mixture of a North American swine virus that had jumped between birds, humans and pigs, with a second Eurasian swine virus that circulated for more than 10 years in pigs in Mexico before crossing over into humans [4]. This pandemic caused extensive outbreaks of disease in the summer months of 2009, across the USA, Brazil, India and Mexico, leading on to high levels of disease in the winter months. The pandemic virus had almost complete dominance over other seasonal influenza viruses and was unusual in its clinical presentation, with the most severe cases occurring in younger age groups [4].

The severity of the H1N1 2009 pandemic can be assessed in terms of the basic reproduction number (R0), a fundamental dimensionless epidemiological parameter representing the average number of secondary infections caused by a typical infectious individual in a fully susceptible population. An R0 > 1 leads to an expected exponential increase in the number of infected individuals at early times, an increase which saturates before decreasing as infected individuals recover, whereas for R0 < 1, the number of infected individuals decreases monotonically. We compile estimates of R0 values for the pH1N1 epidemic across several countries from the literature, and list them in Table 1. Substantial variation in R0 values, ranging from about 1.2 at the lower end to values of 3 and above at the upper end, is evident from this table. This variation across countries illustrates the need to account for host-specific susceptibilities to disease. The immune response of the host, modulated by prior infections and vaccinations, is usually the central factor influencing R0, although location-specific contact rates and health-seeking behaviour contribute as well. In this work, we study how the spread of influenza in a population is affected by variation in naïve host immune response.

Table 1. R0 values for the pH1N1 epidemic in different parts of the world, compiled from literature.

Epidemics are typically modelled through deterministic compartmental-type models, represented by coupled non-linear ordinary differential equations. The SIR model is particularly well suited for studying the spread of influenza, since H1N1 is a virus which spreads from person-to-person through contact, without requiring a vector for transmission. The lack of a long incubation period and a relatively rapid recovery makes it possible to ignore the effects of immigration and emigration, as well as of births and deaths due to natural causes [3]. Models such as the SIR model and related models typically assume that individuals in the population are all alike, which allows one to reduce the number of model parameters to be estimated from data, and leads to mathematical models that can be more feasibly studied from an analytical or computational perspective. However, increasing efforts have been devoted during recent years to assessing the impact of individual heterogeneities in disease spread [14]. These heterogeneities can be of very different nature, when considering for example populations structured in specific spatial configurations [1519], such as households [20] or age-structured populations [15], or when there exist heterogeneous individual susceptibilities, infectivities or recovery periods due, for example, to genetic [15, 21] or behavioural [22] reasons. Network or individual-based models provide a methodology for simulating each individual as a separate entity (an agent) with a specified susceptibility, an individual-specific ability to infect others as well as a specified time to recovery, while also being flexible enough to incorporate specific interaction patterns between agents. Such models, however, typically require estimating a large number of parameters. Individual-based models come with substantial overheads in terms of computational resources. In addition, their inherent stochasticity makes extensive averaging necessary [23, 24].

A straightforward generalisation of the simplest version of the SIR model involves sub-dividing populations into smaller groups or sub-populations. Individuals in each sub-population can be considered to be homogeneous, but individuals across different sub-populations can be modelled as responding differentially to the disease, as in the models of [1821, 2529]. Prior work has mainly focused on the theoretical analysis of these models, and relatively few attempts have been made to incorporate clinical or biological heterogeneities known to be relevant at the individual level, into population-level epidemic models. Incorporating such individual-level immunological information into population-level epidemic models accounting for susceptibility or infectivity heterogeneities has been recently identified as a major challenge for mathematical epidemiology [30].

Both innate and adaptive immune responses are initiated when an individual is exposed to the influenza virus. The innate response induces chemokine and cytokine production. Type I interferons are among the most important cytokines produced by the innate immune response and act to stimulate dendritic cells (DCs), enhancing their antigen production. The adaptive immune system can recognise the presence of an intracellular virus and mount a response only if a molecule called the human leukocyte antigen (HLA) binds to and ‘presents’ fragments of viral proteins (epitopes) to the extracellular environment. Professional antigen presenting cells such as DCs present viral antigens to CD4+ T-cells through HLA class-II and to CD8+ T-cells through HLA class-I molecules. The CD4+ T-helper cells promote a B-cell response and antibody secretion. HLA class-I molecules can be found on the surface of all cells, and interact with T-cell receptors (TCRs) present on CD8+ T-cells [31, 32]. These cells are also called cytotoxic T lymphocytes, or CTLs.

The central role of HLA-mediated presentation of antigens in the magnitude and specificity of CTL response in infectious diseases in general [33], and in influenza A in particular [34, 35], have been well studied. A recent study shows that the targeting efficiency of HLA, a function of the binding score of a given HLA allele and the conservation score of a given protein, correlates with the magnitude of the CTL response, and also with the mortality due to influenza A infection [36]. These studies also show that considering a single HLA allele is insufficient to determine the strength of the CTL response [34, 37].

Each individual has 6 HLA class-I alleles, the combination of all 6 alleles being referred to as an HLA genotype. Cross-reactivity between HLA alleles can result in two individuals with completely different HLA genotypes presenting the same number of high affinity epitopes [38]. Also, some alleles correlate with stronger (HLA-A*02 [34]) or weaker (HLA-A*24 [36]) CTL response to the influenza A virus. This raises a number of questions. Does a high risk allele always correlate with a severe influenza epidemic, or can the presence of diverse HLA alleles offset this risk? Are there specific patterns of susceptibility resulting from diversity in HLA, which can confer greater protection to a population? We answer these questions by using the full HLA genotype of each individual, and with an assumption that a person who presents a larger number of high affinity viral epitopes will mount a stronger CTL immune response than one who presents a smaller number [3337, 3943]. We use genetic diversity in HLA alleles to inform epidemiological parameters at the population level and study their influence on the epidemiological spread of H1N1 influenza.

We assume that all other factors affecting disease spread, such as contact patterns [44], health-seeking behaviour [45] and migration [46] are uniform among all individuals in a population, and across all populations. Such factors have been studied in the literature [4446], largely using theoretical models or data collected for small cohorts. Immunological memory of an individual is also an important aspect of the immune response, and can be affected by factors such as the strain with which an individual was first infected [47, 48], prior history of infections [48] and inherited factors [49]. For lack of data regarding these factors, the model described in this work does not incorporate age and immunological history explicitly. To offset this limitation, we focus first on H1N1 strains isolated during the 2009 pandemic, for which immunological memory and vaccination proved insufficient to curb the spread of disease [4, 50]. We mine this data for characteristics which correlate with epidemic size, and test whether these correlations hold for strains isolated in years other than 2009.

In a previous paper [51], we developed a method to group together individuals who can be expected to have a similar CTL response, using the frequency of occurrence of HLA class-I alleles and the full proteome of the pathogen. We formulated an algorithm to generate all possible HLA genotypes given the frequency of occurrence of each allele in a particular population. Algorithms available through the IEDB resource [52] were used to predict the epitopes presented by each such HLA genotype. Clustering was then carried out on these HLA genotypes based on the number of epitopes presented from each viral protein. In this work, we use the algorithm presented in [51] to generate HLA genotypes and thereby predict high affinity epitopes presented by each such genotype. We thus identify sub-populations of individuals with comparable susceptibility to the virus. The relevant parameter in this case is the total number of such epitopes presented, irrespective of the viral protein from which these epitopes originate. We cluster individuals into groups based on this information, and use the clustering results to calculate the rate at which susceptible individuals become infected. This rate can be connected to the parameter β which appears in the conventional compartmental SIR model, which can be used to track the progress of the epidemic through the population. The prevalence of different HLA class-I alleles in different parts of the world is available through the Allele Frequency Net Database (AFND) [53]. Each population in the AFND is given an ethnicity tag. We predict epidemic sizes using our model for 61 such ethnicities, as well as for 81 strains of influenza A (H1N1) virus isolated in 2009, and 85 strains isolated before or after 2009, for which the genome (and hence proteome) sequence is known [54, 55].

Our results show that if we assume that the susceptibility of a given individual is inversely proportional to the number of high affinity epitopes that this individual presents for a given viral strain, we can qualitatively reproduce some known trends of influenza spread worldwide. Moreover, although the basic reproduction number R0 for a given population and a given viral strain remains the main parameter that controls the epidemic size, other characteristics of the population can also significantly impact epidemic spread. In particular, we show that a composition of HLA genotypes which results in sub-populations with widely differing susceptibilities confers protection against the spread of influenza. Moreover, populations where most of the individuals are less susceptible but where a small sub-set of individuals is highly susceptible, are better in terms of containing the disease than populations that are otherwise configured, even if they have the same value of R0. We show that the full distribution of susceptibilities across a population is required to predict the final epidemic size, but that one can extract useful information from low order moments of this distribution. Although these results are derived from pH1N1 strains, we find that the same trends apply even for viral strains isolated before or after 2009. We also show that populations with frequent occurrence of an allele associated with high risk for one strain do not always experience severe epidemics when considering influenza strains in general. We verify these conclusions by comparisons to synthetic data.

Materials and methods

To model epidemics at the population level, we use a deterministic SIR epidemic model. We describe a population as being formed out of a number of sub-populations. Each sub-population is defined according to their specific susceptibility to the viral strain. To define these sub-populations in practice, starting from biological data, we employ the probabilistic method developed in [51]. This method uses well-tested and benchmarked algorithms for epitope prediction [52] to predict the viral epitopes presented by individuals represented by different HLA class-I genotypes. We link these genotypes to individual susceptibility against the pathogen. We can then group individuals with comparable susceptibilities into well-defined sub-populations.

We represent different epidemic scenarios in terms of epidemic pairs, formed by considering both the pathogen (different influenza strains) and the specific population (in this work, ethnicities) with different sub-population structures. We then use the SIR framework to track the spread of influenza through the population. The ordinary differential equations used in the model are coded in Matlab and solved numerically using Matlab’s ode45 solver.

Generating HLA class-I genotypes

The frequency of different HLA class-I alleles for different ethnicities estimated through large-scale genotyping is available from public databases [53]. Each individual possesses three pairs of HLA class-I genes. One HLA-A, -B and -C allele is obtained from each parent. Provided we assume that these 6 alleles occur independent of each other, we can draw 2 genes each from the full set of possible A, B and C alleles, sampling them according to the empirically measured prevalence of that allele in the population. Each combination of 6 alleles is referred to as an HLA genotype. The likelihood of finding an individual with the exact HLA genotype generated, is given by the product of the likelihood of finding each of the 6 alleles comprising the genotype. A generated genotype is only accepted if the likelihood of finding an individual with that genotype is larger than 10−6.

Forming susceptibility sub-populations

An adaptive CD8+ T-cell mediated immune response can only be mounted against a virus if epitopes from the virus are presented by HLA class-I molecules. The binding between the epitope and the passing CTL takes place through a receptor called the T cell receptor (TCR). Not all TCRs are capable of recognising all viral epitopes. Thus if an individual presents a large number of high affinity epitopes, it is reasonable to assume that there is an enhanced probability that one or more of these epitopes can be recognised by their TCRs. Such individuals can be argued to have low susceptibility to the virus. Conversely, the ability of the immune system to present only a small number of epitopes will reduce the chance that they can be recognised. Such individuals can be argued to be more susceptible to the viral infection. This link between HLA class-I genotypes and disease susceptibility is supported, among others, by [3337, 3943].

Predicting epitopes.

For a given H1N1 influenza viral strain V and particular ethnicity E forming an epidemic pair (E, V), we predict the entire set of epitopes presented by each HLA class-I allele in that ethnicity using different algorithms available through the IEDB analysis resource [52]. A consensus of three algorithms is used: an artificial neural network [56], a stabilized matrix method [57], and a combinatorial peptide-library based method [58]. These three algorithms use very different approaches for predicting epitopes for a given HLA allele. In a study carried out by Sette et. al., all peptides with strong binding affinity, IC50 < 50nM, with their cognate allele were found to be immunogenic [59]. We restrict ourselves to predictions with high likelihood of being immunogenic by ensuring coincident prediction by all three algorithms, and by only considering epitopes with predicted IC50 < 50nM. From these results, we compute the number of high affinity epitopes presented by each individual, represented by their HLA genotype, in the population.

Susceptibility sub-populations.

The clustering of HLA genotypes into sub-populations is carried out on the basis of the number of epitopes presented, under the hypothesis that more susceptible individuals present fewer epitopes. Thus, we cluster individuals so that individuals within the same group present a similar number of epitopes, whereas individuals from different groups present different numbers of epitopes. The susceptibility of each such group is then, (1) where si relates to the susceptibility of individuals in group i, and ei denotes the average number of epitopes presented by the HLA genotypes belonging to sub-population i. A discussion of the proportionality constant is provided in the section Estimating the proportionality constant.

Using the number of individuals N in the population and the classification of genotypes in clusters, we can calculate the fraction of individuals xi in each sub-population i ∈ {1, …, m}, as (2)

All the calculations described above are for a single (ethnicity, viral strain) epidemic pair. The values of all these parameters must be recalculated for each such epidemic pair being studied, since, among others, the parameter m depends on (E, V).

Mathematical model

For each epidemic pair (E, V) we use an SIR-based model to study the spread of influenza. Each population is divided into susceptibility sub-populations; see Fig 1. Our main assumptions are:

  1. The population is closed and spatially well-mixed.
  2. All individuals in the population have equal infectivity and recovery rates.
  3. Individuals in each sub-population have the same susceptibility.
  4. Individuals in different sub-populations have different susceptibilities.
Fig 1. Model sketch.

The SIR model with susceptibility sub-populations used in this work. (a) Initially, individuals belong to one of the susceptibility sub-populations. Infection is seeded by a initially infected people. (b) At the end of the epidemic, all individuals are either recovered or have never been infected.

We use the SIR epidemic model of [21], considering a closed population of N susceptible individuals and a initially infected individuals. The dynamics of the epidemic are represented by the coupled equations (3) (4) (5)

Here Si(t), I(t) and R(t) are the numbers of susceptible (at sub-population i), infected and recovered individuals at time t and initial conditions are given by (6) (7) (8) (9) (10)

We use a = 1 in our numerical calculations to represent a single infective individual who introduces the disease into a fully susceptible population.

The parameter βi governing the infection of susceptible individuals belonging to the ith sub-population is assumed to be a composite of three factors, (11)

We take αsi ∈ [0, 1] to represent the probability of a successful contact between a susceptible individual from the ith sub-population, and an infective individual, leading to infection. The quantity α accounts for factors such as the infectiousness of the pathogen, or the infectivity of the infective individual, while si is related to the susceptibility of individuals in sub-population i. The parameter c represents the average number of contacts per individual per unit time. We note here that, since the dimensions of c are person−1time−1, βi has dimensions person−1time−1. An alternative notation in the literature takes the infection rate to have units time−1, with S and I representing proportion of susceptible or infected individuals, rather than numbers. This would be equivalent to working with the alternative parameter .

Since individuals in all the ethnicities are considered to be homogeneously mixed and all our numerical computations are carried out with the same number of individuals (N + a = 104), we assume the parameter c to be the same for all the epidemic pairs under consideration. Further, since our interest is in analysing the impact of susceptibility heterogeneities in the spread dynamics, we take α to be the same regardless of the epidemic pair (E, V) under consideration. Thus, when comparing the spread dynamics between two epidemic pairs, heterogeneity in susceptibilities emerges as the main factor in our models determining the difference in these dynamics.

Finally, we note that the parameter β, given by (12) can be seen as the counterpart of (β1, …, βm) when the population is considered homogeneous. It corresponds to the parameter widely used and estimated, usually by estimating the basic reproduction number R0, in the literature from epidemiological data for different pathogens and populations.

Estimating the proportionality constant

For a given (E, V) pair, and using Eq (1), the susceptibility of each sub-population is inversely proportional to the average number of epitopes presented by individuals in that group. Thus we can write where z is a proportionality constant which captures other components of the immune system that affect susceptibility, including all aspects of the innate and humoral immune response. We assume these aspects to be the same across all individuals and pairs, since only heterogeneities related to HLA profiles are considered in this work. Then, βi is given by (13) where y = αcz accounts for contributions to βi that are assumed to be the same across different individuals and pairs. The value of β in Eq (12) can be calculated as a weighted average of the βi values, as (14)

The quantity β is henceforth referred to as average susceptibility. We note that our algorithm reports ei = 0.07 as the minimum value of the average number of epitopes presented by a sub-population in any epidemic pair, so that β is always finite.

One way to obtain y is to scale to an experimentally determined value for β, given a specific ethnicity and viral strain (E0, V0). Values for β have historically been estimated using techniques such as serotyping the same set of people at different time points to estimate the change in the fraction of individuals susceptible to a given pathogen. Other methods are reviewed in [60]. Once we have a value of β for one epidemic pair (E0, V0), we can calculate values xi and ei for this epidemic pair using the HLA genotype generation, epitope prediction and clustering methods outlined above. These can be inserted into Eq (14), allowing us to compute the value y, which we have assumed to be the same across all epidemic pairs. Values of xi and ei for each pair (E, V) can be used, together with this value of y, to get a β for any pair (E, V).

In this work, we use the value of R0 estimated in [13] for the Mexico City population for the 2009 H1N1 pandemic originating in Mexico La-Gloria. This was chosen as a reference because HLA class-I allele frequency for this ethnicity, as well as the protein sequence of this viral strain were available. In [13], an exponential curve was fit to the data of number of infections over time during the initial phase of the epidemic. The distribution thus estimated was used to compute R0. The R0 estimated in this manner was 1.72. We use this R0 to compute β for this epidemic pair, and use the epitopes and sub-populations for the pair (E0, V0) = (Mexico City Mestizo pop 2, A/Mexico/LaGloria-8/2009) to estimate y. We note that we are using a particular β estimated in the literature for a specific pair (E0, V0) for computing y, and then considering y to be the same across different pairs. By doing this, we are scaling the rate of the event Si + II + I in all the simulations for any pair (E, V) to the value of β obtained from data for the given pair (E0, V0).

Summary statistics for comparing epidemics

We focus on the following global epidemiological characteristics:

In our model, the ability of an individual to transmit the disease does not depend on the sub-population that the infected individual belongs to, since infectivity is considered to be the same across sub-populations. The SIR model of Eqs (3)(10) was analysed in [21], where it was proved that R(∞) is the only positive solution of (15) and FI can be derived from R(∞) by applying . The basic reproduction number R0 is the number of secondary infections that a typical infected person causes when introduced into a large population of susceptible individuals. In the classical SIR model for homogeneous populations, R0 is given by (16)

In order to calculate R0 for our system of equations (Eqs (3)(10)), we consider the case when a small number of infected individuals is introduced into a large population of N susceptible individuals. We assume the number of susceptible individuals (Si(0) = Ni for all i) to be large, such that aNi. This approaches the limit in which there is an unlimited source of susceptible individuals at the beginning of the epidemic. Then the dynamics of the initially infected population in terms of a(t), the number of initially infected individuals at time t declines as (17) and thus a(t) = a(0)eγt. Let I(1)(t) be the number of secondary infections caused up to time t, with I(1)(0) = 0, by the a initially infected individuals. Then (18) so that .

The basic reproduction number is given by (19) so that by setting a(0) = 1 we get (20)

For m = 1, this expression leads to the well-known basic reproduction number for the homogeneous case (Eq (16)).

Parameters characterising epidemic pairs.

Our model predicts values of FI and R0 for each pair (E, V). Any given epidemic pair (E, V) corresponding to an ethnicity E and a viral strain V has a susceptibility profile described by the number m of sub-populations, and by vectors (β1, …, βm) and (N1, …, Nm). The susceptibility profile of any epidemic pair (E, V) is described by a Susceptibility Profile Vector (SPV)

The quantities FI and R0 can be expected to directly depend on the SPV(E, V), where we omit (E, V) from now on for ease of notation. For example, it is clear that for a given epidemic pair, R0 directly depends on the total number of individuals, N, the recovery rate, γ, and the average susceptibility

On the other hand, the quantity of central interest to epidemic modeling, the final epidemic size FI for a given epidemic pair, could depend on the full distribution of the SPV. For concreteness, we examine the dependence of FI on the lower order moments of the distribution, such as the standard deviation, the skewness and the coefficient of variation, defined respectively as

We note that a long left tail of the distribution represented by SPV would result in Sk(SPV) < 0, indicating the presence of a small number of individuals with susceptibility significantly lower than the mean. On the other hand, when the population has a small representation of individuals with susceptibility significantly higher than the mean, we have Sk(SPV) > 0.


The workflow used in this paper is summarised in Fig 2.

Fig 2. Workflow.

Summary of the steps carried out in this work. Inputs from external sources are shown in dotted parallelograms.


To compute FI, we solve Eqs (3)(10) with N + a = 104 individuals, a = 1. Each simulation is allowed to run for (0, T), where time T is large enough to ensure that the epidemic has died out. In particular, T is chosen to be large enough for each considered epidemic pair so that R(T) ≈ R(∞) obtained from the simulation satisfies Eq (15) with some error ϵ < 10−2. The recovery rate used was γ = 1/3 day−1 [13].

The input to Eqs ((3)(10)) was determined for 61 ethnicities and 81 viral strains isolated in 2009, leading to the study of 4, 941 epidemic pairs. Of these, 1, 392 cases had R0 > 1, and 718 cases had FI > 0.5. The distributions of SPV characteristics across these 4, 941 epidemic pairs is provided in Fig 3. The number m of susceptibility sub-populations varied from 1 (578 cases) to 23 (1 case, A/Giessen/6/2009 with Kenya Nandi ethnicity). The most common value for m was 5, seen in 647 cases spanning 80 strains and 32 ethnicities. Details regarding ranges of calculated parameters for strains isolated before or after 2009 can be found in the supporting information; see S1 Fig. All estimated parameters are provided for all epidemic pairs in a supplementary file; see S1 File.

Fig 3. Variations in SPV characteristics, 2009 strains.

Histograms for the values of the different susceptibility profile vector characteristics for the 4, 941 epidemic pairs involving H1N1 strains isolated in 2009: (a) σ(SPV); (b) Sk(SPV); (c) CV(SPV); (d) m; and (e) β.

Results presented in upcoming sections are for H1N1 strains isolated in 2009, unless stated otherwise.

Dependence of epidemic size and R0 on average susceptibility

We first examine the relationship between the average susceptibility (β), the basic reproduction number (R0) and the epidemic size (FI); see Fig 4. We note that Eq (16) predicts a linear relationship between R0 and β. As can be seen in Fig 4(a), most (E, V) pairs have β < 2 × 10−4person−1day−1, while pairs with higher values of β correspond to those with large epidemic sizes (FI > 0.6). These pairs have R0 > 7, implying β > 2.33 × 10−4person−1day−1 from Eq (20).

Fig 4. R0 cannot predict epidemic size exactly.

The dependence of FI on R0 and β is shown for: (a) all epidemic pairs involving strains isolated in 2009; (b) epidemic pairs involving strains isolated in 2009, and with R0 < 7. Only epidemic pairs with m > 1 are plotted. The red line shows the epidemic size in the case of homogeneous susceptibilities. We see that when R0 > 1, FI takes on a wide range of values for any given R0.

In Fig 4(b) we focus on epidemic pairs with R0 < 7. In this plot, there are a large number of points with epidemic size FI ≈ 0. Upon closer examination, these points turn out to have R0 < 1, as expected. We note that R0 = 1 implies β = 0.33 × 10−4person−1day−1, which corresponds to the point in Fig 4(b) where the epidemic size starts to rise above 0. In all further plots, we focus on the (E, V) pairs where 1 < R0 < 7 and m > 1, leading to the analysis of 956 epidemic pairs.

No single parameter predicts epidemic size

In Fig 4(b), where the relationship between β and FI is shown, it can be seen that a high value of average susceptibility leads to a larger epidemic. The red line corresponds to the epidemic size when the susceptibility compartment is homogeneous (i.e., m = 1). We see that this line forms an upper bound on the FI values for epidemic pairs with m > 1. It has been proved that the final epidemic size is always lower in an epidemic pair with heterogeneous susceptibility, than an epidemic pair with the same average susceptibility but with homogeneous susceptibility [21, 28, 44, 61]. The predictions in our simulations agree with this result. However, we observe a spread of FI values when considering epidemic pairs containing heterogeneous susceptibilities and having the same average susceptibility β; see Eq (12). This shows that heterogeneity plays a role in determining the extent of an epidemic even when the average susceptibility remains constant.

To study what aspects of this heterogeneity have the greatest impact on epidemic size, we examine the dependence of FI on the characteristics of the susceptibility profile vector discussed above (m, β, σ(SPV), CV(SPV) and Sk(SPV)); see Fig 5. The main trends that can be identified are the following:

  • Epidemic pairs leading to positive skewness of the SPV seem to yield smaller epidemic sizes on average; see Fig 5(a).
  • Pairs corresponding to SPV with larger coefficient of variation also yield smaller epidemic sizes; see Fig 5(c).
  • Epidemic pairs containing more sub-populations (larger m) correspond to small epidemic sizes; see Fig 5(d).
Fig 5. FI as a function of SPV characteristics.

The dependence of FI on several characteristics of the susceptibility profile vector of each (E, V) pair involving an H1N1 strain isolated in 2009: (a) skewness of the SPV; (b) standard deviation of the SPV; (c) coefficient of variation of the SPV; (d) number of susceptibility sub-populations, m.

In our data, a positive value for Sk(SPV) corresponds to epidemic pairs with R0 < 2. Fig 4 shows that even with such small values for R0, FI can take on a wide range of values, going up to 0.8. Yet, epidemic pairs in our data set with positive Sk(SPV) always have FI < 0.2; see Fig 5(a). This suggests that having a positive skewness, corresponding to a distribution where most of the people have low susceptibility, but a small number of people have susceptibility significantly larger than the mean, lends some protective effect to the population.

Although σ(SPV) does not directly affect R0, it influences it indirectly due to the positive correlation between β and σ(SPV). To remove this correlation, one can analyse CV(SPV) instead; see Fig 5(c). This figure indicates that epidemic pairs with larger values of CV(SPV) lead to smaller epidemic sizes.

We provide correlation coefficients r(θ, τ) ∈ (−1, 1) between our summary statistics τ ∈ {FI, R0} and SPV characteristics θ ∈ {m, β, CV(SPV), σ(SPV), Sk(SPV)} in Table 2. The parameter β provides the best predictor for both R0 and FI. On the other hand, the heterogeneity described by CV(SPV), and the skewness of the susceptibility distribution described through Sk(SPV), also emerge as good predictors of FI.

Table 2. Correlation coefficients r(θ, τ) between summary statistics of the epidemic and SPV characteristics.

To further examine the connections between the SPV characteristics θ ∈ {m, β, CV(SPV), σ(SPV), Sk(SPV)} and τ ∈ {FI, R0} and concentrating specifically on the role of σ(SPV) and m, we describe two case studies below.

Case study 1—σ(SPV).

Fig 5(b) shows that most of the epidemic pairs in our data set have σ(SPV) < 10−4person−1day−1. Although the correlation between σ(SPV) and FI is not statistically significant (see Table 2), we notice that a high value of σ(SPV) (> 1.5 × 10−4person−1day−1) corresponds to moderate values for FI. We examine two (E, V) pairs with high σ(SPV); see pairs 1 and 2 in Table 3 and their corresponding epidemic dynamics in Fig 6. These two pairs have similar values for σ(SPV), and yet have significantly different epidemic sizes (0.48 for pair 1, and 0.74 for pair 2). We also see from Fig 6(b) and 6(d), that the infection runs its course faster in pair 2 than in pair 1. Both these phenomena can be explained by the fact that pair 2 has a significantly higher β (2.33 × 10−4person−1day−1, compared to 1.42 × 10−4person−1day−1 for pair 1). We can also see from Fig 6 that the sub-population with highest βi is the one most affected by the infection, while the sub-populations with low βi remain largely uninfected, in both pairs 1 and 2. We will see in further sections that β and σ(SPV) together, correlate well with epidemic size.

Fig 6. Case study 1—σ(SPV).

Simulation results for epidemic pairs 1 (a)-(b) and 2 (c)-(d) in Table 3. Distribution of βi values in the population (left): the x-axis represents values of ln(βi), and the y-axis shows values of Ni. The red vertical line corresponds to the average susceptibility β. Time course of the epidemic (right) in terms of variables Si(t) (solid) for each sub-population, and I(t) (dashed).

Table 3. Case study 1—Studying the predictive power of σ(SPV).

Case study 2—m.

In Fig 5(d) there appears to be some negative correlation between m and FI, with larger values of m corresponding to smaller epidemic sizes; see Table 2. However, we note that this is more an artefact of the data than a predictive trend, and it is possible to have epidemic pairs with a large value of m but very different final epidemic sizes and epidemic time-course dynamics. This can be seen for example in Fig 7 for epidemic pairs 3 and 4 from Table 4. Once again, the pair with higher average susceptibility has both a larger epidemic size, and also a faster time course for the spread of the disease.

Fig 7. Case study 2—m.

Simulation results for epidemic pairs 3 (a)-(b) and 4 (c)-(d) in Table 4. Distribution of βi values in the population (left): the x-axis represents values of ln(βi), and the y-axis shows values of Ni. The red vertical line corresponds to the average susceptibility β. Time course of the epidemic (right) in terms of variables Si(t) (solid) for each sub-population, and I(t) (dashed).

Table 4. Case study 2—Studying the predictive power of m.

Dependence of R0 on SPV characteristics

We address the question of whether R0 can be estimated from the SPV characteristics θ ∈ {m, CV(SPV), σ(SPV), Sk(SPV)}; see Fig 8. The linear relationship between β and R0 follows from Eq (16). In Fig 8(d), we once again observe that (E, V) pairs with m > 10 have low R0. As observed in case study 2, this is more an artefact of biases in the real data than a general trend. In Fig 8(b), we plot σ(SPV) against R0. Although σ(SPV) does not directly affect R0, we see this shape due to the relationship in the data between β and σ(SPV).

Fig 8. R0 as a function of SPV characteristics.

The dependence of the basic reproduction number R0 on several characteristics of the susceptibility profile vector of each (E, V) pair considering H1N1 strains isolated in 2009: (a) skewness of the SPV; (b) standard deviation of the SPV; (c) coefficient of variation of the SPV; (d) number of susceptibility sub-populations, m. Only epidemic pairs (E, V) with 1 < R0 < 7 and m > 1 are plotted.

Epidemic size largely correlates with select pairs of parameters

Earlier, we examined the correlation between FI and the SPV characteristics θ ∈ {m, β, CV(SPV), σ(SPV), Sk(SPV)} independently. This raises the question of whether accounting for pairs of such parameters might provide a more accurate prediction of FI. We study here how pairs of parameters are related to epidemic size in all (E, V) pairs with 1 < R0 < 7 and m > 1. We find that pairs involving the average susceptibility β, as well as the heterogeneity parameters Sk(SPV), CV(SPV) and σ(SPV) are better predictors of the final epidemic size than these quantities individually. Plots involving these parameters are shown in Fig 9, while multiple correlation coefficients are shown in Table 5. All other parameter pairs are plotted in supporting information; see S2 Fig. In particular, note that:

  • Epidemic pairs containing a susceptibility profile vector leading to large values of CV(SPV), small values of β, and positive Sk(SPV) experience smaller final epidemic sizes.
  • Epidemic pairs with positive Sk(SPV) are also the ones with small average susceptibility, and they lead to small final epidemic sizes.
Fig 9. FI as a function of pairs of SPV characteristics.

(a) (Sk(SPV), σ(SPV)); (b) (CV(SPV), σ(SPV)); (c) (CV(SPV), β); (d) (Sk(SPV), β); (e) (σ(SPV), β); (f) (CV(SPV), Sk(SPV)). FI is shown as a colourbar.

Table 5. Correlation coefficients r((θ1, θ2), τ) between summary statistics of the epidemic and SPV characteristics pairs.

From Fig 9, we see that for a given β, FI decreases with increasing σ(SPV). It also decreases as Sk(SPV) is made more positive, or as CV(SPV) is increased. This shows that for intermediate values of β such as the ones shown in Fig 9, a higher spread in βi values helps to protect the population against the epidemic spread. In other words, a population with higher genetic heterogeneity in susceptibility to a virus, leading to susceptibility sub-populations with a large spread in susceptibilities, can be expected to have a smaller epidemic than a population where most of the people have similar susceptibility, despite both populations being non-homogeneous in susceptibility.

We also observe that for a given value of β, an (E, V) pair with Sk(SPV) > 0 or only slightly negative corresponds to a smaller FI than one for which Sk(SPV) is a large negative value. We interpret this in the following way: populations containing a small sub-set of individuals with heightened susceptibility, but in which most of the individuals are less susceptible, are better protected against the disease than populations where the susceptibility is more uniformly distributed, even if the mean susceptibility is the same.

We provide in Table 5 multiple correlation coefficients between our summary statistics τ ∈ {FI, R0} and SPV characteristics pairs (θ1, θ2) ∈ {m, β, CV(SPV), σ(SPV), Sk(SPV)}2.

Predictions broadly track trends in pH1N1 (2009) burden

The 2009 pandemic of H1N1 was closely tracked by many organisations in the world, including the World Health Organization (WHO). For example, [62, Fig 3] indicates that certain areas of the world experienced a larger number of cases than others. In particular, we see that China and Japan experienced worse epidemics than Russia, which tends to have relatively smaller epidemics.

To compare the predictions of our model with these observations, we select viral strains isolated in these regions during the 2009 pandemic, and ethnicities corresponding to these countries. We would like to mention here that our model works with individual ethnicities, while the data available is for countries, which are comprised of multiple ethnicities. We find that different ethnicities from the same country experience widely differing epidemic sizes for the same viral strain; see S1 File. For this comparison, we select ethnicities available in our data set from each of these countries, for which the predictions most closely resemble the observations in [62, Fig 3]; see Fig 10. As can be seen in Fig 10, our method predicts that most Chinese ethnicities will experience severe epidemics regardless of the viral strain. On the other hand, Russia and Japan are predicted to experience smaller epidemics for most viral strains. However, we note that for most of the Japanese strains, the Japanese ethnicity will suffer larger epidemic sizes than the Russian one, thus qualitatively agreeing with what can be observed in [62, Fig 3].

Fig 10. Qualitative trends captured by our model.

Epidemic sizes predicted by our model for different ethnicities and strains corresponding to China, Russia, and Japan, during the 2009 influenza A/H1N1 pandemic. Qualitatively, the predictions broadly track trends observed worldwide.

An interesting case is the strain A/Japan/921/2009 (H1N1), which was one of the strains circulating in Japan during the 2009 pandemic. This strain is predicted to cause severe epidemics in most ethnicities, and this holds true across all the 61 ethnicities considered in the data set.

The ethnicity Russia Tuva Pop 2 is predicted to experience moderate epidemics for viral strains isolated in Russia, but a slightly worse epidemic for the strain A/Hyogo/2/2009 (H1N1), isolated in Japan. Thus, our methods bear out the idea that the severity of an influenza epidemic in a given country should not be dictated entirely by the genetic makeup of the hosts, but should also depend on the particular strain of the pathogen circulating in this country. Our predictions suggest that the ability of HLA class-I alleles in the ethnicity Russia Tuva pop 2 to present epitopes from the influenza A (H1N1) virus changes significantly across different viral strains.

The results described above show that even with a model that only incorporates susceptibility heterogeneities in terms of epitope presentation through HLA class-I alleles, we can qualitatively explain some essential trends observed across the world during the 2009 H1N1 pandemic. This serves as a qualitative validation of our methodology. Moreover, our results suggest that while some trends in influenza spread worldwide can be explained by the average susceptibility of each ethnicity to each strain, others might have an explanation related to the particular genetic diversity within each ethnicity for a given strain. For example, when analysing pairs 5 and 6 in Table 6, we can see that the same value of β can lead to different epidemic sizes for the same strain when considering the China Yunnan Province Lisu and the Japan pop 5 ethnicities. This is likely related to the fact that Sk(SPV) is significantly more negative for the Chinese ethnicity, and the coefficient of variation is smaller, leading to a larger epidemic size. A similar behaviour can be seen when considering pairs 7-9 in Table 6. Larger reproduction numbers can still arise from smaller epidemic sizes if Sk(SPV) is closer to 0 (or positive), and for more heterogeneous populations (larger values of CV(SPV)), which might explain smaller epidemic sizes in, for example, the Kenya Luo ethnicity compared to the Chinese ones [62, Fig 3].

Table 6. Select case studies to study the observed behaviour in Fig 10.

Similar trends are observed in H1N1 strains isolated in years other than 2009

We carry out parameter estimation and simulations as described in previous sections, for 85 strains of H1N1 influenza isolated in years other than 2009. This includes 15 strains isolated before 2000, 21 strains isolated between 2000 and 2008 (inclusive), and 49 strains isolated after 2009. We find that the trends identified in Fig 9 apply even for these strains; see supporting information S3 Fig.

Response of some indigenous ethnicities to H1N1

Several studies have reported that during the 2009 pandemic, indigenous ethnicities experienced more severe epidemics than their non-indigenous counterparts [6365]. The indigenous ethnicities in our data set are USA Alaska Yupik, Australia Yuendumu Aborigine, and Australia Cape York Peninsula Aborigine. We find that the ethnicity USA Alaska Yupik is always predicted to have a worse epidemic than non-indigenous ethnicities from the USA, irrespective of the strain being considered. Since our data set does not include any non-indigenous ethnicities from Australia, we are unable to verify whether or not a similar statement holds true for the Australian aboriginal ethnicities.

In general, we find the ethnicity Australia Cape York Aborigine, with average FI = 0.14 when considering all 166 viral strains, is predicted to experience a marginally worse epidemic than Australia Yuendumu Aborigine whose average FI = 0.08. Interestingly, this trend is reversed when we focus on the strains A/Auckland/1/2009 and A/Auckland/597/2000 isolated in Australia. For these strains, Australia Cape York Aborigine has R0 < 1 for both these strains, but Australia Yuendumu Aborigine has R0 = 1.49 for the strain A/Auckland/1/2009; see Table 7.

Table 7. Case study 3—Studying Australian aboriginal ethnicities.

Based on the observations during the 2009 pandemic, it has been suggested that aboriginal communities should be prioritised during vaccination [63, 64]. However the predictions in Table 7 suggest that at least from the perspective of HLA alleles and downstream CTL response, each influenza strain and each aboriginal community needs to be assessed independently. Using our model, it is possible to predict whether or not a new strain will cause a worse epidemic than a strain in the data set, within the constraints of the assumptions made. Predictions such as these could help optimise the deployment of resources when combating a new strain of influenza.

High risk alleles for one strain do not always correlate with severe epidemics in general

The frequency of the HLA class-I allele HLA-A*24 has been found to correlate with mortality rate due to the pandemic H1N1 (2009) influenza virus [36]. We rank ethnicities in our data set in descending order of their average FI across all 166 strains of influenza, and find that the ethnicity USA Alaska Yupik has the highest prevalence of allele HLA-A*24:02, and also has the worst average epidemic size; see Table 8. The ethnicity with the next highest frequency of allele HLA-A*24:02, Japan Central, has very low average epidemic size, and ranks 52nd among 61 ethnicities. The ethnicity Japan pop 3 has comparable frequency of the allele HLA-A*24:02 as Japan Central, but is ranked 28th based on its average epidemic size. These results show that an allele whose frequent occurrence correlates with a high risk for one influenza strain, does not always correlate with a severe epidemic when considering influenza strains in general. Rather, we need to estimate the full profile of the SPV, or at least the summary characteristics with strong correlation as described in previous sections.

Table 8. Top 3 ethnicities with high risk allele HLA-A*24:02.

Synthetic data supports the observed behaviour

Does the behaviour discussed in the preceding sections rely on correlations between SPV characteristics that are specific to the epidemic pairs we analyse? These correlations arise directly from genetic heterogeneities at the HLA genotype level corresponding to the 61 ethnicities and 166 viral strains considered here. However, we could frame our questions more generally. For example, we could ask if a positive skewness of the SPV would always be a protective characteristic for the population, given a fixed average β?

To address these and similar questions, we construct a synthetic data set of 104 epidemic pairs created within the following parameter ranges: where emin and emax are the minimum and maximum values of ei in the real data set analysed in previous sections. These distributions have been chosen so that we obtain 104 epidemic pairs with values in the interval 1 < R0 < 7, m > 1, with Ni and βi distributed within ranges that are comparable to those of the original data set.

For this synthetic data set, we plot in Fig 11 the predicted final epidemic size as a function of the different SPV characteristics. In Tables 9 and 10, correlation coefficients for single and paired SPV characteristics, and summary statistics FI and R0, are provided for the epidemic pairs in this synthetic data set. A direct inspection of results in Fig 11 and Tables 9 and 10 lead to the following conclusions:

  • Large values of β lead to larger epidemic sizes. However, β alone can not explain FI, and other characteristics of the SPV need to be taken into account, as for the original data set; see Fig 11(a).
  • Positive skewness leads to smaller epidemic sizes than negative skewness scenarios, as observed for the original data set; see Fig 11(b).
  • The larger the heterogeneity (in terms of σ(SPV) or CV(SPV)), the more protected the population is against epidemic spread. This is not a consequence of the value of m. Rather, it is the particular combination of βi and Ni values which has an impact on the epidemic dynamics; see Fig 11(c)–11(e).
Table 9. Correlation coefficients r(θ, τ) between summary statistics of the epidemic and SPV characteristics for the synthetic data set.

Table 10. Correlation coefficients r((θ1, θ2), τ) between summary statistics of the epidemic and SPV characteristics pairs, for the synthetic data set.

Fig 11. FI as a function of SPV characteristics—Synthetic data set.

Dependence of epidemic size on several characteristics of the SPV is analysed for the synthetic data set described in the text. (a) β; (b) Sk(SPV); (c) σ(SPV); (d) CV(SPV); (e) m.


Theoretical studies on epidemiological spread of disease in the presence of susceptibility heterogeneities have shown that final epidemic size is typically lower when susceptibility sub-populations are factored in, as compared to the case of homogeneous susceptibility [21, 44, 61]. We find that this result holds true when the sub-population sizes and disease transmission rates are informed by real-world data about immunological factors. The novelty in our approach is to propose how the susceptibility profile vector can be estimated from genetic sequence data, so that we can then deal with particular SPVs that might exist in reality for different ethnicities and viral strains. We also show that some summary statistics of the SPV (such as the skewness or the coefficient of variation) can help to better understand the predicted final size of the epidemic.

A limitation of our model is that factors such as age, prior infection history and vaccination are not included. While there have been studies which collect and analyse such data for small cohorts [47, 48, 65], gathering such information on the global scale required for this analysis requires the formation of consortia such as those existing for diseases such as cancer [37]. Also, we make the strong simplifying assumption that all aspects of the innate and adaptive immune system not affected by HLA class-I presentation can be pooled into a single proportionality constant, and are considered uniform among individuals within an ethnicity, and across ethnicities. While this helped focus the analysis on the role of HLA alleles in disease spread, incorporating other aspects of the immune system into epidemiological models is an important problem that must be addressed. Due to these limitations, predictions made by our model can only be used to draw comparisons between different epidemic pairs, particularly epidemic pairs consisting of the same ethnicity and different viral strains, and not for making absolute quantitative predictions.

A number of extensions of the line of work presented in this manuscript are possible. Presentation of epitopes by HLA class-I alleles is preceded by a number of steps including internalisation of the virus, proteasomal cleavage of viral proteins into shorter peptides, and transport of peptides through the TAP transport system [31]. The epitope prediction tools used in this work do not explicitly consider all these pre-processing steps in any single tool. Also, the prediction algorithms have lower accuracy for rare alleles. The model can be improved by plugging in different epitope prediction methods which overcome these limitations. Also, it would be useful to establish a more accurate, quantitative connection between si and ei than the simple inverse relation we have assumed. Two other possible mathematical forms, si ∝ 1/ln(ei + 1) and si ∝ 1/(ei + 1)2, are explored in the supporting information; see S4 Fig.

Spatial heterogeneities are known to allow for disease persistence, since asynchrony in the epidemic spread among different sub-populations located in different geographical locations can allow for global persistence, even if the epidemic locally dies out [19]. Since HLA alleles are inherited, it can be expected that families and households will have similar HLA genotypes, potentially introducing spatial inhomogeneity in the distribution of HLA alleles in a population. If such spatial information regarding HLA genotypes were gathered, it would be interesting to study how this affects epidemic dynamics and persistence. An agent based model incorporating variations in agent susceptibility along the lines indicated here, along with spatial information regarding each susceptible agent, would provide an idea of how such factors might modify the general conclusions described in this paper. A network model incorporating the social structure of individual contacts would indicate if the combination of varied susceptibility with a specified contact network structure between individuals might accelerate epidemic progress or retard it.


The incorporation of within-host immunological information into population-level epidemic models is a major challenge for epidemiological modeling [30]. In this paper, we address this question in a specific case, by modeling the impact of genetic diversity in terms of the HLA class-I genotype on the predicted epidemic dynamics of H1N1 influenza. To do this, we made use of HLA allele frequencies measured across different ethnicities, focusing on the number of high affinity epitopes presented by individuals within 61 ethnicities and for 81 H1N1 influenza A viral strains isolated in 2009 as well as 85 H1N1 influenza A viral strains isolated in other years. Our main hypothesis was that the susceptibility of individuals in a given ethnicity, for a given viral strain, is inversely proportional to the number of high affinity epitopes that these individuals can present. We then used a multi-compartment SIR model to study the spread dynamics of influenza for each (ethnicity, viral strain) epidemic pair, where the final epidemic size FI and the basic reproduction number R0 are used as the summary statistics for the purpose of comparison.

While the average susceptibility β is a central parameter, the susceptibility profile corresponding to each epidemic pair also plays an important role governing epidemic spread. In particular, when analysing epidemics with intermediate values of β (i.e., intermediate values of R0), more heterogeneous susceptibility profiles, as well as profiles showing positive skewness Sk(SPV), are more protective for the population as a whole against H1N1 influenza. Our model only considers heterogeneity from the perspective of the ability of a person’s HLA genotype to present epitopes from a given virus. However, even if at a qualitative level, our results support the idea that having a wide variety of HLA alleles represented among its individuals, resulting in a wide range of susceptibilities, benefits a population as a whole in terms of restricting the spread of an infectious disease.

Although our model does not incorporate other factors such as social and economic characteristics of each particular population or potential different infectivities for each viral strain, our results qualitatively capture several central trends of influenza spread worldwide. Thus, we can conclude that susceptibility of individuals in terms of the HLA genotype is an important factor that could explain the spread potential of different influenza viral strains among different ethnicities and populations. While some of these trends can just be explained due to larger or smaller values of R0 (i.e., the average susceptibility β), the reason for small epidemic sizes occurring for some particular ethnicities and viral strains might be related to the existence of high genetic diversity resulting in a wide range of susceptibilities in these populations, for these viral strains, with a positively skewed susceptibility profile vector.

Supporting information

S1 Fig. Variations in SPV characteristics, non-2009 strains.

Histograms for the values of the different susceptibility profile vector characteristics for the 5, 185 epidemic pairs involving H1N1 strains isolated in years other than 2009: (a) σ(SPV); (b) Sk(SPV); (c) CV(SPV); (d) m; and (e) β.


S2 Fig. FI as a function of other pairs of SPV characteristics, 2009 strains.

(a) (CV(SPV), m); (b) (Sk(SPV), m); (c) (σ(SPV), m) and (d) (β, m).


S3 Fig. FI trends hold for H1N1 strains isolated before and after 2009.

FI as a function of pairs of SPV characteristics. (a) (Sk(SPV), σ(SPV)); (b) (CV(SPV), σ(SPV)); (c) (CV(SPV), β); (d) (Sk(SPV), β); (e) (σ(SPV), β); (f) (CV(SPV), Sk(SPV)). FI is shown as a colourbar.


S4 Fig. Other mathematical forms for .

The results presented in the paper use the form (column 1). Two other mathematical forms, (column 2) and (column 3) are explored here, for all 61 ethnicities and 166 viral strains. Only the pairs of SPV characteristics found to have high correlation with FI are shown. (a, b, c) (σ(SPV), β); (d, e, f) (CV(SPV), β); (g, h, i) (Sk(SPV), β). FI is shown as a colourbar. Trends in epidemic size hold across all considered mathematical forms.


S1 File. All calculated parameters.

Parameters m, β, βi, xi, σ(SPV), CV(SPV), Sk(SPV), FI and R0 for all epidemic pairs in the data set.



We thank Professor Frank Ball, University of Nottingham, UK, Dr. Jose Faro, University of Vigo, Spain, and Dr. Abhilash Mohan, Indian Institute of Science, India, for the useful discussions. We also thank Proyasha Roy, Indian Institute of Science, India, for technical assistance provided.


  1. 1. Bonita R, Beaglehole R, Kjellström T. Basic epidemiology. World Health Organization; 2006.
  2. 2. Torrence M. Understanding epidemiology. St. Louis: Mosby; 1997.
  3. 3. Coburn BJ, Wagner BG, Blower S. Modeling influenza epidemics and pandemics: insights into the future of swine flu (H1N1). BMC medicine. 2009;7(1):30. pmid:19545404
  4. 4. Girard MP, Tam JS, Assossou OM, Kieny MP. The 2009 A (H1N1) influenza virus pandemic: A review. Vaccine. 2010;28(31):4895–4902. pmid:20553769
  5. 5. Paine S, Mercer G, Kelly P, Bandaranayake D, Baker M, Huang Q, et al. Transmissibility of 2009 pandemic influenza A (H1N1) in New Zealand: effective reproduction number and influence of age, ethnicity and importations. Euro Surveill. 2010;15(24):1–9.
  6. 6. Yang Y, Sugimoto JD, Halloran ME, Basta NE, Chao DL, Matrajt L, et al. The transmissibility and control of pandemic influenza A (H1N1) virus. Science. 2009;326(5953):729–733. pmid:19745114
  7. 7. Haghdoost AA, Baneshi MR, Zolala F, Farvahari S, Safizadeh H. Estimation of basic reproductive number of Flu-like syndrome in a primary school in Iran. International journal of preventive medicine. 2012;3(6).
  8. 8. Jesan T, Menon GI, Sinha S. Epidemiological dynamics of the 2009 influenza A (H1N1) v outbreak in India. Current Science. 2011; p. 1051–1054.
  9. 9. Chan PP, Subramony H, Lai FY, Tien WS, Tan BH, Solhan S, et al. Outbreak of novel influenza A (H1N1-2009) linked to a dance club. Annals Academy of Medicine Singapore. 2010;39(4):299.
  10. 10. Mostaço-Guidolin LC, Bowman CS, Greer AL, Fisman DN, Moghadas SM. Transmissibility of the 2009 H1N1 pandemic in remote and isolated Canadian communities: a modelling study. BMJ open. 2012;2(5):e001614. pmid:22942233
  11. 11. Jin Z, Zhang J, Song LP, Sun GQ, Kan J, Zhu H. Modelling and analysis of influenza A (H1N1) on networks. BMC public health. 2011;11(1):S9 pmid:21356138
  12. 12. Nishiura H, Chowell G, Safan M, Castillo-Chavez C. Pros and cons of estimating the reproduction number from early epidemic growth rate of influenza A (H1N1) 2009. Theoretical Biology and Medical Modelling. 2010;7(1):1. pmid:20056004
  13. 13. Cruz-Pacheco G, Duran L, Esteva L, Minzoni A, Lopez-Cervantes M, Panayotaros P, et al. Modelling of the influenza A (H1N1) v outbreak in Mexico City, April-May 2009, with control sanitary measures. Euro surveillance: bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin. 2009;14(26):344–358.
  14. 14. Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438(7066):355–359. pmid:16292310
  15. 15. Anderson RM, MAY RM. Spatial, temporal, and genetic heterogeneity in host populations and the design of immunization programmes. Mathematical Medicine and Biology: A Journal of the IMA. 1984;1(3):233–266.
  16. 16. Favier C, Schmit D, Müller-Graf CD, Cazelles B, Degallier N, Mondet B, et al. Influence of spatial heterogeneity on an emerging infectious disease: the case of dengue epidemics. Proceedings of the Royal Society of London B: Biological Sciences. 2005;272(1568):1171–1177.
  17. 17. Hagenaars T, Donnelly C, Ferguson N. Spatial heterogeneity and the persistence of infectious diseases. Journal of theoretical biology. 2004;229(3):349–359. pmid:15234202
  18. 18. Hethcote HW, Van Ark JW. Epidemiological models for heterogeneous populations: proportionate mixing, parameter estimation, and immunization programs. Mathematical Biosciences. 1987;84(1):85–118.
  19. 19. Lloyd AL, May RM. Spatial heterogeneity in epidemic models. Journal of theoretical biology. 1996;179(1):1–11. pmid:8733427
  20. 20. Ball F, Lyne OD, et al. Stochastic multi-type SIR epidemics among a population partitioned into households. Advances in Applied Probability. 2001;33(1):99–123.
  21. 21. Ball F. Deterministic and stochastic epidemics with several kinds of susceptibles. Advances in applied probability. 1985; p. 1–22.
  22. 22. Kiss IZ, Green DM, Kao RR. The effect of contact heterogeneity and multiple routes of transmission on final epidemic size. Mathematical biosciences. 2006;203(1):124–136. pmid:16620875
  23. 23. Economou A, Gómez-Corral A, López-García M. A stochastic SIS epidemic model with heterogeneous contacts. Physica A: Statistical Mechanics and its Applications. 2015;421:78–97.
  24. 24. López-García M. Stochastic descriptors in an SIR epidemic model for heterogeneous individuals in small networks. Mathematical biosciences. 2016;271:42–61. pmid:26519788
  25. 25. Rodrigues P, Margheri A, Rebelo C, Gomes MGM. Heterogeneity in susceptibility to infection can explain high reinfection rates. Journal of theoretical biology. 2009;259(2):280–290. pmid:19306886
  26. 26. Katriel G. The size of epidemics in populations with heterogeneous susceptibility. Journal of mathematical biology. 2012;65(2):237–262. pmid:21830057
  27. 27. Ball F, Clancy D. The final size and severity of a generalised stochastic multitype epidemic model. Advances in Applied Probability. 1993; p. 721–736.
  28. 28. Bailey NTJ. The Mathematical Theory of Infectious Diseases and Its Applications. Mathematics in Medicine Series. Griffin; 1975.
  29. 29. Hyman JM, Li J. An intuitive formulation for the reproductive number for the spread of diseases in heterogeneous populations. Mathematical biosciences. 2000;167(1):65–86. pmid:10942787
  30. 30. Lloyd-Smith JO, Funk S, McLean AR, Riley S, Wood JL. Nine challenges in modelling the emergence of novel pathogens. Epidemics. 2015;10:35–39. pmid:25843380
  31. 31. Kindt TJ, Goldsby RA, Osborne BA, Kuby J. Kuby immunology. Macmillan; 2007.
  32. 32. Kreijtz J, Fouchier R, Rimmelzwaan G. Immune responses to influenza virus infection. Virus research. 2011;162(1):19–30. pmid:21963677
  33. 33. Blackwell JM, Jamieson SE, Burgner D. HLA and infectious diseases. Clinical microbiology reviews. 2009;22(2):370–385. pmid:19366919
  34. 34. Boon A, de Mutsert G, Graus Y, Fouchier R, Sintnicolaas K, Osterhaus A, et al. The magnitude and specificity of influenza A virus-specific cytotoxic T-lymphocyte responses in humans is related to HLA-A and-B phenotype. Journal of virology. 2002;76(2):582–590. pmid:11752149
  35. 35. Thomas PG, Keating R, Hulse-Post DJ, Doherty PC. Cell-mediated protection in influenza infection. Emerg Infect Dis. 2006;12(1).
  36. 36. Hertz T, Oshansky CM, Roddam PL, DeVincenzo JP, Caniza MA, Jojic N, et al. HLA targeting efficiency correlates with human T-cell response magnitude and with mortality from influenza A infection. Proceedings of the National Academy of Sciences. 2013;110(33):13492–13497.
  37. 37. Horby P, Nguyen NY, Dunstan SJ, Baillie JK. The role of host genetics in susceptibility to influenza: a systematic review. PloS one. 2012;7(3):e33180. pmid:22438897
  38. 38. Mukherjee S, Warwicker J, Chandra N. Deciphering complex patterns of class-I HLA—peptide cross-reactivity via hierarchical grouping. Immunology and cell biology. 2015;93(6):522–532. pmid:25708537
  39. 39. Chapman SJ, Hill AV. Human genetic susceptibility to infectious disease. Nature Reviews Genetics. 2012;13(3):175–188. pmid:22310894
  40. 40. Jeffery KJ, Usuku K, Hall SE, Matsumoto W, Taylor GP, Procter J, et al. HLA alleles determine human T-lymphotropic virus-I (HTLV-I) proviral load and the risk of HTLV-I-associated myelopathy. Proceedings of the National Academy of Sciences. 1999;96(7):3848–3853.
  41. 41. Segal S, Hill AV. Genetic susceptibility to infectious disease. Trends in microbiology. 2003;11(9):445–448. pmid:13678861
  42. 42. Stephens H, Klaythong R, Sirikong M, Vaughn D, Green S, Kalayanarooj S, et al. HLA-A and-B allele associations with secondary dengue virus infections correlate with disease severity and the infecting viral serotype in ethnic Thais. Tissue antigens. 2002;60(4):309–318. pmid:12472660
  43. 43. Singh N, Agrawal S, Rastogi A. Infectious diseases and immunity: special reference to major histocompatibility complex. Emerging Infectious Diseases. 1997;3(1):41 pmid:9126443
  44. 44. Andreasen V. The final size of an epidemic and its relation to the basic reproduction number. Bulletin of mathematical biology. 2011;73(10):2305–2321. pmid:21210241
  45. 45. Van Cauteren D, Vaux S, de Valk H, Le Strat Y, Vaillant V, Lévy-Bruhl D. Burden of influenza, healthcare seeking behaviour and hygiene measures during the A (H1N1) 2009 pandemic in France: a population based study. BMC public health. 2012;12(1):947. pmid:23127166
  46. 46. Nelson MI, Simonsen L, Viboud C, Miller MA, Holmes EC. Phylogenetic analysis reveals the global migration of seasonal influenza A viruses. PLoS pathogens. 2007;3(9):e131.
  47. 47. Gostic KM, Ambrose M, Worobey M, Lloyd-Smith JO. Potent protection against H5N1 and H7N9 influenza via childhood hemagglutinin imprinting. Science. 2016;354(6313):722–726. pmid:27846599
  48. 48. Lessler J, Riley S, Read JM, Wang S, Zhu H, Smith GJ, et al. Evidence for antigenic seniority in influenza A (H3N2) antibody responses in southern China. PLoS pathogens. 2012;8(7):e1002802. pmid:22829765
  49. 49. Bahadoran A, Lee SH, Wang SM, Manikam R, Rajarajeswaran J, Raju CS, et al. Immune responses to influenza virus and its correlation to age and inherited factors. Frontiers in microbiology. 2016;7. pmid:27920759
  50. 50. Scheible K, Zhang G, Baer J, Azadniv M, Lambert K, Pryhuber G, et al. CD8+ T cell immunity to 2009 pandemic and seasonal H1N1 influenza viruses. Vaccine. 2011;29(11):2159–2168. pmid:21211588
  51. 51. Mukherjee S, Chandra N. Grouping of large populations into few CTL immune ‘response-types’ from influenza H1N1 genome analysis. Clinical & Translational Immunology. 2014;3(8):e24.
  52. 52. Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, et al. The immune epitope database (IEDB) 3.0. Nucleic acids research. 2015;43(D1):D405–D412. pmid:25300482
  53. 53. Gonzalez-Galarza FF, Christmas S, Middleton D, Jones AR. Allele frequency net: a database and online repository for immune gene frequencies in worldwide populations. Nucleic acids research. 2011;39(suppl 1):D913–D919. pmid:21062830
  54. 54. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bansal P, Bridge AJ, et al. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view. Plant Bioinformatics: Methods and Protocols. 2016; p. 23–54.
  55. 55. Consortium U, et al. UniProt: a hub for protein information. Nucleic acids research. 2014; p. gku989.
  56. 56. Lundegaard C, Lamberth K, Harndahl M, Buus S, Lund O, Nielsen M. NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11. Nucleic acids research. 2008;36(suppl 2):W509–W512. pmid:18463140
  57. 57. Peters B, Sette A. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC bioinformatics. 2005;6(1):132. pmid:15927070
  58. 58. Sidney J, Assarsson E, Moore C, Ngo S, Pinilla C, Sette A, et al. Quantitative peptide binding motifs for 19 human and mouse MHC class I molecules derived using positional scanning combinatorial peptide libraries. Immunome research. 2008;4(1):2 pmid:18221540
  59. 59. Sette A, Vitiello A, Reherman B, Fowler P, Nayersina R, Kast WM, et al. The relationship between class I binding affinity and immunogenicity of potential cytotoxic T cell epitopes. The Journal of Immunology. 1994;153(12):5586–5592. pmid:7527444
  60. 60. Anderson RM. Directly transmitted viral and bacterial infections of man. In: The Population Dynamics of Infectious Diseases: Theory and Applications. Springer; 1982. p. 1–37.
  61. 61. Andersson H, Britton T, et al. Heterogeneity in epidemic models and its effect on the spread of infection. Journal of applied probability. 1998;35(3):651–661.
  62. 62. Pawaiya R, Dhama K, Mahendran M, Tripathi B, et al. Swine flu and the current influenza A (H1N1) pandemic in humans: A review. Indian J Vet Pathol. 2009;33(1):1–17.
  63. 63. Flint SM, Davis JS, Su JY, Oliver-Landry EP, Rogers BA, Goldstein A, et al. Disproportionate impact of pandemic (H1N1) 2009 influenza on Indigenous people in the Top End of Australia’s Northern Territory. The Medical Journal of Australia. 2010;192(10):617–622. pmid:20477746
  64. 64. La Ruche G, Tarantola A, Barboza P, Vaillant L, Gueguen J, Gastellu-Etchegorry M, et al. The 2009 pandemic H1N1 influenza and indigenous populations of the Americas and the Pacific. Eurosurveillance. 2009;14(42):19366. pmid:19883543
  65. 65. Clemens EB, Grant EJ, Wang Z, Gras S, Tipping P, Rossjohn J, et al. Towards identification of immune and genetic correlates of severe influenza disease in Indigenous Australians. Immunology and cell biology. 2016;94(4):367–377. pmid:26493179