^{1}

^{*}

^{1}

^{2}

^{3}

^{4}

^{5}

^{1}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: MA PM AV SM. Performed the experiments: LF MA. Analyzed the data: LF MA PM AV SM. Wrote the paper: LF MA PM AV SM. Developed the model: LF MA SM.

Social contact patterns among individuals encode the transmission route of infectious diseases and are a key ingredient in the realistic characterization and modeling of epidemics. Unfortunately, the gathering of high quality experimental data on contact patterns in human populations is a very difficult task even at the coarse level of mixing patterns among age groups. Here we propose an alternative route to the estimation of mixing patterns that relies on the construction of virtual populations parametrized with highly detailed census and demographic data. We present the modeling of the population of 26 European countries and the generation of the corresponding synthetic contact matrices among the population age groups. The method is validated by a detailed comparison with the matrices obtained in six European countries by the most extensive survey study on mixing patterns. The methodology presented here allows a large scale comparison of mixing patterns in Europe, highlighting general common features as well as country-specific differences. We find clear relations between epidemiologically relevant quantities (reproduction number and attack rate) and socio-demographic characteristics of the populations, such as the average age of the population and the duration of primary school cycle. This study provides a numerical approach for the generation of human mixing patterns that can be used to improve the accuracy of mathematical models in the absence of specific experimental data.

The dynamics of infectious diseases caused by pathogens transmissible from human to human strongly depends on contact patterns between individuals. High quality observational data on contact patterns, usually presented in the form of age-specific contact matrices, are difficult to gather and are currently available only for few countries worldwide. Here we propose a computational approach, based on the simulation of a virtual society of agents, allowing the estimation of contact patterns by age for 26 European countries. We validate the estimated contact matrices against those obtained by the most extensive field study on contact patterns, with data collected in eight European countries. We show that our contact matrices share some common features, e.g. individuals tend to mix preferentially with individuals their own age, and country-specific differences, which can be partly explained by differences in population structures due to different demographic trajectories followed after WWII. Our analysis highlights well defined correlations between epidemiological parameters and socio-demographic features of the populations. This study provides the first estimates of contact matrices for many European countries where specific experimental data are still not available.

The accurate characterization of the structure of social contacts in mathematical and computational models of infectious disease transmission is a key element in the assessment of the impact of epidemic outbreaks and in the evaluation of effective control measures. For instance, the transmissibility potential of a disease and the final epidemic size strongly depend on mixing patterns between individuals of the population, which in turn depend on socio-demographic parameters (e.g. household size, fraction of workers and students in the population)

In this study we propose to overcome the above challenges by developing a general computational approach to derive mixing patterns from routinely collected socio-demographic data. In particular we focus on contact matrices by age of 26 European countries for which we are in the position to construct a synthetic society in the computer by integrating available social and census data. The use of contact matrices is the simplest way to improve on the homogeneous mixing assumption while at the same time preserving the analytical transparency of the model. The proposed approach is based on the simulation of a virtual society of agents that allows the estimate of contact matrices by age in different social settings: household, school, workplace and general community. Unlike classical agent based approaches of epidemic transmission

Those matrices are appropriately combined in order to obtain the overall “adequate” total contact matrix for influenza-like-illness. In order to validate the proposed approach we compare the obtained contact matrices by age with the results of the Polymod study

The proposed method is extremely general and can be readily exported to other countries in the world for which the necessary social and demographic data can be gathered. We consider this approach an important step in order to overcome the current difficulties in real data gathering. Furthermore the computational path to the estimate of contact matrices represents a convenient scheme for the introduction of detailed individual based information in a wide range of modeling approaches working at the population level. For this reason we publicly release the entire collection of contact matrices to the scientific community (see

In order to provide a quantitative estimate of contact matrices for 26 European countries we used highly detailed data on the country-specific socio-demographic structures (e.g., household size and composition, age structure, rates of school attendance, etc.) available at the Statistical Office of the European Commission

The procedure to generate the synthetic populations is quite standard in the context of individual based models and is therefore discussed in detail in

The mathematical representation of epidemics relies on the description of the transmission process which is usually modeled through the force of infection, that is the rate at which a susceptible individual acquires the infection because of the interactions with infectious individuals. This quantity is proportional to the number of infectious individuals, the specific transmission probability of the infection during a contact and the overall rate of contacts of each individual with other individuals in the population. Although a vast majority of studies assumes the population as homogeneous –all individuals are equal with same average contact rate– the social and demographic structure of the population is generally reflected in heterogeneous contact patterns among individuals. Age is obviously one of the main determinants of the mixing pattern of individuals. Children tend to spend more time with children and members of their household, active adults mix with individuals in their workplace etc. Mixing patterns by age are generally defined by a contact matrix whose elements

To give an example, let us see in more detail the computation of the matrix of contacts within households,

In order to transform the frequency of contacts into contact matrices relevant for infectious disease spreading, we need to consider the following quantities:

We can therefore define the synthetic contact matrix for households

In order to define the “adequate” contact matrix

For each setting

Matrix

Remarkably, our matrices are computed by considering one-year age brackets, from 0 to 100 and over years; this is the most refined version of our data on frequencies of contacts. They can however be aggregated in different ways, depending on the purpose for which they are used: for instance, for childhood diseases one may prefer to group contact data for children according to educational levels.

Although we are dealing with very detailed data on the socio-demographic structures of European countries, there are a number of limitations and assumptions that it is worth stating. First of all, although the relevant statistics could be gathered from other sources, we consider Eurostat as the only source of data on occupation rates. This is the reason why we decided to exclude Belgium, Poland and Malta from our study in view of the incomplete information on employment and schooling rates. Furthermore, the different household structures considered in our virtual society cover about the 95% of the total number of households in Europe. We do not allow however families with an aggregate member or non-private households (such as rest homes, dorms, religious and military institutions). Finally, another limitation lies in the assumption of homogeneous mixing for the contacts occurring in the community at large (i.e., not occurring between household members, schoolmates and work colleagues). In fact, this implies to disregard any kind of preferential mixing, e.g. by age, and the level of activity of individuals, which may vary by age, as documented by Polymod data

In the classic SIR model the population is divided into three compartments: susceptible (individuals that can acquire infection), infectious (individuals that have been infected and are able to transmit the pathogen) and recovered (individuals that are immune to the disease–e.g. because they recovered from the infection). In order to include the mixing patterns encoded in the contact matrices, each group is characterized by an age structure. Every susceptible individual of age

To analyze post-pandemic H1N1 serological data collected in England and Wales in fall 2009

Representations in logarithmic scale of contact matrices by one-year age brackets for the United Kingdom in the different social settings. Frequency of contacts (in arbitrary units) increases from blue to red.

In

Although similar attributes can be observed in the synthetic contact matrices for all 26 countries under consideration (the representation of all matrices other than UK are reported in

In order to infer more rigorously whether similarities between contact matrices can be identified to characterize specific groups of countries, we use a hierarchical cluster algorithm. The algorithm uses the average dissimilarity between two matrices

In particular, we isolated four main clusters (see

Clustering of countries on the basis of total matrices.

In order to validate the data driven modeling approach at the origin of the synthetic contact matrices we compared our matrices with those obtained by the Polymod project

Another way to assess and validate our approach consists in the analysis of the prevalence by age profile generated by using Polymod matrices and our synthetic matrices in ILI epidemic models. As an example, we considered the epidemic prevalence generated by an age structured SIR model in a fully susceptible population (as detailed in the Materials and Methods section). The model includes the heterogeneity of contacts by age by introducing a force of infection across age groups modulated by the matrix

Furthermore, in order to validate numerical simulations against empirical data, we compared predictions of our and Polymod contact matrices to seroprevalence data collected in England and Wales at the end of the second wave of the 2009 H1N1 pandemic influenza

It is worth remarking that profiles predicted by employing our matrices are smooth because the proportions of contacts are derived from the entire simulated population. Polymod matrices instead are based on the observation of a sample of the population, and this leads to a less regular seroprevalence profile (as can be seen for instance for the Netherlands, where prevalence for individuals aged 19–29 appears to be much lower than for the adjoining age groups). More in general, the prevalence predicted from the synthetic mixing patterns is higher among school-age children, intermediate for working ages and progressively declining in the elderly; prevalence among little children is at an intermediate level. This pattern is mainly driven by country-specific employment and schooling rates, along with the scholastic organization. Simulated seroprevalences, using our contact matrices in the same epidemic setting, for the countries not covered by Polymod are provided in

An intermediate choice between homogeneous mixing and country-specific contact patterns would be to consider mixing patterns as derived by appropriately averaging over the 26 country-specific contact matrices; therefore in this section we compare results obtained by assuming homogeneous mixing, the average European matrix and country-specific matrices. We considered an SIR model where all the basic parameters and scaling factors are set on the baseline yielding a basic reproduction number

The improvement obtained by using either the European average matrix or country-specific matrices compared to the homogeneous model is evident e.g. in terms of final attack rate which, as already noticed in previous computational studies

By applying to every country the average European contact matrix, large differences in terms of attack rate and peak day can be observed compared to the results obtained with the country-specific mixing patterns, especially for values of the basic reproduction number consistent with influenza epidemics (

The synthetic contact matrices allow us to analyze the effect of the different social and demographic structure of countries on the evolution of infectious diseases characterized by the same natural history. For the sake of simplicity we considered an SIR model with basic parameters and scaling factors corresponding to

The average age of the population is the single factor best explaining the basic reproduction number (correlation

In this work we propose a method, based on the analysis of the contact network in a highly detailed virtual society, and compute the related matrices of adequate contacts for 26 European countries.

Our analysis highlights well defined correlations between epidemiological parameters and socio-demographic features of the populations. Specifically, we found that the basic reproduction number is well explained by a linear model having average age of the population and duration of primary school cycle as independent variables, whose values are easily derivable from routinely collected social and demographic data. In addition, the average age appears as the main determinant in explaining differences in final attack rates between countries. In this perspective, the use of synthetic contact matrices helps in improving the accuracy of mathematical models predictions, which are increasingly used for supporting public health decisions.

It is worth remarking that the presented approach is based on routinely collected data, and it can be easily extended to every country for which socio-demographic data are available. Notably, by providing information by one-year age brackets, our contact matrices are particularly suitable when dealing with childhood diseases which require detailed information on contact patterns in the youngest age classes. Finally, our method may be used also retrospectively, in order to reconstruct contact patterns in the past by using data from previous census rounds; this would be useful to review classic results based on indirect estimates of contacts, such as WAIFW matrices

(XLS)

(XLS)

(PDF)

The authors would like to thank three anonymous reviewers for their helpful comments.