## Figures

## Abstract

Several important aspects related to SARS-CoV-2 transmission are not well known due to a lack of appropriate data. However, mathematical and computational tools can be used to extract part of this information from the available data, like some hidden age-related characteristics. In this paper, we present a method to investigate age-specific differences in transmission parameters related to susceptibility to and infectiousness upon contracting SARS-CoV-2 infection. More specifically, we use panel-based social contact data from diary-based surveys conducted in Belgium combined with the next generation principle to infer the relative incidence and we compare this to real-life incidence data. Comparing these two allows for the estimation of age-specific transmission parameters. Our analysis implies the susceptibility in children to be around half of the susceptibility in adults, and even lower for very young children (preschooler). However, the probability of adults and the elderly to contract the infection is decreasing throughout the vaccination campaign, thereby modifying the picture over time.

## Author summary

Basic transmission dynamic characteristics of SARS-CoV-2, such as the probability of acquiring infection when exposed (“susceptibility”), and the probability of transmitting infection when infected (“infectiousness”) may be age-dependent. We present a computational method to estimate these age-specific characteristics using Belgian social contact and surveillance data. We found that children are less susceptible to infection than adults, with the former experiencing 20% to 50% of the susceptibility in adults, while the infectiousness is more difficult to discern. The force of infection (probability of acquiring infection per unit time) decreases over time for the oldest age groups first, following the roll-out of the vaccination campaign which targeted the elderly first.

**Citation: **Franco N, Coletti P, Willem L, Angeli L, Lajot A, Abrams S, et al. (2022) Inferring age-specific differences in susceptibility to and infectiousness upon SARS-CoV-2 infection based on Belgian social contact data. PLoS Comput Biol 18(3):
e1009965.
https://doi.org/10.1371/journal.pcbi.1009965

**Editor: **Claudio José Struchiner,
Fundação Getúlio Vargas: Fundacao Getulio Vargas, BRAZIL

**Received: **October 7, 2021; **Accepted: **February 24, 2022; **Published: ** March 30, 2022

**Copyright: ** © 2022 Franco et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **R codes and all necessary data to run the codes (potentially aggregate) are available at https://github.com/nicolas-franco-unamur/Next-gen. CoMix data and age-class reaggregate PCR tests data are provided with the code. CoMix social contact data are also available via http://www.socialcontactdata.org. If needed, initial non-aggregate PCR tests data can be requested from Sciensano via the online form https://epistat.wiv-isp.be/datarequest.

**Funding: **This work received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program -- Project TransMID (NF, PC and NH, Grant Agreement 682540, https://cordis.europa.eu/project/id/682540). This project was also funded by the European Union’s Horizon 2020 Research and Innovations Programme -- Project EpiPose, Epidemic intelligence to Minimize COVID-19’s Public Health, Societal and Economical Impact (NH, grant number 101003688, https://cordis.europa.eu/project/id/101003688). LW gratefully acknowledges support from the Research Foundation Flanders (FWO) (postdoctoral fellowships 1234620N, https://www.fwo.be). NH acknowledges funding from the Antwerp Study Center for Infectious Diseases (ASCID, https://www.uantwerpen.be/en/research-groups/vaxinfectio/) and the Methusalem-Centre of Excellence consortium (VAX–IDEA, https://www.uantwerpen.be/en/research-groups/vax-idea/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Since the start of the COVID-19 pandemic, a new respiratory disease caused by the SARS-CoV-2 coronavirus, many mathematical and statistical approaches have been considered to identify transmission dynamics and characteristics of the virus. Some of those characteristics are still not completely known due to the lack of appropriate data. However, these characteristics are necessary in order to correctly inform public health policies as well as to develop more advanced scientific tools like mathematical and computational models. Concerning COVID-19, as for most infectious diseases, it quickly became apparent that some of the disease characteristics are strongly age-dependent [1]. In particular, the susceptibility to SARS-CoV-2 infection as well as the infectiousness upon infection may be lower for children than for adults and the elderly, as shown by many studies mostly based on statistical approaches on incidence data [2–6]. Knowledge of such a difference could have an important impact on public health strategies in terms of prioritization of vaccination or the choice of targeted non-pharmaceutical interventions.

In this study, we propose a different method to estimate heterogeneous transmission parameters related to relative susceptibility and infectiousness using derived information from social contact data, and we illustrate this method using Belgian data. Social contact surveys [7] coupled with the next generation principle [8, 9] have been used for years to estimate key epidemiological parameters such as the basic (and effective) reproduction number (i.e., the average number of new infections caused by a *typical* infected individual during their entire infectious period in a (fully) susceptible population), relative incidence or differences in susceptibility [10]. The first large-scale social contact study, POLYMOD [11], collected social contact patterns for eight European countries between May 2005 and September 2006. In 2020–2021, social contact data has been collected in the so-called CoMix survey [12–15], initially in the United Kingdom, The Netherlands and Belgium and afterwards extended to other European countries. Comix collected timely social contact information during the COVID-19 pandemic.

Social contact data can be used as a proxy to model SARS-CoV-2 transmission using the so-called *social contact hypothesis* [16], which implies that the age-specific number of infectious contacts is proportional to the self-reported age-specific number of social contacts by a proportionality factor. This proportionality factor, often denoted by *q*, assumes that the probability of transmission is homogeneous across the different age classes. In the current paper, we aim to disentangle and quantify the heterogeneous components of this proportionality factor further [17], elucidating information on relative age-specific susceptibility and infectiousness. Our approach is based on the method used by [10] to estimate susceptibility profiles for influenza A/H1N1. However, we have refined the method to include a larger number of age categories and applied the methodology to SARS-CoV-2 transmission using a numerical approach. These estimates could serve to inform heterogeneous COVID-19 mathematical models relying on social contact data, such as e.g. mechanistic models [18–21]. Social contact data are also used in [22, 23] to derive heterogeneous contributions to SARS-CoV-2 transmission using an approach based on the reproduction number. We go one step further using an approach based on the relative incidence derived from the next generation principle.

More specifically, we use the CoMix social contact data combined with daily incidence data on the number of new confirmed COVID-19 cases in Belgium over the period December 2020 to May 2021 to estimate the proportionality factor and its heterogeneous unmeasured components. We disentangle potential sources of heterogeneity in the acquisition of SARS-CoV-2 infection especially focusing on the comparison between children (infant, primary and secondary school) and adults. We also estimate the time evolution of the transmission parameters for different adult age classes throughout the vaccination campaign as carried out in Belgium showing an evolution of the proportionality factors over time. Then we present an illustration of the utility of heterogeneous proportionality factors by comparing the reproduction number estimated from the CoMix social contact data to the ones estimated from incidence of cases and hospitalizations, respectively.

## Results

### Estimation of susceptibility and infectiousness through proportionality factors

The proportionality factor is assumed to be age-specific and denoted by *q*_{ij} where *i* and *j* belong to some age classes. *q*_{ij} could be further split into heterogeneous components:
where:

- The vector (
*a*_{i}) represents age-specific differences in factors influencing transmission which are specifically related to the susceptibility of individuals, including, but not limited to, direct (immunological) susceptibility to infection upon exposure (e.g., due to age-specific heterogeneous risk behavior and/or compliance to non-pharmaceutical interventions not already captured by contact frequency, natural susceptibility from previous infection, differences in vaccination status, etc.). In order to distinguish it from direct susceptibility to infection, this vector will be referred to as*q-susceptibility*. - The vector (
*h*_{j}) describes age-specific differences in factors influencing transmission which are specifically related to the infectiousness of individuals, including, but not limited to, infectiousness after acquiring infection (e.g., due to differences in viral load upon exposure, proportion of asymptomatic individuals, differences in vaccination status, mask wearing, etc.). In order to distinguish it from infectiousness upon infection, this vector will be referred to as*q-infectiousness*. - The remaining global proportionality factor captures any remaining residual effect and is of no relevance when considering relative q-susceptibility or q-infectivity.

In the remainder of this paper, we will talk about susceptibility or infectiousness when considering immunological aspects of disease transmission, while we will add the prefix *q* whenever quantities can carry additional effects related to susceptible and infectious individuals in order to avoid any ambiguity.

If we denote by *w* = (*w*_{j}) a vector representing the relative incidence within age class *j* (usually normalized such that ∑_{j} *w*_{j} = 1) and by **M**^{T} a matrix containing the social contact data (whose components *m*_{ji} represent the average daily number of individuals of age *j* who have a contact with a single individual of age *i*), then we have the following system:
where *R*_{t} represents the reproduction number. The core matrix of this system, is called the *next generation matrix* and gives the number of new infections in a successive generation. Details concerning the construction of this matrix and the next generation principle can be found in Section Materials and methods.

Using our method, we compare the social contact matrix **M**^{T} extracted from the CoMix social contact survey in Belgium [12] and its derived relative incidence *w* to the incidence obtained from real-life data in Belgium coming from PCR positive tests [24]. This method allows for an estimation of either the relative *q*-susceptibility (*a*_{i}) or relative *q*-infectiousness (*h*_{j}) by age class, while assuming that the other set of parameters is known from the literature (i.e., holding one of the two vectors fixed). The chosen age groups are [0, 6) years, [6, 12) years, [12, 18) years, [18, 30) years and subsequent 10-year age classes up to 80+ years in order to account for the Belgian educational system. Due to the method, obtained results only have a relative interpretation, hence we present them under the assumption of a mean susceptibility of one for the first adult class [18, 30). The period of observation goes from 22 December 2020 to 26 May 2021 (and to 15 June 2021 for confirmed cases data). A detailed description of considered data, literature assumptions, fitting procedure and normalization method is presented in Section Materials and methods.

The estimated relative *q*-susceptibility for the whole period is presented in Fig 1 and implies that very young children in age group [0, 6) are about 0.182 (95% percentile bootstrap-based CI: 0.146–0.230) times as susceptible compared to the first adult age class [18, 30) with relative susceptibility equal to 1 (95% CI: 0.829–1.252). Primary school students aged [6, 12) have a relative susceptibility of 0.550 (95% CI: 0.427–0.629) and secondary school students aged [12, 18) a susceptibility of 0.603 (95% CI: 0.536–0.700). This shows an increasing *q*-susceptibility by increasing age up to [18, 30) after which the relative *q*-susceptibility tends to decrease slightly. Note, however, that this *q*-susceptibility captures not only differences in immunological susceptibility to infection and the rather low relative *q*-susceptibility in the [12, 18) age class could therefore be influenced by (compliance to) non-pharmaceutical interventions, over and above the age-specific contact frequencies.

The estimation of *q*-susceptibility is performed by age class using the next generation principle under an assumption on age-specific infectiousness (0.54, 0.55, 0.56, 0.59, 0.7, 0.76, 0.9, 0.99, 0.99, 0.99). The calibration is performed on CoMix waves 12 to 23 (observation period: 22 December 2020 to 15 June 2021). Dots represent means and bars represent 95% percentile (nonparametric) bootstrap-based confidence intervals.

The comparison of the relative incidence as estimated based on the positive PCR test data and the CoMix social contact data is presented in Fig 2. The social contact data are presented by waves starting with wave 12 on 22 December 2020 and an inter-survey wave interval of two weeks for subsequent waves (cf. details in Table A of S1 Appendix). The nationally collected data are represented in blue and the estimates coming from social contact data in two colors: in green, the initial estimate with a homogeneous proportionality factor (i.e., with *a*_{i} = 1 and *h*_{j} = 1 for all *i*, *j*) and in red, the estimate using heterogeneous *q*-susceptibility and infectiousness as presented in Fig 1. We clearly observe that estimates of the relative incidence under the homogeneous proportionality factor assumption (green) are very different from the empirical estimates (blue), especially for the young age groups. The relative incidence among adult age classes is estimated relatively well up to a constant, but the relative incidence for children coming from the homogeneous social contacts approach is clearly overestimated, except perhaps during times of school closure (see, e.g., wave 19). This finding provides a clear indication that SARS-CoV-2 transmission is different in children as compared to adults and that homogeneity assumptions should be avoided given that such assumptions could lead to erroneous projections.

In blue: relative incidence based on Belgian PCR data. In green: estimated relative incidence based on the next generation principle under the assumption of a homogeneous proportionality factor. In red: estimated relative incidence based on the next generation principle with an estimated age-specific *q*-susceptibility under the assumption on age-specific infectiousness as in Fig 1. Dots represent means and bars represent 95% percentile (nonparametric) bootstrap-based confidence intervals.

The result of estimating the *q*-infectiousness for the whole period, is depicted in Fig 3. The estimates also show a potential important heterogeneity concerning the proportionality factor on the infectiousness side. However, this reverse exercise provides less accurate results, with very large confidence intervals and some bootstrap estimates reaching zero, both being problematic when dealing with relative values. Those effects are the result of a lack of constraints for q-infectiousness. Indeed, while it is impossible to reach zero susceptibility for a specific age class when having at the same time non-zero incidence, it is technically allowed for age-specific infectiousness to be zero as the observed incidence could result from transmission from other age classes.

The estimation of *q*-infectiousness is performed by age class using the next generation principle under an assumption on age-specific susceptibility (0.4, 0.39, 0.38, 0.79, 0.86, 0.8, 0.82, 0.88, 0.74, 0.74). The calibration is performed on CoMix waves 12 to 23. Dots represent means and bars represent 95% percentile (nonparametric) bootstrap-based confidence intervals.

Exact estimates of the components of the *q*-susceptibility (*a*_{i}) and *q*-infectiousness (*h*_{i}) are provided in Tables E and I of S1 Appendix. Additional estimates under the assumption of homogeneity regarding infectiousness or susceptibility (i.e., estimating (*a*_{i}) under *h*_{j} = 1, ∀*j* or estimating (*h*_{j}) under *a*_{i} = 1, ∀*i*) are also presented in Figs D and L of S1 Appendix together with estimated values and the effect on the relative incidence. These additional estimates provide qualitatively similar results. Additional sensitivity analyses with regard to (*h*_{j}) and (*a*_{j}) showed that the variation of *q*-susceptibility estimates under different assumptions is clearly limited while sensitivity is greater concerning *q*-infectiousness estimates (see Figs B and C of S1 Appendix).

### Time evolution of proportionality factors

Since proportionality factors capture several effects, they also capture time-dependent effects such as the reduction in susceptibility and infectiousness as a result of the vaccination campaign. In order to account for such a time evolution, we also performed the previous analysis using groups of two consecutive CoMix waves instead of the full period. The decision to consider two CoMix waves (28 days) together is motivated by the fact that a sufficiently long non-holidays period is required as social contacts in children are of importance and the heterogeneity of the transmission concerning adult classes is partially constrained by infection reported by children. Note that the gradual introduction of the alpha variant of concern might also interfere.

Estimates of the time-dependent *q*-susceptibility relying on the same (time-invariant) assumption with regard to the infectiousness vector are presented in Fig 4. A normalization of the relative values was performed over the different waves such that the average of the estimated factors for the age classes [0, 6), [6, 12) and [12, 18), i.e. , is assumed constant. This choice is motivated by the fact that the vaccination campaign was not including children during the entire study period, hence proportionality factors regarding susceptibility can be expected to be more stable for these age classes (however still influenced by the evolution of the proportion of susceptible individuals due to the ongoing epidemic). Thus, the results provide an estimate of the evolution of adults’ proportionality factors under an on average constant assumption for children [0, 18). A second normalization (global scaling) is performed under the assumption of a mean susceptibility of one for the first adult class [18, 30) for wave 12.

The estimation of *q*-susceptibility is performed by age class and over time on groups of two consecutive CoMix waves, corresponding to a period of 4 weeks, under an assumption on age-specific infectiousness (0.54, 0.55, 0.56, 0.59, 0.7, 0.76, 0.9, 0.99, 0.99, 0.99). Dots represent means and bars represent 95% percentile (nonparametric) bootstrap-based confidence intervals.

As indicated previously concerning the estimation during the complete time-period, a decreasing *q*-susceptibility is observed through adult age classes (see Fig 1), with the oldest age class (80+) being the least susceptible among all adults aged 18 years or older. This is a priori in contrast with usual assumptions regarding age-specific susceptibility to SARS-CoV-2 infection. However, in Fig 4, we clearly observe the highest relative *q*-susceptibility in the 80+ age class for the earliest waves as compared to all other age groups, or at least an equal *q*-susceptibility by considering the lower side of the confidence interval. Moreover, the *q*-susceptibility in the oldest age class decreases rapidly over time towards the lowest relative *q*-susceptibility equal to 0.446 (95% CI: 0.266–0.660) among the adult age groups. In general, the estimated *q*-susceptibility is almost similar across the different adult classes during the first period with the exception of the oldest class 80+ with an estimated relative *q*-susceptibility of 1.844 (95% CI: 0.920–3.127). Overall, *q*-susceptibility estimates of other age classes tend to decrease over time, albeit at a slower pace and to a lesser extent. This is in line with the implementation of the vaccination policy in Belgium, giving vaccination priority to residents of nursing homes (CoMix waves 13–16, see schematic timeline in Fig A of S1 Appendix) and the elderly in the general population (CoMix waves 18–21), while going gradually down from old to young throughout the study period.

Exact values of the estimates in Fig 4 are provided in Table F of S1 Appendix as well as time-dependent *q*-susceptibility and *q*-infectiousness under the various constraints mentioned above.

### Time evolution of *R*_{t} using contact patterns

In order to check the utility and validity of the use of a heterogeneous proportionality factor, we illustrate its application by determining the reproduction number *R*_{t}, or more specifically the variation of the reproduction number over time, and comparing this evolution with *R*_{t} directly estimated from confirmed cases/hospitalizations data.

In Fig 5, the variation of the reproduction number computed from the CoMix data is compared to the reproduction number computed either from the number of cases [25] (panel a) or from hospitalizations [24] (panel b). Clearly, specific choices of *q*-susceptibility and *q*-infectiousness affect the computation and in Fig 5 we report results for the homogeneity scenario, heterogeneity scenario (corresponding to Fig 1) and temporal heterogeneity scenario (corresponding to Fig 4). A homogeneity assumption for *q*-infectiousness and *q*-susceptibility leads to a poor agreement with the reproduction number estimated from both confirmed cases and hospitalizations and is also characterized by a larger uncertainty. The use of the estimated heterogeneous reproduction factor agrees more with reality, with the use of temporal values is not leading to a substantial improvement.

Reproduction number estimated from the CoMix data using the next generation approach in comparison to the reproduction number estimated from the number of confirmed cases (a) and from the number of hospitalizations (b). In green: estimated *R*_{t} under the assumption of a homogeneous proportionality factor. In red: estimated *R*_{t} with the estimated age-specific *q*-susceptibility under the assumption on age-specific infectiousness as in Figs 1 and 2. In blue: estimated *R*_{t} with the estimated temporal and age-specific *q*-susceptibility under the assumption on age-specific infectiousness as in Fig 4. Dots represent means and bars represent 95% percentile (nonparametric) bootstrap-based confidence intervals.

## Discussion

We have demonstrated in this paper that social contact data can be used to inform transmission parameters and to estimate age-specific characteristics of SARS-CoV-2 transmission. More specifically, the next generation approach enables us to disentangle age-specific differences in transmission rates while relying on temporal changes in social contact behavior measured using consecutive waves of a social contact panel study. Clearly, SARS-CoV-2 transmission is partly influenced by age-specific differences in contact behavior, but importantly, additional age-specific factors related to susceptibility and infectiousness, in a broad sense, are necessary to account for. We have shown that such factors imply a smaller susceptibility for children as compared to adults, with the estimated susceptibility in children being around half of the susceptibility in adults, and even less for very young children (Fig 1). This result is in accordance with results obtained using CoMix social contact data in England but using a calibration on the reproduction number instead of the next generation approach [22] as well as in accordance with results obtained from more standard statistical methods [3–6]. With respect to that, we assessed the impact of assuming homogeneous transmission parameters on the reproduction number, showing how (age-)heterogeneous parameters are necessary to correctly align the reproduction number from the CoMix data and the reproduction number estimated from infections or hospitalizations. Moreover, our method is able to estimate temporal transmission parameters and it shows a gradual decrease in susceptibility of adults in line with the progression of the Belgian vaccination campaign (Fig 4). This decrease implies a progressive change in the dynamics of the epidemic with largely unvaccinated childhood age groups gradually becoming more important drivers of SARS-CoV-2 transmission than predominantly vaccinated adult age groups.

However, our method suffers from several limitations. A potential bias which needs to be acknowledged is the use of PCR data which correspond to the observed relative incidence and do not necessarily correspond to the true relative incidence as each age class is not necessarily tested in the same way, even if we discard periods of strong variation in testing policy. Indeed, even in the absence of a change in testing policy (cf. Table B of S1 Appendix), age-specific differences in symptomatology, disease severity and the probability of developing symptoms upon infection lead to different shares of symptomatic and asymptomatic cases to be detected. Other approaches have also been investigated, for example using serological survey data instead of PCR data, but this was not successful on Belgian data given the limited amount of data and the poor synchronization between CoMix and serological survey periods. Moreover, using serological data requires addressing the difficulty of waning humoral immunity against SARS-CoV-2 infection. Despite the fact that we can infer *q*-susceptibility and *q*-infectiousness from the observed PCR test data, we cannot further disentangle both components by estimating the aforementioned quantities simultaneously. By comparing the two separate approaches, the estimation of the relative *q*-susceptibility seems most informative, since proportionality factors are better constrained by the data (cf. also sensitivity analysis in Figs B and C of S1 Appendix). More specifically, the estimated *q*-susceptibility was identifiable when fitting to reported incidence data while *q*-infectiousness estimates were estimated to be zero for certain age classes, which seems an artifact of the methodology (which could potentially be solved by using external constraints on *q*-infectiousness to avoid reaching unrealistic low values of transmission). Another limitation of the proposed method is that a further decomposition of *q*-susceptibility (or *q*-infectiousness) in immunological susceptibility (infectiousness) and other external factors relevant for transmission between susceptible and infectious persons is difficult, at least without availability of relevant additional data thereon. Nonetheless, an assessment and quantification of the (relative) *q*-susceptibility, *q*-infectiousness and the corresponding relative incidence provides useful insights into heterogeneous SARS-CoV-2 transmission dynamics.

## Materials and methods

### Social contact data

Our study is based on Belgian social contact data collected within the CoMix survey [12, 13] during the COVID-pandemic between December 2020 and May 2021. These data are stored, processed and stratified by age by means of the online Socrates tool [13, 26, 27]. Participants were asked to fill in a contact dairy including all contacts made during a specific day, reporting the type of contact, location, and age of the contacted person, with a contact defined as an in-person conversation of at least a few words, or a skin-to-skin contact. The CoMix survey was repeatedly performed in different waves and different survey periods. More specifically, an initial survey period containing 8 waves was carried out between 4 March 2020 and 27 July 2020 targeting adults only. A second survey period, still ongoing in 2021, began on 11 November 2020 targeting participants of all ages. The waves are conducted with an interval of two weeks (14 days). For more detailed information on the CoMix survey and the stratification process, the reader is referred to [12, 13]. A detailed timetable of the CoMix waves and survey periods is presented in Table A of S1 Appendix. A schematic timeline of CoMix waves according to the evolution of the alpha variant of concern and vaccination campaign in Belgium is presented in Fig A of S1 Appendix.

We use the following notation. *N*_{i} denotes the number of individuals in the Belgian population of age *i* according to Belgian demographic data [28] and integrated into the Socrates tool [27]. In general, we use subscripts *i* as an index for the participant’s age, and *j* as an index for the contacted person’s age. The following observable quantities (dependent on the wave chosen) can be extracted from the survey:

*m*_{ij}represents the average daily number of individuals of age*j*who are contacted by a participant of age*i*. The elements*m*_{ij}constitute a matrix**M**called*social contact matrix*.*c*_{ij}is the per capita contact rate per day for participants of age*i*with persons of age*j*in the population. The elements*c*_{ij}constitute a matrix**C**called the*contact rate matrix*. This matrix is related to the social contact matrix by the relation*c*_{ij}=*m*_{ij}/*N*_{j}.

In theory, due to the reciprocal nature of contacts, the total number of contacts between members of two age classes, as reported by participants in each of the age groups, must be equal, hence *N*_{i}*m*_{ij} = *N*_{i}*c*_{ij}*N*_{j} = *N*_{j}*c*_{ji}*N*_{i} = *N*_{j}*m*_{ji}, which is equivalent to the condition that the contact rate matrix should be symmetric, i.e., *c*_{ij} = *c*_{ji}, ∀*i*, *j*. The social contact matrix **M** respects the relation *N*_{i}*m*_{ij} = *N*_{j}*m*_{ji}, but is in general not symmetric due to differences in *N*_{i} and *N*_{j}. In practice, the observed total number of contacts and are not necessarily equal due to sampling bias, hence, we calculate the reciprocal social contact matrix by:

All these notations and definitions are similar to those described in detail in [29], except that the subscripts *i* and *j* and order of indices are inverted here such that the definition of the social contact matrix **M** corresponds to the default output of the Socrates tool [27].

### Next generation principle

The *social contact hypothesis* [16] implies that the age-specific number of infectious contacts is proportional to the self-reported age-specific number of social contacts. There are two ways to interpret empirical social contact survey data in light of this hypothesis: either survey participants can be infected by their infectious contacts or participants can infect their susceptible contacts. Here, we consider the first interpretation as initial definition—since the CoMix survey did not specifically target infected persons, and symptomatic participants may have been less likely to participate in the survey. However, we will show that the two interpretations lead to the same mathematical result under the assumption of reciprocity of social contacts.

If we denote by *w*_{j} the incidence within age class *j* over a short observation interval (e.g. corresponding to a wave period), then *v*_{j} = *w*_{j}/*N*_{j} is the risk of being infected during the observation interval for that age class (incidence rate or force of infection). The new generation of infected people is given by:
where *q* is a general proportionality factor completely defining the relationship between infection and contact events. The *q*-factor accommodates several effects such as susceptibility to infection, infectiousness upon infection, duration of the infectious period, type and effectiveness of contacts, seasonality, pre-existing natural and vaccine-induced immunity, etc.

The elements define a matrix **K** called the *next generation matrix* (or *reproduction matrix*) since *k*_{ij} represents the mean number of individuals of age *i* that are infected through a single individual of age *j* during their entire infectious period (for which the time between consecutive generations of infected individuals is chosen to be equal to the average duration of infectiousness).

Note that under the reciprocity assumption leading to a symmetric matrix **C**, the relation *N*_{i}*m*_{ij} = *N*_{j}*m*_{ji} provides:
corresponding to the second interpretation that survey participants (on the right side of the transpose contact matrix **M**^{T}) can directly infect their contacts (now on the left side) modulo the proportionality factor. This expression relying on the transpose of the social contact matrix obtained as a direct output of the Socrates tool, **M**, is chosen because of its better numerical stability.

The recurrence relation of the next generation matrix **K**:
tends to a stable distribution due to the Perron–Frobenius theorem [30], i.e.,
with *R*_{t} corresponding to the reproduction number of SARS-CoV-2 [31] which is defined as the leading eigenvalue of the next generation matrix **K**. More specifically, estimation of the reproduction number *R*_{t} and the relative incidence *w* can be done by computing the leading eigenvalue and corresponding right-eigenvector of **K**. However, *R*_{t} depends on the proportionality factor *q*, which might be unknown, but the relative incidence *w* is independent of *q* and can therefore be directly extracted from the social contact data **M**^{T}. The reproduction number is initially the basic reproduction number *R*_{0}, but switches to the effective reproduction number as long as social contact data evolve and the proportionality factor *q* captures the depletion of susceptible. We emphasize here that the eigenvector *w* is only recovered up to a global constant and therefore individual components *w*_{j} have no meaning. What can be interpreted are relative ratios such as *w*_{i}/*w*_{j}, providing an estimate of the relative incidence in age class *i* as compared to the incidence in age class *j*. This vector is usually normalized such that ∑_{i} *w*_{i} = 1. In the same way, the incidence rate *v*_{i} can be recovered, in relative sense, as the leading left-eigenvector of **M**^{T}.

The switch from a homogeneous proportionality factor *q* to a heterogeneous *q*_{ij} is performed by assuming:
where the vector (*a*_{i}) acts on the susceptible side, the vector (*h*_{j}) acts on the infectiousness side, and is a remaining global proportionality factor captures any residual effect. This remaining factor has no influence on the computation of the relative incidence *w*. However, due to the presence of , the vectors (*a*_{i}) and (*h*_{j}) only have a relative interpretation.

The heterogeneous next generation matrix is defined as:

We note that we are working here with a next generation matrix with *small domain*. There also exists a next generation matrix with *large domain* taking explicitly into account the different states of the disease and their duration for each age class [8]. However, the small domain approach is appropriate here since we do not work with a dynamical system and heterogeneity in disease duration is part of the effects captured by the proportionality factors.

### Estimating relative *q*-susceptibility (*a*_{i}) and *q*-infectiousness (*h*_{j}) from COVID-19 age-structured indicators

The vectors (*a*_{i}) and (*h*_{j}) have an important impact on the determination of the leading right eigenvector in the system:
The obtained relative incidence *w** can be compared with the normalized relative incidence estimated from the observed incidence data in Belgium. Using this approach, we are able to determine *q*-susceptibility and *q*-infectiousness corresponding to SARS-CoV-2 transmission in Belgium. However, (*a*_{i}) and (*h*_{j}) vectors cannot be estimated simultaneously in a unique way from this process since there remains an indeterminacy [10, 17]. Nevertheless, the identifiability problem can be solved by imposing a constraint on one of the two vectors.

For this study, we choose each time a heterogeneous constraint coming from the literature as well as a homogeneous constraint (whose results are only presented in S1 Appendix). The heterogeneous constraints are defined from the following assumptions:

- For the assumption on infectiousness
*h*_{j}(estimation of*q*-susceptibility parameters*a*_{i}): We consider the probability of an asymptomatic COVID-19 infection in case of SARS-CoV-2 exposure in the Belgian population to be as assumed in [19] using data from [2]. Assuming that the relative infectiousness of asymptomatic versus symptomatic individuals is 0.51 [19], we obtain the following constraint: - For the assumption on susceptibility
*a*_{i}(estimation of*q*-infectiousness parameters*h*_{j}): The assumption is taken from [1]:

### Data and fitting procedure

We use Belgian data on daily incidence of COVID-19 confirmed by means of a positive PCR test, as provided by the Belgian Institute for Public Health, Sciensano [24]. In order to reduce testing biases, the period of study is restricted to a period with almost constant testing policy (mandatory testing for both symptomatic cases and asymptomatic close contacts or red zone travelers) and before biases are induced by the introduction of the EU Digital COVID Certificate, see [32] or Table A of S1 Appendix) for a summary). Since there is a delay between a change in social contact behavior and its effect on the relative incidence, we consider PCR test results for the period starting 7 days after the onset of a specific CoMix wave and lasting for 14 days thereafter.

Concerning social contact data, the initial CoMix survey waves (1 to 8) are discarded due to a variable testing policy and lack of information regarding child-child contacts. The three subsequent waves (9 to 11) are also discarded since, despite the introduction of measuring child-child contacts, the information was collected using a different survey formulation. CoMix waves 12 to 23 correspond to a period with constant testing policy, an identical survey design as well as without vaccination in children, which implies that the results with regard to age classes [0–6), [6–12) and [12–18) years are expected to be more stable. The start of wave 12 corresponds to 22 December 2020 when the vaccination campaign in adults has not been started and the last wave considered corresponds to 26 May 2021, with PCR tests considered up to 15 June 2021 (thus when vaccination of the oldest individuals in the Belgian population was nearly completed).

The estimation of the parameters *a*_{i} or *h*_{i} is performed using the statistical software package R. A minimum-distance estimation is performed using the Hellinger distance [33] (which is suitable for distributions) between relative incidences *w** and . The optimization is done by means of a random search numerical method [34], starting from an initial homogeneous prior (*a*_{i} = 1 or *h*_{i} = 1, ∀*i* = 1…10). The process uses a Gaussian random walk with steps of length which are performed until no change in the distance is observed during 100 consecutive iterations. The sensitivity analysis is provided by repeating the process over 200 nonparametric bootstrap runs using the previous posterior as new prior. Uncertainty is quantified using means and 95% percentile confidence intervals (i.e., 2.5% and 97.5% quantiles of all bootstrap-based estimates).

Since *q*-susceptibility and *q*-infectiousness vectors represent relative values, a normalization process should be chosen for the representation of the results. We choose here the following (two-step) normalization process:

- The average or of the estimated factors for the age classes [0,6), [6,12) and [12,18) (children, no subject to vaccination during the complete period) is assumed to be constant across all bootstrap runs and wave groups if applicable (i.e. all combinations of two CoMix waves).
- The mean or (across the bootstrap runs) for the first adult age class [18, 30) is set to one. This constraint is chosen because we mainly want to compare susceptibility and infectiousness of children versus adults while the age class [18, 30) is one of the most stable ones in the bootstrapping process. Note that this second normalization step is only a global scaling, thus conserving the confidence interval around one for the age class [18, 30).

*R*_{t} evolution from contact patterns

Via the next generation approach, the ratio of the eigenvalues of two next generation matrices can be used to evaluate the relative reduction in the reproduction number. This can be done to compare the temporal reproduction number derived from the CoMix survey with independent evaluations of the reproduction number. We use as comparison the *R*_{t} computed from the daily number of cases [25] and the daily number of hospitalizations [24]. In order to account for the time delays associated with infections and hospitalizations (e.g. time to develop symptoms, time to hospitalizations, etc.), the reproduction number computed from the CoMix social contact data was shifted forward in time. A time shift of 7 (14) days is considered when comparing *R*_{t} estimates with the reproduction number computed from the number of confirmed cases (respectively hospitalizations). These time shifts are chosen in order to take account of a mean delay between infection and (symptomatic) testing as well as an additional delay between symptom onset and hospitalization [35]. As the reproduction number is known up to the overall constant , we fix the reproduction number for CoMix wave 12 to be equal to the reproduction number computed from infections or hospitalizations. Uncertainty due to sampling variability is estimated via 10000 nonparametric bootstraps.

## Supporting information

### S1 Appendix. Supplementary material.

Presentation of the complete results with all figures and tables containing values. Table A: Timetable of CoMix starting dates. Fig A. Schematic timeline of CoMix waves. Table B: Timetable of Belgian testing policy. Figs. B and C. Sensitivity analysis of the method under different assumptions. Figs. D to S: Estimates of relative proportionality factors and corresponding relative incidence under different assumptions. Table C to J: Exact estimates of the components of the proportionality factors.

https://doi.org/10.1371/journal.pcbi.1009965.s001

(PDF)

## Acknowledgments

We thank several researchers from the SIMID COVID-19 consortium (interuniversity collaboration between University of Antwerp (CHERMID) and UHasselt (DSI, CenStat) as well as other researchers from the Interuniversity Institute of Biostatistics and statistical Bioinformatics (I-BioStat) (KU Leuven and UHasselt) for numerous constructive discussions and meetings. The authors thank the EpiPose consortium partners for useful discussions and for help in setting up the CoMix survey as part of EpiPose. The authors are also very grateful for access to the data from the Belgian Scientific Institute for Public Health, Sciensano.

## References

- 1. Davies NG, Klepac P, Liu Y, Prem K, Jit M, Pearson CAB, et al. Age-dependent effects in the transmission and control of COVID-19 epidemics. Nature Medicine. 2020;26(8):1205–1211. pmid:32546824
- 2. Wu JT, Leung K, Bushman M, Kishore N, Niehus R, de Salazar PM, et al. Estimating clinical severity of COVID-19 from the transmission dynamics in Wuhan, China. Nature Medicine. 2020;26(4):506–510. pmid:32284616
- 3. Goldstein E, Lipsitch M, Cevik M. On the Effect of Age on the Transmission of SARS-CoV-2 in Households, Schools, and the Community. The Journal of Infectious Diseases. 2020;223(3):362–369. pmid:32743609
- 4. Viner RM, Mytton OT, Bonell C, Melendez-Torres GJ, Ward J, Hudson L, et al. Susceptibility to SARS-CoV-2 Infection Among Children and Adolescents Compared With Adults: A Systematic Review and Meta-analysis. JAMA Pediatrics. 2021;175(2):143–156. pmid:32975552
- 5. Hu S, Wang W, Wang Y, Litvinova M, Luo K, Ren L, et al. Infectivity, susceptibility, and risk factors associated with SARS-CoV-2 transmission under intensive contact tracing in Hunan, China. Nature Communications. 2021;12(1):1533. pmid:33750783
- 6. Sun K, Wang W, Gao L, Wang Y, Luo K, Ren L, et al. Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2. Science. 2021;371(6526):eabe2424. pmid:33234698
- 7. Hoang T, Coletti P, Melegaro A, Wallinga J, Grijalva CG, Edmunds JW, et al. A Systematic Review of Social Contact Surveys to Inform Transmission Models of Close-contact Infections. Epidemiology. 2019;30(5). pmid:31274572
- 8.
Diekmann O, Heesterbeek JAP. Mathematical epidemiology of infectious diseases: model building, analysis and interpretation. vol. 5. John Wiley & Sons; 2000.
- 9. Santermans E, Goeyvaerts N, Melegaro A, Edmunds WJ, Faes C, Aerts M, et al. The social contact hypothesis under the assumption of endemic equilibrium: Elucidating the transmission potential of VZV in Europe. Epidemics. 2015;11:14–23. pmid:25979278
- 10. Flasche S, Hens N, Boëlle PY, Mossong J, van Ballegooijen WM, Nunes B, et al. Different transmission patterns in the early stages of the influenza A(H1N1)v pandemic: A comparative analysis of 12 European countries. Epidemics. 2011;3(2):125–133. pmid:21624784
- 11. Mossong J, Hens N, Jit M, Beutels P, Auranen K, Mikolajczyk R, et al. Social Contacts and Mixing Patterns Relevant to the Spread of Infectious Diseases. PLOS Medicine. 2008;5(3):1–1.
- 12. Coletti P, Wambua J, Gimma A, Willem L, Vercruysse S, Vanhoutte B, et al. CoMix: comparing mixing patterns in the Belgian population during and after lockdown. Scientific Reports. 2020;10(1):21885. pmid:33318521
- 13. Verelst F, Hermans L, Vercruysse S, Gimma A, Coletti P, Backer JA, et al. SOCRATES-CoMix: a platform for timely and open-source contact mixing data during and in between COVID-19 surges and interventions in over 20 European countries. BMC Medicine. 2021;19(1):254. pmid:34583683
- 14. Jarvis CI, Van Zandvoort K, Gimma A, Prem K, Auzenbergs M, O’Reilly K, et al. Quantifying the impact of physical distance measures on the transmission of COVID-19 in the UK. BMC Medicine. 2020;18(1):124. pmid:32375776
- 15. Gimma A, Munday JD, Wong KL, Coletti P, van Zandvoort K, Prem K, et al. CoMix: Changes in social contacts as measured by the contact survey during the COVID-19 pandemic in England between March 2020 and March 2021. medRxiv. 2021; pmid:34282423
- 16. Wallinga J, Teunis P, Kretzschmar M. Using Data on Social Contacts to Estimate Age-specific Transmission Parameters for Respiratory-spread Infectious Agents. American Journal of Epidemiology. 2006;164(10):936–944. pmid:16968863
- 17. Goeyvaerts N, Hens N, Ogunjimi B, Aerts M, Shkedy Z, Damme PV, et al. Estimating infectious disease parameters from data on social contacts and serological status. Journal of the Royal Statistical Society: Series C (Applied Statistics). 2010;59(2):255–277.
- 18. Willem L, Abrams S, Libin PJK, Coletti P, Kuylen E, Petrof O, et al. The impact of contact tracing and household bubbles on deconfinement strategies for COVID-19. Nature Communications. 2021;12(1):1524. pmid:33750778
- 19. Abrams S, Wambua J, Santermans E, Willem L, Kuylen E, Coletti P, et al. Modelling the early phase of the Belgian COVID-19 epidemic using a stochastic compartmental model and studying its implied future trajectories. Epidemics. 2021;35:100449. pmid:33799289
- 20. Franco N. COVID-19 Belgium: Extended SEIR-QD model with nursing homes and long-term scenarios-based forecasts. Epidemics. 2021;37:100490. pmid:34482186
- 21. Coletti P, Libin P, Petrof O, Willem L, Abrams S, Herzog SA, et al. A data-driven metapopulation model for the Belgian COVID-19 epidemic: assessing the impact of lockdown and exit strategies. BMC Infectious Diseases. 2021;21(1):503. pmid:34053446
- 22. Munday JD, Jarvis CI, Gimma A, Wong KLM, van Zandvoort K, Liu Y, et al. Estimating the impact of reopening schools on the reproduction number of SARS-CoV-2 in England, using weekly contact survey data. BMC Medicine. 2021;19(1):233. pmid:34503493
- 23. Chin T, Feehan DM, Buckee CO, Mahmud AS. Contact surveys reveal heterogeneities in age-group contributions to SARS-CoV-2 dynamics in the United States. medRxiv. 2021;
- 24.
Sciensano, the Belgian public health institute; 2021. https://epistat.wiv-isp.be/covid/.
- 25.
Data Science Institute UHasselt, COVID-19 Data Dashboard; 2021. https://gjbex.github.io/DSI_UHasselt_covid_dashboard/.
- 26. Willem L, Van Hoang T, Funk S, Coletti P, Beutels P, Hens N. SOCRATES: an online tool leveraging a social contact data sharing initiative to assess mitigation strategies for COVID-19. BMC Research Notes. 2020;13(1):293. pmid:32546245
- 27.
Social Contact Rates (SOCRATES) Data Tool: as part of the socialcontactdata.org initiative; 2021. http://www.socialcontactdata.org/socrates-comix/.
- 28.
StatBel, the Belgian statistical office; 2020. https://statbel.fgov.be/en.
- 29.
Held L, Hens N, O’Neill P, Jacco W. Handbook of Infectious Disease Data Analysis. Boca Raton: Chapman and Hall/CRC; 2020.
- 30.
Meyer CD. Matrix Analysis and Applied Linear Algebra. Other Titles in Applied Mathematics. SIAM; 2000. Available from: https://books.google.be/books?id=-7JeAwAAQBAJ.
- 31. Diekmann O, Heesterbeek JAP, Metz JAJ. On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations. Journal of Mathematical Biology. 1990;28(4):365–382. pmid:2117040
- 32.
History of Belgian testing policy and procedures; 2021. https://covid-19.sciensano.be/fr/procedures/historique-des-changements.
- 33. Beran R. Minimum Hellinger Distance Estimates for Parametric Models. The Annals of Statistics. 1977;5(3):445–463.
- 34. Rastrigin L. The convergence of the random search method in the extremal control of a many parameter system. Automaton & Remote Control. 1963;24:1337–1342.
- 35. Faes C, Abrams S, Van Beckhoven D, Meyfroidt G, Vlieghe E, Hens N. Time between Symptom Onset, Hospitalisation and Recovery or Death: Statistical Analysis of Belgian COVID-19 Patients. International Journal of Environmental Research and Public Health. 2020;17(20):7560. pmid:33080869