Estimation of introduction and transmission rates of SARS-CoV-2 in a prospective household study

Household studies provide an efficient means to study transmission of infectious diseases, enabling estimation of susceptibility and infectivity by person-type. A main inclusion criterion in such studies is usually the presence of an infected person. This precludes estimation of the hazards of pathogen introduction into the household. Here we estimate age- and time-dependent household introduction hazards together with within household transmission rates using data from a prospective household-based study in the Netherlands. A total of 307 households containing 1,209 persons were included from August 2020 until March 2021. Follow-up of households took place between August 2020 and August 2021 with maximal follow-up per household mostly limited to 161 days. Almost 1 out of 5 households (59/307) had evidence of an introduction of SARS-CoV-2. We estimate introduction hazards and within-household transmission rates in our study population with penalized splines and stochastic epidemic models, respectively. The estimated hazard of introduction of SARS-CoV-2 in the households was lower for children (0-12 years) than for adults (relative hazard: 0.62; 95%CrI: 0.34-1.0). Estimated introduction hazards peaked in mid October 2020, mid December 2020, and mid April 2021, preceding peaks in hospital admissions by 1-2 weeks. Best fitting transmission models included increased infectivity of children relative to adults and adolescents, such that the estimated child-to-child transmission probability (0.62; 95%CrI: 0.40-0.81) was considerably higher than the adult-to-adult transmission probability (0.12; 95%CrI: 0.057-0.19). Scenario analyses indicate that vaccination of adults can strongly reduce household infection attack rates and that adding adolescent vaccination offers limited added benefit.


Introduction
Transmission of SARS-CoV-2 occurs predominantly in indoor settings such as public transportation, workplaces, schools, and households [1][2][3][4].Infection can cause the respiratory and systemic disease COVID-19, but in general severity and progression of the disease are mild [5,6].This has hampered, even despite huge research attempts, to accurately quantify variations in susceptibility and infectiousness, and how these depend on host characteristics such as age and sex, type of infecting strain, pre-existing immunity, and vaccination.
Household studies are considered the gold standard for the study of infectious disease transmission, as they provide a setting in which transmission events can be pinned down to one or a small number of potential infectors [7][8][9][10][11][12][13][14][15][16][17][18].Classical analyses of household data use statistical regression techniques to estimate the fraction of persons that are infected over the course of a household outbreak (the secondary attack rate or SAR), stratified by person-type and household characteristics [19].For SARS-CoV-2, a meta analysis of 54 studies has revealed that secondary attack rates are higher when the index case is symptomatically infected, that transmission to adults occurs more often than to children, that transmission to spouses occurs more often than to other family contacts, but that there are no significant sex-differences in attack rates [4].These studies, while providing valuable information, do not provide estimates of parameters that have a biological interpretation, and in particular do not provide insight in the rates of direct person-to-person transmission.As a consequence, they also do not lend themselves to extrapolation and scenario analyses.In addition, most studies use a reactive design in which households are included only after an infected person has been detected in the household (see [20] for an exception).This makes these studies vulnerable to bias, e.g., household-size biased inclusion and bias toward inclusion of households with more severely infected index cases.Also, as households are only included after the first infection these studies cannot estimate the rates at which infections are introduced into the household.
We propose the prospective household-based cohort study as an attractive crossover of the reactive household study and the prospective cohort study.While classical person-based prospective cohort studies in principle can provide high-quality information on risk factors and confounding variables, they are also inefficient if the outcome (infection) occurs infrequently [21].This inefficiency is remedied by employing a household based inclusion, as it increases the number of events (i.e.infections).We illustrate this by using data from a prospective household study carried out in the Netherlands in the first year of the SARS-CoV-2 pandemic.The study contains a total of 1, 209 persons distributed over 307 households, of which 59 are infected during the study period.We analyse the data in a Bayesian framework using survival analysis for estimation of the hazards of introduction of SARS-CoV-2 into households, and stochastic SEIR epidemic models for estimation of within household transmission rates.As it is known that contacts between household members do not occur randomly [22], we stratify the analyses by person-type (child, adolescent, adult), and select the most likely contact structure based on statistical evaluation of competing models.We show that precise estimates of the type-specific introduction rates can be obtained together with the person-to-person transmission rates, and explore the impact of different vaccination strategies on reducing (1) the role of households as a multiplier of infection and (2) the probabilities of infection of specific persons (e.g., adults).

Introduction of SARS-CoV-2 into households
Given the uncertainties at the time on dominant transmission routes of SARS-CoV-2 several household studies were initiated in the Netherlands in 2020 [2,3].Inclusion of households in our study was fairly uniform from September 2020 until January 2021, and has been decreasing from January 2021 onward (Fig 1).Most households are included for the maximal followup period of 161 days, and SARS-CoV-2 has been introduced and established in almost one out of five households (59 of 307, 19%).To explore whether the hazards of introduction can be explained by simpler parametric functions with small number of parameters we perform a number of sensitivity analyses.For instance, a scenario which assumes fixed person-specific introduction hazards yield estimates of the introduction hazards of 0.00030 (0.00018-0.00047)per day for children, 0.00053 (0.00031-0.00086)per day for adolescents, and 0.00052 (0.00039-0.00067)per day for adults.Hence, this model confirms that the introduction hazard is lower for children than for

Within-household transmission of SARS-CoV-2
A total of 59 out of 307 households had a detected SARS-CoV-2 introduction over the course of the study, and in these 59 households 119 of 237 persons had documented SARS-CoV-2 infections.Of these 119 infections, 77 are considered primary or co-primary cases that introduced the infection in the households.Total numbers of children, adolescents, and adults in these households are 89, 31, and 117, corresponding numbers of primary and co-primary cases are 21, 8, and 48, and corresponding numbers of household infections are 19, 3, and 20.A concise overview of the household data is given in Table 1.
The household data are analyzed with a suite of transmission models that vary in the assumptions on the person-to-person transmission rates.Most models are variations and simple extensions of so-called proportionate mixing models in which the transmission rate β ij from a j-type infected person to an i-type susceptible person per infectious period can be written as b ij ¼ f 0 i g 0 j , where f 0 j denotes the absolute infectiousness of type j, and g 0 i the absolute susceptibility of type i.With three person-types these models have a maximum of 6 parameters.One parameter is redundant, however, and the proportionate mixing model can be formulated in terms of the transmission rate among persons of a reference class (β, here taken to be the adults) and the infectiousness and susceptibility of children (C) and adolescents (A) relative to the reference (f C , f A , g C , and g A )(Methods, see also [23]).An overview of the scenarios is given in Table 2.The analyses show that models that do not stratify the population by person-type ('no stratification'), or models that assume a separate transmission parameter for each persontype combination ('full model') perform less well than models of intermediate complexity.In particular, the 'variable infectivity' model with an additional parameter for child-to-child transmission performs best both with regard to LOO_IC and WBIC.In terms of the above Table 1.Overview of household infection data.Shown are household compositions (columns 1-3), the total number of households with a given compositions (column 4), and the total number of household infections within each group (columns 5-7).Full data are available in the online repository (https://github.com/mvboven/sars2-households).transmission matrix, this model is parameterised as follows:

Household composition Number of households
where z CC denotes the specific child-to-child transmission rate.
Table 2. Comparison of household models using information criteria.Model selection is based on the information criteria LOO_IC and WBIC.Shown are results for models that do not stratify the population by age ('no stratification'), and that stratify the population into children (0-12 years), adolescents (12-17 years), and adults (over 18 years).Stratified transmission models are considered in which susceptibility of different age groups is estimated while infectivity is assumed to be identical for different age groups ('variable susceptibility'), in which infectivity is estimated while susceptibility is fixed ('variable infectivity'), and in which both susceptibility and infectivity are estimated ('proportionate mixing').Within the 'variable infectivity' model we further consider sub-models with transmission rates for child-to-child, adolescent-to-adolescent, and adult-to-adult transmission.A saturated model with a separate parameter for each transmission rate is also considered ('full model').The number of within-household parameters is indicated by n.All models assume density-dependent transmission.Sensitivity analyses are performed with respect to the distribution of the infectious period and assumptions on the primary case(s) in the household.As final size distributions are invariant with respect to the mean of the infectious period [24], we focus on how the results are affected by variation in the infectious period distribution.We consider scenarios with an infectious period of fixed duration, and one with an infectious period with more variation than in the default scenario.The parameter estimates of the scenario with fixed infectious period are

Model
, where α = 50 is the shape parameter of the infectious period probability distribution. https://doi.org/10.1371/journal.pcbi.1011832.g003 Table 4. Estimates of within-household transmission parameters when coprimary cases are assumed to be infected within the household.Parameter estimates are shown for the variable infectivity model with separate child-to-child transmission.See Table 3 for details and comparison with the main analysis.

Parameter Estimate 95%CrI
Transmission rate among adults (β) (per person per infectious period) 0.22 (0.14-0.very close to those reported in Table 3.The same is true if variation in the infectious period is increased such that the 95% coverage is 0.61-1.48(gamma shape and rate parameter both equal to 20) instead of 0.74-1.30 in the default scenario.Second, we consider a scenario in which we assume that all cases that had originally been designated as co-primary cases (i.e.infected from outside of the household) had actually been infected within the household [3].
In this scenario, the precision with which introduction hazards can be estimated is decreased, while precision of within-household transmission parameter estimates is increased.Noteworthy, the overall within-household transmission rate increases, while the three peaks of the introduction hazard are still noticeable but are decreased in size.Overall patterns of withinhousehold transmission, including the high estimated child-to-child transmission rates, remain as in the main analyses.Table 4 shows the results of the within-household analyses when using the same model as in the main analyses.
To evaluate the correspondence between the household infection data and posterior predictions we perform posterior predictive checks for the most common household composition consisting of two adults and two children (cf.Table 1) using parameter estimates of the preferred model (Table 3).Of the 22 households with two adults and two children, we focus on the 18 households with a single primary introduction.SARS-CoV-2 is introduced into the household by a child in 4 households and by an adult in 18 households.Fig 4 shows the observed fraction of households with a given number of secondary infections (black dots, lines represent 95% binomial confidence ranges) with the corresponding posterior probabilities  1).Shown are the fractions of the households with a given number of secondary infections (black dots) with exact binomial confidence ranges (lines) stratified by primary case (child or adult), together with the corresponding posterior probabilities of secondary cases.https://doi.org/10.1371/journal.pcbi.1011832.g004(violin plots).Notice that the binomial confidence ranges are quite wide, owing to the small number of households with a given outcome.Overall, the posterior medians of the infection probabilities are within the binomial confidence ranges, and in the 18 households with an adult primary introduction also close to the observed fractions with a given outcome.For other household compositions the numbers are even smaller and comparisons of outcome data with posterior predictions are consequently less informative.

The impact of vaccination on household transmission
The above preparations enable quantification of the role of households as a multiplier of infection.We focus on the secondary attack rate (SAR) and the probability that a focal adult in the household is infected.Table 5 shows the results for various household compositions and vaccination scenarios.Specifically, we consider two vaccination scenarios, one in which all adults in the household are vaccinated, and another one in which both adolescents and adults are vaccinated.In both cases we assume a leaky vaccine that is highly effective in preventing infection (VE S = 0.9) but does not reduce infectiousness (VE I = 0).Additionally, we base all ensuing analyses on the transmission model with highest statistical support (Table 2).Due to the high estimated infectiousness of children, estimated SARs are invariably higher if a child is the primary case rather than an adolescent or adult.Differences in outbreak size can be substantial, especially in larger households.For instance, in households of six persons the estimated SAR is 0.50 if the child is the primary case, but less than 0.25 if it is the adolescent or adult.Noteworthy, estimates are less precise in households with one or more adolescents due to the relatively small number of adolescents in our study.
For the estimated parameters, vaccination with an effective but slightly leaky vaccine has the potential to strongly reduce the size of the household outbreaks (Table 5).For instance, in households consisting of a single child and a single adult, the SAR is 0.25 if the child is the primary case and no vaccination is applied, but just 0.029 if the adult has been vaccinated.In larger households, sizeable reductions can also be achieved, depending on the primary case.Focusing again on households of six persons, the SAR is reduced from 0.50 to 0.31 if a child is Table 5.Estimated secondary attack rates without and with vaccination.The vaccine is assumed to be 90% effective (VE S = 0.9) in preventing infection but not effective reducing infectiousness (VE I = 0).Shown are inferred secondary attack rates (SARs) for various household compositions.Estimates are represented by posterior medians and 2.5% and 97.5% posterior quantiles.Households consist of either a child and an adult (rows 1-2), an adolescent and an adult (rows 3-4), two children and two adults (rows 5-6), two adolescents and two adults (rows 7-8), or two children, two adolescents, and two adults (rows 9-11), thus including the most common household compositions with children and adolescents in the Netherlands (rows 1-8).For each household composition, SARs are calculated for all possible primary cases.Vaccination scenarios are considered in which adults are vaccinated, or in which both adults and adolescents are vaccinated.NA: households do not contain an adolescent.Estimates are based on 1,000 samples from the posterior distribution.

Household composition (numbers)
Primary the primary case, from 0.21 to 0.11 if the primary case is an adolescent, and from 0.24 to 0.18 if the primary case is an adult.Adding vaccination of adolescents does further decrease household outbreak size in households in which an adolescent is present.However, due to the smaller rates of transmission to and from adolescents (Fig 3) the added benefit of adolescent vaccination is smaller compared to adult vaccination.Second, we explore how adding adolescent vaccination might further reduce the probability of infection of a specific adult.This is especially relevant as adults have an intrinsically higher risk of severe disease, especially if the adult is immunocompromised or immunosuppressed [25,26].We focus on a number of illustrative scenarios for various household compositions and vaccination scenarios.The analyses show that the estimated probability of infection of the adult is high in the absence of vaccination, especially if a child is the primary infection in the household (Fig 5A), but can be strongly reduced by adult vaccination (Fig 5B).However, adding adolescent vaccination does not noticeably reduce the probability of adult infection further, as transmission is already strongly reduced and adults are already protected directly by vaccination (Fig 5C ).
Finally, to investigate the robustness of the above vaccination scenarios with respect to the assumed vaccine efficacy we perform scenario analyses with increased vaccine efficacy against infectiousness (VE I = 0.5 instead of VE I = 0).Table 6 shows that in this case the SARs only exceeds 10% in households of size 4 or more when the child is the primary case.In fact, the SAR is highest in a household of size 6 when a child is the primary case.Interestingly, for this particular household composition the impact of vaccination does not noticeably depend on the ability to reduce infectiousness after vaccination, as most infections in these households are caused by the highly infectious children (cf Table 5 with Table 6).

Discussion
We have shown that precise estimates of SARS-CoV-2 household introduction hazards as well as within-household transmission rates can be obtained in a modestly sized study.This has been possible by virtue of the prospective set-up in which households are included before the first infection in the household has been observed.The prospective design has the added benefit compared to reactive household studies that there is lower risk of bias, in particular bias caused by preferential inclusion of larger households (as it is more likely that an infection is introduced in a larger than in a smaller household) and by preferential inclusion of households with severe infections (as these are more likely noticed) [4,27].To make this design work logistically, we have employed a so-called passive-active follow-up strategy in which households are semi-passively followed during the at-risk period with an app, and in which more intensive follow-up is performed upon notification of acute respiratory symptoms in the household [3].
With regard to the introduction hazards we have taken an agnostic approach in which hazards are optimally informed by the data, i.e. without assuming a specific underlying population-level transmission model.This was done on purpose, as the early SARS-CoV-2 pandemic has been strongly affected by lock-downs and behavioral responses, and is not easily described by simple transmission models [28][29][30].With regard to within-household transmission, however, we fit a stochastic transmission model to the data.The within-household analyses provide estimates of age-stratified transmission rates, and in particular yield estimates of intrinsic transmissibility between children, adolescents, and adults in a population with low vaccination coverage and low pre-existing cumulative infection attack rates (� 10-20%, [31]).Our analyses have revealed that, compared with adolescents and adults, children were not the main source of introduction of SARS-CoV-2 into households, that adolescents played a minor role propagating the infection in the household, that children were the dominant transmission source in the household (see also [2,18,27]), and in general that quantitative estimates of introduction hazards can be obtained.In fact, we are not aware of other studies providing such estimates that are optimally informed by the data.In our analyses, the estimated introduction hazards range from � 0.0001 per child per day in epidemic troughs to � 0.001 per adolescent/ adult per day in epidemic upswings.Timing of the peaks and troughs relative to hospital admissions in the Netherlands (Fig 2 ) suggest that this approach has produced reliable results.
While we are convinced that within the confines of our data we have selected the transmission model that is in a statistical sense optimal among the models considered, we also acknowledge a number of weaknesses and alternatives.First, throughout we have assumed a so-called density-dependent transmission model in which each individual in the household makes a fixed number of contacts with each of the other household members per unit of time.This was Table 6.Estimated secondary attack rates without and with vaccination.The vaccine is assumed to be 90% effective (VE S = 0.9) in preventing infection and moderately effective reducing infectiousness (VE I = 0.5).Shown are inferred secondary attack rates (SARs) for various household compositions.Estimates are represented by posterior medians and 2.5% and 97.5% posterior quantiles.See Table 5 for details, and results without vaccination and with vaccination with a vaccine that does not reduce infectiousness.

Household composition (numbers)
Primary done for convenience, as it enables direct translation of the estimated transmission parameters to conditional infection probabilities irrespective of household size, but also because this model provides a slightly better fit to the data than a frequency-dependent transmission model.However, our data contains just 59 infected households, and has limited variation in the size of infected households (3-6 persons).Therefore, alternative contact scenarios cannot yet be discarded (see also [22]).Reassuringly, estimates of person-to-person transmission probabilities are very close under the density-and frequency-dependent contact models for the most common household composition of 1-4 persons.Second, we have assumed that if initial cases in the household are found at the same day, that these cases represent co-primary cases.If one or more of these co-primary cases would actually have been infected in the household, then including this information in the analyses increases within-household transmission rates while decreasing introduction hazards (see Results).In general, it will be very difficult to determine which person or persons have been the primary case or primary cases if onsets of symptoms are on the same day or only a few days apart.This inability to pinpoint the primary case is common to all household-based studies, and is a weakness that is not easily remedied.Fortunately, in sensitivity analyses our results appeared qualitatively robust to such trade-offs between introduction and transmission rates when more or fewer infections are assumed to represent primary case(s).
A third point of concern is that temporal information is used to estimate the introduction hazard but that only limited temporal information is used in the within-household analyses.In fact, temporal information only affects the probability that an additional infection will be introduced from outside the household.We have assumed that household outbreaks have a duration that is equal to the outbreak durations as defined in our earlier study [3].In these analyses most household outbreaks had a duration of 2 to 5 weeks (median: 26 days, range 13-126 days).Fortunately, the parameters seemed to be hardly affected by assumptions on the duration of the household outbreaks, as the estimated introduction hazard is very small compared to the within-household transmission rates (> 2 orders difference).For instance, setting the outbreak duration to either 14 or 28 days for all households hardly affected the results.More problematic, though, might be the implicit assumption that the introduction hazard may vary over time and by person-type, but does not depend on whether other household members have recently been infected outside the household.This can be unrealistic if household members share contacts outside of the household.Incorporation of such correlations in the introduction hazard is a major direction of future model development and applications.
A fourth and final point is the assumption that we are able to perfectly determine who has and who has not been infected.In other words, we assume perfect specificity and sensitivity of the testing chain, viz. the combination of both the tests and testing procedures.It is known that especially in reactive household studies these assumptions may not be met, and that, for instance, results can be affected if sampling is not sufficiently dense [2].Our study differs fundamentally from those reactive studies in the prospective app-based setup, and intensive follow-up upon the detection of a household infection [3].Still, it is conceivable that some misclassifications may have occurred.This applies in particular to the sensitivity of the testing procedures and recognition of primary cases in the household.This is due to the fact that some asymptomatic infections may have been missed, resulting in an underestimation of the introduction hazards.Such missed households would probably mostly occur in case that there would be a single primary case, as the probability of household detection increases with numbers of infections.This, in turn, could potentially have led to lower estimated within-household transmission rates.In all, however, we believe that our study has small probability of misspecification, and certainly much smaller than in other household studies [4].
Our findings demonstrate that prospective household-based studies hold considerable promise to study the interplay between vaccination-and infection-induced immunity on the household introduction hazard, the susceptibility to (re-)infection, and the infectiousness once infected in terms of biologically interpretable parameters.In addition, analysing such data using transmission models has the added advantage over traditional statistical analyses [2,27,32] that transmission model-based estimates can feed into impact analyses of interventions.

Ethics statement
This prospective cohort study was conducted in the Netherlands and was reviewed and approved by the Medical Ethical Committee Utrecht, The Netherlands (reference number 17-069/M), the Medical Ethical Committee of the Vrije Universiteit Medical Centre (VUmc), the Netherlands (reference number A2012.901), and the Medical Ethical Committee of Erasmus Medical Centre, the Netherlands (reference number MEC-2020-0609).Written informed consent was obtained from all participating household members and/or their legal representatives.

Study design and data collection
The Cokids household-based study into the causes of acute respiratory infections (ARIs) in households with underage children was conducted between August 2020 and August 2021.Enrollment was focused on the period between August 2020 and February 2021.Eligible households were selected from three existing Dutch birth cohort studies, and all had at least one child aged 0-17 years.Details of the study design are presented elsewhere [3].Here we briefly mention the salient aspects of the study relevant to our analyses.First, the study contained a core study with follow-up of a maximum of 161 days, and an extended study with follow-up period until July 2021.Since follow-up criteria are different between the core and extended follow-up, and could potentially lead to bias, we here only include the basic followup period for estimation of the introduction hazards.We did, however, include the 4 households with extended follow-up in the analyses of within-household transmission.Additionally, there were 4 households of size 2 in the core study that provided information on the introduction of SARS-CoV-2 into the household but not on within-household transmission as both persons were labelled as primary cases.There was only 1 vaccinated person in the data used for our analyses.
All participants were checked daily, using an app developed specifically for the study, for the occurrence of respiratory symptoms and fever.In addition, all participants were screened for a panel of respiratory viruses at a 4-6 week intervals, irrespective of symptoms.At new onset of respiratory symptoms or a SARS-CoV-2 positive test result, a household outbreak study was initiated, which included daily symptom recording, repeated PCR testing of nosethroat swabs, saliva, and fecal samples, and SARS-CoV-2 antibody testing using paired dried blood spots in all household members.Hence, an outbreak study with intensified follow-up of the household was initiated when (1) a household member developed new-onset respiratory symptoms or fever, or (2) a SARS-CoV-2 positive result was received on a screening test, or (3) a positive test result was received from an external testing site.Based on symptoms, the first case or cases in the household were marked as primary or coprimary infections.Details of the follow-up procedures are given elsewhere [3].
Hospital admission data have been retrieved from the National Institute for Public Health and the Environment's open data (https://data.rivm.nl/covid-19).To remove weekday effects we present these data as a centered 7-day moving average.

Transmission model
Our analyses are based on a multi-type susceptible-exposed-infectious-recovered (SEIR) transmission model.In this model individuals are classified as susceptible to infection (S), infected but not yet infectious (E), infected and infectious (I), or recovered and immune (R).Throughout we stratify the population by age as follows: children (under 12 years), adolescents (12-18 years), and adults (18 years and older).This stratification corresponds well with age at which children transition from primary to secondary school, and from secondary school to subsequent education in the Netherlands [2,3].
The within-household analyses use the number of infections that have occurred at the end of the household outbreaks [8,11,24,33].Statistical methodology based on such final size data is well-developed, and these methods have advantages over analyses that use all temporal information.First, the number of infections in the household can usually be determined with high certainty, while the timing of transitions between classes is often uncertain.Second, final size analyses are invariant with respect to the latent period, i.e. the time that individual spend in the exposed (E) class [33].Hence we may, without loss of generality, focus on a simplified susceptible-infectious-recovered (SIR) type model with no latent period.Third, final size data do not allow estimation of parameters with respect to calendar time, but only relative to other model parameters.This enables simplification of the model by reducing the number of parameters.Specifically, we can assume, again without loss of generality, that the mean of the infectious period is fixed at length 1 time unit, and set the basic reproduction number equal to the contact rate parameter in a reference class.Here, we assume that adults are the reference class.Variation in the infectious periods can affect the outcomes, however, and we assume that the infectious period is gamma distributed [33].Here we assume that the 95% coverage of the infectious period ranges from 0.74 to 1.30, i.e. approximately 6-10 days when the mean infectious period is 8 days.This seems within reasonable range [34,35].We supplement the default analyses with a scenario in which the infectious period is fixed at 1 time unit (i.e.delta-distributed), and with a scenario with increased variation in the infectious period such that the 95% coverage of the infectious period distribution ranges from 0.61 to 1.48 of the mean.
Further, with respect to the number of contacts in households of different sizes we focus on two extremes, viz a density-dependent contact model in our default scenario, and a frequencydependent contact model as alternative [23].In the density-dependent model each household member makes an identical expected number of contacts with each of the other household members per unit of time, while in the frequency-dependent transmission model each household member makes an identical number of contacts per unit of time.Hence, in the frequency-dependent model household members in larger households make fewer contacts with each of the other household members compared with individuals in smaller households.

The final size distribution
The statistical methods rely on the fact that we can compute the probability distribution of the final outbreak size of a given household.Let a, n and j 2 Z d �0 represent the number of primary household cases, initially uninfected members and the number of secondary infections, respectively.Here d is the number of types (in our analysis d = 3 age classes).Given fixed a and n, we want to calculate the probabilities Q j of the final size 0 � j � n.The probabilities Q j depend on a number of parameters.First, we require the transmission rate matrix b 2 R d�d �0 , in which the element β μν denotes the transmission rate from an individual of type ν to type μ.Next, let b denote the vector of probabilities of escape from external infection, i.e., 1 − b μ is the probability that an individual of type μ gets infected during the household outbreak by someone from outside the household if the person had not already been infected otherwise.Finally, the duration of each infection is independent and identically distributed (iid) random variable T i .The probabilities Q j are expressed in terms of the Laplace transform of T i , given by �ðxÞ ¼ E½e À xT i �.
In our analysis T i � Gammaða; a À 1 Þ, such that E½T i � ¼ 1 time unit, and ϕ(x) = (1 − x/α) −α .For integer vectors m and ℓ and a real vector x, we write the scalar function ϕ, we write ϕ(x) to denote the vector (ϕ(x 1 ), . .., ϕ(x d )) 0 .With these definitions and notation in place, the final size probability Q j is given by Q j ¼ P j n j � � where P j satisfies the recursive equation Although formal derivations of Eq (1) can be found elsewhere [24,33], here we give an intuitive derivation using elementary methods.In particular, our analyses do not make use of a Wald identity.We present the method for d = 1 (i.e.no age stratification), but the arguments easily generalize to d > 1.
We write m : ℓ to denote the set of indices m, m + 1, . .., ℓ, which is empty when m > ℓ.First, we enumerate the non-primary members of the household as 1 : n, and we split them into two groups: 1 : j and j + 1 : n.If we condition on how many members in the first group get infected, then we can easily calculate the probability that none of the second group get infected.Suppose that k individuals of 1 : j get infected, then this conditional probability equals where T 1 , . .., T a+k are the gamma-distributed lengths of the infectious periods of the a primary cases and k infected members in group 1 : j.The infectious periods of infected members can overlap, in which case we assume the transmission hazard is additive.With probability b n−j , no one in group j + 1 : n gets infected due to external contacts.The lengths T i are unknown, and therefore we take the expectation to integrate them out.As the T i are i.i.d., we get C k,j = ϕ(β(n − j)) a+k b n−j .Next, let P k denote the probability that exactly individuals 1 : k are infected.If k � j, then we can interpret P k as the probability that 1 : k of the first group 1 : j are infected, and none of j + 1 : n are infected.Moreover, the product J k;j ¼ P k � j k À � is equal to the joint probability that k arbitrary members of group 1 : j are infected (as opposed to exactly members in 1 : k), and none of j + 1 : n.Finally, let U k,j denote the unconditional probability that k members of 1 : j are infected (regardless of what happens to j + 1 : n), then we find using the definition of conditional probability that P k , and U k,j are related by Using the law of total probability, we find that P j k¼0 U k;j ¼ 1, and hence we find the recursive Eq (1) for P k , which includes the edge case P 0 = ϕ(βn) a b n , and therefore allows us to compute P j .Since the household division 1 : j and j + 1 : n was arbitrary, we can now compute the final size probability Q j ¼ P j n j � � .

Hazard of external infection
To estimate the hazard of external infection we use semi-parametric penalized splines [36].A related but simpler approach in which a fixed introduction probabilities are estimated for predefined periods is given in House et al [20].
In our framework, the spline p(t) is defined as a linear combination of cubic basis splines pðtÞ ¼ P K k¼0 w k B k ðtÞ, with 50 equidistant knots such that K = 52.To penalize large deviations of p(t), the weights w are equipped with a random-walk prior as follows.We choose w 0 � N ðÀ 7:5; 2:5Þ and define the other weights cumulatively as w k ¼ w 0 þ s P k '¼1 z ' , where z ' � N ð0; 10Þ.The diffusion parameter σ determines the smoothness of the spline and is given a weakly-informative prior σ 2 * InvGamma(1, 0.0005).For the adult age class, we then define the hazard of infection h 3 (t) = exp(p(t)).The hazards for the other age classes are proportional to the adult hazard, i.e. h 1 (t) = r 1 h 3 (t) for children and h 2 (t) = r 2 h 3 (t) for adolescents, where r 1 , r 2 > 0 are the relative hazards (Table 3).

Likelihood function
With the ingredients specified above, we can formulate the likelihood L(t, a, n, j) of observing a household infected at time t with a primary infected persons, n non-primary persons, and j persons infected over the course of the household outbreak: Lðt; a; n; jÞ ¼ hðtÞ 0 a exp À which is the product of the likelihood that the index cases are infected at time t, and the probability of final size j.This probability Q j depends on n and a, but also on t, because the probability of escape b depends on the external infection hazard h as follows: b ¼ expðÀ R tþDt t hðsÞdsÞ, where Δt is either set at 14 or 28 (days), or is defined as the household-specific period that an infected household is monitored.For households that were not infected during the study period, the final size is unobserved and the introduction time is right-censored.Therefore, the likelihood is given by Lðt; nÞ ¼ exp À where t is time that the household dropped out of the study and n is the household composition.In our inferential analyses, we approximate the cumulative hazards with sums R t 2 t 1 hðsÞds � P t 2 À 1 s¼t 1 hðsÞds.

Inference and model selection
We estimate the parameters in a Bayesian framework using Hamiltonian Monte Carlo.Except for the weakly-informative weights of the penalized spline (see above), none of the other parameters are given explicit prior distributions.Hence, these parameters have (improper) uniform prior distributions on their domains, making each possible parameter value equally likely a priori.To reduce correlations between parameters and facilitate mixing, we parameterize the proportionate mixing model and simplifications thereof in terms of absolute infectivities f i and absolute susceptibilities g i (i = 1, 2, 3), such that the transmission rates are given by β i,j = g i f j , and in particular the transmission rate in the reference class (i = 3, adults) is given by β � β 3,3 = g 3 f 3 .Using this formulation the relative infectivities and susceptibilities of the other person-types (Table 3) are given by f i f 3 and g i g 3 (i = 1, 2).Since one parameter is redundant, we have further taken g 3 � 1 without loss of generality.
All analyses are performed using Stan (version 2.29.0) and R (version 4.2.2) using the CmdStanR package (version 0.4.0) as interface between R and Stan [37].We run 10 Hamiltonian Monte Carlo (HMC) chains in parallel and base the analyses on 1,000 samples from 10 chains, where we have applied 1/10 thinning.In all model runs effective number of samples generally ranges from 900-1,100, while the convergence criterion R ranges from 0.99-1.01.In all analyses we observe strong contraction of all parameter posterior distributions relative to the prior distributions, indicating that all parameters can be estimated from the data.Data, scripts, and figures are available in the online repository at https://github.com/mvboven/sars2-households (doi.org/10.5281/zenodo.10534386).
Model selection is based on information criteria for singular statistical models [38,39].Specifically, we use the leave-one-out information criterion (LOO_IC) from the loo package (version 2.4.1) to gauge relative predictive performance, and calculate the widely applicable Bayesian information criterion (WBIC) using a run of the model at the appropriate sampling temperature to select the most likely data generating process.As estimation of the introduction hazards is already optimized for predictive performance, we have used LOO_IC and WBIC only for the within-household analyses in model runs that exclude external infection during the household outbreaks.

Fig 1 .
Fig 1. Lexis diagram of the study population.Lines show households from the date of inclusion in the study to infection of the first household member (brown dots), completion of the inclusion period without infection (blue dots at time in study of 161 days) or dropout (blue dots with time in study shorter than 161 days).Date of inclusion of the first household was 24 August 2020, and last date of the study was 29 July 2021.https://doi.org/10.1371/journal.pcbi.1011832.g001

Fig 2 .
Fig 2. Estimates of the household introduction hazard for adults.Shown are the posterior median of the introduction hazard per person (blue line) with associated 95% credible envelope (gray area).Household introduction hazards of children and adolescents are obtained by multiplication of the hazard for adults with the relative introduction hazards for children and adolescents.Also presented are the daily number of hospitalisations (yellow dots).To remove weekday effects the number of hospitalisations are represented by a 7-day moving average centered around the current day.https://doi.org/10.1371/journal.pcbi.1011832.g002 /doi.org/10.1371/journal.pcbi.1011832.t001

Fig 3 .
Fig 3.Estimated person-to-person transmission probabilities.Shown are posterior medians of the infectious contact probabilities, i.e. the probabilities that a transmission event would have occurred from an infected person over its infectious period if the contacted person had not already been infected by another person.Infectious contact probabilities are calculated from the person-to-person transmission rates per infectious period β ij and the Laplace transform of the scaled infectious period distribution: P i infected by j ð Þ ¼ 1 À 1 þ

Fig 4 .
Fig 4. Posterior predictive checks in common households of size four, containing two adults and two children (cf.Table1).Shown are the fractions of the households with a given number of secondary infections (black dots) with exact binomial confidence ranges (lines) stratified by primary case (child or adult), together with the corresponding posterior probabilities of secondary cases.

Fig 5 .
Fig 5. Estimates of the probability of infection of an adult for different household compositions, primary infection (child, adolescent, adult), and vaccination strategies.(A) no vaccination, (B) vaccination of adults, and (c) vaccination of adults and adolescents (C).Plots represent the posterior distribution (1,000 samples), and black dots indicate posterior medians.Vaccine efficacy for susceptibility is VE S = 0.9.Notice the difference in scale on the y-axis between (A) and (B)-(C).https://doi.org/10.1371/journal.pcbi.1011832.g005