The role of children in the spread of COVID-19: Using household data from Bnei Brak, Israel, to estimate the relative susceptibility and infectivity of children

One of the significant unanswered questions about COVID-19 epidemiology relates to the role of children in transmission. This study uses data on infections within households in order to estimate the susceptibility and infectivity of children compared to those of adults. The data were collected from households in the city of Bnei Brak, Israel, in which all household members were tested for COVID-19 using PCR (637 households, average household size of 5.3). In addition, serological tests were performed on a subset of the individuals in the study. Inspection of the PCR data shows that children are less likely to be tested positive compared to adults (25% of children positive over all households, 44% of adults positive over all households, excluding index cases), and the chance of being positive increases with age. Analysis of joint PCR/serological data shows that there is under-detection of infections in the PCR testing, which is more substantial in children. However, the differences in detection rates are not sufficient to account for the differences in PCR positive rates in the two age groups. To estimate relative transmission parameters, we employ a discrete stochastic model of the spread of infection within a household, allowing for susceptibility and infectivity parameters to differ among children and adults. The model is fitted to the household data using a simulated maximum likelihood approach. To adjust parameter estimates for under-detection of infections in the PCR results, we employ a multiple imputation procedure using estimates of under-detection in children and adults, based on the available serological data. We estimate that the susceptibility of children (under 20 years old) is 43% (95% CI: [31%, 55%]) of the susceptibility of adults. The infectivity of children was estimated to be 63% (95% CI: [37%, 88%]) relative to that of adults.

1 Household outbreak dynamic model Assume a household has N a adults and N c children, N = N a + N c . The dynamic model we use for describing an outbreak in a household is a stochastic age-of-infection model, similar to the chain-binomial models traditionally used for household outbreaks ( [1], ch. 14-15, [2], ch. 10), but somewhat more realistic in taking into account the generation-time distribution. The model is described below. The main parameters of the model are where β i j (i, j = a, c) are transmission coefficients from individuals of type j to individuals of type i. These parameters will be fitted using simulated maximum likelihood. In addition, our approach requires to estimate the generation-time distribution P τ (1 ≤ τ ≤ T ) based on previous data.
Infections of adults (children) occur according to a Poisson process with rate λ a (t) (λ c (t)). These rates (also called forces of infection) are given by The probability that a susceptible adult (child) becomes infected on day t is therefore p a (t) = 1 − e −λ a (t) , p c (t) = 1 − e −λ c (t) .
We denote by S a (t) and S c (t) the number of adults and children who are susceptible on day t, respectively. Further, denote by i a (t) and i c (t) the number of adults and children who become infected on day t, respectively. The dynamics is described by i a (t) ∼ Bin (S a (t), p a (t)) , The initial conditions are: and, if the index case is an adult, while if the index case is a child: The household outbreak ends at time t * when no further infectious can occur in a household.
The total number of infected among adults and children in the household is Under the assumption that adults and children differ in susceptibility to being infected and in degree of infectivity, we can write where γ is the susceptibility of children relative to that of adults, and δ is the infectivity of children relative to that of adults. Note that this parametrization restricts the structure of the β -matrix, since only three parameters are involved, rather than four.
An example of the use of the dynamic model is presented in S1 Fig, where 10,000 epidemics were simulated using the maximum-likelihood (ML) parameter estimates (β aa = 0.40, γ = 0.35, δ = 0.70) for a household of 2 adults and 4 children and an index case that is an adult (left) or a child (right). One can see the mean daily number of infected individuals (children and adults), excluding the index case, along the duration of the epidemic.

Determining the observed time period
Fitting the model to the available data requires to determine the duration of the observed period for each household, which is used in the simulations providing the likelihood calculations below. We restrict the observed time period as follows: (a) To set the start date of the observed period in a given household, we first identify the earliest indication of infection in the household, which is either onset of symptoms of a positive case or a first positive test. While testing a household was prompted by reporting of symptoms, in ∼ 10% of the households, the earliest indication was a positive test, which could be attributed either to missing reports of symptoms or to cases in which the symptomatic individual who prompted the initial testing was found to be negative (our data set does not include reports of symptoms for negative cases). In addition, in cases in which the reported symptoms onset was over four weeks prior to the first test, the onset date of symptoms was discarded (12 cases).
(b) If the earliest indication is an onset of symptoms, then the start date is set to be five days prior to this onset (the mean incubation period [3]). Otherwise the start date is set as 10 days prior to the first date of a positive test (the mean time from infection to detection, during March-April 2020, as estimated from Israeli epidemiological investigation data, see S2 Fig). This determination of the initial date reflects the approximate timing of infection of the index case.
(c) The end date of the observed period was set as the latest date on which some household member was tested positive for the first time.

Determining the age group of the index case
The reported onset dates of symptoms and test dates were also used to discern the index case in each household. Positive household members whose first indicative date (onset of symptoms or first positive test) was five or less days from the earliest indicative date of all positive household members were considered suspected index cases with equal probability. Thus, for example, if there were three positive household members whose first indicative date was within five days of the first indicative date in the household, each of the three were given a probability of being an index case of 1/3 (and the rest of the household were given probability 0). For each household, the probability that the index case is an adult (child) was obtained by summing the probability of being an index case for all the adults (children) in that household. See Figure 2 in the main text for examples of the determination of the observed period and the suspected index cases in two households, using the above criteria. S1 Table presents the number of households with a given probability assigned for the index case to be an adult. Note that in most cases this probability is either 1 or 0.  Table S1: Number of households with a given probability assigned for the index case to be an adult.

The likelihood function
Assume that we have information on a set of households indexed by h. For each household we have the data: where N a (h), N c (h) are the number of adults and children in the household, ρ(h) indicates the probability that the index case is an adult.
The variables I a (h) and I c (h) stand for the number of adults and of children who were infected, while D(h) is the duration of the observed outbreak, determined as described in the main text.
Consider a household h, and assume that the index case i is of type ind(h) (where ind = a for an adult or ind = c for a child index case). Our goal is to compute the probabilities P h (I a , I c | ind, β ) of the possible outcomes I a and I c , which are the total infected adults and total infected children in such a household, respectively, given values of the parameters β . In order to compute P h we run S = 1000 realizations of the simulation of the household model, for a household with N a (h) adults, N c (h) children and with index case of type ind, with the parameters β , for a duration of D(h) days. We generate a table of the fraction of the simulations in which each of the possible outcomes was obtained, thus obtaining estimates for the above probabilities.
Given the actual outcome (I a (h), I c (h)) in a household the likelihood corresponding to this household, taking into account the uncertainty regarding the identity of the index case, is The likelihood corresponding to the entire data set is given by the product of the likelihoods corresponding to each of the households: and the corresponding log-likelihood is To estimate the parameters β we maximize LL(β ) over the parameters β . Specifically, we consider the special structure of the β matrix given by (1).

Comparison with an alternative model
We compared our model with an alternative, naive model fit to the data. We considered a simpler model in which in each household, all the infected members were infected by the index case, with no secondary infections within the household. We assume that in a household with an adult index case, the probability that a child will be infected is p ca and the probability that an adult will be infected is p aa . Similarly, in a household with a child index case, the probability that a child will be infected is p cc and the probability that an adult will be infected is p ac . In this simple model, in a household with N a adults and N c children and an adult index case, the number of children infected in the household will be binomially distributed with n = N c , p = p ca , and the number of adults infected (excluding the index case) will be binomially distributed with n = N a − 1 and p = p aa , and similarly for a household with a child index case. We therefore refer to this model as the binomial model.

Bootstrap simulations
In order to obtain parametric bootstrap confidence intervals for the ML parameter estimates we ran 1000 simulations of the dynamic model using the ML parameter estimates and employing the exact same households as in the Bnei-Brak data, and generated the number of infected children and adults in each household. We then re-estimated the parameters for these simulated data sets. The results are shown in S4 Fig

Sensitivity analysis
Sensitivity analysis was performed to examine the stability of the results to variations in the assumptions made when fitting the data.

Sensitivity to observed epidemic duration
We examined the sensitivity to variations in the observed epidemic time-period of the households in the data. We modified the duration of the observed epidemic in each household in the data set as follows: If a household's first indicative date was some member's symptom onset date, then instead of using the mean incubation time to determine the start time of the epidemic, we sampled an incubation period from the incubation time distribution in the general COVID-19 cases data in Israel. If a household's first indicative date was some member's first test date, then instead of using the mean detection time to determine the start time of the epidemic, we sampled an detection period from the detection time distribution in the general COVID-19 cases data in Israel. We refitted the model to this data set to estimate the model parameters. This was repeated 100 times. The results of this test are shown in S5 Fig. Blue color indicates the ML parameter estimates obtained using the duration set as in the main text. In most cases the inferred parameter values were identical or very similar to the original estimates, demonstrating small sensitivity of the results to the exact number of days the epidemic was observed in each household.

Sensitivity to uncertainty regarding the index case age-group
We examined the sensitivity of the results to the fact that in some cases the age-group to which the index case belongs is uncertain. In cases where the index case age-group was not certain (in 75 of the 637 households -see S1 Table), we randomly selected an age-group for the index case according to the probability of the index case being in each of the age-groups (as Figure S5: Sensitivity to observed epidemic duration. β aa -the transmission parameter among adults, γ -relative susceptability of children, δ -relative infectivity of children. The blue color indicates the values used to generate the simulated data. described in the Methods section). The model parameters were re-estimated by fitting the model to this data set. This was repeated 100 times. The results of this test are shown in S6 Fig. Blue color indicates the ML parameter estimates obtained using the original data set. In most cases the inferred parameter values were identical or very similar to the original estimates, demonstrating small sensitivity of the results to the uncertainty regarding the index case age-group. Figure S6: Sensitivity to uncertainty regarding the index case age-group. β aa -the transmission parameter among adults, γrelative susceptability of children, δ -relative infectivity of children. The blue color indicates the values used to generate the simulated data.

Sensitivity to the mean generation-time
This analysis is aimed for testing the sensitivity of the estimation results to the mean of the generation-time distribution. Recall that in the main text the results were obtained using a generation-time distribution with a mean value of 4.5. Here, we generated 100 simulated data sets with the exact household types as in the Bnei-Brak data set, while employing the ML parameter estimates and a generation-time distribution with mean of 4 days. Then we fitted a model to this data set assuming a mean generation time of 4.5 days, as used in Section Results of the main text. The results are presented in the top row of S7 Fig. It shows that the obtained estimates are not too sensitive to these deviations from the actual mean generation time. We repeated this type of analysis, now generating 100 data sets with a mean generation time of 5 days, and fitting again a model to this data set assuming a mean generation time of 4.5 days. The results are presented in the bottom row of S6 Fig and here as well, the estimates are not too sensitive to such deviations. 6 Positive rates per 10,000 in different age groups S8 Fig shows the positive rates per 10,000 in different age groups through time. Since the re-opening of schools on September 1st the infection rates in children aged 15-19 has more than doubled, and seem to have triggered growing infection rates in other age groups as well.