Estimating and interpreting secondary attack risk: Binomial considered harmful

The household secondary attack risk (SAR), often called the secondary attack rate or secondary infection risk, is the probability of infectious contact from an infectious household member A to a given household member B, where we define infectious contact to be a contact sufficient to infect B if he or she is susceptible. Estimation of the SAR is an important part of understanding and controlling the transmission of infectious diseases. In practice, it is most often estimated using binomial models such as logistic regression, which implicitly attribute all secondary infections in a household to the primary case. In the simplest case, the number of secondary infections in a household with m susceptibles and a single primary case is modeled as a binomial(m, p) random variable where p is the SAR. Although it has long been understood that transmission within households is not binomial, it is thought that multiple generations of transmission can be safely neglected when p is small. We use probability generating functions and simulations to show that this is a mistake. The proportion of susceptible household members infected can be substantially larger than the SAR even when p is small. As a result, binomial estimates of the SAR are biased upward and their confidence intervals have poor coverage probabilities even if adjusted for clustering. Accurate point and interval estimates of the SAR can be obtained using longitudinal chain binomial models or pairwise survival analysis, which account for multiple generations of transmission within households, the ongoing risk of infection from outside the household, and incomplete follow-up. We illustrate the practical implications of these results in an analysis of household surveillance data collected by the Los Angeles County Department of Public Health during the 2009 influenza A (H1N1) pandemic.


Introduction
In infectious disease epidemiology, the household secondary attack risk (SAR) is the probability of infectious contact from an infected household member A to a susceptible household member B during A's infectious period, where we define infectious contact as a contact sufficient to infect B if he or she is susceptible.It is often called the secondary attack rate, but we prefer to call it a risk because it is a probability [1].SARs can also be defined in other groups of close contacts, such as schools or hospital wards [2].
It has been understood that within-household transmission is not binomial since the work of En'ko in 1899 [39], Reed and Frost in 1928 [40], and Greenwood in 1931 [41].The process is binomial only if the primary case (the first infected household member [42]) is the only possible source of infection for susceptible household members throughout his or her infectious period.However, binomial models continue to be used for the estimation of the SAR because it is thought that multiple generations of transmission within households can be safely neglected when the SAR is small.In its simplest form, this assumes that the number of secondary infections in a household with m susceptible individuals and a single primary case is a binomial(m, p) random variable, where p is the household SAR.A given transmission chain of length k from a primary case A to a given susceptible B has probability p k , which decays exponentially as k increases.Up to and including the COVID-19 pandemic, the vast majority of studies of household transmission use a binomial model (often a logistic regression model) to estimate the household SAR [2, 6, 7, 9, 11-13, 19, 22-28, 30-32, 35-38].A smaller number of studies have used explicit statistical models of transmission [15-18, 20, 21, 29, 33, 43].Here, we hope to establish that the latter approach should become universal.
Although the probability each given transmission chain of length k from A to B decays as p k , the risk of infection through k generations of transmission also depends on the number of transmission chains of length k.A transmission chain of length k ≥ 1 from A to B can be specified by choosing k − 1 individuals from the m − 1 susceptible household members other than B. Each ordering of these k − 1 individuals produces a unique transmission chain.For 1 ≤ k ≤ m, the total number of paths from A to B of length k equals the number of permutations of k − 1 objects chosen from m − 1 objects: 1 8 56 336 Table 1.Number of paths from the primary case to a given susceptible.
Table 1 shows that the number of paths of length k can grow quickly with household size.Each path can carry infection from A to B, so the total risk of transmission from A to B along any path of length k can be much greater than p k .A binomial model attributes this additional risk of infection to direct transmission from the primary case, so the estimated SAR is too high.
The binomial variance assumes that infections in different household members are independent.Because the each new infection in a household increases the risk of infection in the remaining susceptibles, infections within a household are positively correlated.This correlation makes the true variance in the number of infections larger than the binomial variance.To address this issue, cluster-adjusted variances [6,21,23,24,31] and random effects [32] have been used to account for correlation among household members.Because of the bias in the point estimate of the SAR, this adjustment for clustering does not produce confidence intervals that have the expected coverage probabilities.
In a disease where the latent period (between infection and the onset of infectiousness) is longer than the infectious period, multiple generations of infection can be separated in time.This was seen seen most famously by Peter Panum in a measles epidemic on the Faroe Islands in 1846 [44].With such separation, a binomial model could be used to estimate the risk of infection within a follow-up interval designed to capture only the first generation of transmission.However, most infectious diseases can have overlapping generations of infection.For example, influenza has an average latent period of 1-2 days days and an infectious period of 3-4 days [45].In general, the binomial model cannot be salvaged by adjusting the follow-up time of households.
In its original usage, the SAR was defined as the probability that a susceptible in a household with a primary case is infected by within-household transmission, whether or not there were multiple generations of transmission within the household [8,9].Here, we will call this the household final attack risk (FAR).With complete follow-up of all households, a cluster-adjusted binomial model could produce an unbiased estimate of the FAR.However, the estimated FAR will be biased upward if there are co-primary cases or if household members are at risk of infection from outside the household during the follow-up period [3,9,46].Such conditions are common in practice, so the binomial model cannot be salvaged by returning to early interpretations of the SAR.
In its modern interpretation, the household SAR is an extremely useful measure of the transmissibility of infection.However, this interpretation requires us to abandon the use of binomial models for estimation.Here, we use probability generating functions and simulations to show that (1) a binomial model produces biased estimates of the household SAR even when the probability of transmission is small and (2) cluster adjustment of the variances does not produce interval estimates with the expected coverage probabilities.To estimate the household SAR, explicit statistical models of disease transmission such as longitudinal chain binomial models [40,47] or pairwise survival analysis [48][49][50] should always be used.We illustrate the practical implications of these results using household surveillance data collected by the Los Angeles County Department of Public Health during the 2009 influenza A (H1N1) pandemic.
For simplicity, our analytical calculations and simulations assume a uniform SAR within households (i.e., no variation in infectiousness or susceptibility) and no risk infection from outside the household except for the primary case.These assumptions are not realistic: We intend to show that binomial models break down even under these ideal conditions.We use probability generating functions (PGFs) to calculate the true outbreak size distributions at different combinations of the number of susceptibles (m) and the SAR (p), and we verify these calculations in simulations of household outbreaks.

Household outbreak size distributions
Assume that each infectious member of a household makes infectious contact with each other member of the household with probability p during his or her infectious period.Let p mi be the probability that i out of m susceptibles are infected by within-household transmission in a household with a single primary case.Then is the probability generating function (PGF) for the outbreak size distribution in a household with m susceptibles and one primary case.Because a household with zero susceptibles has zero secondary infections with probability one, g 0 (x) = 1.
The PGF for the outbreak size distribution in a household with m + 1 susceptibles can be derived from the PGFs for smaller households.Imagine a household with m susceptibles of whom i were infected.Now imagine that the household had one more susceptible.There are two possible outcomes: 1.With probability (1 − p) i+1 , the additional susceptible escapes infection from all i + 1 infected household members.The total number of infections in the household is i.

2.
With probability 1 − (1 − p) i+1 , the additional susceptible gets infected.He or she acts like a primary case in a household containing the m − i susceptibles who escaped infection.There are i + 1 infections, and the number of infections among the remaining susceptibles has the PGF g m−i (x).
Combining these results, we conclude that The first few iterations yield which can be checked by hand.We calculated these polynomials using Python code in S2 File.As shown in Eq (2), the coefficient on x i in the PGF g m (x) is the probability that i of m susceptibles are infected in a household outbreak started by a single primary case.Using these probabilities, we can calculate the mean and variance of the number of infections among the m susceptibles.

Household outbreak simulations
We simulated household outbreaks using Erdős-Rényi random graphs [51,52], where each pair of nodes is connected independently with probability p.In our graphs, each node represents a household member and p is the SAR.One node is fixed as the primary case, and all household members connected to the primary case by a series of edges are infected.We performed 40,000 simulations for each combination of household size and SAR.In each simulation, there were 200 independent households of the same size.We used logistic regression to calculate the proportion of susceptible household members who were infected with a naive 95% confidence interval.We then calculated a cluster-adjusted confidence interval using generalized estimating equations (GEE) with a robust variance estimate.The variance inflation factor (VIF) was calculated as the ratio of the robust variance to the naive variance.All confidence intervals were calculated on the logit scale as β ± 1.96 σ where β = logit(p) is the estimated log odds of infection and σ is the naive or robust standard error estimate.Finally, we transformed the confidence intervals to the probability scale and estimated the coverage probabilities for the true household SAR and the true household FAR.
Source code Simulations were implemented in Python 3 [53], and statistical analysis was performed in R [54].The R code is available in S1 File, and the Python code is available in S2 File.All software used is free and open-source, and further details are given in the Supporting Information.

Household data analysis
To give a practical example of the consequences of using a binomial model to estimate the household SAR, we use influenza A (H1N1) household surveillance data collected by the Los Angeles County Department of Public Health (LACDPH) between April 22 and May 19, 2009.The data was collected using the following protocol [48]: 1. Nasopharyngeal swabs and aspirates were taken from individuals who reported to the LACDPH or other health care providers with acute febrile respiratory illness (AFRI), defined as a fever ≥ 100 • F plus cough, core throat, or runny nose.These specimens were tested for influenza, and the age, gender, and symptom onset date of the AFRI patient were recorded.
2. Patients whose specimens tested positive for pandemic influenza A (H1N1) or for influenza A of undetermined subtype were enrolled as primary cases.Each of them was given a structured phone interview to collect information about his or her household contacts.They were asked to report the symptom onset date of any AFRI episodes among their household contacts.
3. When necessary, a follow-up interview was given 14 days after the symptom onset date of the primary case to assess whether any additional AFRI episodes had occurred in the household, including their illness onset date.
For simplicity, we assume all AFRI episodes among household members were caused by influenza A (H1N1) and that all household members except the primary case were susceptible to infection.All analyses use natural history assumptions adapted from Ref [20] and identical to those in Refs [49,50].In the primary analysis, we assumed an incubation period of 2 days, a latent period of zero days, and an infectious period of 6 days.In a sensitivity analysis, we consider 7-day and 12-day infectious periods.We estimated the household SAR for 2009 pandemic influenza A (H1N1) using binomial models, a longitudinal chain binomial model, and parametric pairwise March 2, 2022 5/25 regression models.In each household, we censored observations at the end of the infectious period of the primary case.Thus, the models are fit only to infections that could have been caused by primary cases, giving the binomial models the best possible chance of accurately estimating the household SAR.For each assumed infectious period, all statistical models were fit to exactly the same data.For simplicity, we did not include any covariates in these analyses.Final size chain binomial models were not used because they require complete observation of each within-household epidemic, so they cannot be fit to data censored at the end of the infectious period of the primary case in each household.
Binomial models Two binomial models were fit to the LACDPH households data.First, we used an intercept-only logistic regression model with unadjusted and cluster-adjusted confidence intervals [55].Second, we used an intercept-only binomial GEE model [56] to get a second set of cluster-adjusted confidence intervals.
Longitudinal chain binomial model The chain binomial model assumes that a given infectious person A makes infectious contact with a given susceptible household member B with an unknown probability p on each day that A is infectious.On day t, an individual B who is exposed to k infectious household members will escape infection with probability q k and be infected with probability 1 − q k , where q = 1 − p.The likelihood contribution from observation of individual B is the product of these likelihood contributions over all days where B was at risk of infection.The overall likelihood is the product of the likelihood contributions of all susceptibles who were at risk of infection for at least one day.The household SAR is 1 − q ι where ι is the infectious period.Because p ∈ (0, 1), our likelihood was defined in terms of logit(p) = ln( p /q).To get a point estimate of the SAR, the unknown true q is replaced by a point estimate q = 1 − p.Standard maximum likelihood estimation was used to get point and interval estimates on the logit scale, which were transformed back to the probability scale.For simplicity, we have assumed that the probability of escaping infection from an infectious household member does not depend on how long he or she has been infectious or on any covariates.More sophisticated longitudinal chain binomial models can allow the escape probability to vary with the time since infection or with covariates [40,47].
Pairwise survival analysis Pairwise survival analysis estimates failure times in ordered pairs consisting of an infectious individual and a susceptible household member [57].The pair AB is at risk of transmission starting with the onset of infectiousness in A, and failure occurs if A infects B. This failure time, called a contact interval is right-censored if B is infected by someone other than A or if observation of the pair stops.To account for uncertainty about who-infected-whom, the overall likelihood is the sum of the likelihoods for all possible combinations of who-infected-whom consistent with the data [48].The survival function S(τ, θ) is the probability that the contact interval is greater than τ , where θ is a parameter vector.If θ 0 is the true value of the parameter and the infectious period is ι, then the household SAR is 1 − S(ι, θ 0 ).To get a point estimate of the SAR, the unknown true parameter θ 0 is replaced by the maximum likelihood estimate θ.
We used intercept-only exponential, Weibull, and log-logistic regression models [58].For the exponential distribution, S(τ, λ) = exp(−λτ ) where λ is the rate parameter.For the Weibull distribution, S(τ, λ, γ) = exp[−(λτ ) γ ] where λ is the rate and γ is the shape parameter.For the log-logistic distribution, S(τ, λ, γ) = [1 + (λτ ) γ ] −1 for rate λ and shape γ.For all three distributions, λ > 0 and γ > 0 so we defined our likelihoods in terms of their natural logarithms ln λ and ln γ.Standard maximum likelihood estimation was used to get point estimates and a covariance matrix for the rate and shape parameters.To get a 95% confidence interval for the SAR, we sampled ln λ and ln γ from their approximate multivariate normal distribution, calculated the household SAR for each sample, and took the 2.5% and 97.5% quantiles of the calculated SARs as confidence limits.
Goodness of fit To see how well the SAR estimates fit the data, we simulated outbreaks in the Los Angeles households using SAR point estimates from the binomial model, the chain binomial model, and pairwise survival models.In each simulation, we calculated the total number of infections among susceptible household members.For each SAR estimate, we performed 4,000 simulations.We then compared the simulated household epidemics to the observed final size of the outbreak started by the primary cases (i.e., the total number of cases who can be linked to a primary case through one or more generations of transmission).For all infectious periods shorter than 12 days, there are a few observed cases that occur after the end of the initial within-household outbreak.Given the assumed infectious period, these late cases are excluded because they can only be explained by later introductions of infection into the household.
Source code Statistical analyses were done with R [54], and the simulations were implemented in Python 3 [53].The R code is available in S3 File, the Python code is available in S4 File, and the household data are available in S5 File.All software used is free and open-source, and further details are given in the Supporting Information.

Household outbreak simulations
Fig 1 shows the household FAR calculated using PGFs (lines) and from simulations (symbols) as a function of the true SAR and the number of susceptibles.There is excellent agreement between the analytical calculations and the simulations.Both show that the household FAR is larger than the household SAR when there is more than one susceptible.At a fixed SAR, the difference between the SAR and the FAR increases with household size.Thus, a binomial model will produce a point estimate of the SAR that is biased upward whenever there is more than one susceptible household member.
Fig 2 shows the VIF calculated using PGFs (lines) and from simulations (symbols) as a function of the true SAR and the number of susceptibles.Again, there is excellent agreement between the analytical calculations and the simulations.The variance of the number of infections within households is substantially larger than the binomial variance, and this difference increases with increasing household size.Thus, confidence intervals based on a binomial estimate will have coverage probabilities that are too low even if the estimated SAR is correct.
Fig 3 shows the household SAR coverage probabilities for unadjusted and cluster-adjusted binomial 95% confidence intervals.Even for small households, the coverage probabilities are below 95% and decrease rapidly as the true SAR increases.Cluster adjustment increases the coverage probabilities only slightly.With or without adjustment for clustering by household, a binomial model does not produce reliable point or interval estimates of the household SAR.
Fig 4 shows coverage probabilities of unadjusted and cluster-adjusted 95% confidence intervals for the household FAR.Coverage of the FAR is much higher than coverage of the SAR.However, the coverage probabilities for unadjusted confidence intervals are always below 95%, and they decrease with increasing household size or increasing SAR.Adjustment for clustering by household corrects this problem, producing coverage Secondary attack rate Final attack rate q q q q q q q q q q q q q q q q q q q q m = 9 m = 4 m = 2 Secondary attack rate Variance inflation factor q q q q q q q q q q q q q q q q m = 9 m = 4 m = 2 Secondary attack rate SAR coverage probability q q q q q q q q q q q q q q q m = 2 m = 4 m = 9 Secondary attack rate FAR coverage probability q q q q q q q q q q q q q q q m = 2 m = 4 m = 9 probabilities close to 95% for all household sizes.Under these ideal conditions, a binomial model can produce reliable point and interval estimates of the household FAR as long as clustering within households is taken into account.This does not imply that FAR can be defined clearly or estimated accurately under more realistic conditions, and it does not imply that the FAR is an acceptable substitute for the SAR in practice.

Household data analysis
In the LACDPH pandemic influenza A (H1N1) data, there were 58 households with a total of 299 members.There were 99 infections, of which 62 were classified as primary cases because 4 of 58 households had two co-primary cases.There were 37 household contacts who were infected while under observation.The median household size was 5 with a range from 2 to 20.Both in this example and more generally, co-primary cases and varying household sizes are practical problems for estimation of the household SAR.There are three types of cases relevant to our analyses: Possible second generation cases are susceptible household members who are infected during the infectious period of the primary case, so it is possible that they were infected by the primary case.Final size cases are susceptible household members who could have been infected through a chain of transmission starting from a primary case.Late cases are susceptible household members who were infected after the end of the infectious period of the last final size case in the household.Given the assumed infectious period, these cases can only be explained by a new introduction of infection to the household.They are excluded from SAR estimation and from the observed final size for the assumed infectious period.
Table 2 shows the numbers of possible second generation cases, final size cases, and late cases for each assumed infectious period from 3 days (almost certainly too short) to 12 days (almost certainly too long).Assuming an infectious period of 6 days, there are 24 possible second generation cases, 26 final size cases, and 11 late cases.Assuming an infectious period of 7 days results in substantially larger numbers of possible second generation cases and final size cases.An infectious period of at least 12 days is required to account for all observed cases through within-household transmission.We show analyses with 6-day, 7-day, and 12-day infectious periods.
Table 3 shows point estimates and 95% confidence intervals for the household SAR.The point estimates for all binomial models are identical.As expected, binomial models produce much higher estimates than the chain binomial or pairwise regression models.Adjustment for clustering produced a wider confidence interval, with cluster-adjusted variance and GEE producing very similar results.The chain binomial and exponential pairwise regression models produced nearly identical point and interval estimates of the household SAR.For each infectious period, the Weibull and log-logistic pairwise regression models produced slightly different SAR estimates and wider confidence intervals than the exponential model.In all cases, the exponential model had the lowest AIC.The chain binomial and pairwise regression estimates are consistent with each other, but neither is consistent with the binomial estimates.
Fig 5 shows histograms of the simulated outbreak sizes in the LA households based on the four different SAR estimates that assume a 6-day infectious period.The binomial estimates predict outbreaks larger than observed, but the chain binomial and pairwise estimates predict outbreak size distributions centered near observed outbreak size.similar pattern for estimates that assume 7-day and 12-day infectious periods, respectively.For the binomial estimates, the predicted outbreak sizes increase quickly with the assumed infectious period.For the chain binomial and pairwise regression estimates, the predicted outbreak sizes increase much more slowly.extent that a true household SAR exists, it is almost certainly below the binomial estimates and closer to the chain binomial and pairwise regression estimates.
An important advantage of the longitudinal chain binomial and pairwise regression models is that they can estimate the SAR using the entire period of household observation even when there is an ongoing risk of infection from outside the household.Table 4 shows point and interval estimates of the SAR based on the full data set collected by the LACDPH.As before, the chain binomial and pairwise exponential models produce nearly identical point and interval estimates.Using the full data set, the pairwise Weibull and log-logistic models produce point estimates closer to those of the one-parameter models than in Table 3, but their confidence intervals remain wider.All four models produce lower point estimates of the SAR when using the full data set than when using only the possible second generation data.Fig 8 shows the distribution of outbreak sizes under the pairwise exponential estimate of the SAR assuming 6-, 7-, and 12-day infectious periods.The light gray histograms in the background show the distributions based on the point estimates from Table 3, which used the possible second generation data.In all three cases, there is a small but clear improvement in the  predictive fit of the model when the full data set is used.Similar results were seen for the longitudinal chain binomial and pairwise Weibull and logistic regression models (not shown but produced by S3 File).

Discussion
Studies of disease transmission in households and other clearly-defined groups at risk of infection are part of a glorious tradition in infectious disease epidemiology [8].They remain one of the most effective means of obtaining critical information about routes and risk factors of transmission, the basic reproduction number, and the natural history of epidemic diseases [3,59].Every author of the studies cited above has made an important contribution to infectious disease epidemiology and to public health.However, these studies should no longer be analyzed using binomial models.Even when the SAR is small, it is important to account for multiple generations of transmission within households.Unless these generations are clearly separated in time, a binomial estimate of the SAR will be biased upward and have a confidence interval with low coverage probability even if the variance is adjusted for clustering.
A binomial model can estimate the household FAR accurately if cluster-adjusted confidence intervals are used.However, the FAR was clearly defined in our simulations only because we made the following assumptions: (1) Each household had at most one primary case, (2) susceptibles were not at risk of infection from outside the household, and (3) all households had the same size.In practice, these assumptions are extremely unlikely to hold.The LACDPH data had households with multiple primary cases and household sizes that varied from 2 to 20.For all assumed infectious periods shorter than 12 days, there were cases that could only be explained by the re-introduction of infection to the household.Unlike the FAR, the household SAR can be clearly defined and estimated even when there are multiple primary cases, ongoing risk of infection from outside the household, and varying household sizes.
The discrete-time chain binomial model [47] and pairwise survival models [48][49][50] 6−day infectious period  require more detailed follow-up of each household than final size models, but they can account accurately for delayed entry, loss to follow-up, and the risk of infection from outside the household.If there are asymptomatic infections or if infection times cannot be determined with sufficient precision, data augmentation and Markov chain Monte Carlo (MCMC) can be used to account for the transmission of infection [60].These longitudinal models also allow the probability or hazard of transmission to depend on individual-level, pairwise, and household-level covariates [50].The household members in the LACDPH data varied in ways that could have affected their susceptibility and infectiousness, including age, sex, and use of antiviral prophylaxis.Simultaneous estimation of these effects is critical to preventing bias for contagious outcomes [61].
Accurate estimates of covariate effects on infectiousness and susceptibility can provide critical insight into the effectiveness of public health interventions such as handwashing, social distancing, antiviral prophylaxis or treatment, and vaccination.Whereas binomial models can be fit using almost any standard statistical package, the lack of available software has been a major obstacle to the adoption of statistical models of infectious disease transmission in household studies.Chain binomial models are available in the free and open source software package TranStat (www.cidid.org/transtat),which incorporates several advanced methods [62,63] and has been used in analyses of influenza [20], Zika virus [64], and Ebola virus [29].Pairwise survival models are available in the free and open source transtat package for R, which was used to analyze the LA household data above.This package includes parametric models and semiparametric models [48][49][50]57].
In the COVID-19 pandemic, there have been too few studies of SARS-CoV-2 transmission in households or other clearly-defined populations at risk of infection, leaving unanswered many questions about the modes and intensity of transmission and the predictors of infectiousness and susceptibility [59].This has forced public health decisions that affect millions of lives to be made under crushing uncertainty.Household studies can provide critical scientific insights to guide public health interventions and policies.The results above show that replacing binomial models with statistical models of transmission will help infectious disease epidemiologists who conduct these studies contribute more effectively to the prevention and control of epidemics.

Fig 1 .
Fig 1.The household FAR as a function of the SAR for households with different numbers of susceptibles m.Lines show analytical calculations using probability generating functions, and simulations show estimates from 40,000 simulated household outbreaks.Each simulated household outbreak had a single primary case, so the total household size was m + 1.

Fig 2 .
Fig 2.The VIF as a function of the SAR for households with m susceptibles.Lines show analytical calculations, and symbols show estimates from 40,000 simulated household outbreaks.Each simulated household outbreak started with a single primary case, so the total household size was m + 1.For numerical stability, symbols are shown only for simulations with an observed FAR < 0.99.

Fig 3 .
Fig 3.  Coverage probabilities of binomial 95% confidence intervals for the household SAR with different numbers of susceptibles (m).Gray lines are coverage probabilities for unadjusted confidence intervals, and black lines are coverage probabilities for cluster-adjusted confidence intervals.Each symbol represents 1,000 simulations with 100 households each.

Fig 4 .
Fig 4.  Coverage probabilities of binomial 95% confidence intervals for the household FAR with different numbers of susceptibles (m).Gray lines are coverage probabilities for unadjusted confidence intervals, and black lines are coverage probabilities for cluster-adjusted confidence intervals.Each symbol represents 1,000 simulations with 100 households each. a

Fig 5 .
Fig 5. Histograms of simulated final outbreak sizes in the LA households based on household SAR estimates assuming a 6-day infectious period.Vertical black lines indicate the observed final size of 26 cases.

Fig 6 .Fig 7 .
Fig 6.Histogram of simulated final outbreak sizes in the LA households based on SAR estimates assuming a 7-day incubation period.Vertical black lines indicate the observed final size of 32 cases.

Fig 8 .
Fig 8. Histograms of simulated outbreak sizes based on pairwise exponential SAR estimates using the full data (dark gray) superimposed on the corresponding histograms from Figs. 5-7 based on estimates using second generation data (light gray).For each assumed infectious period, a vertical black line shows the observed final outbreak size.

Table 2 .
The number of possible second generation cases, final size cases, and late cases for each assumed infectious period.There are always 37 total final size and late cases.

Table 3 .
Estimates of the household SAR with 95% confidence limits and Akaike information criterion (AIC) for pairwise regression models.

Table 4 .
Full-data estimates of the household SAR with 95% confidence limits and Akaike information criterion (AIC) for pairwise regression models.