_{0}and Transmission Heterogeneity from the Size Distribution of Stuttering Chains

^{1}

^{2}

^{3}

^{*}

^{1}

^{2}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: SB JOLS. Performed the experiments: SB. Analyzed the data: SB JOLS. Wrote the paper: SB JOLS.

For many infectious disease processes such as emerging zoonoses and vaccine-preventable diseases,

This paper focuses on infectious diseases such as monkeypox, Nipah virus and avian influenza that transmit weakly from human to human. These pathogens cannot cause self-sustaining epidemics in the human population, but instead cause limited transmission chains that stutter to extinction. Such pathogens would go extinct if they were confined to humans, but they persist because of continual introduction from an external reservoir (such as animals, for the zoonotic diseases mentioned above). They are important to study because they pose a risk of emerging if they become more transmissible, or conversely to monitor the success of efforts to locally eliminate a pathogen by vaccination. A crucial challenge for these ‘stuttering’ pathogens is to quantify their transmissibility, in terms of the intensity and heterogeneity of disease transmission by infected individuals. In this paper, we use monkeypox as an example to show how these transmission properties can be estimated from commonly available data describing the size of observed stuttering chains. These results make it easier to monitor diseases that pose a risk of emerging (or re-emerging) as self-sustaining human pathogens, or to decide whether a seemingly large chain may signal a worrisome change in transmissibility.

There are many circumstances in infectious disease epidemiology where transmission among hosts occurs, but is too weak to support endemic or epidemic spread. In these instances, disease is introduced from an external source and subsequent secondary transmission is characterized by ‘stuttering chains’ of transmission which inevitably go extinct. This regime can be defined formally in terms of the basic reproductive number,

A top priority in all of these settings is to quantify transmission, in order to determine the risk that the pathogen could emerge and become established in the human population of concern. This could occur due to demographic or biological changes that increase transmission, such as declining vaccine coverage

We use simulations and epidemiological data to explore the influence of transmission heterogeneity on inference from chain size data, and to show that the degree of heterogeneity can actually be inferred from such data. Building upon prior studies we assume that the offspring distribution, which describes the number of secondary infections caused by each infected individual, can be represented by a negative binomial distribution. This has been shown to be an effective model for the transmission dynamics of emerging pathogens

Knowledge of

Until now, estimation of individual variation in infectiousness (summarized by

We demonstrate the epidemiological significance of our ML approach by analyzing chain size data obtained during monkeypox surveillance in the Democratic Republic of Congo from 1980–1984

We define a ‘stuttering transmission chain’ as a group of cases connected by an unbroken series of transmission events. Transmission chains always start with a ‘spillover’ event in which a primary case (sometimes referred to as an index case) has been infected from an infection reservoir outside the population of interest. Mechanisms of spillover differ among pathogens and circumstances, but include animal-to-human transmission, infection from environmental sources or geographical movement of infected hosts. The primary case can then lead to a series of secondary cases via human-to-human transmission within the focal population. Sometimes no secondary transmission occurs, in which case a transmission chain consists of a single primary case. We define an infection cluster as a group of cases occurring in close spatio-temporal proximity, which may include more than one primary infection and thus be composed of more than one transmission chain. Some authors use ‘outbreak’ or ‘infection cluster’ for what we call a transmission chain.

To characterize the transmission of subcritical diseases, epidemiologists might record data describing the total disease incidence, the number of cases in each transmission chain, the number of transmission generations in each transmission chain, or complete contact tracing data. Because the collection of high-resolution epidemiological data is resource and labor intensive, there is great benefit to understanding the type and quantity of data needed for a specific type of assessment. For instance, total incidence data on its own is not sufficient to infer human-to-human transmissibility for subcritical infections, because the contribution of spillover cases is unspecified. However, chain size and contact tracing data can be used to infer

The detailed and accurate data describing human transmission of monkeypox virus in the 1980s

A) Ninety percent confidence regions for

ML value for chain size analysis | 0.30 | 0.36 |

90% CI for chain size analysis | 0.22–0.40 | 0.16–1.47 |

95% CI for chain size analysis | 0.21–0.42 | 0.14–2.57 |

ML value for contact tracing analysis | 0.30 | 0.33 |

90% CI for contact tracing analysis | 0.22–0.40 | 0.19–0.64 |

95% CI for contact tracing analysis | 0.21–0.42 | 0.17–0.75 |

The chain size distribution predicted by models fitted under various assumptions about transmission heterogeneity exhibit subtle, but important differences (

When incidence of an emerging disease increases, a frequent goal of surveillance is to assess whether this is attributable to a rise in transmissibility in the focal population, as manifested by an increased

A) Applying maximum likelihood estimation to simulated data shows the sensitivity of chain size analysis and contact tracing analysis for detecting a change in

Equally as important as detecting a change in

Number of chains simulated | Percentage when |
Percentage when |
Percentage when |

20 | 1.7 | 10.2 | 14.9 |

100 | 5.0 | 10.8 | 15.5 |

500 | 5.1 | 10.8 | 15.7 |

For many surveillance systems, large chains are more likely to be detected than isolated cases. This could give rise to biases in the chain size distribution data, which we address in a later section. In these situations, an alternative approach to detecting a change in

A) Size of an observed chain that is anomalously large as a function of

In some situations, a rapid response protocol might be instituted to quickly investigate worrisomely large chains. In this case, an anomalous size cutoff can be chosen based on there being real-time reports of the size of single chains (as distinct from considering the largest chain obtained from an entire surveillance data set). However, assuming an incorrect value of

Cumulative distribution threshold | |||

95% | 4.9% ( |
4.9% ( |
4.9% ( |

99% | 0.88% ( |
1.29% ( |
1.94% ( |

99.9% | 0.09% ( |
0.22% ( |
0.43% ( |

In other situations, chain sizes may be evaluated collectively after a predefined period of surveillance. For the ML values of

By demonstrating the concordance of results based on chain size and contract tracing data when inferring

Due to the challenges of illustrating the dependence of inference error on three variables, this section considers two special cases of parameter values. First we fix

The axes represent the true

We limit our simulation results to

We summarized the error in

As with relative error, the absolute error

To further our understanding of the error in

In principle, bias-correction could be applied to

Assessing inference of transmission heterogeneity is complicated by the inverse relationship between

A) Error of

Because of the non-intuitive relationship between

Motivated by the observation that the ML estimator for

Since confidence interval calculations are independent of the particular metric used for quantifying inference error (e.g. insensitive to our use of

Overall, our characterization of the inference of

The preceding analyses have shown the potential for accurate inference of transmission parameters from chain size data, but we have not yet considered how imperfect case detection impacts inference results. We have also ignored complications arising when multiple chains are mixed into a single cluster. This latter scenario allows the possibility that some primary infections are falsely classified as secondary cases. Here we consider whether and how these types of data limitations impact inference results.

No surveillance system is perfect and some cases will be missed. However the mechanisms underlying imperfect observation can alter

By modeling observation as a two-step process, we can explore the impact of a diverse range of scenarios. We define the passive observation probability as the probability that any case will be detected by routine surveillance measures. This probability applies independently to all cases, so multiple cases in the same chain can be detected by passive surveillance. In some settings, there is an active surveillance program that investigates outbreaks that have been detected by the passive system. We define the active observation probability as the probability that a case will be detected by active surveillance, conditional on that case not having been detected by passive surveillance. Cases can be detected by active surveillance only if they belong to a transmission chain where at least one case is detected by passive surveillance. (When the active observation probability is zero or one, respectively, our observation model maps onto the ‘random ascertainment’ and ‘random ascertainment with retrospective identification’ scenarios previously analyzed

When the passive observation probability approaches one, essentially all cases are observed and so the inferred

A) The inferred value of

Imperfect observation tends to cause over-estimation of

Overall, our observation model suggests that inference of

A key challenge of analyzing chain size data for monkeypox and many other zoonoses is that primary infections are typically clinically indistinguishable from secondary infections. Yet each type of infection represents a distinct transmission process and ignoring this distinction can skew epidemiological assessments. In the context of chain size distributions, this causes a problem because multiple chains can be combined into one cluster. To improve our understanding of how inference of

The monkeypox dataset we analyze groups cases in terms of infection clusters rather than transmission chains. Our primary strategy to cope with this limitation was to consider all possible ways that the ambiguous infection clusters could be divided into chains (what we term the combinatorial approach). This effort was greatly facilitated by knowing how many primary cases were present in each infection cluster. We now consider the importance for transmission parameter inference of identifying primary cases correctly. We then consider the additional value of more detailed contact tracing data that allows disentanglement of clusters into individual chains.

To assess how clusters identified as having multiple primary infections (equivalent to the presence of ‘co-primary infections’) impact

ML estimates of

To determine the importance of disentangling transmission chains fully before performing inference, we considered two methods for dividing infection clusters with multiple primary infections into individual transmission chains (

Only 5 of the 19 clusters containing multiple primary infections had ambiguity with regard to the size of constituent chains. Thus the noticeable difference between the ML estimates of

Overall, our analysis of monkeypox data highlights how inference of transmission parameters from chain size data can be complicated when infection clusters may contain multiple primary infections. More generally, the challenge of properly differentiating primary from secondary infections is of fundamental importance for analysis of stuttering zoonoses. Even when well-trained surveillance teams are on site to assess transmission pathways, it may be impossible for them to decide between two equally likely infection sources. For instance, it can be difficult to decide if a mother contracted monkeypox because she cared for an infected child or because she contacted infected meat (in the same contact event as the child, or a later one). The theory presented here forms a foundation for further research on infection source assignment and its relationship to underlying transmission mechanisms. Future investigations can leverage existing methods of source assignment developed for supercritical diseases, which utilize various epidemiological data such as symptom onset time, risk factor identification and pathogen genetic sequence data

Several of our modeling assumptions deserve further exploration. In particular, the assumption that transmission can be described by independent and identical draws from a negative binomial offspring distribution is a simplification of some forms of transmission heterogeneity. For example, if heterogeneity is driven largely by population structure, such that susceptibility and infectiousness are correlated, then the relation between

Data acquisition is often the limiting factor for assessing the transmission of subcritical diseases that pose a threat of emergence. Our findings can assist future surveillance planning by drawing attention to the utility of chain size data when contact tracing data are too difficult to obtain. We have shown that both

We analyzed previously reported data describing monkeypox cases identified between 1980–1984 in the Democratic Republic of Congo (formerly Zaire)

The raw cluster data for monkeypox was obtained from

Chain size | Simple cluster analysis | homogeneous assignment | heterogeneous assignment |

1 | 84 | 114 | 120 |

2 | 19 | 16 | 7 |

3 | 11 | 11 | 12 |

4 | 5 | 2 | 3 |

5 | 2 | 2 | 3 |

6 | 4 | 2 | 2 |

We analyze the transmission dynamics of stuttering chains using the theory of branching process

The choice of offspring distribution is important because it defines the relationship between the intensity and heterogeneity of transmission. We adopt a flexible framework by assuming secondary transmission can be characterized by a negative binomial distribution with mean

A key advantage of using a two-parameter distribution over a one-parameter distribution (such as the geometric or Poisson distribution) is that modulating

All simulated chains start with a single primary infection. Then the number of first generation cases is decided by choosing a random number of secondary cases according to a negative binomial distribution with mean

To simulate imperfect observation, we first simulated a set of true transmission chains, then simulated whether each case would be observed according to the passive observation probability. Finally, for chains where at least one case was detected passively, we simulated which additional cases were observed according to the active observation probability.

All calculations and simulations are performed with Matlab 7.9.0. Code is available in

The next two subsections derive the average size and variance of the distribution. As a by-product, we obtain a first order moment estimator for

Since the average number of cases per generation declines in a geometric series when

This relationship can be inverted to obtain the first moment estimator for

An alternative expression for

The coefficient of variation (COV) provides quantitative perspective on the relationship between

Meanwhile, branching process theory shows that the variance of the chain size distribution is

The COV of the negative binomial offspring distribution increases as

The COV for the offspring distribution (i.e. the distribution for the number of transmission events caused by each case, panel A) and chain size distribution (panel B) are both a function of

Beyond determining the relationship between

Based on

Noting that the Gamma function

For a given value of

A) The probability distribution for chain sizes for various parameter choices, when transmission is described by a negative binomial offspring distribution. B) Same as panel A but with logarithmically scaled axes, to highlight lower frequencies and larger chain sizes. C) The weighted probability density for the same

We employ maximum likelihood estimation for

The ML estimate of

Solving for

The ML calculation for the dispersion parameter,

As mentioned, some of the monkeypox infection clusters could not be unambiguously divided into constituent chains. For our baseline ML inference of

Contact tracing investigations yield direct information about how many infections are caused by each infectious case. By analogy to

To study the precision and accuracy of our ML approach, we simulated many data sets for a range of values of

We use two metrics to summarize the error in inferred values of

Another useful metric for characterizing

Since the relative error scales with

Since the coefficient of variation of the negative binomial distribution is a function of

The inference error for

We use likelihood profiling to determine the confidence intervals for inferred values of

Our approach does not put any explicit constraints on the value of

To determine the associated confidence interval for

The two-dimensional confidence regions corresponding to a confidence level of

To test the accuracy of the ML confidence intervals, we use simulated data to determine the coverage probabilities of the univariate confidence intervals for

Prior research has shown that the distribution of the number of transmission generations before extinction for a set of stuttering chains can be used to infer

The joint likelihood of a chain having size

Since this now overlaps with the derivation of

We now assume that we have complete contact tracing data, meaning that for every infected individual we know exactly how many individuals they subsequently infected. The likelihood is given by

This means the ML estimate of R_{0} based on contact tracing data is:

Thus, when estimating

To determine whether two data sets on chain size distribution correspond to statistically distinct values of

For

The results presented in

The probability that a chain has a size less than

(TIF)

(PDF)

(PDF)

We are grateful for the editors and anonymous reviewers for insightful feedback that improved the content, organization and readability of the text.