Identifying the Age Cohort Responsible for Transmission in a Natural Outbreak of Bordetella bronchiseptica

Identifying the major routes of disease transmission and reservoirs of infection are needed to increase our understanding of disease dynamics and improve disease control. Despite this, transmission events are rarely observed directly. Here we had the unique opportunity to study natural transmission of Bordetella bronchiseptica – a directly transmitted respiratory pathogen with a wide mammalian host range, including sporadic infection of humans – within a commercial rabbitry to evaluate the relative effects of sex and age on the transmission dynamics therein. We did this by developing an a priori set of hypotheses outlining how natural B. bronchiseptica infections may be transmitted between rabbits. We discriminated between these hypotheses by using force-of-infection estimates coupled with random effects binomial regression analysis of B. bronchiseptica age-prevalence data from within our rabbit population. Force-of-infection analysis allowed us to quantify the apparent prevalence of B. bronchiseptica while correcting for age structure. To determine whether transmission is largely within social groups (in this case litter), or from an external group, we used random-effect binomial regression to evaluate the importance of social mixing in disease spread. Between these two approaches our results support young weanlings – as opposed to, for example, breeder or maternal cohorts – as the age cohort primarily responsible for B. bronchiseptica transmission. Thus age-prevalence data, which is relatively easy to gather in clinical or agricultural settings, can be used to evaluate contact patterns and infer the likely age-cohort responsible for transmission of directly transmitted infections. These insights shed light on the dynamics of disease spread and allow an assessment to be made of the best methods for effective long-term disease control.


S1. R code_Long et al Estimating the force-of-infection
Here, observed age-specific prevalence data (n = 214 rabbits) was collected from nasal swabs using the data set resulting from "Sampling strategy one: force-of-infection" (see Materials and Methods for details). This data represents interval-censored infection-time data, such that each individual is either infected (Y=1) or not (Y=0) within a set interval of time. For a non-immunizing persistent infection such as B. bronchiseptica, (see key assumptions of force-of-infection (FOI) models in the Statistical Analysis, M&M) the age-specific prevalence of, P(a), can be estimated via the catalytic model: is what mathematical epidemiologists call the age-dependent FOI and what statisticians would call the age-dependent hazard (Hens, Aerts et al. ;Muench 1959).
Assuming that the infection hazard is invariant with respect to age is often unrealistic (see Statistical Analysis, M&M). To incorporate age-dependency we use the piece-wise constant parametric model where, for pre-determined intervals, a constant FOI is assumed. Interval choice is based on some prior knowledge of age-classes of mixing cohorts in the population. When the FOI is assumed to be piece-wise constant across k age classes and each segment has a starting age l a and duration d k , the integrand in equation (S1) for an individual whose age lies within the k'th age class will be: The inner working of this function is as follows: line 1 calculates the duration of each age class (dur); line 2 sets the initial log-likelihood to zero; lines 3 -9 is a loop that for each individual in the data-set, calculates the integrand corresponding to equation S2 (inte), evaluates the prediction from the catalytic model (p) and finally adds the log-likelihood for each individual; finally, line 11 outputs the negative log-likelihood. Note, that to ensure that the FOI values are positive we actually estimate the log-values.
The data-set needs to be formatted so that each individual is a unique line, the column labeled 'age' gives the age of the individual, the column labeled 'sick' has a 0 (zero) for Long et al 2010 -Age-structured Bordetella transmission -R Code 3 healthy individuals and 1 (one) for infected individuals. In the example below the dataset is named dat.
With this setup, we can now estimate λ(a) by minimizing the negative log-likelihood using the quasi-Newton 'BFGS' method (for Broyden, 1969;Fletcher, 1970;Goldfarb, 1970 andShanno, 1970) -generally regarded as the best performing method -as implemented in the optim()-function of R.
First we define the vector giving the cut-off ages. We call this vector x: Then we provide some (arbitrary) initial values for the log-transformed age-specific FOI values. We call this vector para: Finally, we call the numerical optimizer in R and save the output as est: est = optim(par=log(para),fn=loglikpc,cate=x, method="BFGS", data=dat, control=list(trace=2, maxit=1000)) The maximum likelihood estimates for the log FOI is given in est$par. Statistical uncertainty * * * * Because of strong multi-colinearity among the a λ -estimates, we use partial profile likelihoods to erect confidence intervals (Diggle 2006). That is, we profile the likelihood for each segment separately, maximizing the likelihood with respect to the other segments. This step is computationally expensive and a bit technical.
We first modify our likelihood function to flag the segment to be profiled (which) and what value to consider for that segment (wval). The modified function is:

Generalized linear models (GLMs)
To test for evidence of significant sibling-to-sibling transmission we used binomial regression with a complementary log-log link. We use the complementary log-log link, here, because the resultant parameter estimates then has a hazard (i.e. FOI) interpretation, as opposed to the 'odds' interpretation that would result from the commonly used logistic ('logit') link . This was done using the data set resulting from "Sampling strategy two: sibling-to-sibling transmission" (see Sampling Strategies in M&M for details). This data set (n = 160 kits total) comprised of a column entitled: 'disease_conversion', which represents the binary variable that denotes whether or not a co-housed susceptible sibling was infected at the end of the experiment; 'co-housed status', which informed on infection status of co-housed siblings on initiation of experiment; 'facility', to control for possible differences between the two breeding houses and 'family', to control for possible cohort effects. After importing the data into R (Crawley 2007), the GLM was run as follows: glm(disease_conversion ~ cohoused_status + facility, family = binomial(link = "cloglog"))

Generalized linear mixed models (GLMMs)
Next, to investigate if (a) offspring of infected mothers have an increased instantaneous risk of becoming infected and (b) if offspring of the same litter tended to have the same infection fate because of within-litter transmission, we used random effect ("generalized linear mixed model (GLMM)") binomial regression, with litter as a random variable using the data set resulting from "Sampling strategy three: maternal transmission" (see