Statistical Inference for Multi-Pathogen Systems

There is growing interest in understanding the nature and consequences of interactions among infectious agents. Pathogen interactions can be operational at different scales, either within a co-infected host or in host populations where they co-circulate, and can be either cooperative or competitive. The detection of interactions among pathogens has typically involved the study of synchrony in the oscillations of the protagonists, but as we show here, phase association provides an unreliable dynamical fingerprint for this task. We assess the capacity of a likelihood-based inference framework to accurately detect and quantify the presence and nature of pathogen interactions on the basis of realistic amounts and kinds of simulated data. We show that when epidemiological and demographic processes are well understood, noisy time series data can contain sufficient information to allow correct inference of interactions in multi-pathogen systems. The inference power is dependent on the strength and time-course of the underlying mechanism: stronger and longer-lasting interactions are more easily and more precisely quantified. We examine the limitations of our approach to stochastic temporal variation, under-reporting, and over-aggregation of data. We propose that likelihood shows promise as a basis for detection and quantification of the effects of pathogen interactions and the determination of their (competitive or cooperative) nature on the basis of population-level time-series data.

Deterministic skeleton of the model The deterministic skeleton of the model is completely described by the 16 equations given in Table S1: Table S1. Deterministic skeleton of the model. The equations are organized in a grid corresponding to the schematic layout of the compartments shown in Fig. 1. The symbols are described in Table 1.

More on inference technology
Consider a data set Y , consisting of observations of m variables y i (t j ), i = 1, . . . , m at n points in time t j , j = 1, . . . , n. These data reflect indirectly on the true state of the system via the joint probability distribution of y i (t j ) on the system's state x(t j ) at time t j . In particular, specifying an observation model is equivalent to postulating the probability distribution f (y | x, t, θ) of observing the data y(t) given the state of the system x at time t. In this context, θ represents a vector of unknown parameters to be estimated.

S2
For likelihood-based inference, we calculate the likelihood that a chosen parameter set θ explains the complete data (within the confines of the process and observation models). This likelihood function L(θ) is a product of conditional likelihoods, L tj (θ), calculated at each time t j for all n data points in time. In practice, it is more convenient to calculate the natural logarithm of the likelihood function, log L(θ), which involves taking sum of the conditional log-likelihoods.
The Markov property of the model allows one to calculate the conditional likelihood L tj (θ) sequentially starting from t j = t 1 . For each t j , the likelihood function is: Furthermore, the calculation of the filtering distribution is simplified identifying a recursive relation by applying Bayes' Theorem.

Construction of log-likelihood profiles
Each log-likelihood profile is constructed on the basis of log-likelihoods calculated at various points in a two dimensional grid, where the two dimensions represent the two parameters that we are inferring. For Figs. 3, 4, 5, 9, S1, and S2, they are the short term interaction (φ = ξ), and the long term interaction (χ). For Fig. S3, the parameters varied were reporting rate (ρ) and long term interaction χ. The grids span between the values of 0 and 2 for φ and χ, and we sampled at intervals of 0.2. For, ρ, the grid spans between 0.1 and 1, and we sampled at intervals of 0.1. When the results warranted refining, additional calculations were performed at finer intervals. This was necessary for profiles of χ, and ρ since the peaks were much sharper. The points where these additional samples were taken can be read off directly from the graphs.
We estimate log-likelihoods at each of these sampled points in the grid. We performed at least 5 separate calculations for each sampled point, each using 30,000 particles. Likelihood values are averaged over these replicates, and log of this mean is reported as the log-likelihood estimate.
Profiles were then constructed for both of the parameters. This was done by fixing the parameter being profiled at different sampled values and maximizing over the other parameter. A smooth line was then fitted through the values at the sampled points. We used local polynomial regression fitting (loess) provided in R [3] with span of 0.75 for profiles of φ and 0.5 for χ and ρ. The 95% confidence interval is taken to be χ 2 1 (0.95)/2 ≈ 1.92 log-likelihood units below the maximum -univariate confidence limits using the χ 2 distribution.
The log-likelihood surfaces shown in Fig. 6 follows the same process, except here the parameter χ is fixed and parameters δ and φ = ξ are varied. A smooth 2-dimensional surface is fitted using approximate Nadaraya Watson kernel smoother, implemented in R-package fields. The function smooth.2d is used with bandwidth of 0.11 and resolution of 257 by 257. The 95% confidence interval is taken to be χ 2 2 (0.95)/2 ≈ 3 log-likelihood units below the maximum -bivariate confidence limits using the χ 2 distribution. While the choice of the bandwidth and resolution can somewhat affect the precise boundary of the 95% confidence interval, the qualitative picture remains unaltered.
Log-likelihood surface shown in Fig. 8 is a two-dimensional representation of the results used for Fig. S3 without any smoothing. The 95% confidence interval is taken to be χ 2 2 (0.95)/2 ≈ 3 log-likelihood units below the maximum -bivariate confidence limits using the χ 2 distribution.

Changing epidemiology
All of the our analysis was based on epidemiology of a disease with R 0 of 2.7. Here, we look at a disease with higher R 0 (β = 130, and R 0 = 5) under scenario III -partial and temporary cross-immunity, and delayed but permanent enhancement. We compare the inference results between the two diseases in Fig. S1. The inferences for a disease with higher R 0 are more precise. This is especially noticeable for the short term interaction. This is to be expected -the dynamical impact of interaction on a disease with larger R 0 is larger, and consequently much easier to infer.

More on under-reporting
We conducted two separate sets of experiments to address problems associated with under-reporting. In the first, we asked if we are still able to infer interactions if the data presented were under-reported. To mimic this, we scaled a previously generated data by 50%, attempted to infer both short-and long-term interaction, with prior knowledge of 50% reporting rate. The profiles are shown in Fig. S2.
In the second, we asked if we are able to infer interactions from an under-reported data, when underreporting rate is not known a priori. In the paper, we presented a log-likelihood surface, for reporting rate ρ and long-term interaction χ (Fig. 8). In Fig. S3, we present log-likelihood profiles and the simulated data corresponding to the surface.

Estimating initial conditions
Our default estimation experiments assumed that initial conditions -the fractions of the host population in each compartment, X SS through X RR -are known a priori. In practice, initial conditions would have to be estimated along with the interaction parameters. Here, we relax the assumption that initial conditions are known, and attempt to infer simultaneously the interaction parameters and initial conditions. We take the two data sets from scenario III previously examined in Fig. 5 and attempt to infer the long-term interaction parameter, χ.
We proceed as follows:   In the inset, we show close-up of the profiles near the peak. Plotted ∆loglik are relative difference in the raw log-likelihood from the reference point set at ∆loglik = 0, indicated by the horizontal dashed line. ∆loglik = 0 represents the 95% confidence interval -parameter values corresponding to a positive ∆loglik are within the confidence bound. The gray dots indicate the repeated likelihood estimates (5 replicate SMC calculations for each profile point, 30, 000 particles in each SMC calculation). The profiles are created by fitting a smooth line through the log of the mean likelihoods (shown in black dots). The vertical red dashed line is plotted at the actual parameter value used to generate the simulated case-data. Here, short term interaction is absent (φ = ξ = 1), and other parameters not shown in the graph are taken from Table 1. 1. For each profile point, we simulate the deterministic skeleton of the model to obtain the asymptotic range of each state variable.
2. We generate 100 guesses of initial conditions within this range.
3. For each guess, we attempt to maximize the fit of the model to the first 5 years of the data by varying only the initial conditions. We quantify the fit using the synthetic log-likelihood [4] based upon several summary statistics (probes). The probes used include means, variances, and marginal data distributions ignoring the time ordering. The optimization was done with a Nelder-Mead algorithm and the probe.match function in the R package [3] pomp [5].
4. We then select the initial condition estimate with the highest synthetic likelihood and proceed as described previously.
In Fig. S4, we present two profiles for the long term interaction, χ. In blue are the profiles based on the true initial conditions, while in black we present profiles with estimated initial conditions. They are plotted in the same log-likelihood scale to enable comparison. As expected, since more parameters have been estimated, the profiles in black are broader. Crucially, the long-term interaction remains well-identified, i.e. χ is not fundamentally confounded with initial conditions.  Figure S4. Inference of long-term interaction with estimated initial conditions. Log-likelihood profiles for the long-term interaction (χ), with true initial conditions [in blue] and estimated initial conditions [in black]. [Left] corresponds to the dataset used in Fig. 5[Left], and [Right] corresponds to the dataset used in Fig. 5[Right]. The empty dots indicate the repeated likelihood estimates (5 replicate SMC calculations for each profile point, 30, 000 particles in each SMC calculation). The parabolas have been fit to the logarithm of the mean likelihoods (shown in filled dots) and are intended only to guide the eye. The vertical red dashed line is plotted at the true parameter value used to generate the simulated case data. Here, short term interaction is (φ = ξ = 0.6), and other parameters not shown in the graph are taken from Table 1.