A Robust Parameter Estimation Method for Estimating Disease Burden of Respiratory Viruses

Background Poisson model has been widely applied to estimate the disease burden of influenza, but there has been little success in providing reliable estimates for other respiratory viruses. Methods We compared the estimates of excess hospitalization rates derived from the Poisson models with different combinations of inference methods and virus proxies respectively, with the aim to determine the optimal modeling approach. These models were validated by comparing the estimates of excess hospitalization attributable to respiratory viruses with the observed rates of laboratory confirmed paediatric hospitalization for acute respiratory infections obtained from a population based study. Results The Bayesian inference method generally outperformed the classical likelihood estimation, particularly for RSV and parainfluenza, in terms of providing estimates closer to the observed hospitalization rates. Compared to the other proxy variables, age-specific positive counts provided better estimates for influenza, RSV and parainfluenza, regardless of inference methods. The Bayesian inference combined with age-specific positive counts also provided valid and reliable estimates for excess hospitalization associated with multiple respiratory viruses in both the 2009 H1N1 pandemic and interpandemic period. Conclusions Poisson models using the Bayesian inference method and virus proxies of age-specific positive counts should be considered in disease burden studies on multiple respiratory viruses.


Introduction
Acute respiratory infections accounted for 11-22% of global deaths of children under five, with a significant proportion caused by respiratory viruses [1]. However, obtaining reliable population based estimates for disease burden of respiratory viruses remains a challenge. These viruses usually cause overlapping clinical syndromes, making it difficult to assign viral aetiology based on the clinical presentations of patients [2]. Moreover, laboratory tests necessary for case confirmation are not always conducted in clinical settings owing to limited laboratory capacity [3]. Previous studies have used several statistical methods to quantify the morbidity and mortality burden associated with influenza and respiratory syncytial viruses (RSV) [4]. These methods first established a baseline level with the assumption of no virus circulation, and then defined the excess hospitalization or mortality as the difference between the observed and baseline. However, few of these methods were able to separately determine the burden attributable to different respiratory viruses and even fewer studies have assessed the burden of respiratory viruses other than influenza and RSV. One commonly used method, Poisson regression modeling, allows simultaneous assessment of cocirculating viruses and has become increasingly popular recently. But our previous study showed that the point estimates derived by the classical maximum likelihood method for respiratory viruses other than influenza were unrealistically small and even negative [5]. The challenge lies in resolving the overlapping peaks of these co-circulating viruses, and also in adjusting for the confounding effects of other seasonal factors such as temperature or humidity [6]. An alternative estimation method, Bayesian inference, could be used as it has the advantage of incorporating the prior knowledge on parameter distributions [7]. Another unsolved problem in disease burden studies is the choice of virus proxy variables. The numbers or proportions of specimens positive for different viruses in all specimens tested have been widely used in previous studies [8,9]. Other less frequently used proxies include influenza-like illness rates multiplied by laboratory-test positive proportions (ILI6LAB) [10]. Although virus attack rates could be different across age groups due to the heterogeneity in prior immunity and exposure risks [11][12][13], no studies have hitherto integrated age-specific virus data into the models, largely due to the lack of such data in most regions. In this study we evaluated the performance of various combinations of model assumption, virus proxy variables and inference methods, in estimating excess hospitalization attributable to several co-circulating respiratory viruses. The estimates have been validated by comparison with observed rates of laboratory confirmed paediatric hospitalization rates for acute respiratory infections obtained from a population based study.

Data source
Hospital admission records of the two major public hospitals on the Hong Kong Island (Queen Mary Hospital (QMH) and Pamela Youde Nethersole Eastern Hospital (PYNEH) were obtained from the Hong Kong Hospital Authority during the study period of October 2003-September 2010. We compiled weekly numbers of hospital admissions with any listed discharge diagnosis of acute respiratory diseases (ARD) for the age groups of ,1, 1-5 and 6-17 years, according to the International Classification of Diseases (9th Revision, ICD9) codes 460-466 or 480-487. Age specific virology data were obtained from the Microbiology Laboratory of QMH, which provides virology diagnostic services for both QMH and PYNEH, for influenza A (seasonal subtypes H3N2, sH1N1 and pandemic strain pH1N1), influenza B, respiratory syncytial virus (RSV), adenovirus and parainfluenza virus types 1-3. This laboratory tested a total of 80 611 specimens collected from both QMH and PYNEH during the study period, by using direct immunofluorescence tests (IF) and viral culture. Reverse transcription polymerase chain reaction (RT-PCR) was only routinely carried out during the 2009 pandemic [14]. Meteorological data were obtained from the Hong Kong Observatory.

Poisson model
Poisson models were first fitted to the age-stratified weekly admission numbers of acute respiratory diseases. A typical form of this model is where Y t denotes the numbers of age-specific hospital admissions at week t (t = 1,2,…,366), and follows a Poisson distribution with mean m t and variance Qm t . Here Q is an over-dispersion factor to adjust for the unequal mean and variance [15]. fluA t , fluB t , RSV t , adeno t and paraflu t denote the age-specific weekly counts of specimens positive for influenza A and B, RSV, adenovirus or parainfluenza viruses, respectively. s(t), s(Temp t ) and s(Humd t ) are the natural spline functions of time, weekly average temperature and relative humidity, respectively. Five degrees of freedom per year were used for the seasonal trend and two degrees of freedom for temperature and relative humidity. We used a Bayesian inference process based on Gibbs sampling (BUGS) [16] to estimate the parameters. A variety of Bayesian approaches have been widely applied to calculate the genetic distance in phylogenetic analysis [17] and to describe the transmission dynamics of influenza viruses [18]. By incorporating prior knowledge on the distribution of parameter with available data, the Bayesian inference method could provide a posterior distribution closer to the true underlying distribution [19]. Due to the known adverse effects of the viruses on hospital admissions, we assumed that the parameter of virus proxy variable followed a non-negative distribution. Therefore the coefficients of these variables b 1 , b 2 , b 3 , b 4 and b 5 were estimated by a Bayesian process, under the distribution assumption of Uniform[0,h]. The posterior distribution of each covariate parameter was estimated by repeating a Monte Carlo Markov Chain simulation for 50,000 iterations with 25,000 burn-in iterations. Based on our previous findings [20], the starting point of h was set to 10, to cover the range of excess risk from 0-20% associated with 10% increase in virus proxies.
In addition to age-specific positive counts, we tried different combinations of virus proxies with the Bayesian inference method on virus coefficients: age-specific proportions of positive specimens (Model 2), all-ages proportions (Model 3) or all-ages influenza-like illness rates multiplied by all-ages proportions (ILI6LAB, model 4). Besides the commonly adopted log linear Poisson regression models that assumed multiplicative effects of viruses, we also tried linear Gaussian models that assumed additive effects of influenza (Model 5) [10,21]. To compare the Bayesian approach with our previous models based on classical likelihood estimation, we fitted the classical log linear Poisson models with the proxies of agespecific counts (Model 6), age-specific proportions (Model 7) and all-ages proportions (Model 8).

Model validation
Baseline hospitalization for influenza A subtype H3N2 was first calculated from the model as the expected weekly numbers of admissions when the H3N2 proxy variable was set to zero and all the other variables were kept as the observed values. Excess hospitalization attributable to H3N2 was defined as the sum of difference between the observed and baseline hospitalization [22]. Similar calculation was repeated for other subtypes of influenza A, influenza B, RSV, adenovirus and parainfluenza, respectively. Annual excess rate of hospitalization was separately calculated for each year, by dividing the annual total number of excess hospitalization by the mid-year age-specific population in the Hong Kong Island obtained from the year 2006 census. Annual excess rates estimated by these statistical methods were then compared with the directly observed admission rates for a population based systematic sample of laboratory confirmed cases of respiratory virus infections, who were admitted into the QMH and PYNEH with any listed diagnosis of ARD during the same period. The details of data collection for the directly observed virologically confirmed hospitalization rates have been described elsewhere [23]. Briefly, nasopharyngeal aspirates from patients who were younger than 18 years and admitted with symptoms of acute respiratory infection on one chosen day (Wednesday or Thursday) of each week, were all tested for five respiratory viruses by IF. Since these two hospitals provide acute paediatric hospital services for approximately 70% of the population in Hong Kong Island, we could estimate the population based age-specific hospitalization rates from this cohort. We calculated the mean of absolute percentage difference between the annual age-specific estimates and corresponding virologically confirmed observed hospitalization rates, and chose the most optimal model as that with the smallest mean difference. We also assessed the lag effects of these viruses by replacing the virus proxy variables with the proxies at the weeks up to three weeks before the current (lag1, 2 Table 1. Mean absolute percentage difference between annual age-specific excess hospitalization rates and annual hospitalization rates of laboratory confirmed infections in a pediatric cohort.

Results
The mean absolute percentage difference between the annual age-specific rates of excess hospitalization derived from different models and the corresponding observed rates is shown in Table 1. In the models using the same virus proxies, the estimates from the models using the Bayesian inference showed smaller deviations from the observed rates than the classical likelihood estimates, particularly for RSV, parainfluenza and adenovirus. In the models using the Bayesian inference, compared to the other virus proxies, age-specific counts provided the estimates with smaller deviation from the true observed rates for most viruses ( Table 1). The loglink models (Model 1) offered the estimates closer to the observed rates than the identity-link models (Model 5), with the exception of parainfluenza. Overall, the log-link Poisson models using the Bayesian inference and the proxies of age-specific counts (Model 1) provided the most reliable estimates for the excess hospitalization associated with influenza A and B, RSV, parainfluenza and adenoviruses. Therefore we chose this model as the final one and presented the estimates from this model in the rest part of this paper. The lag effects up to three weeks were separately assessed by replacing the age-specific positive counts virus at the current week (lag 0) with those at one to three weeks before (lag 1-3). These models with different lag week consistently provided the estimates more deviant from the observed rates, compared to the proxy variables at the current week (Table 2).
Annual excess rates of hospitalization were slightly lower than the directly observed rates for influenza A subtypes sH1N1, H3N2, pH1N1 and influenza B in all the age groups, without any pattern of consistent under-or over-estimation observed in any of these age groups (Figure 1). For RSV, excess rates tended to be higher than the observed hospitalization rates, particularly for the ,1 age groups. Most of the estimates for parainfluenza were smaller than the observed rates. The greatest deviation from the observed rates was found in adenovirus.
Compared to the interpandemic period, the 2009 H1N1 pandemic was associated with an obvious increase in the observed rates of laboratory confirmed cases for RSV, but a decrease in other viruses (Table 3). Overall the model provided the estimates similar to the directly observed rates of all the viruses under study  during the pandemic period, except slight overestimation in H3N2 and influenza B, and underestimation in adenovirus. The model performance was comparable between the interpandemic and pandemic periods for all the viruses.

Discussion
Time series models have widely adopted by recent studies to estimate the disease burden of influenza and RSV [24,25]. In this study we compared the Bayesian inference method with the classical likelihood estimation, in terms of obtaining more reliable estimates for the disease burden of co-circulating viruses including influenza, RSV, parainfluenza and adenovirus. Under the assumption of positive association between respiratory virus activity and hospitalization, the Bayesian inference method successfully separated the individual effects of multiple respiratory viruses, which the previous models have not or only partially achieved [5,26]. With the exception of adenovirus, the model estimates closely matched the true hospitalization rates across different age groups that were observed in a pediatric cohort under a systematic surveillance for respiratory virus infections. We speculated that underestimation in adenovirus was probably due to its less clear seasonal pattern and relatively lower positive isolation rate compared to the other viruses ( Figure 2). Nevertheless, the models overall offered the satisfactory estimates which were within the close range of true hospitalization rates without exaggeration.
Taking the advantage of long standing virology data with linked age information in Hong Kong, this study for the first time added the age-specific virology data as proxy in the time series models for disease burden studies. We found that age-specific counts showed the best performance among all the proxies when combined with either the Bayesian or classical likelihood inference methods. In previous studies, we used all-ages proportion as proxy because it took into account the temporal variations in total numbers of specimens collected. However, this might not be the case for agespecific virology data, as relatively small numbers of total Table 3. Comparison of weekly directly observed rates (per 100,000 population) and excess rates of hospitalization associated with influenza estimated by the Bayesian approach, during the interpandemic period (4 January 2004-25 April 2009) and pandemic period (26 April 2009-14 August 2010).

Virus/Age group Interpandmic Pandemic
Directly observed rates Excess rates (95% CI) Directly observed rates Excess rates (95% CI) specimens tested in some age groups could have introduced spurious peaks in age-specific proportions. We also evaluated the performance of ILI6LAB proxy, which was found more closely correlated with the true incidence of influenza during the interpandemic or pandemic period [21,27]. We found this proxy provides the estimates closer to the observed rates than age-specific and all-ages proportions, but slightly worse than the proxy of agespecific counts in most viruses (Table 1 and Figure 3). Taken together, age-specific counts shall be recommended as proxy variables if such data are available. If age information is unavailable, ILI6LAB is probably the proxy that shall be considered.
The 2009 H1N1 pandemic was characterized with dramatically increased attack rates among children and young adults, but the severity of pandemic infections was comparable to the seasonal virus strains [28,29]. ARD admission rates in our pediatric cohort increased by a proportion ranging from 7% to 170% during the pandemic (Table 1), and many other studies also reported a similar magnitude of increase [30][31][32][33]. However, the admissions due to non-influenza infections decreased in the pandemic, except RSV. Our model estimates were able to capture this trend, showing the same change directions as the observed rates. However, large deviations were also observed in some age-virus categories, such as influenza B in the ,1 and 1-5 age groups. Further studies are warranted to fine tune our modeling approach in order to derive reliable estimates for different periods.
It has been widely accepted that Poisson distribution is appropriate to fit the low-frequency count data, but the log-link function commonly adopted in Poisson models has been criticized for its assumption of exponential increase in health outcomes along with one unit increase in virus proxies [8,34]. Some of recent studies switched to a more ''reasonable'' assumption of linear relation by adopting an identity-link function in Poisson models [35,36]. In this study we found that the log-link function yielded the estimates slightly closer to the true incidence of influenza hospitalizations than the identity-link. However, the key assumption on the association of virus proxies and health outcomes in Poisson models still remain to be proved. Further evidence on the mechanism of influenza transmission and pathogenicity in human community could probably help resolve this problem.
Our study has potential limitations. First, the Bayesian estimates are sensitive to the prior distributions and the prior assumption of nonnegative coefficient for virus proxy variables needs to be carefully justified. Since our virology data were obtained from the laboratory surveillance based on hospitalized inpatients, it is reasonable to assume that these virology data were positively associated with the increase of hospital admissions with viral respiratory infections. However, overestimation might exist if the assumption of prior distribution is not well justified, and caution needs to be taken when extending this approach to estimate the excess mortality of other respiratory viruses, as most viruses other than influenza cause only mild symptoms that might not necessarily lead to death [37]. Second, age-specific virus data requires long standing and intensive virology surveillance for multiple respiratory viruses, but such surveillance networks may not be available for influenza in many countries. Nevertheless, the importance of simultaneous assessment on other respiratory viruses, particularly RSV, has started to be recognized [26,38]. So we can expect these data will become available in more and more countries in the near future. Third, we only estimated the excess hospitalization of five respiratory viruses due to limited virology data. There are many other respiratory viruses (e.g. rhinovirus) and bacteria (e.g. Streptococcus pneumonia) also contribute greatly to ARD hospitalization in children, although the clinical significance of detection of some of these (e.g. rhinovirus) remains unclear. Further studies are needed to assess whether addition of more virology data could alter the performance of models.
In conclusion, age-specific counts of positive specimens are probably the best proxies for virus activity and should be used in the disease burden models if such data are available. In the absence of age-specific data, the Bayesian inference proposed in this study is superior to the classical likelihood inference method, as the former provides more reliable estimates on excess hospitalization respectively associated with multiple respiratory viruses.