Estimating Sensitivity of Laboratory Testing for Influenza in Canada through Modelling

Background The weekly proportion of laboratory tests that are positive for influenza is used in public health surveillance systems to identify periods of influenza activity. We aimed to estimate the sensitivity of influenza testing in Canada based on results of a national respiratory virus surveillance system. Methods and Findings The weekly number of influenza-negative tests from 1999 to 2006 was modelled as a function of laboratory-confirmed positive tests for influenza, respiratory syncytial virus (RSV), adenovirus and parainfluenza viruses, seasonality, and trend using Poisson regression. Sensitivity was calculated as the number of influenza positive tests divided by the number of influenza positive tests plus the model-estimated number of false negative tests. The sensitivity of influenza testing was estimated to be 33% (95%CI 32–34%), varying from 30–40% depending on the season and region. Conclusions The estimated sensitivity of influenza tests reported to this national laboratory surveillance system is considerably less than reported test characteristics for most laboratory tests. A number of factors may explain this difference, including sample quality and specimen procurement issues as well as test characteristics. Improved diagnosis would permit better estimation of the burden of influenza.


Introduction
Although influenza virus infection is associated with considerable morbidity and mortality [1][2][3], laboratory confirmation of clinical illness is the exception rather than the rule. Clinicians do not routinely seek laboratory confirmation for several reasons: diagnosis will often not alter patient management, a paucity of real-time, accurate, inexpensive testing methods [4] and because influenza is not recognized as the etiology of the clinical presentation [5]. Accurate diagnosis of influenza-like illness, however, could improve clinical care through reduced use of antibiotics and ancillary testing, and more appropriate use of antiviral therapy [6]. Although rapid influenza tests such as pointof-care tests are purported to generate results in a timely fashion to influence clinical care, the performance characteristics of the currently available tests are sub-optimal [7]. New technologies with improved sensitivity such as reverse-transcriptase polymerase chain reaction (RT-PCR) [8] as well as the use of more effective collection systems such as the flocked nasopharyngeal swab compared to traditional rayon wound swabs, and the recommendation to collect more ideal specimens, such as nasopharyngeal swabs rather than throat swabs are likely to improve diagnostic sensitivity [9][10][11][12]. The performance characteristics of currently available tests for influenza vary considerably and the overall sensitivities of these tests when used in routine practice are also dependent on the type of specimen collected, the age of the patient and point in their illness in which they are sampled [4,9,[13][14][15].
We sought to estimate the sensitivity of influenza testing based on results of a national respiratory virus surveillance system using a model-based method [1,2,[16][17][18].

Sources of data
Weekly respiratory virus identifications from September 1999 to August 2006 were obtained from the Respiratory Virus Detection Surveillance System (RVDSS), Public Health Agency of Canada [19,20]. The RVDSS collects, collates, and reports weekly data from participating laboratories on the number of tests performed and the number of specimens confirmed positive for influenza, respiratory syncytial virus (RSV), para-influenza virus (PIV), and adenovirus. Specimens are generally submitted to laboratories by clinicians in the course of clinical care, and by clinicians participating in one of our national influenza surveillance programs, (FluWatch [20]). Indicators of influenza activity are reported year round on a weekly basis to the FluWatch program. The RVDSS is supplemented by case reports of influenza positive cases [19,21]. From the case reports, influenza A was confirmed in all age groups and sporadic cases were confirmed in the off-season months of June through September. Infants and children under the age of 5 years accounted for 25% of the influenza A positive tests, and persons over the age 65 years another 35%. Unfortunately, FluWatch surveillance data does not provide the total number of tests by age. Testing practices are known to be varied [22,23]. The predominant testing methods used for influenza detection varied considerably by province or laboratory and over time. For the 2005/06 season a survey of laboratory techniques in current use indicated that culture accounted for 44% of the diagnostic tests with RT-PCR, rapid antigen tests and direct fluorescent-antibody assay (DFA) accounting for 21%, 19%, and 16% respectively [23].

Statistical Analysis
The weekly number of tests negative for influenza was modelled, using Poisson regression, as a function of viral identifications for influenza, RSV, adenovirus and PIV as well as a baseline consisting of seasonality, trend and holiday variables. The estimated baseline implicitly accounts for influenza tests on specimens taken from patients with respiratory infections due to respiratory pathogens other than the four viruses captured in the RVDSS, as long as both the testing behaviour of clinicians and respiratory illnesses caused by other respiratory pathogens follow a consistent seasonal pattern as prescribed by the model (see below, The Poisson regression model with a linear link function was estimated using SAS [24] PROC GENMOD: Coefficients b 5 to b 9 are multipliers. The weekly number of influenza negative tests estimated to be falsely negative is given by b 5 InflA w +b 6 InflB w . The weekly number of influenza negative tests attributed to RSV is given by b 7 RSVp w. , and similarly for adenovirus and PIV. For each positive influenza A test, an additional b 5 tests above baseline were performed and found to be negative. By specifying a linear link, a value of 0.33, say, for coefficient b 5 , means that for every test for which influenza A was confirmed, 0.33 additional tests, on average, were performed on truly influenza A positive specimens and found to be negativewhich corresponds to a sensitivity of 75%. Sensitivity was calculated as the number of influenza positive tests divided by the number of influenza positive tests plus the model-estimated number of false negative tests, or equivalently, the estimates of sensitivity for influenza A and B are given by 1/ (1+b 5 ) and 1/(1+b 6 ) respectively. The false negative rate is 1 minus sensitivity. While the null value for b 5 is zero, which indicates no statistical association between the number of influenza positive tests and the number of influenza negative tests, the corresponding null value for sensitivity is 1.
For each test confirmed positive for RSV, on average b 7 tests were performed for influenza and found to be negative for influenza. These b 7 tests are attributed to an RSV infection, however the number of influenza-negative tests that actually tested positive for RSV is unknown. If all specimens had been tested for the same viruses (panel tests), 1/b 7 would correspond to the sensitivity for RSV testing, and the sensitivity for adenovirus and PIV given by 1/b 8 and 1/b 9 respectively. Some laboratories are known to test for viruses sequentially [22], and so 1/b 7 -1/b 9 were not interpreted as estimates of the sensitivity for other viruses. Sequential testing may occur if a rapid test for influenza is negative and the laboratory then performs PCR or culture testing. Similarly in young children with a respiratory illness in the winter, rapid tests for RSV infection may be performed first, and only specimens with negative results submitted for subsequent testing for influenza or other respiratory viruses [25]. By contrast, many laboratories conduct panel tests for multiple viruses for ease of handling, decreased patient sampling, and recognition that coinfection can occur. Either form of sequential testing would not bias the estimate of sensitivity applicable to test results reported to RVDSS, though significant use of rapid antigen tests in the laboratories reporting to RVDSS would reduce the overall sensitivity. As a single specimen may undergo multiple tests, the false-negative rate applicable to a specimen that has undergone multiple tests would be expected to be much lower than the system average for individual tests. Parameters b 1. to b 4 account for trends and the seasonality of truly negative specimens (patients presenting with other acute respiratory infections).

Results
Over 50,000 tests for influenza were reported to the RVDSS each year, peaking in 2004/05 at 101,000. Overall 10% of the influenza tests were positive for influenza, ranging from 4% to 13% depending on the season. The proportion positive for RSV, parainfluenza and adenovirus averaged 9%, 3% and 2% respectively. As seen in Figure 1, no virus was identified in 75% of specimens submitted for testing (white area under the curve). Even for the winter months of December through April, one of these 4 viruses was identified on average in no more than 30% of the specimens. The strong and consistent synchronization of negative tests with influenza positive tests, as seen in Figure 1, is suggestive that false negative results contributed to the large number of negative tests during periods of influenza activity.
The sensitivity for influenza A testing averaged 33.7% (with model-estimated 95% confidence intervals of 33.3-34.1) for the 1999/2000-2005/06 period. Influenza B testing had a similar estimated sensitivity at 34.7 (95% CI 33.4-36.1). Estimated sensitivities varied somewhat from season to season, generally ranging from 30%-40% (Table 1), and provincial level estimates, as well, were within a similar range. Stratifying by province or season produced similar estimates for the sensitivity of influenza A testing: 32% (95% CI 30-34) and 36% (95% CI 33-41) respectively. Estimates of sensitivity based on test results reported to the RVDSS for individual laboratories with sufficient data to fit the model showed significant variation, with estimates of sensitivity ranging from 25-65%. As expected, laboratories using primarily rapid antigen tests had lower estimated sensitivities, and laboratories that used PCR methods had higher sensitivity estimates. However, information on testing procedures is limited primarily to the 2005/06 survey. As well, additional irregularities were noticed in the laboratory data and not all laboratories provided sufficient data to fit the model. Figure 2 illustrates a good model fit where the weekly number of influenza negative tests is well explained by the model covariates, with a few exceptions. Firstly, it is evident that additional specimens were tested during the SARS period, as indicated by the period where the number of weekly influenza negative tests exceeded the expected number, or equivalently, a period of successive positive residuals. Residuals typically capture random variation; hence represent tests that can not be allocated based on the specified model. In addition to the SARS period, testing appears to have been elevated for a number of weeks in January 2000 during the peak of the 1999/2000 A/ Sydney/05/97 (H3N2) season in which respiratory admissions were unusually elevated [26,27], and in December 2003, when an elevated risk of paediatric deaths associated with the A/Fujian/411/02 (H3N2) strain [28] was identified in the US. As these periods corresponded to a period of heightened public awareness due to severe influenza outbreaks, parameter estimation was repeated without these data points. Exclusion of these data points did not alter the sensitivity estimate for influenza.
The attribution of influenza negative test results to influenza and other viruses is illustrated in Figure 3. The baseline curve is the model estimate of the number of tests that were likely truly negative for all four viruses tested. A reduction in specimen collection and testing, primarily for viruses other than influenza, is also evident over the Christmas period ( Figure 3).
The weekly proportion of tests confirmed positive for influenza peaked each season at 15 to 30%. Accounting for the model estimated false negative rate suggests that during periods of peak influenza activity, 40-90% of tests were performed on specimens taken from persons recently infected with influenza. Influenza was confirmed in only 14% of specimens sent for testing over the winter period, whereas the sensitivity estimate would imply that up to 40% of influenza tests could be attributed to an influenza infection. The corresponding figures for the whole year indicate that 10% of specimens were confirmed positive for influenza and 30% of influenza tests could be model-attributed to an influenza infection annually.
Despite a relatively large number of tests in the off-season, the number of influenza positive tests was almost negligible; suggesting that the false positive rate applicable to RVDSS influenza testing is minimal.

Discussion
The model estimated sensitivity based on influenza test results reported to the RVDSS of 30-40% is much lower than the standard assay sensitivities documented in the literature. Standard sensitivities for diagnostic procedures used by participating laboratories ranged from 64% for rapid antigen tests to 95% for RT-PCR tests, averaging 75% for the study period [23]. As performance characteristics of specific tests are generally based on high quality specimens, the difference of approximately 40% is likely linked to any one of many operational procedures that affects the quality of the specimen and its procurement. Unlike validation studies, our samples are taken from a variety of clinical settings and processed with a variety of procedures across the country. As well, variation in the indications for diagnostic testing may vary across the country.
As there are many other respiratory pathogens that are not routinely tested for, or reported to the RVDSS, including human metapneumovirus (hMPV), coronaviruses, and rhinoviruses for which patients may seek medical care and present with influenza like illness [29][30][31][32], a large proportion of negative test results was expected. The overall model fit, and the general consistency of the sensitivity estimates, suggests that these many respiratory viruses were reasonably accounted for by the seasonal baseline and that the strong association between the number of influenza positive and influenza negative tests on a weekly basis is indicative of a significant number of false negative results, rather than the activity of another virus or viruses exactly synchronous with influenza. The latter would bias the estimated sensitivity of the system downwards. However, to significantly and consistently bias the estimate, the degree of synchronization would have to be fairly strong, persist over the whole study period, and occur in all provinces. Synchronization was not observed among the RVDSS viruses (influenza A, influenza B, RSV, adenovirus and PIV), and elsewhere other viruses such as rhinovirus, coronavirus and hMPV accounted for only a small proportion of the viral identifications and were not found to be synchronized with influenza [33]. As well, patients may present for care due to a secondary bacterial infection. While any specimen would likely test negative as the virus, at this point, is likely not detectable, the model would statistically attribute a negative test in this case to the primary infection; one of the four RVDSS viruses or to the seasonal baseline that represents other respiratory infections, depending on the level of viral activity at the time of the test. This is not considered a source of bias.
The large variation in false negative rates estimated for individual laboratories reporting to the RVDSS suggests that standardization of sample procurement, testing and reporting procedures would likely reduce the overall false negative rate. The accuracy of diagnostic tests is known to be affected by the quality of the specimen [10,11], its handling, the timing of collection after symptom onset, and the age of the patient [14,15]. Even with the most sensitive molecular methodologies, yield was shown to be strongly related to the time since onset of symptoms [9,14], with a 3-fold decline in proportion positive within 3 to 5 days after onset of symptoms for both RT-PCR and culture procedures. For most laboratory tests, specimen procurement within 72 hours of from the onset of symptoms is recommended [6], yet patients often present much later in the course of illness. Estimates of the median time since onset of symptoms suggest a delay of 3 and 5 days for outpatient and inpatients respectively [15], however these estimates are limited to patients with laboratory confirmed influenza. In addition, there are inherent differences in the performance characteristics of the currently used diagnostic tests [4,6,8,[34][35][36][37][38]. Lack of standardization between diagnostic tests and algorithms used in different laboratories reporting to the RVDSS adds to this complexity. The routine use of RT-PCR testing has only recently become available in Canada (only 20% of tests used RT-PCR methods as of 2005/06 [23]), but increased use of this modality is expected to improve accuracy.
Population or system level sensitivity estimates that include the effects of sample quality are limited. Grijalva and colleagues [39] estimated the diagnostic sensitivity in a capture recapture study of children hospitalized for respiratory complications at 69% for a RT-PCR based system and 39% for a clinical-laboratory based system (passive surveillance of tests performed during clinical practice, and using a variety of commercially available tests).
Though the expected proportion of influenza tests that were due to influenza infections is unknown and variable, our model estimate of 30% appears plausible. Cooper and colleagues [33] attributed 22% of telephone health calls for cold/flu to influenza over two relatively mild years, and elsewhere 20% of admissions for acute respiratory infections (including influenza) in adults aged 20-64 years were attributed to influenza, and 42% for seniors [1].
While there are limitations with this approach, there are no other simple alternatives to assist in the interpretation of the RVDSS data. It would have been helpful to analyze data based on each specimen sent for testing. With only the number of weekly tests and number of positive results, we were unable to calculate the number of specimens that were actually found to be negative for all four viruses, or to estimate the extent of co-infection. Coinfection, which was not accounted for in our model, could result in an under-estimation of the number of falsely negative tests, as the attribution of an influenza negative test that was actually coinfected with influenza and another respiratory virus would have to be split between the viruses. With auxiliary information associated with each specimen, model estimates of false negative rates based on, for example, test type, time since onset of symptoms, age of the patient, or clinical presentation would have allowed us to explore the reasons for the high false negative rates. As the false negative rate appears to be laboratory dependant (data not shown), this estimated range is applicable only to the RVDSS for the study period. A significant reduction in the false negative rate is anticipated as methods become standardized and with the uptake of the new RT-PCR methods. As positive results, particularly for culture, are often obtained a week or more after the specimen was received, some positive results may have been reported in a different week than the test. Multiple test results for a single specimen may have also contributed to reporting irregularities. These irregularities would tend to bias the estimated parameter towards zero, and hence the estimated sensitivity towards 1. Considering the overall model fit and the relative severity of influenza [1], we conclude that our estimate of sensitivity may be slightly over-estimated (number of false negatives under-estimated).
Poor test sensitivity contributes to the chronic underestimation of the burden of influenza in the general population. Since estimates of the burden of illness drive planning for preventive and therapeutic interventions, it is important to improve all aspects leading to improved diagnostic accuracy. We have illustrated a simple method that uses the surveillance data itself to estimate the system wide sensitivity associated with the weekly proportion of tests confirmed positive. Although our estimate of sensitivity is only applicable to the interpretation of the RVDSS data over the study period, similar estimates for specific cohorts or laboratory procedures may help guide further investigation into the reasons for the large number of false negative test results. The capacity for improved diagnostic accuracy will ultimately improve our understanding of the epidemiology of influenza.