Spectrum and Inoculum Size Effect of a Rapid Antigen Detection Test for Group A Streptococcus in Children with Pharyngitis

Background The stability of the accuracy of a diagnostic test is critical to whether clinicians can rely on its result. We aimed to assess whether the performance of a rapid antigen detection test (RADT) for group A streptococcus (GAS) is affected by the clinical spectrum and/or bacterial inoculum size. Methods Throat swabs were collected from 785 children with pharyngitis in an office-based, prospective, multicenter study (2009–2010). We analysed the effect of clinical spectrum (i.e., the McIsaac score and its components) and inoculum size (light or heavy GAS growth) on the accuracy (sensitivity, specificity, likelihood ratios and predictive values) of a RADT, with laboratory throat culture as the reference test. We also evaluated the accuracy of a McIsaac-score–based decision rule. Results GAS prevalence was 36% (95CI: 33%–40%). The inoculum was heavy for 85% of cases (81%–89%). We found a significant spectrum effect on sensitivity, specificity, likelihood ratios and positive predictive value (p<0.05) but not negative predictive value, which was stable at about 92%. RADT sensitivity was greater for children with heavy than light inoculum (95% vs. 40%, p<0.001). After stratification by inoculum size, the spectrum effect on RADT sensitivity was significant only in patients with light inoculum, on univariate and multivariate analysis. The McIsaac-score–based decision rule had 99% (97%–100%) sensitivity and 52% (48%–57%) specificity. Conclusions Variations in RADT sensitivity only occur in patients with light inocula. Because the spectrum effect does not affect the negative predictive value of the test, clinicians who want to rule out GAS can rely on negative RADT results regardless of clinical features if they accept that about 10% of children with negative RADT results will have a positive throat culture. However, such a policy is more acceptable in populations with very low incidence of complications of GAS infection.


Introduction
Group A streptococcus (GAS) is found in 20% to 40% of cases of childhood pharyngitis; the remaining cases are considered viral [1]. Clinical examination cannot distinguish accurately between viral and GAS pharyngitis [2], and a diagnostic test based on a throat swab is recommended in most countries [3]. Throat culture on a blood agar plate in a microbiology laboratory for 48 hours is the reference test for diagnosis of GAS pharyngitis [4][5][6][7]. Rapid antigen detection tests (RADTs) have been proposed as an alternative to throat culture. Compared to laboratory culture, RADTs have high specificity (<95%) and results are immediate.
Their main drawbacks are low sensitivity (<85%, range 65.6% to 98.9%) [8,9] and variations in sensitivity by clinical spectrum of the disease (spectrum effect, or spectrum bias) [10][11][12]. The major issue for the spectrum effect is the generalisability of test performance. First, the overall population estimate might not be generalisable to patient subpopulations; second, the diagnostic accuracy that was observed in one study might not be applicable to other patients. Three studies have shown a significant spectrum effect on RADT sensitivity, but these studies had methodological limitations suggesting selection, indication, partial verification, and measurement biases [13][14][15]. No studies investigated the effect of clinical spectrum on sensitivity, specificity, likelihood ratios and predictive values of RADTs at the same time [13][14][15]. As well, RADT sensitivity is affected by inoculum size [16][17][18] -the amount of GAS colonies identified on throat culture and considered a proxy of the bacterial load on the swab. In vitro, the threshold of positivity of RADTs is between 10 5 and 10 7 colony-forming units per ml [19]. To date, several studies evaluated the effects of clinical spectrum and inoculum size on RADT sensitivity, but these effects were studied only separately, and no study has analysed the potential relation of clinical spectrum and inoculum size.
Because of the need to reduce antibiotics consumption, clinical decision rules were developed to help clinicians determine which patients should undergo testing and/or treatment with antibiotics. A decision rule based on the McIsaac score was recently proposed for adults and children [20][21][22], but this McIsaac-score-strategy was insufficiently validated in children [21][22][23]. In one validation study by McIsaac et al., the reader cannot evaluate the risk of selection bias because the distribution of scores in children was not reported [21]. Another validation study did not include children with low-risk scores (score #1) [22]. A third study aimed to validate the McIsaac score in children, but the rule was modified for application in low-resource settings and no longer integrated the use of a throat culture or RADT [23].
We aimed to determine whether the diagnostic accuracy of a RADT for GAS pharyngitis is affected by the clinical spectrum effect and/or bacterial inoculum size and to validate a McIsaacscore-based decision rule.

Study Design
This is a secondary analysis of data from an office-based, multicenter, prospective study that took place in France between October 2009 and June 2011 (unpublished data). The aims of the princeps study were to evaluate the frequency of GAS carriage in healthy children and to compare the performance of a RADT between children with pharyngitis and healthy children, with throat culture in a microbiology laboratory as the reference test (intermediate results presented at the 28 th Annual Meeting of The European Society for Paediatric Infectious Diseases, Nice, France, 2010; abstract A-229-0021-00920). This ancillary analysis was restricted to patients with pharyngitis from the first year of inclusion. The STARD statement was followed for reporting the results of the study [24].

Patients
Eligible patients were 3 to 15 years old who were evaluated by their paediatrician between October 2009 and May 2010, who had a diagnosis of pharyngitis and did not receive antibiotics for 7 days before inclusion. Pharyngitis was defined by inflammation of the pharynx and/or tonsils. In total, 17 French paediatricians who are part of a research and teaching network (ACTIV) participated in the study [25].

Throat Swabs
Throat samples were obtained by use of a double-swab collection-transportation system (Venturi Transystem Amies agar, COPAN Diagnostics, Corona, CA, USA). The RADT (StreptAtest, Dectrapharm, France), a GAS-specific immuno-chromatographic strip assay, was performed immediately with swab #1 collected in the paediatrician's office. Swab #2 was held at ambient temperature and sent within 72 hr to the Robert Debré Hospital laboratory by an express messenger service. On receipt, the swab was rolled over one-quarter of a trypticase soy agar plate with 5% sheep blood, and the inoculum was further distributed on the plate by streaking with a sterile wire loop. The plates were incubated anaerobically at 37uC and read after 18 to 24 hr. Negative cultures were reexamined after an additional 24 hr of incubation (48 hr total). ß-haemolytic colonies were further investigated by latex agglutination (Prolex, Pro-Lab Diagnostics, Richmond Hill, ON, Canada). With GAS positivity, inoculum size was estimated as follows: 1+, ,10 colonies; 2+, 11-50 colonies; and 3+, .50 colonies per plate. Data for patients with 1+ and 2+ cultures were combined to produce a dichotomized variable (light or heavy GAS growth). Microbiologists were blinded to individual clinical data and RADT results.

Clinical Data
We collected clinical data needed to calculate the McIsaac score for each patient. The McIsaac score predicts GAS pharyngitis and ranges from 0 to 5 [20][21][22]. McIsaac criteria include age 3 to 15 years, history of fever, tonsillar swelling or exudate, tender anterior cervical adenopathy, and absence of cough. As for other studies [13,26], the adenopathy criterion was modified to include any anterior cervical adenopathy $1 cm and/or presence of tenderness because tender lymph nodes can be difficult to assess in children. As for other authors [13][14][15], extreme McIsaac scores were combined (scores 1 and 2, and 4 and 5), and for analysis of clinical components of the score, age was re-coded into 2 categories of equal range (3-8 years; 9-14 years).

Sample Size
In the original study, sample size was estimated so that the 95% confidence interval (95CI) for RADT sensitivity would be a +/ 25% estimation. Assuming sensitivity to vary between 75% and 95% and a GAS prevalence of 35%, a sample of 714 children with pharyngitis was needed. No other sample size calculation was performed for this secondary analysis.

Ethics
The study protocol was approved by the Institutional Review Board Comité de Protection des Personnes Ile-de-France XI (nu09016) and the French administrative authorities (CNIL, nu1354254; Afssaps, nu2009-A00086-51). Parent and patient approval for participation was obtained before inclusion. Data were double entered into 4D software version 6.4 (4D) and the database was fully anonymized.

Statistical Analysis
Hospital laboratory throat culture was considered the reference test. First, we described patient demographic features and overall prevalence of groups A, C and G b-haemolytic streptococci. Patients with group C or G b-haemolytic streptococci were considered GAS-negative in the analysis. Then, GAS prevalence by McIsaac score and each McIsaac criterion were described. Second, we used chi-square tests to investigate the clinical spectrum effect on RADT diagnostic accuracy (sensitivity, specificity, likelihood ratios and predictive values) by McIsaac score and each McIsaac criterion. Third, chi-square tests were used to compare the distribution of GAS inoculum size by officeto-laboratory delay (#48 vs. .48 hr) and to study the effect of inoculum size (light or heavy GAS growth) on RADT sensitivity. Fourth, the distribution of heavy inocula by McIsaac score was explored by a chi-square test for trend, and the association of clinical signs and inoculum size was evaluated by comparing the frequency of heavy inoculum according to each clinical criterion by chi-square tests. Fifth, a binomial model with an identity link was used to evaluate the combination of clinical spectrum effect and inoculum size on RADT sensitivity [27]. Differences in RADT sensitivity were estimated by restricting the analysis to patients with a positive throat culture and using the result of the RADT as the outcome in the model. RADT sensitivity was investigated by univariate modeling according to each McIsaac criterion and then, multivariate modeling by all McIsaac criteria and inoculum size. Potential interaction between inoculum size and each McIsaac criterion was assessed by use of a Wald test. For multivariate modeling, variables were added one by one following the crescent univariate statistical significance order until nonconvergence of the model. Significant interactions were managed by introducing corresponding interaction terms in the multivariate model and, in case of non-convergence, by stratification on inoculum size. Sixth, the accuracy of the following McIsaac-scorebased decision rule was studied in terms of sensitivity and specificity: (a) scores ,2, no further testing or antibiotic; (b) scores 2-3, culture all, antibiotics only with positive culture; (c) scores $4, treat empirically with antibiotics [22]. Statistical analysis involved use of Stata/SE 12 (StataCorp, College Station, TX, USA).
GAS prevalence increased with increasing McIsaac score (p,0.001) (Tables 1 and 2), from 24.3% to 41.2% for children with #2 and $4 McIsaac score, respectively. Some clinical criteria were also associated with GAS: age ,9 years, absence of cough, and presence of fever (Table 2).
We found significant variations in RADT sensitivity and specificity by McIsaac score (

Combined Effect of Clinical Spectrum and Inoculum Size
The results of univariate binomial modeling for RADT sensitivity were almost identical to those of stratified analysis (Tables 2 and 3): age $9 years and presence of cough were significantly associated with decreased RADT sensitivity (p,0.05). After stratification on inoculum size, wide and statistically significant variations in RADT sensitivity were found in patients with light inoculum (e.g., 37.4% variation in sensitivity by age, p,0.01), but not widely (,6%) nor significantly in patients with heavy inoculum. Interactions were found between inoculum size and age (p = 0.03), anterior cervical adenopathy (p = 0.01), and tonsillar swelling or exudate (p = 0.07).
Because of these multiple interactions and because of no convergence in the multivariate model after introducing interaction terms, 2 multivariate models of RADT sensitivity were fit by stratifying on inoculum size ( Table 3). Results of multivariate modeling were close to those of univariate modeling. Light inoculum was associated with wide and statistically significant adjusted variation in RADT sensitivity (e.g., from 23.6% to 40.9%), but heavy inoculum was not widely (#6%) or significantly associated.

Discussion
Accuracy of a diagnostic test can vary across patient subgroups within a population, a phenomenon referred to as spectrum effect (or spectrum bias) [10][11][12]. In assessing the relation of clinical spectrum and inoculum size with RADT performance for GAS pharyngitis, we confirm that the spectrum effect has an impact on RADT sensitivity and report for the first time that it also affects the specificity, likelihood ratios and positive predictive value of the test. However, the spectrum effect had no impact on the negative predictive value of the RADT, which remained stable at about 92%, regardless of McIsaac score or its components. Clinicians do not want to expose their patients to GAS suppurative and nonsuppurative complications by withholding antibiotic treatment and therefore need a diagnostic test with high, stable negative predictive value. In this population with a GAS prevalence of 36%, a patient with a negative RADT result had a probability of GAS-positive throat culture close to 8% regardless of clinical features.
This study confirms the important effect of inoculum size on RADT sensitivity, already described in other studies, and reports for the first time an association of clinical spectrum and inoculum size. Our data suggest that having more streptococci in the throat (greater inoculum size) might be associated with more intense symptoms (higher McIsaac score) (p = 0.09), although we might not have had enough statistical power to validate this hypothesis. However, because we were not able to differentiate GAS carriers from truly infected patients (i.e., by assessment of streptococcal antibody response) [16,28], we cannot conclude on whether these patients with low McIsaac scores and light inocula were more likely to be GAS carriers rather than truly infected patients.
We chose to investigate the McIsaac score in 3 categories rather than compare the accuracy of the RADT above and below a defined breakpoint [15] because this corresponds to the original aims of the score -to stratify patients into 3 levels of risk and to suggest a subsequent course of action (low-risk: no test, no antibiotics; intermediate risk: culture or RADT, antibiotics only for positive results; high-risk: empirical antibiotic treatment without testing) [20][21][22]. According to our results, the McIsaac score alone does not allow for ruling out or affirming the diagnosis of GAS pharyngitis because the reported GAS prevalence ranged from 25% (too-high level to rule out) to 44% (too-low level to affirm) in children with a score of 1 and 5, respectively. Moreover, the McIsaac-score-based decision rule had only 52.0% (47.5%-56.5%) specificity, which does not seem consistent with the current need to reduce antibiotic consumption to contain antimicrobial resistance.
One of the strengths of this study is its prospective, multicenter design, which limits the selection bias potentially present in other studies [13,14]. All included children underwent RADT and laboratory culture, which excluded the possible indication and verification biases of other studies [13,14]. The results of our study are close to those reported in the literature. GAS prevalence was 36.3% (32.9%-39.8%), which is close to that from a recent metaanalysis (37%, 32%-43%) [1]. The overall RADT sensitivity was 86.7% (82.2%-90.4%), which is close to that from another recent meta-analysis (85%, 84%-87%) [9]. A limitation is that we used a modified adenopathy criterion for the analyses. However, univariate and multivariate modeling results and the performance of the McIsaac-score-based decision rule were stable in a sensitivity analysis involving the original McIsaac adenopathy criterion instead of our modified criterion. Another limitation to our study is that some throat swabs were plated .48 hr after collection. This delay had no significant effect on the distribution of GAS inocula, but prolonged or inadequate shipping conditions could have resulted in the loss of viability of GAS if the original swab had only small numbers of bacteria. Therefore, some swabs with light inocula could have been falsely read as negative on throat culture. Because RADTs are more likely to give negative results with light inocula, these swabs might have led to a systematic decrease in number of RADT falsenegative results and systematic increase in number of RADT truenegative results, with over-estimation of the negative predictive value of the RADT as a result. We can assume that this bias occurred at random because shipping and all bacteriologic investigations were blinded to clinical data and RADT resultsnon-differential measurement bias. Such a bias usually leads to loss of power, and our results regarding the stability of the negative predictive value of the RADT should be considered with caution until they are confirmed with a larger sample of patients.
The American Academy of Pediatrics advises that ''Children with manifestations highly suggestive of viral infection […] generally should not be tested for GAS infection'' [7]. Such guidelines cannot be formally validated because the criteria proposed to determine patients who should be tested are not integrated in an explicit decision rule and remain subject to personal interpretation. Because of GAS carriage and because GAS complications have become uncommon in North America and Western Europe [29], we advocate that the target sensitivity of diagnostic strategies for GAS pharyngitis should not be 100%. Moreover, discussions should not focus on sensitivity alone. We showed that the negative predictive value of a test can be stable across patient subgroups because variations in prevalence can be balanced by concomitant variations in sensitivity and specificity. Therefore, clinicians who want to rule out GAS can rely on negative RADT results regardless of clinical features if they accept that about 10% of children with negative RADT results will have a positive throat culture. However, these results should be considered with caution because they depend highly on the study population, the training of clinical and laboratory personnel and the microbiological devices and protocols used. Moreover, available data suggest that false-negative RADT results are as likely to occur in truly infected patients as in GAS carriers [16,28]. Therefore, such a policy would be more acceptable in populations with very low incidence of suppurative and non-suppurative complications of GAS infection [30]. The McIsaac-score-based decision rule is insufficiently accurate for children, and efforts are needed to develop more specific selective testing strategies for GAS pharyngitis in children.