Variability of the QuantiFERON®-TB Gold In-Tube Test Using Automated and Manual Methods

Background The QuantiFERON®-TB Gold In-Tube test (QFT-GIT) detects Mycobacterium tuberculosis (Mtb) infection by measuring release of interferon gamma (IFN-γ) when T-cells (in heparinized whole blood) are stimulated with specific Mtb antigens. The amount of IFN-γ is determined by enzyme-linked immunosorbent assay (ELISA). Automation of the ELISA method may reduce variability. To assess the impact of ELISA automation, we compared QFT-GIT results and variability when ELISAs were performed manually and with automation. Methods Blood was collected into two sets of QFT-GIT tubes and processed at the same time. For each set, IFN-γ was measured in automated and manual ELISAs. Variability in interpretations and IFN-γ measurements was assessed between automated (A1 vs. A2) and manual (M1 vs. M2) ELISAs. Variability in IFN-γ measurements was also assessed on separate groups stratified by the mean of the four ELISAs. Results Subjects (N = 146) had two automated and two manual ELISAs completed. Overall, interpretations were discordant for 16 (11%) subjects. Excluding one subject with indeterminate results, 7 (4.8%) subjects had discordant automated interpretations and 10 (6.9%) subjects had discordant manual interpretations (p = 0.17). Quantitative variability was not uniform; within-subject variability was greater with higher IFN-γ measurements and with manual ELISAs. For subjects with mean TB Responses ±0.25 IU/mL of the 0.35 IU/mL cutoff, the within-subject standard deviation for two manual tests was 0.27 (CI95 = 0.22–0.37) IU/mL vs. 0.09 (CI95 = 0.07–0.12) IU/mL for two automated tests. Conclusion QFT-GIT ELISA automation may reduce variability near the test cutoff. Methodological differences should be considered when interpreting and using IFN-γ release assays (IGRAs).

Estimates of variability have varied widely among studies that used different methods of performing QFT-GIT, different indices to assess variability, and different study populations with varied prevalence of Mtb infection and risk of infection. QFT-GIT variability in published studies has been attributed to temporal biologic fluctuations within subjects due to new Mtb infection [2,22], progression or treatment of human immunodeficiency virus (HIV) infection [23], response to treatment [24][25][26][27], differences in testing methods (such as difference in delay to incubation, duration of incubation, or incubation temperature) [14,28,29], and nonspecific test fluctuations due to random variation [2][3][4]21,30]. Determination of the background variability (noise, a change beyond which represents a ''true'' change) is challenging, especially near the cutoff separating positive and negative test interpretations. This is of critical importance in detecting new infection.
QFT-GIT is a complex test and may be prone to nonspecific random variation. Technical errors attributable to test complexity appear to contribute to IGRA variability [19]. Few studies have assessed the nonspecific random variability of QFT-GIT when repeated on the same samples or samples collected at the same time using identical methods. Discordance in interpretation when QFT-GIT was repeated on the same sample in different ELISAs has been approximately 3.6% [2,10] and 8.0% to 8.3% [29,31] when repeated in the same ELISA.
Although the development and initial evaluation of QFT-GIT relied on manual ELISA methods, automation may reduce QFT-GIT variability. Of the 126 measurements required for one QFT-GIT, 115 are automatable (Goodwin et. al., manuscript in preparation). To our knowledge, a comparison of variability between tests performed manually and between tests performed using an automated workstation has not been reported. To assess the impact of ELISA automation on QFT-GIT, we compared test results and measured variability when tests were performed with manual and automated methodologies.

Ethics Statement
The Centers for Disease Control and Prevention (CDC) and Wilford Hall Medical Center human subjects institutional review boards approved this study. All subjects provided written informed consent.

Subject Selection
After obtaining approval from human subjects review boards at the Centers for Disease Control and Prevention (CDC, Protocol # 5078) and Wilford Hall Medical Center (U.S. Air Force (USAF), Protocol # FWH20080002H), subjects were recruited from among CDC and USAF staff located in Atlanta, Georgia, and San Antonio, Texas, respectively, as part of a larger study investigating QFT-GIT variability. To increase the proportion of subjects with positive QFT-GIT results and to assess subjects with a continuous range of IFN-c measurements (including those with IFN-c measurements near the cutoff separating positive and negative interpretations), only persons with self-reported prior positive TST results were recruited. Prior unpublished assessments among a similar cohort found that 40% to 50% of persons with self-reported prior positive TST results were positive by QFT-GIT. Exclusion criteria were age of less than 18 years or a history of a severe TST reaction (e.g., blistering, scarring, or anaphylaxis). All subjects provided informed written consent and completed a detailed study questionnaire.

QFT-GIT Procedure
Blood from each subject was collected at one morning visit into two sets of QFT-GIT tubes (Set 1 or Set 2) so that an automated ELISA and a manual ELISA could be performed from each set of tubes. Tubes were purchased from Cellestis, Ltd (Cellestis Limited, Carnegie, Victoria, Australia), and each set of tubes included a Nil tube, a TB antigen tube, and a Mitogen tube. Each tube was labeled with a number and a barcode that (1) identified the specimen, (2) identified the tube type (i.e., Nil tube, TB antigen tube, or Mitogen tube), and (3) linked the specimen to subject and collection information. One mL of blood was collected into each tube and tube contents were mixed with a Stuart rock and roll mixer (SciTech Instruments, Inc., Franklin, NJ) for 3 minutes at 33 RPM. Within one hour of blood collection, tubes were incubated at 3760.5uC for 23 to 24 hours and then centrifuged at 3,000 g for 10 minutes.
IFN-c concentrations in plasmas in Nil tubes (Nil), TB antigen tubes (TB), and Mitogen tubes (Mitogen) were determined by ELISAs performed on the day after blood collection using reagents included in QFT-GIT kits. ELISAs were performed with the aid of an automated ELISA workstation (automated ELISA) or without the aid of an automated ELISA workstation (manual ELISA). Triturus automated ELISA workstations (Grifols, USA, Inc., Miami, FL) were used in CDC and USAF labs. For manual ELISAs, reagents were dispensed with Rainin LTS single and  Test results were interpreted as indicated in the CDC guidelines and Cellestis package insert [1,32]. The interpretation was ''positive'' if the Nil was #8.0 IU/mL and the TB Response was $0.35 IU/mL and $25% of the Nil. The interpretation was ''negative'' if the Nil was #8.0 IU/mL, the Mitogen Response was $0.5 IU/mL, and the TB Response was ,0.35 IU/mL or ,25% of the Nil. The interpretation was ''indeterminate'' if (1) the Nil was .8.0 IU/mL or (2) the Nil was #8.0 IU/mL, the Mitogen Response was ,0.5 IU/mL, and the TB Response was ,0.35 IU/mL or ,25% of the Nil.

Statistical Methods
Variability in test interpretations was assessed by calculating the percentage of subjects with any discordance among the four ELISAs. Additionally, positive agreement, negative agreement, and agreement beyond chance (Cohen's kappa statistic, k) were calculated for each pair of ELISAs. To assess variability in IFN-c measurements (i.e., Nil, TB, and TB Response), distributions were compared using the Wilcoxon signed-rank test. Five additional indices of quantitative variability were examined for each pair of ELISAs, the last two of which were derived from the standard deviation of the differences (SD diff ): (1) within-subject coefficient of variation (W-S CV%), (2) intraclass correlation coefficient (ICC), (3) mean difference (bias), (4) the smallest detectable difference (SDD), and (5) the within-subject standard deviation (W-S SD). SDD = 1.96*SD diff , and is the smallest change in a subsequent measurement that must occur to detect a change beyond the variability (e.g., noise) with 95% certainty [33,34], W-S SD = 6(SD diff /!2) [35], and represents 68% of the variation expected around the true value [36]. Limits of agreement (LOA) = bias 6 SDD and encompass the range around the bias that contains 95% of within-subject differences [37]. ICCs were calculated using the SAS macro ICC_SAS [38]. W-S CV% was calculated as described by Bland (root mean square approach) [39] for Nil and TB and estimated for TB Response using the formula !((W-S CV% TB ) 2 + (W-S CV% Nil ) 2 ) (root sum square method for estimating aggregate uncertainty). The W-S CV%s for the TB Response could not be directly determined due to inflation caused by zeros and negative mean values in the denominator (because some TB Response values were #0). A confidence level of 0.95 Table 5. IFN-c means, medians, and ranges for the four tests (IU/mL).

Subject Characteristics
Study participation is depicted in Figure 1. Of the 268 people asked to participate, 55 declined and 55 were not eligible. Of the 158 persons enrolled, 146 had four ELISAs completed (one automated and one manual ELISA for the first set of QFT-GIT tubes, and one automated and one manual ELISA for the second set of QFT-GIT tubes, referred to as A1, M1, A2, and M2, respectively). Characteristics of the study subjects are shown in Table 1.
Forty subjects (27.4%) had at least one positive interpretation. Two subjects (1.4%) had three positive interpretations, eight subjects (5.5%) had two positive interpretations, and five subjects (3.4%) had one positive interpretation. One subject had three indeterminate interpretations with low Mitogen Responses of 0.249 to 0.474 IU/mL and one negative interpretation with a Mitogen Response of 0.55 IU/mL. Nil, TB, and TB Response values for the 15 subjects with discordant results among the four tests (excluding the one subject with three indeterminate results) are shown in Table 3. Results are grouped as either single discordant (one discordant/three concordant) or double discordant (two opposing pairs of concordant results) and additionally categorized into eight groups according to the specific nature of the discordance. Twelve subjects (categories 1-6) were discordant between first and second tests. Two subjects had both automated tests positive and both manual tests negative (category 7), and one had both automated tests negative and both manual tests positive (category 8).
Indices of interpretation variability between pairs of ELISAs are shown in Table 4. Seven (4.8%) subjects had discordant results with automated ELISAs compared to 10 (6.9%) subjects with manual ELISAs (p = 0.17). Results from the 15 subjects with discordant results are depicted in Figure 2. Five of the 7 subjects discordant with the two automated tests (71%) had both TB Responses within 60.25 IU/mL of the QFT-GIT cutoff (0.1 to 0.6 IU/mL, gray dot-dashed lines) vs. 3 of 10 (30%) subjects discordant with the two manual tests.

Quantitative Results
Means, medians, and ranges for Nil, TB, and TB Response are shown in Table 5. There were no significant distributional differences between the two automated tests or between the two manual tests, but TB and NIL values in manual tests were significantly greater than in automated tests (p,0.03). There were no significant differences in TB Response between manual and automated tests. ICCs and W-S CV%s are shown in Table S1. Examination of difference (Bland-Altman) plots for TB Response, shown in Figure 3, shows an increase in variation as the mean of the paired measurements increased.
Analyses were performed examining variation within seven strata of mean TB Response, based on the mean of the four tests. Bias and 95% LOA are shown in Figure 4. The relatively large variability seen for the first stratum (,0.1 IU/mL) is due to grouping subjects with negative means, many of whom had large differences (which also may be seen in Figure 3). The fourth stratum (0.2 IU/mL to 0.499 IU/mL) shows variability in a range surrounding the QFT-GIT cutoff (0.3560.15 IU/mL). In this category, bias and LOA for manual tests were greater than for automated tests. As shown in Table S2, significantly higher W-S SDs were observed within this range for manual tests than for automated tests, as demonstrated by non-overlapping 95% confidence intervals (95% CI). SDDs for this range were also significantly higher for the manual tests than for the automated tests. When this range was expanded to 0.1 IU/mL to 0.6 IU/mL (0.3560.25 IU/mL), W-S SDs remained significantly higher for manual tests (0.27, 95% CI: 0.22-0.37) than for automated tests (0.09, 95% CI: 0.07-0.12). SDDs were also significantly higher for manual tests (0.75, 95% CI: 0.61-1.03) than for automated tests (0.25, 95% CI: 0.19-0.33) for this broader range.

Discussion
This study assessed the precision of the QFT-GIT using both automated and manual ELISA methods. We determined repeatability of QFT-GIT when performed manually on two blood samples collected at the same time and when performed with the aid of an automated ELISA workstation on two blood samples collected at the same time. We observed discordance of 4.8% between two automated tests and 6.9% between two manual tests. Additionally, we evaluated reproducibility of QFT-GIT when one test was performed manually and one test was performed with the aid of an automated ELISA workstation on blood samples collected at the same time. We observed discordance of 3.4% to 9.0% for automated versus manual paired combinations. Eleven percent of subjects (including the one subject with one negative result and three indeterminate results) had at least one discordant result among the four tests. Quantitative indices of variability showed that variation in TB Response near the cutoff separating positive and negative test interpretations was significantly greater with the manual method than with the automated method.
Our discordance rates of 4.8% for two repeated automated QFT-GITs and 6.9% for two repeated manual QFT-GITs are slightly higher than those from two similar studies in which ELISAs were repeated on blood collected at the same time [2,10]. Discordant rates of 3.6% were reported in both studies; however, the ELISA methods used for these studies were not described.
QFT-GIT is a complex assay, but investigators rarely specify details for performing the ELISAs.
Our estimates of QFT-GIT reproducibility when performed in the same lab using automated or manual methods ranged from 3.4% to 9.0%. Prior estimates of QFT-GIT reproducibility when ELISAs were performed in different labs using automated methods ranged from 3.3% to 6.6% [21]. Our finding of greater variability when the QFT-GIT ELISA is performed manually than when performed with the aid an automated ELISA workstation is not surprising, given the complexity of the assay. In a prior study, we reported that a reduction in the number of steps required for QFT-GIT compared to QFT-G was associated with a significant reduction in the number of unusual measurements [19].
We and others have previously suggested the need for a zone of uncertainty surrounding the 0.35-IU/mL cutoff currently used to separate positive and negative QFT-GIT results [2][3][4]6,13,21,40]. Clinicians may need to repeat testing when initial results are within a borderline zone to increase diagnostic certainty. However, there is no consensus on the size of the zone, and different sizes have been suggested or applied. Our finding of greater variability when the QFT-GIT ELISA is performed manually than when aided by an automated workstation suggests that a broader borderline zone would be needed when using manual methods. Use of a broader borderline zone may, in turn, necessitate more repeat testing. Greater precision may justify the cost of an automated ELISA workstation.
Our study has several limitations. First, the small sample size for some strata resulted in large confidence intervals for estimates of variability. Despite the small sample size, differences in variability between automated and manual TB Response in the stratum surrounding the cutoff were significant. Second, we only studied TB Response in persons who reported a prior positive TST. While other populations may have different proportions of negative, positive, and borderline TB Response values, this limitation would not be expected to alter variability within strata of TB Response values.
In conclusion, automation of QFT-GIT ELISA may reduce variability near the cutoff separating positive or negative interpretations. Methodological differences should be considered when interpreting and using IGRAs.