Performance and variability of QuantiFERON Gold Plus assay associated with phlebotomy type

Background QuantiFERON Gold Plus (Plus) assay has two approved methods for blood collection: direct in-tube (Plus direct) or the transfer of blood from a lithium heparin tube (Plus transfer). Currently, there is little data comparing the results of Plus and the QuantiFERON Gold In-Tube (Gold) based on blood collection. Methods In 2017, high risk healthcare workers undergoing annual tuberculosis infection screening at Houston Methodist Hospital, a private hospital in the Texas Medical Center (Houston, TX, U.S.A.) were consented and enrolled in a study comparing the Gold-in-tube (Gold), Plus direct in-tube, and Plus transfer assays. Blood was drawn concurrently for all 3 assays. Results Phlebotomy occurred on 300 consecutive, consented and enrolled participants in the study. The proportion of positive test results for the Gold, Plus direct and Plus transfer assays were 10% (29/300), 12% (35/299) and 17% (51/299), respectively. The agreement in the results of Gold versus Plus direct, Gold versus Plus transfer, and Plus direct versus Plus transfer was 91%, kappa (κ) = 0.56; 91%, κ = 0.59; and 85%, κ = 0.37, respectively. Conclusions Among high risk healthcare workers in a low prevalence tuberculosis setting, the Gold Plus assay had a higher proportion of positive results than the Gold in-tube assay. The agreement between the Gold, Plus direct and Plus transfer assays was unexpectedly low for simultaneously obtained samples. Blood transfer using lithium heparin offers individual clinics and public health programs greater ability to customize protocols, but variability of results still exists.


Introduction
There are several diagnostic and screening assays for tuberculosis (TB) infection (TBI) including the tuberculin skin test (TST) and interferon gamma release assays (IGRAs), which include the QuantiFERON-TB Gold in-tube (QFT-G) (QIAGEN, Germantown, MD, USA) and T-SPOT.TB (Oxford Immunotec, Inc., Marlborough, MA, USA) assays. A new IGRA, Quanti-FERON-TB Gold Plus (QFT-P) (QIAGEN, Germantown, MD, USA), was approved by the U. S.A. Food and Drug Administration (FDA) approved for use in the United States (U.S.) on June 8, 2017. Major differences between the QFT-G and the QFT-P include the removal of TB7.7 peptides, the addition of a second antigen tube containing shorter peptides for ESAT-6 and CFP-10, aimed at eliciting a response from CD8+ T-cells, as wells as the peptides directed at CD4+ cells in the first antigen tube, and the standardization of both blood collection and laboratory procedures (as described below).
The QFT-P is approved for a direct in-tube phlebotomy draw or an indirect phlebotomy draw into a lithium heparin (LiHp) tube, where the blood is subsequently transferred into the four QFT-P tubes. This standardized transfer procedure is expected to reduce indeterminate results caused by pre-analytical errors such as tubes not being shaken as the transfer will be conducted in the laboratory by trained technicians [1]. The second standardization is to bring uniformity to all labs by using a four point standard curve rather than an eight point standard curve, which is needed to calibrate and interpret optical density values into IFN-γ concentrations.
QFT-P is expected to be more sensitive than QFT-G; however, early publications on the sensitivity of QFT-P has shown equal sensitivity compared to QFT-G. Studies conducted in Japan, Italy, Germany, Belgium, and the Netherlands (low TB prevalence countries) found no significant differences between the sensitivity among bacteriologically and non-bacteriologically confirmed active TB patients and specificity among healthy subjects with low or no risk for TB between the third generation (QFT-G) and fourth generation (QFT-P) assays [2][3][4][5][6][7]. A study conducted among U.S. Health Care Workers (HCWs) found a positivity rate of 4% in the study population when using QFT-G and 6% when using QFT-P with 96% agreement between the assays [8].
Sensitivity and specificity are difficult to calculate in TBI assays because of the unavailability of a "Gold Standard" for latent TBI (LTBI) [9]. Surrogate measures are often used including active TB disease which may underestimate sensitivity of a test to detect TBI and overestimate specificity when using individuals with zero TB exposure risk. Specificity is usually estimated in IGRA /TST negative low risk individuals with no known exposures to TB diseased patients; however, when assessing TBI diagnostic performance, results from active TB patients and individuals with no TB risk factors are considered lower in the hierarchy of standards than correlation of results to the exposure gradient of TB infection [10,11].
The goals of the current study were to: (1) Analyze the agreement and performance between the QFT-G and the QFT-P in a population of U.S. HCWs with greater than average risk, and (2) compare agreement and performance of QFT-P results based on different phlebotomy methods: directly collecting blood into the assay tubes (QFT-PD), and transferring the blood from a single standard heparinized tube into the QFT-P assay tubes (QFT-PT).

Methods
Eligible HCWs at the Houston Methodist Hospital (HMH), a private hospital located in the Texas Medical Center in Houston, TX, U.S.A., undergoing annual TB screening were consented for participation in the study. Participants completed a short questionnaire on TB risk factors (including questions on demographic factors, employment and medical history) and had blood drawn by well-trained phlebotomists at the HMH Outpatient Laboratory. The study was approved by the HMH IRB (Pro00016966).

Eligibility
HCWs were eligible for the study if they (1) had a previous positive tuberculin skin test (TST), (2) were foreign born, (3) had received a BCG vaccination, or (4) had immunosuppression due to a medical condition or medication. Exclusion criteria included: (1) not being eligible for a QFT-G test during annual TB screening (i.e. managers tested on a different cycle), (2) having a history of active TB disease or (3) having had a TST within three months of enrollment into the study (new employees).

Blood collection
Participants had a total of 10mL of blood collected: 1.0mL directly drawn into each of the three QFT-G tubes (Grey, Red, and Purple), 1.0mL directly drawn into each of the two QFT-P antigen tubes (QFT-PD-Yellow and Green) and 5.0mL drawn into a LiHp blood. A single positive and negative control was used for both the QFT-G and QFT-PD assays. The blood tubes were transferred to the TB laboratory within 10 hours of collection and incubated at 37˚C, meeting the manufacturers guidelines of initiating incubation within 16 hours of blood collection [12]. In the laboratory, the heparinized blood was transferred within 3 hours into of the 4 QFT-P tubes, which were subsequently incubated within two hours of the transfer [13].

Sample processing and storage
Per the manufacturer's protocols, all QFT tubes (QFT-G, QFT-PD, QFT-PT) were incubated between 16 and 24 hours at 37˚C before being stored at room temperature until centrifugation [12]. QFT-G assays were run within 3 days of blood collection. QFT-P plasma was harvested and stored at -80˚C before being batched and tested. An eight point standard curve was used to calibrate and interpret optical density into estimated IFN-γ for the QFT-G, but a four point standard curve was used for calibration and interpretation for the QFT-P assays. Excess unused plasma from the positive and negative control of the QFT-G assay were frozen and stored until the QFT-PD was run. The plasma was thawed along with the stored and frozen QFT-P plasma, and the plasma from the positive control, negative control, TB1 and TB2 tubes were run with the QFT-PD assay.

Statistical analysis
Sample size was calculated to detect a 2% increase in the proportion of positive tests in the QFT-P assay compared to the QFT-G assay using a two-sided test if the proportion of positive QFT-G assays was 6.5%. As this was a pilot study, 10% of the sample size was enrolled in the study [14]. Frequencies and proportions of test results (positive, negative, and indeterminate) were calculated for the QFT-G and the two QFT-P assays. The agreements between QFT-G and QFT-PD, QFT-G and QFT-PT, and QFT-PD and QFT-PT were analyzed using percent agreement, Cohen's kappa of inter-rater agreement (κ). The conservative cutoff of 0.7 IU/mL IFN-γ was utilized due to reproducibility studies identifying measurements between 0.2 and 0.7 IU/mL as being a "zone of uncertainty" where one is most likely to see reversions and conversions in serial testing [15]. Frequencies and proportions of QFT-P assay's positive due to TB1, TB2 or both TB1 and TB2 with IFN-γ values greater than or above 0.35 IU/ml after subtracting the IFN-γ measured in the negative were reported as this is the manufacturer's cutoff. Boxplots were used to compare absolute difference between TB1-nil or TB2 -nil against the cutoff of 0.35. The frequency and proportions of QFT-P assay's positive due to TB1, TB2 or both TB1 and TB2 with IFN-γ values greater than 0.70 IU/mL were reported. Risk factors for having a positive assay result were identified using univariate and multiple logistic regression. Risk factors were defined as having a prior positive TST or IGRA, previous treatment for TBI, history of autoimmune disease(s), taking immunosuppressive drugs, or receiving a vaccine within 6 weeks prior to having the QFT. All analyses were conducted using SAS 9.4 (Cary, NC). A P < 0.05 was considered statistically significant.

Assay results and agreement
Twenty-nine of 299 (10%) participants had positive results on the QFT-G assay and one (0.3%) participant had an indeterminate result (Fig 1). The QFT-PD had 35 (12%) positive results and one indeterminate result, and the QFT-PT had 51 (17%) positive results and no indeterminate results (Fig 1). The percent agreement between the qualitative results of QFT-G and QFT-PD was 91% (κ = 0.56; Table 2). Of the 26 discordant results between QFT-G and  QFT-PD, 16 (62%) had negative QFT-G and positive QFT-PD results. The percent agreement between QFT-G and the QFT-PT was 91% (κ = 0.59; Table 2). Of the 28 discordant results between the QFT-G and the QFT-PT, 24 (86%) had negative QFT-G test results and positive QFT-PD test results. The percent agreement was lowest between the QFT-PD and the QFT-PT (85%; κ = 0.37, Table 2). There was a significant difference in the proportion of results between the QFT-PD and QFT-PT (P<0.001).
Of the participants that reported the results of their last TST and had results for all three QFT assays (n = 238), 131 (55%) had a positive test result on their last TST. Eight of the 10 participants with a previous positive QFT-G reported having a previous positive TST, and the remaining two participants with previous positive QFT-G assays did not know the results of their last TST. In TST (+) participants, the agreements [%; κ] between pairs of assays were:   Fig 2).
Using the cut-off of a positive assay �0.35 IU/mL, 60% of the positive QFT-PD results were positive due to both TB1 and TB2 compared to the 53% of positive QFT-PT (Table 3). When the cut-off is raised to �0.70 IU/mL, little difference in the proportion of assay results positive due to TB1 and TB2 for QFT-PD and QFT-PT (52% versus 54%). Raising the cutoff for positivity to �0.70 IU/mL causes a large reduction in the number of positive QFT-PT assays (n = 51 versus n = 26) indicating that almost half of the positive QFT-PT were near the cutoff (Table 3).
The analysis of risk factors for having a positive QFT assay found that daily consumption of green tea was significantly associated with having a positive QFT-G (OR: 4.84, P = 0.03), and receiving a vaccine within 6 weeks prior to the assay was significantly associated with a positive QFT-PD (OR = 6.09, P = 0.004) and QFT-PT (OR = 6.02, P = 0.004; Table 5).

Discussion
Within a select population of higher than average risk, HCWs in a low TB incidence country, the overall agreement between the QFT-G and the QFT-P, regardless of blood collection method was high, but there was lower agreement found between the QFT-PD and the QFT-PT. More positive results were found using the QFT-P than the QFT-G, and more positive results were found with the QFT-PT than the QFT-G and QFT-PD. Overall the agreement between the QFT-G, QFT-PD and QFT-PT is 85% or greater. This is in high agreement; however, when comparing the 3 different assays' percent agreements, the QFT-PD and QFT-PT agreement were 6% lower than either the QFT-G or to either QFT-P assays. This difference is most likely due to the inherent variability of an immune-based assay being used in real world conditions. In addition, the sample size in this study was 299 and a larger sample size might  show a greater agreement overall in the percent agreement between the QFT-PD and QFT-PT. Metcalfe et al. reported the within subject variability of the QFT-G for individuals in a low TB incidence setting was ± 0.6 IU/mL [16]. QFT results near the cutoff zone (0.35 IU/mL) can convert or revert test results upon re-testing. This may account for some of the discordant test results between the assays. When the cutoff for a positive QFT-P was raised to 0.7 IU/mL, the percent agreement between the QFT-PD and QFT-PT rose (85% versus 94%), and the agreement between the QFT-PT, QFT-PD, and the QFT-G increased when using the conservative definition of TB1 and TB2 results both �0.35 IU/mL when compared to at least one antigen tube with a value �0.35 IU/mL [17]. A large scale study comparing QFT-G and QFT-PD among U.S. HCWs reported an overall agreement between the two tests at 96% [8], which is similar to the percent agreement found in this study (91%). QFT-G, QFT-PD, and QFT-PT assays showed high percent agreement, but low κ reflecting poor agreement (0.56, 0.59, and 0.37) in the current study. The positivity among the large cohort of HCWs was found to be 4% with QFT-G and 6% with QFT-PD [8]. The positive QFT-G results and negative QFT-PD results were seen in 1% of HCWs, and negative QFT-G results and positive QFT-PD results were seen in 3% of HCWs [8]. These findings were similar to those seen in the current study (3% and 5%).
According to Feinstein et al, κ may be affected by the prevalence of test results (17). The low κ may be partially accounted for by the low prevalence of positive and indeterminate test results. κ underestimates agreement on rare outcomes. When percent agreement by chance (p e ) is high, the calculated κ values indicate low agreement [18].
Discrepant results between the QFT-G and the QFT-P have been seen in other studies. Hoffmann et al. reported nine out of 163 patients tested with both assays had discordant results, and three (33%) of the cases had positive QFT-G results and negative QFT-P test results, while the other six cases were QFT-G negatives with positive QFT-P results [4]. A study among migrant students in Germany found that QFT-PT had a conversion rate of 4% and a reversion rate of 7% [19]. A large multi-center study conducted in Netherlands and Belgium found 50 of 1031 (5%) discordant test results between QFT-G and QFT-P assays, and 60% of these discordant results were in the borderline range of 0.25-0.8 IU/mL [6].
Several pre-analytical factors have been associated with indeterminate QFT-G results. These factors can affect the amount of IFN-γ measured in the assay and include duration between blood draw and incubation [15,20,21], blood volume [1,15], tube shaking [1], and ELISA batching [15]. Vigorous shaking has been found to increase the median IFN-γ measured in the nil and TB Ag tubes [1]. Other factors that have been shown to affect the QFT-G include environmental factors such as pre-incubation temperature [22,23] and season [24- 26]. The greater frequency of positive assay results in the QFT-PT assay compared to the QFT-PD may have been caused by the increased amount of agitation sustained by the blood in the QFT-PT assay due to the transfer to the assay tubes compared to the QFT-PD blood. The amount of positive results seen with the QFT-PT assay was reduced when the cutoff was raised to 0.7 IU/mL potentially reducing the number of false positive results. Another source of variability between the QFT-G and QFT-P results may be the standard curve used. The QFT-G assays were analyzed with an eight-point standard curve, but the two QFT-P assays were analyzed using a four-point standard curve per the instructions in the FDA-approved package insert. Nemes et al. reported finding that QFT-G samples analyzed with an eight-point standard curve had significantly higher IFN-γ values than samples analyzed with four-point standard curves [15]. The QFT-G and QFT-PD used the same positive and negative control in this study. The IFN-γ was estimated using the 8 point and 4 point standard curves for the QFT-G and QFT-PD assays, respectively. The standard curve used was not believed to have had an effect on the quantitative results of the assays.
Bittel et al. analyzed the differences in quantitative and qualitative QFT-G results between direct in-tube phlebotomy and blood collection using a LiHp tube [27]. Of the 107 HCWs screened, 98% had concordant qualitative results between the direct and transferred QFT-G [27]. A statistically significant difference was found between nil and mitogen IFN-γ measurements with the different phlebotomy methods indicating that the transfer affected the amount of IFN-γ being produced by PBMCs in the transferred tubes.
Different factors were identified as being associated with a positive QFT-G and QFT-P. Participants that claimed to regularly consume green tea reported drinking a median one cup per day (IQR: 1-2 cups). Green tea consumption was added to the analysis as a potential risk factor due to anecdotal evidence that some of our subjects who consumed green tea regularly had discrepant IGRA results. Compounds in green tea and thus green tea consumption have been found to affect IFN-γ production [28]. Because our HCW population has a high proportion of individuals of Asian descent we chose to include a green tea consumption question into our TB risk questionnaire to see if we were able to identify an association between green tea consumption and positive QFT results. There is certain evidence that green tea can increase the amount of IFN-γ secreted by splenocytes from mice treated with green tea extract [28], but there has been no systematic research on the effects of vaccines prior to administering a TST or an IGRA. Of the 37 participants that received a vaccine within 6 week prior to having phlebotomy for the QFT assays, 20 (54%) received a flu vaccine. There is evidence that influenza vaccination reduces the risk of TB incidence among the elderly [29]. It has been shown that there is an increase in IFN-γ producing NK cells and CD8+ T cells after influenza vaccination, and PBMCs cultured with influenza vaccine produced a greater amount of IFN-γ post-exposure compared to pre-exposure to the vaccine [30]. This evidence indicates that recent vaccination(s) can affect the amount of IFN-γ being produced. Excess IFN-γ production should ideally be controlled for in the assay by subtracting the baseline (unstimulated) IFN-γ measured, preventing false-positive results; however, the authors of this study are unaware of any studies conducted to determine how vaccines effect IGRAs directly. In addition, due to the small sample size of this project, these results should be interpreted cautiously. This pilot study had several limitations. First, the low number of TB infected participants and the low number of indeterminate results limits our ability to investigate the agreement between the two QFT tests. This also limits our ability to assess if phlebotomy method can lower the number of indeterminate results among patients undergoing TBI testing at our institution. Although certain concerns may be raised when the QFT-Plus tubes are drawn after the Mitogen tube was drawn, there are no restriction in the manufacturer's packet insert regarding the specific order of blood tubes to be drawn during phlebotomy. Therefore, the potential bias caused by the order of blood tubes to be drawn, if any, would be minimal. The plasma for the QFT-G was run immediately while the plasma for the QFT-P assays were frozen per manufacturer's protocol and run at a later date possibly introducing variability. The lack of a gold standard for TBI screening limits our ability to determine which assay is correct in the case of discrepant results.
In spite of limitations, the current study has many strengths. First, the study included data on risk factors for TBI and indeterminate IGRA results, and the availability of prior test results (TST and IGRA) for participants. Simultaneous testing of participants using the QFT-G, QFT-PD and QFT-PT minimized potential pre-analytical and analytical sources of variability. Last, the study took place during routine annual screening making the results more likely to be generalizable in other regularly screened groups in low incidence settings.
The QFT-P assay, no matter the FDA approved blood collection method, showed a high percent agreement with the QFT-G assay among a population of U.S. HCWs when compared to other studies reporting the agreement of the QFT-G and QFT-P; however, the Cohen's κ coefficients of inter-rater agreement indicated that the agreement between the assays was only from fair to moderate. The QFT-PT assay had numerous potential false positive assay results compared to the QFT-PD assay, but a larger study is needed to determine how to control for this variability. The use of the conservative interpretation cutoff of 0.7 IU/ml for a positive test result accounted for over half of the discrepant results. Without a "gold-standard", this study was unable to determine if the QFT-P was able to detect TBI with equal or greater sensitivity to the QFT-G. The option of collecting blood into a single LiHp tube prior to transferring the blood into assay tubes increases the ability of clinics and public health programs to customize their individual protocols to better meet their needs whether on-site or in the field; however, it is currently unknown how using this alternate blood collection method may affect the performance of the QFT-P assay.