Clinical Diagnostic Utility of IP-10 and LAM Antigen Levels for the Diagnosis of Tuberculous Pleural Effusions in a High Burden Setting

Background Current tools for the diagnosis of tuberculosis pleural effusions are sub-optimal. Data about the value of new diagnostic technologies are limited, particularly, in high burden settings. Preliminary case control studies have identified IFN-γ-inducible-10kDa protein (IP-10) as a promising diagnostic marker; however, its diagnostic utility in a day-to-day clinical setting is unclear. Detection of LAM antigen has not previously been evaluated in pleural fluid. Methods We investigated the comparative diagnostic utility of established (adenosine deaminase [ADA]), more recent (standardized nucleic-acid-amplification-test [NAAT]) and newer technologies (a standardized LAM mycobacterial antigen-detection assay and IP-10 levels) for the evaluation of pleural effusions in 78 consecutively recruited South African tuberculosis suspects. All consenting participants underwent pleural biopsy unless contra-indicated or refused. The reference standard comprised culture positivity for M. tuberculosis or histology suggestive of tuberculosis. Principal Findings Of 74 evaluable subjects 48, 7 and 19 had definite, probable and non-TB, respectively. IP-10 levels were significantly higher in TB vs non-TB participants (p<0.0001). The respective outcomes [sensitivity, specificity, PPV, NPV %] for the different diagnostic modalities were: ADA at the 30 IU/L cut-point [96; 69; 90; 85], NAAT [6; 93; 67; 28], IP-10 at the 28,170 pg/ml ROC-derived cut-point [80; 82; 91; 64], and IP-10 at the 4035 pg/ml cut-point [100; 53; 83; 100]. Thus IP-10, using the ROC-derived cut-point, missed ∼20% of TB cases and mis-diagnosed ∼20% of non-TB cases. By contrast, when a lower cut-point was used a negative test excluded TB. The NAAT had a poor sensitivity but high specificity. LAM antigen-detection was not diagnostically useful. Conclusion Although IP-10, like ADA, has sub-optimal specificity, it may be a clinically useful rule-out test for tuberculous pleural effusions. Larger multi-centric studies are now required to confirm our findings.


Introduction
Annually, over half a million pleural effusions are diagnosed world-wide and it is one of the commonest forms of extra-pulmonary tuberculosis (TB; [1]). In Africa, where TB is out of control, extrapulmonary TB is more common due to HIV co-infection. The diagnosis of TB pleural effusion is challenging. Pleural biopsy has a good yield (,80%) but it is invasive, expensive and requires trained medical personnel [2]. Smear microscopy of pleural fluid had a dismal yield (,5%) and culture takes several weeks to obtain [2]. Other rapid diagnostic tools such as nucleic acid amplification tests (NAAT) have poor sensitivity in pleural fluid (,50%; [3]), though the performance outcomes of a standardized NAAT has not previously been evaluated in a high HIV sero-prevalence setting [3]. Given the drawbacks of existing tools investigators have pursued the detection of measurable biomarkers, including IFN-c levels [4] and adenosine-deaminase (ADA), as diagnostic adjuncts [2]. However, measuring IFN-c is relatively expensive in high burden settings [5] and ADA is not widely available in clinical laboratories, and is non-specific even in high burden settings [2,6].
An alternative promising, but poorly studied, biomarker is IFNc inducible protein of 10 kDa (IP-10). IP-10, a Th1-associated chemokine, was found to be a useful discriminatory tool in three case-controlled studies [7,8,9] but its clinical diagnostic utility in an unselected cohort of TB suspects is unknown. More recently, lipoarabinomannan (LAM) antigen detection was found to be useful for the diagnosis of TB when using urine obtained from Tanzanian TB suspects [10]. However, the utility of a standardized LAM antigen-capture ELISA assay (ClearviewH TB ELISA) has not previously been evaluated in other body compartments including pleural fluid.
In this study we prospectively evaluated the comparative diagnostic utility of established (adenosine deaminase [ADA]), more recent (standardized nucleic-acid-amplification-test [NAAT]) and newer technologies (IP-10 levels and a standardized LAM mycobacterial antigen-detection assay) for the evaluation of pleural effusions in 78 South African tuberculosis suspects. The gold standard for tuberculosis was culture positivity for M. tuberculosis and/ or histology in keeping with tuberculosis.

Patient recruitment, characterization and routine laboratory tests
Seventy-eight consecutive patients with suspected TB pleural effusion (persistent fever, night sweats or cough, chest pain, loss of weight, previous tuberculosis or recent TB contact, or any patient in whom, due to suggestive symptoms or signs, TB was part of the differential diagnosis) were prospectively recruited at the Groote Schuur, Somerset and Victoria hospitals in Cape Town, South Africa, after informed consent (see figure 1 for an overview of the study plan), over a 12 month period (ending 30 April 2008). Those under 18 years of age, pregnant or refusing consent were not recruited. The data presented here is part of a parent study, which evaluated the role of unstimulated IFN-c versus quantitative T cell responses for the diagnosis of TB pleural effusion. Four patients were excluded for other reasons and thus there were 74 patients with evaluable results (figure 1). Study approval was obtained from the University of Cape Town Health Sciences Faculty research ethics committee.
All patients had a history taken, detailed physical examination performed, routine haemtological investigations, including testing for HIV infection, chest x-ray, sputum examination when possible (fluorescent microscopy for acid fast bacilli and mycobacterial culture using the MGIT 960 system), and aspiration of approximately 20 ml of pleural fluid (or closest obtainable volume) for biochemical (protein and glucose), cytological (for malignant cells, cell differential [Dif-Quick, American Scientific products]), and microbiological (Gram stain and culture for bacterial pathogens, mycobacterial fluorescence microscopy and culture for M. tuberculosis using the MGIT 960 system) evaluation.
For accurate characterization of disease multiple closed pleural biopsies (approximately four) were undertaken using an Abraham's needle under local anesthesia, by a trained internal medicine resident (specialist registrar). In 16 patients biopsies were not performed because of patient refusal, a contra-indication or a positive culture of fluid, or histology from another site, prior to attempted pleural biopsy. The reference standard for tuberculosis was culture positivity for M. tuberculosis and/ or histology in keeping with tuberculosis (caseous necrosis or acid fast bacilli with or without granuloma formation). Patients were thus characterized as (i) definite TB: either positive M.tb culture (sputum, pleural fluid or tissue) and/ or histology in keeping with tuberculosis, and a clinico-radiological picture consistent with TB with clinical response to anti-TB treatment; (ii) non-TB: alternative diagnosis made on histology or pleural fluid aspiration, and not treated for TB, and on 3 to 6 month follow-up there were no features to suggest TB, and (iii) probable TB: clinical picture of TB but not satisfying the definite TB criteria and treated for TB by the attending physician.
ADA, LAM antigen-detection, NAAT and IP-10 levels Pleural fluid protein and ADA levels were derived using the Biuret and colorimetric methods, respectively. LAM antigen concentration in the pleural fluid was measured in duplicate, after a heating step to dissociate antigen and antibody, according to the manufacturer's instructions (ClearviewH TB ELISA, ME, USA; see http://www.clearview.com/tb_elisa.aspx). An interim analysis was done in the 1 st 24 recruited patients (14 definite or probable TB cases and 10 non-TB cases) to use a go/ no-go decision point for further LAM testing. For nucleic-acid-amplification the Amplified Tuberculosis Direct Test (Genprobe, San Diego, CA) was performed in duplicate according to the manufacturer's instructions (H37Rv served as a positive control and M. intracellulare served as the negative control) and readouts were obtained using a Leader 50 Luminometer (Genprobe, San Diego, CA).
IP-10 levels were measured, according to the manufacturer's instructions, using a standardized Human IP-10 ELISA Kit (Hycult Biotechnology, Uden, The Netherlands). All assays were performed by an experienced laboratory technician who was blinded to patient and clinical details.

Bio-clinical score
To ascertain the relative value of newer tests in a high burden setting, regression models were fitted to identify variables independently associated with risk of tuberculosis, taking into account findings from the history, physical examination and pleural fluid biochemical data. The final bio-clinical scoring rule, incorporating age and protein levels, was developed by assigning a relative score or points to each of the variables included in the final multivariate model. Here we use the model to evaluate the relative incremental value of the different pleural diagnostic tests.

Statistical analysis
Categorical variables were compared using the x 2 test or Fisher exact test and continuous variables were compared using t-student test, whenever appropriate. Non-parametric tests (Mann-Whitney) were used for non-normally distributed variables. Concordance between tests was measured using the kappa co-efficient. Diagnostic accuracy, including 95% confidence intervals, was assessed using sensitivity, specificity, predictive values and area under the ROC in the TB and non-TB sub-groups. The study report was prepared using the Standards for Reporting of Diagnostic Accuracy (STARD initiative) format (19).

Demographic, clinical and biochemical data
A summary of the study plan is shown in figure 1. Of the 78 patients recruited 4 were excluded from the analysis (one because of a coagulated sample, one because an incorrect sample had been harvested [ascitic instead of pleural fluid], and two due to unverifiable patient details). Thus 74 patients had results for pleural fluid microscopy, biochemistry, ADA, PCR or IP-10 levels. There were 48, 19 and 7 patients with definite, non-TB and probable TB, respectively. Effusions in the non-TB group were due to several causes (2 lymphoma, 2 myeloproliferative disorders, 9 adeno or small cell carcinoma, 3 parapneumonic, and 3 due to other causes).

Results of smear, culture, histology and NAAT
Smear, pleural fluid culture, and biopsy (tissue culture and histology) were positive in 1, 27 and 41 of the 48 definite TB cases, respectively, and by definition, in none of the non-TB cases. None of the probable TB cases were culture or biopsy positive but all were treated empirically for TB based on clinical suspicion. Twenty one percent (16/ 74) of patients did not have a pleural biopsy (refused by 4 patients, contra-indicated in 2 patients, 1 in whom a liver biopsy was done, and 9 in whom the culture result was positive prior to a biopsy being done [6 sputum culture positive and 3 pleural fluid culture positive]). The diagnostic outcomes of protein levels at different cut-points is shown in table 1. Performance outcomes of the NAAT (manufacturerderived cut-point) were poor and are shown in table 1, table 2 and  figure 2.

IP-10 and ADA
The IP-10 scatter-plot (n = 73) and area under the ROC is shown in figure 3 and performance outcomes are shown in table 1. At an AUC-derived cut-point of 28170 pg/ml (definite vs non-TB) the sensitivity (%), specificity, +LR (likelihood ratio) and 2ve LR (95%

LAM antigen
We measured LAM antigen levels (ClearviewH TB ELISA, ME, USA) in the first 22 TB suspects recruited (12 TB, 2 probable TB and 10 non-TB). The sensitivity of LAM antigen was 8% (1 of 12 patients) and specificity was 100%.

Incremental diagnostic value of different tests
Incremental test value is summarised in table 3. ADA and IP-10 (at both cut-points) was more sensitive than the clinical evaluation alone, though histology did not add any further value. Adding IP-10 to ADA had little incremental benefit, however, at the lower IP-10 cut-point the % NPV (95% CI) improved from 79 (57; 92) for ADA alone, and 82 (59; 94) for ADA +IP-10, to 100 (71; 100).

Discussion
The diagnostic yield of current tools for pleural effusion is suboptimal. In this study we evaluated IP-10 and LAM antigen detection against other tools for the diagnosis of TB pleural effusion. Although IP-10 levels were significantly higher in TB compared to non-TB patients it was not clinically useful because of the sub-optimal specificity in the non-TB group. In three case control studies, two immunological [8,9] and one diagnostic [7], IP-10 levels were significantly higher in the TB compared to the control group. In the only study reporting diagnostic outcomes in 11 TB patients [7], IP-10 discriminated well between TB and malignant PE, and performed as well as unstimulated IFN-c (AUC of 0.93 vs 0.99, respectively). Comparatively, although we found  highly significant inter-group differences (TB vs non-TB; p,0.0001;), IP-10 missed the diagnosis in ,20% of TB patients and was falsely positive in ,20% of non-TB patients using the AUC-derived cut-point. Thus, IP-10 was not a meaningful clinical discriminatory tool and levels were highly variable in patients with malignant and para-pneumonic effusions.
IP-10 is an IFN-c-driven chemokine and a non-specific Th1 inflammatory marker and thus can be elevated in other disorders including pulmonary fibrosis [11], multiple sclerosis [12] and lymphoma [13]. It is also well recognized that case control studies, though useful for preliminary screening evaluation of new biomarkers, tends to overestimate diagnostic performance outcomes [14]. Moreover, statistical inter-group difference, as we demonstrate, does not necessary imply clinical usefulness. By contrast, at the lower cut-point the IP-10 NPV was 100%, and the negative LR was 0.24, making it a promising rule out test. This may be useful in clinical practice, particularly in undiagnosed pleural effusions when the histology is non-specific or nonrepresentative of pleural tissue. In this context a negative IP-10 test may prompt a more vigorous search for other pathologies (e.g. using alternative diagnostic strategies including thoracoscopy), and reduce exposure to unwarranted or empiric anti-TB treatment and its attendant toxicity.
By comparison the ADA (30 iu/l cut-point) had a better sensitivity, PPV and positive LR, though the specificity and NPV was sub-optimal implying that ,3 in 10 non-TB subjects would be erroneously treated for TB and ,1 in 7 patients with a negative test would in fact have TB. At a higher cut-point (47 iu/l) the former misdiagnosis would be reduced, but not eliminated, and at the expense of missing ,1 in 10 TB cases. At the lower cut-point ADA would be an excellent rule-out test but specificity would be poor. Nevertheless, ADA is cheaper and more widely available than IP-10. Thus IP-10, which can be measured with several commercially available ELISA assays, cannot replace ADA. Rather, it could useful in a specific clinical context where ruling out TB would be useful.
In the parent study using the same cohort of patients we show that unstimulated IFN-c levels very accurately distinguishes TB from non-TB effusions in African patients (data not shown). This preliminary analysis showed a modest correlation between IP-10 and IFN-c levels (Spearman r = 0.3836). Why does IP-10, an IFNc inducible chemokine, not correlate highly with IFN-c levels, as it does in peripheral blood [15]? IP-10 is also regulated by other cytokines including IFN-a, IL-1b and IL-12 [16,17,18], disease phenotype can modulate cytokine half-life [19], and there may be differential cellular uptake and cellular regulation of IP-10 [17]. To meaningfully evaluate the relative clinical value IP-10 and ADA we compared their utility to a simple bio-clinical score, generated through regression analysis, and relevant to a resourcepoor setting [20,21]. Both IP-10 and ADA substantially improved the sensitivity and NPV compared to bio-clinical assessment alone. A combination of the clinical score with ADA improved the NPV but not to an extent that was clinically useful. A combination of the clinical score with IP-10, or IP-10 and ADA, added little incremental value. At the lower IP-10 cut-point NPV was still better than ADA at either cut-point (low or high). We also investigated the effect of HIV status on test performance outcomes. IP-10 levels were not significantly different in HIV+ and negative patients, though HIV+ patients were more likely to have a positive pleural fluid culture (non-significant; data not shown).
TB antigen detection has previously been investigated for its diagnostic utility in pleural effusions. However, tuberculostearic acid was found to have limited diagnostic utility [22]. More recently, a preliminary study from Africa suggested that detection of urinary LAM antigen was a useful diagnostic adjunct in TB suspects [10]. The assay has now been commercialized into a finalized prototype (see http://www.clearview.com/tb_elisa.aspx), which we evaluated in pleural fluid. For cost-containment purposes further testing of LAM was discontinued, after an interim analysis of the first 24 patient results, because of its poor diagnostic utility (only one out of 14 definite or probable TB cases tested positive for LAM antigen). Why LAM antigen virtually undetectable in pleural fluid? Preliminary experiments excluded technical reasons and batch variability. To exclude the lack of antigen-protein dissociation (pleural fluid has a high protein content that may bind free LAM) a heating step was incorporated into the test protocol to ensure dissociation. The lack of LAM antigen detection in the majority of TB pleural effusions probably reflects the pauci-bacillary nature of pleural disease, though high affinity antigen-antibody binding cannot entirely be excluded.
The variability of NAAT performance outcomes are highly dependant on laboratory protocols and we therefore evaluated a standardized NAAT, hitherto not undertaken in an African setting, in pleural TB suspects. The sensitivity of commercial NAATs for pleural TB are highly variable (20 to 100%; summarized in detail in [3,23]). Although, in our study, specificity was high the sensitivity was poor (only 6%). Putative reasons may include the paucibaciliary nature of the disease, inhibitors in the pleural fluid, and sub-optimal mycobacterial nucleic-acid extraction efficiency. The latter is less likely as a culture control (,10 organisms) was included in the experiment. However, the assay used has no inhibitor-specific internal positive control. Further studies are required to assess the impact of HIV-infection and inhibitors on TB-specific nucleic acid amplification tests in high burden settings. We took several steps to minimize bias and ensure study validity, including consecutive recruitment with universally applied and pre-specified inclusion criteria, an experienced technician blinded to clinical details, invasive procedures to ensure accurate classification of patient and control sub-groups, and use of a prespecified reference standard. We also provide, through comparison with clinical assessment and incremental test value over existing tools, information on clinical utility rather than test performance outcomes only (sensitivity, specificity etc; [24]). Thus, there is an emphasis on 'diagnostic' rather than 'test' research. The sensitivity of the test is compared to the sensitivity of clinical assessment, which enables the incremental value of the test over the clinician's assessment, to be determined. Therefore, a test with an apparently high sensitivity may have limited clinical utility if it has little or no incremental benefit over clinical assessment. It is also meaningful, for the purposes of determining value, to determine the incremental benefit of a new test over an existing one. Nevertheless, this study has several limitations. The sample size was small and accuracy estimates relatively imprecise. We also evaluated the test in the same population used to determine the cut-points. Further work is now required to validate these cutpoints in different populations. Also, results are generalisable only to high burden settings; further and larger studies are required to evaluate whether outcomes are different in low burden settings. For example, the high background rate of LTBI, host biological factors and early TB infection may have impacted on test results.
In conclusion, like ADA, IP-10 levels at the AUC-derived cutpoint are not specific for tuberculosis, though at the lower cutpoint they appear to be promising rule-out tests for TB in a high burden setting. Larger studies are required in other settings to confirm these findings.