Can Point-of-Care Urine LAM Strip Testing for Tuberculosis Add Value to Clinical Decision Making in Hospitalised HIV-Infected Persons?

Background The urine lipoarabinomannan (LAM) strip-test (Determine®-TB) can rapidly rule-in TB in HIV-infected persons with advanced immunosuppression. However, given high rates of empiric treatment amongst hospitalised patients in high-burden settings (∼50%) it is unclear whether LAM can add any value to clinical decision making, or identify a subset of patients with unfavourable outcomes that would otherwise have been missed by empiric treatment. Methods 281 HIV-infected hospitalised patients with suspected TB received urine LAM strip testing, and were categorised as definite (culture-positive), probable-, or non-TB. Both the proportion and morbidity of TB cases identified by LAM testing, early empiric treatment (initiated prior to test result availability) and a set of clinical predictors were compared across groups. Results 187/281 patients had either definite- (n = 116) or probable-TB (n = 71). As a rule-in test for definite and probable-TB, LAM identified a similar proportion of TB cases compared to early empiric treatment (85/187 vs. 93/187, p = 0.4), but a greater proportion than classified by a set of clinical predictors alone (19/187; p<0.001). Thirty-nine of the 187 (21%) LAM-positive patients who had either definite- or probable-TB were missed by early empiric treatment, and of these 25/39 (64%) would also have been missed by smear microscopy. Thus, 25/187 (8%) of definite- or probable-TB patients with otherwise delayed initiation of TB treatment could be detected by the LAM strip test. LAM-positive patients missed by early empiric treatment had a lower median CD4 count (p = 0.008), a higher median illness severity score (p = 0.001) and increased urea levels (p = 0.002) compared to LAM-negative patients given early empiric treatment. Conclusions LAM strip testing outperformed TB diagnosis based on clinical criteria but in day-to-day practice identified a similar proportion of patients compared to early empiric treatment. However, compared to empiric treatment, LAM identified a different subset of patients with more advanced immunosuppression and greater disease severity.


Introduction
The early high mortality (.25%) amongst hospitalised TB HIV co-infected patients in resource-poor settings requires urgent attention [1,2]. The increased incidence of sputum pauci-bacillary, and disseminated forms of TB in these patients limits the use of both traditional and new TB diagnostic tools [3][4][5][6]. Empiric TB treatment, based only on clinical and radiological findings is common (,50%) amongst hospitalised HIV-infected patients with advanced immunosuppression; given their high pre-test probability of disease and illness severity [7,8]. Formalized World Health Organisation (WHO) clinical algorithms are available to guide empiric treatment and, despite modest diagnostic accuracy in ambulatory patients [9,10], evidence suggests that their use may reduce mortality amongst hospitalised HIV-infected patients [8].
Although death due to undiagnosed TB is common in hospitalised HIV-infected patients in Africa [2,8], empiric treatment guidelines and practices are inconsistent, vary between hospitals, and may needlessly expose patients to toxic drugs. Thus, there remains a need for simple bedside tools to help guide the early initiation of TB treatment, where a microbiological diagnosis may be unavailable or delayed.
Urinary lipoarabinomannan (LAM) has more recently been evaluated for the diagnosis of TB in HIV-infected patients [11,12].
In hospitalised HIV-infected patients, a urinary LAM ELISA (Alere, USA) has an overall sensitivity of 59-67%, increasing to as high as 85% in patients with CD4,50 cells/ml, and an overall specificity of 80-94% [13][14][15]. In addition, LAM positivity has been associated with higher mycobacterial burden, more severe illness, and a higher mortality [15][16][17]. As an alternative to the ELISA-kit, LAM can now be detected by a simple, low-cost (,US$3.5) point-of-care lateral flow assay that is able to provide results in 25 min from just 60 ml of urine [11,12]. Initial evaluation studies have confirmed equivalent performance of the urine LAM strip test compared to the LAM ELISA assay in different settings [13,18].
However, is this test really useful in 'real world' clinical practice in high HIV prevalence settings? The real value in any diagnostic lies in its ability to provide information beyond that deducible from basic clinical and radiographic data, such that it adds incremental value to routine clinical practice. Useful tests add value to clinical decision-making by ruling-in patients not otherwise routinely identifiable, pinpointing otherwise unrecognizable patients with the highest risk of morbidity and mortality, or by reducing unnecessary treatment. In this context, our study investigated whether point-of-care urine LAM strip testing offered any value over basic clinical and radiological screening, and whether testing was redundant in the context of routine 'real world' day-to-day clinical practice where empiric treatment is commonly used. We therefore evaluated LAM strip test performance against physician-led empiric treatment decisions and a set of clinical predictors.

Study Population
A study outline is shown in Figure 1. In total, 335 prospectively recruited adult in-patients patients from four hospitals (three district-and one tertiary-level) between July 2009 and December 2010 in Cape Town, South Africa were enrolled. Patients were referred for study inclusion by emergency-room or clinic doctors if suspected to have HIV-TB co-infection and needed in-patient care. All patients provided written informed consent and the University of Cape Town Faculty of Health Sciences Human Research Ethics Committee approved the study. Clinical information collected included: demographics, past history of TB, co-morbidity, symptoms, vital signs (including weight) and a modified early warning (MEWS) illness severity score [19]. Blood was taken for HIV, CD4 and renal function testing. A chest radiograph (CXR) was performed in all patients.

TB Diagnostic Sampling and Testing
Consultant-led hospital-based clinicians not associated with the study team determined the timing and extent of TB diagnostic work-up and the commencement of empiric anti-TB treatment. Routine hospital practice includes the collection, where possible, of two sputum samples in patients able to expectorate and, if extrapulmonary TB is suspected, the collection of 1-2 non-sputum samples from clinically involved sites (e.g. fine needle aspirate of lymph node, pleural fluid aspirate/biopsy, ascitic tap, lumbar puncture, pericardial aspiration etc.). Further details of biological samples collected for TB culture, stratified by patient TB diagnostic category, are outlined in table 1. Concentrated fluorescence smear microscopy was performed on NALC/NaOH processed sputum/non-sputum samples, and cultures were performed using the MGIT 960 liquid culture system (BD Diagnostics, USA).

TB Reference Standard and Case Definitions
The reference standard for definite-TB was liquid culture positivity for Mycobacterium tuberculosis from at least a single sample. Given the significant potential for misclassification bias due to challenges of sampling extra-pulmonary compartments, the significant proportion of sputum scarce patients, and the limited performance of a single liquid TB culture in HIV-infected patients [20], patients were further categorised into the following diagnostic groups for analysis ( Figure 1): Definite-TB. At least 1 M. tuberculosis sample positive by liquid culture (either sputum or non-sputum e.g. pleural fluid, pericardial fluid etc.).
Probable-TB. Not meeting the criterion for definite-TB but a clinical-radiological picture highly suggestive of TB. All patients in this group received and showed a good response to anti-TB treatment at two-month follow-up. Smear-positive but culturenegative or contaminated patient samples were included in this group.
Non-TB. No culture-based evidence of M. tuberculosis and an alternative diagnosis available. No clinical deterioration on twomonth follow-up and no TB treatment given. Patients culture positive for non-tuberculosis mycobacteria (NTM) and not receiving anti-TB treatment were assigned to this group.
Unclassifiable TB. Unable to assign to any of the abovementioned diagnostic groups due to death of unknown cause (without autopsy), on-going but uncharacterised symptoms at follow-up, or loss-to-follow-up at 2 months.

Early Empiric Treatment Definition
In order to compare the diagnostic performance of the urine LAM strip testing with routine clinical practice, early empiric treatment was defined as any patient commencing TB treatment within 24 hours of hospital admission based only on clinical and/ or radiological findings, and prior to the availability of any smear or culture results. All early empiric treatment decisions, even if initial made by junior staff (medical officers and registrars), were approved by the attending consultant general physician.

Modelling Clinical Predictors Using Multiple Imputation
A univariate analysis was used to determined basic clinical, laboratory and radiological predictors of definite-TB. A set of multivariate clinical predictors was generated using stepwise logistic regression modeling. Multiple imputation by chained Equations (Royston, P & White, I 2011) was used to impute missing data prior to model building. The variables included in the logistic regression modeling included the following (number of missing data points for each variable that were imputated is indicated in the brackets): sex (2), age (5), previous TB history (0), known TB contact (0), current smoker (0), cough $2 weeks (0), productive cough (0), haemoptysis (0), self-reported weight loss (0), appetite loss (0), recent fever (0), night sweats (0), fatigue (0), shortness of breath (0), chest pain (0), abdominal pain (0), nausea/ vomiting (0), diarrhea (0), neurological symptoms (0), measured weight (39), temperature (7), respiratory rate (7), and Modified early warning score (MEWS) (144) at enrollment, CXR compatibility with TB (24), and urine dipstick abnormalities (0). Different data appeared to be missing for different patients in a random fashion. The continuous variables (weight and temperature) were dichotomised using receiver operating characteristic (ROC) analysis to identify cut-points that maximised discriminatory utility prior to inclusion in the model. Rounded ß-coefficients from the reduced model of significant variables were used to generate scores to quantitate relevant clinical predictors. ROC analysis was performed and three cut-points were selected for rule-in, Youden's index (the optimal mathematical balance between sensitivity and specificity [21]) and rule-out value. Diagnostic accuracy, including 95% confidence intervals, for each cut-point was assessed using sensitivity, specificity, positive and negative predictive values (PPV, NPV) and positive likelihood ratio (LR+).

Urine Sampling and LAM Methodology
All patients gave a spot urine sample (10-30 ml) collected in a sterile container as soon as possible after recruitment. Urine was frozen on the day of collection and stored at 220uC for later batched testing. Urine LAM strip testing (DetermineH TB, Alere, USA) was performed on thawed urine according to the manufacturer's instructions by readers blinded to all patient data and reference test results. Urine LAM strip test lot#101102, the same as used for test evaluation in an outpatient ARV-clinic setting [18], was used. Detailed methodology for reading the urine LAM strip tests has been previously described [13]. Analysis was performed using the grade 2 cut-point which has shown better inter-observer reliability and good rule-in value (LR+ .10) in hospitalised HIV-infected patients [13].

Statistical Analysis
Analyses were restricted to HIV infected patients only and were performed using definite-and probable-(combined) versus non-TB patient groups for the primary determination of diagnostic accuracy (unclassifiable patients were excluded). In addition, given the inability to accurately evaluate the specificity of empiric treatment in the primary analysis as treatment response formed part of the diagnostic categorisation, alternative analyses were performed (see online supplementary material) using either M. tuberculosis complex culture-positive versus negative groups, or only definite-versus non-TB groups. Diagnostic accuracy, including 95% confidence intervals, for individual tests and early empiric treatment was assessed using sensitivity, specificity and LR+. Given the variations in test specificity and very high study prevalence of TB, ranges of positive and negative predicative values are presented for individual tests and early empiric treatment at differing rates of in-patient TB prevalence. STATA IC, version 11 (Stata Corp, Texas, USA) was used for all statistical analyses. Figure 1 outlines the study population. 16% (54/335) of enrolled patients were HIV uninfected and hence excluded from further analysis. 41% (116/281) of patients had definite-TB, an additional 25% (71/281) of patients had probable-TB, and only 10% (27/281) had non-TB. 24% (67/281) of patients remained unclassifiable due to death or lost-to-follow-up and were excluded from the primary analysis. Table 1 outlines demographics, basic clinical characteristics of the patient cohort and the sputum/nonsputum diagnostic samples stratified by TB diagnostic category. These same patient characteristics stratified by smear, culture and CD4 count have been previously described [13,22]. The median  In the primary analysis, both early empiric treatment and the urine LAM strip test showed specificities .95%. However, when an alternative analysis restricted to only patients with a valid M. tuberculosis culture result and comparing culture-positive and -negative groups is performed (results provided in table S1), although test sensitivities are not significantly lower, the specificities of both early empiric treatment and urine LAM strip testing decrease, with urine LAM offering higher specificity than early empiric treatment [75% (67-82, 95/ 126) vs. 63% (54-71, 79/126), p = 0.03]. Given this variable test specificity and high overall study TB prevalence, Table 3 presents a range of PPV (95% CI) and NPV values using three specificities for each diagnostic method (lowest, highest and average) at inpatient TB prevalence rates of 35%, 45%, and 55%, which could be expected to occur in the majority of endemic country hospital settings. The lowest specificities used in Table 3 are taken from the specificities presented in table S1, the highest from Table 2 and the third is an average of the highest and lowest. With an in-patient TB prevalence of 45% (M. tuberculosis culture positive TB prevalence in study = 48%, 116/242), the PPV ranges for early empiric treatment, urine LAM strip test and a combination thereof was 53-100%, 62-90% and 71-94%, respectively.

Clinical Predictors Compared to the Urine LAM Strip Test
The univariate and multivariate associates of definite-TB are shown in Table S3. Table S4 shows the sensitivity (95% confidence intervals), specificity, and LR+ for ROC-selected cutpoints, selected for their rule-in, rule-out, or best compromise between sensitivity and specificity (assuming equal weighting) for the quantified set of clinical predictors, the urine LAM strip test and early empiric treatment. At equivalent specificity, clinical

Clinical Predictors and Early Empiric Treatment
42% (10/24) of patients 'ruled-in' by clinical predictors $2.5 and 38% (23/61) patients 'ruled-out' by clinical predictors #0.5 were given early empiric treatment by attending hospital clinicians. Table S5 provides a further comparison of patient characteristics for patients commencing vs. not commencing early empiric treatment. No differences in basic demographic, symptomatology or diagnostic sampling was noted between groups except that a higher proportion of patients given early empiric treatment had a cough .

Urine LAM Strip Test Positive Patients Missed by Early Empiric Treatment
The Venn diagram in Figure 2 indicates the different but overlapping patient populations detected by urine LAM strip and early empiric treatment initiation. 21% (39/187) of definite-and probable-TB cases were urine LAM strip test positive, but missed by early empiric treatment. 64% (25/39) of these patients were either sputum smear-negative or unable to produce sputum. Table 4

Discussion
The point-of-care urine LAM test has potential as a useful adjunct for rapid TB diagnosis in HIV-infected hospitalised patients [11,13]. Its added clinical value, however, remains uncertain given its modest performance characteristics. The key finding of this study is that LAM detected patients that would have otherwise been missed by empiric treatment and this subgroup of patients had more advanced immunosuppression and greater illness severity. The latter represents a group most likely to benefit from the initiation of early treatment as they are at high risk.
Traditional and newer TB diagnostics show reduced diagnostic accuracy in hospitalised co-infected patients, particularly with advanced immunosuppression, as patients are often unable to produce sputum for diagnostic testing and/or have disseminated disease [3]. In addition, these patients present to hospitals with late stage disease and severe illness [3]. These factors mean that treatment decisions are commonly made based on clinical and Table 2. Sensitivity, specificity and positive likelihood ratio of early empiric treatment, the urine LAM strip test, and CXR for TB diagnosis in hospitalised HIV-infected patients using the definite and probable-TB groups for sensitivity, and the non-TB groups for specificity analyses. radiological findings alone, the need for urgent treatment initiation, and the high background disease prevalence (pre-test probability). Yet, in this same patient group, clinical and radiological findings are frequently atypical and/or non-specific, and this accounts for the poor rule-in value of the set of clinical predictors that we derived. Indeed, as a 'rule-in' test (cut-point selecting high specificity and PPV) clinical predictors could only correctly classify ,20% of all patients. This assumes coherent mathematical analysis of available diagnostic variables. However, routine clinical decision-making is rather a dynamic Bayesian process of assimilating an accumulating series of pre-test probabilities [23], and weighting of the overall post-test probability of disease against a threshold probability for initiating treatment. Thus, in reality physician practice varies widely and in an attempt to reduce mortality hospital treatment thresholds are lower than expected. Indeed, in this study, ,50% of definite-and probable-TB patients initiated early empiric TB treatment.
Early empiric TB treatment decisions should logically be targeted to the sickest patients, especially amongst hospitalised HIV-infected patients with advanced immunosuppression where given higher mortality rates, treatment-initiation thresholds should be lowered. However, this was not the case in those empirically treated. In fact overall, patients not commenced on early empiric treatment appeared, using the MEWS, to have a higher illness severity. No clear demographic, clinical or radiological factor predicted early empiric treatment. By contrast, the urine LAM strip test could pinpoint the most severely ill patients. Thus, our data suggest that early empiric treatment will miss a particularly vulnerable patient group with advanced immunosuppression that would have been detected by LAM strip testing. The rapid identification of these patients could target prompt therapy to those most likely to benefit. Recent studies with both the TB LAM ELISA and the LAM strip test support our findings, demonstrating associations between urine LAM positivity, higher mycobacterial disease burden, more advanced immunosuppression, and increased illness severity and mortality [15][16][17]. These findings Table 3. Calculated positive and negative predictive values for early empiric treatment, the urine LAM strip test and a combination thereof, using three data-generated estimates for test specificity and in-patient TB prevalence.

Test sensitivity (%)
Test specificity (%) In-patient TB prevalence support the need to undertake prospective impact studies to assess whether initiation of TB treatment based on urine LAM testing is able to save lives and/or decrease TB-related morbidity.
Do our findings have relevance to in-patient settings with a lower TB prevalence? Amongst in-patient settings with a lower TB prevalence, urine LAM is likely to offer superior 'rule-in' utility compared to empiric treatment. This is evidenced by: i) the poor comparative diagnostic utility and inferiority of a set of clinical predictors, which estimates pre-test probability or what could be expected with a frequentist interpretation of simple clinical and radiological predictors, and ii) lack of clear demographic, clinical or radiological parameters associated with early empiric treatment practice and limited agreement with the derived set of clinical predictors indicating the lack of predictability and hence, standardisation of empiric treatment decision-making. Given the modest performance characteristics of the urine LAM strip testing it is however clear that both urine LAM alone, or combined with existing empiric treatment practise, is likely to only offer clinically useful 'rule-in' utility (PPV.90%) in hospital settings with high TB prevalence (.35%). This is the first study to compare the value of urine LAM strip testing against clinical-radiological screening and day-to-day clinical practice (early empiric treatment rates) in hospitalised patients. However, our study has important limitations. Given the well-established misclassification bias that occurs due to the drawbacks of the TB culture technique and the lack of interventional lung sampling (sputum induction and bronchoscopy), a diagnostic categorisation was used to group patients for analysis. This may have underestimated sensitivity and overestimated specificity. The TB prevalence in our study was higher than in many other settings and this limits the generalisability of our findings. However, we have presented predictive values using estimated low, medium and high TB prevalence rates to improve generalisability. The study had a high proportion of unclassifiable-TB patients due to death and loss-to-follow-up, and these patients had a higher proportion of LAM strip test negative results than definite-and probable-TB groups. This may have introduced the possibility of selection bias, however, in our secondary analyses presented in the online supplementary materials we compare M. tuberculosis culture positive vs. negative groups and include unclassifiable-TB patients with a valid culture result. Key study findings are unaffected. Our study did not evaluate LAM against newer diagnostic standards such as the Xpert MTB/RIF assay. However, this was not accessible to us at the time of the study and this test offers reduced utility in extra-pulmonary and sputum scare TB. Urine LAM test results were not performed at the bedside or used to guide treatment initiation, thus a survival benefit through initiating early treatment in these severely ill patients is unclear but possible [8].
An ideal point-of-care test for rapid, laboratory-free detection of TB remains elusive [24]. However sampling hurdles and poor performance when using extra-pulmonary samples mean that the need to make empiric treatment decisions is likely to continue. Thus, despite only modest diagnostic accuracy, the low cost urine LAM strip test offers important added clinical value in hospitalised HIV-infected patients with suspected TB. Not only could the test detect patients missed by clinical and radiological predictors but also could potentially enable the rapid treatment of patients with the most advanced immunosuppression and severe illness. Further studies are now required to confirm our study findings and evaluate the impact of urine LAM strip testing to guide early treatment initiation in hospitalised HIV-infected patients. Figure S1 Venn diagram of all definite-TB patients indicating different but overlapping patient populations detected by the urine LAM strip test and early empiric TB treatment. (TIFF) positive-TB patients for sensitivity, and culture negative patients for specificity analyses. 1 All patients with 1 or more valid M. tuberculosis culture (either sputum or non-sputum) are included in this analysis irrespective of final TB diagnostic categorization (39/ 281 patients excluded with either no/contaminated culture result) { Any patient commenced on TB treatment within 24 hours of hospital admission based only on clinical and radiological findings, and prior to the availability of any smear or culture results, is included in this group. P-values indicate significant differences between tests (marked with * and number to indicate comparison group) for different diagnostic accuracy measures *1 p,0.001; *2 p,0.001; *3 p,0.001; *4 p,0.001; *5 p = 0.03; *6 p = 0.03; *7 p,0.001; *8 p = 0.005. (DOCX)

Table S2
Diagnostic accuracy measures of early empiric treatment, the urine LAM strip test and CXR for TB diagnosis in hospitalized HIV-infected patients using definite-TB (M. tuberculosis culture positive) for sensitivity, and non-TB patient groups for specificity analyses. { Any patient commenced on TB treatment within 24 hours of hospital admission based only on clinical and radiological findings, and prior to the availability of any smear or culture results, is included in this group. P-values indicate significant differences between tests (marked with * and number to indicate comparison group) for different diagnostic accuracy measures *1 p,0.001; *2 p,0.001; *3 p,0.001; *4 p,0.001; *5 p,0.001. (DOCX)

Table S3
Univariate and multivariate analyses for associates of definite-TB in HIV-infected hospitalised patients. { Receiver operating characteristic (ROC) curve-selected cut-point maximizing discriminatory utility used to dichotomise the continuous variables weight and temperature OR: odds ratio; TB: Tuberculosis; CXR: Chest x-ray; LAM: Lipoarabinomannan. (DOCX)

Table S4
Diagnostic accuracy measures for set of clinical predictors (using three ROC-selected cut-points), the urine LAM strip test and routine early empiric treatment in hospitalised HIV-infected patients using the definite and probable-TB groups for sensitivity and the non-TB groups for specificity analyses. P-values indicate significant differences between tests and/or cut-points (marked with * and number to indicate comparison group) for different diagnostic accuracy measures; *1 p,0.001; *2 p = 0.03 { Youden's index is defined as the point on the ROC curve that provides the optimal mathematical balance between sensitivity and specificity. (DOCX)

Table S5
Demographic, clinical, sampling and microbiological characteristics of study patients stratified by TB diagnostic group { Any patient commenced on TB treatment within 24 hours of hospital admission based only on clinical and radiological findings, and prior to the availability of any smear or culture results, is included in this group. Analysis is performed for all patients in this graph and hence includes 27 unclassified patients whom were commenced on early empiric treatment but do not form part of the primary analysis presented in the main manuscript. P-values indicate significant differences between patient groups (marked with * and number to indicate comparison group) for different patient characteristics 1 MEWS: Modified early warning score is an admission triage score based on illness severity and higher scores correlated with poor outcomes and increased mortality [19]. (DOCX)