Diagnostic accuracy of the Xpert MTB/RIF assay for bone and joint tuberculosis: A meta-analysis

Background This study aimed to evaluate the accuracy of the Xpert MTB/RIF assay for the diagnosis of bone and joint tuberculosis. Methods We searched databases from their inception to May 7, 2019 for published articles and reviewed them to assess the accuracy of Xpert MTB/RIF with respect to a composite reference standard (CRS) and mycobacterial culture. Meta-analyses were performed using a bivariate random-effects model, and the sources of heterogeneity were assessed via subgroup analysis and meta-regression. Results Nineteen independent (9 prospective, 5 retrospective, and 5 case-control) studies that compared Xpert MTB/RIF with the CRS and 14 (6 prospective, 7 retrospective, and 1 case-control) studies that compared it with culture were included. The pooled sensitivity and specificity of Xpert MTB/RIF were 81% (95% confidence interval [CI], 77–84) and 99% (95% CI, 97–100) compared to the CRS, respectively, and 96% (95% CI, 90–98) and 85% (95% CI, 57–96) compared to culture, respectively. The pooled sensitivity and specificity using pus samples vs. the CRS were 82% (95% CI, 76–86) and 99% (95% CI, 95–100), respectively. The proportions obtained while working with tissue samples vs. the CRS were 84% (95% CI, 76–90) and 98% (95% CI, 94–99), respectively. There was no significant difference in diagnostic accuracy among the types of specimens. Conclusions Xpert MTB/RIF demonstrates good diagnostic accuracy for bone and joint tuberculosis, the results of which are not related to the type of specimen.


Introduction
Tuberculosis is a major infectious disease globally and poses a serious threat to public health [1]. Extrapulmonary tuberculosis (EPTB) accounts for about 10% of all tuberculosis cases [1]. Bone and joint tuberculosis (BJTB) is a common type of EPTB and accounts for about 10-34% of EPTB cases [2,3]. BJTB can lead to joint destruction, deformity, and even paraplegia, thereby seriously affecting the quality of life. Therefore, early and correct diagnosis and treatment are critical [4]. However, early diagnosis is very difficult due to the atypical symptoms of BJTB, deep lesions, and difficulty in obtaining specimens [5]. Traditional diagnostic protocols, such as Mycobacterium tuberculosis culture, are quite time-consuming and have a low sensitivity [6]. Therefore, rapid laboratory diagnosis of BJTB is an urgent necessity. The Xpert MTB/ RIF assay is a rapid, automated molecular test with a high accuracy for the detection of pulmonary tuberculosis (PTB) and EPTB [7]. This assay has also been recommended for the diagnosis of lymph node tuberculosis and has shown good diagnostic accuracy [8]. However, the diagnostic accuracy of this assay for BJTB remains controversial. Due to the lack of independent systematic research on the diagnostic accuracy of Xpert MTB/RIF assay for BJTB, the possible influence of the type of specimen (pus and tissue samples) on the results is yet to be clarified. Therefore, we performed a meta-analysis to confirm the diagnostic accuracy of the Xpert MTB/RIF assay, compared to that of the composite reference standard (CRS) and mycobacterial culture, in the detection of BJTB. We assessed the pooled sensitivity and specificity of this assay compared to different references. Moreover, the diagnostic accuracy of the test was evaluated, based on different sample types, lesion sites, conditions of samples, and patient selection methods by subgroup analysis.

Data sources and search strategy
We searched PubMed, Embase, the Cochrane Library, the Wanfang database, and China National Knowledge Infrastructure for studies evaluating the diagnostic accuracy of Xpert for BJTB on May 7, 2019. The search formula ((Xpert OR Gene Xpert) AND ("Tuberculosis, Osteoarticular"[Mesh] OR "Tuberculosis, Spinal"[Mesh] OR "Extra pulmonary tuberculosis")) was used for PubMed without any limitation. Similar search formulae were used for Embase, the Cochrane Library, China National Knowledge Infrastructure, and Wanfang databases. References cited in the included articles and reviews were further explored for possible candidate studies.

Inclusion criteria
We included full-text original studies that assessed the diagnostic accuracy of Xpert assay for BJTB using bone and joint specimens. Reference standards were well-defined and appropriate to the studies. The articles directly provided true positive (TP), false positive (FP), false negative (FN), and true negative (TN) values for the assay or included the data necessary to calculate these measures. We excluded case reports, articles written in languages other than English and Chinese, studies with < 10 samples, conference reports, and abstracts without full articles.

Reference standard
A composite reference standard (CRS) or mycobacterial culture was defined as the reference standard in our study. Clinical symptoms, radiographic features, biochemical test results, smears, culture, histopathology, and response to anti-tuberculosis drugs constituted the reference standards in the CRS. Some or all of the factors with positive results were considered positive for BJTB. Cases were considered as non-BJTB if all the results were negative. We used the CRS as defined in the original paper.

Literature screening and selection
Two investigators (Guocan Yu and Yanqin Shen) independently assessed the candidate articles by reviewing their titles and abstracts, followed by the full text, for inclusion. Discrepancies between the two investigators were resolved by discussion with a third investigator (Xiaohua Kong).

Data extraction
We extracted data including author name; year; country; TP, FP, FN, and TN values for the assay; reference standard; patient selection method; some steps (e.g., homogenization); specimen type; and condition along with other parameters. The same two investigators independently extracted the necessary information from each of the included articles; we crosschecked the information they obtained. Discrepancies in the two data sets were settled by a discussion with a third investigator, similar to that used during the literature selection phase. Data from studies against two different reference standards were treated separately.

Assessment of study quality
Based on the two reference standards (CRS and culture), the two investigators independently divided the studies into two groups and used a revised tool for Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) to assess study quality separately [9]. We chose not to carry out formal assessment of publication bias, as the available methods such as funnel plots are not considered valid for diagnostic accuracy reviews [10].

Data synthesis and statistical analysis
We first obtained the values corresponding to TP, FP, FN, and TN in each included study, and calculated the estimated pooled sensitivity and specificity of Xpert MTB/RIF associated with the 95% confidence interval (CI), against CRS or culture, using bivariate random-effects models. Forest plots for sensitivity and specificity were generated for each study. The areas under summary receiver operating characteristic (SROC) curves (AUC) were subsequently calculated. I 2 statistics were used to assess heterogeneity between the studies and a reference standard. While 0% indicated no observed heterogeneity, values greater than 50% were considered to imply substantial heterogeneity [11]. We explored different types of samples, different lesion sites, different patient selection method, decontamination methods, sample conditions, and homogenization as potential sources of heterogeneity, using subgroup and meta-regression analyses. At least four published studies were required to perform the meta-analysis for predefined variable types. Data from studies against CRS and culture were analyzed separately. Stata version 14.0 (Stata Corp., College Station, TX, USA) with the midas command packages was used to generate forest plots of sensitivity and specificity with 95% CI for each study and carry out meta-analyses and meta-regression analyses.

Imperfect reference standards
Imperfect reference standards may lead to misclassification of samples in diagnostic validity studies [12,13]. Due to the paucibacillary nature of EPTB, a culture would be an imperfect reference standard and lead to an underestimation of the true specificity of Xpert MTB/RIF. A CRS is a composite standard that comprises results from several tests; however, a CRS itself may have reduced specificity, thereby leading to apparent FN Xpert MTB/RIF results, an underestimation of the true sensitivity of Xpert MTB/RIF [12,14]. Therefore, a study comparing Xpert MTB/RIF with both culture and CRS might provide a more credible range for sensitivity and specificity.

Identification of studies and study characteristics
Through our search strategy, four hundred candidate articles were identified from relevant databases, and twenty-six qualified articles were included according to the inclusion criteria (Fig 1)  , including 12 prospective studies, 9 retrospective studies and 5 case-control studies. The kappa index of agreement for the selection and data extraction was 0.877 (95% CI, 0.779-0.975) between the two investigators. Only two studies were conducted in high-income countries [18,35], while the rest were conducted in low-and middle-income countries. Fifteen articles were written in English and eleven in Chinese. A median of 109 specimens was evaluated in each article (range, 13-418). We excluded five other articles that reported on the sensitivity only, without reporting any specificity [41][42][43][44][45]. Two other articles published in languages other than English or Chinese were excluded [46,47]. The specimens used in the studies included pus, tissue, and joint fluid. The commonest site of infection was the spine.
Articles that reported the use of two different reference standards in the same study were considered to include two independent studies. In accordance with this principle, 33 independent studies were included: 19 (9 prospective studies, 5 retrospective studies, and 5 case-control studies) compared Xpert MTB/RIF with CRS, and 14 (6 prospective studies, 7 retrospective studies, and 1 case-control study) compared Xpert MTB/RIF with culture (Table 1).

Study quality
The overall methodological quality of the included studies, using a CRS and culture, is summarized in Fig 2. The risk of bias was mainly due to patient selection and the reference standard. The flow and timing of the risk of bias from the index test was judged to be relatively low.

Discussion
The diagnosis of BJTB, just like that of other forms of EPTB, is very challenging, due to its paucibacillary nature [48]. The delayed diagnosis and treatment of BJTB can lead to serious complications, such as joint destruction, contractures in large joints, growth arrest, and neurological impairment in spinal tuberculosis, resulting in long-term morbidity and disability [36]. Invasive examination is a necessary diagnostic step in most cases. Puncture and biopsy of lesions are the most common invasive procedures for EPTB; the same applies to BJTB. Due to the deep location, obstruction of the bony structure, limited volume of the puncture specimen, the atypical nature of the puncture site, and low bacterial content of the specimen, it is still difficult to diagnose BJTB by pathological and cultural examination of the puncture specimen [39]. This causes BJTB to be often misdiagnosed or missed, leading to inappropriate treatment and adverse prognoses among patients. Therefore, in order to reduce the deformity and disability rate of BJTB, rapid and effective diagnostic methods are needed to detect BJTB early.
Nucleic acid tests such as the loop-mediated isothermal amplification assay, as a fast and efficient detection method, has been widely used in the diagnosis of tuberculosis [49] and demonstrated good diagnostic efficacy in the detection of EPTB [50]. The Xpert MTB/RIF assay, a rapid and automated real-time nucleic acid amplification test is currently one of the most commonly used in the diagnosis of tuberculosis. This test can validate the MTB complex DNA within 2 hours and is widely used in the diagnosis of PTB and EPTB. Previous studies have shown that this test has good diagnostic efficacy in the diagnosis of PTB and EPTB [7]. The Xpert MTB/RIF assay has also been recommended by the World Health Organization for the diagnosis of EPTB [51], including lymph node tuberculous and tuberculous meningitis. This Diagnostic accuracy of Xpert MTB/RIF assay for bone and joint tuberculosis  test should also be used in the diagnosis of BJTB. Although many studies had reported the application of this test in BJTB, there has been no consensus on the sensitivity and specificity. In a meta-analysis performed by Wen et al., the pooled sensitivity and specificity of Xpert MTB/RIF for BJTB were 81% (95% CI, 78-83) and 83% (95% CI, 80-86), respectively [52]. However, this result was not differentiated according to different reference standards; the number of studies included was relatively small, and the diagnostic efficacy of different types of specimens and different sites of infection was not differentiated. In order to further evaluate the efficacy of this test in the diagnosis of BJTB, we designed this study to determine the accuracy of Xpert MTB/RIF for BJTB.
Our study included 19 studies with comparisons to the CRS and 14 studies with comparisons to culture. The selection of patients included in most studies was not consecutive. The CRS varied among the articles included, which could be the main source of bias among the studies. Publication bias needed to be considered. For intervention studies, publication bias occurs if studies with significant results are more likely to be published than studies with nonsignificant findings. Regarding diagnostic tests, many studies are conducted without ethical review or study registration, or do not compare tests; hence, it would be problematic to assess publication bias. According to the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, traditional analytical approaches, such as 'funnel plots', may not be appropriate for the assessment of test accuracy [10]. At present, there is no recognized and accepted statistical method for quantifying the potential effect of publication bias in studies of diagnostic accuracy [53]. Therefore, we chose not to carry out a formal assessment of publication bias.
The present study demonstrated that the pooled sensitivity and specificity of Xpert MTB/ RIF for BJTB were 81% (95% CI, 77-84) and 99% (95% CI, 97-100) compared to the CRS, 96% (95% CI, 90-98) and 85% (95% CI, 57-96) compared to culture, respectively. As expected, the use of different reference standards led to different sensitivities and specificities, and their ranges were relatively credible. Regardless of the gold standard, Xpert MTB/RIF showed a very good diagnostic accuracy. In the present study, a substantial level of heterogeneity was also observed among the studies. Subgroup analysis revealed that the pooled sensitivity of Xpert MTB/RIF performed on pus samples was 82% (I 2 = 68%; 95% CI, 76-86), and the pooled specificity was 99% (I 2 = 8%; 95% CI, 95-100) compared to the CRS, respectively. The pooled sensitivity and specificity were 84% (I 2 = 79%; 95% CI, 76-90) and 98% (I 2 = 57%; 95% CI, 94-99), respectively, compared to the CRS, when using tissue specimens. The level of heterogeneity within subgroups decreased, suggesting that specimen types might be a source of heterogeneity among studies especially for specificity. The sensitivity and specificity of Xpert MTB/RIF performed using pus specimens was similar to those obtained using tissue specimens compared to the CRS; there was no significant difference. Overall, the diagnostic accuracy of Xpert MTB/RIF for BJTB, using pus and tissue specimens, were found to be similar. This finding was similar to that made in a previous study of lymph node tuberculosis [8]. For spinal tuberculosis, the pooled sensitivity and specificity of Xpert MTB/RIF were 81% (95% CI, 75-86) vs. 97% (95% CI, 91-99) and 99% (95% CI, 93-100) vs. 64% (95% CI, 41-83) compared to CRS and culture, respectively. However, the heterogeneity (such as regards sample type) across studies was substantial; the results needed to be treated with caution. Owing to its paucibacillary nature, for BJTB, culture is an imperfect reference standard. A CRS with multiple evaluation indicators might be a more applicable reference standard. We chose to evaluate the two reference standards separately as it would be more confusing to mix them. As predicted, we found an underestimated level of sensitivity and an overestimated level of specificity when using the CRS as a reference standard. The range for sensitivity and specificity compared with the two references was more plausible. However, the CRS varied across studies in this study. The CRS for most studies included the results of culture (Lowenstein-Jensen and/or BACTEC MGIT 960 culture), except for four studies [20,31,32,38] while that for most studies included the results of histology/cytology, smear microscopy, clinical symptoms, and radiographic features; eight studies included the response to anti-tuberculosis treatment, and none of the studies included other molecular test results. This might be one of the sources of heterogeneity among the studies. The studies also used different culture references. Nine studies only used BACTEC MGIT 960 liquid culture as the reference, one study only used Lowenstein-Jensen solid culture as the reference [34], and four studies used both of them as references [18,22,24,35]. The performances of BACTEC MGIT 960 liquid and Lowenstein-Jensen solid cultures were different [54], which might also be one of the sources of heterogeneity among the studies. Separate data used in the subgroup of different reference standards were limited; thus, further meta-analysis could not be performed.
Meta-regression analysis showed that the patient selection method affected the outcome and was a source of heterogeneity compared to the CRS, probably due to the fact that patients who were not included consecutively were more likely to introduce selection bias. The sample processing of BJTB specimens, such as decontamination, sample condition, and homogenization, varied among studies; however, meta-regression analysis showed that these factors did not affect the outcome and hence were not obvious sources of heterogeneity.
Our meta-analysis had several limitations. This meta-analysis has not been registered online and despite comprehensive searches, some studies may still have been missed, and some studies failed to distinguish specimen types. In addition, some studies used multiple sample types, which may have led to some bias in our results. Additionally, sample processing of BJTB specimens was highly variable among studies, since the assay, designed for respiratory samples, may slightly vary for other specimens. Additionally, the CRS standard for the studies was different. There was substantial heterogeneity among the studies, and the pooled estimates need to be interpreted with caution.

Conclusions
We observed that the pooled sensitivity and specificity of Xpert MTB/RIF were 81% and 99%, respectively, when compared to the CRS, and 96% and 85%, respectively, when compared to culture. For spinal tuberculosis, the pooled sensitivity and specificity of Xpert MTB/RIF were 81% and 99%, respectively, when compared with the CRS, and 97% and 64%, respectively, when compared with culture. When performed on pus samples, the pooled sensitivity and specificity were 82% and 99% compared with the CRS. When performed on tissue samples, the pooled sensitivity and specificity were 84% and 98% compared with the CRS. The diagnostic efficiencies for different specimen types (pus and tissue) were similar. Xpert MTB/RIF showed a good diagnostic accuracy for BJTB and was not related to the type of specimen.