Commercial Serological Tests for the Diagnosis of Active Pulmonary and Extrapulmonary Tuberculosis: An Updated Systematic Review and Meta-Analysis

An up-to-date systematic review and meta-analysis by Karen Steingart and colleagues confirms that commercially available serological tests do not provide an accurate diagnosis of tuberculosis.


Introduction
Despite impressive advances in tuberculosis (TB) control over the last decade [1], missed diagnoses continue to fuel the global epidemic, leading to more severe illness for patients and enabling further transmission of Mycobacterium tuberculosis [2]. Smear microscopy and chest radiography, the primary tools used in resourcelimited countries for identifying TB, often perform poorly, especially in HIV-coinfected patients [3][4][5]. Improved techniques, such as liquid culture for M. tuberculosis and nucleic acid amplification tests, are often too expensive and complex for routine use in resourcelimited settings. The Xpert MTB/RIF (Cepheid), a new technology recently endorsed by the World Health Organization (WHO), provides high sensitivity for detection of TB and drug resistance [6]. WHO has issued a blueprint for Xpert's implementation [7]; however, high cost may be a barrier for scaling up this technology in many areas where the epidemic is most severe [2].
Serological tests have a long history and have been used successfully for the rapid diagnosis of many infectious diseases (e.g., HIV, syphilis, and viral hepatitis). In this paper, ''serological tests'' refers to blood tests that detect the humoral immune (antibody) responses to M. tuberculosis antigens. Serological tests are not to be confused with interferon-gamma release assays that measure the T-cell-based interferon-gamma response to M. tuberculosis antigens. In comparison with microscopy, serological tests appear to offer several advantages: (1) the result from a serological test using the enzyme-linked immunosorbent assay (ELISA) format could be available within hours, and the result using an immunochromatographic assay format, within minutes; (2) a serological test, if developed into a point-of-care test, could potentially replace microscopy or extend testing to lower levels of health services; and (3) in children, for whom sputum is difficult to obtain, and in patients suspected of having extrapulmonary TB, a blood test may be more practical.
Although currently the International Standards for TB Care discourages the use of serological tests in routine practice [8] and no international guideline recommends their use, dozens of commercial serological tests for TB diagnosis are offered for sale in many parts of the world [9], including Afghanistan, Bangladesh, Brazil, Cambodia, China, India, Indonesia, Kenya, Myanmar, Nigeria, Pakistan, Philippines, Russia, South Africa, Thailand, Uganda, and Viet Nam, as was recently found in a survey of 22 high TB burden countries [10]. For example, in India, numerous products with claims of high accuracy in their package inserts are available for purchase (Table S1), and an estimated 1.5 million serological tests are performed every year [10].
We are aware of four systematic reviews and one laboratorybased evaluation on this topic. The first review included only studies with a cohort or case series design and searched the literature through 2003 [11]. Performance of the tests was modest, and sensitivity decreased when only studies meeting at least two designrelated criteria were included (seven studies, pooled sensitivity of 34%) [11]. Two subsequent reviews evaluating commercial serological tests for pulmonary TB (68 studies) [12] and extrapulmonary TB (21 studies) [13] found the sensitivity and specificity of these tests to be highly variable. The fourth review, a meta-analysis of in-house serological tests for the diagnosis of pulmonary TB (254 studies including 51 distinct single antigens and 30 distinct multipleantigen combinations), identified potential candidate antigens for inclusion in an antibody-detection-based TB test in patients with and without HIV infection; however, no single antigen achieved sufficient sensitivity to replace smear microscopy [14]. A laboratorybased evaluation of 19 rapid commercial tests conducted by the WHO Special Programme for Research and Training in Tropical Diseases found that, in comparison with culture plus clinical followup, serological tests provided low and variable sensitivity (1% to 60%) and specificity (53% to 99%) [15].
Since the publication of the previous reviews, the evidence base has grown and approaches to meta-analysis of diagnostic tests have evolved. This updated systematic review was commissioned by WHO to guide policy recommendations on serological tests for TB, with a special focus on the relevance of these assays in lowand middle-income countries. The objective of this review is to synthesize new evidence since 2006 in order to address the following question: what is the diagnostic accuracy of commercial serological tests for active TB (pulmonary and extrapulmonary TB) in adults and children, with and without HIV infection? Specifically, we were interested in evaluating the use of a serological assay as a replacement test for, or an additional test after, smear microscopy.

Methods
We followed methods for conducting and reporting systematic reviews and meta-analyses recommended by the Cochrane Collaboration Diagnostic Test Accuracy Working Group and the PRISMA statement (Text S1), including the preparation of a protocol and analysis plan (Text S2) [16][17][18].

Selection Criteria and Definitions
Types of studies. Diagnostic studies (with any study design) were included that evaluated serological tests for active TB (pulmonary and extrapulmonary TB) in patients who provided sera before or within 14 d of starting antituberculous treatment.
Participants. The participants constituted adults and children, with and without HIV infection, with suspected or confirmed active TB, from all clinical settings (clinic or hospital). The protocol for the current review included studies with at least ten TB cases. Studies could be performed in any country regardless of TB incidence or income status.
Index test. The index test was any commercial serological test for the diagnosis of active TB.
Comparator tests. There was either no test or smear microscopy used for comparison.
Target conditions. The target conditions were pulmonary and extrapulmonary TB.
Reference standards. Pulmonary TB required positivity on mycobacterial culture. (The previous review accepted positivity on either culture or smear microscopy as the reference standard [12].) Extrapulmonary TB required positivity on at least one of the following tests: culture, smear, or histopathological examination.
Outcomes. The outcomes were sensitivity and specificity. Sensitivity refers to the proportion of patients with a positive serological test result among patients with TB confirmed by the reference standard. Specificity refers to the proportion of participants with a negative serological test result among participants without TB according to the reference standard. To estimate specificity, we selected only one non-TB group if a study had more than one such group. The preferred non-TB participants were those in whom active TB was initially suspected but later ruled out (''other respiratory disease'' or ''mixed disease'' groups), and who were from the same population as TB patients.
Extrapulmonary TB. Extrapulmonary TB was classified as lymph node, pleural, meningeal and/or central nervous system, bone and/or joint, genitourinary, abdominal, skin, other sites, disseminated, and multiple sites (extrapulmonary TB cases from different sites are combined to obtain at least ten extrapulmonary TB cases).
Country income status. Country income status was classified according to the World Bank List of Economies [19].
Exclusion criteria. The following studies were excluded: (1) studies published before 1990; (2) animal studies; (3) conference abstracts and proceedings; (4) studies on the detection of latent TB infection; (5) studies on nontuberculous mycobacterial infection; (6) studies that used non-immunological methods for detection of antibodies; and (7) basic science literature that focused on detection/cloning of new antigens or their immunological properties (i.e., early pre-clinical studies).

Search Methods
We updated the database searches ( In addition to database searches, we also searched reference lists of eligible papers and related reviews, and contacted authors and researchers in the field to identify additional potentially relevant published studies. For lack of time, we did not specifically seek to identify unpublished studies.

Study Selection
Initially, two reviewers (KRS and LLF) independently screened the accumulated citations for relevance and then independently reviewed full-text articles using prespecified eligibility criteria. Disagreements about study selection were resolved by discussion.

Data Extraction
A data extraction form was created and pilot-tested with a subset of eligible studies and then finalized. Two reviewers independently extracted data from included studies with the standardized form on the following characteristics: study design; age group (children ,15 y of age); HIV status; case country of residence; sputum smear status (pulmonary TB); site of TB (extrapulmonary TB); assay type (e.g., ELISA, immunochromatographic test); antibody class detected (IgG, IgM, and IgA); serological test name; antigen composition; condition of the specimen (fresh or frozen); and sensitivity and specificity (data were extracted as true positives, false positives, false negatives, and true negatives). In some cases, study investigators evaluated more than one diagnostic test with the same set of participants. In these situations, we extracted data for each test and considered each dataset to be an independent study. For example, Anderson et al. [20] contributed three studies evaluating three serological tests: (1) InBios Active TbDetect IgG ELISA (InBios International); (2) IBL M. tuberculosis IgG ELISA (IBL-Hamburg), and (3) anda-TB IgG (Anda Biologicals) [20]. The agreement between reviewers on data extraction for sensitivity and specificity was 100%. Other differences between the reviewers (these differences mainly concerned the methodological quality assessment) were resolved by discussion. When necessary, we contacted authors of papers identified through the updated literature search for additional information.
While extracting data, we looked for studies that considered the added value of serological tests to determine if they contributed to active TB diagnosis beyond that ascertained by conventional tests such as symptoms, sputum smears, and chest radiographs. In particular, we looked for studies comparing microscopy with microscopy plus serology or studies that performed multivariable analysis. Since we did not identify any studies of this type, we considered studies in smear-negative patients to provide indirect evidence of the use of serology as an add-on test to microscopy. In addition, we looked for information on patient-important outcomes. Patient-important outcomes for this review could include an increased number of TB patients detected, decreased time to starting treatment, increased number of patients starting TB treatment, decreased number of false-positive TB patients treated, and decreased number of patients lost because of a reduced number of visits. Finally, we looked for information on the values and preferences of patients associated with these tests.

Assessment of the Methodological Quality of Individual Studies
Two reviewers (KRS and LLF) independently assessed study quality using the core set of 11 items from Quality Assessment of Diagnostic Accuracy Studies (QUADAS), a validated tool to evaluate the presence of bias and variation in diagnostic accuracy studies [21]. As recommended, we scored each item as ''yes,'' ''no,'' or ''unclear.'' We considered representative patient spectrum (i.e., was the spectrum of patients representative of the patients who will receive the test in practice?) to be persons suspected of having active TB who were consecutively or randomly enrolled. For pulmonary TB, a score of ''yes'' for representative spectrum also required that patients were evaluated in an outpatient setting. For all studies, we scored the following six items as ''yes'': acceptable reference standard (the reference standard was a criterion for inclusion); acceptable delay between serological test and reference standard; partial verification avoided; incorporation avoided; reference standard results blinded (as culture result was considered to be entirely objective in interpretation); and relevant clinical information. ''Differential verification avoided'' was scored as ''yes'' if all participants suspected of having TB were evaluated with the same reference standard or if participants without TB were reported to be asymptomatic and healthy. ''Index test (serological test) result blinded'' (to reference standard result) was scored as ''yes'' if this was explicitly stated in the paper. ''Uninterpretable results reported'' was scored as ''yes'' if uninterpretable results were described or there was a statement about the absence of uninterpretable results. ''Withdrawals explained'' was scored as ''yes'' if a flow diagram or statement was included making it clear what happened to all participants in the study. Conflicts of interest are known to be a concern in diagnostic studies [22]. Therefore, we evaluated the involvement of test manufacturers. Finally, we grouped studies according to the type of serological test or site of TB (in the case of extrapulmonary TB) and assessed study quality separately for each subgroup.

GRADE Quality of Evidence
We used the GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach, a transparent and systematic process for making judgments about quality of the evidence [23]. GRADE specifies four categories for quality: high, moderate, low, and very low. These categories are then applied to the body of evidence for each outcome, rather than to individual studies. In the GRADE approach, the categories reflect the extent of confidence that an estimate of effect is correct [24]. Quality begins with a consideration of study design (e.g., randomized controlled trials and cross-sectional studies in patients with diagnostic uncertainty are considered high quality) and may be compromised by five factors: limitations (risk of bias assessed by QUADAS, such as absence of consecutive or random selection of participants and lack of blinding of test results), indirectness (lack of generalizability and use of test results as surrogates for patientimportant outcomes), inconsistency (unexplained heterogeneity), imprecision (wide confidence intervals for estimates of test accuracy), and risk of publication bias [23].

Data Analysis
Descriptive analyses were performed using SPSS (version 14.0.1.366). For each study, the sensitivity and specificity of the serological test along with the 95% CIs were calculated, and forest plots were generated to display sensitivity and specificity estimates using Review Manager 5.0 (The Nordic Cochrane Center). Heterogeneity was assessed visually using forest plots (Review Manager 5.0).

Selection of Subgroups for Meta-Analysis
We recognized that studies were heterogeneous in many respects, particularly concerning the serological test used, antibody class detected, sputum smear status (pulmonary TB), and site of extrapulmonary TB. Therefore, in order to address heterogeneity and combine study results, subgroups of ''comparable'' tests and extrapulmonary sites were prespecified. When possible, studies were stratified by smear and HIV status. For meta-analysis, at least four studies were required to be available for inclusion in a subgroup, in order to strengthen results and reduce the possibility of finding a significant result by chance. This classification resulted in seven subgroups for meta-analysis (four pulmonary and three extrapulmonary TB subgroups). As noted above, in some cases, study investigators evaluated more than one diagnostic test with the same participants; therefore, some meta-analyses included the same individuals multiple times.
To summarize test performance within each subgroup, we carried out bivariate meta-analyses that jointly modeled sensitivity and specificity. These models weighted studies according to the sampling variability within studies as well as the unexplained heterogeneity between studies using a random effects approach [25]. Subgroups were considered homogeneous with respect to a number of observed variables. However, within a subgroup, it is likely that the heterogeneity between studies could be explained by measurable but unobserved quantities, e.g., the positivity cutoff, that we could not address. Therefore, for the pooled results of studies in the meta-analysis, we did not attempt to quantify unobserved heterogeneity using statistics such as I 2 or chi-squared [18]. The model was estimated using a Bayesian approach with nonsubjective prior distributions and implemented using Win-BUGS (version 1.4.1) [26]. We used Wilson's method for estimating the credible interval as this method performs well even when the probability or the sample size is small [27]. Finally, a hierarchical summary receiver operating characteristic (HSROC) curve was plotted for selected meta-analyses. The HSROC curve plots sensitivity versus specificity and provides information on the overall performance of a test across different thresholds. The closer the curve is to the upper left-hand corner of the plot (sensitivity and specificity are both 100%), the better the performance of the test [28]. The plots were made using R (version 2.6.1) [29].

Pulmonary TB
Results of the search. Initially, 4,256 citations were identified ( Figure 1). After screening titles and abstracts, 160 potentially relevant full-text papers were retrieved. Thirty-one papers (20 from the original review and 11 from the update) describing 67 studies and involving 5,147 participants (sample size = 8,318) were included in the review [20,. A list of excluded articles with their reasons for exclusion is provided in Text S4.
Included studies. Of 67 total studies, six (9%) were reported in languages other than English: Spanish (2), Turkish (1), Chinese (1), Bosnian (1), and Russian (1). Thirty-two (48%) studies were conducted in low-and middle-income countries. No studies were randomized controlled trials; 55% of studies used a cross-sectional study design, and 45% of studies used a case-control study design. All but one study reported recruiting TB and non-TB patients from the same underlying population [56]. One study involved HIV-infected individuals, and no studies involved children. Thirty-one (46%) studies involved smear-positive patients, 28 (42%) studies involved smear-negative patients, and eight (12%) studies involved patients with unspecified smear status. Fifty-four (81%) studies used ELISA, 12 (18%) studies used an immunochromatographic assay, and one study used a kaolin precipitation test. The majority of studies detected only IgG antibody (44 studies) and used frozen serum (51 studies). The median number of TB patients included in each study was 41 (interquartile range 33 to 54). Eighteen serological tests were included; anda-TB (IgG, IgA, and IgM) was the test most frequently evaluated (16/67 [24%]) ( Table 1). The antigen composition for five (28%) of the total 18 tests was considered proprietary information. Of the tests with known antigens, all had unique antigenic compositions except for anda-TB and Hexagon, which both contained antigen A60.
One study directly compared a serological test to sputum microscopy. No studies evaluated the incremental value of adding a serological test after smear microscopy. However, as noted, 28 (42%) studies involved smear-negative patients. These studies were considered a proxy for a diagnostic strategy using serological tests in addition to microscopy. No studies reported on patientimportant outcomes or patient values and preferences concerning these tests. Characteristics of included studies are described in Table S2.
Methodological quality, all included studies. As assessed with QUADAS, studies had very serious limitations. Of the total 67 studies only 19 (28%) were considered to include a representative patient population (we scored this item as ''yes'' when ambulatory patients suspected of having active TB were randomly or consecutively selected), and 34 (51%) studies reported blinding of the serological test result. The majority (60%) of studies reported industry involvement, mainly the donation of test kits ( Figure 2). We downgraded two points for limitations in the GRADE Evidence Profile (Table 3).
Test performance, all studies. As seen from the forest plots in Figure 3, studies displayed considerable heterogeneity, with sensitivity values ranging from 0% to 100% and specificity values, from 31% to 100%. We did not pool accuracy estimates because of the heterogeneity among studies. Similarly, when restricted to studies conducted in low-and middle-income countries, sensitivity (16% to 91%) and specificity (31% to 100%) were highly variable (data not shown).
Methodological quality, studies in smear-negative patients. As assessed with QUADAS, only 14 (50%) of the total 28 studies were considered to include a representative patient population. A majority (75%) of studies reported blinding of the serological test result (Figure 4). We downgraded one point for limitations in the GRADE Evidence Profile (Table 4).
Test performance, studies in smear-negative patients. For individual studies involving smear-negative patients, sensitivity values ranged from 29% to 77%, and specificity values, from 77% to 100% ( Figure 5). We did not pool accuracy estimates because of the considerable heterogeneity among studies. As noted above, studies involving smear-negative patients were considered to provide indirect evidence of a diagnostic strategy using microscopy plus serology. Hence, as an add-on test, serological tests provided inconsistent sensitivity and specificity.
Analysis by subgroups. According to our prespecified analysis plan, there was a sufficient number of studies to perform a meta-analysis for only one serological test, anda-TB IgG, with results stratified by smear status (seven studies of smearpositive and four studies of smear-negative patients). In studies of smear-positive patients, one study was conducted in a low-income country [43]. In studies of smear-negative patients, no studies were conducted in a low-or middle-income country.
Methodological quality of studies evaluating anda-TB IgG. In studies of smear-positive patients, no studies were considered to have a representative patient population (participants were known TB cases rather than suspected cases, were inpatients, and/or were enrolled by convenience), and only two studies reported blinding of the serological test result [43,52] (Figure S1). In studies of smear-negative patients, no studies were considered to have a representative patient population (participant selection was by convenience or not reported), and only one study reported blinding of the serological test result [52] ( Figure S2).
Meta-analysis. In studies involving smear-positive patients, anda-TB IgG yielded a pooled sensitivity of 76% (95% CI 63-87) and a pooled specificity of 92% (95% CI 74-98). In studies involving smear-negative patients, the pooled sensitivity of anda-TB IgG decreased to 59% (95% CI 10-96); the 95% CI was very wide, reflecting the imprecision of the sensitivity estimate. Pooled specificity was 91% (95% CI 79-96) ( Table 2). The HSROC curves show the decreased performance of the test in smearnegative patients compared with smear-positive patients ( Figure 6).  Figure 7). The probability that the pooled sensitivity of ELISA tests exceeds that of immunochromatographic assays was estimated at 0.88. TB in HIV-infected patients. The only study identified involving HIV-infected patients compared the performance of the
Included studies. Of 25 total studies, ten (40%) studies were conducted in low-and middle-income countries. All papers were written in English. Only one study involved HIV-infected individuals. The vast majority (88%) of studies involved adults. In two studies (reported from one paper), 13 of 35 extrapulmonary TB cases occurred in children; however, data were not provided separately for the children [67]. One study specified 13 y as the minimum age for eligibility; however, the age range for the enrolled participants was not reported [64]. Of 25 total studies, serological tests were evaluated for diagnosis of the following forms of extrapulmonary TB: lymph node, six studies; pleural, five studies; multiple sites, five studies (see Table S3 for a list of sites involved); genitourinary, two studies; disseminated, four studies; and meningeal, one study. In two studies, the site of extrapulmonary involvement was not reported. Six distinct serological tests were evaluated; 17 (68%) of the total 25 studies used anda-TB (IgG, ten studies; IgM, five studies; IgA, one study; IgM plus IgA, one study). ELISA was used in 21 (84%) studies, and immunochromatographic assays, in four studies. The majority (72%) of studies detected IgG antibodies. The condition of the specimen was frozen in six (24%) studies and not reported in 19 studies. The median number of TB patients included in each study was 35 (interquartile range 30 to 56). No studies reported on patient-important outcomes or patient values and preferences concerning these tests. Characteristics of included studies are described in Table S3.
Methodological quality, all included studies. Of the 25 total studies, only one (4%) study was considered to include a representative patient population, and only four (16%) studies reported blinding of the serological test result (Figure 8).
Test performance, all studies. As seen from the forest plots in Figure 9, studies displayed considerable heterogeneity, with sensitivity values ranging from 0% to 100%, and specificity values, from 59% to 100%. We did not pool accuracy estimates because of the heterogeneity among studies.
Analysis by subgroups. There was a sufficient number of studies to perform a meta-analysis for only one serological test, There was considerable heterogeneity in study results. f We did not pool accuracy estimates. The 95% CIs were wide for many individual studies. We did not downgrade as there were a large number of studies and we had already taken off two points for inconsistency. g Data included in the review did not allow for formal assessment of publication bias using methods such as funnel plots or regression tests. Therefore, publication bias cannot be ruled out. It is prudent to assume some degree of publication bias as studies showing poor performance of serological tests were probably less likely to be published. No points were deducted. doi:10.1371/journal.pmed.1001062.t003  anda-TB IgG (ten studies). Eight studies were conducted in highincome countries, one study was conducted in India [61], and one study, in Turkey [66].
Methodological quality of studies evaluating anda-TB IgG. As assessed with QUADAS, studies had very serious limitations. No studies were considered to have a representative population (no studies reported selecting participants in a consecutive or random manner), and no studies reported interpreting the results of the serological test result without knowledge of the results of the reference standard.
TB in HIV-infected patients. The only study identified involving HIV-infected patients evaluated the performance of the MycoDot test (Mossman Associates) in a cross-sectional study of patients suspected of having TB in Thailand [68]. In all, 142 HIVinfected (mean CD4 cell count = 188 cells/mm 3 [range 7 to 632]) and 144 HIV-uninfected patients with newly diagnosed TB participated in the study, of whom 50 patients (40 HIV-infected and ten HIV-uninfected patients) had a diagnosis of lymph node TB established by culture or histopathological examination. Compared with the sensitivity of MycoDot in HIV-uninfected TB patients (80%, 95% CI 44-98), the sensitivity of the test in HIV-infected TB patients was considerably lower (33%, 95% CI . The specificity in both groups was 97% (95% CI 93-99) (Table S3).

GRADE Evidence Profiles
For the pulmonary TB studies, the quality of the body of evidence supporting TB serology's estimates of sensitivity and specificity was graded as ''very low'' (Tables 3 and 4). Thus, regardless of the width of the 95% CIs (which reflects the size of studies and the standard deviation of their measured results), we have very low confidence in the estimates obtained from pooling studies in the meta-analysis [23]. For the extrapulmonary TB studies, the final quality grades were also very low (data not shown).

Discussion
This updated systematic review assessing the diagnostic accuracy of commercial serological tests for pulmonary and extrapulmonary TB summarizes the current literature and includes 14 new papers (approximately 30% of the included papers) identified since our previous reviews [12,13]. Unlike the earlier reviews, in the update, we performed a meta-analysis using a bivariate random effects model to account for the variability in Table 4. GRADE evidence profile: should commercial serological tests be used as an ''add on'' test to smear microscopy in patients of any age suspected of having pulmonary TB? This table includes studies conducted in smear-negative patients as a proxy for a diagnostic strategy using serological tests in addition to smear microscopy. Based on sample size = 3,433, sensitivity median = 61% and specificity median = 92%. The quality of evidence was rated as high (no points subtracted), moderate (one point subtracted), low (two points subtracted), or very low (.2 points subtracted) based on five factors: study limitations, indirectness of evidence, inconsistency in results across studies, imprecision in summary estimates, and likelihood of publication bias. For each outcome, the quality of evidence started at high when there were randomized controlled trials or high-quality observational studies (cross-sectional or cohort studies enrolling patients with diagnostic uncertainty) and at moderate when these types of studies were absent. No points were subtracted when there were negligible issues identified; one point was subtracted when there was a serious issue identified; two points were subtracted when there was a very serious issue identified in any of the criteria used to judge the quality of evidence. Points subtracted are in parentheses. Publication bias was rated as ''not likely,'' ''likely,'' or ''very likely'' [23].
a What do these results mean given 10% disease prevalence among individuals being screened for TB? b Outcomes were ranked by their relative importance as critical, important, or of limited importance. Ranking helped to focus attention on those outcomes that were considered most important. c Only 14/28 (50%) studies were considered to include a representative patient population; 75% of studies reported blinding of the serological test result. d We downgraded for indirectness because these studies were used as a proxy for a diagnostic strategy using serological tests in addition to smear microscopy. e There was considerable heterogeneity in study results. f We did not pool accuracy estimates. The 95% CIs were wide for many individual studies. We did not downgrade as there were a large number of studies and we had already taken off two points for inconsistency. g Data included in the review did not allow for formal assessment of publication bias using methods such as funnel plots or regression tests. Therefore, publication bias cannot be ruled out. It is prudent to assume some degree of publication bias as studies showing poor performance of serological tests were probably less likely to be published. No points were deducted. doi:10.1371/journal.pmed.1001062.t004 test accuracy across studies. Findings from the current review are similar to those of the previous review: studies of current serological tests show that these tests provide inaccurate and imprecise estimates of sensitivity and specificity.
In the earlier systematic reviews, we recommended the use of guidelines such as STARD (Standards for the Reporting of Diagnostic Accuracy Studies) [70] and QUADAS [21] to improve methodological study quality. In the current review, within-study quality continues to be a concern. For example, in the pulmonary TB group there were 16 new studies. Six of these studies (three papers) were published subsequent to the previous reviews [20,37,49]. Four of the six studies selected participants by convenience or did not report the manner of selection (selection bias), and no studies reported that the serological test result was interpreted without knowledge of the reference standard. Selection bias and absence of blinding are features of study design that have been associated with exaggerated accuracy estimates [71,72].
A substantial contribution of the current review is the use of the GRADE approach. This framework enabled us to synthesize data on the quality of the body of evidence in a way that was not possible for the previous systematic reviews [12,13] because GRADE was not well developed for diagnostic studies at that time. The very low quality of evidence for the studies evaluating anda-TB IgG in smearnegative patients decreases our confidence in the pooled sensitivity and specificity estimates. In this subgroup, applying the GRADE approach, quality was compromised by three factors: (1) risk of bias: no studies recruited participants in a random or consecutive manner, and only one study reported blinded interpretation of the serological test result; (2) indirectness: no studies were conducted in low-or middle-income countries, limiting generalizability to these settings; and (3) imprecision. If the pooled estimates of test accuracy had been derived from high-quality studies, then the serological test might have been shown to have some clinical utility for contributing to diagnostic algorithms for smear-negative TB, especially since the tests are relatively inexpensive, rapid, and easy to perform. However, the very low quality of the evidence implies that the serological test cannot be recommended.

Strengths and Limitations
Strengths of our review include the use of a standard protocol and comprehensive search strategy, two independent reviewers at all stages of the review process, the assessment of methodological quality of individual studies with the QUADAS tool, and the use of the GRADE approach. Heterogeneity is to be expected in results of diagnostic test accuracy studies [73]. Therefore, we prespecified subgroups to limit heterogeneity and, as noted above, used a bivariate random effects model. Our review also had limitations, notably, the majority of studies were not considered to include patients with a representative spectrum of disease severity. Differing criteria for patient selection and greater duration and severity of illness of the study populations may have introduced variability in findings among studies. In addition, the majority of studies were not performed in a blinded manner, or blinding was not explicitly stated. Also, the meta-analysis was limited by the small number of studies for a particular serological test. anda-TB IgG was the only test with enough studies for meta-analysis. Clearly, having more studies would have allowed us to examine observed, study-level covariates that could be sources of heterogeneity. An additional limitation was that, in some cases, we assumed that multiple results carried out on the same sample were independent. By doing so, our meta-analysis model may have underestimated heterogeneity and overestimated precision of the pooled sensitivity and specificity estimates by including a larger number of participants. Subgroup analyses in a meta-analysis, like subgroup analyses in a clinical trial, are vulnerable to bias; therefore, the findings of this meta-analysis should be interpreted with caution [74]. Although we tried to address language bias by performing the updated literature search in all languages, the original literature search was limited to studies published in English, and language bias remains a possibility. Finally, our review did not allow for formal assessment of publication bias using methods such as funnel plots or regression tests because such techniques have not been adequately evaluated for diagnostic data [75]. Therefore, publication bias cannot be ruled out. However, it is prudent to assume some degree of publication bias, as studies showing poor performance of serological tests may have been less likely to be published, especially because several studies were industry supported.
This systematic review focused on test accuracy (i.e., sensitivity and specificity). Although, we looked for information on patient-important outcomes (meaning a serological test used in a given situation results in a clinically relevant improvement in patient care and/or outcomes), we did not find this information in the literature reviewed. We did not identify studies with the specific aim of detecting the value of serology over and above conventional tests such as smears. However, the WHO Special Programme for Research and Training in Tropical Diseases report on rapid serological tests for TB mentioned above did evaluate the added value of smear plus serology and reported a gain equivalent to the detection of 57% of the smear-negative, culture-positive TB cases. There was, however, a corresponding unacceptable decrease in specificity to 58% [15].
In conclusion, published data on commercial serological tests produce inconsistent and imprecise estimates of sensitivity and specificity, and the quality of the body of evidence on these tests remains disappointing. This systematic review included evaluations of only commercially available antibody-based detection tests. Considerable research is underway on new approaches to the serological diagnosis of TB. These approaches include the use of newly identified selected purified recombinant antigens and antigen combinations [14]. Recent studies from a number of laboratories have reported several new potential candidate antigens that may be expected to lead to improved antibody detection tests for TB in the future. These conclusions should be reconsidered if, in the future, methodologically adequate research evaluating serological tests becomes available.
The findings from this systematic review were used as the input for a cost-effectiveness study of serological testing for active TB in India [76]. In comparison with sputum  microscopy, serological testing resulted in fewer disabilityadjusted life years averted and more false-positive diagnoses and secondary infections, while increasing costs to the Indian TB control sector approximately 4-fold. This cost-effectiveness study and the findings from our updated systematic review were considered by a WHO Expert Group on Serodiagnostics, and in July 2011, the WHO published a policy statement on commercial serodiagnostic tests for diagnosis of TB. The policy states that ''Commercial serological tests provide inconsistent and imprecise estimates of sensitivity and specificity. There is no evidence that existing commercial serological assays improve patient-important outcomes, and high proportions of false-positive and false-negative results adversely impact patient safety. Overall data quality was graded as very low, with harms/risks far outweighing any potential benefits (strong recommendation). It is therefore recommended that these tests should not be used in individuals suspected of active pulmonary or extra-pulmonary TB, irrespective of their HIV status.'' The WHO policy strongly encourages targeted further research to identify new/alternative point-of-care tests for TB diagnosis and/or serological tests with improved accuracy [77].  Text S1 PRISMA statement.

Editors' Summary
Background Every year nearly 10 million people develop tuberculosis-a contagious bacterial infection-and about two million people die from the disease. Mycobacterium tuberculosis, the bacterium that causes tuberculosis, is spread in airborne droplets when people with the disease cough or sneeze. It usually infects the lungs (pulmonary tuberculosis) but can also infect the lymph nodes, bones, and other tissues (extrapulmonary tuberculosis). The characteristic symptoms of tuberculosis are a persistent cough, weight loss, and night sweats. Diagnostic tests for the disease include microscopic examination of sputum (mucus brought up from the lungs by coughing) for M. tuberculosis bacilli, chest radiography, mycobacterial culture (in which bacteriologists try to grow M. tuberculosis from sputum or tissue samples), and nucleic acid amplification tests (which detect the bacterium's genome in patient samples). Tuberculosis can usually be cured by taking several powerful drugs daily or several times a week for at least six months.
Why Was This Study Done? Although efforts to control tuberculosis have advanced over the past decade, missed tuberculosis diagnoses and mismanaged tuberculosis continue to fuel the global epidemic. A missed diagnosis may lead to more severe illness and death, especially for people infected with both tuberculosis and HIV. Also, a missed diagnosis means that an untreated individual with pulmonary tuberculosis may remain infectious for longer, continuing to spread tuberculosis within the community Missed diagnoses are a particular problem in resource-limited countries where sputum microscopy and chest radiography often perform poorly and other diagnostic tests are too expensive and complex for routine use. Serological tests, which detect antibodies against M. tuberculosis in the blood (antibodies are proteins made by the immune system in response to infections), might provide a way to diagnose tuberculosis in resource-limited countries. Indeed, many serological tests for tuberculosis diagnosis are on sale in developing countries. However, because of doubts about the accuracy of these commercial tests, they are not recommended for use in routine practice. In this systematic review and meta-analysis, the researchers assess the diagnostic accuracy of commercial serological tests for pulmonary and extrapulmonary tuberculosis. A systematic review uses predefined criteria to identify all the research on a given topic; meta-analysis is a statistical method that combines the results of several studies.
What Did the Researchers Do and Find? The researchers searched the literature for studies that evaluated serological tests for active tuberculosis published between 1990 and 2010. They used data from these studies to calculate each test's sensitivity (the proportion of patients with a positive serological test among patients with tuberculosis confirmed by a reference method; a high sensitivity indicates that the test detects most patients with tuberculosis) and specificity (the proportion of patients with a negative serological result among people without tuberculosis; a high specificity means the test gives few false-positive diagnoses). They also assessed the methodological quality of each study and rated the overall quality of the evidence. The researchers found 67 studies (half from low/middle-income countries) that evaluated serological tests for the diagnosis of pulmonary tuberculosis. The sensitivity of these tests varied between studies, ranging from 0% to 100%; their specificities ranged from 31% to 100%. For the anda-TB IgG test-the only test with sufficient studies for a meta-analysis-the pooled sensitivity from the relevant studies was 76% in smear-positive patients and 59% in smear-negative patients. The pooled specificities were 92% and 91%, respectively. The researchers found 25 studies (40% from low/middle-income countries) that evaluated serological tests for the diagnosis of extrapulmonary tuberculosis. Again, sensitivities and specificities for each test varied greatly between studies, ranging from 0% to 100% and 59% to 100%, respectively. Overall, for both pulmonary and extrapulmonary tuberculosis, the quality of evidence from the studies of the serological tests was graded very low.
What Do These Findings Mean? This systematic review, which updates an analysis published in 2007, indicates that commercial serological tests do not provide an accurate diagnosis of tuberculosis. This finding confirms previous systematic reviews of the evidence, despite a recent expansion in the relevant literature. Moreover, the researchers' analysis indicates that the overall quality of the body of evidence on these tests remains poor. Many of the identified studies used unsatisfactory patient selection methods, for example. Clearly, there is a need for continued and improved research on existing serological tests and for research into new approaches to the serological diagnosis of tuberculosis. For now, though, based on these findings, cost-effectiveness data, and expert opinion, the World Health Organization has issued a recommendation against the use of currently available serological tests for the diagnosis of tuberculosis, while stressing the importance of continued research on these and other tests that could provide quick and accurate diagnosis of TB.