Accuracy of serological tests for diagnosis of chronic pulmonary aspergillosis: A systematic review and meta-analysis

Chronic pulmonary aspergillosis (CPA) is a slow and progressive disease that develops in preexisting lung cavities of patients with tuberculosis sequelae, and it is associated with a high mortality rate. Serological tests such as double agar gel immunodiffusion test (DID) or counterimmunoelectrophoresis (CIE) test have been routinely used for CPA diagnosis in the absence of positive cultures. However, these tests have been replaced with enzyme-linked immunoassay (ELISA) and, a variety of methods. This systematic review compares ELISA accuracy to reference test (DID and/or CIE) accuracy in CPA diagnosis. It was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The study was registered in PROSPERO under the registration number CRD42016046057. We searched the electronic databases MEDLINE (PubMed), EMBASE (Elsevier), LILACS (VHL), Cochrane library, and ISI Web of Science. Gray literature was researched using Google Scholar and conference abstracts. We included articles with patients or serum samples from patients with CPA who underwent two serological tests: ELISA (index test) and IDD and/or CIE (reference test). We used the test accuracy as a result. Original articles were considered without a restriction of date or language. The pooled sensitivity, specificity, and summary receiver operating characteristic curves were estimated. We included 14 studies in the review, but only four were included in the meta-analysis. The pooled sensitivities and specificities were 0.93 and 0.97 for the ELISA test. These values were 0.64 and 0.99 for the reference test (DID and/or CIE). Analyses of summary receiver operating characteristic curves yielded 0.99 for ELISA and 0.99 for the reference test (DID and/or CIE). Our meta-analysis suggests that the diagnostic accuracy of ELISA is greater than the reference tests (DID and/or CIE) for early CPA detection.

Introduction STARD 2015 [15]. A systematic review protocol was developed and registered in the International Prospective Register of Systematic Reviews-CRD42016046057. We used the Cochrane recommendations to report systematic reviews and meta-analyses of studies on diagnostic accuracy [16].

Eligibility criteria
The inclusion criteria comprised studies in which population or serum samples from patients diagnosed with aspergilloma or CPA were subjected to immunoenzymatic test (ELISA) and to DID and/or CIE test. The accuracy of the tests was defined as the primary outcome. Original studies were included without restriction based on language, geographical location, or publication date. We excluded studies with children or animals and in vitro studies. We could not find an article in Japanese, which was selected for full article reading because it was not available in the international library commuting service.

Information sources and search strategies
The following databases were searched for studies: MEDLINE (through PubMed), EMBASE (through Elsevier), LILACS (through VHL), Cochrane library, and ISI Web of Science. Gray literature was researched in Google Scholar and congress abstracts. We performed the search strategy until June 2019.
We used the following search strategy for MEDLINE and adapted it for the other databases: pulmonary aspergillosis AND serologic test (and its synonyms). ("Pulmonary Aspergillosis" [Mesh] or Aspergillosis, Pulmonary or Pulmonary Aspergillosis or Lung Aspergillosis or Aspergillosis, Lung or Aspergillosis, Lung or Bronchopulmonary Aspergillosis or Aspergillosis, Bronchopulmonary or Bronchopulmonary Aspergillosis or Aspergillosis, Bronchopulmonary or Aspergillose, Bronchopulmonary or Bronchopulmonary Aspergillose) AND ("Serologic Tests" [Mesh] or Serological Tests or Serological Tests or Serological Tests, Serological or Tests, Serologic or Serologic Tests or Serologic Tests or Serodiagnoses).

Study selection and data extraction
Titles were imported from EndNote Online, and duplicate studies were removed. The remaining titles were independently reviewed by two authors (TFS and SMVLO), who selected the article abstracts and summarized the complete texts for evaluation. The divergences were resolved by a third expert reviewer (RPM). Two other authors (CEVC and JV) performed independent evaluations of the complete articles and judged the methodological quality of the included studies using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [17]. The divergences were resolved by consensus among the researchers.
• Two reviewers (CEVC, JV) independently extracted the following data from each included study: • Study characteristics: author, year of publication, country, design, and sample size; • Population characteristics: according to the inclusion criteria; • Description of the index test and cut-off points; • Description of the reference standard and cut-off points; • QUADAS-2 items; and • Accuracy results obtained in each study to construct a diagnostic contingency (2 × 2 table).

Assessment of methodological quality
For this review, we used the QUADAS-2 tool to assess the methodological quality of studies [17]. QUADAS-2 consists of four key domains: patient selection, index test, reference standard, and flow and timing. We assessed all domains for risk of bias (ROB) potential and the first three domains for applicability concerns. Risk of bias was judged as "low," "high," or "unclear." Two review authors independently completed QUADAS-2 and resolved disagreements through discussion.

Statistical analysis and data synthesis
We used data reported in the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) format to calculate sensitivity and specificity estimates and 95% confidence intervals (CIs) for individual studies. Summary positive (LR+) and negative (LR-) likelihood ratios and summary diagnostic odds ratios (DOR) were obtained from bivariate analysis. We used the clinical interpretation of likelihood ratios [18] as follows: conclusive evidence (LR+ > 10 and LR-< 0.1), strong diagnostic evidence (LR+ > 5 to 10 and LR-0.1 to < 0.2), weak diagnostic evidence (LR+ > 2 to 5 and LR-0.2 to < 0.5), and negligible evidence (LR+ 1-2 and LR-0.5-1). In studies where it was possible to calculate sensitivity and specificity for the ELISA test and DID and/or CIE, we calculated the accuracy test and Youden's J statistic. Youden's index values range from zero to one inclusive, with the expectancy that the test will show a greater proportion of positive results for the diseased group than the control [19].
Studies were submitted to meta-analysis when three conditions were met: sample size greater than 20; sensitivity and specificity were available for the index and the reference tests; and control group was included in the analysis. We presented individual studies and pooled results graphically by plotting sensitivity and specificity (and their 95% CIs), heterogeneity, and receiver operating characteristic (ROC) space estimates using Stata software. For the subgroup analysis, we presented individual studies and pooled results in forest plots using Meta-DiSc software.

Investigations of heterogeneity
We investigated heterogeneity using subgroup analysis. First, we analyzed a subgroup with three studies that presented only healthy controls, maintaining high heterogeneity. Next, we analyzed a second subgroup with two of the most recent commercial testing studies. Thus, we found the main source of heterogeneity: in-house and commercial tests. In-house tests present many technical differences. We considered an I2 value close to 0% as having no heterogeneity between studies; close to 25%, low heterogeneity; close to 50%, moderate heterogeneity; and close to 75%, high heterogeneity between studies [20].

Study inclusion
A total of 2160 articles were identified. Among these, 2096 were found using a database, and 64 were identified from other sources (manual search). After removing duplicates, 1797 articles remained. After title/abstract exclusion, only 21 articles were submitted to a full text read, and 14 of these were included for the systematic review. Only four studies were included in the meta-analysis (see Fig 1).

PLOS ONE
In one article, we could not identify the number of patients with CPA that was evaluated nor was it possible to extract data from the 2 × 2 table for DID and ELISA [28]. In two articles, it was not possible to recover the DID data [24,30]. In another article, data were not obtained from CIE [31]. In yet another article [32], it was not possible to extract ELISA data. In one study [33], 20 sera from 13 patients were used; it was not possible to extract the accurate data per patient, and control group data were not presented for the ELISA test. In three articles, the tests did not include a control group [25,27,29]. In one article, the control group included patient samples showing the presence of DID precipitation lines; we did not consider this to be a control group [24]. Only one study used participants with disease as controls [26].
During the extraction of ELISA antigen concentration data, five studies using in-house tests presented concentrations varying from 0.1 mcg to 250 mcg per well [21,22,30,33,34]. These concentrations were not reported in two other articles [28,32].
In the reference test, all studies had a low risk of correctly classifying the target condition. Bias risk assessment was uncertain or high risk in ten studies [21, 23-25, 27, 28, 30, 31, 33, 34] owing to a lack of clarity regarding whether the standard test was interpreted without the knowledge of the index test or with prior knowledge.
Regarding flow and time, bias risk assessment was uncertain in nine studies [21, 25, 27-31, 33, 34] for not clearly describing whether there was an appropriate interval between conducting the index test and the reference test. The evaluation was high risk in two studies [24,27]. All patients were submitted to a reference test in eleven studies, which were included in the analysis [21-24, 26, 29-34]; the results showed low risk. Not all patients were submitted to a test reference in two studies [25,27], and this was uncertain in one study [28].
Almost all the articles presented low applicability concern, because they did not fail to correspond to the critical question in our study.

Diagnostic accuracy
We present all articles included in this systematic review with a description of the index and reference tests, the number of patients and control groups, and the values of sensibility, specificity, accuracy test, likelihood positive value, likelihood negative value, and Youden's statistic in Table 1.
The Youden index ranged from 0.50 to 0.98 for the ELISA test and from 0.26 to 1 for the reference test (DID and/or CIE) for the individual studies. Four studies presented a good performance above 0.90 Youden index for the ELISA test [23,26,31,34] and three studies for the reference test [21,32,34]. Two studies used commercials tests [23,26] using the fluorescent enzyme immunoassay method with the ImmunoCAP system, and the best cut-off value for this test in our study (sensitivity: 100%, specificity: 96%) was 27 mgA/L [26]. The other studies presented a performance below 0.90. The Youden index indicates the trade-off between sensitivity and specificity.

PLOS ONE
not interpreted the reference test (DID and/or CIE) in the same way because LR-was included as weak diagnostic evidence.
The forest plots in Figs 4 and 5 show the sensitivity, specificity ranges, and heterogeneity for the ELISA test and reference test (DID and/or CIE) in detecting CPA across the included studies.
We also constructed the sROC curves and calculated the area under the ROC (AUROC) for included studies (Fig 6). The overall diagnostic performance of ELISA and the reference tests (DID and/or CIE) were comparable (AUROC 0.99 [95% CI 0.97-0.99] and 0.99 [95% CI 0.97-0.99], respectively).

Heterogeneity investigations
When we evaluated the four studies [21][22][23]26], we found a heterogeneity (I2) of 67.69 (95% CI 33.17-100.00) in the ELISA sensitivity pool, considered as moderate heterogeneity, and 96.50 (95% CI 94.38-98.62) in the DID and/or CIE sensitivity pool, considered to be highly heterogeneous. First, we investigated the subgroup analyses, evaluating only the three studies using healthy controls [21][22][23]. We found a heterogeneity (I2) of 72.40% in the ELISA sensitivity pool and 88. 20% in the DID and/or CIE sensitivity pool, considered as high heterogeneity. These results are presented in Figs 7 and 8.
Next, we investigated the second subgroup analyses, evaluating only the two most recent studies using commercial ELISA tests [23,26]. The heterogeneity (I2) was 0% for sensitivity and specificity. When we studied the reference tests, the heterogeneity (I2) was 97.8% for sensitivity and 0% for specificity.
Studies using in-house ELISA tests show large methodological differences in their performance. High heterogeneity was maintained for sensitivity in both studies using DID and/or

PLOS ONE
CIE tests [23,26], considering that the precipitation tests are all in-house and also present large methodological differences in the studies included in this review.

Discussion
To our knowledge, this is the first systematic review to compare ELISA test with precipitin tests (DID and/or CIE) for CPA diagnosis. Although current studies suggest ELISA as a better performing test for CPA diagnosis, precipitation tests are still considered to be the reference test in many countries, especially in Brazil where this review was performed.
Fourteen articles that met the criteria for the research question were included, and all studies were considered to have an uncertain or high risk of bias in some domains in the quality risk assessment. Important methodological differences were verified, mainly related to the in-house ELISA tests. More recent studies with commercial ELISA tests were included in the review, with the differences described. We also observed this phenomenon in DID and/or CIE tests, because these are all still in-house.
We observed mainly in the former studies that population selection was based on stored samples from patients already diagnosed with CPA and submitted to tests described in the review. In addition, the lack of a checklist in the study descriptions was evident. Many QUA-DAS-2 items were not clearly reported, interfering with the quality evaluation. As an example, although we were skilled in extracting the data for constructing the 2 × 2 table, we noted that the discussion and conclusion of one study had an error in printing that was not compatible with the objective, methods, and results of the article [21].
The best performances in the ELISA evaluation of individual studies included in the metaanalysis based on the Youden's test were from the commercial tests [23,26], ImmunoCAP and Immulite tests, which ranged from 0.94 to 0.96. The best cut-off from the ImmunoCAP system in the individual studies was 27 mgA/L [26].
Our study shows several methods used to identify A. fumigatus-specific IgG. For in-house testing, we observed a variety of concentrations and antigens used. For commercial tests, there is also no standard cut-off values. The CPA category could justify different values and the possibility of other etiologies causing fungus ball [35]. Other possibilities for different cut-off values observed in our study may be related to the use of healthy or disease controls and ethnic differences.
When we evaluated Youden's J statistic for the precipitation test (DID or CIE) in the studies included in the meta-analysis, only one study presented a performance of 0.96 [21]. The performance for the other studies [22,23,26] ranged between 0.26 and 0.59.
The limitations regarding the use of the precipitin test are based on the requirement for immunodiffusion and electrophoresis migration methods. They do not present antigen standardization, besides requiring additional work and much time to obtain the results, especially in low resource countries [36].
The ELISA test seems to be promising. However, even with important methodological differences, it was useful to evaluate the use of diagnostic data for CPA in each study where it was possible to obtain data for sensitivity and specificity calculation. Two more recent studies were highlighted in this review [23,26], with sensitivities presenting low confidence intervals for the ELISA test. These studies showed a better performance than the confidence intervals from the reference tests (DID and/or CIE). Besides that, the pooled LR+/LR-from the ELISA test presented conclusive evidence, and this was not observed in the reference test results.
Several studies have recently been published with serological data using only commercial ELISA tests for CPA diagnosis in an area with high tuberculosis prevalence [1,12,37].
The limitations of this study depend on the primary studies. There were problems regarding individual reporting in the primary studies, so we could not construct a 2 × 2 table. In some cases, the lack of appropriate reporting made us judge the study as having an unclear [21,28,30,33,34] or high risk of bias [27,31].
The availability of commercial tests demonstrated in recent studies [23,26] may facilitate the incorporation of the ELISA test into clinical practice, allowing standardized use for the diagnosis of CPA and replacing the reference test that still depends on in-house performance.
Since the global CPA burden is substantial, mainly as a complication of pulmonary tuberculosis (PTB) [38] and especially in countries such as Brazil, which is among the 30 countries representing over 80% of tuberculosis cases worldwide in 2015 [39], there is still a need for well-designed studies to obtain evidence and demonstrate the use of the ELISA tests compared to precipitation tests.
Although it is not possible to define the evidence strength, the clinical implications of this study were as follows: precipitin detection is laborious, requiring specialized laboratories and presenting low sensitivity for the diagnosis of CPA; in-house ELISA tests do not present standard concentrations and antigens for comparative studies; commercial ELISA tests show better performance for diagnosing CPA, but additional studies must be conducted to identify the best cut-off value; and the ImmunoCAP and Immulite systems demonstrated the best performances among commercial tests.
In conclusion, our meta-analysis suggests that the ELISA test presented better accuracy than the precipitation tests (DID and/or CIE) for CPA diagnosis. Thus, ELISA can be considered as the test of choice in clinical practice.  Table. (TIF) S3 Table. (TIF)