Diagnostic Accuracy of Ber-EP4 for Metastatic Adenocarcinoma in Serous Effusions: A Meta-Analysis

Numerous studies have investigated the utility of Ber-EP4 in differentiating metastatic adenocarcinoma (MAC) from malignant epithelial mesothelioma (MM) and/or reactive mesothelial cells (RM) in serous effusions. However, the results remain controversial. The aim of this study is to determine the overall accuracy of Ber-EP4 in serous effusions for MAC through a meta-analysis of published studies. Publications addressing the accuracy of Ber-EP4 in the diagnosis of MAC were selected from the Pubmed, Embase and Cochrane Library. Data from selected studies were pooled to yield summary sensitivity, specificity, positive and negative likelihood ratio (LR), diagnostic odds ratio (DOR), and receiver operating characteristic (SROC) curve. Statistical analysis was performed by Meta-Disc 1.4 and STATA 12.0 softwares. 29 studies, based on 2646 patients, met the inclusion criteria and the summary estimating for Ber-EP4 in the diagnosis of MAC were: sensitivity 0.8 (95% CI: 0.78–0.82), specificity 0.94 (95% CI: 0.93–0.96), positive likelihood ratio (PLR) 12.72 (95% CI: 8.66–18.7), negative likelihood ratio (NLR) 0.18 (95% CI: 0.12–0.26) and diagnostic odds ratio 95.05 (95% CI: 57.26–157.77). The SROC curve indicated that the maximum joint sensitivity and specificity (Q-value) was 0.91; the area under the curve was 0.96. Our findings suggest that BER-EP4 may be a useful diagnostic adjunctive tool for confirming MAC in serous effusions.


Introduction
Distinguishing metastatic adenocarcinoma (MAC) from malignant mesothelioma (MM) and/or reactive mesothelial cells (RM) is very important for staging and has significant treatment implications. However, it is difficult to differentiate malignant cells from reactive mesothelial cells, especially in cases involving malignant mesothelioma versus adenocarcinoma [1][2][3][4]. A biopsy provides a relatively high sensitivity and has been used as the gold standard diagnostic method [5,6], however, these operations are invasive, operator dependent, and may complicate subsequent disease management by seeding tumor cells or be unfeasible because of poor condition of the patient. Tumor biomarkers are attractive adjuncts because of their noninvasive feature and relative inexpensiveness. So far, many tumor biomarkers directed against specific cell type antigens have been used in serous effusions to improve the accuracy of diagnosis, but the results are not always in agreement [7,8]. It remains unclear which marker has a superior performance and application of a novel panel of diagnostic markers for early and accurate detection of MAC is mandatory to aid conventional tests.
Ber-EP4 is a monoclonal antibody that identifies 34-kD and 39-kD cell surface glycoproteins present on the membrane of human epithelial cells but not on reactive or malignant mesothelial cells [9]. An increasing number of studies have shown the ability of this antibody to be a marker in the differential diagnosis of MAC from MM/RM [1,. Systematic analysis of these data may be valuable to finally confirm the application potential of Ber-EP4 as a marker for MAC. So we performed this meta-analysis to explore the potential value of Ber-EP4 in the diagnosis of MAC from MM/RM, which, to the best of our knowledge, has not been previously performed.

Search strategy and study selection
A search of the literature was conducted using the electronic databases Pubmed, Embase, Cochrane Library, Web of Science, and The Chinese Journals Full-text Database (CNKI) (updated to December 31, 2013). The search terms used were: ''Ber-EP4,'' ''body fluids,'' ''effusions,'' ''sensitivity and specificity,'' and ''accuracy.'' Only full-text papers published in English and Chinese were included. The reference lists of all articles reviewed were also searched for eligible studies. The following criteria were used in the selection of literature for meta-analysis: (1) studies evaluated Ber-EP4 in the differential diagnosis of MAC and MM/ RM in serous effusions, (2) each study contains more than ten fluid specimens, and (3) studies must provide sufficient data to calculate both sensitivity and specificity. Publications with evidence of a possible overlap of patients with other studies were discussed by BW and DDL and only the best quality study was used. Two reviewers (BW and DDL) independently judged study eligibility while screening the citations. Disagreements were resolved by consensus.

Data extraction and quality assessment
Two authors (BW and DDL) independently extracted the data and reached a consensus on all items. Any discrepancies were resolved by discussion with a third author (YLF) to reach a final consensus. The following data were collected from each study: the first author's name, publication year, country, test methods, cutoff value, sensitivity, specificity. The methodological quality of each study was assessed using guidelines published by the STARD (standards for reporting diagnostic accuracy, maximum score 25) initiative [38] (ie, guidelines that aim to improve the quality of reporting in diagnostic studies) and the QUADAS-2 (quality assessment for studies of diagnostic accuracy, an evidence-based quality assessment tool for use in systematic reviews of diagnostic accuracy studies) tool. The QUADAS-2 tool consists of 4 key domains that discuss patient selection, index test, reference standard and flow of patients through the study and timing of the index tests and reference standard (flow and timing) [39].

Statistical analyses
The standard methods recommended for the diagnostic accuracy of meta-analyses were used [40]. The following indexes of test accuracy were computed for each study: sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic odds ratio (DOR). The diagnostic threshold identified for each study was used to plot a summary receiver operating characteristic (SROC) curve [41]. To detect cut-off threshold effects, the relationship between sensitivity and specificity was evaluated by the Spearman correlation coefficient. The inter-study heterogeneity was calculated by the chi-square-based Q test and the inconsistency index I 2 . When a significant Q test (p,0.05 or I 2 .50%) indicated heterogeneity among studies, the random-effect model (DerSimonian-Laird method) was conducted for the meta-analysis to calculate the pooled sensitivity, specificity, and other related indexes of the studies; otherwise, the fixed-effect model (Mantel-Haenszel method) was chosen. Meta-regression was performed to investigate the source of heterogeneity within the included studies (inverse variance weighted) [42]. Since publication bias is of concern for meta-analyses of diagnostic studies, we tested for the potential presence of this bias using Deeks' funnel plots [43]. Analyses were performed using the following statistical software programs: STATA, version 12.0 (Stata Corporation, College Station, TX, USA) and Meta-Disc 1.4 for Windows (XI Cochrane Colloquium, Barcelona, Spain) [44,45]. In every test, a two-sided p-value of ,0.05 was considered statistically significant.

Quality of reporting and study characteristics
The article selection process used in this study is summarized in Fig. 1. A total of 29 studies published between 1993 and 2013 met the inclusion criteria and were included in the present metaanalysis. The main clinical characteristics of the included studies are presented in Table 1. Overall, the 29 selected studies, which originated from 14 countries, included 2646 individuals and the sample size varied from 17 to 232 individuals with an average size of 90 individuals. In all studies included in the meta-analysis, the cytological diagnoses of all cases were proved by histopathology or clinical data. 19 of all studies received the same reference standard, indicating that there was partial potential verification bias. 21 of all studies, samples were collected from consecutive or random selected patients. 6 of all studies make inappropriate exclusions (for  example, not including ''difficult-to-diagnose'' patients). Only 1 study did not report blinded interpretation of Ber-EP4 assay independent of the reference standard. Most studies had an adequate description of the used cut-off value of the marker. Details of the staining methods of the studies included in the metaanalysis was presented in Table S1. In all, 29 studies included in our meta-analysis had higher STARD scores ($13, data not shown), which showed high quality. As shown in Table 2, validity of included trials was assessed using the QUADAS-2 tool. Based on the methods reported in each trial, each of the 14 components according to QUADAS-2 criteria was graded ''yes'', ''unclear'' or ''no'', which meant ''low risk of bias'', ''uncertain of bias'' and ''high risk of bias'', respectively [39].

Diagnostic accuracy
The between-study heterogeneity was assessed by I 2 index to choose the appropriate calculation model. The I 2 of sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR) and DOR were 88.1% (p,0.0001), 69.2% (p,0.0001), 52.2% (p = 0.0006), 92.5% (p,0.0001), and 44.2% (p = 0.0061), respectively. Therefore, the random effects model was used for calculating pooled sensitivity, specificity, PLR, NLR and DOR in present meta-analysis. Fig. 2 shows the forest plots of the sensitivity and specificity of these 29 studies concerning BerEP4 in the diagnosis of MAC. The pooled sensitivity and specificity were 0.8 (95% CI: 0.78-0.82) and 0.94 (95% CI: 0.93-0.96), respectively. The overall PLR and NLR were 12.72 (95% CI: 8.66-18.7) and 0.18 (95% CI: 0.12-0.26), respectively. The pooled diagnostic odds ratio (DOR) was 95.05 (95% CI: 57.26-157.77). The SROC curve for BerEP4 is shown in Fig. 3, which indicates sensitivity versus 1-specificity of individual studies. As a global measure of test efficacy we used Q-value, the intersection point of the SROC curve with a diagonal line from the left upper corner to the right lower corner of the ROC space which corresponds to the highest common value of sensitivity and specificity for the test, for the overall measure of the discriminatory power of the test. Our data showed that the SROC curve for BerEP4 is positioned near the desirable upper left corner and the Q-value was 0.91; while the area under the curve (AUC) was 0.96, indicating that the level of overall accuracy was high. To explore the possible reasons for the heterogeneity, a meta-regression analysis based on test method (cell blocks or smears), sample size ($100 or ,100), lack of blind and other methodological quality according to QUADAS-2 tool and STARD guideline (data not shown). Statistical significance could be observed between studies with and without enrolling consecutive/random sample of patients (p = 0.0002, data not shown). None of the other covariates included in the meta-regression was found to be the significant source of heterogeneity (all p.0.05, data not shown).

Publication bias evaluation
Publication bias was explored through Deeks' funnel plots. The shape of the funnel plot of the pooled DOR of BerEP4 for the diagnosis of malignant effusions did not reveal any evidence of obvious asymmetry (Fig. 4). The Deeks' test also showed a statistically non-significant value (p = 0.81), indicating that there was no potential publication bias.

Discussion
Effusion in body cavities is a common complication which may result from a variety of clinical settings including infections, cardiac failure, and malignancies such as lung, breast, gastrointestinal, and female genital adenocarcinoma as well as malignant  LR: low risk; HR: high risk; UR: unclear risk; LC: low concern; HC: high concern; UC: unclear concern. doi:10.1371/journal.pone.0107741.t002 Ber-EP4 and Metastatic Adenocarcinoma in Serous Effusions PLOS ONE | www.plosone.org mesothelioma [46,47]. Distinguishing malignant epithelial cells from mesothelial cells is critical in the differential diagnosis of body cavity effusions. However, adenocarcinoma metastatic to serous membranes is often associated with prominent mesothelial hyperplasia and often results in diagnostic confusion. This phenomenon is a major problem in routine cytology, and a reliable method is needed. Immunohistochemistry can greatly aid in resolving such diagnostic dilemmas. Unfortunately, currently available markers have varying sensitivities and specificities for epithelial or mesothelial cells. Ber-EP4 is a monoclonal antibody that identifies 34-kD and 39-kD cell surface glycoproteins present on the membrane of human epithelial cells but not on reactive or malignant mesothelial cells. In recent years, an increasing number of studies have attempted to evaluate the diagnostic accuracy of Ber-EP4 for MAC but the results remain controversial because of several factors, including the differences in study designs, sample size, statistical methods, etc. [48]. In this regard, we performed this current meta-analysis to comprehensively assess the diagnostic accuracy of Ber-EP4 for MAC in serous effusions. The SROC curve presents a global summary of test performance, and shows the trade-off between sensitivity and specificity. The present meta-analysis has shown that the mean sensitivity of the Ber-EP4 was 0.8 while the mean specificity was 0.94, and that the maximum joint sensitivity and specificity (Q value) was 0.91 while the AUC was 0.96, indicating a good overall accuracy in the diagnosis of MAC, although not perfect. The DOR, the ratio of the odds of positivity in disease relative to the odds of positivity in the non-diseased, is a single indicator of diagnostic test performance [49] that combines the data from sensitivity and specificity into a single number. The value of a DOR ranges from 0 to infinity, with higher values indicating better discriminatory test performance (higher accuracy). A DOR of 1.0 indicates that a test cannot discriminate between patients with the disorder and those without it. In this meta-analysis, the pooled DOR was 95.05, also suggesting a high level of overall accuracy. However, the SROC curve and the DOR are not easy to interpret and use in clinical practice, while the likelihood ratio (PLR and NLR) is more clinically meaningful for our measures of diagnostic accuracy. A PLR value of 12.72 suggests that patients with MAC have about 13-fold higher chance of being Ber-EP4-positive compared to those with MM/RM, and this was high enough for the clinical practice. On the other hand, the NLR was 0.18, which means that the probability of having MAC in Ber-EP4-negative patients is 18% in theory, while, for instance, cancer cells may be absent or scanty on the cell blocks or smears used for immunostaining, which may have inflated the false negative rate.
The I 2 test for the pooled sensitivity, specificity, PLR, NLR and DOR showed that the heterogeneity between the studies was obvious. So we undertook a meta-regression analysis to find the possible reasons for heterogeneity. Some papers have reported that the cell block sections may be the most suitable form of sample preparation when performing immunostaining on effusions due to ease of morphologic interpretation, standardized like-like comparison with surgical pathology material, least amount of background stain, and expected immunostaining patterns [32][33][34][35]. So we first considered that the test method (cell blocks or smears) might contribute to the heterogeneity. However, meta-regression analysis indicates that the above variable was not the source of heterogeneity (p = 0.9046, data not shown). The other primary cause of heterogeneity in test accuracy studies is threshold effect, which arises when differences in sensitivities and specificities occur due to different cut-offs or thresholds used in different studies to define a positive or negative test result. We used the Spearman correlation coefficient to analyze the threshold effect. No heterogeneity could be observed from threshold effects (p.0.05, data not shown). Then we chose to investigate whether the QUADAS results, the STARD scores, lack of blinding, and the sample size were responsible for the heterogeneity noted. Statistical significance was observed between studies with and without enrolling consecutive/random sample of patients (p = 0.0002), indicating that patient selection bias may affect the diagnostic accuracy. The study participants must be representative of the study entrants in order for the study participants. Therefore, a study ideally should enroll a consecutive or random sample of eligible patients with suspected disease to prevent the potential patient selection bias [39,50].
In the present meta-analysis, the results indicate that Ber-EP4 may, to a certain extent, be valuable in the differential diagnosis of MAC in serous effusions. However, no single marker alone can establish the diagnosis in all cases of body cavity fluid, and combinations of Ber-EP4 with other epithelial or mesothelial stains are recommended to increase diagnostic accuracy [24]. It has been reported that MOC-31 was100% sensitive and 100% specific in differentiating MAC from MM and RM, and the staining combination of positive for MOC-31 and negative for D2-40 or calretinin was 100% specific and 99% sensitive for MAC [24]. Furthermore, the combined use of Ber-EP4, MOC-31, CA19-9, and CEA antibodies might be a suitable panel for the discrimi-nation between adenocarcinoma cells and reactive mesothelial cells [1]. However, due to the varying degrees of diagnostic accuracy of identical markers reported in different studies, it remains unclear which marker has a superior performance. Therefore, more immunomarkers should be comprehensively evaluated for their diagnostic accuracy and high-quality diagnostic tests are needed to find the optimum panel of antibodies for the diagnosis of MAC in serous effusions.
Our study had some limitations. First, only published studies were included in this meta-analysis, the exclusion of unpublished data, ongoing studies, conference abstracts and letters to editors may have led to publication bias. Second, verification bias can occur since some adenocarcinoma was diagnosed in some patients based just on the clinical course, but not diagnosed by histological examination. This issue regarding accuracy of diagnosis can cause nonrandom misclassification, leading to biased results. Furthermore, 6 studies that make inappropriate exclusions (for example, not including ''difficult-to-diagnose'' patients) may result in overestimation of diagnostic accuracy, even though no significance could be detected in our meta-regression analysis. Third, different cutoff values were used in the included studies, which made it difficult to determine the optimized cutoff value. Fourth, because of lacking of required data reported in the original publications, we could not analyse the effect of factors such as laboratory infrastructure, expertise with tumour marker assay technology, patient spectrum and setting on the accuracy of the Ber-EP4 measurements.
Despite these limitations, our study is the first comprehensive meta-analysis to date to have assessed the diagnostic accuracy of Ber-EP4 for MAC in serous effusions. The results demonstrated that Ber-EP4 may be a useful adjunct to conventional diagnostic tools for accurately differentiating MAC and MM/RM, but should be interpreted in parallel with the gold standard of morphology and clinical findings. Further blinded larger-scale prospective cohort studies are needed and they should focus on the application of a novel panel of diagnostic markers for early and accurate detection of MAC.

Supporting Information
Table S1 Summary of the studies included in the metaanalysis.