Serum HER2 Is a Potential Surrogate for Tissue HER2 Status in Gastric Cancer: A Systematic Review and Meta-Analysis

Determining the expression level of human epidermal growth factor receptor 2 (HER2) in tumor tissue is of great importance for personalized therapy in gastric cancer. Although several studies have investigated whether serum HER2 can serve as a surrogate for tissue HER2 status, results have been inconsistent. We therefore performed a meta-analysis of published clinical studies in an attempt to address this problem. PubMed, Embase, Web of Science, the Cochrane Library and Science Direct were queried for eligible studies that could provide sufficient data to construct 2 × 2 contingency tables. The quality of the studies included in the meta-analysis was assessed in accordance with the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria. The pooled sensitivity, specificity and diagnostic odds ratio (DOR) were calculated for the eligible studies. The summary receiver operating characteristic (SROC) curve was constructed and the area under the SROC (AUSROC) was used to evaluate overall diagnostic performance. Eight studies comprising a total of 1170 participants were included in our meta-analysis. The pooled sensitivity, specificity and DOR were 0.39 (95% CI: 0.21–0.61), 0.98 (95% CI: 0.87–1.00), and 27 (95% CI: 9–81), respectively. The AUSROC was 0.77 (95% CI: 0.73–0.80) and Deeks funnel plot suggested the absence of publication bias (p = 0.91). Meta-regression analysis indicated that threshold effect was the main source of heterogeneity. Assays for evaluating serum HER2 levels are highly specific and demonstrate moderate diagnostic performance for HER2 tissue status in gastric cancer.


Introduction
The past few years have witnessed groundbreaking advances in the application of personalized treatment based on a patient's genetic background. For gastric cancer, the 2010 phase III Trastuzumab for Gastric Cancer (ToGA) clinical trial demonstrated the efficacy and safety of trastuzumab in the management of human epidermal growth factor receptor 2 (HER2)-positive advanced gastric or gastroesophageal junction (GEJ) cancer [1]. HER2 is encoded by the HER2/neu oncogene and belongs to the HER family of tyrosine kinase receptors. These receptors play a central role in the regulation of various cellular processes including growth, survival and migration, and represent candidate molecular therapeutic targets in a range of cancer types [2]. HER2 overexpression is observed in 6% to 35% of gastric cancers [3] and the relationship between this overexpression and patient prognosis remains controversial [4][5][6]. HER2-positive patients with gastric cancer qualify for trastuzumab-based therapy and, therefore, determination of HER2 status in patient tumor tissue is of great importance prior to the administration of anti-HER2 targeted treatment.
In current clinical practice, tissue samples obtained by surgery or biopsy are primarily used to evaluate a patient's HER2 status following analysis by immunohistochemistry (IHC) or fluorescence in situ hybridization (FISH) [7,8]. However, these invasive procedures cannot be performed repeatedly to check for dynamic changes in HER2 status during patient treatment or follow-up because they represent impractical methods for monitoring treatment response. Furthermore, complications associated with biopsy can be encountered [9]. Additionally, conflicting results from IHC or FISH analysis, which may arise because of variations in specimen processing procedures between different laboratories, the existence of non-standardized assays and scoring systems, or the inherent heterogeneity within tumor cells, can also prove problematic [10,11]. In view of such potential drawbacks, investigators have become increasingly interested in developing more convenient, reproducible and less invasive detection methods for identifying HER2-positive patients.
The expression level of serum HER2 represents a noninvasive biomarker that could supplement current HER2 testing. Studies have revealed a relatively high concordance rate between elevated serum HER2 levels and positive HER2 status in tumor tissues [12,13]. It has also been reported that serum HER2 could potentially be used to monitor breast cancer relapse [14]. Compared with biopsy, serum HER2 analysis provides a more reproducible and suitable option as a general screening test for the characterization of a cancer patient's genetic profile, and would therefore greatly benefit the field of targeted cancer therapy. With respect to gastric cancer, several clinical centers have investigated correlations between serum HER2 and clinicopathological characteristics in addition to the tissue HER2 status of patients [15][16][17]. Changes in serum HER2 levels during chemotherapy have been reported to correlate with response to chemotherapy in patients with HER2-positive tumors [18]. Nonetheless, these studies differ in many aspects such as cohort size, sampling time, tumor-node-metastasis (TNM) stage, and cut-off values, all of which complicate the ability to draw definitive conclusions.
In this study we therefore conducted a meta-analysis to investigate the diagnostic accuracy of serum HER2 for determining tumor HER2 status. To the best of our knowledge, this work has not been performed previously.

Methods
The meta-analysis was conducted in accordance with the standard guidelines for systematic reviews of diagnostic test accuracy [19] and used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [20] as the template for reporting (S1 PRISMA Checklist).

Search strategy
PubMed, Embase, Web of Science, the Cochrane Library and Science Direct electronic databases were searched for entries recorded from the time of database inception to 17 January, 2015. The elements of the following three categories were applied in various combinations when interrogating databases: ("serum" or "soluble") and ("human epidermal growth factor receptor 2" or "HER2" or "c-erbB-2") and ("gastric cancer" or "stomach neoplasm" or "stomach tumor"). English language restriction was applied to all searches. Review articles and reference lists were also manually screened for relevant studies.

Study selection
Two investigators independently reviewed the abstract and title of each publication to identify those articles that were likely to assess the diagnostic value of serum HER2 in gastric cancer. When such articles were identified full-texts were further screened to evaluate study relevance. Eligible studies were selected according to the following inclusion criteria: i) expression levels of both serum and tissue HER2 were detected in gastric cancer patients; ii) the diagnosis of gastric cancer was confirmed by histopathological or cytological examination; iii) sufficient data was available to construct a 2 × 2 contingency table. Exclusion criteria were as follows: i) duplicates; ii) conference abstracts, comments, letters and animal trials; iii) insufficient data to calculate for sensitivity and specificity, i.e., false and true positives and negatives were not available. When there was uncertainty regarding eligibility, a discussion would be held between the two investigators and a senior investigator until consensus was reached.

Data extraction and quality assessment
Two investigators independently extracted data from each study, such as surname of first author, year of publication, country of origin, enrolled participants, TNM stage, major tumor subtype (Lauren's classification), methods of serum HER2 detection, the cut-off value, and the numbers of true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN). When different cut-off values were used in the same study, the pre-specified value and the associated sensitivity and specificity were extracted [16,21]. The methodological quality of each eligible studies was assessed by means of the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria [22], which consist of four key domains that discuss patient selection, index test, reference standard and flow and timing. Each domain deals with the risk of bias while the first three domains also deal with concerns regarding applicability. Signaling questions are included to assist judgment in terms of risk of bias and concerns with applicability. Questions are answered as "yes", "no" or "unclear" and are phrased such that "yes" implies low risk of bias/concerns and "no" implies high risk of bias/concerns.

Statistical analysis
A 2 × 2 table was constructed to calculate sensitivity, specificity, positive likelihood ratios (PLR), negative likelihood ratios (NLR) and the diagnostic odds ratio (DOR) by using bivariate regression models, as recommended by the Cochrane Diagnostic Test Accuracy Working Group [19]. The bivariate model takes into account the potential trade-off effect between sensitivity and specificity [23]. The PLR is calculated as follows: sensitivity/(1−specificity) while the NLR is calculated as follows: (1−sensitivity)/specificity. Generally, a PLR > 5.0 and a NLR < 0.2 were considered clinically useful [24,25]. DOR is a prevalence-independent indicator that combines sensitivity and specificity, and is calculated as: PLR/NLR [26]. The summary receiver operating characteristic (SROC) curve was constructed and area under the SROC (AUSROC) calculated.
The I 2 statistic [27], which describes the percentage of the total variation across studies that is attributable to heterogeneity rather than chance, was used to assess between-study heterogeneity, with a value exceeding 50% indicating the existence of significant heterogeneity. The I 2 statistic is calculated as follows: 100%×(Q-df)/Q, where Q is Cochran's heterogeneity statistic and df is degrees of freedom. The value of I 2 ranges from 0-100%, with 0% implying no observed heterogeneity, and larger values indicating increasing heterogeneity. Heterogeneity caused by the threshold effect was evaluated by Spearman's correlation coefficient as described previously [28][29][30]. Univariable and multivariable meta-regression analyses were performed to detect the amount of non-threshold effect heterogeneity between studies. Deeks funnel plot was applied to evaluate publication bias and a significance level of α = 0.05 was used to determine statistical significance [31].
All analyses were conducted using Stata software (version 12.0, StataCorp, College Station, Texas USA).

Results
After excluding duplications, the initial search yielded 1504 records, of which 15 articles were considered potentially suitable for inclusion in the meta-analysis. Following a review of fulltext articles, a further seven were excluded [32][33][34][35][36][37][38], leaving a total of eight articles for the pooled analysis [15-18, 21, 39-41]. A flowchart summarizing the selection procedure for study eligibility is shown in Fig 1. A total of 1170 patients with gastric cancer were included in the meta-analysis with the median patient age among studies ranging from 53 to 71 years. Four studies were conducted retrospectively manner, while the remaining did not specify the design. Seven studies took blood samples prior to any treatment and one study took samples after treatment. The prevalence of HER2 overexpression in gastric cancer patients across the studies ranged from 6.7% to 61.6%. The optimal cut-off value was pre-specified in five of the studies. In these studies a serum HER2 concentration of 15.0 ng/ml (recommended by the Food and Drug Administration for breast cancer) or 15.2 ng/ml (recommended in the manufacturer's instructions for the relevant commercial kit) was used. The remaining three studies used the concentration of serum HER2 of healthy control individuals as the cut-off point. To determine serum HER2 levels, a chemiluminescence immunoassays (CLIA) was used in five of the studies and an enzyme-linked immunosorbent assay (ELISA) was used in the remaining three studies. Individual study characteristics are summarized in Table 1.
The between-study variability (i.e., heterogeneity) was found to be high for both sensitivity (I 2 = 80.27%) and specificity (I 2 = 95.67%). We next investigated whether there was threshold effect where variations in sensitivity and specificity correlated with differences in the cut-off value of serum HER2 in the included studies. Surprisingly, bivariate model analysis revealed that there was a negative correlation between sensitivity and specificity and that 100% of heterogeneity was likely to be attributed to threshold effect. The Spearman's correlation coefficient was −1. This type of effect can often result in a shoulder-like curve when sensitivity is plotted against specificity [23], as shown in Fig 4. We also explored other potential sources of heterogeneity by meta-regression analysis. TNM stage, patient group size, sampling time, detection method and ethnicity were used as covariates. Univariable meta-regression analysis revealed that patient group size accounted for the heterogeneity of sensitivity, while TNM stage, sampling time and ethnicity accounted for the heterogeneity of specificity (Fig 6). However, when all of the above covariates were included in a joint analysis model, meta-regression analysis revealed that none were responsible for heterogeneity (p values of 0.66, 0.07, 0.68 0.79 and 0.15 for stage, patient group size, sampling time, detection method and ethnicity, respectively).

Discussion
HER2-targeted therapy necessitates the determination of HER2 status in patients with cancer.
In clinical practice, surgical specimens and biopsy are often used to address this problem. However, surgical specimens are not usually available to patients who do not require tumor resection, and biopsied tissue may provide a false-negative result because of heterogeneous HER2 expression within the tumor [42]. Surgical specimens and biopsied tissue provide only limited information regarding a patient's HER2 status because they do not permit a real-time picture of HER2 status during disease progression and treatment. In contrast, serum HER2 levels can be examined easily and repeatedly, and in a less invasive manner. Serum HER2 monitoring is also less labor-intensive than surgical techniques and has the potential to be performed by automated assays enabling an objective analysis. Several studies have demonstrated high concordance rates between serum HER2 concentration and tissue HER2 status [12,43]. In our present meta-analysis, we found that serum HER2 showed high specificity for the detection of tissue HER2 status in gastric cancer.    The AUSROC provides a global estimate of diagnostic performance. According to the suggested guidelines for the interpretation of the AUSROC value [44], serum HER2 had a moderate (0.7 < AUC < 0.9) diagnostic ability to discriminate HER2-positive gastric cancer patients from HER2-negative patients. The DOR can take values between zero and infinity, with higher   [24]. Fagan's nomogram is a graphical tool that helps clinicians use test results to estimate a patient's probability of having a disease [45]. In the nomogram, a line drawn from a patient's pre-test probability (usually estimated according to local prevalence data and published reports) through the likelihood ratio of the test intersects with the post-test probability, which indicates the patient's chance of having the disease after the test results are known. In Fig 5, with a pooled PLR of 17 and a hypothetical pre-test probability of 16% (the median prevalence for the eight studies), the use of serum HER2 as a test for HER2-positive tumor diagnosis would raise the post-test probability to 76%, which means that the probability of gastric cancer patients having tissue-HER-2 positive status increases from 16% to 76% in the event of a positive serum-HER2 test. Conversely, with a pooled NLR of 0.62, the post-test probability decreases from 16% to 11% after a negative serum-HER2 test. These findings therefore demonstrate that a HER2 serum test result is of moderate diagnostic value in determining the HER2 status of a patient. Pooled estimates of sensitivity and specificity were associated with an I 2 of 80.27% and 95.67% respectively, implying the presence of heterogeneity. To our surprise, the bivariate model analysis revealed that all heterogeneity was likely to be the consequence of threshold effect. Threshold effect occurs when the studies included in the meta-analysis use different numerical cut-off values to define a test result as positive [19]. Cut-off values in diagnostic tests represent a major concern as different thresholds can result in variations in sensitivity and specificity. Of the studies included in our meta-analysis, five used cut-off points that were recommended either by the Food and Drug Administration for breast cancer or by the manufacturer of a particular commercial kit used in a study. The remaining three studies used serum HER2 levels calculated for healthy control individuals as a cut-off point and these control concentrations varied between the three studies. Using thresholds that were established for breast cancer may be inappropriate for patients with gastric cancer. Peng et al. have identified the optimal cut-off point of serum HER2 as 10.65 ng/ml for gastric cancer patients, which is lower than the 15.0 ng/ml cut-off point more generally applied [21]. When using this 10.65 ng/ml threshold, ROC analysis yielded a better sensitivity when compared with that obtained for the 15.0 ng/ml threshold (67.4% vs. 40%), while specificity remained unchanged at 78.9%. However, in another study [16], the optimal serum HER2 cut-off value for predicting tissue HER2 status in gastric cancer was found to be 16.35 ng/ml, which exceeds the 15.0 ng/ml threshold. These results suggest that the optimal serum HER2 cut-off value for patients with gastric cancer may differ from that used for breast cancer patients. Kong et al. have even suggested that different ethnicities have different cut-off values and that it may be beneficial to establish cutoff values for each ethnic population [46]. This is an area of research that clearly warrants further investigation.
To explore other potential factors that may account for heterogeneity, we also performed meta-regression analysis. Multivariable meta-regression analysis revealed that TNM stage, patient group size, sampling time, detection method, and ethnicity were not responsible for heterogeneity. This finding further supported our previous result that heterogeneity was likely to be entirely the consequence of threshold effect. In view of the small number of studies included in our analysis, we would interpret the data in a more cautious way and suggest that the relatively small number of eligible studies may compromise the power of test.
It is important to acknowledge the potential limitations of the present meta-analysis. First, despite an in-depth search of several electronic databases, the number of included studies was small. Second, all of the studies were from Asia, which may lead to some inherent bias. Third, we could not determine the ideal cut-off value for the serum HER2 test because the raw data were not provided in the published reports. Large-scale prospective randomized trials are needed to address this problem. Fourth, the quality assessment results shown in Fig 2 indicate that there was a possible risk of bias and concerns regarding applicability in the patient selection procedure in the included studies.
To the best of our knowledge, our study represents the first reported meta-analysis directed at evaluating the clinical utility of serum HER2 in the diagnosis of HER2 tumor status. Our analysis has shown that serum HER2 had high specificity and moderate diagnostic value for distinguishing HER2-positive gastric cancer patients from HER2-negative patients, suggesting its potential as a surrogate biomarker of HER2 status. Prospective studies are now needed to further validate our findings. In particular, the ability to predict patient prognosis and response by monitoring serum HER2 levels during disease progression and treatment warrants additional investigation.