Diagnostic Value of the Combination of Golgi Protein 73 and Alpha-Fetoprotein in Hepatocellular Carcinoma: A Meta-Analysis

Conflicting results have been widely reported on the use of Golgi protein 73 (GP73) as a serum biomarker for diagnosing hepatocellular carcinoma (HCC). This study evaluated the accuracy of GP73, alpha-fetoprotein (AFP), and GP73 + AFP for diagnosing HCC. The meta-analysis was performed on 11 studies that were selected by means of a comprehensive systematic literature review. Summary diagnostic accuracy, meta-regression analysis for heterogeneity and publication bias, and other statistical analyses were performed using Meta-Disc (version 1.4) and Stata (version 12.0). Pooled sensitivity, specificity, and diagnostic odds ratio were 0.77 (95% CI: 0.75–0.79), 0.91 (95% CI: 0.90–0.92), and 12.49 (95% CI: 4.91–31.79) for GP73; 0.62 (95% CI: 0.60–0.64), 0.84 (95% CI: 0.83–0.85), and 11.61 (95% CI: 8.02–16.81) for AFP; and 0.87 (95% CI: 0.85–0.89), 0.85 (95% CI: 0.84–0.86), and 30.63 (95% CI: 18.10–51.84) for GP73 + AFP. The area under the curve values were 0.86, 0.84, and 0.91 for GP73, AFP, and GP73 + AFP, respectively. These results indicate that for HCC diagnosis, the accuracy of GP73 was higher than that of AFP, and that GP73 + AFP exhibited significantly higher diagnostic accuracy than did GP73 or AFP alone.


Introduction
Hepatocellular carcinoma (HCC) is one of the most common malignant cancers and the third leading cause of cancer-related deaths worldwide among men aged between 40 and 59 years [1]. HCC prevalence is high in Asia and in western and central Africa [2]. In USA, the incidence of HCC increased during 1973-2011 on a year-on-year basis [3]. The 5-year recurrence rate of HCC is 48.8%, and the mean survival time is between 54.4 and 70.0 months [4]. A 10-year survey conducted in China indicated that the social cost and burden of HCC was the highest among chronic diseases listed by the WHO [5]. Therefore, early detection and effective treatment are crucial for improving the survival and quality of life of patients with HCC.
Since the 1970s, alpha-fetoprotein (AFP) has been used as a primary diagnostic serum biomarker of HCC. However, serum AFP is not an accurate biomarker of HCC because of its low sensitivity and specificity [6][7][8][9]. Therefore, a novel serum biomarker that exhibits superior diagnostic accuracy is required for diagnosing HCC. Recent studies have identified various new tumor biomarkers such as Golgi protein 73 (GP73, also known as GOLPH2), interleukin-6, and squamous cell carcinoma antigen. GP73, a Golgi type II transmembrane protein of unknown function, is expressed at low levels in biliary epithelial cells in healthy livers and is detected in human serum [10,11], and the expression of GP73 is upregulated in the hepatocytes of patients with viral and non-viral liver diseases [12]. Serum GP73 and AFP have been used as biomarkers of HCC in several studies, but the results of these studies are heterogeneous and conflicting [13][14][15][16]. The present study performed a systematic literature review and meta-analysis to evaluate the accuracy of serum GP73 + AFP for diagnosing HCC.

Identification of studies
We comprehensively and systematically searched the literature for peer-reviewed English-language studies published in PubMed, Embase, or Web of Science before May 1, 2015. The keywords for the search included (1) GP73: GP73, Golgi protein 73, Golgi phosphoprotein 2, Golgi membrane protein 1; and (2) HCC: HCC, hepatocellular carcinoma, liver cancer, liver cell carcinoma, hepatic cell carcinoma. Furthermore, references of selected studies and other relevant published reports were manually searched. If more than one study was based on the same research topic or contained the same data, only the highest-quality study was selected. Conference abstracts and letters to the editor were excluded because these provided limited information.

Selection criteria
The titles and abstracts of selected studies were independently reviewed by 2 reviewers. Disagreements on study inclusion or exclusion were resolved by consensus. Next, full-text articles of potentially eligible studies were retrieved for further assessment. Studies were included in the meta-analysis if they provided both sensitivity and specificity data of serum GP73 and AFP used for diagnosing HCC based on histopathological confirmation. Only English-language full-text articles were reviewed and included in the final analysis.

Data extraction and quality assessment
Two reviewers independently extracted data such as first author name, year of acceptance for publication, country of study, number of patients with HCC, number of controls (healthy subjects or patients with cirrhosis, hepatitis, and other benign liver diseases), test methodology, cut-off value, and raw data for analyzing sensitivity and specificity (number of true-positive, false-positive, false-negative, and true-negative results) from the included studies. The quality of the included studies was assessed using the tool Quality Assessment of studies of Diagnostic Accuracy included in Systematic reviews (QUADAS; Cochrane Collaboration). Fourteen items in the QUADAS checklist were scored "yes", "no", or "unclear" [17].

Data analysis
A heterogeneity test and a random-effect model were used in case of heterogeneity between the included studies, whereas a fixed-effect model was used in the absence of any heterogeneity. In this study, the following analyses were performed: Spearman correlation coefficient, threshold effect, and diagnostic odds ratio (DOR; used to eliminate possible threshold effect); the overall sensitivity, specificity, positive likelihood ratio (PLR), and negative likelihood ratio (NLR); summary receiver operating characteristic (SROC) curve analysis; area under the curve (AUC) analysis; meta-regression analysis; and publication bias analysis. Publication bias was determined using Stata (version 12.0), whereas all other analyses were performed using Meta-Disc (version 1.4).

Study selection and analysis of study quality
The initial search identified 352 relevant studies, of which 109 were duplicates. After reviewing the titles and abstracts of the remaining 243 studies, 211 studies were excluded from the metaanalysis. The remaining 32 studies were considered eligible, and their full-text articles were reviewed. Of these, 21 studies were excluded because they were not diagnostic studies or they did not report sufficient data to construct a 2 × 2 table. Finally, 11 studies were included in the meta-analysis. The process of study selection is summarized in Fig 1. The 11 studies included 1764 patients with HCC and 4659 controls. All the patients underwent a single test for determining serum levels of GP73 and AFP [9,[18][19][20][21][22][23][24][25][26][27]. Four of the 11 studies included 1025 patients with HCC and 3813 controls who also underwent a test to determine the serum levels of GP73 + AFP [9,19,20,24]. Characteristics of the included studies are listed in Table 1. The results of quality assessment of the 11 studies by using QUADAS are shown in Table 2. Summary scores were not calculated because their interpretation could be challenging and potentially misleading [28]. The details of the 14 items in QUADAS (Table 2) are the following: 1. The spectrum of patients in all the selected studies was representative of the patients who received the test in regular clinical practice. 2. All 11 studies clearly defined selection and exclusion criteria. 3. All studies used appropriate reference standards to accurately classify the target condition. 4. In 4 of the included studies, blood samples were collected before intervention, but this was unclear in the case of the others studies. 5. In 8 studies, patients were tested and their disease status was confirmed bases on the aforementioned reference standard, but this is unclear for the other studies. 6. In 2 studies, patients received the same reference standard   regardless of the index test result. 7. In all 11 studies, the reference standard used was independent of the index test. 8. In 7 studies, the execution of the index test was described in sufficient detail to permit replication of the test. 9. In 7 studies, the execution of the reference standard was described in sufficient detail to permit its replication. 10. The index test results in none of the studies were interpreted without knowledge of the results of the reference standard. 11. In all studies, the reference standard results were interpreted without knowledge of the results of the index test. 12. In 10 studies, the same clinical data were available when test results were interpreted as would be available when the test is used in practice. 13  Representative patient spectrum?
Acceptable delay between tests? The Spearman correlation coefficient for AFP was 0.773 (P = 0.005), which indicated a threshold effect. The threshold effect was considered as one of the reasons for heterogeneity, but other reasons for heterogeneity could not be determined. The Spearman correlation coefficients for GP73 and GP73 + AFP were 0.6 (P = 0.051) and 0.4 (P = 0.6), respectively, which indicated a contrasting result. Obtaining the DOR is a primary outcome method used for eliminating a possible threshold effect and for moderately differentiating between patients with and without cancer [29]. In this study, the DOR was highest for GP73 + AFP and lowest for AFP alone. These results are listed in Table 3.

Meta-regression analysis for heterogeneity and publication bias
Significant heterogeneity was observed among the studies included in this meta-analysis. Because of the unsatisfactory quality of the test methodologies used in the included studies, the  small number of studies, and incomplete data, only 3 covariates (year of acceptance for publication, country of study, and assay methodology) were included in meta-regression analysis to assess their impact on sensitivity and specificity. DOR is a commonly used accuracy measure because it measures a diagnostic performance that includes both sensitivity and specificity or both PLR and NLR. As a global measure of diagnostic accuracy, DOR is suitable for comparing the summary diagnostic accuracy of distinct tests [30]. We observed that the 3 covariates of GP73 and AFP did not exert a statistically significant effect on DOR ( Table 4). The number of studies that provided the sensitivity and specificity data of GP73 + AFP for diagnosing HCC  was too small to allow a meta-regression analysis for heterogeneity. However, for publication bias analysis, funnel plots were obtained using the metafunnel command of Stata version 12.0 (Fig 8).

Discussion
HCC is a major public health concern worldwide. Early detection of HCC is critical for accurate treatment and for improving the health and survival of patients. In this meta-analysis involving 11 studies, we evaluated the accuracy of serum GP73, AFP, and GP73 + AFP for diagnosing HCC. However, because of several methodological limitations, the diagnostic accuracy values reported in the 11 studies showed significant heterogeneity. Whereas 4 of the 11 studies reported that GP73 was superior to AFP as a serum biomarker of HCC [18,24,25,27], the remaining 7 studies reported contrasting or ambiguous results [9,[19][20][21][22][23]26]. Furthermore, although 4 of the 11 studies reported the accuracy of GP73 + AFP for diagnosing HCC, the conclusions of these studies were inconsistent [9,19,20,24]. Eight of the 11 studies included in our meta-analysis [9,[21][22][23][24][25][26][27] were also included in a 2012 study by Zhou et al, who compared the accuracy of GP73 with that of AFP for diagnosing HCC [31]. However, our meta-analysis evaluated the accuracy of GP73 + AFP for diagnosing HCC. Moreover, we included 3 additional studies to increase the credibility of the results obtained for the diagnostic accuracy of GP73 + AFP. GP73, a potential serum biomarker of HCC, is a 73-kDa transmembrane glycoprotein containing 400 amino acids that is normally expressed in epithelial cells of various human tissues [11]. High levels of serum GP73 were first reported by Block et al in 2005 in patients with hepatitis B virus-associated HCC [10]. In the same year, Marrero et al [27] confirmed that levels of serum GP73 in patients with HCC were considerably higher than those in patients with cirrhosis, and additionally reported that the sensitivity of GP73 for the early diagnosis of HCC was superior to that of AFP. Moreover, the systematic review and meta-analysis of Witjes et al [32] further confirmed these results. Since these findings were published, increasing numbers of studies have been performed on GP73 as a biomarker of HCC.
Our meta-analysis showed that pooled sensitivity and specificity of GP73 were higher than those of AFP, and that these values for GP73 + AFP were higher than those for GP73 or AFP In this study, the PLR and NLR values of GP73 were lower than those of AFP. By contrast, the PLR value of GP73 + AFP was higher than that of GP73 or AFP alone and its NLR value was lower than that of GP73 or AFP alone. These results indicated that the diagnostic accuracy of GP73 was comparable to that of AFP, but that the diagnostic accuracy of GP73 + AFP was superior to that of GP73 or AFP alone. DOR converts the strengths of sensitivity and specificity into a single index that represents diagnostic accuracy. DOR is defined as the ratio of the odds of positive test results of participants with a disease to the odds of positive test results of participants without that disease [33]. DOR values range from 0 to infinity, with a higher value indicating higher accuracy. In this meta-analysis, the mean DOR values of GP73, AFP, and GP73 + AFP were 12.49, 11.61, and 30.63, respectively; this suggests that serum levels of GP73 + AFP were more helpful for early diagnosis of HCC than were the serum levels of GP73 or AFP alone, and that the accuracies of GP73 and AFP alone for diagnosing HCC did not differ markedly.
The SROC curve and AUC are important for assessing diagnostic data in meta-analyses. In SROC curve analysis, the emphasis is on a comprehensive evaluation of a diagnostic method and not on simply the method's sensitivity or specificity [34][35][36]. The AUC is a useful and widely used index of the SROC curve in meta-analyses and it ranges from 1, which indicates a perfect test that correctly classifies all cases and non-cases, to 0, which indicates a test that does not perform an accurate diagnosis. The AUC also shows extremely steady performance in heterogeneity tests. In our meta-analysis, the AUC values of GP73, AFP, and GP73 + AFP were 0.86, 0.84, and 0.91, respectively, which indicates that serum levels of GP73 + AFP showed higher accuracy in HCC diagnosis than did the serum levels of GP73 or AFP alone. Moreover, these AUC values indicated that the diagnostic accuracy of GP73 was superior to that of AFP.
In a meta-analysis, one of the major goals is to analyze the reasons for heterogeneity rather than to compute a single summary measure. An I 2 value of >50% indicates significant heterogeneity [37]. Here, I 2 values of the sensitivity and specificity of GP73, AFP, and GP73 + AFP are presented using forest plots, and these reveal significant heterogeneity. A threshold effect was only one of the reasons for heterogeneity, and the meta-regression analysis for heterogeneity performed in this study showed no statistical difference. Therefore, our meta-analysis could not determine all the reasons responsible for the heterogeneity observed among the included studies. This might be because of inconsistencies in the assessment of study quality and the availability of limited data and incomplete information. Asymmetrical funnel plots indicated publication bias, which might exist because of diverse reasons such as poor methodological quality of small studies, inclusion of numerous studies without registration, true heterogeneity, artifactual results, and other causes. Therefore, several recent studies have not evaluated publication bias among studies that assessed the diagnostic accuracy of distinct markers [34,38,39]. In this study, we also repeated all of these pooled sensitivity and specificity calculations with each of the 11 studies removed individually, and found that this did not markedly affect the final conclusion. These findings clearly reflect the credibility and stability of the results of this meta-analysis. Diagnostic Accuracy of GP73 plus AFP in HCC In conclusion, the results of this meta-analysis showed that serum GP73 + AFP exhibited significantly higher diagnostic accuracy for HCC than did serum GP73 or AFP alone. Moreover, the accuracy of GP73 for diagnosing HCC was superior to that of AFP, which was similar to that observed in the Zhou et al study [31]. However, the number of studies that evaluated the serum levels of both GP73 and AFP was too small to allow meta-regression analysis for heterogeneity, and data on early detection of HCC were lacking; thus, further investigation must be conducted to assess the accuracy of serum GP73 + AFP for the early diagnosis of HCC. Moreover, additional biomarkers should be combined with GP73 and AFP for comprehensive diagnosis of HCC.