Diagnostic accuracy of midkine on hepatocellular carcinoma: A meta-analysis

Objective To evaluate the dependability and accuracy of midkine (MK) in the diagnosis of hepatocellular carcinoma (HCC). Methods PubMed, EMBASE, Web of Science, China Biology Medicine disc and grey literature sources were searched from the date of database inception to January 2019. Two authors (B-H.Z. and B.L.) independently extracted the data and evaluated the study quality using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. The sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR−) were estimated using a bivariate model. Moreover, hierarchical summary receiver operating characteristic curves were generated. The diagnostic odds ratio (DOR) and area under the curve (AUC) were pooled using a univariate model. Results Nine articles (11 studies) were included (1941 participants). The bivariate analysis revealed that the sensitivity and specificity of MK for HCC diagnosis were 0.85 (95% CI 0.78–0.91) and 0.83 (95% CI 0.76–0.88), respectively. We also found a LR+ of 5.05 (95% CI 3.33–7.40), a LR− of 0.18 (95% CI 0.11–0.28), a DOR of 31.74 (95% CI 13.98–72.09) and an AUC of 0.91 (95% CI 0.84–0.99). Subgroup analyses showed that MK provided the best efficiency for HCC diagnosis when the cutoff value was greater than 0.5 ng/mL. Conclusions MK has an excellent diagnostic value for hepatocellular carcinoma.


Introduction
According to recent EASL HCC guidelines, approximately 854,000 new cases of liver cancer are diagnosed annually, among which hepatocellular carcinoma (HCC) is the most frequent type, accounting for up to 90 percent [1]. It is also the fifth most common cancer and the third most common cause of cancer-related death globally [2,3]. The evolution of HCC is a multistep process from chronic liver disease to liver cirrhosis to primary HCC and eventually to metastatic HCC [4]. Patients who are diagnosed with HCC at an inchoate stage are more likely to be cured and have a 70% chance of living more than 5 years with the appropriate therapies such as hepatectomy or liver transplantation. Those who are diagnosed at an advanced stage, in contrast, qualify only for palliative treatments and have unsatisfactory median survival times ranging from 1 to 2 years [5]. These data corroborate the importance of early and accurate HCC diagnosis.
Some guidelines have ruled out α-fetoprotein (AFP) and recommend ultrasound (US) as the standard HCC monitoring procedure in cirrhotic patients [6,7]. A recent meta-analysis concluded that US plus AFP may serve as an updated screening strategy for early HCC. However, the sensitivity and specificity are still low (63% and 45%, respectively) [5]. Moreover, many non-invasive screening tools, such as non-coding RNAs, des-γ-carboxyprothrombin and midkine (MK), have been investigated for use in the diagnosis of HCC [8]. As early as 1996, the serum level of MK assessed by enzyme-linked immunoassay (EIA) was found to be undetectable or lower than 0.6 ng/mL in healthy participants. However, more than fifty percent of HCC patients have an MK value varying from 0.6 to 8 ng/mL [9]. Using EIA, Ikematsu et al found that the highest level of normal serum MK does not reach 0.5 ng/mL, whereas the serum MK levels in 25 HCC cases were all greater than 0.5 ng/mL [10]. In addition, the secretory characteristic of MK makes it easy to quantitate in blood samples. All these characteristics indicate that MK has a promising future as a tool for non-invasive, early and sensitive HCC diagnosis [11]. However, the small number of cases in each study has limited the accuracy of the results, and the diagnostic ability of MK has not yet been fully elucidated. We conducted a systematic review and meta-analysis to determine the diagnostic power of MK for HCC.

Methods
Drafted based on a preset protocol registered with PROSPERO 2018 (https://www.crd.york.ac. uk/PROSPERO/, CRD42018103537), the current meta-analysis was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (S1 Table) [12].

Eligibility criteria
We enrolled studies that evaluated the use of the blood level of MK for the diagnosis of HCC. Studies with insufficient data or those including subjects with other types of liver tumours were excluded. If two studies had an identical cohort, we excluded the less informative one or the one with a smaller population.

Identification and selection of studies
We systematically searched electronic databases including PubMed, EMBASE, Web of Science and China Biology Medicine disc (CBMdisc) from the data of database inception to January 2019, without imposing language restrictions. We used the MeSH terms "liver", "neoplasms", "carcinoma", "midkine", "sensitivity and specificity", "roc curve" and "diagnosis" for literature retrieval. Details of the search strategies for PubMed and EMBASE are presented in S1 Fig. For CBMdisc, the combination of Chinese and English was required. Relevant unpublished work concerning MK and HCC was detected through a grey literature search of meeting proceedings and abstracts from the American Association for Cancer Research and American Society of Clinical Oncology. Finally, we identified candidate articles from the references of pertinent reviews and original studies.
First, the titles and abstracts of retrieved studies were independently screened and filtered by two investigators (B-H.Z. and B.L.). Second, the eligibility of the full-text articles was determined through separate scrutinization by two investigators. Duplicate use of an identical cohort was carefully evaluated. Disagreements were resolved through discussion or consultation with the third investigator (J-Y.Y.).

Data extraction and quality assessment
Two investigators (B-H.Z. and B.L.) independently extracted information the below. First, the following main characteristics of the included studies were extracted: first author name, year of publication, country, sample type, number of participants, age, sex distribution, type of controls, detection method and cutoff values. Second, the following data concerning the diagnostic accuracy were collected: true positive (TP) rate, false positive (FP) rate, false negative (FN) rate, true negative (TN) rate, sensitivity and specificity. All data are publicly available in Open Science Framework (osf.io/gw8em/). The generic Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 tool for diagnostic accuracy studies was applied for the quality evaluation of the enrolled studies [13]. Two investigators (B-H.Z. and B.L.) independently rated the four domains for the "Risk of Bias" and "Applicability Concerns". Consensus was reached through deliberation.

Data synthesis
We fitted hierarchical models when there were at least 4 studies available. All calculations were accomplished with the package 'mada' in R (version 3.6.0). Cells in the contingency table that were zero needed a continuous correction with a recommended value of 0.5 for data analyses because certain ratios did not exist.
The sensitivity and specificity with corresponding 95% CIs were recalculated from the TP, FP, FN and TN rates extracted via a 2 × 2 table from each included study. The threshold effect was initially determined by the correlation between the sensitivity and false positive rate (1specificity) through the visual evaluation of coupled forest plots and was further verified by the Spearman correlation coefficient ρ (> 0.6) between the logit of sensitivity and the logit of the false positive rate [14].
The bivariate random effects model by Reitsma et al. [15] for diagnostic meta-analyses was applied to obtain the pooled estimates of the sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR−). Additionally, the hierarchical summary receiver operating characteristic (HSROC) curves were calculated with both the Rutter & Gatsonis and Rücker & Schumacher approaches [16,17]. We implemented independent evaluations of the diagnostic performance based on the diagnostic odds ratio (DOR) using the DerSimonian and Laird (DSL) model [18] and the area under the curve (AUC) using Holling's model [19]. The heterogeneity of the DOR was determined using the chi-squared test and Higgins' inconsistency index (I 2 ). The statistic for the chi-squared test was Q, and a corresponding p value was calculated for the qualitative assessment of heterogeneity. We set 0.1 as the cutoff significance level [20]; however, with only 9 studies included in our investigation (< 20), the Q test should be interpreted very cautiously [21]. Higgins's I 2 statistic, calculated via the formula I 2 = 100% × (Q-df)/Q, was also calculated as a measure of between-study heterogeneity [22]. The level of heterogeneity was deemed negligible, moderate, and considerable for I 2 values of 25%, 50%, and 75%, respectively [22]. We also conducted a series of prespecified subgroup analyses based on sample type, number of participants, country, control type and cutoff values. Two different thresholds (�0.5 ng/mL and >0.5 ng/mL) were chosen for the exploration of diagnostic accuracy in reference to the existing practice [10]. Deeks' funnel plot was generated to test for publication bias [23].

Study selection and characteristics
As seen in the flowchart, a total of 139 articles met the preliminary standards, including 41 from PubMed, 42 from EMBASE, 28 from Web of Science and 28 from CBMdisc (Fig 1). Ninety-three records remained after removing duplicates. Additionally, 55 irrelevant studies and 10 reviews and meta-analyses were excluded based on screening the titles and abstracts. The remaining 28 articles were considered eligible for full-text review. Nineteen additional studies comprising 16 with insufficient data, 2 with identical cohorts and 1 with a case group composed of patients with cholangiocarcinoma were excluded. A manual search for grey literature and references found no applicable results. These strict eliminations yielded a group of 9 articles (11 studies) for inclusion in the meta-analysis [24][25][26][27][28][29][30][31][32], one of which was a poster presentation [24]. The studies were conducted in China, Egypt, Taiwan and Australia.
The primary attributes of the enrolled studied were summarized and are listed alphabetically in Table 1 and S2 Table. Six studies also analysed the diagnostic potential of AFP [24,25,28,[30][31][32], and only three studies addressed the combined diagnostic potential of AFP and MK [28][29][30]. Due to the scarcity of studies, we did not calculate the indexes relating to the combined AFP and MK group. The number of participants in each study ranged from 70-833, with a median of 164. In total, the meta-analysis included 1941 individuals, namely, 834 HCC patients and 1107 non-HCC participants. Specifically, the non-HCC participants included 123 with gastrointestinal tumour (GIT), 73 with benign liver tumour (BLT), 453 with liver cirrhosis, 27 with chronic hepatitis C (CHC), 86 with chronic hepatitis B (CHB), 50 with benign gastrointestinal disease (BGID) and 295 healthy people. Enzyme-linked immunosorbent assay (ELISA) served as the uniform testing method [24][25][26][27][28][30][31][32] for serum MK. Only one study investigated the MK level in whole blood, and they performed the experiment with TaqMan [29].

Quality assessment
The results of the QUADAS-2 assessment regarding the risk of bias and applicability concerns are summarized in S3 Table. We did not assign quality scores because of underlying heterogeneity [33].
The details are presented below: for the "risk of bias", the major concerns were "patient selection" and the "index test". This was mainly due to the uncertainty of whether consecutive or random sample collection was used, the case-control design, the arbitrary use and absence of a preset cutoff value. In the absence of explicit reference standards, two studies were marked as high risk. In addition, studies without the presentation of an appropriate interval between the index test and the reference standard were deemed unclear or risky. With regard to the "applicability concerns", most of the included studies showed low risk, and the two unclear risk studies did not describe the reference standard; hence, we could not evaluate the applicability.

Diagnostic accuracy
In general, our analysis revealed that the sensitivity and specificity of MK in the diagnosis of HCC ranged from 0.60 to 1.00 (median, 0.87) and from 0.62 to 1.00 (median, 0.84), respectively (Fig 2A). Neither the visual assessment of the coupled forest plots nor the Spearman correlation coefficient ρ (-0.50, 95% CI -0.85-0.14) supported the threshold effect. For AFP, we incorporated common cutoff values (20, 40 and 200 ng/mL) among the various values addressed by one study for further analysis. The sensitivity and specificity ranged from 0.25 to 0.83 (median, 0.52) and from 0.35 to 1.00 (median, 0.84), respectively (Fig 2B). No threshold effect was found on the forest plots or with the Spearman ρ (0.38, 95% CI -0.45-0.85).

Discussion
The abnormal expression of MK has been widely investigated in various malignancies [32,34]. In contrast, MK is rarely detectable in non-malignant blood samples, and the encouraging non-invasive diagnostic potential of MK for tumours is worth in-depth investigation. Jing et al concluded that MK has great performance in the diagnosis of malignant diseases such as oesophageal squamous cell carcinoma, paediatric embryonal tumour, colorectal cancer, hepatocellular carcinoma, thyroid cancer, non-small cell lung cancer, mesothelioma and head and neck squamous cell carcinoma. However, tumour heterogeneity confers substantial limitations on the conclusions [35]. Here, we found a "good" AUC for MK, compared with a "reasonable" AUC for AFP according to the criterion proposed by Jones et al. [36]. Likewise, the pooled DOR for MK eclipsed the one for AFP. The overall sensitivity was greater for MK than for AFP (p = 0.000), yet the overall specificity was approximately equal. In summary, MK is an adequate diagnostic biomarker that is generally more sensitive than AFP for the discrimination of HCC patients from normal individual and cirrhosis, CHC, CHB, GIT, BLT and BGID patients.
To the best of our knowledge, this is the first systematic review and meta-analysis evaluating the diagnostic accuracy of MK in HCC individuals. We conducted the current systematic review according to the PRISMA guidelines [12] and used a preestablished protocol registered in PROSPERO to guarantee the internal validity of our conclusions. A rigorous search of online databases and grey literature sources without language restriction avoided selection bias stemming from the source of the literature. Two authors (B-H.Z. and B.L.) independently extracted data and assessed the quality of the studies using QUADAS-2 [13], a meticulous tool for diagnostic meta-analyses. We used both univariate and bivariate models to synthesize the existing data.
Nine articles including 11 studies were collected and included in the subgroup analyses of MK. We incorporated five covariates: sample type, number of participants, country, control type and cutoff value. As indicated, the pooled sensitivity of the studies with >100 participants [24,26,27,31,32] was lower than that of studies with �100 participants [25,[28][29][30]. In addition, the pooled specificity of studies with >100 participants was lower than that of studies with �100 participants. We noticed that the entire population of studies with �100 participants was still small, with the maximum sample size of only 75 [28]. However, as we know, the small-study effect is a typical mechanism well documented in randomized clinical trial studies, and it seems less marked in diagnostic meta-analyses [37]. Furthermore, the pooled sensitivity and specificity of MK in studies with cutoff values >0.5 ng/mL were manifestly greater than those in studies with cutoff values �0.  [27]. We noticed that MK can bind to heparin sulfate on the vascular endothelial surface. This combination could undermine the sensitivity and specificity of the use of serum MK for HCC detection. As reported previously, the intravenous administration of heparin could increase the serum MK level in a dose-dependent manner [26]. A heparin-ELISA is another type of heparin test that increases the sensitivity of MK, or in other words, lowers the cutoff value (0.07 ng/mL). Traditional HSROC parametrization (Rutter & Gatsonis method) revealed the conspicuous superiority of MK over AFP with regard to the diagnosis of HCC. It should also be noted that in this meta-analysis, an enrolled study represented the particular population of a single institution and consequently defined flexible optimal cutoff values. The diagnostic efficiency per study could be overestimated, correspondingly increasing the power of the pooled estimates to a certain degree. In this case, an alternative approach, the conservative Rücker & Schumacher method, was employed to compute the HSROC curve, acting as a supplement to account for this tiny flaw. The resulting curves all verified the better diagnostic accuracy of MK compared with AFP. The rate of AFP-negative (<20 ng/mL) HCC limits the practicability of AFP for HCC surveillance. The secretory ability of hepatic tumours could be dampened by their small size. Even among larger lesions, twenty percent are not correlated with upregulated levels of AFP [31]. Five studies agree that the MK level is independent of AFP level [28-30, 32, 38]. Additionally, four studies reported a high positivity rate for MK in AFP-negative HCC [24,28,31,32], suggesting the excellent sensitivity of the combination of MK and AFP. In addition, Vongsuvanh et al suggest the capacity of MK to be used for the pre-clinical diagnosis of HCC. In 2000, Ikematsu and colleagues addressed the decreased level of serum MK in 4 out of 5 HCC patients after curative surgery [10]. A later study reported that thirty-six HCC patients had experienced a sharp decline in the serum level of MK four weeks after hepatectomy. Meanwhile, the serum levels of MK in patients with documented recurrence (20/36) increased to the preoperative levels [32]. However, Hung et al concluded that the longitudinal monitoring of serum MK is incapable of detecting HCC recurrence and de novo HCC [26]. Further welldesigned studies with larger sample sizes are needed to settle those disputes.
Limitations should be acknowledged. First, with an exhaustive search procedure, only 9 eligible articles (11 studies) were obtained. Quality assessment uncovered studies with high or unclear risks of bias. This could be explained by their suboptimal study designs. Second, only three studies reported or had sufficient information to calculate the data regarding the diagnostic accuracy of combined MK and AFP; hence, we could not perform a comparative study of the combined and individual diagnostic accuracies. The lack of AFP studies in the included literature and the selection of different cut-off values for AFP may also undermine the stability of our results. Third, the diversity of the control group weakened the accuracy of the specificity values. Specifically, direct-acting antiviral agents (DAA) and nucleotide analogues (NUC) are safe and effective at eradicating HCV and HBV infection, respectively. Therefore, the possible use of DAA or NUC regimens in patients with CHC and CHB in the control group may impede a robust conclusion. Likewise, the aetiology of liver cirrhosis and the trend for the application of lower AFP thresholds (<20 ng/mL) to monitor HCC recurrence may affect the robustness of the conclusion.
In conclusion, MK has a high diagnostic accuracy for HCC screening. More studies are needed to investigate the differential expression of MK in blood samples from patients with different degrees of liver fibrosis and its value in the diagnosis of cirrhotic and non-cirrhotic liver cancer patients. Whether the combination of MK and AFP provides better performance for HCC detection remains unknown. Further studies with rigorous designs are warranted to complete a full-scale evaluation of combined MK and AFP implementation as a means to accelerate the clinical investigation of individualized screening options.