Preoperative Evaluation of the Histological Grade of Hepatocellular Carcinoma with Diffusion-Weighted Imaging: A Meta-Analysis

Objective To evaluate the diagnostic performance of diffusion-weighted imaging (DWI) in the preoperative prediction of the histological grade of hepatocellular carcinoma (HCC). Materials and Methods A comprehensive literature search was performed in several authoritative databases to identify relevant articles. QUADAS-2 was used to assess the quality of included studies. Data were extracted to calculate the pooled sensitivity, specificity, positive likelihood ratio (PLR) and negative likelihood ratio (NLR). Summary receiver operating characteristic (SROC) curves were derived and areas under the SROC curve (AUC) were computed to indicate the diagnostic accuracy. Heterogeneity test, meta-regression analysis and sensitivity analysis were performed to identify factors and studies contributed to the heterogeneity. Results A total of 11 studies with 912 HCCs were included in this meta-analysis. The pooled sensitivity, specificity, PLR and NLR with corresponding 95% confidence intervals (CI) were 0.54(0.47–0.61), 0.90(0.87–0.93), 4.88(2.99–7.97) and 0.46(0.27–0.77) for the prediction of well-differentiated HCC (w-HCC), 0.84(0.78–0.89), 0.48(0.43–0.52), 2.29(1.43–3.69) and 0.30(0.22–0.41) for the prediction of poorly-differentiated HCC (p-HCC). The AUC were 0.9311 and 0.8513 in predicting w-HCC and p-HCC, respectively. Results were further evaluated according to the method of image interpretation. Significant heterogeneity was observed. Conclusion DWI had excellent and moderately high diagnostic accuracy for the detection of w-HCC and p-HCC, respectively. Nonetheless, further studies in larger populations and an optimized image acquisition and interpretation are required before DWI-derived parameters can be used as a useful image biomarker for the prediction of the histological grade of HCC.

image acquisition and interpretation are required before DWI-derived parameters can be used as a useful image biomarker for the prediction of the histological grade of HCC. hepatoma) OR liver cancer)) AND ((((((diffusion magnetic resonance imaging) OR diffusion MRI) OR diffusion weighted imaging) OR DWI) OR apparent diffusion coefficient) OR ADC)) AND (((((histologicÃ grade) OR histopathologicÃ grade) OR tumor grading) OR tumor differentiation) OR neoplasm grading). We did not limit our research to publications of certain language.
Inclusion criteria for meta-analysis were (a) DWI was performed in patients with HCC before treatment; (b) patients had pathological proof of the histological grade of HCC; (c) the diagnostic results of true-positive (TP), false-positive (FP), false-negative (FN) and true-negative (TN) were available; (d) the study included at least 20 HCCs. Non-original researches and republished studies were excluded.

Data Extraction and Quality Assessment
Two investigators independently extracted the following information decided beforehand: author, year of publication, study design (prospectively or retrospectively), patient age, gender (male-to-female ratio), etiology of underlying liver disease, liver function (Child-Pugh A/B/C), number of HCC, mean tumor size, histological differentiation (well/ moderately/ poorly-differentiated), reference standard (liver biopsy and/or surgery), blinding procedure, time intervals between reference standard and index test, imaging protocols adopted to perform DWI (magnetic field strength, b values, image interpretation and diagnostic threshold) and the diagnostic results (TP, FP, FN and TN). Disagreements were resolved by consensus.
Quality assessment of studies eligible for meta-analysis was conducted according to QUA-DAS-2 (Quality Assessment of Diagnostic Accuracy Studies) [33]. Data extraction and quality assessment were carried out by two investigators on consensus.

Statistical Analysis and Data Synthesis
The primary outcome was the identification of w-HCC from higher grades and/or p-HCC from lower grades. Results were presented as TP, FP, FN and TN.
The heterogeneity of included results was assessed statistically using the Q statistic of the Chi-square value test and the inconsistency index (I 2 ). The I 2 index is a measure of the percentage of the total variation across studies resulting from heterogeneity beyond chance. I 2 ＞50% or p＜0.1 for the Chi-square value test indicate the presence of significant heterogeneity [34]. If significant heterogeneity was observed, a random-effects coefficient binary regression model was used to summarize the pooled diagnostic performance accordingly [35]. The summary receiver operating characteristic (SROC) curve was constructed and areas under the SROC curve (AUC) were consulted when determining the diagnostic performance [36]. The prediction of w-HCC and p-HCC were evaluated separately, and results were further evaluated according to the image interpretation.
Threshold effect was judged visually and quantitatively through the SROC curve and Spearman correlation coefficient between the logit of sensitivity and the logit of (1-specificity). A strong positive correlation with p ＜ 0.05 would suggest the existence of threshold effect [37]. Meta-regression analysis was performed on study design, age, male-to-female ratio, number of HCC, tumor size, reference standard, blinding procedure, and imaging protocols to further explore variables contributed to the heterogeneity if necessary. In addition, sensitivity analysis was performed to ensure the reliability of included studies and to identify potential studies that may cause notable heterogeneity. Data analyses were performed using the Meta-DiSc software (version 1.4) [38].
The potential publication bias was assessed by the Deek's funnel plot and an asymmetry test using Stata software (version 12.0). An asymmetrical funnel plot with p ＜ 0.05 would indicate the presence of publication bias [39].

Study Characteristics and Quality Assessment
In the 11 studies included for meta-analysis, a total of 912 HCCs were included, with 239 w-HCCs, 449 m-HCCs and 224 p-HCCs. All studies were designed retrospectively. Blinding procedure was reported in 8 studies. Patients were included consecutively in most studies with or without the history of underlying liver disease (hepatitis B/C infection, fibrosis, cirrhosis and steatohepatitis, etc.). Liver function was reported only in 4 of the 11 studies. Two studies that included only hypervascular HCCs were specifically noticed. In studies [14,18] where the histopathological grades were presented as the Edmondson-Steiner grade, tumor differentiation was classified into well, moderate and poor according to Edmondson-Steiner's grading system [40].
There were 2 studies had outcome data for the prediction of w-HCC, 4 studies had outcome data for the prediction of p-HCC, and 5 studies had outcome data for the prediction of both w-HCC and p-HCC. DWI was interpreted qualitatively by visual assessment (VA) and quantitatively by ADC quantification (ADC-Q) in 4 and 2 studies respectively in predicting w-HCC, with one study reported the outcome data of both qualitative and quantitative interpretation. For the prediction of p-HCC, DWI was interpreted qualitatively and quantitatively in 4 and 5 studies, respectively. Therefore, a total of 7 studies with 8 data sets were available for the prediction of w-HCC, and 9 studies with 9 data sets were available for the prediction of p-HCC.
Principal information about those included studies was summarized in Table 1. The included studies had high quality. Result of quality assessment for the 11 studies was presented in Table 2. Fig. 2 shows a graphical display for QUADAS-2 results regarding the proportion of studies with low, high or unclear risk of bias.

Diagnostic Accuracy for the Prediction of Well-differentiated HCC
The overall pooled sensitivity and specificity with corresponding 95% confidence intervals (95% CI) in predicting w-HCC were 0.54 (0.47-0.61) and 0.90 (0.87-0.93), respectively. Sensitivity of individual studies ranged widely from 12% to 86%, while specificity focused mainly from 83% to 95%. The pooled positive likelihood ratio (PLR) and negative likelihood ratio (NLR) were 4.88 (2.99-7.97) and 0.46 (0.27-0.77), respectively. According to the SROC curve, the AUC was 0.9311. There was significant heterogeneity (I 2 = 89%) in the sensitivity between each study. No threshold effect was detected. Meta-regression analysis revealed no factors contributed significantly to the heterogeneity. Forest plots of sensitivity, specificity, PLR and NLR are shown in Fig. 3.
Subgroup analysis was performed according to the method of image interpretation. VA yielded a lower sensitivity (40%) and higher specificity (93%) than that of ADC-Q (75% and 86%, respectively). Notable heterogeneity existed in the pooled sensitivity for VA (I 2 = 87.8%), while only slightly significant heterogeneity (I 2 = 53.5%) existed in the pooled sensitivity for ADC-Q.
Two studies that included only hypervascular HCCs and showed extremely low sensitivity were excluded for sensitivity analysis. Therefore, when combined, the pooled sensitivity, specificity, PLR and NLR for the remaining studies were 0.68 (0.60-0.76), 0.90 (0.87-0.93), 6.35 (4.08-9.88) and 0.36 (0.23-0.57) for overall evaluation, and 0.60 (0.48-0.72), 0.94 (0.89-0.96), 6.96 (2.45-19.76) and 0.46 (0.24-0.85) for VA. Along with the significant decrease of heterogeneity, the pooled sensitivity increased considerably in both groups. The diagnostic results of subgroup analysis and sensitivity analysis were presented in Table 3. Diagnostic Accuracy for the Prediction of Poorly-differentiated HCC There were 9 studies assessed the performance of DWI in predicting p-HCC. Sensitivity of individual studies concentrated within the range of 72% to 100%, while the specificity ranged widely from 16% to 90%. Combined together, the included 9 studies yielded a sensitivity of 0.84 (95% CI, 0.78-0.89) and a relatively low specificity of 0.48 (95% CI, 0.43-0.52). The pooled PLR and NLR were 2.29 (95% CI, 1.43-3.69) and 0.30 (95% CI, 0.22-0.41), respectively. The AUC was 0.8513. Significant heterogeneity was observed in the specificity (I 2 = 96.4%) between included studies. There was no notable threshold effect in the evaluated 9 studies. Meta-analysis did not found any factors that contributed significantly to the heterogeneity. Forest plots of sensitivity, specificity, PLR and NLR are shown in Fig. 4. SROC curves for the prediction of both w-HCC and p-HCC are shown together in Fig. 5. Subgroup analysis showed that VA had a high sensitivity of 91% but an extremely low specificity of 25%. Significant heterogeneity still existed (I 2 = 84.5%) in specificity within the VA group. While ADC-Q had a more agreeable sensitivity of 78% and specificity of 82%, with only slightly significant heterogeneity (I 2 = 52.2%) in specificity.
For the same reason stated above, two studies were excluded for sensitivity analysis. Results of sensitivity analysis were similar with that of before exclusion for both overall evaluation and VA. The diagnostic results were summarized in Table 3.

Publish bias
The results of the Deeks's funnel plot asymmetry test (p = 0.73 and p = o.34) indicated the absence of notable publication bias for the prediction of both w-HCC and p-HCC.

Discussion
Low grade HCC tends to show hyperintensity on unenhanced T1-weighted imaging and hypointensity on T2-weighted imaging [41]. Morphological evaluation showed that larger tumor size and extrahepatic extension were associated with higher histological grade [17]. However, it is often challenging to determine the differentiation of HCC seen on MR images,  especially in patients with cirrhosis because of the architectural distortion of the liver parenchyma and the overlapping imaging appearances [42,43]. Fundamentally different from the conventional morphologic-based imaging techniques, DWI probes the function of tissues. Restricted diffusion of water for a malignant tumor is mainly caused by increased cellular density and decreased extracellular space, and is presented with hyperintensity on DWI or decreased ADC [44]. Earlier study concluded that as histological grade rise, the ADC value tends to decrease, and lesions were more likely to show hyperintensity on DWI [24]. To our knowledge, this is the first meta-analysis of the diagnostic performance of DWI in predicting the histological grade of HCC. Results showed that for differentiating w-HCC from higher grades, DWI had a relatively low sensitivity (54%), high specificity (90%), and an excellent diagnostic performance (AUC = 0.9311). When differentiating p-HCC from lower grades, the pooled sensitivity was 84%, and specificity was 48%, with a moderately high diagnostic performance (AUC = 0.8513). We evaluated the diagnostic performance for the prediction of w-HCC and p-HCC separately because they have significantly different prognosis and ask for different patient management. For instance, when selecting candidates for liver transplantation, consideration may be given to excluding patients with poorly differentiated HCC [12], and include patients with tumors larger than 5 cm with a well differentiated histology [9,45].
There was significant heterogeneity in the pooled sensitivity for the prediction of w-HCC and in the pooled specificity for the prediction of p-HCC. Spearman correlation coefficient confirmed the absence of threshold effect, and meta-regression analysis did not find any factor contributed statistically to the heterogeneity. Sensitivity analysis found that patient inclusion of only hypervascular lesion was the main cause of heterogeneity in the sensitivity for the prediction of w-HCC, but barely affected the diagnostic results for the prediction of p-HCC.
Among the included studies, Nasu et al [25] and Piana et al [26] reported an extremely low sensitivity for predicting w-HCC and an extremely low specificity for predicting p-HCC. We speculate this could be explained by the inclusion of only hypervascular (i.e., arterial enhancement or typical enhancement patterns) HCCs. Studies suggested that the presence of typical enhancement patterns on CT or MRI indicates not only the diagnosis of HCC, but also the potential of MVI, and is more likely to have higher histological grade [15,46].
In 2009, Nasu et al [25] reported that there was no significant correlation between ADC and the histological grade of hypervascular HCC, but slightly significant correlation existed between signal intensity on DWI and histological grade. Following researches included HCCs with histopathological proof of the differentiation grades, regardless of the contrast-enhanced appearances. Heo et al [23] investigated the correlation between the histological grade and both ADC and the expression of vascular endothelial growth factor (VEGF). Results showed that tumor differentiation had a significantly inverse correlation with the ADC value of HCC (r = -0.51), but there was no correlation between the histological differentiation and the VEGF expression (r = -0.33). Another study found the minimum-spot ADC was an independent risk factor for early tumor recurrence after HCC resection, suggesting that DWI has a potential role for histological tumor grading and prediction of early HCC recurrence [21]. Recently, Chang et al [17] reported that ADC value and relative intensity ratio in arterial phase (RIRa) were the two most promising quantitative MRI parameters to distinguish histological grade. Moreover, comparison of the two parameters revealed that the ADC values were more sensitive than RIRa in differentiating w-HCC from higher grades. In summary, all those reports indicated the potential of DWI in predicting the histological grade o f HCC.
In the subgroup analysis, we compared the effect of different image interpretation. Qualitative interpretation was carried out through visually assessing the signal intensity (hypointense, isointense or hyperintense) on DWI images, and hypointensity or isointensity was considered an indicator for w-HCC in this study. An objective VA depends largely on blinding image interpretation, and is affected by the conspicuity of lesion on DWI, especially when lesions were smaller than 1 cm in diameter [47]. Quantitative interpretation, on the other hand, evaluates tumor cellularity through the calculation of ADC. However, the ADC quantification (ADC-Q) can be easily affected by DWI sequence parameters, such as the use of parallel imaging technique and the set of b values [48,49], and the substantial overlap of ADC between different histological grades [22,23] makes it even more trick to estimate tumor differentiation. A previous study suggested that in the setting of solid focal liver lesions, VA was less accurate than ADC-Q for detecting HCC [50]. Results of this study indicated that for the prediction of w-HCC, ADC-Q had the highest sensitivity, while VA demonstrated the highest specificity. When differentiating p-HCC, however, VA demonstrated the best sensitivity, while ADC-Q showed the best specificity. Concerning both the sensitivity and specificity, results of quantitative interpretation (ADC-Q) seem to be more favorable for the prediction of both w-HCC and p-HCC. There are as yet no standardized imaging protocols and result interpretations regarding the prediction the histological grade of HCC using DWI. In addition to the commonly applied signal intensity on DWI and the average ADC values on ADC maps, few other parameters such as the minimum-spot ADC [21], contrast-to-noise (C/N) ration [20] and relative contrast ratio (RCR) [27] on DWI have been explored. Despite some limitations, so far, most studies preferred the application of ADC [51]. Recent technique advance has promoted the application of intravoxel incoherent motion (IVIM) diffusion-weighted imaging in predicting the histological grade of HCC. By using the intravoxel incoherent motion model and multiple b values, pure diffusion characteristics can be separated from pseudodiffusion caused by perfusion [52]. According to Woo et al [18], IVIM-derived D values (diffusion coefficient, representing pure molecular diffusivity) showed significantly better diagnostic performance than ADC values in differentiating high grade HCC from low grade HCC. It was also proven to have good reproducibility [53]. Hopefully, this could improve the performance of quantitative DWI in predicting the histological grade of HCC. However, considering the limited number of study, further studies with a larger study population are still needed.
Liver function was another concern that may affect the estimation of HCC differentiation. If patients had severely impaired liver parenchyma (fibrosis or cirrhosis), the ADC measurements might show different results [54]. We could not further analyze it because liver function was reported in only few studies. Previous study indicated a tendency toward decreased detection of HCC in DWI with the severity of cirrhosis [42]. Another research reported that the hepatobiliary phase (HBP) of gadoxetic acid-enhanced MRI (EOB-MRI) predicts the histological grade of HCC only in patients with Child-Pugh class A cirrhosis [55], but there is no similar exploration concerning DWI. However, comparison of the pathological results of the background liver parenchyma by Chang et al [17] revealed no significant differences in presence of liver fibrosis, fatty change, or iron deposition between Child-Pugh class A and Child-Pugh B/C patients. This, we speculate, could ease the concern to some degree, but studies are still needed to verify it.
There are still many challenges in the preoperative evaluation of the histological grade of HCC. Generally, it is hard to distinguish well-differentiated HCCs from benign hepatocellular nodules, especially high grade dysplastic nodule [24,56]. Moreover, lesions that are small or located in areas vulnerable to the cardiac motion-related artifacts such as the liver dome and the left subphrenic region cannot be evaluated precisely by visual assessment or by using ADC measurements [57,58]. Although many MRI sequences (conventional T1/T2-weighted [41], DCE-MRI [16,17,59], HBP images of EOB-MRI [29,60] and DWI) have been explored in the prediction of HCC differentiation, no consensus has been obtained yet. So far, there is only one study [14] investigated the utility of combined MRI techniques, and concluded that the combination of DWI and subtraction imaging can help better distinguish the histological grade of HCC. Larger prospective studies should be conducted to further validate the applicability of combined MRI sequences, and to determine the optimal diagnostic algorithm.
Some limitations of this meta-analysis should be addressed. First, the sample size was relatively small. Due to the limited number of studies and information, further analysis was infeasible, and results of this study should also be consulted with caution. Second, the retrospective study design with patients scheduled for surgical resection or transplantation in many studies might have introduced some bias in patient selection, and blinding procedure was unclear in 3 studies, causing concerns about the quality of included studies. Third, although meta-regression analysis found the mean tumor size did not contribute statistically to the heterogeneity, difference did exist in the inclusion of HCC lesions with only three studies included lesions smaller than 1 cm in diameter, which was considered difficult to visualize and to set the region of interest in most studies. Forth, due to the retrospective nature, 3 studies took either surgical or biopsy results as reference standard. However, determining tumor grade on biopsy may be misleading, as tumor grade is often underestimated in core biopsy specimens in comparison with surgical specimens [61,62], and the Interrater disagreement for biopsy evaluation was substantial [61]. Therefore, both the reliability and reproducibility of the grading of HCC using biopsy was queried.
In conclusion, this meta-analysis showed that DWI had excellent and moderately high diagnostic accuracy for the detection of w-HCC and p-HCC, respectively. Although difficulty existed in correct prediction of the histological grade of individual lesion at present, quantitative DWI still holds tremendous potential in the non-invasive prediction of the histological grade of HCC preoperatively. Further studies in larger populations and an optimized image acquisition and interpretation are required before DWI-derived parameters can be used as a useful image biomarker for the prediction of the histological grade of HCC.