Spleen Stiffness Is Superior to Liver Stiffness for Predicting Esophageal Varices in Chronic Liver Disease: A Meta-Analysis

Background and Aims Liver stiffness (LS) and spleen stiffness (SS) are two most widely accessible non-invasive parameters for predicting esophageal varices (EV), but the reported accuracy of the two predictors have been inconsistent across studies. This meta-analysis aims to evaluate the diagnostic performance of LS and SS measurement for detecting EV in patients with chronic liver disease (CLD), and compare their accuracy. Methods Pubmed/Medline, Embase, Cochrane Library and Ovid were searched for all studies assessing SS and LS simultaneously in EV diagnosis. A total of 16 studies including 1892 patients were included in this meta-analysis, and the pooled statistical parameters were calculated using the bivariate mixed effects models. Results In detection of any EV, for LS measurement, the summary sensitivity was 0.83 (95% confidence interval [CI]: 0.78–0.87), and the specificity was 0.66 (95% CI: 0.60–0.72). While for SS measurement, the pooled sensitivity and specificity was 0.88 (95% CI: 0.83–0.92) and 0.78 (95% CI: 0.73–0.83). The summary receiver operating characteristic (SROC) curve values of LS and SS were 0.81 (95% CI: 0.77–0.84) and 0.88 (95% CI: 0.85–0.91) respectively, and the results had statistical significance (P<0.01). The diagnostic odds ratio (DOR) of SS (25.73) was significantly higher than that of LS (9.54), with the relative DOR value was 2.48 (95%CI: 1.10–5.60), P<0.05. Conclusions Under current techniques, SS is significantly superior to LS for identifying the presence of EV in patients with CLD. SS measurement may help to select patients for endoscopic screening.


Introduction
presented in original articles were insufficient, the corresponding author would be contacted by e-mail to provide them. Studies without available relevant data after contacting original authors were excluded.

Search strategy
A systematic search was performed through Pubmed/Medline, Embase, Cochrane Library and Ovid to identify all relevant studies assessing SS and LS simultaneously in EV diagnosis. Relevant studies published prior to 1 May 2016 were searched using the following keywords: spleen stiffness, liver stiffness, elastography, varices. A manual search was also carried out on reference lists of identified articles. All studies were limited to articles with an English abstract.

Study selection and data extraction
Two investigators (X.M. and L.W.) independently screened the search results and reviewed relevant full texts to determine eligibility. Discrepancies were resolved in consultation with a senior reviewer (Q.Z.). For each included study, the following data were extracted: author, country, year of publication, study design, number of patients, age, gender, body mass index (BMI), etiology of CLD, proportion of cirrhosis, Child-Pugh score, prevalence of EV or severe EV, definition of severe EV, measuring techniques, invalid measurement, optimum cut-off value according to ROC curve or Youden Index, sensitivity, specificity and area under ROC curve for SS and LS respectively. We imputed the number of true positive, false positive, false negative and true negative results of SS and LS respectively on EV or severe EV diagnosis in all patients with EGD.

Quality assessment
Risk of bias was assessed separately by two investigators using the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [20]. This tool is divided into 4 domains including patient selection, index test, reference standard, flow and timing. Each domain is assessed for risk of bias, and the first 3 domains are assessed for applicability as well. In this meta-analysis, LS and SS measurement were regarded as the index test, and the reference standard referred to EGD.

Data synthesis and analysis
Based on extracted data, the summary sensitivities, specificities, and diagnosis odds ratio (DOR) with corresponding 95% confidence interval (CI) were calculated to evaluate the performance of liver and spleen stiffness measurements for EV and severe EV diagnosis. The DOR comprises a combination of sensitivity and specificity, and it was regarded as a single indicator of diagnostic test accuracy [21]. The summary ROC (SROC) curve was also performed as an alternative global measure of accuracy to avoid the influence of heterogeneity and different cut-off value. All summary parameters were calculated using the bivariate mixed effects models. In addition, using Fagan nomogram, we evaluated the post-test probabilities of EV on assumption of 57% pre-test probability following a positive or negative test result. To provide a clinically meaningful comparison, we conducted the SROC curve for both liver and spleen stiffness measurements simultaneously, and compared their area under SROC curve using Z-test [22]. We also calculated the relative DOR (rDOR) ratios with 95% CI of the two parameters. When 95% CI do not include the unity, the difference of DOR between tests is statistically significant.
Between-study heterogeneity was assessed by computing Higgin's I 2 and chi-square test (P value). An I 2 value more than 50% or a P value less than 0.10 was considered substantial heterogeneity. Besides, we used meta-regression analyses according to different study characteristics to investigate sources of heterogeneity. Because there are considerable variations across different techniques for stiffness measurement and different stages of CLD, we also performed subgroup analyses to investigate the influence of such variability on diagnostic performance.
Deek's funnel plot was used to test the presence of publication bias, in which a regression of diagnostic log odds ratio against 1/sqrt (effective sample size) and weighting by effective sample size was conducted, with a P value less than 0.10 suggesting significant asymmetry [23]. All statistical analyses were performed by STATA 12.0 (College Station, TX) software using MIDAS command. This meta-analysis was based on PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist (S1 PRISMA Checklist).

Search results
A total of 607 studies were identified based on described search strategies. After removing duplicates and irrelevant articles, 240 studies were screened for further review. 98 studies were excluded because they didn't report on both liver and spleen stiffness measurement, and 93 studies could not be included for not relevant to EV (n = 47) or lack of EGD (n = 46). 33 studies were excluded for children or animal subjects (n = 8), surgery experience (n = 8), incomplete data (n = 6), inadequate cut-off value (n = 2) or type of reviews (n = 9). Ultimately, a total of 16 studies (14 full-text studies and 2 abstracts) including 1892 patients in whom both SS and LS were measured for EV detection were selected for meta-analysis. 13 of these reported the diagnostic performance of SS and LS in identifying the presence of EV [8,14,18,[24][25][26][27][28][29][30][31][32][33], while 5 studies were available in severe EV diagnosis through spleen and liver stiffness measurement [27,31,[34][35][36]. The coefficient of agreement between the two investigators was very good.
Minor heterogeneity has been observed between studies on LS measurement, with I 2 = 41.76%, P = 0.09. There was not significant threshold effect between studies, with the Spearman correlation coefficient = -0.17, P = 0.58. According to meta-regression analyses, basic characteristics of patients could explain the source of heterogeneity. Studies with the mean age less than 55 years old showed higher diagnostic accuracy compared to those performed in older patients. Research involving more male participants (over 70%) also improved the diagnostic performance of LS (P<0.01). The accuracy of LS was not affected by technique for measurement, location, quality of study, proportion of cirrhosis, etiology of disease, sample size (P>0.05) (S2 Table). Funnel plot asymmetry test demonstrated that there was no evidence of publication bias between studies (P = 0.68). Separate analysis specific to TE technique (n = 9) was conducted to demonstrate the optimism range of cut-off value. In terms of DOR value, there were not significant differences between studies with the cut-off value lower than 21 kPa (n = 5, DOR = 7.21) and the others (n = 4, DOR = 11.523), P = 0.09.
There was not significant heterogeneity in the analysis of SS for the prediction of EV (P = 0.13, I 2 = 27.76%). Threshold effect was not observed for SS analysis (P = 0.02). Funnel plot asymmetry test demonstrated that there was no evidence of publication bias for SS in EV diagnosis (P = 0.92). Separate analysis specific to TE technique (n = 9) was conducted to demonstrate the optimism range of cut-off value. For studies with the cut-off values lower than 47 kPa, the DOR value of SS in predicting EV was 34.92, which is significantly higher than other studies with the cut-off value!47 kPa, P<0.05.
Spleen stiffness is superior to liver stiffness for the prediction of esophageal varices in patients with chronic liver disease Our results indicated that SS predicted the presence of EV better than LS, on both sensitivity and specificity. The area under SROC curve of SS for diagnosis of EV was 0.88 (95% CI: 0.85-0.91), while the LS had a value of 0.81 (95% CI: 0.77-0.84) (Fig 3). There was significant difference between the two SROC values according to Z-test (Z = 3.74, P<0.01). The summary DOR of SS (DOR = 25.73) was higher than that of LS (DOR = 9.54), and the difference was statistical significant (rDOR = 2.48, 95% CI: 1.10-5.60, P = 0.03). Because the technique for measurement varies between included studies, a certain cut-off value could not be concluded accurately. To decrease the influence of different diagnostic thresholds, all included studies defined the optimum cut-off value according to the ROC curve or Youden index to maximize the sensitivity and specificity. At corresponding cut-off value, the summary sensitivity of SS and LS for detecting the presence of EV were 0.88 and 0.83 respectively (Z = 1.13, P = 0.26), whereas the specificity of SS was significantly higher than that of LS with the value of 0.78 and 0.66 (Z = 2.35, P = 0.02). A Z-test based on the joint model of sensitivity and specificity demonstrated that the diagnostic accuracy of SS and LS differed significantly for prediction of EV (P = 0.03). Table 3 summarized the pooled accuracy and the comparison of LS and SS measurement.
Significant heterogeneity was observed in the analysis of severe EV. Because of the limited number of included studies, meta-regression could not be used to explore the factors inducing heterogeneity. Funnel plot asymmetry test demonstrated that there was no publication bias for LS and SS in detecting severe EV, with P = 0.15 and 0.55.

Discussion
LS and SS are two non-invasive parameters receiving the most attention for identifying patients suffered from EV, but the diagnostic value of these two predictors is still controversial. In this meta-analysis, we evaluated the performance of LS and SS simultaneously for detecting EV and severe EV in patients with CLD, and compared their diagnostic accuracy. Our results indicated that SS was superior to LS for predicting the presence of EV in patients with CLD, while the diagnostic accuracy of both LS and SS were limited in predicting severe EV.

Comparison of LS and SS for Esophageal Varices Diagnosis
During the progression of liver cirrhosis and portal hypertension, passive congestion and tissue hyperplasia characterized by a combination of angiogenesis and fibrogenesis frequently occur in the spleen [37]. All these changes result in increased SS, which is closely related to portal hypertension and reflects the extra-hepatic hemodynamic changes. When it comes to LS, although it appears to be a reliable surrogate for liver biopsy in identifying mild or advanced fibrosis, the pathophysiological basis for its correlation with portal hypertension remains poorly defined [38]. It is clear that LS only reflects the increased intra-hepatic vascular resistance, but not the hyperdynamic circulation and the opening of portal-systemic shunts [38]. For this reason, SS predicts the formation of EV caused by splanchnic hemodynamics changes better than LS [8], which is consistent with our results.
Combination of different non-invasive markers is also an important and valid approach to exclude EV in clinical practice. It is considered that the combination of LS value with other spleen-related parameters results in an increased diagnostic accuracy [8]. This phenomenon indicates that the association of LS and parameters reflecting the extra-hepatic hemodynamic could be a valuable tool with better diagnostic accuracy for the prediction of EV. Studies have shown that combining the LS and SS measurements further increased the diagnostic accuracy of EV [30,32]. Hence, it is possible to construct a combinative model with satisfactory accuracy for predicting EV based on the SS measurement.
Several techniques were enrolled in our studies for liver and spleen stiffness measurement. As the most widely used method for organ stiffness assessment, TE is available in many clinical centers, although it requires a dedicated Fibroscan device [39]. It should be mentioned that the reliable measurements by TE is quite low in obese cases and patients with ascites. We observed that these kinds of cases were tend to be avoided in most original studies involved in this metaanalysis. In contrast, ARFI and SWE are two novel, popular, ultrasound technique based technologies, which could be used in the existence of ascites. However, there is limited validation of these two techniques and the measures of quality are not well defined [19]. In this meta-analysis, we observed that there was no significant heterogeneity between different techniques, and the threshold effect was not obvious. Thus, the diagnostic performance of all these techniques were comparable. Moreover, we excluded studies with a different threshold standard. All studies included in our meta-analysis determined its own cut-off value following the accordant standard, which minimizes the influence of different techniques and cut-off values and ensures the comparability of the studies.
For severe EV, our results indicated that both liver and spleen stiffness measurement showed limited diagnostic accuracy. From the current studies, LS is considered not to correlate with the grades of EV [40,41], whereas the SS measurement may be possible to identify severe EV, but the accuracy is not high [42]. Certainly, additional studies are needed to verify the diagnostic performance of LS and SS in predicting severe EV.
Singh et al summarized the accuracy of SS measurement as a new predictor in detection of EV [19]. Extending upon previous studies, we compared the diagnostic value of this new proposed parameter with the conventional LS measurement in the prediction of EV. We concluded that SS is significantly superior to LS in EV diagnosis, which is helpful in clinical practice. Besides, with the development of elastography techniques, more recent studies (especially in last two years) were involved in this meta-analysis, which keeps our study novel and timely. Thus, only 5 studies included in our meta-analysis were involved in the previous publication.
The strengths of our study were the comprehensive and simultaneous assessment of the diagnostic value of LS and SS for the prediction of EV, and provided an authentic comparison of the two useful parameters. All comparative studies included in our meta-analysis provided sufficient data for both LS and SS simultaneously, which was able to decrease the risk of bias from patient spectrum, disease prevalence and inter-observer variability. Furthermore, a Z-test was used to compare the SROC value of LS and SS for predicting the presence of EV, and the rDOR was also conducted to compare the diagnostic accuracy based on the DOR value, which confirms the reliability of our study.
The limitations of our meta-analysis should be taken into consideration. First, only 5 studies described the performance of SS and LS for severe EV diagnosis, which limited the conduction of meta-regression and subgroup analysis for explaining the heterogeneity. More research is also needed to validate our summary results of LS and SS in identifying severe EV. Second, minor heterogeneity existed in the analysis of LS for prediction of EV in our meta-analysis. Although the heterogeneity is acceptable and could be explained by characteristics of involved patients, it also affected the reliability of our results. Third, the range of detection and units are completely different regarding variety of included techniques, which limited their comparisons. Because all studies involved in this analysis have to report the performance of LS and SS simultaneously, the included number of some clinical frequently-used techniques, such as ARFI, SWE, was too small to be analyzed separately. For this reason, we could not obtain the optimism cut-off range of each technique. In this meta-analysis, only separate analysis specific to TE was provided. Therefore, our summary conclusion that SS is superior to LS for predicting the presence of EV also needs to be validated under specific techniques respectively based on more original studies in future.
In conclusion, our meta-analysis demonstrated that SS is superior to LS for predicting the presence of EV in patients with CLD. Although the accuracy of the two parameters in identifying severe EV is not high, they still could be considered as a choice for screening EV in newly diagnosed cirrhosis. Combination of LS and LS may improve the diagnostic accuracy, and it is also possible to construct a novel combinative model with higher accuracy in predicting EV. Simple, low-cost and more accurate non-invasive models are needed in future as surrogates of endoscopy for EV detection.