The Performance of Enhanced Liver Fibrosis (ELF) Test for the Staging of Liver Fibrosis: A Meta-Analysis

Background The enhanced liver fibrosis test (ELF) has been shown to accurately predict significant liver fibrosis in several liver diseases. Aims To perform a meta-analysis to assess the performance of the ELF test for the assessment of liver fibrosis. Study Electronic and manual searches were performed to identify studies of the ELF test. After methodological quality assessment and data extraction, pooled estimates of the sensitivity, specificity, area under the receiver operating characteristic curve (AUROC), positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR) and summary receiver operating characteristics (sROC) were assessed systematically. The extent of heterogeneity and reasons for it were assessed. Results Nine studies were identified for analysis. The pooled sensitivity, specificity, positive LR, negative LR, and DOR values of ELF test, for assessment of significant liver fibrosis, were 83% (95% CI = 0.80–0.86), 73% (95% CI = 0.69–0.77), 4.00 (95% CI = 2.50–6.39), 0.24 (95% CI = 0.17–0.34), and 16.10 (95% CI = 8.27–31.34), respectively; and, for evaluation of severe liver fibrosis, were 78% (95% CI = 0.74–0.81), 76% (95% CI = 0.73–0.78), 4.39 (95% CI = 2.76–6.97), 0.27 (95% CI = 0.16–0.46), and 16.01 (95% CI: 7.15–35.82), respectively; and, for estimation of cirrhosis, were 80% (95% CI = 0.75–0.85), 71% (95% CI = 0.68–0.74), 3.13 (95% CI = 2.01–4.87), 0.29 (95% CI = 0.19–0.44), and 14.09 (95% CI: 5.43–36.59), respectively. Conclusions The ELF test shows good performance and considerable diagnostic value for the prediction of histological fibrosis stage.


Introduction
Liver fibrosis is a consequence of various chronic liver diseases, often caused by viruses, alcohol, and fat deposition, and can result in liver cirrhosis. Cirrhosis is the main cause of morbidity and mortality in chronic liver disease, but is often asymptomatic until the synthetic and filtering functions of the liver are finally compromised or portal hypertension develops. In addition, for chronic viral hepatitis, the degree of liver fibrosis is an important parameter for decisions on antiviral therapy [1], so the early detection of fibrosis progression and the development of cirrhosis are important in the management of patients with chronic liver disease.
Presently, a liver biopsy remains the reference standard for evaluating liver fibrosis. However, it is limited by sampling error and the risk of complications [2,3]. Intra-and interobserver variability may lead to misinterpretation of the fibrosis stage [4][5][6][7].
One reason for the difficulty in correctly assessing the fibrosis stages may be simply that a biopsy specimen represents only 1/ 50,000 th of the total liver mass [2]. Even with adequate biopsy samples ($15 mm in length with five or more portal tracts), cirrhosis can be understaged in 10-30% of cases [8]. Moreover, it is usually difficult to undertake biopsies on a repeated basis, because of their invasive nature and complications, such as pain and bleeding.
Thus, much attention has been focused on the development of non-invasive methods, including radiological and biochemical tests, to detect liver fibrosis. Transient elastography for assessing liver stiffness has become available for the evaluation of liver fibrosis as a rapid, non-invasive method. However, this technique is cost-intensive and its availability is largely limited to specialist liver centres. Moreover, liver stiffness measurements can be difficult or impossible in obese patients, in those with narrow intercostal space, and in patients with ascites [9], and a failure rate up to 18.9% has been reported [10].
Alternative method assessing the degree of liver fibrosis focused on serum biomarkers. The combined use of three serum biomarkers of hyaluronic acid (HA) [11], which is a component of the extracellular matrix (ECM) and is primarily cleared from the bloodstream by the hepatic sinusoids, tissue inhibitors of metalloproteinases (TIMP-1) [12][13][14][15][16] inhibiting the activities of matrix metalloproteinases (MMPs) and amino-terminal propeptide of procollagen type III (PIIINP) [17][18][19][20] reflecting collagen synthesis at the site of disease has recently been proposed for the detection of fibrosis. In clinical practice, serum samples were analysed for levels of HA, TIMP-1 and PIIINP. Results were entered into the established algorithm and expressed as discriminant scores. This simplified version of panel was called enhanced liver fibrosis (ELF) score. In other word, a higher concentration of individual biomarkers leads to a higher ELF score and indicates a greater likelihood of more severe fibrosis. The ELF test has several strengths such as better automaticity, high reproducibility, less invasiveness and proven considerable diagnostic performance in the assessment of the degree of liver fibrosis [21][22][23]. The ELF test has received the Conformité Européénne mark in May 2007 [24].
The aim of this study was to perform a meta-analysis to evaluate the diagnostic accuracy of ELF, with histopathology as a reference standard.

Search Strategy
A computerised search was performed in PubMed/Medline, EMBASE, the Cochrane Library, and Google Scholar to identify relevant articles published from 2003 to 2013. The literature search was performed with the following terms: cirrhosis, liver fibrosis, and enhanced liver fibrosis test or ELF test. The research was limited to articles concerning humans with an abstract in English. The complete search yielded 260 articles from databases.

Study Selection and Quality Assessment
Two reviewers (W-L.W. and Q-S.X.) read the titles and abstracts of original articles that addressed the diagnostic accuracy of ELF for staging liver fibrosis in humans to select potentially relevant articles. All of the selected articles were collected and reviewed independently by the same reviewers to determine their eligibility for detailed analysis. The inclusion criteria were as follows: patients with suspected cirrhosis, ELF scores as the index test, defined optimal cut-off values or a threshold of ELF, histopathology as the reference test, and raw data (i.e., truepositive (TP), false-positive (FP), true-negative (TN), and falsenegative (FN) results could be found or calculated). Exclusion criteria were duplicate publication (based on the same primary study) and sample size of less than 20. Disagreements between the two reviewers regarding study inclusion were resolved by consensus after a face-to-face discussion. Investigators in the primary research were approached for additional information as necessary.
The methodological quality of each study was assessed using a checklist based on the Quality Assessment for Studies of Diagnostic Accuracy (QUADAS) tool [25], which enables reviewers to evaluate the quality of studies, especially investigations of diagnostic accuracy [26,27].

Data Extraction
Data were extracted from primary studies by the two reviewers (W-L.W. and Q-S.X.) independently. In cases of discrepancies between the first two reviewers, a senior surgeon (S-S.Z.), with more than 20 years of experience in hepatic disease, was consulted and a consensus was reached. We defined significant fibrosis as a fibrosis stage $2 for studies using grading systems with five stages (F0-F4; i.e., the METAVIR, Brunt, Batts-Ludwig systems) or as a fibrosis stage $3 for studies using the Ishak scoring system (S0-S6). For grading systems using five stages or the Ishak scoring system, severe fibrosis was defined as a fibrosis stage $3 or $4, and cirrhosis was defined as a fibrosis stage = 4 or $5, respectively [28]. We extracted available data on TPs, FNs, FPs, and TNs for staging liver fibrosis to construct a 262 contingency table.

Data Synthesis and Statistical Analysis
From the extracted data, arranged in 262 contingency tables, we computed sensitivity, specificity, and diagnostic odds ratios (DORS) to estimate the diagnostic performance of each test modality to assess each stage of liver fibrosis. All statistics are reported as point values with 95% confidence intervals (CIs). Sensitivity was defined by the TP rate and was calculated as TP/ (TP+FN). Specificity was defined by the TN rate and was calculated as TN/(FP+TN). The DOR is a single overall indicator of diagnostic performance and is the ratio of the odds of positivity in disease subjects relative to the odds of positivity in non-diseased subjects [29]. The DOR was calculated as (TP6TN)/(FP6FN).
The performance was summarised using a bivariate binomial model [30]. This model assumed a binomial distribution for the number of patients with TP and TN results and allowed the inclusion of covariates and random effects. The inherent association between sensitivity and specificity was modelled in a bivariate normal distribution by assuming random effects [31].
The heterogeneity of all diagnostic test parameters was evaluated initially with a graphic examination of forest plots for each parameter. A statistical assessment was then made of the inconsistency index (I 2 ). The I 2 statistic is defined as the percentage of variability due to heterogeneity beyond that from chance; values greater than 50% represent the possibility of substantial heterogeneity. The pooled summary statistics for the sensitivities, Table 1. Characteristic of patients of included studies.    specificities, likelihood ratios, and diagnostic odds ratios of the individual studies are reported. Summary receiver operating characteristic (SROC) curves were also constructed to express the test parameter results as the diagnostic odds ratios. These curves were also used to assess the presence of a diagnostic threshold (cut-off) bias as a cause of between-study heterogeneity. Analyses were performed using the Meta-Disc 1.4 statistical software (Unit of the Clinical Biostatistics team of the Ramón y Cajal Hospital in Madrid, Spain). Figure 1 depicts the flow of our search results. In total, 261 studies were identified using electronic searches. Without duplicates, 117 abstracts were assessed. Of them, 19 seemed relevant and the full studies were assessed. Ultimately, nine investigations were identified for inclusion in this study [32][33][34][35][36][37][38][39][40]. Quality assessment scores for the diagnostic studies were above 10 of the 14 QUADAS items describing methodological quality.

Study Characteristics
The nine studies evaluated involved 1826 patients from Asian and European medical centres. In four trials, the disease spectrum was restricted to chronic viral hepatitis, in one trial to nonalcoholic fatty liver disease (NAFLD), and in four trials, there was no restriction (Table 1).
Five studies reported quality criteria for liver biopsy specimens [32,34,[36][37][38], and three investigations reported a minimum length of 15 mm [33,35,39]. The Ishak histological scoring system was used in three studies [35,38,39], the National Institute of Diabetes and Digestive and Kidney Diseases system was used in one study [33], the modified Brunt system was used in one study [32], the Batts-Ludwig system was used in one study [37], and the METAVIR system was used in four studies [34][35][36]40] (Table 2). Moreover, there are four studies [34,37,38,40] comparing the performance of ELF test with transient elastograhpy (TE) for staging liver fibrosis ( Table 3).

Diagnostic Threshold Bias and Meta-Regression Assessment
To assess the diagnostic threshold (cut-off) bias as a cause of heterogeneity in test performance, we prepared an ROC plot of the sensitivity versus 1-the specificity. Among the six primary studies providing data for the detection of significant liver fibrosis, the diagnostic threshold (cut-off) yielded an area under receiver operating characteristic (AUROC) of 0.8813, among another seven primary studies providing data for the assessment of severe liver fibrosis, the diagnostic threshold (cut-off) yielded an AUROC of 0.8696, among another six primary studies providing data for the prediction of cirrhosis, the diagnostic threshold (cut-off) yielded an AUROC of 0.8770, and they all revealed evidence supporting the diagnostic threshold (cut-off) bias as a major cause of heterogeneity (Fig. 2).

Discussion
Information on the presence and degree of liver fibrosis is pivotal for making therapeutic decisions and predicting disease outcomes [41]. For example, the ultimate goal of treatment at the stage of significant liver fibrosis is to prevent the potential pathogenesis of liver disease [41,42]. In contrast, given that the severe liver fibrosis or cirrhosis may have a risk of progression to hypertension and HCC, discrimination of severe liver fibrosis and cirrhosis is important [42]. The increasing awareness of the limitations of liver biopsies [43,44] has stimulated the development and refinement of non-invasive techniques for the assessment of liver fibrosis. Theoretically, non-invasive techniques for the assessment of liver fibrosis should possess the advantages of liver specificity, easy execution, and high diagnostic performance, in terms of sensitivity, specificity, DOR, PLR, NLR, and AUROC. The most studied non-invasive detection method for liver fibrosis is transient elastography, but it has shown less accuracy in discriminating lower fibrosis stages [45,46] and restricted by narrow intercostal space and ascites.
Studies have confirmed that the ELF test can accurately determine the degree of liver fibrosis [32,[47][48][49] and revealed a lower significance for discrimination of low and moderate fibrosis stages and a broad overlapping range for those stages [38]. In the three subgroups of this meta-analysis, the pooled sensitivity, pooled specificity, and summary DOR of the ELF test were greater than 80%, 74%, and 17, respectively. That indicates that at least 74% of patients could reasonably avoid a liver biopsy. With summary AUROCs of 0.8813, 0.8696, and 0.8770 for significant and severe liver fibrosis and cirrhosis, respectively, the results of this metaanalysis demonstrate that ELF has good diagnostic performance for assessing liver fibrosis.
A diagnostic tool is deemed perfect if the AUROC is 100%, excellent if the AUROC is greater than 90%, and good if the  AUROC is greater than 80% [50]. According to these results, coupled with its reproducibility, the ELF test can be used in clinical practice as a good tool for the staging of cirrhosis. The fine performance of the ELF test may result from the fact that serum markers reflect fibrosis in the whole liver rather than 1/50,000 th of the organ, as does a biopsy sample, or, alternatively, that the ELF test evaluates the impact of liver fibrosis on liver function as well as the architectural damage associated with histological fibrosis and cirrhosis.
It is worth noting that the ELF test showed a high correlation with aminotransferase levels and revealed a significantly high correlation with inflammation [38]. One study [47] found that the ELF test, reflecting on-going pathophysiological processes and functions that a biopsy cannot capture, had prognostic value. The ELF test, an index of HA, PIIINP, and TIMP-1, exhibited prognostic ability even in the early stages of the disease process (AUROC = 0.737-0.863 at all times points) [48], because, probably, the above indices are expressed during the early stages of collagen deposition in the liver. In further analysis of ELF test performance in predicting all-cause mortality, it was found that the AUROC of the ELF test at 6 years was significantly greater than that of a biopsy [47].
In future, the ELF test may be used to evaluate the impact of treatment directed at the underlying causes, such as viral hepatitis,  and in the development of new treatments, such as anti-fibrotic drugs.
Indeed, there was significant heterogeneity in this meta-analysis, which may be due to the following reasons. First, differences in study methodologies are well-recognised causes of heterogeneity in meta-analyses of diagnostic tests. Second, subtle variations in the algorithm of the ELF score and liver biopsy may also contribute to between-study variation. Third, the use of different histological scoring systems may result in discrepancies in the findings of the studies. The studies included in this meta-analysis used five histological scoring systems. Although the histological staging system is complex, it is relevant for the assessment, follow-up, and definition of the rate of fibrosis progression, and is also categorical in nature. The current reliance on histological staging using categorical scores for liver biopsy samples is recognised as suboptimal for assessing efficacy, and this may be a source of heterogeneity [51]. Fourth, the size of liver biopsy tissue cores may impact the accuracy of liver fibrosis staging. Criteria for liver biopsy specimens ($20 mm in length and/or 11 portal tracts) have been described previously [8]. However, in practice, it is difficult for biopsy samples to achieve these criteria. In this meta-analysis, the mean length of specimens ranged from 18.9 to 25.1 mm, so no study reported liver biopsy samples meeting the criteria, and only two studies [32,36] described liver biopsy specimens with 11 complete portal tracts. Thus, the observed heterogeneity may be secondary to intrinsic errors in liver biopsy measurements, which limit the diagnostic accuracy of non-invasive evaluations [52,53].
Fifth, a diagnostic threshold (or cut-off value) bias was identified as an important cause of heterogeneity in the pooled results for the three patient groups. In this meta-analysis, there was no consistent cut-off value, which would also generate heterogeneity. Finally, publication bias may also have resulted in heterogeneity in this meta-analysis because we excluded some studies having no full text and published in languages other than English.
In summary, the ELF test showed good performance and considerable diagnostic value for the prediction of histological fibrosis stage and can be deemed a 'good' diagnostic tool in clinical practice for the staging of cirrhosis.

Supporting Information
Checklist S1 The PRISMA Checklist. (DOC)