The Effectiveness of Noninvasive Biomarkers to Predict Hepatitis B-Related Significant Fibrosis and Cirrhosis: A Systematic Review and Meta-Analysis of Diagnostic Test Accuracy

Noninvasive biomarkers have been developed to predict hepatitis B virus (HBV)-related fibrosis owing to the significant limitations of liver biopsy. Those biomarkers were initially derived from evaluation of hepatitis C virus (HCV)-related fibrosis, and their accuracy among HBV-infected patients was under constant debate. A systematic review was conducted on records in PubMed, EMBASE and the Cochrane Library electronic databases, up until April 1st, 2013, in order to systematically assess the effectiveness and accuracy of these biomarkers for predicting HBV-related fibrosis. The questionnaire for quality assessment of diagnostic accuracy studies (QUADAS) was used. Out of 115 articles evaluated for eligibility, 79 studies satisfied the pre-determined inclusion criteria for meta-analysis. Eventually, our final data set for the meta-analysis contained 30 studies. The areas under the SROC curve for APRI, FIB-4, and FibroTest of significant fibrosis were 0.77, 0.75, and 0.84, respectively. For cirrhosis, the areas under the SROC curve for APRI, FIB-4 and FibroTest were 0.75, 0.87, and 0.90, respectively. The heterogeneity of FIB-4 and FibroTest were not statistically significant. The heterogeneity of APRI for detecting significant fibrosis was affected by median age (P = 0.0211), and for cirrhosis was affected by etiology (P = 0.0159). Based on the analysis we claim that FibroTest has excellent diagnostic accuracy for identification of HBV-related significant fibrosis and cirrhosis. FIB-4 has modest benefits and may be suitable for wider scope implementation.


Introduction
Chronic infection with hepatitis B virus (HBV) is an important global health problem. Approximately 350 million people are chronically infected with hepatitis B virus worldwide, especially in developing countries, 25% of whom will die from long term sequelae, such as cirrhosis, liver failure and hepatocellular carcinoma, resulting in 600,000 to one million deaths annually [1]. Patients who are suffering from significant hepatic inflammation and fibrosis are at high risk of those complications [2]. Assessment of liver significant fibrosis is critical to establishing effective clinical practice. It could be of great help for a doctor to determine patients' suitability and the optimal time for antiviral therapy to achieve the best curative effects as well as to prevent excessive medication [3]. In addition, early prediction of cirrhosis is beneficial to reducing complications in patients with chronic viral hepatitis [4].
Liver biopsy, an invasive technique, is the gold standard for the assessment of fibrosis. It has several disadvantages, such as patients' reluctance, pain, hemoperitoneum, and pneumothorax, etc. [5]. In addition, its accuracy in assessing fibrosis is questionable because of sampling errors and intra-and interobserver variations [6]. Therefore, many people are beginning to realize the importance of prediction of liver fibrosis by noninvasive biomarkers.
Aspartate aminotransferase-to-platelet ratio index (APRI), the fibrosis index based on the 4 factors (FIB-4) and FibroTest are examples of noninvise biomarkers predicting liver fibrosis based on routinely available clinical parameters [7]. They were initially used in Western populations with hepatitis C virus (HCV) or HCV/ human immunodeficiency virus (HIV) co-infection [8] and had good performance. The area under the receiver operating characteristic (AUROC) curve of FibroTest for detecting significant fibrosis peaked out at 0.85 [9], and the AUROC curve of APRI and FIB-4 reached 0.80 [10] and 0.81 [11] respectively. For detecting cirrhosis, FibroTest also has the best result, and its AUROC curve topped out at 0.90 [12]. The AUROC curve of APRI and FIB-4 are 0.83 [13] and 0.89 [14], respectively. These three markers can be considered as ''good'', even ''better'' markers, according to the criteria of Deeks JJ [15]. Consequently, the researchers were regularly conducting those markers to predict significant fibrosis and cirrhosis among HBV-infected patients. APRI was first used to predict significant fibrosis or cirrhosis in patients with HBeAg-negative chronic hepatitis B by Chrysanthos et al. [16]. They found APRI was strongly correlated to the fibrosis. Later FIB-4 and FibroTest were successively used to predict HBV-related fibrosis.
However, due to the fact that those markers were initially derived from evaluation of HCV-related fibrosis, their accuracy for HBV patients was under constant debate among the researchers. Some scholars indicated that all of those noninvasive markers were able to predict significant fibrosis or cirrhosis among HBV patients, and could potentially be used to decrease the number of liver biopsies [7]. Others maintained that those markers were not directly applicable to evaluation of HBV-related fibrosis because of the small AUROC curve [17]. Therefore, we decided to conduct this meta-analysis to assess the pooled performance of these biomarkers for prediction of significant fibrosis and cirrhosis among HBV-infected patients. It could provide the basis for future research and clinical application.

Literature Search
The review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [18] (see Checklist S1 for PRISMA checklist). A protocol (see Text S1) was developed and systematic methods were used to identify relevant studies, assess study eligibility for inclusion, and evaluate study quality. Online database search was completed on PubMed, EMBASE and the Cochrane Library (01/2003-04/ 2013) for terms including the following: aspartate aminotransferase-to-platelet ratio index, APRI, fibrosis index based on the 4 factors, FIB-4, FibroTest, hepatitis B virus, HBV, Chronic hepatitis B, CHB, fibrosis and cirrhosis (see Text S2 for full search strategies). Additional studies were identified via a manual search for the referenced studies and review articles. EndNote X5 software was used to manage the references.

Selection Criteria
Studies were included if they met the following inclusion criteria: (a) The study evaluated the performance of the APRI and/or FIB-4 and/or FibroTest for the prediction of fibrosis and/ or cirrhosis in HBV infected patients. Studies on patients with other etiologies of liver disease were also included if data for HBVinfected patients could be independently extracted. In addition, special populations of HBV patients (e.g., HBV/HIV coinfection, HBV/HCV, and HBV/ hepatitis D virus [HDV]) were also included. (b) Liver biopsy was used to diagnose liver fibrosis as a golden standard. (c) Data could be extracted to construct at least one 262 table of test performance, based on some cutoff points of the APRI, FIB-4, and FibroTest for a fibrosis stage. (d) They assessed the diagnostic accuracy for fibrosis stage F$2 or F$4 according to METAVIR or a comparable staging system. (e) The study included at least 40 patients. Studies of smaller sample sizes were excluded due to concerns on their applicability.

Data Extraction and Quality Assessment
Two reviewers (XYX and RXS) screened the downloaded titles and abstracts against the inclusion criteria. Two reviewers (XYX and HK) independently evaluated study eligibility, graded the    The Effectiveness of Noninvasive Biomarkers PLOS ONE | www.plosone.org study quality, and extracted data from the study. Any disagreements between the reviewers were resolved with detailed discussions between them together with a third reviewer (HBL). The parameters in our literature search included author, year of publication, region, method, patient gender, age, number of patients, underlying chronic liver disease etiology, histological scoring system, average length of liver specimen, time interval between biopsy and laboratory tests, prevalence of the fibrosis stage, as well as cutoff values to identify the fibrosis stage [13]. The quality of included studies was independently appraised by two reviewers (XYX and YHZ) using the quality assessment of diagnostic accuracy studies (QUADAS) questionnaire [19] (see Text S3). It could estimate the internal and external validity of diagnostic accuracy studies used in systematic reviews.

Statistical Analysis and Data Synthesis
We extracted and tabulated the data in a series of 262 tables, which included sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) at each threshold value. The primary outcome was the identification of significant fibrosis, defined by METAVIR [20], Batts and Ludwig [21], and Scheuer [22] for stages F2 through F4, and Ishak [23] for stages F3 through F6. This gauge was chosen because significant fibrosis is often considered a threshold for the initiation of antiviral therapy [24]. We also assessed cirrhosis (METAVIR, Batts and Ludwig, and Scheuer F4, and Ishak F5-6). In order to provide clinically meaningful results, the metrics of diagnostic test accuracy were examined.
The SROC curve, generated using linear regression, represents the relationship between the true positive rate and false positive rate across these studies, albeit they may have used different test thresholds [25]. In this analysis, the area under SROC curve was examined according to Moses et al. [26], and each study was weighted with its sample size and with adjustment for the number of thresholds within each study [27].
The diagnostic odds ratio (DOR) describes the odds of a positive test in true disease cases compared with cases of no disease [15]. The summary DOR was calculated using a DerSimonian and Laird random-effects model on a logarithmic scale with a corresponding test of heterogeneity [28]. Because such analyses require a single measure of accuracy for each study and many studies reported multiple test thresholds, we calculate the average DOR among all thresholds for a given study [29]. We also calculated summary sensitivities and specificities using the  bivariate meta-analytic approach [30]. Pairs of sensitivity and specificity for diagnostic thresholds are jointly analyzed, with any correlation that might exist between those two measures taken into account using a random-effects approach.
The heterogeneity (or the lack of homogeneity) of the results between studies was assessed statistically using the Cochran-Q and the quantity I2. I2 value describes the percentage of total variation across studies that is attributable to heterogeneity rather than

Search Results
The study selection process is presented with a flow chart in Figure 1. 306 studies were retrieved with the described search strategies, of which 196 were excluded following title and abstract screening. The full texts of 110 potentially eligible reports were obtained for further assessment. Of those, 30 papers were included in the review following full-text screening (

Diagnostic Accuracy for the Prediction of Significant Fibrosis
In the seventeen studies assessing the APRI (N = 3,573), the AUROC curve ranged from 0.61 to 0.86. When combined, the area under the SROC curve was 0.77 08 (SE = 0.0172) (Figure 2A). The Pooled DOR was 5.41 (95% confidence interval [CI] 3.98-7.35) ( Figure 2B). The Cochran-Q and I 2 value of all measures were 38.32 and 58.2%, indicating significant heterogeneity across the included studies (P = 0.001) ( Figure 2B). The pooled sensitivities and specificities could not be assessed. Instead, the sensitivities and specificities of the APRI at various diagnostic thresholds in the seventeen studies are listed in Table 2. We used the metaregression analysis to explore the heterogeneity of the APRI accuracy for detecting significant fibrosis, which was mainly affected by median age (P = 0.0211, see Text S4 for metaregression). There was no significant correlation between other covariates and the DOR.
The AUROC curve ranged from 0.69 to 0.90 in the 11 studies assessing the FibroTest (N = 1.640). When combined, the area under the SROC curve was 0.84 (SE = 0.0227) ( Figure 4A). The summary DOR was 13.73 (95% CI 8.61-21.90), and the score of Cochran-Q is 22.52, indicating significant heterogeneity across the included studies (P = 0.0127) ( Figure 4B). We didn't find the cause of the heterogeneity of FibroTest accuracy according to the

Diagnostic Accuracy for the Prediction of Cirrhosis
There were 11 studies on assessing the APRI for the predication of cirrhosis (N = 2,083). The AUROC curve of these studies ranged from 0.50 to 0.83. When combined, the area under the The Effectiveness of Noninvasive Biomarkers SROC curve was 0.75 (SE = 0.0174) ( Figure 5A). The summary DOR was 4.4 (95% CI 2.9-6.8). The heterogeneity occurred in the meta-analysis for the twelve studies assessing the APRI for the predication of cirrhosis, which was statistically significant (Q = 23.10, P = 0.01; I 2 = 56.7%, Figure 5B). However, when we further conducted the meta-analysis at the different thresholds of ,1.0, 1.0, and 2.0, we found that the heterogeneity wasn't statistically significant ( Figure S3). The summary sensitivity and specificity of the APRI at different diagnostic thresholds are listed in Table 3.
In the nine studies assessing the FibroTest (N = 1101), the AUROC curve ranged from 0.68 to 0.92. When combined, the area under the SROC curve was 0.90 (SE = 0.0250) ( Figure 7A). The summary DOR was 23.75 (95% CI 11.88-47.48) and the score of Cochran-Q is 20.25 (P = 0.0094) ( Figure 7B). The heterogeneity was statistically significant. The pooled sensitivities and specificities could not be assessed. Instead, the sensitivities and specificities of the FibroTest at various diagnostic thresholds in the nine studies are listed in Table 4.
According to the meta-regression analysis, the heterogeneity of FibroTest accuracy for detecting cirrhosis was mainly affected by sample size (P = 0.0385) and median age (P = 0.0436) (Text S7), whereas the other covariates were not significant. Publication Bias Funnel plots of these three markers for assessing possible publication bias are illustrated in Figure 8. Mild asymmetry was noted in the funnel plots of the FIB-4 and FibroTest.

Discussion
Liver fibrosis progression is commonly found in HBV-infected patients. Cirrhosis develops in approximately one third of those cases, usually after an extensive period of time during which liver biochemical indices are found to be predominantly or even persistently abnormal [1]. Patients with significant fibrosis or cirrhosis should be considered for antiviral therapy, which can potentially reverse cirrhosis and reduce complications [60]. Considering the limitations and risks of biopsy, the researchers make persistent efforts in exploring some noninvasive markers in order to more accurately identify patients with significant fibrosis or cirrhosis. APRI, FIB-4 and FibroTest are such noninvasive markers gaining increasing acceptance in clinical practice. Those markers may reduce the need for liver biopsy and may help to monitor the efficacy of treatment [47].
In our systematic review, the diagnostic accuracy of the APRI, FIB-4 and FibroTest for HBV-related significant fibrosis and cirrhosis has been comprehensively evaluated and summarized on The Effectiveness of Noninvasive Biomarkers a large scale, and we confirmed the results of many individual studies. Our meta-analysis also included the description of multiple measures of test performance using confirmed meta-analytic techniques and formal assessment for publication bias and heterogeneity, as well as exploratory analysis. All results should be valid and reasonably reliable.
FibroTest had the best result in not only significant fibrosis but also cirrhosis. The area under the SROC curve of FibroTest is bigger and even reaches the standard of ''better'' on cirrhosis [15], and the summary sensitivity and specificity have reached 84% and 82%, respectively. A meta-analysis about HCV-infected patients showed that the area under the SROC curve of significant fibrosis and cirrhosis are 0.81 and 0.90 [12]. Evidently, the performance of FibroTest in evaluating HBV-related fibrosis is no worse than HCV-related. Therefore, FibroTest could be considered as a better marker in assessing fibrosis and cirrhosis of HBV-infected patients. The FibroTest, however, is calculated with alpha2 macroglobulin, alpha2 globulin (or haptoglobin), gamma globulin, apolipoprotein A1, GGT and total bilirubin [61]. Alpha2 macroglobulin and alpha2 globulin (or haptoglobin) are not routine clinical measurements, and those two indicators are not tested for patients in most hospitals. Furthermore they cost more than conventional indicators. Those factors may bring restrictions to the wider application of the FibroTest in clinical practice.
The calculation method of FIB-4 is simpler than that of FibroTest. The area under the SROC curve of FIB-4 predicting HBV-related significant fibrosis and cirrhosis are 0.75 and 0.87, respectively. FIB-4 also has a better performance of predicting fibrosis [7]. Its test items are easy to obtain in clinical practice, although its predictive results are not as good as FibroTest [11,14]. APRI shows lower diagnostic accuracy than FibroTest and Fib-4 to identify HBV-related significant fibrosis and cirrhosis. It has been introduced to assess HBV-related fibrosis the earliest because of its simple and easy practice. Presently, APRI is widely utilized in identifying the degree of fibrosis and cirrhosis of patients with hepatitis C and hepatitis B, particularly in regions with limited healthcare resources. Some scholars argue that the calculation method of APRI did not consider the factor of spleen size [35]. If patients were grouped by spleen size, the performance of APRI in predicting HBV-related fibrosis would be improved. Our metaanalysis revealed that the area under the SROC curve of APRI was small and the accuracy of the evaluation of HBV-related fibrosis was poor. Our results showed similar performance of APRI for staging of significant fibrosis and cirrhosis [62].
Meta-regression method was convenient and reliable to screen the factors of heterogeneity. The strength of our study is that metaregression analysis has been used to explore several factors that may be responsible for heterogeneity. Liver biopsy scoring systems and percentage of males emerged from many relevant factors to provide heterogeneity to summary test result on APRI to predict significant fibrosis [62]. On the other hand, etiology of cirrhosis was found to be significantly associated with the heterogeneity on APRI to predict cirrhosis. But the heterogeneity of the metaanalysis of the FIB-4 and FibroTest to predict significant fibrosis and cirrhosis was not statistically significant. FIB-4 and FibroTest to predict fibrosis had better consistency, and summary test results were reasonably reliable.
However, there are several limitations in our systematic review. Firstly, we only focused our analysis on those patients with HBVrelated fibrosis, without distinguishing between HBeAg negative and positive cases, or considering the virus replication rate due to the limited number of publications. Secondly, we included studies published in English and Chinese languages only, so the language bias may influence the results to some extent. Lastly, Fibroscan, a widely noninvasive tool, was not considered in this meta-analysis, because our focus was to compare the serum markers calculated by biochemical examination.
In summary, the FibroTest has excellent diagnostic accuracy for the identification of HBV-related significant fibrosis and cirrhosis. But FibroTest is seldom applied in clinical practice as a result of expensive cost. FIB-4, a relatively moderate marker, has better summary diagnostic accuracy and could be measured and calculated relatively easily. Furthermore, APRI shows some limited value in identifying hepatitis B-related significant fibrosis and cirrhosis. All of them have their own advantages and disadvantages. Future studies of novel fibrosis markers are needed to demonstrate improved accuracy and cost-effectiveness compared with those simple, economical, and widely available indeces.