Performance of Biomarkers FibroTest, ActiTest, SteatoTest, and NashTest in Patients with Severe Obesity: Meta Analysis of Individual Patient Data

Background Liver biopsy is considered as the gold standard for assessing non-alcoholic fatty liver disease (NAFLD) histologic lesions in patients with severe obesity. The aim of this study was to perform an overview of 3 studies which assessed the performance of non-invasive markers of fibrosis (FibroTest), steatosis (SteatoTest) and steato-hepatitis (NashTest, ActiTest) in these patients. Methods 494 patients with interpretable biopsy and biomarkers using of three prospective cohorts of patients with severe obesity (BMI >35 kg/m2) were included. Histology (NAS score) and the biochemical measurements were blinded to any other characteristics. The area under the ROC curves (AUROC), sensitivity, specificity, positive and negative predictive values were assessed. Weighted AUROC (wAUROC Obuchowski method) was used to prevent multiple testing and spectrum effect. Two meta-analyses were performed; one used the individual patient, and the other a classical meta-analysis. Results Prevalence of advanced fibrosis (bridging) was 9.9%, advanced steatosis (>33%) 54.2%, and steato-hepatitis (NAS score >4) 17.2%. The mean wAUROCs were: FibroTest for advanced fibrosis (95%CI; significance)  =  0.85 (0.83–0.87; P<0.0001); SteatoTest for advanced steatosis = 0.80 (0.79–0.83); and ActiTest for steato-hepatitis = 0.84 (0.82–0.86; P<0.0001). Using the classical meta-analysis (random effect model) the mean AUROCs were: FibroTest = 0.72 (0.63–0.79; P<0.0001); SteatoTest = 0.71 (0.66–0.75; P<0.0001); and ActiTest = 0.74 (0.68–0.79; P<0.0001). Despite more metabolic risk factors in one cohort, results were similar according to gender, presence of diabetes and between the 3 cohorts. Conclusion In patients with severe obesity, a significant diagnostic performance of FibroTest, SteatoTest and ActiTest was observed for liver lesions.

Non invasive biomarkers of liver fibrosis have been extensively validated in chronic viral hepatitis and more recently in patients with alcoholic and non alcoholic fatty liver diseases, the most validated serum fibrosis biomarkers being FibroTestH/FibroSureH (FT) [5][6][7][8]. In patients at high risk of NAFLD, FT has been validated in two studies [9,10].
Very few biomarkers have been validated for the diagnosis of steatosis or NASH, including patients with severe obesity [7,8,10].
The aim of the present study was to better assess the performance of 4 previously published biomarkers, of fibrosis (FT) [9], of steatosis (SteatoTestH)(ST) [11] and of necrosis and inflammation [ActiTestH (AT) and NashTestH (NT)] [12], the combination of these 4 biomarkers is named FibroMaxH, in patients with severe obesity. As part of the FLIP consortium European project [13], a large integrated database of 494 patients was constructed using 3 recent validation studies performed independently from the inventor's group permitting to increase of number of patients with advanced liver injuries.
The specific goal was to estimate the diagnostic performance of these biomarkers versus ALT the routine liver test, using the most accurate methods already applied in patients with chronic hepatitis C: meta-analysis of individual data [14], and standardized area under the characteristics receiver operating curves (AUROCs) [15][16][17].

Methods
Informed consent have been obtained for all patients and all clinical investigation have been conducted according to the principles expressed in the Declaration of Helsinki. The ethic committee of Groupe Hospitalier Pitié Salpêtrière has approved the research.
We identify all clinical studies assessing the diagnostic performance of FT, ST, AT and NT, in obese patients. Two meta-analyses were performed. One used the integrated database of these studies combining individual data provided by authors, and the other was a classical meta-analysis of these studies but using weighted AUROCs. Finally the performances of the four biomarkers and ALT were assessed using methods without gold standard.

Patients
To be eligible for the study, all patients had to have fulfilled the following criteria: (1) severe obesity (BMI.35 kg/m2), (2) absence of current excessive drinking, as defined by average daily consumption of alcohol of 20 g/day for women and 30 g/day for men; (3) absence of long-term consumption of hepatotoxic drugs; and (4) negative screening for chronic liver diseases, including negative testing for hepatitis B surface antigen and hepatitis C virus antibodies, and no evidence of genetic hemochromatosis.
The following clinical and biological features were required for the integrated data base: weight, height, BMI, blood pressure, alanine aminotransferase (ALT), gamma glutamyl transferase (GGT), serum triglyceride, cholesterolemia, fasting blood glucose and interpretable biomarkers FT, ST, AT and NT. Diabetes, hypercholesterolemia and hypertriglyceridemia were defined as follows: fasting blood glucose.1.26 g/l, cholesterolemia.2.4 g/l and serum triglyceride.1.5 g/l, or respective specific treatment.

Biomarkers measurements
FT, ST, AT and NT (Biopredictive, Paris, France; Fibro-SUREH is the brand name for FT in USA, LabCorp, Burlington, NC, USA) were determined as has been previously published [6,14]. The published recommended pre-analytical and analytical procedures were used [6,14]. FT includes a2-macroglobulin, apolipoprotein A1, haptoglobin, total bilirubin, and GGT, adjusted for age and gender; AT includes same 5 components plus transaminases ALT; ST and NT included the same 6 components than AT plus serum glucose, triglycerides and cholesterol, adjusted for age, gender and BMI.

Histological analysis
In the three studies, histological features were scored according to the same criteria than those used in the FT/AT [9,17], ST [11], and NT [12] validations in non-alcoholic fatty liver disease (NAFLD), and those used in the NAFLD scoring system (NAS) [20]. Fibrosis was scored using a predetermined scoring system equivalent to METAVIR scoring system [20][21][22] and used in the first FT validation in NAFLD [9]. Fibrosis was staged on a scale of 0 to 4: F0 -no fibrosis; F1 -portal fibrosis or perivenular fibrosis without septa; F2 -few septa; F3 -numerous septa without cirrhosis; and F4 -cirrhosis.
NASH was classified using the NAS score [20], defined as the sum of scores for steatosis (0-3), lobular inflammation (0-3) and ballooning (0-2), thus ranging from 0 to 8. Cases with NAS of 0 to 2 were considered not diagnostic of NASH; cases with scores of 5 or greater were diagnosed as NASH. Cases with activity scores of 3 and 4 were considered as borderline, possible NASH [20]. In each population, liver biopsies were classified by a centralized pathologists blinded to the clinical and biological data. Liver biopsies were performed during the operative procedure, by Hepafix needle in half of cases. Patients with more than 6 months between biopsy and serum samples were not included. Biopsies were routinely stained with hematoxylin-eosin and Masson's trichrome.

Statistical analysis
Methods were detailed in File S1 and Figure S1. In order to take into account the spectrum effect and to prevent multiple testing risk, the primary endpoint for each quantitative biomarker's performance (FT, AT, ST) was the Obuchowski measure [14][15][16]. This measure is a multinomial version of the AUROC. With N categories of the gold standard outcome (i.e. histological fibrosis stage) and AUROCst, the estimate of the AUROC of diagnostic tests for differentiating between categories s and t, the Obuchowski measure, is a weighted average of the N(N21)/2 different AUROCst corresponding to all the pairwise comparisons between 2 of the N categories. Each pairwise comparison between stages has been weighted (wAUROC) to take into account the distance between grades or stages. AMSTAR recommendations were followed for the meta-analysis [23]. The secondary outcomes were the AUROC using the standard definition of liver injury and predictive values using predetermined cutoffs as defined in the validation of biomarkers in NAFLD [6,9,10,11,12,14,17,20]. A sensitivity analysis of biomarkers analysis was performed in patients with diabetes versus patients without diabetes, according to gender and according to age (50 years cutoff).

Studies search
A total of 212 studies of biomarkers have been identified in patients with obesity or NAFLD, including 90 studies of steatosis' biomarkers, 54 studies of fibrosis' biomarkers, and 51 of steatohepatitis' biomarkers.
Among these 212 studies, three were included as specifically conducted in patients with severe obesity (Figure 1 and Figure  S2): three [10,24,25] assessed FT, ST, AT and NT. One study is part of an ongoing cohort (Lille cohort) [10,26]; for the other two (Paris and Bethune cohorts) [24,25] the performances of biomarkers were not detailed in the publications but the authors shared the individual data; five other studies investigated these tests in patients with NAFLD but not specifically in severe obese patients and were not included: FT [9,13], ST [11,13], NT [12,13].

Patients included (Figure 1)
In the Lille cohort, 288 patients were included [10,26], 114 in the Paris cohort, and 84 in the Bethune cohort. Between the cohorts there was few significant differences, mostly less metabolic factors in the Bethune cohort (Table 1). There was no significant difference between included and non included patients' characteristics.

Integrated analysis
The AUROCs were detailed in Figure S2.
Classical AUROC of FT was 0.72 (0.63-0.79; P,0.0001). The FT values according to each stage are given in Figure 2A.
Performance of NashTest, and ActiTest for the diagnosis of NASH. Prevalence of NASH was 17.2% and for possible NASH 25.7% (Table 1). Concordance rate between histological NAS score and presumed by NASH test was 33.1% (P,0.0001) but with a weak kappa reliability test = 0.18. Among 110 patients presumed No-Nash by NT, 95 (86%) were No-Nash, 10 Possible and 5 Nash at biopsy; among 355 presumed Possible-Nash by NT, 176 were No-Nash, 111 (31%) Possible and 68 Nash at biopsy; among 29 patients presumed Nash by NT, 11 were No-Nash, 6 Possible and 12 (41%) Nash at biopsy.
Sensitivity, specificity, positive (PPV) and negative(NPV) predictive values. Diagnostic values according to predetermined cutoffs are detailed in Table 3. For fibrosis the PPV was 87.5% for the diagnosis of Fibrosis .F0 using the 0.27 cutoff and the NPV for fibrosis .F1 was 93.8% using 0.48 cutoff.
For steatosis the PPV of ST was 92.4% for the diagnosis of steatosis .S0 using the 0.38 cutoff and the NPV for steatosis .S1 was 59.3% using 0.69 cutoff.
For steato-hepatitis the NPV of AT was 96.0% for the diagnosis of NASH (NAS.4) using the 0.29 cutoff and the PPV for Possible/NASH or NASH (NAS.2) was 47.5% using 0.17 cutoff.
Sensitivity analysis of AUROCs (Table S1) and Obuchowski measures (Table S2) according to the presence of diabetes, gender and age showed that the performances of FT, AT and ST remained always highly significant for the diagnosis of advanced fibrosis by FT and NASH by AT. For ST the AUROC was significantly higher in patients with diabetes than without.

Discussion
This study is the largest analysis of liver biomarkers (FT, ST, AT and NT) performances in patients with severe obesity. This overview confirms the accuracy previously observed for the diagnosis of liver injury in patients with NAFLD [9,11,12] and in general populations [27,28]. The two new studies (Paris and Bethune' cohorts) performed in patients with severe obesity have confirmed the performances previously observed in the Lille [10].

Advantages of this overview
The main advantage of this overview was an increase of power in comparison with isolated studies. A large number of patients was necessary to assess correctly these biomarkers performances as some classes of liver injury could be too small, such as patients with advanced fibrosis or patients without advanced steatosis. Indeed the liver injury spectrum is dramatically different in obese patients than in liver diseases where FT and ST were originally constructed. In obese we observed 9.9% of advanced fibrosis, much lower than the 49.0% observed in patients with chronic hepatitis C [29]; the prevalence of steatosis (.5%) was 85.9% much higher than the 45.0% of the initial ST training group [11].
Due to this spectrum effect the use of the Obuchowski measure was also necessary to prevent misleading interpretation of not weighted AUROCs [14][15][16]. With or without standardization, the AUROCs of FT was 0.85 vs 0.72, ST was 0.80 vs 0.71 and AT was 0.84 vs 0.74, respectively.
The accuracy of the biomarkers were confirmed using several statistical methods the integrated data base analysis and the classical meta-analysis. There was no difference between cohorts and between patients with or without diabetes.

Limitation of the present study
This population of tertiary centers offering bariatric surgery is not representative of the general population of severe obese patients. There was an heterogeneity between the three cohorts with less metabolic factors in the Betune cohort. The distribution of the present study sample was taken for he Obuchowski measure as the present study was the largest study published in severe obese and there was no recognized reference distribution. Due to the limited number of patients with advanced fibrosis it was therefore not possible in the present study to compare the accuracy between all advanced fibrosis stages. Only 8 (1.6%) patients had a cirrhosis. This low prevalence of advanced fibrosis was expected as these obese patients were selected according to the absence of other recognized risk factors of fibrosis progression: no high alcohol consumption, predominantly young (42 years old) and females (77.3%) [21].
ST has limitations as it is mostly a semi quantitative test mostly designed to be sensitive for excluding steatosis and it cannot not discriminate severe steatosis (greater than 66%) versus marked steatosis between 33% to 66%. More quantitative ST should be developed as severe steatosis represented 20% of these obese patients versus 34% for marked steatosis (33-66%).
This overview focused on 4 tests developed by several coauthors of the article, who have an obvious conflict of interest as inventor or employee of the company marketing these tests. However the other co-authors were totally independent, recruited the patients and performed the assay independently of the company and had a full access to all data and analyses.
Another limitation was the absence of direct comparisons with other biomarkers such as ELF, Fibrospect, Fibrometer and Fibroscan for fibrosis, cytokeratin 18 for NASH, and magnetic resonance imaging and spectroscopy for steatosis [7,8,30]. The main goal of this study was to validate the performance of these tests versus random. At least this overview demonstrated that both ST and AT were significantly more accurate than ALT for the diagnosis of steatosis and NASH in patients with severe obesity. There was no difference in the present study between the FT performance and the ALT performance for the diagnosis of fibrosis. This absence of significant difference should not be interpreted as an absence of difference according to the low power of this comparison. Due to the low prevalence of advanced fibrosis (9.9%) in obese patients a study comparing FT to other fibrosis biomarkers would need much more patients. As observed in other frequent liver disease, ALT is specifically associated with necroinflammatory activity grades and therefore must not be used as a showing the relationship between tests and the stage/grade of liver injury. The horizontal line inside each box represents the median, and the width of each box the median61.57 interquartile range/!n (to assess the 95% level of significance between group medians). Failure of the shaded boxes to overlap signifies statistical significance (P,0.05). The horizontal lines above and below each box encompass the interquartile range (from the 25th to 75th percentile), and the vertical lines from the ends of the box encompass the adjacent values (upper: 75th percentile plus 1.5 times interquartile range; lower: 25th percentile minus 1.5 times interquartile range). doi:10.1371/journal.pone.0030325.g002 marker of fibrosis [17]. Ideally fibrosis' biomarkers must be interpreted together with validated independent biomarker of activity and steatosis to prevent false positive. Biomarker such as Fibrometer which included transaminases in its components, had a variability related to activity.
This overview confirms the significant accuracy of AT for the diagnosis of overt NASH as well as for the pairwise comparison between NAS categories, observed by Lassailly et al [10]. The AT was originally designed for necroinflammatory histological activity diagnosis in chronic hepatitis C and B. According to the observed performance in obese patients, it will be interesting to check the AT performance in patients with other NAFLD risk factors as well as a comparison or combination of cytokeratin 18 for the diagnosis of NASH.
Long term prospective studies must be undertaken in patients with severe obesity and other NAFLD risk factors in order to validate these biomarkers versus biopsy. In patients with chronic hepatitis C [31], chronic hepatitis B [32] and alcoholic liver disease [33] FT had similar the prognostic values than biopsy.
Finally a major limitation of liver biomarkers validation is the absence of perfect gold standard [19,22]. Using more appropriate methodology such as latent class analysis looking for truth in the absence of gold standard is probably one scientific manner to better estimate the performance of liver biomarkers [34].