Real-Time Shear Wave versus Transient Elastography for Predicting Fibrosis: Applicability, and Impact of Inflammation and Steatosis. A Non-Invasive Comparison

Background and Aims Real-time shear wave elastography (2D-SWE) is a two-dimensional transient elastography and a competitor as a biomarker of liver fibrosis in comparison with the standard reference transient elastography by M probe (TE-M). The aims were to compare several criteria of applicability, and to assess inflammation and steatosis impact on elasticity values, two unmet needs. Methods We took FibroTest as the fibrosis reference and ActiTest and SteatoTest as quantitative estimates of inflammation and steatosis. After standardization of estimates, analyses used curve fitting, quantitative Lin concordance coefficient [LCC], and multivariate logistic regression. Results A total of 2,251 consecutive patients were included. We validated the predetermined 0.2 kPa cut-off as a too low minimal elasticity value identifying not-reliable 2D-SWE results (LCC with FibroTest = 0.0281[-0.119;0.175]. Other criteria, elasticity CV, body mass index and depth of measures were not sufficiently discriminant. The applicability of 2D-SWE (95%CI) 89.6%(88.2–90.8), was significantly higher than that of TE, 85.6%(84.0–87.0; P<0.0001). In patients with non-advanced fibrosis (METAVIR F0F1F2), elasticity values estimated by 2D-SWE was less impacted by inflammation and steatosis than elasticity value estimated by TE-M: LCC (95%CI) 0.039 (0.021;0.058) vs 0.090 (0.068;0.112;P<0.01) and 0.105 (0.068;0.141) vs 0.192 (0.153;0.230; P<0.01) respectively. The three analyses methods gave similar results. Conclusions Elasticity results including very low minimal signal in the region of interest should be considered not reliable. 2D-SWE had a higher applicability than TE, the reference elastography, with less impact of inflammation and steatosis especially in patients with non-advanced fibrosis, as presumed by blood tests. Trial Registration ClinicalTrials.gov NCT01927133


Introduction
Liver fibrosis evaluation using real-time shear wave elastography (2D-SWE) by Aixplorer TM is a two-dimensional transient elastography technique, [1] which is a competitor of the transient elastography with probe M (TE-M) considered as a standard. [2] 2D-SWE estimates the speed of a shear wave to provide a quantitative estimate of tissue stiffness. 2D-SWE has the advantage over TE of being able to image liver stiffness in real time, not limited at a single location, and guided by a higher frame-rate B-mode image. [1,3] Two disadvantages were also identified and related to applicability and reliability: "quality criteria not well defined" and the "influence of inflammation" should be clarified. [2] The first aim was to better define quality criteria [2,3,4,5,6] (S1 Table) The secondary aim was to better quantify the impact of inflammation and steatosis on elasticity values, independently of fibrosis value.
These aims were reachable more rapidly and in larger populations using validated blood tests as reference, rather than using liver biopsies.

Patients
Consecutive patients undergoing chronic liver disease assessment at the "Groupe Hospitalier Pitié Salpêtrière" Hospital in Paris, France were recruited (Fig 1). We included patients aged 18 years or older who had undergone simultaneous serum sampling for FibroTest and attempted liver stiffness measurements with 2D-SWE and TE-M and TE-XL.(S1 File) Patients with acute liver disease, ALT greater than 622 IU/L, and extra-hepatic cholestasis, were not included. Written informed consent have been obtained for all patients and all clinical investigations have been conducted according to the principles expressed in the Declaration of Helsinki. The ethic committee of Groupe Hospitalier Pitié Salpêtrière has approved the research. This study is a sub-project of the FibroFrance project (NCT01927133). All co-authors had access to the study data and had reviewed and approved the final manuscript.

Elasticity measurements
2D-SWE was performed using the Aixplorer TM ultrasound system (Supersonic Imagine S.A., Aix-en-Provence, France). For each patient the mean, and the median of Qbox elasticity were assessed, as well as the lowest and the highest elasticity values. A single estimate of the Qbox elasticity was performed, as it has been previously validated using biopsy that less than five measures were sufficient for 2D-SWE in comparison with TE, three measures [7,8] and finally one measure [9]. TE-M and TE-XL were performed using M and XL probes respectively, using FibroScan TM (Echosens, Paris, France) according to the instructions and training provided by the manufacturer. [2] Steatosis was also assessed using the controlled attenuation parameter (CAP) of TE-M measures. [10] The measurements were not blinded as the same operator performed successively TE-M, TE-XL and 2D-SWE, but blinded to the blood tests results.

Definition of applicability rate
For FibroTest, a measurement was classified as a failure when serum sampling was impossible; it was classified as not-reliable if one component in the measurement had an extreme value, which induced a change of more than 0.30 in the FibroTest value when calculated using the usual median instead. [11] 1 As FibroTest (FT) was taken as the reference, the not-reliable FTs were excluded of the "intention to diagnose population". 2 Several failures or not-reliable results were possible in the same patient explaining why the total failures or not reliable results were greater than the number of patients excluded of the 2D-SWE reliability population and of the concordance population. 3  For TE-M and TE-XL, signal absence was considered a failure, and the standard reliability definition was the IQR/liver stiffness measurement (IQR/M) <0.30, at least 10 measurements and a success rate of 60% or greater. [2,14] For 2D-SWE, there were no standard, [2,3,4,5] and we compared the different definitions published,(S1 and S2 Tables) using the "strength of concordance" method detailed elsewhere. [4,6]. In the absence of reference, measurement of the strength of the concordance between two imperfect gold standards could be used as a tool for identifying factors of variability. Any variability factor of one test should impact the strength of the association between the two tests, assuming that this variability factor is not also associated with the other test (independent tests).

Impact of inflammation and steatosis on elasticity
The strength of concordance between 2D-SWE and FibroTest were estimated using the Lin concordance coefficient of correlation (LCC), stratified according to presence of significant steatosis and significant inflammation (S2 File and S3 Table) Statistical analysis The aims of this study were not to assess the performance of 2D-SWE versus TE-M, the standard of elastography.
Firstly, we attempt to normalize and standardize the expression of elasticity. We assessed the impact of elasticity expression in kPa without and with transformation to reduce skewed distribution. For the concordance analyzes we transform the TE-M, TE-XL and 2D-SWE elasticity, first by logarithmic transformation, and secondly by standardizing the expression from 0 to 1, dividing each value by 74, the range between 1 to 75 kPa (Fig 2). We checked that indeed the non-transformation of data would had induced higher coefficient of variation (CV) and lower concordances (S3 Table).
Secondly, we describe the association between elasticity values and fibrosis severity using curve fitting. We compared graphically and with squared correlation coefficient (R2)(S4 Table), the simplest continuous linear model, with models assuming several parts in correlation. The rational was that elasticity values, despite Log-transformation and standardization, were still increasing in patients with stages F3 and F4 in comparison with patients with nonsevere stages (F0F1F2).
Thirdly, we used LCC with bilateral 95% confidence interval, for assessing the quantitative strength of concordance between elasticity estimates, FibroTest measures being taken as reference. [15] LCC was also used to compare the impact of inflammation and steatosis on elasticity values. A quantitative multivariate regression analysis was performed using the diagnostic of F3F4 as the endpoint. We compared the regression coefficients as well as the area under the ROC curves (AUROC) of the logistic regression function, including elasticity values, inflammation and steatosis and the cause of liver disease as variability factors. Due to the relatively small number of patients, only NAFLD and CHC were entered in these models.
The AUROC were estimated by the non-parametric method, and compared using the paired method of Zhou et al. [16] Recommendations have been made for assessing the intention to diagnose, to use the worse scenario for missing data. [17,18] Accordingly, we used for missing elasticity measure the ([1-standardized reference measure], that is ). We used NCSS software (Kaysville, Utah, USA) [19] for standard statistics and LCC.

Populations included
Between October 2013 and April 2015, 4,151 consecutive patients were assessed for fibrosis, 2,251 patients constituted the "intention-to-diagnose population", 1,720 patients constituted the "SWE reliability population", 1,558 patients the "concordance population", 1,270 patients the "concordance population with SteatoTest" and 663 patients were the "not-applicable population".
There were no major, or unexpected differences, between the different populations characteristics.(S5-S9 Tables). Only 53/1588 (3.3%) patients had an interval between blood tests and elasticity measurements between one to six months.

Standardization of elasticity values
The standardization of elasticity values (Fig 2. Panel A) by using Log10 transformation (Fig 2. Panel B) and limiting the maximum to 75 kPa and dividing by 75, permitted to obtain as for FibroTest a similar range between 0.00 to 1.00 as well as less skewed distribution (Fig 2. Panel C).
The lowest CV was observed for "Mean of standardized elasticity mean values in Qbox". (S10 Table) Identification of reliability criteria for 2D-SWE measurements We retrieved five definitions of failure and ten definitions of non-reliability. (S2 Table). We were able to assess four criteria of quality.
For the "minimal 2D-SWE value" we confirmed a discriminant cut-off at 0.2 kPa. The LCC was not significant between 2D-SWE and FibroTest, only in patients with minimal elasticity <0.2kPa (Table 1)(S4 Table) (S1 and S2 Figs).
For the elasticity CV (S10 Table), for elasticity measure depth (S11 Table), and for BMI (S12 Table), we did not found sufficiently discriminant cutoffs comparatively to the "minimal 2D-SWE value".  Table 2). In a total of 145 patients, elasticity measure was applicable using 2D-SWE and not applicable using TE-M.(S6 Table) In a total of 53 patients, elasticity measure was applicable using 2D-SWE despite not applicable using TE-XL.(S7 and S8 Tables). These patients who benefit from 2D-SWE had a higher prevalence of NAFLD.  After stratification on inflammation severity, LCC remained significantly lower for 2D-SWE only in patients without significant inflammation, in comparison with TE-M and TE-XL. After stratification on steatosis severity, LCC remained significantly lower for 2D-SWE in patients without significant inflammation in comparison with TE-M, and in patients with significant steatosis in comparison with TE-XL.

Curves fitting
After stratification on both inflammation and steatosis, LCC remained significantly lower for 2D-SWE, only in patients without significant inflammation/steatosis. The exclusion of patients (3%) with interval greater than 30 days did not change significantly the LCC for activity or steatosis the results. Among the 214 non-reliable SWE of this table, only 132 were identified in the SWE reliability population (132/1720 = 7.7%) using the minimal value cutoff 0.2 kPa. The remaining 82 patients had also not reliable TE-or TE-XL and were excluded of the intention to diagnose population (Fig 1).
Variability of presumed prevalence of severe fibrosis (F3F4) in CHC according to the combinations of elasticity results. (S17 Table). Using FibroTest as a reference, the prevalence of F3F4 was 35.4%(31. 6-39.4 Using the worst elasticity value among the 3 elasticity values, the presumed prevalence of F3F4 was 27.4%(23.8-31.1), and using only patients with the three concordant values, the prevalence of F3F4 was only 10.5%(8.2-13.5)(P<0.0001).
Consequence of inflammation and steatosis on the prevalence of severe fibrosis, as presumed by elasticity methods in CHC. According to elastography method, inflammation and steatosis, the presumed prevalence of F3F4 varied from 9.6%(6.7-13.3) using TE-XL in patients without significant inflammation and without significant steatosis, to 72.0%(50.6-87.9; P<0.0001) using TE-M in patients with significant inflammation and significant steatosis.
Analyses of diagnostic performances in intention to diagnose. In intention to diagnose, when applicability and inflammation were taken into account, 2D-SWE had higher performance for the diagnosis of  Table).

Discussion
An ideal study would have been to obtain a large surgical biopsy for each patient included, a perfect reference. In the absence of such perfect references, the standard method was to use biopsies, a non-perfect reference, with its own limitations including sampling error. Even 25 mm length biopsies had 25% of false positive/negative rates for the diagnosis of fibrosis stage, activity grade and steatosis grade [20,21]. The third approach we choose was to use validated blood tests as non-perfect references, with their own limitations and advantages. These approaches were complementary, and could permit to respond more rapidly to unmet needs [4,22].

Limitations of blood tests as references
The first main concern was that the reference utilized to assess fibrosis, inflammation and steatosis were suboptimal. We acknowledge that FibroTest had limitations, but had been extensively and independently validated with a low risk of non-reliable results, around 2% [11]. Even if the discordances rates were always around 20% versus elasticity measurements or biopsy fibrosis score, the prognostic performances of FibroTest were similar or greater than those of biopsy or TE-M for the most frequent chronic liver diseases [2,23,24,25,26]. Furthermore, the natural history of fibrosis progression estimated using FibroTest was similar to that estimated using biopsy [27]. Liver fibrosis progression was assessed using biopsy and FibroTest in 2,472 untreated patients: 770 with CHC, 723 with CHB, 761 with NAFLD, and 218 with ALD. We observed highly significant concordance between FibroTest and biopsy estimates of hazards with intraclass correlation = 0.961 (95% CI 0.948-0.970) and 0.899 (95% CI 0.135-0.969) for cirrhosis and for minimal fibrosis, respectively. This concordance persisted according to the disease and the gender.
ActiTest also has been extensively histologically validated (5,326 patients) and is the only blood tests with diagnostic performance greater than transaminases for the prediction of necro-inflammatory histologic activity [12,26]. Validation studies were not easily identified in PubMed (S3 File). One example of omitted evidence based was the largest histological validation of ActiTest in 1,459 patients of a prospective trial in CHC with biopsies [25].
We acknowledge that SteatoTest had much less studies available (3,253 patients), than FibroTest and ActiTest [13,26]. Validation studies were not easily identified in PubMed (S4 File). One example of omitted evidence based was the largest validation of SteatoTest in 1,415 patients of a prospective trial in CHC with biopsies [25].
The following other limitations were detailed in S5 File. We acknowledge that despite statistical significance, the difference in curve-fitting between TE and 2D-SWE were moderate and could be viewed as not clinically relevant. The measurements were not blinded as the same operator performed successively TE-M, TE-XL and 2D-SWE. We utilized the same cut-off values for 2D-SWE and TE-XL as of TE-M in the absence of consensual cut-offs. There was a low prevalence of patients with decompensated cirrhosis as one previously observed advantage of 2D-SWE was its higher applicability in patients with ascites than for TE-M. [4] We did not estimate automatic variability assessment, such as that combining CV and temporal variability. [5] Few patients (3%) had an interval between blood tests and elasticity measurements between one to six months but their exclusion did not change significantly the results. (Table 3) There was missing data for SteatoTest in 318 patients, but these patients were similar to those with non-missing SteatoTest.

Advantages of the present study
Standardization of elasticity measures. The first original result was that logarithmic transformation but also standardization according to range had a direct impact for concordance analyses. 2D-SWE has a possible range up to 300kPa compared to a 75kPa maximum value for TE-M. A method with larger range of elasticity values will have an artificial decrease in quantitative concordance coefficient if not standardized as the reference method. These rules should be discussed in specific guidelines.
Quality criteria for 2D-SWE results. The second original result was the clarification of the relative interest of four quality criteria proposed for 2D-SWE. Among these criteria only the minimal value of the elasticity in the ROI minimal (0.2kPa) seemed useful as a cutoff to identify and exclude unreliable results. As this cutoff was predetermined and validated in a previous study, it could be recommended as a simple criterion for clinicians. [4] Applicability of 2D-SWE. The better applicability rate of 2D-SWE versus TE-M was confirmed. [2,4] These patients who benefit from 2D-SWE had a higher prevalence of NAFLD. These results were in accordance with the lower impact of steatosis on elasticity measured by 2D-SWE. These differences confirmed that comparisons between tests must be performed in intention to diagnose [18].

Impact of inflammation and steatosis on 2D-SWE elasticity estimates.
For the first time to our knowledge it was possible, in a large number of patients with different liver diseases, to assess the relative impact of inflammation and steatosis on the elasticity value, independently of fibrosis severity. The influence of inflammation on elasticity measured by TE-M has been well validated, but few studies were published for 2D-SWE. [2,11,12] The influence of steatosis on elasticity was a matter of debate with conflicting results in TE-M studies: some studies suggested that steatosis was associated to an increase in whereas did not. [2,28,29] Here we observed that for all elastography methods, elasticity was increased by both inflammation and steatosis. The curve fitting, the univariate and the multivariate analyses clearly demonstrated that inflammation increased more the elasticity value than steatosis. Steatosis measure presumed by SteatoTest seemed more sensitive than when presumed by CAP.
Comparisons between elastography methods, in per-protocol and intention to diagnose. As already described, the comparison between blood tests and elastography performances should be performed using direct comparisons, and intention to diagnose analyses. [18] In intention to diagnose, when applicability and inflammation were taken into account, 2D-SWE had higher performance for the diagnosis of F3F4 versus TE-M, difference which was not identified in standard per-protocol comparisons (S18 Table). In patients with high risk of inflammation or steatosis, 2D-SWE had an advantage for being more applicable and more specific for staging fibrosis than the TE-M, the present standard elastography method. Further studies are necessary to compare other new elastography methods such as ARFI or other real time elastography [9,30].
In patients with CHC, according to the severity of inflammation and steatosis, and to the elastography method, the presumed prevalence of F3-F4 varied from 9.6% to 72.0%. This spectrum variability should be taken into account for the prioritization of reimbursement of DAA.

Conclusion
Elasticity results 2D-SWE including minimal signal <0.2 kPa in the ROI should be considered as not reliable. 2D-SWE had a higher applicability than TE-M the reference elastography, with less impact of inflammation and steatosis especially in patients with non-advanced fibrosis, as presumed by blood tests.  Table. Characteristics of patients with TE-M not-applicable tests and 2D-SWE applicable compared to patients of the "concordance population" (n = 1,588). (DOCX) S9 Table. Characteristics of patients with TE-XL not-applicable tests and 2D-SWE applicable compared to patients of the "concordance population" (n = 1,588). (DOCX) S10 Table. Comparison of the SWE coefficient of variation (ratio standard deviation/stiffness mean) cutoffs, according to concordance with the 3 other reliable tests' results. Reliability population, n = 1720. (DOCX) S11 Table. Comparison of the SWE depth cutoffs, according to concordance with the 3 other reliable tests' results, in the "Reliability population" (n = 1,720). Depth (mm) measured from probe surface to the top of the region of interest. (DOCX) S12 Table. Comparison of the WHO BMI cutoffs, according to concordance of SWE with the 3 other reliable tests' results, in the "reliability population" (n = 1,720). (DOCX) S13 Table. Association between elasticity values and body mass index according to elastography method, adjusted by fibrosis severity. (DOCX) S14 Table. Multivariate analysis of diagnostic performance of 2D-SWE elasticity for the diagnosis of F3F4 presumed by FibroTest, adjusted on inflammation, steatosis or liver disease. (DOCX) S15 Table. Multivariate analysis of diagnostic performance of TE-M elasticity for the diagnosis of F3F4 presumed by FibroTest, adjusted on inflammation, steatosis or liver disease. (DOCX) S16 Table. Multivariate analysis of diagnostic performance of TE-XL elasticity for the diagnosis of F3F4 presumed by FibroTest, adjusted on inflammation, steatosis or liver disease. (DOCX) S17 Table. Performances of elastography methods in CHC patients (n = 599) for the diagnosis of severe fibrosis (F3F4) presumed by FibroTest and using a standard cutoff of 9.5 kPa. Impact of inflammation and steatosis severities, presumed by ActiTest (n = 599) and Stea-toTest (n = 477). (DOCX) S18 Table. Comparison of diagnostic performances of SWE, TE-M, TE-XL for the diagnosis of severe fibrosis (F3-F4) presumed by FibroTest. Analyses performed in "Intention to diagnose" in 2,239 patients, and "Per protocol" in 1,588 patients with applicable elasticity data. (DOCX)