Inter-vendor reproducibility of left and right ventricular cardiovascular magnetic resonance myocardial feature-tracking

Aim Since cardiovascular magnetic resonance feature-tracking (CMR-FT) has been demonstrated to be of incremental clinical merit we investigated the interchangeability of global left and right ventricular strain parameters between different CMR-FT software solutions. Material and methods CMR-cine images of 10 patients without significant reduction in LVEF and RVEF and 10 patients with a significantly impaired systolic function were analyzed using two different types of FT-software (TomTec, Germany; QStrain, Netherlands). Global longitudinal strains (LV GLS, RV GLS), global left ventricular circumferential (GCS) and radial strains (GRS) were assessed. Differences in intra- and inter-observer variability within and between software types based on single and up to three repeated and subsequently averaged measurements were evaluated. Results Inter-vendor agreement was highest for GCS followed by LV GLS. GRS and RV GLS showed lower inter-vendor agreement. Variability was consistently higher in healthy volunteers as compared to the patient group. Intra-vendor reproducibility was excellent for GCS, LV GLS and RV GLS, but lower for GRS. The impact of repeated measurements was most pronounced for GRS and RV GLS on an intra-vendor level. Conclusion Cardiac pathology has no influence on CMR-FT reproducibility. LV GLS and GCS qualify as the most robust parameters within and between individual software types. Since both parameters can be interchangeably assessed with different software solutions they may enter the clinical arena for optimized diagnostic and prognostic evaluation of cardiovascular morbidity and mortality in various pathologies.

Introduction Cardiovascular magnetic resonance feature tracking (CMR-FT) is a technique analogous to speckle tracking echocardiography (STE), a non-Doppler based technique to assess cardiac mechanics [1]. Quantitative wall motion parameters coming from STE demonstrate high value for prognosis and mortality prediction [2] over and above classical parameters such as ejection fraction (EF) [3]. Studies show good agreement between STE and CMR-FT [4][5][6] and recently similar utility for prognosis assessment has been demonstrated for CMR-FT [7][8][9]. Furthermore, reasonable agreement between CMR-FT and CMR-tagging [10], which is considered the CMR reference standard for quantitative wall motion assessment [11] has been demonstrated. Parameters derived from CMR-FT in contrast to CMR-tagging do not require the acquisition of additional sequences and time consuming post processing, but allow quantitative deformation parameters to be derived from routinely acquired steady-state-free-precession (SSFP) sequences [12]. CMR-FT has proven reliability [13,14] and is receiving increasing interest due to mounting evidence regarding the clinical applicability in a variety of cardiovascular diseases [7,8,12,[15][16][17][18][19][20].
Notwithstanding these considerations significant numerical differences in strain assessments have been demonstrated between different CMR-FT software types (2D CPA MR, Tom-Tec GmbH, Unterschleissheim, Germany and Tissue Tracking, cvi 42 , Circle Cardiovascular Imaging Inc., Calgary, Canada) [21]. Recently Medis Medical Imaging Systems (Leiden, Netherlands) have released an alternative tool called QStrain. Although both solutions share a similar basic algorithm [11], they offer different workflows with Medis requiring a higher degree of manual user interaction than TomTec. Consequently, the aim of the present study was to assess the reproducibility and inter-vendor agreement between the established TomTec methodology and the new solution provided by Medis in regard to global left and right ventricular strain values.

Study population
The study cohort consisted of 10 patients with normal left ventricular ejection fraction (LVEF) and 10 patients with significantly impaired systolic function. The research was conducted in accordance with general ethical approval for additional research analyses on clinically acquired data granted by the Ethics committee of the University Medical Centre Goettingen. All patients gave written informed consent and all clinical investigations have been conducted according to the principles expressed in the Declaration of Helsinki.

CMR imaging
CMR imaging was carried out on a SIEMENS Symphony 1.5 Tesla system in the supine position using a five-channel cardiac surface coil. Electrocardiogram (ECG)-gated SSFP cine sequences in long-axis 2-and 4-chamber views and 12 to 14 equidistant short-axis planes completely covering the left ventricle were acquired during brief periods of breath-holding (25 frames/cardiac cycle). Typical CMR parameters were as follows: pixel spacing: 1.6mm x 1.6mm; 7 mm slice thickness; 8 mm inter-slice distance; TE: 1.4ms; TR: 46ms.

Volumetric analysis
CMR based volumetric analysis was performed using the dedicated software solution provided by Medis Medical Imaging Systems (QMass, Version 7.6).

CMR-Feature Tracking (CMR-FT)
CMR-FT was performed using the software provided by TomTec Imaging Systems (2D CPA MR, Cardiac Performance Analysis, Version 4.6.3.9) and Medis Medical Imaging Systems (QStrain, Version 2.1.12.2). The software tools will be referred to as "TomTec" and "QStrain" in the following sections of the paper. Strain was assessed with both software types at the following locations: long axis 2-and 4-chamber views; short axis sections at basal, mid-ventricular and apical levels. The slices of the different short axis levels were identified as follows; basal level: last slice showing the complete left ventricular myocardium throughout the entire cardiac circle without in plane appearance of the left ventricular outflow tract (LVOT) at end-systole; mid-ventricular: slice located at the level of both papillary muscles; apical: slice showing consisting blood-pool cavity throughout the entire cardiac cycle (no obliteration of the lumen at end-systole). RV tracking was performed including the septum.
With TomTec left ventricular (LV) endocardial and epicardial borders were manually delineated at short and long-axis views with the initial contour set at end-diastole. Due to the thin myocardial wall right ventricular (RV) tracking was performed only delineating an endocardial contour. Workflow using QStrain was different since the software introduces the work step to delineate cardiac contours both at end-diastole and end-systole. In case of insufficient tracking, as defined by apparent deviations of the contours from the endocardial and/or epicardial borders, contours were manually corrected and the algorithm reapplied. All measurements were repeated three times in all sections.
All patients were analyzed by one single observer (RJG) using both types of software. The same observer repeated the analysis on the same data-sets four weeks later to assess intraobserver variability. Inter-observer reproducibility was derived from the tracking results of a second skilled observer (TL). To study the impact of repeated measurements on reproducibility results based on a single measurement (R1) were compared with the results for these parameters derived from two (R2) and three (R3) repeated and subsequently averaged measurements.

Statistical analysis
Microsoft Excel and IBM SPSS Statistics version 23 for Mac were used to conduct statistical analysis. All continuous data are reported as mean ± standard deviation. Statistical parameters to assess inter-vendor agreement and intra-and inter-observer variability were calculated as follows: Bland-Altman analysis [22] (mean difference between measurements with 95% limits of agreement (±1.96 standard deviations)), intra-class correlation coefficients (ICC) using a model of absolute agreement (agreement was considered excellent when ICC > 0.74, good when ICC = 0.60-0.74, fair when ICC = 0.40-0.59, and poor when ICC < 0.4 [23]) and the coefficient of variation (CoV) (defined as the standard deviation of the differences divided by the mean [24]). The Kolmogorov-Smirnov test was applied to test for normal distribution of the data [25]. To compare mass and volumetric parameters between healthy volunteers and the patient group parametric parameters were tested according to the t-test, while the Man-Whitney U test was applied for non-parametric data. Pairwise non-parametric strain parameters assessed with each vendor were compared using the Wilcoxon test. The Mann-Whitney U test was used to analyze whether there was a significant difference between nonparametric strain parameters for healthy volunteers and patients with impaired cardiac function, respectively. Significance was defined as p < 0.05.

Participant details
Demographics are displayed in Table 1. Quantitative analyses were performed in all subjects, no subject was excluded. Fig 1 shows representative assessments of LV circumferential strains at basal level with both software types including contours and corresponding strain curves. Inter-vendor agreement was excellent for GCS and LV GLS for both intra-and interobserver levels. RV GLS and GRS both showed lower inter-vendor agreement.

Inter-vendor agreement
There was no significant difference between vendors regarding the averaged results (R3) for LV GLS (p = 0.079), GCS (p = 0.502) and RV GLS (p = 0.093). GRS measured with QStrain was significantly higher (p < 0.001) than measured with TomTec. These findings were similar for the analysis based on a single repetition (R1) and two averaged repetitions (R2) for LV GLS, GCS and GRS. RV GLS based on a single measurement (R1) only, however, was significantly lower measured with TomTec (p = 0.033). https://doi.org/10.1371/journal.pone.0193746.g001 Table 2. Inter-vendor agreement and intra-vendor reproducibility at intra-and inter-observer levels for global longitudinal, global circumferential and global radial strain based on three averaged measurements (R3).

Reproducibility
All values for mean difference ± standard deviation, ICC and CoV as derived from three averaged measurements (R3) are given in Table 2. In both vendors, intra-vendor reproducibility was best for GCS followed by LV GLS. RV GLS showed excellent intra-vendor reproducibility with both types of software. GRS showed the highest intra-vendor variability amongst all parameters. Whilst TomTec showed higher intra-than inter-observer reproducibility, interobserver reproducibility with QStrain was slightly better than intra-observer reproducibility. Table 3 displays the results for inter-vendor agreement and intra-observer variability based on one measurement (R1) as compared to two averaged measurements (R2, Table 4) and three averaged measurements (R3, Table 2). Fig 3 shows the impact of repeated measurements on inter-vendor agreement based on CoV and ICC. Repeated measurements had moderate impact on LV GLS and GCS regarding inter-vendor agreement and intra-vendor variability (Tables 2, 3 and 4). The effect was more pronounced for RV GLS and GRS regarding intravendor reproducibility but not inter-vendor agreement (Tables 2, 3 and 4). Table 5 reports inter-vendor agreement based on mean differences with limits of agreement for the whole study cohort, healthy volunteers and patients with impaired cardiac function.

Fig 2. Reproducibility for CMR-FT derived global strain parameters at intra-and inter-observer levels.
Inter-vendor agreement for global strain parameters for healthy volunteers and patients with impaired cardiac output based on three averaged measurements (R3). Panel a-d: Bland-Altman plots with limits of agreement (95% confidence intervals) demonstrating the CMR-FT derived reproducibility at an intra-observer level are being displayed. Panel e-h: Bland-Altman plots with limits of agreement (95% confidence intervals) demonstrating the CMR-FT derived reproducibility at an inter-observer level are being displayed.
https://doi.org/10.1371/journal.pone.0193746.g002 Table 3. Inter-vendor agreement and intra-vendor reproducibility at intra-and inter-observer levels for global longitudinal, global circumferential and global radial strain based on one measurement (R1).   Fig and S2 Fig). Inter-vendor agreement was higher for all global strain parameters in the patient group as compared to the volunteers (Table 5). This applied to the results on an intra-observer and on an inter-observer level. Consistently, LV GLS and GCS were found to show the best inter-vendor agreement in patients and healthy volunteers, respectively. In all groups agreement between vendors was reasonable for RV GLS and lower for GRS. This was paralleled by similar results on intra-vendor levels both for TomTec and for QStrain (see Table 2, S1 and S2 Tables).

Discussion
To our knowledge this is the first study investigating the performance and potential differences between the recently introduced CMR-FT tool QStrain and the more established TomTec software, that has already been used and validated in a variety of studies [6,21,26]. First, our findings show reasonable inter-vendor agreement between both types of software. GCS and LV GLS qualify as the best parameters with excellent reproducibility and interchangeability based on a single analysis. Second, RV GLS and GRS are less robust with significant inter-vendor variability. However, intra-vendor reproducibility of these parameters can be improved by repeated analysis runs resulting in sufficient reproducibility when using a single software type. Consequently, it is important staying within one vendor when calculating these parameters. Third, we could show that reproducibility of CMR-FT within and between software types is not adversely affected by impaired ventricular function.
Our results consistently indicate the high clinical applicability of GCS and LV GLS as most robust parameters in both types of software, which is in line with previously published Table 4 Inter-vendor reproducibility of cardiovascular magnetic resonance myocardial feature-tracking literature [12][13][14]16]. Similar to the current study, GCS has been repeatedly shown to have the least variability in previous CMR-FT studies [13], earlier studies comparing STE and CMR-FT [4,5] and tagging and CMR-FT [27]. Buss et al. found CMR-FT derived LV GLS and GCS to serve as a predictors of cardiac events, independent of clinical and laboratory markers, LVEF and late gadolinium enhancement in patients with dilated cardiomyopathy [7]. More recently, Orwat et al. could show that in patients with repaired tetralogy of Fallot LV GLS and GCS are significantly associated with outcome [8]. This growing evidence for LV GLS and GCS as prognostic tools in a variety of diseases and their high interchangeability between different vendors based on a single analysis underline their potential prospective incremental clinical merit. RV GLS and GRS have already been reported in previous studies to show lower inter-vendor reproducibility than GCS and LV GLS [21]. This was in line with our findings. GRS represents strain throughout the entire myocardial wall from subepicardium to subendocardium and is consequently much more affected by through plane motion [21] and complex diastolic and systolic twisting motion [28] than LV GLS and GCS that are predominantly assessed with subendocardial tracking. The two software solutions, which were applied in the present study rely on a similar algorithm, which explains the good agreement for LV GLS, GCS and RV GLS. A higher degree of manual user interaction (e.g. manual defining of both, systolic and diastolic contours) as required by QStrain did not impact reproducibility. As mentioned above the lower reproducibility for GRS, however is inherent to FT algorithms based on optical flow methods as previously shown [13,14,21,27]. Interestingly a recently introduced CMR-FT software (Segment CMR by Medviso, Lund, Sweden) using an algorithm which incorporates non-rigid image registration [29,30] seems to overcome this limitation as suggested by Morais et. al. [31]. Instead of tracking endocardial borders only this new algorithm tracks the entire image content (i.e. blood pool and the entire myocardium) and thus a higher number of myocardial image samples. As mentioned before GRS represents strain throughout the entire Inter-vendor reproducibility of cardiovascular magnetic resonance myocardial feature-tracking myocardial wall and is therefore the strain parameter that is particularly affected by this different approach. Morais et al. could show that this leads to a significantly better reproducibility for GRS than reported in all previous studies that were carried out with CMR-FT software which did not incorporate non-rigid image registration [7,13,14,16,21,27]. Notwithstanding these considerations, intra-vendor agreement for RV GLS can be improved through repeated runs and was excellent in both types of software based on three averaged measurements. Thus, our study points out two important aspects: first, both software solutions can serve as a reliable tool in the assessment of CMR-FT derived right ventricular strain; second, three averaged runs make results for RV GLS significantly more reliable and might justify a threefold increased analysis time in future studies when assessing right ventricular strain. This is an important finding since the role of strain deterioration in right ventricular pathologies is of growing research interest with potential clinical utility. For instance recently published studies indicate that CMR-FT derived RV GLS analyses play a promising role in the detection of right ventricular pathologies such as arrhythmogenic right ventricular cardiomyopathy (ARVC) [17] and even allow a prediction of subsequent clinical deterioration in diseases affecting the right ventricle such as pulmonary hypertension [19]. In addition, RV GLS was recently identified as a predictor of outcome in patients with repaired tetralogy of Fallot [8].

TomTec versus QStrain
Our data indicate no significant differences between TomTec and QStrain comparing values for LV GLS, GCS and RV GLS. Earlier vendor-comparisons using TomTec and Circle, cvi 42 (Circle) found interchangeability for GCS between these vendors to be limited, because values for GCS measured with Circle were significantly lower than assessed with TomTec [21]. As a result, interchangeability between TomTec and QStrain is superior to the interchangeability between TomTec and Circle. However, it is important to note that regardless of the number of repetitions applied GRS results are not interchangeable as values for this parameter were significantly higher measured with QStrain. This difference between vendors was, on the contrary, not found comparing TomTec and Circle [21]. Taking into account normal values for GRS according to a review by Claus et. al [32]) and the range for GRS observed in this study cohort (8% to 61%), a mean difference between vendors of 12.16% (Table 2) appears quite considerable. Thus, serial GRS assessments with both software types e.g. during patient follow-up at different hospitals do not seem to represent a valid clinical methodology. Since the intravendor reproducibility is adequate one needs to decide which software to use consistently when analyzing GRS in future studies to avoid poor interchangeably between vendors and to allow GRS to be quantified with accurate performance.
In line with the findings from STE studies [33] reproducibility was even higher in patients with heart failure than in healthy volunteers. These findings appear intuitive as strain in healthy volunteers is usually higher than in patients with impaired cardiac function, which was also the case in the current study (Table 1). Higher strain parameters indicate more cardiac motion and are thus much stronger affected by through plane motion effects, which are known to lower reproducibility at the regional level [13,34,35]. However, contradictory to our results the study by Morais et al. has shown that intra-observer, inter-observer and interstudy variability is similar in healthy volunteers and patients with known myocardial pathology [31]. When interpreting these differences, it is important to note that first the technique used by Morais et al. is based on a different CMR-FT algorithm and second LVEF in the patient group analyzed by Morais et al. was significantly higher than in our patient group. Moreover, to divide study groups based on LVEF might represent a confounding factor for the comparison of reproducibility in health and disease as there is evidence from echocardiographic studies [36,37] and other CMR-FT studies [17] that strain parameters can be impaired, when global function parameters may still be normal.
When interpreting the results of the current study, it is important to bear in mind that the main step for the assessment of myocardial strain with CMR-FT is the initial and manual delineation of the endo-and epicardial contour by a skilled observer. Although the identification of these contours can be performed easily and quickly some variation between two different observers is inherent to the process and thus a potential source of variability. Moreover, the process is complicated by the fact that rotational and strain metrics estimates neglect outof-plane movement of the myocardium throughout the heart cycle when using 2 dimensional techniques. Ideally, further refinements should aim at the development of 3 dimensional techniques with fully automatic analysis solutions to overcome these limitations.

Study limitations
Even though our study showed reasonable inter-vendor agreement for global strain parameters the results have to be interpreted with caution. First and foremost, the study population was quite small and patients and healthy volunteers were divided into each group only regarding to their LVEF, while right ventricular performance or etiology for an impairment of the LVEF was of no concern. Furthermore, both groups were not age matched, however, the age differences between both groups according to the t-test were not significant (p = 0.14). Besides distribution between sexes was balanced between the two groups but not within each group. However, the study did not aim at any quantitative comparison of strain parameters between health and disease, nor according to the etiology for the impairment of cardiac function, age or sex. Additionally, no true blinding of the observers as to whether a subject belonged to the healthy volunteer or the patient group could be achieved since a marked reduction in systolic function can be easily appreciated from the original images. However, all observers were blinded as well to any of their own results as to the results of the second observer. Off note, in the present study GCS and GRS were derived from the exactly same slices by all observers. It is important to note that especially for the mid-ventricular and the apical slices there might be a second or even a third slice that would have met the specified slice selection criteria. Thus a different slice selection among different observers is a possible source of variability in clinical practice that is not reflected in the results of the current study. Notwithstanding this consideration this study aimed to quantify the variability inherent to the tracking performance rather than the variability that is introduced by slice selection.
As no echocardiographic or CMR-tagging was performed in any of the patients the study did not include any independent reference standard. Notwithstanding, TomTec has been validated against myocardial tagging with excellent agreement [10] and speckle tracking echocardiography with reasonable to good agreements in earlier studies [4][5][6]. Besides, we did not aim at another comparison between different techniques to assess cardiac dynamics but rather at an inter-vendor comparison between the two types of commercially available CMR-FT software to clarify how well they agree with each other and to what degree results can be used interchangeably.

Conclusion
In conclusion, our study shows reasonable inter-vendor agreement between both types of software without negative affection of reproducibility when studying either healthy subjects or patients with cardiac pathology. LV GLS and GCS qualify as the most robust parameters and can be used interchangeably based on single measurements only. When analyzing right ventricular strain with either vendor three repeated runs are highly recommendable to improve reproducibility. Independent of the number of repetitions interchangeability of RV GLS and GRS may be questioned based on our results. Consequently, one should stay within one vendor when assessing these parameters.
If further studies will be able to confirm these findings CMR-FT derived quantitative deformation parameters may be fully implemented within routine clinical MR examinations for optimized diagnostic assessments and risk prediction in various cardiac pathologies.
Supporting information S1 Fig. Reproducibility for CMR-FT derived global strain parameters at intra-and interobserver levels for normal subjects. Inter-vendor agreement for global strain parameters for normal subjects based on three averaged measurements (R3). Panel a-d: Bland-Altman plots with limits of agreement (95% confidence intervals) demonstrating the CMR-FT derived reproducibility at an intra-observer level are being displayed. Panel e-h: Bland-Altman plots with limits of agreement (95% confidence intervals) demonstrating the CMR-FT derived reproducibility at an inter-observer level are being displayed. (TIF)

S2 Fig. Reproducibility for CMR-FT derived global strain parameters at intra-and interobserver levels for patients with impaired cardiac function.
Inter-vendor agreement for global strain parameters for patients with impaired cardiac function as defined by reduced ejection fraction based on three averaged measurements (R3). Panel a-d: Bland-Altman plots with limits of agreement (95% confidence intervals) demonstrating the CMR-FT derived reproducibility at an intra-observer level are being displayed. Panel e-h: Bland-Altman plots with limits of agreement (95% confidence intervals) demonstrating the CMR-FT derived reproducibility at an inter-observer level are being displayed. (TIF) S1 Table. Inter-vendor agreement and intra-vendor reproducibility at intra-and interobserver levels for global longitudinal, global circumferential and global radial strain for normal subjects based on three averaged measurements (R3). SD, standard deviation; Diff., differences; ICC, intra-class correlation coefficient; CoV, coefficient of variation; CI, confi-