Arterial spin labeling perfusion MRI : Inter-vendor reproducibility and clinical applicability

Introduction A main obstacle that impedes standardized clinical and research applications of arterial spin labeling (ASL), is the substantial differences between the commercial implementations of ASL from major MRI vendors. In this study, we compare a single identical 2D gradient-echo EPI pseudocontinuous ASL (PCASL) sequence implemented on 3T scanners from three vendors (General Electric Healthcare, Philips Healthcare and Siemens Healthcare) within the same center and with the same subjects. Material and Methods Fourteen healthy volunteers (50% male, age 26.4 ± 4.7 yrs) were scanned twice on each scanner in an interleaved manner within three hours. Because of differences in gradient and coil specifications, two separate studies were performed with slightly different sequence parameters, with one scanner used across both studies for comparison. Reproducibility was evaluated by means of quantitative cerebral blood flow (CBF) agreement and inter-session variation, both on a region-of-interest (ROI) and voxel level. In addition, a qualitative similarity comparison of the CBF maps was performed by three experienced neuro-radiologists. Results There were no CBF differences between vendors in study 1 (p>0.1), but there were CBF differences of 2-19% between vendors in study 2 (p<0.01 in most ROIs) and 10-22% difference in CBF values obtained with the same vendor between studies (p<0.01 in most ROIs). The inter-vendor intersession variation was not significantly larger than the intra-vendor variation in all (p>0.1) but one of the ROIs (p<0.01). Conclusion This study demonstrates the possibility to acquire comparable cerebral CBF maps on scanners of different vendors. Small differences in sequence parameters can have a larger effect on the reproducibility of ASL than hardware or software differences between vendors. These results suggest that researchers should strive to employ identical labeling and readout strategies in multi-center ASL studies.


Introduction
Arterial spin labeling (ASL) is an emerging magnetic resonance imaging (MRI) perfusion modality that enables non-invasive cerebral perfusion measurements. Since ASL is virtually harmless, not hampered by the blood-brain barrier and enables absolute quantification of cerebral blood flow (CBF), it is an attractive tool compared to other perfusion imaging modalities [1,2]. Through several methodological advances, ASL perfusion MRI has matured to the point where it can provide high quality wholebrain perfusion images in only a few minutes of scanning [3]. Its reproducibility has been established and its CBF-maps are comparable with imaging methods based on exogenous tracers [4][5][6][7]. ASL is commercially available on all major MRI systems and clinical applications are under rapid development. ASL-based CBF measurements are of clinical value in a number of cerebral pathologies, such as brain tumors, cerebrovascular pathology, epilepsy and neurodegeneration [8,9]. Therefore, the initiation of large-scale multi-center ASL studies is a next step to extend our understanding of the pathophysiology of many common disorders.
However, it is essential to first establish the inter-vendor reproducibility of ASL [10,11]. One main obstacle that impedes multi-center studies, is that fundamental differences exist between ASL implementations of different vendors. Each MRI vendor has implemented a different labeling-readout combination, which may seriously hamper the comparison of multi-vendor ASL-data [12]. Since each labeling and readout strategy exhibits specific advantages and disadvantages, a substantial technical heterogeneity is introduced [13]. Therefore, it remains unclear to which degree ASL-based CBF-maps from centers with scanners of different vendors are comparable. The aim of the current study is to assess and compare the intra-and inter-vendor reproducibility of pseudo-continuous ASL (pCASL) CBF measurements as currently clinically implemented by two major vendors: i.e. GE and Philips.

Subject recruitment and study design
Twenty-two healthy volunteers (9 men, 13 women, mean age 22.662.1 (SD) years) were included. In addition to standard MRI exclusion criteria, subjects with history of brain or psychiatric disease or use of medication -except for oral contraceptives -were excluded. No consumption of vasomotor substances such as alcohol, cigarettes, coffee, licorice and tea was allowed on the scan days. On the day prior to the examination, alcohol and nicotine consumption was restricted to three units and cigarettes respectively.
All subjects were scanned twice at two academic medical centers in the Netherlands: Erasmus MC -University Medical Center Rotterdam (center 1) and Academic Medical Center Amsterdam (center 2). The inter-session time interval was kept at 1-4 weeks. MRI experiments were performed on a 3T GE scanner at center 1 (Discovery MR750, GE Healthcare, Milwaukee, WI, US) and on a 3T Philips scanner at center 2 (Intera, Philips Healthcare, Best, the Netherlands), both equipped with an 8-channel head coil (InVivo, Gainesville, FL, US). Foam padding inside the head coil was used to restrict head motion during scanning [10]. Subjects were awake and had their eyes closed during all ASL scans.

Ethics statement
All subjects provided written informed consent and the study was approved by the ethical review boards of both centers.

Acquisition
Each scan session included a pCASL and 1 mm isotropic 3D T1-weighted scan for segmentation and registration purposes. For the acquisition of a single time-point CBF-map, pCASL has become the preferred labeling strategy because of its relatively high signal-to-noise ratio (SNR) and wide availability across all platforms [3,14]. On both scanners we employed the clinically implemented pCASL protocols that are currently used in clinical studies [15,16]. Table 1 and Figure 1 summarize the protocol details and show the timing diagrams for both sequences respectively. The main difference between the GE and Philips implementations was the readout module: multi-shot spiral 3D fast spin-echo vs. single-shot 2D gradient-echo echo-planar imaging respectively.
Post-processing: quantification Matlab 7.12.0 (MathWorks, MA, USA) and Statistical Parametric Mapping (SPM) 8 (Wellcome Trust Center for Neuroimaging, University College London, UK) were used for postprocessing and statistical analyses. For the Philips data, label and control pCASL images were pair-wise subtracted and averaged to obtain perfusion-weighted images. For the GE data, the perfusionweighted images as directly provided by the scanner were used. Since the images as provided by GE did not incorporate motion correction, this was not applied to the Philips data. The perfusionweighted maps of both vendors were quantified into CBF maps using a single compartment model [3,17]: where DM represents the difference images between control and label and M 0a the equilibrium magnetization of arterial blood. In Philips, DM was corrected for the transversal magnetization decay time (T 2 *) of arterial blood (48 ms) during the 17 ms echo time (TE) by e TE/T2* [18]. PLD is the post-labeling delay (1.525 s), T 1a is the longitudinal relaxation time of arterial blood (1.650 s), a is the labeling efficiency (0.8), where a inv corrects for the decrease in labeling efficiency due to the 5 and 2 background suppression pulses at GE (0.75) and Philips (0.83) respectively and t represents the labeling duration (1.450 s and 1.650 s for GE and Philips respectively) [19][20][21]. The increase in label decay in the ascending acquired 2D slices in Philips-data was accounted for. GE has, but Philips has not, implemented a standard M 0 -acquisition where proton density maps are obtained with a saturation recovery acquisition using readout parameters identical to the ASL readout. These maps were converted to M 0a by the following equation: where t sat is the saturation recovery time (2 s), T 1GM is the relaxation time of gray matter (GM) tissue (1.2 s) and l GM is the GM brain-blood water partition coefficient (0.9 mL/g) [15,22,23]. For the Philips data, a single M 0a -value was used for all subjects. This value was obtained in a previous study with the same center, scanner, head coil, pCASL protocol and a similar population (n = 16, 56% M, age 20-24 years) [24]. In short, cerebrospinal fluid T1 recovery curves were fitted on the control images of multiple time-point pCASL measurements, with the same readout, without background suppression. The acquired M 0 was converted to M 0a by multiplication with the blood water partition coefficient (0.76) and the density of brain tissue (1.05 g/mL) [23,25]. No difference was made between the quantification of GM and WM CBF.

Post-processing: spatial normalization
A single 3D T1-weighted anatomical scan from each scanner for each subject (n = 44) was segmented into GM and white matter (WM) tissue probability maps. All CBF maps were transformed into anatomical space by a rigid-body registration on the GM tissue probability maps. The tissue probability maps were spatially normalized using the Diffeomorphic Anatomical Registration analysis using Exponentiated Lie algebra (DARTEL) algorithm, and the resulting normalization fields were applied to the CBF maps as well [26]. Finally, all normalized images were spatially smoothed using an 86868 mm full-width-half-maximum Gaussian kernel, to minimize registration and interpolation errors.

Data analysis
All intra-vendor reproducibility analyses were based on a comparison of session 1 with session 2 within each vendor (n = 22). All inter-vendor reproducibility analyses were based on a comparison of GE session 1 with Philips session 2, and GE session 2 with Philips session 1 (n = 44). In this way, the temporal physiological variation is expected to have an equal contribution to the intra-and inter-vendor reproducibility. All reproducibility analyses were based on the mean CBF of the two sessions, and on the mean and standard deviation of the paired inter-session CBF difference, denoted as DCBF and SD DCBF respectively. The within-subject coefficient of variation (wsCV) -a normalized parameter of variation -was defined as the ratio of SD DCBF to the mean CBF of both sessions: Reproducibility was assessed on a total GM and WM level, and on a voxel-level.

Data analysis: total supratentorial GM and WM
Mean CBF-values of each session were obtained for the total supratentorial GM and WM. GM and WM masks were obtained by thresholding GM and WM probability maps at 70% and 95% tissue probabilities respectively. GM-WM CBF ratios were calculated individually. The significance of paired inter-session CBF differences (DCBF) was tested with a paired two-tailed Student's t-test. The Levene's test was used to test the significance of the difference between GE SD DCBF and Philips SD DCBF , as well as between the inter-vendor SD DCBF and both intra-vendor SD DCBF [27]. Limits of agreement -defining the range in which 95% of future measurements is expected to lie -were defined as DCBF61.96 SD DCBF [28].

Data analysis: voxel-level comparison
To assess spatial inter-vendor differences, CBF-and wsCVvalues were computed for each voxel. For CBF, both sessions and all subjects were averaged. To test significant voxel-wise intervendor CBF differences, a Bonferroni-corrected paired two-tailed Student's t-test was performed (using both sessions, n = 44). Individual histograms of CBF (25 bins, range 0-160 mL/100 g/ min) were averaged to generate a group-level histogram. A wsCV histogram (25 bins, range 5-45%) was generated from the wsCV-maps. Both CBF and wsCV histograms were generated for the total supratentorial GM and WM of each vendor. Statistical significance was set to p,0.05 for all tests.

Session timing
The number of days between intra-vendor sessions did not differ between vendors: 18.366.5 and 19.767.2 for GE and Philips respectively (independent sample Student's t-test, p = 0.5). However, GE session 1 and session 2 took place earlier in the day compared to the Philips sessions (15 h2664 h00 and 15 h5563 h34 compared to 20 h1662 h06 and 19 h4762 h38 respectively, p,0.01).

Total GM and WM
The intra-and inter-vendor statistics are summarized in Table 2 and visualized by the Bland-Altman plots in Figure 2. GM CBF did not differ significantly between both vendors (p = 1.0), but WM CBF did (p,0.01). Likewise, the intra-vendor GM variances of the paired CBF differences did not differ between the two vendors whereas the WM variances did (p = 0.6 and p = 0.02 respectively). The GM-WM CBF ratios of both vendors differed significantly, the 2D readout (Philips) GM-WM ratio being approximately twice as large as the ratio of the 3D readout (GE) (p,0.01). Both the GM and WM intra-vendor wsCVs were similar to the inter-vendor wsCVs (Table 2), which is confirmed by the Levene's test. The variance of GM inter-vendor CBF differences did not differ significantly from the variance of intravendor differences (p = 0.3 and p = 0.5 for GE and Philips respectively). For the WM, however, the variance of inter-vendor CBF differences did differ significantly from the Philips variance but not from the GE variance (p = 0.02 and p = 0.8 respectively).

Voxel-level comparison
Spatial CBF differences between GE and Philips are illustrated for a single subject and on group level in Figure 3 and 4 respectively. The spatial wsCV distribution is shown in Figure 5. In addition, Figure 6 provides an overview of spatial CBF differences between subjects, sessions and vendors for a single transversal slice. The main visual difference on all these maps was the homogeneity of GE compared to the heterogeneity of Philips, especially in the WM and in the z-direction. More specifically, the contrast between GM and WM was higher on the Philips CBF and wsCV-maps. Also within the GM, the CBF was more heterogeneous on the Philips maps compared to the GE maps. A CBF decrease and wsCV increase was observed in the posterior and superior regions on the GE maps and in the anterior-inferior and superior regions on the Philips maps. The GM CBF histograms were comparable between vendors (Figure 4d). The GE WM CBF histogram had a higher mean, but had the same shape as the Philips WM CBF histogram. The wsCV histograms, on the other hand, were less comparable (Figure 5d). The spatial GM wsCV distribution of Philips had a higher mean and was wider compared to GE. This difference in mean and spread was even larger for the WM.

Discussion
The most important result of this study is that -despite several voxel-wise differences between vendors -there were no intervendor differences in mean CBF or wsCV on a total GM level. This can be explained by the fact that the variation between the sessions can for a large part be attributed to physiological factors, as was previously noted in single-vendor reproducibility studies [11,[29][30][31]. For clinical studies that focus on the GM in total, it may therefore be more important to minimize and account for physiological variation than to account for inter-vendor differences in ASL implementation.
A different picture arises for smaller GM regions or for the total WM. We observed several spatial differences between vendors which can mainly be explained by differences in the readout module. The most visually striking inter-vendor difference on all CBF-and wsCV-maps was in the WM. The GM-WM CBF ratio of the 2D readout (Philips) was twice as large as the ratio of the 3D

3.4
Mean and DCBF represent the inter-session CBF mean and paired difference respectively. readout (GE), which is in agreement with a previous readout comparison on a single Siemens scanner [13]. This can be explained by the larger extent of spatial smoothing of a spiral 3D readout (GE) compared to the 2D readout (Philips), which leads to more contamination of the GM signal into the WM and vice versa.
Therefore, a 2D readout seems most suitable when the goal is to acquire uncontaminated GM or WM CBF -although the ability of ASL to measure WM CBF is debatable due to the long transit time of WM [32].  This difference in spatial smoothing may also explain the homogeneous GM appearance of the mean CBF and wsCV maps acquired with GE as compared to the more heterogeneous appearance of those acquired with Philips. In addition, it may explain the significant inter-vendor CBF difference within the subcortical GM since this area is surrounded by WM and therefore suffers more from smoothing with WM signal in GE (Figure 4c). Another explanation for the smaller spatial variation of GE, is its higher SNR compared to Philips. The SNR at GE is most probably higher because of the intrinsically high SNR of a 3D readout and because background suppression is more efficient for a single-volume readout as compared to a multi-slice readout [13]. In addition, parallel imaging was not available in the GE sequence, but was turned on in the Philips sequence. To what extent the heterogeneous appearance of the Philips CBF maps has a physiological origin or is rather the result of a too low SNR, cannot be differentiated with these data.
In regions with long arrival times -i.e. the posterior vascular territory and posterior watershed area -lower CBF and higher wsCV was observed in GE but not in Philips (Figures 4 and 5) [5]. This inter-vendor difference can be explained by differences in the effective post-labeling delay (PLD) between the readouts, even though both acquisitions had the same initial PLD (1525 ms). Whereas the 3D readout obtains all ASL signal for the total 3D volume at a single time-point -i.e. after 1525 ms PLD -the 2D readout obtains signal from each slice sequentially. With this multi-slice acquisition, each slice exhibits a longer effective PLD compared to its previous slice. This inferior-superior PLD increase of the 2D readout (Philips) allows the labeled blood more time to reach the superior slices compared to the homogeneous PLD of the 3D readout (GE). Therefore, the PLD may have been too short for the label to reach the superior slices in 3D (GE), whereas the effective PLD for the superior slices in 2D (Philips) was sufficient. These inter-vendor CBF differences and higher wsCV for GE in superior regions with long transit times are probably resolved by selecting a longer PLD for the 3D readout, such as 2000 ms [3].
Other prominent spatial inter-vendor CBF ( Figure 4) and wsCV ( Figure 5) differences were observed on the brain edges. We observed higher CBF and lower wsCV in anterior and inferior regions in Philips but not in GE. The prominent inferior CBF and wsCV differences (Figure 4c and Figure 5c) are partly due to the fact that these slices were simply not acquired by the 2D readout (Philips). With a 2D sequence, it is common practice to scan cerebral slices only as well as to optimize the PLD, T1 decay and background suppression for the cerebral slices. These issues do not apply for a 3D sequence, whose 3D slab usually has whole-brain coverage. The differences in the other areas can be explained by susceptibility artifacts from bone-air transitions at the paranasal sinuses and mastoid air cells present in the gradient-echo T2*weighted readout implemented by Philips [33]. In addition, it is expected that the echo-planar imaging readout (Philips) exhibits geometric distortion in these regions [33]. The T2-weighted spinecho readout employed by GE is much less sensitive to these artifacts, in comparison to the gradient-echo readout employed by Philips. For these reasons, a 3D readout is superior in regions such as the orbito-frontal lobe and cerebellum compared to a 2D readout. This especially favors the use of a 3D readout for clinical applications of ASL, since pathologies in these regions could remain undetected on a 2D readout [34][35][36].
A limitation of the current study is that we did not acquire spatial M 0 -maps with the same readout in Philips. By employing a voxel-wise normalization of the ASL-signal, these maps would have opposed the T 2 * susceptibility effects, since these will be approximately equally large for the DM and M 0 -map. Therefore, Philips spatial M 0a -maps could have improved quantification in regions of air-tissue transitions, which may have diminished the inter-vendor variation to a certain extent. However, the added value of spatial M 0 -maps is limited since they cannot improve the lower SNR of the gradient-echo readout (Philips) near the airtissue transitions. Therefore, the inter-vendor reproducibility in these regions is expected to remain low.
The current study may also be limited by the inter-vendor calibration of quantification parameters. These may remain arbitrary, mostly because they have been derived from simulations rather than measurements. One example is the inter-vendor differences in labeling efficiency due to a different number of background suppression pulses (5 and 2 for GE and Philips respectively) [21]. One way to deal with this is to scale to a phase-   Inter-Vendor Reproducibility of PCASL PLOS ONE | www.plosone.org contrast MRI sequence of the main feeding arteries [20]. However, this would shift the inter-vendor CBF variation from the ASL-sequence towards the phase-contrast MRI measurements.
Inter-vendor CBF and wsCV differences were observed on a voxel-level but not on the total GM level. Apparently, the effects of the abovementioned readout differences do cancel out when sufficient GM voxels are averaged. There are several explanations for this observation. First, the higher SNR of the 3D module may be important on a voxel-level, but if sufficient GM voxels are averaged physiological variation seems to outnumber the SNR differences between the readout modules. Second, the smoothing of the GE 3D readout averages signal from multiple GM voxels which increases SNR and subsequently decreases the wsCV within a single voxel. This effect is similar to averaging signal from multiple GM voxels of the 2D readout in post-processing. Therefore, this difference of spatial signal averaging between both readouts becomes apparent on a voxel-level but is negligible when all GM voxels are averaged.
It should be acknowledged that this study evaluated healthy controls only. The abovementioned inter-vendor readout differences could become more or less important in patients, considering the different spatial CBF variation in patients compared to healthy controls. Furthermore, these inter-vendor differences should not be generalized to all MRI vendors. Visual readout differences between GE and Siemens, who both use a 3D approach, may be smaller than the readout differences in the current study [13].
In conclusion, the current study shows that pCASL results do not differ between vendors on a total GM level. Therefore, the reliability of averaged CBF-values for the total GM can be expected to be equal in single-and multi-vendor studies. However, the reliability of measurements in GM regions or in the WM, is impeded by differences between the readout modules of both vendors. Therefore, our results strongly encourage the standardization of ASL implementations among vendors, which was also advocated by the recent ASL consensus paper [3].