Inter Subject Variability and Reproducibility of Diffusion Tensor Imaging within and between Different Imaging Sessions

The aim of these studies was to provide reference data on intersubject variability and reproducibility of diffusion tensor imaging. Healthy volunteers underwent imaging on two occasions using the same 3T Siemens Verio magnetic resonance scanner. At each session two identical diffusion tensor sequences were obtained along with standard structural imaging. Fractional anisotropy, apparent diffusion coefficient, axial and radial diffusivity maps were created and regions of interest applied in normalised space. The baseline data from all 26 volunteers were used to calculate the intersubject variability, while within session and between session reproducibility were calculated from all the available data. The reproducibility of measurements were used to calculate the overall and within session 95% prediction interval for zero change. The within and between session reproducibility data were lower than the values for intersubject variability, and were different across the brain. The regional mean (range) coefficient of variation figures for within session reproducibility were 2.1 (0.9–5.5%), 1.2 (0.4–3.9%), 1.2 (0.4–3.8%) and 1.8 (0.4–4.3%) for fractional anisotropy, apparent diffusion coefficient, axial and radial diffusivity, and were lower than between session reproducibility measurements (2.4 (1.1–5.9%), 1.9 (0.7–5.7%), 1.7 (0.7–4.7%) and 2.4 (0.9–5.8%); p<0.001). The calculated overall and within session 95% prediction intervals for zero change were similar. This study provides additional reference data concerning intersubject variability and reproducibility of diffusion tensor imaging conducted within the same imaging session and different imaging sessions. These data can be utilised in interventional studies to quantify change within a single imaging session, or to assess the significance of change in longitudinal studies of brain injury and disease.


Introduction
Diffusion tensor imaging (DTI) has been used to identify neuronal injury and predict outcome in a variety of neurological disorders such as traumatic brain injury [1][2][3], multiple sclerosis [4][5][6], Alzheimer's dementia and psychiatric disorders [7][8][9]. Previous human studies using DTI have provided invaluable reference data regarding normal values within different brain structures and several groups have reported data comparing DTI measurements between subjects, between scanners in different centres, following service upgrades, and reproducibility within the same centre over time [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28]. However, there are limited data that compare intersubject variability and reproducibility of DTI measurements [23], or published studies that compare reproducibility of DTI measurements obtained within the same imaging session (within session reproducibility) with that obtained during repeat imaging sessions on the same or different days (between session reproducibility). This is of particular relevance for group comparisons with healthy volunteers, and longitudinal and interventional studies where DTI can be used as a non-invasive imaging biomarker of disease progression or response to therapy. The rational design and interpretation of such studies is hampered by lack of knowledge regarding how the variability of DTI measurements in data obtained during the same scanning session differs when compared with similar data obtained during a different session or day. In studies where consecutive measurements are performed on each subject under resting and experimental conditions problems associated with variation between subjects due to individual differences (intersubject variability) can be limited. However, baseline DTI measurements may vary within an individual patient (intrasubject variability) and limit the ability to detect significant changes over time or following a therapeutic intervention. Where DTI is repeated after several days or weeks in different imaging sessions the measurements may vary within an individual patient even in the absence of disease progression due to a combination of intrasubject and scanner variability [29,30]. Without knowledge of such differences it is difficult to accurately determine the clinical significance of pathophysiological changes, as they evolve following various causes of brain injury or disease.
The aim of these studies was to provide reference data on intersubject variability and reproducibility of fractional anisotropy, apparent diffusion coefficient, radial and axial diffusivity measurements in a group of healthy volunteers. These data will inform the design of interventional studies, where repeated measurements are conducted within the same session, and longitudinal studies, where assessments are repeated over time in several different imaging sessions.

Ethics statement
Ethical approval was obtained from the Cambridgeshire 2 Research Ethics Committee (reference number 97/290), and written informed consent was obtained from all volunteers in accordance with the Declaration of Helsinki.

Imaging data acquisition
Twenty six healthy volunteers without any history of neuropsychiatric disorder or substance abuse underwent imaging using a 3T Siemens Verio MRI scanner (Siemens AG, Erlangen, Germany) within the Wolfson Brain Imaging Centre (WBIC), University of Cambridge. All volunteers were right handed (ten males and sixteen females) with mean (range) age of 34 (25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44) years, and employed by Cambridge University Hospitals NHS Trust. Each subject was requested to attend two imaging sessions and undergo DTI twice during each session. Twenty-two volunteers attended a second imaging session within a mean (range) of 33  days. Structural sequences included 3D T1weighted magnetization prepared rapid gradient echo (MPRAGE), fluid attenuated inversion recovery (FLAIR), gradient echo and dual spin echo (proton density/T2-weighted). The DTI data were acquired using 63 non-collinear directions, b = 1000 s/mm 2 with one volume acquired without diffusion weighting (b = 0), echo time (TE) 106 ms, repetition time (TR) 11700 ms, 63 slices, field of view 192 mm692 mm, 2 mm 3 isotropic voxels, and an acquisition time of 13:50 minutes. The two DTI sequences were interspersed within the structural sequences at different intervals within each imaging session in order to allow realistic comparison with clinical studies. In a single subject the second DTI dataset from the baseline imaging session was not successfully completed due to scanner malfunction, while four volunteers failed to attend the second imaging session within six months. Imaging data were checked for patient movement, and data sets degraded by motion artefact were excluded [1].

Image processing
Fractional anisotropy (FA), apparent diffusion coefficient (ADC) and axial (AD) maps were created using the Oxford Centre for functional MRI of the brain FSL Diffusion Toolbox [31,32], while radial (RD) diffusivity values were calculated as the mean of the second and third eigenvalues. To aid coregistration, the skull and extracranial soft tissue were stripped from the T1 weighted image using the Brain Extraction Tool of FSL [33]. The diffusion weighted data were normalized using a two-step approach. First, volunteer T1 weighted images were coregistered to the Montreal Neurological Institute 152 (MNI152) template using the vtkCISG normalized mutual information algorithm [34]. Using the b = 0 image the diffusion weighted data were coregistered to the subjects own T1 weighted image obtained during the same session. The transformation matrix normalizing the MPRAGE was then applied to the diffusion weighted data. Regions of interest (ROIs) from the Harvard Oxford subcortical and MNI structural probabilistic atlases available within FSL were applied in normalised space (figure 1) [35,36]. All normalised images were inspected using FSL View by a single experienced clinical investigator (TV) to confirm that data processing had completed successfully and that the ROIs were aligned and corresponded to the regions specified. The ROI template was modified by erosion of a single voxel using fslmaths to improve spatial localisation and reduce the impact of coregistration, normalisation and partial volume errors. The FA, mean ADC, AD and RD values for the different ROIs were calculated using in-house software using Matlab (Mathworks, Natick, USA).

Analysis Strategy
The baseline data from all 26 volunteers were used to calculate the intersubject variability, while within session and between session reproducibility were calculated from available  data. The acquisition of two sets of imaging data in each of the two imaging sessions allows the calculation of four independent sets of DTI data, which could be used to assess the reproducibility of measurements. We used the average SD for all DTI measurements obtained in 26 volunteers in both sessions to calculate the population 95% prediction interval (PI) for zero change (using two SD values). These calculated thresholds are prediction intervals for assuming no changes from zero with the repeat DTI measurement rather than confidence intervals for variability of the measurement. Although these average data are extremely useful, the calculated SD could vary within different sessions and particular ROIs within subjects. It would therefore be helpful to have a more specific measure of variability within a session (within session reproducibility), and preferably for each ROI. While this is possible, the small sample numbers (two readings obtained in each of the two sessions) means that a conventional threshold of change greater than 2SD cannot be used to assess the statistical significance of changes in this context. While any estimate of variance based on a t distribution with two degrees of freedom must be treated with caution, statistical theory suggests that an estimate of the 95% prediction interval for zero change may be provided by a threshold of 4.3 SDs. These within session measurements could therefore be used to assess the significance of the changes in DTI parameters following a therapeutic intervention within the same imaging session.

Statistical analysis
Statistical analyses were conducted using Statview (Version 5, 1998, SAS Institute Inc., Cary, North Carolina, USA) and SPSS H Statistics Version 21 (IBM H Corporation, New York, United States). All data are expressed and displayed as mean and standard deviation (SD), unless otherwise stated. To compare the reproducibility of DTI measurements the SD and coefficient of variation (CoV) (CoV = SD/mean) of measurements were calculated within each ROI. Data were compared using paired t-tests, factorial analysis of variance (ANOVA) and intraclass correlation (ICC) as appropriate. Using ANOVA the residual standard deviation was used to calculate the 95% prediction interval for zero change of repeat DTI studies. All p values are quoted after Bonferroni corrections for multiple comparisons (where appropriate).

Intersubject variability of diffusion tensor imaging metrics
The intersubject variability of DTI measurements is displayed in Tables 1 and 2 for the predominantly white matter and mixed cortical and deep grey matter regions. The intersubject variability was high for all the calculated parameters with a mean (range) CoV across the ROIs for FA of 7.9 (3.3-31.7%) and 6.8

Within session and between session reproducibility of diffusion tensor imaging
The ROI data for within and between session reproducibility were variable across the different brain regions, but lower than the values for intersubject variability (Tables 3-6). The within session reproducibility measurements were significantly lower than between session reproducibility measurements for all the DTI parameters (p,0.001, paired 't' test with Bonferroni correction). As an example, the difference between intersubject variability, within and between session reproducibility is displayed for FA in figure 2.

Calculation of 95% prediction interval for zero change
Using the four DTI measurements obtained from both sessions we used ANOVA to determine the significance of the differences (Table 7). These confirm that there is a significant difference between regions and subjects, and that there is a significant interaction between brain region and subject. The residual variance of the DTI measurements which could not be accounted for by the known independent variables is shown in Table 7, and the calculated SD for FA, ADC, AD and RD were 1.

Discussion
This study provides additional reference data concerning intersubject variability and reproducibility of DTI conducted within the same imaging session (within session) and different imaging sessions (between session) in a group of healthy volunteers. As reported previously [23], we found that intersubject variability was high, with substantial variability across the brain for all the calculated parameters. While the DTI measurements were stable with CoV values of ,5%, the repeated DTI sequences obtained during the same session (within session) had lower CoV values than those obtained from measurements obtained in a different imaging session separated by up to six months. The calculated 95% prediction intervals for zero change of repeat DTI measurements were similar for the data obtained within the same session and that calculated from all the measurements obtained over both imaging sessions. These prediction intervals can be calculated for individual ROIs and utilised in interventional studies to quantify change within a single imaging session, or to assess the significance of change in longitudinal studies of brain injury and disease.
The factors affecting the reproducibility of DTI parameters include changes within the MR scanner or individual subjects. Features related to the scanner include B 0 field inhomogeneities, scanner drift, gradient coil stability, signal to noise ratio and software upgrades. Such factors may be more significant when imaging is acquired within different imaging sessions, rather than repeat acquisitions within the same session where such parameters are more likely to be similar. Regular servicing and daily quality assurance measurements seek to ensure that an MR scanner is operating normally. It is obviously necessary to monitor such changes, and where possible, take steps to limit their impact. There were no upgrades or changes in MR scanner hardware or software during the period of this study. In addition, daily signal to noise ratio measurements were not significantly different for the six month period of this study (p = 0.08, Friedman test. Data not shown). While scanner variability is important there are individual subject factors that can induce substantial variability in DTI parameters. These include head movements, and positioning within the scanner field of view. We undertook standard procedures to limit such variability. All subjects were positioned within the head coil according to standard operating procedure and their alignment was confirmed prior to commencing imaging. Following standard imaging for localisation we monitored subject movement, and all data were checked during acquisition and processing for evidence of motion artefact. While no subject was excluded during acquisition or processing in these analyses DTI had to be repeated in one subject during an imaging session due to subject movement. In addition, we performed all analyses following image coregistration and spatial normalisation to MNI standard space. We used a standard ROI template covering the whole brain from the Harvard Oxford subcortical and MNI structural probabilistic atlases available within FSL. While the use of this analysis strategy sought to reduce variability within our comparisons, we eroded the ROI template by a single voxel within FSL in order to improve spatial localisation and reduce the impact   Individual white matter region of interest measurements for within session reproducibility obtained in the first and second imaging sessions in 26 and 22 subjects respectively, and the between session reproducibility for those 22 subjects who underwent imaging at both sessions. Data displayed are percentage coefficient of variation of measurements for fractional anisotropy (FA), apparent diffusion coefficient (ADC), axial (AD) and radial (RD) diffusivity. doi:10.1371/journal.pone.0065941.t005 of coregistration, normalisation and partial volume errors. Finally, all ROIs were manually inspected to ensure that they were aligned with the imaging data and corresponded to the regions specified. In summary, we considered possible sources of DTI variability within our centre and attempted to limit their impact and ensure that the data we acquired were comparable within and between the different imaging sessions. While our results for DTI reproducibility are in line with published data [12,19,37], we report data specifically concerning the difference between intersubject variability, within session and between session reproducibility. It is useful to consider the sources of variability in DTI data in the setting where we are trying to address the significance of changes between normal physiology and disease states, or changes that are the consequence of a therapeutic intervention. In the first case, the relevant sources of error are the intersubject variability in the patient and volunteer groups. Our data for healthy volunteers are broadly concordant with results from other groups [23], and show that these are high, with mean (range) CoV of 7% (3-32) for FA, 7% (2-34) for ADC, 5% (2-27) for axial diffusivity and 11% (3-63) for radial diffusivity. To be certain that DTI values derived from an individual patient are significantly lower, with a confidence of 95%, these figures suggest that we need to have mean ROI FA values (for example) that are at least 14% lower than volunteer means. This estimate, and the secure distinction of a patient group as abnormal, is further confounded by the fact that intersubject CoV in patients with neurological disorders is larger [38,39], and is variable across different brain regions. These figures underline the difficulty of using DTI in small groups of patients with various causes of neurological disease who have variable pathophysiology. In practice, however, estimated sample sizes in such studies are moderated by the fact that the changes in DTI are often dramatic, and significance is often detected with manageable numbers, despite the large intersubject variability in volunteer and patients groups [38,39].
However, it is important to point out that these figures are largely irrelevant when considering the power and design of clinical studies, when DTI is being used to monitor changes within the same subject in the same scanning session (within session reproducibility) or during longitudinal assessments over time in several different imaging sessions (between session reproducibility). In such settings, the subject is his or her own control, and the relevant parameter is intrasubject variability or reproducibility. Our data show that these figures for CoV are much smaller than those obtained from the discussion in the previous paragraph. In addition, we provide reference data for FA, ADC, AD and RD in healthy volunteers demonstrating that the CoV for within session reproducibility is lower than between session reproducibility (Tables 3-6). These data provide helpful guidance for designing clinical studies, and suggest that it should be possible to detect differences of approximately 5 to 10% with confidence, particularly within single session interventional studies. For example, although the reproducibility of measurements is variable for the different brain regions we can use these data to calculate sample sizes for interventional and longitudinal clinical studies. Even when we consider the brain region with the highest CoV (ventral midbrain) we should be able to detect a 10% change in DTI with 95% power at a significance level of 1% within a group of 10 subjects within a single interventional or longitudinal study design [40]. Clearly, such estimates only strictly apply to our scanner and institution, but they provide a useful starting point for study design. There are a number of factors particular to our scanning protocols and institutional setup that limit the use of the reproducibility measurements that we provide. These include, but might not be limited to, scanner, acquisition protocols, data correction and reconstruction, and processing. Despite these variations, it should be possible for other groups to use the methodology that we describe to derive 'in house' data for their studies. In addition, although these data provide guidance for designing clinical studies, particular groups of subjects (including those with brain injury) may require sedation and control of ventilation as part of clinical care. While such patient groups may appear complex and difficult to manage within the context of an imaging study the fact that they remain completely immobile and have stable physiology should result in lower CoV for reproducibility measurements and an increase in the sensitivity of interventional studies [41].

Methodological limitations
While we were able to obtain multiple DTI datasets on up to two occasions in this group of volunteers, scanner availability and subject tolerance prevented us from acquiring further DTI datasets within the same session and additional scanning sessions. We found that the within session reproducibility measurements were lower than between session reproducibility measurements obtained over a six month period. The expected change in DTI in healthy volunteers of a similar age over a period of up to six months is small and unlikely to have resulted in the differences we have found [42,43]. The 95% prediction intervals for zero change for the within session DTI measurements were similar to that calculated from the DTI measurements obtained within all sessions. The lack of difference between these measures could be related to the fact that we were only able to obtain two sets of DTI within each session and that the 95% prediction interval for zero change for within session measurements is based on 4.3 rather than 2 SDs. These overall prediction intervals for zero change are calculated from all the ROI data, but can easily be calculated for individual ROIs using the same technique and used as a method for determining the significance of changes following an intervention or longitudinal change over time.
There were differences in the intersubject variability and reproducibility of DTI across the different brain regions. These differences are demonstrated in Tables 1-6 and figure 2, and are particularly relevant within the corpus callosum, caudate, cingulum and midbrain structures. The increase in variability and lower reproducibility of these regions may be related to partial volume errors within these relatively small structures secondary to variation in the quality of coregistration and spatial normalisation within individual subjects. We tried to limit these errors by eroding the ROI template by a single voxel to improve accuracy. Despite this, errors remain within some ROIs where DTI values differ in closely adjacent brain regions. However, the purpose of this study was to determine the variability of measurements using an ROI template and standard processing pipeline. While variability in the fitting of template ROIs in individual subjects may result in higher intersubject variability for particular brain regions this is less likely for measurements of reproducibility within the same subject. Here any differences in ROI template fitting between the sessions are likely to be small. However, these regional differences underline that DTI studies seeking to compare different subject groups or assess interventional or longitudinal change should compare data from within the same brain region using the same data processing technique. While the data we report are specific to our methods the reproducibility measurements that we report provide a useful starting point for study design.

Conclusions
This study provides additional reference data concerning intersubject variability and reproducibility of DTI conducted in a group of healthy volunteers. The CoV for repeat DTI measurements obtained during the same session were lower than those obtained from measurements obtained in a different imaging session separated by up to six months. These data can be used to calculate the 95% prediction interval for zero change and may inform the design of interventional studies to quantify change within a single imaging session, or to assess the significance of change in longitudinal studies.