University of Birmingham Comparison of inter subject variability and reproducibility of whole brain proton spectroscopy

The aim of these studies was to provide reference data on intersubject variability and reproducibility of metabolite ratios for Choline/Creatine (Cho/Cr), N-acetyl aspartate/Choline (NAA/Cho) and N-acetyl aspartate/Creatine (NAA/Cr), and individual signal-intensity normalised metabolite concentrations of NAA, Cho and Cr. Healthy volunteers underwent imaging on two occasions using the same 3T Siemens Verio magnetic resonance scanner. At each session two identical Metabolic Imaging and Data Acquisition Software (MIDAS) sequences were obtained along with standard structural imaging. Metabolite maps were created and regions of interest applied in normalised space. The baseline data from all 32 volunteers were used to calculate the intersubject variability, while within session and between session reproducibility were calculated from all the available data. The reproducibility of measurements were used to calculate the overall and within session 95% prediction interval for zero change. The within and between session reproducibility data were lower than the values for intersubject variability, and were variable across the different The within and between session reproducibility measurements similar for vs. and vs. and p , 0.05). This study additional data that can be utilised in interventional studies to quantify change within a single imaging session, or to assess the significance of change in longitudinal studies of brain injury and disease.


Introduction
Proton magnetic resonance spectroscopic imaging ( 1 HMRS) can be used in the diagnosis, assessment of progression and prediction of outcome in a variety of neurological disorders such as brain tumours [1], traumatic brain injury [2][3][4], multiple sclerosis [5,6], motor neuron disease [7], Alzheimer's dementia [8] and psychiatric disorders [9][10][11]. The metabolites reliably measured with proton spectroscopy ( 1 HMRS) at medium to long echo times include N-acetyl aspartate (NAA), Creatine (Cr) and Choline (Cho) containing compounds. These provide a measure of neuronal integrity, metabolism and a marker of neuronal breakdown and turnover respectively [12][13][14][15]. While targeted imaging of regions of interest (with single voxel or two dimensional 1 HMRS) allows evaluation of local neuronal loss and glial proliferation, whole brain imaging provides assessment of the global burden of neurological disease even in regions that appears structurally normal. 1 HMRS has been used to non-invasively evaluate normal appearing brain in a variety of neurological disorders including multiple sclerosis and head injury [3,6,16]. Whole brain proton spectroscopy (WB 1 HMRS) data acquired with Metabolic Imaging and Data Acquisition Software (MIDAS) [17][18][19] provides a fully automated pipeline for processing and interpreting WB 1 HMRS data. Previous studies using MIDAS and other 1 HMRS techniques have provided invaluable reference data regarding normal values within different brain regions and reproducibility of such data [20][21][22][23][24]. However, there are limited data comparing intersubject variability and reproducibility of WB 1 HMRS measurements within the same imaging session (within session reproducibility) and those obtained during repeat imaging sessions on different days (between session reproducibility). This is of relevance for group comparisons with healthy controls, and longitudinal and interventional studies where WB 1 HMRS is used as a biomarker of disease progression or response to therapy. The rational design and interpretation of such studies is hampered by lack of knowledge regarding how the variability of WB 1 HMRS measurements in data obtained during the same scanning session differs when compared with similar data obtained during a different session or day. In studies where consecutive measurements are performed on each subject under resting and experimental conditions problems associated with variation between subjects due to individual differences (intersubject variability) can be limited. However, baseline MIDAS WB 1 HMRS measurements may vary within an individual patient (intrasubject variability) and limit the ability to detect significant changes over time or following a therapeutic intervention. Where imaging is repeated after several days or weeks in different sessions the measurements may vary within an individual patient even in the absence of disease progression due to a combination of intrasubject and scanner variability [25]. Without knowledge of such differences it is difficult to accurately determine the clinical significance of pathophysiological changes, as they evolve following various causes of brain injury or disease.
The aim of these studies was to provide reference data on intersubject variability and reproducibility of commonly used metabolite ratios (Cho/Cr, NAA/Cho and NAA/Cr) and individual signal-intensity normalised metabolite concentrations (NAA, Cho and Cr) in a group of healthy volunteers using MIDAS. These data will inform the design of interventional studies, where repeated measurements are conducted within the same session, and longitudinal studies where assessments are repeated over time in several different imaging sessions.

Ethics statement
Ethical approval was obtained from the Cambridgeshire 2 Research Ethics Committee (reference number 97/290), and written informed consent was obtained from all volunteers in accordance with the Declaration of Helsinki.

Imaging data acquisition
Thirty two healthy volunteers without any history of neuropsychiatric disorder or substance abuse underwent imaging using a 3T Siemens Verio MRI scanner (Siemens AG, Erlangen, Germany) with 12 channel detection within the Wolfson Brain Imaging Centre (WBIC), University of Cambridge. All volunteers were right handed (fourteen males and eighteen females) with mean (range) age of 34 (25 -50) years, and were employed by Cambridge University Hospitals NHS Trust. Twenty-two volunteers attended a second imaging session within a mean (range) of 33 (3 -181) days. At each imaging session subjects were imaged twice with MIDAS along with standard structural imaging. Structural sequences included 3D T1-weighted magnetisation prepared rapid gradient echo (MPRAGE), fluid attenuated inversion recovery (FLAIR), gradient echo and dual spin echo (proton density/T2-weighted). Whole brain spectroscopy data were acquired using a volumetric spin echo (TR/TE 1710/70 milliseconds, flip angle of 73˚, 50 phase encoding steps and a field of view of 28062806180 mm 3 ) covering the whole brain with an acquisition time of 26 minutes as described by Maudsley et al [17,18,26]. This sequence also included lipid inversion nulling and an unsuppressed water spectroscopy dataset acquired with 20˚flip angle acquired in an interleaved fashion. The MIDAS and MPRAGE were acquired at an angulation of + 15 to + 20˚to the AC-PC line to improve brain coverage and limit field inhomogenieties from the frontal and sphenoid air sinuses. The MPRAGE (TR/TE 2150/4.4 and flip angle 8˚) was acquired within each imaging session with one millimetre isotropic resolution.

Spectroscopic data processing
Parametric maps were created using the automated pipeline of MIDAS and the data for NAA, Cho and Cr were individual signal-intensity normalised to institutional units (iu) based on the tissue water signal derived from the water reference dataset. Metabolite data were reconstructed using MIDAS and resulted in images composed of 64664632 voxels with an individual voxel volume of approximately 1 ml. Voxel data with line width greater than 12 Hz were excluded from further analysis as previously described by Maudsley et al [24]. The WB 1 HMRS parametric maps were spatially normalised using a two-step approach using FSL [20,27]. First, control T1 weighted images were coregistered to water spectroscopic images using FMRIB's Linear Image Registration Tool (FLIRT) [28][29][30]. This was followed by coregistration of control T1 weighted images to the MNI152 template using FMRIB's Non-linear Image Registration Tool (FNIRT) [29][30][31]. Combined transformation matrixes were then applied to all parametric images used in the analyses. Representative white matter, deep grey and mixed regions of interest (ROIs) from the Harvard Oxford subcortical and MNI structural probabilistic atlases available within FSL were then applied in normalised space (Fig. 1). All coregistered images were subsequently inspected to ensure that the ROIs were correctly aligned and corresponded to the regions specified. The ROI template was modified by erosion of a single voxel using FSL to improve spatial localisation and reduce the impact of coregistration, normalisation and partial volume errors. The mean values for metabolites for each ROI were calculated using in-house software written in Matlab (Mathworks, Natick, USA).

Analysis Strategy
Each of the 32 volunteers were invited to attend two separate imaging sessions where two MIDAS sequences were obtained. This resulted in a maximum of four independent sets of WB 1 HMRS data (runs) for each subject, which could be used to assess the reproducibility of measurements. Twenty-two subjects underwent imaging in both sessions. The baseline data from all 32 volunteers were used to calculate intersubject variability. For the repeat MIDAS measurements obtained in the same subject the data were split into that obtained during the same imaging session and that obtained in two different imaging sessions to calculate within session and between session reproducibility respectively. Therefore, the available paired data from each session (run 1 & 2 and 3 & 4 respectively) were used to calculate within session reproducibility, and the available combinations of the four datasets from the different sessions were used to calculate between session reproducibility (runs 1 & 3, 1 & 4, 2 & 3, and 2 & 4). The inclusion of all potential combinations ensures that any variation in the order of the individual sequences obtained within each particular session is accounted for within the calculated average measurement of between session reproducibility and reflects clinical practice.
In order to help design any future interventional study using proton spectroscopy we need to know how much deviation in a repeat measurement we would accept as no or zero change. We used the SD of measurements obtained in this healthy volunteer study to calculate a 'confidence interval' for zero change of a repeat measurement in the same subject. We used the average SD for all measurements obtained in 32 volunteers in both sessions to calculate the population 95% prediction interval (PI) for zero change (using two SD values) [25,32,33]. These calculated thresholds are prediction intervals for assuming no changes from zero with the repeat WB 1 HMRS measurement rather than confidence intervals for variability of the measurement. This estimate for the variation in repeat measurements means that we would accept a positive or negative change in a patient as being indicative of zero change as long as it were less than 2 times the standard deviation of the repeat measurement obtained in our healthy volunteer group. Although these average data are extremely useful, the calculated SD could vary within different sessions and particular ROIs within subjects. It would therefore be helpful to have a more specific measure of variability within a session (within session reproducibility), and preferably for each ROI. While this is possible, the small sample numbers (two readings obtained in each of the two sessions) means that a conventional threshold of change greater than 2SD cannot be used to assess the statistical significance of changes in this context. For a t distribution with two degrees of freedom, statistical theory suggests that an estimate of the 95% prediction interval for zero change may be provided by a threshold of 4.3 SDs. These within session measurements could therefore be used to assess the significance of the changes in WB 1 HMRS parameters following a therapeutic intervention within the same imaging session. We have previously published this analysis strategy for diffusion tensor imaging and 15 O positron emission tomography [25,34].

Statistical analysis
Statistical analyses were conducted using Statview (Version 5, 1998, SAS Institute Inc., Cary, North Carolina, USA) and SPSS Statistics Version 21 (IBM Corporation, New York, United States). All data are expressed and displayed as mean and standard deviation (SD), unless otherwise stated. To compare the reproducibility of WB 1 HMRS measurements the SD and coefficient of variation (CoV) (CoV 5 SD/mean) of measurements were calculated within each ROI. Data were compared using paired t-tests, factorial analysis of variance (ANOVA) and intraclass correlation (ICC) as appropriate. Using ANOVA the residual standard deviation was used to calculate the 95% prediction interval for zero change of repeat WB 1 HMRS studies. All p values are quoted after Bonferroni corrections for multiple comparisons (where appropriate).

Within session and between session reproducibility of WB 1 HMRS
The individual ROI data for within and between session reproducibility were variable across the different brain regions, but lower than the values for intersubject variability (Tables 3, 4, 5, 6). The within and between session reproducibility measurements were similar for Cho/Cr, NAA/Choline, Cho and Cr (11.8%, 11.4%, 14.3 and 10.6% vs. 11.9%, 11.4%, 13.5% and 10.5%, and p 5 0.44, 0.87, 0.08 and 0.86 respectively, paired 't' tests), but for NAA/Creatine and NAA between session reproducibility was lower than within session reproduci-bility (9.3% and 9.1% vs. 10.1% and 9.9%, p ,0.05 paired 't' test with Bonferroni correction). The difference between intersubject variability, within and between session reproducibility is displayed for a selection of ROIs for the metabolite ratios and concentrations in Figs. 3 and 4 respectively.
The intraclass correlation coefficient (ICC) for within and between session reproducibility within brain regions of mixed cortical and deep grey, and white matter are displayed in table 7.

Calculation of 95% prediction interval for zero change
Using the four WB 1 HMRS measurements obtained from both sessions we used ANOVA to determine the significance of the differences (Tables 8 and 9). These confirm that there is a significant difference between regions and subjects, and that there is a significant interaction between brain region and subject. The residual variance of the measurements that cannot be accounted for by the known independent variables is shown in Tables (Table 3 & 5) an estimate of the 95% prediction intervals for zero change within individual ROIs should be based on 4.3 SD values. As an example, this results in a 95% prediction interval for zero change for NAA, Cho and Cr within a single imaging session of 3839.5, 844.5 and 2345.2 iu for the left temporal, and 3557.0, 1815.9 and 3081.0 iu for the right anterior thalamic radiation respectively. These prediction intervals can be used to assess the impact of therapeutic interventions within a single session, but also to assess the impact of treatment and disease progression over time within different imaging sessions.

Discussion
This study provides additional reference data concerning intersubject variability and reproducibility of metabolite ratios and individual signal-intensity normalised metabolite concentrations obtained using WB 1 HMRS conducted within the same imaging session (within session) and different imaging sessions (between session) in a group of healthy volunteers. As reported previously, we found that intersubject variability was high [21]. The reproducibility of metabolite ratios and concentrations were lower than intersubject variability (10 -15% vs. 15 -30%) but there was substantial variability across the brain for all the calculated parameters. The within and between session reproducibility measurements were similar for Cho/Cr, NAA/Cho, Cho and Cr but for NAA/Creatine and NAA between session reproducibility was lower than within session reproducibility. The calculated overall population 95% prediction intervals for zero change of repeat  calculated for individual ROIs and utilised in interventional studies where response to therapy can be assessed, or to assess the significance of change from disease progression within longitudinal studies of nervous system disorders. The factors affecting the reproducibility of WB 1 HMRS parameters include changes within the MR scanner or individual subjects. Features related to the scanner include B 0 field inhomogeneities (heating during the long acquisition process), scanner drift, gradient coil stability, signal to noise ratio and software upgrades. Such factors may be more significant when imaging is acquired within different imaging sessions, rather than repeat acquisitions within the same session where such parameters are more likely to be similar. Regular servicing and daily quality assurance measurements seek to ensure that an MR scanner is operating normally. It is obviously necessary to monitor such changes, and where possible, take steps to limit their impact on the spectroscopic data obtained. Importantly, there were no upgrades or changes in MR scanner hardware or software during the period of this study. While scanner variability is important there are individual subject factors that can induce substantial variability in WB 1 HMRS. These include head movements and positioning within the scanner field of view. In particular, data acquisition within the volume of interest is sensitive to inhomogeneities that can result from proximity to the sphenoid and frontal sinuses. We undertook standard procedures to limit such variability. All subjects  Table 3. Within session and between session variability of metabolites for whole brain proton spectroscopy.  were positioned within the head coil according to standard operating procedures within our institution and the alignment confirmed prior to commencing imaging. Following standard imaging for localisation we monitored subject movement, and all data were checked during processing for movement artefact.
No data sets were excluded in these analyses due to subject motion during the scan. In addition, we performed all analyses following image coregistration and spatial normalisation to MNI standard space. We used a standard ROI template covering the whole brain from the Harvard Oxford subcortical and MNI structural probabilistic atlases available within FSL. While the use of this analysis strategy sought to reduce variability within our comparisons, we eroded the ROI template by a single voxel within FSL in order to improve spatial localisation and reduce the impact of coregistration, normalisation and partial volume errors. Finally, all ROIs were manually inspected to ensure that they were correctly aligned with the imaging data and corresponded to the regions specified. In summary, we considered possible sources of WB 1 HMRS variability within our centre and attempted to limit their impact and ensure that the data we acquired were comparable within and between the different imaging sessions. Whilst our results for WB 1 HMRS reproducibility are in line with published data, we report data specifically concerning the difference between intersubject variability, within session and between session reproducibility. It is useful to consider the sources of variability in WB 1 HMRS data in the setting where we are trying to address the significance of changes between normal physiology and disease states, or changes that are the consequence of a therapeutic intervention. In the first case, the relevant sources of error are the intersubject variability in the patient and volunteer groups. Our data for healthy volunteers are broadly concordant with results from other groups [24], and show that these are high, with mean (range) CoV for Cho/Cr 21 (11 -62%), NAA/Cho 17 (11 -55%), NAA/Cr 13 (8 -37%), NAA 12 (6 -23%), Cho 31 (13 -69%) and Cr 19 (7 -61%). To be certain that 1 HMRS values derived from an individual patient are significantly lower, with a confidence of 95%, these figures suggest that we need to have mean ROI NAA values (for example) that are at least 23% lower than volunteer means. This estimate and the secure distinction of a patient group as abnormal is confounded by the fact that intersubject CoV in patients with Table 6. Within session and between session variability of metabolites for whole brain proton spectroscopy.

NAA Cho Cr Within Session Between Session Within Session Between Session Within Session
Between Session neurological disorders is likely to be larger than controls, and variable across different brain regions. These figures underline the difficulty of using WB 1 HMRS in small groups of patients with different causes of neurological disease who have variable pathophysiology. In practice, the estimated study sample size is moderated by the dramatic changes in metabolite concentration that occur in patients. For example, following mild traumatic brain injury there is approximately a 20% reduction in NAA and increase in Cho even where structural imaging appears normal, and in severe traumatic brain injury changes of up to a 50% can occur [35,36]. Hence the significance of metabolite change is often detected with manageable numbers, despite the large intersubject variability in volunteer and patients groups. However, it is important to point out that these figures are largely irrelevant when considering the power and design of clinical studies, when WB 1 HMRS is being used to monitor changes within the same subject in the same scanning session (within session reproducibility) or during longitudinal assessments over  time in several different imaging sessions (between session reproducibility). In such settings, the subject is his or her own control, and the relevant parameter is intrasubject variability or reproducibility. Our data show that these figures for CoV are smaller than those obtained from the discussion in the previous paragraph. In addition, we provide reference data for metabolites in healthy volunteers demonstrating that the CoV for within session reproducibility is broadly comparable to that obtained in different imaging sessions (Tables 3, 4, 5, 6, 7). While the reproducibility of NAA/Creatine and NAA was significantly lower for between session compared to within session measurements the absolute differences were small. This finding is not consistent with the lack of difference for the other metabolites and is unlikely to be clinically relevant. We found no evidence to suggest that within session reproducibility was smaller than between session reproducibility measurements. These data provide helpful guidance for designing clinical studies, and suggest that for NAA or NAA/Cr it should be possible to detect differences of 20% with confidence. For example, although the reproducibility of measurements is variable for the different brain regions we can use these data to calculate sample sizes for interventional and longitudinal clinical studies. For a lobar ROI such as the right frontal region the between session CoV was 8% for NAA and we should be able to detect a 20% change with 95% power at a significance level of 1% within a group of 10 subjects within a single interventional or longitudinal study design [37]. Clearly, such estimates only strictly apply to our scanner and institution, but they provide a useful starting point for any spectroscopic study design. There are a number of factors particular to our scanning protocols and institutional setup that limit the use of the reproducibility measurements that we provide. These include, but might not be limited to, scanner, acquisition protocols, data correction and reconstruction, and processing. Despite these variations, it should be possible for other groups to use the methodology that we describe to derive 'in house' data for their studies. In addition, although these data provide guidance for designing clinical studies, particular groups of subjects (including those with brain injury) may require sedation and control of ventilation as part of clinical care. While such patient groups may appear complex and difficult to manage within the context of an imaging study the fact that they remain completely immobile and have stable physiology should result in lower CoV for reproducibility measurements and an increase in the sensitivity of interventional studies [34].

Methodological limitations
The volunteers included in this study ranged in aged from 25 -50 years, and since metabolite levels are associated with age [23], this may account for some of the variability in the intersubject analysis. While we were able to obtain multiple WB 1 HMRS datasets on up to two occasions in this group of volunteers, scanner availability and subject tolerance (duration and noise) prevented us from acquiring further WB 1 HMRS datasets within the same session and additional scanning sessions. A repeat imaging session was performed within a mean (range) of 33 (3 -181) days, and variation in this interval could result in biological differences between the datasets obtained within a few days compared to those obtained after several months. However, any expected change in WB 1 HMRS in healthy volunteers of a similar age over a period of up to six months is small and unlikely to have resulted in the differences we have found [23,38]. In addition, we found no relationship between scan reproducibility and the interval between the two imaging sessions. We found variability in the 1 HMRS measurements and their reproducibility across the different brain regions. In addition, there was more variability in metabolite data involving choline, which probably reflects the lower concentration of choline within the brain [39]. These differences are demonstrated in Tables 1,  2, 3, 4, 5, 6, and Figs. 3 and 4 and were particularly relevant for the corpus callosum, deep grey matter, midbrain, frontal, occipital and some white matter regions. We found no relationship between the ROI volume and intersubject variability and reproducibility of 1 HMRS for any of the metabolites (data not shown). Despite this, the cause of these differences may in part be related to inhomogeneities in the B 0 field induced by the frontal and sphenoidal air sinuses, partial volume errors within relatively small regions, locally variant metabolite concentrations, and variation in the quality of coregistration and spatial normalisation within individual subjects. We tried to limit these errors through careful review of all the transformed imaging datasets, shimming the scanner before each MIDAS data acquisition, and eroding the ROI template by a single voxel to improve accuracy. Despite this, errors remain within some ROIs where 1 HMRS values differ in closely adjacent brain regions. However, the purpose of this study was to determine the variability of measurements using an ROI template and standard processing pipeline. While variability in the fitting of template ROIs in individual subjects may result in higher intersubject variability for particular brain regions this should be less likely for measurements of reproducibility within the same subject. Here any differences in ROI template fitting between the sessions should be small. These regional differences underline that 1 HMRS studies should compare data within the same brain region using the same data processing technique. Our figures for reproducibility are higher than that reported by Maudsley et al using the same acquisition sequence [20]. This reflects our inclusion of a larger study group and that we utilised a standard processing pipeline and ROI template covering the whole brain within normalised space that we would typically apply to patient studies. While the data we report are specific to our methods the reproducibility measurements that we report provide a useful starting point for study design.

Conclusions
This study provides additional reference data concerning intersubject variability and reproducibility of WB 1 HMRS conducted in a group of healthy volunteers. The CoV for repeat WB 1 HMRS measurements obtained during the same session were similar to that obtained from measurements obtained in a different imaging session separated by up to six months. These data can be used to calculate the 95% prediction interval for zero change and may inform the design of interventional studies to quantify change within a single imaging session, or to assess the significance of change in longitudinal studies.