28 Jun 2018: Lévy S, Guertin MC, Khatibi A, Mezer A, Martinu K, et al. (2018) Correction: Test-retest reliability of myelin imaging in the human spinal cord: Measurement errors versus region- and aging-induced variations. PLOS ONE 13(6): e0199796. https://doi.org/10.1371/journal.pone.0199796 View correction
To implement a statistical framework for assessing the precision of several quantitative MRI metrics sensitive to myelin in the human spinal cord: T1, Magnetization Transfer Ratio (MTR), saturation imposed by an off-resonance pulse (MTsat) and Macromolecular Tissue Volume (MTV).
Thirty-three healthy subjects within two age groups (young, elderly) were scanned at 3T. Among them, 16 underwent the protocol twice to assess repeatability. Statistical reliability indexes such as the Minimal Detectable Change (MDC) were compared across metrics quantified within different cervical levels and white matter (WM) sub-regions. The differences between pathways and age groups were quantified and interpreted in context of the test-retest repeatability of the measurements.
The MDC was respectively 105.7ms, 2.77%, 0.37% and 4.08% for T1, MTR, MTsat and MTV when quantified over all WM, while the standard-deviation across subjects was 70.5ms, 1.34%, 0.20% and 2.44%. Even though particular WM regions did exhibit significant differences, these differences were on the same order as test-retest errors. No significant difference was found between age groups for all metrics.
While T1-based metrics (T1 and MTV) exhibited better reliability than MT-based measurements (MTR and MTsat), the observed differences between subjects or WM regions were comparable to (and often smaller than) the MDC. This makes it difficult to determine if observed changes are due to variations in myelin content, or simply due to measurement error. Measurement error remains a challenge in spinal cord myelin imaging, but this study provides statistical guidelines to standardize the field and make it possible to conduct large-scale multi-center studies.
Citation: Lévy S, Guertin M-C, Khatibi A, Mezer A, Martinu K, Chen J-I, et al. (2018) Test-retest reliability of myelin imaging in the human spinal cord: Measurement errors versus region- and aging-induced variations. PLoS ONE 13(1): e0189944. https://doi.org/10.1371/journal.pone.0189944
Editor: Fernando de Castro, Instituto Cajal-CSIC, SPAIN
Received: June 18, 2017; Accepted: December 5, 2017; Published: January 2, 2018
Copyright: © 2018 Lévy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The current Quebec law on the protection of private data prevents us from sharing all the MRI datasets this work is based on, even though they are de-identified, because they contain potentially sensitive information. This restriction has been imposed by the ethical review board of the Research Center of Montreal University Geriatric Institute (Comité mixte d’éthique de la recherche du RNQ, approval number CMER-RNQ_14-15-010). To contact the research ethics board of the RNQ, please refer to Mrs Karima Bekhiti (phone: +1 514 527-9565 #3223; email: firstname.lastname@example.org). Notwithstanding this limitation, we are willing to share the MRI data from four members of the lab, from which we have obtained explicit approval. Furthermore, we are also sharing the processing scripts used in this work, which will enable other researchers to re-run the analyses on the shared datasets. The data and scripts have now been uploaded to https://osf.io/ezmrj/.
Funding: This study was funded by the Canada Research Chair in Quantitative Magnetic Resonance Imaging (JCA), the Canadian Institutes of Health Research [CIHR FDN-143263] (JCA), the Canadian Institutes of Health Research [CIHR MOP-130341] (JCA and PR), the Fonds de Recherche du Québec—Santé [FRQS-28826] (JCA), the Fonds de Recherche du Québec—Nature et Technologies [2015-PR-182754] (JCA), the Natural Sciences and Engineering Research Council of Canada [NSERC-435897-2013] (JCA), the Natural Sciences and Engineering Research Council of Canada [NSERC 2016-06774] (NS), the Quebec BioImaging Network (JCA) and the Montreal Heart Institute Foundation (NS). Those funds were used for MRI data acquisition, computer and software resources and authors’ funding.
Competing interests: The authors have declared that no competing interests exist.
1.1. Quantitative MRI
Precise techniques are needed to monitor microstructural degeneration of the nervous tissue in clinics, especially for longitudinal follow up of white matter (WM) lesions in neurodegenerative pathologies, such as demyelination in multiple sclerosis. Rather than using MRI as a technique for simply viewing the anatomy, quantitative MRI (qMRI) aims to provide quantitative metrics related to some tissue properties. To date, several qMRI metrics have been proposed to characterize myelin content in the WM.
The longitudinal relaxation time T1 has shown high correlation with the myelin volume quantified by histology [1–3]. However, T1 is also affected by iron concentration , and it is difficult to disentangle the specific contribution of myelin and iron because of their co-localization . The Magnetization Transfer Ratio (MTR) has also shown high correlation with histopathology of myelin in multiple sclerosis patients [2,3]. However, MTR consists of various contributions (T1 and fraction F of exchanging protons bound to macromolecules) [6,7], which in some cases work against each other, reducing its sensitivity to myelin [2,8]. In this perspective, the quantification of the saturation imposed by an off-resonance pulse (MTsat) has been proposed to minimize T1 effects and increase the specificity to myelin .
Proton density (PD) is also a promising metric, as it measures the density of MRI-visible protons–i.e. protons with sufficiently long transversal relaxation time (T2)–which are water (or liquid) protons. In the Central Nervous System (CNS), the complement of PD yields an estimate of the density of non-free protons, which are mostly bound to lipids and other macromolecules. Since myelin consists of 70 to 80% lipids and some macromolecules [9,10], this index can be expected to be a good marker of myelin content. Several PD estimation techniques and studies in the CNS have been published [11–23]. The complement of PD has been recently named Macromolecular Tissue Volume (MTV) [24,25] and its sensitivity and specificity to myelination was tested. MTV showed high accuracy and precision when quantifying the lipid content in phantoms. In addition, the MTV significantly decreased in the WM of multiple sclerosis patients compared to controls, showing independence from fiber geometry, unlike the Fractional Anisotropy (FA) from Diffusion Tensor Imaging (DTI). However, since MTV is defined as the fraction of non-liquid protons, it includes more than the volume occupied by myelin, raising the question of its specificity to myelin.
Myelin Water Imaging (MWI) using multi-echo T2  is another myelin mapping technique that has shown good sensitivity to myelin content in MS patients post-mortem  and in vivo . While the earliest implementations of MWI were not clinically feasible, techniques such as Gradient- And Spin-Echo (GRASE [29,30]) were shown to speed up the acquisition . Further investigations are ongoing.
The time constant of the transverse relaxation due to spin-spin interactions and local field inhomogeneities (T2*) has also exhibited sensitivity to myelin [32–34]. However, T2* includes important contributions from other factors, such as iron content [4,35], fiber orientation , blood vessels  and blood oxygen level .
Inhomogeneous Magnetization Transfer (ihMT) ratio is another recent metric  that is thought to be particularly sensitive and specific to myelin [40,41]. However, the measurement of this metric requires non-product sequence which are currently not available on clinical scanners.
The above-mentioned metrics have their own advantages and limitations in quantifying myelin content in the CNS. To compare them, the relevant criteria for a myelin biomarker needs to be defined properly. Sensitivity and specificity are often the outstanding criteria. Here, sensitivity refers to the ability of the metric to monitor the variations in myelin content, while the specificity describes its exclusivity to myelin variations, i.e. to what extent the variations in the metric values are due to variations in the myelin content only. However, before tackling the sensitivity and specificity of a metric, it is essential to assess its repeatability. Indeed, sensitivity and specificity cannot be determined precisely if the metric values dramatically change between different scan sessions. The repeatability refers to the agreement (measurement precision) between two or more measurements made at different time points under the same conditions (e.g., same protocol, same scanner, same subjects, etc.) . The repeatability must not be mistaken with reproducibility, which refers to the agreement between two or more measurements made at different time points under changing conditions. In both repeatability and reproducibility studies, the reliability is a relevant aspect to assess. The reliability compares the variability of scores due to measurement errors to the variability in the “true”, error-free scores, i.e. to the variability induced by true variations of the measured feature (e.g., true variations in myelin content).
1.3. Review of past studies on qMRI metrics repeatability
The question of repeatability is even more relevant for spinal cord studies, where noise, motion and susceptibility artifacts make it difficult to acquire high quality images . Previous studies investigated the repeatability of quantitative MRI metrics. Taso et al.  reported the repeatability of MTR, ihMTR and DTI (Diffusion Tensor Imaging) indexes within 3 healthy subjects at 3 time points by means of coefficients of variations (CV), defined as the ratio of the between-scans standard-deviation over the mean across scans. However, this index does not allow to properly compare between different metrics, as the means can differ drastically across metrics or even for a single metric across different studies (e.g., MTR ), yielding lower CVs for metrics with higher mean values. Smith et al.  also reported the test-retest repeatability of DTI and MT metrics within 9 healthy subjects at 2 time points using the normalized Bland-Altman difference (i.e. mean difference between scans divided by the mean across scans), which makes it harder to compare the repeatability between metrics with different means. Grussu et al.  reported the test-retest repeatability of NODDI (Neurite Orientation Dispersion and Density Imaging) indexes within 5 heathy subjects. The test-retest reliability was quantified by means of Intra-Class Correlation (ICC) coefficients defined as the ratio of the inter-subject variance over the total variance (i.e. the sum of the within- and between-subjects variances). Smith et al.  assessed the repeatability of MTR and F (fraction of exchanging protons bound to macromolecules) from quantitative magnetization transfer (qMT) imaging by means of the 95% confidence interval for the test-retest difference. However, this estimate of the measurement error was not properly compared neither between metrics nor in the context of the differences observed between (expected) different myelin contents.
The test-retest repeatability has been studied extensively in research fields other than qMRI, notably in rehabilitation research [49–53]. Useful statistical indexes to quantify repeatability are provided. First, the existence of a systematic bias between test and retest measurements can be examined by the confidence interval for the test-retest difference (CId), as used in Smith et al. . Then, the reliability can be assessed by the intra-class coefficient based on a two-way mixed effects model of analysis of variance. Finally, groups can be compared taking measurement errors into account (which is not done with usual statistical tests) using CId, showing whether the difference between groups is distinguishable from measurement errors or not. In the same vein, one can compute the Minimum Detectable Change (MDC) to quantify the minimum difference between two single metric values that is necessary to report a “true” error-free change, again taking the measurement errors into account. The MDC is particularly appropriate and intuitive for clinicians who would like to assess whether a treatment affects their patient or not.
In this work, we propose a statistical framework to quantify the test-retest reliability of qMRI metrics. We (i) quantify the repeatability of T1, MTR, MTsat and MTV in the spinal cord using a clinically-compatible protocol and (ii) evaluate the sensitivity of these metrics to myelin content across spinal pathways and age groups, in the context of the test-retest measurement errors.
2. Material and methods
2.1. Data acquisition
Thirty-three right-handed healthy subjects including 19 young (aged 24.9 ± 3.9, from 21 to 33 y.o.; 9 women, 10 men) and 14 elderly (aged 67.4 ± 4.0, from 61 to 73 y.o.; 6 women, 8 men) were recruited. A written consent form was obtained from each participant as supervised by the ethical review board of the Research Center of Montreal University Geriatric Institute (Comité mixte d’éthique de la recherche du RNQ, approval number CMER-RNQ_14-15-010).
To assess the metrics repeatability, 8 young (aged 24.0 ± 3.9, from 21 to 31 y.o., 2 women, 6 men) and 8 elderly (aged 67 ± 4.5, from 61 to 72 y.o., 2 women, 6 men) subjects from the previously described cohort underwent two scanning sessions: 12 subjects were scanned twice within a 10-month interval, and 4 within the same session (with a 5-minute break out of the scanner between scan and rescan). All data were acquired on a 3T Siemens TIM TRIO scanner and with a standard 12-channels head coil and a standard 4-channels neck coil.
The protocol consisted of:
- One sagittal turbo-spin-echo 3D SPACE T2-weigthed anatomic image (TR = 1500 ms; TE = 119 ms; flip angle = 120°; BW = 723 Hz/voxel; matrix = 384x384x52; resolution = 1x1x1 mm; FOV = 384x384x52 mm) with a high contrast between cord and cerebrospinal fluid (CSF) to further take the curvature of the cord into account in the data processing;
- Four 3D FLASH acquisitions (TR = 35 ms; TE = 5.92 ms; BW = 260 Hz/voxel; matrix = 192x192x22; resolution = 0.9x0.9x5 mm; gap = 1 mm; FOV = 174x174x110 mm; R = 2 acceleration; phase encoding direction = right-left). The four FLASH scans consisted of:
- ○. One with a prior RF saturation pulse (Gaussian-shaped, duration = 9984 μs, offset frequency = 1.2 kHz) and an excitation flip angle of 10°;
- ○. Three without a saturation pulse and flip angles of 4°, 10°, and 20°;
- Two axial 2D segmented spin-echo EPI acquisitions (TR = 3000 ms; TE = 19 ms; BW = 1905 Hz/voxel; matrix = 64x64, 17 slices; resolution = 3.0x3.0x5.5 mm; FOV = 192x192 mm) with a flip angle of 60 and 120° respectively (for B1+ estimation purposes);
All images spanned at least C2 to C5 vertebral bodies. The duration of the protocol was 18 minutes.
2.2. Data processing
Analysis was performed using the Spinal Cord Toolbox (SCT) version 2.2.3 . The four datasets were first co-registered, then metrics were calculated. For extracting metrics within specific pathways in the white matter (dorsal column, DC, lateral funiculi, LF, ventral funiculi, VF), data were registered to the MNI-Poly-AMU template , which includes an atlas of WM tracts . For sake of clarity, details about the processing pipeline are included in the supplementary material (see S1 File in section 8. Supporting information).
2.3. Statistical analysis
Statistical analyses were performed using MATLAB R2014a (The MathWorks, Inc., Natick, Massachusetts, USA) and SPSS (IBM SPSS Statistics–Release 126.96.36.199) at the 0.05 significance level unless otherwise stated.
Systematic change between test and retest
The mean of the difference between test and retest across subjects was computed along with a 95% confidence interval for the true test-retest difference (CId) derived according to: where is the Standard Error, SDd is the standard-deviation (SD) of the difference between test and retest across the subjects, n is the number of subjects and tn−1 is the t statistics with n − 1 degrees of freedom and type I error of 5% . In our case, tn−1 = 2.131.
If zero is not included in CId, we can consider that a systematic change between test and retest has occurred . In addition to assess the systematic bias between test and retest, the CId gives the minimum difference between two subjects groups that is distinguishable from measurement errors.
Absolute test-retest difference
The absolute difference between test and retest, termed |d|, and its mean across subjects () were computed to give to the reader a basic and direct measure of the measurement errors magnitude.
The Intra-Class Correlation (ICC) coefficient is an appropriate coefficient to assess the test-retest reliability . It measures the proportion of variance that is attributable to the “true” error-free scores of subjects (inter-subject variance) compared to the total variance (“true” variance + variance due to measurement errors). The ICC is calculated from a 2-way mixed effects model of repeated-measures analysis of variance which particularly fits any kind of test-retest experiment designs: the total variance is partitioned between within- and between-objects (subjects) variances. A commonly used index to report repeatability is the Pearson’s correlation coefficient. The ICC coefficient value is often close to the Pearson’s correlation value. However, the ICC includes a penalization for a systematic error between measurements (in this case, the ICC would be lower than the Pearson’s) and it can also assess the reliability of a measure based on more than two measurements by subjects (thanks to the model of analysis of variance used for computation). Moreover, the Pearson’s coefficient normalizes each measurement by its own mean and SD, whereas the ICC normalizes the variables by the pooled mean and SD of both measurements. So if the variables do not have a common unit and variance, the Pearson’s is more appropriate. But, for test-retest measurements having the same units, the ICC is a better index .
The higher the ICC, the higher the reliability; the upper threshold above which the ICC would reflect a good reliability remains subjective and depends on the application but we can still refer to the scale proposed by Shrout and Fleiss , Fleiss  and Cicchetti : poor < 0.4 < fair < 0.6 < good < 0.75 < excellent ≤ 1. Chinn  suggests that measure needs to have at least an ICC coefficient of 0.6 to be useful. Contrary to the other repeatability indexes of this section, the ICC coefficient is a dimensionless index.
In this study, the ICC coefficient was computed according to the Matlab implementation of McGraw and Wong  (case 3A).
Minimal Detectable Change
Another useful index is the Minimal Detectable Change (MDC). It estimates the minimal difference between two scores that would reflect a “true” difference (i.e., not completely due to measurement error). It can be derived according to: where is the Standard Error of Measurement and SDpooled is the standard-deviation across all measurements [49,63]. The MDC can also be interpreted as an interval for repeated measures. If x is the score of a subject for a single measurement, there is a 95% chance that the score of a repeated measurement lies within x ± MDC, assuming that the measurement errors are normally distributed. Any difference of ± MDC between two metric values can be considered as usual variation (due to measurement error); such a difference is not exceptional enough to be considered as a real change in the microstructure.
The MDC and the CId are based on the same idea of estimating the magnitude of the difference in metric values that can be only due to measurement errors. However, the MDC applies for two single metric values whereas CId, which takes into account the sign of the difference between test and retest, applies for group comparison where negative measurement errors compensate for positive ones.
Comparison of indexes with different units across studies
To allow the comparison between techniques having different measuring units, one can express the repeatability indexes as a percentage of the mean across all measures, similar to calculation of the coefficient of variation (CV = 100 ∙ SD/mean). This method works fine when the mean is similar between techniques, otherwise the comparison is biased by the mean. For example, it has been shown that MTR could lead to drastically different mean values when acquired with different offset saturation pulse parameters, e.g. from 9 to 51% in the healthy WM . Hence, normalizing by the mean would yield lower indexes for techniques with higher mean value, whereas these techniques could have the same test-retest repeatability as other techniques with lower mean values. To avoid this while still being able to compare between techniques side by side, we expressed these reliability indexes as a percentage of the SD across subjects of the first MRI session values only (SDsubjects), i.e.: where Index represents any reliability index expressed in the metric unit such as the MDC. Indeed, this manipulation enables us to compare metrics side by side while accounting for the property we are looking for. Here, we are looking for a metric that has low test-retest variability relative to the inter-subject variability, i.e. relative to the dispersion of the sample this metric can offer. The SD across subjects is the most basic measure of the sample dispersion. In this way, we would like the to be as low as possible (i.e., a low measurement error and a high SD across subjects) in order to observe differences between subjects that are higher than measurement errors.
2.3.2. Sensitivity to myelin content variations.
To assess the metrics sensitivity to the variations in myelin content across vertebral levels/WM regions relative to the repeatability, differences in group mean (n = 33) between levels/regions were compared along with their measurement error (assessed by the CId).
Moreover, a one-way repeated measures ANOVA between levels/regions was performed independently for each metric (n = 33). The assumptions of normal distribution within each group (i.e., level or WM region) and of sphericity were checked using Lilliefors’s test and Mauchly's test respectively. When the assumption of sphericity was not met, a Greenhouse-Geisser correction was used to compute the ANOVA. When the ANOVA detected a significant difference, a post hoc multiple comparison test using the Tukey's honestly significant difference criterion was performed in order to find which groups were significantly different from each other.
To test the metrics sensitivity to the demyelination with aging reported by histology in the literature [64–66], for each vertebral level/WM region, means across each age group were compared taking the measurement error (assessed by the CId from the previous analysis) into account in order to investigate whether the difference in means could reflect a “true” difference or whether it is indistinguishable from measurement errors.
In addition, to test for significant differences, we performed independently for each metric, on the larger sample (n = 33, nyoung = 19, nelderly = 14), two-way repeated ANOVAs with the age group as between-subjects factor and, as within-subjects factor:
- vertebral levels to determine if this effect was consistent across levels (the metric being quantified in the whole WM);
- ROIs (WM, DC, LF, VF) to determine if this effect was consistent across ROIs (the metric being quantified from C2 to C4).
Finally, to complete this study, a power analysis was performed for two-tailed t-tests between young and elderly subjects based on whole WM values of each metric.
Fig 1 shows test and retest multi-parametric maps by vertebral levels, for one single young and one single elderly subject, as well as for the group average (n = 33). The single subject data look noisy, however the average map shows clear distinction between WM and GM. Moreover, the symmetry that can be observed on the group average maps suggests no apparent differences in myelin content between left and right cord. In all metrics, the heterogeneity of values across WM regions suggests different microstructural compositions. For example, the fasciculus cuneatus shows higher MTV than the fasciculus gracilis, suggesting higher myelin content in agreement with previous histology studies [1,67]. Apart from MTR, all metrics show fairly stable values across vertebral levels.
All these maps are in the template space. Note that the color bar scale has been adjusted to the mean maps contrast. On a single-basis subject, one can observe a somewhat poor test-retest repeatability, within and across slices. However, despite this poor repeatability, the average maps (here, n = 33) are more consistent in terms of symmetry and tract-specific variations. For example, we can clearly distinguish higher MTV in the fasciculus cuneatus versus in the gracilis (dorsal column), which is in agreement with previous histology work [1,67].
A guide for reading (and understanding) figures and tables in the paper.
Fig 2 shows intra- and inter-subject differences for metrics quantified in the WM. Fig 2 is a subset of Table 1, which quantifies the metrics repeatability over all WM at the different cervical levels (Fig 3 and Table 2 are their analogs quantifying the metrics repeatability over all reliable levels within the different WM sub-regions). Let’s take an example to better explain how to use these repeatability indexes. Let’s take the T1 at C3. Regarding only one scan, the mean T1 across the group is 1007.2ms and the SD is 74.3ms. A 95% confidence interval for the mean test-retest difference of [-38.5; 23.1]ms indicates that if we rescan the same group a second time, the mean is likely to lie between 968.7 and 1030.3ms (with 95% probability). Now, if we measure T1 at C3 in a different group (e.g., a group of patients) and the resulting mean lies between 968.7 and 1030.3ms, we will not be able to report whether the difference in T1 between the two groups is due to measurement errors or to a true difference in T1. The MDC (113.2ms in our example case) will be useful for instance in a case where a clinician measures the T1 in a new lesion of his patient at one time point t; say he gets a measure of T1(t) = x ms. If he re-measures it right after, there is 95% probability that T1(t + 30min) lies within x ± 113.2 ms. Now, if he wants to control the evolution of the lesion one year later and he measures T1(t + 1year) still within x ± 113.2 ms, he will not be able to say whether this change between T1(t) and T1(t + 1year) is due to an evolution of the tissue or to measurement errors.
The top and bottom of the orange boxes respectively represent the max and min among test and retest, while the black line in the middle of the box represents the mean. Note that the y-axis does not start from zero for the sake of clarity. The mean absolute difference between test and retest (mean height of orange boxes, ) is displayed in the top left hand corner of each graph. This figure gives a comprehensive view of the repeatability compared to between-subject differences.
The top and bottom of the orange boxes are respectively the max and min among test and retest, while the black line in the middle of the box is the mean. The mean absolute test-retest difference (mean height of orange boxes, ) across subjects is displayed in the top left hand corner of each graph. Due to its tiny size and its border location between GM and CSF, the VF yields the largest test-retest variations.
The ICC and the MDC (expressed in percentage of the SD across subjects) are useful to compare repeatability across metrics (more extensively done in Fig 4). For example, if we compare T1 to MTR at C3, the ICC is much higher for T1 (0.72) than MTR (-0.3)–note here that the interpretation of a negative value for the ICC is the same as for a null value (very poor reliability). This is because T1 has a lower test-retest variation ( = 47.1ms in Fig 2) compared to the variation between subjects (SDsubjects = 74.3ms in Table 1), whereas MTR has a high test-retest variation ( = 1.43% in Fig 2) compared to the variation between subjects (SDsubjects = 1.38% in Table 1). This also reflects in the MDC (). For T1 at C3, MDC = 113.2ms, which is 152.3% of SDsubjects (Table 1), whereas for MTR at C2, MDC = 3.76%, which is 271.6% of SDsubjects. This result shows that measurement errors in MTR cover almost 3 times the standard variations between subjects, making it difficult to observe true differences in MTR.
Comparison between the repeatability of the four myelin-sensitive metrics when the metric is estimated (A) in the whole WM by vertebral level and (B) from C2 to C4 within WM sub-ROIs. Repeatability indexes from left to right: mean absolute test-retest difference (), Intra-Class Correlation (ICC) coefficient, Minimal Detectable Change (MDC). and MDC are expressed in percentage of inter-subject SD in order to assess the repeatability relative to the differentiation between subjects (i.e., the reliability), despite the different units of the metrics.
The mean test-retest difference (, displayed in gray at the top left of each graph) is higher at C5 (Fig 2); however, one-way repeated ANOVAs testing the effect of vertebral levels on the absolute test-retest difference did not report significant results (p-values were 0.183, 0.195, 0.389 and 0.579 for T1, MTR, MTsat and MTV respectively). No clear test-retest difference between young and elderly subjects is observed on this graph.
For all metrics and all levels, no significant systematic bias between test and retest is detected (all CId include 0, see Table 1). When compared to other metrics, mean MTsat shows minimal variations across vertebral levels (p-values of the repeated ANOVAs between levels were <<0.0001, <<0.0001, 0.02 and <0.0001 for T1, MTR, MTsat and MTV respectively). The ICC coefficient highlights a poor test-retest reliability, barely exceeding 0.5, especially for MTR and MTsat. This point is supported by the MDC, which is generally around 2 times the SD across subjects.
Fig 3 shows repeatability results within sub-regions of the WM: dorsal column (DC), lateral funiculi (LF) and ventral funiculi (VF). Overall, the VF shows the largest test-retest differences. These observations were confirmed (except for MTsat which shows large test-retest differences in the DC) by one-way repeated ANOVAs performed between ROIs on the absolute test-retest difference (p-values <0.01, 0.01, 0.08, <0.01 for T1, MTR, MTsat, MTV respectively). In addition, similar repeatability is found when the metrics are estimated over all WM or within the DC or the LF.
Fig 3 is a subset of Table 2, which quantifies the metrics repeatability within sub-ROIs of the WM from C2 to C4. Interestingly, MTsat performs really differently according to the ROI, yielding the worst repeatability result in the DC (ICC = 0.1, MDC ≈ 3 inter-subject SDs) and the best one in the LF (ICC = 0.82, MDC ≈ 1.2 inter-subject SDs). Note however that estimating the metric at several levels (here, C2 to C4) is not favorable to MTsat given that its ICC in WM at C4 is half its ICC at C3 (Table 1). Overall, T1 and MTV yield the best results. MTV regularly shows a fair repeatability whatever the ROI is, with a MDC about 1.5 to 2 times the inter-subject SD (which is equivalent to 87–95% of the sample distribution). In the level-wise analysis, MTV performs slightly better than T1. We suspect that these results reflect the clearer delineation between the cord sub-regions and the more homogeneous values in those sub-regions that could be observed in MTV maps when compared to T1 or even MTsat maps (Fig 1). Furthermore, as expected, MTR regularly performs worst, in part because of the low contrast between subjects it exhibits, whatever the ROI is.
Fig 4 compares three main repeatability indexes (absolute test-retest difference, ICC and MDC) between the different metrics. While no particular metric stands out from this comparison, MTR seems to be the least reliable at every level. For most of the vertebral levels, of MTR is on the same order as the inter-subject SD (which is equivalent to 68% of the population if we assume a Normal distribution for the sample), the ICC is below 0.4 at every level and the MDC exceeds 2.5 inter-subjects SDs (equivalent to 98.8% of the population) at 2 levels over 4. When considering the effect of vertebral level, C5 seems to be the least reliable (ICC < 0.5 for all metrics). Regarding the effect of WM regions (Fig 4B), some differences are observed. For instance, MTsat yields the best ICC score in the LF (0.82) and the worst in the DC (0.1).
3.2. Sensitivity to myelin content
This section deals with the larger sample (n = 33 subjects).
3.2.1. Effects of vertebral levels and WM regions.
Fig 5 plots the group mean along with the measurement error magnitude (CId) in order to allow the reader to assess whether differences between vertebral levels or WM regions can be distinguished from measurement errors or not. Individual subjects data are also plotted to see if differences between subjects can be carried out despite the measurement error. However, for individual comparison, measurement errors are assessed by the MDC, which is much larger than the CId (as negative and positive errors do not compensate for each other). Only T1 and MTV seem to allow the comparison between some healthy subjects.
The red envelope represents the 95% confidence interval for the test-retest difference (CId), which assesses the measurement error magnitude of the group mean (in black). The orange envelope represents the MDC (Minimum Detectable Change), difference required to compare individual subjects (faded gray lines). Note that the group mean approaching the edges of the CId (red envelope) reflects an asymmetric confidence interval due to a non-null offset between test and retest (non-null mean test-retest difference, ). However, no offset was large enough to report a significant systematic bias between test and retest (see section 3.1. Repeatability, Table 1 and Table 2).
The differences that are distinguishable from measurement errors were sum up in Table 3, along with the results of the one-way repeated ANOVAs. One can observe that some cases show significant differences but those differences are too small to be distinguished from measurement errors. This is the case for the MTR which is significantly different between every vertebral level but only C2 and C5 show a difference large enough to be due to something else than measurement errors. Also, significant differences between WM regions are found with MTR and T1 but none of them are larger than measurement errors.
3.2.2. Effect of age.
Fig 6 compares the differences between young and elderly to the measurement errors assessed by the CId. With all metrics within every spinal cord region (vertebral level or WM region), the difference between young and elderly can always be explained by measurement errors only. Moreover, the repeated ANOVAs did not report any significant effect of age for all metrics, neither level-wise nor ROI-wise. However, we can still notice some general trends: T1, MTR and MTV generally support the demyelination with aging histologically reported in the literature, whereas MTsat constantly shows the reverse trend.
For each case, the corresponding 95% confidence interval for the mean test-retest difference (CId), estimated from the test-retest analysis (see section 3.1. Repeatability) was centered at the mean of each group, in order to assess whether the difference between young and elderly is larger than the test-retest errors or not. With all metrics within every spinal cord region (vertebral level or WM region), the difference in means between young and elderly was undistinguishable from measurement errors.
To complete this study, Table 4 reports the statistical power analysis. From this analysis, one can compare the difference that can be detected given the metrics test-retest errors (length of the CId, 2nd column) to the minimum difference in the true metric values required to detect a significant difference (1st column) between young and elderly (with a fair test power). We can notice for example that, given the measurement errors of MTR (1.36%), even if the difference in means were large enough (≥1.27%) to yield a significant result, the imprecision of measurement is too large to detect such a difference. It is not the case with the other metrics. Moreover, we can notice that the observed differences in means (3rd column) are very low compared to the difference needed to obtain significant results (1st column), yielding very low statistical power for those tests (4th column). Finally, given the large sample size required to obtain a significant difference (5th column), T1 and MTV do not seem sensitive to age groups (based on their mean WM values in this study).
This study proposes a statistical framework for comparing clinically feasible myelin imaging techniques (T1, MTR, MTsat and MTV) in the cervical spinal cord.
4.1. Myelin-sensitive metrics values in the spinal cord
The resulting mean values across subjects are in agreement with previous studies. Stikov et al.  observed a T1 around 1000ms in the brain, which is comparable to the T1 in the spinal cord WM in-vivo at 3T [69,70]. The same holds for our MTV measurements which are in agreement with reported PD values [12,18–23,69,71]. There is no gold-standard for clinically feasible MT-based protocols due to their dependence on pulse sequence parameters. However, the values for MTR and MTsat we observed are also in agreement with literature [6,45,48,72–75].
Even for the most reliable metrics (T1 and MTV, see Fig 4), the ICC is moderate (around 0.5) and the MDC is on the order of two inter-subject SDs. Given the test-retest variations, the minimal difference between individual healthy subjects that can be detected with these metrics (MDC) is much larger than the usual variations we observed (see Fig 5). Looking at groups of subjects, significant differences between spinal cord regions stand out but still, they are not large enough to be distinguished from measurement errors (quantified by the CId in this case, as shown in Fig 5).
In comparison with the brain, repeatability in the spinal cord is hampered by multiple sources of artifacts (motion, susceptibility) and low SNR . Better repeatability might be achieved with coarser resolution and/or more averaging, though at the cost of longer acquisition times, which could be associated with more subject motion.
Taso et al.  reported results for myelin-related metrics in the spinal cord WM: a CV of 5.3% for MTR and 2.9% for ihMT ratio. However, this study reported the repeatability in terms of CVs, which are misleading when comparing metrics with different units and/or dynamic ranges (as mentioned in section 2.3.1. Repeatability). Smith et al.  reported a CId of [− 3%, +5%] for MTR over all WM from C2 to C5 within 10 young healthy subjects. Even if the repeatability of the metrics reported in our study is not good enough to differentiate between WM regions or age groups, it is still much better (CId of [− 0.99%, +0.54%] for MTR). This may suggest that significant differences not accounting for precision of measurements might have been reported in the literature, whereas they could be only explained by measurement errors.
Looking at the metrics individually, T1-based metrics (MTV and T1) generally show the best reliability (Fig 4). Regarding sensitivity to myelin, MTV shows clearer delineation of the GM and smooth variations in the WM (Fig 1), but no difference between WM regions stood out when compared to the measurement error. When looking at individual maps, T1 seems particularly affected by cord movements and compressions occurring during respiratory and cardiac cycles (Fig 1), which produces statistically significant differences (see Table 3), but those differences are not larger than measurement errors. The same applies for MTR, which emerges as the less reliable metric due to its very small variation between subjects (Fig 4). However, MTR is the only metric exhibiting a significant effect that accounts for measurement error (difference between vertebral levels C2 and C5 in Table 3). This decrease in MTR towards lower levels could reflect a true decrease in myelin content, but could also be due to B1+ inhomogeneity. MTR variations due to B1 errors have already been reported in the brain  and correcting for them should be further investigated in the spinal cord. MTsat minimizes the T1 contribution included in MTR, and is thereby less variable across vertebral levels.
4.3. Sensitivity and specificity to myelin with MRI
The assessment of the sensitivity of metrics to myelin content remains difficult, due to the lack of a ground truth. A loss of myelinated fibers with aging (mainly the small caliber ones) was observed histologically in the brain  and cervical spinal cord [64–66] but it remains unclear if these variations can be detected by clinical MRI nowadays. Age effects have been reported in the brain with MTR  and DTI [79–82]. In the spinal cord, most age effects are reported with DTI [83–85]. One study investigated MTR evolution in the spinal cord during aging, but no significant effect was reported . The same study reported a decrease in ihMT ratio between subjects aged 35 to 50 and subjects aged over 50, not accounting for measurement errors however. Our study did not observe any difference between age groups, with or without accounting for measurement error (Fig 6). This lack of sensitivity to aging could be due to the choice of acquisition parameters, the small effect/sample size, or simply due to a lack of true differences in myelination.
As noted in the introduction, some of the myelin-sensitive techniques are also hampered by confounding factors. For example, T2* is affected by iron content, fiber orientation, blood vessels and blood oxygen level. MTR is affected by T1 and B1 field, and more generally, magnetization transfer and MTV are sensitive to macromolecules (i.e., not only myelin). For each of these techniques, there are ways to mitigate those confounds. For example, quantitative susceptibility maps could inform T2* maps, or T1 and B1+ fields could be acquired to correct MTR maps . All these strategies come at the cost of additional scan time, and possibly larger output variance (due to the introduction of yet other noisy measures).
While DTI has some intrinsic limitations, other techniques also based on diffusion-weighted imaging might offer more sensitivity to myelin. It is important to note, however, that because water protons trapped between myelin sheaths have a short T2 (around 10 ms at 3T, which could be quantified using myelin water fraction techniques) and that protons from bound molecules have an even shorter T2 (order of μs, which could be quantified with ultra-short TE imaging or magnetization transfer techniques), diffusion-weighted protocols typically use a TE (> 60ms) too long to be sensitive to signal coming from the myelin (and from water trapped in it). Some advanced diffusion-weighted techniques include NODDI [47,86], which can notably estimate the intra-cellular volume fraction and CHARMED/AxCaliber [87–89], which can notably estimate the hindered (extra-cellular) and restricted (intra-cellular) water fraction. All these metrics are thus indirectly related to the myelin volume fraction, although additional information would be required to be able to quantify absolute myelin content.
To improve specificity to myelin, combining several metrics, using for example independent component analysis, or acquiring maps of confounding factors for a posteriori corrections, might be advisable . Future work will be undertaken in this direction .
4.4. Perspective of repeatability assessment
Repeatability assessment is crucial for the development of qMRI biomarkers. Our results show that significant differences between groups can be reported with standard statistical tests, yet these differences are comparable to (or even smaller than) test-retest measurement errors. Controlling for both aspects (statistical significance and measurement errors) is necessary for qMRI studies.
The indexes reported in this work (95% confidence interval for the test-retest difference (CId), ICC and MDC) are useful for quantifying repeatability and allowing comparisons across studies. As mentioned before, the coefficient of variation depends on the magnitude of the metric, and should not be the primary index for assessing repeatability, especially if metrics have different means or units. The CId first allows to control for the existence of a potential systematic bias between measurements (i.e. scan sessions). In addition, it gives an estimation of the measurement error for group averages. In the same vein, the MDC provides a measure of the minimum difference between two individual measurements to report a true difference, taking into account the measurement errors. For example, the CId would be useful for researchers comparing different populations, whereas the MDC would be useful for a clinician needing to assess the evolution of a WM lesion within a single patient. Furthermore, the ICC coefficient has the advantage to be dimensionless, and can thus be easily compared to assess reliability across metrics, studies, vendors or sites. Aside from providing a robust quantification of the repeatability with two measurements (test-retest studies), the ICC coefficient (and consequently, the MDC) can also be consistently used with more than two measurements. Those reliability indexes have already been extensively used in test-retest studies from other research fields, such as rehabilitation, where the precision of tests is crucial [49–53]. In this work, the absolute test-retest difference (|d|) was reported to provide the reader with a direct and basic measure of measurement errors; however, this index is not sufficient to estimate the repeatability and compare it across studies.
Finally, the assessment of the repeatability needs to be adapted to the study goals. Indeed, the ICC depends on the sample homogeneity. Therefore, if the goal is to differentiate between the microstructure of healthy subjects, including patients in the sample will artificially increase the between-subjects variability and overestimate the ICC. In this study, we can confidently assert that the ICC is lower (and the MDC is higher) than it would have been for a sample that includes patients and controls. Therefore, if the goal is to distinguish between pathological cases, we recommend including the different types of tissue (healthy and pathological tissues, with different stages of the disease) in the cohort. This way, the MDC and ICC would integrate the associated between-subjects variability.
4.5. Data sharing
Due to IRB restrictions, all data used here could not be publicly shared. However, we obtained specific consent for sharing MRI data from four young volunteers. Three of them were part of the tested and retested group. Along with those datasets, we provide the batch scripts used to produce the myelin-sensitive metric maps and to register them to spinal cord template and white matter atlas. Also available is a Microsoft Excel spreadsheet gathering all results of the metric estimations within each region of interest for every scan session and every volunteer of the cohort. The 1st tab of the sheet corresponds to the tested and retested cohort only (n = 16), and the 2nd tab corresponds to the whole cohort (n = 33). Finally, also shared are the scripts to extract these metrics values, to compute the statistical indices for reliability assessment and to produce the figures presented in this work. All these data and code are available at: https://osf.io/ezmrj/.
In this study, we assessed the repeatability and distribution of myelin-sensitive metrics (T1, MTR, MTsat and MTV) in the spinal cord. T1 and MTV (1 – proton density) showed the best reliability regarding the inter-subject variations, but the measurement error remains too large to detect differences between healthy individuals. T1, MTR and MTV showed trends consistent with the hypothesis of demyelination with aging, but again the differences were not large enough to be distinguishable from measurement errors, or to be significant.
This study used a range of statistical tools to explore the differences between myelin-sensitive metrics. We show that even though statistically significant differences can be reported using standard statistical tests, an important proportion of these differences can be attributed to measurement error. In particular, the coefficient of variation is a misleading index when comparing metrics with different units, and we recommend using the MDC when comparing individual measurements, and the 95% confidence interval of the test-retest difference when comparing groups. The indexes explored in this study allow for a fair comparison of qMRI metrics across studies, MRI vendors and sites, leading toward standardizing the field of myelin imaging and increasing its clinical relevance.
- 1. Bot JCJ, Blezer ELA, Kamphorst W, Nijeholt GJLà, Ader HJ, Castelijns JA, et al. (2004) The Spinal Cord in Multiple Sclerosis: Relationship of High-Spatial-Resolution Quantitative MR Imaging Findings to Histopathologic Results. Radiology 233: 531–540. pmid:15385682
- 2. Mottershead JP, Schmierer K, Clemence M, Thornton JS, Scaravilli F, Barker GJ, et al. (2003) High field MRI correlates of myelin content and axonal density in multiple sclerosis. Journal of Neurology 250: 1293–1301. pmid:14648144
- 3. Schmierer K, Scaravilli F, Altmann DR, Barker GJ, Miller DH (2004) Magnetization transfer ratio and myelin in postmortem multiple sclerosis brain. Annals of Neurology 56: 407–415. pmid:15349868
- 4. Stüber C, Morawski M, Schäfer A, Labadie C, Wähnert M, Leuze C, et al. (2014) Myelin and iron concentration in the human brain: A quantitative study of MRI contrast. NeuroImage 93, Part 1: 95–106.
- 5. Fukunaga M, Li T-Q, van Gelderen P, de Zwart JA, Shmueli K, Yao B, et al. (2010) Layer-specific variation of iron content in cerebral cortex as a source of MRI contrast. Proceedings of the National Academy of Sciences 107: 3834–3839.
- 6. Helms G, Dathe H, Kallenberg K, Dechent P (2008) High-resolution maps of magnetization transfer with inherent correction for RF inhomogeneity and T1 relaxation obtained from 3D FLASH MRI. Magnetic Resonance in Medicine 60: 1396–1407. pmid:19025906
- 7. Stikov N, Keenan KE, Pauly JM, Smith RL, Dougherty RF, Gold GE (2011) Cross-relaxation imaging of human articular cartilage. Magnetic Resonance in Medicine 66: 725–734. pmid:21416504
- 8. Schmierer K, Tozer DJ, Scaravilli F, Altmann DR, Barker GJ, Tofts PS, et al. (2007) Quantitative magnetization transfer imaging in postmortem multiple sclerosis brain. Journal of Magnetic Resonance Imaging 26: 41–51. pmid:17659567
- 9. Norton WT, Autilio LA (1966) The lipid composition of purified bovine brain myelin. Journal of Neurochemistry 13: 213–222. pmid:5937889
- 10. Laule C, Vavasour IM, Kolind SH, Li DKB, Traboulsee TL, Moore GRW, et al. (2007) Magnetic Resonance Imaging of Myelin. Neurotherapeutics 4: 460–484. pmid:17599712
- 11. Neeb H, Zilles K, Shah NJ (2006) A new method for fast quantitative mapping of absolute water content in vivo. NeuroImage 31: 1156–1168. pmid:16650780
- 12. Whittall KP, Mackay AL, Graeb DA, Nugent RA, Li DKB, Paty DW (1997) In vivo measurement of T2 distributions and water contents in normal human brain. Magnetic Resonance in Medicine 37: 34–43. pmid:8978630
- 13. Volz S, Nöth U, Deichmann R (2012) Correction of systematic errors in quantitative proton density mapping. Magnetic Resonance in Medicine 68: 74–85. pmid:22144171
- 14. Volz S, Nöth U, Jurcoane A, Ziemann U, Hattingen E, Deichmann R (2012) Quantitative proton density mapping: correcting the receiver sensitivity bias via pseudo proton densities. NeuroImage 63: 540–552. pmid:22796988
- 15. Abbas Z, Gras V, Möllenhoff K, Keil F, Oros-Peusquens A-M, Shah NJ (2014) Analysis of proton-density bias corrections based on T1 measurement for robust quantification of water content in the brain at 3 Tesla. Magnetic Resonance in Medicine 72: 1735–1745. pmid:24436248
- 16. Abbas Z, Gras V, Möllenhoff K, Oros-Peusquens A-M, Shah NJ (2015) Quantitative water content mapping at clinically relevant field strengths: A comparative study at 1.5 T and 3 T. NeuroImage 106: 404–413. pmid:25463455
- 17. Olivier N, Mark A, Fergus G, Michael B Sir (2009) Intensity correction with a pair of spoiled gradient recalled echo images. Physics in Medicine and Biology 54: 3473. pmid:19436101
- 18. Wehrli FW, BREGER RK, MacFALL JR, DANIELS DL, HAUGHTON VM, CHARLES HC, et al. (1985) Quantification of Contrast in Clinical MR Brain Imaging at High Magnetic Field. Investigative Radiology 20: 360–369. pmid:4044176
- 19. Farace P, Pontalti R, Cristoforetti L, Antolini R, Scarpa M (1997) An automated method for mapping human tissue permittivities by MRI in hyperthermia treatment planning. Physics in Medicine and Biology 42: 2159. pmid:9394404
- 20. Gutteridge S, Ramanathan C, Bowtell R (2002) Mapping the absolute value of M0 using dipolar field effects. Magnetic Resonance in Medicine 47: 871–879. pmid:11979565
- 21. Ernst T, Kreis R, Ross BD (1993) Absolute Quantitation of Water and Metabolites in the Human Brain. I. Compartments and Water. Journal of Magnetic Resonance, Series B 102: 1–8.
- 22. Danielsen ER, Henriksen O (1994) Absolute quantitative proton NMR spectroscopy based on the amplitude of the local water suppression pulse. Quantification of brain water and metabolites. NMR in Biomedicine 7: 311–318. pmid:7718431
- 23. Helms G (2000) A precise and user-independent quantification technique for regional comparison of single volume proton MR spectroscopy of the human brain. NMR in Biomedicine 13: 398–406. pmid:11114063
- 24. Mezer A, Rokem A, Berman S, Hastie T, Wandell BA (2016) Evaluating quantitative proton-density-mapping methods. Human Brain Mapping 37: 3623–3635. pmid:27273015
- 25. Mezer A, Yeatman JD, Stikov N, Kay KN, Cho N-J, Dougherty RF, et al. (2013) Quantifying the local tissue volume and composition in individual brains with magnetic resonance imaging. Nat Med 19: 1667–1672. pmid:24185694
- 26. Mackay A, Whittall K, Adler J, Li D, Paty D, Graeb D (1994) In vivo visualization of myelin water in brain by magnetic resonance. Magnetic Resonance in Medicine 31: 673–677. pmid:8057820
- 27. Laule C, Kozlowski P, Leung E, Li DKB, MacKay AL, Moore GRW (2008) Myelin water imaging of multiple sclerosis at 7 T: Correlations with histopathology. NeuroImage 40: 1575–1580. pmid:18321730
- 28. Vavasour IM, Laule C, Li DKB, Oger J, Moore GRW, Traboulsee A, et al. (2009) Longitudinal changes in myelin water fraction in two MS patients with active disease. Journal of the Neurological Sciences 276: 49–53. pmid:18822435
- 29. Feinberg DA, Oshio K (1991) GRASE (gradient-and spin-echo) MR imaging: a new fast clinical imaging technique. Radiology 181: 597–602. pmid:1924811
- 30. Prasloski T, Rauscher A, MacKay AL, Hodgson M, Vavasour IM, Laule C, et al. (2012) Rapid whole cerebrum myelin water imaging using a 3D GRASE sequence. NeuroImage 63: 533–539. pmid:22776448
- 31. Emil Ljungberg IV, Roger Tam, Youngjin Yoo, Alexander Rauscher, David Li, Anthony Traboulsee, Alex MacKay, Shannon Kolind. Rapid Myelin Water Imaging in Human Cervical Spinal Cord; 2016 Tuesday, May 10, 2016 Singapore, Singapore.
- 32. Pitt D, Boster A, Pei W, et al. (2010) IMaging cortical lesions in multiple sclerosis with ultra–high-field magnetic resonance imaging. Archives of Neurology 67: 812–818. pmid:20625086
- 33. Mainero C, Louapre C, Govindarajan ST, Giannì C, Nielsen AS, Cohen-Adad J, et al. (2015) A gradient in cortical pathology in multiple sclerosis by in vivo quantitative 7 T imaging. Brain 138: 932–945. pmid:25681411
- 34. Cohen-Adad J, Benner T, Greve D, Kinkel RP, Radding A, Fischl B, et al. (2011) In vivo evidence of disseminated subpial T2* signal changes in multiple sclerosis at 7 T: A surface-based analysis. NeuroImage 57: 55–62. pmid:21511042
- 35. Lee J, Shmueli K, Kang B-T, Yao B, Fukunaga M, van Gelderen P, et al. (2012) The contribution of myelin to magnetic susceptibility-weighted contrasts in high-field MRI of the brain. NeuroImage 59: 3967–3975. pmid:22056461
- 36. Cohen-Adad J, Polimeni JR, Helmer KG, Benner T, McNab JA, Wald LL, et al. (2012) T2* mapping and B0 orientation-dependence at 7 T reveal cyto- and myeloarchitecture organization of the human cortex. NeuroImage 60: 1006–1014. pmid:22270354
- 37. Spees WM, Yablonskiy DA, Oswood MC, Ackerman JJH (2001) Water proton MR properties of human blood at 1.5 Tesla: Magnetic susceptibility, T1, T2, T *2, and non-Lorentzian signal behavior. Magnetic Resonance in Medicine 45: 533–542. pmid:11283978
- 38. Li D, Wang Y, Waight DJ (1998) Blood oxygen saturation assessment in vivo using T2 * estimation. Magnetic Resonance in Medicine 39: 685–690. pmid:9581597
- 39. Alsop D, de Bazelaire C, Garcia D, Duhamel G. Inhomogenous magnetization transfer imaging: a potentially specific marker for myelin; 2005; Miami, Florida, USA. pp. 2224.
- 40. Alsop D, Dandamudi R, Bakshi R. Inhomogeneous magnetization transfer imaging of myelin concentration in multiple sclerosis; 2007. pp. 2188.
- 41. Duhamel GLT, A; Prevost, V; Varma, G; Guye, M; Ranjeva, JP; Pelletier, J; Alsop, DC; Girard, OM. Magnetization transfer from inhomogeneously broadened lines (ihMT): application on multiple sclerosis; 2015 2015, June 3rd; Toronto, ON, Canada. pp. 4346.
- 42. Kessler LG, Barnhart HX, Buckler AJ, Choudhury KR, Kondratovich MV, Toledano A, et al. (2015) The emerging science of quantitative imaging biomarkers terminology and definitions for scientific studies and regulatory submissions. Statistical Methods in Medical Research 24: 9–26. pmid:24919826
- 43. Stroman PW, Wheeler-Kingshott C, Bacon M, Schwab JM, Bosma R, Brooks J, et al. (2014) The current state-of-the-art of spinal cord imaging: Methods. NeuroImage 84: 1070–1081. pmid:23685159
- 44. Taso M, Girard OM, Duhamel G, Le Troter A, Feiweier T, Guye M, et al. (2016) Tract-specific and age-related variations of the spinal cord microstructure: a multi-parametric MRI study using diffusion tensor imaging (DTI) and inhomogeneous magnetization transfer (ihMT). NMR in Biomedicine 29: 817–832. pmid:27100385
- 45. Berry I, Barker GJ, Barkhof F, Campi A, Dousset V, Franconi J-M, et al. (1999) A multicenter measurement of magnetization transfer ratio in normal white matter. Journal of Magnetic Resonance Imaging 9: 441–446. pmid:10194715
- 46. Smith SA, Jones CK, Gifford A, Belegu V, Chodkowski B, Farrell JAD, et al. (2010) Reproducibility of tract-specific magnetization transfer and diffusion tensor imaging in the cervical spinal cord at 3 tesla. NMR in Biomedicine 23: 207–217. pmid:19924726
- 47. Grussu F, Schneider T, Zhang H, Alexander DC, Wheeler–Kingshott CAM (2015) Neurite orientation dispersion and density imaging of the healthy cervical spinal cord in vivo. NeuroImage 111: 590–601. pmid:25652391
- 48. Smith AK, Dortch RD, Dethrage LM, Smith SA (2014) Rapid, high-resolution quantitative magnetization transfer MRI of the human spinal cord. NeuroImage 95: 106–116. pmid:24632465
- 49. Carter R, Lubinsky J (2015) Rehabilitation research: principles and applications: Elsevier Health Sciences.
- 50. Lexell JE, Downham DY (2005) How to Assess the Reliability of Measurements in Rehabilitation. American Journal of Physical Medicine & Rehabilitation 84: 719–723.
- 51. Bashardoust Tajali S, MacDermid JC, Grewal R, Young C (2016) Reliability and Validity of Electro-Goniometric Range of Motion Measurements in Patients with Hand and Wrist Limitations. The Open Orthopaedics Journal 10: 190–205. pmid:27398107
- 52. James S, Ziviani J, Ware RS, Boyd RN (2016) Test–retest Reproducibility of the Assessment of Motor and Process Skills in Children with Unilateral Cerebral Palsy. Physical & Occupational Therapy In Pediatrics 36: 144–154.
- 53. Sakzewski L, Lewis M, Ziviani J (2016) Test–retest reproducibility of the Assessment of Motor and Process Skills for school-aged children with acquired brain injuries. Scandinavian Journal of Occupational Therapy: 1–6.
- 54. De Leener B, Lévy S, Dupont SM, Fonov VS, Stikov N, Louis Collins D, et al. (2016) SCT: Spinal Cord Toolbox, an open-source software for processing spinal cord MRI data. NeuroImage.
- 55. Fonov VS, Le Troter A, Taso M, De Leener B, Lévêque G, Benhamou M, et al. (2014) Framework for integrated MRI average of the spinal cord white and gray matter: The MNI–Poly–AMU template. NeuroImage 102, Part 2: 817–827.
- 56. Lévy S, Benhamou M, Naaman C, Rainville P, Callot V, Cohen-Adad J (2015) White matter atlas of the human spinal cord with estimation of partial volume effect. NeuroImage 119: 262–271. pmid:26099457
- 57. Bland JM, Altman DG (1986) Originally published as Volume 1, Issue 8476STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT. The Lancet 327: 307–310.
- 58. Shrout PE, Fleiss JL (1979) Intraclass correlations: uses in assessing rater reliability. Psychological bulletin 86: 420. pmid:18839484
- 59. McGraw KO, Wong SP (1996) Forming inferences about some intraclass correlation coefficients. Psychological methods 1: 30.
- 60. Fleiss J (1986) Book Reviews. Journal of Applied Statistics 13: 231–231.
- 61. Cicchetti DV (1994) Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment 6: 284–290.
- 62. Chinn S (1991) Statistics in respiratory medicine. 2. Repeatability and method comparison. Thorax 46: 454–456. pmid:1858087
- 63. Stratford PW (2004) Getting more from the literature: estimating the standard error of measurement from reliability studies. Physiotherapy Canada 56: 27–30.
- 64. Nakanishi R, Goto J, Ezure H, Motoura H, Ayabe S-i, Atsumi T (2004) Morphometric Analyses of Axons in the Human Lateral Corticospinal Tract: Cervical/Lumbar Level Comparison and Relation to the Ageing Process. Okajimas Folia Anatomica Japonica 81: 1–4. pmid:15248559
- 65. Ohnishi A, O'Brien PC, Okazaki H, Dyck PJ (1976) Morphometry of myelinated fibers of fasciculus gracilis of man. Journal of the Neurological Sciences 27: 163–172. pmid:1249584
- 66. Terao S-i, Sobue G, Hashizume Y, Shimada N, Mitsuma T (1994) Age-related changes of the myelinated fibers in the human corticospinal tract: a quantitative analysis. Acta Neuropathologica 88: 137–142. pmid:7985494
- 67. Nijeholt GJLà, Bergers E, Kamphorst W, Bot J, Nicolay K, Castelijns JA, et al. (2001) Post-mortem high-resolution MRI of the spinal cord in multiple sclerosisA correlative study with conventional MRI, histopathology and clinical phenotype. Brain 124: 154–166. pmid:11133795
- 68. Stikov N, Boudreau M, Levesque IR, Tardif CL, Barral JK, Pike GB (2015) On the accuracy of T1 mapping: Searching for common ground. Magnetic Resonance in Medicine 73: 514–522. pmid:24578189
- 69. Duval T, Lévy S, Stikov N, Campbell J, Mezer A, Witzel T, et al. (2017) g-Ratio weighted imaging of the human spinal cord in vivo. NeuroImage 145, Part A: 11–23.
- 70. Smith SA, Edden RAE, Farrell JAD, Barker PB, Van Zijl PCM (2008) Measurement of T1 and T2 in the cervical spinal cord at 3 tesla. Magnetic Resonance in Medicine 60: 213–219. pmid:18581383
- 71. Duval T, Lévy S, Stikov N, Campbell J, Mezer A, Witzel T, et al. g-Ratio weighted imaging of the human spinal cord in vivo. NeuroImage.
- 72. Samson RS, Ciccarelli O, Kachramanoglou C, Brightman L, Lutti A, Thomas DL, et al. (2013) Tissue- and column-specific measurements from multi-parameter mapping of the human cervical spinal cord at 3 T. NMR in Biomedicine 26: 1823–1830. pmid:24105923
- 73. Yiannakas MC, Kearney H, Samson RS, Chard DT, Ciccarelli O, Miller DH, et al. (2012) Feasibility of grey matter and white matter segmentation of the upper cervical cord in vivo: A pilot study with application to magnetisation transfer measurements. NeuroImage 63: 1054–1059. pmid:22850571
- 74. Hickman SJ, Hadjiprocopis A, Coulon O, Miller DH, Barker GJ (2004) Cervical spinal cord MTR histogram analysis in multiple sclerosis using a 3D acquisition and a B-spline active surface segmentation technique. Magnetic Resonance Imaging 22: 891–895. pmid:15234459
- 75. Rovaris M, Judica E, Ceccarelli A, Ghezzi A, Martinelli V, Comi G, et al. (2008) Absence of diffuse cervical cord tissue damage in early, non-disabling relapsing-remitting MS: a preliminary study. Multiple Sclerosis Journal 14: 853–856. pmid:18611991
- 76. Ropele S, Filippi M, Valsasina P, Korteweg T, Barkhof F, Tofts PS, et al. (2005) Assessment and correction of B1-induced errors in magnetization transfer ratio measurements. Magnetic Resonance in Medicine 53: 134–140. pmid:15690512
- 77. Tang Y, Nyengaard JR, Pakkenberg B, Gundersen HJG (1997) Age-Induced White Matter Changes in the Human Brain: A Stereological Investigation. Neurobiology of Aging 18: 609–615. pmid:9461058
- 78. Ge Y, Grossman RI, Babb JS, Rabin ML, Mannon LJ, Kolson DL (2002) Age-Related Total Gray Matter and White Matter Changes in Normal Adult Brain. Part II: Quantitative Magnetization Transfer Ratio Histogram Analysis. American Journal of Neuroradiology 23: 1334–1341. pmid:12223374
- 79. Barrick TR, Charlton RA, Clark CA, Markus HS (2010) White matter structural decline in normal ageing: A prospective longitudinal study using tract-based spatial statistics. NeuroImage 51: 565–577. pmid:20178850
- 80. Likitjaroen Y, Meindl T, Friese U, Wagner M, Buerger K, Hampel H, et al. (2012) Longitudinal changes of fractional anisotropy in Alzheimer’s disease patients treated with galantamine: a 12-month randomized, placebo-controlled, double-blinded study. European Archives of Psychiatry and Clinical Neuroscience 262: 341–350. pmid:21818628
- 81. Teipel SJ, Meindl T, Wagner M, Stieltjes B, Reuter S, Hauenstein K-H, et al. (2009) Longitudinal changes in fiber tract integrity in healthy aging and mild cognitive impairment: a DTI follow-up study. Journal of Alzheimer's disease: JAD 22: 507–522.
- 82. Kochunov P, Thompson PM, Lancaster JL, Bartzokis G, Smith S, Coyle T, et al. (2007) Relationship between white matter fractional anisotropy and other indices of cerebral health in normal aging: Tract-based spatial statistics study of aging. NeuroImage 35: 478–487. pmid:17292629
- 83. Wang K, Song Q, Zhang F, Chen Z, Hou C, Tang Y, et al. (2014) Age-related changes of the diffusion tensor imaging parameters of the normal cervical spinal cord. European Journal of Radiology 83: 2196–2202. pmid:25287960
- 84. Chan T-Y, Li X, Mak K-C, Cheung J-y, Luk K-K, Hu Y (2015) Normal values of cervical spinal cord diffusion tensor in young and middle-aged healthy Chinese. European Spine Journal 24: 2991–2998. pmid:26208941
- 85. Agosta F, Laganà M, Valsasina P, Sala S, Dall'Occhio L, Sormani MP, et al. (2007) Evidence for cervical cord tissue disorganisation with aging by diffusion tensor MRI. NeuroImage 36: 728–735. pmid:17490894
- 86. Zhang H, Schneider T, Wheeler-Kingshott CA, Alexander DC (2012) NODDI: Practical in vivo neurite orientation dispersion and density imaging of the human brain. NeuroImage 61: 1000–1016. pmid:22484410
- 87. Assaf Y, Blumenfeld-Katzir T, Yovel Y, Basser PJ (2008) Axcaliber: A method for measuring axon diameter distribution from diffusion MRI. Magnetic Resonance in Medicine 59: 1347–1354. pmid:18506799
- 88. Assaf Y, Basser PJ (2005) Composite hindered and restricted model of diffusion (CHARMED) MR imaging of the human brain. NeuroImage 27: 48–58. pmid:15979342
- 89. Duval T, McNab JA, Setsompop K, Witzel T, Schneider T, Huang SY, et al. (2015) In vivo mapping of human spinal cord microstructure at 300mT/m. NeuroImage 118: 494–507. pmid:26095093
- 90. Mangeat G, Govindarajan ST, Mainero C, Cohen-Adad J (2015) Multivariate combination of magnetization transfer, T2* and B0 orientation to study the myelo-architecture of the in vivo human cortex. NeuroImage 119: 89–102. pmid:26095090
- 91. Lévy S, Khatibi A, Mangeat G, Chen J-I, Martinu K, Rainville P, et al. Statistical combinations of T1, MTR, MTsat and Macromolecular Tissue Volume to improve myelin content estimation in the human spinal cord at 3T; 2017 April 26, 2017; Honolulu, USA.