Diffusion-weighted MRI distinguishes Parkinson disease from the parkinsonian variant of multiple system atrophy: A systematic review and meta-analysis

Background Putaminal diffusivity in brain magnetic resonance diffusion-weighted imaging (DWI) is increased in patients with the parkinsonian variant of multiple system atrophy (MSA-P) compared to Parkinson disease (PD) patients. Purpose We performed a systematic review and meta-analysis to evaluate the diagnostic accuracy of DWI to distinguish MSA-P from PD. Methods Studies on DWI were identified through a systematic PubMed and Clarivate Analytics® Web of Science® Core Collection search. Papers were selected based on stringent inclusion criteria; minimum requirement was the inclusion of MSA-P and PD patients and documented true positive, true negative, false positive and false negative rates or overall sample size and reported sensitivity and specificity. Meta-analysis was performed using the hierarchical summary receiver operating characteristics curve approach. Results The database search yielded 1678 results of which 9 studies were deemed relevant. Diagnostic accuracy of putaminal diffusivity measurements were reported in all of these 9 studies, whereas results of other regions of interest were only reported irregularly. Therefore, a meta-analysis could only be performed for putaminal diffusivity measurements: 127 patients with MSA-P, 262 patients with PD and 70 healthy controls were included in the quantitative synthesis. The meta-analysis showed an overall sensitivity of 90% (95% confidence interval (CI): 76.7%-95.8%) and an overall specificity of 93% (95% CI: 80.0%-97.7%) to distinguish MSA-P from PD based on putaminal diffusivity. Conclusion Putaminal diffusivity yields high sensitivity and specificity to distinguish clinically diagnosed patients with MSA-P from PD. The confidence intervals indicate substantial variability. Further multicenter studies with harmonized protocols are warranted particularly in early disease stages when clinical diagnosis is less certain.


Introduction
Parkinson disease (PD) and multiple system atrophy (MSA) are both progressive, neurodegenerative synucleinopathies. Depending on the predominant motor deficits, MSA is sub-divided into a parkinsonian (MSA-P) and a cerebellar (MSA-C) variant. Because MSA-P and PD share several signs and symptoms, they may be mistaken for one another on clinical examination [1] with diagnostic error rates at the first clinical visit reaching 24%. [2] Thus, an early and reliable diagnostic marker is a major unmet medical need. In recent years, several brain magnetic resonance imaging (MRI) features have been described as specific for MSA and as helpful in the differential diagnosis of parkinsonian syndromes. These include atrophy of the putamen, pons, cerebellum and middle cerebellar peduncle (MCP), a dilated fourth ventricle, and various signal intensity alterations on routine MRI in MSA [3][4][5] whereas conventional MRI is typically normal in PD. Diffusion-weighted imaging (DWI) is of particular interest since it may serve as a quantifiable surrogate marker of neurodegeneration in MSA patients. [6] In fact, increased putaminal diffusivity in DWI is considered a common and diagnostically valuable finding in patients with MSA. [7,8] Here, we present a systematic review and meta-analysis of the diagnostic accuracy of DWI in distinguishing MSA-P from PD.

Patients and methods
Studies on DWI were identified by two raters (SB, FK) through a systematic PubMed and Clarivate Analytics 1 Web of Science 1 Core Collection search. The following search term was used: ("multiple system atrophy" OR MSA OR "olivopontocerebellar atrophy" OR OPCA OR "striatonigral degeneration" OR SND OR "Shy-Drager syndrome") AND ("magnetic resonance imaging" OR MRI OR diffusion Ã OR diffusivity Ã OR DWI OR DTI) (S1 File, Search strategy). The term diffusivity used in this article includes Trace(D), averaged ADCs and mean diffusivity (MD). Full papers published from March 1986 through June 29, 2017 were considered. For further analysis papers had to satisfy the following, predefined eligibility criteria: (1) Papers were required to be published in English or German language. (2) MSA-P and PD patients were included in the study. (3) Studies were required to either report true positive, true negative, false positive and false negative rates or overall sample size and sensitivity and specificity values. Our meta-analysis complied with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [9] (S1 Checklist, Prisma Checklist).
The risk of bias in individual studies and across studies was performed with a tool for the quality assessment of studies of diagnostic accuracy (QUADAS) [10] (S1 Table, Quadas). The rating was performed by two independent raters (SB, FK) and discordant ratings were resolved in a discussion of the two initial raters and one additional uninvolved senior investigator. The QUADAS questionnaire includes fourteen items covering the following issues: reference standard, covered patient spectrum, verification bias, disease progression bias, review bias, incorporation bias, clinical review bias, test execution, indeterminate results and study withdrawals. Data extraction was done for each paper by the two independent investigators. For statistical analysis the following data were extracted from each of the studies: (1) Number of participants in each group, (2) sensitivity and specificity, or alternatively, true positive, true negative, false positive and false negative rates. Overall sensitivity and specificity were calculated using the hierarchical summary receiver operating characteristics (HSROC) curve approach as described previously [11] and, in addition, both, a summary estimate which includes 95% confidence region and a forecast of the sensitivity and specificity which includes a 95% prediction region, are provided. In this method, the relationship between logit-transformed sensitivity and specificity in each study is quantified by the log diagnostic odds ratio (OR) and the results are used to estimate a summary ROC curve. Between-study heterogeneity was assessed by I 2 statistic, a parameter that provides a measure of the degree of inconsistency across studies describing the percentage of total variation attributable to heterogeneity, rather than chance. I 2 values up to 30%-40% are considered as low and up to 50%-60% as moderate heterogeneity. [

Results
A total of 1678 papers were identified by the initial PubMed and Clarivate Analytics 1 Web of Science 1 Core Collection search. After review of the abstracts, and removal of 1118 duplicates, 109 publications were selected for further review of the full texts. Only 9 studies satisfied the predefined criteria and were deemed relevant. A detailed flow chart of the review process is shown in Fig 1. The characteristics of the nine studies [13][14][15][16][17][18][19][20][21] included in this study are presented in Table 1.
A sufficient number of studies to conduct a meta-analysis was published only for overall putaminal diffusivity measurements. Data from 127 MSA-P patients and 262 PD patients were analysed. Overall sensitivity was 90% (95% confidence interval: 76.7%-95.8%) and an overall specificity was 93% (95% confidence interval: 80.0%-97.7%) to discriminate MSA-P from PD patients (Fig 2). Excellent positive and negative likelihood ratios of 12.43 (3.97-38.92) and 0.11 (0.05-0.28), respectively, were observed. There was substantial between-study heterogeneity as suggested by I 2 score of 66.13 and 78.82 for sensitivity and specificity, respectively.
Results of DWI measurements in five additional brain regions were reported in the literature. Nicoletti et al. were able to discriminate MSA-P from PD with a sensitivity and specificity of 100% based on measuring diffusivity in the MCP. [14] Following this approach, Chung et al. were able to replicate the excellent specificity but found a lower sensitivity to differentiate MSA-P from PD (sensitivity and specificity of 92% and 100%, respectively). [13] Analysis of the caudate nucleus also revealed a sensitivity of 75% and a specificity of 94% comparing MSA-P and PD. [14] Another study also measured diffusivity in the globus pallidus where sensitivity and specificity reached 63% and 93% for discriminating MSA-P from PD. [14] Ito et al. performed analyses in the pons to differentiate MSA-P from PD with a sensitivity and specificity of 70% each. [18] Two further studies described measurements in the cerebellum. Sensitivity ranged from 60% to 91% and specificity from 88% to 64%. [18,19] Moreover, magnetic field strength, slice thickness and interslice gap varied between studies. Two studies used 3T field strength, [15,18] other six studies used 1,5T field strength [13,14,16,17,20,21] and one used both. [19] Slice thickness varied from 2 mm to 6 mm and interslice gap varied from 0 mm up to 1.5 mm. A detailed overview is provided in Table 1.
All studies used established diagnostic criteria as a reference standard. Five out of nine studies included only probable MSA according to the current Consensus Criteria, the other studies included probable and possible MSA cases. [4] Five out of nine studies (56%) reported the method of patient recruitment and six out of nine (67%) reported the blinding status.

Discussion
This meta-analysis shows that assessment of putaminal diffusivity on high-field DWI is a useful imaging technique to discriminate MSA-P from PD with overall sensitivity of 90% and overall specificity of 93%.
Putaminal diffusivity changes in MSA-P seem to correspond to prominent neuronal loss in the putamen in this disorder. Since diffusivity is based on hydrogen motility, structural damage in the putamen would lead to enhanced diffusivity [22] which can indeed be detected already in early disease stages in MSA-P patients. [16,20,23] Although normal aging may also affect diffusion tensor imaging, [24] none of the studies assessed here was confounded by age differences between study groups.
Our meta-analysis showed substantial between-study heterogeneity. Several factors might contribute to this variability: (1) slice thickness and interslice gap varied considerably between the studies included in this meta-analysis and it appears natural that a thinner slice thickness and a smaller inter-slice gap provides better diffusivity read-outs. (2) Differences in size and placement of the region of interests (ROIs) may have also influenced results. While some Diffusion-weighted MRI distinguishes PD from the MSA-P: A systematic review and meta-analysis  authors determined diffusivity in the putamen others restricted their ROIs to the posterior putamen. A standardized placement of the ROIs could be helpful in harmonizing results among different study sites. (3) Another potential source of variability arise from the used ROI placement procedure, i.e. automated atlas based definition of ROI or manual delineation of the ROI. In the present meta-analysis eight out of nine studies used manually placed ROIs and it remains to be studied which method provides better test-retest reliability. Magnetic field strength varied between the included studies from 1.5 T to 3 T. In total six studies used 1.5 T [13,14,16,17,20,21]], two used 3T [15,18]] and one used both field strengths [19]]. However, it is unlikely that this circumstance influenced the results of our meta-analysis since diffusion tensor do not depend directly on the magnetic field and can thus be measured and directly compared between high-and low-field acquisitions. In fact, water diffusion in a given space is the same at 1.5, 3.0 and even 7.0 T [25]. It is worth mentioning that none of the patients in any study had a post-mortem confirmed diagnosis. As clinical diagnostic certainty increases with disease progression, most of the studies have included patients in advanced disease stages, thus making the clinical diagnosis of patients more reliable. Other studies, having also included patients in earlier disease stages and followed patients clinically for at least 1 year to optimise diagnostic certainty. [20,21] Nevertheless, we cannot rule out clinical misclassification in some instances, but this is an inherent problem in clinical biomarker research in neurodegenerative parkinsonism. However all studies analysed here have used established diagnostic criteria for diagnosis of MSA-P. [4] Three studies compared the diagnostic value of striatal ADCs or putaminal diffusivity to either dopamine D2 receptor binding IBZM-SPECT ([132-I]-iodobenzamide-single-photon emission computed tomography), [21] cardiac MIBG ([132-I]-meta-iodobenzylguanidine uptake) [17] or 18-Fluorodeoxyglucose positron emission tomography (FDG-PET). [15] Putaminal diffusivity measures were more accurate as compared with IBZM-SPECT, cardiac MIBG and FDG-PET imaging.
In summary, DWI is easy to implement in routine MRI protocols. Based on this meta-analysis, putaminal diffusivity on DWI has excellent sensitivity and specificity in distinguishing MSA-P from PD in clinically established cases, nevertheless, these results must be considered sober. Standardized MRI Protocols, harmonized DWI sequenzes and ROIs are needed to increase the inter-scanner and inter-site comparability. Further studies which directly compare different ROI placements are another important area of future research, also further studies comparing different methods are needed. Finally, all studies included in this meta-analysis analysed patients with an established clinical diagnosis, hence, multicenter imaging studies in patients with newly diagnosed parkinsonism with harmonized MR protocols and long-term clinical follow-up are highly warranted to inform us about the diagnostic accuracy of DWI in early disease stages when clinical diagnosis is often inaccurate.