Longitudinal brain structure changes in Parkinson’s disease: A replication study

Context An existing major challenge in Parkinson’s disease (PD) research is the identification of biomarkers of disease progression. While magnetic resonance imaging is a potential source of PD biomarkers, none of the magnetic resonance imaging measures of PD are robust enough to warrant their adoption in clinical research. This study is part of a project that aims to replicate 11 PD studies reviewed in a recent survey (JAMA neurology, 78(10) 2021) to investigate the robustness of PD neuroimaging findings to data and analytical variations. Objective This study attempts to replicate the results in Hanganu et al. (Brain, 137(4) 2014) using data from the Parkinson’s Progression Markers Initiative (PPMI). Methods Using 25 PD subjects and 18 healthy controls, we analyzed the rate of change of cortical thickness and of the volume of subcortical structures, and we measured the relationship between structural changes and cognitive decline. We compared our findings to the results in the original study. Results (1) Similarly to the original study, PD patients with mild cognitive impairment (MCI) exhibited increased cortical thinning over time compared to patients without MCI in the right middle temporal gyrus, insula, and precuneus. (2) The rate of cortical thinning in the left inferior temporal and precentral gyri in PD patients correlated with the change in cognitive performance. (3) There were no group differences in the change of subcortical volumes. (4) We did not find a relationship between the change in subcortical volumes and the change in cognitive performance. Conclusion Despite important differences in the dataset used in this replication study, and despite differences in sample size, we were able to partially replicate the original results. We produced a publicly available reproducible notebook allowing researchers to further investigate the reproducibility of the results in Hanganu et al. (2014) when more data is added to PPMI.


Introduction
Parkinson's disease (PD), one of the most common neurodegenerative diseases, is often characterized by akinesia, bradykinesia, tremor, and is commonly associated with mild cognitive impairment which significantly decrease overall quality of life [1].These symptoms are accompanied by atrophy of the cortical and subcortical brain structures as well as cortical thinning [2].As a result, there has been interest in determining whether magnetic resonance imaging (MRI) measures of atrophy can be used as a biomarker of cognitive decline.Overall, gray matter atrophy and cortical thinning are present in early PD, while frontal atrophy and temporoparietal thinning are associated with cognitive impairment in PD [3].
Magnetic resonance imaging-derived measures of the structural brain changes occurring in PD have emerged as potential diagnostic and prognostic tools to understand the trajectory of PD.Structural imaging, especially regional cortical thickness and loss of gray matter volume, has been considered helpful in determining a PD diagnosis, progression prognosis, and distinguishing PD from other dementias [2].However, the need to further investigate the sensitivity, reliability, effect of confounding factors, and overall generalizability of these progression measures has been highlighted [3] and such rigorous validation is likely a factor preventing the adoption of MRI measures as outcome measures in PD clinical research.
The replication of neuroimaging findings has been challenged in multiple ways in recent years.For example, in the study of Botvinik-Nezer et al. [4], 70 independent teams were asked to analyze the same dataset using the methods of their choice to test nine ex-ante hypotheses.The study investigated the variability of the results.Results obtained across research teams did not concur on five out of the nine hypotheses, reaching agreement levels ranging from 21% to 37%.Furthermore, the identification of regional brain atrophy in PD has been of interest as a possible marker of certain symptoms of PD and of the progression of PD [2].However, studies conducted in non-PD populations have shown that estimates of regional volume [5,6] and of cortical thickness vary depending on the software toolbox [7,8].Overall, a range of factors matter in the replicability of neuroimaging findings, including computational environments [9,10], analysis tools and versions [11,12], statistical models [13], and study populations [14].
This study is a part of a reproducibility evaluation project that aims to replicate 11 structural MRI measures of PD reviewed in Mitchel et al. [2].The goal of the present study is to replicate the work by Hanganu et al. [15] to test whether prior findings regarding structural MRIderived PD biomarkers replicate in a different dataset using similar analytical methods.Hanganu et al. [15] compared the change of gray matter volume and cortical thinning over time between PD patients with mild cognitive impairment (PD-MCI), PD patients without mild cognitive impairment (PD-non-MCI), and healthy older controls (HC); and also tested the relationship between longitudinal structural changes and cognitive decline in the PD patients.They reported four main findings: (Finding 1) an increased rate of cortical thinning in PD patients with mild cognitive impairment compared to PD patients without MCI (mainly affecting the right temporal regions, insula, and inferior frontal gyrus), and compared to healthy controls (mainly in the right temporal regions and supplementary motor area);

Participants
The original study included 15 PD-non-MCI, 17 PD-MCI and 18 HC.In order to reconstruct this cohort, PD patients and HC were selected from PPMI to attempt to match the sample size and demographics of the groups in the original study.The following criteria were used to define the PD cohorts: clinical diagnosis of PD, available T1-weighted images at two research visits, Hoehn and Yahr stage I and II (the stage was stable across the two visits for each patient), testing performed at PD OFF state, available MoCA scores, and the absence of any other neurological condition.Data was collected after approval of the local ethics committees of the PPMI's participating sites.All participants provided written informed consent.This study was conducted in accordance with the Declaration of Helsinki and was exempt from the Concordia University's Research Ethics Unit.
Patients were divided into PD-MCI and PD-non-MCI groups.In the original study, MCI was diagnosed on the basis of the presence of subjective complaints of cognitive impairment, objective impairment on two or more neuropsychological tests in one domain of cognitive function and the absence of dementia.In the PPMI dataset, patients are already classified as having MCI or not using a very similar criteria for classification, and thus the existing classification was used.The diagnosis is available in the PPMI's cogstate variable which was used to determine MCI.Diagnosis of MCI in PPMI is administered at the participating sites based on clinician's experience and test outcomes in six domains: attention, memory, orientation, executive abilities, praxis, and language.
Ten PD-MCI (M age = 67.6;SD = 5.8), 15 PD-non-MCI (M age = 63.4;SD = 9.4), and 18 HC participants were selected (M age = 66.9;SD = 6.1).PD-non-MCI and HC group sample sizes match those of the original study but an insufficient number of PD-MCI patients were identified in the PPMI dataset that met all the original inclusion criteria, thus our sample is smaller than the original sample (n = 10 vs n = 17).Initially, 20 PD-MCI patients who had at least two MRI scans were identified in the PPMI.One participant was excluded for not meeting Hoehn and Yahr criterium, one for lacking MoCA score, and additional eight were not tested during PD OFF state.There were 138 PD-non-MCI patients that had at least two MRI scans.Seven subjects were excluded for not meeting Hoehn and Yahr criterion, nine for lacking MoCA score, additional 53 patients were not tested during PD OFF state.The sample that met all the criteria to match the PD-MCI sample was 69.We matched 15 PD-non-MCI patients to match the PD-MCI group.Finally, there were 67 healthy control subjects out of whom 18 were randomly selected as a main control cohort.From the remaining 49 participants another set of 18 healthy controls were randomly selected as a replication sample.No participant was excluded due to image preprocessing failure.Flow diagram of sample selection is presented in Fig 1 .Descriptive statistics are reported in Table 1.

Image acquisition and preprocessing
Structural MRI images were taken from the PPMI which uses a standardized study protocol and the following parameters: repetition time = 2.3 s, echo time = 2.98 s, inversion time = 0.9 s, slice thickness = 1 mm, number of slices = 192, field of view = 256 mm, and matrix  size = 256 × 256.However, since PPMI is a multisite project there may be slight differences in the sites' setup.Scans were acquired using different 3T scanners (Philips Achieva n = 2; Siemens Prisma fit n = 8; Siemens Prisma n = 4; Siemens Skyra n = 2; Siemens TrioTim n = 64; Siemens Verio n = 6).There were scans with echo time (TE) that diverged from the standardized protocol: one image with TE = 2.52 s, one image with TE = 1.91 s, three images with TE = 2.91 s, one image with TE = 2.93 s, four images with TE = 2.95 s, four images with TE = 2.96 s, two images with TE = 3.06s.Additionally, two images had TE = 2.94 s, TR = 6.49s, and TE = 2.91 s, TR 6.26 s.T1-weighted brain images were processed using FreeSurfer 7.1.1[17].The longitudinal preprocessing stream was used to calculate the change in cortical thinning and subcortical volumes [18].FreeSurfer's recon-all function was used for cortical reconstruction.First, all timepoints were processed cross-sectionally with the default workflow, then an unbiased template from the two timepoints was created for each subject, finally data was processed longitudinally.Specifically an unbiased within-subject template space and image [19] is created using robust, inverse consistent registration [20].Several processing steps, such as skull stripping, Talairach transforms, atlas registration as well as spherical surface maps and parcellations are then initialized with common information from the within-subject template, significantly increasing reliability and statistical power [18].The rate of change of cortical thickness between the two timepoints was calculated for each subject.Cortical thickness was smoothed with a 10 mm FWHM kernel.The original study also reported manual correction of misclassified tissue types, which was not performed in our study since the protocol for it was insufficient to replicate.The preprocessed cortical thickness data was used in the vertex-wise analyses and subcortical volumetric data was used in the subcortical analyses.

Statistical analyses
Structural brain images and Montreal Cognitive Assessment (MoCA) scores from the initial and the follow-up visits were analyzed consistently with the 4 main findings reported in the original study.(Finding 1) We tested vertex-wise differences in the change of cortical thickness between HC, PD-MCI, and PD-non-MCI groups with an ANCOVA model.(Finding 2) We tested the correlation between the change of cortical thickness and the change of MoCA scores in PD-MCI, PD-non-MCI, and PD-all (all PD patients) groups.The time between the two visits was added as a covariate in the general linear models.The rate of change of the cortical thickness was calculated with the formula: (Thickness at Time 1 -Thickness at Time 2) / (Time 2 -Time 1).Subcortical volumes were adjusted for the estimated total intracranial volume as well as the averages of the two time points using regression-based correction, in line with the original study.(Finding 3) We tested the differences in regional volume changes between the three groups using t-tests and (Finding 4) measured the correlations between the change in MoCA scores and change of the subcortical volumes and cortical thickness in each group using Pearson correlation.Whole-brain analyses were corrected using FreeSurfer's mri_glm-sim function.Cluster-wise p-value threshold was used at the p < .05level.Volumetric analyses were corrected for multiple comparisons using Bonferroni correction.Group analysis is significant at p < .007while correlational analysis at p < .003which both correspond to p < .05after the correction.Effect sizes in vertex-wise analyses represented by Pearson correlation using FreeSurfer's mri_segstats function.
was used for image preprocessing and vertex-wise analyses.We used a containerized version of FreeSurfer managed by Boutiques 0.5.25 (doi:10.5281/zenodo.3839009).The containerized FreeSurfer analyses were executed through the Slurm batch manager on the Narval cluster (https://docs.alliancecan.ca/wiki/Narval/en)hosted at Calcul Que ´bec and part of Digital Research Alliance of Canada.
The code and results are publicly available at https://github.com/LivingPark-MRI/hanganu-etal-2014.We developed a Python package (LivingPark utils, available at https:// github.com/LivingPark-MRI/livingpark-utils) to download and manipulate PPMI data directly from the original PPMI database.As a result, our notebook can be re-executed by anyone with a PPMI account.

Vertex-wise results
We found numerous group differences in the rate of change of cortical thickness.The results are reported in Table 2 and Fig 2.
PD-MCI vs. PD-non-MCI.The two patient groups differed in the rate of cortical thinning.The PD-MCI group, compared to PD-non-MCI patients, had an increased rate of cortical thinning in the right middle temporal and precentral gyri as well as right insula.PD-non-MCI group exhibited an increased thinning in the left precuneus compared to the PD-MCI group.
PD-MCI vs. HC.The HC group had increased cortical thinning in the right precentral and supramarginal gyri, left superior frontal gyrus, and bilateral superior parietal lobule, compared to the PD-MCI group.
PD-non-MCI vs. HC.The HC group had increased cortical thinning in the right precuneus as well as precentral and supramarginal gyri compared to the PD-non-MCI group.There was a significant negative correlation between the change in MoCA scores and the rate of change of the right middle frontal gyrus thickness in the PD-MCI group (p < .05).There was a positive correlation between the MoCA and the rate of change of the left inferior temporal and precentral gyri across all PD patients (p < .05).The correlations were not significant in the PD-non-MCI group.The results are reported in Table 3 and Fig

Group and region
Size (mm

Volumetric results
Group comparisons of the volumetric change of subcortical regions revealed no significant differences between the groups (ps > .05).The results are reported in Table 4. Correlation analysis did not show any significant correlation between the change of MoCA scores and the change in volume of subcortical structures.Results are reported in Table 5.

Replication results
There were several group differences in the rate of change of cortical thickness in the replication analysis with a different HC sample.The results are reported in Table 6 and Fig    Group comparisons of the volumetric change of subcortical regions in the replication analysis revealed no significant differences between the groups (ps > .05).The mean change of the subcortical volumes in HC group is close to zero which is a results of random selection of the sample.The results are reported in Table 7.

Discussion
This study attempted to replicate the results of Hanganu et al. [15], which focused on the longitudinal changes in the cortical thickness and subcortical gray matter volume in PD patients.Group differences between PD patients with and without MCI were investigated along with the relationship between the structural changes and cognition.We used the analytic methods described in the paper and applied them to a different dataset of PD patients with and without MCI, and HC.Table 7. Group differences in the volumetric change of the subcortical regions and the overall cortical thickness in the replication analysis.We have replicated the differences in the rate of cortical thinning between the PD-MCI and PD-non-MCI groups (Finding 1 in [15]), which supports the notion that these two patient groups differ in the rate of neurodegeneration.Patients with MCI had increased cortical thinning of the right MTG and insula, which overlaps with the regions reported in the original study.However, we did not replicate the group differences involving the HC group.These differences were not replicated using HC replication sample either.The discrepancies between the results with both HC results further support the notion that this original result is challenging to reproduce.

PD-MCI PD-non-MCI HC PD-MCI vs PDnon-MCI PD-MCI vs HC PD-non-MCI vs HC
We found a relationship between the decrease in cognitive performance and cortical thinning of the left precentral gyrus and ITG in PD patients.The ITG cluster partially overlaps with the area reported in the original study.Although the structural changes were not found in the exact same brain voxels, the data supports the role of the thinning of the temporal regions in PD patients' cognition (Finding 2 in [15]).Additionally, we found a relationship between the increase of the cognitive performance and cortical thinning of the right MFG in the PD-MCI group.This result was not reported in the original study and provides additional insight into potential structural changes in PD populations.This is in contrast to previous studies reporting the link between the MFG atrophy and cognitive functioning [21,22].However, our result is not entirely surprising since the relationship between the presence of a regional gray matter preservation and the severity of symptoms in dementia has been previously reported [23].Overall, we have replicated the original vertex-wise results (Findings 1 and 2 in [15]) to a certain extent.
Volumetric results (Findings 3 and 4 in [15]) were not replicated.The original study reported group differences in the change of overall cortical thickness as well as volumes of amygdala and nucleus accumbens.Hanganu et al. [15] reported a correlation between the change in cognitive performance and the change in gray matter volume in the right thalamus and amygdala in PD patients.We found no significant correlations across all subcortical regions.We did not replicate the results with the HC replication sample either but there is consistency between both HC groups.The lack of replication is not only caused by sample selection of HC group but also affected by the sample size, effect may be too small to be detected by such a small sample.
There are several factors beyond the selection of HC group that may explain the differences between the original results and the results of our replication.Most importantly, the two studies differ in the sample sizes for the PD-MCI group.We were unable to select enough PD patients with MCI that met all the inclusion criteria which reduced the chance to replicate the results.This was troublesome despite the fact that we used PPMI, one of the largest publicly available PD datasets.The PPMI is a relatively new initiative and is lacking much longitudinal data.Our Jupyter notebook remains accessible and can be re-run as new data becomes available in PPMI.Importantly, current neuroscience recommends using larger sample sizes to avoid inflation of effect sizes.Sample sizes used in research are increasing and recent data suggest that brain-wide association studies may require as many as thousands of participants to define reliable brain-behavior relationships [24].
Differences between the samples may also affect replicability.There are no established clinical measures to infer disease severity in the brain, but disease duration, UPDRS score (a measure of symptom severity), and medication use are sometimes used as proxy measures.Average disease duration was very similar in our sample compared to the original study suggesting the patients were roughly matched.However, some patients had not yet started PD medications in our sample whereas all patients in the original study were already taking dopaminergic medications, and the UPDRS score was also a few points lower in our sample compared to the original study's sample, both of which suggest that the replication sample we constructed from the PPMI cohort had slightly milder disease than the sample included in the original study.Our sample is also slightly older than the original cohort.Furthermore, despite using the same inclusion criteria as the original study, it is possible that other differences in sample characteristics may have contributed to the incomplete replication.Conclusions drawn from our data should not be expanded to different clinical populations (e.g., more severe PD patients).
Differences in neuroimaging data acquisition protocols and MRI scanners can also influence replicability.Data used in our study was acquired using different MRI scanners.Although PPMI uses a standardized protocol for data acquisition we cannot rule out the possibility that the reported differences may be related to the variability introduced by using various scanners, even though the vast majority of scans were acquired with a Siemens scanner.Ideally all participants should be scanned with the same machine but the benefits of collecting more data in a multisite project outweighs the advantages of a single-stage acquisition.
Software versions may also have impacted the results.FreeSurfer 5.3.for Centos 4 was used in the original study.This version of FreeSurfer was released 10 years ago and is no longer supported or recommended, hence we used version 7.1.1.for Centos 7 instead.Although the software version shouldn't drastically change the clinical results, it is possible that it introduced variability during preprocessing or during the statistical analysis stage.We used the mri_glmfit-sim method to perform the analysis (including cluster correction) while the original study used the QDEC (Query, Design, Estimate, Contrast) method with mri_surfcluster.Our method is more stringent but also more reliable.It might have established more reliable borders between the gray and white matters.Filip et al. [25] reported structural group differences in data analyzed with FreeSurfer 5.3.which were not replicated with version 7.1.Previous studies indicated that software version may impact structural brain analyses [7,9].We will test the impact of software variability in our future work.
We thoroughly followed the original processing pipeline reported by Hanganu et al. [15].Nevertheless, it is possible that we have missed some steps which could have influenced the results.We did not perform manual correction of misclassified brain tissue as this procedure cannot be objectively replicated.There might be slight differences in the statistical models that we were not aware of.Any of the aforementioned discrepancies from the original study could have impacted the ability to replicate the results.Once the analyses were conducted, we contacted the authors of the original study to obtain their feedback, which importantly contributed to the discussion section.
Finally, there is a negative bias in replication studies, coming from the fact that researchers conducting replications focus on following original methods rather than getting positive results.Therefore, it is expected that more negative results are reported in replications than in original studies.We encourage scientists to follow all the steps and details from the original studies instead of simply aiming to replicate positive results.
We encountered multiple challenges while attempting to replicate the study by Hanganu et al. [15].Analysis details are necessarily limited in the methods section of most papers which left us to infer some analytic steps.We also had difficulty constructing a similar cohort from publicly available PD data, which led to some differences in the patient characteristics between the original and replication samples, and to differences in sample size.We have published a Jupyter notebook that the research community can use to replicate our study.While respecting the PPMI data usage agreement, it clearly defines the criteria to define the study population (using the Pandas library), the preprocessing pipeline and software version (through containerized tools), and the statistical model that was used to obtain the results.Our notebook addresses some of the aforementioned challenges encountered during our study and can be re-run over time to update the study as more data gets added to

Fig 2 .
Fig 2. Vertex-wise group differences in the rate of change of cortical thickness.Cold colors represent increased cortical thinning in the first group compared to the second group, warm colors represent increased cortical thinning in the second group compared to the first group.Results corrected with the cluster-wise threshold (p < .05).HC, healthy controls; PD-MCI, Parkinson's disease with mild cognitive impairment; PD-non-MCI, Parkinson's disease without mild cognitive impairment.https://doi.org/10.1371/journal.pone.0295069.g002 4.

Fig 3 .Table 4 .
Fig 3. Vertex-wise correlation between the rate of change of cortical thickness and the change in Montreal Cognitive Assessment scores in the PD-all and PD-MCI groups.https://doi.org/10.1371/journal.pone.0295069.g003

Fig 4 .
Fig 4. Vertex-wise group differences in the rate of change of cortical thickness in the replication analysis.Cold colors represent increased cortical thinning in the first group compared to the second group, warm colors represent increased cortical thinning in the second group compared to the first group.Results corrected with the cluster-wise threshold (p < .05).HC, healthy controls; PD-MCI, Parkinson's disease with mild cognitive impairment; PD-non-MCI, Parkinson's disease without mild cognitive impairment.https://doi.org/10.1371/journal.pone.0295069.g004

Table 5 . Correlation between the rate of change of the subcortical volumes and the change in Montreal Cognitive Assessment scores.
L, left; PD-all, Parkinson's disease with and without mild cognitive impairment; PD-non-MCI, Parkinson's disease without mild cognitive impairment; PD-MCI, Parkinson's disease with mild cognitive impairment; R, right.The results are significant at the p < .003corresponding to p < .05after the Bonferroni correction.https://doi.org/10.1371/journal.pone.0295069.t005