Automated, High Accuracy Classification of Parkinsonian Disorders: A Pattern Recognition Approach

Progressive supranuclear palsy (PSP), multiple system atrophy (MSA) and idiopathic Parkinson’s disease (IPD) can be clinically indistinguishable, especially in the early stages, despite distinct patterns of molecular pathology. Structural neuroimaging holds promise for providing objective biomarkers for discriminating these diseases at the single subject level but all studies to date have reported incomplete separation of disease groups. In this study, we employed multi-class pattern recognition to assess the value of anatomical patterns derived from a widely available structural neuroimaging sequence for automated classification of these disorders. To achieve this, 17 patients with PSP, 14 with IPD and 19 with MSA were scanned using structural MRI along with 19 healthy controls (HCs). An advanced probabilistic pattern recognition approach was employed to evaluate the diagnostic value of several pre-defined anatomical patterns for discriminating the disorders, including: (i) a subcortical motor network; (ii) each of its component regions and (iii) the whole brain. All disease groups could be discriminated simultaneously with high accuracy using the subcortical motor network. The region providing the most accurate predictions overall was the midbrain/brainstem, which discriminated all disease groups from one another and from HCs. The subcortical network also produced more accurate predictions than the whole brain and all of its constituent regions. PSP was accurately predicted from the midbrain/brainstem, cerebellum and all basal ganglia compartments; MSA from the midbrain/brainstem and cerebellum and IPD from the midbrain/brainstem only. This study demonstrates that automated analysis of structural MRI can accurately predict diagnosis in individual patients with Parkinsonian disorders, and identifies distinct patterns of regional atrophy particularly useful for this process.


Neuroimaging data preprocessing
Structural images were segmented into different tissue types via the "new segment" tool in the SPM8 software (http://www.fil.ion.ucl.ac.uk/spm/, release 4667). [1] Rigidly aligned grey and white matter maps, down-sampled to 1.5 mm isotropic voxel size, were then used to diffeomorphically register all subjects to their common average (i.e. study-specific template), using a matching term that assumed a multinomial distribution. [2] Registration involved estimating a set of initial velocities, from which the deformations were computed by a geodesic shooting procedure. [3] As described in the main text, classification was based on a set of "scalar momentum" image features derived from this registration, which describe anatomical variability amongst subjects. [4] These comprise three components (corresponding to grey matter, white matter and other), but since they sum to zero at each voxel, they can be reduced to only two components (grey and white matter). These images contain all information necessary to reconstruct the original images (in addition to the template) and therefore provide a parsimonious representation of shape difference. The scalar momentum images for grey and white matter were spatially smoothed with an isotropic 10mm Gaussian kernel and masked anatomically to constrain them to either the whole brain, a subcortical motor network (bilateral cerebellum, midbrain/brainstem, caudate, putamen, pallidum and accumbens), or each of these regions, separately. All masks were defined anatomically using the atlas tools in the FSL software (http://www.fmrib.ox.ac.uk/fsl/, version 4.1). This choice of feature construction method and smoothing level was based on previous work, where smoothed scalar momentum images yielded greater accuracy than a range of alternative features (including Jacobian determinants, rigidly aligned grey matter, spatially normalised grey matter and Jacobian scaled spatially normalised grey matter) for predicting subject age and sex in a publicly available dataset (unpublished data). This finding was confirmed for the primary diagnostic classifier in this study (Classifier I) by comparing the accuracy obtained by a classifier based on the scalar momentum images to classifiers based on: (i) the Jacobian determinants and (ii) a concatenation of modulated grey-and white-matter images, which are both commonly used in neuroimaging (Table S1).

Pattern Recognition approach
The multi-class classification approach employed in the present work is described in detail elsewhere. [5] Briefly, class memberships were modelled probabilistically using a multinomial In (2), it can be seen that the class labels are conditionally independent of the data vectors, given the latent functions and ( | , ) denotes the posterior distribution over the latent functions. This posterior is analytically intractable but can be estimated using Markov chain Monte Carlo (MCMC) methods -the reader is referred elsewhere for full details. [5,6] All chains were thoroughly checked for convergence and efficiency as described in prior work. [5] After computation of (2), predictions were calibrated nonparametrically to compensate for the variance inflation characteristic of high-dimensional neuroimaging data. [7] Classifier assessment using cross-validation The main goal of any classification model is to estimate how well the classifier will make predictions for new data samples. As noted in the main text and is well known in the statistical literature, it is essential that this is performed using a dataset that has not been used to construct the model, otherwise performance measures may be overly optimistic. [8] Cross-validation is the standard statistical approach to estimate generalizability with moderately sized datasets and is well known to yield approximately unbiased estimates of the true generalisability. [9] In the present work, leave-one-out cross-validation was employed whereby the dataset was repeatedly repartitioned into disjoint training-and test datasets. At each iteration, a single subject from each group was excluded simultaneously (test dataset), and all model parameters were inferred from the remaining data (training dataset). Class predictions were then derived from the test dataset as described above and compared to their true values. This procedure was repeated excluding each subject once and overall performance measures were computed by averaging over all cross-validation folds. Note that all preprocessing was embedded within the cross-validation loop so a different studyspecific template was constructed for each cross-validation fold.

Significance testing of classifier assessment metrics
A Monte Carlo test was employed to assess significance of each measure of classifier performance (i.e. the sensitivity and predictive value for each class and the accuracy and overall predictive value; Figure 1, main text). This provides a non-parametric alternative to conventional approaches, suitable for multi-class problems. [10] To achieve this, 10,000 randomized confusion matrices were generated, each having identical class distributions to the true sample. Classifier assessment metrics were computed and p-values were derived by computing the proportion of permutations where the randomized statistic exceeded the true statistic.

Visualization of discriminating brain regions
To visualize the relative contribution of different brain regions to the prediction of each class, we present a multi-class generalization of an approach we have employed elsewhere for binary classification. [11][12][13] For this purpose, it is convenient to adopt an alternate but equivalent perspective for GP models known as the "weight space" view (c.f. the "function space" view presented above). [14] From this perspective, and under the assumption of linear covariance, the latent function values can be expressed as = T , where is a -dimensional vector of weights predictive of class . In a corresponding manner to the presentation above, the predictive weights can be collected by: = � 1 T , … , T � T and the predictive distribution for unseen data can be rewritten as: ( * | , , * ) = ∫ ( * | , * ) ( | , ) where ( | , ) is the posterior over the predictive weights. As is typical in neuroimaging applications, for the present application ≫ , so computing the predictive distribution from the weight-space view is computationally much more expensive than the function space view. However, this framework is still of interest because each can be visualized in the voxel space. For this purpose it is convenient to derive the weight-space representation from the function space representation. Under the MCMC approach employed here, the expected value of the predictive weights can be easily computed from the latent function values by a Monte Carlo sum, i.e.: where ( ) denotes the -th sample from the Markov chain, the + symbol denotes the (Moore-Penrose) pseudo-inverse, is the total number of steps in the chain and � is an augmented data matrix, given by block concatenating the data matrix times.

Classification performance: alternative classification features
Relative to the scalar momentum image features, the accuracy of the primary diagnostic classifier (Classifier I) was lower when either the Jacobian determinants or concatenated modulated grey and white matter were used as classification features (Table S1; c.f. Table 2, main text). Therefore these feature construction methods will not be considered further. Classification performance: whole-brain The whole-brain classifiers were not only less accurate overall relative to the classifiers trained using the subcortical motor network (Table S2), but they also had consistently lower accuracy and predictive value (PV) for each class ( Figure S1; c.f. Figure 2, main text).

Predictive weights
The patterns of predictive weights for each disease class derived from the classifier contrasting PSP, IPD, HCs and MSA using the subcortical motor network features (i.e. Classifier II described in the main text) are presented in Figure S2. These broadly show a correspondence with the accuracy derived from the regional classifiers in that regions with high accuracy individually generally show high magnitude weights (c.f. Figure 3 main text).
However, it is important to emphasize that the patterns of predictive weights given by (4) do not permit statistical inference over individual brain regions in a classical sense. Instead, they indicate the relative contribution of different voxels within the discriminative pattern to the classifier decision, which is important, given that both the predictive weights and the regional effects contribute to the prediction (3). Thus, to assist interpretation of the weights it is also desirable to consider classical statistical parametric maps, which are useful to quantify the magnitude and indicate the direction of focal effects in each brain region ( Figure   S3).  The predictive weights and statistical parametric maps for the scalar momentum features used in this work must be interpreted with some care. The scalar momentum coefficients govern the deformation of the images during registration and encode the deviations of each image from the template. [4] Informally, coefficient magnitudes can be considered to quantify the degree of regional dissimilarity between the template and image and the sign of the difference between coefficients in a pair of images is informative about whether a structure has undergone relatively greater expansion or contraction to match the template. In other words, negative coefficients may be considered to reflect atrophy relative to the template (i.e. relative to the average of the study population).
For PSP, relatively greater atrophy (with respect to the template) of the midbrain, caudate and decussations of the superior cerebellar peduncles in relation to HCs ( Figure S3) were important components of the predictive pattern and were assigned high magnitude negative weight ( Figure S2). For MSA, atrophy was widespread throughout the brainstem, cerebellar cortex and middle cerebellar peduncles ( Figure S3) where correspondingly high magnitude negative weights were distributed. The pattern for IPD is presented primarily for completeness, and no interpretation is offered since the classifier discriminating between classes using the motor network only identified IPD at trend level. Correspondingly, regional effects for IPD with respect to HCs were of low magnitude in all regions ( Figure S2).
However, as noted in the main text, the pattern of regional effects in the midbrain/brainstem -although subtle -accurately discriminated IPD from all other classes (Figure 2, main text).