Disease Severity and Progression in Progressive Supranuclear Palsy and Multiple System Atrophy: Validation of the NNIPPS – PARKINSON PLUS SCALE

Background The Natural History and Neuroprotection in Parkinson Plus Syndromes (NNIPPS) study was a large phase III randomized placebo-controlled trial of riluzole in Progressive Supranuclear Palsy (PSP, n = 362) and Multiple System Atrophy (MSA, n = 398). To assess disease severity and progression, we constructed and validated a new clinical rating scale as an ancillary study. Methods and Findings Patients were assessed at entry and 6-montly for up to 3 years. Evaluation of the scale's psychometric properties included reliability (n = 116), validity (n = 760), and responsiveness (n = 642). Among the 85 items of the initial scale, factor analysis revealed 83 items contributing to 15 clinically relevant dimensions, including Activity of daily Living/Mobility, Axial bradykinesia, Limb bradykinesia, Rigidity, Oculomotor, Cerebellar, Bulbar/Pseudo-bulbar, Mental, Orthostatic, Urinary, Limb dystonia, Axial dystonia, Pyramidal, Myoclonus and Tremor. All but the Pyramidal dimension demonstrated good internal consistency (Cronbach α≥0.70). Inter-rater reliability was high for the total score (Intra-class coefficient = 0.94) and 9 dimensions (Intra-class coefficient = 0.80–0.93), and moderate (Intra-class coefficient = 0.54–0.77) for 6. Correlations of the total score with other clinical measures of severity were good (rho≥0.70). The total score was significantly and linearly related to survival (p<0.0001). Responsiveness expressed as the Standardized Response Mean was high for the total score slope of change (SRM = 1.10), though higher in PSP (SRM = 1.25) than in MSA (SRM = 1.0), indicating a more rapid progression of PSP. The slope of change was constant with increasing disease severity demonstrating good linearity of the scale throughout disease stages. Although MSA and PSP differed quantitatively on the total score at entry and on rate of progression, the relative contribution of clinical dimensions to overall severity and progression was similar. Conclusions The NNIPPS-PPS has suitable validity, is reliable and sensitive, and therefore is appropriate for use in clinical studies with PSP or MSA. Trial Registration ClinicalTrials.gov NCT00211224


Introduction
Progressive Supranuclear Palsy (PSP) and Multiple System Atrophy (MSA), sometimes termed 'parkinson plus' syndromes, account for 10-20% of parkinsonian syndromes [1][2][3], although these figures may be an overestimate being derived from autopsy studies. Both diseases are associated with severe disability and early death [4][5][6][7]. PSP and MSA most commonly present with an akinetic-rigid syndrome, with additional features such as dysautonomia and cerebellar features in MSA, or oculomotor, bulbar, cognitive and behavioral abnormalities in PSP [2], [8]. However, the expression of these features is variable during the evolution of these syndromes, and although some are regarded as typical of PSP (e.g., supranuclear ophthalmoplegia, dementia) or of MSA (e.g., dysautonomia, cerebellar syndrome), there is considerable overlap between the two disorders [8][9][10][11][12]. In addition, if we are to study these disorders early in their evolution, then a generic 'parkinson plus' scale is required, and such a scale should capture all important aspects of the severity of the clinical syndromes. To date, no scale designed to assess severity and disease progression over the many functional dimensions relevant to parkinson plus syndromes has been developed and fully validated. Although the Unified Parkinson's Disability Rating Scale (UPDRS) [13] has been used in PSP [11], [14] and MSA [15], [16], assessment of its metric qualities has not been completed in this population. While the PSP Rating Scale (PSP-RS) [5], [17] and the Unified Multiple System Atrophy Rating Scale (UMSARS) [18][19][20] were designed specifically for PSP and MSA respectively, neither of these scales was designed to cover the full spectrum of disability in atypical parkinsonian ('parkinson plus') syndromes or to capture functional deficits in early MSA or PSP when the diagnosis remains uncertain. Indeed, a scale that can with equal validity be applied to either disease in the early stages is as important in the investigation of natural history as it is in clinical trials. As part of the NNIPPS study [8] we therefore developed a clinical scale applicable in large multicentre trials that would allow evaluation of atypical parkinsonian syndromes at all stages, while also providing useful measures of change across the whole course of disease evolution.
Thus our main objectives were to evaluate disease severity and progression in PSP and MSA in relation to treatment; to ascertain that prognostic factors at entry were balanced between treatment groups; and to provide candidate covariates for survival analysis. Critically, the NNIPPS study was designed with stratification according to diagnosis at entry (PSP versus MSA) and required balanced numbers of patients in each stratum. This allows independent assessments of the results for each condition, and unbiased comparisons of symptom severity between diseases. Here we present the symptom severity profile and rate of progression in each disorder as evaluated with the NNIPPS-PPS scale, along with its psychometric properties, including face and content validity, construct validity, inter-rater reliability, and responsiveness.

Ethics approval
The protocol and amendments were reviewed and approved by the Comité de Protection des Personnes of Pitié-Salpêtrière Hospital (France), the UK Multicentre Research Ethics Committee (MREC), (UK), Ethikkommission of the University of Ulm, (Germany), and by local Institutional Review Boards (Ethics Committees) where appropriate (UK, Germany).

Trial design
The NNIPPS study was granted approval by the relevant Institutional review boards and all subjects gave fully informed signed consent before enrolment. Patients with an akinetic-rigid syndrome diagnosed as PSP or MSA according to the NNIPPS diagnostic criteria [8] were eligible. Details of the therapeutic trial design and results have been reported previously [8]. In brief, the intent to treat population comprised 760 patients (362 PSP and 398 MSA) recruited in 44 centers in the UK, France and Germany. Patients were stratified according to diagnosis and randomized double-blind to riluzole or placebo. The study was powered to demonstrate efficacy within each strata independently. The primary efficacy measure was survival, and secondary endpoints were rates of change in functional scores. Patients were evaluated 6-monthly for 3 years until death or the administrative cut-off date.

Scale construction
Prior to the start of the trial, items were selected through expert consensus as part of a broad clinical description of both MSA and PSP. The dimensions included (i) functional disability (activities of daily living), (ii) mental function (cognition, mood & behavior); (iii) extra-pyramidal motor disability (rigidity, bradykinesia), (iv) tremor, (v) oculomotor function, (vi) cerebellar signs, (vii) pyramidal signs, (viii) dysautonomia, (ix) bulbar/pseudobulbar symptoms, (x) myoclonus, and (xi) dystonia. Items were selected from the following scales available at that time, the UPDRS (all items from Mental, ADL and Motor examination sections) [13], the PSP-RS (six items from the mental section) [17], three items from the International Cooperative Ataxia Rating Scale (ICARS) [21], the global ataxia score of the Expanded Disability Status Scale (EDSS ) [22], and four items evaluating orthostatic signs and three for urinary signs from the Autonomic Symptom Profile [23] adapted to interview record instead of self-rating. Additional items were included to assess oculomotor signs, dystonia, myoclonus, pyramidal signs, sitting down and strength of cough.
A preliminary version of 109 items was evaluated in a pilot study to check each item and category wording. Redundant or inappropriate items were eliminated to obtain the first version comprising 85 items to be tested. Severity levels of items ranged from 0 (''normal'') to a maximum of 6 (very severe), with a majority of items (65) scored on a 5-point scale (0-4) (Supporting information S1). Four sections were interview based with patient and/or caregiver (Mental, Activities of Daily Living-ADL, orthostatic and urinary signs), eleven were assessed through examination. Time to complete the scale was 30-45 minutes depending on clinical state of patient. Throughout the study, the scale was completed in all centres using an English version.

Psychometric properties
According to the recommendations of the American Psychological Association [24], we evaluated face and content validity, construct validity (Factor analysis, internal consistency, convergent and predictive validity) [25] (Supporting information S2). Total score and dimensional sub scores were obtained from summing item scores overall or within dimensions, respectively.

Reliability
An inter-rater reliability study was conducted with sub-samples of patients recruited from 11 centers (France: n = 3, UK: n = 3, Germany: n = 5). At inclusion, patients were evaluated twice independently on the same day. To assess inter-rater agreement, Cohen's linear weighted kappa (k w ) or simple kappa (k) for binary items was calculated for each item [33], [34]. For the dimensional sub scores and the total score, Fisher's intra-class coefficients (ICC) were computed using analysis of variance (ANOVA) with a oneway random effect model [35]. Inter-rater reliability coefficients were interpreted according to proposed standards for strength of agreement as: #0.20 = poor, 0.01-0.20 = slight, 0.21-0.40 = fair, 0.41 to 0.60 = moderate, 0.61-0.80 = substantial, and 0.81-1.0 = almost perfect [36]. Individual item strength of agreement was considered as acceptable for k.0.40 (moderate to almost perfect); for dimensional sub scores, ICC threshold for acceptability was raised to 0.70. Internal consistency of the total and dimensional scores was evaluated through Cronbach a coefficients and considered acceptable for a.0.70.

Sensitivity to change
For each patient with at least two usable assessments, repeated measurements of the NNIPPS-PPS total score and dimensional sub scores were summarized by the slope of change (annual rate of change in scores), using unweighted least-square regression estimates [37]. To assess independence of change relative to severity stage, we compared total score slope of change across the whole range of severity grades defined by the CGI-ds (one-way anova with test of trend). To test scale sensitivity to treatment effects, mean slopes were compared between the treated and placebo groups using two-way anova including treatment, diagnostic strata, and treatment by strata interaction factors.
Responsiveness was further evaluated using effect size (ES) defined as the ratio of the difference in slopes of change between treatment groups to the Standard Deviation (SD) of placebo (mean slope riluzole -mean slope placebo/SD slopes placebo). To assess change within MSA and PSP strata and overall, we used the standardized response mean (SRM) defined as the ratio of the mean score change to the standard deviation (SD) of the score change. The SRM and ES values were interpreted as small (0.20 to 0.49), moderate (0.50 to 0.79) or large (.0.80) [38].
For power calculations and assessment of scale efficiency, sample size estimates were calculated within MSA and PSP strata and overall (p (a) = 0.05, p(12b) = 0.80), using the total NNIPPS-PPS score slopes expressed as annual rate of change, those of the UPDRS motor score and SEADL, and those reported for UMSARS [39] and PSP-RS [5].
To explore the dimensional profiles of PSP and MSA, means and SD of scale scores at entry or of score slopes of change were calculated for the overall population and broken down by diagnostic strata (PSP versus MSA); Within diagnostic strata, these were tested for significance with Student's t test comparing means to a theoretical value of 0, and across diagnostic strata, with Student's t test for independent groups. For graphical representation of severity profiles at entry and at follow-up, mean dimensional scores at entry and mean slopes of change were expressed as percent of maximum dimensional scores. To assess the relative contribution of each dimension to overall severity within each disease, mean dimensional scores (at entry) and dimensional slopes of change were also expressed as percent of total score and of total score slope of change, respectively.
All analyses were conducted on the Intent to Treat population (ITT, or sub-groups of the ITT where appropriate), using SAS (9.1.1) software. Significance level was set at p,0.05 (2-sided), except when comparing dimensional sub scores between groups, where Bonferroni correction for multiple comparisons was applied (p,0.003).

Results
The characteristics of the trial population and main results are reported in detail elsewhere [8]. The NNIPPS diagnostic criteria, validated prospectively against pathology, proved highly sensitive and specific, and the NNIPPS sample was broadly representative of the PSP and MSA patient population. Patients alive at the end of the study had at least 30 months follow-up and a total of 342 patients deceased during the trial (47% PSP patients, 43% MSA patients). Disease severity was comparable in both treatment groups at entry. On follow-up, since there was no treatment effect, on any primary or secondary efficacy measures, data from placebo and riluzole groups were combined.

Face and content validity
All items of the scale were clearly understood by trial investigators, and considered appropriate to measure severity of PSP and MSA syndromes. The expert neurologists advised that all relevant dimensions for assessment of severity of both diseases were reasonably well represented with the items selected.

Construct validity
Due to poor rate of completion, the item ''erectile dysfunction'' was excluded from the scale prior to analysis. For the Principal Component Analysis (PCA), patients with any additional item missing (11% of cases) were excluded. The analysis population included complete records of 675 patients (PSP n = 317; MSA n = 358). The Principal Component Analysis (PCA) extracted 15 factors, altogether contributing to 62% of total variance (Table  S1), with clearly identifiable clinical meaning, and corresponding to the a priori defined clinical dimensions. A single item, ''sensory complaints'' not correlating with any factor was further excluded from the scale. The first factor, consisting of two sets of items, 7 interview-based assessing activity of daily living and 7 from motor examination, was split for further analyses into two clinical dimensions ADL/Mobility and Axial Bradykinesia respectively. Items assessing tremor, correlating with 2 separate factors (Tremor at rest and Postural tremor), were combined into one single dimension (Tremor) as rest tremor symptoms were either absent or mild in these patients. The resulting 83-item scale, summarized into 15 dimensional sub scores and a total score, underwent thorough validation and was used to evaluate disease severity and progression. The internal consistency of the total score was excellent (Cronbach a = 0.92), and acceptable to high for all dimensional sub scores (Cronbach a = 0.68-0.94) except the Pyramidal score (Cronbach a = 0.39) ( Table 1). Convergent validity was good as shown by the high correlation of the total score with global severity scales such as the CGI-ds (r = 0.72), HYS (r = 0.76) and SEADL (r = 20.80). Moderate correlation was found with Quality of Life scales (with PDQ-8: r = 0.48, SF-36 physical score r = 20.58). The ADL/mobility, Axial and Limb bradykinesia, and Bulbar-pseudobulbar sub scores were the most correlated (r = 0.49-0.85) with HYS, SEADL, and the CGI-ds (Table S2). Correlations of Cerebellar, Pyramidal, Rigidity, Bulbar/pseudo bulbar, Mental, Limb Bradykinesia and Axial Bradykinesia, Orthostatic and Urinary sub-scores with their corresponding VAS were satisfactory (r = 0.52-0.76). Correlations of the Orthostatic and Urinary scores with the CGI dysautonomia were also satisfactory (respectively r = 0.53 and 0.64). The Mental score correlated moderately with the FAB (r = 20.49) and the MMSE (r = 20.46). No relationship (r,0.30) with age or disease duration was found for any of the NNIPPS-PPS scores. This weak correlation with the disease duration could partly be explained by the bivariate distribution, with a substantial proportion of patients with low CGI-ds (1-3) in those with longer disease duration above 5 years (34%, i.e., slow progressors) and high CGI-ds (4)(5)(6) in those with short disease duration (,3 years) (37%, i.e., fast progressors). The convergent validity was further supported by the good discrimination between the two extreme groups of GCI-ds scores, with total score and 11 out 15 dimensional scores significantly higher (p,0.003 with Bonferroni correction) in the high severity group (Figure 1).
The total score showed PSP patients to be slightly more severe at entry than MSA patients ( Table 1). As inevitable in view of our strata inclusion and exclusion criteria, Oculomotor and Mental scores were higher in PSP, while MSA patients showed higher scores for Tremor, Cerebellar, Orthostatic and Urinary symptoms (Figure 2 right). When sub scores were expressed as percent of total score, for those scores unrelated to inclusion/exclusion criteria (n = 9) which contributed to approximately 70 percent of the total score, dimensional profiles were identical (Figure 2 Left). Importantly, within each diagnostic stratum, all mean dimensional sub-scores of the NNIPPS-PPS, including those related to strata inclusion and exclusion criteria were significantly different from zero (p,0.001 by Student's t test), indicating that all clinical dimensions were present in each disorder, although at varying levels (  (Table 2). Multivariate stepwise Cox model analysis with candidate covariates including baseline demographic characteristics (strata, gender, disease duration, age at inclusion, age at onset), global severity scales (HYS, SEADL, CGI-ds, CGI dysautonomia) and NNIPPS-PPS total score, showed the latter as best predictor of survival (Table S3).

Inter-rater reliability
A total of 116 patients (MSA n = 74, PSP n = 42) were analyzed with a total of 33 evaluators including general neurologists, geriatricians, as well as experts in movement disorders. The characteristics of the 116 patients studied (France (n = 70), UK (n = 18) and Germany (n = 28)) were representative of the overall NNIPPS ITT population [8] ( Table 3). The reliability of the total score was excellent (ICC = 0.94). For 14 of the 15 dimensional subscores, ICC values were substantial to almost perfect and moderate for one (Myoclonus) ( Table 1). Item wise, inter-rater agreement was considered as acceptable (k w .0.40, moderate to almost perfect) for 79 items (95%), including substantial for 38 items (k w .0.6) and moderate for 41 (k w 0.4 to 0.6); four items had slight to fair reliability (k w ,0.4), two in the tremor section and two myoclonus items. On feedback, discrepancies between investigators' scores were accounted for (i) fluctuations in the severity of clinical symptoms and signs during the day, (ii) differences in interview technique, (iii) scoring of signs such as dystonia or myoclonus requiring expertise to be detected, and (iv) interpretation of items (mainly those of the mental function). Based on this feedback, standard operating procedures were established and implemented in the clinical trial.

Responsiveness
There were 642 patients with at least two usable assessments (PSP n = 305, MSA n = 337) to assess rates of change. In both groups, the rate of change of the total score was highly significant (p,10 24 ), with PSP patients showing a higher progression rate as compared to MSA (p,10 24 ). In the PSP group, rates of change were highly significant (p,10 24 ) for all but three dimensions (Orthostatic, Myoclonia, Tremor) and in the MSA group one only (Orthostatic) was not significant (Table 4). In both groups the rate of change in Orthostatic score paradoxically showed non-significant improvement with time, which upon examination was found  Table 1, far left column). Comparisons (Student's t tests) were made between the two subgroups defined by the extreme values of the Clinician Global Impression of disease severity (CGI-ds) in the overall study population. CGI Borderline/ Mild illness (score 1-2) n = 93, dotted line; CGI Severe/extremely severe illness (score 5-6) n = 142, solid line. ns: not significant at p,0.003 with Bonferroni correction. doi:10.1371/journal.pone.0022293.g001 related to biased scorings for patients not being able to stand or walk anymore. The same bias was found to significantly affect Cerebellar scores at follow-up. The total score re-calculated without these two sub-scores revealed little alteration of the slope of change (Table 4). While there were clear differences in rates of progression for dimensional sub-scores between PSP and MSA (Figure 4 Left), when dimensional slopes of change within disease were expressed as percent of the total score slope, the profile of contribution of these to overall disease severity progression was remarkably similar even for dimensions related to inclusion criteria such as Mental or Urinary dimensions (Figure 4 Right).
There was no difference in the slope of change of the total score across the different levels of the CGI-ds (21.8 point per year in the lowest severity group versus 22.1 in the highest severity group, p = ns) indicating consistency of the scale across disease stages. Moreover there was no correlation between the baseline total score and slope of change (Spearman r = 0.04, p = ns).
Consistent with the lack of overall treatment effect on survival or on other functional scales [8], no difference was found between treatment groups for mean slopes of change in total NNIPPS-PPS score (Effect Size = 0.03).
When calculated across all visits, the standardized response mean (SRM) was large for both conditions (1.10 overall) with a higher response for PSP patients (SRM = 1.25) than for MSA patients (SRM = 1.00) thus confirming the more rapid progression in the former.
Compared to UPDRS, SEADL, UMSARS or PSP-RS, sample size estimates to detect a significant treatment difference in slope were substantially lower (30% to 60%) with the NNIPPS-PPS total score, whatever the group of patients considered (Table 5).

Discussion
The NNIPPS-PPS project is unique in attempting to develop and validate prospectively a comprehensive rating scale for both PSP and MSA that can be applied in the early stages of disease when sensitivity and specificity of current consensus diagnostic criteria are poor [2] or as yet untested [2], [40]. The validation of the NNIPPS-PPS scale in a large multicentre clinical trial in PSP and MSA enabled us to prospectively describe and compare symptoms severity and progression of a population of well characterised patients in which diagnostic criteria, prospectively tested against pathology, were both highly sensitive and specific [8]. Although the research criteria for inclusion in NNIPPS may differ from criteria for diagnosis in the clinic (e.g., patients with a pure cerebellar or pure autonomic presentation of MSA, and patients with PSP developing supranuclear palsy later in disease evolution, were formally excluded from the trial), our inclusion criteria were quite liberal. For example, we accepted a very mild . Dimensional sub scores are expressed as percentage of the total score to evaluate relative contribution of each dimension to overall severity score. Comparisons (Student's t tests) were made between the two strata. PSP n = 362, dotted line; MSA n = 398, solid line. Left: sub scores unrelated to strata inclusion/exclusion criteria-three comparisons reached significance level at p,0.003: Limb bradykinesia, Rigidity and Myoclonia cumulating to 3.4% overall difference in contribution to total score. Right: sub scores related to strata inclusion/exclusion criteria-all differences are significant at p,0.003 with 28.2% overall difference in contribution to total score. Contributions of dimensions related to inclusion criteria amount for 27.6% and 17.3% for PSP and MSA respectively; Contributions of dimensions related to exclusion criteria amount for 4.9% and 11.8% in PSP and MSA respectively. doi:10.1371/journal.pone.0022293.g002 akinetic-rigidity syndrome (i.e., only one of 14 items rated as mild in the UPDRS motor examination) [8]. On the whole, our sample should be relatively close to the clinical population, presenting a    PSP and MSA columns: NNIPPS-PPS dimensional and total scores slopes of change (mean 6 SD points per year) by strata; N: number of patients with at least two usable assessments over the three year follow-up. Maximum (most severe) theoretical scores are indicated in the far left column (brackets). Strata p value column: p value from ANOVA comparing slope of change between strata. Slope test columns: p value from within-group t test comparing slopes of change within strata (PSP, MSA) to 0 (no change). *Total score-2: Cerebellar and Orthostatic scores at follow-up visits were found to be highly biased by interference with walking ability (some items becoming impossible to rate when the patient was unable to stand), and/or motor disability (eg, rigidity), their respective scorings were removed from this Total score calculation with minor alteration in the overall PPS slope of change in both groups. doi:10.1371/journal.pone.0022293.t004 broad spectrum of severity and clinical profiles, thus allowing robust generalisation of the results. The 15 dimensional sub-scores identified through factor analysis confirmed the hypothesised clinical dimensions, accurately reflecting the complex clinical profile of these two conditions. Overall, the dimensional scores at entry demonstrated a remarkably similar clinical profile in PSP and MSA, with complete overlap in nine dimensions (Figure 2 Left), together contributing to about 70% of the total severity score at entry in each disease. These findings are well supported by the psychometric quality of the scale to measure disease severity, in terms of reliability, construct validity, predictivity and sensitivity to change.
Although the data were acquired in the setting of a' field-type' study involving numerous clinicians, inter-rater reliability of the NNIPPS-PPS was high, both at the item level and sub-scores with 95% and 87% with acceptable to high agreement, respectively. Likewise, total score and all dimensions except the Pyramidal one showed acceptable to high internal consistency.
For assessment of convergent validity, we chose several generic evaluations to investigate different approaches of severity assessment. The scale demonstrated a good convergence with other clinical measures for the overall score and for dimensions where reference measures could be obtained. Predictive validity of the scale was clearly demonstrated through survival analysis with total score and most dimensional scores highly predictive of survival.
Analysis of the repeated measures over the 3 year follow-up showed that the scale appropriately reflects disease progression (Table 4), except for Myoclonia which had a very low frequency and low severity in both conditions, and the Cerebellar and Orthostatic dimensions which could not be reliably assessed at follow-up once patients were unable to stand, or were treated for orthostatic symptoms. On the whole, the slopes of progression of sub scores also demonstrated a remarkably similar profile in MSA and PSP (Figure 4 Right). Nevertheless, as previously reported in [8], PSP patients had more severe symptoms and signs at entry, and had a faster rate of progression on follow-up compared to MSA in terms of both functional disability and survival. This difference was clearly detected with the NNIPPS-PPS scale, demonstrating the good psychometric quality of the scale (Figure 4 Left). To confirm the usefulness of the total score as an outcome measure for clinical interventions, we calculated the standardized response mean (SRM) which reflects the ability of the scale to detect change. The NNIPPS-PPS total score was able to detect a smaller effect for disease progression than we originally hypothesized [8]. Compared to UPDRS, SEADL, PSP-RS or UMSAR scales, the NNIPPS-PPS scale requires fewer patients to detect a given treatment effect. However, the absence of a treatment effect with riluzole precluded the assessment of responsiveness to treatment [8].
A major concern for the application of any scale is the relation between rate of progression and disease severity (i.e., linearity). Non-linearity contributes to bias as the slope varies with disease severity. We found no correlation between total score slope and the total score at inclusion, or between slope and CGI-ds, as the annual decrease remained constant across the different severity levels, from mild to very severe. This is at variance with the SEADL for which the annual rate of progression decreased with greater disease severity (data not shown), or with the UMSARS [39]. This may be explained by a ceiling effect affecting these measurements, which was not present with the NNIPPS-PPS.
Several dimensions, Dystonia (axial or Limb), Myoclonia, Cerebellar, Orthostatic and Pyramidal provided limited information. Although not frequent and not contributing much to overall disease severity in our analysis, Dystonia and Myoclonia dimensions showed acceptable psychometric properties and should be kept as they may be disabling, of prognostic value when present and diagnostically useful. Cerebellar and Orthostatic dimensions showed acceptable construct validity and reliability but their assessments were biased at follow-up, suggesting the need for revised standard operating procedures. The Pyramidal dimension proved difficult to quantify, had low internal consistency and reliability, hence its contribution to overall disease severity and progression is questionable. However, nearly 50% of patients in both conditions presented with pyramidal signs at inclusion [8]. To assess its real contribution to disease severity and progression the construct of the Pyramidal dimension should be reconsidered. Lastly, the domain exploring sexual symptoms requires further development to complete evaluation of dysautonomia. As it is likely that PSP could be combined with other tauopathies such as corticobasal degeneration syndrome (CBD), further work on the scale may consider adapting the scale for CBD, including elements such as apraxia. These issues are now being addressed in a new ongoing study.
The development of a scale should allow an 'unbiased' assessment of the full range of functional deficits in the disorders in question. This is particularly important in complex multisystem disorders such as PSP and MSA. We chose to design a comprehensive, more extended scale, rather than to limit the dimensions to the most characteristic features of PSP and MSA. In that respect, we have confirmed that the Bulbar syndrome is an independent dimension with important contributions to disease severity, progression and prognosis in both conditions. While cerebellar dysfunction is characteristic of MSA, it also occurs in PSP, as Steele et al. [41] pointed in their original description of PSP. Likewise, cognitive abnormalities have often been regarded as unimportant in MSA [18], but we have shown that these are relatively common in MSA. In a previous paper [42], we showed that cognitive impairment substantially increased the false diagnosis rate in the MSA group. However, the overall rate of false diagnosis was low (12%) [8] and the cognitive impairment predicted only a third of these. Thus, these few misdiagnosed cases cannot account for the decline in mental functioning in patients diagnosed with MSA. Furthermore, 18.2% of the neuropathologically confirmed MSA cases were found to be cognitively impaired a frequency similar to the trial population (i.e., 20%). Although generally less severe than in PSP, the profile of cognitive dysfunction in MSA was similar on the Dementia Rating Scale [42]. Our results confirm that all a priori defined dimensions are present in both disorders, differing only in terms of degree of severity or rate of progression.
Overall, the NNIPPS study has provided new insights on the natural history of PSP and MSA. Our assumptions at the planning stage were that overall diagnostic accuracy would be low, particularly early in the disease course and that some overlap might therefore be present in the assessments of disease severity. Our results have shown that the NNIPPS diagnostic criteria had good sensitivity and specificity even at the early stage [8], while the dimensional profile of disease severity and progression as analyzed here showed wider overlap than expected. These findings are not contradictory as the NNIPPS diagnostic criteria though specific to each condition represented only a partial aspect of the overall disease severity assessed with the NNIPS-PPS. On the other hand, our consistent findings of similar patterns of cognitive disability in MSA and PSP [42] and their high contribution to overall disease severity and progression, argue strongly that the current consensus criteria for MSA [40] should be revised [43], [44].
In conclusion, we have developed a clinical scale combining features of MSA and PSP, which in the early stages share common features, making accurate diagnosis difficult. The study has provided evidence, prospectively collected in a large multicentre cohort that there is consistent overlap between these disorders, differing in degree of severity and progression rates. Our results show that the NNIPPS-PPS has the psychometric qualities required to measure disease severity and progression in both diseases, is efficient for powering trials, and is strongly predictive of survival. These features make it suitable for capturing the effect of disease-modifying therapy in clinical trials for MSA, PSP or aty pical parkinsonian ('parkinson plus') syndromes generically.

Supporting Information
Supporting information S1 PARKINSON PLUS SCALE (NNIPPS-PPS) (83 items). The 83 items of the NNIPPS-PPS scale are presented within their respective dimensions along with scoring definition for each item. (DOC) Supporting information S2 Details of psychometric validation methods and results.

(DOC)
Table S1 Factor analysis of the NNIPPS-Parkinson Plus Scale. Data from 675 patients (317 PSP and 358 MSA) with fully completed scales at inclusion were submitted to Principal Component Analysis (PCA). The analysis included 84 out of the 85 items of the scale as one item, ''erectile dysfunction'', could not be included due to poor completion rate. 85 patients (11%) had 1 to 10 item scores missing and were therefore excluded from analysis. Fifteen factors were extracted following varimax rotation. Loadings.0.30 of each item with corresponding factor are listed. Only one item, ''sensory complaints'', did not correlate with any factor. For further analysis, and on clinical grounds, the first factor was split into 2 clinical dimensions: ADL/mobility based on interview items, and Axial bradykinesia based on motor examination; the 2 tremor factors were combined into a single dimension (Tremor).