A Longitudinal Study of Motor, Oculomotor and Cognitive Function in Progressive Supranuclear Palsy

Objective We studied the annual change in measures of motor, oculomotor and cognitive function in progressive supranuclear palsy. This had twin objectives, to assess the potential for clinical parameters to monitor disease progression in clinical trials and to illuminate the progression of pathophysiology. Methods Twenty three patients with progressive supranuclear palsy (Richardson’s syndrome) were compared to 22 matched controls at baseline and 16 of these patients compared at baseline and one year using: the progressive supranuclear palsy rating scale; the unified Parkinson’s disease rating scale; the revised Addenbrooke’s cognitive examination; the frontal assessment battery; the cubes section of the visual object and space perception battery; the Hayling and Brixton executive tests; and saccadic latencies. Results Patients were significantly impaired in all domains at baseline. However, cognitive performance was maintained over a year on the majority of tests. The unified Parkinson’s disease rating scale, saccadic latency and progressive supranuclear palsy rating scale deteriorated over a year, with the latter showing the largest change. Power estimates indicate that using the progressive supranuclear palsy rating scale as an outcome measure in a clinical trial would require 45 patients per arm, to identify a 50% reduction in rate of decline with 80% power. Conclusions Motor, oculomotor and cognitive domains deteriorate at different rates in progressive supranuclear palsy. This may be due to differential degeneration of their respective cortical-subcortical circuits, and has major implications for the selection of outcome measures in clinical trials due to wide variation in sensitivity to annual rates of decline.


Introduction
In 1964 Drs Steele, Richardson and Olszewski published their seminal report of 9 patients "who displayed an unusual progressive neurological disorder with ocular, motor and mental features" [1], a condition now known as progressive supranuclear palsy (PSP) or Richardson's syndrome. Subsequent work has sought to understand the natural history of the disease, including increased awareness of cognitive decline. With the advent of clinical trials of potential disease modifying drugs, many based on interfering with the hyperphosphorylation and aggregation of Tau protein, there is renewed interest in identifying reliable clinical markers of disease for early diagnosis and disease progression.
However, tests which are sensitive to the presence of the disease, may not be optimal for monitoring progression and vice versa [2]. Some investigators have proposed brain imaging as a biomarker [3,4]. Others have focussed on the clinical progression of PSP, either as part of a composite test such as the PSP rating scale [4][5][6], or with separate systems as summarised in Table 1. Some domains, such as motor ability, have been studied longitudinally using validated scales [7]. However, few studies have investigated the longitudinal change in other domains such as cognition. Predicting PSP tau pathology is possible with a high degree of accuracy in the presence of a typical PSP-RS (Richardson's syndrome) phenotype [7], but without data detailing progression of disease in multiple domains, improvements in the rate of decline of patients in trials could be missed.
In this study, our aim was to assess the rate of decline of neuropsychological, motor and oculomotor functions. Our hypothesis was that these functions would all be abnormal at the time of diagnosis, but that they would deteriorate differentially. This comparison is not only relevant to our ability to study disease progression in the context of disease modifying treatments: it would also offer new insights into the pathophysiological progression of PSP.

Ethics Statement
The Cambridgeshire research ethics committee approved this study, including the information sheet, consent documents and all tests to be carried out. All investigations were carried out with the adequate understanding and written consent of the participants involved in the research. The capacity of all patients was assessed by trained medical staff including a consultant neurologist. No patient was recruited to this study if they did not have the capacity to consent. Capacity was assessed and consent was obtained again after any interval in testing greater than six weeks. The patients' family was included in the process at each stage and, although not necessary, their agreement to testing was also obtained.

Participants
Twenty-three patients were recruited prospectively from a specialist neurological clinic for patients with PSP and related disorders, at Addenbrooke's hospital between 2007 and 2009. Contemporary clinical diagnoses for possible or probable PSP were made by an experienced neurologist according to consensus criteria [8]. With subsequent information the diagnoses have been revised to probable or definite PSP as, to date, ten of the patients have undergone post mortem examination: all ten had PSP. The phenotype identified by our inclusion criteria corresponds closely to the PSP-RS 'Richardson's syndrome' rather than other clinical manifestations of PSP pathology such as PSP-P. Baseline assessment was carried out at recruitment with interval testing as close to a year after baseline assessment as practicable.
At baseline, one patient (A) was unable to complete the saccadometry, two patients (B and C) with poor visual acuity undertook a non-visual subset of tests only, and two patients (D and E) failed to complete all of the neuropsychological tests due to fatigue or intercurrent illness. At interval testing, patient A was unable to complete the saccadometry but was able to complete all other tests, patients B, C and D were unable to complete testing due to intercurrent illness or fatigue and patient E died before the planned interval of 1 year. In addition a further 3 patients died before the end of the interval period. Sixteen patients had complete or near complete data sets at interval testing.
Twenty two age-and education-matched controls were recruited from the panel of volunteers at the MRC Cognition and Brain Sciences Unit (CBU) or from spouses of patients. Controls had normal hearing and corrected vision and did not have significant neurological or psychiatric comorbidity.

Motor and cognitive testing
Motor function was assessed with section III of the Unified Parkinson's Disease Rating Scale (UPDRS) [9] and the PSP rating scale (PSPRS) [5]. The PSPRS also includes sections for bulbar, oculomotor and personality changes. Scores for UPDRS and PSPRS were transformed by simple inversion so that high scores represented better function for ease of comparison across all tests. This was achieved by subtracting participant scores from the maximal test score (108 for UPDRS, 100 for the PSPRS total score and 5 for the PSPRS stage subscore).
Cognitive testing used the Addenbrooke's Cognitive Examination -revised (ACE-R) [10], the Frontal Assessment Battery (FAB) [11], the cubes subsection of the Visual Object and Space Perception Battery (VOSP) [12], the Hayling test and Brixton test [13]. There is a timed component in the Hayling test. Given the bradyphrenia and bradykinesia evident in PSP, we used only the number of incorrect responses in the second section of the test, assessing unsuccessful inhibition by participants, without reference to timing of responses. Hayling A errors refers to participants choosing, incorrectly, a stereotyped ending for a sentence. Hayling B errors are answers, which although semantically related, are not stereotyped. Scores for correct answers were given so that higher scores represented better function.

Saccadometry
Saccadometry was completed at baseline and interval. Saccades were measured using a head mounted binocular infra red sclerometer, recorded at 1kHz and low pass filtered at 250Hz, with 12 bit resolution [14]. It presents targets for the participants using low powered lasers mounted on the front and angled at +10°, 0 and -10° azimuth. The saccadometer uses a step task paradigm. After a random initial period between 0.5 and 1.0 seconds the central target is extinguished and simultaneously either the left or right target presented. The device measures the latency of the resulting saccade (time between the target moving and eyes starting to move). The device was automatically calibrated using a short series of presentations of the targets at the beginning of the session. Participants sat at a distance of 1.5 metres from a blank wall with the room darkened. Because the stimuli move exactly with the head, a bite bar is not required, leading to increased subject comfort.
After testing, the data were downloaded to a laptop and preprocessed using Latency Meter v 2.10 [14]. This contains an automated validation program that compares the log likelihood value for the position and velocity traces for each trial to the mean and standard deviation values for all the trials in the session. This is used to reject blinks, saccades in the wrong direction, and grossly abnormal traces using a rejection threshold for either the velocity or position traces. Saccade data were then analysed using SPIC software employing the LATER (Linear Approach to Threshold with Ergodic Rate) model [15][16][17] and reciprobit plots of response latencies. A typical reciprobit plot of a series of latencies is shown in Figure 1. The majority of the saccade population adhere to a normal distribution of inverse latencies, and can be seen to lie along a straight line in the reciprobit plot. There are a minority of saccades that are generated differently, with a distinct normal distribution of inverse latencies with high variance. These are seen with a reduced latency and lie along a different line -the 'early' saccade distribution. Three parameters, the reciprocal of the median latency, mu, and the slopes of the early and main lines, early sigma and sigma respectively, are sufficient to describe the two inverse latency distributions and can be related directly to the physiology of visually evoked saccade generation [17]; these parameters are estimated from the observed distributions by minimisation of the Kolmogorov-Smirnov one-sample statistic.

Statistical analysis
Statistical analysis used SPSS v 15 (SPSS Inc., Chicago, IL). Parametric data for patients and controls were compared with t-tests, one way (ANOVA) or repeated measures ANOVA (rm-ANOVA) with post hoc contrasts. Baseline and interval parametric data for patients were compared with paired t-tests.
Non-parametric data were investigated with Mann Whitney or Kruskal Wallis. χ 2 tests were used for categorical data.
Where the rate of change for different tests were compared, the annualised normalised score was used. This is the difference in scores for each test divided by the time interval between tests and multiplied by 1 year. This score was then divided by the maximum score for the test in order to compare different tests. For saccadometry, mu was divided by the mean plus two standard deviations for the controls (6.38).
Power calculations used Gpower 3.1.5 [18,19] with an alpha value of 0.05, Beta value of 0.2 (power 80%) and two sided t tests. Sample sizes were estimated for interventions that reduced the rate of decline by 25% and 50%.

Results
The groups were well matched demographically at baseline (see table 2). Figures 2-4 show baseline test scores for controls and all patients. It can be seen that patients were significantly worse than controls at baseline for all tests (see also table 3). Repeat testing of the patients followed up were completed at a mean interval of 1.2 years (SE 0.07).  Latency is plotted on the x axis using a reciprocal scale. The reciprocals of the latencies are equally spaced along this scale. Additionally this scale is mirrored, so that short latencies are to the left and long to the right: infinite latencies, whose reciprocals are zero, therefore form the right hand margin. Because their reciprocal latencies are normally distributed, with mean mu and standard deviation sigma, most latencies lie on a straight line (red), the main distribution, whose median and slope correspond to mu and sigma. In addition, under some conditions there may be a sub-population of early saccades (blue) that lie on a line of shallower slope, corresponding to a third parameter early sigma.
doi: 10.1371/journal.pone.0074486.g001 interval testing after a year with the paired t test comparison. It can be seen that for the majority of the tests, the mean change in score was either zero or very close to zero. The exceptions to this were the scores for the PSP Rating Scale (PSPRS) which showed a mean change over a year of 11.3 points (t(15)=11.2, p<0.001) (see table 4 and Figure 2), the Unified Parkinson's Disease Rating Scale III (UPDRS), which showed a mean change of 8.3 points (t(15)=3.6, p=0.003) (see table 4 and Figure 2) and mu (the inverse median latency for saccades), which showed a mean decrease in mu of 0.4 seconds -1 (equivalent to an increase in latency of 0.02 seconds) (t(13)=2.5, p=0.01) (see table 6 and Figure 4).
Although the comparisons between baseline and interval only included patients for whom we had both baseline and interval data, we looked for differences between those baseline patients who could not complete interval testing and those who could, in case their drop out had caused a bias. Demographically, there was no difference between these two patient subgroups in age (t(37)=0.4, p=0.7), gender (χ 2 (1)=0.01, p=1.0), education years (U=183, p=1.0) or disease duration (U=115, p=0.05). The lack of progression of cognition over a year was surprising, so we also looked at cognitive differences at baseline between those patients who we could and couldn't test at interval. Allowing for Bonferroni corrections (p<0.01 as significant for 5 comparisons) there were no differences (FAB (t(19)=1.7, p=0.12), Hayling A (t(21)=0.1, p=0.9), Hayling B (t(21)=0.9, p=0.4), ACE-R (t(21)=2.4, p=0.03, Brixton (t(18)=-1.3, p=0.2)). Figure 5 shows the annualised normalised rate of change of the UPDRS, PSPRS, mu and the ACE-R. Normalising each test to its maximum enables comparison directly across tests. It can be seen that cognitive function as measured by the ACE-R changes only very slightly over a year whereas the PSPRS changes markedly. Repeated measures ANOVA for the normalised annualised rates of change for ACE-R, UPDRS, mu and PSPRS shows that there was a significant main effect for rates of change in different domains of the PSP phenotype (F(3,39)=3.15, p=0.036). Post hoc contrasts revealed that rates of change were significantly more for PSPRS when compared to ACE-R (F(1,13)=17.2, p=0.001) but not for mu compared to PSPRS (F(1,13)=3.4, p=0.09) or for ACE-R compared to UPDRS (F(1,13)=3.9, p=0.7). Figure 6A shows how each element of the PSPRS changed over the course of a year, ordered from most change to the least. Figure 6B shows the subsection scores of the PSPRS ordered by most change. As can be seen in Figure 6, the gait/ midline sections of the scale undergo the most change, followed by changes expressed in the history given by the carer and patient.
We extended the analyses to investigate the potential role of the tests used in this study in therapeutic trials. The change in test scores over an interval of a year was used to calculate an effect size for power calculations. Table 7 shows the estimated group sizes needed to reveal a putative reduction of 25 and 50% in the rate of decline on each of the principal measures, over 12 months.
In addition, we investigated the correlation between baseline test scores and disease duration. Examining the patients for whom we had both baseline and interval data, there was a significant correlation between PSPRS and disease duration (Pearson correlation 0.68, p (2 tailed) = 0.005), with none of the other 11 measures tested having a p value of less than 0.1.
The groups consist of controls, all patients who were tested at baseline (PSP Baseline), baseline scores for those who were tested at both baseline and interval (PSP Interval), and baseline scores for those patients who were tested at baseline but either were unable to complete interval assessments (PSP Unable) or died before the interval assessments were due to be carried out (PSP Deceased

Discussion
This study examined the longitudinal change in progressive supranuclear palsy, including the cognitive, motor and oculomotor dimensions of this complex disease. Over one year, motor functions, and oculomotor decision time (saccade latency) changed significantly. However, multiple cognitive measures did not change significantly, despite being profoundly affected by the presence of disease at baseline.
The test exhibiting the greatest annual change was the PSP rating scale (see Figure 5) [5]. Other studies have also shown a comparable change in this composite score [4][5][6]. Golbe et al. Row A shows baseline scores on the test. Blue diamonds are controls, red squares are patients and black triangles mark the baseline score for patients who could not complete the interval testing. Hayling A and Hayling B scores have been transformed so that a higher score represents a better function (see methods). Row B shows the difference in score between baseline and interval, in those patients who completed both assessments. The score has been adjusted so that it shows the change in score over twelve months. Negative values represent worsening of function. In both sets of graphs, the x axis represents a nominal value. In line A the x axis is arranged so that controls are on the left, patients who completed interval assessments are in the middle and patients who did not complete the interval assessment are on the right. In line B the scores are arranged randomly.  Tests are named at the top of each column. VOSP is the Visual Object and Space Perception battery and mu is the reciprocal of the latency as measured by saccadometry. Row A shows baseline scores on the test. Blue diamonds are controls, red squares are patients and black triangles mark the baseline score for patients who died before interval testing. A lower value on the y axis for the mu graph corresponds to a lengthening of latency between stimulus presentation and saccade initiation. Row B shows the difference in score between baseline and interval, in those patients who completed both assessments. The score has been adjusted so that it shows the change in score over twelve months, irrespective of how far apart the assessments were. Negative values represent worsening of function. In both sets of graphs, the x axis represents a nominal value. In line A the x axis is arranged so that controls are on the left, patients who completed interval assessments are in the middle and patients who did not complete the interval assessment are on the right. In line B the scores are arranged randomly.   For 80% power with a 40% reduction in outcome variable over a year, 36 patients were estimated to be needed in each treatment arm using midbrain volume and 45 for total PSPRS score. One of the most interesting and novel results is that cognition, despite being very disordered in a significant proportion of patients, does not change appreciably over the course of a year (see Table 5 and Figure 3). This stability was seen in multiple tests, including the ACE-R, Hayling test errors, Frontal Assessment Battery and Brixton test. The only exception was the mental sub section of the PSPRS (see Figure 6). This section of the PSPRS has a measure of bradyphrenia -a cardinal cognitive feature of PSP -which  deteriorated over one year even though it is not objectively measured or operationalised. Cross sectional studies have found no correlation between cognitive function and disease duration [20][21][22], supporting our findings. However, some studies have reported that cognitive symptoms appear to progress throughout the disease [1,4,6,[23][24][25][26][27]. Only three of these were prospective longitudinal studies, of which, one used the PSPRS [4] and one used, as in the PSPRS, a carer reported questionnaire biased towards apathy, bradyphrenia and depression [6]. The other longitudinal study [24] found only a 12% increase in frontal lobe symptomatology between first and last visit (46% to 58%) compared to larger increases in other features of the disease, suggesting cognitive impairment had developed early and then remained relatively stable. Early cognitive impairment is also suggested by a PET study which found frontal cortical hypometabolism in patients with mild disease [28]. Taken together, these data could imply that frontal cognitive dysfunction occurs before overt presentation of the motor symptoms that usually lead to diagnosis and then remains relatively stable, apart from bradyphrenia. New treatments to avert cognitive impairment in PSP would therefore require a transformation in the awareness, recognition and specialist referral pathways for PSP.
Within the PSPRS, it can be seen ( Figure 6) that large changes occur for the gait/midline and history sections. Early falls are a core inclusion criteria for PSP, while the presence of a gait/midline disorder is reflected in the supportive criteria [8]. Although progression often leads to wheelchair dependence, other midline features (neck rigidity for example) appear to continue to progress. The change in gait/midline problems on the PSPRS is mirrored in the UPDRS motor section and has been seen in other studies [4,6].
The change in PSPRS mean score for dysphagia was below the rate of change of the overall score and that for dysarthria is only just above the overall score. Several studies have shown that the leading cause of death in PSP is respiratory complications arising from aspiration [24,[29][30][31], with dysarthria also deteriorating during the illness [6,24,25,30,32]. We may have seen greater changes if our patients had been at  more advanced stages of the illness. However, these findings may also be due to dysarthria and dysphagia being variable during the course of the day or non-linearity of the scale, relying, for example, on the patient coughing once or many times on drinking water [5].
Oculomotor function changed over one year, including the range of vertical gaze in the PSPRS (cf [33]). Horizontal gaze range, in contrast, was stable. However, we were especially interested in the latency immediately prior to a saccade, resulting from a cortical and subcortical supranuclear oculomotor decision network. We identified a small but significant change in latency (mu, reciprocal latency) between baseline and interval, indicating an increase in latency for visually evoked horizontal saccades. Such saccades have been proposed as a biomarker for the diagnosis or progression of PSP and other neurodegenerative disorders [34][35][36][37], although other studies have found variable changes in saccade parameters over time [6,37,38]. We suggest that while the latency of horizontal saccades remains useful to explore the neural systems of decision making in disease, and perhaps long term change in PSP, it is not optimal as a marker of change over 1 year, the typical timescale for pharmaceutical trials.
In our study, the greatest change was seen in the PSPRS, particularly the gait/midline section. We did not assess the Parkinson's plus scale developed within the NNIPPS trial (the Natural history and Neuroprotection in Parkinson Plus Syndromes, NNIPPS-PPS) [6]. This scale was developed by a consensus of experts and includes sections similar to parts of the UPDRS (motor, mental and activities of daily living (ADL) sections) and the PSPRS (mental section). As in our study, the greatest annual change was seen in mobility, axial bradykinesia and rigidity. They also found a large annual change in limb bradykinesia, whereas our limb section of the PSPRS was one of the sections that changed least. The group size is estimated for each arm of an intervention (not including attrition) where a therapy reduced progression of that aspect of the disease by 25 or 50%.
Alpha is set at 0.05, beta (β) at 0.2 (ie. a power of 80% By contrasting the range of tests which are abnormal at baseline in PSP (median time from onset to recruitment 3.0 years) with the tests which change over one year's interval, it is clear that motor, oculomotor and cognitive systems evolve at different timescales. From our data, cognition deteriorates early in susceptible individuals and then remains relatively stable. Motor deficits also occur early but continue to progress throughout the whole course of the disease. Oculomotor function, in terms of the range of eye movements, deteriorates before the time of diagnosis but continues to progress slowly during the middle stage of disease. Oculomotor function in terms of saccadic latency may be affected by disease, and may be informative about within-subject differences in cognition or brain function [36] but does not change markedly at the group level.
The differential rates of deterioration may be due to the relatively domain specific cortical -subcortical pathways [39] having different susceptibilities to PSP pathology or different capacities for functional compensation. This differential progression has important implications for patients. However it is also relevant to disease-modifying drug trials, where it is critical to choose the best marker of disease progression. We have used our data to assess the tests used as markers of disease progression, giving sample sizes needed depending on the effect size and likely effect of any treatment (table 7). It should be noted that our calculations pertain to studies of comparable patients at similar stages of disease (see tables 2 and 3 for our baseline data), and inclusion of early stage patients may lead to different power calculations. However, we recruited prospectively from patients who presented to a regional clinic with a diagnosis of PSP during the time of the study, it is likely that our cohort is typical of many other centres. It should also be borne in mind that symptomatic disease duration is not a good proxy for stage. Although in our sample, the estimated duration did correlate with baseline PSPRS score, it did not significantly correlate with rate of change of PSPRS or other baseline motor and cognitive measures. We also note that one of our patients had relatively slow progression, surviving 17 years. This is within diagnostic criteria and within the range of published cohorts, but nonetheless unusual.
Using these power calculations, the group sizes vary widely depending on the different elements of the disease that investigators may want to influence. Assessing improvement in cognition would require over 800 patients in each group using the ACE-R [10] but assessing global function requires a more tractable 45 patients using the PSPRS [5], which accords with the estimate from Whitwell et al. [4]. Payan et al. also assessed the PSPRS and estimated 100 patients in each group would be needed, but only 40 if using the NNIPPS-PPS [6].
There are several potential limitations to our study. Firstly, inclusion criteria relied on clinical rather than pathological diagnosis. However, ten cases have subsequently had a post mortem examination and the diagnosis of PSP was confirmed in all ten. Larger trials have also shown a diagnostic accuracy in excess of 90% [7]. Secondly, our study is relatively small. However, patients had typical clinical phenotypes of PSP (cf Richardson's syndrome, PSP-RS) and completed an intensive evaluation across many functional domains over one year, providing a significant addition to the previous literature on cognitive, motor or oculomotor progression. PSP usually progresses quickly and it is possible that those patients who did not complete interval testing represented those who had a more aggressive phenotype of the disease, leaving behind a subset of patients who were less likely to change over a year. Against this possibility however, is that there was no difference between survivors and non-survivors in terms of demographics, cognition or disease duration to suggest that they represented distinct populations. Furthermore, comparisons between baseline and interval metrics were only carried out between patients who had completed both sets of tests.
Another significant limitation is that patients may have different rates of decline for different functions as the disease progresses (eg. cognitive vs motor). Furthermore, tests may vary in their ability to represent the true function of patients through the course of the illness e.g. due to floor or ceiling effects. The interpretation and replication of our data, including power calculations for interventional trials, should therefore take into account the baseline characteristics of our cohort, including stage or severity of disease.
In conclusion, we suggest that cognition does not change appreciably over a year in the middle stages of PSP (after diagnosis), a novel result that may shed light on the underlying pathological deterioration in PSP and which needs to be replicated in further studies. Of clear significance to the development of new treatments, is that we have shown that patients show significant deterioration over one year using the PSPRS severity measure. Indeed, including patients who were clinically diagnosed with PSP [5], which is broadly equivalent to the operational diagnostic criteria used by a recent trial [7], annual change in the PSP rating scale was matched between our study and the original PSPRS study, at 11.3 points a year [5]. We also concur with Whitwell et al. that using the PSPRS, approximately 45 patients in each treatment arm would provide reasonable power for future clinical trials of a highly effective treatment (50% slowing of annual decline) [4]. The high rate of pathological confirmation from the PSP clinical phenotype [7] and the properties of the PSPRS, support the use of these simple tools in new clinical trials of this devastating disease.