Longitudinal Changes in Clock Drawing Test (CDT) Performance before and after Cognitive Decline

Background Many scoring systems exist for clock drawing task variants. However, none of them are reliable in evaluating longitudinal changes of cognitive function. The purpose of this study is to create a simple yet optimal scoring procedure to evaluate cognitive decline using a clinic-based sample. Methods Clock-drawings from 121 participants (76 individuals with no dementia and later did not develop dementia after a mean 41.2-month follow-up, 45 individuals with no dementia became demented after a mean 42.3-month follow-up) were analyzed using t-test to determine a new and simplified CDT scoring system. The new scoring method was then compared with other commonly used systems. Results In the converters, there were only 7 items that are significantly different between the initial visits and the second visits. We propose a new scoring system that includes the seven critical items: numbers are equally spaced (12–3–6–9) (p = 0.031), the other eight numbers are marked (p = 0.022), numbers are clockwise (p = 0.002), all numbers are correct (p = 0.030), distance between numbers is constant (p = 0.016), clock has two hands (p = 0.000), arrows are drawn (p = 0.003). Compared with other traditionally used scoring methods, this based change clock drawing test (BCCDT) has one of the most balanced sensitivities/specificities with a clinic-based sample. Conclusions The new CDT scoring system provides further evidence in support of a simple and reliable clock-drawing scoring system in follow-up studies to evaluate cognitive decline, which can be used in assessing the efficacy of medicine.


Introduction
Neuropsychological evaluations are an integral part of a complete geriatric evaluation used to diagnose dementia. The clock drawing test (CDT), widely acknowledged for its simplicity and ease of administration, is a measure used to detect cognitive decline associated with a variety of neurobehavioral disorders. Moreover, the CDT requires different cognitive abilities including auditory and visual comprehension, concentration, visuospatial abilities, abstract conceptualization, and executive control [1]. Deficits in these areas reflect possible frontal and temporoparietal disturbances that are often exhibited in Alzheimer disease (AD) [2][3][4], and that may not easily be detected by commonly-used cognitive screening tests such as the Mini-Mental State Exam (MMSE) [5].Correlating highly with the MMSE [6] and other measures of global cognitive decline, the CDT serves as a simple and nonthreatening cognitive screen, rendering it a popular tool in both clinical and research practices [5], [7].
In the past 30 years, many variations of the Clock Drawing Test (CDT) have risen to the forefront as a dementia screening measure [6], [8][9][10][11][12][13][14][15][16][17] (see table 1). The scoring system by Shulman et al. in 1986 [18] was one of the oldest methods. Sunderland et al. [9] used a 10-point anchored system based on preset criteria with an arbitrary cut-off at 6 points. They found that interrater reliability was high in clinicians and non-clinicians. However, this scale proved difficult to apply according to the criteria provided since it assumes that the representation of the hands is first and entirely affected, and other errors in the representation of numbers and the clock face occur later. Therefore, some drawings received very low scores for minor errors in the representation of numbers even though the hands were properly placed. In 1989, Wolf-Klein et al. [6] tested patients who were admitted consecutively to a nursing facility without preselection, although the group with AD was older than the normal group. The 10 anchor points pertain only to the spacing of the numbers; time setting is not assessed, therefore their system was less sensitive to problems with executive functioning. Sample anchor points include: 10 'normal'; 7 'very inappropriate spacing'; 4 'counter-clockwise rotation'; and 1 'irrelevant figures'. They reported a sensitivity of 75.2% and a specificity of 97.7% for distinguishing between demented and ''mentally normal elderly.'' The Clock Completion Test of Watson et al. [11] is an objective and simple scoring method. The subject is asked to place all the numbers in the clock, but not to set a time. Consequently, the scoring is only based on the position of the numbers in the clock face. No hands are required or scored and so some sensitivity is lost. The authors report that the number of digits in the 4th quadrant (9-12) had the best agreement with the diagnosis of dementia. The Clock Drawing Interpretation Scale (CDIS) by Mendez et al. [10] uses 20 points distributed between general impression, placement of numbers and hands with a score of higher than 18 as being normal. The authors found that the presence of the number 2 and the correct location of the minute hand were the items most frequently absent and were absent in all AD patients. In 1998, Royall et al. [14] developed the CLOX test, a CDT scoring system, which they mention is specifically designed to measure executive control functions. The patient is asked to draw a clock on an empty page and later to copy a clock. The authors suggest that the difference between these tests can be a measure of executive control function.
Recently, more and more researches focused on its utility on screening mild cognitive impairment despite the inconsistent results, but little is known about the longitudinal changes in performance before and after cognitive decline. To our knowledge, most of the previous articles were cross-sectional, no article has evaluated whether individual with no dementia had progressed to mild Alzheimer's disease using CDT. Therefore, we conduct this study to investigate which aspects of clock drawing are important factors while assessing the characteristic changes in performance over time.

Participants
This study was conducted at the Memory Clinic of Shanghai Huashan Hospital Fudan University. The cohort consisted of participants referred to the clinic between June 2004 and Nov 2012 after they had finished the laboratory tests and cranial CT/ MRI scan and were found to have no clinically significant abnormalities in vitamin B12, folic acid, thyroid function (free triiodothyronine-FT3, free tetraiodothyronine-FT4, thyroid stimulating hormone-TSH), rapid plasma regain (RPR), or treponema pallidum particle agglutination (TPPA). During the initial visits, all patients were assessed by physicians experienced in dementia disorders, and underwent thorough physical, psychiatric and neurological examinations, as well as an interview that focused on their cognitive symptoms. All of the MCI participants were diagnosed according to the following which take Mayo criteria [37] as reference: (1) cognitive complaints verified by an informant; (2) cognitive impairment lasting more than 3 months; (3) mini-mental state examination-Chinese version (C-MMSE) $ cut-off score for adjusted education: edu$9 yr, 26; 6#edu,9 yr, 22 [19]; (4) preserved basic ability of daily life (ADL)/minimal impairment in complex instrumental functions; (5) etiology unknown; (6) normal hearing and sight; (7) have not met the diagnostic criteria for dementia based on the criteria from the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA).
In the present study, 121 participants at baseline were included. The participants were followed about four years after the first visits. 76 participants did not convert to dementia over longitudinal follow-up with a mean duration of 41.2 months. These participants are termed Non-converters (mean age = 68.8 years, SD = 9.0; mean education = 13.4 years, SD = 2.6). Another group of 45 participants progressively deteriorated and were judged clinically as having developed Alzheimer' s Disease over longitudinal follow-up. They are termed Converters (mean age = 69.4 years, SD = 6.8; mean education = 12.6 years, SD = 2.3). The mean duration of follow-up for the converters was 42.3 months (SD = 18.8). AD was diagnosed as probable AD according to the National Institute of Neurological and Communicative Disorders and Stroke-Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA/NINCDS-AIREN) criteria. According to the scores he/she obtained in MMSE(Mini-Mental state examination) [20], CFT (complex figure test) [21], AVLT(auditory verbal learning test) [22], AFT(animal fluency test) [23], STT(shape trails test) [24], CDR(clinical dementia rating scale) [25] , SCWT(Stroop color word test) [26] at the both visits, the severity of AD was just mild. This study was approved by the ethics committee of Shanghai Huashan Hospital Fudan University. All participants signed a consent form.
During the clock-drawing test, participants were asked to draw a big circle and put the numbers of the clock, and then they were asked to indicate the time as ''50 after 13.'' There was no time limit for this test.
According to previous studies, we chose 18 items and classified them into three major components: (a) drawing planning; (b) numbering; (c) placement and size of the hands. Each category can be further subdivided into some aspects. Within this study, we scored each clock according to the 18 items, by rating 1 if correct and 0 if wrong.
Moreover, five different scoring systems were used to score each clock blinded to the results of the rest of the assessment. We chose them because they were simple, representative and took the physicians less time. The three semi-quantitative scoring systems  Table 2. Age, education, and interval between two assessments for individuals classified with Non-converters and Converters.

Data analysis
Initial analyses (t test) examined the relationship between cognitive status (non-converter vs. converter) and age, years of education or interval between the two assessments to determine if these variables should be considered as covariates.
MMSE total, CFT-Copy, CFT-Recall, AVLT-I, AVLT-II, AFT-total, STT-A, STT-B, CDR, CDR-SB were compared between non-converters at the first visit (V1) and non-converters at the second visit (V2), converters V1 and converters V2, as well as non-converters V2 and converters V2 to determine the general cognitive function.
We selected 18 items from CDT associated with dementia according to previous studies. Each of the 18 items was converted to a dichotomous variable (0, 1) with ''0'' indicating no and ''1'' indicating yes. In order to understand if any of the items could predict cognitive status (non-converter vs. converter), an initial t test was conducted between non-converters V1 and converters V1. To find the longitudinal changes in performance before and after cognitive decline, a second t test was then conducted between V1 and V2 in converters. Moreover, we conducted another t test between non-converters V2 and converters V2 to know the differences of the 18 items between the patients with dementia and the patients with no dementia.
Once the items that significantly discriminated between converters V1 and converters V2 had been isolated, we proposed a new scoring system named as based change clock drawing test (BCCDT). Then we compared the BCCDT with the CDT by Sunderland  For all of the 242 assessments, the CDT scores obtained from the six scoring methods were correlated with each other to investigate the relationship between the types of scoring method.
Comparison for continuous variables was evaluated with the Student t-test or the Mann-Whitney U test when the data were not normally distributed. P values and CIs were estimated in a 2 -tailed fashion. Difference was considered to be statistically significant at P,0.05.

Characteristics of the participants
During clinical follow-up, 76(63%) participants remained nondemented and 45(37%) participants developed dementia. We divided all of the participants into two groups, Non-converters and Converters. Initial T test revealed that age (t = 0.340, P = ns), education (t = 1.726, P = ns) or interval between two assessments (t = 0.349, P = ns) had no significant impact on the results (see table 2).

Cognitive state of the Non-converters and Converters
According to the cognitive state at V2, we got four groups as follows: Non-converters V1, Non-converters V2, Converters V1 and Converters V2. The first two meant baseline visit and second visit of the participants who did not convert to dementia over longitudinal follow-up, and the latter two indicated first visit and second visit of the participants who developed Alzheimer' s Disease over longitudinal follow-up. We used T test to compare MMSE total, CFT-Copy, CFT-Recall, AVLT-I, AVLT-II, AFTtotal, STT-A, STT-B, CDR, CDR-SB between two of the four groups. The result showed that most had significant difference, except for CFT-Copy between Non-converters V1 and Converters V1, STT-B and CDR between Converters V1 and Converters V2 (see table 3). Table 4 shows the T test conducted to assess the utility of 18 items in our sample. Firstly, there was only one significant item at baseline between converters and non-converters (t = 4.731, p = 0.030), and performance in the converters were better than that in the non-converters, meaning that it was difficult to predict dementia. Secondly, for the items that were poorly finished, the accuracy rate of which was lower by 50% at baseline in converters, including ' '12, 3, 6, 9'' are first written after the circle, ' '1, 2, 4, 5, 7, 8, 10, 11'' are equally spaced, hour hand is towards correct number, minute hand is towards correct number, minute hand is longer than hour hand, there was no significantly difference between V1 and V2, showing that poorly finished items at baseline were not always the sensitive one to predict dementia. Thirdly, in the converters, there were four items, the score of which was higher in V2 than in V1, indicating that those four items were not helpful to improve forecast value. Therefore the total score of CDT should not just be the addition of each item. Fourthly, at the second visit, there were 15 items that were significantly different between non-converters and converters. But among the converters, there were only 7 items that could tell differences between V1 and V2, which means when comparing dementia with no dementia, the sensitive items between cross-sectional comparison and longitudinal comparison were not the same. Finally, there were seven significant items that appeared to be possible markers of progression to dementia in follow-up studies. Numbers are equally spaced (12-3-6-9) (p = 0.031), the other eight numbers are marked (p = 0.022), numbers are clockwise (p = 0.002), all numbers are correct (p = 0.030), distance between numbers is constant (p = 0.016), clock has two hands (p = 0.000), arrows are drawn (p = 0.003), all parameters indicated remarkable differences between baseline and follow-up scores in converters. The conclusion that can be drawn here is that these seven items may consist of a simple clock-drawing scoring system in follow-up studies to evaluate whether individual with no dementia had progressed to dementia. We named the new scoring system based change clock drawing test (BCCDT). Another 11 items no longer proved to be major contributors. Table 5 presents the correlation coefficients between the CDT score and other cognitive measures using correlation analysis. All correlations between the six CDT scores and the MMSE total score, AVLT-I, AFT and STT-A were significant, with the highest correlation occurring between BCCDT and MMSE total score, AVLT-I, AFT and STT-A. Sunderland scoring system and the BCCDT correlated with the time of Rey -O CFT-Copy (s), Table 5. Correlations between the CDT score and other cognitive measures. AVLT-II and correct number of SCWT-C, the highest correlation between Sunderland scoring system, BCCDT and time of Rey -O CFT-Copy (s), AVLT-II was obtained using BCCDT. Watson, Sunderland scoring system and BCCDT correlated with the CFT-Recall, and the three correlation coefficients were similar. MOCA-CDT, Shulman scoring system, Sunderland scoring system, and BCCDT correlated with time of SCWT-C(S), and the highest correlation was between BCCDT and time of SCWT-C(S). In conclusion, for BCCDT, it has displayed good correlation with other memory clinic measures (see table 5). Table 6 summarizes correlations between the six scoring methods, including BCCDT. For the total 242 assessments, the six systems are moderately-to-highly correlated, with the highest correlation occurring between the MOCA-CDT and Pfizer Inc. and Eisai Inc scoring method. All correlations between BCCDT and others were statistically significant at the 0.01 level.

Correlations between the six scoring methods
6. The utility of BCCDT comparing with other five scoring systems T test was conducted to assess the utility of the six scoring systems. We found in converters, the scores at V1 and V2 was significantly different, and p value of BCCDT (p = 0.000) was the smallest in the six (see Table 7).

Discrimination of different scoring systems between non-converters and converters
The area under the ROC curve is perhaps a more unbiased method to determine the efficiency of a screening test as it shows the relationship between sensitivity and specificity. ROC curves were drawn for the six scoring systems to evaluate their respective areas under the curve, sensitivities, and specificities (see Table 8). Using the optimal cut-off score of 5, the differences between the two groups were most discernible under BCCDT, according to the ROC curve (area under the curve = 0.713, p = 0.001), while the sensitivity and specificity remained at 78.6% and 57.1%, respectively. The Watson scoring method had the smallest area under the curve (0.571, p = 0.260). The MOCA-CDT and Shulman scoring systems had the highest sensitivities at 92.9% and 88.1%, respectively. BCCDT and Sunderland scoring procedures fell in the middle at 78.6%, and 73.8%, respectively. The Watson method had the lowest sensitivity at 54.8%, performing just above chance level for correctly identifying individuals with dementia. With regard to specificity, BCCDT scoring procedure had the highest specificity at 57.1%, followed by the Sunderland scoring method at 47.6%. The Pfizer Inc. and Eisai Inc's specificity closely trailed the Watson scoring system at 38.1%. Both the Shulman and MOCA-CDT procedures had the lowest specificities at 33.3% and 28.6%, respectively.

Discussion
Using the clinical sample, BCCDT was found to be effective in evaluating the longitudinal changes in clock drawing test (CDT) performance before and after cognitive decline. It includes seven critical items (numbers are equally spaced (12-3-6-9), the other eight numbers are marked, numbers are clockwise, all numbers are correct, distance between numbers is constant, clock has two hands, arrows are drawn). Further investigations should examine these seven items in the context of other indicators of dementia such as story recall and the MMSE score.
MMSE is one of the most influential cognitive screening methods. It has been widely used in screening dementia and MCI. In previous studies, the orientation and delayed recall parts of the MMSE are good at predicting specifically AD [27], [28]. But with the clinical practice of MMSE, researchers found it was not sensitive enough to be used in follow-up of cognitive function.
Recently, more and more researches focused on CDT, as it could reflect different cognitive abilities including auditory and visual comprehension, concentration, visuospatial abilities, abstract conceptualization, and executive control [1]. However, most were cross-sectional studies, there were few longitudinal studies. Ji et al. [29] described the longitudinal changes in performance and error types on CDT by dementia severity and subtypes. They concluded that longitudinal analysis of error on CDT may reflect different characteristics of cognitive deterioration according to dementia subtypes and dementia stages. Zhou [30] used Death scoring systems (total score of 4) to assess the efficacy of medicine, but the sensitivity and specialty has not been verified. Therefore, we hope a suitable CDT scoring system will help to evaluate cognitive function longitudinally. Lennie et al. [31] found that ''the clock has two hands, the size difference of the hands is respected, and the hour hand is towards correct number'' were three interesting findings that were early discriminators for developing dementia. These items may be good indicators of further cognitive decline. Sebastian et al. [32] concluded that the MMSE and the clock drawing test were as accurate as CSF biomarkers in predicting future development of AD in patients with MCI. But in our study, table 4 illustrated that there was only one significant item at baseline between the non-converters and converters, and performance in the converters were better than that in the nonconverters, which means that it was difficult to predict dementia using any one of the 18 items. Compared with the MCI-NP (participants with mild cognitive impairment who did not develop dementia on follow-up visits) group, participants in the MCI-D (participants with mild cognitive impairment who became demented after a 48-month follow-up) were more likely to fail the item for ''size difference of the hands is respected.'' [31] However, our results showed that poorly finished items at baseline were not the sensitive one to predict dementia.
A majority of studies focused on the utility of different CDT scoring system in screening dementia or MCI [33][34][35]. In this study, we found that when comparing dementia with no dementia, the sensitive items between cross-sectional comparison and longitudinal comparison were not the same. Therefore, the existing CDT scoring systems were not suitable for follow-up studies, and could not be used in assessing the efficacy of medicine.
According to the scores he/she obtained in MMSE (Mini-Mental state examination), CFT (complex figure test), AVLT (auditory verbal learning test), AFT (animal fluency test), STT (shape trails test), CDR (clinical dementia rating scale), SCWT (Stroop color word test) at both visits, the severity of AD was just mild. Patients who have been moderate to severe demented were not able to complete all of the tests. Therefore, BCCDT could be used to earlier recognize whether patients with MCI had progressed to mild AD. In addition, we discovered that the total score of CDT should not just be the addition of each item, as several items were not helpful to improve forecast value.
In comparing the non-converters and converters, the new scoring method with a cut-off score of 5 produced a sensitivity of 78.6% and a specificity of 57.1%. Even though the sensitivity of MOCA-CDT and the Shulman scoring method surpassed the new scoring system's sensitivity, the specificity of the new method was the highest among the six systems. This comparison revealed the new method to be more balanced than others for screening AD.
The correlation of the CDT with other screening tests, including the 'gold standard' MMSE, was good in most studies [15], [36], as well as in our study. We suggest that there may be a rationale for using both the MMSE and the CDT whilst evaluating longitudinal changes of cognitive function, as the MMSE measures are mostly verbal skills and so could not be sensitive enough. However, this would considerably increase the time of administration.
Because this was a longitudinal study, we may think that the performance decline in the seven items of BCCDT was due to aging. But in the non-converters, there was no significant difference between V1 and V2. Therefore, our results should not be interpreted as determining the effect of aging on CDT performance.
Several limitations of this study need to be considered when examining the results. The sample used in this study was not population based, but comprised clinic-based participants, which was not as ethnically diverse nor representative as might be desired. Results of the utility of our proposed scale should be verified in other population context to avoid the bias of ''preselected patients''. There is no correlation analysis between the moment of making V2 and the moment of the diagnosis of dementia. Moreover, AD was diagnosed as probable AD according to the NINCDS-ADRDA/NINCDS-AIREN criteria, and there were no distinctive biomarkers such as beta-amyloid or position-emission tomography (PET), so error could not be avoided.

Key points
Seven items of clock drawing test may consist of a simple clockdrawing scoring system in follow-up studies to evaluate whether individuals with no dementia had progressed to dementia. Table 7. Scores of four groups using six scoring systems. It was difficult to predict dementia using the 18 items of clock drawing test.
Poorly finished items of clock drawing test were not always sensitive to predict dementia.
Some items of clock drawing test were not helpful to improve forecast value of dementia.