Bringing the Cognitive Estimation Task into the 21st Century: Normative Data on Two New Parallel Forms

The Cognitive Estimation Test (CET) is widely used by clinicians and researchers to assess the ability to produce reasonable cognitive estimates. Although several studies have published normative data for versions of the CET, many of the items are now outdated and parallel forms of the test do not exist to allow cognitive estimation abilities to be assessed on more than one occasion. In the present study, we devised two new 9-item parallel forms of the CET. These versions were administered to 184 healthy male and female participants aged 18–79 years with 9–22 years of education. Increasing age and years of education were found to be associated with successful CET performance as well as gender, intellect, naming, arithmetic and semantic memory abilities. To validate that the parallel forms of the CET were sensitive to frontal lobe damage, both versions were administered to 24 patients with frontal lobe lesions and 48 age-, gender- and education-matched controls. The frontal patients’ error scores were significantly higher than the healthy controls on both versions of the task. This study provides normative data for parallel forms of the CET for adults which are also suitable for assessing frontal lobe dysfunction on more than one occasion without practice effects.


Introduction
In everyday life, cognitive estimation is an important form of problem solving. Given that previously learned knowledge cannot be directly called upon, so that the exact answer is not known, to reach an appropriate answer requires the development of an appropriate strategy and reasoning (e.g., estimating how much your grocery shopping will cost). To produce reasonable cognitive estimates, individuals need to identify and select the appropriate cognitive set, retrieve and manipulate particular details or estimates from that cognitive set, monitor the appropriateness of their response and repeat the procedure if necessary to produce a better estimate.
The Cognitive Estimation Task (CET) was devised by Shallice and Evans [1] in an attempt to assess the ability to provide appropriate cognitive estimates. Shallice and Evans [1] found that patients with damage to the frontal lobes performed poorly on the task producing bizarre over-or under-estimates. Many of the cognitive abilities thought to be important for producing successful cognitive estimates are executive in nature. Executive functions are thought to be mediated mainly by the frontal lobes [2]. Thus, it is not surprising that frontal lobe damage produces deficits in estimation. Since this original study, a number of researchers have demonstrated deficits in cognitive estimation in patients with frontal lobe lesions compared to patients with temporal or diencephalic lesions and healthy controls [3][4][5][6]. However, it should be noted that Taylor and O'Carroll [7] did not find a significant difference between patients with anterior and posterior lesions in terms of cognitive estimation in a large group of patients with different neurological conditions. Deficits in CET performance have also been reported in Alzheimer's disease [6][7][8][9][10][11][12][13], Korsakoff's disease [11,14,15], frontotemporal dementia [12], subcortical vascular dementia [16], post-encephalitis amnesia [17], major depressive disorder [18], traumatic brain injury [19] and in some cases of non-demented Parkinson's disease [20] but see [21].
The CET devised by Shallice and Evans [1] included 15 questions. The possible answers were coded according to the degree of bizarreness using a 4-point system. The CET was administered to a group of 45 frontal patients, a group of 51 posterior patients and a control group of twenty-five patients with extra-cerebral lesions. The results revealed that the percentage of very extreme, extreme, and quite extreme responses produced by the frontal patients was significantly greater than the control group, indicating a frontal lobe deficit associated with poor performance on the task.
The CET became a relatively widely used test of executive functions [22]. However, a number of issues have been raised concerning both the original version of the CET, as well as versions developed subsequently. For example, in the original version of the CET, there was only a small control group and there was no published normative data [1,10,13,23]. It could also be argued that certain items are only answerable by individuals from the specific country the normative data were obtained [13,[24][25][26][27]. Axelrod and Millis [24] produced normative data for a revised version of the CET based on a larger group of 164 healthy American volunteers. Around the same time, British normative data for a shortened version of the original CET [23] was collected by O'Carroll, Egan, and MacKenzie [25] from a sample of 150 participants. Furthermore, normative data have been obtained from individuals with limited education ranges. For instance, Axelrod and Millis [24] recruited more highly educated individuals whereas O'Carroll et al. [25] focused more on participants with fewer years of education. Other studies have collected normative data for versions of the CET which include items thought to rely less on general knowledge [28] and items for use with individuals from different cultures [13,26,27] but many of these include items that are now outdated [1,23,25]. For a review of the different CET versions used in healthy and clinical populations see Wagner, MacPherson, Parente and Trentini [29].
Repeated assessments of 'executive' functions are often required to monitor a large number of neurological conditions. Therefore, to have multiple versions of an executive task is an undoubted advantage. In the specific case of the CET, the questions asked should be novel, if administered on several occasions. This would avoid the possibility to have subjects thinking about possible responses after the test or remembering answers given previously. Also, to the best of our knowledge parallel versions of the CET do not exist. The aim of this study is to devise two parallel standardized versions of the CET which contain more up-to-date landmarks, people and objects that everyone will be familiar with. Normative data on a large number of healthy controls varying in age, gender and years of education will be provided. Finally, the performance of frontal patients on these parallel forms will be assessed.

Experiment 1 Participants
One hundred and eight-four healthy British volunteers (81 men, 103 women) aged between 18 and 79 years (M = 48.07 years, SD = 17.51 years) were recruited for the study. Their level of education ranged from 9-22 years (M = 14.33 years, SD = 2.92 years). Participants were grouped into different decades according to their ages: 18-29 years, 30-39 years, 40-49 years, 50-59 years, 60-69 years and 70-79 years, as well as in relation to their different levels of education in the UK: 9-11 years (O level or Standard Grade examinations), 12-15 years (A-level or Higher Grades examinations as well as College level higher education) and 16 plus years (University level education). One hundred and sixty-nine participants were right handed. None of them had any previous history of head injury or stroke, major neurological or psychiatric illness, or alcohol abuse as listed in the exclusion criteria for the Wechsler Adult Intelligence Scale-III UK (WAIS-III UK) [30] and the Wechsler Memory Scale-III (WMS-III UK) [31]. The majority of participants were recruited through the Institute of Cognitive Neuroscience, University College London volunteer panel and the Department of Psychology, University of Edinburgh volunteer panel. Others were recruited through an advertisement in a local newspaper, through personal contact with the researchers or word-of-mouth. Participants were reimbursed for any expenses for their participation. All participants spoke English as their first language. The study was approved by the National Hospital for Neurology and Neurosurgery & Institute of Neurology Joint Research Ethics Committee in London and the Table 1. Distribution of the participants' demographic characteristics according to age, education and gender. Philosophy, Psychology and Language Sciences Research Ethics Committee in Edinburgh. Written consent was obtained according to the Declaration of Helsinki. The demographic information of the participants according to age, education and gender are shown in Table 1.

Experiment 1 Background Neuropsychological Measures
All participants performed the National Adult Reading Test-Revised (NART) to estimate verbal intelligence [32] and Raven's Advanced Progressive Matrices to assess nonverbal abstract reasoning (APM) [33]. The Graded Naming Test (GNT) [34] was administered to assess naming abilities and the Graded Difficulty Arithmetic (GDA) [35] to assess arithmetical abilities. The Information subtest from the WAIS [30] was administered to assess general knowledge.

Experiment 1 Cognitive Estimation Task (CET)
Fifty-eight estimation questions were devised by the authors who included items relating to length (12), weight (10), area (10), speed (10) and number (16). All items required numerical responses. Participants were told that for most questions there was no exact answer or it was unlikely they would know the answer so they should make a reasonable guess or estimate of what the answer would be. The estimation questions were asked out loud by the experimenter and participants gave their answers orally. Participants could answer the items using their preferred unit of measurement, but when scoring the items, the responses were converted to the same unit of measurement. This was to ensure that participants did not fail to provide an appropriate estimate due to unfamiliarity with the unit of measurement rather than poor estimation abilities. Individuals were given as much time as necessary to produce estimations. For each item, participants were asked if they were sure that the response they had provided was a reasonable estimate and, if not, they were able to change their response.

Experiment 1 Data Analysis
Internal consistency of the CET items was examined using Cronbach's alpha coefficient [36] and Guttman split-half reliability coefficient [37]. Spearman's correlation coefficients were calculated to examine the relationship between the participants' CET performance and the background neuropsychological measures as well as age, gender and education. Linear regression analyses were conducted to determine which descriptive characteristics and neuropsychological measures are significant predictors of cognitive estimation. Finally, principal component analyses (PCA) were conducted on each 9-item CET with orthogonal rotation (varimax) to determine the number of components each CET loads upon.

Experiment 2 Participants
Twenty-four patients (14 men and 10 women) with lesions localised within the frontal lobes were recruited from the National Hospital for Neurology and Neurosurgery. Inclusion and exclusion criteria were: (1) the presence of a focal lesion confined to the frontal lobes based on a clinical CT or MRI scan, (2) English as a first language, (3) absence of childhood onset epilepsy (late onset seizures arising from the lesion were allowed), (4) absence of severe aphasia, and (5) absence of other significant neurological and psychiatric disorders. The aetiologies were as follows: glioma = 12, meningioma = 6, subarachnoid haemorrhage (SAH) = 3, space occupying lesion (SOL) = 1, anterior communicating artery aneurysm (ACoAA) = 1 and arteriovenus malformation (AVM) = 1. Twelve of the 18 tumour patients had undergone surgical excisions. Three of the 6 remaining tumour patients had undergone CT stereotaxic biopsies without excision. Frontal lesions were localised by operation site in the case of surgical Forty-eight healthy volunteers (28 men and 20 women) from Experiment 1 were selected as controls for the 24 frontal patients. Two healthy volunteers were selected for each patient, both matched in terms of gender as well as age and years of full-time education (plus or minus a maximum of 3 years). The mean age of the healthy control group was 47.92 years (SD = 13.11 years, range = 25-75 years) and the mean education was 14.67 years (SD = 3.26 years, range = 10-21 years). An independent samples ttest demonstrated that the frontal patients and the control volunteers did not differ significantly in terms of age (p = .99) or years of education (p = .84). Consent was obtained according to the Declaration of Helsinki and the study was approved by the National Hospital for Neurology and Neurosurgery & Institute of Neurology Joint Research Ethics Committee. Table 2 demonstrates the means and standard deviations for the volunteers performing the background neuropsychological measures.

Experiment 1 Cognitive Estimation Task (CET)
Firstly, any items that were predominantly British or resulted in ambiguity were removed. The means and standard deviations for the remaining items were then examined. Items where the standard deviation was greater than the mean were removed to reduce the items with the greatest variance in terms of healthy individuals' responses. This resulted in 38 estimation items. The percentiles for each item's actual responses were then examined and outliers that were 1.5 or more times the interquartile range were removed. Responses between the 20 th and 80 th percentile were considered normal and awarded 0 points. Responses that were equal to or more than the 10 th but less than the 20 th percentile or more than the 80 th percentile but less than or equal to the 90 th percentile were considered quite extreme and awarded 1 point. Responses were considered extreme when they were more than or equal to the 5 th percentile but less than the 10 th percentile or more than the 90 th percentile but less than or equal to the 95 th percentile and scored 2 points. Finally, responses less than the 5 th or more than the 95 th percentile were considered very extreme and scored 3 points. Any missing values that were due to the removal of outliers or where participants were unable to provide an estimate were scored as very extreme and awarded 3 points. The descriptive statistics for the 184 healthy participants' actual responses to the 38 CET items when outliers were removed and their percentile ranges are in the supporting information file (Tables S1 and S2).
The frequency distributions for the number of times scores of 0, 1, 2 and 3 occurred for each item were then examined. According to the normal distribution, only 5% of scores at each end of the curve should be scored as extreme values (i.e., 3 points). Items are more likely to be useful for detecting abnormality if there are relatively few outliers in the normal population and when normal subjects are more consistent in the responses chosen. The number of times 3 would be allocated for an item would be increased with the number of outliers and the degree to which the more extreme values found were less consistently chosen in the normal population. Therefore, any items where 20 or more individuals achieved a score of 3 were excluded. Participants could then obtain a total score between 0 and 72 for the 24 CET items where the higher the score, the greater the number of responses that deviated from the group. The mean error score for the CET for the entire sample was 14.04 out of a possible 72 (SD = 7.21, range = 1-56).
The Cronbach's alpha for the 24 items on the CET was.62, which is considered an acceptable level of internal reliability [38,39]. To assess whether the scores for each CET item correlated with the scores for the other CET items, item-total correlations between the individual item score and the total score for the remaining items were also conducted (see Table S3 in the supporting information file for individual items). Those items that were not significant were removed, resulting in 19 CET items and the one remaining item relating to the area dimension was also removed. The remaining 18 CET items had a Cronbach's alpha of.63 and the Guttman split-half reliability coefficient was.59. Spearman's correlational analyses were then conducted to investigate whether performance on the 18-item CET correlates with performance on the background neuropsychological measures. Table 3 demonstrates the correlational analyses for CET performance and the background neuropsychological measures. All measures significantly negatively correlated with CET performance. This suggests that the better the performance on the background measures (i.e., NART IQ, Raven's APM, GNT, GDA and Information subtest), the lower the error score on the CET. The background measures were then entered into a linear regression analysis to investigate their involvement in CET performance. The analysis revealed a statistically significant model that explains 33.9% of the variance on the CET where performance on the GDA (p,.05) and the Information subtest significantly influences performance (p,.0001).
The 18-item CET was then subdivided into two 9-item parallel forms of the CET with a total error score ranging between 0 and  Tables 4 and 5. When the 18-items were split into two 9item versions, both the CET A and CET B had relatively low reliability, Cronbach's a = .44 and.51 respectively. The Guttman split-half reliability coefficients were.47 and.59 respectively.
According to Kaiser-Meyer-Olkin (KMO), the sample for version A of the CET were adequate for PCA, KMO = .57, with individual item KMO values ..50. The correlations between items were also sufficiently large for PCA with Bartlett's test of sphericity, x 2 (36) = 60.22, p,.01, indicated that correlations between items were adequate for PCA. The analysis revealed that 3 factors could be extracted from the data, each with an eigenvalue greater than 1 and explaining 44.09% of the variance (see Table 4).
For the 9-item CET B, the sample were also suitable for PCA, KMO = .61, with KMO values ..49 for individual items. Bartlett's test of sphericity indicated the correlations between items were large enough for PCA, x 2 (36) = 90.307, p,.0001. PCA revealed a four factor solution, each factor with an eigenvalue more than 1, and explaining 57.17% of the variance (see Table 5). If the item, ''What is the maximum speed of a cyclist?'' was removed from the analysis due to the lower KMO value of.49, the PCA revealed a three factor solution, each factor with an eigenvalue over 1, explaining 49.92% of the variance.
The means and standard deviations for 184 participants performing versions A and B of the CET respectively according to age group, gender and level of education are shown in the supporting information file (Tables S4 and S5). Spearman's rank order correlations revealed that performance on the two versions of the CET correlated significantly (r = .34, p,.0001).
Spearman's rank order correlations were calculated to examine whether performance on the two versions of the CET correlates with age or education. The performance on the CET was significantly negatively correlated with age (version A: r = -. 16 Table 6 provides the percentiles of the distribution of the CET scores adjusted for age, gender and education.

Experiment 2 Background Neuropsychological Measures
Each frontal patient performed the same background neuropsychological assessment as the healthy participants in Experiment 1.

Experiment 2 Cognitive Estimation Task (CET)
The frontal patients were then administered both versions A and B of the final 9-item CET using the same instructions as Experiment 1.

Experiment 2 Data Analysis
The performance of the frontal patients and healthy controls was compared using two-tailed independent samples t-tests when the data were normally distributed and Mann-Whitney U-Tests when the data were not normally distributed. Table 7 shows the means and standard deviations for the frontal patients and healthy control groups performing the background measures. A Mann-Whitney U-Test revealed that the frontal patients correctly answered significantly fewer arithmetical problems than the healthy controls (U = 384.00, z = 22.30, p,.05). However, the frontal group did not significantly differ from the control group on any of the other background measures (p..10).

Experiment 2 Cognitive Estimation Task (CET)
The means for the patient groups and healthy controls performing versions A and B of the CET are also in Table 7. Mann-Whitney U-Tests revealed the frontal patients performed significantly more poorly than the healthy controls on both versions of the CET, achieving higher error scores than the control group (version A: U = 788.00, z = 2.54, p,.05 and version B: U = 753.50, z = 2.13, p,.05). Spearman's rank order correlational analysis revealed that the performance of the frontal patients on the two versions of the CET was significantly correlated (r = .45 p,.05). However, unlike our healthy participants, the frontal patients' CET performance did not correlate significantly with age (p..12).
To determine whether the CET is equally sensitive with both male and female frontal patients, further analysis revealed that the male frontal patients performed significantly more poorly than the healthy controls on both version A (frontal patients:

Discussion
This study provides normative data for two newly devised versions of the CET. Attempts have been made to include up-todate concepts which are no longer specific to certain countries such as UK or USA. Age, gender and education were all found to be associated with successful CET performance in healthy individuals. CET performance was also negatively correlated with intellect, naming, arithmetic and semantic memory abilities. Frontal patients were also found to produce significantly higher CET error scores than age, gender and education matched controls. This would suggest that our parallel versions of the CET are suitable for assessing frontal lobe dysfunction in clinical practice and research on more than one occasion to the same individual without practice effects.
The original version of the CET did not explicitly provide participants with the opportunity to change their response if they felt it was inappropriate, although changes to the estimates were accepted. Poor performance on the CET in frontal patients may be due to their impulsive nature where they simply respond with the first answer that comes to mind without monitoring the appropriateness of the response. In the new versions of the CET, participants were encouraged to review their responses in order to examine whether this would result in better estimations. However, our data suggest that this is not the case and even when frontal patients are encouraged to evaluate their responses and change them if necessary, they produce bizarre estimations. Further work is required to determine whether frontal patients are disadvantaged by the inclusion of this extra step on the CET due to a lack of insight into their own estimation abilities.
The negative correlation with age and CET performance in our normative sample suggests that the older the participant, the better the performance on the task. It may be that our older individuals' better performance on the CET is due to their ability to compensate for poorer reasoning and self-monitoring through intact semantic or factual knowledge. For example, to provide a fitting estimate for the item, ''How many strings are there on a harp?'', one needs to access semantic knowledge about musical instruments. Indeed, both the current and previous findings in the literature have reported significant associations between CET performance and semantic knowledge [6,11,14,25].
Improved CET performance with age would not be a reason to conclude that the CET is not an appropriate measure of executive dysfunction. For example, there are a large number of studies that have shown that phonemic fluency, another measure of executive dysfunction, does not decline with age or may in fact improve with age [40][41][42][43]. Moreover, while our older adults may have compensated for poorer reasoning and self-monitoring through intact semantic or factual knowledge, this could not explain the poor performance of our frontal group. Indeed, significant correlations between age and CET performance in our frontal patients were not found. Moreover, our frontal patients did not significantly differ from the controls in terms of their NART score or general knowledge performance, and yet they still produced significantly higher CET error scores. A plausible account is that performance on the CET requires both adequate strategic processes and good general knowledge. Therefore, while good general knowledge is necessary to perform well on the CET, without adequate strategic processes it is not sufficient to score within the normal range. In contrast, semantic retrieval on the NART and other neuropsychological tests of general knowledge does not require executive control.
An advantage of the current study was the inclusion of a large number of participants whose levels of education varied greatly. Previous attempts to provide normative data for the CET have tended to include individuals with either high or low education levels [24,25] or at least have not specified what the participants' education levels are [28]. The current findings support earlier studies that have found that the higher the number of years of education, the better the performance on the CET [24,25,27]. Significant correlations were also found between CET performance and intellectual abilities as has been shown with earlier versions of the CET correlating with novel abstract reasoning [11] and general intellectual abilities assessed using the WAIS [23] and the Wechsler Intelligence Scale for Children (WISC-III) [44]. It may be that some of the variance on the CET is due to education or general intellectual abilities rather than estimation abilities.
Few studies in the literature have reported whether gender influences CET performance. However, our CET advantage in male participants has previously been reported by other CET studies in the literature [25,27,45]. Moreover, while typically executive measures do not show gender differences, normative data for the Controlled Oral Word Association Test, one of the most widely used tests of frontal executive dysfunction, has provided evidence of gender effects where women perform better than men on the FAS phonemic fluency task [46,47].
Therefore, while gender effects on executive tasks are not common, they would not speak against the claim that CET performance is a sensitive measure of executive abilities. As the Information subtest of the WAIS [30] and the Grading Difficulty Arithmetic Test (GDA) [35] were significant predictors of CET performance, higher estimation error scores in females might be explained in terms of poorer semantic knowledge and arithmetical abilities. Further analysis revealed that both versions of the CET were sensitive to frontal lesions in male patients but only version A in female frontal patients. However, there was a trend for the female frontal patients to also perform more poorly than healthy controls on version B. It may be that this lack of sensitivity in female patients is due to the small sample size. Further work is required to determine whether this is the case.
Clinicians and researchers generally use the CET to assess executive dysfunction [45]. However, our correlational analyses have shown that cognitive abilities such as abstract reasoning, general knowledge, naming and arithmetical abilities may also underlie the computation and evaluation of reasonable cognitive estimates. The CET appears to be multidimensional in nature with different cognitive functions operating in concert to achieve the overall goal. These findings also suggest that impairments in distinct cognitive abilities might underlie the CET impairments reported in individuals with syndromes such as Alzheimer's disease [6][7][8][9][10][11][12][13], Korsakoff's disease [11,14,15], frontotemporal dementia [12], subcortical vascular dementia [16] and post-encephalitis amnesia [17].
One limitation of this study is the lack of the inclusion of a posterior control group. If the CET is indeed sensitive specifically to frontal lobe lesions, one would predict that patients with nonfrontal lesions should perform similarly to healthy controls when asked to produce cognitive estimates. Indeed, posterior patients produce significantly less extreme responses than frontal patients on the original version of the CET [1,5]. Future work should attempt to demonstrate that patients with non-frontal lesions are able to produce appropriate cognitive estimates on these new versions of the CET, in order to provide evidence on the frontal lobe localisation of CET processes.
In terms of test reliability, the internal consistency of the items when they were split into versions A and B of the CET was low [39], suggesting that the test items within the same version of the CET vary and are not necessarily measuring the same construct. Moreover, PCA revealed that the items in the CET loaded on several different components, which supports the multidimensional nature of the task as well as executive abilities in general [48]. It also should be noted, however, that the value of Cronbach's alpha can be deflated when there are a small number of items and multiple dimensions [49]. Previous versions of the CET have also been shown to have low reliability, as well as correlating with several cognitive domains and loading on more than one component [25,45]. The advantages of the CET as a clinical tool include its ability to be administered quickly, at a bedside and without making any motor demands [45]. The CET also has the advantage of not demonstrating significant declines in performance as a result of healthy adult ageing and therefore might be useful to accurately classify individuals in terms of normal and abnormal ageing. Of course, this is only the initial stage in establishing the usefulness of the new parallel versions of the CET in clinical assessment and further research is required to examine whether performance the task can be localised to a specific subregion of the frontal lobes and the predictive accuracy of the test in terms of different forms of dementia.      Table S6 The correction grid with the points to add or subtract from the raw scores to obtain adjusted scores for version A of the CET. For the combinations not reported, the corrections that should be applied to the raw CET scores to achieve adjusted scores are below the table.

Supporting Information
(DOCX) Table S7 The correction grid with the points to add or subtract from the raw scores to obtain adjusted scores for version B of the CET. For the combinations not reported, the corrections that should be applied to the raw CET scores to achieve adjusted scores are below the table. (DOCX) Author Contributions