Performance of humans vs. exploration algorithms on the Tower of London Test.

The Tower of London Test (TOL) used to assess executive functions was inspired in Artificial Intelligence tasks used to test problem-solving algorithms. In this study, we compare the performance of humans and of exploration algorithms. Instead of absolute execution times, we focus on how the execution time varies with the tasks and/or the number of moves. This approach used in Algorithmic Complexity provides a fair comparison between humans and computers, although humans are several orders of magnitude slower. On easy tasks (1 to 5 moves), healthy elderly persons performed like exploration algorithms using bounded memory resources, i.e., the execution time grew exponentially with the number of moves. This result was replicated with a group of healthy young participants. However, for difficult tasks (5 to 8 moves) the execution time of young participants did not increase significantly, whereas for exploration algorithms, the execution time keeps on increasing exponentially. A pre-and post-test control task showed a 25% improvement of visuo-motor skills but this was insufficient to explain this result. The findings suggest that naive participants used systematic exploration to solve the problem but under the effect of practice, they developed markedly more efficient strategies using the information acquired during the test.


Introduction
The Tower of London (TOL) [1] was designed to assess deficits of planning in patients with lesions of the frontal lobe. In Shallice's rationale, these lesions damage the Supervisory Attentional System (SAS) responsible for the non-routine selection of action schemes. In the TOL, ''the subject must construct a stack of objects from a starting configuration in series of individual moves'' ( [1], p. 203). Three beads placed on three rods are moved one by one in order to reach a given configuration. The subject performs twelve tasks requiring between 2 and 5 moves. With four moves or more, the SAS is presumably engaged, thus deficits are expected in patients with frontal lesions. Since then, the TOL has been used as a clinical tool, e.g., as part of the CANTAB (Cambridge Neuropsychological Test Automated Battery) computerized tests [2] and for research on executive functions and cognitive skills. For instance, the PubMed database contains 53 articles on the TOL ( March 20, 2009; keywords ''Tower London'' and/or ''TOL'' in title; irrelevant references removed manually) In the original TOL, the difficulty was graded by the number of moves. However, there is empirical evidence that the difficulty can vary markedly among the tasks with the same number of moves (see Section Discussion). It is now accepted that what really mediates the difficulty is the search space (also called problem space) [3,4]. The search space is a graph that represents the possible configurations as nodes and the transformations (or moves) as edges. A task is defined by means of two nodes (initial and final configurations). A solution is a path of minimal length between these nodes. The search space allows determining the number of alternative paths, the configurations to examine, as well as several factors that may affect performance like the presence of conflictive moves or sub-goals [5]. Facts and figures on the search space of the TOL can be found at the web site that presents support information for this article [6].
The impact of the search space on the performance of problemsolving programs (problem solvers) has been known for long in Artificial Intelligence [7]. The search space determines the combinatorial dimension, i.e., the number of possibilities that problem solvers have to examine. The impact depends on the algorithm, i.e., the predetermined sequences of decisions and operations executed by the program. It also depends on the a priori information and on the memory resources. For instance a program with a priori information and no memory limitations can use a look-up table that contains a predetermined solution for each task. The solution is found in the table and the combinatorial dimension does not affect performance. Conversely, a program that has no memory and no specific exploration method will explore randomly the search space therefore the average execution time grows quickly with the combinatorial dimension.
It may seem straightforward to transpose explanations and results from problem solvers to human performance. In fact, Shallice [1] states explicitly that the architecture of the Supervisory Attentional System and the TOL itself were inspired from an earlier problem solver [8]. However, unless the contrary is proven, it would be premature to assume that human solve combinatorial problems like programs, i.e., by means of a predetermined strategy (in a broad sense, i.e., a way to solve a problem). It would also be premature to assume that human are systematic, i.e., that they employ the same strategy for all the tasks of a protocol. In fact, there is evidence of the contrary (see Section Discussion). Also, human performance may be affected by contextual and psychological factors (see Section Discussion). In summary, whether the search space affects in the same way programs and humans is unclear.
However, a simple approach issued from the field of Algorithmic Complexity allows comparing usefully human and program performance on a given search space. The objective is to determine the degree of efficiency of humans by placing the human performance curve (execution time as a function of the number of moves) on a discrete scale used to rate the efficiency of algorithms: constant, logarithmic, linear, polynomial, exponential… The numerical execution time of an algorithm is unimportant because it can be improved with faster computers. Conversely the pattern of variation is irreducible. For instance, algorithms with exponential patterns of variation are unusable for large-scale problems, whatever the computer.
In order to build a scale of efficiency, we select a few algorithms that solve the TOL efficiently in different conditions (a priori information or not, bounded vs. unbounded memory) and we determine their patterns of variation. We then perform correlation analysis. The pattern of variation with the highest correlation coefficient corresponds to the degree of efficiency of humans, whatever the method they employ to solve the problem. We can refine the method by using measures of performance of real algorithms (task by task) instead of patterns of variations (that only consider the number of moves).
In case of success, the approach will provide insights on the efficiency of human strategies and a fair comparison between humans and algorithms. Because the approach is based on correlation analysis, it works in spite of the facts that humans are several orders of magnitude slower, human strategies and their neuronal realizations are unknown, and human performance is affected by contextual and psychological factors of difficulty. Inasmuch as the correlations are strong, we may even use the patterns of variations as algorithmic indexes to predict quantitatively the degree of difficulty due to the search space. This would be a valuable outcome for experimental research using the TOL. Indeed, the approach may fail, for instance if human strategies are not constant during a test, if inter-individual differences are too important and/or if other factors of difficulty have more impact than the search space.
We nonetheless applied the approach to the data of healthy aged participants (from the study presented in [9]). Their performance fitted nicely with the algorithmic index of efficient algorithms using bounded memory resources and no a priori information. This result was promising but it was not considered a sufficient validation, among other reasons because the tasks were limited to 5 moves (like in the original TOL) whereas the search space allows tasks up to 8 moves. We thus conducted an experiment with healthy young volunteers. Because all participants had a high education level, we assumed that they were cognitively skilled and we included difficult tasks in the protocol.

Materials and Methods
In the following, we refer to the original TOL [1]. We present the search space of the TOL and the algorithmic indexes, the experimental protocols for elderly and young participants and finally the data analyses.

Search space of the TOL
The search space of the TOL (Figure 1) contains 36 configurations and 108 licit moves (i.e., 36 nodes and 54 bidirectional edges). The number of licit moves from a given configuration (branching factor) is 2, 3 or 4 (average = 3). The configurations present 6 spatial patterns and for each spatial pattern, there are 6 color patterns (i.e., the order in which the colors are painted on the balls). We use the nomenclature of [3] for the configurations and the patterns. There are 1296 possible tasks, requiring between 0 (trivial tasks) and 8 moves. The number of solutions per task range from 1 to 8. More information about the search space is available at [6]

Exploration algorithms and Algorithmic Indexes
The objective was to build a scale to which the strategies of participants can be compared. Each element of the scale corresponded to a class of algorithms that have the same pattern of variation. We represented this pattern by means of an algorithmic index, i.e., a curve giving the average execution time as a function of the number of moves (recall that the numerical values of the curve are unimportant, given that it will be used only for correlation analysis).
For practical reasons, we limited ourselves to a minimal scale composed of 3 indexes (see below). We assumed that it was sufficient to validate the method, and this limitation entailed no simplistic assumption on actual human strategies (the scale is only used to rate their efficiency; actual strategies may be quite different from the algorithms used to build the scale).
We considered exploration algorithms, capable of finding the shortest path between two configurations of the search space. Exploration algorithms are defined as follows.
The algorithms receive as entry the search space of the TOL and the task to solve, i.e., a pair of configurations C I , C F at a distance N They return a sequence of N moves between C I and C F , i.e., a shortest path. They have no explicit information on the TOL, i.e., no predetermined data. Otherwise the problem could be solved in one step, with a look-up table. They embed no implicit information, i.e., they are not designed especially for the TOL. In other terms, they work with any (finite) search space. They are optimal given the constraints imposed to each family of algorithms, i.e., within each family, their pattern of variation has the slowest increase.
In other terms, the exploration algorithms are naive (like human participants that have not been exposed to the test) and they solve the general problem of the shortest path. We then computed four algorithmic indexes: U(N), B(N), I(task) and I(N). Their properties are summarized in Table 1. The numerical values of the indexes and the programs used to compute them can be found in the supporting web site [6]. U(N): algorithms with unbounded memory. These algorithms can store all the intermediate results. This speeds up the execution. Although unbounded memory is unrealistic for humans (this would be like using the long-term memory interactively) this family is of interest because it contains the most efficient algorithms for the general problem of the shortest path. The algorithmic index U(N) giving the average execution time T as a function of N increases as the number of nodes+arcs at distance N or lower. For the search space of the TOL, U(N) is almost linear. This can be attained for instance by means of labeled broad-first exploration [10].
Here is the sketch of such an algorithm to explore X~initial configuration C I f g for C in X if C~final configuration C F return label C as explored for A in possible moves from C if not A labeled as explored and not target configuration of A, T, labeled as explored label A and T as explored add T to X B(N): algorithms with bounded memory. These algorithms cannot store all the examined nodes and arcs because memory overflow may occur. These algorithms are therefore slower, but they are of interest because humans also have a bounded working memory [11,12]. With bounded memory, it is at most possible to store the path under construction (here, 8 nodes or less). Because the paths have nodes and arcs in common, there is a considerable amount of repetition. B(N) increases as the number of paths of length N or lower, which in general grows as b N , b being the average branching factor [13]. This was verified for the search space of the TOL, i.e. B(N) grows exponentially with exponent close to 3 (Table 1). This can be attained for instance by means of depth-first exploration. Here is a sketch of depth-first algorithm.   torial properties (and human performance) may differ markedly from task to task. We thus computed a task-dependent index I(task) that represents task by task the average execution time of efficient algorithms with bounded memory. To do so: We implemented a random broad-first exploration algorithm with bounded memory that explores the nodes randomly in order of increasing distance from the initial configuration. The memory is used only to store the current path in order to avoid circuits (i.e., moving to a previous position. This requires at most 8 nodes), i.e., the algorithm uses bounded memory. Here is the sketch of the algorithm (complete algorithm can be found at [6] [14]). Because at each repetition, the algorithm behaves differently, the average represents the performance of a collection of deterministic algorithms. For validation, we computed I(N) as the average of I(task) for the tasks of N moves. We verified that I(N) was similar to B(N). This was expected, because the random algorithm uses bounded memory. Note that I(N) increases slightly slower than B(N), possibly because the algorithm does not examine the paths that contain loops (see exponents in Table 1, bottom right and Figure 2). However this small difference does not justify using I(N) as a separate algorithmic index.

Experiment 1 -elderly participants -protocol
The participants were tested in the context of a study presented in [9]. The group was composed of 35 healthy elderly volunteers randomly selected from a list of beneficiaries of a pension fund (14 males; age 72.4 s = 4.4; education: 8.3 years, s = 1.5). All of them were naive with the TOL and none presented history of cognitive and/or neurological diseases (exclusion criteria: stroke, Parkinson's disease, severe trauma with loss of consciousness for 48 h or more, depression and chronic alcoholism).
Ethic statement. All the participants gave written informed consent, according to the regulations of the Ethic Advisory Board of Université Bordeaux 2.
The TOL was presented under the form of two identical kits (initial and target configuration), made of a wooden base (226662 cm) with 3 rods of 12 cm, 8 cm, 4.5 cm, and 3 balls (yellow, red and blue), 3 cm in diameter.
Two tasks of 2 moves were first executed by the examiner. The following instructions were then given to the participants: a) reproduce the target configuration in a minimum number of moves; b) move only one ball at a time; c) place at most one ball on the shortest peg, and two balls on the middle one; d) each ball can only move from one peg to another (i.e., do not lay the balls on the table or keep them in the hand). The participants were instructed to work out the minimal number of moves to reach the target configuration and to execute the corresponding sequence i) without errors and ii) as fast as possible. There were no time limits. They were asked to tell the examiner when they had finished, or when they abandoned. They performed 15 tasks presented in order of increasing number of moves, from 1 to 5 (Table 2). Each task corresponded to a unique trial. The number of moves was not indicated.
The execution time (precision 61 s) and the number of moves executed by the participant were measured by the examiner. The result (target configuration attained or not, abandon) was noted, as well as the rule violations that may have been committed.

Experiment 2 -young participants -protocol
Like in the original test [1], the tasks of Experiment 1 required 5 moves or less. In contrast, Experiment 2 was designed to cover more thoroughly the search space of the TOL, with tasks of 2 to 8 moves. This was presumably possible because the participants were young adults with high educational level, which were expectedly faster than the elderly participants of Experiment 1. Because the objective was not to compare the performance of young and elderly, it was of little interest to include the set of tasks of Experiment 1. The material (wooden kits, yellow, read and blue balls), the way of presentation and the instructions were similar to experiment 1.
The group was composed of 30 healthy young volunteers (13 males; age 22.9 s = 3.2; education: 15.6 years, s = 2.4). All of them were naive with the TOL. The exclusion criteria were: history of neurological diseases (like in Experiment 1), depression, motor deficits affecting hand movement, uncorrected vision or hearing deficits.
Ethic statement. All participants gave written informed consent according to the regulations of the Ethic Committee of IUGM (ethic certificate No. 20060101).
Participants executed a total of 35 tasks requiring between 2 and 8 moves (Table 3), in order of increasing number of moves, with 5 tasks per number of moves. Like in experiment 1, the number of moves was not indicated to the participant. The tasks of 1 move were not included because they were considered too easy. The tasks were selected pseudo-randomly so that the difficulty was balanced for each number of moves, i.e., the average of I(task) for the 5 tasks of N moves was close to I(N).
During the test, the examiner recorded manually the total execution time (precision 1 s) and also the preparation time, i.e., the time elapsed between the presentation of the task and the first move. The difference represented the movement time. Note that on-line planning may occur during the movement time [15,16]. In order to document the variation of visuo-motor performance (without planning demands), before and after the test, participants executed a sequence of 20 self-determined moves as fast as possible. The time was recorded manually by the examiner (precision 1 s). Note that the possible variation results from the opposite effects of fatigue and motor skill acquisition.

Data analysis
The execution time T was calculated for the valid trials, i.e., final configuration attained without rule violation whatever the number of moves.
The correlation coefficients (Pearson r) were computed on the set of tasks, between the averaged T (across participants) and the indexes U(N), B(N), and I(task). The indexes were then ranked in order of decreasing correlation coefficients.
For young participants, the correlations were initially computed for the whole set of tasks, but at the light of preliminary results, we computed them piecewise, i.e., on two subsets of tasks: easy (2 to 5  The tasks are in the order of presentation. The configurations are identified according to moves, n = 20) and difficult (5 to 8 moves, n = 20). The easy subset has a level of difficulty (or a number of moves) comparable to [1] and Experiment 1. The difficult subset contains the tasks of higher difficulty (or number of moves). The subsets are not disjoint (5 moves) so that both contain 20 tasks. Within each set or subset of tasks, the significance of the differences between the correlation coefficients of the indexes was assessed by means of T-tests, using equation (1) [17]. In order to determine whether inter-individual variability affect the ranking of the indexes, we also computed the correlation coefficients (Pearson r) between T (non-averaged, i.e., variable across participants) and the indexes U(N), B(N) and I(task) on the set of valid trials (Participants6Tasks).
Finally, for the young participants, in order to determine to what extent the variation of movement time may affect the total execution time (and the correlations), we compared the execution time of pre-an post-session visuo-motor tasks (20 self-determined moves) by means of a T-test.

Elderly participants -performance curve
The execution time T as a function of the task is presented in Figure 3 with the algorithmic indexes. Two preliminary observations are of interest. 1) There were marked differences of average execution time between tasks with the same number of moves. The differences were so important that a task of 3 moves took on average longer than some tasks of 4 moves and the same occurred with some tasks of 4 and 5 moves. 2) There were a visual resemblance between the curves or T and I(task): both presented peaks (long execution times) for the same tasks of 4 and 5 moves. However, the foregoing observations are qualitative and cannot be generalized. In fact, given the high variability of execution times, we verified that some visually marked differences were not statistically significant.
The correlation coefficients (Pearson r) between the execution time and the algorithmic indexes are presented in Table 4. On the set of tasks (n = 15), I(task) presented the highest correlation coefficient (p = 0.92), followed by U(N) (0.86) and B(N) (0.81). All correlations were significant at p = 0.01. All the differences between correlation coefficients (as computed with Equation 1) were significant at p = 0.05. On the set of valid trials (n = 525) the correlation coefficients were lower because of the inter-individual variability. However all correlations were significant at p = 0.01. I(task) again presented the highest coefficient (p = 0.47) but the difference with B(N) (p = 0.46) was not significant. U(N) presented the lowest coefficient (p = 0.44).
In order to ensure that there was no better and simpler predictor of performance from the number of moves, we performed curve fitting for logarithmic, linear (like U(N)), polynomial, power law and exponential (like B(N) and I(task)) models. This confirmed that the best fit was for the exponential model (Pearson R of best fit: exponential: .94, power law: .90, 2nd order polynomial: .86, linear: .81, logarithmic: .73). Note that all the models have the same number of free parameters (2) except the polynomial (3). The exponential would therefore remain the best model for measures of quality of fit like the AIC or the Deviance

Young participants -performance curve
The execution time T as a function of the task is presented in Figure 4. A visual examination provides the following preliminary observations. 1) Like for elderly participants, for easy tasks (2 to 5 moves) the execution time increased but 2) it presented no clear trend for the difficult tasks (5 to 8 moves). 3) The execution times were markedly shorter for young than elderly participants and the steepness of the curve for easy tasks was markedly lower. 4) Like for elderly participants, there were marked differences of execution time among tasks with the same number of moves. 5) There was a visual resemblance between the curves I(task) and T: both  Table 4. Elderly participants -correlation coefficients for the set of tasks and the set of valid trials. presented peaks (long execution times) for the same tasks of 5 moves. The differences of slopes and execution times are illustrated on Figure 5. However, the foregoing observations are qualitative and require a quantitative validation before any generalization (see below). The correlation coefficients (Pearson r) between the execution time and the algorithmic indexes are presented in Table 5 separately for easy tasks (2 to 5 moves) (N = 20) and difficult tasks (5 to 8 moves). For easy tasks, I(task) presented the highest correlation coefficient (p = 0.92), followed by U(N) (0.90) and B(N) (0.80). All correlations were significant at p = 0.01. The differences between correlation coefficients (as computed with Equation 1) were not significant between I(task) and U(N), but significant at p = 0.001 between I(task) and B(N). On the set of valid trials (n = 596) the correlation coefficients were lower because of the inter-individual variability. However all correlations were signif-  icant at p = 0.01. I(task) again presented the highest coefficient (p = 0.56) but the difference with U(N) (p = 0.54) was not significant. The difference was significant with B(N) (p = 0.48) at p = 0.001.
As a validation, we performed curve fitting for logarithmic, linear, polynomial, power law and exponential models. This confirmed that the best fit was for the exponential model (Pearson R of best fit: exponential: .92, power law and 2nd order polynomial: .89, linear: .82, logarithmic: .76) The results changed completely for the difficult tasks. None of the correlation was significant, whether on the set of tasks (n = 20) or the set of valid trials (n = 597). Note that this is a mere consequence of the flatness of the performance curve as depicted by Figure 5. It was thus pointless to compute the significance of the difference between correlation coefficients.

Young participants -preparation and movement time
In this section, we present minimal results on the preparation and movement times of young participants. The only objective is to provide cues to interpret the foregoing results because it has already been mentioned that there is on-line planning during the movement phase. Figure 6 depicts preparation and movement time as a function of the number of moves and the corresponding trend lines computed separately for easy tasks (2 to 5 moves) and difficult tasks (5 to 8 moves). For easy tasks, preparation and movement time increase with the number of moves. For difficult tasks, the preparation time increases but the movement time decreases slightly.
A t-test on the execution time of the pre-and post test visuomotor tasks (execute 20 self-determined ball displacements as fast as possible) on the set of participants (n = 30) showed a significant decrease (pre: m = 24.8 s s = 7.1; post: m = 19.8 s s = 3.8; t = 5.7, significant at p = 0.01 bilateral). The amplitude of the decrease is about 5 s, i.e., 25%. In order to avoid misinterpretations of Figure 6, this decrease has to be contrasted with the 400% increase of the required ball movements (from 2 to 8) that tend to increase the total execution time.

Discussion
1. Naive human strategies are as efficient as optimal exploration algorithms For simple tasks (5 moves or less), humans and exploration algorithms with bounded memory had similar performances curves, i.e., the execution time increased exponentially with the number of moves. This result initially obtained with healthy elderly persons was conclusively replicated with young participants, on tasks of 5 moves and less.
The algorithms used as reference are optimally efficient under the same constraints than naive human participants: no a priori information, and bounded memory, similar to human working memory. These algorithms explore the configuration broad first, i.e., in order of increasing distance. However, this does not mean that humans use the same order. Any systematic exploration in which a node is examined a fixed number of times has a similar performance curve. In summary, the results support the view that naive humans use systematic exploration to solve the TOL.
We can reasonably discard that these results are artifacts. For easy tasks, the exponential indexes I(task) and B(N) were significantly more correlated to human execution time than the linear index U(N). Even when correlation coefficients were computed on the set of trials (accounting for the inter-individual variability), the index I(task) presented the highest correlation coefficient. This was verified for young and elderly, and on different sets of tasks.

Similar difficulty for naive humans and exploration algorithms
Due to the combinatorial properties of the search space of the TOL, the execution time may vary markedly across tasks with the same number of moves. This is true for humans, as depicted by performance curves (Figures 3 and 4) and for exploration algorithms as shown by the numerical values of the index I(task) that characterizes the average execution time of exploration algorithms on each task [6].
In addition, the results support the view that naive human strategies and exploration algorithms are similarly affected by the combinatorial properties of the search space. In both experiments, the specific index I(task) (r = 0.92) had a significantly higher correlation coefficient than the general indexes B(N) and U(N). Note that I(N), the task-independent version of I(task) is similar to B(N), and its correlation coefficients would have been similar, i.e., significantly lower than those of I(task). This shows conclusively that there is a significant benefit in using a task-dependent algorithmic index.
The results also show conclusively that the number of moves N is a poor predictor of human performance, at least in comparison with I(task). Note that the index U(N) (that corresponds to algorithms with unbounded memory) is almost a linear function of the number of moves. Thus N would have obtained correlation coefficients similar to those of U(N) (about 0.80 for both groups), significantly lower than those of I(task). Table 5. Young participants -correlation coefficients for the set of tasks and the set of valid trials. This suggests that practitioners and researchers working with the TOL could beneficially use I(task) instead of N in order to grade the difficulty of the tasks. This index can be found at [6]. I(task) represents the combinatorial difficulty, i.e., the difficulty due to the configuration of the search space, which is constant whatever the features of the participants, the protocol and the environment. However, it is worth underlining that I(task), like the number of moves is only a coarse predictor of difficulty. It cannot account for the variety of factors that may affect human performance.
Some of these factors can be obtained from the search space, like the presence of positive or negative triggers (i.e., initial moves that place a ball immediately in its final position; triggers tend to be intuitive moves for naive participants, but only positive triggers lead to some solution) [9] or the presence of conflictive moves or sub-goals [4,5]. Other factors are related to the protocol, like the physical model, i.e., the nature of objects and actions used to present the task [18], the instructions [16,19], the way of presentation, computer screen vs. real objects [20] or the presence of prior information [19]. Finally, some factors of performance are external to the task and the protocol, e.g., mood [21].

Non-naive human strategies more efficient than exploration algorithms
The surprising result occurred during the second half of the session of young participants: their execution time did not increase although they had to solve tasks of increasing difficulty (as graded by the number of moves). This means that humans became markedly more efficient than the exploration algorithms that best described their naive performance. It is unlikely that this result is an artifact. All the correlations between human execution time and algorithmic indexes that were significant during the first half of the test became non-significant on tasks of 5 moves and more, as an effect of the flatness of the performance curve ( Figure 5).
The change of efficiency is in line with the general literature on automaticity [22,23,24] and skills acquisition [25,26]. It is admitted that general intelligence (and/or controlled execution and/or executive functions) is employed to execute a novel task. Conscious control and attention are required, and the execution is slow, sequential and effortful. With practice, the execution requires less attention, less conscious awareness, and becomes more efficient. However, the gain in efficiency may come from a shift towards 'expert' strategies (in line with the Principle of Rationality, [27]) and/or a faster execution of the basic operations while strategies remain unchanged.
In the present case, it is unlikely that the strategies remained identical while basic operations became more efficient (e.g., visual check and mental representation of configurations, mental rehearsal of moves, physical movements). If this was the case, the execution time would have decreased during the sequences of 5 tasks with the same number of moves, and this did not occur ( Figure 4). Also, the performance at the pre-and post-test visuomotor control task only increased about 25%, but this is unlikely to compensate for the increase of the number of ball movements, i.e., 400% on the whole test (2 to 8 moves) and 60% on the second half (from 5 to 8 moves). Although the number of required movements only determines indirectly the visuo-motor demands, we may expect that such demands increased more than 25%.
The change in efficiency is more likely due to a change of strategies. This explanation is in line with evidence obtained from changes in brain activation related to cognitive skill acquisition [28]. It is also in line with evidence obtained by the patterns of gaze [29] that suggests that the difference between good and bad performers corresponds to a difference in strategy (although it may also come from a more thorough planning, [30]).
Indirect evidence from the algorithmic indexes provides additional insights about the strategies employed in the second half