Do Changes in the Pace of Events Affect One-Off Judgments of Duration?

Five experiments examined whether changes in the pace of external events influence people’s judgments of duration. In Experiments 1a–1c, participants heard pieces of music whose tempo accelerated, decelerated, or remained constant. In Experiment 2, participants completed a visuo-motor task in which the rate of stimulus presentation accelerated, decelerated, or remained constant. In Experiment 3, participants completed a reading task in which facts appeared on-screen at accelerating, decelerating, or constant rates. In all experiments, the physical duration of the to-be-judged interval was the same across conditions. We found no significant effects of temporal structure on duration judgments in any of the experiments, either when participants knew that a time estimate would be required (prospective judgments) or when they did not (retrospective judgments). These results provide a starting point for the investigation of how temporal structure affects one-off judgments of duration like those typically made in natural settings.


Introduction
The rate or speed of external events influences people's estimates of duration. In particular, intervals that are filled with more events or faster-moving objects are usually judged longer than those with fewer events or slower speeds [1][2][3][4][5]. However, relatively few studies have asked how changes in the pace of external events influence our judgment of time. In an early study, Buffardi [6] found that clustering events near the beginning of an interval led to longer subjective duration than clustering them near the end (see also [7,8]). More recently, Matthews [9] found that shapes moving with constant speed were judged longer than those which decelerate, which in turn seemed longer than those which accelerate. Similarly, a recent series of studies using brief tone sequences found that evenly-spaced sequences were judged longer than those with a change in tempo [10]. Moreover, temporal structure interacted with physical duration: accelerating sequences were judged longer than decelerating ones at short durations but the pattern reversed at longer durations.
These studies suggest that changes in tempo affect judgments of duration, but all were conducted in the psychophysical tradition: the stimuli were meaningless, lasted at most a few seconds, and participants judged many items over the course of the experiment. There is increasing emphasis on duration estimates produced under more ecologically valid circumstances (e.g., [11][12][13]). The current work examines the effects of temporal structure on duration estimates under more natural conditions, where participants made one-off judgments of intervals lasting a few minutes. That is, each participant judged a single time interval defined by a meaningful event sequence. Understanding such judgments is practically important because they more closely resemble the time estimates made in everyday life (people rarely judge the durations of hundreds of tones one after the other but are often asked how long a single experience or task lasted). Moreover, one-off judgments of long intervals may well be made different from psychophysical judgments because (a) judgment of intervals longer than about 1 second are often argued to rely on cognitive and neural mechanisms which are different from those subserving judgment of shorter intervals (see [14] for a review), and (b) psychophysical tasks elicit comparisons between stimuli presented in the experimental session, which can exert profound context effects on temporal estimates [10,[15][16][17], whereas one-off judgments minimize such effects.
For some participants in our experiments, duration estimates were retrospective (participants did not know that they would be asked for a duration judgment). For others, the judgments were prospective (participants were forewarned that they would be asked for a duration judgment). Previous research suggests that these two types of judgment draw upon different mechanisms and are differentially affected by experimental manipulations (e.g., [18]). Experiments 1a-1c had participants judge pieces of music whose physical duration was always the same but whose tempo either gradually increased (accelerating condition), gradually decreased (decelerating condition), or remained steady (constant-rate condition). Experiment 2 used a visuo-motor task in which participants responded to left and right arrows which appeared at increasing, constant, or decreasing rates, with total duration and number of stimuli held constant. Experiment 3 used a reading task in which participants read ''fun facts'' which appeared at increasing, decreasing, or constant rates, with the total duration and number of items equated across conditions. Competing theories of temporal judgment make different predictions about the effects of temporal structure in these studies.
One class of theory emphasises memory processes and is particularly relevant to retrospective judgments. In particular, it has been argued that remembered duration is positively related to the ''storage size'' [19], degree of segmentation [20,21] or number of contextual changes [22] during the interval. Within these frameworks, accelerating, decelerating, and constant-rate stimuli might be expected to have the same remembered duration because they comprise the same number of events. On the other hand, changes in tempo are presumably an integral part of the overall ''stimulus complexity'' and may necessitate changes in processing style over the course of the interval, in which case accelerating and decelerating items will be judged longer than constant-rate stimuli. An alternative memory-based perspective comes from Boltz [23,24], who has suggested that highly coherent stimuli (where the temporal and non-temporal structure affords high predictability) are more successfully encoded into memory, yielding more accurate duration estimates. One plausible idea is that gradual changes in tempo will reduce coherence by making the hierarchical structure of the event sequence less apparent, thereby reducing accuracy for the accelerating and decelerating items.
A second class of theory concerns "active timing". When people know that a temporal judgment will be required (in a prospective judgment task), they can employ intentional timing strategies during stimulus presentation. Many theorists assume that intentional timing is based on some kind of counting or accumulation process, often involving the flow of pulses from a pacemaker [25][26][27][28][29][30]. If the pacemaker keeps a steady rate then temporal judgment will be unaffected by temporal structure (assuming that other aspects of the process -such as the latency to begin/end the accumulation of pulses -are also unaffected). However, several authors have argued that the pacemaker is not constant but is affected by factors such as stimulus modality and intensity (e.g., [31][32][33]). In particular, presenting a stream of clicks at 5 Hz for a few seconds seems to speed up the pacemaker, lengthening the subjective duration of a subsequent stimulus [32][33][34] (see also [35]). This implies that the pacemaker rate is coupled to the tempo of external stimulation. If so, the nature of this coupling will determine the effect of temporal structure on judged duration. A linear coupling predicts no effect because the average tempo is the same for accelerating, decelerating, and constant rate conditions. However, some authors have suggested a negatively-accelerated (e.g., logarithmic) relationship between pacemaker rate and the pace of stimulus change [9,10,36]. Under this view, judged duration will be maximal for stimuli with constant rate [4]. Beckman and Young [1] have suggested an alternative, changebased model in which stimulus change combines additively with physical duration (a proxy for degree of environmental change) to determine subjective time. A simple view equates change with number of events or distance travelled, in which case accelerating, decelerating, and constant-rate stimuli which are matched on these dimensions will appear to have equal duration. If secondorder change is also taken into account, then accelerating and decelerating sequences will be judged longer than constant-tempo ones. Other authors have emphasized the role of attention in active timing (e.g., [37][38][39]), although the predicted effects of changes in tempo under these accounts are unclear. Finally, although memory-based and active-timing accounts are most obviously applicable to retrospective and prospective judgment tasks respectively, it is worth noting that participants might nonetheless use memory-based strategies for prospective judgments and rely on incidental pacemaker-based timing for retrospective judgment.
The five experiments described here examine the effects of temporal structure on one-off judgments of duration using both prospective and retrospective judgment tasks, and aim to constrain the theoretical accounts outlined above.

Experiment 1a
Experiment 1a had participants listen to a piece of music whose tempo either accelerated, decelerated, or remained constant, and then make an unexpected judgment of the music's duration.

Method
All Experiments were approved by the Faculty Ethics Committee for the University of Essex Faculty of Science and Engineering. Experiments 2 and 3 obtained written consent. For Experiments 1a-1c, consent was oral, in keeping with the British Psychological Society Code of Human Research Ethics and approved by the University of Essex Faculty of Science and Engineering ethics committee. Oral consent was appropriate because participation took place in diverse public settings, was anonymous, and comprised a task (listening to a brief piece of piano music and judging how long it lasted) that posed no conceivable risk to the participants, who were free to withdraw at any time. No participants withdrew after starting the task, and participants' written responses to the study questions was used to document their consent.
Participants. An opportunity sample of 135 participants (65 females) aged 19-68 years (M = 32.5 years, SD = 11.7 years) were recruited from a variety of locations (e.g., on campus, in their workplace).
Stimuli. The stimuli were three pieces of piano music: Inventio 4 by Johann Sebastian Bach (in 3/8); an excerpt from Sarabande by Georg Friedrich Hä ndel (in 3/2); and an excerpt from Sonata Facile by Wolfgang Amadeus Mozart (in 4/4). The pieces were chosen to be unfamiliar to the majority of the participants and were edited in Musescore (www.musescore.org), a freely-available piece of music-composition software. Musecore has an integrated sequencer and synthesizer which plays musical scores and allows the resulting performances to be saved as midi files. Pause and repeat signs were removed, and tempos of 78, 96, and 112 beats per minutes were chosen for the Bach, Hä ndel, and Mozart pieces, respectively. Using a synthesizer to generate the pieces has the advantage of avoiding slight changes in tempo/ pauses etc that come during human performance.
We used Audacity's ''Silence finder'' (audacity.sourceforge.net) to apply a uniform criterion for removing silence from the end of the recordings (specifically, silence was defined as intensity less than 250 dB lasting more than 0.1 seconds). We then used Audacity's ''Sliding time scale/pitch shift'' tool to adjust the tempo without altering the pitch. To create the ''accelerating'' stimuli, we set the starting tempo at 20% below the original value and the final tempo at 20% above the original value. For the ''decelerating'' stimuli, the starting and final tempos were set at 20% above and below the original value, respectively. These values were selected so that the changes in tempo were gradual but the initial and terminal tempos were very noticeably different.
The accelerating and decelerating versions of each piece had identical durations, but differed by a few hundred milliseconds from the original, constant-tempo pieces (an artefact of the tempo adjustment procedure). We therefore used the sliding time scale tool to apply a very small constant shift to the constant-pace stimuli so that they had the same durations as the corresponding accelerating and decelerating pieces. An advantage of this approach is that all three versions of each piece of music (accelerating, decelerating, constant-tempo) had been subject to the sliding time scale algorithm.
These manipulations resulted in constant-tempo, accelerating, and decelerating versions of all three pieces (Bach, Hä ndel, Mozart). The constant-tempo, accelerating, and decelerating versions of each piece had the same duration, and the pieces were extremely similar in length: (Bach = 61.07 seconds; Händel = 61.05 seconds; Mozart = 61.03 seconds). Finally, we used the Scale Intensity function of Praat (www.fon.hum.uva.nl/praat/) to norm the average nominal intensities to 65 dB. (Even before norming, the nominal intensities were very similar for all pieces: 60.3-61.7 dB). All of the final stimuli were saved as stereo wav files and are available from the authors.
Design and procedure. Participants were recruited in a variety of settings. They read an information sheet telling them that the experiment involved listening to a piece of music and that it should take no more than 5 minutes of their time. They were asked to listen carefully to the piece of music over headphones and told that, once it finished, they should remove the headphones, whereupon they would be given a question sheet. Participants were not told in advance that a duration judgment would be required. The music was played from a portable music player over Sennheiser HD 580 headphones. The testing environments typically involved low-level background noise.
Each of the 9 composer6condition combinations was played to 15 different participants. The first question on the response sheet asked: ''How long do you think that the piece of music lasted? Go with your intuitive judgment -do not try to use a watch or clock.'' There followed space to enter a judgment in minutes and seconds. The next question asked participants to put a small mark on a horizontal line ''to indicate how long the piece of music felt to you'', in order to probe their subjective impressions (which can be affected by event structure in a way not always detectable with absolute duration estimates [40]). The line was 14 cm long and labelled ''Very short'' and ''Very long'' at the ends. (For the first 20 participants, a formatting error meant that the labelling was slightly ambiguous. However, the experimenter was available to provide verbal clarification, and exclusion of these participants made no difference to the results.) The distance of the participant's mark from the ''Very short'' end of the line was measured and divided by the total line length to get a response scaled between 0 and 1.
Participants were also asked to rate their enjoyment of the piece from 1 to 7 (where 1 = not at all and 7 = very much), to indicate whether they recognized the music and, if they did, to try to name the piece. Finally, they indicated their age and gender.

Results and Discussion
Alpha was set to.05 for all analyses.
Recognition. Preliminary analysis showed that 25 participants (18.5%) purported to recognize the piece they were played, distributed fairly evenly over the decelerating (N = 12), accelerating (N = 8) and constant-tempo (N = 5) conditions, x 2 (2, N = 135) = 3.63, p = .16, and between the pieces by Bach (N = 9), Hä ndel (N = 4), and Mozart (N = 12), x 2 (2, N = 135) = 4.81, p = .090. Of the people who said they recognized the music, 7 attempted to identify it, mostly by venturing a composer; no-one named the piece correctly but 5 either correctly named the composer or named another piece by the right composer. Thus, it seems that the pieces met the aim of being largely unrecognized, and recognition was unrelated to either composer or temporal structure.
Temporal judgments. Here and below, preliminary ANO-VAs were conducted which included composer and recognition as between-subject variables. With one exception (noted below), none of the main effects or interactions involving these factors was significant and they were dropped from the analysis. (Their inclusion made no difference to the effects of temporal structure).
Responses to the duration estimation question were converted into seconds and are shown in Table 1. The judgments from the decelerating, accelerating, and constant-tempo conditions did not significantly differ, F(2,132) = .71, p = .492, g 2 p = .01. The duration estimates were significantly greater than the true duration for all three conditions (all ps ,.001). Table 1 also shows the subjective impression judgments which, like the duration estimates, were unaffected by temporal structure, F(2,132) = 2.09, p = .128, g 2 p = .03. Levene's test indicated that the variability of responses did not differ between conditions for either the duration estimates or the subjective impression judgments, F(2, 132) = 0.10, p = .907 and F(2, 132) = 0.27, p = .766, respectively.
Enjoyment. A supplementary analysis examined the effects of condition on enjoyment. There was no effect, F(2,132) = .44, p = .646, g 2 p = .01. (Caution may be needed here because there was some heterogeneity of variance, F(2,132) = 3.21, p = .044).
Correlations between measures. We calculated the correlations between enjoyment ratings, duration estimates and subjective judgements to assess the size of any relationship between these variables. The results showed a weak, significant relationship between duration estimate and subjective judgement, r = .20, p = .023. The other correlations were very small and not significant which suggests extremely weak relationships between duration estimates and enjoyment, r = 2.11, p = .214, and between subjective judgement and enjoyment, r = 2.005, p = .955.
In short, this experiment found no effect of temporal structure on retrospective judgments of music duration.

Experiment 1b
Experiment 1b was very similar to Experiment 1a, except that participants knew in advance that they would be asked how long the music lasted. In addition, the participant sample was rather different.

Method
Participants. One hundred and thirty five participants (66 female) aged 17-40 years (M = 21.3 years, SD = 3.0 years) took part. One additional participant reported crackling/loss of signal from the headphones during testing and was replaced. (It is possible that other participants experienced similar problems, but none mentioned it.) Participants were primarily students at the University of Essex, recruited during the term and tested at convenient locations around campus such as in the university library and lobbies (cf Experiment 1a, which used more participants from outside the university and was conducted over the summer vacation). The mean age of the participants is about 10 years younger than in the previous experiment. Stimuli, design, and procedure. The experiment was very similar to Experiment 1a, except that the initial instruction sheet informed participants that they would be asked to judge how long the music lasted, and told them not to use a watch because ''we are interested in your intuitive judgment of time''. The response sheet added a question asking participants ''how you went about estimating how long the piece of music lasted''. A similar question was asked at the end of Experiments 1c and 2, below. These judgment-strategy data are not analyzed here, but are available from the authors.
Temporal judgments. The duration estimates for the decelerating, constant-tempo, and accelerating conditions are shown in Table 1, and did not differ, F(2,132) = 1.06, p = .349, g 2 p = .02. The estimates were significantly longer than the true duration in all conditions (all ps ,.001). Table 1 also shows the subjective impression judgments, which were likewise unaffected by condition, F(2,132) = 1.52, p = .223, g 2 p = .02. Levene's test indicated that the response variability did not differ between conditions for either the duration estimates or the subjective impression judgments, F(2, 132) = 0.56, p = .571 and F(2, 132) = 1.79, p = .171, respectively.
Enjoyment. The preliminary three-way ANOVA with condition, composer, and recognition as between-subjects factors indicated that the only significant factor affecting enjoyment was recognition: participants who reported recognizing the piece also reported greater enjoyment (M = 5.25, SD = 1.12) than those who did not (M = 4.01, SD = 1.43), F(2,118) = 13.19, p,.001, g 2 p = .10. No other main effects or interactions were significant (all ps ..14).
Correlations between measures. Duration estimates were only weakly correlated with subjective impressions, and the relationship missed significance: r = .167, p = .053. Neither type of temporal judgment correlated with enjoyment (for duration estimates: r = .04, p = .640; for subjective impressions: r = 2.081, p = .349).
In short, Experiment 1b found no effect of temporal structure on one-off prospective judgments of music duration.

Experiment 1c
Neither the retrospective judgments of Experiment 1a nor the prospective judgments of Experiment 1b were affected by changes in tempo. However, inspection of the mean responses in Table 1 suggests two potentially interesting trends. Firstly, duration estimates were much longer (and therefore less accurate) in the retrospective task (overall mean = 117.7 seconds) than in the prospective task (overall mean = 87.6 seconds). Secondly, in the retrospective task the subjective-impression responses were greatest for the accelerating stimuli and smallest for the decelerating stimuli, whereas in the prospective task the order is reversed.
The results of Experiments 1a and 1b are not directly comparable because participants were not randomly assigned to tasks, and the participant groups were quite different in some respects (e.g., age). Experiment 1c therefore combined prospective and retrospective tasks in a single experiment, to test the reliability of the cross-experiment differences suggested by Table 1. Because the previous experiments indicated that the accelerating and decelerating conditions are most different, we dropped the constant-tempo condition to boost the power of the study.

Method
Participants. Two hundred and four participants aged 18-51 (M = 21.7, SD = 4.6) took part; 67 were male. Five additional participants were discarded because they indicated that they had previously taken part in a music duration-judgment experiment. Participants were members of the University of Essex participant pool and were paid £3. Most booked test sessions through an online system; some were recruited by the experimenter in person.
Stimuli, design, and procedure. As before, this experiment used a fully between-subjects design. Participants were tested in a quiet testing cubicle. The accelerating and decelerating versions of the music from each composer were used equally often, half for retrospective judgments and half for prospective judgments. The experimenter cycled through the 12 cells of the design in sequence, with the exception of the last 5 participants who were tested as replacements for earlier participants. Instructions for the retrospective task were as in Experiment 1a; those for the prospective task were the same as for Experiment 1b. The response sheets were similar to before, except that the ''enjoyment'' question was dropped. At the end, all participants were asked to write how they formed their judgment.

Results and Discussion
Recognition. A total of 26 participants (12.7%) indicated that they recognized the music; they were distributed fairly evenly over accelerating (N = 16) and decelerating (N = 10) conditions, Twelve people wrote something in the box asking them to name the piece; three named the correct composer and one additional person named the correct piece. As in the previous experiments, it seems that the choice of stimuli achieved the goal of being largely unrecognized.
Temporal estimates. The duration estimates are shown in Table 2. A 262 ANOVA indicated no main effect of condition, F(1,200) = 0.03, p = .866, g 2 p = .00 and no task6condition interaction, F(1,200) = 0.57, p = .450, g 2 p = .00. Although retrospective judgments tended to be longer than prospective judgments, the effect missed significance, F(1,200) = 3.51, p = .063, g 2 p = .02. Mean estimates were significantly above the true duration for all conditions (all ps ,.001). Subjective impressions were similarly unaffected by condition, task, and their interaction (all Fs ,1, ps ..3). Levene's test showed that neither duration estimates nor subjective impressions showed differences in variability across the cells of the design, F(3, 200) = 1.70, p = .169 and F(3,200) = 1.15, p = .331, respectively. As in the earlier experiments, magnitude estimates and subjective judgments were weakly correlated, r = .21, p = .003. Experiment 1c therefore found no evidence for an overall effect of changes in tempo on judgments of music duration and no modulation by judgment task. Experiments 1a-1c used music to define the to-be-judged interval. In addition to its emotive qualities and familiarity, music typically has a highly coherent, hierarchical structure [38], and may be judged different from other types of stimuli/task [41]. The next two experiments generalized the preceding findings to two other types of task, one of which involved active responding (Experiment 2) and one which simply involved processing text (Experiment 3).

Experiment 2
Experiment 2 examined the effects of changes in the pace of events on the judged duration of a visuo-motor task.

Methods
Participants. A planned sample of 162 participants was tested (115 female, ages 18-55 years, M = 21.8 years, SD = 5.1 years); two additional participants encountered technical difficulties and had to be replaced. The participants were recruited from the University of Essex participant pool, and were paid £3 each.
Design and Procedure. The experiment employed a 263 design with judgment task (prospective; retrospective) and condition (constant rate; accelerating; decelerating) manipulated between subjects. Due to a minor coding error, the numbers of participants in the six cells of the design were not equal, but the design was balanced for each factor separately. That is, half of the participants were in the prospective task condition, half were in the retrospective task condition, and one third of the participants were assigned to each temporal structure (constant rate, accelerating, and decelerating).
Participants signed a consent form providing relevant information, including the maximum duration of the study (10 minutes). The main task was computer-based and was performed in a sound-attenuating chamber. Instructions were presented onscreen, telling the participants that they should respond as quickly as possible to arrows that appear on the screen. At this stage, participants in the prospective condition were informed: ''At the end, we will ask you how long you spent doing the task''; participants in the retrospective condition were only told ''At the end, we will ask you some questions about the task''.
Participants were told that they should press the ''S'' key if the arrow pointed left and the ''K'' key if it pointed right, responding as quickly as possible while maintaining accuracy. It was explained that the arrows would keep appearing independently of the participant's responses, and that they should focus on the screen throughout.
After the last page of instructions, the message ''Get ready'' was displayed centrally on-screen for 2 seconds, followed by a 2 second blank before the first stimulus. The task consisted of 91 black arrows presented sequentially on a white background, pointing either left or right with the direction randomly chosen on each trial. Stimulus timing was measured in frames (screen refreshes); the monitor had a refresh rate of 85 Hz (therefore one frame was one 85th of a second) and a resolution of 10246780 pixels. Stimulus presentation and response collection were controlled using DMDX [42].
The arrows were displayed for 200 ms (15 frames) each, followed by a blank screen. The duration of the blank screen between the arrows depended on the condition. In the ''constant rate'' condition, there was a 2153 ms (183 frames) blank between stimuli. In the ''accelerating'' condition, the blank was 3212 ms (273 frames) on the first trial and decreased by 23.5 ms (2 frames) with each subsequent trial, reaching 1094 ms (93 frames) on the final trial. In the ''decelerating'' condition, the blanks followed the reverse pattern of the ''accelerating'' condition.
Following the final post-stimulus blank, the instructions ''End of Task. Wait a moment'' were displayed for 1 second (3 participants saw the ''End of Task'' message for slightly less than the usual 1 second because they pressed a response key when the message was on the screen). The end-of-task message was followed by instructions to estimate task duration (defined as the time from the ''Get Ready'' message at the start until the ''End of Task'' message at the end), in minutes and seconds. Participants were told to make intuitive judgements without referring to a watch or clock. A second question requested participants to mention the strategy used to come up with the time estimate. Both responses were made on a paper response sheet.

Results and Discussion
The actual times between the appearance of the ''Get Ready'' signal and the appearance of the ''End of Task'' signal were measured by the computers' internal clocks. Across all participants, the mean was 215.98 seconds (SD = 0.01 seconds); there was minor variation because of occasional dropped frames etc, but across all participants the shortest and longest durations differed by only 66 ms. Accuracy on the visuo-motor task was generally high (M = 94.1% correct, SD = 9.3%). One participant scored well below chance (2.1% correct), suggesting that they got the response keys the wrong way round.
The data of primary interest were the duration estimates. Each participant gave an estimate of the task duration, in minutes and seconds, which was converted into seconds for analysis purposes. In the few cases when participants reported approximations such as ''about 2-3 minutes'', the average of the two values was considered in the analysis, resulting in one score per participant. The mean judgments are shown in Table 3.
A 263 fully between-subjects ANOVA was performed to see if the duration estimates depended on task and/or temporal structure. The results showed no main effect of task, F(1,156) = 0.48, p = .488, g 2 p = .00, no main effect of condition, F(2,156) = 0.60, p = .551, g 2 p = .01, and no interaction, F(2,156) = 0.71, p = .492, g 2 p = .01. Levene's test indicated that the variance of the time estimates was not significantly different across groups, F(5,156) = 0.97, p = .435. Mean duration judgments were overestimates for all 6 task6condition combinations (all ps ..05, but across all participants the overestimation was reliable, t(161) = 3.28, p = .001).
Since the design of the study was not fully balanced (the cell sizes were only equal for each of the factors taken separately), two additional one-way between-subjects ANOVAs were performed, one for each independent variable. There was no difference in judgements between the accelerating, decelerating, and constantrate conditions, F(2,159) = 0.71, p = .494, g 2 p = .01. The nonsignificant effect of task was also replicated, F(1,160) = 0.41, p = .524, g 2 p = .00. In short, there was no difference between the three temporal structure conditions. Moreover, time estimates were not significantly different between participants who were informed they would have to make a duration judgement and those who were not, and the effects of temporal structure were not dependent on whether the participants knew they will be required to make such a judgement.

Experiment 3
Experiment 3 examined the effects of temporal structure on the judged duration of a reading task, and focused on retrospective judgments.

Method
Participants. A total of 134 native speakers of English (86 females) participated in this experiment. Their mean age was 21.5 years (SD = 5.7 years), ranging from 18 to 55 years.
Design and Procedure. The stimuli consisted of 16 ''facts'' taken from www.snopes.com (e.g., ''The youngest mother on record was a five-year-old Peruvian girl''), presented in black font on a white background on a CRT monitor (128061024 pixels refreshing at 85 Hz). The stimulus presentation was controlled by PsychoPy [43].Participants were tested individually in soundattenuating cubicles, and were randomly assigned to one of three conditions (constant, accelerating, or decelerating rate). Participants were told that they would be shown a list of ''fun facts'', one at a time, that they should read each one while it was on the screen, and that they would be asked some questions about what they had read at the end. They were not informed that a duration estimate would be required.
In the constant-rate condition, the facts were presented for 5 seconds each (425 frames); in the decelerating condition, the first fact was presented for 2 seconds (170 frames), and each subsequent fact was presented for 400 ms (34 frames) longer than the previous, up to 8 seconds (680 frames) for the last one; the accelerating condition was the mirror image of the accelerating condition (e.g., the first fact was presented for 8 seconds, and each subsequent fact was shown for 400 ms shorter than the previous one). In all conditions, the total duration was the same (M = 81.6 seconds, SD = 0.03, ranging from 81.49 to 81.63 seconds). (Note that the actual time varied very slightly between participants because the duration of one display frame is not always precisely 1/85 th of a second.).
The 16 facts were presented in the same order for all participants. Following the presentation of the facts, the participants were asked to judge (without using a watch) the time duration (in seconds) between the appearance of the first fact and the disappearance of the last one. They were further told that the answer lies somewhere between 0 and 180 seconds.

Results and Discussion
The response from one participant who reported having misunderstood the task was excluded. The mean time judgment was 83.00 seconds (SD = 40.18) in the constant-rate condition (N = 43), 84.98 seconds (SD = 39.46) in the accelerating condition (N = 45), and 81.97 seconds (SD = 44.19) in the decelerating condition (N = 45). A one-way between subjects ANOVA revealed no reliable difference in time judgement between the three conditions, F(2,130) = 0.06, p = .940, g 2 p .00, and Levene's test indicated no heterogeneity of response variance, F(2,130) = 0.10, p = .901. Mean duration estimates did not differ from the true event duration (for all three conditions, and collapsing over condition, ps ..5).
A secondary analysis was conducted excluding 12 participants who reported unusually low judgments of less than 20 seconds (suggesting that they may have misunderstood the instructions). Excluding these from the analysis did not change the effect of condition, F ,1.
In short, this experiment again found no effect of changes in the pace of events on retrospective judgments of time.

Conclusions
Across five studies, we found no significant effect of changes in the pace of events on one-off judgments of duration. Table 4 summarizes the effect sizes for the duration estimates in the current experiments (with the values calculated separately for the prospective and retrospective conditions of Experiments 1c and 2). The table shows both eta-squared and omega-squared values, with the latter being regarded as a less biased estimate of the population value [44]. Across all studies, the effect of temporal structure is small. Table 4 also shows the effect sizes for the contrasts between the decelerating, accelerating and (where applicable) constant-rate conditions of each experiment. These estimates were submitted to a random-effects meta-analysis using the metafor package for the R statistical language [45,46]. None of the contrasts showed significant heterogeneity in effect size (all ps ..60) and the overall effect size estimates were 20.104 (95% confidence interval, CI: 20.307,0.100) for the accelerating-constant rate contrast; 0.089 (95% CI: 20.074, 0.253) for the decelerating-accelerating contrast, and 0.057 (95% CI: 20.146, 0.259) for the decelerating-constant rate contrast. Note that all of the confidence intervals are relatively narrow and span zero.
Our samples were large and our studies were highly-powered. Previous comparisons of accelerating, decelerating, and constantrate stimuli found a median effect size of g 2 p = .53 for 5 studies examining judgments of moving shapes [9] and g 2 p = .21 for 4 studies examining judgments of brief tone sequences [10] (the latter paper also reports additional experiments with similar effect sizes in its Supplementary materials). The power to detect such effects is more than 99.9% for every experiment reported here [47]. Even the smallest of the effects in the aforementioned papers was g 2 p = .10, and the power to detect such an effect was 93.9%, 93.9%, 99.7%, 97.2%, and 93.7% for Experiments 1a-3, respectively. We also computed Bayes factors [48][49][50][51] for the duration estimates using a Zellner-Siow g prior [52] proposed as a default for ANOVA-type designs [53,54]. These Bayes factors were computed using a modified version of the code provided by Wetzels and colleagues [53] and are shown in Table 5; in every case, the data favour the null hypothesis by at least 10 to 1.
We must be cautious about generalizing these null results. Other stimuli and tasks might show a different pattern. Moreover, although the changes in tempo that we used were clearly noticeable, stronger manipulations might produce a significant effect. Our goal in reporting the current experiments is to provide a first investigation of how changes in the pace of external events influence one-off estimates of duration, and our results will be a useful contribution to subsequent meta-analyses of this issue.
If future work confirms our findings, then the null effect of changes in tempo will help constrain theoretical accounts of duration judgment. There are many accounts of temporal judgment, and it seems premature to engage in a lengthy discussion of all of them at this point. Rather, we briefly outline a few basic ideas. Considering first retrospective judgments, we noted above that many accounts have posited that remembered duration is positively related to the ''storage size'' [19], degree of segmentation [20,21], or number of contextual changes [22] during the to-be-recalled interval (see also [1] for a change-based account of active timing). The null effects reported here suggest that, if these accounts are correct, then tempo changes do not add complexity or require more changes in processing than constantrate event sequences. That is, the complexity/degree of segmentation/amount of change that determines a remembered duration seems to be defined in terms of first order change (the mean rate of events during the interval), not second order change (changes in the rate of events during the interval). Likewise, we found no indication that changes in tempo disrupt the coherence of the stimulus/event sequence in such a way as to impair duration estimates relative to constant-tempo stimuli.
Turning to prospective judgments, we noted in the Introduction that many theorists assume intentional timing to be based on counting or accumulating the output of a pacemaker whose rate may be linked to the rate of external stimulation. The null effects found here suggest either that the pacemaker rate is unaffected by external tempo, or that any coupling is linear (such that subjective duration depends only on the total number of external events in a given time period). Likewise, under the change-based account of Beckmann and Young [1], the current data argue that it is firstorder change, not second-order change, which determines judged duration.
Our experiments found no significant differences between prospective and retrospective judgments. This stands in contrast to the more usual finding of longer and more accurate judgments in the prospective paradigm (e.g., [13,55]). The reason for this difference is not clear. Possibly our forewarning was not a strong enough manipulation; instructions which more forcefully emphasized the need for an accurate temporal judgment in the prospective condition might have produced different results. In any case, our data will again make a useful contribution to metaanalyses of this effect [56].
Although we found no effect of temporal structure, studies using brief, meaningless stimuli which participants judge many times in a within-subject experiment show large effects (e.g., [6,9]). Why the difference? One explanation is that psychophysical studies reduce noise, maximizing experimental effects (although it is perhaps unlikely that the current studies would not even find a consistent trend, especially given the large sample sizes). Lownoise, psychophysical studies help illuminate basic aspects of perception, but the results are much less relevant to one-off judgments of moderate durations of the kind that people typically make outside the laboratory. Furthermore, presenting many similar items for judgment within an experimental session provokes inter-item comparisons and perceptual contrast which can strongly influence judgment [15][16][17][57][58][59]]. An alternative (not contradictory) possibility is that timing mechanisms are different for durations of a few seconds and for intervals of a few minutes. In particular, longer durations and more complex, naturalistic stimuli permit a greater diversity of judgment strategies than brief, impoverished items. Exploring these disparate strategies will be a key direction for future research. Relatedly, it will be important to establish whether factors such as modality [32,33], intensity [31], and repetition/familiarity [60][61][62][63], which exert a profound influence on psychophysical studies of perceived duration, have a noticeable impact on one-off judgments of naturalistic stimuli.

Acknowledgments
We thank Steven Southworth for help with data collection for Experiment 2.  The g 2 p and v 2 columns show the effect sizes for the effect of temporal structure on duration estimate in each experiment/condition. The effects for the prospective and retrospective judgment data from Experiments 1c and 2 have been analyzed separately, so each analysis is based on a one-way design and the partial eta-squared values shown here are identical to eta-squared. Note also that the calculation of omega-squared assumes a balanced design, but there were slightly unequal cell-sizes in Experiment 3. The last 3 columns show the standardized differences between means g* [64] calculated using the metafor package for R [46]. Acc = accelerating, Dec = Decelerating; Con = Constant-rate. doi:10.1371/journal.pone.0059847.t004 The Bayes factor is the probability of the observed data under the null hypothesis divided by the probability of the data under the distribution of alternative hypotheses specified by the Zellner-Siow g prior. Values greater than 1 indicate support for the null hypothesis that there is no effect of temporal structure on duration estimates. Values greater than 10 are often labelled ''strong'' evidence for the null; values greater than 30 are ''very strong'' evidence [48]. doi:10.1371/journal.pone.0059847.t005