Understanding the structure of cognitive noise

Human cognition is fundamentally noisy. While routinely regarded as a nuisance in experimental investigation, the few studies investigating properties of cognitive noise have found surprising structure. A first line of research has shown that inter-response-time distributions are heavy-tailed. That is, response times between subsequent trials usually change only a small amount, but with occasional large changes. A second, separate, line of research has found that participants’ estimates and response times both exhibit long-range autocorrelations (i.e., 1/f noise). Thus, each judgment and response time not only depends on its immediate predecessor but also on many previous responses. These two lines of research use different tasks and have distinct theoretical explanations: models that account for heavy-tailed response times do not predict 1/f autocorrelations and vice versa. Here, we find that 1/f noise and heavy-tailed response distributions co-occur in both types of tasks. We also show that a statistical sampling algorithm, developed to deal with patchy environments, generates both heavy-tailed distributions and 1/f noise, suggesting that cognitive noise may be a functional adaptation to dealing with a complex world.


Animal Naming Preprocessing and Descriptives
Submissions of animal names were pre-processed by applying spelling corrections, removing punctuation and whitespace characters, removing capitalization and pluralization, and were finally normalized by selecting unified animal names (for instance, "meal worm" as "worm" or "chimp" as "chimpanzee"). The normalization adhered to the animal taxonomy in [1] whenever possible. After normalization, a total of 411 unique animal names remained. Participants submitted an average of 148.2 unique animal names (SD = 41.8). To assess which categories subsequently named animals belonged to, we applied the category labels used in [1,2]. Only 79 unique animals in our experiment were not covered by these category labels (139 submissions in total). The first two authors then individually supplied category labels for these remaining animals and devised a categorization scheme consistent with the labels in [1,2]. After applying these labels, 18 unique animals without a category label remained (48 submissions in total).
We then analyzed the proportion of within-class and between-class transitions for each participant. Transitions were scored as within-class if at least one of the category labels for subsequent animals matched the categories of the previous animal. If no category label existed for an animal, that transition was always scored as between-class. On average, 46.45% transitions were within class. This proportion was significantly larger than chance (p < .001, obtained via 10,000 bootstrapped samples). Transition proportions for individual participants ranged from 34.63% -56.31% (M = 46.87%, SD = 6.38%), whereas proportions in bootstrapped independent samples were significantly lower (all p < .001,M =14.82%, SD=5.2%). Finally, the duration between successive animals was significantly longer (t(18) = 2.18, p = .04) for between-class transitions (M = 6.94s, SD = 14.4) than within-class transitions (M = 3.81s, SD = 6.11).
To assess whether IRIs correlated with distance in a semantic space, we used the semantic embedding from [1] obtained via the BEAGLE model of [3] trained on Wikipedia. We then transformed the cosine similarities between successive animals x and y into Euclidean distances via d(x, y) 2 = 2 − 2 * sim(x, y). Animals for which no corresponding embeddings were available in [1] were dropped from the analysis (139 out of 4967 submissions, 2.8%). There was a positive correlation between IRIs and semantic distances, r = 0.14, n = 4697, p < .001. For individual participants we obtained correlations between 0.04 and 0.25 (M = 0.15, SD = 0.07, 8 out of 10 p < .05). Participants sometimes repeated the previous animal names (min = 0.38% repetitions, max = 10.70% M = 3.63%, SD = 2.95%).

Data Fitting Methods
Correlation and tail exponents of the time series from both experiments were fitted following conventional methods [4,5]. For the correlation exponents (α), we conducted a Fourier analysis using a periodogram function in MATLAB 2020b. The correlation exponent was computed as the negative slope of the fitted line in the windowed, log-binned power spectrum for frequencies less than 0.1, following [5].
For the tail exponents (µ), we first calculated the distribution of successive changes. The changes are inter-response-intervals and absolute differences between two time estimates in the animal naming experiment and in the time estimation experiment respectively. These distributions were then fitted by a power-law distribution with the exponent and the range of data as free parameters [4]. The range was calculated as the part of the dataset with the lowest Kolmogorov-Smirnov (KS) statistic. This was implemented by repeatedly increasing the lower bound and decreasing the upper bound until the KS statistic failed to improve consecutively five times [4]. The power-law exponent was then fitted to this subset of the data using maximum-likelihood estimation [6]. In the animal naming experiment, the median fitted range was [2.00s, 61.64s]. In the time estimation experiment, the median fitted ranges were [0.02s, 1.05s], [0.03s, 2.32s], and [0.01s, 4.10s] for 1/3s, 1s, and 3s conditions respectively. Using ranges in this way helped exclude resting periods and lapses of attention from the analysis.
In a separate analysis, we looked for a signature of rest periods or lapses of attention in the time estimation experiment: changes in the time estimate that were followed by an immediate reversion to the previous time estimate. In these cases, the ratio of two consecutive changes would be close to −1. To check whether the apparent heavy tails were due instead to resting periods or inattention, as a conservative test we excluded changes in time estimates associated with change ratios between −1.2 and −0.8, and reanalyzed the remaining data. Per-participant tail exponents ranged from 0.87 to 2.45 (M = 1.56, SD = 0.56), and 24 of 30 were classified as heavy-tailed.
Whether a distribution follows a power law or not is often controversial and technically challenging to establish. To strengthen our results further, we also fitted our data with an alternative, strict approach. As before, we estimated power-law statistics by minimizing the KS-distance of our observed data and the theoretical MLE distribution, as suggested in [6]. Instead of the iterative procedure of determining lower-and upper bounds, we fitted a truncated power-law distribution (a power-law distribution times an exponential distribution) with a x min truncation. We then contrasted the optimized power-law distribution with the fit of an exponential distribution and report normalized likelihood ratios, as well as bootstrapped p-values [6] using the implementation in [7]. Using this strict test, for all three tapping targets we obtain µ exponents suggesting power-law distributions (µ 1/3s = 1, µ 1s = 1.37, µ 3s = 1.82). For both 1s and 3s targets, the likelihood ratio strongly and significantly favored the power-law distribution over an exponential (llr 1s = 2.04, p 1s = 0.04, llr 3s = 14.87, p 3s < .001). However, for the aggregate 1/3s targets the likelihood ratio was not conclusive (llr 1/3s = −.04, p 1/3s = .97). For aggregate animal-naming data, we found that the likelihood ratio strongly and significantly favored a power-law distribution (µ = 1.91, llr = 6.91, p < .001).
In contrast to the 1/3 and 1s targets, the majority of likelihood ratios were significant (6/10 p < .05). For the animal naming data, estimates ranged from 1 to 2.59 (M = 1.51, SD = 0.53) and all likelihood ratios favored power-law over exponential, with six ratios significantly favoring the power-law (6/10 p < .05). Overall, using this strict test, we continued to find evidence for power-law tails in trial-by-trial changes, though in a smaller proportion of participants.

Validating the Data Fitting Methods
The identification of 1/f noise and heavy tails depends on the fitting methods. While the methods we used are established in the literature [5,6,4] and a comprehensive evaluation of the methods is beyond the scope of this paper, it is useful to test the fitting methods on a simple counterexample to demonstrate their validity. In particular, we drew independent and identically distributed (i.i.d.) samples from a standard Gaussian distribution from which the simulated traces should never exhibit 1/f noise or heavy tails. Traces of 2 11 i.i.d. samples from a Gaussian were repeatedly simulated 1000 times and were fitted using the same methods as used with the data and simulations from the sampling algorithms. We found that the range of correlation exponents for these simulated traces was [-0.4989, 0.2592], clearly outside the range of 1/f noise [0.5, 1.5]; and that the range of tail exponents was [4.0506, 17.7245], again outside the range of heavy tails (1,3] -thus none of the 1000 replications met the criterion for 1/f noise or the criterion for heavy tails. This practice shows that the fitting methods are robust against a simple counterexample: i.i.d. sampling from a Gaussian.
We also show that the fitting method is robust using randomly reshuffled cognitive time series. For the animal naming task, the reshuffled time series had a 0.06% chance of being classified as 1/f noise across 1000 reshuffles. For the time estimation task, 0% of the reshuffled time series was classified as 1/f noise. These results argue against 1/f noise occurring by chance for values independently drawn from a heavy-tailed distribution.

Text B. Sampling Algorithms
The three sampling algorithms discussed in the main text are forms of Markov Chain Monte Carlo (MCMC). While there are elaborate versions of these algorithms with sophisticated tuning schemes that improve their efficiency, we compared simple commonly-used implementations of each.

Random-Walk Metropolis
Random Walk Metropolis (RWM; see [8] for an introduction) is a basic and well-known sampling method for sampling from a target distribution π(x). It was run with a normal proposal distribution centered on the current state of the sampler (i.e., with zero mean and standard deviation σ).

Hamiltonian Monte Carlo
Hamiltonian Monte Carlo (HMC; see [8] for a conceptual introduction) is an MCMC algorithm that uses Hamiltonian dynamics to make better proposals. We ran HMC with the common leapfrog integration scheme and Gaussian momentum proposals. For pseudocode of the algorithm,

Algorithm 1 Random-Walk Metropolis
Given σ, iterations Initialize x 0 for i = 1 : iterations do x ← x ′ end if return x end function see the following pseudo-code, where ϵ is the step-size parameter of the integrator, L is the log-likelihood of variables x, and L is the number of leapfrog integration steps.

Metropolis-coupled Markov Chain Monte Carlo
Metropolis-coupled Markov Chain Monte Carlo (MC 3 ; see [9] for an introduction) is a more sophisticated sampling algorithm that consists of multiple MCMC chains running at different temperatures. We ran several chains in parallel, with the first chain exploring the untempered distribution, and each successive chain exploring an increasingly tempered distribution. While sophisticated temperature schemes have been suggested in the literature, we adopt a linearly spaced temperature scheme, where successive chains explore increasingly tempered distributions.

Algorithm 3 Metropolis-coupled Markov Chain Monte Carlo
Given π, M chains, iterations, Temperature, σ T c ← Temperature(c), for all chains c ▷ Initialize temperature Initialize x c i=0 , for each chain c in C chains ▷ Set the initial state of each chain for i = 1 : iterations do for c = 0 : C do Step(x c i−1 , π 1/Tc , σ) ▷ Perform one MCMC step for the tempered distribution π end for for s = 0 : M/2 do Select two chains, k, l, at random (k ̸ = l).

Text C. Model Comparison Methods
As noted in the main text, we conducted simulation-based model comparisons for the three sampling algorithms using the Approximate Bayesian Computation (ABC) method [10]. Sampling algorithms were simulated to produce a time series that were then compared to the observed cognitive time series. The contrast between simulated and observed time series is done through the summary statistics of both time series; in our case, we focus on the correlation (α) and tail indices (µ) as the summary statistics. More formally, model performances were measured by the marginal likelihood, which can be approximated as follows: where θ is the set of free parameters of algorithms, N is the number of simulated time series,D(θ i ) is the simulated time series given model parameters θ i , and ρ(D, D) is the likelihood approximation function that compares the summary statistics between simulated and observed time series. To avoid strong preferences in the parameter space, all parameters for all three samplers were uniformly distributed across a predefined space (detailed below). For each task, each sampler was repeatedly simulated 1000 times with randomized parameter values. Both MC 3 and RWM were simulated for 2 11 iterations. Since HMC is more sample efficient, but also computationally more costly, we ran HMC simulations for 2 8 iterations. To make the summary statistics more comparable, each simulated time series also used the same initial value.
The likelihood approximation function ρ(D, D) was computed based on the difference in correlation and tail indices between simulated and actual data. These differences were smoothed through a similarity kernel function to return a probability of the observed data given the model simulation. We assumed Gaussian kernels with mean equal to that of the participant's index and standard deviation equal to that of the cross-participant indices from the same experimental condition. Covariances of the Gaussian kernels were set to zero. The calculation of correlation and tail exponents of the simulated time series followed the same methods used in the observed data [6,5] except that no exclusion was applied on the simulated time series.

Target Distributions
We used two target distributions to represent the hypothesis spaces of time estimates and animal names. For the time estimation experiment, a standard Gaussian distribution was assumed to be the target. For the animal naming experiment, a more sophisticated target distribution was considered. We first embedded all unique animal names that were reported by participants into a 2-dimensional space using multidimensional scaling [11], trained on the similarities in [1]. To infer a probability distribution of this semantic space, we then fitted a Dirichlet Process Gaussian Mixture Model via variational inference (with a maximum of 10 components), as implemented in [12]. Dropping mixtures with weights that were less than 0.1, we obtained a mixture of four Multivariate Gaussian distributions, for an illustration see

Parameter Ranges
Parameters were randomly drawn from sampler-specific priors. For RWM and MC 3 , we varied the proposal width in relation to the width of the target distribution σ prop /σ tar . We employed a two-step sampling procedure, first sampling if the proposal width is larger or smaller than the target width with equal probability. Then we sampled x uniformly x ∼ U(1, 10), resulting in samples from the range [σ tar /x, x/σ tar ], including both proposal distributions that were significantly larger than the target distribution and those that were much narrower than the target distribution. This was important because the proposal width directly affects the samples' autocorrelation: very wide proposal distributions will result in proposals outside of the typical set of the target distribution, producing large numbers of rejections, and thus, since the current state is rarely updated, large autocorrelations. However, this behavior can be subtly different for target distributions that are multimodal or for heavy-tailed distributions. In these cases, occasional large jumps might occur when a proposal into the tails or the other mode is proposed.
In addition to the proposal width, for MC 3 , we sampled the number of parallel chains, M , to be between 2 and 10 chains, with equal probability. We did not vary the temperature function, and all simulations used a linearly increasing temperature between chains (increasing temperature by 5 in each chain).
For HMC, we kept the step size constant at 0.1, as it only affects the integration accuracy. We varied the path length, sampling uniformly L = U(1, 5). All three sampling algorithms were initialized at zero.

Generating IRIs from Samples
For the analysis in the main text, we assumed that each sample took additional time, and that the time between subsequent animal names was related to the number of samples generated before there was a change in the nearest animal name to the state of the sampler in the semantic space. With this method, the time taken to generate a sample was unrelated to the distance it had travelled, and instead followed a Poisson process: the time between each sample was exponentially distributed with rate parameter 0.1. These assumptions about the time required to generate samples match those in the Autocorrelated Bayesian Sampler model [13].

An Alternative Method for Generating IRIs
As a robustness check for the animal naming task, an alternative cognitive mechanism for IRI was considered where the IRIs are related to the Euclidean distances between two successive animal names. That is, we calculated the distance the sampler travelled in the semantic space before the nearest animal name in the space changed. We performed the same approximate Bayesian computation to evaluate model performances and found that MC 3 and RWM were worse than HMC (see SI Table 1). However, comparing to log marginal likelihood results in Table 1, HMC was outperformed by both RWM and MC 3 when IRI was related to number of samples until a new animal was nearest.