Long-range temporal correlations in neural narrowband time-series arise due to critical dynamics

We show theoretically that the hypothesis of criticality as a theory of long-range fluctuation in the human brain may be distinguished from the theory of passive filtering on the basis of macroscopic neuronal signals such as the electroencephalogram, using novel theory of narrowband amplitude time-series at criticality. Our theory predicts the division of critical activity into meta-universality classes. As a consequence our analysis shows that experimental electroencephalography data favours the hypothesis of criticality in the human brain.


Introduction
In a spatially extended physical system of homogeneous interacting elements, critical powerlaw avalanche dynamics (CD) occur when the size and lifetime of a burst of activity after a perturbation (an 'avalanche') follow power-law distributions [1,2]. Theoretical approaches to CD assume a separation of time-scales [1,3]; the model system under study is allowed to relax to equilibrium completely after a perturbation before the effect of a subsequent perturbation is considered.
However, in experiments involving real-world physical systems, individual avalanches are often not directly observable-the signal is in many cases continuous. Perturbations are generated by the presence of stochasticity intrinsic to the system and may occur at random before the termination of an avalanche caused by a prior perturbation. Thus avalanches 'overlap' in the temporal domain and the degree of overlap depends on the size of the system, assuming that perturbations occur at random in each component element; such overlapping generates a continuous signal.
In neuroscience, several experiments [4][5][6] have provided evidence in vitro that small isolated neuronal networks display CD; however, the networks studied do not typically display a separation between periods of inactivity and activity [5,7]. In such experiments, a period of inactivity is defined by the network returning to a quiescent baseline level of lesser activity set by a threshold [5,7,8]. While this technique may work for small networks isolated in cultures, PLOS  results have been contradictory in vivo [9][10][11]; no consideration has yet been given to the hypothesis that this discrepancy is due to the absence of separation of time-scales in the continuous neuronal activity. When testing the hypothesis of criticality in the human brain, the problem of a lack of separation between avalanches is especially pressing. The use of invasive electrodes is problematic and research is often limited to macroscopic imaging of the entire brain, using, for instance, electroencephalographic (EEG) or magnetoencephalographic (MEG) recordings. The entire human brain never attains a quiescent level of baseline activity; the brain is continuously active rather than being relatively inactive between bursts of activity [12]. Only a few papers have claimed to test the hypothesis of criticality from the large-scale brain signals of awake human subjects [13][14][15]. These analyses are subject to several difficulties; firstly, the threshold set for separation between activity and inactivity is high, thus making the relationship to genuine separation of time-scales unclear. Secondly, as a consequence of this high threshold, the authors find CD only over short time-scales (<1 second). Thus no explanation may be offered on this basis of the long-term variability of human brain activity.
Such difficulties stemming from the lack of separation between activity and inactivity in vivo make unclear whether the results of in vitro experiments confirming criticality in animals [4][5][6] generalize to the intact brain of awake human subjects. Despite these difficulties, the question remains, why does the human brain display the same 1/f γ form in its power-spectra [16] as posited by CD [3]? The most prominent alternative theory to CD proposes that the scale-free form of the power-spectra, for example, of EEG and MEG data, is due not to CD but to passive filtering (PF) of activity through the extracellular media [17,18]; thus the authors show that CD is not necessary to explain the 1/f γ form of the power-spectrum.
In this paper, we show that the PF explanation is insufficient in reproducing certain longrange properties of EEG data which we show theoretically must hold if the CD theory is correct. We demonstrate that if a critical system produces temporally overlapping avalanches, which superimpose linearly at the level of sensors (as is known for electrophysiological recordings [19,20]), then the corresponding stochastic process generated by such activity may be shown to display novel properties, distinct from the form of the power-spectrum, which are provably not explainable by the PF theory. Our theory shows for the first time that critical universality classes naturally fall into subclasses which we refer to as meta-universality classes; the meta-universality classes are defined according to the spectral properties of the narrowband amplitude processes derived non-linearly from the activity produced by the system. The theory allows us to derive a test for CD in macroscopic neural recordings without requiring separation of activity and inactivity. We apply the test to EEG recordings and find that the data strongly favour the CD hypothesis over the PF hypothesis as a theory of neuronal variability over long time-scales. An illustration of our approach is displayed in Fig 1.

Overview of theory
We assume that we record activity of a macroscopic electrophysiological signal such as the EEG or MEG and that the activity is generated by a large neural network. We assume moreover that the activity is composed of the activity of numerous local neural networks such that the total activity measured is a linear superposition of the activity of these local networks; this linear superposition is well known and studied [19]. The activity of these local neural networks is composed of bursts of activity interleaved by periods of quiescence, due to the sparseness of neuronal firing [21,22]. In physical terms, these bursts of activity may be thought of as perturbations from equilibrium caused by the intrinsic stochasticity of the network. We shall refer to these bursts of activity as "avalanches", independently of whether their size and lifetime are power-law distributed. Thus short bursts of activity will be referred to as "short avalanches", despite the everyday connotation of the word "avalanche" as a "tumbling of snow of immense size". To simplify matters we assume that at each time step a fixed number of avalanches of local neural networks are initiated. Thus the recorded signal consists of the linear superposition of all avalanches initiated at prior times which have yet to subside.
According to the CD theory the size (number of neurons firing) and lifetime (length in time) of the avalanches of local neural networks are power-law distributed [1,2]; according to the PF theory, the size and lifetime distributions decay faster than a power-law [17,18].
In the case of the PF theory the long range correlations/ power-law power spectrum observed in the macroscopic neural signals are generated by filtering through the extracellular medium between recording electrode and neuronal activity. The PF hypothesis asserts that this filtering is caused by ionic diffusion generated by recalibration of concentrations through the medium at the source of the neuronal activity after polarization of neuronal membranes [18].
Vital to note from the outset is that our theory predicts properties of the amplitude dynamics of the narrowband activity of macroscopic neural signals. The importance of these The approach taken in this paper. Criticality implies that the distibutions of discrete avalanches are power-law (left hand side). However analysing discrete dynamics is problematic on the basis of continuous EEG/ MEG recordings. Until now it was unclear how to distinguish criticality from alternative explanations of 1/f γ power-spectra on the basis of continuous data (right hand side) such as EEG/ MEG. We show that criticality implies that the narrowbands of the continuous data have specific long-range properties (left hand branch of right hand side). The passive-filtering theory of the origin of the 1/f γ form of EEG/ MEG power-spectra does not predict such longrange narrowband properties (right hand branch of right hand side). Thus we have a criterion to distinguish criticality from passive filtering on the basis of MEG/ EEG recordings. We perform this test on empirical data. https://doi.org/10.1371/journal.pone.0175628.g001 Long-range temporal correlations in neural narrowband time-series arise due to critical dynamics dynamics in the CD/ PF debate is that they are invariant to linear filtering of the underlying signal. This is because the signal may be considered a linear combination of its narrowband (individual frequency) activities-a linear filter has only the effect of reweighting these narrowbands [23]. Thus predictions made on the basis of CD regarding these narrowband dynamics circumvent the counterarguments of the PF theory. Fig 2 illustrates our theory. The upper row (rows delineated by the square brackets and dotted lines) displays sample avalanches which are initiated at each time-step according to both the PF and CD theories; for visualization purposes, each subsequent avalanche is displayed at a separate position in the y-direction (from bottom to top in time, according to when each period began). In the second row, we see the sample activity X(t), produced by the network, composed as the linear superposition (sum) of all of these avalanches. The third row displays the observed data at the measurement device X 0 (t), which is given according to PF by a 1/f γ filtered version of X(t) but according to CD simply as a scalar multiple (dependent on the proximity to the recording electrode) of X(t). The fourth and fifth rows display narrowband timeseries f ω 1 (X 0 (t)), f ω 2 (X 0 (t)) which are obtained from the observed activity X 0 (t) by linear filtering in a narrow pass-band around two distinct frequencies ω 1 and ω 2 . In each row we visualize three cases. The left hand column corresponds to the PF theory, the right two columns correspond to two distinct universality classes in CD, which we denote by parameters α, β; the length of avalanches are distributed as p(L)*L −α and the height of avalanches as p(h)*L β . Informally α determines how likely long avalanches are in comparison to short avalanches and β relates to how tall long avalanches are in comparison to short avalanches (see Section: Materials and Methods: Theoretical Results for details). In Fig 2 we consider fixed β since varying β has simply the effect of altering the results quantitatively (but not qualitatively) and Illustration of the theory. Samples according to PF (left) and CD with the lifetime exponent α = 2.5, 1.5 and height exponent β = 1 (centre, right). Top row: avalanches a i,s (t) composing the continuous network signal X(t) by linear superposition (many avalanches superimposed over one another). Second row: continuous signal X(t). Third row: measured signal, filtered in the case of the PF theory (left) and a scalar multiple of the network signal X(t) in the CD case (centre, right). Fourth and fifth rows: narrowband signals at two frequencies ω 1 and ω 2 . We observe that in all cases the observed signal X 0 (t) fluctuates over a range of time-scales. However the narrowband signals display pronounced fluctuations in their amplitude envelopes only in the CD model for certain exponent values. See Section: Results: Overview of Theory for a detailed description of each panel.
controlling the exact onset of the distinct behaviours (thus we relegate detailed discussion of β to the theory section).
We see that the PF and CD theories make differing predictions and that the predictions of the CD theory depend on the universality class. The predictions relate to the dynamics of narrowbands f ω (X 0 (t)) for all ω. This holds since in both the PF and CD cases, the dynamics of f ω i (X 0 (t)) are unaffected by linear filtering of X(t).
1. The PF theory predicts that X(t) is not an auto-correlated time-series over long-time scales (left, second row). This is because the avalanches do not span long enough time-scales to induce long-range correlations (left, top row). This implies that distant time-points of X(t) are independent. The long-range variability of X 0 (t) (where X 0 (t) is the observed data) is induced by passive filtering through the extra cellular medium (left, third row). This implies that the amplitude envelopes of the narrow-band time-series, f ω 1 (X 0 (t)), f ω 2 (X 0 (t)), are also uncorrelated over long time-scales and across frequencies (left, fourth and fifth row); this is because f ω i (X 0 (t)) preserves the dynamics of f ω i (X(t))-the process of linear filtering has no effect on narrowband dynamics.
2. For α = 2.5 long avalanches are far more likely than in the passive filtering case; the total variance of long avalanches outweighs the variance of small avalanches (centre, top row). This implies that the total activity of the system X(t) and therefore the observed activity X 0 (t) display scale-free fluctuations over all time-scales (centre, second and third rows).
Because the variance of the large avalanches dominate over the small avalanches, the amplitude of the narrowband time-series display scale-free fluctuations on all time-scales; since the fluctuations arise from the swell in variance profile generated by individual avalanches which have broadband frequency content, these scale-free fluctuations are correlated across frequencies (centre, fourth and fifth rows). Informally speaking, the avalanches "protrude" from X 0 (t); such "protruding" causes fluctuations in all f ω i (X 0 (t)).
3. For α = 1.5 the likelihood of long avalanches is larger than the likelihood when α = 2.5. This implies that the number of avalanches gradually accumulates over time (right, top). As for α = 2.5 the presence of long avalanches implies the presence of long-range fluctuations in X (t) and X 0 (t) (right, second and third rows). This is because the frequency content of the signal X(t) is given by the sum of frequency content over all active avalanches-this takes the form of a power-law. Formally, frequency content sums linearly over a superposition of avalanches, which means that whether the individual avalanches are visible or not is immaterial to whether the power-spectrum is of power-law form. However since the number of avalanches active gradually accumulates, the narrowband time-series do not display longrange correlations or correlations across frequencies (right, fourth and fifth rows). Informally, the small exponent α causes avalanches to "stack up" cumulatively. This process causes the profiles of the individual avalanches to become invisible, meaning that scale-free fluctuations disappear from the f ω i (X 0 (t)), although X 0 (t) still displays scale-free fluctuations.
See Fig 3 for an illustration of the difference between values of α in the case of the CD theory. As α decreases (from bottom to top), individual avalanches become visible in the continuous data, but for values α < 2, avalanches are no longer individually discernible. This continuous transition may be explained in terms of moving gradually between the three cases above: the PF theory, the CD theory with α = 2.5 and the CD theory with α = 1.5. For large values of α, the qualitative properties X(t) for the CD theory and the PF theory are identical; large avalanches are improbable. (lower-part of Fig 3, labelled MU4). Here the data has a homogeneous character, with individual bursts of activity not clearly visible. As α decreases, large avalanches become more probable, and these become discernible in the continuous time-series (middle part of Fig 3, labelled MU2-3). However, as α continues to decrease, the relative probability of long avalanches become so great that avalanches overlap to an extent that they are not individually discernible (top part of Fig 3).

Theoretical results
In this section we formalize the observations made in the previous section; the formal statements are proved in Section: Materials and Methods: Theory. As a starting point we use the facts that avalanches occuring in local neural networks are separated by periods of inactivity, and macroscopic brain signals, such as EEG/ MEG, measure the linear superposition X(t) of activity of numerous local neural networks. That avalanches of local neural networks do not overlap in time on the level of small spatially local subnetworks follows from the finding that neural firing is sparse [21,22]; note that this allows that multiple avalanches are active in the entire network (composed of all local subnetworks) at any given time; i.e. in intra-cortical experiments using multi-electrodes ranging over a couple of mm., bursts of activity are separated in time by quiescence, although when looking at the whole brain using, e.g. EEG, we see a continuous signal, which is composed of the superposition of all such spatially local avalanches. Linear superposition is well established in electrophysiology [19] and follows from the principle of superposition of electrodynamics [24] (This linear superposition was also proposed as the mechanism whereby flicker or 1/f noise arises from power-law dynamics in the seminal paper of [1]; although such a linear superposition may not hold for all applications it is well established in neuroelectrophysiology.) This means that the macroscopic network activity X(t) is continuous, since before the termination of any individual avalanche, several other avalanches have been ignited in other spatial areas of the network. Thus: To simplify matters in the following, we assume that q s is constant q s = q. All results may be extended if the q s are considered as independent samples of a single random variable. So far Eq (1) says nothing which distinguishes PF and CD. The distinction between PF and CD lies in how h s,i and L s,i are claimed to be distributed. According to the PF hypothesis their distributions decay faster than power-laws (see left, top row of Fig 2). For simplicity, therefore, we summarize the PF as claiming exponential distributions: power-law because they reflect a filtered version of X(t): F(u) is a linear filter due to the extracellular media which yields a signal with power-law power spectrum from white noise input [17,18]. Important to note is that the filter F leaves the amplitude dynamics of narrowbands of X(t) unaffected. This is because F is a linear filter, so if c ω is the frequency response of F at the frequency ω, and f ω (X(t)) is the narrowband component of X(t) at frequency ω then: Thus the PF theory makes no claim as to the dynamics of the narrowbands, only their relative weightings in the observed signal.
On the other hand, the CD hypothesis states that macroscopic signals X 0 (t) reflect X(t) directly so that X 0 (t) = bX(t), for some scalar b, and that we have power law densities for h s,i and L s,i cut off at a value L c proportional to the system size: Notice that the power-law in h is described in terms of L. This is because CD asserts dependency between the distribution of the two quantities [3]. This dependence implies that the marginal distribution over h is a power-law in h. These power-law densities explain the powerlaw form of neuroelectrophysiological power-spectra [25]. In addition the CD hypothesis states that each a s,i (t) is an independent and identical sample from a single stochastic process, which we call a(t) [26].
As well as considering the raw measured signal X'(t) we also consider the amplitude of a narrowband linear-filtered component of X 0 (t), which we denote g ω (X(t)) (this corresponds to the amplitude envelope of oscillatory activity in the bottom row of Fig 2). See Section: Materials and Methods: Amplitudes Estimated with the Hilbert Transform for details.
An illustration of g ω (X 0 (t)) is given in the lower-panel Fig 4; f ω (X 0 (t)) may be considered as a narrowband oscillatory signal and g ω (X 0 (t)) as the amplitude envelope of these oscillations. g ω (X 0 (t)) has been studied in many applications in neuroscience, including with regards to long range variability [27,28].
Our theory depends on two measures of long-term variation evaluated on g ω (X(t)): the Hurst exponent and the Detrended Cross Correlation Coefficient. The Hurst exponent H, of a stationary process Y(t) may be defined by the large-lag asymptotic scaling of the auto covariance function [29]. For H 2 (0.5, 1), Y(t) is said to be long-range temporally correlated (LRTC) whenever [29]: Informally, the larger H, the more Y(t) fluctuates over long time-scales. For auto covariances decaying faster than s −1 , one defines H = 1/2 and Y(t) is not LRTC [29]; this means that over long time-scales, Y(t) may be treated as uncorrelated. From now on, we distinguish the Hurst exponents of X 0 (t) and g ω (X 0 (t)) as H raw and H o amp . Thus H raw is a measure of how much X 0 (t) fluctuates over long-time scales, whereas H o amp is a measure of how much the amplitude envelopes of narrowbands of X 0 (t) fluctuate over long-time scales; note that there is no a priori reason that these fluctuations should be related-this is because a long-range correlated timeseries (with high H) can be generated by filtering white noise whose narrow subbands have amplitudes with negligible autocorrelations) [30] but equally may be generated as the superposition of narrow subbands with long-range autocorrelations in their ampltiude envelopes.
Typically in neuroscientific applications Hurst exponents are measured with Detrended Fluctuation Analysis (DFA) [31]. Applying DFA allows us to quantify the fluctuations displayed in the bottom three panels of The detrended cross correlation coefficient [32]ρ DCCA (n, Y 1 , Y 2 ) is a measure of correlation between two time-series Y 1 (t) and Y 2 (t) at a time-scale n, which is invariant to non-stationary trends of a fixed polynomial degree. Informally ρ DCCA (n, Y 1 , Y 2 ) measures to what degree fluctuations on the time-scale n co-occur between Y 1 and Y 2 and measures correlation but by ignoring non-stationary trends. When context leaves no room for ambiguity we abbreviate ρ DCCA (n, Y 1 , Y 2 ) to ρ DCCA (n). Applying ρ DCCA (n) to pairs of amplitudes of narrowband activity g ω i (X 0 (t)), g ω j (X 0 (t)) allows us to quantify the correlations between the fluctuations in the amplitude envelopes displayed in the bottom two panels of Fig 2. Given the measures of long-range variablity H o amp and ρ DCCA (n) we are now ready to describe our theoretical results, which formalize the discussion of Section: Results: Overview of Theory.
Our findings are summarized in Table 1 and Fig 5. Most importantly our theory makes rigorously quantified differential predictions between the PF and the CD theories. The differentiation between these models, and between groups of universality classes, which we term metauniversality classes, is made on the basis of the values of H o amp and lim n!1 ρ DCCA (n). For the PF theory we find that H amp = 0.5 and lim n!1 ρ DCCA (n) = 0. Thus according to PF, asymptotically, g ω (X 0 (t)) is not auto-correlated and g ω 1 (X 0 (t)) and g ω 2 (X 0 (t)) are uncorrelated over large time scales for distinct frequencies ω 1 For the CD theory we find that there are four meta-universality classes. In the first metauniversality class (MU1) which is defined by α 2 we find that H amp = 0.5 and lim n!1 ρ DCCA (n) = 0. On the other hand in the second meta-universality class (MU2), defined by α > 2 and β > α − 3, we find that H o amp ¼ b=2 À a=2 þ 2 and lim n!1 ρ DCCA (n) = 1. In the third meta-universality class (MU3), defined by α > 2 and aÀ 3 2 < b a À 3, we find that H amp = 0.5 and lim n!1 ρ DCCA (n) ! 1 as n ! 1. In the fourth meta-universality class (MU4), defined by α > 2 and b aÀ 3 2 , we find that H amp = 0.5 and lim n!1 ρ DCCA (n) = 0. Thus according to CD, for large values of α and small β we predict qualitatively identical long-range properties of g ω (X 0 (t)) as predicted by PF, but for intermediate values of α and large β we see non-trivial long-range auto-correlations and cross-correlations of g ω i (X 0 (t)), g ω j (X 0 (t)). For small values of α CD again predicts qualitatively identical long-range properties of g ω (X 0 (t)) as predicted by PF.
These results allow us to distinguish between the PF theory and CD theory on the basis of H amp and ρ DCCA (n) in the classes MU2 and MU3.

Simulations
For the CD model, we model the activity of local networks by: b(t) is the average avalanche shape, c(t) the shape of the variance profile and (t) a coloured noise with the spectrum known for a critical system P½o $ o À bÀ 1 [3].
We check the predictions of the PF model, by modelling the observed activity X 0 (t) as filtered uncorrelated white noise, to yield a process with a power-law power-spectrum; this is because with exponentially decaying avalanche height and size distributions, as claimed by the   inaccuracies generated by the finite sample size, H o amp ¼ 0:5 and ρ DCCA (n) ! 0 as n ! 1. Top two rows: we generate two processes assuming CD according to Eq (1) generating avalanches as described above. We set q = 5, T = 40000, β = 1 and α = 2.5, 1.5 (top, middle resp.). We confirm that for α = 2.5, H amp > 0.5 and ρ DCCA (n) ! 1 as n ! 1. On the other hand, for α = 1.5 we confirm that to within inaccuracies generated by the finite sample size, H amp = 0.5 and ρ DCCA (n) ! 0 as n ! 1. These quantitative differences in g ω (X 0 (t)) between the three conditions is reflected in the qualitative nature of the data. Whereas for the PF model and CD model with α = 1.5, the data take the form of a homogeneous random walk, for the CD model with α = 2.5, the structure of the time-series is more heterogeneous, with large avalanches visible among the many avalanches triggered.
PF model. The aim of this simulation is to verify that the behaviour confirmed in the preceding example for the PF model is reproducible, satisfying H o amp ¼ 0:5 and lim n! ρ DCCA (n) = 0. We simulate X 0 (t) from the PF model with T = 15000, 100 times. In each case the data are then filtered forward and backwards in two separate frequency bands (between 0.39 and 0.41 of the sampling frequency and 0.29 and 0.31 of the sampling frequency) with Butterworth filters of order m. We then measure DCCA correlations between the Hilbert transforms g ω (X 0 (t)) of these signals and measure their Hurst exponents H o amp with DFA, in both cases using window lengths between 10 3 and 10 4 . We repeat this setup for γ = 0.8, 1.8 and m = 2, 4.
The results are displayed in Fig 7 and show that, although there are small sample effects, on average we obtain H amp = 0.5 and zero cross correlations ρ DCCA (n) = 0 between frequency bands.
Predictions for all meta-universality classes. The aim of this simulation is to confirm the quantitative theoretical predictions for CD displayed in Table 1 across universality classes. For Long-range temporal correlations in neural narrowband time-series arise due to critical dynamics each pair of exponents in the ranges α = 1.5, 2.0, . . ., 6.5 and β = 0.25, 0.5, . . ., 3, we generate a sample path X 0 (t) of length T = 300,000, with a cutoff at L c = 100,000, and number of superpositions q = 5. The first 100,000 time points are discarded, to ensure stationarity. We then design Butterworth filters of order 2 between 0.29 and 0.31 (ω 1 = 0.3) and between 0.39 and 0.41 (ω 2 = 0.3) of the sampling frequency. The data from X 0 (t) are then filtered forwards and backwards and the amplitude envelopes are calculated g ω 1 (X 0 (t)) and g ω 2 (X 0 (t)) (yielding effective filter order 4). We measure the Hurst exponent of g ω 1 (X 0 (t)), using DFA, and the DCCA correlation coefficients between g ω 1 (X 0 (t)) and g ω 2 (X 0 (t)) setting n to log spaced values between 100 and 200000. This setup is repeated 100 times, and the results of the simulations are averaged. The results of this simulation for all critical parameters are displayed in Fig 8. We find good agreement between the theory for each meta-universality classes and the simulation results.
Further simulations. "Further Simulations" of S1 Appendix provides details of two further sets of simulations. The first verifies the accuracy of the theoretical exponent relations described in Table tab:theory ("Exponent Relations" in S1 Appendix). The second investigates the effect of signal-to-noise ratio on accuracy of the theoretical relations derived for data contaminated by noise ("The Effect of Signal-to-Noise Ratio" in S1 Appendix).

Analysis of EEG recordings of the human brain
Finally we tested the PF and CD hypotheses by estimating H o amp , H raw and ρ DCCA (n) on the experimental EEG data of seven human subjects. Section: Materials and Methods: Experiment describes the Experiment and our data preprocessing protocols in detail. We measured the gradient of ρ DCCA (n) against log(n) for the largest 11 timescales considered. We found that in 82% of cases, the gradient is positive, indicating that if the  range of n considered were extended in the presence of more data, the ρ DCCA (n) values would continue to approach 1. We also found that a large proportion of ρ DCCA (n) values are larger than 0.9. These results provide strong evidence in support of the theoretical prediction for CD, lim n!1 ρ DCCA (n) = 1. We found moreover that the ρ DCCA values at the highest scale and H o amp measured from the same neural data were highly correlated (p ( 0.0001) and H o amp values in distinct frequency ranges were highly correlated (p ( 0.0001) (likewise for ρ DCCA (n)). We also found that the ρ DCCA (n) and H o amp values were not significantly correlated with H raw (p > 0.05). Finally we measured the correlation between the average ρ DCCA (n) value over frequencies for a component versus a surrogate measure for the signal noise to ratio, namely the ratio of power in the alpha range 8-13Hz to power in a wider frequency range 5-16Hz. This measure determines to what extent the alpha rhythm stands above the 1/f γ shape of the power-spectrum (see Fig F in S1 Appendix). We found that the average ρ DCCA (n) value was positively correlated with the signal-to-noise measure (Spearman correlation: r = 0.32, p = 0.0026).
These results favour the CD hypothesis over the PF hypothesis. This is because, according to the PF hypothesis, H o amp ¼ 0 and lim n!1 ρ DCCA (n) = 0. On the other hand, in MU2, H amp > 0.5 and lim n!1 ρ DCCA (n) = 1. and in MU3, lim n!1 ρ DCCA (n) ! 1 for low frequencies. It might be argued that these results are explained by the PF hypothesis, since ρ DCCA (n) ! 0 is only true asymptotically, and for finite n we could have ρ DCCA (n)>0. However, this is not a sufficient explanation, since we also find empirically that H o amp > 1=2 as is only possible in MU2, with many values far larger than 1/2; in addition, we find that the gradient of the graphs of ρ DCCA (n) against log(n) are positive in over 80% of cases and in many cases the values of ρ DCCA (n) are indeed close to 1. Thus the data are better explained by CD in terms of the finiteness of n, with the gradient indicative of the limit ρ DCCA (n) ! 1 rather than ρ DCCA (n) ! 0 (see the middle column of Fig 6 for an illustration). Moreover, in MU3, we do not expect that ρ DCCA (n) is close to one, but rather greater than 0, in the finite sample (see Fig 8); the theory asserts that this is only true for the low-frequency, large scale limit.
Another respect in which these data confirm the CD hypothesis is the fact that the theory predicts the same exponent for each frequency band Further data analysis. In "Further Data Analysis" in S1 Appendix we investigate the impact of the spatial filters chosen, confirming that the choice made is not critical, and describe computations of power-spectra for each of the subjects, providing additional insight into the dataset, and provide full DFA plots for one of the subjects analysed, demonstrating the scale-free nature of the data analysed.

Conclusion
Our findings provide evidence which support the hypothesis of criticality in the human brain rather than the shaping of a 1/f γ spectrum by passive filtering; passive filtering of neural signals may generate a power-law spectrum but will not induce LRTC of narrowband amplitudes or DCCA correlations between narrowband amplitudes.
Such conclusions are made possible by our theory of meta-universality classes. Depending on the values of critical exponents, we predict qualitatively differing behaviours, which fall into four classes. These behaviours are manifest in the amplitude envelopes of narrowband processes of the measured data.
In contrast to the methods applied in previous studies on criticality in neuroscience, our framework does not depend on being able to detect single avalanches. Indeed our theory shows that if the network falls into the first meta-universality class (MU1), then single avalanches cannot be discernible in principle. Thus, if no avalanche dynamics are detected from macroscopic recordings, then this does not mean the system is not critical. Thus it is a fallacious argument to claim that no avalanches are discernible from macroscopic recordings means there must be another explanation for 1/f power-spectra. This fact may explain the failure to detect avalanche dynamics in the experiments of [10,17].
Several papers have attempted to circumvent the problems due to the lack of separation of time-scales by thresholding time-series obtained with macroscopic neural recordings. On the basis of our analysis we advise here that caution should be exercised when using such thresholding approaches. If the network is critical close to the transition from MU2 to MU1 then the estimates of the critical exponents α and β obtained in this way are potentially biased since individual avalanches are not discernible. In particular if α < 2 then such analysis will yield an estimate that no critical dynamics are present at all although they are present but not identifiable. Moreover continuity considerations may imply that for α close to 2 estimation could be inaccurate.
Given these difficulties relating to continuous neural data what do existing analyses which use thresholding tell us? Several papers have confirmed predictions of the CD hypothesis over short time-scales and for the extremal activity of macroscopic neuronal time-series such as EEG/ MEG [13][14][15]. Moreover exponents correlate with Hurst exponents of e.g. alpha range amplitude envelopes [15]. Our findings imply that close to the boundary MU1/2, avalanches are impossible to detect individually; however close to the border between MU2/3 avalanches should nevertheless be detectable at the extremes of neuronal time-series, for sufficiently large β, assuming artifactual activity leading to shifts in voltage is not present. However, since avalanches in EEG/MEG are estimated from broad-band data, it is still unclear what effect passive filtering will have on the derived exponent values-this awaits further work, thus leading us to question the size exponent τ = −3/2 reported in several studies. In addition, even if this value is accurate, the theory which claims τ = −3/2 is only valid for a critical branching process or mean-field network [33]; it is not entirely clear whether the brain should be modelled as such. Although Hurst exponents in the alpha-range are seen to correlate with estimated avalanche exponents [15], the mechanism responsible for this correlation is unclear, as these oscillations occur at a fixed frequency and criticality is a scale free phenomena. Finally, our work shows that if the thresholding technique fails to detect avalanche dynamics as reported by [17], this does not mean that criticality is not present-our results present a potentially more illuminating test capable of testing scaling over several orders of magnitude on a sound theoretical basis, although in MU1 our methods will also fail to detect criticality if present.
Another interesting question is, can we in principle derive the exact values of the exponents α and β from the data, without measuring avalanche dynamics but by considering only the Hurst exponents H o amp and H raw ? The answer is, in principle affirmative in MU2, but not otherwise. In MU2 we have bijective relations between H raw ; H o amp and α, β, whereas in the other meta-universality classes we have H o amp ¼ 0:5. Thus in MU2 in principle we can estimate α and β by measuring H raw and H amp and inverting the theoretical relations to α and β which we derived (see Table 1). However, in practice this is problematic, since signal to noise ratio distorts the empirically measured exponents towards 0.5 [34]. This does not affect the qualitative conclusion, that if H o amp > 0:5 then we must have a network in MU2, but it skews the ability to exactly estimate α and β.
Nonetheless, using our framework and despite the problems of signal-to-noise ratio, we are able differentiate between the meta-universality classes, thus providing a range of confidence on the possible universality class-CD predicts that only MU2 has H o amp > 0:5. On this basis we find empirically at least some brain regions must be in MU2. On the other hand at least some values of ρ DCCA (n) and H o amp measured are not consistent with MU2. [5] find a value of α = 1.7±0.2, which means the network lies in MU1-in combination with our own results, this provides evidence that a range of critical exponents may be relevant to understanding brain function.

Amplitudes estimated with the Hilbert transform
Let f ω (Á) be a linear narrowband filter with pass band [ω − Δ, ω + Δ], and HðÁÞ the Hilbert transform then: When we want to make the pass band explicit we write f ω,Δ (X(t)) and g ω,Δ (X(t)). When we estimate g ω (X(t)) in software, we use the MATLAB code abs(hilbert(Á)).

Detrended fluctuation analysis
Detrended Fluctuation Analysis (DFA) [31] is a methodology for the estimation of the Hurst exponent H of a (possibly non-stationary) time-series. Its advantage over covariance analysis or analysis of the power-spectrum are its robustness to trends contaminating the empirical time-series and its desirable convergence properties [35]. The steps involved in DFA are as follows. First one forms the aggregate sum of the empirical time-series X(t): (From now on whenever we use lower case for a time-series, e.g. x(t), we mean the timeseries obtained from the corresponding upper case time-series, e.g. X(t), by way of this operation.) Analysis of the fluctuations in X(t) may then be performed by measuring the variance of x(t) in windows of varying size n after detrending, i.e., x(t) is split into windows of length n, x ð1Þ n ; Á Á Á ; x ðjÞ n ; Á Á Á ; x ðbN=ncÞ n and the average variance after detrending the data of x(t) in these windows is formed; i.e. let P d be the operator which performs least squares detrending of polynomial degree d, then the DFA coefficients or detrended variances of degree d are: 1 bN=nc Á n X j ðx ðjÞ n À P d ðx ðjÞ n ÞÞ > ðx ðjÞ n À P d ðx ðjÞ n ÞÞ ð13Þ Crucially, in the limit of data the slope of logðF 2 DFA ðnÞÞ against log(n) converges to H. Thus X(t) is LRTC if and only if the estimate of H,Ĥ, converges to a number greater than 0.5 in the limit of data. We note here that there are numerous methods for the estimation of the Hurst exponent; these include wavelet estimators [35,36], log-periodogram based methods [37], among others [38]. We use DFA since it is standard in the physics and neuroimaging literature, and yields competitive estimates [39]. See Fig 10 for an illustration of DFA. [40] contains a useful review and explanation of DFA.

Detrended cross-correlation analysis
Podobnik and Stanley propose Detrended Cross-Correlation Analysis (DCCA) [41], an extension of DFA to two time-series, by considering the detrended covariance: DCCA quantifies the behaviour of the covariance between X 1 and X 2 over a range of timescales given by n and generalizes DFA in the sense that if X 1 = X 2 then F 2 DCCA ðnÞ ¼ F 2 DFA ðnÞ. In analogy to the Pearson correlation coefficient, the detrended cross correlation coeffcient is [32]: i ρ DCCA quantifies the correlation between X 1 and X 2 over a range of time-scales. Note that while estimation of ρ DCCA is technically more complex than for Pearson correlation, v both Long-range temporal correlations in neural narrowband time-series arise due to critical dynamics coefficients estimate the same quantity for stationary time-series, not contaminated by trends [42]. Thus ρ DCCA generalizes the Pearson correlation coefficient. Applicability to non-stationary time-series is particularly important for the neural data analysis. Whenever the context allows for no ambiguity we abbreviate ρ DCCA (n, X 1 , X 2 ) to ρ DCCA (n). An illustration of DCCA is provided in Fig 11.

Details for Fig 3
We describe here how to reproduce Fig 3. We sample 5 × 6000 = 30000 instances of a(t) with unit length and unit expected height, with power-spectrum P½o $ o À bÀ 1 . Subsequently for α = 2, 2.125, 2.25, . . ., 5.5, at each time point q = 5 avalanches are initiated according to the model; however, for each universality class considered, the a i,s (t − s) are identical, taken from the 30000 pre-generated a(t), differing only in their length and height.

Theory
Predictions for the PF theory. The results in this section are derived here for the first time unless otherwise stated/ cited.
Our first result is that for the X 0 (t) of the PF hypothesis H o amp ¼ 0:5 and, when ω 1 6 ¼ ω 2 , ρ DCCA (n, g ω 1 (X 0 (t)), g(ω 2 )(X 0 (t))) ! 0 as n ! 1. This implies that g ω (X 0 (t)) and g ω (X(t 0 )) are uncorrelated for large t − t 0 and that for distinct ω 1 6 ¼ ω 2 the narrowband amplitudes g ω 1 (X 0 (t)) and g ω 2 (X 0 (t)) are unrelated (left, bottom two rows of Fig 2). This can be seen by splitting Eq (1) into avalanches which last longer than some value L 0 and avalanches which have duration shorter than or equal to L 0 : XðtÞ ¼ Therefore: Since time-points in this expression spaced more than L 0 points apart are independent we also have that time-points of g ω (X(t)) spaced more than L 0 points apart are independent, so that amplitude envelopes are asymptotically not autocorrelated, H o amp ¼ 1=2. Moreover ρ DCCA (n, g ω 1 (X(t)), g ω 2 (X(t))) ! 0 as T ! 1 since the numerator of the DCCA correlation coefficient is given by the DCCA coefficients F 2 X 1 ;X 2 ðnÞ and the denominator by ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi F 2 X 1 ;X 1 ðnÞF 2 X 2 ;X 2 ðnÞ q . The numerator behaves like the coefficients of white noise for large window sizes [43] and therefore tends to 0 in the limit. The terms in the denominator tend to a non-zero limit [44], so the entire quotient tends to 0. Since the effect of the filter of Eq (4) is simply to reweight f ω (X(t)) [23], then the result carries over to g ω (X 0 (t)). Predictions for the CD theory. We assume that all power-law distributions are cut off at a lifetime L c which is proportional to the size of the network considered and, for simplicity, that a fixed number q s = q of avalanches begin at each time-point.
(MU1) α < 2 For α < 2 the number of avalanches active at time t is: The first line follows by definition and the second line is an asymptotic approximation of the first line for large L c . Hence the number of avalanches active, a s;i tÀ s L s;i unbounded in the system size. I.e., for times s 1 , . . ., s k : where k $ L 2À a c . Applying the Lyaponov Central Limit Theorem condition [45], this implies that X 0 (t) converges to Gaussian for large L c with power-law autocorrelation (see "Central Limit Theorem" in S1 Appendix). Since Gaussian processes are uniquely defined by their second order properties and because a scale-free Gaussian process may be generated by filtering white noise [46], then in analogy to the results for the PF hypothesis H o amp ¼ 1=2 and Eðr DCCA ðn; g o 1 ðX 0 ðtÞÞ; g o 2 ðX 0 ðtÞÞÞÞ ! 0 as n ! 1.
To derive the Hurst exponent H raw of X 0 (t), we follow [25] and approximate avalanches by a box function: The authors show that in this case the autocorrelation function rðtÞ ¼ EðXðsÞXðs þ tÞÞ À EðXðsÞÞ 2 satisfies: Using the fact that if the autocorrelation function scales as r(t)*t −γ , then the Hurst exponent and γ are related as γ = 2 − 2H [47], gives: (MU2) α > 2 and α < β + 3 For α > 2, the probability that an avalanche is active with duration greater than L 0 is: The first line follows by definition and the second line is an asymptotic approximation to the first line for large L c . Thus the probability that long avalanches occur simultaneously is negligible; this implies that large avalanches "protrude" from X 0 (t) as illustrated in Fig 3. Moreover, since all quantities take the form of a power-law at criticality [3], we have, for an exponent β 0 , a scaling relation for large L: Here f ω (a)(t/L) is understood as position t/L of the normalized process a(t) filtered in the narrowband around ω. We now derive an expression for β 0 in terms of the critical exponent β. At criticality is has been shown that we require P½o $ o À bÀ 1 [3]. This implies that the standard deviation of f Lω, LΔ (a(t)) scales according to L −β/2 . This is because by definition of the power-spectrum [23]: And therefore: Therefore, for fixed ω: and: Therefore by linearity of the Hilbert transform: Thus: An illustration of the relationship between β and β 0 is given in Fig 12; if the height of avalanches scales with their length (left panel), then the scaling of the height of their narrowband amplitude envelopes is less sharp (right panel). We validate this relation in "Further Simulations: Exponent Relations" in S1 Appendix.
We now require an additional known relation between critical exponents. Let S be the size of an avalanche, i.e. the total activity occurring over the course of the avalanche: Then, it is known that at criticality [3]: And [2]: The "filtered" avalanches g o ðL b s;i aðt=L s;i ÞÞ also obey power-laws, with different exponents, α 0 , β 0 , τ 0 . We showed above that β 0 = β/2; in addition, α 0 = α since g o ðL b s;i aðt=L s;i ÞÞ and L b s;I aðt=L s;i Þ have the same length.
By Eq (44): The size exponents τ and τ 0 are important for our subsequent analyses because whether the asymptotic properties of the process X 0 (t) are dominated by the large avalanches depends on whether their size distributions have divergent variance or not. See Fig A in S1 Appendix for an illustration of these "filtered avalanches". X 0 (t) may be divided into a sum of long avalanches L s,i > L 0 which do not overlap and short avalanches.
Therefore, since f ω (L β )*L β/2 f ω (a)(t/L): The left hand term dominates whenever its variance is unbounded for large L c . This happens if τ 0 < 3 (since then the variance of p(S) diverges) which translates to α < β + 3 by Eq (45). Therefore the right hand term may be neglected and since the avalanches of the left hand term are separated in time, the envelope operator may be pulled under the sum so that: The same proof as for H raw when α < 2 may then be applied to deriving H o amp . The same proof applies for H raw as before.
Thus we find: And: Moreover, assuming that the integrals R 1 0 g o ðaðtÞÞdt exist, then the separation of large avalanches make it simple to derive that ρ DCCA (n, ω 1 , ω 2 ) ! 1 as n ! 1: for large n, each DCCA window is longer than the largest avalanche. For α < β + 3 the avalanches in the left hand term of Eq (47) may be assumed to be non-overlapping since we assume α > 2 (see derivation for MU1). Because τ 0 < 3, we may neglect small avalanches and assume that a window contains only one avalanche at t 0 with length L. Then: But R L 0 g o 1 ðaðt=LÞÞ $ R L 0 g o 2 ðaðt=LÞÞ up to a constant factor for large avalanches and assuming the integral converges. Thus the correlation tends to 1.
(MU3) α > 2 and α < 2β + 3 For frequencies with 1/ω ) L c and L L c : This is because relative to the time-scale of the filter, a(t/L) may be treated as a delta function, which weights all frequencies equally; y(ω)*c, where y(ω) is the power-spectrum of a(t/L) and c is a constant. Therefore we apply the same argument as for MU2 but with Eq (47) replaced by: For τ > 3, the left hand term dominates, since it has unbounded variance in L c , so we find that for ω 1 , ω 2 ! 0, ρ DCCA (n, ω 1 , ω 2 ) ! 1 as n ! 1. Moreover, applying the results for MU2 we have: (MU4) α > 2β + 3 and α > 2 Since these universality classes have the shortest tails in their critical distributions, we may apply identical methods as applied to the PF hypothesis which show that H o amp ¼ H raw ¼ 1=2 and ρ DCCA (n, ω 1 , ω 2 ) ! 0 as n ! 1 when ω 1 6 ¼ ω 2 .

Simulation details
In all simulations involving CD we model the average avalanche shape of Eq (9) as a quadratic function: b(t) = −4(t − 1/2) 2 + 1 and we set b(t) = c(t). For the noise component we use the implementation of [30]. For the power-law cutoff sampling, we perform a density transformation of the uniform distribution. (In MATLAB x = rand(1,T). Ã (1-L c )+L c ; x = 1./ (x.^(1./(α-1)))) For the PF model since the distribution of bursts of activity according to the PF decays quickly, we take X(t) as a Gaussian white noise process. X 0 (t) is then obtained by filtering using the method of [30] to yield a process with spectrum scaling according to 1/ω γ .

Experiment
Seven subjects participated in the study (1 female); the subjects had an average age of 25 years at the time of the study. Participants gave written informed consent for their participation. The experimental protocol was approved by the Institutional Review Board of the Charité Medical University, Berlin and conformed to the declaration of Helsinki. EEG recordings were obtained at rest with subjects seated comfortably in a chair with their eyes open. Recordings were made of three sessions, each 5 minutes long so that each data set comprises roughly 15 minutes of data. EEG data were recorded with 96 Ag/AgCl electrodes, using BrainAmp amplifiers and BrainVision Recorder software (Brain Products GmbH, Munich, Germany). The signals were recorded in the 0.016-250 Hz frequency range at a 1000Hz sampling frequency and subsequently subsampled to 200Hz.
Outlier channels were rejected after visual inspection for abrupt shifts in voltage and poor signal quality. The data were then re-referenced according to the common average. Spatial filters were computed on the data using Spatio-Spectral Decomposition (SSD) [48], in order to extract components with pronounced alpha oscillations. Spatial filters with poor signal quality or topographies were rejected. We then restricted our analysis to components displaying a peak in the alpha range; this step ensured a high signal quality with low levels of artifactual activity. The fact that the spatial filters yield clear oscillatory signals ensured that the neuronal processes in the adjacent frequency ranges similarly originated from cortical areas relating to neuronal rather than artifactual activity. For DFA and DCCA estimation we set n to log-spaced values between 1000 and 25000.
Important is that we analyse 3 frequency ranges without oscillations (no local maximum in power-spectrum); the aim was to restrict analysis to activity corresponding to the 1/f γ shape of the power-spectra. Given that the data were sampled at 200Hz, and that lower frequencies require far larger window sizes for analysis, we chose 3 frequencies above the beta range, taking care to exclude the 50Hz line noise.
Supporting information S1 Appendix. Supplementary theory, simulations and data analysis. Proof of the central limit theorem discussed in Section: Materials and Methods: Theory: Predictions for the CD Theory; further simulations validating the theoretical exponent relations displayed in Table 1 and the effect of signal-to-noise ratio; further data analysis investigating the effect of the choice of spatial filter, power spectra and DFA log-log plots. (PDF) S1 Software. CD model code. Fast implementation of the CD model of Eq (1). (GZ)