^{1}

^{2}

^{3}

^{4}

^{*}

Conceived and designed the experiments: JCS MTB. Performed the experiments: JCS. Analyzed the data: JCS MBW MTB. Contributed reagents/materials/analysis tools: JCS. Wrote the paper: JCS MBW MTB.

Dr. Bianchi receives funding from the Clinical Investigator Training Program: Harvard/MIT Health Sciences and Technology - Beth Israel Deaconess Medical Center, in collaboration with Pfizer, Inc. and Merck & Co. This funder had no role in the design of this study, acquisition or analysis of the data, or in any part of the analysis, interpretation or publication decisions. This funding source does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.

Despite the common experience that interrupted sleep has a negative impact on waking function, the features of human sleep-wake architecture that best distinguish sleep continuity versus fragmentation remain elusive. In this regard, there is growing interest in characterizing sleep architecture using models of the temporal dynamics of sleep-wake stage transitions. In humans and other mammals, the state transitions defining sleep and wake bout durations have been described with exponential and power law models, respectively. However, sleep-wake stage distributions are often complex, and distinguishing between exponential and power law processes is not always straightforward. Although mono-exponential distributions are distinct from power law distributions, multi-exponential distributions may in fact resemble power laws by appearing linear on a log-log plot.

To characterize the parameters that may allow these distributions to mimic one another, we systematically fitted multi-exponential-generated distributions with a power law model, and power law-generated distributions with multi-exponential models. We used the Kolmogorov-Smirnov method to investigate goodness of fit for the “incorrect” model over a range of parameters. The “zone of mimicry” of parameters that increased the risk of mistakenly accepting power law fitting resembled empiric time constants obtained in human sleep and wake bout distributions.

Recognizing this uncertainty in model distinction impacts interpretation of transition dynamics (self-organizing versus probabilistic), and the generation of predictive models for clinical classification of normal and pathological sleep architecture.

Although it has been understood for decades that sleep is comprised of transitions among sub-stages of rapid eye movement (REM) and non-REM (NREM) sleep, whether the temporal dynamics of these transitions is important for restorative functions remains obscure. Interestingly, recent analysis demonstrates that standard metrics used to summarize sleep architecture in clinical studies (sleep efficiency, percentages of each stage) fails to identify differences in fragmentation caused by medically severe sleep apnea

Improved quantification of sleep architecture holds promise for correlating sleep disruption with daytime symptoms and different pathological causes of fragmentation. Understanding sleep-wake transitions also has implications for modeling sleep architecture dynamics. Power laws and exponentials are apparent in many aspects of biology, from molecular to system levels, but may have distinct mechanistic implications. Power law distributions are thought to arise from multiple interacting components of a complex system, and observations of such systems often follow a similar profile across multiple measurement scales (“scale-free” patterns). Examples of scale-free patterns include stock market fluctuations (similarly jagged over minutes, days, or years), pulmonary branching patterns, activity level fluctuations, and heart rate variability

We ask therefore, under what conditions one distribution is likely to be mistaken for the other in terms of fitting, and consider the effects of sample size and the parameters of the distributions using simulation methods. We hypothesize that with relatively small sample sizes (such as that which might be typical of 1–14 clinical sleep study nights), it is likely that statistical testing will yield an acceptable fit for a power law distribution, even if the true underlying model is multi-exponential, and vice versa. We further hypothesize that certain combinations of exponential distributions will be particularly susceptible to mistaken acceptance of power law fitting.

We are primarily concerned with two distributions commonly used to describe the lengths of sleep and wake bouts. A power law function has the general form ^{−α}, where c is a constant, and α is the “scaling factor”, that is, the slope of the line seen on a log-log plot. An exponential function has the general form ^{(−x/τ)}, where

We follow Clauset et al

The cumulative distribution function (CDF) is defined as the probability (_{o}_{o}_{o}

This type of analysis requires a sense of how far from zero _{o}_{o}_{o}

Algorithm summary: (Supplemental

Draw a test sample from a known distribution: either a power law random number generator, or an exponential random number generator with one, two, or three exponential components.

Fit the test sample to the other (incorrect) distribution and estimate the KS test statistic, _{o}

Generate 100 reference sample datasets from random number generators with distributions defined by the fitted functions used in step 2.

Re-fit each of these 100 reference sample datasets to the type of distribution from which they were drawn in step 3, and estimate the test statistic for simulated data, _{s}_{s}

Compare _{0}_{s}_{o}_{s}

The output of each iteration is a

Test samples of data were randomly drawn from the sum of either two or three exponential distributions. Each test sample was fitted to a power law distribution, and the goodness of fit was evaluated by the KS method. We systematically varied the proportion of observations drawn from each exponential distribution and its decay parameter, τ (the average duration of an observation), in order to determine the goodness of fit of the power law function over a spectrum of parameter values. The number of observations included

Note that the methods of Clauset et al

We generated sleep bouts as random values drawn from a power law distribution with a scaling exponent (α) of 3, each of which is referred below as a “test sample”. We set α = 3 to ensure adequate dispersion of the duration distribution given our binning routine (1 epoch bin width, all fractions rounded down). We then collected these simulated bouts into frequency-duration histograms (see below), in preparation for three separate fitting routines: a single exponential function, the sum of two exponential functions, and the sum of three exponential functions of x. For example, the form of the three-exponential function is f(_{1}e^{(−x/τ1)} + _{2}e^{(−x/τ2)}+ _{3}e^{(−x/τ3)}, where _{i} is the relative contribution of the i^{th} exponential term to the distribution, defined by the

The shortest state defined by convention in human sleep studies is 30 seconds, or one “epoch”, which is the unit of time used in these simulations. In generating random data, we discarded values less than 1 epoch; in other words,

Fits that violated our biologically imposed constraints (positive A values and positive τ values) were discarded. We implemented the fitting routine in two different ways. We first considered 1000 consecutive sample datasets, and the probability of rejecting the exponential fits refers only to the subset of 1000 for which exponential fitting converged within our constraints. Fitting with the sum of three exponentials is more likely to include a component that violates our constraints. If there were a systematic relationship between non-convergence and the distribution of bout lengths in the sample, then the results for the three-exponential function would have more of these samples excluded, confounding comparison between the results for one, two and three exponentials. To address this potential bias, we also analyzed the first 1000 samples for which all three exponential functions converged according to our criteria.

The fitted probability mass functions were normalized in order to represent them as fitted probability density functions (PDFs). These PDFs were then used to generate random values distributed according to the fitted exponential functions with the R implementation of the Unuran universal number generator (_{s}

Although several groups have reported mono-exponential fitting to observed bouts of sleep across species, we have recently demonstrated that the distribution of human sleep sub-stages (REM and NREM) is not captured by a mono-exponential model. Specifically, two (REM) or three (NREM) exponential terms were required to fit these distributions, suggesting multiple distinct stage transition time scales

The rules governing the timing of sleep-wake stage transitions remain unknown. Given the potential clinical importance of fragmentation (mainly attributed to brief transitions to wakefulness that interrupt sleep continuity), characterizing these patterns empirically from hypnogram data is worthwhile. Consider a simplified model of sleep architecture consisting of two states (sleep and wake) with fixed transition probabilities (a first order markov process). In this setting, the distribution of sleep (and wake) bout lengths is predicted to be mono-exponential.

Simple first order Markov transitions between two states (S and W), which generates a one-exponential distribution of sleep states (when n = 400 or 4000). The data are plotted as a binned frequency histogram with arbitrary units of bout duration; inset shows the same data on a semi-log plot, in which mono-exponential distributions appear linear.

In contrast to the simple shape of a mono-exponential function, a collection of sleep bouts drawn from a multi-exponential distribution can appear linear on a log-log plot, a feature typically considered characteristic of a power law distribution. This is shown in the log-log frequency histogram of

This frequency histogram appears linear on the log-log plot, which is typical of a power law distribution. The distributions of the three distinct one-exponential generators used to create the distribution shown in panel A include a fast (blue), intermediate (green) and slow (red) exponential decay constant, in arbitrary units of duration. Panel C shows the overlap of panels A and B.

We conducted simulations to answer the following question: over what range of parameters might a two-exponential process be reasonably fit by a power law function? Thus, we generated sample data sets by varying the exponential decay constant (τ) and the relative proportion of two independent exponential generators. We also varied the total number of observations to determine the impact of sample size on model fitting; these values approximate a single night (40) or multiple nights (160, 320, 640) of stage transitions typical of standard human polysomnograms. In each simulation, the fast τ was fixed at 1 epoch, and the value of the second (slow) τ varied from 2 to 60 epochs (_{1} is given on the _{1} generator). Therefore the degenerate cases of a single exponential process are shown on either extreme of the _{1} events is either 0 or 1). The

For _{2} when _{2} is between ∼10–30 epochs. When the number of samples is increased to 320 (

Panels A–D show the probability of failing to reject the incorrect power law model, for two-exponential data of increasing sample sizes (_{1} = 1 (the fast exponential), τ_{2} is varied (

Frequency histogram examples of a single trial from the “green” zone of

We repeated power law fitting of the same simulations as in ^{2} from an OLS analysis measures the amount of variation in the sample that is explained by a power law model. The ^{2} cannot therefore be used to accept or reject a hypothesis via threshold or cut-off values in the same manner that is commonly implemented with a ^{2} value on the ^{2} values (red) indicate that the power law model explains most of the sample variation across the entire parameter landscape (the data mimic a power law across the parameter space), despite the distribution being drawn from a two-exponential distribution. Comparing these results to those from the KS test, it is clear that the ^{2} from the OLS approach is not an appropriate measure of goodness-of-fit under these conditions.

For ^{2}) from the (incorrect) power law model is high for all parameter values when the OLS method is used.

We next evaluated the range of parameters within which a three-exponential process is well-fit by a power law model. In one set of simulations, we held the τ values for the three exponential functions constant (τ_{1} = 1, τ_{2} = 5, τ_{3} = 25), and systematically varied the proportion of observations drawn from each function. In each case, we evaluated the goodness of fit of a power law model with the KS method. The probability of failing to reject the power law model is shown for _{2} exponential function to the number of draws from the τ_{3} (slowest) exponential function; the _{1} (fastest) exponential function to the number of draws from the τ_{2} exponential function. In this manner, all combinations in the parameter space can be visualized. When _{1} was low, especially when τ_{2} was also low. As the number of samples increased, the initially broad range of failure to reject became narrower, with a peak occurring when τ_{1}:τ_{2} was ∼2–16, and τ_{2}:τ_{3} was ∼1–0.25. Like the two-exponential simulations, when

A–D show the probability of failing to reject the (incorrect) power law fit of three-exponential data with increasing sample size (

In a complementary set of simulations, we held the number of draws from each exponential function constant and in equal proportions (1∶1∶1). We varied instead the decay constants for the middle (τ_{2}) and slowest (τ_{3}) decaying exponential functions, while keeping τ_{1} fixed at 1. The probability of rejecting the power law model is shown when the total number of observations was _{2} was between ∼2–10 epochs, and was fairly insensitive to changes in the value of τ_{3}. When n = 159, the zone of failure to reject the power law was concentrated around τ_{2} values of ∼2–5, again fairly insensitive to the values of τ_{3}. For larger sample sizes, there was minimal chance of failing to reject the power law model.

A-D show the probability of failing to reject the (incorrect) power law fit of three-exponential data with increasing sample size (

We next performed simulations to answer the converse question: given a known power law distribution of data, what is the likelihood of incorrectly accepting exponential fitting? To accomplish this, we generated random draws from a power law distribution, and determined the probability of rejecting one-, two-, or three-exponential model fits to the data, using the KS method (see methods). We evaluated total sample sizes of

one exponential | 0.900 | (448) | 0.127 | (914) | 0.000 | (1000) |

0.882 | 0.141 | 0.000 | ||||

two exponentials | 1.000 | (405) | 0.980 | (904) | 0.549 | (999) |

1.000 | 0.972 | 0.542 | ||||

three exponentials | 1.000 | (377) | 1.000 | (884) | 0.828 | (996) |

1.000 | 0.999 | 0.826 |

Surprisingly, even the mono-exponential model cannot be rejected in a substantial portion of trials if the power law distribution is under-sampled (

Measurements that characterize sleep architecture according to state transition dynamics may capture the elusive concept of sleep continuity (or fragmentation) better than routine clinical statistics such as stage percentages or sleep efficiency. Analysis of sleep-wake state transition probabilities in animal and human studies suggests that the temporal stability of certain stages is approximated by either an exponential or a power law model. Our results emphasize that the distinction between a power law and multi-exponential process is not always straightforward – visually or statistically. By simulating sleep bout lengths based on a variety of known distributions (exponential or power law), we determined goodness of fit by the KS method for the incorrect model: power law for a known exponential distribution, and exponential for a known power law distribution. The parameter landscape under which the incorrect model provided a good fit of the data (that is, “zones of mimicry”) corresponded closely to the τ values for multi-exponential fitting of wake and NREM sleep bouts observed in our analysis of Sleep Heart Health Study subjects

Several practical challenges exist regarding the quantification of sleep architecture dynamics. Our study does not directly address whether the power law or multi-exponential model is appropriate for any given experimental dataset (where the true distribution is not known). Importantly, our results show that the commonly used OLS fitting method

Brief transitions are subject to inter-rater variability in manual scoring and to “rounding” criteria in scoring guidelines. Since brief transitions contribute not only to the steep decay portion of the frequency-duration histograms, but also to the tail portions (by interrupting otherwise long bouts), these fitting methods may be particularly sensitive to accurate determination of brief transitions. Pooling clinically similar subjects may address the sample size challenge, but introduces uncertainty in terms of inter-individual heterogeneity and therefore in the observed statistical distribution of the data. Longitudinal home monitoring of sleep-wake stages is not currently available.

It has been suggested that sleep-wake architecture resembled the dynamics seen in some models of self-organized criticality: in avalanche models, the duration of the avalanche events followed a power law (and was thus likened to wake durations), while the time between avalanches followed an exponential distribution (and was thus likened to sleep durations)

The implication for sleep, and indeed perhaps any setting in which random processes may self-organize, is that asking whether a distribution is

Our current study focused on the question of model fitting, and raised cautionary insights about certain distributions mimicking one another from a fitting standpoint. The related question of model choice is also of interest, but not directly addressed by our analyses: given an empiric set of observations, which model is more likely to explain the data. This question is best undertaken with the guidance of (preferably strong)

Many other processes in the biological sciences have been analyzed in terms of a power law distributions and their alterations in disease

In conclusion, we suggest that the two fundamental aims of sleep architecture analysis are 1) to provide a “top-down” approach to mirror the extensive “bottom-up” approaches to sleep-wake mechanisms and physiology, and 2) to provide improved clinical metrics of normal sleep and its disruption in disease states. Applying these methods to sleep-wake dynamics of animals with anatomical lesions or pharmacological manipulations of the critical pathways

Summary of K-S method for goodness-of-fit. A. Sample data set plotted as a frequency-duration histogram. B. Data from panel A re-plotted as a cumulative probability distribution, along with fitted curve (see methods). The maximum vertical distance between the data and the fitted curve is computed. C. Generated new data set (green) by random number generator defined by the fitted function. The maximum vertical distance between the new data set and the fitted curve is ds. D. Repeat process in panel C 100 times to generate a distribution of ds values.

(0.34 MB TIF)

We thank Drs Elizabeth Klerman, Andrew Phillips, and Mark Kramer for valuable comments on an earlier draft of the manuscript. Thanks also to Aaron Clauset, Cosma R. Shalizi and Laurent Dubroca for posting methods for fitting power laws and generating power-law distributed random variates in R.