A Novel, Open Access Method to Assess Sleep Duration Using a Wrist-Worn Accelerometer

Wrist-worn accelerometers are increasingly being used for the assessment of physical activity in population studies, but little is known about their value for sleep assessment. We developed a novel method of assessing sleep duration using data from 4,094 Whitehall II Study (United Kingdom, 2012–2013) participants aged 60–83 who wore the accelerometer for 9 consecutive days, filled in a sleep log and reported sleep duration via questionnaire. Our sleep detection algorithm defined (nocturnal) sleep as a period of sustained inactivity, itself detected as the absence of change in arm angle greater than 5 degrees for 5 minutes or more, during a period recorded as sleep by the participant in their sleep log. The resulting estimate of sleep duration had a moderate (but similar to previous findings) agreement with questionnaire based measures for time in bed, defined as the difference between sleep onset and waking time (kappa = 0.32, 95%CI:0.29,0.34) and total sleep duration (kappa = 0.39, 0.36,0.42). This estimate was lower for time in bed for women, depressed participants, those reporting more insomnia symptoms, and on weekend days. No such group differences were found for total sleep duration. Our algorithm was validated against data from a polysomnography study on 28 persons which found a longer time window and lower angle threshold to have better sensitivity to wakefulness, while the reverse was true for sensitivity to sleep. The novelty of our method is the use of a generic algorithm that will allow comparison between studies rather than a “count” based, device specific method.


Introduction
Large-scale studies have traditionally assessed physical activity and sleep by self-report but objective measurement tools have become increasingly common in the last decade. Studies of physical activity use a hip-or waist-mounted accelerometer that are not suitable for sleep assessment as they are removed prior to bedtime [1]. In sleep studies, parameters are primarily assessed over several days using a 24 hour wrist-mounted accelerometer, commonly known as actigraphy [2]. In both areas of research, body acceleration is expressed in manufacturer specific 'count' values over a specific time window, called epoch. This approach has limitations as estimates from different accelerometers are not comparable and the analyst has limited control over signal processing [3].
Technological advances in accelerometers over recent years now allow collection of high resolution data in universal units of gravitational acceleration. This type of data, also referred to as raw accelerometry, increases analytical freedom and is more amenable to methodological consistency between studies [3]. The wrist-worn version of such accelerometers has become popular, especially in large population studies [4][5][6][7], as compliance is equal or better than waist-worn devices [8][9][10] and it allows assessment of both physical activity and sleep. A number of algorithms have been developed to derive physical activity variables from raw accelerometer data [6,11,12]. In contrast, relatively little has been done on the extraction of sleep parameters from these data [13,14].
The primary signal used to characterise sleep is lack of body movement, but this judgement is complicated by the fact that there are minor body movements during sleep and [15,16] absence of body movement is also possible during periods of wakefulness. Consequently, sleep characterised by accelerometry primarily represents a sustained lack of body movement when the participant reports being in bed or asleep and may not necessarily result in the same sleep classification as the one from polysomnography or self-reported sleep [15]. Polysomnography, the gold standard for assessment of sleep is a multi-parametric test of biophysiological changes that occur during sleep [17] and requires a laboratory setting, making it expensive and infeasible for large scale studies.
Previous work on sleep detection from accelerometer data relied on algorithms that use magnitude of acceleration as their input [13,18,19]. We propose a novel method, an easier to interpret description of body kinematics, using accelerometer derived arm angle to detect sleep. We also evaluate the agreement of sleep duration parameter derived from this method with self-reported sleep duration in a large sample of community dwelling older adults. Finally, we investigate the impact of method configuration and individual factors such as socioeconomic position, reports of sleep disturbances and depression on agreement between selfreported sleep duration and two accelerometer-assessed sleep parameters, time in bed and total sleep duration. In order to validate our sleep detection algorithm we compared it with data from a polysomnography study in a sample of 28 sleep clinic patients.

Study Population
Data are drawn from the Whitehall II Study, established in 1985/88, as previously described [20]. Accelerometer measurement was added to the study at the 2012/13 wave of data collection for participants seen at the central London clinic and for those living in the South-Eastern regions of England who underwent a clinical evaluation at home [6]. We used a second study on sleep clinic patients in order to validate our sleep detection algorithm against polysomnography. These data come from 28 adult patients who were scheduled for a one night polysomnography (PSG) assessment at the Freeman Hospital, Newcastle upon Tyne, UK, as part of their routine clinical assessment and were subsequently invited to participate in the study.

Ethics Statement
In both studies participants were provided with instructions and an information sheet about the study and were given time to ask questions prior to providing written informed consent.
The studies were approved by the University College London ethics committee and the NRES Committee North East Sunderland ethics committee, respectively.

Instrumentation
Participants in the Whitehall II Study had no contraindications (allergies to plastic or metal, travelling abroad the following week) and were asked to wear a tri-axial accelerometer (GEN-EActiv, Activinsights Ltd, Kimbolton, UK) on their non-dominant wrist for nine (24-h) consecutive days. They were asked to complete a simple sleep log every morning which consisted of two questions: 'what time did you first fall asleep last night?' and 'what time did you wake up today (eyes open, ready to get up)?' The accelerometer was configured to collect data at 85.70 Hz with a ±8g dynamic range. A more elaborate description of the accelerometer protocol can be found in a recent publication [6].
In the second study, polysomnography (Embletta 1 , Denver) was performed using a standard procedure, including video recording, a sleep electroencephalogram (leads C4-A1 and C3-A2), bilateral eye movements, submental EMG, and bilateral anterior tibialis EMG to record leg movements during sleep. Respiratory movements were detected with chest and abdominal bands measuring inductance, airflow was detected with nasal cannulae measuring pressure, and oxygen saturation of arterial blood was measured. Airflow limitation and changes in respiratory movement were used to detect increased upper-airway resistance. All respiratory events and sleep stages were scored according to standard criteria so that EEG determined total sleep time could be measured [15]. Participants were asked to wear the same brand of accelerometer as in the first study (GENEActiv, Activinsights Ltd, Kimbolton, UK) on their non-dominant wrist throughout the one night polysomnography assessment. Here, the accelerometer was also configured to record at 85.70 Hz.

Data Processing
Sleep detection algorithm. Wrist-worn accelerometer data allow estimation of the arm angle relative to the horizontal plane. By visualizing the arm angle, sleep is characterised as a period marked by a low frequency of changes in arm angle, see thin black line in Fig 1. The accelerometer also had a light sensor (Fig 1) but we decided not to use it for sleep detection as despite its appeal for visual interpretation it can be misleading when the light sensor is covered by a sleeve or bedding.
Previously published methods were used to minimize sensor calibration error [21] and to detect and impute accelerometer non-wear periods [12]. Arm angle was estimated as follows: where a x , a y and a z are the median values of the three orthogonally positioned raw acceleration sensors in g-units (1g = 1000 mg) derived based on a rolling five second time window. Here, the z-axis corresponds to the axis positioned perpendicular to the skin surface (dorsal-ventral direction when the wrist is in the anatomical position). Next, estimated arm angles were averaged per 5 second epoch, and used to assess change in arm angle between successive 5 second epochs. Periods of time during which there was no change larger than 5°over at least 5 minutes were classified as bouts of sustained inactivity, or potential sleep periods. To demonstrate how a different parameter selection impacts the results, all analyses were replicated with a 10 minute time window criterion. Please see the discussion section for further details on this method. In the Whitehall II Study summary variables were extracted per 24 hour time window to create a standardised time interval, in this case from noon to noon the following day. However, if the sleep log indicated that the participant was a day sleeper (potential night worker) with sleep onset before noon and a wake up time after noon then a time window from 6pm to 6pm was used. The first and ninth night (24h) were excluded from the analysis, leading to a maximum of seven nights per person in the analysis. Additionally, nights with less than 16 hours of detected wear time were excluded from the analysis. The sleep detection algorithm resulted in one or more sustained inactivity bouts detected per day and for each bout, the duration, the time of sleep onset, and the time of waking were recorded.
Upon publication of this paper our method will be made freely available as an expansion of open-source R-package GGIR on CRAN (https://cran.r-project.org) [12,21]. Sleep detection aided by sleep log. In the Whitehall II Study a sleep log was used to distinguish nocturnal sleep from bouts of sustained inactivity during the day, which could be related to physical inactivity or day time naps. Sustained inactivity bouts were considered to be nocturnal sleep if they overlapped with a sleep window defined by the sleep log. The following variables were derived: sleep onset time, defined as the start of the first nocturnal sustained inactivity bout; waking time, defined as the end of the last nocturnal sustained inactivity bout; time in bed, defined here as the difference in time between the sleep onset and waking time; and total sleep duration, the sum of all nocturnal sleep bouts. Participants who did not have data for at least one weekday and one weekend day were excluded from all analysis that relied on weekly averages of sleep estimates. Average weekly accelerometer assessed sleep parameters were derived as [(average value for weekdays × 5 + average value for weekend days × 2) / 7].
In the polysomnography study the window of sleep classification was not determined by sleep log, but by the more reliable polysomnography recording of total sleep period, which is defined from lights out till waking up.
Processing of sleep log. The (visual) screening of accelerometer (actigraphy) data and sleep log data can be cumbersome in large datasets and it complicates replication [22]. Therefore, it is important to automate the method as much as possible in order to allow replication and validation of findings [22,23]. A one dimensional graph showing the overlap between sleep log entries and accelerometer detected sustained inactivity bouts was generated for all nights where the sleep onset time and/or waking time deviated by more than four hours from the nearest accelerometer detected sustained inactivity bout. This allowed us to visualise outliers and assess whether obvious mistakes had been made in the sleep log entry by the participant. These included: confusion between AM and PM, sleep onset and waking time being swapped, or a combination of the two. Visual inspection of 203 nights identified as outliers (out of 28005) resulted in a correction of 33 individual nights across 21 participants' log entries. Additionally, we found 24 nights across 20 participants with ambiguous log entries, who were subsequently excluded from the analyses. In total 27981 nights from 4094 participants were used in the analysis.

Covariates
The following measures, drawn from the health survey questionnaire, were included in the analysis: • Occupational position at age 50 years assessed using a 3-level variable (high, intermediate and low) representing income and status at work.
• Self-reported sleep duration assessed using the question 'how many hours of sleep do you have on an average week-night?'. This question was asked a few days ahead of the accelerometer and sleep log assessment to avoid bias resulting from knowledge about sleep log entries.
• Insomnia symptoms derived from the Jenkins Sleep Problem Scale [24]. Participants were asked how often in the past month they had experienced the following symptoms: trouble falling asleep, waking up several times per night, trouble staying asleep (including waking far too early), and disturbed or restless sleep. The response scale ranged from 0 ("not at all") to 5 ("22-31 days"). The sum of individual scores was then categorised as follows: 0, 1-11, 12-20 [25].
• Depressive symptoms assessed using the 20-item Center for Epidemiologic Studies Depression (CES-D) scale. Scores range between 0 and 60 with higher scores indicating greater depressive symptoms; scores !16 were used to represent cases of CES-D depression [26].
In addition, body mass index (BMI) was calculated as weight (kilograms) divided by height (meters) squared. Weight was measured by a trained nurse in underwear to the nearest 0.1 kg on Soehnle electronic scales with digital readout (Leifheit AS, Nassau, Germany) and height was measured in bare feet to the nearest 1 mm using a stadiometer with the participant standing erect with head in the Frankfurt plane.

Statistical analysis
The agreement between weekly average accelerometer assessed sleep duration (time in bed, and total sleep duration − rounded to the nearest integer) and self-reported sleep duration from the health survey questionnaire was assessed using a confusion matrix and a weighted Cohen's Kappa coefficient [27]. In order to see the influence of individual characteristics, we stratified this analysis by the above mentioned covariates: sex, age (< 70y and !70y), occupational position, CES-D depression score, Jenkin's sleep score, BMI, and the day the accelerometer was worn (weekday versus weekend day).
A P-value of < 0.05 was considered significant. The agreement between accelerometerbased estimates of sleep and polysomnography estimates of sleep was quantified by the accuracy, sensitivity and specificity of the binary classification of sleep (any sleep stage) and wakefulness, and by the bias (mean ± standard deviation between patients) in estimated sleep duration.

Results
Of the 4879 participants to whom the accelerometer was proposed in the Whitehall II Study, 388 did not consent and 210 had contraindications. Of the remaining 4281 participants who wore the accelerometer, 4204 (98.2%) had valid accelerometer data (a readable data file). Among them, sleep log data were missing for 80 participants and 30 additional participants did not meet criteria for accelerometer wear time (at least one night with >16h of wear time). Therefore, the assessment of discrepancies between the accelerometer and the sleep log was undertaken in 4094 participants (83.9% of those invited).
Owing to missing data in the health survey questionnaire on depression (N = 240) and/or sleep (N = 83), additional 285 cases were excluded for the analysis comparing accelerometer assessed and self-reported sleep duration. Furthermore, 40 participants were excluded because they did not have at least one weekday and one weekend day of accelerometer data. The resulting 3769 (75% men) participants were on average 69.3 (standard deviation (SD) = 5.7) years old and had a mean BMI of 26.4 (SD = 4.1) kg/m 2 .
Pearson's correlation coefficient between sleep parameters derived from a 5 and a 10 minute window was 0.91 for time in bed and 0.98 for total sleep duration (both P < .01). Use of a 10 compared to a 5 minute window led to shorter time in bed (difference = -0.24h; 95% Confidence Intervals (CI):-0.29, -0.20) and total sleep duration (difference = -0.94h; 95%CI:-0.99, -0.90).
The median delay between responding to the health survey questionnaire and the first day of accelerometer wear was 5 days (interquartile range 1-22 days). Agreement between questionnaire based sleep duration and accelerometer estimated time in bed was similar for both time windows (Kappa = 0.32 for 5 min and 0.36 for 10 min window). However, agreement between questionnaire based sleep duration and accelerometer estimated total sleep duration was higher for the 5 minute (Kappa = 0.39) than the 10 minute window (Kappa = 0.18) ( Table 1). Accelerometer assessed total sleep duration tended to be greater than self-reported sleep duration for short sleepers and smaller for long sleepers; total accelerometer assessed sleep duration was smaller in long sleepers ( Table 1).
The agreement between questionnaire assessed sleep duration and time in bed was lower in women, in depressed participants, in those reporting more insomnia symptoms, and on weekend days (Table 2). However, no such differences were observed for total sleep duration ( Table 2). S1-S3 Figs provide a visual representation of the confusion matrices underlying the Kappa coefficients for sex, sleep quality, and depression.
All 28 patients recruited for the polysomnography study (11 female) had complete data and were aged between 21 and 72 years (mean±sd: 45±15 years). Of them, 19 had a sleep disorder according to the International Classification of Sleep Disorders. The agreement between accelerometer estimates of sleep using our algorithm and polysomnography derived parameters was good, as shown in Table 3. For example, sleep parameters derived with a 5 minute window and a 5 degree angle threshold had on average a 31 minute overestimation of sleep duration and an 83% accuracy (Table 3).

Discussion
This paper presents a novel method to detect sleep duration from raw accelerometer data collected using a wrist-worn device and assesses agreement of these parameters with both selfreported sleep duration and polysomnography. We used an accelerometer with 5 minute and 10 minute criteria to detect bouts of sustained inactivity based on change in arm angle; following the standard protocol in sleep studies we distinguished nocturnal sleep from daytime inactivity using a sleep log [28]. Overall there was moderate agreement between accelerometer and questionnaire assessed sleep duration, except for total sleep duration derived with a 10 minute time window configuration for which the agreement was poor. In subgroup analyses, we found lower agreement for time in bed in women, depressed participants, and those reporting more insomnia symptoms. However, no differences in agreement as a function of these variables were observed for total sleep duration. Previously published methods for sleep detection using accelerometers optimized their parameters to best fit with polysomnography-derived periods of sleep [14,18]. We did not use parameter optimisation, but instead aimed for a method that provided an easy to understand (heuristic) description of wrist kinematics. Two requirements were identified: 1) data should be expressed in universal kinematic units and not in "count" values that are device specific, and not easily interpretable, and 2) method parameters should be derived logically and not result from statistical regressions on a particular sample. Consequently, arm angle was considered easier to interpret than magnitude of acceleration. The selection of a 5°angle threshold based on the median value of a moving 5 second window was assumed to capture all static postures and to filter out minor breathing movements and movements of the bed partner. The 5 and 10 minute window of substantial inactivity criterion were assumed to be plausible time intervals after which sleep becomes more likely. Additional parameter configurations were evaluated in the polysomnography study confirming that a 5 and a 10 minute window together with a 5 degree angle threshold provide informative estimates of sleep.
Previous findings from the comparison between accelerometer estimates of sleep and a concurrently applied sleep log [15,29,30] should be interpreted with caution because data from the log was used to time nocturnal sleep initiation and waking time, boosting the agreement between the two methods. Only a few studies have compared accelerometer derived estimates of sleep with independently collected self-reported sleep duration [31,32]. One study based on 56 adult women (age range 18-80y) found kappa values ranging from -0.19 to 0.14 for different sleep estimates [32]. Another study based on 225 adolescents (age range: 11-13yr) found no significant correlation (r: 0.06; p > .05) between actigraphy and self-reported sleep duration [31]. In the present study, greater agreements were found between accelerometer and questionnaire estimated sleep duration. Better agreement in our data could be explained by the accuracy of our accelerometer method or the age group examined in our study.
For the accelerometer method the time window used to decide that an inactivity bout is sleep (5 vs. 10 minutes) affects the estimated sleep duration, both for time in bed and total sleep duration, with durations being longer with the 5 minute window. The estimated total sleep duration we found for a 5-compared to a 10-minute window was closer to reports from studies using polysomnography in older adults from the general population (range being 6 to 6.7h) [33][34][35][36].
In addition, the agreement between self-reported sleep duration and accelerometer-assessed total sleep duration was better for a 5 minute window, while agreement for time in bed was moderate for both 5 and 10 minute time windows. A 10 minute window could be hypothesised to be better adapted for ignoring inactive periods while awake while a 5 minute window may be more sensitive in capturing all sleep periods. The interpretation of the sleep duration question may also affect the level of agreement between the methods. Some participants may have answered the question with their recall of total hours of sleep, while other participants may have interpreted the question about sleep duration as the time difference in hours between sleep onset and waking time, time in bed. Future studies using direct comparison against polysomnography in a larger sample of the general population settings are needed to help inform a decision about time window configuration. Our polysomnography study showed that sensitivity to detect sleep is better for shorter time windows and higher angle thresholds while the specificity to detect sleep, and therefore the sensitivity to detect wakefulness, is better for longer time windows and lower angle thresholds. These results are in line with results of comparison with self-reported sleep as discussed above. The optimal method configuration to minimize bias in estimated sleep duration depends on the true duration of sleep, which is unknown [37]. Therefore, a generic configuration for the optimal setting is likely to reflect the data structure of the derivation sample, which may differ from the data structure of the application sample. Our suggestion is to use knowledge (believes) of the effect of sleep on body posture, in terms of parameters such as arm angle and time of sustained inactivity, rather than count data produced by conventional methods for which optimization in particular datasets is the only pathway for interpretation. The lower agreement for time in bed seen in women, depressed participants, those reporting more insomnia symptoms, and on weekend days may be explained by a lower validity of selfreported sleep in these groups or on these days of the week, a lower validity of the accelerometer method, or a combination of the two [38]. For example, disturbed sleep may present challenges for both methods: difficulties in estimating sleep duration by oneself, more inactive wake periods that can be confused with sleep, and possibly more minor movements during sleep that can be confused with wakefulness. However, the absence of a noticeable difference in method agreement by individual factors for total sleep duration seems to indicate that the agreement discrepancies are specific to the time in bed parameter.
The main strength of accelerometry is that it can provide complementary estimates of sleep that cannot be derived from self-report methods, e.g. total sleep duration, sleep efficiency, and sleep fragmentation. A specific strength of raw data accelerometry when used on the wrist is that it can be combined with assessment of physical activity using the same device in order to facilitate research on the relationship between sleep and physical activity. Our study findings need to be interpreted in light of some limitations. The sleep log we used was relatively simple. A more detailed sleep log including specific questions for naps, times into and out of bed, and time lights off, may well allow better assessment of sleep. However, the addition of questions may also increase the burden on the participants and result in missing data or incorrect data entries. Future research is needed to compare the accelerometer method with polysomnography in a larger sample to better understand discrepancies between neurological sleep and sleep defined by wrist kinematics.
In conclusion, agreement between sleep duration estimates by our newly developed method for accelerometer data and sleep questionnaire was moderate. We found that the agreement with self-reported sleep duration is least affected by individual factors when sleep duration is defined as total sleep duration over a night.