Figures
Abstract
Purpose
We examined the one-year test re-test reliability and validity criterion of survey-assessed sleep duration collected from two separate questions.
Methods
The Activity Validation Sub Study included 751 participants of the Cancer Prevention Study-3 study to further investigate rest/activity cycles. Sleep duration was collected using three methods: survey, Daysimeter device, and sleep diary. Survey-assessed sleep duration was collected using 2 different questions, each with different response options (categorical and continuous). Selected participants (n = 170) were asked to wear a Daysimeter device for seven consecutive days for two non-consecutive quarters. Participants were excluded from the current study due to incomplete/implausible survey or device data or reported working night shift. We calculated reliability of pre- and post-survey sleep duration for both survey question using Spearman correlation. We used the method of triads to estimate the validity coefficient (VC) between the three sleep duration measurements in the present study and the “true” latent sleep duration measure, and bootstrapping methods to calculate the 95% confidence intervals (95%CI).
Results
Of 119 participants included in the study (52.10% male), test-retest correlation showed strong and moderate correlations for sleep duration collected continuously and categorically, respectively. The VC for survey-assessed continuous sleep duration was 0.82 (95%CI 0.71, 0.90) for weekday and 0.68 (95%CI 0.46, 0.83) for weekend. Performance of the VC was slightly weaker for survey-assessed categorical sleep duration (weekday VC = 0.57 95%CI 0.42, 0.71; weekend VC = 0.47 95%CI 0.29, 0.62).
Citation: Donzella SM, Masters M, Phipps AI, Patel AV, Zhong C (2024) Validity of self-reported sleep duration in the Cancer Prevention Study– 3. PLoS ONE 19(8): e0307409. https://doi.org/10.1371/journal.pone.0307409
Editor: Karla Moreno-Tamayo, Mexican Social Security Institute: Instituto Mexicano del Seguro Social, MEXICO
Received: March 6, 2024; Accepted: July 1, 2024; Published: August 16, 2024
Copyright: © 2024 Donzella et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data underlying the findings of this study are restricted by the Emory University Institutional Review Board, who approved the consent forms. Data contain potentially identifying or sensitive patient information. Data from the Cancer Prevention Study 3 are available from the American Cancer Society by following the ACS Data Access Procedures (https://www.cancer.org/content/dam/cancer-org/research/epidemiology/cancer-prevention-study-data-access-policies.pdf) for researchers who meet the criteria for access to confidential data. The applicant should contact cohort.data@cancer.org for initial evaluation of your research idea.
Funding: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study-3. This analysis was also supported through an unrestricted research grant from Sleep Number Corporation. The authors would like to state that the Sleep Number Corporation funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This manuscript has not been previously published, and is not under consideration for publication elsewhere.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The field of sleep epidemiology has grown in the past several decades. Many epidemiologic studies have suggested a relationship between sleep duration and various health outcomes, including mortality [1]. Such epidemiologic studies often rely on self-reported sleep measures, which carry significantly lower research costs and minimize participant burden relative to more objective measures (e.g., actigraphy). Although self-reported sleep measures are widely used in epidemiology research, these measures may be influenced by participant perception of sleep [2], how the question is worded or formatted [3], and/or various preexisting health conditions [4], all of which may potentially lead to systematic or random error in the measurement. Reliable and valid assessment of sleep duration is necessary to accurately capture sleep patterns and evaluate the impact of sleep on health.
Often, the validity of subjective sleep measures is assessed in a two-way comparison using accelerometry [5] or polysomnography [6] (PSG) as the comparator. Accelerometers are a non-invasive technique widely used to objectively measure sleep duration through movement [7]. Although accelerometry provides objective sleep measures, the collection and processing of these data is imperfect [7, 8] and may under- or over-estimate sleep duration compared to polysomnography (PSG) [7]. PSG is the gold standard of objective sleep measurement; however, its use in large-scale epidemiologic studies is generally infeasible given that it is resource, time, and cost intensive. Therefore, the use of a triangular comparison (method of triads) between questionnaire, reference, and objective measures may be more appropriate to quantitatively describe the validity of sleep collected via questionnaire in large prospective cohort studies.
The objective of this study was to 1) assess the one-year test-retest reliability of sleep duration collected from two different questionnaire items and 2) examine the criterion validity of self-reported sleep duration collected from two different questionnaire items using the method of triads, in a validation sample of participants enrolled in a large, nationwide, prospective cohort study.
Methods
Study population and design
The American Cancer Society Cancer Prevention Study-3 (CPS-3) is a prospective cohort study of over 300,000 US adults with the primary goal of investigating cancer incidence and mortality [9]. Participants aged 35 to 65 years with no previous cancer history (except for basal and squamous cell skin cancer) were recruited and enrolled from 2006 to 2013. Of the 303,682 participants enrolled, approximately 254,000 completed the baseline questionnaire and were sent subsequent follow-up questionnaires. The CPS-3 study recruitment and design has been previously described in detail [9].
The CPS-3 Activity Validation Sub Study (AVSS) was conducted in 2015 to further investigate rest/activity cycles among a subset of CPS-3 study participants [10]. A total of 10,000 CPS-3 participants were invited to participate in the AVSS, with sampling stratified by sex, race, and ethnicity. A total of 1,801 participants accepted the invitation and 751 participants who completed the 2015 CPS-3 follow-up survey were enrolled in the AVSS. A subsample of AVSS participants (N = 190) were invited to participate in the collection of light exposure and objective sleep measures. The CPS-3 study and the AVSS were approved by the Emory University Institutional Review Board. All CPS-3 participants provided written consent when they completed the on-site survey at enrollment with trained volunteers serving as the witness. AVSS participants provided consent electronically with no witness present. No minors were enrolled in the CPS-3 or AVSS studies.
The AVSS prospectively collected data from July 27, 2015 to October 26, 2016. At baseline, participants were asked to complete a “pre-survey” questionnaire that collected information on sleep patterns as well as other demographic and lifestyle factors. Participants completed various objective and subjective sleep measurements throughout the duration of the study. At the end of the AVSS study, participants completed the same questionnaire which we refer to as the “post-survey” questionnaire.
Sleep measures
Survey.
Sleep duration was collected using 2 different survey questions. The first question asked participants “During the past year, estimate the hours per day you spent sleeping on typical weekdays and weekends” with possible hours/day response categories being “0, <1, 1–2, 3–4, 5–6, 7–8, 9–10, 11+”. Weekdays were defined as Sunday-Thursday night and weekends included Friday and Saturday nights. The midpoint for each response category was used as the absolute sleep duration, and 12 was used as the midpoint for the 11+ category. This measure will be referred to as survey-assessed categorical sleep duration.
The second sleep duration question was as follows: “During the past year, what time do you typically try to go to sleep and wake up?”. Participants were asked to fill in time they fell asleep and time they woke up for workday and non-workday, separately, in the format of hh:mm, and to mark AM or PM. All responses were converted to 24:00 time and the following assumptions were made: 1) if hour asleep was between 1:00–5:59 and AM/PM was not selected, then time asleep was assumed to be 01:00–05:59; 2) if hour asleep was between 6:00–11:59 and AM/PM was not selected, then time asleep was assumed to be 18:00–23:59; 3) if hour asleep was 12, then time asleep was assumed to be midnight regardless of AM/PM selection. Due to limited data on occupation schedule, workday and non-workdays were assumed to correlate with weekday and weekends, respectively. This measure will be referred to as survey-derived continuous sleep duration.
Accelerometer.
Selected participants (n = 190) were asked to wear a Daysimeter device for seven consecutive days for two non-consecutive quarters (Q1/Q3 or Q2/Q4) [11]. Data collection throughout the course of seven consecutive days provides the opportunity to observe potential differences in weekday and weekend sleep patterns. Similarly, we asked participants to wear the Daysimeter during two non-consecutive seasons to account for potential seasonal differences in sleep that may occur [12, 13]. The Daysimeter has an accelerometer embedded within the device [14] and was worn on the non-dominant wrist during sleeping hours. Raw Daysimeter accelerometry data were processed and assigned an activity index. The sleep algorithm was developed based on the Actiware-Sleep Version 3.4 algorithm [15]. Date of wear was extracted from the raw Daysimeter data and used to determine type of day (weekday or weekend). Valid device data was determined to be at least one quarter of 3 weekdays + 1 weekend of wear. Average sleep duration estimates for valid weekdays and weekends were calculated separately.
Sleep diary.
Participants were asked to complete a sleep diary for every day of Daysimeter wear. In the sleep diary, participants were asked “What is the best estimate of how much actual sleep you got last night?” Participants estimated the number of hours and minutes of their actual sleep duration for each night. Corresponding diary data for each day of Daysimeter data was required for inclusion in final study population. Diary data were averaged for weekdays and weekend separately.
Exclusion criteria
Of the 190 AVSS study participants assigned a Daysimeter, participants were excluded from the current analysis for the following reasons: incomplete AVSS pre- or post-survey data (n = 15), implausible device data (n = 1), reported working night shift (n = 7), insufficient days of device wear or diary entries (n = 40), or reported less than 3 hours or greater than 14 hours of sleep duration for any of the sleep measures (n = 8). Thus, final analyses were conducted on a sample of n = 119 participants.
Statistical analysis
Mean sleep duration was calculated separately based on the two survey questions, the Daysimeter, and the diary. Reliability of survey-reported sleep duration was calculated using Spearman correlation of the pre- and post-survey for each survey question (categorical and continuous) overall.
We used the method of triads [16] to estimate the validity coefficient (VC) between the three sleep duration measurements in the present study and the “true” latent sleep duration measure. The method of triads is a technique in factor and path analysis that uses an observed correlation matrix to fit a theoretical correlation matrix [16]. This approach assumes a positive linear relationship between the observed and true measures and that the random errors in each observed estimate are independent [16]. In practice, the “true” sleep duration measurement is not available. The method of triads concludes that the any observed associations between the three measures being compared is due to their relationship with the “true” latent sleep duration [16]. A VC is calculated for the observed measures with the “true” sleep duration using a set of Pearson pairwise correlation coefficients (r) (Fig 1). Correlations were interpreted following standard convention (<0.40: weak, 0.40–0.69: moderate, 0.70–0.89: strong, and ≥0.90: very strong) [17]. We used bootstrapping methods with a sample size of n = 1,000 to calculate the 95% confidence intervals (CI). All analyses were done in R version 4.2.3.
Pairwise correlation coefficients (rSA, rDS, rAD) between the different methods (S, D, A) are used to calculate the VCs.
Results
A total of 119 participants were included in the study, among whom 52.10% were male (n = 62) and the mean age was 51.74 years (SD = 9.71, range 32–73 years) (Table 1). Over half of the participants self-reported race as White (61.34%). The majority of participants reported average sleep quality to be “fairly well” or “very well” (88.98%) and reported having not taking sleep medicine in the past month (86.21%) (Table 2). Average weekly sleep duration was shortest when measured by the diary (412.09 min, SD = 54.67 min) and the difference between average weekday and weekend sleep duration ranged from approximately 32–58 minutes across the various sleep duration measures.
Test-retest correlation for the pre- and post-survey showed moderate and strong correlations for sleep duration collected categorically and continuously, respectively (Table 3). Bivariate Pearson correlations between the survey, Daysimeter, and diary are shown in Table 4. Correlations of sleep duration showed moderate to strong agreement between measures with the exception of categorical weekend sleep duration collected from the survey and the Daysimeter (r = 0.36, 95% CI 0.20–0.51). Generally, survey-derived continuous sleep duration had better correlations with the Daysimeter and diary compared to the survey-assessed categorical sleep duration.
The VC for survey-derived continuous sleep duration and the latent sleep duration was 0.82 (95% CI 0.71, 0.90) for weekday and 0.68 (95% CI 0.46, 0.83) for weekend (Table 5). Performance of VC was slightly weaker for survey-assessed categorical sleep duration (weekday VC = 0.57 95% CI 0.42, 0.71; weekend VC = 0.47 95% CI 0.29,0.62).
Discussion
This study used data collected from a year-long activity validation study to assess how well two different survey questions measured average sleep duration. Survey-derived continuous sleep duration performed better than survey-assessed categorical sleep duration in the test-re-test Spearman correlation and the VC. Overall, survey-based sleep duration showed moderate to strong reliability with latent sleep duration, suggesting that self-reported sleep duration is a reliable measure of actual sleep duration.
Performance of survey-based sleep duration varied by question design. There was stronger agreement between continuously derived sleep duration and latent sleep duration compared to categorical sleep duration. This may be due to the categorical response options capturing an approximated rather than a more precise sleep duration. Categorical response options included in the final analyses captured a two-hour range in sleep duration, with the exception of the last category, leading to the midpoint of each category to vary by two hours. For example, if a participant reported 6.5 hours on the pre-survey (categorical option: 5–6 hours) and 7 hours on the post-survey (categorical option: 7–8 hours), the Spearman correlation for the categorical response would compare 6 hours versus 8 hours and 6.5 hours versus 7 hours for the continuous response. This difference in precision is displayed in the Spearman correlations for pre- and post-survey sleep duration where categorical sleep duration had lower correlations than continuous sleep duration. Twenty-eight and 23 participants selected a different categorical option from the pre-survey to post-survey on weekdays and weekends respectively, which would have contributed to the reduced agreement in the correlation. Although categorical sleep duration still showed moderate agreement with latent sleep duration, the varied performance between categorically- and continuously-derived sleep duration supports the notion that the structure of the question and response impacts the reliability of the measure.
We examined sleep duration on the weekday and weekends separately to investigate if the self-reporting of sleep duration changes based on type of day. The measurement of survey-assessed weekday sleep duration generally appeared to be more consistent and in better agreement with latent sleep duration compared to survey-assessed weekend sleep duration, consistent with previous studies [18, 19]. Sleep patterns often vary throughout the week, commonly due to work and social commitments [20]. The accumulation of sleep debt from insufficient sleep on weekdays can lead to longer sleep bouts on the weekends to “catch-up” on sleep [20]. This phenomenon was seen in our study as mean weekend sleep duration was greater than mean weekday sleep duration across all measurement tools. It is possible that the self-reporting of average weekend sleep duration presents more challenges compared to average weekday sleep duration due to less structured and/or inconsistent weekend schedules. In addition, we were limited to just two weekends of observations, which may produce additional variability. Results highlight the importance of measuring sleep duration separately on weekdays and weekends as mean values and reliability of measurements may vary.
Prior studies have used two-way comparisons to assess the reliability of sleep duration measures [5, 6, 21, 22]. Some have compared survey-assessed sleep duration to objective measures such as actigraphy or PSG and reported poor correlations [6, 19], while others have compared survey-assessed sleep duration to sleep diaries and have reported correlations ranging from 0.39 to 0.48 [18]. Although the use of two-way comparisons have been helpful to understand how survey-assessed sleep performs relative to other subjective or objective tools, no single sleep measurement tool is without error making it difficult to select a reference tool. It is possible that the random errors of the measurements used in a two-way comparison are correlated which violates the assumptions needed for a traditional validation approach [16]. The method of triads can be helpful when there is no perfect measure to serve as a reference, provided the two model assumptions are met; that there is a linear relationship between the measures and independence of errors [16]. With the addition of the third measurement in the method of triads, and the assumption of a positive linear relationship between the observed and true measures, we were able to overcome this limitation. In our study, we could not rule out the possibility of a correlation between the random errors of the survey and diary, but we believe the linear assumption will hold. As a result, the estimated VCs of the two survey-assessed sleep duration measures with latent sleep duration should be viewed as the upper limit of criterion validity [16]. To our knowledge, this is the first study to assess criterion validity of survey-assessed sleep duration using the method of triads.
Differences in structure of survey questions and sleep diaries may impact participant responses, reproducibility, and validity. Sleep diaries require participants to recall sleep duration from only the previous night while survey questions may require recall of average sleep duration over the prior year. Cognitive burden of the sleep diary is lower than that of the survey. Additionally, sleep is dynamic and varies throughout the duration of a year. Asking participants to recall the average sleep duration during the prior year may introduce non-differential measurement error. As previously mentioned, categorical versus continuous response options may lead to varying precision of measurement introducing measurement error. The sleep diary used in this study asked participants to report an estimate of “actual sleep” and the average duration recorded from the sleep diary was lower than that recorded from both survey questions. It is possible that when answering the diary question participants removed time from nocturnal awakening. This would be particularly important to consider when working with populations at higher risk of experiencing insomnia symptoms (e.g., cancer survivors [23]). Careful questionnaire and diary design may reduce random or systematic error in the measurements.
Strengths of this study include the diverse study population, as 18% and 20% of study participants self-reported their race as Black and Hispanic, respectively, and the ability to leverage multiple measures of sleep duration. We were also able to capture seasonality due to the collection of Daysimeter sleep measures in two non-consecutive quarters. Our study is not without limitations. Daysimeter devices were sent to only a small proportion of AVSS participants leading to a relatively small sample size after applying the exclusion criteria. Measures of accelerometer-measured sleep may change due to the device and/or algorithm used [24, 25]. The use of the Daysimeter device as the accelerometer limits the generalizability to other studies as Actigraph devices are the most commonly used research-grade device and utilize different data-processing algorithms. However, the Daysimeter has been validated for research [14], and the method of triads seeks to address such limitations [16]. Another limitation of this study is the relatively long test-re-test period. Sleep is dynamic and it is possible that sleep patterns can naturally change over the course of a year. Despite the long test-re-test period, correlations for pre- and post-survey measures remained moderate to strong. Due to the relatively small sample size, we were unable to stratify by various factors that may impact sleep patterns such as age, sex, chronotype, and use of sleep medicine. Future research should consider the impact of various demographic, behavioral, and health factors on the validation of sleep measures. Lastly, the AVSS study population is representative of the ACS CPS-3 cohort, but not of the more general US population.
Conclusion
Criterion validity was stronger for weekday sleep duration measures compared to weekend sleep duration. The two survey-assessed sleep duration questions used in the AVSS and CPS-3 cohorts displayed acceptable reliability and validity when using the method of triads to assess sleep duration.
Acknowledgments
The authors express sincere appreciation to all Cancer Prevention Study-3 participants, and to each member of the study and biospecimen management group. The authors would like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention’s National Program of Cancer Registries and cancer registries supported by the National Cancer Institute’s Surveillance Epidemiology and End Results Program.
Disclaimer: The study protocol was approved by the institutional review boards of Emory University, and those of participating registries as required. The authors assume full responsibility for all analyses and interpretation of results. The views expressed here are those of the authors and do not necessarily represent the American Cancer Society or the American Cancer Society–Cancer Action Network.
References
- 1. Chaput J-P, Dutil C, Featherstone R, Ross R, Giangregorio L, Saunders TJ, et al. Sleep duration and health in adults: an overview of systematic reviews. Applied Physiology, Nutrition, and Metabolism. 2020;45(10):S218–S31. pmid:33054337
- 2. Okifuji A, Hare BD. Nightly analyses of subjective and objective (actigraphy) measures of sleep in fibromyalgia syndrome: what accounts for the discrepancy? The Clinical journal of pain. 2011;27(4):289. pmid:21178589
- 3. Lauderdale D. Survey Questions About Sleep Duration: Does Asking Separately About Weekdays and Weekends Matter? Behavioral Sleep Medicine. 2014;12(2):158. pmid:23570614
- 4. Tsuno N, Besset A, Ritchie K. Sleep and depression. Journal of clinical psychiatry. 2005;66(10):1254–69. pmid:16259539
- 5. Lauderdale DS, Knutson KL, Yan LL, Liu K, Rathouz PJ. Sleep duration: how well do self-reports reflect objective measures? The CARDIA Sleep Study. Epidemiology (Cambridge, Mass). 2008;19(6):838.
- 6. Silva GE, Goodwin JL, Sherrill DL, Arnold JL, Bootzin RR, Smith T, et al. Relationship between reported and measured sleep times: the sleep heart health study (SHHS). Journal of Clinical Sleep Medicine. 2007;3(6):622–30. pmid:17993045
- 7. Sadeh A. The role and validity of actigraphy in sleep medicine: An update. Sleep Medical Reviews. 2011;15(4):259–67. pmid:21237680
- 8. Tryon WW. Issues of validity in actigraphic sleep assessment. Sleep. 2004;27(1):158–65. pmid:14998254
- 9. Patel AV, Jacobs EJ, Dudas DM, Briggs PJ, Lichtman CJ, Bain EB, et al. The American Cancer Society’s Cancer Prevention Study 3 (CPS‐3): Recruitment, study design, and baseline characteristics. Cancer. 2017;123(11):2014–24. pmid:28171707
- 10. Rees-Punia E, Matthews CE, Evans EM, Keadle SK, Anderson RL, Gay JL, et al. Demographic-specific validity of the cancer prevention study-3 sedentary time survey. Medicine and science in sports and exercise. 2019;51(1):41. pmid:30095743
- 11. Diver WR, Figueiro MG, Rea MS, Hodge JM, Flanders WD, Zhong C, et al. Evaluation of a Novel Ambient Light Survey Question in the Cancer Prevention Study-3. International Journal of Environmental Research and Public Health. 2023;20(4):3658. pmid:36834353
- 12. Suzuki M, Taniguchi T, Furihata R, Yoshita K, Arai Y, Yoshiike N, et al. Seasonal changes in sleep duration and sleep problems: A prospective study in Japanese community residents. PLoS One. 2019;14(4):e0215345. pmid:30998709
- 13. Mattingly SM, Grover T, Martinez GJ, Aledavood T, Robles-Granda P, Nies K, et al. The effects of seasons and weather on sleep patterns measured through longitudinal multimodal sensing. NPJ digital medicine. 2021;4(1):76. pmid:33911176
- 14. Bierman A, Klein TR, Rea MS. The Daysimeter: a device for measuring optical radiation as a stimulus for the human circadian system. Measurement Science and Technology. 2005;16(11):2292.
- 15. Figueiro MG, Steverson B, Heerwagen J, Kampschroer K, Hunter CM, Gonzales K, et al. The impact of daytime light exposures on sleep and mood in office workers. Sleep Health. 2017;3(3):204–15. pmid:28526259
- 16. Kaaks RJ. Biochemical markers as additional measurements in studies of the accuracy of dietary questionnaire measurements: conceptual issues. The American journal of clinical nutrition. 1997;65(4):S1232–S9. pmid:9094927
- 17. Schober P, Boer C, Schwarte LA. Correlation coefficients: appropriate use and interpretation. Anesthesia & analgesia. 2018;126(5):1763–8. pmid:29481436
- 18. Mallinson DC, Kamenetsky ME, Hagen EW, Peppard PE. Subjective sleep measurement: comparing sleep diary to questionnaire. Nat Sci Sleep. 2019;11:197–206. pmid:31686932
- 19. Aili K, Åström-Paulsson S, Stoetzer U, Svartengren M, Hillert L. Reliability of actigraphy and subjective sleep measurements in adults: the design of sleep assessments. Journal of Clinical Sleep Medicine. 2017;13(1):39–47. pmid:27707448
- 20. Roenneberg T, Pilz LK, Zerbini G, Winnebeck EC. Chronotype and social jetlag: a (self-) critical review. Biology. 2019;8(3):54. pmid:31336976
- 21. Patel SR, Ayas NT, Malhotra MR, White DP, Schernhammer ES, Speizer FE, et al. A prospective study of sleep duration and mortality risk in women. Sleep. 2004;27(3):440–4. pmid:15164896
- 22. Jackson CL, Patel SR, Jackson WB, Lutsey PL, Redline S. Agreement between self-reported and objectively measured sleep duration among white, black, Hispanic, and Chinese adults in the United States: Multi-Ethnic Study of Atherosclerosis. Sleep. 2018;41(6):zsy057. pmid:29701831
- 23. Savard J, Morin CM. Insomnia in the context of cancer: a review of a neglected problem. Journal of clinical oncology. 2001;19(3):895–908. pmid:11157043
- 24. Quante M, Kaplan ER, Cailler M, Rueschman M, Wang R, Weng J, et al. Actigraphy-based sleep estimation in adolescents and adults: a comparison with polysomnography using two scoring algorithms. Nature and science of sleep. 2018;10:13. pmid:29403321
- 25. Van De Water AT, Holmes A, Hurley DA. Objective measurements of sleep for non‐laboratory settings as alternatives to polysomnography–a systematic review. Journal of sleep research. 2011;20(1pt2):183–200. pmid:20374444