Automated feature extraction from population wearable device data identified novel loci associated with sleep and circadian rhythms

Wearable devices have been increasingly used in research to provide continuous physical activity monitoring, but how to effectively extract features remains challenging for researchers. To analyze the generated actigraphy data in large-scale population studies, we developed computationally efficient methods to derive sleep and activity features through a Hidden Markov Model-based sleep/wake identification algorithm, and circadian rhythm features through a Penalized Multi-band Learning approach adapted from machine learning. Unsupervised feature extraction is useful when labeled data are unavailable, especially in large-scale population studies. We applied these two methods to the UK Biobank wearable device data and used the derived sleep and circadian features as phenotypes in genome-wide association studies. We identified 53 genetic loci with p<5×10−8 including genes known to be associated with sleep disorders and circadian rhythms as well as novel loci associated with Body Mass Index, mental diseases and neurological disorders, which suggest shared genetic factors of sleep and circadian rhythms with physical and mental health. Further cross-tissue enrichment analysis highlights the important role of the central nervous system and the shared genetic architecture with metabolism-related traits and the metabolic system. Our study demonstrates the effectiveness of our unsupervised methods for wearable device data when additional training data cannot be easily acquired, and our study further expands the application of wearable devices in population studies and genetic studies to provide novel biological insights.


Introduction
Sleep is essential for human health and well-being, and changes in sleeping patterns or habits can negatively affect health, leading to physical and mental disorders [1][2][3][4] . The corresponding sleepwake circadian rhythm is also essential for human health. Circadian rhythms are endogenous biological processes that follow a period of approximately 24 hours and are entrained by environmental stimuli such as the light/dark cycle to adjust the 24-hour cycle 5 . The circadian system is important for sleep regulation, and dysregulated sleep-wake circadian rhythms can cause diseases including sleep disorders, metabolic syndrome, and psychiatric and neurodegenerative diseases [6][7][8] . While it is vital to obtain a thorough understanding of the important roles of sleep and circadian rhythms in human health, they still remain poorly understood.
Actigraphy has been increasingly used in sleep and circadian studies, as it can provide continuous and objective activity monitoring and is low-cost and easy to wear. Actigraphy can address some of the limitations with traditional sleep diaries, including subjectivity, bias, difficulty in completion by young children or patients, and extra manual work for caregivers. While polysomnography (PSG) as the "gold standard" in sleep studies does not have the same issues as sleep logs, it is limited by high costs, in-lab setting, intrusive measures, and difficulty in long-time monitoring. The continuous and objective measures provided by actigraphy can provide reliable information on sleep 9 , and it is especially useful for studying long-duration circadian rhythms. In cases of large-scale epidemiological studies, the availability of actigraphy data provides excellent opportunities for studying population-level and individual-level sleep characteristics and circadian rhythm patterns. However, the analysis of actigraphy data remains a major obstacle for researchers.
One major challenge in the application of actigraphy is to extract sleep and circadian rhythm features without additional information, such as sleep diaries and/or PSG validation that are often unavailable or labor-intensive to collect. To extract sleep features such as sleep start, sleep end, and sleep duration, it is typical to either obtain the gold-standard PSG records or obtain sleep dairies on go-to-sleep time and wake-up time to build sleep identification algorithms [10][11][12][13] . To ease researcher's efforts in collecting additional data and still accurately infer sleep features, there is a need to develop necessary methodology to infer sleep parameters based on actigraphy in the absence of sleep records. The method will be useful in circadian studies where PSG for long-time monitoring cannot be acquired and continuous activity logs requires much manual work. It is also particularly useful in large-scale epidemiological studies where collecting PSG for all participants is unrealistic and recording sleep logs is labor-intensive.
In this paper, we analyzed data from the UK Biobank study, where accelerometer data from over 100,000 participants and genetic data from near 500,000 participants are available. We applied novel data processing methods to the accelerometer data from 90,515 participants after quality control procedures to extract sleep and circadian rhythm features. We then conducted genome-wide association studies (GWAS) and identified 19 and 34 genetic loci associated with sleep traits and circadian rhythm traits at p<5×10 -8 respectively, of which 5 and 13 loci reached the significance level p<5×10 -9 . Further tissue enrichment analysis highlights the important roles of the central nervous system and the metabolic system, thereby providing new insights into the molecular regulation and genetic basis of sleep and circadian rhythms.

Loci Associated with Sleep-Activity Traits and Circadian Traits
The Manhattan plots of GWAS results for HMM inferred sleep and activity traits are shown in Supplementary Figure S1. The heritability estimates from LD score regression 14 for mean activity levels during sleep and during wake are ~5% and ~7%, respectively. For mean activity levels during sleep, four independent regions on chromosomes 2, 5, 6, and 14 contained significant SNPs with p-value < 5×10 -8 , two of which had p-values < 5×10 -9 . The strongest association signal was on chromosome 2 at SNPs within gene MEIS1, known for association with Restless Leg Syndrome and Insomnia (Table 1). [15][16][17][18] The other three genetic loci were rs188904275 in JAKMIP2 (p-value = 3.7×10 -8 ), rs184670665 near IMPG1 (p-value = 2.4×10 -10 ), and rs73586669 near OR4E1 (pvalue = 2.4×10 -8 ), with JAKMIP2 previously found to be associated with Body Mass Index (BMI) measurements and pulmonary diseases 19 . These novel loci were not previously associated with activity levels during sleep. SNP rs7087063 in gene CELF2 on chromosome 10 was found to be associated with the HMM estimated activity variability during wake (p-value = 3.0×10 -8 ), and we note that CELF2 was previously found to be associated with Alzheimer's disease 20 (Table 1) Figure S1. The heritability estimates from LD score regression 14 for sleep start and sleep end are ~8% and ~7%, respectively. SNP rs573901234 near gene JHDM1D-AS1 on chromosome 7 was associated with short sleep duration < 5 hours (p-value = 1.4×10 -8 ). Five SNPs were associated with long sleep duration > 10 hours, among which rs74460673 at gene TMEM39B was previously found to be associated with BMI 21,22 and rs573982927 at MYO3B was associated with obesity traits 23,24 . Twenty seven SNPs were found to be associated with sleep start, including three SNPs at or near gene MEIS1 related to Restless Leg Syndrome and insomnia [15][16][17][18] and nineteen SNPs at gene BTBD9 also related to Restless Leg Syndrome 25 (Supplementary Table S1). Sleep start is also associated with one SNP at gene CYP7B1 related to BMI, two SNPs near gene HSD17B12 related to BMI 21,26,27 , two SNPs near MIR129-2 also related to BMI 21,22 and Alzheimer's Disease 28 , and two SNPs near LOC101928944 related to schizophrenia 29,30 . As for sleep end, association signals were found for five SNPs in the intergenic regions near LINC02260, which is known to be related to red blood cell measures such as cell counts and hemoglobin content 31 , and blood cell information is also known to be associated with sleep deprivation and sleep disorders [32][33][34] . Sleep end is also associated with one SNP at gene NTNG1 related to Restless Leg Syndrome 25 and BMI 21,35 , one SNP near LINC00963, and one SNP near GLRX3.
1/2-day periodicity measures the strength of day-night rhythmicity 39 and the strongest association signals (p-value < 5×10 -9 ) were detected in the intergenic regions near UBE2F-SCLY and FBXO15 and in the intronic regions at FYB1 and CFAP44, in which intergenic regions near FBXO15 were previously found to be associated with insomnia 40 and FYB1 was associated with depression 41,42 . There were also association signals for SNP clusters in the intergenic regions near   gene TNR as well as for SNPs at RASEF, TMEM132D, ERCC2, MPPED1 and near BRINP3,   LINC01287, and MLYCD (p-value < 5×10 -8 ; Supplementary Table S2). 1/3-day periodicity measures the 1/3-day rhythmicity that not only involves activities during the day but also captures activities during sleep 39 . The strongest signals (p-value < 5×10 -9 ) were identified at SNP clusters in the intergenic regions near BRINP3, URB2, GRIA1 and LOC400682 and in the intronic regions at MGAT5, C3orf20, and LINC01861, where BRINP3 was associated with BMI measurement 21,22,27 , depression 43 , and rheumatoid arthritis 44 , and GRIA1 was associated with schizophrenia 29,45,46 and circadian rhythm 47 . A cluster of five SNPs at CDH6 reached significance level at 5×10 -8 , and CDH6 was associated with resting heart rates 48 . Details on other novel SNP associations are listed in Supplementary Table S2. QQ-plots show that the estimated values of the inflation factor λ are under 1.07 and that the population structure was properly controlled for (Supplementary Figure S2).
With respect to the quality control procedures for all GWAS, the inflation factor λ was under 1.12 for all traits considered, suggesting appropriate control of population structure in the analysis (Supplementary Figure S2). LD Score intercepts estimated using LD score regression 14 were in the range of 0.99-1.02, indicating that the genomic inflation was due to polygenic architectures rather than uncorrected population structure. Further, we estimated partitioned heritability 49 using ten broad tissue categories and no tissue was significantly enriched. For all sleep time-related traits, the adrenal/pancreatic tissues were relatively more enriched than the other tissues (Supplementary Figure S3). For activity levels during sleep, the cardiovascular tissues were relatively more enriched compared to the other tissues (Supplementary Figure S3). To further investigate the relationships of sleep and circadian traits with other complex traits,   we estimated genetic correlation with a number of traits, including screen exposure, sleep, mental   health, BMI and diet, alcohol consumption, shift-work, and diseases such as respiratory diseases   and anemia (Supplementary Table S3) and we set the statistical significance threshold at 7.9 × 10 −5 through Bonferroni correction (12 sleep and circadian traits × 53 traits). For sedentary and screen exposure traits, we observed significant negative genetic correlation of time spent watching TV or using computer with the strength of circadian rhythmicity (correlation=-0.300, =2.1 × 10 −12 and correlation=-0.294, =5.7 × 10 −10 , respectively), and we also observed significant negative correlation of time spent using computer with sleep duration, sleep start (go to bed early), and activity levels during the wake status(correlation=-0.238, =2.7 × 10 −7 ; correlation=-0.353, =3.5 × 10 −5 and correlation=-0.200, =1.8 × 10 −22 respectively). There was positive genetic correlation between the time spent watching TV and activity levels during sleep (correlation=0.203, =7.3 × 10 −5 ). For BMI and diet related traits, we observed significant negative genetic correlation between BMI and weight with the strength of circadian rhythm (1-day periodicity), activity levels when awake, and sleep duration (all negative correlations with pvalues < 7.9 × 10 −5 ). We did not observe significant genetic correlation with mental health traits, alcohol consumption, shift-work, and respiratory diseases either. While the genetic correlation with anemia was not significant, there was moderate positive genetic correlation between activity levels during sleep and iron deficiency anemia (correlation=0.404, =6.8 × 10 −3 ), and iron deficiency is known to be associated with poor sleep quality 68 and restless legs syndrome 50,51 .

Genetic Correlation of Sleep and Daily Rhythm with Other Traits
We also observed significant positive genetic correlation between accelerometer-derived and self-reported sleep duration, between accelerometer-derived sleep start/end and self-reported chronotypes, and between accelerometer-derived sleep end and self-reported hypersomnia (all p-values < 7.9 × 10 −5 ). These results suggest agreement between accelerometer-derived measures and self-reported measures for sleep time and are also consistent with previous studies 52 . We did not observe significant genetic correlation of accelerometer-derived measures with sleep disorders, possibly because the ambiguity in the definition of sleep disorders. We did not observe significant genetic correlation between accelerometer-derived activity level measures and self-reported physical activity measures.
For the significant trait-pair of BMI and 1-day periodicity denoting the strength of circadian rhythm, we further conducted two-sample MR analysis using GWAS summary statistics from the GIANT study 27 . The estimated causal effects across different MR estimation methods are negative for both directions but were not statistically significant.

Cross-Tissue Transcriptome-Wide Association Analysis
We applied UTMOST 53 to perform tissue enrichment analysis in 44 tissue types and identified single-tissue and cross-tissue gene-trait associations. For single-tissue tests, 124 gene-trait pairs were identified (Supplementary Table S4). There were 43 unique genes in total and 16 gene-trait pairs were identified in more than one tissue type (p-values < 3.3 × 10 −6 after Bonferroni correction for 15,000 genes 53 ). Among them, the GLTP-sleep duration pair was identified in 38 tissue types, which is a novel association not reported in previous studies, and GLTP is related to Glycolipid Transfer Protein for protein binding and lipid binding 54 . L3MBTL2, Lethal (3) malignant brain tumor-like protein 2 related to protein binding and DNA regulation 55 , was associated with more than one trait, including activity levels during sleep and wake as well as the circadian rhythm, and the three L3MBTL2-trait pairs were all significant in the subcutaneous adipose tissues. Most genes that appeared in more than one gene-trait pair have functions in protein binding, including CEP70, GIPC2, TRAF3, ABCD2, LIMS1, SEPN1. BRSK1 and LRFN4 are related to neurotransmitter activities 55 . TREH is related to digestion and galactose metabolism 55 , and the TREH-sleep start pairs are significant in mucosa esophagus, stomach, and transverse colon, all of which belong to the digestive system. Among all 44 tissue types, subcutaneous adipose, brain anterior cingulate cortex, and skeletal muscles are most enriched with seven genetrait pairs (Supplementary Figure S4). Tissue enrichment analysis through UTMOST identified novel genes associated with sleep and circadian traits and highlighted the relevance of tissues in the central nervous system and the metabolic system in sleep and circadian regulation.  Table S5).
Three genes, CA7, DYNC1LI1, and ELMOD2, were associated with more than one trait. CA7 was associated with activity levels during sleep and activity levels during wake, and in previous studies its expression in brain was associated with neurological disorders 56 . DYNC1LI1 was associated with sleep duration and daily rhythmicity and its functions are RNA binding 57 and protein binding 58 . ELMOD2 was associated with activity variability during wake, sleep duration, sleep end and daily rhythmicity and its function may be related to respiratory diseases 59 . Among genetrait pairs, GRIA1 was also identified in GWAS analysis and was associated with circadian rhythms and sleep traits in previous studies 40,47 , and its functions involve amyloid-beta binding in hippocampal neurons and ionotropic neurotransmitter receptor activities 60 . Gene Ontology (GO) enrichment analysis using DAVID 61 did not identify significant enrichment among different GO terms.

Discussion
Using information related to sleep and activity inferred from the UK Biobank accelerometer data, we identified five genetic loci associated with HMM derived sleep related traits. The association of activities during sleep with restless leg syndrome, obesity, and pulmonary diseases may be attributed to the fact that individuals with these traits may be less likely to sleep well at night and are prone to being restless. Sleep start and end are also strongly associated with Restless Leg Syndrome, as people with this disorder may have difficulty falling asleep or staying asleep, and because symptoms worsen at night only with a short period in early morning that is symptomfree, people with Restless Leg Syndrome tend to have late sleep onset 62 . Sleep duration, sleep start and sleep end are all associated with BMI, consistent with the established association between sleep and obesity. As previous studies 63 discussed how decreased sleep duration can elevate obesity risks and how sleep disorders can increase risks for chronic health conditions, our study also provides evidence of genetic correlations among sleep, metabolism and obesity. For sleep end, it is interesting to identify SNPs near LINC02260, which was previously found to be associated with red blood cell related measures 31 . It is known from previous studies that sleep deprivation and sleep disorders can lead to changes in metabolism with altered red blood cell measurements such as increased red blood cell counts [32][33][34] . Thus, our study also confirmed the important link between sleep and blood cell metabolism, and the complex relationship remains to be studied.
For circadian and daily rhythm analysis, we were able to identify 13 loci associated with periodic traits. The SNPs located at FBXO15 and GRIA1 were associated with circadian rhythms and sleep traits in previous studies 40,47 , and the associations of SNPs in XKR4 and CDH6 with thyroid stimulating hormones and resting heart rates can be explained by the circadian oscillation natures of hormones and heart rates. 64,65 These association results suggest that the Penalized Multi-band Learning approach 39 in extracting clinically meaningful circadian features from objective physical activity measures. Furthermore, we identified novel associations for SNPs at or near FYB1, GRIA1, CDH6 and BRINP3, which are associated with BMI and mental problems, suggesting shared genetics between sleep-wake circadian rhythms and multiple traits.
Our study found that sleep and physical activity are closely related to mental and neurological problems, as the identified SNPs/loci from our GWAS results were also associated with traits including depression, schizophrenia and Alzheimer's disease. Our cross-tissue transcriptome-wide association analysis also implied the important role of the central nervous system in physical activity, sleep, and circadian rhythm. Our results are consistent with previous UK Biobank studies 66,67 and confirmed the shared genetic architectures of sleep and physical activity with mental health and neurological disorders.
Our study shows strong evidence for shared genetics of activity and circadian rhythm with metabolism-related traits and the metabolic system. In addition to GWAS results that suggest shared genetic architecture with BMI, genome-wide genetic correlation estimates also provide strong evidence that a higher BMI is associated with lower physical activity levels and weaker daily rhythmicity. Furthermore, our UTMOST cross-tissue transcriptome-wide association analyses also implicate that the adipose tissues and skeletal muscle tissues, which are mostly related to metabolism and physical activity, may have important roles as they together with the brain cortex are most enriched. Our study suggests the complex interplay among activity, circadian rhythm, metabolic phenotypes and the central nervous system.
We also note that the gene TREH, which was identified in UTMOST tissue enrichment analysis and whose function is related to digestion and galactose metabolism, is associated with sleep start in esophagus, stomach and colon that all belong to the digestive system, suggesting the link between digestion and sleep. From daily experience, a heavy dinner or an empty stomach may affect the ability to fall asleep, and on the other hand, insomnia may also affect the digestive system. It is known from previous studies 68-70 that sleep problems are associated with gastrointestinal problems and digestive disorders such as gastroesophageal reflux disease and irritable bowel syndrome. We are just beginning to dissect the underlying genetic architecture of sleep, and details on the complex relationships of sleep with the central nervous system and the metabolic system remain to be studied.
This study demonstrates the utility of device measures by presenting applications of population-level objective physical activity data in genetic studies, using novel methods to effectively extract sleep, activity and daily rhythm features. Our results show that accelerometerderived sleep duration and sleep start/end correlate well with self-reported sleep duration, chronotypes, and hypersomnia. The discrepancy between accelerometer-derived and self-reported physical activity measures are likely due to self-report bias, as people tend to overestimate their daily activity and underestimate sedentary behaviors. The HMM-based algorithm can classify sleep/wake epochs, and the estimated HMM parameters can further be used as sleep and activity related features directly: mean activity levels and activity variability during sleep or wake can characterize individual sleep-activity patterns effectively 71 . In a similar manner, the Penalized Multi-band Learning approach 39 can identify the population-level dominant periodic information that depicts daily rhythms, and further, individual variations in the strength of periodic signals can be utilized as circadian and daily rhythm features in further analyses. Our study is the first to extract and utilize periodic features from objective physical activity data in genetic studies to examine the genetic architecture of circadian and daily rhythms. In summary, using large-scale population studies like the UK Biobank study with objective physical activity data and genetic data available, we were able to extract meaningful sleep, activity and circadian variables from time-series data using automated algorithms and further conduct genetic analysis to deepen our understanding of the underlying genetic structure. Our study demonstrates the effectiveness of our methods and the utility of device-based activity data in sleep and circadian studies. Our methods can help expand the application of wearable device data in health studies and further provide novel insights into the shared genetic architectures of sleep, activity, and circadian rhythms with metabolic and neurological traits.

Data
Data were collected from the UK Biobank study 72 , a longitudinal population-based study with around 500,000 participants living in the UK. Genetic data from 487,409 participants were available when we accessed the UK Biobank data 72,73 in September, 2017. The dataset includes around 96 million single nucleotide polymorphisms (SNPs), including imputation based on the UK10K haplotype, 1000 Genomes Phase 3, and Haplotype Reference Consortium (HRC) reference panels 73 . We applied filters to exclude SNPs with Minor Allele Frequency < 0.1%, Hardy-Weinberg equilibrium < 1e-10 and imputation quality score (UKBB information score 73 ) < 0.8. After these steps, a total of 11,024,754 SNPs remained in further analyses.
Besides genetic data, a subset of 103,712 UK Biobank participants agreed to have their objective physical activity data 74 collected, and they were asked to wear an Axivity AX3 wrist accelerometer for seven consecutive days from 2013 to 2015. The device has been demonstrated to provide equivalent output to the GENEActiv accelerometer, which has been validated against free-living energy expenditure assessment methods. [75][76][77] This study was covered by the general ethical approval for UK Biobank studies from the National Research Ethics Service by National Health Service on 17th June 2011 (Reference 11/NW/0382). The accelerometer dataset that we acquired in September of 2017 consists of data in the activity count format summarized every fivesecond epoch from 103,706 participants. We applied similar quality control procedures as other accelerometer studies 78 . We excluded individuals with flagged data problems, poor wear time, poor calibration, recorded interrupted periods, or inability to calibrate activity data on the device worn itself requiring the use of other data. We also excluded individuals if the number of data recording errors was greater than 3rd quartile + 1.5×IQR. After pre-processing, 92,631 individuals remained, and 90,515 of them had genotyping data that were available for further genetic analyses.

Identification of Loci Associated with Sleep and Circadian Rhythms
Sleep and circadian rhythm phenotypes were derived from accelerometer data. For sleep, because it is a large-scale population study and no validation sleep logs are available, we need unsupervised algorithms to extract sleep features from accelerometer data. Specifically, we developed an unsupervised sleep-wake identification algorithm 71 based on Hidden Markov Model (HMM) 79,80 to infer the sequence of "hidden states" of sleep or wake for each individual. This HMM can be directly applied to summary activity count data, which are widely used activity data formats that can save storage space, lengthen the duration of use of wearable devices, and increase computational efficiency. HMM assumes that in the sleep state or the wake state, activity counts follow different distributions as activity counts tend to be fewer in the sleep state compared to those in the wake state 71 . We used log transformation of activity counts and assumed that they follow different Gaussian distributions to infer the sequence of "hidden states" 71 .
In our model, we consider the log transformed data: log(count+1) as the observed data, because the large range of observed activity counts from zero to several thousand per epoch poses both statistical and computational challenges in data analysis. Our empirical results suggest that the HMM algorithm works well for the log transformed data. We observe activity count data from time 1 to time : ( ) = { 1 , 2 , … , }. Let ( ) = { 1 , 2 , … , } denote the sequence of the corresponding hidden states across these T time points, where each Xi can be one of the two possible hidden states = { 1 , 2 } in each epoch, with 1 denoting the sleep state and 2 denoting the wake state. We assume that Xi follows a Markov model, that is the hidden state +1 at time +1 solely depends on , and the observation at time solely depends on the hidden state .
denotes the 2 by 2 transition probability matrix, in which represents the transition probability from state to state . The emission probability ( | ) denoted by depends on the state of . If = 1 in the sleep state, we assume that the log transformed count follows zeroinflated truncated Gaussian distribution, which is truncated from 0 to the left. It has a zero component because sleep is associated with rare movements and activity measurements during sleep often involve many zeros. Therefore: where is the probability of extra zeros, 1 is the mean, 1 is the standard deviation, (⋅) is the probability density function of the standard normal distribution, and Φ(⋅) is its cumulative distribution function.
If = 2 in the wake state, we assume that the log transformed count follows the Gaussian distribution: where 2 is the mean and 2 is the standard deviation of the Gaussian distribution.
Therefore, the set of parameters for the emission probability is where is the tuning parameter and controls the balance between the L1 and L2 norms. Note that , 's are nonnegative and thus we do not need to take the absolute value for the L1 norm. By setting large enough, all diagonal elements of Θ, namely all , ′ , are suppressed to zero and no periodicities are selected. As decreases, some , ′ become nonzero and they correspond to the most dominant periodicities that are selected sequentially according to how dominant they are.
To minimize ( ), we take the partial derivative of ( ) with respect to each , : To identify genetic loci associated with each sleep and circadian phenotype, we conducted genome-wide association analysis, using PLINK (version 1.9) 88 . We included age, sex, and the first 20 principal components as covariates when fitting linear models, and we report statistically significant loci using a traditional threshold of 5×10 -8 . We also highlight results from a more stringent threshold of 5×10 -9 to take into account multiple testing using Bonferroni correction, which is the same threshold suggested in previous studies 67,89 .

Genetic Architecture of Sleep and Circadian Rhythms
To estimate heritability 14 , we applied LD score regression 14,49 with the LDSC tool. We also conducted partitioned heritability analysis 49 across tissue categories using the same tool.
Significant enrichments for individual traits were identified using the Bonferroni corrected threshold of p < 5 × 10 −4 (10 traits ×10 tissue types). For significantly correlated trait-pairs, we further investigated causal relationships by conducting bi-directional Mendelian Randomization (MR) analyses. We did not analyze trait-pairs related to sleep and activity, which have been studied elsewhere 52, 67 , but primarily focused on the circadian rhythm. We performed two-sample MR analysis using publicly available GWAS summary data extracted from the MR-Base web platform 91 . We used leave-one-out analysis and single-SNP analysis as sensitivity analyses and considered the inverse-variance weighted (IVW) approach, MR-Egger 92 , weighted median estimation and weighted mode estimation methods to examine whether there are consistent MR results across methods.

Cross-Tissue Transcriptome-Wide Association Analysis
To investigate functional and biological mechanisms underlying sleep-activity and circadian rhythms, we applied UTMOST 53 that utilizes GWAS summary statistics and integrates eQTL information to perform tissue enrichment analysis in 44 tissue types and identify single-tissue and cross-tissue gene-trait associations. The cross-tissue gene-trait association is evaluated via a joint test summarizing single-tissue association statistics and quantifying the overall gene-trait association. The UTMOST 53 p-value threshold after Bonferroni correction for 15,120 genes is 3.3 × 10 −6 . Gene ontology enrichment analysis was further conducted using DAVID 61 .