Changes in the quantity and quality of time use during the COVID-19 lockdowns in the UK: Who is the most affected?

We investigated changes in the quantity and quality of time spent on various activities in response to the COVID-19-induced national lockdowns in the UK. We examined effects both in the first national lockdown (May 2020) and the third national lockdown (March 2021). Using retrospective longitudinal time-use diary data collected from a demographically diverse sample of over 760 UK adults in both lockdowns, we found significant changes in both the quantity and quality of time spent on broad activity categories (employment, housework, leisure). Individuals spent less time on employment-related activities (in addition to a reduction in time spent commuting) and more time on housework. These effects were concentrated on individuals with young children. Individuals also spent more time doing leisure activities (e.g. hobbies) alone and conducting employment-related activities outside normal working hours, changes that were significantly correlated with decreases in overall enjoyment. Changes in quality exacerbated existing inequalities in quantity of time use, with parents of young children being disproportionately affected. These findings indicate that quality of time use is another important consideration for policy design and evaluation.


Study context
To slow down the spread of COVID-19, the UK underwent a three-month national lockdown from 26 March to 23 June 2020 ('first national lockdown'). The lockdown measures resulted in drastic changes to most daily routines: all schools and 'non-essential' shops were closed, UK residents were not allowed to leave their home except for a few specific reasons (such as buying necessary supplies), and anyone who was not classified as a 'key worker' (such as NHS staff) was instructed to work from home. Similar restrictions were in place during UK's third national lockdown, lasting from 6 January to 12 June 2021. Table  S1 compares key public health measures implemented during both lockdowns (at the time of our survey), using Oxford's COVID-19 Government Response Tracker data [1]. The key difference between the first and third lockdowns is that stay-at-home requirements were recommended during the first lockdown, but required during the third lockdown.
Note that the UK's second national lockdown, from 5 November to 2 December 2020, was substantially less restrictive, with schools and non-essential shops remaining open, and no stay-at-home or work-from-home mandate. Using COVID-19 Government Response Tracker data [1] to put these differences in perspective, the average stringency of measures during the first and third lockdowns was 77.2 and 84.8 on a 0-100 scale respectively, 1 compared to 67.3 during the second lockdown. We therefore focused our study on the first and third national lockdowns.  Table S1. Comparison of key public health measures during the first and third lockdown, using data and classifications from the COVID-19 Government Response Tracker. Measures correspond to the weeks in which we conducted our survey.

Data collection
We collected data in two waves. All respondents were paid a modest incentive of 5 GBP per hour for their participation.

Wave 1
Wave 1 was conducted in 13-19 May 2020, 7 weeks into the UK's first national lockdown. In this wave, we collected sociodemographic information and time use diaries for the first two timepoints: pre-pandemic (defined as February 2020) and the first national lockdown (26 March to 23 June 2020).
We used the survey platform Prolific to recruit individuals who were over 18, had lived in the UK since December 2019, and were still in the labor market (including unemployed and searching for work) in February 2020. Prolific is a reputable survey company used primarily by researchers for surveys and experiments. Compared to in-person data collection methods or similar platforms such as MTurk, Prolific has been shown to deliver higher or comparable data quality [2] [3].
To ensure we recruited a demographically diverse sample and improve the generalizability of our results, we requested that our sample should match the composition of the UK population in gender, age, and ethnicity (see Prolific for further details of their recruitment process and criteria used: https://researcher-help.prolific.co/hc/engb/articles/360019236753-Representative-Samples-on-Prolific).
Prolific contacted a total of 1239 individuals. 1060 of these individuals (85.6%) submitted a complete response to the Wave 1 questionnaire, 42 individuals (3.4%) submitted unusable responses (such as incomplete or missing time use diaries), and 137 individuals (11.1%) did not respond.

Wave 2
We contacted the same respondents in 1-7 March 2021 (Wave 2), 7 weeks into the UK's third national lockdown. We collected time use diaries for this timepoint as well as information about changes in respondents' employment situation.
Of the 1036 respondents from Wave 1, 762 filled in the Wave 2 questionnaire, giving a response rate of 74%. Table  S2 compares our longitudinal sample to the full sample and shows that at the 5% level, the observable characteristics of Wave 1 and Wave 2 respondents were similar.  Table S2. Comparison of Wave 1 and Wave 2 respondents. Covariates were defined as binary variables that equal 1 if the respondent satisfied the specified condition. 'White' includes mixed-race respondents. 'Has young child' equals 1 if the respondent lives with at least one child aged 11 or under. 'Tertiary degree' equals 1 if the respondent obtained any post-secondary educational qualification. *** p < 0.01, ** p < 0.05, * p < 0.1.
To conduct a more formal check for attrition bias, we ran a probit regression where the outcome variable equals 1 for respondents that participated in both waves, and 0 otherwise, using the baseline characteristics reported in Table S2 as covariates. Estimates are reported in Table S3. Older respondents were significantly more likely to participate in both waves, but we did not find evidence of systematic variations in participation across gender, ethnicity, education, or household composition. As a robustness check, we repeated our main analysis using inverse probability weights to account for attrition among younger respondents and obtained qualitatively similar results (Section 7).

Participated in both waves
(1)  Table S3. Estimates of correlations between sociodemographic characteristics and participation in both survey waves. A positive coefficient indicates the covariate was associated with an increased probability of participating in both waves. 'White' includes mixed-race respondents. 'Has young child' equals 1 if the respondent lives with at least one child aged 11 or under. 'Tertiary education degree' equals 1 if the respondent obtained any post-secondary educational qualification. Robust standard errors are reported in brackets. *** p < 0.01, ** p < 0.05, * p < 0.1.

Longitudinal sample
Our longitudinal sample consisted of individuals who completed at least one time use diary for each timepoint (N=766). Table S4 shows descriptive statistics comparing characteristics of our respondents with those participating in a nationally representative longitudinal survey (Understanding Society). As our sample only included adults who were in the workforce from January-May 2020 (including those unemployed but looking for a job), for comparability we also present characteristics of the correspondent subset from Understanding Society. Compared to Understanding Society, our sample contained similar proportions of respondents in some age brackets (aged 25-29, 30-34, 40-44, 50-54) and those who identified as white.
Since our sample overrepresented adults with a tertiary degree and older adults, there may be concerns that our results are specific to the group of individuals we recruited. As a robustness check, in Section 8, we repeated our main analysis using calibration weights so that our sample was reweighted to match the weighted Understanding Society inworkforce characteristics reported in Table S4 (gender, age, ethnicity, education, and household composition). We obtained qualitatively similar findings, suggesting that our results do not seem to be specific to our particular sample composition.  For each timepoint (pre-pandemic, first lockdown, third lockdown), respondents retrospectively filled in time use diaries for their most recent workday (if applicable) and most recent non-workday. Diaries for the first two timepoints were completed in one sitting (Wave 1), and diaries for the third timepoint were completed in another sitting (Wave 2).
Time use diaries record the chronological sequence of activities that respondents did over a 24-hour period through a series of 'episodes'. The 24-hour period we chose was midnight to 11:59pm. The time use diary format and structure followed that of the 2014/15 UK Time Use Survey (https://www.timeuse.org/node/10833). For each episode within a time use diary, we asked respondents to fill out: (1) the episode start and end time (with a minimum duration of 10 minutes per episode); (2) the main activity of that episode; (3) the secondary activity that the respondent was engaged in simultaneously (if any); (4) whom they did the activity with; (5) where they did the activity; (6) whether they used a device for that episode; (7) how much they enjoyed the activity (on an increasing scale of 1 to 7). Figure S5 shows a screenshot of the format used to record episode-specific information. Figure S5. Screenshot of one time use 'episode'. Clicking on the arrows leads to dropdown menus with pre-specified options.
For the full list of options, see Table S5.
For the main and secondary activity, respondents chose from a pre-defined list of 42 activities (derived from the 2014/15 UK Time Use Survey), each falling under one of 12 broad activity domains. For our main analyses, we further aggregated these activities into 4 categories: Housework, Employment, Leisure, and Subsistence (40 categories total, excluding travelling and studying). Table S6 shows the mapping between individual activities, activity domains, and categories.
To ensure that respondents recorded information in a comparable way, before filling in their time use diaries, respondents were given written guidelines and examples of how to enter in episode-specific information. Respondents also had to correctly complete three fictional diary episodes based on specific information provided before they could proceed to their time use diaries.

Time use diary cleaning
This section describes specific issues that arose in the raw diary data and how we dealt with them.
Missing activities. There were some episodes that have missing main activities. If the main activity was missing and the secondary activity was not missing, we recoded the secondary activity as the main activity. If the episode was missing a main activity but not missing the start and end times, we checked if the subsequent or preceding rows had non-empty activities and missing or equivalent start or end times. If so, we replaced the missing activities (and other information for that episode) with the activities in the subsequent or preceding row. For example, if row began at 16.00 and ended at 17.30, had a missing main activity and missing secondary activity and if row + 1 had missing times but had cooking and eating recorded as the main and secondary activities respectively, we replaced row with activities from row + 1.
Missing start and end times. There were some episodes that had missing start or end times. If the episode was missing a start time, we replaced the start time with the end time of the preceding episode, or 24.00 if the episode was the last in that diary day. If the episode was missing an end time, we replaced the end time with the start time of the subsequent episode, or 0.00 if the episode was the first in that diary day. We dealt with entries that both started and ended at 0.00 or 24.00 in a later step. If episode had a missing end time (but had a start time), and episode + 1 had a missing start time (but had an end time), we set the duration of the episode to be 10 minutes.
Incorrectly recorded AM and PM times. We provided respondents with a 24-hour clock to record the start and end times of each episode. There were episodes where respondents mixed up the AM or PM nature of the clock. For example, an episode that spanned 14.00-15.15 may be followed by an episode that spanned 3.15-4.00. In these cases, we adjusted the times to be consistent with the sequence of activities reported by the respondent.
Overlapping episodes. Within each diary, there were consecutive episodes that overlap. This could happen in the following ways: • Two episodes had the same start and end time. If the secondary activity was blank for both episodes, we combined the episodes into one single episode where the activity from the first entry was the main activity and the activity from the second entry became the secondary activity. If the secondary activity was non-blank for only one episode, we kept the episode with the secondary activity and used the other entry to impute any missing activity characteristics. If the secondary activity was non-blank for all overlapping episodes and the main activity was the same, we adjusted the end time of the first entry and the start time of the second entry so both episodes had equal duration. If the secondary activity was non-blank for all overlapping episodes and the main activities differed across overlapping episodes, we combined the overlapping episodes into one episode, where the main activity from the second episode became the secondary activity for the combined episode. • Two episodes had the same start times but different end times (e.g. episode spanned 15.00-16.00 while episode + 1 spanned 15.00-16.30). If the overlapping episodes had the same main and secondary activity, we kept the episode that ended later. If the overlapping episodes did not have the same main and secondary activities, we kept these overlapping episodes, and in a later step we adjusted the end time of episode and the start time of episode + 1 by equal amounts so that they did not overlap. In the example above, the times would be adjusted to be 15.00-15.45 for episode and 15.45-16.30 for episode + 1.
• Two episodes had different start times but the same end time. (e.g. episode spanned 15.00-16.00 while episode + 1 spanned 15.30-16.00). If the episodes had the same main and secondary activity, we kept the episode that started earlier. If the episodes did not have the same main and secondary activities, we kept these overlapping episodes, and in a later step we adjusted the end time of episode and the start time of episode + 1 by equal amounts so that they did not overlap. In the example above, the times would be adjusted to be 15.00-15.45 for episode and 15.45-16.30 for episode + 1.
• Two episodes had different start and end times but the time intervals of episode and episode + 1 overlapped (e.g. episode spanned 15.00-16.00 while episode + 1 spanned 15.30-16.30). If the overlapping episodes had the same main and secondary activity, we combined both entries into one episode. If the episodes had different activities, then we adjusted the end time of episode and the start time of episode + 1 by equal amounts so that they did not overlap. In the example above, the times would be adjusted to be 15.00-15.45 for episode and 15.45-16.30 for episode + 1).
Missing interval between 2 episodes. There were some episodes that had a positive time gap between them (e.g. episode spanned 15.00-16.00 and episode + 1 spanned 16.30-18.00). If the missing time interval between the end time of episode and the start time of episode + 1 was less than 60 mins, we adjusted the end time of episode to equal the start time of episode + 1. If the missing time interval was over 60 minutes, then we adjusted the end time of the episode and the start time of episode + 1 to meet halfway.

Same start and end time.
There were some episodes that began and ended at the same time. If an episode was not the first entry in a diary day but had the same start and end time, we adjusted the start time to be 10 minutes earlier and the end time of the preceding episode to be 10 minutes earlier. If an episode was not the last entry in a diary day but had the same start and end time, we adjusted the end time to be 10 minutes later and the start time of the subsequent episode to be 10 minutes later.
Imputing starting or ending sleep episodes. Some diaries did not start at 0.00 or end at 24.00 because respondents did not include their sleep episodes (e.g. the first entry of the day started at 7.00 with the main activity 'eating'). In these cases, we added an entry where the main activity was 'sleeping'. The same adjustment was made by the UKTUS. We also imputed the characteristics of these episodes as follows: secondary activity = none, where = at home, used a device = no, with whom and episode-specific enjoyment = modal answer across all recorded sleep episodes by that respondent for that timepoint. Table S7 reports summary statistics for the time use diaries after cleaning. Only a very small proportion of episodes needed cleaning, and the proportion of cleaned episodes were similar across timepoints.

Summary statistics of time use diaries
As outlined in our pre-analysis plan, only individuals with at least one complete time use diary in each timepoint were counted in our final sample. A complete time use diary is defined as having 3 or more entries in one day (after data cleaning).
We obtained a total of 3982 time use diaries: 1417 for the pre-pandemic timepoint, 1247 for the first national lockdown, and 1318 for the third national lockdown. Since the time use diaries were completed retrospectively, and the pre-pandemic information was based on recall from three months earlier (respondents completed information on February 2020 in a May 2020 survey), there may be concerns that recall bias would particularly affect the prepandemic data.
To investigate this possibility, we compared the mean pre-pandemic time spent on each activity domain (specified in Table S6) with the mean times obtained from a nationally representative survey (the 2014/15 UK Time Use Survey). This approach followed that of other COVID-19 studies on time use, which used UK data from 2014-2016 as the prepandemic baseline [4][5][6]. Table S8 shows the mean time spent (hours per day) in each broad activity category used for our main analysis. Since the UKTUS was designed to be nationally representative, to make valid comparisons we re-weighted our data using calibration weights to match the composition of the Understanding Society in-workforce sample (see Section 8 for details of the methodology). Comparing our weighted data with the UKTUS in-workforce weighted data (columns in bold), the means across all broad activity categories were similar for both workdays and non-workdays ('Non-WD'). Therefore, we could proceed with some confidence in the reliability and external validity of our pre-pandemic baseline data.  Table S8. Comparison of mean time spent (hours per day) on broad activity categories during workdays and non-workdays (Non-WD) in our data and the 2014/15 UK Time Use Survey (UKTUS). 'In workforce' is the subset of UKTUS respondents who were employed or seeking work. Time spent across specific activities was aggregated into broad activity categories according to the classification in Table S6. Our data was re-weighted using calibration weights to match the composition of a nationally representative sample (Understanding Society in-workforce respondents). The UKTUS data was weighted to account for nonresponse, using weights provided by the UKTUS. For each diary day, we calculated the total time spent in each activity category by adding up time spent across specific activities (main activity only) using the classification in Table S6. Since a respondent completed up to 2 diary days per timepoint, we then obtained a single value for each timepoint by dividing the total time spent by the total number of applicable diary days: total time spent on housework, leisure, and subsistence were divided by 2 if a respondent completed both a workday diary and non-workday diary; total time spent on employment was not divided by 2 because there is at most one applicable diary day per timepoint. Note that the total time spent across these 4 categories may add up to less than 24 hours because we excluded travelling and studying from our main analysis.

Quality of time use
We used episode-specific information to construct 4 indicators for the quality of time use: • Multitasking: For each respondent and timepoint, we calculated the total time spent on episodes that contained both a main and secondary activity, where the main and secondary activities belonged to different broad categories (e.g. employment as main activity, housework as secondary activity). • Leisure time spent alone: For each respondent and timepoint, we calculated the total time spent on episodes where the activity category was 'leisure' and the episode-specific characteristic 'with whom' was 'alone' by the total number of episodes (leisure and non-leisure episodes). • Increase in unusual work hours: Unusual work hours was defined as any employment-related activity conducted outside standard working hours (the time window of 8.30-17.30 on a workday), which included employmentrelated activities conducted on a non-workday and job searching activities undertaken by unemployed respondents. The time window was determined by taking the median start and end time of employment activities across all respondents' pre-pandemic workday diaries. • Increase in unusual housework hours: Unusual housework hours was defined as any housework-related activity conducted within standard working hours (8.30-17.30 on a workday).

Enjoyment
For each timepoint, we calculated a single measure of enjoyment by aggregating episode-specific enjoyment (measured on a 1-7 Likert scale) across all episodes and diary days, weighted by the duration of time spent on each episode. To mitigate issues with interpersonal comparability of levels of enjoyment [7], we instead calculated withinperson differences in aggregate ('overall') enjoyment over the timepoints considered.

Covariates
All regressions included the following covariates, which we also obtained via our online survey: • Female: A binary variable that equals 1 if the respondent identified as female.
• Living with child under 11: A binary variable that equals 1 if the respondent reported living with at least one child aged 11 or under. • Working from home: A binary variable that equals 1 if the respondent reported a non-zero percentage of time spent working from home during the period considered. We included two work-from-home indicator variables, corresponding to the first national lockdown (26 March 2020 -23 June 2020) and third national lockdown (6 January 2021 -12 June 2021) respectively. • Education: A binary variable that equals 1 if the respondent's highest educational attainment was a postsecondary degree, which included 2-year postsecondary qualifications. • Age: A set of binary variables indicating the respondent's age in May 2020: 18-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, and 60 or older. • White: A binary variable that equals 1 if the respondent identified their ethnic group as white (including mixedrace respondents). • Income: Self-reported monthly personal before-tax income (in GBP) from all sources of employment (main job plus any secondary jobs), excluding income from other sources such as government benefits or investments. Respondents selected from 12 pre-defined categories, specified in intervals of 500 GBP, ranging from '0 GBP' to 'more than 5000 GBP'. We construct 5 binary variables corresponding to incomes of 1000-2000 GBP, 2000-3000 GBP, 3000-4000 GBP, 4000-5000 GBP and more than 5000 GBP.
In our regression analysis, we used values of these covariates from the pre-pandemic timepoint, with the exception of 'Working from home', which we allowed to vary across timepoints. The estimated coefficients on sociodemographic characteristics should therefore be interpreted as the correlation between the outcome variable and having that characteristic in February 2020.

Pre-analysis plan
Before conducting our analysis, we uploaded a pre-analysis plan to AsPredicted.org (https://aspredicted.org/blind.php?x=3az7we), which describes the key variables, sample inclusion criteria, hypotheses to be tested, and analyses to be conducted. Our analysis followed the procedures outlined in our pre-analysis plan, but with the following extensions: • We initially planned to use fragmentation (number of times the respondent did the activity in a given day, divided by the total number of activities in that day) as an indicator for the quality of time use. However, since most respondents only repeated a given activity 1-3 times per day (defined according to our broad activity categories), this measure did not have a large enough range to yield meaningful descriptions of changes in the quality of time use. We therefore excluded fragmentation from our analysis and included two additional indicators (unusual work hours and unusual housework hours) that had strong justification in the literature. • We initially planned to include episode-specific enjoyment (measured on a 1-7 Likert scale) as a control variable in multivariate regression analysis. However, after early presentations of our work and discussions with colleagues, questions were frequently asked about the effects of changes in quality and quantity on overall enjoyment, so we extended our analysis to include regressions with enjoyment as the dependent variable.
6 Additional Results 6.1 Changes in time use using alternative measures of time spent Our main measure of total time spent was calculated using information about the main activity that the respondent engaged in during each episode. However, respondents could also record a secondary activity for each episode, so our measure does not capture any changes due to multitasking. We therefore considered two alternative measures: time spent as a secondary activity, and time spent as either the main or secondary activity.
Figures S9-S10 show bar charts analogous to Figure 1 in our main manuscript, using the alternative measures described above. Respondents with young children spent more time on housework as a secondary activity prepandemic, and significantly increased the time spent on housework as a secondary activity during both lockdowns. When considering time spent on each activity as a main or secondary activity, we obtained qualitatively similar results as in our main manuscript, but with larger magnitudes. For example, during the first lockdown, respondents with young children spent an average of 2.18 more hours per day on housework as a main or secondary activity, compared to 0.97 more hours per day when counting main activities only, though both of these changes are significant at the 5% level. Figure S9. Average within-person changes in time spent on 4 broad activity categories, as a secondary activity only. Bars represent changes in hours per day spent on that category as a main activity, comparing the pre-pandemic timepoint (February 2020) to the first and third lockdowns (May 2020 and March 2021, respectively). Within-person changes for employment activities are calculated using the subset of individuals who remained employed in both periods of interest. Error bars represent 95% confidence intervals, and average levels for each subgroup are reported underneath the bars. Note that the conditional means were calculated separately (either by gender or household composition), so the four subgroups shown are not mutually exclusive. Figure S10. Average within-person changes in time spent on 4 broad activity categories, as either a main or secondary activity.
Bars represent changes in hours per day spent on that category as a main activity, comparing the pre-pandemic timepoint (February 2020) to the first and third lockdowns (May 2020 and March 2021, respectively). Within-person changes for employment activities were calculated using the subset of individuals who remained employed in both periods of interest. Error bars represent 95% confidence intervals, and average levels for each subgroup are reported underneath the bars. Note that the conditional means were calculated separately (either by gender or household composition), so the four subgroups shown are not mutually exclusive. Table S11 shows the proportion of respondents (out of N=766) who were employed at each timepoint and at any two given timepoints. 86.04%, 62.97%, and 74.02% of respondents were employed during the pre-pandemic period, Lockdown 1, and Lockdown 3, respectively. 62.01% of respondents were employed in the pre-pandemic period and during Lockdown 1. 69.97% of respondents were employed in the pre-pandemic period and during Lockdown 3. 58.22% of respondents were employed during Lockdowns 1 and 3. 57.1% of respondents were employed across all three timepoints.

Measuring inequality in time use
As part of our pre-analysis plan, we intended to use Lorenz curves to visualize changes in the inequality of time use across respondents. Due to space constraints in the main manuscript, we present these results here ( Figure S12). Table S13 presents Gini coefficients of the distributions shown in Figure S12. Across broad activity categories, the distribution of time spent on housework as a main activity was the most unequal, and the distribution of time spent on subsistence activities was the most equal. Within timepoints, the larger Gini coefficients for time spent on secondary activities suggest variations in multitasking behavior. Across timepoints, the changes in time spent on employment are clearly seen in Figure S12: a substantial increase in inequality during the first lockdown that was partly reversed during the third lockdown.

Total time spent on subsistence as main activity (hours per day)
Cumulative proportion of respondents  Table S13. Gini coefficients of time spent (hours per day) on broad activity categories, by timepoint. Gini coefficients range from 0 (perfect equality) to 1 (complete inequality). 'Pre' refers to pre-pandemic (February 2020), 'LD1' refers to the first national lockdown (May 2020), and 'LD3' refers to the third national lockdown (March 2021). All respondents in our sample were included.
6.4 Correlation between time use and sociodemographic characteristics for broad activity categories Table S14 shows the regression estimates underlying Figure 2 in our main manuscript.  . In addition to the variables reported, we also controlled for age, and pre-pandemic levels in total time spent on the given activity. Robust standard errors are reported in brackets. *** p < 0.01, ** p < 0.05, * p < 0.1. Table S15 shows the association between time spent on travelling and studying during the two lockdowns and individual characteristics.    Table S16. Estimates of selection equation using Heckman's two step estimator. 'Working during pre-pandemic period' is a binary indicator that equals 1 if the respondent was working in the pre-pandemic period and zero otherwise. '% time working from home pre-pandemic' is a continuous variable, ranging from 0 to 100, that measures self-reported percentage of time in a typical work week (pre-pandemic) that the respondent worked from home. Time spent was calculated using the main activity only. Robust standard errors are reported in brackets. *** p < 0.01, ** p < 0.05, * p < 0.1.  Table S17. Estimates of time spent (hours per day) on employment activity domains by sociodemographic characteristics, without correcting for selection into employment. Regressions used the subset of individuals who remained employed in both periods of interest. Regressions also controlled for age and pre-pandemic levels in total time spent on employment activities. Time spent was calculated using the main activity only. Robust standard errors are reported in brackets. *** p < 0.01, ** p < 0.05, * p < 0.1.

Correlation between time use and sociodemographic characteristics for specific activity categories
We re-ran the regression specification using our main measure of time use (main activity only), where the outcome variable was change in time spent on activity subcategories. Table S18 shows the results for employment activity subcategories (estimated using the Heckman selection model). Table S19 shows the results for housework activity subcategories. Table S20 shows the results for leisure activity subcategories. Lastly, Table S21 shows the results for subsistence activity subcategories.    Table S21. Estimates of time spent (hours per day) on subsistence activity subcategories, by sociodemographic characteristics. Time spent was calculated using the main activity only. Regressions also controlled for age and pre-pandemic levels in total time spent on the given activity. Robust standard errors are reported in brackets. *** p < 0.01, ** p < 0.05, * p < 0.1. Table S22 shows the regression estimates underlying Figure 5 in our main manuscript.  Table S22. Estimates of correlations between within-person changes in overall self-reported enjoyment and characteristics of time use. Sociodemographic covariates were used as controls. Reported changes during the first and third lockdown (May 2020 and March 2021, respectively) were relative to the pre-pandemic timepoint (February 2020). Robust standard errors are reported in brackets. A coefficient of 0.1 corresponds to ~0.13 SD in pre-pandemic enjoyment levels. *** p < 0.01, ** p < 0.05, * p < 0.1.

Robustness to attrition
In Section 2.2, we found evidence that older respondents were more likely to complete both survey waves. To account for potential bias arising from this differential attrition, we used inverse probability weights to re-weight our longitudinal sample. Specifically, we used the probit estimates reported in Table S3 to obtain predicted probabilities of each respondent appearing in both survey waves, conditional on sociodemographic characteristics. We then used the inverse of the predicted probability as that respondent's weight.
Tables S23-S34 compare the weighted results with the unweighted results presented in Figures 1-4 Table S28. Weighted and unweighted estimates of time spent (hours per day) on leisure and subsistence activities, by sociodemographic characteristics. Time spent was calculated using main activities only. Regressions also controlled for age and pre-pandemic levels in total time spent on the given activity. Robust standard errors are reported in brackets. *** p < 0.01, ** p < 0.05, * p < 0.1.  Table S30. Weighted and unweighted mean values and within-person differences in leisure time spent alone, by timepoint and demographic subgroup. Inverse probability weights were used to construct weighted averages. 'Pre-pandemic' refers to February 2020, 'LD 1' refers to the first lockdown (May 2020), 'LD 3' refers to the third lockdown (March 2021). 95% confidence intervals are reported in brackets. Note that the conditional means were calculated separately (either by gender or household composition), so the four subgroups shown are not mutually exclusive.  Table S33. Weighted and unweighted estimates of mean levels and within-person differences in overall enjoyment, by timepoint and demographic subgroup. 'Pre' refers to pre-pandemic (February 2020), 'LD 1' refers to the first lockdown (May 2020), 'LD 3' refers to the third lockdown (March 2021). 95% confidence intervals are reported in brackets. Note that the conditional means were calculated separately (either by gender or household composition), so the four subgroups shown are not mutually exclusive.  Table S34. Weighted and unweighted estimates of correlations between within-person changes in overall selfreported enjoyment and characteristics of time use. Sociodemographic covariates were used as controls.

Quality measures: Inverse probability weights
Reported changes during the first and third lockdown (May 2020 and March 2021, respectively) were relative to the pre-pandemic timepoint (February 2020). Robust standard errors are reported in brackets. A coefficient of 0.1 corresponds to ~0.13 SD in pre-pandemic enjoyment levels. *** p < 0.01, ** p < 0.05, * p < 0.1.

Sample representativeness
In Section 2.3, we found evidence that our sample is, on average, more educated and older than a nationally representative sample of the UK workforce. To address potential concerns about sample representativeness, we used calibration weights to reweight our sample to match the composition of Understanding Society's in-workforce sample across gender, age, ethnicity, education, and household composition (all defined as categorical variables).
Specifically, let denote a vector of binary characteristics, where the ℎ element equals 1 if the respondent satisfies the condition of characteristic . Since the characteristics are binary, there is a finite number of possible combinations of these vectors, denoted .
Let , denote the weight currently assigned to an individual with the characteristic vector , which is calculated by dividing the number of individuals with these characteristics by the total sample size (the proportion ). We chose the calibration weights , to be as 'close as possible' (in the squareddistance sense) to the original weights , , such that the re-weighted proportions equal those of the nationally representative sample , :      Table S46. Weighted and unweighted estimates of correlations between within-person changes in overall selfreported enjoyment and characteristics of time use. Sociodemographic covariates were used as controls.
Reported changes during the first and third lockdown (May 2020 and March 2021, respectively) were relative to the pre-pandemic timepoint (February 2020). Robust standard errors are reported in brackets. A coefficient of 0.1 corresponds to ~0.13 SD in pre-pandemic enjoyment levels. *** p < 0.01, ** p < 0.05, * p < 0.1.
9 Other data quality issues 9

.1 Selection on unobservables
We showed that our results are robust to accounting for attrition bias and re-weighting to match the composition of the UK workforce. However, our reweighted data may still suffer from selection in ways that correlate with outcomes of interest. For example, we conducted our survey online, so are more likely to reach individuals who spend more time online. We cannot rule out the possibility that these participants systematically differ from non-participants in their lockdown experiences and time use. Still, given the widespread use of broadband and smartphones in the UK, this issue is less of a concern than it would have been a decade ago. In fact, older respondents (aged 55 and above), who are conventionally seen as less likely to be online, are over-represented in our longitudinal sample.

Measurement error in time use diaries
There are two main sources of measurement error arising from our time use diary methodology: recall bias and individual-specific variation in recording activities. Any measurement error in time spent on various activities will create attenuation bias in our estimates, so the true size of the changes may be larger than those we document.
Firstly, the information for the pre-pandemic timepoint (February 2020) was obtained in May 2020, three months after the particular days of interest actually occurred, and so may be less accurate than the information for the other two timepoints, where respondents recalled events of one or two days ago. We tried to mitigate this issue by encouraging respondents to refer to their planners when completing the time use diaries.
To investigate potential differences in accuracy across timepoints, we compared the distribution of episode start and end minutes. Doing so enables detection of rounding (for example, to the nearest hour), which is more likely to occur if respondents could not remember the exact start or end time of an episode. Figure S47 shows that these distributions were very similar across all timepoints, suggesting that the degree of recall bias due to rounding is unlikely to vary across timepoints. Table  S7 (Section 3) shows that across all timepoints, the proportions of episodes that need cleaning due to mis-recording were very small (less than 0.1%) and similar across timepoints. Second, since time use diaries were self-completed, there may be individual-specific variation in recording of activities. The self-completion method has been shown to obtain similar data quality compared to the external coding method used by 2014/15 UKTUS or objective real-time instruments [8,9]. We used the following methods to improve the standardization of responses and ensure our main results are robust to such variation: • Requiring respondents to complete a 5-10-minute tutorial on how to fill in time use diaries. We provided details instructions in PDF format which respondents could refer to when filling in their own diaries. Before filling in their own diaries, respondents had to correctly fill in three fictional time use diaries according to the instructions provided. • Focusing on within-person differences, which will 'difference out' this variation (to the extent that such variation is constant across time for each individual). Using broad activity categories. At the level of aggregation used in our main analysis, it is highly unlikely that respondents will mis-classify activities, for example, mistaking a subsistence activity for a leisure activity. There may be some differences in the way that respondents record main and secondary activities, but the supplementary analysis in Section 6.1 shows that we obtain similar results (in sign and statistical significance), albeit of a different magnitude, when considering time spent in the main and secondary activity or the secondary activity only.