Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

When the outcome is compositional: A method for conducting compositional response linear mixed models for physical activity, sedentary behaviour and sleep research

  • Aaron Miatke ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft

    aaron.miatke@mymail.unisa.edu.au

    Affiliations Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia,, Centre for Adolescent Health, Murdoch Children’s Research Institute, Melbourne, Australia

  • Ty Stanford,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia,

  • Tim Olds,

    Roles Funding acquisition, Writing – review & editing

    Affiliations Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia,, Centre for Adolescent Health, Murdoch Children’s Research Institute, Melbourne, Australia

  • Francois Fraysse,

    Roles Writing – review & editing

    Affiliation Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia,

  • Carol Maher,

    Roles Writing – review & editing

    Affiliation Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia,

  • Josep Antoni Martin-Fernandez,

    Roles Methodology, Writing – review & editing

    Affiliation Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain

  • Dot Dumuid

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliations Alliance for Research in Exercise, Nutrition and Activity, Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia,, Centre for Adolescent Health, Murdoch Children’s Research Institute, Melbourne, Australia

Abstract

Time use is compositional in nature because time spent in sleep, sedentary behaviour and physical activity will always sum to 24 h/day meaning any increase in one behaviour will necessarily displace time spent in another behaviour(s). Given the link between time use and health, and its modifiable nature, public health campaigns often aim to change the way people allocate their time. However, relatively few studies have investigated how movement-behaviour compositions change longitudinally (with repeated measures), due to experimental design elements (e.g., intervention effects), or differences due to participant socio-demographic characteristics (e.g., sex, socio-economic status) within clustered sampling designs. This may be because most mixed-model packages that account for the random effects do not natively support a multivariate outcome such as movement-behaviour composition. In the current paper we provide a practical framework of how to implement a compositional multivariate-response linear mixed model that can be used to model the entire 24h movement-behaviour composition as the dependent variable within a multilevel framework. The method accounts for covariances across and within response variables at the grouping (individual, cluster etc.) and covariance between response variables at the observation level. Results are therefore invariant to the chosen log-ratio basis used to construct the response variables (i.e., mathematically equivalent models). The method outlined is applicable to many designs including longitudinal cohort studies, intervention trials, and clustered cross-sectional designs (e.g., students within schools, patients within clinics). In a worked example we show how this approach can be used to investigate how time is reallocated in children across the school year.

1.0 Introduction

Time spent in daily movement behaviours (sleep, sedentary behaviour (SB), physical activity) has been linked with many health measures, ranging from adiposity to mental health, cognition and mortality risk [1]; largely investigated in cross-sectional or observational data. Due to its modifiable nature, intervention and health promotion efforts have long attempted to improve specific aspects of time use, such as increasing time spent in physical activity [2,3], reducing sedentary time [4,5], or improving sleep [6]. Most statistical analyses of the effectiveness of such interventions consider only one activity in isolation, even though the constant sum constraint of a 24-h day means that other activity(ies) will necessarily undergo compensatory changes. As all the activities are important to health outcomes, they should all be considered when planning and executing movement behaviour interventions [7]. Understanding shifts in time use resulting from natural temporal (e.g., circadian and circannual) cycles and life transitions are also important considerations. It is imperative to be able to include the full 24-h composition of movement behaviours in the same statistical model when assessing the effectiveness of interventions. The inclusion of all raw 24-h movement behaviour variables (min/day) in statistical models has been problematic due to their constant sum constraint and inherent perfect multi-collinearity which may produce spurious results [8]. This has been overcome through the application of compositional data analysis (CoDA), whereby 24-h compositions are expressed as logratios prior to their inclusion in statistical models [9].

While CoDA is now commonly applied when 24-h movement-behaviour compositions are considered as independent or predictor variables, fewer studies have used CoDA to consider movement behaviour compositions as dependent or response variables [10]. There is currently no accepted methodology for analysing movement behaviour compositions in a multi-level framework that could be used to, for example, assess whether mean compositions change over time with repeated measurements on individuals, if there are intervention or experimental group effects in compositions over time, or if compositions are associated with sociodemographic or environmental factors while accounting for clustered sampling designs. As such, researchers have generally used other approaches to investigate changes or group differences in compositions. Most have ignored the compositional nature of the data entirely and modelled minutes per day of each behaviour separately [11,12]. Some have tried to respect the compositional nature of their data and expressed the compositions as a set of orthonormal log-ratio (olr) coordinates which are iteratively used as the dependent variables in multiple univariate models while accounting for the random effects of repeated measures on participants (as outlined further in section 1.2). Others have investigated how compositions change over time by creating a ‘change composition’ via perturbation. This reduces the data to a single observation and eliminates the need to model random effects. These are all useful approaches, however, they also all have drawbacks. The first approach ignores the compositional nature of the data completely; the second relies on interpreting individual olr coordinates which limited practical meaning; and the third lacks flexibility in its application. For example, it requires complete data and can only be used with two timepoints.

This paper is structured as follows: in Section 1, we provide background on the rationale for CoDA for 24-h movement behaviours. Section 1.1 introduces the log-ratio methodology and how it is applied within regression modelling, and Section 1.2 describes previous frequentist approaches to analysing compositions as the dependent variables, noting the inability of these methods to include multivariate response variables when the data have a multi-level structure. In Section 2, we present a method for a compositional multivariate-response linear mixed model (CMRLMM). Section 3 provides an example of the application of our CMRLMM to real-world data of Australian school children’s movement behaviour patterns across various time points. We also compare the CMRLMM to both the univariate response compositional linear mixed model and to the non-compositional linear mixed model fitted separately for each behaviour in raw minutes/day (e.g., Sleep, SB, light physical activity [LPA], moderate-to-vigorous physical activity [MVPA]). These comparators reflect common practice in the literature and illustrate the consequences of ignoring the correlations between olr coordinates/behaviours and the constant-sum constraint. In Section 4, we provide a concluding discussion on strengths and weaknesses of the new method, and areas for future development and research.

1.1 Brief overview of CoDA and application to 24-h movement behaviours

CoDA is used in behavioural epidemiology to analyse time spent across 24-hour movement behaviours, including sleep, SB, and physical activity. The log-ratio methodology transforms raw compositional data into orthonormal log-ratio (olr) coordinates (also known as isometric log-ratio coordinates [ilr]) before their inclusion in compositional regression models. However, current methods struggle to address a common issue in epidemiological studies: the multi-level structure of movement behaviour data caused by repeated measurements or clustered sampling designs.

Indeed, a composition is a vector in where the only relevant information is contained in the ratios between its components, these components of the composition are aptly named compositional parts. In the context of time-use epidemiology, these compositional parts typically represent the time spent in sleep, SB, LPA and MVPA. For example, the ratio of time spent in SB to time asleep within a 24-h window is invariant to whether those times are expressed as proportions of the day, hours or minutes and have specific properties due to the sample space. However, it is also worth noting that it is possible to conceptualise a time-use composition in a variety of ways. Instead of creating a time-use composition based on activity intensities, it is also possible to divide the day into time spent in contextual domains such as work, commuting, household activities, leisure etc. [13]. Compositions also exist in other fields within the behavioural and health sciences such as dietary macronutrient compositions (carbohydrate, fat, protein) [14] and body composition (fat, muscle, bone) [15]. In all these examples, a D-part composition can be expressed as a vector , where all parts are non-negative (but ideally strictly positive) and sum to a positive constant . Typically, its sample space is mathematically written as

where is known as the D-part simplex or D-simplex, a () dimensional subset of real space [16]. Because compositions only contain relative information, a composition can be closed to any positive constant such that

(1)

where is the closure operation to the constant . For example, if the parts of the vector represent time spent in the behaviours sleep, SB, LPA and MVPA, the composition could equivalently be closed to = 1440 (min/day), =24 (hour/day), =1 (proportion of day). When considering differences, or changes, in compositions, traditional algebraic addition and subtraction operations for Euclidean space, are not suitable as they are not scale invariant nor sub-compositionally coherent [17,18]. However, in the simplex space compositions can be perturbed such that which is analogous to addition in real space. In addition, the powering of by a constant , , which plays the typical role of the product of a vector by a scalar. Using these operations a linear regression model with compositional response can be formulated as

(2)

where the coefficients and is a real covariate. Further, the neutral element has the expected properties such that Finally the inverse element of x, is defined as , denoted as . It can be shown that when a composition is perturbed by its inverse it will result in the neutral element such that . While ‘stay in the simplex’ algebra like that outlined above can be used to summarise or interpret CoDa, most multivariate statistical techniques are not suitable for raw CoDa and require expressing the raw data in terms of log-ratios of the parts. As such, most compositional techniques used in time-use epidemiology favour using the log-ratio approach, specifically the olr transformation [9]. While other log-ratio approaches have been proposed, such as the additive log-ratio (alr) or centred log-ratio (clr), these have drawbacks that limit their use within statistical models such as mixed models. Namely, alr coordinates are asymmetric, meaning that distances are not preserved within the simplex space. This means results depend on the compositional part chosen as the denominator when constructing the coordinates. Unlike alr coordinates, clr coordinates are isometric, meaning distances are preserved. However, the clr coordinates are also restrictive and spurious when used within statistical models because of the zero-sum constraint [9,19,20]. Neither of these issues exist when using the olr transformation which is generally preferred in most modelling applications, including within time-use epidemiology [9]. The olr transformation involves expressing D-part compositional vector of movement behaviour data that exist in the simplex, , into D-1 olr-coordinates [21,22] that exist in real space, . That is, the vector are the coordinates of composition in an olr-basis. Importantly, these olr-coordinates can be used in standard multivariate statistical models [23]. For example, the linear regression model in Eq. (2) is expressed in olr-coordinates as .

An olr-basis can be created using a data-driven method such as Principal Balances [24] or R-mode cluster analysis [25]. Alternatively, the knowledge of the researcher can be used to improve the interpretation of the models when creating the olr-basis by a sequential binary partition (SBP) process, which is generally preferred in time-use research [15]. An SBP process uses a (D – 1) x D dimensional sign matrix to iteratively divide the compositional parts until all groups consist of a single component [26].

The olr-coordinates of the general form can be defined as

(3)

where the and are the number of parts in the j-th row of the sign matrix that are coded positive and negative, respectively and are the indexes of parts in the numerator and are the indexes of parts in the denominator of each row of the sign matrix [26]. Note that the olr-coordinate in Equation (3) is a “balance” between the average of two sets of parts. Balance coordinates may be of use when a researcher is interested in distinct groups of behaviours [27]. In the context of 24-hour movement behaviours, this means that each olr-coordinate represents a contrast between groups of behaviours such as sleep vs. waking activities, or active vs. passive behaviours. Together, these coordinates fully describe how time is distributed across the day while preserving the relative nature of the data. If a researcher has a particular interest in one behaviour, they may use pivot coordinates of the general form

(4)

Here, the first olr-coordinate ( reflects the dominance of one behaviour () relative to the geometric mean of the remaining behaviours and is of use when a researcher has a particular interest in one behaviour. Note that , that is, the first olr-coordinate is proportional to the first clr-coordinate The remaining olr-coordinates, , are then created in a similar manner with the denominator of the previous coordinate split with until no parts remain [28].

While hypothesis-driven construction of olr-coordinates may provide one way of interpreting findings from compositional models, there are limitations to how meaningful these contrasts are to everyday life. For example, the first pivot coordinate () must be interpreted as the relative increase of one activity, while the geometric mean of the remaining activities is reduced. In real life, changes in the geometric mean of a group of activities may be difficult to conceptualise, it is often easier to interpret findings within the simplex space [29]. Importantly, while the interpretation for individual olr-coordinates will change depending on the basis chosen when creating the olr-coordinates, collectively they are equivalent and will retain all relative information about the movement behaviour composition no matter how they are constructed such that for any olr-basis. This means in order to make inferences about the movement behaviour composition as a whole, the vector of olr-coordinates must be considered collectively. One option is to use the statistical model to compute point estimates for scenarios of interest, relevant to the research question. For example, if the logratios are considered as predictors of a health outcome in a linear regression model, the model parameters can be used to estimate the value of the health outcome for a selection of different movement behaviour compositions that emulate the reallocation of time between activities [30]. If the logratios are considered as the dependent variables in a multivariate response linear regression model, the model parameters can be used to estimate what the logratios (and the corresponding 24-h movement-behaviour compositions) would be at different levels/values of the predictor or independent variable. However, when considering compositional responses, standard multivariate response linear regression models are often unsuitable in epidemiological studies due to non-independence of observations resulting from repeated measurements on sampling units (e.g., participants; or clustered sampling designs, e.g., participants within health centres). In these instances, a linear mixed-effects model (LMM) that extends the general linear model to include both fixed and random effects that account for correlated observations is a popular and understood method. Importantly, commonly used statistical software packages for LMMs within a frequentist framework (STATA, SPSS, SAS, R) do not support the inclusion of multivariate outcomes natively (without some data manipulation and model re-specification), such as movement behaviour composition expressed as logratios [3133]. This difficulty explains why an approach previously, as described in Section 1.2, to investigate longitudinal changes in movement-behaviour composition has been to use multiple univariate response LMMs, each with a different olr-coordinate as the outcome [3438].

1.2 Compositional univariate-response linear mixed model

Previous studies investigating changes in movement-behaviour compositions with repeated measurements on the same individuals (i = 1,2,…,N) over T timepoints have used a univariate compositional LMM of the form outlined in Equation (5). This model will have time as the level 1 response which is nested within individuals as the level 2 response and has been used by some researchers in the general form

(5)

Where is the response for an individual olr-coordinate, for the i-th subject at the j-th timepoint, is the mean value for when all predictors are equal to zero, is the random intercept for i-th subject, is the slope coefficient representing the mean linear slope change over time, timepoint is the specific j-th time for subject i, and is the residual error. In the context of 24-h movement-behaviour research, this type of model has typically been used to assess whether time spent in specific behaviours, or balances of behaviours, changes across repeated measurement occasions (e.g., baseline, 6 months, and 12 months), or differs between intervention and control groups. For example, a researcher may use Equation (5) to test whether the relative balance between active and sedentary time changes following a lifestyle intervention or investigate average trajectories in time-use composition in a longitudinal cohort study. Previously, researchers have generally investigated compositional outcomes within a multilevel framework using multiple models of the general form Equation (5) in one of three ways. Approach 1) has involved iteratively constructing D sets of pivot coordinates (Equation (4)), with reflecting the dominance of a different behaviour in each instance and using this coordinate as the dependent variable in D different LMM analyses. For example, this approach has been used previously to investigate intervention effects on 24-h movement behaviours by fitting a separate model where corresponds to the pivot coordinate for sleep, then for SB, LPA, and MVPA, respectively [34,35]. Alternatively, approach 2) similarly involves a ‘multiple models’ approach for each component in the vector of olr-coordinates constructed using a single basis [3638], that is, a separate model for each element of the vector (Equation (3)) where each coordinate represents a particular balance of behaviours, for example, active vs passive behaviours. Approach 3) involves ignoring the compositional nature of the data and fitting separate models of a similar form to Equation (3) that treat each behaviour as an independent outcome. That is, fit Equation (3) with raw minutes/day of each behaviour as the dependent variable. While this third approach is strictly not a compositional model, it is still the probably the most used approach currently by applied time use researchers, so we include it here for completeness.

These methods can be useful approaches; however, they also have limitations. Firstly, all three approaches may have an inflated type 1 error rate associated with running repeated analyses with the same data [39]. Moreover, while approach 1 may be useful if a researcher has a particular interest in a single behaviour, as described earlier the coefficients for a single pivot coordinate are often practically less meaningful to interpret [40]. Similar difficulties exist when interpreting results for individual olr coordinates using approach 2, particularly for coordinates , where only some behaviours will be involved in their creation. Approach 2 may allow for results to be transformed back into the simplex to allow for more meaningful interpretations via model-based estimates. However, this requires estimates from multiple, independent models to be pieced together. While approach 3 is widely used, it disregards the structural linear dependence among compositional behaviours (e.g., guaranteed structural spurious non-zero correlation [41]. By treating each behaviour as independent, individually regressing behaviours in a univariate manner and cannot ensure that predicted time across behaviours sum to 24 hours (see Supplementary file for examples). Importantly, while approach 2 does model each element of the complete vector of olr coordinates, it ignores any correlation structure among the olr-variables. By using individual models of form Equation (5), we are assuming that the random effects and errors come from separate, and unrelated, normal distributions. However, this is unlikely to be true in the case of CoDa, where the olr-coordinates are intrinsically multivariate in nature and are usually correlated. From an interpretation standpoint, this means our unrelated models provide no information on how behavioural trade-offs may occur to fit into the 24-h day, for example to see whether people who sleep more also accumulate relatively more PA within their waking day. Importantly, using the multiple models approach also means that once transformed back to the simplex, model-based estimates and residuals may differ depending on the sign matrix and associated olr-basis used to construct the coordinates (see examples in supplementary material 1). This is a problem when working with compositional data, where invariance under change of olr basis is one of the fundamental principles [23]. Indeed, the use of multiple, independent models as outlined in approach 2 above implicitly assumes no relationships between olr coordinates. This is equivalent to specifying a diagonal matrix (heterogenous variance with zero correlation between elements) in the multivariate sense. However, if, after the change of basis, the model is solved again using the multiple-univariate model procedure outlined in approach 2, it is implicitly assumed that the matrices are diagonal, and therefore, the same result as before the change of basis cannot be obtained. Simply put, the only way to obtain equivalent results before and after the change of basis is by using a multivariate procedure that accounts for the relationship between olr coordinates (see supplementary file for further details).

Moreover, while it is possible to test the effects on individual olr-coordinates using the multiple-models approach, there is difficulty in performing a single test of the joint effects on the complete vector of olr-coordinates [42], for example, to explore whether the composition of 24-h movement behaviours changed differently between intervention and control groups. In order to overcome these limitations a CMRLMM is required, as presented in Section 2.

2. Compositional multivariate-response linear mixed model

We propose extending the univariate mixed model above in Equation (5) to the multivariate case where response vectors are not modelled with implied independence. By taking the univariate equation above and adding subscript r to indicate the specific olr-response in question (i.e., ), the dependence between response vectors can be more easily specified. Our multivariate formula now becomes:

(6)

where is response for the th participant at the th timepoint; is the fixed intercept specific to response olr-coordinate; is the fixed ‘slope’/contrast between time-point and time-point specific to response olr-coordinate Note, the general formula outlined here assumes a linear change over timepoints and is used for simplicity in notation. In instances where this is not true, can be replaced to estimate olr-responses over the T timepoints via an indicator function as follows where is an indicator function that is equal to when the argument is true, otherwise. So far, this appears similar to the previous formula. However, unlike Equation (5), in Equation (6), the random effects and error will be of length , and are now assumed to come from a single multivariate normal distribution, rather than multiple unrelated univariate distributions, as follows:

where are potentially differently varying and correlated random intercepts for each response olr-coordinate , specific to person ; and

where are potentially differently varying and correlated random errors for each response olr-coordinate , specific to person at time-point for each response olr-coordinate ; with group- and residual-side variance and covariance ( matrices

(7)

and

(8)

Respectively. Or, written more fully in expanded form

(9)

and

(10)

where correlations at the differing levels of model are differentiated with the and superscripts, where . Note that both matrices are symmetrical, that is, and for any (ij) entry.

Unlike in Equation (5), this model accounts for the correlation between each olr-coordinate by estimating covariances between random effect and residual error terms as shown in the off-diagonal elements above which are constrained to be equal to zero when using the multiple models approach outlined earlier. Thus, the model outlined in Equation (6) allows olr-responses to flexibly vary (and covary) for each olr at both the group (participant) level and residual level. Estimates can then be made for olr-coordinates while respecting the multivariate nature of the data. Just like in the model outlined in Equation (5) the estimates at all levels of the model, including the residuals, will be specific to the basis chosen when constructing the olr-coordinates. However, when (uniquely) transformed back in the simplex space using the inverse transformation , the compositional coefficients, and any time-use estimates made using the model, are invariant to the basis used to construct the original olr-coordinates. Currently, in a frequentist framework, the multivariate response linear mixed model of Equation (6) cannot be fitted in R in the most popular linear mixed model packages nlme [43] and lme4 [44]. However, there are two potential solutions to this problem: (a) use a Bayesian framework to fit the multivariate response linear mixed model. For example, using brms::brm(); or (b) re-define the data structure to fit an equivalently specified (frequentist) univariate response linear mixed model using nlme::lme() and lme4::lmer(). Each paradigm has advantages and limitations. Bayesian methods are highly flexible, can naturally accommodate multivariate outcomes, and provide full posterior distributions of parameters for richer uncertainty quantification. However, they are typically more computationally demanding and require appropriate prior specification. Frequentist approaches, by contrast, enable formal hypothesis testing and are computationally efficient in some scenarios. However, problems can sometimes arise when fitting models with complex random-effects structures, where convergence or boundary fit issues may occur. Specifically, in the case of (near) zero variance components that cause boundary fit issues, Bayesian estimation can offer greater numerical stability through proper priors of the variance parameters where sampling at 0 poses no issues. In practice, the frequentist framework (knowingly or unknowingly) is also currently the most familiar statistical approach used by time-use and other CoDA researchers. The popularity of frequentist methods has motivated the formalisation of this approach in the hope to create a template for future compositional response linear multilevel models for repeated observations or clustered data as (although choosing one approach should not be mutually exclusive of the other). For the interested reader or CoDA practitioner, Bayesian CMRLMMs can be fit natively on “wide format” (discussed below) multivariate responses data using the R package `multilevelcoda` that also contains a discussion of Frequentist vs Bayesian approaches [45].

In order to be able to fit Equation (6) using the frequentist framework, the data have to be restructured so that all olr-coordinate responses are in a single column to be used as the dependent variable. We term the re-defined data structuring to fit an equivalently specified univariate response linear mixed model the “stacked response” linear mixed model approach. This terminology is borrowed from the limited online resources [46,47], and even fewer published descriptions [48,49], and of course how the multivariate response vectors are stacked to make a univariate response vector. Additional dummy variables are then needed to specify the olr-coordinate response in question, that is, a dummy variable for each response r = 1,2,…,D-1 Our original dataset which was of length ij (i individuals over j timepoints each), now becomes length ijr with one row per olr-coordinate, per participant, per timepoint. We can then specify a single model that allows for individual changes in each olr-response, while accounting the structure of the random effects and error terms as specified in Equations (9) and (10). An example of the stacking process is outlined below for in Tables 1 and 2. In the restructured dataset the vector of olr-coordinates is contained in a single column along with the dummy variables

thumbnail
Table 1. The original data in general form in wide format for the ith individual.

https://doi.org/10.1371/journal.pone.0340373.t001

thumbnail
Table 2. The restructured dataset in general form in long format for the ith individual after ‘stacking’ the multivariate olr-response variable.

https://doi.org/10.1371/journal.pone.0340373.t002

The dummy variables are binary variables representing which olr-response is being considered. By expanding Equation (5) we can now fit Equation (8) where the response is the notionally ‘univariate’ response associated with the stacked vectors of responses

(11)

Or more succinctly expressed as

(12)

Where , are the olr (or response index) indicator variables associated with each olr-response as specified in Table 2 where the value of = 1 for the rows of data containing the dependent variable , and 0 otherwise. Thus, the dummy variables act as an additional level to the model which simply defines the response structure. Of note, when compared to the univariate mixed model in Equation (5), the multivariate model in this form also no longer has an overall intercept. Instead, the intercept for each olr-response is estimated separately based on the dummy coding used. Equation (11) can now be used to estimate the vector of responses simultaneously while respecting the multivariate nature of the data. Expanding the formula to include additional demographic predictors such as socio-economic status, body-mass index etc. is then relatively simple. These can be included in the model call by including interaction terms between the dummy coding used and variables of interest to allow each fixed effect to have a specific response (index) associated estimate.

Multivariate test on fixed effects

Another key advantage of the CMRLMM is the ability to perform a multivariate test on the fixed effects, such as a multivariate F test or Wald chi-square test to test for significance in changes in composition across timepoints, between groups, or other variables of interest. When using the multiple models approach outlined in Section 1.2, it is only possible to perform univariate tests on individual olr-coordinates. However, when using a CMRLMM, it is also possible to conduct tests that are multivariate in nature because the model includes the complete vector of olr-coordinates . Moreover, because the covariances between are accounted for at both levels of the model, the compositional representation of the model is equivalent regardless of the basis chosen when constructing the olr-coordinates, meaning results will be consistent.

3. Example

In this section we use data from the Life on Holidays (LoH) study, for which a full protocol describing data collection methods has been published previously [50]. LoH was a longitudinal cohort study based in Adelaide, Australia, that aimed to track changes in 24-h activity composition, diet and weight status of primary school-aged children during the school year and summer school holiday periods (n = 241). Ethical approval was obtained from The University of South Australia Human Research Ethics Committee (200980), the South Australian Department of Education and Child Development (2008–0055) and the Adelaide Catholic Education Centre (201820) for the original Life on Holidays study. Time use was measured using wrist-worn GENEActiv accelerometers at five timepoints across two school years between February 1st 2019 – November 30th 2021: Timepoint 1, at the start of Grade 4 (February-March); Timepoint 2, at the end of Grade 4 (October-November); Timepoint 3, during the summer holiday period; Timepoints 4 & 5, at the start and end of the Grade 5 school year, respectively. Time-use composition was conceptualised as a 4-part composition (D = 4) consisting of time spent in sleep, SB, LPA, or MVPA. Each minute of the day was classified as either SB, LPA or MVPA from the accelerometer recordings using validated cutpoints [51] with sleep time distinguished from waking time using a validated algorithm [52]. Unconditional CMRLMM was initially created without the addition of any covariates. Baseline categorical socio-economic status as determined by parental income, and continuous BMI z-score were then included in the full model as an example of how to include time-invariant covariates.

Children’s movement-behaviour compositions at each study time-point (index j currently ignored for clarity) were expressed as olr-coordinates using the SBP and sign matrix shown below, where the 4-part compositional vector , reflected time spent in sleep, SB, LPA and MVPA, respectively. Conceptually, the SBP matrix is a representation of a (divisive) clustering dendrogram starting with the entire set of behaviours and recursively partitioning the (sub)sets of behaviours until each leaf/node contains only a set of two or less behaviours. In the accompanying sign matrix, each row represents an olr coordinate, + 1 identifies behaviours in the numerator group of the log ratio, –1 identifies those in the denominator group of the log ratio, and 0 indicates that the behaviour is not part of that specific olr coordinate. Note: when considering the raw compositional coefficients, time-use estimates and interaction effects for the complete vector of coordinates, the choice of SBP is arbitrary.

According to the sign matrix above the resultant olr-coordinates were then calculated as follows (ignoring the person I at timepoints j notation for simplicity), where the first coordinate represents the ratio of sleep to the geometric mean of all waking behaviours combined; the second coordinate, represents the ratio of SB to active behaviours (LPA + MVPA), and the third coordinate, , contrasts LPA to MVPA as outlined below.

For person and time-point , the corresponding three response () olr-coordinates were then modelled as

(12)

where is the fixed intercept specific to response olr-coordinate ; is the indicator function that is equal to when the argument is true, otherwise; is the fixed ‘slope’/contrast between time-point and time-point specific to response olr-coordinate ; are potentially differently varying and correlated random intercepts for each response olr-coordinate , specific to person ; are potentially differently varying and correlated random errors for each response olr-coordinate , specific to person at time-point ; with group- and residual-side variance and covariance matrices respectively defined according Equations (9) and (10).

It is noteworthy that In the LoH study, a linear relationship between the follow-up timepoints and each olr-response cannot be assumed as time use was hypothesised to differ during the school holiday period when compared to the in-school timepoints. Therefore, categorical timepoint indicators, contrast to the first study time point, were used. However in a more general case of repeated measures (or even repeated measures per person ) over time that may not be equally spaced but a linear relationship between the follow-up timepoints and each olr-response, one would replace the somewhat notationally clumsy terms simply with for each response where is potentially continuous and unique time value for the repeated measure for person . This principle is equally applicable when time is not the predictor of interest, for example in cross-sectional clustered designs adiposity could be treated as continuous (zBMI) or categorical (overweight/obese status). The choice between categorical or continuous predictors does not alter the core structure of the model. An additional level of nesting to account for nesting of participants within schools was also tested, however the variance components for this level of the model were very low, suggesting little school-to-school variation. The school-level random effects were subsequently dropped in the interest of model parsimony. In order to fit Equation (12), the three response variables were then stacked into a single column to be used as the dependent variable as described earlier. Models were fitted using R package the nlme::lme() [43]. R code along with alternate model specification using the lme4::lmer() package [44] are provided in supplementary material 1.

Table 3 presents parameter estimates and standard errors for the fixed effects of Equation (12). p values are reported using the ‘inner outer’ approximation of degrees of freedom as is default in nlme::lme() [43]. The accurate estimation of degrees of freedom and p values in multilevel models is often discussed [53]. Other degrees of freedom approximation methods have been proposed, including the Satterthwaite and Kenward–Roger approximations [54], along with other methods of quantifying uncertainty (e.g., bootstrapping). However, a detailed comparison of these methods is beyond the scope of this article. Results suggest higher values for all three olr-coordinates over time, in particular for timepoint 3. zBMI was positively associated with representing the balance of LPA to MVPA suggesting the ratio of LPA to MVPA increased as zBMI increased.

The estimated level 2 random effects (between individual) variance/covariance matrix as described in Equation (9) are shown below:

And the estimated level 1 residual (within individual and time-point) error variance/covariance matrix as described in Equation (10):

Positive correlations for all three olr-coordinates were observed in both the within- and between-person covariance matrices. Importantly, these correlations are overlooked when using the multiple univariate response modelling approach outlined earlier. Practically, this impacts our results in multiple ways. Firstly, the covariance components offer additional insight into how people allocate their time and can offer behaviourally meaningful inferences depending on the sign matrix used to construct the coordinates. For example, the positive correlation between olr1 and olr2 in the between-person G matrix suggests that those individuals who on average spend more time in sleep vs awake (olr1), also tend to spend more of their waking day accumulating SB than active time (olr2). These insights are lost when fitting multiple unrelated models to either individual olr coordinates or raw min/day of each behaviour. Likewise, the correlations between the fixed effect parameters allow for valid joint uncertainty for the full olr vector (supplementary material). Additionally, the model fit statistics and statistical power will improve when compared to a multiple models approach used previously. To demonstrate this, we can fit the equivalently specified multivariate response model with the random effect and residual covariance matrices assumed to be diagonal matrices (covariances constrained to zero), referred to as the multivariate unrelated outcomes model. This model will provide the same estimates at all levels of the model as those provided by the multiple univariate models. However, because the multivariate unrelated outcomes model is nested within the fully multivariate model outlined above, it can be compared via a likelihood ratio test. Results of the likelihood ratio test suggest that the addition of the covariances between olr coordinates at both levels of the model improves model fit (Table 4).

thumbnail
Table 4. likelihood ratio test comparing CMRLMM to the multivariate unrelated outcomes model with assumed diagonal covariance matrices.

https://doi.org/10.1371/journal.pone.0340373.t004

Another important difference is that the results for the CMRLMM will be invariant to the basis used to construct the olr-coordinates. This is not the case when constraining the covariances to be diagonal matrices as is the case when fitting independent models on each olr coordinate. To demonstrate this, we can see in Fig 1 the fitted values for 10 randomly sampled level-2 units that have been transformed back to their compositional representation when using the CMRLMM and those from multiple, independent models with olr coordinates that were constructed using two different orthonormal bases. It can be seen that the compositional representation is equivalent for the two CMRLMMs. However, results using the independent models approach differ depending on the basis used to construct the olr coordinates. Importantly, both also differ from those provided from the CMRLMM. Further model comparisons are provided in supplementary file S1, S2, S3 Files.

thumbnail
Fig 1. Fitted values for 10 randomly selected level-2 units with coordinates constructed using two different olr bases.

Abbreviations: mrlmm = multivariate response linear mixed model; urlmm = univariate response linear mixed model.

https://doi.org/10.1371/journal.pone.0340373.g001

When compared to fitting four separate and independent non-compositional models on raw minutes/day additional benefits are seen when using the CMRLMM. In addition to providing no information on the relationships between behaviours as described above by not explicitly modelling covariances, the non-compositional models do not respect the constant sum constraint of the data (supplementary file S1, S2, S3 Files).

Unlike the multiple models approach used previously, the multivariate mixed model also allows for a single test of the fixed effects for the joint effect on the vector of olr-coordinates which will be invariant to the basis chosen when constructing the olr-coordinates. The results of the multivariate F-test are presented in Table 5. Using ‘inner-outer’ approximation of denominator degrees of freedom [55], the F statistic for interaction between the vector of olr-coordinates and timepoints (F(12, 2475) = 9.832, p < 0.001) shows that the fixed effects of the model suggest movement-behaviour composition is significantly different across timepoints. Results also suggest that there was a significant interaction between zBMI and movement-behaviour composition (F(3,2475) = 4.44, p = 0.004), whereas parental income and movement-behaviour composition were not significantly associated (F(6,2475) = 1.86, p = 0.08). The ability to perform multivariate tests such as these is limited when creating separate models for individual olr coordinates. Three-way interactions between olr-coordinates, timepoint and covariates were also tested, but were not significant, suggesting children followed a similar pattern of across all timepoints.

While the results of the multivariate F-test suggest movement-behaviour compositions are different across timepoints, they do not give an indication of which timepoints differ, on which components in what direction. In order to investigate this, the estimates presented in Table 3 which are specific to the basis chosen when constructing the olr-coordinates , need to be back-transformed to their corresponding raw compositional representation , using . For example, back transforming the vector of intercept estimates , will result in the estimated mean movement-behaviour composition at baseline (assuming zBMI of zero and reference SES group), as below

(13)

Once transformed into the simplex space, the compositional coefficient estimates are now invariant to the basis chosen when constructing the olr-coordinates for both the fixed effects and random effects. After back transforming the estimates for timepoints T2 to T5, the compositional coefficients can be interpreted as the perturbation vector for that timepoint when applied to the baseline composition (which has been closed to 1440 min/day in Table 6) to obtain estimates for a given timepoint. The perturbation vector can be interpreted substantively as relative reallocations for each behaviour when compared to the neutral perturbation vector, subject to closure as outlined in Section 1.1. For example, with a four-part composition, as in our demonstration, the neutral perturbation vector [1/D, …, 1/D] is [0.25, 0.25, 0.25, 0.25]. When the compositional coefficients for timepoints 2–5 are compared to the neutral perturbation vector we can see that in our example dataset time appears to be reallocated away from MVPA (values <0.25) and towards SB (values >0.25) as children age across the timepoints, with the largest changes occurring during T3 (the school holiday period).

thumbnail
Table 6. Compositional coefficient estimates for fixed effects.

https://doi.org/10.1371/journal.pone.0340373.t006

While compositional coefficients are meaningful, it may be preferred to make model-based point estimates for the compositions, and differences, across timepoints using the fixed effects from the CMRLMM. Fig 2 provides estimated movement-behaviour compositions at the five timepoints (for a participant of mean zBMI and income category) via predicted olr-values that have been back transformed into the simplex space , 95% percentile intervals for the compositions were estimated using non-parametric ‘cases’ bootstrapping procedure with 1000 replicates, which resamples level-2 observations (participants) [56,57].

thumbnail
Fig 2. Estimated movement-behaviour compositions for an average participant across the five timepoints.

Sleep: range 581-586 min/day. SB: range 481-520 min/day. LPA: range 280-298 min/day. MVPA: range 55-77 min/day.

https://doi.org/10.1371/journal.pone.0340373.g002

We can see that Timepoint 3 during the school-holiday period appears to have the largest differences, particularly in relation to the time spent in SB and MVPA. It is also noteworthy that despite timepoints 3,4 and 5 being significantly and positively associated with (Table 3), suggesting more sleep across these timepoints, we can see that the proportion of sleep is relatively stable across all timepoints, and in fact estimated to be lower in timepoints T3 and T5 than at baseline. The seemingly contradictory results are due to the way must be interpreted. As mentioned in Section 1.1, the first pivot coordinate represents the dominance of sleep relative to the geometric mean of SB, LPA and MVPA. Despite sleep remaining relatively stable across all timepoints, there is a significant change in across timepoints due to the change in the sub-composition of the remaining behaviours (increased contribution of SB and decreased MVPA). This demonstrates why the CMRLMM is preferred to the previously used method of trying to draw inferences about time spent in each behaviour from pivot coordinates in univariate models as has been done previously [34,36]. In order to estimate differences between timepoints we can use the approach suggested by Martín Fernández, Daunis-i-Estadella [58] to determine group differences. Here, we use the fixed effects of the model to calculate the log-ratio difference in the predicted movement-behaviour compositions between timepoint 3 and other timepoints with bootstrapped percentile intervals as shown in Fig 3.

thumbnail
Fig 3. Estimated log-ratio difference in compositional parts for each timepoint when compared to timepoint 3 (school holidays).

Note: values above the horizontal line indicate relatively higher proportions of that component during the in-school timepoint compared to the school-holiday period; values below the horizontal line indicate relatively lower proportions of that component during the in-school timepoint compared to the school-holiday period.

https://doi.org/10.1371/journal.pone.0340373.g003

These findings suggest that children’s movement behaviour compositions change as they age, with distinct differences evident during the school-holiday period. These changes are characterised by lower contributions of MVPA and higher contributions of SB. Given the unfavourable associations reported between reallocating time away from MVPA to SB [1], this indicates the school holiday period may be a key intervention point for future public health initiatives.

4. Discussion

Key strengths of the CMRLMM outlined in this paper is its ability to include all daily time-use components as dependent variables in a single analytical model, within a multi-level framework. A multi-level framework is relevant for many applications where compositional outcomes are required in epidemiological and behavioural sciences, including both observational and experimental study designs. For example, observational cohort studies that investigate how movement-behaviour compositions change longitudinally over time with repeated measurements on participants (as in our example) or experimental studies with repeated measurements such as randomised-control trials that aim to evaluate compositional effects of a targeted behaviour-change program. Researchers may also be interested in investigating how activity compositions differ among different groups of participants in clustered cross-sectional study designs where participants are sampled within higher-level units such as schools, health-care providers, or worksites. For example, how activity compositions differ amongst different occupational groups when workers are sampled within worksites [59]. In each case, the model’s multilevel structure allows random effects to be defined at relevant grouping levels (e.g., participant, school, or site), thereby accounting for intra-cluster correlation while preserving the compositional dependency between behaviours. This flexibility supports consistent, compositionally valid inference across a wide variety of research designs.

A key benefit of the CMRLMM is that it provides consistent results regardless of how the compositional time-use behaviours are ordered in the model. The CoDA log-ratio transformation overcomes the issue of perfect multi-collinearity between the time-use components, and the variable stacking procedure outlined in this paper enables all the log ratios to be considered as dependent variables simultaneously. Another strength of the analytical pipeline presented in this paper is the interpretability of the model log-ratio coefficients in the original compositional units (minutes/day) via a back-transformation. As we have demonstrated, specific research questions regarding meaningful differences in time use (e.g., comparing children’s time use during school terms to the holidays) can be explored by post-hoc comparisons, and hypotheses via bootstrapping.

There are limitations to the CMRLMM which should be considered. First, the CoDA olr transformation cannot be implemented if there are zero values in any of the time-use components, as the logarithm of zero is undefined. However, this is true of all CoDA techniques, and there are published methods for dealing with zero values in compositional variables, which are beyond the scope of this paper. In terms of the implementation of the CMRLMM in the R environment, it should be considered that these models can have long run times, and convergence issues can arise. However, these issues can be addressed by following recommended troubleshooting procedures (e.g., rescaling and centring continuous variables, trying various optimisers, increasing the maximum number of iterations) [53]. High performance computers may also reduce runtime when fitting models, particularly when modelling compositions with many parts, and parallel processing will drastically reduce runtime when conducting bootstrap inference. Importantly, the CRLMM expressed in compositional coefficients is not more complicated than univariate response models used previously, so the benefits justify the greater computational complexity.

The use of CoDA in time-use epidemiology has grown at a rapid rate in recent years, leading to a paradigm shift in the way that people view the relationship between time use and health. For example, the acceptance that all time-use behaviours are interrelated and jointly contribute to health has led to various countries adopting integrated 24-hour movement behaviour guidelines that promote an optimal mix of these behaviours for different age groups [6062]. Similarly, multi-component intervention studies now accept the co-dependent nature of time use and aim to simultaneously change time spent in multiple behaviours [7]. Our proposed CMRLMM provides the tools required to allow researchers to better understand how time use changes longitudinally due to natural interventions (such as in the example given in this paper), or in experimental designs. The CMRLMM approach will also support analyses that explore how different personal and sociodemographic factors may be related to different time reallocation patterns. This knowledge will allow more targeted public health initiatives that may aim to block unfavourable reallocations (such as from MVPA to SB as in the current example), or alternatively nudging people to make favourable reallocations.

Supporting information

References

  1. 1. Miatke A, Olds T, Maher C, Fraysse F, Mellow ML, Smith AE, et al. The association between reallocations of time and health using compositional data analysis: a systematic scoping review with an interactive data exploration interface. Int J Behav Nutr Phys Act. 2023;20(1):127. pmid:37858243
  2. 2. Warburton DER, Bredin SSD. Health benefits of physical activity: a systematic review of current systematic reviews. Curr Opin Cardiol. 2017;32(5):541–56. pmid:28708630
  3. 3. Müller-Riemenschneider F, Reinhold T, Nocon M, Willich SN. Long-term effectiveness of interventions promoting physical activity: a systematic review. Prev Med. 2008;47(4):354–68. pmid:18675845
  4. 4. Prince SA, Saunders TJ, Gresty K, Reid RD. A comparison of the effectiveness of physical activity and sedentary behaviour interventions in reducing sedentary time in adults: a systematic review and meta-analysis of controlled trials. Obes Rev. 2014;15(11):905–19. pmid:25112481
  5. 5. Saunders TJ, McIsaac T, Douillette K, Gaulton N, Hunter S, Rhodes RE, et al. Sedentary behaviour and health in adults: an overview of systematic reviews. Appl Physiol Nutr Metab. 2020;45(10 (Suppl. 2)):S197–217. pmid:33054341
  6. 6. Albakri U, Drotos E, Meertens R. Sleep Health Promotion Interventions and Their Effectiveness: An Umbrella Review. Int J Environ Res Public Health. 2021;18(11):5533. pmid:34064108
  7. 7. Lewis BA, Napolitano MA, Buman MP, Williams DM, Nigg CR. Future directions in physical activity intervention research: expanding our focus to sedentary behaviors, technology, and dissemination. J Behav Med. 2017;40(1):112–26. pmid:27722907
  8. 8. Pedišić Ž. Measurement issues and poor adjustments for physical activity and sleep undermine sedentary behaviour research—the focus should shift to the balance between sleep, sedentary behaviour, standing and activity. Kinesiology. 2014;46(1):135–46.
  9. 9. Dumuid D, Stanford TE, Martin-Fernández J-A, Pedišić Ž, Maher CA, Lewis LK, et al. Compositional data analysis for physical activity, sedentary time and sleep research. Stat Methods Med Res. 2018;27(12):3726–38. pmid:28555522
  10. 10. von Rosen P. Analysing time-use composition as dependent variables in physical activity and sedentary behaviour research: different compositional data analysis approaches. J Act Sedentary Sleep Behav. 2023;2(1):23. pmid:40217358
  11. 11. Ferguson T, Curtis R, Fraysse F, Olds T, Dumuid D, Brown W, et al. How do 24-h movement behaviours change during and after vacation? A cohort study. Int J Behav Nutr Phys Act. 2023;20(1):24. pmid:36859292
  12. 12. Ryan DJ, Ross MH, Simmich J, Ng N, Burton NW, Gilson N, et al. TRACK & ACT: a pragmatic randomised controlled trial exploring the comparative effectiveness of pedometers and activity trackers for changing physical activity and sedentary behaviour in inactive individuals. J Act Sedentary Sleep Behav. 2023;2(1):12. pmid:40217539
  13. 13. Olds T, Burton NW, Sprod J, Maher C, Ferrar K, Brown WJ, et al. One day you’ll wake up and won’t have to go to work: The impact of changes in time use on mental health following retirement. PLoS One. 2018;13(6):e0199605. pmid:29953472
  14. 14. Leite MLC. Applying compositional data methodology to nutritional epidemiology. Stat Methods Med Res. 2016;25(6):3057–65. pmid:25411321
  15. 15. Dumuid D, Martín-Fernández JA, Ellul S, Kenett RS, Wake M, Simm P, et al. Analysing body composition as compositional data: An exploration of the relationship between body composition, body mass and bone strength. Stat Methods Med Res. 2021;30(1):331–46. pmid:32940148
  16. 16. Aitchison J. The Statistical Analysis of Compositional Data. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1982;44(2):139–60.
  17. 17. Aitchison J, editor A concise guide to compositional data analysis. Compositional Data Analysis Workshop; 2005.
  18. 18. Egozcue JJ, Pawlowsky-Glahn V. Basic concepts and procedures. Compositional data analysis: Theory and applications. 2011. p. 12–28.
  19. 19. Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ. The principle of working on coordinates. Compositional data analysis: Theory and applications. 2011. p. 29–42.
  20. 20. Tolosana-Delgado R, Van den Boogaart K. Linear models with compositions in R. Compositional data analysis: theory and applications. 2011.
  21. 21. Egozcue JJ, Pawlowsky-Glahn V. Compositional data: the sample space and its structure. TEST. 2019;28(3):599–638.
  22. 22. Martín-Fernández JA. Comments on: Compositional data: the sample space and its structure. TEST. 2019;28(3):653–7.
  23. 23. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio Transformations for Compositional Data Analysis. Mathematical Geology. 2003;35(3):279–300.
  24. 24. Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, Tolosona-Delgado R. Advances in Principal Balances for Compositional Data. Math Geosci. 2017;50(3):273–98.
  25. 25. Martín-Fernández JA, Donato VD, Pawlowsky-Glahn V, Egozcue JJ. Insights in Hierarchical Clustering of Variables for Compositional Data. Math Geosci. 2023;56(3):415–35.
  26. 26. Egozcue JJ, Pawlowsky-Glahn V. Groups of Parts and Their Balances in Compositional Data Analysis. Math Geol. 2005;37(7):795–828.
  27. 27. McGregor DE, Dall PM, Palarea-Albaladejo J, Chastin SF. Compositional Data Analysis in Physical Activity and Health Research. Looking for the Right Balance. Advances in Compositional Data Analysis: Festschrift in Honour of Vera Pawlowsky-Glahn: Springer; 2021. p. 363-82.
  28. 28. Fišerová E, Hron K. On the Interpretation of Orthonormal Coordinates for Compositional Data. Math Geosci. 2011;43(4):455–68.
  29. 29. Van den Boogaart KG, Tolosana-Delgado R. Analyzing compositional data with R. Springer. 2013.
  30. 30. Dumuid D, Pedišić Ž, Stanford TE, Martín-Fernández J-A, Hron K, Maher CA, et al. The compositional isotemporal substitution model: A method for estimating changes in a health outcome for reallocation of time between sleep, physical activity and sedentary behaviour. Stat Methods Med Res. 2019;28(3):846–57. pmid:29157152
  31. 31. Ben Bolker JP, Emi Tanaka, Phillip Alday, Wolfgang Viechtbauer. CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R 2024 [Available from: https://cran.r-project.org/web/views/MixedModels.html
  32. 32. Schabenberger O. Introducing the GLIMMIX procedure for generalized linear mixed models. In: SUGI 30 Proceedings, 2005. 1–20.
  33. 33. Achana F, Gallacher D, Oppong R, Kim S, Petrou S, Mason J, et al. Multivariate Generalized Linear Mixed-Effects Models for the Analysis of Clinical Trial-Based Cost-Effectiveness Data. Med Decis Making. 2021;41(6):667–84. pmid:33813933
  34. 34. Larisch L-M, Bojsen-Møller E, Nooijen CFJ, Blom V, Ekblom M, Ekblom Ö, et al. Effects of Two Randomized and Controlled Multi-Component Interventions Focusing On 24-Hour Movement Behavior among Office Workers: A Compositional Data Analysis. Int J Environ Res Public Health. 2021;18(8):4191. pmid:33920971
  35. 35. Larsson K, Von Rosen P, Rossen J, Johansson U-B, Hagströmer M. Relative time in physical activity and sedentary behaviour across a 2-year pedometer-based intervention in people with prediabetes or type 2 diabetes: a secondary analysis of a randomised controlled trial. J Act Sedentary Sleep Behav. 2023;2(1):10. pmid:40217375
  36. 36. Pasanen J, Leskinen T, Suorsa K, Pulakka A, Virta J, Auranen K, et al. Effects of physical activity intervention on 24-h movement behaviors: a compositional data analysis. Sci Rep. 2022;12(1):8712. pmid:35610297
  37. 37. Suorsa K, Leskinen T, Pasanen J, Pulakka A, Myllyntausta S, Pentti J, et al. Changes in the 24-h movement behaviors during the transition to retirement: compositional data analysis. Int J Behav Nutr Phys Act. 2022;19(1):121. pmid:36109809
  38. 38. Vansweevelt N, van Uffelen J, Boen F, Suorsa K, Seghers J. Socio-economic position and changes in 24-h movement behaviors during the retirement transition. J Act Sedentary Sleep Behav. 2025;4(1):17. pmid:41102869
  39. 39. Farcomeni A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat Methods Med Res. 2008;17(4):347–88. pmid:17698936
  40. 40. Greenacre M. Towards a pragmatic approach to compositional data analysis. 2017.
  41. 41. Kynčlová P, Hron K, Filzmoser P. Correlation Between Compositional Parts Based on Symmetric Balances. Math Geosci. 2017;49(6):777–96.
  42. 42. Hox J, Moerbeek M, Van de Schoot R. Multilevel analysis: Techniques and applications. Routledge. 2017.
  43. 43. Pinheiro JC, Bates DM. Linear mixed-effects models: basic concepts and examples. Mixed-effects models in S and S-Plus. 2000. p. 3–56.
  44. 44. Bates D, Maechler M, Bolker B, Walker S, Christensen RHB, Singmann H, et al. Package ‘lme4’. convergence. 2015;12(1):2.
  45. 45. Le F, Dumuid D, Stanford TE, Wiley JF. Bayesian Multilevel Compositional Data Analysis with the R Package multilevelcoda. Multivariate Behav Res. 2025;:1–19. pmid:41249128
  46. 46. Curran PJ, McGinley JS, Serrano D, Burfeind C. A multivariate growth curve model for three-level data. 2012.
  47. 47. Twisk JW. Applied multilevel analysis: a practical guide for medical researchers. Cambridge University Press. 2006.
  48. 48. Baldwin SA, Imel ZE, Braithwaite SR, Atkins DC. Analyzing multiple outcomes in clinical research using multivariate multilevel models. J Consult Clin Psychol. 2014;82(5):920–30. pmid:24491071
  49. 49. Snijders TA, Bosker R. Multilevel analysis: An introduction to basic and advanced multilevel modeling. 2011.
  50. 50. Watson A, Maher C, Tomkinson GR, Golley R, Fraysse F, Dumuid D, et al. Life on holidays: study protocol for a 3-year longitudinal study tracking changes in children’s fitness and fatness during the in-school versus summer holiday period. BMC Public Health. 2019;19(1):1353. pmid:31646994
  51. 51. Phillips LRS, Parfitt G, Rowlands AV. Calibration of the GENEA accelerometer for assessment of physical activity intensity in children. J Sci Med Sport. 2013;16(2):124–8. pmid:22770768
  52. 52. van Hees VT, Sabia S, Anderson KN, Denton SJ, Oliver J, Catt M, et al. A Novel, Open Access Method to Assess Sleep Duration Using a Wrist-Worn Accelerometer. PLoS One. 2015;10(11):e0142533. pmid:26569414
  53. 53. Bolker B. GLMM FAQ 2024 [Available from: bbolker.github.io/mixedmodels-misc/glmmFAQ
  54. 54. Luke SG. Evaluating significance in linear mixed-effects models in R. Behav Res Methods. 2017;49(4):1494–502. pmid:27620283
  55. 55. Pinheiro J, Bates D. Mixed-effects models in S and S-PLUS. Springer Science & Business Media. 2006.
  56. 56. Carpenter JR, Goldstein H, Rasbash J. A Novel Bootstrap Procedure for Assessing the Relationship between Class Size and Achievement. Journal of the Royal Statistical Society Series C: Applied Statistics. 2003;52(4):431–43.
  57. 57. Leeden R v d, Meijer E, Busing FM. Resampling multilevel models. Handbook of multilevel analysis. Springer. 2008. p. 401–33.
  58. 58. Martín Fernández JA, Daunis-i-Estadella P, Mateu i Figueras G. On the interpretation of differences between groups for compositional data. SORT: Statistics and Operations Research Transactions. 2015;39(2):231–52.
  59. 59. Gupta N, Mathiassen SE, Mateu-Figueras G, Heiden M, Hallman DM, Jørgensen MB, et al. A comparison of standard and compositional data analysis in studies addressing group differences in sedentary behavior and physical activity. Int J Behav Nutr Phys Act. 2018;15(1):53. pmid:29903009
  60. 60. Ross R, Chaput J-P, Giangregorio LM, Janssen I, Saunders TJ, Kho ME, et al. Canadian 24-Hour Movement Guidelines for Adults aged 18-64 years and Adults aged 65 years or older: an integration of physical activity, sedentary behaviour, and sleep. Appl Physiol Nutr Metab. 2020;45(10 (Suppl. 2)):S57–102. pmid:33054332
  61. 61. Draper CE. The South African 24-hour movement guidelines for birth to 5 years. S Afr J CH. 2021;15(2):58.
  62. 62. Okely AD, Ghersi D, Hesketh KD, Santos R, Loughran SP, Cliff DP, et al. A collaborative approach to adopting/adapting guidelines - The Australian 24-Hour Movement Guidelines for the early years (Birth to 5 years): an integration of physical activity, sedentary behavior, and sleep. BMC Public Health. 2017;17(Suppl 5):869. pmid:29219094