Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A novel scaling methodology to reduce the biases associated with missing data from commercial activity monitors

  • R. O’Driscoll ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Appetite Control and Energy Balance Group, School of Psychology, University of Leeds, Leeds, United Kingdom

  • J. Turicchi ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Appetite Control and Energy Balance Group, School of Psychology, University of Leeds, Leeds, United Kingdom

  • C. Duarte ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Investigation, Supervision, Writing – review & editing

    Affiliation Appetite Control and Energy Balance Group, School of Psychology, University of Leeds, Leeds, United Kingdom

  • J. Michalowska ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Conceptualization, Investigation, Resources, Writing – review & editing

    Affiliation Department of Treatment of Obesity, Metabolic Disorders and Clinical Dietetics, Medical Faculty, Poznan University of Medical Sciences, Poznan, Poland

  • S. C. Larsen ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Investigation, Project administration, Writing – review & editing

    Affiliation Research Unit for Dietary Studies, The Parker Institute, Bispebjerg and Frederiksberg Hospital, The Capital Region, Denmark

  • A. L. Palmeira ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Funding acquisition, Investigation, Methodology, Project administration

    Affiliations Faculdade de Motricidade Humana, Universidade de Lisboa, Lisbon, Portugal, Universidade Lusófona, Lisbon, Portugal

  • B. L. Heitmann ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing

    Affiliations Research Unit for Dietary Studies, The Parker Institute, Bispebjerg and Frederiksberg Hospital, The Capital Region, Denmark, Department of Public Health, Section for General Medicine, Copenhagen University, Copenhagen, Denmark, Charles Perkins Centre, The Boden Institute, University of Sydney, Sydney, Australia

  • G. W. Horgan ,

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Conceptualization, Investigation, Methodology, Software, Writing – review & editing

    Affiliation Biomathematics & Statistics Scotland, Aberdeen, United Kingdom

  • R. J. Stubbs

    Contributed equally to this work with: R. O’Driscoll, J. Turicchi, C. Duarte, J. Michalowska, S. C. Larsen, A. L. Palmeira, B. L. Heitmann, G. W. Horgan, R. J. Stubbs

    Roles Funding acquisition, Methodology, Project administration, Validation, Writing – original draft, Writing – review & editing

    Affiliation Appetite Control and Energy Balance Group, School of Psychology, University of Leeds, Leeds, United Kingdom


3 Sep 2020: O’Driscoll R, Turicchi J, Duarte C, Michalowska J, Larsen SC, et al. (2020) Correction: A novel scaling methodology to reduce the biases associated with missing data from commercial activity monitors. PLOS ONE 15(9): e0238965. View correction



Commercial physical activity monitors have wide utility in the assessment of physical activity in research and clinical settings, however, the removal of devices results in missing data and has the potential to bias study conclusions. This study aimed to evaluate methods to address missingness in data collected from commercial activity monitors.


This study utilised 1526 days of near complete data from 109 adults participating in a European weight loss maintenance study (NoHoW). We conducted simulation experiments to test a novel scaling methodology (NoHoW method) and alternative imputation strategies (overall/individual mean imputation, overall/individual multiple imputation, Kalman imputation and random forest imputation). Methods were compared for hourly, daily and 14-day physical activity estimates for steps, total daily energy expenditure (TDEE) and time in physical activity categories. In a second simulation study, individual multiple imputation, Kalman imputation and the NoHoW method were tested at different positions and quantities of missingness. Equivalence testing and root mean squared error (RMSE) were used to evaluate the ability of each of the strategies relative to the true data.


The NoHoW method, Kalman imputation and multiple imputation methods remained statistically equivalent (p<0.05) for all physical activity metrics at the 14-day level. In the second simulation study, RMSE tended to increase with increased missingness. Multiple imputation showed the smallest RMSE for Steps and TDEE at lower levels of missingness (<19%) and the Kalman and NoHoW methods were generally superior for imputing time in physical activity categories.


Individual centred imputation approaches (NoHoW method, Kalman imputation and individual Multiple imputation) offer an effective means to reduce the biases associated with missing data from activity monitors and maximise data retention.


Participation in physical activity and limiting sedentary behaviours is associated with increased total energy expenditure and potentially beneficial homeostatic matching of energy intake to energy expenditure [1]. As such, more active lifestyles are associated with a reduced risk of obesity [2], weight loss and prevention of weight regain following weight loss [35], as evidence suggests that weight maintenance is more readily achieved at higher degrees of energy flux [6]. Thus, the accurate and precise quantification of physical activity behaviours is critical to the study of overweight, obesity and associated comorbidities.

Accelerometery-based measures of physical activity have been available for a number of years [7]. Their objective nature offers a significant advantage over questionnaire-based assessments, which are biased by misreporting [8]. In current activity monitors, tri-axial piezoelectric sensors detect acceleration in anteroposterior, mediolateral and vertical axes and are used to objectively quantify human movement [9]. Technological advances in terms of size, data aggregation/storage capabilities and the associated fall in cost facilitates the use of tri-axial accelerometers in most new devices [9], as opposed to the uni-axial [10], bi-axial accelerometers [11] and burdensome battery packs required for earlier devices [12]. Taken together, these advances mean that it is increasingly feasible to objectively and continuously monitor the intra-day physical activity patterns of large groups of participants.

A well-recognised phenomenon in accelerometer research is missing data [13] attributable to behavioural (removal for aesthetic reasons) and non-behavioural reasons (device technical failures, charging). Non-wear time in accelerometers has previously been detected by defining periods in which the signal of acceleration in each axis falls below a threshold for some period of time, often a predefined period between 10–120 minutes [14,15]. Researchers then permit a maximum amount of non-wear time per day, which may be up to 14 hours [16]. The aim of defining such a period is to determine the amount of missing data which minimally influences the inferences of the study [17]. It is also common to define a minimum number of valid days within a measurement period and if these criteria are met, an average or total value for physical activity metrics can be estimated [18,19].

Missing accelerometer data may detrimentally influence the conclusions of a study in a number of ways. If physical activity summaries are calculated from incomplete data, true physical activity may be under-estimated (depending on the assumptions made about missing data). If missing periods occur in individuals that differ behaviourally or demographically from those with more complete data then the generalisability of the study’s conclusions may be compromised [20]. A range of strategies have been developed with the aim of limiting the bias introduced by missing accelerometer data [21]. These methods make use of the observed (non-missing) data to build predictive models of missing data points and have utilised mean imputation [22], combined multivariate strategies [23,24] or normalisation by the amount of wear-time [25,26].

Commercial activity monitors are increasingly prevalent in research environments and may be utilised in large cohorts and over long durations for assessment of physical activity. Commercial activity monitors are cloud-connected, facilitating the assessment of physical activity for longer time periods than research-grade equivalents (i.e. Actigraph GT3-x), which typically measure physical activity maximally over a single week [27]. Commercial activity monitors are also increasingly equipped with heart rate monitoring devices [28], which can facilitate the estimation of the relative intensity of physical activity or energy expenditure, through heart rate reserve (HRR) or flex methodologies [2932] but also creates different patterns of missingness. For example, missing data may be identified through loss of contact with the wrist (and therefore no measured heart rate), inferring that the device has most likely been removed. This results in the detection of smaller windows of removal, compared to longer periods used when accelerometer signal is the determinant of missingness [14,15]. These differences highlight an important need to develop methods to limit the bias associated with missing data from these devices. There has been no attempt to develop or apply imputation methodologies to commercially available multisensory activity monitors (i.e. Fitbit charge 2; FC2).

The purpose of the present study is to propose and evaluate a methodology designed to minimise the bias introduced by missing data collected from a commercial activity monitor (FC2). Firstly, we conducted a series of intra-class correlation analyses to investigate the minimum data required to achieve a reasonably non-biased aggregation of physical activity data collected by a FC2. Next, the results of autocorrelation analyses are presented, which serve as the rationale for the development of a method which scales temporally proximate data to produce summaries over a given measurement period. Lastly, in a series of simulation experiments using real datasets with simulated missingness, we compared the performance of the proposed methodology to alternative imputation strategies.

Materials and methods


Data were collected as part of the NoHoW trial (ISRCTN88405328), an 18-month randomised 2x2 controlled trial testing the efficacy of an ICT based toolkit for weight loss maintenance across three European centres: United Kingdom, (Leeds), Denmark (Copenhagen), and Portugal (Lisbon). The NoHoW study received funding from the European Union’s Horizon 2020 research and innovation programme (grant agreement number: 643309). The study was conducted in accordance with the Helsinki Declaration and ethical approval has been granted by local institutional ethics committees at the Universities of Leeds (17–0082; 27-Feb-2017), Lisbon (17/2016; 20-Feb-2017) and the Capital Region of Denmark (H-16030495; 8-Mar-2017) and all participants provided informed consent to have their data used for research purposes by this research team. Full details of the trial protocol have been published previously [33]. The NoHoW trial recruited 1,627 participants and some of the observational work reported in this study utilised the entire sample of NoHoW participants and when this is the case, this is specified in the manuscript.

For the simulation experiments conducted in this study, FC2 data from 109 participants each wearing a FC2 for 14 days (minutes = 2,197,440, hours = 36,624, days = 1526) were used. This sample was selected based on the quantity of non-wear time (<2.5% data missing within the first 14 days). Utilising a sample with minimal degrees of missingness allows ‘true’, near-complete data to be held back for comparison with imputation methods.

Fitbit Charge 2 (FC2)

All participants enrolled in the NoHoW trial were provided with a FC2 (FC2; Fitbit Inc, San Francisco, CA, USA). The FC2 is a wrist-worn activity monitor which derives estimates of energy expenditure and physical activity based on data obtained from incorporated sensors and proprietary algorithms. The FC2 estimates of heart rate are obtained through a patented technology called ‘PurePulse’, which uses light-emitting diodes to monitor blood volume [28]. Data are aggregated to the minute-level and synced via the Fitbit mobile application to Fitbit servers through an application programming interface. In the present study, non-wear time is defined by the absence of a heart rate measure and all devices were set to ‘auto’ mode by default, which ensured that no heart rate reading was transmitted when the device was not on the wrist.

Autocorrelation analyses

The algorithm proposed in this study was initially based on a series of autocorrelation analyses which are presented below. In autocorrelation analyses, the correlation between values in the time series are computed as a function of the time lag between them, defined in minutes in this case. For these analyses we calculated the autocorrelation value for all time lags of up to 7 days (10080 minutes) for each participant individually, thus indicating time points within a week with the highest correlation. Fig 1 illustrates the autocorrelation for steps and heart rate for 90 minutes and 10081 minutes, respectively.

Fig 1. Autocorrelation (ACF) values for steps with time lags of 90 minutes (A), 10,080 minutes (B) and heart rate with time lags of 90 minutes (C) and 10,080 minutes (D).

Average ACF values are shown in red and the blue ribbon represents ± 1 standard deviation.

The average of the autocorrelation values (ACF) reached within 60 minutes for steps are: 15 mins: ACF = 0.31, 30 mins: ACF = 0.21, 45 mins: ACF = 0.15, 60 mins: ACF = 0.12, comparatively, heart rate values are higher: 15 mins: ACF = 0.62, 30 mins: ACF = 0.52, 45 mins: ACF = 0.46, 60 mins: ACF = 0.41. Although there is evidence of periodic patterns on subsequent days, the value does not exceed ACF = 0.09 for steps, which is observed at a lag of 1441 minutes and ACF = 0.25 is observed for heart rate at 1440 minutes, the differences in these values are likely attributable to the stochastic nature of steps when compared to heart rate. Notably, the value at 10081 mins (7 days) is ACF = 0.05 for steps and ACF = 0.13 for heart rate. Thus, the greatest autocorrelation values are observed locally for both steps and heart rate.

Wear time requirements

In order to investigate the minimum amount of wear-time required for a valid hour, day or 14-day period, intraclass correlation (ICC) analyses were conducted, as ICC is a widely used and accepted means of determining measurement agreement [34]. In each of these experiments, data were deleted incrementally and at random and the ICC was calculated between the partially deleted data and the ‘true’ steps at each increment. An ICC threshold of 0.9 was used as the selection criterion to represent 10% similarity of true values [18]. We first investigated the minimum time required within a single hour with adjustment for wear time, and thus the remaining data was divided by the proportion of the wear time and this adjusted value was used for ICC analyses. In the daily and 14-day analyses, adjustments for wear time were not made. For all analyses, two-way mixed-effects agreement models were used [34] and this was conducted with the ‘icc’ function from the ‘rel’ package in R. Fig 2a demonstrates that if 5 minutes of data are present and scaled to 60 minutes, the ICC threshold of 0.9 is reached. In the daily analysis, the ICC threshold was met at 18–19 hours per day (Fig 2b). It is important to note that our ICC comparisons for each day include non-scaled data despite using scaled data in our algorithm (outlined below). When scaling by the proportion of wear time per day, the number of hours required will be lower. We utilise 18 hours to ensure that true data are available from different parts of the day (i.e. morning, afternoon, evening) and this is a conservative requirement in line with previous research [35]. To establish minimum 14-day requirements, the ICC threshold was met at 3 days (Fig 2c). For the final algorithm, we required 4 days including at least one weekend day as the minimum criteria for inclusion, owing to the potential for differential patterns of physical activity between weekdays and weekend days [36].

Fig 2. Intraclass correlations (ICC) for incrementally deleted data and ‘true’ data.

Data are presented for scaled minutes per hour (A), for hours per day (B) and for number of days per 14 days (C).

NoHoW algorithm

Based on these analyses we propose a scaling algorithm, referred to from hereon as the ‘NoHoW algorithm’ as follows:

  1. If non-missing minutes per hour < 5 then remove hour from dataset else sum available minutes to provide hourly total
  2. Divide the number of available minutes per hour by 60 to give the proportion of wear time per hour
  3. Divide hourly total by the proportion of wear time per hour to provide a scaled hourly total
  4. If available hours per day < 18 then remove day from dataset else sum all available hours to give daily total
  5. Divide the number of available hours by 24 to give proportion of wear time per day
  6. Divide daily total by the proportion of wear time per day to provide a scaled daily total
  7. If available days per 14 days < 4 or < 1 weekend day then remove 14-day period from dataset else average all valid days

Simulation experiments

In order to test the algorithm, we performed two simulation experiments. In the first experiment, we tested traditional imputation methods as well as the proposed algorithm. This was achieved by creating datasets with simulated missingness from each of the included participant’s true data and holding back this true data to be compared to the imputed datasets. The time point at which the data were removed was random and the length of each deleted period was uniformly sampled between one and 120 minutes in duration. The decision to insert missing data at random positions was informed by observing the proportion of missing FC2 data for each hour in the first 14 days of the NoHoW study, on average 22.83% was missing with a range of 21.1% at 13:00–13:59 to 25.96% at 23:00–23:59 (S1 Fig). To determine the length of missing periods in this study, we quantified the length of each missing period in in the first 14 days of the NoHoW study, where the length was less than an entire day (1440 minutes). Of the 146,165 missing periods, 139,213 (95.24%) were less than 60 minutes and 3882 (2.7%) were greater than 120 minutes (S2 Fig), thus we set 120 minutes as the upper limit for the length of insertions. The final parameter in the missing data algorithm was the number of missing periods, which was set to 40. This resulted in the amount of missing data per day being 13.7% (11.76% inserted) on average and ranging up to 44.4% (36.81% inserted) in simulation study 1.

Utilising the same simulated missing datasets, our first simulation study tested the methodologies below for dealing with missing data.


The effect of no imputation or adjustment strategy was demonstrated by simply reporting the physical activity summaries for the simulated missing datasets.

Mean imputation.

Missing data were imputed with the i) mean of all the remaining data and ii) with the mean of the individuals remaining data. This was conducted with the Hmisc package in R.

Random forest imputation.

We performed random forest imputation, utilising the ‘missForest’ package in R. This is a non-parametric imputation method, which implements the original random forest algorithm [37]. We performed random forest imputation to predict the missing values for steps, heart rate and calories on each participants data using weekday and hour as observed, non-missing variables. Hyperparameters were selected with consideration of computational feasibility; We utilised 100 trees in each forest, the number of randomly sampled variables at each split was set to the square root of the number of variables and the maximum number of iterations was set to 5.

Multiple imputation

We tested multiple imputation with the use of bootstrapping and predictive mean matching utilising i) the entire sample and ii) individual-level data. In the case of the overall model, we utilised age, gender and day of the week as covariates, as they have previously been shown to be associated with differential patterns of physical activity [18,38]. In the individual models, hour of the day was used as an additional covariate. An advantage of multiple imputation is the repetition of the imputation process thus attempting to address the uncertainty associated with a single imputation. We utilised 5 imputations in the overall model, and in the individual level model we utilised 7 imputations. Multiple imputation was implemented with the Hmisc package in R.

Kalman imputation

Lastly, we tested Kalman smoothing imputation using a structural time series model. Kalman imputation was implemented with the imputeTS package in R to impute caloric expenditure, steps and heart rate.

Simulation study 2

In simulation study 2, we investigated how the bias introduced by the NoHoW algorithm, Kalman imputation and individual level multiple imputation may vary depending of the quantity and position of missing data. We chose to include these individual centred approaches as they were the only individualised approaches that were statistically equivalent to the true data across all activity types in simulation study 1. As in the first simulation study, we utilised 14-days (20160 minutes) of data for each participant. We simulated missingness randomly throughout the day and in all iterations, the maximum length of each insertion was set to 120 minutes. The simulations were split in to 10 windows of missingness, where the number of missing periods inserted for each participant increased incrementally with each simulation window. In the first window, the number of missing periods per participant was sampled from a uniform distribution between 0–10, the second between 10–20 up to the tenth which inserted 90–100 missing periods in each iteration. Within each window 20 simulations were conducted per participant, for a total of 21,800 iterations of each algorithm overall.

Physical activity metrics

Each of the imputation methods tested in both simulation studies were used to address a number of distinct physical activity metrics including total steps, total daily energy expenditure (TDEE) and minutes of sedentary, light, moderate and vigorous physical activity. Both steps and TDEE for a given interval are extracted from the FC2 and time in each of sedentary, light, moderate and vigorous are defined by the heart rate reserve (HRR) method which is computed for each minute in the dataset. To facilitate this method, we estimated maximum heart rate for each participant using the Tanaka method; (208–0.7 x age) [39]. To define resting heart rate, we first determined sleeping heart rate, which was defined as the mean of the lowest 20 consecutive minutes observed between 00:00 and 08:00 am, when steps/min were < 5. After sleeping heart rate was defined, an 8% increase was used to approximate resting heart rate as this represents a typical difference between resting and sleeping heart rate [40]. Relative intensity of each minute was then calculated: (1)

The following cut points for were applied: Sedentary (<20% HRR), light (20–40% HRR), moderate (40–60% HRR), and vigorous (≥60% HRR) [32]. For each missing minute in the dataset, each of the imputation methods described above were used to impute or scale steps, caloric expenditure and heart rate to produce hourly, daily and average physical activity estimates.

Statistical analysis

All data are presented as means and standard deviations unless otherwise stated and a flowchart detailing both simulation studies is available in S2 Fig. To evaluate the performance of each method, root mean squared error (RMSE) was calculated for all physical activity metrics for hourly, daily and 14-day averages, relative to the observed data. Where RMSE is defined as: (2)

Where refers to predicted values, yi refers to the true values and n refers to the number of observations. Equivalence tests were performed to investigate whether the models were statistically equivalent to the true data. To be considered equivalent, the 90% confidence interval of the estimate must fall within ± 10% of the criterion mean. Simulation study 1 was conducted on an intel i7-8750H with 32GB RAM and 12 logical processors. Simulation study 2 was undertaken on ARC3, part of the High-Performance Computing cluster at the University of Leeds, UK. Statistical analyses were conducted with R version 3.6.3 using a p-value of < 0.05 to determine statistical significance.


The participants meeting the minimum criteria were predominantly female (n = 93, male = 16) and were primarily from the Danish centre (DK = 69, UK = 23, Portugal = 17), Table 1 presents the demographic and physical activity results for the included sample.

Table 1. Demographic data and physical activity averages for the included sample (n = 109).

Total daily energy expenditure (TDEE) is presented is kcals/day, sedentary, light, moderate and vigorous are presented in minutes/day.

The computation time for each of the included algorithms in the first simulation were as follows: Overall mean imputation: 18.23 Minutes, Individual mean imputation: 1.27 Minutes, Overall multiple imputation: 17.61 Hours, Individual multiple imputation: 17.04 Minutes, Random forest imputation: 4.36 Hours, Kalman imputation: 2.16 Minutes, NoHoW method: 2.12 Seconds.

Table 2 illustrates the results of the first simulation study for 14-day, daily and hourly comparisons and Table 3 presents the results of equivalence tests for each of the methods. For TDEE, Individual multiple imputation had the smallest RMSE for 14-day (36.32 kcal), followed by the NoHoW method (39.51 kcal), and for the hourly comparison, Kalman imputation was superior (14.11 kcal). In the daily comparison the smallest RMSE was observed for the NoHoW method (115.86 kcal). All methods except removal (mean difference: -343.44 kcal) were statistically equivalent to the true data, with the smallest mean difference observed for Individual multiple imputation. For steps, the lowest RMSE was observed for the NoHoW method for 14-day (397.83 steps) and daily comparison (1366.92 steps) and Kalman imputation for hourly comparison (173.78 steps). All methods except removal (mean difference: -1320.74 steps, p-value >0.05), were statistically equivalent to the true data. In the HRR analysis, multiple imputation methods, Kalman imputation and the NoHoW algorithm were statistically equivalent for all sedentary, light, moderate and vigorous comparisons.

Table 2. Mean ± standard deviation estimates and Root Mean Squared Error (RMSE) for each of the imputation methods tested in simulation study 1.

Total daily energy expenditure (TDEE) is presented is kcals, sedentary, light, moderate and vigorous are presented in minutes.

Table 3. Mean ± standard deviation estimates and equivalence test results for each of the imputation methods tested in simulation study 1.

Total daily energy expenditure (TDEE) is presented is kcals, sedentary, light, moderate and vigorous are presented in minutes. Bounds refers to the equivalence boundaries and p-value upper and lower refers to equivalence tests at the upper and lower equivalence bounds.

In the second simulation study, which is visually represented as boxplots in Fig 3, the aggregated RMSE for each of the tested approaches tended to increase with the proportion of missing data. For the TDEE estimation (Fig 3A), the first iteration (1% missingness added) resulted in a mean RMSE of 31.14 kcal/day for the NoHoW method (range 28.82–33.12 kcal/day) compared to multiple imputation: 21.30 kcal/day (range 19.20–23.11 kcal/day) and Kalman imputation: 37.44 kcal/day (range 35.49–39.90 kcal/day). Comparatively, at the 10th insertion of missingness (~28% missingness added) a maximum RMSE of 68.89 kcal/day, 68.05 kcal/day and 72.55 kcal/day was observed for NoHoW, multiple imputation and Kalman imputation, respectively. For steps (Fig 3B), evidence of slightly superior performance was observed for multiple imputation at the lower levels of missingness (<19%). However, mean RMSE values for each of the methods remained similar and did not differ by more than 86 steps/day. In the HRR analysis, differences were the greatest in the sedentary comparison (Fig 3C), with the NoHoW and Kalman methods having a lower mean RMSE than multiple imputation at each window. The largest difference was observed at 28% missingness, where the mean RMSE values were 24.87 mins/day (range: 23.15–26.39 mins/day) for the NoHoW method, 55.56 (range 53.69–57.76) mins/day for multiple imputation and 23.73 mins/day (range 21.46–26.89 mins/day) for Kalman imputation. For light (Fig 3D) and moderate (Fig 3E) the NoHoW method showed the lowest mean RMSE values after 13% missingness. Its largest mean RMSE of 15.19 mins/day (range 12.81–17.42 mins/day) for light activity and 5.38 mins/day (range 4.72–6.26 mins/day) for moderate activity were observed at 28% missingness. Lastly, in the vigorous activity simulation (Fig 3F), multiple imputation had the lowest mean RMSE with <7% added missingness but Kalman and NoHoW methods were superior at higher levels of missingness. In the 28% missingness window, NoHoW reached a mean RMSE of 2.25 mins/day (range 1.84–3.03 mins/day) mins/day and Kalman reached 2.28 mins/day (range 1.85–2.95 mins/day). Results of the second simulation study are available in S1 Table.

Fig 3. Boxplots detailing Root Mean Squared Error (RMSE) values from simulation study 2 for each window of missingness.

Data are presented for TDEE (A), Steps (B), Sedentary (C), Light (D), Moderate (E), Vigorous (F). Mean missing data refers to the additional data added in the simulations.


The use of commercial activity monitors in research environments is proliferating, creating new research opportunities, however, it is critical to take steps to ensure the integrity of these data is not challenged by missing data. The purpose of the present study was to develop and test a methodology to account for missingness in physical activity data collected with a commercial activity monitor in a free-living environment. In our initial experiments, we utilised ICC analyses to show that if data are scaled within an hour, the relative data requirements to meet an ICC threshold of 0.9 are minimal (~5 minutes). This relates to the relative similarity between ‘local’ data points, as confirmed by our autocorrelation analyses. We also show that if the data are not scaled by wear time the relative requirements for a day equates to approximately 18 hours per day. This is in contrast to a previous study, which showed that relative to a 14 hours/day criterion, at least 13 hours/day of accelerometer data are required [41]. This slight discrepancy in the proportion of the day required may relate to the inclusion of night hours in our sample. Given the likelihood that this is a highly sedentary period, missing data at night is likely to be less influential on daily totals.

In simulation study 1, we used each of the tested methods to impute metrics that are likely to be of importance depending on the specific research aims. Our results suggest differential outcomes depending on the metric selected, for instance, random forest imputation, overall mean and individual mean methods did not impute vigorous or moderate minutes regularly, as reflected in the non-significant equivalent results (indicating these methods are not statistically equivalent). This is likely due to the low proportion of the day in which these activities are performed. In the first simulation study, we observed a slight tendency for the NoHoW method to overestimate minutes of moderate and vigorous activity. This may relate to the position of the missing data in simulation 1; For example, if missing data occurs in the sedentary period after an exercise bout then this period will be overestimated. As exercise is infrequent in non-athlete populations this is unlikely to result in a large error in mean differences. Indeed, the estimates for moderate and vigorous differed by < 2 minutes/day in the 14-day comparison. Researchers should consider imputation strategies based on observed activity data from their sample or should select methodologies which are statistically equivalent in the specific activities of interest.

We have also shown that all tested methods for all comparisons resulted in a RMSE which was lower than no imputation (i.e. removal). Making no attempt to adjust for missingness effectively assumes that activity was 0 and our results demonstrate the potential implications of this. In our first study, ~14% of the day was missing on average with ~12% inserted, equating to a wear time of 20–22 hours, which falls within the acceptable levels of missingness for most accelerometer research [14,15] and therefore evidences the importance of using one of these methods even in the case of relatively small quantities of missing data. Of the imputation methods tested, an advantage of individual-centred methods was observed, specifically Kalman imputation, individual multiple imputation and the NoHoW algorithm. Indeed, in our second simulation study, in which the maximal missingness approached double the quantity of our first simulation study the RMSE for TDEE was lower than the values observed for removal, overall mean and random forest imputation in simulation study 1, indicating the efficacy of these methods.

We simulated missingness evenly throughout the entire 24-hour period in relation to the observed patterns of missingness in the NoHoW trial. This is contrary to a previous study observing that missing data patterns more frequently occur at the beginning and end of the day [42]. It is of note that we utilised wrist-worn devices compared to the aforementioned study, which utilised hip worn accelerometers. Unlike wrist-worn monitors, hip-worn accelerometers are generally removed with changing of clothes. This may encourage compliance [43] and contribute to a more uniform distribution of missingness throughout the day.

We consider the relative computational simplicity of the NoHoW method to be a significant advantage. Accelerometer data of this kind can be extremely high volume and researchers must select their imputation strategy with consideration of both error reduction and computational feasibility. It may be possible to utilise advanced machine learning techniques to impute missing data, but these methods are computationally expensive and may be technically inaccessible to many researchers. In addition, more information (e.g. physiological, psychological or behavioural factors) may allow for more accurate multivariate imputation techniques but in free-living widescale settings this information is likely to be limited, thus our method is likely to be widely applicable. A further advantage of the present study is the testing of numerous activity metrics in addition to steps. Steps are a highly interpretable and relatable metric produced by wearable devices and some evidence suggests that estimates of steps from Fitbit devices are more valid and reliable than other derived variables, i.e. TDEE [4446] although machine learning techniques may facilitate the refinement of energy expenditure estimates [47]. Nevertheless, the metric of interest to researchers will vary depending on the aims and hypotheses of a study and we demonstrate that the NoHoW method, Kalman imputation and individual level multiple imputation perform particularly well across a variety of physical activity metrics.

Key limitations of the present study are the utilisation of participants with a high proportion of wear time (>97.5%). Whilst highly adherent participants were required in order to have a near-complete dataset to validate against, we cannot rule out the possibility that the included participants are in some way behaviourally different from the participants that remove the FC2 more frequently. Second, we inserted missing data at random positions, and it remains uncertain how representative this is of free-living data in other studies. Participants may remove devices for comfort, aesthetic reasons, charging or under conditions where they would not wish to have measurements made (e.g. extreme sedentariness) and thus, it is possible that missingness is not completely at random [48] and may differ between populations and research studies. Unfortunately, no definitive method exists to test if data are missing at random [49] and many imputation strategies have limited capabilities to overcome this. However, our second simulation study simulates a wide variety of missing patterns in an attempt to identify such biases and worst-case scenarios in the selected methods.

Incorporation of activity monitoring devices is a necessary step in improving physical activity and energy balance tracking in research and clinical settings. We have proposed a simple and accessible methodology which effectively reduces the bias introduced to physical activity estimates by non-wear time and may improve the validity of research conclusions. Other imputation strategies (i.e. multiple imputation and Kalman imputation) performed comparatively well and importantly, all the methods tested in this study are superior to data removal. Researchers and clinicians utilising commercial activity monitors to monitor physical activity longitudinally should account for missingness in datasets and the algorithm presented in this study offers an approach to this.

Supporting information

S1 Table. Aggregated results for each window of missingness for each physical activity metric in simulation study 2.

Difference refers to the difference in means between the imputed and true value. Abbreviations: SD: standard deviations, RMSE: Root mean squared error.


S1 Fig. The percentage of missing data for each hour of the day in the NoHoW trial.


S2 Fig. A density plot detailing the lengths of missing data (<1440 minutes in length) in the NoHoW trial.

The mean is represented by the red dashed line and the median is represented by the blue dashed line.


S1 Data. A flowchart detailing the simulation procedures conducted in this study.



  1. 1. Beaulieu K, Hopkins M, Blundell J, Finlayson G. Impact of physical activity level and dietary fat content on passive overconsumption of energy in non-obese adults. Int J Behav Nutr Phys Act. 2017;14: 14. pmid:28166797
  2. 2. Swift DL, McGee JE, Earnest CP, Carlisle E, Nygard M, Johannsen NM. The Effects of Exercise and Physical Activity on Weight Loss and Maintenance. Prog Cardiovasc Dis. 2018;61: 206–213. pmid:30003901
  3. 3. Kerns JC, Guo J, Fothergill E, Howard L, Knuth ND, Brychta R, et al. Increased Physical Activity Associated with Less Weight Regain Six Years After “The Biggest Loser” Competition. Obesity. 2017;25: 1838–1843. pmid:29086499
  4. 4. Wadden TA, Neiberg RH, Wing RR, Clark JM, Delahanty LM, Hill JO, et al. Four-year weight losses in the look AHEAD study: Factors associated with long-term success. Obesity. 2011;19: 1987–1998. pmid:21779086
  5. 5. Schoeller DA, Shay K, Kushner RF. How much physical activity is needed to minimize weight gain in previously obese women? Am J Clin Nutr. 1997;66: 551–556. pmid:9280172
  6. 6. Drenowatz C, Greier K. The Role of Energy Flux in Weight Management. Exerc Med. 2017;1: 4.
  7. 7. Sirard JR, Melanson EL, Li L, Freedson PS. Field evaluation of the Computer Science and Applications, Inc. physical activity monitor. Med Sci Sports Exerc. 2000;32: 695–700. pmid:10731015
  8. 8. Helmerhorst HJF, Brage S, Warren J, Besson H, Ekelund U. A systematic review of reliability and objective criterion-related validity of physical activity questionnaires. Int J Behav Nutr Phys Act. 2012;9: 103. pmid:22938557
  9. 9. Hills AP, Mokhtar N, Byrne NM. Assessment of Physical Activity and Energy Expenditure: An Overview of Objective Measures. Front Nutr. 2014;1: 1–16. pmid:25988106
  10. 10. Swartz AM, Strath SJ, Bassett DRJ, O’Brien WL, King GA, Ainsworth BE. Estimation of energy expenditure using CSA accelerometers at hip and wrist sites. Med Sci Sports Exerc. 2000;32: S450–6. pmid:10993414
  11. 11. Whybrow S, Ritz P, Horgan GW, Stubbs RJ. An evaluation of the IDEEATM activity monitor for estimating energy expenditure. Br J Nutr. 2013;109: 173–183. pmid:22464547
  12. 12. Plasqui G, Bonomi AG, Westerterp KR. Daily physical activity assessment with accelerometers: New insights and validation studies. Obes Rev. 2013;14: 451–462. pmid:23398786
  13. 13. Troiano RP, Berrigan D, Dodd KW, Mâsse LC, Tilert T, McDowell M. Physical activity in the United States measured by accelerometer. Med Sci Sports Exerc. 2008;40: 181–8. pmid:18091006
  14. 14. Choi L, Liu Z, Matthews CE, Buchowski MS. Validation of accelerometer wear and nonwear time classification algorithm. Med Sci Sports Exerc. 2011;43: 357–364. pmid:20581716
  15. 15. Ridgers ND, Fairclough S. Assessing free-living physical activity using accelerometry: Practical issues for researchers and practitioners. Eur J Sport Sci. 2011;11: 205–213.
  16. 16. Tudor-Locke C, Camhi SM, Troiano RP. A catalog of rules, variables, and definitions applied to accelerometer data in the national health and nutrition examination Survey, 2003–2006. Prev Chronic Dis. 2012;9: E113. pmid:22698174
  17. 17. Liu B, Yu M, Graubard BI, Troiano RP, Schenker N. Multiple imputation of completely missing repeated measures data within person from a complex sample: application to accelerometer data in the National Health and Nutrition Examination Survey. Stat Med. 2016;35: 5170–5188. pmid:27488606
  18. 18. Doherty A, Jackson D, Hammerla N, Plötz T, Olivier P, Granat MH, et al. Large scale population assessment of physical activity using wrist worn accelerometers: The UK biobank study. Buchowski M, editor. PLoS One. 2017;12: e0169649. pmid:28146576
  19. 19. Kapteyn A, Banks J, Hamer M, Smith JP, Steptoe A, Van Soest A, et al. What they say and what they do: Comparing physical activity across the USA, England and the Netherlands. J Epidemiol Community Health. 2018;72: 471–476. pmid:29643112
  20. 20. Loprinzi PD, Cardinal BJ, Crespo CJ, Brodowicz GR, Andersen RE, Smit E. Differences in demographic, behavioral, and biological variables between those with valid and invalid accelerometry data: Implications for generalizability. J Phys Act Heal. 2013;10: 79–84.
  21. 21. Stephens S, Beyene J, Tremblay MS, Faulkner G, Pullnayegum E, Feldman BM. Strategies for Dealing with Missing Accelerometer Data. Rheum Dis Clin North Am. 2018;44: 317–326. pmid:29622298
  22. 22. Meng Y, Speier W, Shufelt C, Joung S, E Van Eyk J, Bairey Merz CN, et al. A Machine Learning Approach to Classifying Self-Reported Health Status in a Cohort of Patients with Heart Disease Using Activity Tracker Data. IEEE J Biomed Heal Informatics. 2020;24: 878–884. pmid:31199276
  23. 23. Lee PH. Data imputation for accelerometer-measured physical activity: The combined approach. Am J Clin Nutr. 2013;97: 965–971. pmid:23553165
  24. 24. Staudenmayer J, Zhu W, Catellier DJ. Statistical considerations in the analysis of accelerometry-based activity monitor data. Med Sci Sports Exerc. 2012;44: S61–S67. pmid:22157776
  25. 25. Katapally TR, Muhajarine N. Towards uniform accelerometry analysis: a standardization methodology to minimize measurement bias due to systematic accelerometer wear-time variation. J Sports Sci Med. 2014;13: 379–86. Available: pmid:24790493
  26. 26. Chen C, Jerome GJ, Laferriere D, Young DR, Vollmer WM. Procedures used to standardize data collected by RT3 triaxial accelerometers in a large-scale weight-loss trial. J Phys Act Health. 2009;6: 354–9. Available: pmid:19564665
  27. 27. Thraen-Borowski KM, Gennuso KP, Cadmus-Bertram L. Accelerometer-derived physical activity and sedentary time by cancer type in the United States. PLoS One. 2017;12. pmid:28806753
  28. 28. Benedetto S, Caldato C, Bazzan E, Greenwood DC, Pensabene V, Actis P. Assessment of the fitbit charge 2 for monitoring heart rate. PLoS One. 2018;13: e0192691. pmid:29489850
  29. 29. Rennie KL, Hennings SJ, Mitchell J, Wareham NJ. Estimating energy expenditure by heart-rate monitoring without individual calibration. Med Sci Sports Exerc. 2001;33: 939–945. pmid:11404659
  30. 30. Silva AM, Santos DA, Matias CN, Júdice PB, Magalhães JP, Ekelund U, et al. Accuracy of a combined heart rate and motion sensor for assessing energy expenditure in free-living adults during a double-blind crossover caffeine trial using doubly labeled water as the reference method. Buchowski M, editor. Eur J Clin Nutr. 2015;69: 20–27. pmid:24690589
  31. 31. Bassett DR, Rowlands A, Trost SG. Calibration and validation of wearable monitors. Med Sci Sports Exerc. 2012;44: S32–S38. pmid:22157772
  32. 32. Schrack JA, Leroux A, Fleg JL, Zipunnikov V, Simonsick EM, Studenski SA, et al. Using Heart Rate and Accelerometry to Define Quantity and Intensity of Physical Activity in Older Adults. Journals Gerontol—Ser A Biol Sci Med Sci. 2018;73: 668–675. pmid:29509832
  33. 33. Scott SE, Duarte C, Encantado J, Evans EH, Harjumaa M, Heitmann BL, et al. The NoHoW protocol: A multicentre 2×2 factorial randomised controlled trial investigating an evidence-based digital toolkit for weight loss maintenance in European adults. BMJ Open. 2019;9: e029425. pmid:31575569
  34. 34. Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016;15: 155–63. pmid:27330520
  35. 35. Shook RP, Hand GA, O’Connor DP, Thomas DM, Hurley TG, Hébert JR, et al. Energy Intake Derived from an Energy Balance Equation, Validated Activity Monitors, and Dual X-Ray Absorptiometry Can Provide Acceptable Caloric Intake Data among Young Adults. J Nutr. 2018;148: 490–496. pmid:29546294
  36. 36. Shiroma EJ, Lee IM, Schepps MA, Kamada M, Harris TB. Physical Activity Patterns and Mortality: The Weekend Warrior and Activity Bouts. Med Sci Sports Exerc. 2019;51: 35–40. pmid:30138219
  37. 37. Breiman L. Random Forests. Mach Learn. 2001;45: 5–32.
  38. 38. Berkemeyer K, Wijndaele K, White T, Cooper AJM, Luben R, Westgate K, et al. The descriptive epidemiology of accelerometer-measured physical activity in older adults. Int J Behav Nutr Phys Act. 2016;13: 1–10. pmid:26733186
  39. 39. Tanaka H, Monahan KD, Seals DR. Age-predicted maximal heart rate revisited. J Am Coll Cardiol. 2001;37: 153–156. pmid:11153730
  40. 40. Kräuchi K, Wirz-Justice A. Circadian Clues to Sleep Onset Mechanisms. Neuropsychopharmacology. 2001;25: S92–S96. pmid:11682282
  41. 41. Herrmann SD, Barreira TV., Kang M, Ainsworth BE. How many hours are enough? Accelerometer wear time may provide bias in daily activity estimates. J Phys Act Heal. 2013.
  42. 42. Yue Xu S, Nelson S, Kerr J, Godbole S, Patterson R, Merchant G, et al. Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity. Stat Methods Med Res. 2018;27: 1168–1186. pmid:27405327
  43. 43. Diaz KM, Krupka DJ, Chang MJ, Shaffer JA, Ma Y, Goldsmith J, et al. Validation of the Fitbit One® for physical activity measurement at an upper torso attachment site. BMC Res Notes. 2016;9: 213. pmid:27068022
  44. 44. Feehan LM, Geldman J, Sayre EC, Park C, Ezzat AM, Young Yoo J, et al. Accuracy of fitbit devices: Systematic review and narrative syntheses of quantitative data. JMIR mHealth uHealth. 2018;6: e10527. pmid:30093371
  45. 45. O’Driscoll R, Turicchi J, Beaulieu K, Scott S, Matu J, Deighton K, et al. How well do activity monitors estimate energy expenditure? A systematic review and meta-analysis of the validity of current technologies. Br J Sports Med. 2020;54: 332–340. pmid:30194221
  46. 46. O’Driscoll R, Turicchi J, Hopkins M, Gibbons C, Larsen SC, Palmeira AL, et al. The validity of two widely used commercial and research-grade activity monitors, during resting, household and activity behaviours. Health Technol (Berl). 2020;10: 637–648.
  47. 47. O’Driscoll R, Turicchi J, Hopkins M, Horgan GW, Finlayson G, Stubbs JR. Improving energy expenditure estimates from wearable devices: A machine learning approach. J Sports Sci. 2020;00: 1–10. pmid:32252598
  48. 48. Sterne JAC, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ. 2009;339: 157–160. pmid:19564179
  49. 49. Ae Lee J, Gill J. Missing value imputation for physical activity data measured by accelerometer. Stat Methods Med Res. 2018;27: 490–506. pmid:26994215