Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Modeling energy balance while correcting for measurement error via free knot splines

  • Daniel Ries ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Statistical Sciences Department, Sandia National Laboratories, Albuquerque, NM, United States of America, Department of Statistics, Iowa State University, Ames, IA, United States of America

  • Alicia Carriquiry,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Writing – review & editing

    Affiliation Department of Statistics, Iowa State University, Ames, IA, United States of America

  • Robin Shook

    Roles Conceptualization, Writing – review & editing

    Affiliation Center for Children’s Healthy Lifestyles & Nutrition, Children’s Mercy, Kansas City, MO, United States of America

Modeling energy balance while correcting for measurement error via free knot splines

  • Daniel Ries, 
  • Alicia Carriquiry, 
  • Robin Shook


Measurements of energy balance components (energy intake, energy expenditure, changes in energy stores) are often plagued with measurement error. Doubly-labeled water can measure energy intake (EI) with negligible error, but is expensive and cumbersome. An alternative approach that is gaining popularity is to use the energy balance principle, by measuring energy expenditure (EE) and change in energy stores (ES) and then back-calculate EI. Gold standard methods for EE and ES exist and are known to give accurate measurements, albeit at a high cost. We propose a joint statistical model to assess the measurement error in cheaper, non-intrusive measures of EE and ES. We let the unknown true EE and ES for individuals be latent variables, and model them using a bivariate distribution. We try both a bivariate Normal as well as a Dirichlet Process Mixture Model, and compare the results via simulation. Our approach, is the first to account for the dependencies that exist in individuals’ daily EE and ES. We employ semiparametric regression with free knot splines for measurements with error, and linear components for error free covariates. We adopt a Bayesian approach to estimation and inference and use Reversible Jump Markov Chain Monte Carlo to generate draws from the posterior distribution. Based on the semiparameteric regression, we develop a calibration equation that adjusts a cheaper, less reliable estimate, closer to the true value. Along with this calibrated value, our method also gives credible intervals to assess uncertainty. A simulation study shows our calibration helps produce a more accurate estimate. Our approach compares favorably in terms of prediction to other commonly used models.


Obesity is perhaps the most serious public health problem of the 21st century, given the prevalence, global reach, and widespread health, economic, and social consequences. While the weight gain and lost is most certainly a complex interplay of a large number of factors across a variety of domains [1], ultimately a chronic energy surplus or deficit (energy intake versus energy expenditure) determines body weight change [26]. However, accurately measuring energy balance in free-living individuals is challenging, even in small studies. Yet to design effective public health policies and interventions, it would be valuable to be able to assess energy balance in nationwide surveys such as National Health and Nutrition Examination Survey (NHANES). Clearly, instruments such as doubly labeled water (DLW) and dual-energy X-ray absorptiometry (DXA) are too costly and burdensome to administer in large groups. Alternatively, consumer devices designed to measure physical activity and body composition are generally affordable, easy to use, and popular (an estimated 45 million will be sold in 2017), [7] but have varying levels of validity and reliability [810].

Within the past decade mathematical models have been formulated based on the principles of the first law of thermodynamics (rate of energy storage = rate of energy intake—rate of energy expenditure) [11]. Developed with multiple datasets containing gold-standard measures of energy expenditure, energy intake, and changes in energy storage (e.g. body composition using a two-compartment model of fat mass and fat-free mass) during periods of overfeeding [12] or caloric restriction [13], researchers have developed and refined a model based on the energy balance principle [1416]. The result is a simple, easy-to-use equation that offers great promise in the quest for estimating energy intake using objectively measured methods. We have recently used these energy balance equations to compare estimates of energy intake obtained through gold-standard methods (DLW) and arm-based activity monitors (Sensewear Armband, BodyMedia Inc. Pittsburgh, PA) [17]. We observed very low group error in the estimates of energy expenditure and equation-derived energy intake using both the DLW and armband, indicating equivalency between the measures. However, the individual error for equation-derived energy intake and expenditure was quite large, likely due to large individual measurement error.

Therefore, a question of interest is whether measurements of energy balance obtained from self-report instruments or even from objective measuring tools such as the Sensewear Armband or other consumer devices, which are much less costly to apply, and can be calibrated to correct for measurement error. We explore the association between measurements obtained from accurate instruments and those obtained from noisy instruments which can be administered to large groups. We are interested in formulating a model for energy balance by using energy expenditure (EE) and changes in energy stores (ΔES) while accounting for dependence between the two and measurement error. Widely accepted gold standard measurements exist for both EE (DLW) [12, 15, 16, 1821] and ΔES (DXA) [12, 16, 20]. Table 1 lists abbreviations used in this article. Unfortunately, these instruments are expensive and burdensome. There are alternative approaches [17] to quantify both EE and ΔES that while less expensive and easier to administer, are subject to bias and other errors. Our goal is to model energy balance by using both gold standard and less precise instruments with the end goal of evaluating the error present in the measurements and ultimately calibrating the less precise instruments, so in future studies, researchers can calibrate their measurements of EE and/or ΔES if they are not using a gold standard.

Measurement error modeling is a well developed field in statistics. Fuller made popular linear measurement error models through his book that was the first expose on measurement error [22]. Nonlinear models have since become more popular and widely used and an overview of these models is given in [23]. Berry et al. [24] proposed Bayesian measurement error models that used p-splines to model the relationship between the latent variable and noisy measurements. This was one of the first Bayesian approaches to a problem like this as it was at the onset of the Markov Chain Monte Carlo revolution that allowed for Bayesian modeling to be practical. These models were then extended by [25] and [26] by allowing for a more flexible distribution of the latent variables than a Gaussian as well as using b-splines instead of p-splines. They used Dirichlet Process Mixture Models to allow for more flexibility in the structure of the latent variables, and though simulation and real data anaylsis showed it could have a major effect if the true underlying distribution was not Gaussian. Additionally, they allowed for non-constant variances in the error terms for noisy measurements and gold standard measurements. There is a large body of measurement error research applied to the field of nutrition. Nusser et al. [27] developed a semiparametric approach to estimating intake distributions using noisy, 24 hour recalls of nutrient intakes. Sinha et al. [28] developed Bayesian methods for the analysis of nutritional data that used b-splines and Dirichlet Process Mixtures to allow for flexibility, that would later be extended by [25] and [26]. The analysis of semicontinous data with measurement error was explored in [29], otherwise known as the “NCI method”, and later extended in [30] and [31]. The strong research in measurement error modeling developed for the field of nutrition can be used as a starting point for measurement error modeling in the physical activity realm. Reversible Jump MCMC was designed as a means of model selection [32]. In the context of b-splines, model selection is determining the number of knots and the locations of the knots. An early and practical approach to regression using splines and Reversible Jump MCMC was given in [33], which introduced the idea of Bayesian free-knot splines. Although the method used Reversible Jump MCMC, it was not a “fully Bayesian” approach as it did not place priors on the spline regression coefficients, rather it used OLS to update regression coefficients during each step of the algorithm. A more fully Bayesian approach was given by [34] which allowed for placing priors on the regression coefficients. For complex regression problems where such things as discontinuities in the curve existed, the method of [34] performed better, but with smooth functions that appear to have continuous second derivatives, the simpler to implement method of [33] performed comparably. In these papers, the explanatory variable for which the locations of the knots are being chosen, was assumed to be fixed and known. In this paper, those values will be treated as latent variables which will add a layer of complexity to the algorithm.

In this article we adopt a Bayesian semi-parametric approach. We make distributional assumptions about error terms, but we try to be flexible when modeling the true relationship between less precise measurements and the truth. We propose using free knot splines to model the relationship between the less precise measurements and the truth and we build a Reversible Jump MCMC algorithm to do so. The remainder of this article is organized as follows: in the Methodology section we describe the data structure and assumptions about their dependencies; we also briefly review two commonly used models and introduce a bivariate, Bayesian semi-parametric model that allows for dependence between EE and ΔES. In the Simulated Data and Simulation Study sections, we describe how we simulate complex data and how we constructed the simulation study to assess the performance of the three models. The Results section summarizes our findings in the simulation study. In the Calibration section, we show how calibration could be performed using the proposed model given new data when no gold standard measurements are available.


In this section, a new way to analyze the relationship between gold standard and less expensive measurements that accounts for dependence between EE and ΔES is presented. First, a more precise definition of ΔES is given as well as a practical way to calculate it in practice. Independence assumptions are listed along with justifications that help simplify the model construction. Two simpler models are described before the proposed method: a naïve model that assumes there is no measurement error in gold standard measurements, and a linear measurement error model that assumes a linear relationship between less expensive measurements and the true, latent values of EE and ΔES. Finally, the proposed new model using free knot splines to model the relationship between less expensive measurements and the true, latent values of EE and ΔES is described in further detail.

Calculation of ΔES

In the energy balance equation, (1)

ΔES is expressed in kcals, and can be positive or negative. To convert DXA measurements of fat mass and fat free mass to kcals, we use Eq (2). Because we assume that energy stores are characterized only as either fat mass (FM) or fat free mass (FFM), this equation provides an exact answer if we know the values of CFM and CFFM. We let CFM = 9500 and CFFM = 1100 like in [20], recognizing that a single value does not account for biological variation. We divide these by the change in time (14 days ± 3 days) and multiply by CFM and CFFM to get ΔES in kcals. For each individual, we compute (2)


We denote observed average daily EE measured via DLW for subject i over time period j by , and observed average daily ΔES measured via DXA for subject i over time period j by . A positive value for ΔES indicates that more calories were taken in than expended. We compute daily values of EE for a person by averaging the total EE for that person obtained by DLW, because DLW gives an estimate of EE over a period of time, in this instance approximately 14 days.

When collecting data on a large population, it is feasible to administer less expensive instruments on most of the subjects. However, they result in less accurate measurements. Although there are several less precise ways to measure EE and ΔES, we keep the notation general since in any given situation we will refer only to one specific instrument. We denote the observed average daily EE obtained with an less precise instrument for subject i over time period j, , and the observed average daily change in energy stores measured by an less precise instrument for subject i over time period j, .

Lastly, the values which we cannot observe are the usual EE and ΔES for subject i. We define usual as a long run average (expected value) of the true EE and ΔES. Let represent the usual daily EE for subject i and represent the usual daily ΔES for subject i. Note that even if we could observe daily EE and daily ΔES for each participant with no error, there is still within-person variability in these two variables because people change their caloric intake and their physical activity from day to day.

(In)Dependence assumptions

The observed data vector for subject i at time j is (, , , , zi) where Zi is a vector of covariates measured with no error for subject i. We start by assuming independence between individuals.

Several of the variables in the model are conditionally independent. Given the value (usual daily EE for subject i) and (usual daily ΔES for subject i), and covariates Zi, we assume that:

  1. and are independent of each other,
  2. and are each independent of both and ,
  3. and are independent of each other.

Assumption 1. follows because given the true values X and covariates Z, knowing an less precise measurement will give us no more information about the less precise measurement of the other, so long as it is not self-administered. To justify assumption 2., we note that once we know the truth X, having an unbiased measurement of X will not provide any more information about the less precise, biased measurement of X. Assumption 3. follows from a reasoning similar to 1.

Naïve model

The first model we consider is what we call the naïve model. This model assumes no measurement error in the gold standard instrument, thus DLW and DXA give error-free measurements of and , respectively. We also assume that the less precise measurements Y are linearly related to the usual values and to error free covariates. Based on empirical evidence, gender, BMI, and age all had some effect on the less precise measurement of EE. The naïve model is: (3) (4) where the β1,⋅ terms represents the relationship between less precise measurements and the usual EE and ΔES and the β0 terms represent systematic biases. We let γ = (γ1,⋅, γ2,⋅, γ3,⋅) and γ1,⋅ is the coefficient for gender, γ2,⋅ is the coefficient for BMI, and γ3,⋅ is the coefficient for age. We take the standard approach and assume that the errors are normally distributed.

We choose independent priors for all model parameters for all models going forward. Where appropriate, we select priors that are conjugate or conditionally conjugate for ease of implementation but also to permit incorporating weak information through the prior. Prior distributions for all models are listed in the S4 Appendix.

Linear measurement error model (LMEM)

The Linear Measurement Error Model (LMEM) recognizes that WEE and WΔES are contaminated with additive measurement error, and are unbiased measurements of truth, rather than equal to truth. Therefore the model becomes hierarchical as it does not directly model the relationship between Y and W, but rather Y and X under the assumption that W is an unbiased measurement for X. The relationship between Y and X is assumed to be linear, and as in the naïve model, the model also accounts linearly for error-free covariates Z. We assume that the measurement errors are normally distributed. To allow dependence between EE and ΔES, we model with a bivariate normal distribution. More formally, the model is given by: (5) (6) (7) (8) (9)

The full likelihood for this model and the one in the next section are givein in the S3 Appendix.

Spline measurement error model (SMEM)

We extend the LMEM for EE and ΔES in the previous section to include both non-linear and non-parametric components. We follow the same construction of the LMEM to model the gold standard measurements as unbiased for usual attributes and subject to normally distributed measurement errors as in (7) and (8).

We wish to understand both the biases as functions of usual value and demographic covariates, as well as the measurement error in the instruments themselves. We propose modeling the less precise measurements in a semi-parametric regression framework. Specifically, model the functions m(⋅) using free knot cubic B-splines, and model demographic covariates with a linear component. We require monotone functions so we can take inverses for calibration later, but this only requires the spline coefficients to be non-decreasing ie. β1β2 ≤ … ≤ βk [35] as used in similar applications [28, 36, 37]. Our approach has three benefits. First, the spline is flexible and can pick up an unknown relationship between X⋅ and the less precise measurement of the same, which is important because we never observe the truth and therefore it is difficult to justify a particular functional form of the relationship. Second, the use of free knot splines eliminates the need for us to specify the number and position of the knots. Previous methods using splines in measurement error models choose a “moderately large” number of knots, typically at least 15 [24, 26, 28]. We use Reversible Jump MCMC (RJMCMC) to determine the number and position of knots. This means that we treat the number of knots in each regression equation and their knot locations as random variables. Third, the linear component for the covariates allows for an easy interpretation of the parameters and thus the biases in the instrument. We make a working assumption of constant variance for all measurement errors. Based on the above, the model specification is then: (10) (11) (12) (13) where Bee() and Bes() are n × (kee + 4) and n × (kes + 4) B-spline basis matrices that can be constructed using the recursion specified in [38]. We let kee and kes denote the number of knots for the EE and ΔES splines, respectively.

There are many different types of splines, but we picked B-splines because in similar problems [25, 26, 28] it has been shown that they are numerically more stable than P-splines, for example, which can have major effects on outcomes as compared in [25].

We allow more flexibility in the distribution of the latent variables , by specifying a Dirichlet process mixture prior for them. This allows the data to “speak for themselves” which is ideal when the model includes latent variables. The density of , can then be modeled as an infinite mixture of normals: (14) (15) (16) (17) (18) where α helps control how many components of the infinite mixture are used. We choose to set α to 1. The parameter ζi takes value for which group observation i came. Cat(H, π) is a categorical random variable such that P(ζi = h) = πh, hH. In any given problem, we can select H such that for some ϵ > 0 [39], pg. 552.

Although we do not know the true form of the association between the noisy measurements and the usual values, we do not anticipate it to be highly complex, so we would like to use as few knots as necessary. We use ree and res to denote the knot locations. Our discrete uniform prior on these, means that knots can only occur at the latent values of (, ). This was done largely for computational convenience; we could have assigned a continuous prior for the knot locations, but we do not believe this will adversely affect estimation because the latent (, ) are updated every MCMC iteration. Notice that we have not placed priors on the spline regression coefficients βee and βes, or the linear regression coefficients γee and γes; this is because we will update them using ordinary least squares (OLS). More details can be found in the S2 Appendix.

Simulated data

In this section we describe how we simulate data to mimic “real” observations, in order to perform a simulation study. Our simulated data need to be sufficiently complex and incorporate dependence in order to faithfully represent the distributions of true EE and EI, as well as gold standard measurements and less precise measurements. We need to simulate data for all the components in the model as well as the latent variables (, ). We explore estimation with measurement errors for the gold standard and less precise measurements under three different scenarios: normal errors, skewed errors, bimodal errors.

For this simulation, we used three covariates: gender, age, BMI. Using a total sample size of 300, we sampled 300 Bernoulli(0.5) to determine gender. Age was simulated from Uniform(20,40). The BMI for an individual was simulated from a Normal(27,5). Let Z be the matrix of dimension 300 × 3 that links covariates to individuals.

We simulate (, ) from a mixture of 5 bivariate t-distributions. Sixty observation pairs are simulated from five different bivariate t-distributions. The mean and standard deviation of the two-dimensional vector for each of the five t-distributions are each different. The scale matrix for each of the five t-distributions is constant and the degrees of freedom is equal to five.

We let the correlation between EE and EI be 0.4376 as calculated from previous studies’ data. The values used for the vector γee = (300, 14, −7) and γes = (−200, 8, −5) for gender, BMI and age, respectively. We compute using the energy balance equation in (1). Fig 1 shows histograms f or the latent variables in one simulated data set.

Fig 1. Simulated data distribution.

Distribution of simulated latent variables X from one simulated data set.

For the gold standard measurements, let (19) where uEE represents the measurement error in DLW and uΔES represents the measurement error in DXA. Above, represents the within person deviation in EE for person i during time period j from the person’s true mean, and similarly represents the within person deviation in ΔES for person i during time period j from the person’s true mean. For the less precise measurements there is a slightly different setup. The within person variability gets added to each individuals’ usual values of EE and ΔES and thus is affected by the functions m(⋅). Therefore we add these within person variation terms δ to the usual X values we simulated to get: (20) and the functions m(⋅) depend on .

The pairs (, ) are simulated jointly but independently across time and individual. We simulate the within person variability terms (, ) from a bivariate normal distribution.

We assume that DLW and DXA are unbiased measurements of EE and ΔES, respectively. These measurements are simulated according to (7) and (8) where we further brake down ν as in (19). The u term represents the measurement error components we still need to specify and δ represents the within person component of the error which we have already discussed. We assume that the u terms are independent within and across individuals as well as of all δ and X.

From these simulated values, we then get simulated gold standard data , . We generate measurement errors for the gold standard measurements (and for the less precise measurements) from three different distributions: normal, skewed normal, and a bimodal mixture of two normals that is centered around 0. Parameters were chosen such that the means of all error distributions are 0, and the variances for each distribution is the same within EE errors and within ΔES errors.

We generate observations for less precise measurements in a similar fashion as in the last section. We assume that the errors are independent within and across subjects as well as mutually independent with all δ, X and Z terms. We draw these errors from densities that are the same to those in the previous section, except with larger variances.

In contrast to the gold standard measurements which we assume are unbiased, we now add bias to the less precise measurements. The bias is introduced via the functions mee and mes. For these simulated data, we let: (21) (22)

Fig 2 shows mee(⋅) on the left and mes(⋅) on the right both against a y = x line for comparison.

Fig 2. Nonlinear functions.

Plot of nonlinear functions mee() (left) and mes() (right), and Y = X is black for reference to unbiased measurement.

We then add Zi γ to the simulated less precise measures of EE and ΔES.


We adopt a Bayesian approach to estimation in this problem, and therefore, our goal is to estimate the joint posterior distribution of all parameters and latent variables in the model. In our case, the joint posterior distribution is p(θ, XEE, XΔES|WEE, WΔES, YEE, YΔES, Z). We use Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution. For the naïve and LMEM models, we used Just Another Gibbs Sampler (JAGS) to simulate draws from the posterior distribution. This was simple to implement and was relatively quick to sample. In order to fit free knot splines which allow for dimension change, we must use Reversible Jump MCMC which requires a more complex sampler. We use R and C++ for the RJMCMC sampler. Because the algorithms are technical and not the main objectives of this paper, we provide the algorithm for the Gibbs sampler in the S1 Appendix and the reversible jump algorithm in the S2 Appendix.

Simulation study

In this section we describe a simulation study that we carried out, to check the performance of the models we propose. We are interested in the predictive performance of the models because our main goal is to develop a calibration tool. We are also interested in evaluating the robustness of the model to departures of the errors from the standard normality assumption, which is why we simulate errors from two alternative error distributions. We present performance measures such as predicted mean squared error (PMSE) for the regression function in question as well as posterior means and posterior standard deviations for parameters of interest.


We simulated 200 data sets each for normal, skewed, and bimodal errors for both 2 and 4 replicate measurements per individual. The number of individuals is 300 in all cases. Preliminary analysis suggest that the number of replicates per individual has a stronger impact on performance than the number of individuals.

Although we would like to be as flexible as possible with our distributional assumptions on the bivariate latent variables, we also want a model that produces estimates with low prediction mean squared error (PMSE) given the data constraints of our application. In practice, it is difficult to obtain more than two replicate measurements on an individual, at least when using the gold standard measurements. During the simulation study, we found that the Dirichlet Process prior on the latent variables produced unstable results in parameter estimates and low acceptance rates of proposals in the random walk Metropolis-Hastings algorithm when we only had two replicate observations per person. Results were stable however, when four replicates per person were available. Because of this issue, we fit a fourth model using a bivariate normal distribution for the prior of latent variables instead of the Dirichlet Process prior while still using splines for the regression functions. We refer to this model as SMEMN. The MCMC has a minor change in the Gibbs step (steps (a)-(c) are eliminated and step (d) no longer depends on grouping h).

We set the values of the hyperparameters as follows: , , , , , , ayee = ayes = awee = awes = byee = byes = bwee = bwes = 0.1, ψ = I2×2, d = 3, Mμ = (2400, 0), Cμ = diag(100000, 100000), λee = λes = 1. We ran the MCMC for 3 chains of 12,000 iterations, using the first 2000 as burn in, and convergence for all models was fast as indicated by trace plots and Gelman-Rubin diagnostics less than 1.04.


Tables 2, 3 and 4 show results averaged over 200 Monte Carlo samples, for normal, skewed, and bimodal errors, respectively. The asterisk next to the truth for the measurement error with respect to the less precise measurements indicates that this is a Monte Carlo approximation to the truth. Recall that we included within person variation in the functions m(⋅), but in our model we use the working assumption that the additive error term accounts for both within person variability and measurement error. Because we cannot directly extract the value from the function, we approximate it by generating 10,000 data sets and removing the mean function from the less precise observations, and then calculating the standard deviation of the residual. We then averaged those standard deviation estimates to get the one reported in the table.

Table 2. Summary of simulation under normal errors for naïve, LMEM, SMEMN, SMEM models, respectively.

Table 3. Summary of simulation under skewed errors for naïve, LMEM, SMEMN, SMEM models, respectively.

Table 4. Summary of simulation under bimodal errors for naïve, LMEM, SMEMN, SMEM models, respectively.

Across all models and error types, the linear coefficients are estimated largely without bias. This is not too surprising since these covariates are measured without error. This suggestst the regression coefficient estimates will not be affected by distribution of the errors. Additionally, the regression coefficients can be interpreted as biases inherent to the device. For example, γ1,ee can be thought of as the the additional number of calories a device will report for a male compared to a female, all else equal. These results could be informative and useful as a secondary study goal. The biases and standard errors are slightly smaller for models SMEMN and SMEM, however. All three measurement error models perform about the same when assessing the measurement error in the gold standard instruments. When errors are generated from a bimodal distribution, estimated error variances are biased toward zero. This is true for the measurement error in the less precise measurements as well. The SMEMN and SMEM models produce similar results for the estimates of variance measurement error of less precise measurements. Estimates are good for EE and ΔES when errors are normal, but biased low for ΔES for both skewed and bimodal errors. Both the naïve model and the linear measurement error model result in estimated measurement error standard deviations for the less precise measurement that are too large under normal errors and skewed errors for EE. When the departure from normality is significant (bimodal error distribution) unbiasedly estimating the measurement error variance can be challenging.

Fig 3 shows boxplots of the log mean PMSE for each simulation for each model under each type of error distribution for EE for 2 and 4 replicates, and Fig 4 shows the same for ΔES. There is a consistent decreasing pattern from simpler to most complex in terms of the models. First, the naïve model does much worse than the same model which accounts for measurement error. The naïve model and the linear measurement error model perform much worse than the models with free knot splines in terms of PMSE. This is under the case where the true relationship is non-linear, but when looking at the noisy data the relationship doesn’t appear to be highly non-linear. This suggests the methods using free knot splines are able to see potential relationships that are difficult to see with only the noisy data. There is not a large difference between the SMEMN and SMEM model in terms of PMSE, but the SMEM model generally does better. There are more parameters in SMEM to help explain the scientific mechanism of the problem, but that does not necessarily imply better prediction. The question is whether the small improvement is worth the increase in model complexity. We think that the answer is no for two reasons: (i) our main focus with this model is calibrating the less precise measurements and not necessarily conducting inference at the latent variable level, and (ii) the DP approach is reliable only situations when we have four replicates, which for gold standard measurements, is unrealistic in practice. Because the main focus is to calibrate less precise measurements, the simulation results are promising.

Fig 3. PMSE for EE.

Log PMSE for EE Regression faceted by measurement error distribution and number of replicates.

Fig 4. PMSE for ΔES.

Log PMSE for ΔES Regression faceted by measurement error distribution and number of replicates.

To see the structure of the nonlinear model with the fitted spline on top of the simulated data, we provide plots from one of the 200 simulated data sets. We chose a simulated data set with skewed errors and two replicates per person. Fig 5 shows the fitted spline between the values of EE and ΔES and the measurements obtained with the less precise measurement. The points correspond to the individual simulated data where the y value is the mean of the two replicates. The bold (red) line is the mean estimated spline function. We randomly selected 500 MCMC draws for the spline, and plotted them behind the mean. Fig 6 gives the distribution of the number of knots for the spline for both the EE and ΔES splines. The splines are not overly complex and typically use four or fewer knots.

Fig 5. Fitted spline.

Spline function for Model SMEMN with Skewed Errors.

Fig 6. Distribution of kee and kes.

Distribution of Number of knots for Model SMEMN with Skewed Errors.


The main goal of this work is to develop a calibration approach to “correct” the measurements of EE ΔES obtained with less precise, noisy measurements. That is, given a measurement of EE or ΔES from an less precise instrument and some demographic information, we can return a better estimate of the true value as well as a credible interval that shows the uncertainty in the estimate. Calibration for our models simply amounts to finding the inverse of the fitted models as a function of Y instead of X, and Z. For a given observed value of Y and Z, and an estimate of γ, the calibration for X is: (23)

We cannot find the inverse in (23) in closed form so we find it numerically instead. To do so, we use optimize in R for the function |s(x) − y* | where s() represents the regression function and y* is the observed less precise measurement minus the vector of coefficients γ multiplied by the individuals’ covariate values Z. The algorithm for our calibration for individual i is as follows:

For r = 1,…R

  1. Calculate , where Zi are the covariate values for individual i.
  2. Use optimize for the function to choose the value of x that will minimize the criterion, call this xi,calibrated. Here, si(x) is the predicted value of yi for the given value x using the MCMC draw for the spline coefficients β.(r), latent variables (XEE(r), XΔES(r)), and knot locations (, ) from the rth draw of the chain.

Since our interest lies in correcting less expensive measurements for potentially non-linear biases and measurement error as determined jointly in the model through the use of gold standard measurements, this calibration step is of most interest to practitioners. Although parameters estimates from the model may be interesting, obesity, nutrition, and physical activity researchers often need reliable data on EI and EE to understand the effects of treatments in controlled experiments or relationships found in exploratory analyses from observational data. The calibration method above along with the estimated posterior distribution for the model gives practitioners a powerful way to adjust their measurements of EI and/or EE for measurement error.

As an example, suppose that we wish to calibrate three noisy measurements each from a different individual using Model SMEMN. We randomly select 3 individuals from the same data set used earlier to give results for model SMEMN. Individual 1 is male, BMI of 28.6, age 20.5; individual 2 is female, BMI of 21.5, age 30.1 and individual 3 is male, BMI 38.6 and age 22.8. Observed less precise measurements for these individuals, their true values, as well as 95% credible intervals for their mean calibrated truth under skewed normal errors are given in Table 5. Fig 7 shows histograms of 1000 calibrated draws for each individual for EE and ΔES measurements under skewed errors. Looking at the table and figure, one can see that the calibration helps pull the less precise measurement closer to the truth. In all cases, the calibration helped to improve the estimate obtained from the less precise measurement. A simple point estimate correction may be used and an analysis could procede with these corrected measurements taken as truth; a more comprehensive approach would be to use the point estimate of EE and ΔES as well as the uncertainty given by the posterior distribution. This would allow for an approach that fully accounts for biases and measurement error uncertainty present in the data as to avoid making erroneous conclusions based on bad data. Running this on many of the simulated individuals had similar results.

Table 5. 95% credible interval for calibration estimate for less precise measurements for skewed errors.

Fig 7. Calibration.

Posteriors of calibrated observations. Solid vertical line shows observed value from less precise measurement and dashed vertical line shows truth.


In this chapter we presented a semi-parametric approach to model energy balance via its components EE and ΔES. We assume that we have gold standards for both quantities that are unbiased, as well as less precise instruments that result in biased measurements of the truth. We propose a model where the form of the association between the unbiased and the biased measurements of EE (or of ΔES) is left unspecified and uses splines to estimate that function. This allows a flexible relationship between an less precise measurement and its unobserved truth. We assumed that the gold standard measurements and less precise measurements are conditionally independent given the latent vector (XEE, XΔES). We modeled the latent vector (XEE, XΔES) using a bivariate normal distribution and a Dirichlet process. Although the Dirichlet process is more flexible and based on a weaker assumption, it required more replicate observations (mainly on gold standard measurements) than is feasible in practice in order to give stable results. The normality assumption was robust and resulted in stable and surprisingly reasonable results given the true structure of the latent variables. Because this model produced accurate estimates even with only two replicates of gold standard measurements per person, we believe that it is a plausibly useful model for this specific application unless more than two replicates per person are available. The resulting estimates and PMSE show the approach what we propose outperforms a simpler linear measurement error model and a naïve model that does not take measurement error into consideration.

The intended use of the model presented in this paper is for device calibration. In order to do meaningful research in the fields of physical activity, nutrition, and health, one needs accurate, reliable data. The issue of obesity was highlighted in the introduction, and understanding energy consumed versus energy expended is crucial to understanding the obesity crisis, but collecting data on these quantities is difficult. Because measurements of EE and ΔES from less expensive devices can often include considerable error and bias, these data can lead to erroneous results later in a study. Although gold standard measurements exits for EE and ΔES, they are expensive and it is unreasonable in a large study to administer gold standard measurements to everything in the study. The method presented in this paper provides a statistical approach that allows for flexibility in the relationship between less expensive measurement and truth in order to calibrate less expensive measurements. This way, large studies can administer both gold standard and less expensive measurements to a small subsample, and use the methods presented in this paper to calibrate the less expensive measurements for those who didn’t receive gold standard measurements. This can save time and money for researchers without having to compromise the integrity of the data. One of the uses would be to obtain a corrected estimate of EI, by getting corrected estimates of EE and ΔES and then using the energy balance equation. Although only a simulation study is presented, given a study with the same data structure, estimates of the parameters in the model could be used for future device calibration.

The main motivation for constructing this model was to account for the error and bias in easy to administer measurements in order to calibrate less precise observations. We presented a simple way to do this calibration given an less precise measurement for EE and ΔES and values of gender, BMI, and age. Using a Bayesian approach we are easily able to get a posterior distribution for the mean calibrated estimate which also provides a measure of uncertainty. Our example shows that the calibrated estimate is often an improvement compared to the observed less precise measurement.


  1. 1. Foresight Programme Tackling Obesities: Future Choices-Project Reprot. 2007.
  2. 2. Bray GA, Smith SR, de Jonge L, Xie H, Rood J, Martin CK, et al. Effect of dietary protein content on weight gain, energy expenditure, and body composition during overeating: a randomized controlled trial. JAMA: the journal of the American Medical Association. 2012;307(1):47–55. Epub 2012/01/05. pmid:22215165
  3. 3. Bouchard C, Tremblay A, Despres JP, Nadeau A, Lupien PJ, Moorjani S, et al. Overfeeding in identical twins: 5-year postoverfeeding results. Metabolism: clinical and experimental. 1996;45(8):1042–50. Epub 1996/08/01.
  4. 4. Bouchard C, Tremblay A, Despres JP, Nadeau A, Lupien PJ, Theriault G, et al. The response to long-term overfeeding in identical twins. N Engl J Med. 1990;322(21):1477–82. Epub 1990/05/24. pmid:2336074
  5. 5. Bouchard C, Tremblay A, Despres JP, Theriault G, Nadeau A, Lupien PJ, et al. The response to exercise with constant energy intake in identical twins. Obes Res. 1994;2(5):400–10. Epub 1994/09/01. pmid:16358397
  6. 6. Heilbronn LK, de Jonge L, Frisard MI, DeLany JP, Larson-Meyer DE, Rood J, et al. Effect of 6-month calorie restriction on biomarkers of longevity, metabolic adaptation, and oxidative stress in overweight individuals: a randomized controlled trial. JAMA: the journal of the American Medical Association. 2006;295(13):1539–48. Epub 2006/04/06. pmid:16595757
  7. 7. Alger K. Wearable technology is revolutionizing fitness 2014. Available from:
  8. 8. El-Amrawy F, Nounou MI. Are Currently Available Wearable Devices for Activity Tracking and Heart Rate Monitoring Accurate, Precise, and Medically Beneficial? Healthcare informatics research. 2015;21(4):315–20. Epub 2015/12/01. pmid:26618039
  9. 9. Lee J-M, Kim Y, Welk GJ. Validity of consumer-based physical activity monitors. Med Sci Sports Exerc. 2014;46(9):1840–8. pmid:24777201
  10. 10. Murakami H, Kawakami R, Nakae S, Nakata Y, Ishikawa-Takata K, Tanaka S, et al. Accuracy of Wearable Devices for Estimating Total Energy Expenditure: Comparison With Metabolic Chamber and Doubly Labeled Water Method. JAMA internal medicine. 2016;176(5):702–3. Epub 2016/03/22. pmid:26999758
  11. 11. von Helmholtz H. Uber die Erhaltung der Kraft Uber die Erhaltung der Kraft, Ein Physikalische Abhandlung, vorgetragen in der Sitzung der physicalischen Gesellschaft zu Berlin am 23sten Juli, 1847. Berlin: Druck and Verlag von G. Reimer; 1847.
  12. 12. Gilmore LA, Ravussin E, Bray GA, Han H, Redman LM. An objective estimate of energy intake during weight gain using the intake-balance method. Am J Clin Nutr. 2014;100(3):806–12. Epub 2014/07/25. pmid:25057153
  13. 13. de Jonge L, DeLany JP, Nguyen T, Howard J, Hadley EC, Redman LM, et al. Validation study of energy expenditure and intake during calorie restriction using doubly labeled water and changes in body composition. Am J Clin Nutr. 2007;85(1):73–9. Epub 2007/01/09. pmid:17209180
  14. 14. Thomas DM, Martin CK, Redman LM, Heymsfield SB, Lettieri S, Levine JA, et al. Effect of dietary adherence on the body weight plateau: a mathematical model incorporating intermittent compliance with energy intake prescription. Am J Clin Nutr. 2014;100(3):787–95. Epub 2014/08/01. pmid:25080458
  15. 15. Hall KD, Chow CC. Estimating changes in free-living energy intake and its confidence interval. American Journal of Clinical Nutrition. 2011;94:66–74. pmid:21562087
  16. 16. Sanghvi A, Redman LM, Martin CK, Ravussin E, Hall KD. Validation of an less precise and accurate mathematical method to measure long-term changes in free-living energy intake. American Journal of Clinical Nutrition. 2015;102:353–8. pmid:26040640
  17. 17. Shook RP, Hand GA, O’Connor DP, Thomas DM, Hurley TG, Hebert JR, et al. Energy Intake Derived from an Energy Balance Equation, Validated Activity Monitors, and Dual X-Ray Absorptiometry Can Provide Acceptable Caloric Intake Data among Young Adults. The Journal of nutrition. 2018;148(3):490–6. Epub 2018/03/17. pmid:29546294
  18. 18. Lagerros YT, Lagiou P. Assessment of physical activity and energy expenditure in epidemiological research of chronic diseases. European Journal of Epidemiology. 2007;205:353–62.
  19. 19. Bouten C, Verboeket-van de Venne W, Westerterp KR, Verduin M, Janssen JD. Daily physical activity assessment: comparison between movement registration and doubly labeled water. Journal of Applied Physiology. 1996;81:1019–26. pmid:8872675
  20. 20. Thomas DM, Martin CK, Heymsfield S, Redman LM, Schoeller DA, Levine JA. A simple model predicting individual weight change in humans. Journal of Biological Dynamics. 2011;5:579–99. pmid:24707319
  21. 21. Racette SB, Das SK, Bhapkar M, Hadley EC, Robers SB, Ravussin E, et al. Approaches for quantifying energy intake and %calorie restriction during calorie restriction interventions in humans: the multicenter CALERIE study. Am J Physiol Endorinol Metab. 2011;302:441–8.
  22. 22. Fuller W. Measurement Error Models. New York: John Wiley and Sons, 1987.
  23. 23. Caroll R, Ruppert D, Stefanski L, Crainiceanu C. Measurement Error in Nonlinear Models. Chapman and Hall/CRC.
  24. 24. Berry SM, Carroll RJ, Ruppert D. Bayesian Smoothing and Regression Splines for Measurement Error Problems. Journal of the American Statistical Association. 2002;97:160–69.
  25. 25. Sarkar A, Mallick BK, Carroll RJ. Bayesian Semiparametric Regression in the Presence of Conditionally Heteroscedastic Measurement and Regression Errors. Biometrics. 2014;70:823–34. pmid:24965117
  26. 26. Sarkar A, Mallick BK, Staudenmayer J, Pati D, Carroll RJ. Bayesian Semiparametric Density Deconvolution in the Presence of Conditionally Heteroscedastic Measurement Errors. Journal of Computational and Graphical Statistics. 2014;23:1101–25. pmid:25378893
  27. 27. Nusser S, Carriquiry A, Dodd K, Fuller W A Semiparametric transformation approach to estimating usual daily intake distributions Journal of the American Statistical Association. 1996;91:1440–9.
  28. 28. Sinha S, Mallick B, Kipnis V, Carroll R. Semiparametric Bayesian Analysis of Nutritional Epidemiology Data in the Presence of Measurement Error. Biometrics. 2010;66:444–54. pmid:19673858
  29. 29. Tooze J, Grunwald G, Jones R. Analysis of repeated measures data with clumping at zero. Statistical Methods in Medical Research. 2002;11:341–55. pmid:12197301
  30. 30. Kipnis V, Midthune D, Buckman D, Dodd K, Guenther P, Krebs-Smith S, Subar A, Tooze J, Carroll R, Freedman L. Modeling data with excess zeros and measurement error: Application to evaluating relationships between episodically consumed foods and health outcomes Biometrics. 2009;65:1003–10. pmid:19302405
  31. 31. Kipnis V, Freedman L, Carroll R, Midthune D A bivariate measurement error model for semicontinuous and continuous variables: Application to nutritional epidemiology. Biometrics. 2016;72:106–15. pmid:26332011
  32. 32. Green PJ. Reversible Jump Markov Chain Monte Carlo Computation and and Bayesian Model Determination. Biometrika. 1995;82:711–32.
  33. 33. Denison DGT, Mallick BK, Smith AFM. Automatic Bayesian curve fitting. Journal of the Royal Statistical Society. 1998;60:333–50.
  34. 34. DiMatteo I, Genovese CR, Kass RE. Bayesian Curve-Fitting with Free-Knot Splines. Biometrika. 2001;88:1055–71.
  35. 35. de Boor C. A Practical Guide to Splines. New York: Springer; 1978.
  36. 36. Leitenstorfer F, Tutz G. Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics. 2007;8:654–73. pmid:17062722
  37. 37. Wang W, Small DS. Monotone B-spline smoothing for a generalized linear model response. The American Statistician. 2015;69:28–33.
  38. 38. Takezawa K. Introduction to Nonparametric Regression. Wiley Series in Probability and Statistics; 2006.
  39. 39. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. Chapman and Hall/CRC; 2014.