Modeling energy balance while correcting for measurement error via free knot splines

Measurements of energy balance components (energy intake, energy expenditure, changes in energy stores) are often plagued with measurement error. Doubly-labeled water can measure energy intake (EI) with negligible error, but is expensive and cumbersome. An alternative approach that is gaining popularity is to use the energy balance principle, by measuring energy expenditure (EE) and change in energy stores (ES) and then back-calculate EI. Gold standard methods for EE and ES exist and are known to give accurate measurements, albeit at a high cost. We propose a joint statistical model to assess the measurement error in cheaper, non-intrusive measures of EE and ES. We let the unknown true EE and ES for individuals be latent variables, and model them using a bivariate distribution. We try both a bivariate Normal as well as a Dirichlet Process Mixture Model, and compare the results via simulation. Our approach, is the first to account for the dependencies that exist in individuals’ daily EE and ES. We employ semiparametric regression with free knot splines for measurements with error, and linear components for error free covariates. We adopt a Bayesian approach to estimation and inference and use Reversible Jump Markov Chain Monte Carlo to generate draws from the posterior distribution. Based on the semiparameteric regression, we develop a calibration equation that adjusts a cheaper, less reliable estimate, closer to the true value. Along with this calibrated value, our method also gives credible intervals to assess uncertainty. A simulation study shows our calibration helps produce a more accurate estimate. Our approach compares favorably in terms of prediction to other commonly used models.


S2 Appendix Reversible Jump MCMC.
[3] introduced RJMCMC as a means to jump between parameter spaces that have different dimensions within an MCMC algorithm. There have been a number of different approaches to Bayesian estimation of free knot splines using RJMCMC [4], [5], [6], [7]. For the most part, we adopt the approach of Denison et al. This approach performs well relative to the fully Bayesian approach of DiMatteo et al. for smooth and not highly complex functions when an appropriate adjustment is made (that was pointed out in DiMatteo et al.). We do not expect the mean function in our model to be highly complex, so the approach of Denison et al. is likely to perform well in our problem.
Consider the situation where we wish to estimate a regression spline in the following setup where the y i , i = 1, ...n are the response variable and x i , i = 1, ...n is the observed covariate. We do not want to specify the number or location of the knots, but rather estimate them from the data. Under the specifications of Denison et al., in RJMCMC for free knot splines there are three possible transitions: 1. Birth of a knot 2. Death of a knot 3. Movement of a knot, where in either of the first two transitions the dimension of the parameter space changes. The transition that is proposed depends upon the prior for the number of knots k. Using the notation of Denison et al., let the prior probability for k knots be p(k). Then the probability of attempting a birth step, death step or move step are b k , d k , η k , respectively, where where 0 ≤ c ≤ 1/2 to ensure b k + d k ≤ 1. If the birth step transition is chosen, a new knot is proposed from the set of available new knots. Denison et al. puts a discrete uniform prior on knot locations so that only observed locations can be knot locations. A proposed new knot is chosen at random from the set A={x i , i = 1, ...n : x i is not currently a knot or within + 1 locations of a current knot}. Here, is the order of the splines, so in our case =3. If the death step transition is chosen, an existing knot is picked at random and removed. If the movement transition step is chosen, a current knot is picked at random and moved to a new location at random from the set A. Once new knots are chosen, we construct a spline basis matrix using the observed locations {x i , i = 1, ...n} and the positions of the knots. We then use OLS to estimate the spline parameters β. Although we could place priors on the spline regression parameters, this adds significant computational burden and results have been similar when comparing estimation with OLS versus fully Bayesian estimation as long as functions are smooth and not highly complex [7], [4]. The acceptance probability for each step is of the form given by [3]: Denison et al. give simplified acceptance probabilities for each of the three transitions under a Poisson prior for number of knots and a discrete uniform prior for knot locations. In the regression setup in Denison et al., they then sample variance components using a Gibbs step.
The model that was fitted in [7] is simpler than our model (1) in three respects: (i) our regression model is part of a larger hierarchical model, (ii) we have an additional linear component in the mean function, (iii) the covariate value x i is a latent variable . The first issue does not present much of a problem thanks to conditional independence assumptions. For issue (ii) we propose to update the linear coefficient parameters at the same time as we update the spline coefficient parameters, using OLS. We understand that this is not a fully Bayesian approach, but we anticipate results to be similar with much less computational burden.
Issue (iii) is not as simple. In the free knot spline setup in Denison et al., they have a regression model of the form (1) where y is the observed response and x is the observed covariate. In our case we do not observe x; rather, it is a latent variable that we draw from its full conditional distribution using the Gibbs algorithm described in Section 2.5.1. Because we sample the x's via the Gibbs algorithm, the basis matrix has to be adjusted in every iteration. We calculate the basis matrix using the current value of the latent variables and the proposed knots. If we accept the proposal, then the knots, spline coefficients, and consequently the estimate of s · (X ·(k) , β (k) · ) are all updated. The challenge is what to do when we reject the candidate draw. It seems reasonable to keep the current knot locations and the current spline regression parameters, but keeping the current predictions of s · (X ·(k) , β (k) · ) does not make sense because they are based on X ·(k) , which is updated in every iteration. We propose the following protocol: if we reject the RJ candidate, we keep the current knots and spline coefficients, calculate a new basis matrix of X based on the current values of X ·(k) and the current knots, and then compute the predicted values by multiplying the basis matrix by the current spline coefficients. An additional complication is that X has to do with the birth step and the set of available knots A from which to choose . We want to pick randomly from the set of X values which we drew in the current iteration of the MCMC, but we do not want a knot to be bigger than or smaller than the maximum or minimum value of X. To avoid this problem, we exclude the three smallest and the three largest values of X from A. We do not expect this to be a major constraint because we are less concerned with flexibility of the function at its extremes and therefore do not want to allow lots of knots at the extremes.
This algorithm is run independently for EE and ∆ES mean regression functions because of the conditional independence assumptions. Because this algorithm is the same for EE and ∆ES, we omit subscripts and superscripts below. The reversible jump step that goes into our overall Gibbs algorithm is then: 1. Calculate b k and d k according to (2).
2. Select birth, death, or move step with probabilities b k , d k , 1 − b k − d k respectively.

Knot Changes
If birth step: (a) Select a new knot location at random from the set A and join with current knots r (k−1) to create the proposed knot locations r * . If death step: (b) Sample one knot location from r (k−1) at random and remove it to create the proposed knot locations r * . If move step: (c) Sample one knot location from r (k−1) at random, and change it to a new knot location at random from the set A to create the proposed knot locations r * .
4. Calculate the spline basis matrix B * (X (k) ) using X (k) and proposed knot locations r * .