Predictive validation and forecasts of short-term changes in healthcare expenditure associated with changes in smoking behavior in the United States

Objectives Out-of-sample forecasts are used to evaluate the predictive adequacy of a previously published national model of the relationship between smoking behavior and real per capita health care expenditure using state level aggregate data. In the previously published analysis, the elasticities between changes in state adult current smoking prevalence and mean cigarette consumption per adult current smoker and healthcare expenditures were 0.118 and 0.108 This new analysis provides evidence that the model forecasts out-of-sample well. Methods Out-of-sample predictive performance was used to find the best specification of trend variables and the best model to bridge a break in survey data used in the analysis. Monte-Carlo simulation was used to calculate forecast intervals for the effect of changes in smoking behavior on expected real per capita healthcare expenditures. Results The model specification produced good-out-of-sample forecasts and stable recursive regression parameter estimates spanning the break in survey methodology. In 2014, a 1% relative reduction in adult current smoking prevalence and mean cigarette consumption per adult current smoker decreased real per capita healthcare expenditure by 0.104% and 0.113% the following year, respectively (elasticity). A permanent relative reduction of 5% reduces expected real per capita healthcare expenditures $99 (95% CI $44, $154) in the next year and $31.5 billion for the entire US (in 2014 dollars), holding other factors constant. The reductions accumulate linearly for at least five years following annual permanent decreases of 5% each year. Given the limitations of time series modelling in a relatively short time series, the effect of changes in smoking behavior may occur over several years, even though the model contains only one lag for the explanatory variables. Conclusion Reductions in smoking produce substantial savings in real per capita healthcare expenditure in short to medium term. A 5% relative drop in smoking prevalence (about a 0.87% reduction in absolute prevalence) combined with a 5% drop in consumption per remaining smoker (about 16 packs/year) would be followed by a $31.5 billion reduction in healthcare expenditure (in 2014 dollars).

∈ , −1 : real cigarette tax in state i in region r in year t, in 2010 dollars per pack, and zero for state i that is not in region r, : regression slope parameters , : regression error term. , , : independently identically distributed error terms, j = 1 (main equation), 2 (regional cigarette tax adjustment equation for mean consumption measurement error).

Specification of the Model for Main Results in Previous Research
See previous published research [1] for derivation of the model equations.

Root Mean Square Error (RMSE) and Root Mean Square Forecast Error (RMSFE)
Where ln�ℎ , � is logarithm of observed real per capita healthcare expenditure, ln�ℎ , � � is the in-sample prediction for RMSE, and the one-step-ahead out-of-sample forecast for RMSFE. I : the number of cross-sectional units (fifty states and DC), :number of years in the estimation period (RMSE)

STATA ROUTINES USED FOR ESTIMATION
Several Stata add-ons were used for estimation of the results and sensitivity analysis in addition to standard Stata command. The Stata add-on 'xtivreg2' was used for the 2SLS and GMM instrumental variables estimates, which includes features to examine the first stage estimates and detailed diagnostic tests of instrument validity. The Stata routine 'cointreg' was used for the Dynamic Ordinary Least Squares (DOLS) estimates.

MODEL SPECIFICATION, LAG ORDER, MEASUREMENT ERROR AND CAUSAL MODEL
There is strong evidence that per capita health care expenditure is nonstationary and contains an autoregressive unit root [2]. The regression residuals are stationary, which indicated that there is a cointegrating relationship between the dependent and explanatory variables [2]. A full analysis of the cointegrating regression and short run adjustments to long run disequilibrium was not possible given the number of variables that are potentially involved and the relatively short time series for the panels, in addition to the complicating factor of measurement error in one of the explanatory variables. Therefore we chose to estimate an autoregression (reduced form regression with lagged explanatory variables) rather than an error correction model or a Granger-Engle two step estimation method. The estimated autoregression combines long run and short run effects so either stationary or non-stationary measurement error is of concern [3].
In regression with stationary variables very long lag lengths for smoking behavior would be necessary for unbiased estimation of a regression that is consistent with a causal model of the effect of past smoking behavior on current health care expenditure. But including all the lags would be infeasible to estimate using aggregate data in such a short time series. However, in a cointegrated system the long run relationship is represented by a static regression [3].
With cointegration a model with a relatively short lag order can represent both a very long run process, as well as a short run process [3]. The fact that a short run in a model with one lag is chosen by conventional lag order selection criteria suggests that the there is a rapid adjustment process following disequilibrium in the long run cointegrating relationship (that is, a large error term in the lagged cointegrating relationship), and the separate short-run adjustment process takes place over a short time horizon.. Standard methods for determining the order of the autoregression indicate that a lag of one period is sufficient. A sensitivity analysis done for the 2016 publication gave evidence that two lags may be necessary for a few BEA regions, however, including them in the regression does not substantially change the results [2].
Measurement error has long been recognized as a problem in the analysis of in cigarette consumption as a function of untaxed consumption due to tax differentials between states [4]. The reason for the instrumental variables estimates was to account for measurement error in state cigarette consumption per current smoker due to untaxed interstate consumption, not because of simultaneous equation endogeneity between real per capita health care expenditure and cigarette consumption per smoker, or omitted variables. We are unaware of any theory or empirical evidence that per capita health care expenditure has any significant feedback effect on cigarette consumption per current smoker.
There is some evidence that real cigarette taxes and interstate tax differentials are nonstationary over the sample period: they rise rapidly at irregular intervals and persist at that higher level or decay very slowly. For mean consumption, Fisher's panel unit root tests failed to reject the null hypothesis of a unit root at the 5 percent significance level for all panels. Endogeneity bias due to measurement error comes from an interaction of the observed variable with measurement error in cigarette consumption per smoker) and the measurement error itself.
Cigarette consumption per current smoker is non-stationary. Measurement error is not a serious problem for testing for unit roots. The size of standard unit root tests is too large with standard procedures for selecting appropriate lag length with measurement error, but this can be solved by using the Bayesian Information Criterion (BIC) to determine the appropriate lag length. [5] Also the results of the unit roots root tests of the level or first difference of cigarette consumption per current smoker are not affected by the choice of lag length.
The cointegrating regression itself is robust to stationary and near-unit root 'mildly' nonstationary measurement error, and the standard estimators can be used in presence of these sorts of measurement error, including ordinary least squares [6,7]. It is impossible to determine unit root versus near unit root time series in the relatively short time series used in this analysis, but we believe that cigarette taxes are probably near unit root based on borderline acceptance of the null of a unit root using standard panel unit root tests, and visual examination of the cumulative periodograms. These sorts of measurement error do produce bias in the stationary short run adjustment process, but this problem can be handled by a wide variety of estimators, including 2SLS and GMM instrumental variables estimators, that produce unbiased estimates of the short run processes [6,7]. But, regardless of whether the measurement error process is nonstationary or nonstationary then this measurement error will produce biased coefficient estimates for the stationary short run processes. Therefore lag order for and adequate specification of the autoregression can be relatively short for the reasons given above. Also, untaxed inter-state consumption responds within a few years of the appearance of a tax differential, therefore the lag order and for valid instruments for mean is relatively short. We conducted a sensitivity analysis using different sets of instruments using different instrument selection, reported in the following section.

SENSITIVITY ANALYSIS ON CHOICE OF INSTRUMENTAL VARIABLES
For the estimates reported in the main text, the instruments were: (1) prevalence of smoking lagged two and three periods, and (2) mean consumption lagged three periods. The standard test results for these instruments indicate that the regression equation is identified, the instruments are not 'weak' and produce little weak instrumental variable bias, and the joint null of the over-identifying restrictions needed for valid instruments (Hansen J test) is not rejected, with p-values varying from 0.340 to 0.710 for sample periods from 1992-2006 up to 1992-2010.
A sensitivity analysis was conducted of 2SLS estimates using different instruments with different lag orders. The analysis used the sample period 1992 to 2010 to avoid an artifacts possibly introduced by the modelling the break in BRFSS methodology in 2011. The sensitivity analysis also included two additional estimators for comparison. One was the fixed effects panel regression with no instruments. The second was an estimator that was developed for cointegrating regression that is robust to either stationary or mildly stationary measurement error in the explanatory variables that is suitable for samples with a relative short time dimension: Dynamic Ordinary Least Squares (DOLS) [6][7][8]. The DOLS estimator adds several leads and lags of the first difference of the dependent and explanatory variables to the regression. The DOLS estimator is a single equation estimator originally designed for a single time series, but can be adapted to estimating the slope coefficients of a fixed effects model by taking the within transformation of the explanatory and dependent variables The results of the sensitivity analysis (Table S2) support use of 2SLS with instrumental variables as the best estimator, with appropriately lagged instruments that can be used over the whole sample, before and after the break in BRFSS methodology. The fixed effects panel estimates with no instruments produced the smallest estimates. The DOLS estimates are closer to the 2SLS estimates than they are to the fixed effects panel estimates with no instruments. Results for effectiveness of the DOLS estimator are asymptotic and for a potentially large number of leads and lags needed for adequate adjustment. Only 1 to 3 leads and lags are feasible for a sample with only 19 observations along the time dimension from 1992 to 2010, and 3 lags reduces the sample size by one third. So, DOLS may provide an imperfect adjustment for measurement error in a finite sample and relatively short time dimension. The 2SLS estimates using 2 lags of mean consumption per smoker as an instrument are a little closer to those published in the main text. The null of the Hansen J test for over-identifying restrictions is rejected at the 5 percent significance level for the instruments that included mean consumption at 2 lags, so there is reason to believe it is not a valid instrument.
Measurement error reduces the absolute value of regression coefficient of the variable measured with error towards zero, at least in the two variable regression model. [9] While the analysis is more complex in a multivariable regression model, in this case the two variable regression results seems to hold. We believe that three lags for the instruments for cigarette consumption per current smoker in the 2SLS estimates are enough avoid noticeable bias in this case.
Including cigarette consumption lagged three periods as an instrument seemed to produce the most efficient estimates with valid instruments, so that that specification was chosen for the results in the man text.

SENSITIVITY ANALYSIS ON CHOICE OF ESTIMATOR
The estimates presented in the main text used 2SLS estimation rather than Generalized Methods of Moments (GMM) because GMM estimation failed to consistently produce a reliable variancecovariance matrix of full rank after adjustment for the break in BRFSS methodology in 2011 with cluster robust estimates. This failure was probably due to the increase in the number of parameters needed to introduce state specific intercepts to model the break in BRFSS methodology in 2011. We wanted to use the same estimator before and after the break in BRFSS methodology in 2011.
A sensitivity analysis comparing 2SLS and GMM instrumental variables estimates are presented in Table S2. This sensitivity analysis was done using the sample from 1992 to 2010, again to avoid artifacts from modelling the break in BRFSS methodology in 2011. GMM estimators with variance estimates that are robust to autocorrelation for up to three lags were in the sensitivity analysis. We chose 3 lags and the maximum lag length using a generous interpretation of the common practice of using T 1/3 or T 1/4 for the maximum lag length where T is the number of annual observations in the sample [3,10].
The GMM estimates are very similar to the 2SLS estimates, with and without robust variance estimates adjusted for up to 3 lags in the residuals for autorcorrelation. The GMM estimates are not statistically different from those presented in the manuscript. The estimates adjusted for autocorrelation and spatial correlation simultaneously are somewhat lower, but still not statistically different from the results on the paper. Table S2 also includes sensitivity analysis regression estimates from the previous publication that adjust for autocorrelation and cross sectional dependence simultaneously.
[1] These estimates used the 'xtscc' Sata add-on, which does not permit instrumental variables estimation, and we believe these estimates are lower due to measurement error.
The coefficient estimates in Table S2 below are not different from those in the manuscript at the 5% significance level (p-values range from 0.16 to 0.67)    Year after change Figure S4.-The effect of an annual 5% reduction in prevalence of smoking and 5% reduction in mean cigarette consumption.
Solid line is point forecast of mean effect, dashed lines are 95% confidence interval for mean effect. Savings ($ per person per year) Year after change