Heterogenous wealth effects of minimum unit price on purchase of alcohol: Evidence using scanner data

One of the key arguments given to oppose the “sin taxes” is that they are regressive in nature and place disproportionately higher cost on the poor thereby reducing their net wealth. The response to a reduction in net wealth attributed to tax can potentially have significant effects through an increase in alcohol purchase by heavy drinkers reinforcing or even offsetting the direct price or substitution effect of these taxes in reducing alcohol consumption. Comparatively little is known empirically about the net wealth effect associated with changes in alcohol tax policy, and this study aims to help fill this gap in the literature. In this study we aim to estimate how the wealth effects of introducing a minimum unit price (MUP) of A$2.00 per standard drink vary over the distribution (quantiles) of alcohol consumers. The data used in this study is a longitudinal panel of 1,395 households’ daily alcohol purchases (scanner data) recorded over a full year. Our analysis involves (i) quantile regression to estimate income elasticity over the distribution of consumption, and (ii) using these elasticities to estimate the potential wealth effects of a hypothetical change in alcohol prices from introducing an MUP policy. We control for consumer demographic characteristics, alcohol product prices and prices of close substitutes, and quarterly seasonal effects. We find that the estimated wealth effect from increasing the price of alcohol under a MUP policy is not significant at any point over the distribution of alcohol consumers. The policy increases per capita tax impact by less than A$5.00 per week for light/moderate consumers (50th—80th quantile) and decreases their daily per capita alcohol consumption by less than 0.02 standard drinks. Wealth effects attributable to an MUP policy are likely to be negligible. Substitution effects of the policy dominate the wealth effects in generating key health related outcomes such as reductions in alcohol consumption.

where k denotes products, p kct is the unit price of k in postcode c at quarter t and and p k0 and q k0 are the sample median prices and quantities for k. This approach aims to ensure that the derived price index will not vary with systematic differences in unobserved household characteristics, which may either affect preferences for quality or influence local prices. In the demand function estimate, we also control for the prices of the closest substitutes, including: regular soft drinks; diet drinks; fruit juice; and, bottled water. We construct price indices for these other beverage categories as per the procedure we use for alcohol, and assume that these price indices remain constant in the counterfactual scenario.

B Quantile Regression
We use quantile regression to identify heterogeneity in ex-ante wealth effects across the distribution of alcohol purchasing. To begin, consider a standard linear regression model that defines the conditional mean of the dependent variable y, as a linear function of a vector of explanatory variables, x, written as y i = x i β θ + i and E(y i |x i ) = x i β (2) where is an error term. While this standard linear regression is useful for estimating the average (mean) effect of x on y, it provides only a partial view of the relationship.
Quantile regression provides a more complete picture by estimating the relationship between y and x at different points (i.e. quantiles) over the conditional distribution of y. That is, it allows for effects of the independent variables to differ over the quantiles (of alcohol purchases in our case), which is of particular interest to us given the uneven distribution of alcohol purchases in the sample population. The starting point for quantile regression is the conditional quantile function (CQF). The CQF at quantile θ for a continuously distributed variable y, given a vector of regressors x, can be defined as where Q θ (y i |x i ) denotes the θ th conditional quantile of y i . So, for example, to describe the median (mid-point in the distribution), we take θ = 0.5. In the context of our study, where large volumes of alcohol are purchased by a relatively small share of households, the median is probably more informative than the mean. The quantile regression model of the form first introduced by [3] can written as: where β θ is the vector of parameters, and the conditional quantile of the error term is zero. The quantile regression estimator of β θ is found by solving the problem: where ρ θ (λ) = (θ − I(λ < 0))λ is the check function, and I(·) is the usual indicator function. The special case where θ = 0.5 is called the median regression estimator, or the least absolute deviations (LAD) estimator. The minimisation problem in Eq. (5) can be solved by linear programming for different quantiles of the dependent variable (see [3]), which makes estimation relatively fast [4]. Additionally, the quantile regression estimator has several important equivariance properties that are preserved under monotone transformations, which help facilitate the computation procedure. For example, if we transform a set of positive observations by taking the values in logs, the median of the log will be the median of the untransformed data. In our sample there is left-side zero censoring because a large share of households did not purchase alcohol in some quarters. Therefore, quantile regression of the form described in Eq. (3) is not applicable and this specification needs to be corrected for zero censoring. One alternative is the tobit model, also referred to as the censored regression model, which can be written as: where In other words, the tobit model is a standard regression model where all values of the dependent variable that are equal to, or less then, zero take the value zero. The tobit model describes both the probability that y i |x i = 0; and the distribution of y i |y i > 0. However, despite its popularity, [5] shows that if the errors are not normally distributed and homoscedastic, then the estimated coefficients of the tobit model are inconsistent. 1 [6] proposes an alternative to maximum likelihood estimation of the parameters of the tobit censored regression model that is not based upon strict parametric assumptions. His proposed censored least absolute deviations (LAD) estimatorβ n is a generalization of LAD estimation for the standard linear model, and, unlike estimation methods based on the assumption of normally distributed error terms, the estimator is consistent and asymptotically normal for a wide class of error distributions, and is also robust to heteroscedasticity. The value of the estimatorβ n can be found by solving: [6] later extended this median (LAD) regression, recognising that in situations where the dependent variable is heavily censored (i.e. where y = 0 for a large share of the observations), the censored LAD estimatorβ n may be very imprecise, since the median of y i would be uninformative about β 0 for much of the sample. In such a situation, Powell suggests it may be preferable to centre the distribution of y i at a higher quantile than the median, because a higher quantile would be more often positive, and thus more often informative about β 0 . [6] further shows that under certain regularity conditions, the estimators generated from a censored quantile regression (CQR) model are consistent, independent of the distribution of the error term, are asymptotically normally distributed, and are robust to outliers of the dependent variable. Below we describe the CQR model in detail.

B.1 Censored Quantile Regression
When the conditional quantile of the error term is zero, a CQR model of alcohol purchases censored at zero can be expressed as: The CQR estimator of β θ proposed by [6] is found by solving where I is an indicator function taking the value of unity when the expression holds, and zero otherwise. For observations when x i β θ is equal to or less than zero (i.e. zero being the censoring point), max{0, x i β θ } = 0 and Eq. (7) is minimised by using only the observations for which x i β θ is greater than zero. While Eq. (4) is a linear function and can be solved by linear programming as noted above, the expression max{0, x i β θ } in Eq. (8) is not linear and has no linear programming representation. Therefore, to solve Eq. (8), we use the three-step algorithm proposed by [7] for known censoring points that is simple, easily computable (comparable to linear least squares), well-behaved, robust, and performs well near the censoring point. [7]'s estimatorβ θ is obtained in the following three steps. First, the censoring probabilities are estimated by a parametric classification (probability) model δ i = pX i γ + i , where δ i is the indicator of no censoring. Then, for each quantile regression, a sample of observations with sufficiently low censoring probabilities relative to the quantile of interest are selected (i.e. the households that did purchase alcohol), defined as J 0 = {i : pX iγ > 1 − θ + c where θ is the quantile and c is the trimming constant between 0 and 1, set to 0.1 in our case. Following [7], we allow for misspecification of the model by excluding the observations that could theoretically be used but have censoring probabilities in the highest quantiles.
Second, we obtain the initial (consistent but inefficient) estimatorβ 0 θ using standard linear quantile regression, shown in Eq. (4), for the sample J 0 . This initial estimator is used to define a new subsample of observations This sample consists of all observations for which the estimated conditional quantile is above the censoring point. We exclude observations in the lowest quantiles of the distribution of the residuals.
Third, we use standard linear quantile regression, as per Eq. (4), for sample J 1 defined in step two. As shown by [7], this results in a consistent and efficient estimator ofβ θ . The standard errors of the parameter estimates are obtained with the censored quantile regression bootstrapping procedure described by [8]. For all estimates we use the "cqiv" command available for Stata Version 14 written by [9].

B.2 Counterfactual Analysis
To simulate a counterfactual distribution we use the modelling and inference tools developed by [10]. A complete description of the statistical properties of the counterfactual estimation methods we implement can be found in [10]. The key element of this approach is the counterfactual unconditional distribution F Y j|k , where Y is the outcome of interest (alcohol purchases), and j and k are the reference (i.e. observed) and counterfactual populations, respectively. F Y j|k is the distribution of alcohol purchases in population k if it had the same behavioural response (i.e. alcohol purchases) to a change in exogenous characteristics (i.e. per capita household income) as population j. The behavioural response is modelled through the conditional distribution F Yj |Xj , where the index j indicates that this distribution is estimated on the reference population j, and X is a set of covariates. Let F X k be the unconditional distribution of the covariates in counterfactual population k, then we have The counterfactual distribution F Y j|k (y) is estimated in two steps. First, the conditional distribution F Yj |Xj is estimated using the censored quantile regression model described earlier, withQ Yj |Xj = Xβ j (θ) at various quantiles θ ∈ [0, 1] using a sample of population j. Next, the conditional distribution is obtained by the relationship where 1{·} is the indicator function andQ Yj |Xj (θ|X = x) = xβ j (θ).
In the second step, the counterfactual unconditional distribution is obtained by using a simple plug-in rule: where n is the size of population k and i is an observation in k. This empirical average is the empirical counterpart of the theoretical formula in Eq. (9). Confidence intervals (CIs) can be computed by bootstrap resampling over populations j and k [10]. We further cluster the standard errors at the household level.
Using counterfactual distribution methods we estimate several outcomes of interest. If we partition the vector of covariates X in (inc, Z), then our main counterfactual of interest is the 'after price policy' distribution. The model is estimated for the observed covariates X 0 = (inc 0 , Z 0 ) where inc 0 is the pre-policy per capita income of households and Z is a vector of household characteristics, price indices for alcoholic and non-alcoholic beverages, and quarter dummies that we describe earlier. The counterfactual is obtained for X 1 = (inc 1 , Z 0 ) where inc 1 is the after-policy per capita household income. The before/after unconditional quantile treatment effect (UQTE) can be computed at each quantile θ aŝ whereQ Y j|k (θ) = min{y : F Y j,k }(y) ≥ θ. In practice, the after-policy population is obtained by sampling observations in the current (pre-policy) population, calculating the after-policy tax burden, and setting the observed income inc 0 to the value that it would take when the tax burden from the MUP policy is imposed, as described below. Following this, we can also estimate other policy outcomes of interest, including the predicted change in volume of alcohol (standard drinks) purchased per capita, per quarter (or per day).
For all estimates we use the "counterfactual" command available for Stata Version 14 written by [9]. We have slightly modified this command in order to draw bootstrap samples of households as we work with a panel, and also for clustering the standard errors at the household level.

B.3 Calculating tax burden and adjusted incomes
For Eq. 14, we calculate the value of adjusted per capita household income inc k at the after policy stage by adapting aspects of the mathematical framework in the Sheffield Alcohol Policy Model (SAPM) version 2.0 (see [11]), including the SAPM method of calculating the expected changes to product prices under a MUP policy and the subsequent change in alcohol purchase costs (i.e. tax burden) for each household. The first step is calculating the current product price (A$) per standard drink (12.67 mL alcohol) as well as the new price under a MUP policy. Using the detailed information in our dataset, we are able to accurately calculate pre-policy values of product prices and household spending using individual details of each product u purchased at each household's separate shopping transactions l, including the alcoholic beverage category k (X 9 categories), container size in litres s, quantity q of containers purchased in the transaction, the alcohol by volume (ABV%) content of each product a u , and the price paid at the time of purchase p l . With this information, we calculate the current price paid (A$) per standard drink d for each individual product transaction, denoted price 0uld and expressed as: To calculate price 1uld , which is the after-policy price per standard drink that each product will take under the simulated A$2.00 MUP policy, we inflate the value of price 0uld up to A$2.00 if it is less than A$2.00, otherwise the price is left unchanged. We derive price indices for price 1uld (ex post MUP scenario) for inclusion in our counterfactual analysis model, as described above. The second step is to calculate the additional alcohol purchase costs per capita (i.e. tax burden) for each household at the new reduced quantity, post $2 MUP. We denote exp 0pli and exp 1pli to be the pre-policy and after-policy purchase costs, respectively, which represent the summed pre-policy price 0uld (as defined in Eq.15), and summed after-policy price 1uld at all shopping transactions l, for ex post consumption by each individual i aged greater than eleven years within each household j, of alcoholic beverage category k. To calculate the additional annual per capita tax burden T under a MUP policy for each household j we subtract for a specific quantity the pre-policy purchase costs from after-policy purchase costs, expressed as: The third and final step is to calculate adjusted incomes inc k for households at the 'after price policy' stage. For this, we simply subtract the annual per capita tax burden for each household (T j ) from each household's pre-policy per capita income inc * .