The distributional impact of a green payment policy for organic fruit

Consumer spending on organic food products has grown rapidly. Some claim that organics have ecological, equity, and health advantages over conventional food and therefore should be subsidized. Here we explore the distributive impacts of an organic fruit subsidy that reduces the retail price of organic fruit in the US by 10 percent. We estimate the impact of the subsidy on organic fruit demand in a representative poor, middle income, and rich US household using three analytical methods; including two econometric and one machine learning. We do not find strong evidence of regressive redistribution due to our simulated organic fruit subsidy; the poor household’s relative reaction to the subsidy is not much different than the reaction at the other two households. However, the infra-marginal savings from the subsidy tend to be larger in richer households.


Introduction
One could argue that organic food production is a Clean Technology (CT) that deserves subsidization. CT produces a good with fewer environmental and human health impacts than a similar good produced by a more mature but 'dirtier' technology [1]. However, the goods produced by a CT have higher average production costs than the very similar goods produced by the 'dirtier' technology [1]. Despite incentivizing industry to incur higher-than-necessary production costs, subsidization of CT can make society better off if its use incidentally generates benefits that are greater than additional costs (e.g., [2]). Not only can subsidization of a CT generate immediate social welfare if the total subsidy is less than the external benefit created by use of the CT, but subsidization of the CT's use accelerates the decline in the clean technology's average cost of production via learning-by-doing (e.g., [3][4]) and the creation of economies of scale (e.g., [5]), promising even more welfare in the future.
Like other CT, organic agriculture produces goods similar to goods produced by 'dirtier' technology (conventional or "industrial" agriculture), but operates at a higher average cost ( [6][7][8][9], S1 Supporting information) and, in many cases, generates greater uncertainty in producer returns [7,[10][11][12][13]. However, like the impact of adopting other CT, organic systems can generate several incidental social, human health, and environmental benefits relative to production and consumption with the conventional technology. For example, studies have found a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 that landscapes with more organic farms also have more equitable and vibrant local economies [14]. Organic farms in the US tend to be smaller and less capitalized, and therefore are a countervailing force against the overall trend of farm consolidation across the rural US. In 2014, 12,595 farms that covered 3,642,933 acres were certified organic, meaning an average farm size of 289 acres. In contrast, there were 2,085,000 farms of all types in the US in 2014, with an average size of 438 acres [15]. Further, the limited use of pesticides in organic farming has been found to significantly reduces illnesses and injuries among agricultural workers relative to conventional farming practices [16].
Some of the other human health attributed to organic consumption and organic production may or may not exist. For example, the organic process is perceived by many to produce food that is more nutritious and less likely to expose consumers to higher levels of dangerous chemicals relative to conventionally produced analogs. Whether these claims are true or not is hotly contested [17][18][19][20][21][22][23][24][25][26]. Regardless of the actual gains to private health from consuming organic foods, the perception of such gains mean that organic food persists as a stated preference for many households [16,[27][28]. However, organic food price premiums mean these investments in (perceived) private health are less available to the households with limited food budgets [29]. Therefore, subsidizing the production or consumption of organic food would, theoretically, give poorer households greater capacity to access (perceived) health benefits.
Likewise, a positive environmental impact from organic production may be more perceived than real. On the positive side of the environmental impact ledger, organic production tends to have lower irrigation requirements, leads to less soil carbon loss [30], uses less energy [31][32], and supports more biodiversity [33][34] than conventional production. On the negative side of the environmental impact ledger, organic production's reliance on manure for nitrogen instead of synthetic fertilizers means lower yields [32] and lower plant uptake of nitrogen relative to conventional farming (synthetic fertilizers are designed to make their nitrogen available when needed by the plant but manure is not; see [35]). Therefore, as demand for organics increase, landscapes dominated by organic production experience 1) greater rates of land conversion, and therefore, greater losses in the ecosystem services produced by less-intensively used land, to compensate for lower organic yields and 2) greater eutrophication and acidification potential relative to landscapes dominated by conventional production [32]. A recent review of the organic production literature found that organic production's indirect land use impact is so severe that it turns a process that is less polluting than conventional farming on a per unit of land basis into a more polluting process on a per unit of output basis [36]. Therefore, while popular opinion holds that organic production is better for the environment than conventional production [37][38], whether the adoption of organic farming generates net environmental benefit is an open question.
The real and perceived external benefits created by organic food production and consumption has led the European Union to subsidize organic food production. Dimitri and Oberholtzer [39] stress the economic efficiency gains that European policy-makers believe the policy induces: "European governments support organic agriculture through green payments. . .for converting to and continuing organic farming. The economic rationale for these subsidies is that organic production provides benefits that accrue to society and that farmers lack incentives to consider social benefits when making production decisions. In such cases, payments can more closely align each farmer's private costs and benefits with societal costs and benefits." While previous US farm bills have provided some one-time subsidies to US farmers switching from conventional to organic farming [11], this external support has been small relative to European support for organic production. Therefore, the American Public Health Association, among others, has advocated adding an European-style green payment policy to the US farm bill [40] to better align US farmer's private costs and benefits with societal costs and (perceived) benefits.
While the drive for an organic subsidy in the US seems to be largely motivated by economic efficiency concerns, in this paper we consider how the subsidy's consumer benefits would be distributed across the US household income spectrum, about which there is little current evidence. CT subsidization policies tend to regressively re-distribute consumer welfare [41][42][43][44][45]. Therefore, assessing whether organic production subsidization would regressively redistribute US consumer welfare is highly relevant to organic food policy discussions, especially those that concern subsidization. Richer households could benefit more from organic food subsidization if poorer households have less access to organic produce [46], if poorer households have relatively weaker demand for organic food [47], poorer households feel they cannot risk their limited food budget on perishable goods despite a price reduction [48], poorer households are less aware of shrinking organic price premiums caused by subsidization [29], or the subsidy is paid for by taxes that disproportionally affect the poor.
Here we explore the distributive impacts of an organic food subsidy-whether it is a production or consumption subsidy-that reduces the retail prices of organic fruit in the US by 10 percent from their 2011-2013 levels. First, we estimate the demand for organic fruit across US households stratified by annual income. Second, we use these estimated models to predict the impact of the subsidy on representative households' 1) inframarginal gains from organic fruit consumption and 2) the change in organic fruit consumption. If the relatively affluent experience greater inframarginal gains and react more strongly to the lower organic food prices created by subsidization than less well-off households then the subsidy program would redistribute even more welfare to the affluent. We focus on organic fruit because 1) it was widely available in all Americans markets in 2011-2013, our analytical timeframe, [49] and 2) organic produce, especially fruit, is the most popular organic food type [50].
A priori we suspected a broad subsidy of US organic fruit would primarily re-distribute surplus to wealthier US households. Previous research has found that urban households with more educated, older and married heads of house with at least one child at home, and higher incomes are more likely to buy organic produce, all else equal [27][28][51][52][53][54][55][56]. Therefore, if broad organic fruit subsidization were to reinforce these trends then subsidy-related costs could be incurred by many taxpayers to primarily improve the welfare of more educated and wealthier US households. Alternative impacts include organic fruit subsidies generating an equal bump in organic fruit consumption across US household types or, if the marginal utility from organic food consumption declines quickly, generating a relatively larger bump in organic fruit consumption for those with more modest means.

Data
Our study is based on a sample of US household organic and conventional fruit purchases from 2011 through 2013. The data comes from the Nielsen Corporation's Consumer Panel Data. Each year Nielsen recruits approximately 60,000 US households to record each purchase, food-related or not, they make over the course of the year. Sampled household purchase data, as well as the related household demographic information, can be projected to market, regional, and national levels using projection factors assigned to each household [57][58].
Using this dataset we generate the following data for each sampled household by month (household-month) from 2011 through 2013: 1) the ounces of type × variety fruit i that household k bought in month m, given by o ikm , where type is either organic or conventional and variety refers to apple, orange, strawberry, etc.; 2) household k's expenditure on type × variety fruit i in month m, given by e ikm ; 3) the average price of type × variety fruit i that household k faced in month m, given by p ikm (in some cases we had to impute price; see S2 Supporting information); 4) several vectors of household k's characteristics in month m, given by X km and C km ; and 5) household k's status as a low (130% or less of their federal poverty line (FPL)), middle (130% to 500% of their FPL), or high (greater than 500% of their FPL) income household in month m (S3 Supporting information). The FPL varies by year and family size. For example, in 2011 the FPL was $22,350 for a family of four and $26,170 for a family of five. However, in 2013 the FPL was $23,550 for a family of four and $27,570 for a family of five. We coded households at 130% of their FPL or lower as low income because they are eligible for the Supplemental Nutrition Assistance Program (https://www.cbpp.org/research/a-quick-guideto-snap-eligibility-and-benefits). We coded households at 500% or greater of their FPL as high income because many federal and nonprofit assistance programs (e.g., AIDS Drug Assistance Program, the Leukemia and Lymphoma Society's financial assistance programs) are not available to households in this category while it is for all other households.
The vector X km contains the household variables that previous research has flagged as affecting propensity to buy and overall consumption of organic produce, including the household's monthly real income (December, 2013 $), household size, whether or not the household contains one or more children under 18, whether at least one head of household has a college degree, whether the household is headed by a married couple, and the race of the head of household. The vector C km indicates m's season × year (e.g., winter 2011, spring 2011, etc.), whether or not household k lives in a metro or non-metro county in month m, and which Nielsen Scantrack market household k lives in during month m. For estimation purposes, we also define c km �C km where c km only includes the season × year interaction dummy variables. All values in X km and C km stay fixed within a calendar year. See Table 1 for a summary of some of these data across sampled households.
We include the market and metro classification variables in c km and C km to control for coarse differences in organic fruit availability across the US landscape [59]. We do not have the data to control for organic fruit availability at a fine scale. For example, grocery stores in higher income areas have been shown to offer healthier items [47] and more organic options [46] than those in poorer areas. Obviously, these fine-grain patterns of organic fruit availability could affect demand for organic fruit. However, some researchers have downplayed the importance of healthy food supply in a household's neighborhood on determining demand for produce. For example, Allcott et al. [47] found that the entrance of a supermarket with healthy food options into a "food desert" does little to affect the food choices of neighborhood residents. The implication is that households will travel where necessary to within a larger market area to satisfy their preferences. We only used fruit purchases, organic and conventional, recorded in the 2011-2013 Consumer Panel datasets that involved the scanning of a Universal Product Code (UPC) at a store. A fruit purchase was coded as organic if its UPC or item description indicated the item had the US Department of Agriculture (USDA) organic label. Therefore, if the description of a purchased product included a claim of organic but the item did not have the USDA organic label it was coded as conventional. In some cases, fruit purchases made with a UPC were recorded on a per item basis rather than a weight basis (e.g., two apples bought for $2 instead of 30 ounces bought for $2). Because our subsidy simulations manipulate organic fruit on a per weight basis (e.g., dollars per ounce) we converted all item-based expenditures and prices to ounces of expenditures and per ounce prices (see S1 Table for assumed fruit weights).
As of 2014, approximately 92 percent of organic food was sold via stores [60] and 40 percent of all fresh produce bought at US stores is done so with a UPC [47]. Therefore, given that UPC purchases represent a significant portion of all fruit purchases, our analysis of expected changes in UPC-coded fruit purchases due to a subsidy can plausibly be extended to cover all purchased fruit.
Finally, our dataset only includes 10 type × variety fruit combinations, including two 'other' categories. Specifically, i indexes organic varieties of apples, blueberries, oranges, strawberries, and "all other fruit" and conventional varieties of apples, blueberries, oranges, and strawberries, and "all other fruit" (S4 Supporting information). We focus on organic apples, blueberries, oranges, and strawberries because, based on expenditures, they are 4 of the 6 most popular organic fruit varieties in the US (S2 Table).

Trends in the organic and conventional fruit expenditure and price data during 2011-2013
Using each household's projection factors we found that total US household expenditures on organic fruits with a PUC increased from $144.82 million in 2011 to $211.53 million in 2013 (December, 2013 $) or 46.1% (all statistics and trends discussed in this section only refer to fruit, both organic and conventional, purchased with UPCs). Over the same time period, total US household expenditures on conventional fruits increased by 6.9% (Table 2). Overall, poor US households increased their expenditures on organic fruits between 2011 and 2013 by 68.7%, compared to the middle and high-income class' 34.5% and 51.6% respective increases. However, on a per household basis, the high-income bracket not only bought more organic fruit than typical low income and middle-class households, they also experienced the greatest growth in organic fruit purchases between 2011 and 2013 ( Table 2). The 2011 to 2013 growth in US household-level organic fruit purchases occurred at both the extensive and intensive margins. Growth on the extensive margin occurred across all three income brackets (Table 3). Among all US households, the number of households that only bought organic fruit in a given year and the number of households that bought some organic fruit in a given year increased by 80.0% and 35.7%, respectively, between 2011 and 2013. Conversely, over this same time period, the number of US households that only purchased conventional fruit fell 4.5%. The growth in "organic fruit-only" or "both varieties of fruit" households between 2011 and 2013 was greatest in the middle-income bracket ( Table 3). As to the intensive margin, of the US households that were represented in the 2011 and the 2013 Consumer Panels, their total expenditures on organic fruit was 47.1% higher in 2013 than in 2011.
The spatial concentration of organic fruit purchases made between 2011 and 2013 was more intense than that of conventional fruit purchases. Using household projection factors, we found that consumers in the six (eleven) of the 52 Nielsen Scantrack markets that spent the most on organic fruit were responsible for a third (a half) of all organic fruit purchases (Fig 1). Conversely, households in the top eight (fourteen) markets for conventional fruit purchases made during the 2011 to 2013 period were responsible for a third (a half) of all conventional fruit purchases.
The pattern of 2011-2013 organic fruit purchases across households ordered by income within Scantrack markets was also often skewed. In Fig 2 and S1 Fig we display the cumulative proportion of a market's expenditures on organic fruit from 2011-2013 against cumulative household income in each market. The figures also indicate the approximate break point between the middle-class and rich income categories along a market's household income spectrum. We conduct this Lorenz Curve analyses for 2-person households (the most frequent household size type) (Fig 2) and for 3-person households (the most frequent household size that typically includes a child) (S1 Fig). In both 2-person and 3-person households, disproportionate gains in organic expenditures within a market, if they took place at all, generally took place somewhere in the middle or rich portion of the market's income distribution. (In comparison, in almost every market, the poor and middle-class spend disproportionally more of their income on conventional fruit relative to richer households; see S2 and S3 Figs) There are some exceptions to this general rule. For example, poorer two-person households in Detroit and Nashville and poorer three-person households in Boston, Los Angeles, and Seattle purchased more than their fair share of organic fruit. Further, some markets had remarkably even distributions of organic fruit consumption across their income spectrums. This trend was especially evident in two-person households. Examples of markets with this trend include Boston, Columbus, Denver, Houston, Seattle, Orlando, and Phoenix. Finally, we present two summaries of 2011-2013 organic fruit prices. At the national level, the annual average price series for each organic fruit is either monotonically increasing or does not display a trend across the years 2011 to 2013 (S3 Table). The trend in regional organic strawberry prices is uniform: in all regions the average annual organic strawberry price fell between 2011 and 2012 but then increased between 2012 and 2013. The other three organic fruits do not display such uniformity in regional price trends.
In Fig 3 we give the national-level 5th percentile, mean, and 95th percentile organic to conventional fruit price ratios by month during the 2011 to 2013 period for apples, blueberries, oranges, and strawberries. The apple, orange, and strawberry price ratio trends are in line with Hallam's [61] earlier finding that organic price premiums are 20 to 30% in OECD countries. USDA-ERS [62] also analyzed prices for 18 fruits with 2005 data and found that the organic premium was less than 30% for most items. In our data blueberry price ratios were the anomaly to these trends as its price premium exceeded 100% in the summer months. This figure also highlights the strong seasonal trend in fruit prices; a cycle that is hidden when we averaged fruit prices across months (S3 Table).

Estimating household demand for organic fruit by household income class
To simulate the distributional impacts of an organic fruit subsidy we first had to estimate monthly household demand for organic fruit by household income class. We estimate these demand functions with three different methods. Each method complements each other as each method has unique statistical properties or assumes unique behavioral constraints. Therefore, the range in expected household monthly organic fruit consumption for each income class we derive is robust to various assumptions. Similar to Alcott et al. [47], we do not weight households by their projection factors when we estimate models of monthly household organic fruit consumption.
First estimation method: Separate equations. In our first estimation method we econometrically parameterize monthly conditional demand (positive) and unconditional demand (positive or zero), measured in ounces, for each fruit type × variety i and household income class z combination separately (z indexes the three household income classes we model, low, middle, and high). Therefore, in this first estimation method we do not assume that fruit consumption choices are made jointly nor do we impose any behavioral restrictions on household purchasing behavior.
The first estimation method's unconditional expectation for ounces of fruit type × variety i bought by household-month km 2 z is given by, where e ikm > 0 means household-month km 2 z purchased a positive amount of i. The conditional demand term E[o ikm | e ikm > 0], representing expected ounces of i purchased by household-month km 2 z given they buy a positive amount, is represented by the equation, where P km is the set of 10 organic and conventional fruit prices household k faced in month m, λ ikm (X km δ i + C km μ i + P km ω i ) is a Heckman-style correction term, and all other Greek letters represent model coefficients (the variables X km , C km and c km variables are explained at the beginning of the "Data" section). We include a Heckman-style correction in (2) because this subset of purchasing households may be a selected sample. The Heckman-style correction controls for the propensity of km 2 z to purchase a positive amount of i. Assuming the errors in the regression model designed to parameterize (1) are normally distributed, then Eq (1) is estimated with, |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } We estimate (3)-(4) in two stages. First, we use a probit over all household-month km 2 z to parameterize P(e ikm > 0). Then we use ordinary least squares (OLS) over household-month The predicted unconditional monthly ounces of i consumed by the representative km2z, given byô u1 iz ðX km ; c km ; C km ; P km Þ, is found by evaluating estimated, FðX kmdi þ C kmmi þ P kmôi Þ |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl } |ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl {zffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl ffl fflffl } at mean X km , c km , C km , and P km across all household-months km 2 z (S4 Table). The '1' inô u1 iz indicates the first estimation method. The predicted conditional monthly ounces of i consumed by the representative km2z where e ikm > 0, given byô c1 iz ðX km ; c km ; C km ; P km ;l ikm Þ, is found by evaluating estimatedÊ½o ikm je ikm > 0� at mean X km , c km , C km , and P km for householdmonths km 2 z where e ikm > 0 (S5 Table). To see the estimated forms ofô c1 iz andô u1 iz , including estimated δ i , μ i , ω i , β i , σ i and ϕ i , refer to the instructions for running the relevant Stata.do files in S5 Supporting information.
Second estimation method: Linquad. In the first econometric approach we assumed that expenditures on fruit type × variety i was exogenous to other fruit purchases and household budget. However, this many not be the case. Instead, consumers may allocate a portion of their income over a joint purchase of several fruit types and varieties. If this latter narrative better represents actual consumer behavior then we should estimate fruit purchases with a demand system where all fruit expenditures are determined jointly. In this case we adopted the LinQuad demand system [63][64][65][66][67][68]. This system models demand with a well-defined expenditure function and imposes several consumer theory restrictions, including homogeneity in prices and Slutsky substitution matrix symmetry. We selected LinQuad over other demand system estimation methods because LinQuad does not require the modeler to have household expenditure shares across all categories of consumables [69][70].
Under this method the unconditional and conditional expectations for ounces of i bought by household-month km 2 z is given, respectively, by, and wherel ikm , �ðX kmdi þ C kmmi þ P kmôi Þ, and FðX kmdi þ C kmmi þ P kmôi Þ are the same as the estimated functions found in (5). However, the function f(X km β i , c km σ i , P km θ i , s km ) in Eqs (6) and (7) [71] is not found in the first estimation method. This function explains a households' monthly consumption of ounces of i as a function of a household's characteristics, fruit prices faces, and the household's budget constraint, where s km refers to household k's income in month m (s km is a component of X km but is highlighted with its own symbol due to its importance), and the bracket of Eq (8) is km's budget constraint. The constraint in (8) limits expenditure on i by km 2 z to be less than or equal to its monthly income s km . Following the LinQuad method, we use a seemingly unrelated regression (SUR) approach to estimate f(X km β i , c km σ i , P km θ i )'s parameters β, σ, α, θ, and ψ across all i jointly (less organic-other and conventional-other) using all household-months km 2 z with imposed symmetry in the price coefficients and homogeneity in prices and income (we allow for correlation in the regression errors across equations). Notice the other difference between the LinQuad method and the separate equation method described above: with the LinQuad method E[o ikm | e ikm > 0] is estimated with all km 2 z, not just those where e ikm > 0. The predicted unconditional monthly ounces of i consumed by a representative km2z, given byô u2 iz ðX km ; c km ; C km ; P km ; s km Þ, is found by evaluating the estimated (6) at mean X km , c km , C km , and P km across all household-months km 2 z (S4 Table). The '2' inô u2 iz indicates the second estimation method. The predicted conditional monthly ounces of i consumed by a representative km2z, given byô c2 iz ðX km ; c km ; P km Þ, is found by evaluating the estimated (8) at mean X km , c km , C km , and P km for km 2 z where e ikm > 0 (S5 Table). To see the estimated forms ofô u2 iz andô c2 iz , including estimated β, σ, α, θ, and ψ, refer to the instructions for running the relevant Stata.do files in S5 Supporting information.
Third estimation method: LASSO. For the third demand estimation method we use the ML technique known as LASSO (least absolute shrinkage and selection operator). Unlike the econometric techniques, LASSO is not based on a data generating theory. Instead the "data" determines the final set of predictors (the universe of possible explanatory variables is the only place for economic intuition in these methods).
Economists have rarely used ML techniques like LASSO because they generate 'agnostic' predictive models rather than models of inference that allow for the testing of economic theory. For example, economists assume prices explain demand for a product and therefore include prices in the product's demand models. In contrast, the LASSO technique may find that some (or even all) prices are not particularly predictive of observed demand and therefore drop these prices from the final predictive heuristic in order to minimize mean squared prediction error (the objective of LASSO estimation). However, recall the point of this research is to predict as accurately as possible how an organic food subsidy would play out across household income brackets in the US. Therefore, using a predictive algorithm that might ignore basic economic intuition is consistent with our research motives (see [72][73] for other examples of counterfactual analysis with ML techniques). Further, if the two econometric estimation methods described above mis-specify the data generating process or behavioral constraints across consumers of organic fruit then the LASSO demand estimates could be the most accurate depiction of household demand for organic fruit.
Like the first estimation method, we use LASSO to separately predict unconditional and conditional demand for each fruit type × variety i in income group z. Also like the first estimation method, we estimate the LASSO-conditional demand function only using observations where e ikm > 0 (the second estimation method is a system where both conditional and unconditional demand is estimated over all observations). However, unlike the first two estimation methods, we did not predict representative household monthly demand for i at the mean regressor values with our LASSO model. Instead, with the LASSO method we predicted representative unconditional (conditional) household-month demand for i by predicting quantity demanded for each km2z (each km2z where e ikm > 0) and then taking the mean of the vector of predicted quantities (the mean of the predictions rather than the prediction over means).
In the first step of LASSO estimation we find the model coefficients that best explain whether or not a household-month km2z purchased i. This is done by maximizing the log likelihood function of the linear logistic model, where N z is the set of all km2z, I(e ikm = 0) indicates a km2z observation where i was not purchased, I(e ikm > 0) indicates a km2z observation where i was purchased, W km ¼ ½X 0 km C 0 km P km � is a vector of v standardized candidate predictors for the binary consumption of i, r W km ð Þ ¼ 1 1þexp ðÀ W km p i Þ is the probability that i is not purchased by km2z, the shrinkage penalty P v j ¼ 1 jp ij j is the sum of independent variable coefficients for purchases of i by km2z, and μ i is the tuning parameter that adjusts the severity of the shrinkage penalty. The shrinkage penalty in Eq (9) is a kinked function of π i . Therefore, the LASSO tends to set some model coefficients to zero.
The primes on X km and C km in the Eq (9)'s vector W km indicate that these householdmonth variable vectors are different than the X km and C km variable vectors used with estimation methods 1 and 2 in two ways. First, the X km and C km vectors used in estimation methods 1 and 2 are comprised of a set of author-selected household-month and market variables found in the Consumer Panel dataset. In contrast, X 0 km and C 0 km include all of the householdmonth variables available from the Consumer Panel dataset. Second, the household-month variables in X km and C km are simplified representations of the more complex raw data found in the Consumer Panel dataset. In X 0 km and C 0 km we use the more complex raw data. For example, in the Consumer Panel dataset a household-month is placed into one of nine categories regarding the number and mix of children in the household. In X km this variable was reduced to a binary variable that indicated whether household k had one or more children in month m or not. In X 0 km all nine children categories are potential predictors. Further, in the C km vector used in estimation methods 1 and 2 the rural-urban continuum code (RUCC) categories are reduced to a dummy representation [59]. In C 0 km all seven RUCC categories are used.
The second step in predicting monthly household demand for fruit type × variety i in income group z with the LASSO model involves minimizing the log-likelihood function, where T iz is the set of set of km2z where e ikm > 0, w ikm ¼ ½X 0 km C 0 km P km ; r ikm � is a matrix of q standardized candidate predictors for the consumption of i, vector ϑ i contains linear model coefficients for i, and μ i is the tuning parameter, as described above.
Note that w ikm includes km2z's probability of purchasing i in a given month. Because model (10) is agnostic regarding the distributional shape of a model's errors, we cannot use the inverse mills ratio to correct for self-selection bias when estimating and predicting demand for fruit i as we did with the econometric methods (the inverse mills ratio assumes normally distributed errors). Instead we use ρ ikm itself as a propensity score in the LASSO conditional demand function to control for self-selection (the coefficient on this variable is separately identified in the linear second stage model because the first stage is estimated using the non-linear logistic function).
To estimate (9)-(10) for i we first solved (9) over all km2z. Then we calculated r ikm ¼ 1 1þexp ðÀ W kmpi Þ for each km2z. Next, we solved Eq (10) for i, i.e., generatedθ iz , using the data w ikm ¼ ½X 0 km C 0 km P km ;r ikm � from the set of km2z where e ikm > 0. The LASSOpredicted unconditional monthly household demand for i was found by generatinĝ o u3 ikm ¼r ikm ðw ikmŴiz Þ across each km2z and then taking the average ofô u3 ikm . The LASSO predicted conditional demand was found by generatingô c3 ikm ¼ w ikmŴiz across all the set of km2z where e ikm > 0 and then taking the average ofô c3 ikm . The standard errors for estimates of π ij and θ ij are bootstrapped by replicating the joint solution to Eqs (9) and (10) 100 times. In each replicate, the sample is randomly drawn with replacement, so while the sample size is the same each time, the households represented in the dataset are different in each replication. See S6 Table for the means over a select number of variables found in w ikm . To see the estimated forms ofô u3 iz andô c3 iz , including estimatedr ikm andŴ iz , refer to the instructions for running the relevant R scripts in S5 Supporting information.
Simulating the total impact of organic food subsidization on representative households. First, as described above, we predicted monthly unconditional and conditional household quantity demanded for a representative household in each income class. Second, we reduced the organic fruit prices a representative household in income class z faced by 10% (given byP z ). In the econometric-based methods this meant reducing the z's mean prices by 10%. With the LASSO this meant reducing all z households' observed prices by 10%. Third, we predicted the amount of money the household would save with the subsidy if it continued to purchase organic fruit at predicted pre-subsidy levels (infra-marginal gain due to the subsidy). Let the unconditional and conditional infra-marginal gains due to the subsidy at representative household z using estimation method j be given byô uj iz ð:jP z ÞðP iz ÀP z Þ andô cj iz ð:jP z ÞðP iz ÀP z Þ, respectively. Fourth, we predicted the representative household's change in monthly unconditional and conditional quantity demanded. Let the change in unconditional and conditional quantity demanded of i at representative household z using estimation method j be given by Dô uj iz ¼ô uj iz ð:jP z Þ Àô uj iz ð:jP z Þ and Dô cj iz ¼ô cj iz ð:jP z Þ Àô cj iz ð:jP z Þ, respectively. The representative household's change in purchasing behavior and infra-marginal gain together indicate the total impact of the subsidy on the household.
Finally, we estimated each z's unconditional and conditional purchase and expenditure elasticities across each i and each estimation method j. A purchase elasticity for organic fruit i measures the percentage change in ounces of i bought per month for each 1% increase in the prices of all organic fruit. An expenditure elasticity for organic fruit i measures the percentage change in dollars spent per month on i for each 1% increase in the prices of all organic fruit. Therefore, in this case, a negative (positive) elasticity means predicted purchases or expenditures increased (decreased) with the organic fruit price subsidy. See Tables 4 and 5 for all predicted values ofô uj iz ð:jP z ÞðP iz ÀP z Þ;ô cj iz ð:jP z ÞðP iz ÀP z Þ; Dô uj iz ; Dô cj iz , and unconditional and conditional purchase and expenditure elasticities.
Please note that we do not specify the subsidy that would lead to a 10% reduction in the retail prices of all organic fruit. The source of the subsidy is outside the scope of our study. For example, prices that consumers face can be reduced by a point of purchase subsidy or a production subsidy that generates lower production costs. In the end, whatever form it would take, we assume that this subsidy reduces the prices of all organic fruit by 10% from 2011-2013 levels.

Impact of independent variables on the propensity to consume and amount of consumption
Below we do not discuss the impact of independent variables on the propensity of a household to consume organic fruit or the estimated impacts of small changes in independent variable values on organic fruit consumption. Instead the interested reader can generate all marginal effects by running the computer code and datasets provided in the SI. However, prior to discussing the subsidy's expected impact on households' inframarginal savings and consumption and expenditure behavior, we briefly note the impact independent variables had on the propensity of a household to consume organic fruit and the amount of organic fruit consumed in the estimated LASSO model. The number of times LASSO bootstrap replicates generate nonzero LASSO coefficients offers some information on the importance of the various independent variables in explaining organic fruit consumption behavior (S7 and S8 Tables).
Fruit prices, household income, number of children in the household, household race indicators, and the rural-urban continuum classification categories consistently generated nonzero LASSO coefficients across replicates in most organic fruit selection models. At the  quantity stage of consumption, LASSO replicates consistently generated non-zero price variable, income variable, and race variable coefficients over middle and rich household demand models. In contrast, prices and household-level variables did not consistently predict fruit quantity consumed in poor households. Overall, the LASSO replicates indicate that the variables typically used to predict or explain household consumption of organic food are least relevant among the poorest US households. This suggests that either 1) the poor had difficulty accessing organic fruit from 2011-2013 and therefore prices and household characteristics are irrelevant to their choices, 2) the decision to purchase organic fruit among poorer households is a function of omitted variables, or 3) some combination of both of these issues.

Impact of subsidy on conditional purchases and expenditures
For households that already purchase some amount of organic fruit i in a given month (i.e., conditional demand), an organic fruit subsidy would compel them to, on average, consume more organic apples, blueberries, and strawberries each month than they had previously, all else equal (see the monthly purchase columns in Table 4). Between these three fruits, conditional organic apple consumption would increase the most (in a relative sense) ( Table 4, S4  Fig). Conversely, the conditional consumption of organic oranges would, on average, fall or  The distributional impact of a green payment policy for organic fruit stay flat, all else equal. Of the four fruits we focus on, organic apples are the only fruit type that habitual buyers of organics, regardless of income class, would, on average, spend more on after the subsidy than before (i.e., only organic apple conditional expenditure elasticities tend to be negative; Table 4, S5 Fig).
The conditional inframarginal gains from the subsidy are similar across household and fruit types ( Table 4). Regardless of estimation technique or fruit type, the lower income class household often saves just as much or more on the inframargin due to the subsidy as the middle or upper-class household. These results indicate that among households that already purchase some amount of organic fruit i in a given month, pre-subsidy purchase patterns of i were not correlated with household income.
Further, there is no clear pattern in the subsidy's impact on conditional purchases across income classes (see the purchase elasticities in Table 4 and S4 Fig). Across all three estimation techniques, the middle-income class household would, in a relative sense, increase their conditional consumption of organic apples as much or more than the other two household types, all else equal. Further, relative conditional purchases of blueberries would generally (but not universally) increase the most, on average, at the high-income class household. Conversely, we found that the representative lower income household would, on average, be as responsive if not more responsive to the subsidy than the representative high-income class household in terms of conditional organic orange and strawberry purchases ( Table 4, S4 and S5 Figs). In conclusion, relative reaction to the subsidy is not consistently different across income types in households that already buy organic fruit (conditional demand).
Therefore, given similar pre-subsidy consumption patterns and no clear differences in purchase elasticities across household types that already buy organic fruit, the organic fruit subsidy would not be re-distributive at the consumption stage among the cohort of American households that already buy organic fruit. Of course, this assumes that no household class is particularly responsible for funding of the subsidy. If, for example, the subsidy was largely funded by the richer habitual buyers of organic fruit then the subsidy would re-distribute some welfare from richer to poorer households.

Impact of the subsidy on unconditional purchases and expenditures
Considering all households, not just those that already buy organic fruit i in a given month (i.e., unconditional demand), an organic fruit subsidy would compel them, on average, to consume more organic blueberries and strawberries each month than they had previously, all else equal (Table 5, S4 Fig). In fact, several predictions of the relative change in unconditional quantity demanded of organic strawberries are the largest relative changes we observe across all subsidy counterfactuals. Further, across all unconditional organic fruit-estimation method reactions we model, only the econometrically-derived predictions of change in organic strawberry quantity demanded indicates that typical US households, regardless of income class, would unconditionally spend more on organic fruit after the subsidy than before (see the unconditional monthly expenditure elasticity graphs in S4 Fig). Almost all other unconditional expenditure elasticity predictions indicate that a 10% drop in organic fruit prices would mean that a representative US household would (unconditionally) spend less on of organic fruit i than before despite buying more of i in reaction to the subsidy. This pattern holds across all three income types (recall these results only hold for organic fruit purchased via a UPC).
Except for organic apples, unconditional purchase elasticities are larger, in absolute terms, than their conditional analogs. Further, again except for organic apples, unconditional expenditure elasticities are closer to zero (or in some cases, more negative) than their conditional analogs. This indicates that the subsidy would have a relatively larger impact on the purchasing patterns of typical US households than the subset of US households that already buy organic fruit (conditional demand). In other words, our analysis suggests that the subsidy would change the organic fruit buying behavior of the casual consumer more, in a relative sense, than it would the habitual consumer of organic fruit.
We found that a 10% subsidy would affect unconditional inframarginal savings across income classes differently. Specifically, upper income households typically would save more on their pre-subsidy organic fruit purchases given the subsidy than their lower and middleincome peers. This outcome simply reflects that when considering all households, not just habitual buyers of organic fruit, a typical rich household already buys more organic fruit than the typical household from the other two classes ( Table 2). While one rich household's inframarginal gains due to the subsidy are very small relative to its representative peers-less than one cent per month across all fruits-if we sum this difference across all US households the aggregate differential across income class is significant. For example, using the data on the number of US households in each income class from Table 3 and the inframarginal gain predictions for strawberries from Table 5 we found that rich American households would gain approximately $2.65 million more in inframarginal savings per year than lower income American households. This is just considering strawberries.
Unlike conditional response to the subsidy, unconditional demand for organic fruit is such that a 10% organic fruit price subsidy would, on average, generate relatively different reactions across income class. Our analysis suggests that the subsidy would generally cause monthly organic fruit consumption at the representative poor or middle-income class household to increase relatively more than at the rich household, all else equal (Table 5, S4 and S5 Figs). This is unequivocally the case for organic apples, blueberries, and strawberries where, across all estimation methods, the lower or middle income representative households always has a larger (more negative) expenditure elasticity for these three fruits than the rich representative household.
In conclusion, considering all households, not just those that already buy organic fruit, their overall reaction to the subsidy is differentiated by income class. First, given that they already tend to buy more organic fruit than other household types, the bulk of unconditional inframarginal savings would accrue to upper income class households. However, rich households would not react as strongly to the subsidy as the other two household types; across almost all fruit and demand estimation technique combinations we found the largest (more negative) unconditional monthly purchase elasticity at the representative poor and/or middleincome class household. Whether the subsidy re-distributes unconditional surplus would ultimately depend on the tax incidence of the funds used to support the subsidy. A broadly applied tax or one that mostly relied on contributions from wealthier households would mean a re-distribution of surplus to poorer and middle-class households.

Comparing estimation methods
A comparison of subsidy impacts across the three estimation techniques reveals three discernable patterns. First, the conditional inframarginal savings generated with the LinQuad system is always a bit higher than the conditional inframarginal savings generated with the Heckmanlike and LASSO single equations, typically by about 10% to 25%. In other words, conditional organic fruit demand at pre-subsidy prices is always a bit higher according to the system-estimated demand equations than we predict with the other two methods. Second, the LASSO equations almost always find larger unconditional inframarginal savings than the other two methods (the exception to this trend is unconditional organic orange demand), but the difference is typically less than a penny for each fruit category, on average. Third, the LASSO-generated elasticity predictions almost always have standard errors that are as small or are smaller than the standard errors on elasticity predictions generated by the other two methods. However, we do not find that purchase and expenditure elasticities generated with one method are systematically smaller or larger than the purchase and expenditure elasticities generated with the other two methods.
As we mentioned above, empirical economists have rarely used ML techniques like LASSO because they generate 'agnostic' predictive models rather than models of inference. However, because our aim was to predict household reaction to a subsidy we were less interested in inference and therefore open to models primarily used for prediction. We were concerned that the LASSO-generated demand equations would include many fruit price variable coefficients equal to 0. If this had happened we would either have to conclude that fruit prices largely do not play a role in organic fruit consumption (highly doubtful) or that the LASSO method was not appropriate given our research aims. However, for the most part, however, LASSO-estimated demand equation did include non-zero price coefficients, particularly in the apple, blueberry, and strawberry selection stages (Eq (9)). Own organic and conventional prices as opposed to other fruit prices were particularly likely to have non-zero price coefficients (S7 and S8 Tables). In the few LASSO-estimated demand equations where prices were not found to be very predictive of purchasing behavior-this was the case for several low-income household demand equations and for several organic orange demand equations regardless of income class-we, not surprisingly, found LASSO-estimated purchase elasticities of zero or near zero and expenditure elasticities of one or near one.

Conclusions
We predicted the impact of an organic fruit subsidy on patterns of organic fruit consumption and expenditures across US households of different income classes. We were particularly interested to determine if the subsidy would, on average, favor households from a certain income class. If we estimated significantly larger inframarginal savings and a particularly elastic response to the subsidy in one class it would suggest that the subsidization would re-distribute some social welfare to that class, especially if the revenue for the subsidy came from the general taxpayer.
In contrast to other CT subsidization, which has been found to be strongly regressive at the consumption stage, such as roof-top solar in Australia [42] or electric cars in the US [45,74], we find evidence that organic fruit subsidization in the US would, if anything, change the relative consumptive behavior of poor and middle-class households more than that of richer households. Of course, because richer households in the US tend to buy more organic fruit than poorer households already, the inframarginal savings from the subsidy would tend to be larger in richer households. However, this inframarginal savings advantage enjoyed by richer households in general disappears among the smaller subset of households that habitually buy organic fruit (conditional demand): conditional inframarginal savings generated by fruit subsidies are relatively equal across all three income classes and fruit types.
While we find that most households would buy a bit more organic fruit with a price subsidy, whether the subsidy would generate net welfare across households is an open question. Ideally, we would evaluate a representative household's welfare gains from the subsidy using an explicit household utility function. However, the Heckman-like and LASSO single equation estimation methods do not specify a utility function and estimating welfare changes with the LinQuad system proved problematic due to the large number of zero demanders. Of course, consumer surplus is not the only benefit the subsidy would create. The subsidy could also generate some positive farm worker and consumer health, environmental, and rural equity externalities via the promotion of a CT. For example, if organic food is healthier and more nutritious to eat than conventional fruit or other foods, then a subsidy would also promote better health among the wider US population. On the negative side, raising revenue for the subsidy would generate dead weight loss in the US economy that would need to be set against the potential welfare gain of any positive externalities. It is our hope that future research addresses questions over the net welfare impact of organic food subsidization.
Our approach to estimating the impact of organic fruit subsidization across different household income classes is limited in several ways. First, data on price paid, weight or number of fruit items bought, or the organic / conventional status of the fruit for purchases not completed with a UPC can be missing or unreliable. Therefore, we did not include non-UPC fruit purchases in our dataset. Not only does this mean that our dataset misses all fruit purchases made at grocery stores that did not involve the use of a UPC, but fruit purchases made at delis, farmers' markets, road-side stalls, restaurants, etc., are also largely missing from our dataset. (This latter issue is less concerning given, as we mentioned above, 92 percent of organic food is purchased through grocery stores and natural food stores [75].) It is not clear how different our simulated policy impacts would be if we included fruit purchased without a UPC in our fruit dataset. For example, while approximately 40% of all fresh produce is bought with a UPC [47], if 20% of organic fruit is bought with a UPC but 50% of conventional fruit is bought with a UPC then our simulated results could be biased.
Second, we estimated average monthly fruit purchases each household faced by using observed prices from the household's home market. When no prices were observed relevant prices were imputed (S2 Supporting information). This process introduces additional unmeasured error into our estimates. However, given that the LASSO model usually assigned nonzero coefficients to the price variables and most price elasticities had expected signs it appears that this additional error did not eliminate the role that prices should and did have in explaining consumer behavior over organic fruit. We must note, however, that LASSO estimates over data with measurement error are prone to attenuation bias (estimates are closer to 0 than they really are [76]).
Third, when estimating organic fruit consumption we do not weight households by their projection factors. Therefore, it is not clear how well the estimated demand functions and policy simulations capture the universe of US households rather than just the set of sampled households. If we assume the estimated demand functions are representative of the universe of US households we can measure the error in not using the projection factors by comparing simulated estimated representative household demand for fruit I with and without projection factors. When we average predicted monthly consumption of i across all km2z with and without projection factor weights the predictions are generally within five to ten percent of one another (S9 and S10 Tables). The only exceptions to this pattern are found in some of the unconditional demand estimates across poor households. This finding adds weight to our earlier conclusion that our models do less well in describing demand for organic fruit among the poorer US households. Otherwise, assuming the estimated demand functions are representative of the universe of US households, our policy simulations are a reasonable guide to the impact of fruit subsidization across the US, not just across the sampled households.
Finally, we assume that monthly price fluctuations in fruit over the 2011-2013 are determined by exogenous supply fluctuations. To the extent that prices are endogenous, our coefficients will be biased. One might consider using an IV method to instrument for prices, but we lack data on variables that could serve as clean instruments: those that affect only supply and are unaffected by demand residuals. year t projection factor to arrive at market totals. This figure only includes fruit purchased with a Universal Product Code (UPC). In almost every market the poor and middle-class spend more than their share on fruit compared to richer households. It is likely that some of this uneven expenditure patterns across income spectrums are explained by the well-to-do's tendency to buy more food at restaurants, delis, and farmers' markets and less on groceries from traditional retail outlets relative to other household types (The JPMorgan Chase Institute 2016). In other words, households from the upper percentiles had less opportunity to buy fruit from stores, typically the place where fruit is sold via UPCs. year y projection factor to arrive at market totals. This figure only includes fruit purchased with a Universal Product Code (UPC). In almost every market the poor and middle-class spend more than their share on fruit compared to richer households. It is likely that some of this uneven expenditure patterns across income spectrums are explained by the well-to-do's tendency to buy more food at restaurants, delis, and farmers' markets and less on groceries from traditional retail outlets relative to other household types (The JPMorgan Chase Institute 2016). In other words, households from the upper percentiles had less opportunity to buy fruit from stores, typically the place where fruit is sold via UPCs.  Table. The average expected household monthly consumption of organic fruit i across all household-months km2z where e ikm > 0 (i.e., conditional demand) when each km2z's expectation is not weighted and is weighted with each km2z's projection factor. (DOCX) S10 Table. The average expected household monthly consumption of organic fruit i across all household-months km2z (i.e., unconditional demand) when each km2z's expectation is not weighted and is weighted with each km2z's projection factor.
(DOCX) S1 Supporting information. Net return to organic farming versus conventional farming.