Taking dietary habits into account: A computational method for modeling food choices that goes beyond price

Computational models have gained popularity as a predictive tool for assessing proposed policy changes affecting dietary choice. Specifically, they have been used for modeling dietary changes in response to economic interventions, such as price and income changes. Herein, we present a novel addition to this type of model by incorporating habitual behaviors that drive individuals to maintain or conform to prior eating patterns. We examine our method in a simulated case study of food choice behaviors of low-income adults in the US. We use data from several national datasets, including the National Health and Nutrition Examination Survey (NHANES), the US Bureau of Labor Statistics and the USDA, to parameterize our model and develop predictive capabilities in 1) quantifying the influence of prior diet preferences when food budgets are increased and 2) simulating the income elasticities of demand for four food categories. Food budgets can increase because of greater affordability (due to food aid and other nutritional assistance programs), or because of higher income. Our model predictions indicate that low-income adults consume unhealthy diets when they have highly constrained budgets, but that even after budget constraints are relaxed, these unhealthy eating behaviors are maintained. Specifically, diets in this population, before and after changes in food budgets, are characterized by relatively low consumption of fruits and vegetables and high consumption of fat. The model results for income elasticities also show almost no change in consumption of fruit and fat in response to changes in income, which is in agreement with data from the World Bank’s International Comparison Program (ICP). Hence, the proposed method can be used in assessing the influences of habitual dietary patterns on the effectiveness of food policies.


Introduction
Computational models can serve as powerful tools that complement existing evidence-gathering techniques to develop human health policies [1]. These models can simulate multiple scenarios of policy change at low risk and cost. An important application for such models is in the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 area of food policy, where we can study the dietary effects of interventions such as food aid or nutritional assistance programs [2][3][4]. These policy-driven interventions are generally considered as our major tools to improve health outcomes such as obesity and malnutrition [5].
One technique that has been widely used to study food dietary choices is linear programming (LP) [6]. Specifically, LP has been used to generate simulated diets through algorithms that minimize the deviation from average population dietary patterns [7] while satisfying budgetary constraints [8]. LP has been frequently used to assess the economic feasibility of recommended diet guidelines [9,10] and explore approaches to improve the nutritional impact of food aid policies [11][12][13][14].
LP models are appealing, particularly in regard to their mathematical optimization capabilities: they always converge to a unique solution, which is the diet with the minimum deviation from a target diet [15]. Hence, different initial diets that reflect variations in prior eating habits would lead to the same result for the final diet under the LP algorithm. While solution uniqueness is a strength in many respects, in some circumstances it can be a limitation, particularly when history-dependence is of interest. Studies have shown that food preferences develop throughout life in response to food experiences and attitudes [16] starting from neurological and physiological development during childhood [17,18] or even before the childbirth [19]. In other words, taste preferences and eating habits are formed early in life and persist into adulthood, resulting in dietary habits that are resistant to change. In addition, there has been an increasing realization of the role of the brain reward system in affecting both liking (palatability) and wanting (appetite) of food [20]. These changes affect various hedonic, cognitive and homeostatic aspects of food choice [21]. Thus, habitual dietary choices have important influences that may be highly relevant when modeling responses to policy changes.
In this work, we demonstrate how an agent-based model (ABM) can be used for the analysis of dietary choices under budgetary constraints. An ABM is a software tool wherein a collection of simulated agents, representing individuals, are endowed with characteristics and behaviors that reflect the demographics and behaviors of a population of interest. While ABMs have only recently been introduced in the field of public health nutrition, they are gaining acceptance as a tool that provides insights that complements other existing tools such as LP [22][23][24]. In the authors' previous work on agent-based modeling [25], dietary behaviors of individuals from low-income families were examined. The goal of that work was to evaluate the effects of three price metrics (prices in relation to calories, servings and weight) on food choices. It was found that when simulated individuals in the ABM made food decisions based on price per calorie, the results most closely matched the average observed intake of lowincome individuals.
In the present paper, we extend this ABM to examine the effects of changes in food budget constraints on dietary choices. We consider two scenarios: in the first, food budgets of lowincome individuals are increased to the budget corresponding to the cost of the mean diet for all income groups, as determined by National Health and Nutrition Examination Survey (NHANES). The purpose of this analysis is to examine the extent to which prior dietary habits of low-income individuals would affect dietary behavior when food budget constraints are relaxed. In the second scenario, we analyzed the income elasticity of demand. The income elasticity of demand for each of the main food categories is defined as the percent change in the amount of consumed food in response to a one-percent increase in income. We compute the income elasticities of demand for four food groups, milk, grains, fruit and fats, which are chosen so that they can be compared with the available income elasticities reported by the USDA. Here, the changes in the quantity of the consumed food are considered separately for each food category. Both scenarios examine the effects of food budget changes; the primary difference is that an incremental change is examined in the second scenario.

Materials and methods
We present our method through a case study based on dietary behaviors of low-income adults in the US. Each agent is assigned an age, gender and BMI, which are randomly generated with means and variances that match the population with the lowest 41% income in the NHANES data set, as described in more detail in the next section. In our model, each simulated individual (agent) adjusts its diet by trying to keep her/his diet as close as possible to the mean diet. By mean diet, we refer to the average consumption pattern of our population, given by the average calorie intake for the main food groups (as defined by the USDA). In previous studies of the behavioral influences of food prices, the implicit assumption is that individuals are budget optimizers who consider price-calorie tradeoffs to determine their diets [7][8][9][10][11][26][27][28]. This assumption is based on observations that show that people change their sources of calories, rather than their total caloric intake, when faced with changes in food budgets or prices [7]. It has also been reported that low-income individuals maintain their identity and self-respect by retaining familiar dietary patterns instead of purchasing the least expensive source of nutrients to achieve a healthy diet [7,29].
Based on these observations, we follow the process shown in Fig 1A in our ABM to simulate an individual agent's diet choices under a food budget constraint. Here, an agent's initial diet is set equal to the mean diet, and is then iteratively and incrementally altered until its cost is less than the agent's food budget. The iteration begins by randomly choosing an increasing (B) Reversed scenario: agents begin with the diet derived from part (A) and adjust food intake when the food budget is increased to be equal to the cost of the mean diet. The difference between the two scenarios is shown using dotted circles, and boxes similar to the Normal Scenario are shown in gray.
food category (C-I) and decreasing food category (C-D). The probability p i of food category i being chosen as the increasing food category C-I (i = 1. . .N, where N is the number of food categories) is calculated by: where m i refers to the average percentage of calorie intake from food category i. The term e i is the absolute value of the own price elasticity of demand for food category i, defined as the percentage change in the demand for food category i in response to a one-percent rise in the price of that food category. (The absolute value is used because the own price elasticity of demand is negative.) Variable t i , which is the product of m i and e i , is divided by its sum to obtain the final set of probabilities p i . The probability p 0 j of food category j being chosen as the decreasing food category C-D proceeds with a similar set of expressions: It can be seen that m 0 j acts in a complementary manner in relation to m i , in that an increase in m i results in a decrease in m 0 j . It is noted that since P n i¼1 m i ¼ 1, we will also have P n j¼1 m 0 j ¼ 1. A higher m 0 j corresponds to a greater likelihood of being chosen as the decreasing category; as before, this likelihood is adjusted by multiplying by the price elasticity e j .
Once the two food categories are chosen, then a candidate for an iterative change in diet is determined by increasing the amount of energy received from food category C-I by a small amount (ε%), while simultaneously decreasing food category C-D by the same amount. This candidate change in diet is accepted only if the price (per calorie) of the first food category is less than the second; otherwise the food category choice algorithm is repeated. In this manner, the total net caloric intake remains unchanged, while the net cost of the diet decreases because of the differential pricing of the two food categories.
This algorithm uses the mean diet as one measure of the preferences while incorporating the influences of the price elasticity of demand, which reflect the willingness of an individual to change food purchases when faced with a price increase. Individuals are more resistant to changing consumption of foods with lower price elasticities. This means that for a food category i, a higher m i (average percentage of calorie) would result in a higher probability of being chosen as the C-I category, but this probability is adjusted by multiplying by e i (absolute value of the own price elasticity of demand) to account for the resistance to change the intake of this food category. It is noted that the use of the mean diet to determine the probability of choosing food categories implicitly considers other factors that affect food choices such as taste, convenience and cultural considerations, since all of these factors have already contributed to the diets that were used to calculate the mean diet.

Study population, design, and data sources
To assess the performance of our proposed method, we have simulated the food consumption patterns of the adult (>20 years) US population in the year 2001 using our ABM. The size of our agents' population was set to 201 million, equal to the number of adults in the US on April 2000 [30]. Age and gender of the simulated individuals are assigned based on the US Census data [30] while income and food expenditure data are assigned based on the US Bureau of Labor Statistics (BLS) dataset for 2001 [31]. The non-food portion of food expenditures (such as tips, labor and restaurant taxes) were excluded from the BLS food expenditure values, based on US Department of Agriculture (USDA) datasets on the expenditure of food away from home (FAFH) [32,33]. This adjustment was necessary because our data on food price (described below) only relates to the price of food itself. In this model, since our focus is on the changes in dietary patterns associated with budgetary factors, we assume that all food is prepared at home.
Each agent is assigned a value for average daily energy intake (EI). Based on the age and gender of each agent, its EI was drawn from the EI distributions as calculated from the NHANES 2001-2002 dataset. These EI values were sampled from 6 normal distributions as reported by Ford and Dietz [34] for the following subpopulations: 2 genders Ã 3 age groups (i.e., 20-39 yr, 40-59 yr and 60-74 yr). Diets are represented as a list of numbers containing the percentages of EI from the nine major food categories. These nine categories, as indicated by the USDA [35], are: 1) milk and milk products; 2) meat, poultry, and fish; 3) eggs; 4) dry beans, legumes, nuts and seeds; 5) grain products, 6) fruit; 7) vegetables, 8) fats, oils, and salad dressings; and 9) sugars, sweets, and beverages. The mean diet list is denoted as [m 1 , m 2 , . . ., m 9 ] in this paper. To calculate the mean diet, the average proportion of total EI from each of the nine major food categories was derived from the NHANES 2001-02 data [36]. We used the food code variable in the dietary food recall dataset of the same NHANES survey [35] to identify food categories. For adults, the calculated mean diet was: [10.7, 18.6, 1.9, 3.1, 33.4, 4.8, 7.8, 3.0, 16.6]. To avoid generating uncommon and unrealistic diet patterns in our model, the maximum possible value for each category of food is set to the 85 th percentile of the NHANES population. Similarly, the minimum value is set to the 15 th percentile.
The mean price per calorie ($/100 kcal) of each of the nine food categories is obtained from the work by Drewnowski, et al. [37]. This data was originally calculated using the USDA Food and Nutrient Database for Dietary Studies 1.0 (FNDDS 1.0) [36] and the Center for Nutrition Policy and Promotion (CNPP) food prices database [38], all for the year 2001. Additionally, food price elasticities for all food categories were obtained from Andreyeva et al. [39], except for the beans category, which was obtained from [40].

Experiments
We begin with a set of validation simulations in which we compare the results of our model with dietary patterns observed in the NHANES datasets. The purpose of this comparison is to ensure that our model is able to correctly simulate the food choices of individuals. We compare the food consumption patterns of our simulated individuals who had the lowest 41% of income against similar individuals in the NHANES 2001-2002 dataset. To identify the individuals with the lowest 41% of income from the NHANES dataset, we used the total family income variable in NHANES. Agents use the process shown in Fig 1A to adjust their diets based on their food budget. It should be noted that the NHANES dataset does not include food expenditure data; hence, we used the BLS dataset [31] for assigning both incomes and food expenditures of our simulated individuals.
The next set of simulations is designed to evaluate the ability of our method to account for the behavioral influence of prior diets. We begin by running our model until the diets of the agents are adjusted according to their food budget (which is essentially the same as our validation simulations described above). We then increase the food budget of the individuals in the lowest 41% income bracket so that the food budgets of these agents are set equal to the cost of mean diet, and then continue running the model until the cost of their diets reaches the new (increased) food budgets. When the food budgets are increased, the diet adjustment follows a similar logic as the process that was presented in Fig 1A. Details of this process are shown in Fig 1B. Instead of trying to decrease the cost of her/his diet, an agent changes her/his diet to increase its overall cost. The values for the increasing (C-I) and decreasing (C-D) food categories are updated only if the price of C-I is greater than C-D; otherwise the food category choice algorithm is repeated. After the simulation is complete, we check our results to see if the resulting final diets after the increase in food budget is affected by prior diets. If our model is insensitive to prior diets, then the final diet would be similar to the mean diet; otherwise, the final diet would be significantly different.
We conclude with a set of income elasticity experiments, in which we simulate the income elasticity of demand by using our ABM, and then compare the results to US data. Income elasticity of demand (not to be confused with the price elasticity of demand) refers to the change in the demand of a certain good in response to a 1% increase in income. We use our model to calculate the income elasticity of demand for four food categories: milk, grains, fruit and fats, which were compared with the available income elasticities in the World Bank's International Comparison Program (ICP), as reported by the USDA [41]. These four categories were selected because these were the only four food categories in our analysis that had corresponding matches in the ICP dataset. We also developed a linear programming (LP) method to compare techniques for simulating the income elasticity of demand. This LP model uses the same set of inputs as our ABM, and is based on the minimization of the distance of the agents' diets from the mean diet. Other constraints have been also added to this method, including minimum and maximum allowable value for the consumption of each category of food. More details about the LP model are provided in the S1 Text.
The model was developed in the NetLogo environment [42]; the model and its source code is publicly available [43]. Because of the stochastic nature of ABMs, we run our models 100 times, and report the average and confidence intervals of the obtained results. Additional details about our ABM are provided in S2 Text. We formally test the null hypothesis that the simulated diets are identical to the NHANES mean diet, with significance level set to p = 0.05. We focused primarily on the individuals with the lowest 41% of income. We report our simulation results for other income percentiles in S3 Text. Fig 2 shows the average percentage of energy intake from the nine major categories of foods for the simulated individuals who have the lowest 41% of income. In four of nine cases, the confidence intervals of the simulated results overlap with the average intake of individuals of this income bracket that is recorded in NHANES. We performed a series of t-tests on the simulated results, and at p = 0.05, no significant statistical difference was found between the NHANES data and our simulated results for the Dairy, Grains, Fruits and Sugars food groups.

Dietary response to increasing budgets
In this series of simulations, the budgets of agents in the lowest 41% income bracket were increased, and set equal to the cost of the mean diet. This simulates the effects of removing budgetary constraints on low-income individuals. The final diets generated by our method are compared with the NHANES' mean diet in Fig 3. None of the nine simulated categories had 95% confidence intervals that overlapped with mean diet values, indicating significantly different results between the ABM-generated diet and the mean diet. In fact, after performing t-tests  comparing the NHANES data with our model's results, statistically significant differences were observed in all 9 food groups (p<0.001 for Dairy, Meat, Eggs, Beans, Grains, Fats and Sugars and p<0.05 for Fruits and Vegetables). S4 Text shows the results obtained from a similar experiment using the LP method. The ABM results are close to the actual data with overlapping confidence intervals. More importantly, the ABM results correctly identified the normal goods (i.e., those goods that are consumed at higher levels when the consumer experiences an increase in income, which in this case are milk and fruit), as well as the inferior goods (i.e., those goods that are consumed at lower levels, which in this case are grains and fats). In general, changes in income resulted in exaggerated dietary changes in the LP model, leading to overestimates in the magnitudes of the income elasticities of demand.

Discussion
Previous work has shown that dietary behaviors are resistant to change because of habits that have been built up over a lifetime [44]. Experiences affecting food choices start from pregnancy, and as children grow, parental modeling and familiarity plays an important role in their developing food preferences [45]. Studies on older adults found that their food choices are strongly influenced by past experiences and beliefs [46]. Life course factors are suggested to be the major component of food choice process, which refer to personal roles and the social, cultural, and physical environments to which a person has previously been exposed [47].
For these reasons, we have developed an ABM for simulating dietary choices that is sensitive to prior dietary habits. We did observe slight increases in the consumption of Fruits and Vegetables after the increase of food budgets, which is in line with other studies on the outcomes of food assistance programs like the Supplemental Nutrition Assistance Program (SNAP) [48]. Nevertheless, our simulated results show that when the income of low-income individuals was increased so that they afford the cost of the mean US diet, their diets did not approach the proportions found in the mean diet. Instead, the diets retained some of the unhealthy characteristics of the former diets under more constrained budgets. In particular, the consumption of Fruits and Vegetables was lower and the proportion of Fats is higher than what is found in the mean diet calculated from NHANES. In a relevant study, it was found that conditional cash transfer programs (a type of welfare assistance program) increased household fruit and vegetable consumption, but also led to excess energy consumption in poor communities in Mexico. An application of our method could be in determining those food categories that require special attention in food policies, as we have demonstrated variable resistances of low-income individuals in adopting healthier choices in different food categories. Our method can be also used as a virtual laboratory to test the potential outcomes of a range of candidate interventions, so that we can determine the interventions that maximize desired health outcomes, such as higher consumption of fruits and vegetables.
We have also observed that the income elasticity values of demand that were simulated by our model are closer to the World Bank ICP values as compared with LP results. One important reason for the difference between ABM and LP results is that the LP algorithm always leads to the same diet, regardless of the initial diet used in the procedure. The influence of initial diet in the LP analysis is only on the computation time required to arrive at the final result. It is noted that the uniqueness of the LP result for each level of the income constraint is mathematically appealing, and is an attribute that is suited for many studies of dietary behaviors [3,8]. For instance, LP models have been proven to be useful because they can isolate the effects of economic and other constraints and examine their impact on food selection [6]. ABMs, which have been used by researchers to study other areas of nutrition [22,23,49,50], can complement LP methods when complex effects of individual and environmental characteristics are of interest. These complex effects can include prior diets, which is the focus of the present study, or other behaviors, such as food dependence on high-fat and high-sugar foods, which have been implicated as a driver of obesity [51]. Hence, policy makers interested in analyzing the potential effects of candidate interventions and policies would benefit from the complementary information they can gain from both LP and ABM models. When compared with other commonly used methods, the only extra data requirement for the proposed method is the price elasticity of demand. These data are easily accessible in reports such as [39] and [40].
The present study has some limitations. The values for the price elasticity of different food categories were determined in terms of consumption patterns of an entire population (including both low-and high-income individuals); however, low-income individuals tend to be more price sensitive, with slightly higher price elasticities. Additionally, our model was designed with a set of simple rules for the dietary choices. While we believe that the current design suffices for the purpose of this work (i.e., assessing the effects of habitual dietary tendencies), in future investigations, more complex models will be needed if additional behaviors are of interest.