Testing hypothetical bias in a choice experiment: An application to the value of the carbon footprint of mandarin oranges

This study investigates “hypothetical bias,” defined as the difference in the willingness to pay for a product attribute between hypothetical and non-hypothetical conditions in a choice experiment, for the carbon footprint of mandarin oranges in Japan. We conducted the following four treatments: a non-hypothetical lab economic experiment, a hypothetical lab survey, a hypothetical online survey, and a hypothetical online survey with cheap-talk. Each treatment asked participants to choose one of three oranges based on price and carbon emissions level. Next, participants were asked to answer questions on demographics and the following three kinds of environmental factors: environmental consciousness, purchasing behavior for goods with eco-labels, and daily environmental behavior. Using the random parameter logit model, the willingness to pay per 1g of carbon emission reduction were 0.53 JPY, 0.52 JPY, 0.54 JPY, and 0.58 JPY in the non-hypothetical lab economic experiment, hypothetical lab survey, hypothetical online survey and hypothetical online survey with cheap-talk, respectively. The complete combinatorial test of the willingness to pay for carbon emission reductions indicates no hypothetical bias between any treatment combinations. Our findings reveal that environmental attributes for food are less likely to show hypothetical bias than other goods. The results of the main effect with an interaction term show that environmental consciousness reduces the coefficients of carbon emissions in all treatments. Therefore, a psychological scale is useful for showing whether hypothetical bias emerges with treatment or participants’ personal backgrounds.


Introduction
The choice experiment (CE) is a type of stated preference method [1] that elicits consumer preferences based on various scenarios, unlike the revealed preference method. In a CE, participants choose one of several alternatives based on different levels of attributes. To measure consumers' willingness to pay (WTP), the CE is a useful method for examining how consumers choose products and services, such as food and beverages, transportation, environment, and health [1,2]. The CE is often employed as part of a survey in which participants do not buy goods or services; this method is therefore called a "hypothetical CE." Conversely, a "non-hypothetical CE" includes participants with monetary incentives applied through the experimental economics method developed by Vernon Smith based on the principles of the induced value theory [3], in which participants are awarded cash for their performance in an experiment and cannot be deceived. Thus, in a non-hypothetical CE, the experimenter provides real goods, including the attributes they offer, and participants receive money equal to the endowment minus the price of the goods they chose in the CE [4,5].
Comparing hypothetical and non-hypothetical CE methods reveals hypothetical bias (HB) [6]. Here, HB means that WTP for an attribute in a non-hypothetical CE is lower than that in a hypothetical CE. This concept was originally discussed in relation to the contingent valuation method (CVM) [7][8][9]: the inclusion of cheap-talk scripts in the CVM [10] can make participants more cautious and prevent HB. Therefore, this CE study considers HB to be present if the WTP for an attribute in the hypothetical CE with cheap-talk scripts is lower than that in the hypothetical CE. Table 1 summarizes previous studies on HB in the context of food by employing CE. Notably, Lusk and Schroeder were the first to study HB. In the case of beef steak, they found HB in total WTP, but not in marginal WTP [6]. Carlsson, Frykblom, and Lagerkvist found HB in WTP for genetically modified organisms (GMO) in chicken and beef [11]. Chang, Lusk, and Norwood observed HB toward organic beef and wheat; however, they forecasted market share and did not estimate WTP [12]. Similarly, Yue and Tong showed HB toward organic and local tomatoes [13]. Also notable is that Aoki, Shen, and Saijo revealed HB toward sodium nitrite in ham sandwiches [4]. Additionally, De-Magistris, Gracia, and Nayga Jr. found HB toward organic food and food miles in the case of almonds, but this HB disappeared when cheap-talk scripts and their original statements were included [14]. Grebitus, Lusk, and Nayga found HB toward apples and wine as well as food miles with regard to total WTP, conditional on participants' personal traits; however, they did not show marginal WTP [15]. Moser, Raffaelli, and Notaro found no HB toward the reduced impact of the climate on apples [16]. Alemu and Olsen, employing customized cheap-talk scripts and an augmented opt-out option, found HB toward cricket powder in bread [17]. Liebe et al. found HB toward organic and fair trade in tea in online environments [18]. Last, Wuepper, Clemm, and Wree found HB toward water footprint labels in coffee with regard to the marginal effect in online shops [19].
In addition to these studies in food, several CE studies reveal HB in environmental projects. Carlsson and Martinsson, the pioneers of testing HB in CE, did not find HB in this context [20]. Meanwhile, Araña and León investigated the test dynamics of HB in environmental projects over several months, and found that while HB was observed in the first month, it disappeared in the second month [21]. Johansson-Stenman and Svedsäter found HB for WWF donations but not local restaurant vouchers [22]. Hensher observed no HB for travel route choices [23]. As for studies on HB in other fields, Haghani et al. summarized [24,25].
Governments often employ online surveys to investigate how food labeling and packaging relate to consumer perceptions of value. While online surveys are useful and convenient, they are often criticized as involving HB. Furthermore, laboratory economic experiments have become increasingly popular but are also criticized for using small samples that do not reflect real markets and only recruit participants from the locality surrounding the laboratory, unlike online surveys, which can have a national scope. Therefore, experimental results have been criticized for their poor applicability to policy implications. Comparing laboratory economic experiments with online surveys reveals a tradeoff between a small number of samples with monetary incentives and a large number of samples without monetary incentives. However, based on Table 1, no study using CE has yet compared whether laboratory economic experiments or online surveys are more effective for testing new food policies. This background prompted us to employ CE to investigate the existence of HB by comparing laboratory economic experiments with online surveys. We conducted four treatments: a non-hypothetical lab economic experiment (NHLEE), a hypothetical lab survey (HLS), a hypothetical online survey (HOS), and a hypothetical online survey with cheap-talk (HOSCT).
We used Japanese mandarin oranges (Citrus unshiu Marc.) as the good, which are very similar to the mandarin oranges popular in Europe and America (Citrus reticulata) and one of the most popular fruits in Japan. We selected the oranges' carbon footprint as the attribute. The Paris Climate Conference (COP21) recently spotlighted Climate Action, one of the 17 United Nations' sustainable development goals (SDGs), which promotes reducing carbon footprints while simultaneously strengthening food security, including food production [26]. Governments are encouraged to evaluate the carbon footprints of their products and invest in products that can mitigate climate change [27]. In Japan, transport emissions account for approximately 80% of all domestic emissions; thus, the Japanese government must reduce transportation-related carbon emissions [28]. The fact that large metropolitan areas must Caputo, Nayga Jr, and Scarpa found that consumers valued carbon footprint labels more than food miles labels for fresh tomatoes [37]. Additionally, Grebitus, Steiner, and Veeman investigated the value of the carbon footprint compared to the water footprint for ground beef by comparing the impact of consumers' human value systems on food choices [38], and showed that a personal propensity to purchase influences the value of the carbon footprint [39]. Van Loo, Caputo, Nayga Jr, and Verbeke found that high-income earners valued carbon footprint labels more than other sustainable labels, such as organic, animal welfare, and free-range labels on chicken breasts [40]. The results of Thøgersen and Nielsen implied that people prefer the low carbon footprint and that the carbon footprint increases people's environmental consciousness [41]. Apostolidis and McLeay reported that sustainability-oriented consumers prefer low carbon footprint labels to other sustainability labels for mincemeat [ Since these studies found that other eco-labels and environmental awareness affected preference for CFP, this study also analyzed whether personal attributes, environmental psychology, and environmental consideration scales affected preference for CFP to examine the social factors that promote CFP.
The paper is organized as follows. Sections 2 and 3 explain the material and methods, and model, respectively. Section 4 describes the results. Section 5 discusses the results and Section 6 concludes the paper.

Experimental design
This study compares four treatments as shown in Table 2. First, the NHLEE employed the experimental economics method and was conducted in the university laboratory. Participants received an endowment to buy real, fresh oranges. They choose one of three oranges packed in a clear plastic bag. Second, the HLS employed a survey questionnaire conducted in the university laboratory. Participants were asked to imagine they had an endowment to buy real, fresh oranges. They chose one of the three oranges photographed in the NHLEE.
Third, the HOS employed an online survey questionnaire conducted by Rakuten Insight Global, which has the largest single panel in Japan. Participants were asked to imagine they had an endowment to buy real, fresh oranges. They chose one of the three oranges photographed in the NHLEE. Fourth, the HOSCT was the same as the HOS except that it included a cheap-talk script inspired by Carlsson et al. [11]. The cheap-talk script is provided in the Supplementary Materials.
For the online survey, the company recruited the respondents, and managed the participant identities. The authors did not know any personal identification for any respondents. The data collection complied with the Law on the Protection of Personal Information in Japan. The institutions and universities to which the authors belong do not require ethical approval for science research, except in instances that could be deemed life-threatening or harmful to human subjects. Each respondent participated in only one session.
For this study, we conducted the HOS and HOSCT treatments and used the date from the NHLEE and the HLS conducted in Aoki and Akai [34] and Aoki and Akai [33], respectively. NHLEE and HLS were conducted in 2012 and HOS and HOSCT were conducted in 2016. During the last eight years, CFP was not popularized, and no laws were made about it. The Consumer Price Index (CPI) [47] for 2012 and 2016 were 97.2 and 99.9, respectively, with 2015 being 100. 2012-2016 was a sluggish period in Japan's prices, as it was below 100, and the situation was similar for the previous eight years. In addition, the inflation rate announced by the IMF [48] was -0.06% in 2012 and -0.12% in 2016, both of which were implemented during a period of decline. Thus, Japan's economic situation has been implemented at a similar time.

Products and carbon footprints
As noted in the Introduction, we used the Japanese orange, satsuma mandarin (Citrus unshiu Marc.) as our product. The Japanese orange is one of the most popular fruits in Japan-it is produced in 36 out of all 46 prefectures [49] and is ranked third in terms of share in total fruit consumption [50], with almost 100% production self-sufficiency in Japan since 1965 [51]. All products were selected from three of the top five origins, namely, the Wakayama, Ehime, and Kumamoto prefectures, located 100km, 380km, and 800km respectively from the university laboratory in NHLEE.
The carbon footprint was calculated based on the four stages of the life cycle assessment: production [52,53], sorting and packing [54], transportation [55], and packaging [56]. In the production stage, the level of carbon emissions mainly depends on whether the product is cultivated in a greenhouse or outdoors. This study used oranges grown outdoors and transported by truck because it was conducted while they were in season. Among the four stages, the largest proportion of carbon emissions is generated during transportation.
Since the orange-growing process is the same across all three prefectures, the main difference in the carbon footprint of the oranges arises from the distance factor: the carbon emissions of each orange produced in Wakayama, Ehime, and Kumamoto are 20g, 30g, and 40g, respectively. We used these emission levels to measure the oranges' carbon footprint attribute. Participants were informed that the differences in the magnitude of the carbon footprint were primarily due to the distance from the origin. However, unlike in the case of food miles, the carbon footprint is calculated in terms of CO 2 ; therefore, participants did not know the origin of each orange.
We select oranges of size S, which are, on average, 7cm in diameter and 100g in weight. These oranges are relatively small, and smaller oranges are generally known to be sweeter than larger ones. Participants were informed about the size of the oranges; in fact, they could see the real oranges in the NHLEE and, in the other treatments, the photographs included their sizes. Fig 1 shows the three oranges used in the NHLEE treatment.

Choice set design
We used three Japanese orange alternatives, A, B, and C, and two attributes, price and carbon emissions (as a measure of the carbon footprint). The price had three levels per orange: 25 JPY, 35 JPY, and 45 JPY. These levels are based on two sources: the average retail price of oranges in the three largest supermarkets near the university laboratory and the average national price in all other supermarkets and shops in Japan [57]. The price levels across these sources are not significantly different from each other. The carbon emissions had three levels per orange, as advised in Section 2.2: 20g, 30g, and 40g per orange.
To create choice sets based on these price and carbon emission levels, we employed a Doptimal fractional factorial design using Design Expert 7.0, which is more useful for creating a reasonable number of choice sets in practice compared to a full factorial design. Ultimately, we created 24 choice sets and divided them into two blocks of 12 per the below reasoning. Design Expert 7.0 (State Easy) was used to design the choice set, with D-optimal as the design type and the coordinate-exchange algorithm (CEA) as the efficient search algorithm. CEA is known to work very well under multinomial logit discrete choice models [58]. In stores near our campus, size S oranges are usually sold in packs of 6. Given the 12 choice sets, two packs of oranges were used, totaling 12 oranges with a weight of approximately 1.2kg. The actual monthly consumption of oranges when they are in season (between November and March) is approximately 3kg per person. Thus, the weight of the 12 oranges in the study, 1.2kg, is equivalent to 40% of actual monthly household consumption. Because too many oranges indicate a heuristic participant decision, we ensured the amounts indicated reasonable decisions. Fig 2 illustrates an example of the designated choice sets.
Finally, we did not provide an opt-out option. While previous studies indicate that an optout option increases the realism of choice by preventing forced choices [1,23], Lusk and Schroeder and Alemu and Olsen found that the non-hypothetical condition induced a higher rate of choosing the opt-out option than the hypothetical condition does [6,17]. Notably, Carlsson, Frykblom, and Lagerkvist found that a CE with an opt-out option has greater unobserved heterogeneity than one without it [59]. We assumed that the induced value theory in experimental economics does not work in non-hypothetical experiments when an opt-out option is introduced. In the experiment, if we had introduced an opt-out option, then participants may have a different purpose from the experimenter. This is because the objective of some participants changes from choosing food, the purpose of the experiment, to just making money. Because the main premise of experimental economics is based on the induced value theory it cannot be a valid basis for the experiments in this study.

Questionnaire
After the CE, we conduct questionnaires consisting of five parts, as shown in Table 3. The first part comprised demographics such as gender, age, household, education, and income. The    Table 4. The responses were rated on a 5-point Likert-type scale, ranging from 1 ("always agree") to 5 ("never agree"). The third part examined the frequency of eating oranges per week. The fourth part comprised the purchasing behavior in relation to goods with an eco-label, developed specifically for this study, to test whether consumers prefer goods with an eco-label. The fifth part examined the daily environmental behavior, developed specifically for this study. This part consisted of six questions to test whether participants practice environmentally beneficial behavior in their daily lives. We employ these questions in the estimation model to further investigate the preferences regarding price and carbon footprint.

NHLEE.
We conducted the NHLEE treatment in the laboratory at Osaka University, Japan. The NHLEE treatment consisted of the following seven steps: Step 1: Each participant sat behind a desk, separated by 60cm × 80cm partitions in front and beside the desk, in the university laboratory, as shown in Fig 3. Step 2: Each participant submitted a signed consent form before the experiment began.
Step 3: Each participant received an instructional leaflet explaining the experiment's procedure; the experimenter also read the instructions aloud once. Step 4: Each participant received a choice set card and a clear plastic-hinged box containing three oranges, each with a different carbon emission level, as shown in Figs 1 and 2.
Step 5: Each participant selected one of the three oranges to buy with their endowment of 120 JPY. Each participant then wrote down the orange chosen in the choice set card. Next, the experimenters gathered the cards and the box. These processes constituted one round. We conducted 12 rounds.
Step 6: Each participant filled in the questionnaire after completing all rounds.
Step 7: Each participant received the rewards and selected oranges from all rounds. The rewards were calculated by the following equation: show-up fee and the sum of the changes of the endowment minus the price of the selected orange in each round. After a 10% income tax was deducted from the rewards, the remaining money was handed to the participants.
The entire process lasted for approximately 60 minutes. The instructions and questionnaires are presented in the Supplementary Materials.

HLS.
We conduct the HLS treatment in the laboratory at Osaka University, Japan. The HLS treatment follows the same steps as the NHLEE, except for Steps 4, 5, and 7. In Step 4, each participant received a choice set card and a photograph of the oranges taken in the NHLEE, as mentioned in Section 2.1. In Step 5, the experimenters gathered only the cards. In Step 7, each participant received a fixed reward of 1,000 JPY. The process lasted for approximately 45 minutes.

HOS.
The HOS treatment involved an online survey with the following five steps: Step 1: Each participant read an online instruction that explained the experiment's procedure.
Step 2: Each participant viewed an online choice set card and photographs of the oranges taken in the NHLEE. Step 3: Each participant imagined that they had an endowment of 120 JPY to buy an orange. Then, they selected one of the three oranges to buy. These processes constituted one round. We conducted 12 rounds.
Step 4: Each participant filled in the questionnaire after completing all rounds.
Step 5: Each participant received a fixed reward. In the online, the research company paid 1 point for 1 question which could be used for buying the goods on the EC site. The process lasted for approximately 15 minutes.

HOSCT.
The HOSCT treatment had the same steps as the HOS treatment except for Step 1. In Step 1, each participant read an online instruction as well as cheap-talk, presented on different screens. We fixed the time of the cheap-talk screen to 5 seconds to prevent participants from skipping the explanation. The process lasted for approximately 15 minutes.

The random parameter logit (RPL) model
The RPL model [61,62] is a popular estimation model employed in CE literature. The RPL model relaxes the assumption of the independence of irrelevant alternatives and assumes heterogeneous preferences across participants. This model enhances the accuracy and reliability of the estimated results. In addition, it is based on random utility theory, which is central to CEs. The basic assumption underlying random utility theory is that decision makers maximize utility; thus, the theory assumes that decision makers would select the alternative that maximizes their utility. Although the utility of an alternative for an individual (U) cannot be observed, it can be assumed to consist of a deterministic (observable) component (V) and a random error (unobservable) component (ε). Formally, an individual n's utility for alternative i in each of the t choice sets can be expressed as U int = V int +ε int = β 0 n X int +ε int . The density of β 0 n is denoted by f(β|θ), where θ is a vector of the true parameters of the taste distribution. X int denotes the explanatory variables of V int for alternative i, individual n, and choice set t. The random error component ε int is assumed to follow a Type I extreme value distribution and be independently and identically distributed.
The conditional probability of alternative i for individual n in choice set t is expressed as follows: The probability of the observed sequence of choices conditional on knowing β 0 n is expressed as follows: where i(n,t) represents the alternative selected by individual n from choice set t. The unconditional probability of the observed sequence of choices for individual n is the integral of the conditional probability over all possible variables of β 0 , and can be expressed as follows: In most applications, the density f(β|θ) is specified to be normal or log-normal-namely, β~N (b,W) or ln β~N(b,W)-where the mean, b, and covariance, W, are estimated. In this study, we use normal density.
Hence, the main effect in Model 1 and the main effect with the interaction term in Model 2 are estimated using the RPL model with socioeconomic characteristics. Thus, the two indirect utility functions are expressed as follows: where Price int and Carbon int are the price and carbon emissions, respectively, of alternative i, individual n, and choice set t. Attribute int consists of the price and carbon emission variables. Qestion na consists of demographics, the ECCB scale, the frequency of eating oranges, buying goods with an eco-label, and daily environmental behavior. Attribute int ×Qestion na is the interaction term; including it in the model further refines the estimation on individual-specific characteristics regarding the heterogeneity in the mean of the random parameter [2]. The interaction term shows the effect of demographics on indirect utility by interacting with the attributes and enhances the accuracy and reliability of estimates of V ni . β 1n , β 2n and β 3na are parameters estimated by the respective explanatory variables of the above attributes.
Generally, the RPL model consists of two types of variables-namely, fixed and random variables, which are assumed to be homogenous and heterogeneous preferences, respectively. In our model, Price int and Carbon int are the random variables.
In the RPL models shown in Table 1, prices are set as fixed parameters [6,12,[14][15][16]. However, a fixed cost coefficient is criticized because the assumption that all respondents have the same marginal price utility is unrealistic. Therefore, we used the random parameter model for the price. However, it is still an open question which distribution should be set as the random parameters in the RPL model [63]. In Table 1, only one study used the preference space with normal distribution for the price [11].
However, it has been shown that the moments of the WTP distribution become undefined when a random monetary attribute such as price is assumed to be normally distributed [64]. To avoid this, it is desirable to set the distribution of the random monetary attribute to the lognormal distribution, or to use the WTP space instead of preference space [63]. In Table 1, two studies used the WTP space with negative log-normal distribution for the price [17,18].
There is only one study of Carlson et al. which employed the preference space with normal distribution in the HB studies [11]. To show the additional evidence for the normal distribution in this field, this study employed the preference space with the normal distribution following the basic idea suggested by Train [62].

The marginal WTP
We calculated the marginal WTP for lower carbon emissions using the coefficients in the main effect. Since our model does not include the opt-out option in the choice sets, the calculated WTP is defined as the marginal WTP, rather than the total WTP, which indicates the total benefit. The marginal WTP shows the welfare measures represented by the marginal rate of substitution between the coefficients of the attributes and price. Thus, the calculated WTP is defined as: where α is the coefficient of the estimated Price, and β is the coefficient of the estimated attributes j except for Price. Standard deviations, standard errors, and the 95% confidence-interval bounds are derived using the Krinsky and Robb method. Since the estimated Price is normalized by the marginal WTP, we directly compare the estimation results of Carbon in each treatment.

Hypotheses
According to the complete combinatorial (CC) test [65], we set the null hypothesis of no difference in the estimated mean WTPs between two treatments as follows. This type of comparison has been applied by several studies [5,6,11,14,17,66]. : WTP NHLEE -WTP HOSCT < 0 This hypothesis checks whether the non-hypothetical laboratory setting without cheap-talk induces HB compared to the hypothetical online survey setting with cheap-talk. If H 0 6 is rejected, it implies that a laboratory setting with a small sample is more realistic than an online survey setting with a large sample, even though the survey involves cheap-talk. This is also one of this study's main contributions.

Samples
In the NHLEE and HLS treatments, we recruited both students from Osaka University and residents living in the area surrounding the university from a variety of demographic backgrounds. To recruit students, we distributed leaflets across campus. To recruit residents, we distributed flyers attached to the four most popular Japanese newspapers to 15,700 households. We conducted 15 sessions with 104 participants (41 students and 63 residents) in the NHLEE. The participants earned an average of 1,407 JPY. Only two men did not take the oranges home. In the HLS, we conducted 19 sessions with 212 participants (96 students and 116 residents).
In the HOS and HOSCT treatments, we controlled the samples in each treatment so that they were representative of the corresponding population with respect to age and gender in Japan. In addition, we assumed that participants were older than 18 years to ensure the same age composition as in the NHLEE and HLS treatments. Rakuten Insight Global distributed the survey to 15,379 people in their panels and recruited 500 people for each treatment. Table 3 summarizes the demographics of each treatment. In terms of gender, the percentage of women in the lab was higher than online. As for age, most of the lab participants were under 25 years old, while most of the online participants were over 40 years old. Regarding family structure, most of those for lab participants had four or more members, but most online had two or three. For education, college graduates were the most common for both. In terms of income, most Lab participants had less than 2.5 million or more than 7 million, while most Online had more than 7 million, followed by 2.5-4 million. In terms of frequency, lab participants were more likely to eat oranges in a week. The use of public transportation was more than twice as high in lab participants than online participants (50% in labs and 20% in online); Not tap running is higher for lab participants (more than 80% in labs but around 70% in online); Air conditioning is higher for lab participants (more than 70% in labs but around 50% in online); Walking and Garbage was higher in the lab at over 75% and online at about 70%. Overall, lab participants tended to have more heightened environmental awareness and caring behavior than online participants.
Although there are differences in each treatment, as a whole sample, the ratio of men to women was 47% and 53%, with slightly more women than men. The age ratio was 44% for those under 40 years old and 53% for those over 40 years old, with middle-aged respondents accounting for a slight majority. In terms of family structure, many respondents consisted of two to three members, which was the household structure. In terms of education, college graduates were the most common. In terms of income, the largest number of respondents had an income of over $7million, indicating that people with high annual incomes participated in the survey.

The main effect and the WTP: Testing for HB
We employed the panel RPL estimation model using LIMDEP 11.0 and NLOGIT 6.0. First, we tested the hypothesis of equal utility parameters among treatments using the likelihood ratio (LR) test [67]. The LR test employed the log likelihood values in Model 1, estimated using Halton draws with 50 replications, following Carlsson et al. [8], under the grid search procedure [67]. The LR test rejected the hypothesis at the 1% level (LR = 2(-12760.8-(-1020.2-2110.4-4827.0-4760.4)) = 85.6). Thus, we estimated each treatment.
Next, we analyzed the main effects of Price and Carbon in Model 1, as shown in Table 5. This model was estimated using Halton draws with 500 replications [68,69] to improve the validity. The variables Price and Carbon are random parameters in the model and specified to be normally distributed [61,68]. We considered that preference for price in this study is not as clear as in other studies [5]. This is because the higher price can be a signal of better quality. The variables Price and Carbon indicate significantly negative signs at the 1% level in all treatments. These results imply that people prefer to purchase cheaper oranges and oranges with lower carbon emissions, regardless of the setting. The standard deviations of these variables were significant in all treatments, implying that individual preferences for price and carbon footprint were heterogeneous in all treatments.
Finally, we calculated the marginal WTP in each treatment using the Krinsky and Robb method. The marginal WTP per 1g of carbon emission reduction was 0.53 JPY, 0.52 JPY, 0.54 JPY, and 0.58 JPY in the NHLEE, the HLS, the HOS, and the HOSCT, respectively. Additionally, we figured the kernel density distributions of mean marginal WTP for carbon emissions in each treatment, as shown in Fig 4. To test the hypotheses on HB mentioned in Section 3.3, we employed the CC test. The CC test did not reject all hypotheses at the 5% significant levels; Thus, the results do not indicate the presence of HB.

The main effect with the interaction term in investigating carbon footprint
We analyzed the main effect with the interaction term in Model 2 as shown in Table 6. First, we describe the independent variables except for the attributes in Model 2 below. The interaction term with attributes was employed for demographics, the ECCB scale, the frequency of eating oranges, buying goods with an eco-label, and daily environmental behavior, are detailed in Tables 3 and 4. The demographic dummy variables were Female and University. The ordered categorical variables were Age, Household, and Income. Regarding the ECCB scale, the order was changed in the analysis because the higher order was interpreted as agreeing more before estimation. We then assumed HighECCB as a dummy variable, assigning it the value of 1 if it was more than the average total score in each treatment and 0 otherwise. Frequency was the number of oranges the participants ate per week. Label represented participants who bought goods with an eco-label when they purchased goods and was a dummy variable. Regarding daily environmental behavior, six variables, Not using shopping bags, Using public transportation, Not running, AirConditioning, Walking, and Separating garbage, were used as dummy variables. Next, we explain only the significant results for the interaction terms. The estimation results for all variables in Model 2 are in the Supplementary Materials. For Carbon×Female, the Table 5. The random parameter logit regression results in the main effect (Model 1). HOSCT had a significantly negative sign, implying that females prefer oranges with low carbon emissions in the hypothetical condition with cheap-talk. For Carbon×Age, the NHLEE and the HLS had significantly positive signs, implying that students prefer oranges with low carbon emissions under the controlled environment monitored by the experimenter. For Car-bon×HighECCB, all treatments had significantly negative signs, implying that people with high environmental consciousness prefer oranges with lower carbon emissions, regardless of the setting. For Carbon×Frequency, the HOSCT had a significantly positive sign, implying that people who sometimes eat oranges prefer oranges with low carbon emissions in the hypothetical condition with cheap-talk. For Carbon×Label, the HOS had a significantly negative sign, implying that people who buy goods with an eco-label prefer oranges with low carbon emissions in the hypothetical condition. For Carbon×Not using shopping bags, the HOSCT had a significantly negative sign, which implies that people who do not use shopping bags prefer oranges with low carbon emissions in the hypothetical condition with cheap-talk. For Car-bon×Public Transportation, the HOSCT had a significantly negative sign, which implies that people who often use public transportation prefer oranges with low carbon emissions in the hypothetical condition with cheap-talk. For Carbon×Garbage, the HLS and the HOS had significantly negative signs, which implies that people who frequently separate their garbage prefer oranges with low carbon emissions in the hypothetical condition. For Price×Female, the NHLEE, HOS, and HOSCT had significantly positive signs, implying that males prefer a cheaply priced orange in the non-hypothetical laboratory and hypothetical  Table 6. The RPL estimation results in the main effect with interaction (Model 2) (Only significant variables). online settings. For Price×Age, the NHLEE, the HOS, and the HOSCT had significantly positive signs, implying that younger people prefer cheaply priced oranges in the non-hypothetical laboratory and hypothetical online settings. For Price×University, the HOSCT had a significantly negative sign, which implies that university graduates prefer cheaply priced oranges in the hypothetical condition with cheap-talk. For Price×Income, the HOS had a significantly positive sign, which implies that low-income earners prefer cheaply priced oranges in the hypothetical online setting. For Price×HighECCB, the HOS and the HOSCT had significantly positive signs, implying that people with low environmental consciousness prefer cheaply priced oranges in the hypothetical online setting. For Price×Lables, the HOS had a significantly negative sign, implying that people who buy goods with an eco-label prefer cheaply priced oranges in the hypothetical online setting. For Price×Not using shopping bags, the HLS, the HOS, and the HOSCT had significantly negative signs, which implies that people who use shopping bags prefer cheaply priced oranges in the hypothetical condition. For Price×Public Transportation, the NHLEE and the HOSCT had significantly negative signs, which implies that people who use public transportation prefer cheaply priced oranges in the actual condition as the non-hypothetical laboratory and hypothetical online settings. For Price×Not tap running, the HOS had a significantly negative sign, which implies that people who frequently do not leave the tap running and turn off the light after exiting a room prefer cheaply priced oranges in the hypothetical online setting. For Price×Walking, the HOSCT had a significantly positive sign, which implies that people who walk less in their daily life prefer cheaply priced oranges in the hypothetical condition with cheap-talk. As a result above, only Carbon×HighECCB demonstrated the same result in all treatments; we offer our reasoning for the HB in the following section.

Discussions
This study set the attributes of orange as Price and Carbon emissions and examined HB by comparing four treatments. A WTP analysis results did not show HB in any treatment comparisons. Meanwhile, results for the interaction term evidenced that people with high ECCB scales prefer oranges with low carbon footprints.

Comparison of studies employing common environments for non-and hypothetical conditions
In this study, comparisons between laboratory settings, between online settings, and between laboratory and online settings were examined. We will discuss the effect on HB, focusing on the research environment. Environments for comparing non-and hypothetical conditions in Table 1 are classified into two types: common environment and different environment. In the common environment, there are two conditions: the laboratory setting and the non-laboratory setting.

Laboratory setting.
Two previous studies were conducted in the laboratory setting. Lusk and Schroeder compared the non-hypothetical and hypothetical conditions in a university laboratory setting [6]. Their results did not show HB toward the following attributes of beef: Generic, GT, Natural, Choice, or CAB. Meanwhile, De-Magistris et al. compared the non-hypothetical and hypothetical conditions in a town laboratory setting in relation to almonds [14]. Their results showed HB toward the attributes of Organic and Food mileage labeling. However, in the comparison of the hypothetical and hypothetical with cheap-talk conditions, Food mileage no longer showed HB. In this study, the comparison between the laboratory settings (H 0 1 ) showed no HB, supporting Lusk and Schroeder [6]. Additionally, our results for the hypothetical and hypothetical with cheap-talk conditions also supported De-Magistris et al. [14]. However, the results of the non-hypothetical and hypothetical conditions were not supported. Apart from food studies, Carlsson and Martinsson compared HB in a university laboratory [20]. Their results did not show HB toward the attribute of Donation in relation to environment projects of the World Wide Fund for Nature (WWF). Johansson-Stenman and Svedsäter compared HB in college laboratories [22]. Their results did not show HB in the attribute of Campaign against the WWF's donations for the protection of elephants in Asia. Therefore, neither of these studies showed any HB for environmental or climate change attributes. Similar to previous studies on laboratory settings, this study had difficulty observing HB toward environmental and climate change attributes, perhaps because they may have characteristics very different from taste; that is, since the impact on consumer preferences is less likely in the first place, the HB is unlikely to be observed. Future research would do well to address this challenge.

Non-laboratory setting.
Previous studies in which HB comparisons were conducted in non-laboratory settings are divided into two types: non-face-to-face and face-to-face. To begin, we will describe the non-face-to-face type. Carlsson et al. used postal surveys to compare the hypothetical and hypothetical with cheap-talk conditions [11]. While the GMO attribute for beef showed HB, other attributes, such as Improved labelling, Out all year, and Mobile did not show HB. Liebe et al. conducted non-and hypothetical online choice experiments without cheap-talk by a professional survey organization and showed HB against attributes of Organic and Fair trade in tea [18].
On the other hand, we used online surveys. H 0 2 compared the hypothetical and hypothetical with cheap-talk conditions and did not show HB. Because Carlsson et al.'s study [11] as well as this element of our own used non-face-to-face methods the effect of cheap-talk has a chance to be weak in such an environment HB is difficult to find. Liebe et al. suggests that we have a chance to show HB for carbon footprint if we have an online experiment [18]. Next, we will describe the face-to-face type. Moser et al. employed an interview at a supermarket and conducted several treatments to compare the hypothetical and hypothetical with cheap-talk conditions [16]. They report that HB was present in some appearance-related attributes for apples; however, other attributes for apples, such as Method of production, Origin, Reduced impact on climate, did not show HB. They also compared the non-and hypothetical conditions in the same environment and found HB in some levels of Cultivation methods, Appearance, and Origin began to show HB. However, Reduced impact on climate still failed to show HB. Alemu and Olsen conducted an interview with town residents, comparing the hypothetical condition with a cheap-talk condition and the non-hypothetical condition [17]. In both comparisons, the attributes, Amount of cricket flour and Whether some portion of the wheat flour is fortified or not, showed HB. These face-to-face studies suggest that HB is hard to find in the attributes of environmental and climate change, as in laboratory settings. However, the effect of cheap-talk scripts cannot be generalized.

Comparison of studies employing different environments for non-and hypothetical conditions
Next, we will explain previous studies studying HB by comparing non-and hypothetical conditions under different environments. In a previous study comparing laboratory setting experiments with face-to-face street surveys, Aoki

Attributes and psychological factors
The previous sections note that environmental and climate change attributes seem unlikely to exhibit HB, regardless of the setting; similarly, in this study, HB was also not observed with environmental attributes. This section discusses our results from the ECCB scale. Our analysis using the interaction term revealed that people with high ECCB prefer a low carbon footprint over people with low ECCB in all treatments; this suggests that environmental awareness of carbon footprints is not affected by treatment type. Therefore, there may have been no HB in relation to the carbon footprint regardless of treatment. Like us, Grebitus, Lusk, et al. examined HB using psychological measures, verifying HB against the food miles attribute for apples and wine by considering the Big Six personality traits of Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism, and Agency [15]. They found that, in the hypothetical condition for apples, participants with Openness, Neuroticism, Extraversion, and Conscientiousness showed a lower probability of choosing the optout option, which was an option that does not include Food miles. However, in the non-hypothetical condition for apples, only Agency had a similar effect. In the results for wine, participants with Neuroticism, Extraversion, and Conscientiousness in the hypothetical condition showed a lower probability of choosing the opt-out option, and no psychological measure showed a similar effect in the non-hypothetical conditions. Ultimately, the psychological impact on food preferences attributable to food miles was stronger in the hypothetical condition than in the non-hypothetical condition, which suggests that food miles affected the HBnotably, this result differs from our own; the reasons for this asymmetry remain unclear.
Scholars believe that HB emerges for two reasons: differences in treatments influence participant psychology and psychologies of participants in each treatment differ in the first place. Identifying which reason causes HB in a particular case is a challenge for future research. In future studies examining HB, prior filtering of participants may be important in between-subject designs with regard to psychology.
Regarding other factors, the common interactions were Carbon × Age in the experiment, and Price × Female, Price × Age, Price × High ECCB, and Price × Not using shopping bags in the online sample. Compared to the experiment, the online sample has a more adjusted demographic ratio, so it is possible that the experiment could produce the same results if the same demographic ratio was used in the same way as the online survey. Other than these, the fact that the interactions differed between treatments is an issue for future work.
The reason "not leaving the tap running" and "walking", which were significant in PRICE, were not significant in CFT is that these two behaviors strongly reflect the awareness of saving money rather than CO 2 reduction behavior or good behavior for the environment. On the other hand, the reason why "separating garbage," which was significant in the CFT, was not significant in the PRICE is thought to be that garbage separation is significant because it affects the increase or decrease of CO 2 from the viewpoint of waste incineration, but garbage separation is not significant in the PRICE because it does not lead to saving behavior.
Additionally, low-income earners prefer cheaply priced oranges only in HOS which is the treatment most prone to virtual bias compared to the other treatments. Only one previous study, Van Loo et al. found the interaction with price and high income positive and significant in a hypothetical survey of cheap talk [40]. Since income generally has a strong influence on purchasing power, one possibility is that income may have a strong influence on the price in situations where no constraints are placed on it by the experimenter and nothing is said about it. However, the question of how treatment affects attributes is still open because, as mentioned earlier, there are no other studies comparing the two.

Limitations
There are three limitations. The first is the sample selection bias between treatments. To compare the laboratory treatments (NHLEE, HLS) and the online treatments (HOL, HOLCT), the responses had to be collected in different ways. In experiments, it is very difficult to recruit hundreds of subjects according to demographic ratios. Therefore, there is a difference in the demographic ratio and sample size between the online survey and the laboratory experiment and survey. For this reason, we tried to reduce the bias as much as possible by examining the interaction effect with personal attributes in the analysis. Future tests of the difference in HB should ensure that there is no difference in the subject samples between treatments.
Secondly, as a treatment, we did not do a laboratory survey with CT and non-hypothetical online experiment. If we do these two, we can cover all six treatments: implementation environment × CT presence/absence × monetary incentive presence/absence = 6 treatments. It is relatively easy to communicate CT in the lab. To do a non-hypothetical online experiment is a challenge, as research companies often pay monetary points instead of rewarding with cash, which is against the induced value theory suggested by Vernon Smith.
Third, in the method, we calculated WTP using preference space; since we needed to compare WTP for HB validation, it would be preferable to estimate WTP directly in WTP-space, considering scale effects between treatments. However, we tried to estimate WTP-space from the results of this study using the generalized mixed logit model of NLOGIT, but the estimation was not possible. An appropriate design for estimation in WTP-space is an issue to be explored in future research.

Conclusions
This study was conducted in laboratory and online settings, and neither treatment revealed HB in relation to the carbon footprint of oranges. Hence, we conclude that environmental and climate change attributes in food are probably less prone to HB. In addition, an analysis using the environmental psychology scale confirmed the trend, reducing the coefficients of carbon emissions in all treatments because the consumer's environmental awareness scale was not affected by all treatments. Accordingly, when testing for HB going forward, it is important to consider not only treatment differences but also the effects of psychological scales related to the target attribute. In sum, this study confirmed that policymakers would do well to consider psychology as well as attributes, rather than thinking only about whether to use an economic experiment or a survey.
Supporting information S1 Table. The random parameter logit regression results in the main effect (Model 1) and that with interaction (Model 2) (All). 28. Ministry of the Environment 2008, Greenhouse gas emissions and sinks calculation results (In Japanese); Available from https://www.env.go.jp/earth/ondanka/ghg-mrv/emissions/results-h20.html.