Side effects may include: Consequence neglect in generating solutions

Christopher Rodriguez; Daniel M. Oppenheimer

doi:10.1371/journal.pone.0322149

Abstract

Strategies designed to address specific problems often give rise to unintended, negative consequences that, while foreseeable, are overlooked during strategy formulation and evaluation. We propose that this oversight is not due to a lack of knowledge but rather a cognitive bias rooted in focalism—the tendency to focus narrowly on the primary objective, ignoring other relevant factors, such as potential consequences. We introduce the concept of consequence neglect, where problem solvers fail to generate or consider downstream effects of their solutions because these consequences are not central to the proximal goal. Across four studies, we provide evidence supporting this phenomenon. Specifically, we find that individuals rate strategies more negatively after being prompted to generate both positive and negative consequences, suggesting that negative outcomes are not naturally weighted unless attention is explicitly drawn to them. We conclude by discussing the broader implications of consequence neglect for policymaking, business, and more general problem solving, and offer directions for future research.

Citation: Rodriguez C, Oppenheimer DM (2025) Side effects may include: Consequence neglect in generating solutions. PLoS ONE 20(4): e0322149. https://doi.org/10.1371/journal.pone.0322149

Editor: Tobias Otterbring, University of Agder: Universitetet i Agder, NORWAY

Received: November 16, 2024; Accepted: March 17, 2025; Published: April 30, 2025

Copyright: © 2025 Rodriguez, Oppenheimer. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data within our manuscript can be found in the files tab of the following OSF link: https://osf.io/8jrew/

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

“There was an old lady who swallowed a fly; I don’t know why she swallowed a fly - Perhaps she’ll die!

There was an old lady who swallowed a spider, that wriggled and jiggled and tickled inside her! She swallowed the spider to catch the fly; I don’t know why she swallowed a fly - Perhaps she’ll die!

There was an old lady who swallowed a bird; How absurd to swallow a bird!...”

- Rose Bonne

In a classic children’s song, an old lady swallows progressively larger animals, each intended to solve the issue of having swallowed the previous one, but ultimately leading to bigger problems down the road. The rhyme concludes: “There was an old lady who swallowed a horse; she died of course!” While silly and absurd, the song contains an important moral for children: the importance of considering the future consequences of their actions. In every verse it was clear that swallowing a larger animal would have negative outcomes, but the old lady repeatedly failed to anticipate them. Of course, the old lady of the song is fictional, but her consequence neglect mirrors a common pattern in real-life decision making where obvious future consequences of current actions are often ignored.

For example, during the British occupation of India, to combat the growth of the cobra population, the British created a bounty for cobra skins. While the goal was to encourage people to hunt cobras, thus reducing the cobra population, it overlooked an obvious consequence: cobra breeding (with the purpose of selling their skin) developed into a lucrative business. The British were then stuck paying for many cobra skins without meaningfully reducing the cobra population. To resolve this, the British called off the bounty. Unfortunately, this resulted in another foreseeable side effect: when cobra farming stopped being profitable, cobra farmers simply releasing their livestock into the wild. Suddenly, the British found themselves with a cobra problem more severe than before [1].

This fiasco could have been avoided if only the British had thought through what the likely consequences of various policies (above and beyond those intended by the policies) might be. Such oversights are common. This paper aims to address why. We suggest that when there is a problem that needs to be solved, evaluators tend to have “tunnel vision” that focuses on how a given solution will solve that problem (achieve the proximal goal) to the exclusion of considering other consequences that the solution might engender. That is to say, the problem-solving process leads people to consider too heavily the question of “how well does this solution solve the problem?” at the expense of the question “how good is this solution wholistically?” Consideration of potential downstream consequences is crucial for evaluating a solution wholistically. We suggest this is important in both formal policy-making contexts (e.g., imposing tariffs) as well as everyday problem-solving domains that all people may face (e.g., motivating a teen to study). Thus, we say that someone is exhibiting consequence neglect when they fail to consider consequences that materially affect their evaluation of a policy (or a problem-solution more generally), despite being capable of predicting those consequences.

We are not the only scholars who have observed obvious consequences going overlooked. For instance, Bazerman [2] discusses climate change as a “predictable surprises” which he defines as “an event or set of events that catch an organization off-guard, despite leaders’ prior awareness of all of the information necessary to anticipate the events and their consequences.” He argues that scientists have long been telling us about the dire effects of climate change, and thus, disasters driven by the effects of climate change are predictable surprises. In previous work, Watkins & Bazerman [3], outline a framework which can be used to describe predictable surprises and when they may occur. Specifically, their framework suggests something to be a predictable surprise if it falls into one of three categories: 1) the emerging problem was not recognized and should have been, 2) the emerging problem was recognized but not prioritized when it should have been, or 3) the emerging problem was recognized and prioritized but a response was not mobilized when it should have been. To utilize this framework, we believe the phenomenon of consequence neglect lives within category 1, when the emerging problem was not recognized but should have been. The British would not have thought the cobra bounty policy to be an effective policy if they had considered the predictable surprise, (or in our terms, non-focal consequence) of people starting cobra farms.

Other scholarly work related to consequence neglect lies in the phenomenon of opportunity cost neglect [4], defined as when consumers fail to consider alternative uses for their money when making a consumption decision. The authors show that prompting consideration of opportunity costs leads to a reduction in purchase rates. One could construe opportunity cost neglect as a form of consequence neglect. When considering a purchase, consumers focus on whether they like the product, while neglecting the consequence that if they spend their money on the product under consideration, they will not have that money in the future to spend on other, possibly more desirable products. More recent literature has extended this finding from consumer behavior to public policy, showing that participants taking on the role of policy makers were less likely to invest in specific policies when reminded about the opportunity costs of doing so [5].

Why would people neglect consequences when doing so can lead to suboptimal decisions? Watkins & Bazerman [3] highlight self-serving biases, and a number of, organizational, and political reasons for why predictable surprises may occur. We do not dispute the importance of such factors in creating predictable surprises, but here we work at a different level of analysis, showing how basic cognitive mechanisms can lead people to neglect consequences. It is well known that people have limited cognitive capacity [6] and shown that people take cognitive shortcuts to reduce the mental burden of difficult problems [7]. One common way in which people simplify decisions is by reducing the amount of evidence that they consider (for a review see [8]). Previous research has documented considerable evidence that when people are focused on particular goals, cues, or features, they often neglect other relevant information that would increase the quality of their decision making: a phenomenon known as focalism. For instance, Wilson et al [9] asked football fans how their happiness would be influenced by the outcome of a football game of a team they supported. The question focused participants on a specific source of happiness (the happiness caused by their favored team winning or losing) leading participants to overestimate the extent to which their overall happiness would be impacted by the game and neglect how other events in their lives (e.g., upcoming holidays, romantic interactions, work obligations, etc.) would affect their happiness. However, when participants’ attention was drawn to the fact that those other future events would also impact their happiness, participants ceased to overweight the effect of the game. Schkade and Kahneman [10] document a similar focusing illusion in the effect of weather on predictions of life satisfaction. Focalism even influences how academics design studies and interpret research results [11].

We argue that even when people are capable of generating unintended consequences of a solution to a given problem, they may not attempt to do so because these consequences are not naturally focal to the development of the solution. In problem solving contexts, most focal to the evaluation is the solution’s effectiveness at dealing with the primary problem at hand. However, many consequences are not directly related to the initial problem, even though they are crucial to a proper evaluation of the solution as a whole; we call such consequences material consequences. For instance, the old woman in the opening example was focused on how to catch a fly in her stomach, and the British were focused on how to get people to kill cobras, those are focal consequences. However, the fact that the old woman would now have a spider in her stomach, and the British had incentivized breeding cobras are material but non-focal consequences of their solutions; they are the direct results of their solutions, but are unrelated to the initial problem to be solved. Due to focalism, material but non-focal consequences of a solution may never be called to mind, leading to the adoption of less effective (or even harmful) solutions. Thus, asking a consequence neglectful person to explicitly consider consequences should lead to better policy outcomes because it forces them to confront material consequences that were not focal to the policy’s initial goal.

Other scholars have documented domains in which changing what is focal in our decision context can lead to shifting preferences [12,13]. Thus, providing a decision-making framework that calls for explicit consideration of consequences could serve as a relatively minor nudge [14] with a potential meaningful impact on how policies are selected and chosen to be implemented.

In order to explore consequence neglect, we use an experimental paradigm stemming from the illusion of explanatory depth literature (IOED) [15]. We use this paradigm not because we think that the theoretical construct of consequence neglect is related to the illusion of explanatory depth, but instead because it allows a convenient within-subjects method which can be used to detect evidence for consequence neglect. In classic studies, participants evaluated their understanding of the functional mechanisms of a variety of objects (e.g., toilets or helicopters) both before and after attempting to give a detailed explanation of the mechanism. This allowed the authors a within-subject method of measuring how much people overestimated their explanatory prowess. Notably, when evaluating their knowledge of how objects work, participants tend to focus on the function or purpose of the object (which they do understand), rather than on the mechanisms that allow the objects to achieve their function/purpose (which they do not understand) [16]. This leads people to overestimate their understanding unless their focus is drawn to the steps of the mechanistic process.

Our studies utilize a similar structure. In each of our studies, participants were presented with predicaments. Participants then either read a proposed solution or generated their own solution (depending on the study) and evaluated how favorable they found the proposed solution. Subsequently, participants generated (in a free response format) two positive and two negative consequences of the proposed solution. Finally, participants again evaluated how favorable they believed the proposed solution to be.

If participants had considered consequences in their initial evaluation, then highlighting the consequences should have little impact on their subsequent re-evaluation; those consequences would already have been incorporated into their judgments of favorability. However, if despite being capable of generating consequences material to a policy’s evaluation, participants initially neglected consequences, then making consequences focal (by explicitly asking participants to consider them) should lead participants to updates to their assessments of the policy’s quality.

To foreshadow the results, across multiple policy contexts, participants reliably evaluate policies to be of lower quality after having been prompted to generate both positive and negative consequences for those policies. This pattern of results is consistent with the idea that future benefits (solving the proximal problem) had already been incorporated into evaluations, but future costs (unintended negative consequences) had not. It is notable that in several of these studies participants generated these consequences themselves; thus, they could have been taken into consideration during their initial evaluation (i.e., these aren’t consequences that were unforeseeable). Below, we summarize each study and its key findings.

Study overview

Study 1: Establishing the existence of consequence neglect.

In our first experiment, participants were presented with six pre-selected solutions for addressing different societal and organizational problems. They rated the effectiveness of these policies before and after being prompted to generate both positive and negative consequences. We observed a consistent downward shift in policy ratings after the consequence generation task, suggesting that negative consequences were not initially considered. Additionally, we explored whether individual differences (e.g., cognitive reflection, demographic variables) predicted consequence neglect but found no significant associations.

Study 2a: Consequence neglect in self-generated solutions.

Building on Study 1, Study 2a examined whether consequence neglect extends to self-generated solutions. As in Study 1, a consequence generation task led to significant downward revisions in ratings, confirming that individuals neglect negative consequences even in solutions they personally devise. This finding suggests that consequence neglect occurs not only in policy evaluation but also during the problem-solving process itself.

Study 2b: Comparing self-generated vs. other-generated policies.

Study 2b tested whether consequence neglect differs between solution generators and solution evaluators. To allow direct comparison, we took solutions generated in Study 2a and presented them to a new group of participants for evaluation. The consequence generation task once again led to significant downward revisions in policy ratings. Notably, the magnitude of consequence neglect was statistically similar for self-generated and other-generated solutions, suggesting that the failure to consider negative consequences is equally likely to occur during both policy creation and evaluation.

Study 3: Ruling out alternative explanations.

To ensure that our results were driven by consequence neglect rather than unrelated cognitive processes, such as fatigue or mere re-evaluation of the policy. Study 3 introduced two control conditions: a thought generation condition (where participants reflected on policy implementation rather than consequences) and a re-evaluation only condition (where participants simply rated policies twice without any intermediate task). Only the consequence generation condition led to significant downward rating shifts, ruling out the possibility that mere re-evaluation or general cognitive engagement accounted for the effects observed in Studies 1 and 2.

Study 1: Consequence neglect proof of concept

In our initial study, we set out to determine if consequence neglect was, in fact, an observable phenomenon. In addition, we wanted to explore possible individual differences that might be associated with consequence neglect. As such, we developed a scale meant to capture the degree to which someone has a propensity for consequence neglect (a rating of individual differences in consequence neglectfulness). In addition, we measured people’s scores on Frederick’s [17] Cognitive Reflection Task (CRT), a measure of proclivity for deliberative thought. If intuitive thought leads to evaluation based upon how well it solves the problem, and deliberative thought is necessary to evaluate policies more holistically (i.e., considering consequences), then we would expect lower scorers on the CRT to be associated with higher rates of consequence neglect.

Method

Participants. One hundred participants were recruited from Amazon Mechanical Turk for a flat rate payment of $3.90 based on the estimated time to complete the survey. We took two steps to ensure data quality: 1. Upfront attention checks were used to screen out potential bots or inattentive subjects. 2. Before exiting the survey, participants were given an open-ended free response question that sought to differentiate humans from bots [18]. After accounting for these screens, our final sample size for analysis consisted of 95 participants (36% female, mean age = 40 years). The Carnegie Mellon University Institutional Review Board approved all aspects of this study as well as all subsequent studies prior to data collection. Upon entry into the study, informed consent was received from participants via survey response.

Procedure. After completing the aforementioned screening procedures and agreeing to participate in the study, participants were presented with a problem and a policy aimed at solving that problem. The participants evaluated the likely effectiveness of the policy on 5-point Likert scales (1=Not effective at all, 5= Extremely effective). This process was repeated for six different problems/policies that included: developing an incentive scheme to increase learning in schools, motivating a salesforce, improving quality of life in a low-income neighborhood, preventing negative outcomes of cosmetic surgery, raising the minimum wage, and responding as a CEO to a company controversy. For example:

“Imagine that you are a policy maker considering how to improve the learning in our education system. The currently proposed policy is to offer incentive bonus pay to teachers for higher student test scores. How effective do you think this policy would be?”

The complete wording for all of the stimuli can be found in our Web Appendix (which is in the files tab of the following OSF link: https://osf.io/8jrew/).

Participants were also presented with another problem concerning the prevention of office supply theft. However, for this problem they were asked to generate their own solution (rather than being provided with an experimenter generated solution). This prompt read:

“Imagine that someone in your office has been stealing supplies (i.e., pens and paper). Your boss has put you in charge of finding a solution to prevent this theft from occurring. Your boss likes to provide these supplies because she thinks it boosts productivity, so ceasing to offer supplies is not an option. What solution would you propose to your boss to solve this problem?”

After generating their solution, they rated the solution’s effectiveness on the same scale as the previous ratings. Next, participants completed a demographics questionnaire in which they self-reported age, education level, race, gender, income, and political affiliation. Participants also completed the CRT [17] as well as a novel individual difference scale meant to measure one’s consequence neglectfulness. The novel scale consisted of 11 items that participants responded to on 5-point agreement scales (1=strongly disagree, 5=strongly agree). Example items from the scale include (complete scale can be found in our Web Appendix which is in the files tab of the following OSF link: https://osf.io/8jrew/)

“I often ask myself ‘how did I not see this coming?’”

“When brainstorming, once I find a solution that seems to solve the problem, I choose it and don’t worry about what the long-term consequences may be.”

Participants were next presented with the consequence generation task. For each of the six policies that they had previously seen, as well as their own solution to the office supplies problem, participants were asked to generate two positive and two negative consequences. The order in which positive and negative consequences were generated was counterbalanced to control for possible order effects.

After participants completed the consequence generation task for all the policies, participants were again presented with all seven of the policies and asked to rate each of them using the same scale as on their initial ratings. After this final rating elicitation, participants completed the outro bot-check and were debriefed. A visual representation of the experimental paradigm can be found in Fig 1.

Download:

Fig 1. Study Design for Studies 1, 2a, and 2b.

The above figure shows the general outline of the paradigm used throughout the studies in this paper. In general, participants began by rating policy solutions (the source of these policy solutions varied by study). Subsequently, participants were asked to generate two positive and two negative consequences for those same policy solutions. Finally, participants again rate policy solutions.

https://doi.org/10.1371/journal.pone.0322149.g001

Consequence neglect. Our design allows us to test for evidence supporting consequence neglect within subject. Prior to the first rating, participants may or may not have called particular consequences of a given policy to mind. If these consequences were thought of at the time of policy evaluation, then they would be reflected in the initial round of policy ratings. However, if participants did not initially call these consequences to mind during the first evaluation, then the initial ratings would not reflect the consideration of these consequences. During the second rating, we know explicitly that participants had called consequences to mind since we required them to do so. Thus, if we find a difference between the first and second rating, we can attribute it to engagement in the consequence generation task. It is notable that since participants are responsible for generating the consequences themselves, we know that they were capable of coming up with these consequences. Thus, a systematic downward shift in ratings would be consistent with the presence of negative consequence neglect because people were perfectly capable of generating these consequences substantive to policy evaluation but did not.

Results

Consequence neglect. Across all contexts, the average rating shift was 0.2 points downwards. A pairwise t-test reveals this shift to be significant (t(664)=6.52, p <.01, d_RM = -0.244). Results of rating shifts for the different policy contexts are shown in Table 1. Notably, every context tested was consistent with the pattern that consequence neglect would predict. Pairwise t-tests showed that this downward shift was statistically significant for three out of the six policy evaluation domains and also for the self-generated policy. Overall, these results show us evidence that a consequence generation task leads to a subsequent downgrade in ratings.

Download:

Table 1. Mean Rating Shifts in Study 1.

https://doi.org/10.1371/journal.pone.0322149.t001

Of course, it was important to determine whether the order that consequences were generated (positive vs. negative consequences) had any bearing on the observation of this downward shift. If order effects were observed, then the results could be an artifact of priming (i.e., the most recently generated consideration having the most weight), rather than evidence of consequence neglect. However, across each of the seven contexts, we found no significant difference between participants who generated positive consequences first and those who generated negative consequences first (all ps >.2). This provides evidence that the order in which consequences were generated did not influence the subsequent rating of a policy.

Despite the fact that people were fully capable of coming up with negative consequences, these downward shifts (four of which were significant) suggest that they may not have been considered in the first round of ratings. Examining some of the generated consequences sheds light on how easy some of these are to call to mind. For example, for the teacher incentive prompt, some commonly generated negative consequences were “Teachers teach to the test,” “teachers change test scores,” and “Teachers may feel stressed if students don’t perform well” (A complete list of consequences as well as full datasets for all of our studies can be found in our Web Appendix which is in the files tab of the following OSF link: https://osf.io/8jrew/). It is not a matter of whether people are capable of generating consequences, but rather a matter of whether they attempt to do so.

Individual differences. We began our scale analysis by appropriately reverse coding and summing across our scale items to make a composite score such that higher scores were associated with greater consequence neglectfulness. Our scale yielded fairly reliable measurements (Chronbach’s α =.785). However, while the measure was reliable, it was not meaningfully associated with post consequence generation rating shifts as we had hypothesized. In fact, none of the individual differences that we measured were meaningfully related to observed levels of consequence neglect, including participants’ CRT scores. Summary statistics describing these null results are shown in Table 2. This suggests that there is no “general consequence neglectfulness” trait that differs between individuals, (or to the extent that there is, such a trait not associated with demographics, measures of reflective thought, or measures of introspective awareness of one’s tendency to ignore consequences).

Download:

Table 2. Null Relationships between Frequency of Downward Revisions and Individual Difference Measures in Study 1.

https://doi.org/10.1371/journal.pone.0322149.t002

Discussion

Study 1 illustrated that a consequence generation task leads to a subsequent downward revision in one’s evaluation of a given policy. We showed that the same trends occurred across all seven domains tested, and that the results were not an artifact of order effects. However, we did not find any evidence of associations between these downgrades and any individual difference measures, including measures of cognitive reflection, educational attainment, or self-reported tendency to think about consequences.

One intriguing observation was that people showed larger consequence neglect for self-generated solutions to problems than when evaluating solutions generated by others. However, there are (at least) two important caveats to this finding. First, there was only one self-generated solution domain (office supply theft), and it is possible that the specific domain tested happened to evoke stronger consequence neglect than the other domains tested (i.e., stimulus sampling issues [19]). Second, the other-generated solutions were all generated by the research team. Given that the research team was not blind to the hypotheses when generating stimuli, it is possible that the experimenter-generated stimuli could artificially inflate rates of negative consequence neglect. Although the data patterns in Study 1 suggest that if anything, the opposite is true, that could be an artifact of flawed stimulus sampling. While we explored seven different policy contexts, for the six vignettes with other-generated solutions only a single policy solution was proposed/evaluated. The results would be more compelling if there were a wider range of policy solutions proposed, and those proposed policy solutions were generated by individuals who were blind to the goals of the study.

Another potential issue with Study 1 is the dependent measure. Participants were asked to evaluate the likely effectiveness of a solution. This question may not have fully captured the concept we were hoping to measure: a solution can be extremely effective at resolving the problem at hand while still having problematic externalities. Participants might have neglected consequences because they thought that consequences weren’t germane to the specific question being asked. As such, in Study 2a we asked participants to evaluate the overall favorability of the solution, which is less likely to be interpreted as solely about how well the solution deals with the proximal problem. Study 2a was designed to resolve these issues.

Study 2a: Consequence neglect for self-generated solutions

Study 2a differs from Study 1 in several ways. First, participants in Study 2a generated their own solutions rather than being provided with experimenter-generated solutions. It may be that in the process of generating solutions, people think more deeply, which might lead consequences to come to mind. Alternatively, the goal-oriented nature of developing a policy to solve a problem may actually exacerbate consequence neglect by motivating people to focus on the problem at hand rather than the unintended results of their solutions.

Secondly, the participants were asked to evaluate the favorability of a policy, rather than the effectiveness of a policy. Third, we used several new scenarios to further generalize the results to a more diverse set of policy domains. Finally, as our measure of individual differences in consequence neglect had proven to be uninformative in Study 1, it was removed from Study 2a to shorten the experiment and save time.