The authors have declared that no competing interests exist.
When many events contributed to an outcome, people consistently judge some more causal than others, based in part on the prior probabilities of those events. For instance, when a tree bursts into flames, people judge the lightning strike more of a cause than the presence of oxygen in the air—in part because oxygen is so common, and lightning strikes are so rare. These effects, which play a major role in several prominent theories of token causation, have largely been studied through qualitative manipulations of the prior probabilities. Yet, there is good reason to think that people’s causal judgments are on a continuum—and relatively little is known about how these judgments vary quantitatively as the prior probabilities change. In this paper, we measure people’s causal judgment across parametric manipulations of the prior probabilities of antecedent events. Our experiments replicate previous qualitative findings, and also reveal several novel patterns that are not well-described by existing theories.
When something happens, people commonly ask: What caused that? Making causal judgments is often deceptively easy. We naturally conclude that the lightning strike caused the forest fire; the last-minute goal caused the sports team’s victory; or the scandal caused the political candidate’s defeat. These kinds of causal judgments are also very important, structuring how we understand and interact with our environments.
Yet, there are certain things that people are reluctant to label causes of an event, even though the event clearly depended on them. For instance, it is less natural to conclude that the forest fire was caused by the presence of oxygen, or by the lack of rain, or by the arsonist’s birth, even though the all of these events were necessary for the fire to happen.
This phenomenon is widespread: When an outcome occurs, people consistently judge certain seemingly-relevant events more causal than others [
When attempting to understand why some causes stand out while others recede, a key fact is that this distinction is not dichotomous, but rather
Despite its importance, however, our empirical understanding of gradation in token causal judgment is limited. Many studies have investigated how qualitative shifts in the parameters of causal systems (e.g. the prior probability of the antecedent events) produce qualitative shifts in causal judgment (see [
Several qualitative manipulations are known to affect which events people consider causal. We do not attempt an exhaustive review; rather, we follow Icard et al. [
We first consider conjunctive causal systems, where multiple antecedent events were counterfactually necessary for an outcome—e.g. there was a fire which required both oxygen and a lit match. In these systems, classic normative analyses of causation would, roughly, label the necessary events as equally causal [
This effect has been demonstrated in two ways. First, when asked to select which of several necessary events was the cause of an outcome within some naturally occurring series of events, people tend to pick the rarer event. For instance, people say the match, not the oxygen, caused the fire. They do so in part because oxygen is always present, and a lit match is rarer [
Second, researchers have directly manipulated the prior probability of antecedent events, and found that making an event rarer causes people to assign it more causality. For instance, Icard et al. gave people vignettes like the following [
Professor Smith works at a large university. At this university, in order to get new computers from the university, faculty like Prof. Smith must send an application to two administrative committees, the IT committee and the department budget committee. Prof. Smith will be able to get her new computers if the IT committee AND the department budget committee approve her application.
Icard et al. manipulated the prior probability of one of the causes by telling participants that the budget committee either “almost always” or “almost never” approves applications. Then, participants were told that both of Prof. Smith’s applications were approved, and she received a computer. Participants were asked the extent to which they agreed that the focal event—the budget committee’s approval—caused Prof. Smith to receive the computer. Icard et al. found that people agreed substantially more when the budget committee almost never approves applications. This result, along with many similar results, demonstrates that rarer events are often considered more causal. Following others, we refer to this effect as “abnormal inflation”.
A less thoroughly explored effect is that, in conjunctive systems, people also judge an antecedent event
In disjunctive causal systems, where multiple events are each sufficient to produce an outcome, these effects change. Here, people appear to judge events more causal if they are
In sum, we’ve highlighted four causal selection effects in token causal judgment: abnormal inflation and supersession in conjunctive systems, and abnormal deflation and no supersession in disjunctive systems [
Second, these causal selection effects have typically been studied with qualitative manipulations of the rarity of the antecedent events. Relatively little is known about how token causal judgments vary quantitatively with the probability of antecedent events. This is our point of departure.
To examine the quantitative nature of token causal judgment, we consider a deterministic causal system in which two binary variables combine, either conjunctively (Experiment 1) or disjunctively (Experiment 2), to produce a binary outcome (
(A) Causal structure with two variables that are individually necessary and jointly sufficient to produce an outcome. The outcome equals 1 if and only if both
As discussed above, previous research has established that the prior probabilities of the antecedent variables matter for causal judgment. However, prior work has largely restricted itself to qualitative manipulations—e.g. comparing cases where an antecedent event is rarely present to cases where it is almost always present. Here, we map the prior probability of each antecedent variable (
To preview our results: We largely replicate the first three qualitative findings described above (abnormal inflation and supersession in conjunctive structures, abnormal deflation in disjunctive structures), but we find a “reverse” form of supersession in disjunctive structures. Moreover, our experiments reveal nonlinear effects which, to our knowledge, are not well captured by existing theories.
The data and analysis code for both experiments can be found at
In Experiment 1, we presented participants with the following vignette and an associated image (
A person, Joe, is playing a casino game where he reaches his hand into two boxes and blindly draws a ball from each box. He wins a dollar if and only if he gets a green ball from the left box and a blue ball from the right box.
Joe closes his eyes, reaches in, and chooses a green ball from the first box and a blue ball from the second box. So Joe wins a dollar.
The images presented the two boxes, illustrating the percentage of green balls in the left box and blue balls in the right box. By manipulating these images, we manipulated the prior probability that Joe draws a green ball or a blue ball. Since we ask people to rate the causality of drawing the green ball, we will refer to green as the “focal” variable (labeled
For each observation of the system, we randomly selected a prior probability for
All participants were recruited through Amazon Mechanical Turk. They gave informed consent, and the study was approved by Harvard’s Committee on the Use of Human Subjects. The experiment took about 3 minutes to complete, and participants were paid $0.35 each.
We employ a within-between subjects design. Each participant was given the vignette five times, each time with a randomly chosen probability setting. (We ensured that a participant did not see the same probability setting twice). We ran
In addition to the qualitative predictions described above (abnormal inflation and supersession in the conjunctive structure), several contemporary theories of token causation make more precise quantitative predictions in our task. In our treatment of these models, we sometimes make additional assumptions to adapt them to the present task. Our intent is not to critically evaluate the models in their exact original setting, but rather to examine the extent to which plausible instantiations of them can capture quantitative variation in the situations we study. Also, we focus primarily on the
One influential theory comes from Halpern & Hitchcock [
Halpern & Hitchcock argue that this observation can explain the dependence of causal judgment on the prior probabilities of the variables. To determine whether a variable
Halpern & Hitchcock’s model is not committed to a particular normality ordering over possible witness worlds (for a detailed discussion, see [
(A-D) Predictions from the five models we analyze. The x-axis indicates the probability of drawing a green ball; line color indicates the probability of drawing a blue ball; and the y-axis indicates
Formally, then, the degree to which
Halpern & Hitchcock’s model is ordinal [
Another theory we consider comes from Icard, Kominsky, & Knobe [
According to Icard et al., while the probability of sampling a counterfactual world with
In the structure of Experiment 1,
The predictions of this simplified version of Icard et al.’s model are shown in
Finally, we consider three classic “causal strength” measures, some of which were designed primarily to capture other flavors of causal judgments (e.g. type causation). Nonetheless, these measures make plausible quantitative predictions in the causal systems we test, and so we include them in our analysis.
The first, denoted SP, assigns causality to a variable to the extent to which observing the variable raised the probability of the outcome:
The second is similar, but requires causes to raise the probabilities of their outcomes, relative to a state where the cause was absent. This model, denoted Delta-P, takes the form:
The third measure builds on the second, but normalizes the rating by the probability that the event is absent when the cause is absent. This model, called Power-PC, takes the form:
There are other notable models of token causal judgment that use Bayesian reasoning to infer underlying causal structures from sparse data [
People’s average ratings are shown in
We replicated the qualitative findings from previous work. People exhibited abnormal inflation; on average, they rated drawing green (the focal variable) as more causal when drawing green was less likely. They also exhibited supersession; on average, they rated drawing green as less causal when drawing blue (the alternate variable) was less likely.
To demonstrate both of these effects statistically, we estimated a linear mixed effects model, regressing the causal ratings of green on the prior probabilities of drawing green and blue, with random intercepts and slopes for each subject. On average, people indeed rated green as more causal when its probability was lower (
(A) People show a larger abnormal inflation effect from
To rule out that these basic patterns were driven by order effects in our repeated-measures design, we analyzed how they changed from the beginning to the end of the experiment. The results did not differ much between the first and second half of trials (see
More interesting, however, are the nonlinear patterns. We highlight four such patterns, all revolving around what happens when drawing one of the balls is certain. First, people exhibit a large jump in abnormal inflation from
Second, people exhibit a large jump in supersession from
Third, people do not exhibit abnormal inflation when
Fourth, there is an analogous effect for supersession; people do not exhibit supersession when
To compare the shape of people’s response patterns to those predicted by the models, we computed the correlation between the empirical ratings and each model’s predictions, across settings of
Each point represents one joint setting of the two probabilities (e.g. one point might capture the case where
The Icard model and SP both scored well by predicting the linear trends of abnormal inflation and supersession. However, neither correctly characterized the overall nonlinear patterns in people’s ratings. In particular, none captured the effects of certainty described above.
We characterized people’s quantitative causal ratings across the free parameters of a conjunctive causal system. As expected, people exhibit both abnormal inflation, where their causal rating of the focal variable (green) increased as it became rarer, and supersession, where their causal rating of the focal variable decreased as the alternate (blue) became rarer.
Moreover, people showed clear nonlinear patterns in judgment when the candidate causes were certain to occur. Specifically, they showed an unexpectedly strong drop in their causal rating of the focal variable when it was certain to occur, and an unexpectedly strong increase when the alternate was certain to occur. Moreover, the effect of abnormal inflation disappeared when the alternate was certain to occur, and the effect of supersession disappeared when the focal variable was certain to occur. These patterns are not well-described by extant models.
Experiment 2 was identical to Experiment 1, except the causal structure was disjunctive instead of conjunctive (
On each trial, participants were told that Joe drew both a green and blue ball, and won the dollar. As before, we parametrically manipulated the prior probabilities of drawing green (
All participants were recruited through Amazon Mechanical Turk, and gave informed consent. We ran
In the disjunctive structure, the qualitative predictions from past research were (a) that participants would show abnormal
Recall that in Halpern & Hitchcock’s model [
To determine the precise predictions, there are three cases to consider. First, consider the case where both
Similarly, if
The third and hardest case is when one of the two variables is likely, and the other is rare—that is, when
But it is plausible that, in our experiment, people would rank the normality of a world according to the
This leaves us with two potential response profiles, HH1 and HH2:
Below, we consider both variations of the model. Their predictions are depicted in
(A-F) Predictions from the models we analyze. The x-axis indicates the probability of drawing a green ball; line color indicates the probability of drawing a blue ball; and the y-axis indicates
Recall that, according to the model of [
In the structure of Experiment 2,
The Icard model’s predictions are shown in
In the disjunctive structure of Experiment 2, the three causal strength measures considered above reduce to (
People’s average ratings are shown in
We dissect these effects further in
(A) People exhibit a linear effect of abnormal deflation. (B) People show a larger reverse supersession effect from
There were two other nonlinear patterns that mirrored the patterns in Experiment 1. Recall that, in Experiment 1, people stopped exhibiting abnormal inflation when drawing blue was certain, and stopped exhibiting supersession when drawing green was certain (with certain exceptions discussed above). In Experiment 2, we find an analogous pattern: People stop exhibiting abnormal deflation when drawing blue is certain (
These visual observations were confirmed statistically. When
The correlations between the model predictions and people’s ratings across conditions are shown in
Brackets show 95% confidence intervals for Pearson correlation coefficients. The correlation for Power-PC cannot be calculated because it predicts a constant response profile.
We characterized people’s quantitative causal ratings across the free parameters of a disjunctive causal system. As predicted by the Icard model [
Moreover, people’s nonlinear patterns mirrored those in Experiment 1: They stopped exhibiting abnormal deflation when the alternate variable was certain to occur, and stopped exhibiting reverse supersession when the focal variable was certain to occur. These nonlinearities are not well-described by extant models.
We presented people with two basic causal systems, and elicited people’s causal judgments across parametric manipulations of two key parameters in those systems: the prior probability of the focal event (whose causal status was being rated), and the the prior probability of an alternate event (that was either necessary or sufficient for the outcome). These experiments replicated three previously established qualitative effects—abnormal inflation and supersession in the conjunctive structure, abnormal deflation in the disjunctive structure—and revealed a novel effect of reverse supersession in the disjunctive structure. Moreover, people exhibited a constellation of nonlinear patterns that were strikingly consistent across the two experiments. We discuss these results in turn (see
Table of effects | ||
---|---|---|
People rate focal as less causal |
People rate focal as more causal |
|
People rate focal as less causal |
People rate focal as more causal |
|
People show stronger abnormal inflation, and stop showing supersession. | People stop showing reverse supersession. | |
People show stronger supersession, and stop showing abnormal inflation. | People show stronger reverse supersession, and stop showing abnormal deflation. |
The original supersession effect is that, in conjunctive structures, a focal event is rated less causal when an alternate event becomes rarer; the alternate event “supersedes” the focal event [
In contrast to those papers, we do find a kind of supersession effect in the disjunctive structure of Experiment 2. The effect is reversed: The focal event is rated more causal when the alternate event becomes rarer (or, equivalently, the focal event is rated less causal when the alternate event becomes more common). Given the logic of supersession, this reversal is unsurprising. In disjunctive structures, events become causally preferred when they are more common (the “abnormal deflation” effect). It makes sense, then, that if the alternate event were to supersede the focal event, it would do so when it was common, not rare.
What
Across most of the parameter space, the effects of prior probabilities on causal judgment appear largely linear and additive. That is, the prior probabilities of the focal and alternate events each have a mostly-linear influence on causal judgment, and do not interact much. The main exception is when the probability of either event approaches 1. When one of the events is certain to occur, people exhibit a constellation of departures from the linear, additive baseline.
First, three of the four primary effects—abnormal inflation and supersession in Experiment 1, reverse supersession in Experiment 2—become very strong when the relevant events become certain. For instance, in Experiment 1, although the focal variable (drawing a green ball) is considered less and less causal as it becomes more common, the drop in causal rating from
Discontinuities in causal judgment when an event is certain have been discussed before. Perhaps most prominently, Cheng & Novick [
Another possibility is that these nonlinearities result from a domain-general feature of people’s probabilistic reasoning: People, in general, treat the jump from almost-certain to certain as more important than other equivalently sized jumps in probability [
Both of these perspectives, however, have trouble accounting for the second set of departures from the linear, additive baseline: People stop exhibiting all the effects described here (abnormal inflation/deflation, supersession) when the
The nonlinearities observed in our experiments are primarily about what happens when one of the events becomes certain to occur. It is also possible that there are similar nonlinearities when one of the events becomes certain
Out of the models analyzed here, the response profile predicted by Icard et al. was consistently the most similar to people’s ratings [
People’s causal judgments are influenced by the prior probabilities of the candidate causes. Here, we find that people show systematic patterns of judgment across parametric manipulations of those probabilities, in a way that is not fully characterized by existing theories. We hope this spurs further research into the quantitative nature of token causal judgment.
(A-B) The effects in Experiment 1; (C-D) the effects in Experiment 2. People show roughly similar patterns across the two halves, suggesting that the influence of
(TIFF)
We thank Joshua Knobe, Thomas Icard, and the Moral Psychology Research Lab for their advice and assistance.