An accurate and efficient measure of welfare tradeoff ratios

Wenhao Qi; Edward Vul; Lindsey J. Powell

doi:10.1371/journal.pone.0322410

Abstract

People’s decisions are affected by their interest in others’ welfare. They can be motivated both to help and to harm others. The direction and magnitude of these motivations can be quantified relative to a person’s self-interest as a welfare tradeoff ratio (WTR). This construct is valuable for testing quantitative theories of social motivation. However, most existing measures of WTRs, and the similar construct of social value orientation (SVO), are based on multiple choices between discrete sets of payoffs, which forces a tradeoff between the accuracy and efficiency of the measures. Here we introduce the Lambda Slider, a WTR measure that is simultaneously accurate and efficient. A participant uses a linear slider to choose from a continuous range of payoff allocations for herself and her social partner. The underlying payoff functions for self and other create a one-to-one correspondence between the participant’s potential WTR values and the slider positions that she could choose, which enables accurate measurements of WTR from a single response. Across three experiments, we show that a single response on the Lambda Slider has high reliability, high convergent validity with other measures of social motivation, and high external validity for an altruistic decision with real-world consequences. The Lambda Slider is easy to implement and can be applied in a wide variety of studies on the forces that shape social motivation.

Citation: Qi W, Vul E, Powell LJ (2025) An accurate and efficient measure of welfare tradeoff ratios. PLoS One 20(5): e0322410. https://doi.org/10.1371/journal.pone.0322410

Editor: Garret Ridinger, University of Nevada Reno, UNITED STATES OF AMERICA

Received: October 10, 2024; Accepted: March 20, 2025; Published: May 27, 2025

Copyright: © 2025 Qi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data and analysis code in the experiments can be found at https://doi.org/10.5281/zenodo.14563524.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

People’s lives are full of choices that affect both their own welfare and others’ welfare. For example, the decision to give your coat to another person on a cold winter night decreases your own welfare but increases that person’s welfare. People’s decisions in such interdependent situations are driven by a variety of social motivations, including an interest in social norms and reputation along with direct concern for others’ well-being [1,2].

The sum of these social forces results in an overall motivation to increase or decrease another person’s welfare; i.e., to benefit or harm that person. The direction and magnitude of this motivation can be captured as a welfare tradeoff ratio (WTR): the amount of personal welfare one is willing to give up in order to increase or decrease another person’s welfare by a specified amount [3]. Formally, if Alice has a relationship with Bob, then we can express Alice’s utility for a given decision as

(1)

where (“s” stands for “self”) is Alice’s resulting welfare (her actual or expected payoff from the decision), (“t” stands for “target”; we use “target” instead of the usual “other” due to the confusability between the letter o and the number 0 as subscripts, and that the letter t happens to be the alphabetical successor of the letter s) is Bob’s welfare (his actual or expected payoff), and is Alice’s welfare tradeoff ratio toward Bob. For conciseness, we will use to represent welfare tradeoff ratios throughout the paper. A higher indicates stronger altruism or friendliness on the part of Alice toward Bob, as it means that Alice will favor actions or situations that are good for Bob, even at the expense of some of her own welfare. In contrast, a lower indicates stronger selfishness or dislike. A negative would mean that Alice could perceive utility in sacrificing some of her own welfare in order to harm Bob.

An important goal for social psychology and behavioral economics has been to understand the factors that impact people’s concern for others’ welfare [4–8]. WTRs are one of several dependent variables researchers have developed for measuring this concern. Others include decisions in specific economic games (e.g., the dictator game) and composite constructs such as “social value orientation” (SVO). The continuous version of the SVO construct is similar, formally and conceptually, to [7]. One advantage of both and SVO is that they measure generalizable values that can be used to predict people’s choices across many decision contexts.

An ideal tool for measuring social motivation would be both accurate and efficient. (For simplicity, in most of this paper, “accuracy”, “accurate” or “accurately” entails the technical concepts of both accuracy (or unbiasedness) and precision (or reliability); i.e., a measure needs to be both accurate and precise in order to be called “accurate”. For our purposes, higher efficiency means fewer responses from a participant for one measurement.) This would make it feasible for researchers to measure meaningful differences or changes in concern for others across many people, partners, or situations. For example, an accurate and efficient measure of could allow researchers to study how changes as social partners build a history of reciprocation by quickly and repeatedly sampling across interactions [9]. Or it could allow researchers to study how reflects positions and connections among many people in a social network [10].

Existing measures of both and SVO have generally faced a tradeoff between accuracy and efficiency. As described below, they achieve accurate estimates of one participant’s concern for another person by asking the participant to make a high number of decisions about how they would allocate payoffs with that person. Here we propose a new measure of , the Lambda Slider, that largely avoids this tradeoff and achieves an accurate measurement of from only a single decision. We first explain why the accuracy–efficiency tradeoff arises in existing measures of and SVO and how our new measure eliminates this tradeoff. We then present three experiments testing the psychometric properties of the Lambda Slider.

Binary allocation tasks

How can we measure ? The simplest way is through a “binary allocation task” [11]. If we want to measure Alice’s toward Bob, we can give Alice two allocation options to choose from (Fig 1A). Option A results in $5 for Alice and $0 for Bob (, ), while Option B results in $0 for Alice and $10 for Bob (, ). (Here we assume a linear relationship between monetary payoffs and welfare, and that the same increase in payoff leads to the same increase in welfare for both oneself and the target. In practice this may not be exactly true [12], but in our experiments we try to minimize sources of nonlinearity.) If , Alice will choose Option B since it leads to a higher overall utility for herself than Option A. If , then the overall utility of Option A is higher, and Alice will be more likely to choose that option instead. Therefore, Alice’s decision on this allocation task tells us whether her toward Bob is above or below the threshold . This threshold tested by the task can be adjusted by changing the payoff values involved in the two allocation options.

Download:

Fig 1. From binary allocation tasks to the Lambda Slider.

(A) A binary allocation task, where the threshold of for switching between Options A and B is . The arrows point in the direction of the gradient of the utility function (Eq (1)) for a given . (B) Adding a third option C to create a triple-dominance task, which is equivalent to two binary allocation tasks with , but requires only one response. The shaded areas are all the locations where Option C can be placed in order for the task to be triple-dominance. C^′ is a hypothetical third option that does not form a triple-dominance task with A and B because they do not fall along a strictly concave function . (C) An illustration of a hypothetical “septuple-dominance task” in which each option would be preferred for some range of . (D) A possible option space for a Lambda Slider, where a participant can choose any point on the curve (via a slider; Fig 2A). Each point on the curve corresponds to a unique whose corresponding utility gradient is perpendicular to the tangent of the curve at that point.

https://doi.org/10.1371/journal.pone.0322410.g001

However, a single binary allocation task has very low sensitivity, defined as the inverse of the smallest change that can be detected by the measure. A binary allocation task with cannot distinguish among different s above, or below, 0.5. By analogy, refusing to give your coat to another person could reflect any ranging from valuing your own warmth just a little more than theirs to actively wishing for them to be cold. The accuracy of a measure is upper-bounded by its sensitivity. To gain a higher sensitivity in our overall measurement of , we can give Alice multiple binary allocation tasks with different s. For instance, if we assume falls between –2 and 3 and want a measure that gets within 0.5 of the correct value, we need 9 tasks with . If we aim to get within 0.1 of the correct value, then the number of tasks goes up to 49. This illustrates the inevitable tradeoff between sensitivity and efficiency when we measure with binary allocation tasks.

Most existing measures of [13–15], or related constructs such as social value orientation (SVO; [6,11,16–19]), share the logic of narrowing down with multiple binary allocation tasks, thus sharing the tradeoff between sensitivity and efficiency. (We believe that fundamentally and SVO are the same thing. Their difference is mostly historical—SVO was traditionally treated as categorical and describes a person’s stable disposition toward a unidentified other, while is usually treated as continuous and specific to each decision influenced by a variety of factors. Recent work has shown a convergence between these two concepts (e.g., [7,9]). For a comprehensive review of the measures in the SVO literature, see [7].) This can result in study designs in which many participants must be recruited to study the effects of only a few factors on social motivation (e.g., [20]).

Triple-dominance tasks

Can we achieve the level of sensitivity of many binary allocation tasks with only a few responses or even one response from the participant? We can draw inspiration from the Triple-Dominance Measure [6]. Although a triple-dominance task is equivalent to two binary allocation tasks, a participant only needs to make one response on the measure. We can create a triple-dominance task by adding a third option to the binary allocation task in Fig 1A: Option C results in $5 for Alice and $5 for Bob (Fig 1B). With these three options, Alice will choose A if , B if , and C if is between 0 and 1. Therefore, this triple-dominance task is equivalent to two binary allocation tasks but Alice only needs to make one decision by choosing the best option among the three.

Allocation options in a triple-dominance task must be selected such that for any given , one option dominates (i.e., results in a higher utility than) the other two, and for each option there exists some such that the given option is dominant. In order to maintain these features, the three options need to fall along a strictly concave function . This means that Options A and B constrain the possible payoffs offered in Option C, as illustrated by the shaded areas in Fig 1B. As a counterexample, consider Option C^′ in Fig 1B, which corresponds to and . Given Options A, B and C^′, Alice will choose A if and B if , but she will never choose C^′, so the task does not satisfy the criterion that for each option there exists some such that the given option is dominant.

Lambda Slider

By the same logic, we can add more allocation options to a single-choice task to gain a higher sensitivity in the measurement of . Fig 1C is a hypothetical example of a “septuple-dominance task” with 7 options, and the sensitivity of the measurement is for . The options still need to fall along a strictly concave function to ensure that each option corresponds to the best choice given some .

If we keep adding options to the task, we can create a smooth, continuous curve in the – space (Fig 1D), with each point on the curve corresponding to a single . This one-to-one correspondence (bijection) between potential s and points on the curve results in a (theoretically) infinite sensitivity of the measurement, which makes it possible to accurately measure a participant’s toward a particular social partner from a single choice. Another way to understand this is to notice that each point on the curve corresponds to one particular exchange rate between and , and the participant can vary the exchange rate continuously until she finds the preferred one according to her .

We can present such a continuous set of allocations to the participant with a slider (Fig 2A), and we call it the Lambda Slider (see S1 Appendix for a formal definition and S2 Appendix for comparison with a related measure, the Circle Test [19]). The rewards allocated to the participant and target, and , are both continuous functions of the slider position x, and we call and the payoff functions of the slider.

Download:

Fig 2. The (quadratic) Lambda Slider.

(A) The interface of the slider. The payoff to oneself (red bar) and payoff to the target (blue bar) change continuously as the participant moves the slider. (B) The payoff functions of the quadratic Lambda Slider used in Experiment 1 (a = 11.25, , , and ). If we plot and against each other, we get a parabolic curve similar to Fig 1D. (C) Examples of the participant’s utility function for different s. The slider position that maximizes each utility function is marked, which is an identity function of the participant’s .

https://doi.org/10.1371/journal.pone.0322410.g002

We can choose the payoff functions such that the slider position x that a utility-maximizing participant chooses (denoted x^*) is an identity function of her toward the social partner in question. Consequently, the slider position is a direct measure of and no additional calculation is required. One class of such payoff functions (and arguably the simplest class; see S1 Appendix) is

(2)

(3)

(4)

where a>0 is an arbitrary scale parameter that expands or shrinks the range of payoff values, and are arbitrary shift parameters that can offset the participant and target’s payoff ranges from one another, and and are boundaries of the slider (Fig 2B). When we apply the utility definition of Eq (1), we get

(5)

(6)

(7)

which is a concave parabola with a peak at (Fig 2C), so it satisfies the criterion

(8)

In other words, the participant will choose the slider position that is equal to her (as long as it falls between and ) in order to maximize her utility, and this single response on the Lambda Slider gives a measurement of the participant’s with theoretically infinite sensitivity, though of course there will be some limits imposed by the implementation of the task.

We call a Lambda Slider with payoff functions given by Eqs (2) and (3) the quadratic Lambda Slider. When plotted on the – plane, the quadratic Lambda Slider is still a parabola, and is a strictly concave function, similar to Fig 1D. The (quadratic) Lambda Slider shares the logic with mechanism design [21], i.e., we design the payoff structure such that the player’s rational action directly reveals her hidden preferences ( in our case). In all the experiments below, the slider position x on the Lambda Slider directly maps to .

SVO Slider Measure

One apparent difference between the Lambda Slider and the measures based on binary allocation tasks is that the set of possible responses is continuous for the former but discrete for the latter. There is a measure, the SVO Slider Measure, that employs continuous sliders to assess SVO or [7,18]. This measure consists of 6 sliders, each involving linear payoff functions for the participant, , and another person, . Each slider connects two points on a circular arc, centered on (Fig 3A). The points represent the choices most aligned with four categorical social value orientations: competitive, selfish, prosocial, and altruistic. After calculating the average chosen payoffs for self and target across the 6 sliders ( and ), a summary output is calculated as:

Download:

Fig 3. The SVO Slider Measure [18].

(A) The payoff functions of the 6 primary items of the measure (black lines). Each segment represents the linear relationship between and on one of the items, and they are labeled in the same order as in [18]. The red arc and point provide an intuitive explanation (but not formal justification) for the calculation of SVO^∘. ( B) The theoretical step function (green curve) between the output of the measure (SVO^∘) and . The labeled vertical segments correspond to the s (the thresholds of at which a utility-maximizing participant switches from one end to the other on the sliders) of the 6 items. The theoretical response on the circular Lambda Slider (arctan ; Eq (15) in S2 Appendix) is also plotted for comparison.

https://doi.org/10.1371/journal.pone.0322410.g003

(9)

This “angle” can be interpreted as the angle of the point a participant would choose among payoff values aligned along the arc in Fig 3A, with larger angles corresponding to higher values of . (In fact, this arc can be used to create a “circular” Lambda Slider; see S2 Appendix.)

The SVO Slider Measure is relatively efficient, requiring 6 responses for one measurement, which is fewer than previous measures such as the 9-item Triple-Dominance Measure [6] and the Ring Measure [16,17]. However, the linear nature of the slider payoff functions effectively results in binary allocation tasks, which create the familiar tradeoff between sensitivity and efficiency. For instance, the first slider has payoff functions

(10)

(11)

where is the slider position. Then the utility function is

(12)

(13)

A utility-maximizing participant would choose x = 0 if , choose x = 1 if , and be indifferent if . Therefore, this slider is equivalent to a binary allocation task with . Similarly, the s for the remaining 5 sliders are , , , 1, and . The measure has no way of distinguishing among different s between two adjacent s (e.g., 0 from slider 1 and from slider 4). For any given , we can derive the output of the measure, SVO^∘, from the choices that the participant would make on the 6 sliders, which is plotted in Fig 3B. The relationship between and SVO^∘ is not one-to-one, but many-to-many (i.e., different s between two adjacent s lead to the same responses, and for a given that is equal to one of the s, all positions on one of the sliders are equally preferable). Technically speaking, the SVO Slider Measure has high resolution but low sensitivity. The Lambda Slider has the potential to provide both higher sensitivity and higher efficiency, though when implemented as a single item measure it may have somewhat lower reliability.

Current research

In Experiment 1, we compare the Lambda Slider to the SVO Slider Measure in terms of test–retest reliability and convergent validity, because (a) the SVO Slider Measure performs relatively well in practice and is regarded as the state-of-the-art measure of , and (b) it can share an interface with the Lambda Slider (Fig 2A), allowing us to easily mix them in a single experiment. (Future work can compare the Lambda Slider with other popular measures of , such as the Welfare Trade-Off Task [14,15], which requires a different interface.) In Experiment 2, we rule out an alternative hypothesis that participants use a heuristic to make decisions on the Lambda Slider. In Experiment 3, we test the external validity of the Lambda Slider using a social decision with real-world consequences, and explore the effects of inequity aversion on measurements of . All data and analysis code in the experiments can be found at https://doi.org/10.5281/zenodo.14563524, with instructions for reproducing the results.

Experiment 1

We have formally shown above that the one-shot Lambda Slider has infinite sensitivity. However, how much such theoretical sensitivity translates to empirical accuracy is limited by the degree to which participants perfectly maximize a utility function in the form of Eq (1).

In Experiment 1, we evaluate the reliability and validity of the quadratic Lambda Slider, and compare it with the SVO Slider Measure. (In all three experiments, we report all measures, manipulations and exclusions.) To evaluate the psychometric properties of the Lambda Slider, we need to elicit as wide a range of s as possible from each participant. It has been shown that a person’s toward another person decreases as their social distance increases [13]. Therefore, we asked participants to each generate a list of 10 known people (subsequently called “targets”) occupying a range of social distances from themselves. We then had participants make hypothetical allocation decisions between themselves and each of those 10 targets. Such a manipulation not only helps elicit a wide range of s, but also tests the measure’s convergent validity with social distance, based on an expected negative correlation between a participant’s measured s toward the targets and her reported social distances from the targets.

Methods

Participants.

40 participants were recruited on Prolific and completed the experiment online between May 7 and 10, 2022. (The sample sizes in all experiments were determined before any data analysis, although this is not strictly necessary because all data analyses are fully Bayesian. The sample sizes of Experiments 1 and 2 were determined heuristically, while the sample size of Experiment 3 was determined based on a frequentist power analysis as preregistered.) The participants were drawn from the “standard sample”, were located in the USA, were fluent in English, had an approval rate of at least 95%, and had at least 10 previous submissions on the platform. The participants gave informed consent to participate in the experiment by clicking a button on the web page displaying the consent form at the start of the experiment. The experiment was approved by the UCSD institutional review board (Protocol #800709). Each participant received US$2 for completing the experiment. 30 participants (7 female, 23 male) passed at least 8 out of the 9 attention checks (see below) and only these participants are included in the analyses below.

Design.

The experiment is implemented as a web page and can be viewed at https://experiments.evullab.org/qi-games-2/. There are three stages in the experiment: List, Rank, and Slide.

In the List stage, participants are asked to list the first names of 10 people they know, 2 in each of 5 categories: family+, friends, neighbors and colleagues, acquaintances, and adversaries. These categories are designed to maximize the range of social distances between a participant and the targets and, presumably, of the participant’s s toward the targets.

In the Rank stage, participants are asked to rank the 10 names they input in the List stage “based on how close you are to them (in terms of relationship, not physical distance)” by dragging the 10 names in a vertical list. The order of the names is initially randomized. The final order of the names is recorded.

In the Slide stage, each participant completes 72 allocation trials using an interface similar to Fig 2A. In each trial, participants drag the horizontal slider, and the payoffs to the participant (), depicted both numerically and as horizontal bars, change continuously according to the underlying payoff functions, which are bounded at 0 and 100 in an arbitrary unit. The bars are labeled “You receive:” and “[Target] receives:”, where “[Target]” is replaced by the name of the target in the current trial. Participants are told that the payoffs are hypothetical and are asked to move the slider until the settings look the best to them. The initial position of the slider is randomized in each trial.

In order to evaluate the test–retest reliability of the Lambda Slider and the SVO Slider Measure, we need two measurements for each target for each measure, which amounts to 2 quadratic Lambda Slider trials and 12 SVO Slider Measure trials (twice for each of the 6 primary items) per target. If we measured each participant’s s toward all the targets on both measures, there would be 6 times as many SVO Slider Measure trials as Lambda Slider trials and too many trials in total. Therefore, we measure each participant’s s toward all the 10 targets on the Lambda Slider (20 trials in total), but only targets whose social distance rankings are 1, 4, 7 or 10 on the SVO Slider Measure (48 trials in total).

A participant’s response on each quadratic Lambda Slider trial is directly used as the measured by virtue of Eq (8). A participant’s responses on the 6 different SVO Slider Measure items are aggregated to an SVO^∘ according to Eq (9). The first occurrence of each item is treated as part of the first measurement of SVO^∘, and the remaining items compose the second measurement.

We also include 4 “catch trials” as attention checks, in which “[Target]” is replaced by “Left” or “Right”. Participants are instructed that on these trials they should move the slider to the far left (right) regardless of the payoffs. A participant is considered to pass a catch trial if the slider position she chooses satisfies ) when the target is “Left” (“Right”). These 72 trials are randomized in order. Immediately after Trials 2, 6, 14, 30 and 62 (called “memory trials”), participants are asked to type the target’s name (or “Left” or “Right”) they just saw as attention checks. Participants are considered to pass a memory trial if the name they type is the same as the target’s name they just saw, after transforming both names to lowercase and removing whitespaces. The combined “catch” and “memory” trials result in 9 attention checks altogether.

The quadratic Lambda Slider trials have payoff functions defined by Eqs (2)–(4) with a = 11.25, , , , such that , , and the range of that can be accurately measured is (Fig 2B). We make the range of because (a) allowing the payoffs to reach extreme values creates salient points that may bias participants’ responses [22], and (b) the welfare participants perceive for themselves and the targets with respect to the raw payoffs is likely to be more nonlinear when the payoffs are close to 0 [12]. The catch trials depict payoff functions in the same manner as the Lambda Slider trials. The SVO Slider Measure trials have the same payoff functions as in [18], as shown in Fig 3A.

Results

Test–retest reliability.

We evaluate the test–retest reliability of the quadratic Lambda Slider by estimating the correlation between the two measurements of of each participant–target combination, and compare it to the correlation between the two measurements of SVO^∘. We do not expect the correlation of s to be as high as the correlation of SVO^∘s because (a) one measurement of ^∘ is an aggregation of 6 responses, which almost certainly has less noise than 1 response on the Lambda Slider, and (b) the Lambda Slider has a nonlinear payoff structure, which might be harder to understand than the linear payoff structures of the SVO Slider Measure. However, researchers using the Lambda Slider have the flexibility to select the number of repeated measurements to achieve the desired tradeoff between precision and efficiency. (This is different from the tradeoff between sensitivity and efficiency involved in the binary allocation tasks, mentioned in the Introduction. For the binary allocation tasks, the tradeoff arises from a theoretical limitation which even applies to a noiseless decision maker, while the current tradeoff is only due to noise in the decisions.) We will first compare the test–retest reliability of the 1-response with the 6-response SVO^∘, and then estimate the reliability of the multiple-response .

Figs 4A and B plot the relationship between the two measurements of each participant–target combination. We fit a bivariate normal distribution to the Lambda Slider data and to the SVO Slider Measure data (see S4 Appendix for details; all data analyses are fully Bayesian, and we use uninformative or weakly informative priors based on null hypotheses for the main parameters). The mean and standard deviation of one measurement are for the SVO Slider Measure data. (When the uncertainty in the estimation of a parameter is not important, like here, we report a single number which is the posterior median of the parameter. Otherwise, see below.) The two measurements on the quadratic Lambda Slider have a high correlation (, (Fig 4A), indicating that a single measurement on the Lambda Slider has high test–retest reliability. (Here, , and is the 95% (equal-tailed) credible interval of . The same notation is used for the rest of the paper. Besides, we do not report the probability of direction ( calculated using the “direct” method [23] is 100%, in which case the true is expected to be at least 99.975% because we take at least 4000 posterior samples in our models.) As predicted, the two measurements of SVO^∘ have an even higher correlation (, (Fig 4B). Test–retest reliability of ( A) the quadratic Lambda Slider and ( B) the SVO Slider Measure, and ( C) convergent validity between the two measures. In ( A) and ( B), each data point represents one participant–target combination. In ( C), for each participant–target combination, there are two data points representing the first paired with the first SVO^∘, and the second paired with the second SVO^∘. The green line is the theoretical relationship between and SVO^∘, same as Fig 3B. Data points on the boundaries, which are treated as censored data, are represented as crosses (same for all figures below). The ellipses indicate the 1- iso-density loci of the fitted bivariate normal distributions with parameters set to their posterior medians.

Download:

Fig 4. Test-retest reliability of (A) the quadratic Lambda Slider and (B) the SVO Slider Measure, and (C) convergent validity between the two measures.

In (A) and (B), each data point represents one participant-target combination. In (C), for each participant-target combination, there are two data points representing the firstλpaired with the first SVO^°, and the second λ paired with the second SVO^°. The green line is the theoretical relationship betweenλ and SVO^°, same as Fig 3B. Data points on the boundaries, which are treated as censored data, are represented as crosses (same for all figures below). The ellipses indicate the 1-σ and 2 - σ iso-density loci of the fitted bivariate normal distributions with parameters set to their posterior medians.

https://doi.org/10.1371/journal.pone.0322410.g004

To assess how many administrations of the Lambda Slider would be required to make the reliability scores of the two measures comparable, we estimate the reliability score of the average of multiple measurements on the Lambda Slider. According to the classical test theory [24], the test–retest correlation is equal to the reliability score, and

(14)

where is the variance of the error (of one measurement) on the Lambda Slider. Let measurements on the Lambda Slider. Averaging , so we have

(15)

and therefore

(16)

The same result can be obtained by first assuming a multivariate normal distribution over , and then deriving the correlation between the mean of the first n variables and the mean of the other variables.

For a baseline of , if we increase the number of measurements to n = 3, we have , which indicates that the reliability of the average of 3 measurements on the Lambda Slider is expected to be comparable to the reliability of the SVO Slider Measure, which requires 6 measurements.

Convergent validity: Lambda Slider vs. SVO Slider Measure.

Fig 4C plots the relationship between as measured by the quadratic Lambda Slider and by the SVO Slider Measure (SVO^∘). We fit a 4-variate normal distribution (2 measurements 2 measures for each participant–target combination) to the data (see S4 Appendix for details). The two measures are highly correlated (), indicating that the Lambda Slider has high convergent validity with the SVO Slider Measure.

In Fig 4C, there seem to be more responses of SVO^∘ between than between . This does not indicate that the SVO Slider Measure has a higher sensitivity for measuring in this range than the Lambda Slider, because (a) it is inconsistent with theoretical predictions, and (b) it can be explained away by assuming that a participant probabilistically chooses between self-gain maximization and perfect inequity aversion for each decision, which we do not explicate here but can be investigated by future work.

Convergent validity: vs. social distance.

Fig 5 shows participants’ measured s from the Lambda Slider and their SVO^∘ measurements toward targets with different social distances from the participants. We fit a Bayesian mixed effects model to the data with the social distance ranking as a monotonic predictor and or SVO^∘ as the dependent variable (see S4 Appendix for details). As predicted, as measured by the quadratic Lambda Slider decreases as the target’s social distance ranking increases (mean slope ; this corresponds to how much decreases on average as social distance ranking increases by 1). The output of the SVO Slider Measure (SVO^∘) also decreases as the target’s social distance ranking increases (mean slope ; this corresponds to how much SVO^∘ decreases on average as social distance ranking increases by 3).

It is worth noting that participants’ s spanned a wide range (Fig 5A). The mean toward the socially closest person is 0.86, which means that participants value the target’s welfare almost as much as their own. The mean toward the socially most distant person is −0.86, which means that participants are almost willing to give up $1 to take $1 away from the target. The SVO Slider Measure has very low sensitivity for (Fig 3B), and thus cannot measure a large subset of plausible s accurately.

Experiment 2

Experiment 1 provided evidence that the Lambda Slider is a valid and reliable measure of . However, it is possible that instead of making decisions by incorporating the relevant s into a utility function like the one in Eq (1) (we call this hypothesis ), participants use the slider position as a qualitative representation of kindness/spitefulness and make decisions based on this representation (we call this hypothesis ). For instance, after getting an intuitive idea of how the two payoffs change as a function of the raw slider position , a participant might treat , 0.25, 0.5, as “very mean”, “somewhat mean”, “neutral”, “somewhat nice”, and “very nice”, respectively. (Note that above. ) always corresponds to the left (right) end of the slider.) Then she may choose to be “very nice” to Alice, “somewhat mean” to Bob, etc., and choose slider positions accordingly.

make different predictions when we alter the relationship between and the raw slider position . For example, suppose that in an initial trial in which , Alice chooses on the quadratic Lambda Slider, corresponding to a of 1 for that target. If we then have Alice make a decision for the same target on a different quadratic Lambda Slider with , ).

In general, suppose we have two quadratic Lambda Sliders, Slider A and Slider B. Let on Slider B. Let the raw slider position that the participant chooses be on Slider B. For simplicity, suppose neither is at the boundaries of the slider. Let ) be the derived from ). We have

(17)

(18)

Given , since on the two sliders should be the same, we have , and therefore

(19)

Given , we have

(20)

To adjudicate between , in Experiment 2, we let participants make decisions for each target on three different quadratic Lambda Sliders with different ranges of x, and see which hypothesis best predicts the responses.