## Figures

## Abstract

Information sampling is often biased towards seeking evidence that confirms one’s prior beliefs. Despite such biases being a pervasive feature of human behavior, their underlying causes remain unclear. Many accounts of these biases appeal to limitations of human hypothesis testing and cognition, de facto evoking notions of bounded rationality, but neglect more basic aspects of behavioral control. Here, we investigated a potential role for Pavlovian approach in biasing which information humans will choose to sample. We collected a large novel dataset from 32,445 human subjects, making over 3 million decisions, who played a gambling task designed to measure the latent causes and extent of information-sampling biases. We identified three novel approach-related biases, formalized by comparing subject behavior to a dynamic programming model of optimal information gathering. These biases reflected the amount of information sampled (“positive evidence approach”), the selection of which information to sample (“sampling the favorite”), and the interaction between information sampling and subsequent choices (“rejecting unsampled options”). The prevalence of all three biases was related to a Pavlovian approach-avoid parameter quantified within an entirely independent economic decision task. Our large dataset also revealed that individual differences in the amount of information gathered are a stable trait across multiple gameplays and can be related to demographic measures, including age and educational attainment. As well as revealing limitations in cognitive processing, our findings suggest information sampling biases reflect the expression of primitive, yet potentially ecologically adaptive, behavioral repertoires. One such behavior is sampling from options that will eventually be chosen, even when other sources of information are more pertinent for guiding future action.

## Author Summary

Human decision-making often appears irrational. A major challenge is to explain why apparently irrational behavior occurs and what potential benefits it might have conferred for our evolutionary ancestors. A well-studied behavior in experimental psychology is “confirmation bias,” where we sample information that simply confirms what we already believe. In this study, we show that one factor giving rise to such information sampling biases is Pavlovian approach: our natural tendency to approach items that are associated with reward. We demonstrate three novel information sampling biases in a large-scale smartphone experiment with >30,000 human subjects. We examine how these three biases are related to Pavlovian approach, as quantified via an entirely independent economic choice task. We also show that, within our population, information sampling is a stable trait of an individual that is related to demographic variables such as age and education. Although irrational in the context of our task, we postulate that approach-induced biases in information sampling may have been adaptive over evolutionary history. They would drive organisms towards gathering information about locations that they will eventually engage with to obtain reward.

**Citation: **Hunt LT, Rutledge RB, Malalasekera WMN, Kennerley SW, Dolan RJ (2016) Approach-Induced Biases in Human Information Sampling. PLoS Biol 14(11):
e2000638.
https://doi.org/10.1371/journal.pbio.2000638

**Academic Editor: **Michael J. Frank, Brown University, UNITED STATES

**Received: **July 25, 2016; **Accepted: **October 7, 2016; **Published: ** November 10, 2016

**Copyright: ** © 2016 Hunt et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All data and MATLAB code are available at the for download from the Dryad repository (DOI 10.5061/dryad.nb41c), at http://dx.doi.org/10.5061/dryad.nb41c

**Funding: **Astor Foundation received by WMNM. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Wellcome Trust www.wellcome.ac.uk (grant number 098830/Z/12/Z, 096689/Z/11/Z, 098362/Z/12/Z, 091593/Z/10/Z, 101252/Z/13/Z) received by LTH, SWK and RJD. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Max Planck Society https://www.mpg.de/en received by RBR and RJD. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Rosetrees’ Trust http://www.rosetreestrust.co.uk/ received by WMNM. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

Many spheres of human behavior depend upon gathering and understanding evidence appropriately to inform decision-making. Yet the best way to sample information is a nontrivial problem, necessitating deciding where to sample information [1,2], when to cease information gathering [3,4], and weighing up how such evidence should guide behavior [5,6]. Normative approaches can help address these questions [7], but their computational complexity renders them unlikely candidates for controlling behavior. Instead, these approaches can be better used as a basis for understanding limitations in cognitive processes and why biases emerge in human behavior [8,9].

A particularly well-studied bias is that of confirming one’s prior beliefs [10]. Inspired by classic rule discovery and falsification studies of Wason [11,12], explanations of confirmation bias frequently appeal to limits in hypothesis testing as their latent cause. Several alternative accounts have been proposed. The “positive test account” [13] posits that humans form beliefs about a particular hypothesis and subsequently selectively seek and interpret evidence in support of this rule rather than against it. Yet it has been pointed out that this strategy may be normative in situations where possible competing hypotheses to explain the data are sparse [14]. Other accounts suggest that humans are simply limited in the number of hypotheses they can consider at any given time [15].

It is widely acknowledged that humans are also subject to more primitive influences on behavioral control. Whilst these have been overlooked as a potential source of confirmation bias, they are known to impact upon information seeking in other domains. For instance, a primitive behavior present in several species is the observing response [16,17]. Here, animals select actions to yield information (reduce uncertainty) about the probability of receiving future reward, even when these actions have no bearing upon reward receipt. This can also be related to human preferences for revealing advance information about rewards when that information is immaterial to the task at hand [18]. Critical here is the notion that, in nature, advance information is typically valuable in guiding future action (unlike in the experimental tasks used to demonstrate these behaviors). Preferences for early temporal resolution of uncertainty [19] is thus conserved across humans and other species and persists in influencing behavior even when rendered instrumentally irrelevant.

These considerations led us to consider how other primitive behaviors might bias information sampling. A notable characteristic of reward-guided behavior in many species is that of Pavlovian approach. Animals show greater efficacy in learning approach, as opposed to avoidance, actions that will lead to the delivery of reward [20,21]. Humans are also subject to similar approach biases [22]. Pavlovian approach effects also spill over into the domain of attentional control, as stimuli previously ascribed a high value capture attention even when they are contextually irrelevant [23]. As the locus of attention is intimately linked to information sampling during choice [24], this raises the possibility that Pavlovian approach may similarly influence information search.

To test this idea, we examined gameplay data from a large-scale smartphone app [25] in which we manipulated several factors of interest whilst probing subjects’ information sampling behavior. In brief, subjects played a card game in which they paid to sample information from different locations prior to deciding which option was most likely to yield reward. A framing manipulation meant that in half of all gameplays, approaching (choosing) the “biggest” option would be rewarded, but in the other half, approaching the “smallest” option would be rewarded. Crucially, the information structure of the task was identical across these matched conditions, such that any effects on information sampling could be ascribed to our manipulation as to the option subjects were instructed to approach.

We compared observed behavior to predictions derived from a normative dynamic programming model that computes the expected value associated with a perfect model of the task, treated as a Markov decision process (see Materials and Methods and [26]). This enabled us to isolate three distinct biases in subjects’ information search that respectively influenced where information was sought, when information collection terminated, and how information was used to guide eventual choices. Each of these three biases can be considered as a form of “approach” behavior towards locations that are more likely to yield reward. Also relevant here is our recent parameterization of human Pavlovian approach behavior in an approach-avoidance decision model on a separate economic decision task [27]. We demonstrate that the prevalence of all three biases is related to the key parameter from this model.

## Results

### Information Seeking Task Design

Subjects played a binary choice game that involved paying escalating costs for information (by turning over playing cards) while gambling on which option was best based upon card values that were revealed (Fig 1A). There were six possible conditions that subjects might play (Fig 1B). Across three of these conditions, subjects’ objective was to identify the pair (row) of cards with the largest product (“MULTIPLY BIGGEST”), largest sum (“ADD BIGGEST”), or largest single card (“FIND THE BIGGEST”). Across the remaining three conditions, the objective was inverted, such that they now sought the row with the smallest product, sum, or single card.

**(A)** Subjects aim to select the “winning row” (in this example, the row with the largest product). After the first card is revealed (here, the 7 of diamonds), subjects enter Task Stage 1. Here, they choose between the two yellow options, either sampling another card (costing 10 points) or making a guess about which is the winning row (no cost). Greyed-out cards cannot yet be sampled. If choosing to sample, then Task Stage 2 is entered, in which either remaining card may be sampled (costing 15 points) or the subject may again guess. In Task Stage 3, sampling the last remaining card costs 20 points. At any Task Stage, making a guess means that subjects enter the Choice Stage. Here, after choosing, all cards are revealed and the subjected either wins 60 points if correct, or loses 50 points if incorrect. **(B)** The six task conditions in a 3-by-2 design. Subjects either select the row with biggest (or smallest) sum, the biggest (or smallest) product, or the biggest (or smallest) single card.

At the beginning of each trial, all cards start face down. Subjects then touch the first card (randomly located) to turn it over at no cost. This enters Task Stage 1 (Fig 1A). One of the three remaining cards is made available to be sampled at a cost of 10 points, but subjects can alternatively make a guess (gamble on which option will be rewarded) at no cost. If they choose to sample, the value of the second card is revealed and they enter Task Stage 2. Either of the two remaining cards can then be sampled at a cost of 15 points, or subjects can again choose to make a guess at no cost. If they choose to sample again, they enter Task Stage 3. The last remaining card can be sampled at a cost of 20 points, or they may again guess at no cost. At any Task Stage, making a guess means that subjects enter the Choice Stage. Here, subjects choose which row they think will be rewarded, and all remaining cards are then turned face up. The subject wins 60 points if the gamble is correct and loses 50 points if incorrect, minus the points paid for information sampling. Card values ranged, with a uniform distribution (sampled with replacement), from 1 to 10, with “picture cards” removed from the deck.

On each gameplay, subjects were randomly assigned to play two short blocks (11 trials each) of two from the six possible conditions. The symmetry between the “approach big” and “approach small” conditions is crucial to our experimental design. Revealing a card of a particular value yields the same information content in both versions of the task (with the exception of the FIND THE BIGGEST and FIND THE SMALLEST conditions). This means that subjects’ information gathering behavior should, normatively, be matched across these conditions. The only behavior that should change is the final gamble made by the subject, which should reverse. By comparing across ADD BIG and ADD SMALL conditions, and across MULTIPLY BIG and MULTIPLY SMALL conditions, we could probe the influence of the approach direction (i.e., big/small) on information sampling behavior and vice versa.

### “Positive Evidence Approach” Bias

The first question we asked pertained to Task Stage 1 (Fig 1A). Here, subjects decided whether to sample or guess based upon two variables: the information seen, i.e., the card value, and also the location where information was made available for sampling. We label the first row sampled as “row A.” In some trials, subjects were constrained to sample the next card from row A (“AA trials”), whilst in other trials they were constrained to sample from row B (“AB trials”).

As can be seen from the optimal dynamic programming model (Fig 2A), the card value and (to a lesser extent) the trial type influences the relative expected value of choosing to guess versus choosing to sample. The U-shaped function of the graph reflects an intuition that high- or low-valued cards are informative about the correct option to approach, making it more valuable to guess early. Mid-valued cards, by contrast, provide less information and make it more valuable to sample more information. The differential influence of AA versus AB trials is because the potential reduction in uncertainty depends upon the information that has already been revealed. Intuitively, on a MULTIPLY TRIAL where a 1 has been revealed, then sampling from row A again yields little information relative to row B, as it is already known that row A will have a low value (between 1 and 10). On a MULTIPLY TRIAL where a 10 has been revealed, then sampling from row A yields more information than row B as it reduces the range of possible row A values from between 10 and 100 to an exact value.

At Task Stage 1, subjects decide whether to make a guess or pay 10 points to sample. The available card to sample may be on the same row (“AA trials”) or the opposite row (“AB trials”) as the first card. **(A)** Model predictions. The relative expected value (in points) of guessing versus sampling from the dynamic programming model in the MULTIPLY conditions. Mid-valued cards make it more valuable to sample, whereas extreme-valued cards make it more valuable to guess. There is a weaker influence of the location of available information (compare “AA trials” versus “AB trials”). Crucially, optimal behavior is identical for both MULTIPLY BIG and MULTIPLY SMALL conditions. **(B)** Subject behavior. The probability of guessing in both conditions shows a broad similarity to the predictions of the dynamic programming model, but behavior in MULTIPLY BIG and MULTIPLY SMALL shows systematic differences. (See S1 Fig for AA and AB trials plotted together, rather than MULTIPLY BIG and MULTIPLY SMALL.) **(C)** Positive evidence approach bias is revealed by subtracting the MUTLIPLY SMALL condition from the MULTIPLY BIG condition. Subjects are more likely to guess early if they have seen evidence that supports them approaching row A rather than avoiding it. This effect is strengthened in AB trials, in which subjects only have the opportunity to sample further information about row B. See also S2 Fig and S3 Fig for other conditions. Data for reproducing all analyses is freely available for download from Dryad.

As expected, the dynamic programming model predicts identical behavior irrespective of the subject’s approach goal. As an example, consider revealing a 2 in the MULTIPLY BIG condition on an AA trial. This yields a probability of 0.764 that row B will be rewarded, and the expected value of guessing is therefore 0.764 * 60 + (1–0.764) * (-50) = 34. The expected value of sampling again from the A row, calculated using dynamic programming, is 28.8, and so the relative expected value of guessing is 5.2 (Fig 2A). Now consider seeing a 2 in the MULTIPLY SMALL condition. This now yields the exact same probability of 0.764 that row A will be rewarded. Hence the expected value of guessing remains 34. The expected value of sampling further information remains 28.8, and so the relative expected value of guessing remains 5.2.

In contrast with these model predictions, subjects’ actual behavior showed a systematic difference between MULTIPLY BIG and MULTIPLY SMALL conditions (compare circles and plus signs in Fig 2B, see also S1 Fig). Subjects became more likely to guess if they had seen evidence that supported them approaching row A rather than avoiding it. In MULTIPLY BIG, a high-valued card (6 or above) carries evidence for choosing row A. Subjects become more likely to guess than when seeing the same card in MULTIPLY SMALL. However, a low-valued card in MULTIPLY BIG (5 or below) carries evidence for avoiding row A. Subjects now become more likely to sample than when seeing the same card in MULTIPLY SMALL. This framing effect is seen most clearly when subtracting behavior in MULTIPLY SMALL from MULTIPLY BIG (Fig 2C).

The observed bias is one of approaching an option if positive evidence has been provided in support of that option; consequently, we term this “positive evidence approach.” We observed that positive evidence approach was more pronounced in AB trials than AA trials (Fig 2C). This is again consistent with our hypothesis, as subjects are less inclined to sample further information if it is available on a row they wish to avoid than if it is on a row they wish to approach.

To quantify positive evidence approach across our population, we used the following summary statistic:
where *pGuess*_{big,i} and *pGuess*_{small,i} denote the average probability of guessing in MULTIPLY BIG and MULTIPLY SMALL, respectively, having revealed card value *i*. As there should be no difference in the probability of guessing across the two conditions, the expected value of this statistic from the normative model is 0. By contrast, the value of this statistic across our population was 0.42 in AA trials and 0.70 in AB trials. To estimate our confidence in this summary statistic, we recomputed it on 1,000 bootstrapped samples of 10,000 gameplays from our population. This yielded 95% confidence intervals of [0.30, 0.54] in AA trials and [0.62, 0.79] in AB trials. (Throughout the paper, we focus on the reporting of effect sizes and 95% confidence intervals rather than *p*-values, as our large sample size renders *p*-values less informative [28].)

Similar results are seen by comparing the ADD BIG and ADD SMALL conditions (S2 Fig; AA trials: mean 0.52, 95% CIs [0.40, 0.64]; AB trials: mean 0.79, 95% CIs [0.70, 0.89]). See also S3 Fig for FIND THE BIGGEST/FIND THE SMALLEST, which are not directly matched for information content.

It is also notable that, overall, subjects’ behavioral choices to sample information were similar to predictions arising from the optimal model (Fig 2B), although not identical (S1 Fig). This alone does not imply that subjects are implementing the optimal model. Instead, it may simply reflect the fact that relatively simple behavioral strategies will often recapitulate many features of more sophisticated strategies [29]. For example, one straightforward strategy would be to compare the value of the presented card to the average value, estimate the current degree of uncertainty in making a choice, and then use these values with a softmax transformation [30] to calculate a probability for selecting row A, selecting row B, or sampling further information. We consider this question further in a latter section of the paper and show that this can approximate the average behavior of subjects in the task without recourse to an optimal model.

### “Rejecting Unsampled Options” Bias

We next asked how decisions to sample or reject information might influence subsequent choices. If subjects elected to guess at the first stage, they entered the Choice Stage, in which they gambled on which option would be rewarded (Fig 1A). In ADD BIG, the relative expected value of choosing row A over row B increases with the value of the first card (Fig 3A, blue line), while in ADD SMALL, it decreases with the value of the first card (Fig 3A, purple line). This was reflected in subjects’ choices in both sets of trials (Fig 3B; see S4 Fig and S5 Fig for other conditions). However, this decision arises on two different types of trial. The subject will either have just declined the opportunity of sampling information from the A row (on AA trials), or the B row (on AB trials). Our hypothesis was that information sampling depends upon the underlying approach value of an item. A corollary is that declining to sample an item reflects an underlying preference for not approaching it.

If subjects choose to guess at Task Stage 1, they then select between row A and row B. **(A)** Model predictions. The relative expected value (in points) of choosing row A versus row B in the ADD BIG (cyan) and ADD SMALL (purple) conditions. Crucially, the decision is identical for AA and AB trials; the only difference between these conditions is which row subjects have previously declined to sample. **(B)** Subject behavior. The probability of choosing row A shows a softmax relationship to the relative values shown in Fig 3A. Note that the green line (AB trials) is higher than the blue line (AA trials) at nearly all card values, particularly near the point of subjective equivalence. **(C)** Rejecting unsampled option bias is revealed by subtracting AB trials from AA trials in both ADD BIG and ADD SMALL conditions. Subjects are more likely to choose row A if they have declined to sample row B. See also S4 Fig and S5 Fig for other conditions. Data for reproducing all analyses is freely available for download from Dryad.

When we compared choice preferences on AA and AB trials for identical card values on the same condition, we observed that, across all six conditions, subjects showed a systematic shift towards being less likely to choose the option that had just been left unsampled. Hence, subjects presented with the same card value on an AA trial were more likely to choose option B than on an equivalent AB trial (Fig 3B). This effect was most pronounced near the point of subjective equivalence in subjects’ choices and is revealed most clearly by subtracting subjects’ choice behavior in AB from AA trials (Fig 3C).

We term this a “rejecting unsampled options” bias. To quantify rejecting unsampled options across our population, we used the following summary statistic:
where *p(Choice = A)*_{i,AB} denotes the probability of choosing row A having observed card *i* on an AB trial, and *p(Choice = A)*_{i,AA} denotes the same probability on an equivalent AA trial. There is no difference in the choice that is presented to the subject between AB and AA trials, and the expected value of this statistic is therefore 0. The mean value of this statistic across the population was 0.21 in ADD BIG (95% CIs [0.10, 0.32]) and 0.30 in ADD SMALL (95% CIs [0.18, 0.41]).

We also found the rejecting unsampled options bias to be present across all the other conditions: MULTIPLY BIG (mean 0.24, 95% CIs [0.12, 0.35]; S4 Fig), MULTIPLY SMALL (mean 0.27, 95% CIs [0.15, 0.39]; S4 Fig), FIND THE BIGGEST (mean 0.22, 95% CIs [0.11, 0.35]; S5 Fig), and FIND THE SMALLEST (mean 0.21, 95% CIs [0.09, 0.34]; S5 Fig).

### “Sampling the Favorite” Bias

Our design enabled us to also investigate where subjects chose to sample information. At Task Stage 2 on AB trials, we could determine subjects’ relative preference for sampling from row A versus row B (Fig 1A). Here, different conditions have different predictions for which row is more advantageous to sample. For example, in both the ADD BIG and ADD SMALL conditions, sampling from either row yields exactly the same amount of information about which row might be rewarded. The optimal dynamic programming model predicts no relative advantage for sampling from row A versus row B (S6 Fig).

In both MULTIPLY BIG and MULTIPLY SMALL conditions, however, dynamic programming predicts that sampling from the row that currently has the higher-valued card will be more informative. The intuition behind this is that the range of possible outcomes on the row with the higher-valued card is greater, and so sampling further information on this row leads to a greater reduction in uncertainty than sampling the row with the lower-valued card. This is borne out in a heat map of model predictions, showing the difference in relative value from sampling from row A versus row B (Fig 4A). Importantly, these predictions are identical for both MULTIPLY BIG and MULTIPLY SMALL conditions. Somewhat counterintuitively, it is therefore more advantageous to sample from the row with the largest card even in MULTIPLY SMALL. (Note that this is different from the relative expected value of guessing versus sampling, which is shown in S8 Fig)

If subjects choose to sample at Task Stage 1, they enter Task Stage 2. If this is on an AB trial, they then may sample again from row A, or from row B, or make a guess. **(A)** Model predictions. The relative expected value (in points) of sampling from row A versus sampling from row B in the MULTIPLY conditions. It is generally more valuable to sample from the row that currently has the higher-valued card (e.g., the 7, in the example shown). Crucially, this prediction is the same in both MULTIPLY BIG and MULTIPLY SMALL conditions. **(B)** Subject behavior. Subjects show a propensity to sample from the row containing the high-valued card in MULTIPLY BIG, but this trend is reversed in MULTIPLY SMALL. Subjects are therefore inclined to sample the option that is currently most likely to be approached. **(C)** Sampling the favorite bias is revealed by subtracting MULTIPLY SMALL trials from MULTIPLY BIG trials. The normative model predicts this heat map to show no difference between conditions, yet there is a clear tendency to sample the currently favored row. See also S6 Fig and S7 Fig for other conditions, and S8 Fig for relative value of guessing versus sampling across all six conditions. Data for reproducing all analyses is freely available for download from Dryad.

In contrast to model predictions, we found that subjects preferred to sample from the option that currently had the higher value in the MULTIPLY BIG condition alone (Fig 4B, left). In the MULTIPLY SMALL condition, they preferred to sample from the option that currently had the lower value (Fig 4B, right). The influence of this bias in subjects’ information sampling is revealed more clearly by subtracting behavior in MULTIPLY SMALL from that of MULTIPLY BIG (Fig 4C). Whereas the optimal model shows no difference between these two conditions (i.e., the entire heat map should equal 0), subjects reliably sampled information from the row that they currently sought to approach rather than avoid.

We term this bias “sampling the favorite.” To quantify sampling the favorite, we derived two statistics for trials in which subjects decided to sample a third piece of information. We calculated one “strong evidence” statistic for trials in which the “favorite” (the item that would eventually be approached) was clear. We defined this as trials where the difference in card values was 4 or greater in magnitude:
where *P(Sample = A)*_{i,j,big} refers to the relative probability of choosing to sample from row A over row B, when card *i* is presented on row A and card *j* presented on row B, on MULTIPLY BIG trials. The top row of the equation denotes trials where row A has a higher-valued card than row B, favoring approaching A in MULTIPLY BIG but approaching B in MULTIPLY SMALL. The converse is true for the bottom row.

We calculated a second “weak evidence” statistic for trials in which the “favorite” was less clear. We defined this as trials in which the difference in card values was between 1 and 3 in magnitude: Crucially, because the optimal model predicts identical values for sampling from row A versus row B on MULTIPLY BIG and MULTIPLY SMALL, the expected value for both statistics is always 0. In contrast, the value of the “strong evidence” statistic across our population was 12.35 (95% CIs [10.55, 14.01]), whilst the value of the “weak evidence” statistic was 7.21 (95% CIs [5.85, 8.48]).

Note this bias was also observed in the ADD BIG/ADD SMALL condition (S6 Fig), where the value of the “strong evidence” statistic was 11.93 (95% CIs [10.39, 13.51]), whilst the value of the “weak evidence” statistic was 6.86 (95% CIs [5.61, 8.14]). See S7 Fig for FIND THE BIGGEST/FIND THE SMALLEST, which are not directly matched for information content.

### A Parametric Model of Subject Behavior

We consider that subjects are unlikely to be implementing dynamic programming when they perform the task, yet their overall behavior shows a surprising resemblance to model predictions (e.g., Fig 2B, Fig 3B). We therefore constructed a simpler model that describes subjects’ performance without recourse to dynamic programming.

In this model, subjects first compute the value of choosing each option by comparing the presented value to the average value of all possible cards. In ADD BIG trials, at stage 1, for example, this would be the following:
where <1stCardValue> is the expected value of the first card (5.5) and β_{1} is a free parameter. In ADD SMALL trials, we simply inverted the value of the each card, such that 10 became 1, 9 became 2, and so on.

We also considered an AB trial (e.g., Fig 4B), where after turning the second card, the values of option A and B become the following:
At both stages, we compute the degree of uncertainty, ω, in choosing either option:
This is then used to derive the value of sampling information from option A or option B:
The probability of each choice *C* being option *o* is finally generated using a softmax choice rule:
where *i* indexes the entire set of available options at a given stage of the task. Hence, at stage 1 of an AA trial, it indexes the set {choose A, choose B, sample A}; at stage 1 of an AB trial, it indexes the set {choose A, choose B, sample B}; at stage 2 of an AB trial, it indexes the set {choose A, choose B, sample A, sample B}. Note that the model therefore assumes a three- or four-way decision at each stage of the game. Although in the structure of the task this was made as two sequential binary decisions (guess or sample, then select a row), it reflects the intuition that when subjects decide to guess, they already have internally committed to choosing a particular row.

We fit parameters β_{1}, β_{2}, β_{3}, and τ, using maximum likelihood estimation (*fminsearch* in MATLAB) separately at stage 1 and stage 2. Fitting was repeated at 50 random seed locations to avoid local minima, and the model was fit to the average behavior of all subjects across each condition using a sum of squared errors cost function. We performed fitting separately for both ADD and MULTIPLY conditions; we do not consider FIND THE BIGGEST/SMALLEST conditions here, as these are not matched for information content. Note that this model does not explicitly feature terms for the costs associated with sampling information; instead, these are implicitly factored into the constant term β_{2}.

This model captures the main features of the behavioral data (S9 Fig). At stage 1, it displays a U-shaped effect of card value on information sampling (as in Fig 2B) caused by the effects of choice uncertainty on the value of sampling information. It also displays a softmax choice curve (as in Fig 3B) between options A and B, matching subjects’ real choice probabilities between these two options. At stage 2, it displays choice probabilities between options A and B that again closely match subjects’ behavior.

However, because this model makes symmetric predictions for BIG and SMALL trials, it fails to capture the three biases described above (S9 Fig). We therefore adapted the model with three additional parameters to capture these biases (Fig 5). At stage 1, before entering values into the softmax choice equation, we captured the “rejecting unsampled options” bias (Fig 5D–5F) by adding an “approach bonus” (β_{4}) to the value of the item that could not be sampled. This makes subjects more likely to choose this option if they do not sample information.
A natural consequence of this value modulation by β_{4} is that it also induces a change in the form of the uncertainty ω, which determines the value of sampling information (see above). Whereas previously this would have been symmetric around the average value of the first card, it now becomes asymmetric, but in opposite directions on “add big” versus on “add small” trials. On AB trials, this feature of the model is thus sufficient to also capture the “positive evidence approach” bias (Fig 5B and 5C).

**(A)** Predicted probability of guessing at stage 1 for AA trials (compare to Fig 2Bi) and **(B)** AB trials (compare to Fig 2Bii) in “add” condition. **(C)** Predicted “positive evidence approach” bias. Compare to Fig 2C. **(D)** Predicted probability of choosing row A, having chosen to guess, at stage 1, for “add big” (compare to Fig 3Bi) and **(E)** “add small” (compare to Fig 3Bii) conditions. **(F)** Predicted “rejecting unsampled options” bias. Compare to Fig 3C. **(G)** Predicted probability of sampling row A in “big” (compare to Fig 4Bi) and in **(H)** “small” (compare to Fig 4Bii) conditions. **(I)** Predicted “sampling the favorite” bias. Compare to Fig 4C. See also S9 Fig. Data for reproducing all analyses is freely available for download from Dryad.

On AA trials, however, the “approach bonus” alone predicts the opposite pattern of “positive evidence approach” to that observed in the data. We instead found that we could capture the “positive evidence approach” bias (Fig 5A–5C) by modulating the value of sampling option A:
Notably, we found increasing first card value had a negative influence on the value of sampling A, reflected by the negative sign in front of parameter β_{5}. We infer that on AA trials (where option A is available to be sampled), subjects are more inclined to choose option A when it is of high value than to sample again from it. This parameter is not needed on AB trials, where “positive evidence approach” is already captured by the β_{4} parameter alone. It should also be noted that it captures another general feature of the data, which is that subjects are more likely overall to guess on AA trials than on AB trials (see S1 Fig). This is because β_{5} reduces the value of sampling on AA trials selectively.

Finally, at stage 2, we found that we could capture the “sampling the favorite” bias (Fig 5G–5I) by introducing a parameter that affected subjects’ propensity to sample from higher-valued cards: Parameter fits for stage 1 and stage 2 for both ADD and MULTIPLY trials are given in S1 Table. To confirm that the additional parameters provided additional complexity to model fits without overfitting, we used 10-fold nested cross-validation. Parameters were fit using 90% of the data (training set), and then the cost function was calculated for the remaining 10% of the data (test set, not used to train the model). This process was iterated ten times using different portions of the data as the test set each time. At both stage 1 and stage 2, for both ADD and MULTIPLY conditions, the model with additional parameters provided consistently better fits to the test set than the reduced model (S2 Table).

The close fit between model predictions and subject behavior reveals that a far simpler framework (comparing a card value to the average expected value) can approximate an optimal dynamic programming model. Moreover, subjects’ approach-induced biases in information sampling can be readily parameterized within this framework. We anticipate that further, more refined models will be subsequently tested by downloading the raw behavioral data from Dryad [31].

One potential drawback of the proposed model is that each of the three biases is captured by a separate parameter rather than a single unifying mechanism driven by Pavlovian approach. We therefore tested whether these parameters are related to each other across the entire population of subjects—that is, whether they might show positive covariance with each other. To this end, we adopted an alternative model fitting strategy using a mixed effects analysis to describe population behavior. A mixed effects analysis contains population-level hyperparameters to constrain individual subject model fits (see reference [32] for details). An added benefit is that one can examine the covariance structure of these hyperparameters to explore how β_{4}, β_{5}, and β_{6} are related to each other. Importantly, we found that, after model fitting, the covariance between all three parameters was positive. We normalized by the variance of each parameter to yield correlation coefficients between β_{4}, β_{5}, and β_{6}; this yielded a positive correlation between β_{4} and β_{5} (r = 0.21), between β_{4} and β_{6} (r = 0.06), and between β_{5} and β_{6} (r = 0.12; *p* < 0.0001 for all comparisons). This analysis provides evidence that subjects who showed stronger expression of one of the biases also tended to show greater expression of the remaining two biases.

### Relationship of Biases to Pavlovian Approach-Avoid Parameter

An advantage of large-scale data collection via a smartphone app is that it allows data to be gathered on a range of cognitive tasks across a large cohort of subjects. Recently, we reported learning and choice behavior on another gambling task contained within the same app platform [27,33]. In this simpler gambling task, subjects make binary choices between safe and risky options in three types of trials: gain trials (a certain gain versus a larger gain/zero gain gamble), mixed trials (certain zero gain versus a mixed gain/loss gamble), and loss trials (a certain loss versus a larger loss/zero loss gamble). Notably, subject behavior in this task was best characterized within a Pavlovian approach-avoidance decision model when compared to a range of models that also included a standard Prospect Theory model [27]. This decision model captures the influence on risk-taking behavior of both economic preferences and Pavlovian influences. It describes subjects’ value-independent propensity to approach or avoid gain gambles with a single parameter, *β*_{gain}, and their value-independent propensity to approach or avoid loss gambles with a second parameter, *β*_{loss}. Full details of modeling are provided in [27] and Materials and Methods.

For each subject who played both games within the app (*n* = 21,866 users), we estimated *β*_{gain} and *β*_{loss} and computed the difference between these two parameters. We performed a median split on these values to derive two subpopulations of subjects, one exhibiting a larger bias for approach potential rewards over avoid potential losses and one exhibiting a weaker bias. Next, we calculated the average behavior in our task of the subjects within these two subpopulations. We then fit the model described in the previous section to subjects’ aggregate behavior and compared the fits of *β*_{4} (rejecting unsampled options), *β*_{5} (positive evidence approach), and *β*_{6} (sampling the favorite) statistics across the different subpopulations. To estimate our confidence in these statistics, we performed 100 bootstraps using 10,000 samples drawn from each subpopulation.

All three of our information sampling biases were differentially present in the high approach-avoid versus low approach-avoid groups. Positive evidence approach was greater in the high approach-avoid group in both add and multiply trials (Fig 6A). Rejecting the unsampled option was also greater in the high approach-avoid group in the add condition, although this difference was slightly reversed in the multiply condition (Fig 6B). Sampling the favorite showed a subtler pattern of expression was greater in the high approach-avoid group in both add and multiply trials (Fig 6C). All of the different observed biases are linked by the tendency to sample information from locations that will eventually be approached. The present results show that this is also reflected in the expression of these biases in groups exhibiting differential levels of Pavlovian approach influence on their behavior.

Blue bars denote subjects with below-median values for β_{gain}–β_{loss}; red bars denote subjects with above-median values. **(A)** The positive evidence approach bias is quantified using the β_{5} parameter in the parametric model of subject behavior; in both conditions, subjects with high approach-avoid parameter differences show more positive evidence approach than subjects with low parameter differences. **(B)** The rejecting unsampled option is quantified using the β_{4} parameter in the parametric model of subject behavior; in the add conditions, there is considerably greater expression of rejecting unsampled option in subjects with high approach-avoid parameter differences; there is a weaker trend in the opposite direction in the multiply condition. **(C)** The sampling the favorite bias is by quantified using the β_{6} parameter in the parametric model of subject behavior; in both conditions, subjects with high approach-avoid parameter differences show more sampling the favorite bias than subjects with low parameter differences. Bars/error bars reflect mean/standard deviation across 1,000 bootstrapped samples of 10,000 gameplays. Data for reproducing all analyses is freely available for download from Dryad.

### Variability in Information Sampling across Age and Education Groups

An additional advantage of acquiring data via smartphone is that it enables examination of variation in information sampling across a much wider range of subjects than is typically examined in laboratory studies. In an initial exploration of this, we examined variation in a simple measure of information seeking, namely the average number of cards turned relative to the optimal model.

Subjects reliably sampled less information than predicted from the optimal model, but there was substantial variation across the population (Fig 7A). It is important to remember, however, that the model is only “optimal” in the sense of maximizing expected points per gameplay. It does not, for example, include additional factors such as the subjective cost of sampling information. Indeed, we found that adding a “subjective sampling cost” of 5 points per turn to the optimal model shifted the distribution in Fig 7A so that it was now centered around zero (S10 Fig). Nonetheless, variability in the extent to which individual subjects sampled information was highly reproducible across repeated gameplays (Fig 7B and S10 Fig), and we also found it to be stable irrespective of which set or ordering of conditions subjects played (S11 Fig). This suggests that it provides a measure that might be related to performance on other cognitive tasks or demographic information about participants. An example of the latter is our finding that the number of cards gathered was positively related to both the highest level of attained education and age group of our participants (Fig 7C, top panels). Importantly, this measure was decoupled from general performance on the task, which was positively related to educational attainment but negatively related to age (Fig 7C, bottom panels). There was a very slight tendency for subjects within the “high approach-avoid group” to gather more evidence versus subjects in the “low approach-avoid group,” but this difference was negligible relative to the overall variance in information sampling across the population (mean of 0.0066 more cards sampled in high approach-avoid group, 95%CIs [0.0037, 0.0094]).

**(A)** Histogram of the mean number of cards sampled in each trial relative to how many would be sampled by the optimal model. Subjects show a propensity to guess early, but there is considerable individual variation. **(B)** Individual variation in information sampling reproduces across subsequent gameplays. Each dot is a subject; subjects who were inclined to seek little information in gameplay 1 also sought little information in gameplay 2. **(C)** Variation in information seeking (top) and subject performance (bottom) as a function of educational attainment (left panels) and age (right panels). General Certificate of Secondary Education (GCSE) is equivalent to 10th grade in United States; A Level (ALev) is equivalent to 12th grade. Datapoints denote mean +/- standard error of the mean. Data for reproducing all analyses is freely available for download from Dryad.

## Discussion

Information seeking comprises interlinked decisions that include how much to sample, where to sample from, and, finally, which option to choose based upon sampled information. Whilst the complexity of our task allowed these different features to be indexed simultaneously within a single scenario, the task was sufficiently constrained such that it can be treated as a Markov decision process. As such, an optimal model of the task can be derived using dynamic programming [26]. Dynamic programming has rarely been considered as a normative basis for analysis of information search strategies in human information search [7]. Although computationally expensive, a distinct advantage for our purposes is that it straightforwardly derives a common currency for the expected value of sampling in different locations against the value of choosing a particular option.

Subjects rapidly learnt the task, with their performance in terms of points gained becoming relatively stable within ~4 trials; moreover, basic features of subject behavior (e.g., Figs 2B and 3B) matched with the overall pattern of predictions from the normative model. This confirms our previous observations concerning the validity of behavioral data acquired via smartphone [25]. We make the large behavioral dataset freely available for download [31], providing an empirical testing ground for models of human information seeking.

Crucially, three features of subject behavior at different Task Stages showed demonstrable biases in information seeking. Two of these biases, positive evidence approach and sampling the favorite, were elicited as a consequence of our manipulation of which item subjects approached across different conditions. A third bias, rejecting unsampled options, was demonstrated as an effect of rejecting an option on the preference of a subject for choosing that option. All three biases were a consequence of the item that subjects currently sought to approach. Although manifesting as suboptimal biases in our experiment, we contend that these behaviors are present because they are likely to be, and have been, adaptive ecologically [34]. In nature, foraging decisions (such as whether to stay or depart from a patch, or whether to engage with or reject an item of prey) are more common than those made between binary mutually exclusive options [35]. In such contexts, we hypothesize that an adaptive strategy is to engage with the most valuable alternative first and then accept or reject this alternative having acquired more information about its value. It would be intriguing to test whether approach-induced information sampling can produce optimal information sampling in more naturalistic foraging paradigms.

Further evidence supporting the claim that our biases are related to Pavlovian approach comes from their differential expression in two groups who varied in the degree of a Pavlovian approach-avoid parameter derived from a separate decision task. This provides a tentative suggestion of an underlying dopaminergic mechanism for control of Pavlovian approach on information seeking behaviors, given our recent demonstration that Pavlovian approach is boosted in subjects treated with L-DOPA [27]. We also note that polymorphisms in genes controlling dopamine function have recently been linked to individual differences in confirmation bias [36]. Moreover, recordings from midbrain dopaminergic neurons reveal that they signal information in a manner consistent with the animal’s preference for advance information in the same manner that they encode information about reward [17]. Future studies could easily exploit possibilities of data collection via smartphone to test this and related hypotheses via combined collection of genetic and behavioral data across large populations. It might also be possible to design future versions of our task with a larger number of trials/conditions per subject so as to elicit each of the three observed biases within-subject rather than depending upon examining amalgamated data across a population.

It should be noted, however, that this effect was relatively small. This may in part be due to the limited number of trials completed on both tasks, which provides significant challenges for characterization of individual subject behavior. This is particularly the case when multiple conditions/trial types need to be completed to obtain an effect. In the present study, there were only 22 trials per subject, and this is because we explicitly aimed to ensure that the average time to complete each game was less than 5 minutes, as shorter games yield the highest number of gameplays [25].

It is possible that subjects had miscalibrated beliefs about task structure. For example, they may not have realized that there was a card value 1, which is normally replaced by an “ace” in a regular deck of playing cards; or they may have believed that the average card value is 5, rather than 5.5. Such beliefs can straightforwardly be factored into the dynamic programming model, as can misunderstandings about the cost structure of the task, or additional opportunity costs for sampling further evidence. We found that such manipulations did indeed influence the relative preference of the model for guessing or sampling at different card values (S10 Fig). Crucially, however, none of these belief-based manipulations predict any of the three biases observed. “Positive evidence approach” and “sampling the favorite” depend upon comparisons of SMALL and BIG conditions: any normative model predicts that subjects’ information sampling should be identical between these conditions, and that they should simply flip their final choice. Similarly, “rejecting unsampled options” depends upon a comparison of final choice behavior in AA and AB trials in situations in which the subject has received identical information in both trial types.

It would also be possible to explore alternative versions of the current experiment that might examine the generality of our approach-avoidance account of information seeking biases. For instance, it would be intriguing to manipulate the affective valence of “points” such that they became aversive rather than rewarding. In such an experiment, we would predict that the approach-induced biases in information sampling would reverse. It would also be interesting to parametrically manipulate the costs involved in sampling different cards, as this would allow the experimenter to directly quantify the value of sampling information from different locations. It is also important to bear in mind that, even when information sampling is biased, posterior beliefs can remain unbiased if belief updating is performed normatively [37,38]. It would be informative in future experiments to formally dissociate subjects’ apparent biases in information sampling from their biases, if any, in their belief updating.

Our findings are closely linked to other evidence from recent studies that relates the value of stimuli to deployment of attention [23,39,40]. Both these studies and our own suggest that valuable items capture attention and, hence, cause more information to be sampled from the associated location. In contrast with these previous studies, however, we show that the influence of value on information sampling occurs rapidly, can be reshaped depending upon current task goals, and can manifest as several distinct behavioral biases that affect multiple stages of information sampling. Combined, this evidence argues that choice models in which attention and information sampling are determined purely stochastically [6] require revision. Whereas these models convincingly demonstrate an important role for the locus of attention on valuation, the present data imply that the converse is also true. In simple terms, the value subjects ascribe to a location influences how likely they are to sample from it.

## Materials and Methods

### Ethics Statement

Ethical approval for this study was obtained from University College London research ethics committee, application number 4354/001.

### Smartphone-Based Data Acquisition

Researchers at the Wellcome Trust Centre for Neuroimaging at University College London worked with White Bat Games to develop The Great Brain Experiment [25], available as a free download on iOS and Android systems (see http://thegreatbrainexperiment.com). Ethical approval for this study was obtained from University College London research ethics committee, application number 4354/001. On downloading the app, participants filled out a short demographic questionnaire and provided informed consent before proceeding to the games. Each time a participant started a game, a counter recording the number of plays was incremented. At completion of a game, if internet connectivity was available, a dataset was submitted to the server containing fields defining the game's content and the responses given. The first time a participant completed any game the server assigned that device a unique ID number (UID). All further data submissions from that device, as well as the demographic information from the questionnaire, were linked to the UID. No personal identification of users was possible at any time. Users who responded that their age was less than 18 years during the demographic questionnaire (i.e., minors) were excluded from the study: the app allowed these users to play the games, but no data were submitted to the server (hence the minimum age category on Fig 7C is 18–25 years).

The information-seeking game was available by clicking on “Am I a risk-taker?”, which launched the game. On each gameplay, subjects were randomly assigned to play short blocks (11 trials each, as outlined in Fig 1A and main text) of two different conditions randomly selected from six possibilities (Fig 1B). In two of these, subjects had to select the row that they believe contained the largest sum (“ADD BIGGEST”) or largest product (“MULTIPLY BIGGEST”). In a further two conditions, subjects has to reverse their eventual choice and select the row containing the smallest sum (“ADD SMALLEST”) or product (“MULTIPLY SMALLEST”). The remaining two conditions required participants to select the row with the largest or smallest individual card (“FIND THE BIGGEST” and “FIND THE SMALLEST,” respectively). Full instructions for the task can be seen in S1 Text and S1 Movie.

The economic gambling task was available by clicking on “What makes me happy?" Subjects started the game with 500 points and made 30 choices in each play. In each trial, subjects chose between a certain option and a gamble. Chosen gambles, represented as spinners, were resolved after a brief delay. Subjects were presented with the question, “How happy are you at this moment?” after every two to three trials.

### Dynamic Programming Model

The probabilistic structure of the task means that it is straightforward to derive a normative solution of task performance that maximizes the expected average number of points to be obtained from a given set of moves. This is achieved by applying dynamic programming to the task [30]. At each step, dynamic programming calculates the expected value of every possible action (seeking more information in a particular location, or making a guess). To do so, it takes into account the full probability distribution of currently hidden cards and the possible gain in information that can be obtained from sampling further.

Each combination of presented cards is defined as a state *s*. The best possible action *a* that a subject can take in a given state is defined as:
In a given state, the action value *Q* of making a particular guess in a particular state *s* can be calculated as:
where *p(win)* is the current probability of winning by making that guess, *p(lose)* is the probability of losing, and *totalcost* is the incurred costs for sampling information thus far.

By contrast, sampling further information has a fixed probability (0.1) of transitioning into one of 10 possible subsequent states (10 different card values may be revealed). The value of sampling can then be calculated as the best action value in the subsequent state multiplied by the probability of transitioning:
where s_{i} is the state that the subject would transition into if card value *i* is revealed. To calculate the value of sampling, one works backwards from the terminal state (all four cards revealed, where *Q*_{s,guess} = 15 (= 60–10–15–20)) to calculate in all previous states.

### Pavlovian Approach-Avoidance Model

Full details of the approach-avoidance decision model are given in reference [27]. In brief, subjects’ expected utilities for choosing the safe option (*U*_{certain}) and risky option (*U*_{gamble}) were fitted using an established parametric decision model based on Prospect Theory [41]. The probability of choosing to gamble was then modelled by modifying the softmax function:
This causes a value-independent change in the probability of gambling, with mapping choice probabilities to be bounded at (*β*,1) if *β* is greater than zero, and (0,*β*) if *β* is less than zero. *β* is fit separately for gain trials and loss trials, yielding two parameters, *β*_{gain} and *β*_{loss}.

### Data Deposited in the Dryad Repository

Raw data, along with MATLAB scripts for reproducing all figures shown in the paper and code for the dynamic programming model, are freely available for download from http://dx.doi.org/10.5061/dryad.nb41c [31].

## Supporting Information

### S1 Fig. Probability of guess in MULTIPLY condition for MULTIPLY BIG (left) and MULTIPLY SMALL (right) conditions.

Data are the same as in main Fig 2B, but are replotted with AA and AB trials on top of each other to facilitate comparison with dynamic programming model predictions.

https://doi.org/10.1371/journal.pbio.2000638.s001

(PDF)

### S2 Fig. Figure layout as per main Fig 2, but for ADD conditions.

Note that there is now no difference between normative model predictions for AA vs. AB trials.

https://doi.org/10.1371/journal.pbio.2000638.s002

(PDF)

### S3 Fig. Figure layout as per main Fig 2, but for single card trials.

Note that in single card trials, the same card value carries different amounts of information between the two conditions (hence part A is split into two plots). As such, there is no direct equivalent for the positive evidence approach bias.

https://doi.org/10.1371/journal.pbio.2000638.s003

(PDF)

### S4 Fig. Figure layout as per main Fig 3, but for multiply trials.

https://doi.org/10.1371/journal.pbio.2000638.s004

(PDF)

### S5 Fig. Figure layout as per main Fig 3, but for single card trials.

https://doi.org/10.1371/journal.pbio.2000638.s005

(PDF)

### S6 Fig. Figure layout as per main Fig 4, but for add trials.

Note that there is no relative advantage for sampling row A over row B in the normative model (part A).

https://doi.org/10.1371/journal.pbio.2000638.s006

(PDF)

### S7 Fig. Figure layout as per main Fig 4, but for single card trials.

Note that in single card trials, the same card value carries different amounts of information between the two conditions (hence part A is split into two plots). As such, there is no direct equivalent for the sampling the favourite bias.

https://doi.org/10.1371/journal.pbio.2000638.s007

(PDF)

### S8 Fig. The relative expected value of guessing (and choosing the best option) minus the expected value of sampling further information, at Task Stage 2 on AB trials, derived using dynamic programming.

(A) ADD conditions, where predictions are identical for ADD BIG and ADD SMALL. (B) MULTIPLY conditions, where predictions are identical for MULTIPLY BIG and MULTIPLY SMALL. (C) SINGLE CARD conditions. Top row = FIND THE BIGGEST condition, bottom row = FIND THE SMALLEST condition.

https://doi.org/10.1371/journal.pbio.2000638.s008

(PDF)

### S9 Fig. Behavioral predictions from the reduced (4-parameter) model of subject behavior, with best-fit parameters.

Data is plotted as in main Fig 5. (A) Predicted probability of guessing at stage 1 for AA trials and (B) AB trials in ‘add’ condition. (C) Predicted ‘positive evidence approach’ bias. (D) Predicted probability of choosing row A, having chosen to guess, at stage 1, for ‘add big’ and (E) ‘add small’ conditions. (F) Predicted ‘rejecting unsampled options’ bias. (G) Predicted probability of sampling row A in ‘big’ and in (H) ‘small’ conditions. (I) Predicted ‘sampling the favorite’ bias.

https://doi.org/10.1371/journal.pbio.2000638.s009

(PDF)

### S10 Fig. Data plotted as in main Fig 7A and 7B, but with an additional ‘subjective sampling cost’ of 5 points/turn added to the normative model.

The mean of the distribution of the number of cards sampled relative to the model (left panel) now lies close to 0.

https://doi.org/10.1371/journal.pbio.2000638.s010

(PDF)

### S11 Fig. Information seeking is a stable trait irrespective of condition ordering.

Along the bottom of the matrix is the condition experienced in the first 11 trials of gameplay 1 (1 = FIND BIGGEST, 2 = FIND SMALLEST, 3 = ADD BIG, 4 = ADD SMALL, 5 = MULTIPLY BIG, 6 = MULTIPLY SMALL), whilst along the left of the matrix is the condition experienced in the first 11 trials of gameplay 2. The color of the heatmap reflects the correlation coefficient between information sampling (relative to the optimal model) across the two gameplays (as in main Fig 7B).

https://doi.org/10.1371/journal.pbio.2000638.s011

(PDF)

### S1 Table. Maximum likelihood estimates of parametric model of subject behavior.

https://doi.org/10.1371/journal.pbio.2000638.s012

(DOCX)

### S2 Table. Sum of square errors for model fitting on left-out data during 10-fold nested cross-validation (mean +/- S.E.M.).

https://doi.org/10.1371/journal.pbio.2000638.s013

(DOCX)

### S1 Text. Instructions provided to subjects when performing the task.

https://doi.org/10.1371/journal.pbio.2000638.s014

(DOCX)

### S1 Movie. Example movie of subject performing several trials of the task, starting from the home screen.

(Readers can also download the app, at http://www.thegreatbrainexperiment.com).

https://doi.org/10.1371/journal.pbio.2000638.s015

(MP4)

## Acknowledgments

We thank all members of The Great Brain Experiment team for support and interaction throughout the project, in particular Neil Millstone (White Bat Games) for designing and coding the app and Peter Zeidman (Wellcome Trust Centre for Neuroimaging) for implementing server-side data collection and curation.

## References

- 1. Payne JW (1976) Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance 16: 366–387.
- 2. Wilson RC, Geana A, White JM, Ludvig EA, Cohen JD (2014) Humans use directed and random exploration to solve the explore–exploit dilemma. Journal of Experimental Psychology: General 143: 2074–2081.
- 3. Wald A (1945) Sequential Tests of Statistical Hypotheses. The Annals of Mathematical Statistics 16: 117–186.
- 4. Ditterich J (2010) A Comparison between Mechanisms of Multi-Alternative Perceptual Decision Making: Ability to Explain Human Behavior, Predictions for Neurophysiology, and Relationship with Decision Theory. Frontiers in Neuroscience 4:184.
- 5. Kahneman D, Tversky A (1984) Choices, values, and frames. American Psychologist 39: 341–350.
- 6. Krajbich I, Armel C, Rangel A (2010) Visual fixations and the computation and comparison of value in simple choice. Nature Neuroscience 13: 1292–1298. pmid:20835253
- 7. Nelson JD (2005) Finding Useful Questions: On Bayesian Diagnosticity, Probability, Impact, and Information Gain. Psychological Review 112: 979–999. pmid:16262476
- 8. Markant DB, Settles B, Gureckis TM (2015) Self-Directed Learning Favors Local, Rather Than Global, Uncertainty. Cognitive Science.
- 9. Sanborn AN, Griffiths TL, Navarro DJ (2010) Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review 117: 1144–1167. pmid:21038975
- 10. Nickerson RS (1998) Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2: 175–220.
- 11. Wason PC (1960) On the failure to eliminate hypotheses in a conceptual task. Quarterly Journal of Experimental Psychology 12: 129–140.
- 12. Wason PC (1968) Reasoning about a rule. Quarterly Journal of Experimental Psychology 20: 273–281. pmid:5683766
- 13. Klayman J, Ha Y-w (1987) Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review 94: 211–228.
- 14. Hendrickson AT, Navarro DJ, Perfors A (2016) Sensitivity to hypothesis size during information search. Decision 3: 62–80.
- 15. Tweney RD, Doherty ME, Worner WJ, Pliske DB, Mynatt CR, et al. (1980) Strategies of rule discovery in an inference task. Quarterly Journal of Experimental Psychology 32: 109–123.
- 16. Wyckoff LB Jr. (1952) The role of observing responses in discrimination learning. Part I. Psychological Review 59: 431–442.
- 17. Bromberg-Martin ES, Hikosaka O (2009) Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards. Neuron 63: 119–126. pmid:19607797
- 18. Eliaz K, Schotter A (2007) Experimental Testing of Intrinsic Preferences for NonInstrumental Information. American Economic Review 97: 166–169.
- 19. Kreps DM, Porteus EL (1978) Temporal Resolution of Uncertainty and Dynamic Choice Theory. Econometrica 46: 185–220.
- 20. Williams DR, Williams H (1969) Auto-maintenance in the pigeon: sustained pecking despite contingent non-reinforcement1. Journal of the Experimental Analysis of Behavior 12: 511–520. pmid:16811370
- 21. Hershberger WA (1986) An approach through the looking-glass. Animal Learning & Behavior 14: 443–451.
- 22. Guitart-Masip M, Huys QJM, Fuentemilla L, Dayan P, Duzel E, et al. (2012) Go and no-go learning in reward and punishment: Interactions between affect and effect. NeuroImage 62: 154–166. pmid:22548809
- 23. Anderson BA, Laurent PA, Yantis S (2011) Value-driven attentional capture. Proceedings of the National Academy of Sciences 108: 10367–10371.
- 24. Gottlieb J, Hayhoe M, Hikosaka O, Rangel A (2014) Attention, Reward, and Information Seeking. Journal of Neuroscience 34: 15497–15504. pmid:25392517
- 25. Brown HR, Zeidman P, Smittenaar P, Adams RA, McNab F, et al. (2014) Crowdsourcing for Cognitive Science–The Utility of Smartphones. PLoS ONE 9: e100662. pmid:25025865
- 26.
Sutton R, Barto A (1998) Reinforcement Learning: MIT Press.
- 27. Rutledge RB, Skandali N, Dayan P, Dolan RJ (2015) Dopaminergic Modulation of Decision Making and Subjective Well-Being. Journal of Neuroscience 35: 9811–9822. pmid:26156984
- 28.
Kline RB (2013) Beyond Significance Testing: Statistics Reform in the Behavioral Sciences: American Psychological Association.
- 29. Nelson JD (2009) Naïve optimality: Subjects' heuristics can be better motivated than experimenters' optimal models. Behavioral and Brain Sciences 32: 94–95.
- 30.
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. Cambridge, MA: MIT Press.
- 31. Hunt LT, Rutledge RB, Malalasekera WMN, Kennerley SW, Dolan RJ (2016) Data from: Approach-induced biases in human information sampling. Dryad Digital Repository. Openly available via: http://dx.doi.org/10.5061/dryad.nb41c.
- 32. Huys QJ, Cools R, Golzer M, Friedel E, Heinz A, et al. (2011) Disentangling the roles of approach, activation and valence in instrumental and pavlovian responding. PLoS Comput Biol 7: e1002028. pmid:21556131
- 33. Rutledge RB, Skandali N, Dayan P, Dolan RJ (2014) A computational and neural model of momentary subjective well-being. Proceedings of the National Academy of Sciences 111: 12252–12257.
- 34. Pirolli P, Card S (1999) Information foraging. Psychological Review 106: 643–675.
- 35. Freidin E, Kacelnik A (2011) Rational Choice, Context Dependence, and the Value of Information in European Starlings (Sturnus vulgaris). Science 334: 1000–1002. pmid:22096203
- 36. Doll BB, Hutchison KE, Frank MJ (2011) Dopaminergic Genes Predict Individual Differences in Susceptibility to Confirmation Bias. Journal of Neuroscience 31: 6188–6198. pmid:21508242
- 37.
Nelson JD (2009) Confirmation Bias. In: Kattan MW, editor. Encyclopaedia of medical decision making. Los Angeles: Sage. pp. 167–171.
- 38. Klayman J (1995) Varieties of Confirmation Bias. 32: 385–418.
- 39. Hikosaka O, Kim HF, Yasuda M, Yamamoto S (2014) Basal Ganglia Circuits for Reward Value–Guided Behavior. Annual Review of Neuroscience 37: 289–306. pmid:25032497
- 40. Hunt LT, Dolan RJ, Behrens TE (2014) Hierarchical competitions subserving multi-attribute choice. Nat Neurosci 17: 1613–1622. pmid:25306549
- 41. Sokol-Hessner P, Hsu M, Curley NG, Delgado MR, Camerer CF, et al. (2009) Thinking like a trader selectively reduces individuals' loss aversion. Proceedings of the National Academy of Sciences 106: 5035–5040.