Approximate planning in spatial search

doi:10.1371/journal.pcbi.1012582

Fig 1.

The decision problem facing an agent.

The agent’s goal is to take actions that maximize long-run rewards. The agent can solve this problem by planning a path through a decision tree where nodes represent possible future states of the world, and edges represent actions the agent could take. The agent constructs the tree by iteratively considering possible future states, and plans by choosing the sequence of actions that minimizes costs, while taking into account the probabilities of success. Our task captures this process in a spatial setting, by mapping costs to steps taken to make an observation, and probabilities of success to the relative size of an observed area.

More »

Expand

Fig 2.

An example path in the Maze Search Task (MST).

Black tiles are not-yet-observed areas, which hide an exit (red square). This maze has six ‘rooms’, groups of black tiles that are revealed all at once. Revealing tiles can be done in any order, but players are incentivized to plan their path so as to reach a hidden exit in fewer steps.

More »

Expand

Fig 3.

Decision-tree for a maze with four rooms (hidden tiles that are revealed together).

The tree abstracts away from specific moves like ‘up’ and ‘left’ and considers more general actions like which area to uncover next. The root of the decision-tree corresponds to the player’s starting location. The four nodes accessible from the root indicate the possible observations that can be made next, followed by the observations that can follow each of those, and so on.

More »

Expand

Fig 4.

Examples of maze designs illustrating differences between models’ predictions.

Here “S” indicates the starting position (the root of the decision tree). The initial observations accessible from the starting location are numbered as 1 and 2. Hatching indicates cells that will be revealed by traveling in each direction. A. In the initial decision heuristics can not distinguish between the available choices, as both nodes can be reached in 2 steps, and reveal 6 cells each. B. The Expected Utility model is indifferent between the two directions, as it trades probabilities of success in each direction against the distance cost. In contrast, the Discounted Utility model, and the Steps heuristic can both predict the human preference for visiting the closer room. C. In this example all models except the Discounted Utility model are indifferent between the two directions. The Discounted Utility model can predict human preference for going right (node 2).

More »

Expand

Fig 5.

Experiment 1, results.

A. Model performances, measured as the total log likelihood of each model across all five folds. Shorter bars indicate better fit to human behavior. B. Bootstrapped correlations of choice probabilities aggregated across participants with each model’s predictions. Error bars indicate 95% confidence intervals.

More »

Expand

Fig 6.

Experiment 2 results.

A. Model performances, measured as the total log likelihood of each model across all five folds. Shorter bars indicate better fit to human behavior. B. Bootstrapped correlations of choices aggregated across participants with each model’s predictions. Error bars indicate 95% confidence intervals.

More »