Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model

doi:10.1371/journal.pcbi.1004523

Fig 1.

The transitive inference procedure, as implemented for rhesus macaques responding using eye tracking.

(Top) Each session used a novel seven-item list, like the one depicted here. However, subjects were never presented with the entire list. (Middle) Each trial began with a fixation point. Following fixation, two stimuli appeared, and subjects received feedback upon a saccade to either stimulus. If the stimulus appearing earlier in the list was selected, a reward was delivered; if the other stimulus was selected, the animal was subjected to a timeout. Either outcome constituted the completion of a trial. In the event of an incomplete trial (e.g. subjects fixating but failing to saccade to a stimulus) was deemed incomplete and did not count toward the set of trials completed. All dashed lines and arrows represent eye movements and fixation areas, and did not appear on the screen. (Bottom) Subjects were initially trained only on the six adjacent pairs. Following adjacent pair training, subjects were then tested on all twenty-one pairs. These varied in their ordinal distance (with the pair AG being the largest). Additionally, six pairs were considered the critical transfer pairs (shaded in gray) because they were neither adjacent nor did they include the terminal items. Consequently, these are the pairs that provide the strongest test of transitive inference and symbolic distance effects.

More »

Expand

Fig 2.

Outline of the betasort algorithm over the course of one trial.

The algorithm’s logic is presented in both a schematic (left) and detailed (right) outline. Rectangles refer to operations, diamonds to logical branches, and octagons to loops that iterate over sets of items. Four phase are depicted: the choice policy (red), the relaxation of the contents of memory (green), the processing of explicit feedback (blue), and implicit inference (yellow).

More »

Expand

Fig 3.

Visualization of Betasort’s adjustment of the beta distributions during a single trial in which an incorrect response is given.

For this example, the trial stimuli are the pair CE. The initial conditions show the beta distributions of a well-learned list, with means marked by a vertical line. During the choice phase, a value is drawn randomly from the beta distributions of each trial stimulus, and the stimulus with the larger random value is chosen. In this example, the algorithm incorrectly selects stimulus E, an unlikely but possible event. Immediately following the choice, but before feedback is taken into account, the positions of all stimuli are relaxed (using ξ = 0.6 for this example). This has the effect of making all density functions slightly more uniform, and reduces the influence of older trials in favor of more recent ones. During explicit feedback, the increases L_E by one, while also increasing U_C by one. This increases the odds of future selections of stimulus C, while decreasing the odds of future selections of stimulus E. Next, the algorithm makes implicit inferences about the positions of all known stimuli that did not appear during the trial. Because stimulus D falls between C and E, its count of successes and failures is consolidated and its position does not change. Stimuli A and B are positioned above the trial stimuli, and so are shifted upward. Stimuli F and G are shifted downward.

More »

Expand

Fig 4.

Monkey and human performance on non-terminal stimulus pairs.

Trial zero is the point of transfer from adjacent-pair to all-pair trials. (A) Smoothed response accuracy for three rhesus macaques, divided into pairs with ordinal distance one (BC, CD, DE, and EF; red), two (BD, DE, and CF; orange), three (BE and DF; green), and four (BF; blue). Subjects show an immediate distance effect from the first transfer trial. (B) Simulated performance using betasort, using each monkey’s maximum-likelihood model parameters for each session. Hypothetical performance is plotted for all distances at all times, to show how the algorithm would respond had it been presented with trials of each type. Like the monkeys, the algorithm displays an immediate distance effect. (C) Simulated performance using betaQ, with maximum-likelihood parameters. Although a small distance effect is observed, performance remains close to chance throughout training. (D) Simulated performance using Q/softmax. Performance remains strictly at chance throughout adjacent-pair training, and only begins to display a distance effect after the onset of the all-pairs trials. (E) Performance of human participants given 36 trials of adjacent-pair training, followed by 90 trials of non-adjacent pairs only, and finally 42 trials of all pairs. Unlike the monkeys, participants rapidly acquire the adjacent pairs, and show only a mild distance effect at transfer. (F-H) Simulations based on human performance using the three algorithms, analogous to panels B through D. As in the monkey case, Q/softmax displays no distance effect at all until non-adjacent pairs are presented.

More »

Expand

Fig 5.

Estimated response accuracy on the first transfer trial for each of the 21 possible pairs.

Estimates compare performance by subjects (blue lines) to those generated by simulations using each algorithm. Those pairs with a gray backdrop are the critical transfer pairs that are not expected to be subject to the terminal item effect. Shaded regions around each point/line represent the 95% confidence interval for the mean, determined using bootstrapping. (A) Monkey performance (green) compared to the performance of the betasort algorithm (red), given each session’s maximum likelihood parameter estimates. An overall distance effect is reliably observed from the simulation. (B) Monkey performance (green) compared to the betaQ algorithm (blue), given maximum likelihood parameters. Although a distance effect is evident among critical pairs, betaQ fails to perform appropriate levels of accuracy. (C) Monkey performance (green) compared to the Q/softmax algorithm (brown), given maximum likelihood parameters. Apart from a robust terminal item effect, the algorithm’s responding is strictly at chance, including all critical transfer pairs. (D-F) Human performance compared to the three algorithms, analogous to panels A through C. Although none of the algorithms precisely resemble the participants, the betasort algorithm comes closest, yielding a distance effect on critical transfer pairs.

More »

Expand

Fig 6.

Simulated response accuracy for all stimulus pairs over the course of 200 trials of adjacent-pair training.

Performance was modeled using betasort (red), betaQ (blue), and Q/softmax (brown). Critical transfer pairs are indicated with a gray shaded background. Both betasort and betaQ used the same parameters (τ = 0.05, ξ = 0.95), while Q/softmax used the parameters (α = 0.03, β = 10). Betasort shows more pronounced transfer in the critical pairs, whereas betaQ shows a more pronounced terminal item effect. Q/softmax rapidly acquires the terminal items, but remains strictly at chance for all non-terminal pairs.

More »

Expand

Fig 7.

Visualization of the contents of memory for the three algorithms under simulated conditions.

Three phases were included for each algorithm: 200 trials of adjacent pairs only, followed by 200 trials of all pairs, and then followed by 200 massed trials of only the pair FG. (A) Expected value for each stimulus under the betasort algorithm, given parameters of τ = 0.05 and ξ = 0.95. Not only is learning during adjacent pair training faster, but massed trials of FG do not disrupt the algorithm’s representation of the order, because occasional erroneous selection of stimulus G increases the value of all stimuli, not just stimulus F. (B) Expected value for each stimulus under the betaQ algorithm, given parameters of τ = 0.05 and ξ = 0.95. Although the algorithm derives an ordered inference by the time the procedure switches to all pairs, that order is not preserved during the massed trials of FG, as a result of the lack of inferential updating. (C) Expected value Q for each stimulus under the Q/softmax algorithm, given parameters of α = 0.03 and β = 10. Values for non-terminal items remain fixed at 50% throughout adjacent pair training, and only begin to diverge when all pairs are presented in a uniformly intermixed fashion. Subsequent massed training on the pair FG disrupts the ordered representation because rewards drive the value of stimulus F (and the value of stimulus G down) while the other stimuli remain static.

More »

Expand

Table 1.

Median runtime of 1000 trials of betasort, as a result of 1000 simulations.

More »

Expand