Adaptive algorithms for shaping behavior

doi:10.1371/journal.pcbi.1013454

Adaptive algorithms for shaping behavior

Fig 3

A: An overview of the POMCP teacher, which cycles between inferring the student’s q values, innate bias and learning rate based on the transcript and planning using a Monte Carlo tree search.

B: The adaptive heuristic (ADP), which employs a simple decision rule to stay, increment or decrement the current difficulty based on the estimated success rate (computed using an exponential moving average over past transcripts). C: POMCP and ADP are comparable and significantly outperform other algorithms [39] when the task is non-trivial (low ε), including when INC fails (). Here N = 10. Note that planning using POMCP is intractable when . Barplot means are estimated from 10 repeats. D,E: POMCP and ADP adaptively alternate between difficulty levels, thereby preventing catastrophic extinction. Note the drop in difficulty levels after significant extinction in both cases. Here .

doi: https://doi.org/10.1371/journal.pcbi.1013454.g003