Google Research

Tracking what matters: a decision-variable account of human behavior in bandit tasks

Annual Meeting of the Cognitive Science Society (CogSci 2021) (2021) (to appear)


We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in a 2-armed bandit task find that a modification of classical Q-learning algorithms, with context-dependent learning rates, better explains behavior compared to constant learning rates. We propose a simple alternative: humans directly track the decision variable underlying choice in the task. Under this reframing, the asymmetric learning rates can be reinterpreted as moving towards certainty in choice. We describe how our model incorporates partial feedback (outcomes on chosen arms) and complete feed- back (outcome on chosen & unchosen arms), and show that our model significantly outperforms previously proposed models on a range of datasets. Our reframing of the computational models adds nuance to previous findings of perseverative behavior in bandit tasks; we show evidence of context- dependent choice perseveration, i.e., that humans persevere in their choices unless contradictory evidence is presented.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work