- Vishwajeet Agrawal
- Pradeep Shenoy
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in a 2-armed bandit task find that a modification of classical Q-learning algorithms, with context-dependent learning rates, better explains behavior compared to constant learning rates. We propose a simple alternative: humans directly track the decision variable underlying choice in the task. Under this reframing, the asymmetric learning rates can be reinterpreted as moving towards certainty in choice. We describe how our model incorporates partial feedback (outcomes on chosen arms) and complete feed- back (outcome on chosen & unchosen arms), and show that our model significantly outperforms previously proposed models on a range of datasets. Our reframing of the computational models adds nuance to previous findings of perseverative behavior in bandit tasks; we show evidence of context- dependent choice perseveration, i.e., that humans persevere in their choices unless contradictory evidence is presented.