Jump to Content

An anatomical substrate of credit assignment in reinforcement learning

Jorgen Kornfeld
Michale S. Fee
Philipp Schubert
Winfried Denk
bioRxiv (2020)

Abstract

How is experience used to improve performance? In both biological and artificial systems, the optimization of parameters that affect behavior requires a process that determines whether a parameter affects the outcome and then modifies the parameter accordingly. Central to the recent bloom of artificial intelligence has been the error-backpropagation algorithm(Rumelhart, Hinton, and Williams 1986) , which computationally retraces the signal from the output to each synapse (weight) and allows a large number of parameters to be optimized in parallel at high learning rates. Biological systems, however, lack an obvious mechanism to retrace the signal path. Here we show, by combining high-throughput volume electron microscopy (Denk and Horstmann 2004) and automated connectomic analysis(Januszewski et al. 2018; Dorkenwald et al. 2017; Schubert et al. 2019) , that the synaptic architecture of songbird basal ganglia supports a form of local credit assessment proposed in a model of songbird reinforcement learning (M. S. Fee and Goldberg 2011). We show that three of this model’s major predictions hold true: first, cortical axons that encode exploratory motor variability terminate predominantly on dendritic shafts of spiny neurons. Second, cortical axons that encode timing seek out spines, which enable calcium-based coincidence detection (R. Yuste and Denk 1995) and appear to be capable of creating and storing eligibility traces (Yagishita et al. 2014). Third, synapse pairs that presynaptically share a cortical timing axon and post-synaptically a medium spiny dendrite are substantially more similar in size than expected, indicating a history of Hebbian plasticity (Bartol et al. 2015; Kasthuri et al. 2015) . Combined with numerical simulations these data provide strong evidence for a model of basal ganglia learning with a biologically plausible credit assignment mechanism.

Research Areas