Graphical models for bandit problems
Abstract
We introduce a rich class of graphical models
for multi-armed bandit problems that permit
both the state or context space and the action
space to be very large, yet succinctly specify
the payoffs for any context-action pair. Our
main result is an algorithm for such models
whose regret is bounded by the number of
parameters and whose running time depends
only on the treewidth of the graph substructure induced by the action space.