AutoML for Contextual Bandits

Praneet Dutta; Man Kit (Joe) Cheuk; Jonathan Kim; Massimo Mascaro

AutoML for Contextual Bandits

Praneet Dutta

Man Kit (Joe) Cheuk

Jonathan Kim

Massimo Mascaro

REVEAL Workshop @ ACM RecSys 2019 Conference, Copenhagen (2019) (to appear)

Download Google Scholar

Abstract

Contextual Bandits is one of the widely popular techniques used in applications such as personalization, recommendation systems, mobile health, causal marketing etc . As a dynamic approach, it can be more efficient than standard A/B testing in minimizing regret. We propose an end to end meta-learning pipeline to approximate the optimal Q function for contextual bandits problems. We see that our model is able to perform much better than random exploration, being more regret efficient and able to converge with a limited number of samples, while remaining very general and easy to use due to the meta-learning approach.We used a linearly annealed e-greedy exploration policy to define the exploration vs exploitation schedule. We tested the system on a synthetic environment to characterize it fully and we evaluated it on some open source datasets. We see that our model outperforms or performs comparatively to other models while requiring no tuning nor feature engineering.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

AutoML for Contextual Bandits

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs