Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Kai Wang*

Lily Xu*

Aparna Taneja

Milind Shashikant Tambe

AAAI 2023 (to appear)

Download Google Scholar

Abstract

Restless multi-armed bandits (RMABs) are an extension of multi-armed bandits (MABs) with state information associated with arms, where the states evolve restlessly with different transition probabilities depending on whether the arms are pulled. The additional state information in RMABs captures broader applications with state dependency, including digital marketing and healthcare recommendation. However, solving RMABs requires information on transition dynamics, which is often not available upfront. This paper considers learning the transition probabilities in an RMAB setting while maintaining small regret. We use the confidence bounds of transition probabilities to define an optimistic Whittle index policy to solve the RMAB problem while maintaining sub-linear regret compared to the benchmark. Our algorithm, UCWhittle, leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than other online learning baselines without structural information. We evaluate UCWhittle on real-world healthcare data to help reduce maternal mortality.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities