- Abheek Ghosh
- Dheeraj Nagaraj
- Manish Jain
- Milind Tambe
We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We further discuss why the Whittle index policies can provably fail in these settings, despite indexability and how even asymptotic optimality does not translate well to practically relevant planning horizons.
We then propose an alternate planning algorithm based on the mean-field method, which borrows ideas from existing research with some improvements. This algorithm can provably and efficiently obtain near-optimal policies when the number of arms, $N$, is large without the stringent structural assumptions required by Whittle index policies. Our approach is hyper-parameter free, and we provide an improved non-asymptotic analysis which has a) a better dependence on problem dependent parameters b) high probability upper bounds which show that the reward of the policy is reliable c) matching lower bounds for this algorithm, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines.
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work