Google Research

CaQL: Continuous Action Q-Learning

Proceedings of the Eighth International Conference on Learning Representations (ICLR-20), Addis Ababa, Ethiopia (2020)


In this work we propose CaQL, a value-based reinforcement learning (RL) algorithm that handles continuous actions, whose Q-function is modeled by a generic feed-forward neural network. We show that the problem of calculating Bellman residual can be posed as a mixed-integer linear programming (MILP) problem. Furthermore to reduce the complexity of computing Bellman residual, we propose three techniques (i) dynamic tolerance, (ii) dual filter, (iii) clustering to speed up the computation of max-Q values. Finally, to illustrate the efficiency of CaQL, we compare it with state-of-the-art RL algorithms on benchmark continuous control problems that have various action constraints, and show that CaQL significantly outperforms policy-based methods in heavily constrained environments.

