IMO^3: Interactive Multi-Objective Off-Policy Optimization

Nan Wang
Hongning Wang
Branislav Kveton
Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI-22), Vienna(2022), pp. 3523-3529 (to appear)


Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO3). The key idea of IMO3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO3 identifies a near-optimal policy with high probability, depending on the amount of designer feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems.

Research Areas