Jump to Content

Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management

Dhawal Gupta
Mohammad Ghavamzadeh
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS-23), New Orleans (2023)
Google Scholar


Reinforcement learning (RL) has offered great promise for developing dialogue management (DM) agents that avoid being short-sighted, conduct rich conversations, and maximize overall user satisfaction. Despite recent developments in deep RL and language models (LMs), using RL to power conversational chatbots remain a formidable challenge. This is because deep RL algorithms require online exploration to learn effectively, but collecting fresh human-bot interactions can be expensive and unsafe. This issue is exacerbated by the combinatorial action space that these algorithms need to handle, as most LM agents generate responses at the word-level. Leveraging the recent advances of Mixture-of-Expert Language Models (MoE-LMs) that capture diverse semantics, generate utterances of different intents, and are amenable for multi-turn DM, we develop a gamut of offline RL algorithms that excel in dialogue planning. Through exploiting the MoE-LM structure, our methods significantly reduce the action space and improve the efficacy of RL DM. We compare that with SOTA methods on open-domain dialogues to demonstrate their effectiveness both in the diversity of generated utterances and the overall DM performance.

Research Areas