DynaMITE-RL: A Dynamic Model for Improved Temporal Meta Reinforcement Learning
Abstract
We introduce a meta-reinforcement learning (meta-RL) approach, called DynaMITE-RL, to perform approximate inference in environments where the latent information evolves slowly between subtrajectories called sessions.
We identify three key modifications to contemporary meta-RL methods: consistency of latent information during sessions, session masking, and prior latent conditioning.
We demonstrate the necessity of these modifications on various downstream applications from discrete Gridworld environments to continuous control and simulated robot assistive tasks and find that our approach significantly outperforms contemporary baselines.
We identify three key modifications to contemporary meta-RL methods: consistency of latent information during sessions, session masking, and prior latent conditioning.
We demonstrate the necessity of these modifications on various downstream applications from discrete Gridworld environments to continuous control and simulated robot assistive tasks and find that our approach significantly outperforms contemporary baselines.