Google Research

DQN Replay Dataset

Description

The DQN Replay Dataset is generated using DQN agents trained on 60 Atari 2600 games for 200 million frames each, while using sticky actions (with 25% probability that the agent’s previous action is executed instead of the current action) to make the problem more challenging. For each of the 60 games, we train 5 DQN agents with different random initializations, and store all of the (state, action, reward, next state) tuples encountered during training into 5 replay datasets per game, resulting in a total of 300 datasets.

The DQN Replay Dataset can be used for training offline RL agents, without any interaction with the environment during training. Each game replay dataset is approximately 3.5 times larger than ImageNet and includes samples from all of the intermediate policies seen during the optimization of online DQN.

This logged DQN data can be found in the public GCP bucket gs://atari-replay-datasets which can be downloaded using gsutil. To install gsutil, follow the instructions here.

After installing gsutil, run the command to copy the entire dataset:

gsutil -m cp -R gs://atari-replay-datasets/dqn

To use the dataset only for a specific Atari 2600 game (e.g., replace GAME_NAME by Pong to download the logged game replay datasets for Pong), run the command:

gsutil -m cp -R gs://atari-replay-datasets/dqn/[GAME_NAME]

Note that the dataset consists of approximately 50 million tuples due to frame skipping (i.e., repeating a selected action for k consecutive frames) of 4. The stickiness parameter is set to 0.25, i.e., there is a 25% chance at every time step that the environment will execute the agent's previous action again, instead of the agent's new action.

Since the (state, action, reward, next state) tuples in DQN replay dataset are stored in the order they were experienced by online DQN during training, various data collection strategies for benchmarking offline RL can be induced by subsampling the replay dataset containing 200 million frames. For example, the first k million frames from the DQN replay dataset emulate exploration data with suboptimal returns while the last k million frames are analogous to near-expert data with stochasticity. Another option is to randomly subsample the entire dataset to create smaller offline datasets. Based on the popularity and ease of experimentation on Atari 2600 games, the DQN replay dataset can be used for benchmarking offline RL.

Refer to the open-source code for loading the dataset with the Dopamine library. More details can be found at offline-rl.github.io.

Additional information: AI Blog: An Optimistic Perspective on Offline Reinforcement Learning