![]() |
ReLab v1.0.0-b
Reinforcement Learning Benchmarks
|
ReLab is a versatile and powerful library for training, evaluating, and analyzing reinforcement learning agents. This tutorial will walk you through its core features, including creating environments, defining agents, and training your first model using ReLabβs Python API. Additionally, you'll learn how to run complete experiments using ReLabβs command-line interface.
When running ReLab scripts, the library organizes all generated files into a data
directory. This structured directory ensures that your experiment outputs are logically grouped, making it easy to access and analyze the results. Below is an overview of the data
directory and its purpose:
Hereβs what each folder contains:
demos/
: demo_500000.gif
shows the agent's behavior after 500,000 training iterations.graphs/
: mean_episodic_reward.pdf
) are stored for each environment and summarize the performance of one or more agents.mean_episodic_reward.tsv
) are also stored here for individual agents, containing the mean and standard deviation of the specified metric at each training step.runs/
: events.out.tfevents...
) that allow you to track the agentβs progress during training.saves/
: model_500000.pt
), allowing you to reload and evaluate the agent at different stages of training.buffer.pt
) saves the replay buffer associated with the last checkpoint iteration, ensuring training can resume seamlessly from where it was left off. For example, if the directory contains model_500.pt
and model_1000.pt
, then buffer.pt
corresponds to the replay buffer at iteration 1000.By organizing experiment outputs in this way, ReLab ensures that your data is easy to locate and manage, enabling you to efficiently analyze results, compare agents, and showcase their learned behaviors.
ReLab's configuration allows you to customize key aspects of training and logging. Here are the most relevant entries:
max_n_steps
: Maximum number of training iterations (default: 50,000,000). checkpoint_frequency
: Number of training iterations between model checkpoints (default: 500,000). tensorboard_log_interval
: Number of training iterations between TensorBoard log updates (default: 5,000). save_all_replay_buffers
: Determines whether all replay buffers are saved (default: False
). False
, only the replay buffer associated with the most recent checkpoint is saved.Example Usage
Before doing anything with ReLab, the relab.initialize()
function must be called. It is the first step to setting up the library, ensuring that all paths are properly configured. Here's a quick breakdown:
This function performs several key steps:
CHECKPOINT_DIRECTORY
and TENSORBOARD_DIRECTORY
) to define where specific files are stored, ensuring consistency across scripts.The relab.agents.make()
function is a factory method that simplifies the creation of reinforcement learning agents in ReLab. By passing the name of the desired agent and optional keyword arguments, you can create and configure agents with ease.
agent_name
: The name of the agent to instantiate. Must be one of the supported agents (listed below). If an unsupported name is provided, the function raises an error.kwargs
: Keyword arguments forwarded to the agent's constructor, allowing you to customize the agent's behavior.Example Usage
Hereβs a table summarizing the supported agents in ReLab. It includes their full names, abbreviations, and key characteristics such as whether they are value-based, distributional, random, or learn a world model.
Abbreviation | Full Name | Value-Based | Distributional | Random Actions | World Model |
---|---|---|---|---|---|
DQN | Deep Q-Network | β | βοΈ | βοΈ | βοΈ |
DDQN | Double Deep Q-Network | β | βοΈ | βοΈ | βοΈ |
CDQN | Categorical Deep Q-Network | β | β | βοΈ | βοΈ |
MDQN | Multi-step Deep Q-Network | β | βοΈ | βοΈ | βοΈ |
QRDQN | Quantile Regression Deep Q-Network | β | β | βοΈ | βοΈ |
NoisyDQN | Noisy Deep Q-Network | β | βοΈ | βοΈ (noisy layers for exploration) | βοΈ |
NoisyDDQN | Noisy Double Deep Q-Network | β | βοΈ | βοΈ (noisy layers for exploration) | βοΈ |
NoisyCDQN | Noisy Categorical Deep Q-Network | β | β | βοΈ (noisy layers for exploration) | βοΈ |
DuelingDQN | Dueling Deep Q-Network | β | βοΈ | βοΈ | βοΈ |
DuelingDDQN | Dueling Double Deep Q-Network | β | βοΈ | βοΈ | βοΈ |
PrioritizedDQN | Prioritized Experience Replay DQN | β | βοΈ | βοΈ | βοΈ |
PrioritizedDDQN | Prioritized Experience Replay DDQN | β | βοΈ | βοΈ | βοΈ |
PrioritizedMDQN | Prioritized Multi-step DQN | β | βοΈ | βοΈ | βοΈ |
RainbowDQN | Rainbow Deep Q-Network | β | β | βοΈ | βοΈ |
RainbowIQN | Rainbow with Implicit Quantile Network | β | β | βοΈ | βοΈ |
IQN | Implicit Quantile Network | β | β | βοΈ | βοΈ |
Random | Random Agent | βοΈ | βοΈ | β | βοΈ |
VAE | Variational Autoencoder | βοΈ | βοΈ | β | β |
BetaVAE | Beta Variational Autoencoder | βοΈ | βοΈ | β | β |
HMM | Hidden Markov Model | βοΈ | βοΈ | β | β |
BetaHMM | Beta Hidden Markov Model | βοΈ | βοΈ | β | β |
CHMM | Critical Hidden Markov Model | β | βοΈ | βοΈ | β |
Notes:
The relab.environments.make()
function is a factory that provides an easy and customizable way to set up Gym environments for training reinforcement learning agents.
env_name
: The name of the environment to instantiate.kwargs
: Keyword arguments forwarded to the environment's constructor, allowing you to customize the environment.The function applies several preprocessing steps:
gym.make
, by default the entire action space is used (18 actions for all Atari games).screen_size
).frame_skip
.stack_size
observations to provide temporal context for agents.Example Usage
At times, you might want to evaluate your agents on a specific subset of Atari games. ReLab provides three predefined Atari benchmarks to simplify this process:
small_benchmark_atari_games()
benchmark_atari_games()
small_benchmark_atari_games()
plus additional titles like Asteroids, Seaquest, and Montezumaβs Revenge.all_atari_games()
Example Usage:
By now, youβve learned about ReLab's features, how to configure the library, create agents and environments, and manage saved data and benchmarks. Letβs bring it all together with a complete training script to demonstrate how these components work in practice:
While you could use Poetry to train and demonstrate the policy of individual agents, ReLab enables you to run full-scale experiments. An experiment automates training, evaluation, and result visualization across multiple agents, environments, and random seeds. Hereβs a breakdown of what the script does:
Example Usage:
--no-local
flag to run experiments using Slurm. Omitting it defaults to run locally.This script ensures a streamlined workflow for conducting experiments, from training to visualization, with minimal manual intervention!
For more details, you can explore the official documentation, which provides an in-depth explanation of all ReLabβs classes. Additionally, the Python scripts in the scripts
directory offer practical examples to help you understand how ReLab works. These resources are great starting points for deepening your understanding and making the most out of ReLab!