Skip to content

iit-DLSLab/stable-baselines3-devkit

Repository files navigation

Stable Baselines3 DevKit

A flexible, modular framework based on Stable Baselines3 for training reinforcement learning (RL) and imitation learning (IL) agents across diverse robotic simulation environments and demonstration datasets.

Overview

This framework provides a unified interface for training policies using either simulation environments or demonstration datasets. The architecture ensures complete independence between data sources (environments vs datasets) and policy implementations, allowing seamless switching between training paradigms while maintaining consistent data processing pipelines.

Who Is This For?

Researchers working on:

  • Robot learning algorithms
  • Multi-task policy learning
  • Sim-to-real transfer
  • Comparative studies across environments/datasets

Practitioners needing:

  • Rapid prototyping of robot policies
  • Flexible experimentation with different architectures
  • Unified training pipeline across multiple simulators

Key Features

Unified Data Interface

  • Environment-agnostic: Works with Isaac Lab, ManiSkill, Aloha and other custom Gym environments
  • Dataset-agnostic: Compatible with LeRobot datasets
  • Shared data format ensures policies work seamlessly across sources

Flexible Training Algorithms

  • Supervised Learning: Behavior cloning from demonstrations
  • On-policy RL: PPO, Recurrent PPO, Transformer PPO (! efficient rollout buffer implementation !)
  • Off-policy RL: SAC

Out-of-the-box Features

  • Distributed training with Accelerate
  • Mixed precision training (FP16/BF16)
  • Gradient accumulation
  • Comprehensive logging (TensorBoard, Weights & Biases)
  • Automatic checkpointing and resumption
  • Extensive test suite

Source Code Architecture

The framework follows a layered architecture that separates concerns:

Architecture Diagram

Component Hierarchy

  1. Data Sources

    • Environments: Isaac Lab, ManiSkill, Aloha (via SB3 Wrapper), or contribute implementing your own!
    • Datasets: LeRobot demonstrations (via DS Wrapper)
    • Both produce standardized observation/action dictionaries
  2. Preprocessors

    • Transform source-specific data formats to policy inputs
    • Handle normalization, image processing, sequence formatting
    • Examples: Gym_2_Mlp, Gym_2_Lstm, Gym_2_Sac, Aloha_2_Lstm
  3. Agents

    • Manage training loops (rollout collection, batch optimization)
    • Interface with policies through preprocessors
    • Handle loss computation and gradient updates
    • Examples: PPO, RecurrentPPO, TransformerPPO, SAC, SL
  4. Policies

    • Neural network architectures (actor-critic or standalone)
    • Independent of data source or training algorithm
    • Examples: MlpPolicy, LSTMPolicy, TransformerPolicy, SACPolicy, TCNPolicy
  5. Entry Points

    • train.py: Online RL from simulation
    • train_off.py: Offline IL from demonstrations
    • predict.py: Policy evaluation and deployment

Data Flow

Environment/Dataset → Wrapper → Preprocessor → Policy → Agent → Optimization
         ↓                         ↓             ↓        ↓
    Standardized              Normalized    Actions    Loss
       Format                   Inputs

Installation

Prerequisites

  • Python 3.11+
  • CUDA 11.8+ (for GPU training)

Core Installation

# Clone repository
git clone https://github.com/johnMinelli/stable-baselines3-devkit
cd src
# Install dependencies
pip install -r requirements.txt

Quick Start

Online RL Training (Simulation)

Train a PPO agent with MLP policy on Isaac Lab:

python train.py \
  --task Isaac-Lift-Cube-Franka-v0 \
  --envsim isaaclab \
  --agent custom_ppo_mlp \
  --num_envs 4096 \
  --device cuda \
  --headless

Train a Recurrent PPO agent with LSTM policy:

python train.py \
  --task Isaac-Velocity-Flat-Anymal-D-v0 \
  --envsim isaaclab \
  --agent custom_ppo_lstm \
  --num_envs 2048 \
  --device cuda \
  --headless

Offline IL Training (Demonstrations required e.g. ManiSkill_StackCube-v1 )

Train an LSTM policy via behavior cloning:

python train_off.py \
  --task SL \
  --agent Lerobot/StackCube/lerobot_sl_lstm_cfg \
  --device cuda \
  --n_epochs 200 \
  --batch_size 64

Policy Evaluation

Evaluate a trained policy:

python predict.py \
  --task Isaac-Velocity-Flat-Anymal-D-v0 \
  --envsim isaaclab \
  --agent custom_ppo_mlp \
  --num_envs 1 \
  --val_episodes 100 \
  --device cuda \
  --resume
  (opt) --checkpoint path/to/best_model.zip

Usage Guide

Creating a New Policy

Policies must inherit from BasePolicy and implement required methods:

from stable_baselines3.common.policies import BasePolicy

class CustomPolicy(BasePolicy):
    def __init__(self, observation_space, action_space, lr_schedule, **kwargs):
        super().__init__(observation_space, action_space, ...)
        # Define networks

    def forward(self, obs):
        # Compute actions, values, log_probs
        return actions, values, log_probs

    def predict_values(self, obs):
        # Compute state values
        return values

Register in agent's policy_aliases:

class PPO(OnPolicyAlgorithm):
    policy_aliases = {
        "MlpPolicy": MlpPolicy,
        "CustomPolicy": CustomPolicy,
    }

Creating a Preprocessor

Preprocessors bridge data sources to policies:

from common.preprocessor import Preprocessor

class CustomPreprocessor(Preprocessor):
    def preprocess(self, obs):
        # Transform observations for policy
        processed = self.normalize_observations(obs)
        # Additional transformations
        return processed

    def postprocess(self, actions):
        # Transform policy outputs for environment
        return self.unnormalize_actions(actions)

Adding a New Environment

  1. Create environment-specific wrapper:
class NewEnvWrapper(gym.Wrapper):
    def __init__(self, env):
        super().__init__(env)
        # Setup observation/action spaces

    def reset(self):
        # Return standardized observations

    def step(self, action):
        # Return standardized obs, reward, done, info
  1. Create preprocessor for the environment:
class NewEnv_2_Policy(Preprocessor):
    # Implement preprocessing logic
  1. Add YAML configuration file in configs/agents/NewEnv/

  2. Update train.py imports:

if args_cli.envsim == "newenv":
    import newenv_package

Contributing

Yes, You can! :)

Citation

If you use this framework in your research, please cite:

@misc{stable-baselines3-devkit,
  title = {Stable Baselines3 DevKit},
  author = {Giovanni Minelli},
  year = {2026},
  url = {https://github.com/johnMinelli/stable-baselines3-devkit}
}

Acknowledgments

This framework builds upon:

About

SB3 on steroids

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages