SkyRL + OpenEnv: Training a RL Agent in OpenEnv

This example uses the fsdp backend.

In this example, we walk through a simple example on how to train a reinforcement learning agent using SkyRL with OpenEnv environments. OpenEnv provides isolated execution environments for agentic RL training with Gymnasium-style APIs.

How does it work?

SkyRL integrates with any Gymnasium API-based environment easily with SkyRL-Gym, which provides a simple interface for text-based environments called BaseTextEnv. We integrate OpenEnv environments through a custom environment wrapper OpenEnv that implements the BaseTextEnv interface. This wrapper allows SkyRL to interact with various OpenEnv environments, including:

Echo Environment: Simple echo environment for testing
Coding Environment: Python code execution in sandboxed environment
OpenSpiel Environment: Game environments using OpenSpiel
Atari Environment: Classic Atari game environments
SUMO-RL Environment: Traffic simulation environments
FinRL Environment: Financial trading environments

The integration works by:

Environment Registration: The OpenEnv environment is registered dynamically in the entrypoint using register() from skyrl_gym.envs
Environment Initialization: SkyRL creates an OpenEnv client using from_docker_image() to connect to the appropriate Docker container
Action Parsing: LLM responses are parsed into environment-specific actions (e.g., EchoAction, CodeAction)
Step Execution: Actions are executed in the isolated environment and observations/rewards are returned
Episode Management: The environment tracks conversation history and manages episode termination

At a high level, the integration looks as follows:

# The OpenEnv wrapper class
class OpenEnv(BaseTextEnv):
    def __init__(self, env_config: DictConfig, extras: Dict[str, Any] = {}):
        self.env_name = extras["env_name"]
        self.env_type = self._get_env_class(self.env_name)
        self.env = self.env_type.from_docker_image(self.env_name + ":latest")
        self.initial_step_result = self.env.reset()

    def step(self, action: str) -> BaseTextEnvStepOutput:
        action = self._get_openenv_action(self.env_name, action)
        result = self.env.step(action)
        # Process result and return observations, reward, done

Finally, we also register the new environment in the entrypoint script:

# In integrations/openenv/entrypoints/main_openenv.py
from skyrl_gym.envs import register

register(
    id="openenv",
    entry_point="integrations.openenv.env:OpenEnv",
)

Environment Setup

Prerequisites: Ensure that you have Docker installed

First, we need to install the OpenEnv environments:

# Execute from SkyRL root directory
cd SkyRL
uv run integrations/openenv/install_environment.py echo-env
# Or install all environments:
# uv run integrations/openenv/install_environment.py

This will pull the necessary Docker images for the OpenEnv environments.

Dataset Preparation

For training, we use simple example datasets generated by the prepare_dummy_dataset.py script:

# Execute from SkyRL root directory
cd SkyRL
uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv --env_name echo_env

This creates training and validation datasets with example prompts for the specified environment. We provide dummy train set examples for echo_env and coding_env.

Training

We provide an example training script for Qwen2.5-1.5B-Instruct on OpenEnv environments:

# Execute from SkyRL root directory
cd SkyRL
bash integrations/openenv/run_openenv.sh

You can customize the training by setting environment variables:

ENV_NAME=echo_env NUM_GPUS=2 bash integrations/openenv/run_openenv.sh

Supporting environments are: echo_env, coding_env, openspiel-env, atari-env, sumo-rl-env, finrl-env.

Example Reward Curve

Here's how the reward curve for the above example script looks like after a few steps:

OpenEnv Reward

Tips

Docker Resources: Ensure sufficient Docker resources are available, especially for computationally intensive environments like Atari or OpenSpiel.
Generation Format: The generation format is expected to be a single action wrapped in <action>...</action> tags for dummy testing. Change _get_openenv_action in integrations/openenv/env.py for custom parsing logic.
Multi-Turn Interaction: Pass MAX_TURNS=xx to enable multi-turn interaction.