SkyRL + OpenEnv: Training a RL Agent in OpenEnv
In this example, we walk through a simple example on how to train a reinforcement learning agent using SkyRL with OpenEnv environments. OpenEnv provides isolated execution environments for agentic RL training with Gymnasium-style APIs.
How does it work?
SkyRL integrates with any Gymnasium API-based environment easily with SkyRL-Gym, which provides a simple interface for text-based environments called BaseTextEnv. We integrate OpenEnv environments through a custom environment wrapper OpenEnv that implements the BaseTextEnv interface. This wrapper allows SkyRL to interact with various OpenEnv environments, including:
- Echo Environment: Simple echo environment for testing
- Coding Environment: Python code execution in sandboxed environment
- OpenSpiel Environment: Game environments using OpenSpiel
- Atari Environment: Classic Atari game environments
- SUMO-RL Environment: Traffic simulation environments
- FinRL Environment: Financial trading environments
The integration works by:
- Environment Registration: The OpenEnv environment is registered dynamically in the entrypoint using
register()fromskyrl_gym.envs - Environment Initialization: SkyRL creates an OpenEnv client using
from_docker_image()to connect to the appropriate Docker container - Action Parsing: LLM responses are parsed into environment-specific actions (e.g.,
EchoAction,CodeAction) - Step Execution: Actions are executed in the isolated environment and observations/rewards are returned
- Episode Management: The environment tracks conversation history and manages episode termination
At a high level, the integration looks as follows:
# The OpenEnv wrapper class
class OpenEnv(BaseTextEnv):
def __init__(self, env_config: DictConfig, extras: Dict[str, Any] = {}):
self.env_name = extras["env_name"]
self.env_type = self._get_env_class(self.env_name)
self.env = self.env_type.from_docker_image(self.env_name + ":latest")
self.initial_step_result = self.env.reset()
def step(self, action: str) -> BaseTextEnvStepOutput:
action = self._get_openenv_action(self.env_name, action)
result = self.env.step(action)
# Process result and return observations, reward, doneFinally, we also register the new environment in the entrypoint script:
# In integrations/openenv/entrypoints/main_openenv.py
from skyrl_gym.envs import register
register(
id="openenv",
entry_point="integrations.openenv.env:OpenEnv",
)Environment Setup
Prerequisites: Ensure that you have Docker installed
First, we need to install the OpenEnv environments:
# Execute from skyrl-train directory
cd SkyRL/skyrl-train
uv run integrations/openenv/install_environment.py echo-env
# Or install all environments:
# uv run integrations/openenv/install_environment.pyThis will pull the necessary Docker images for the OpenEnv environments.
Dataset Preparation
For training, we use simple example datasets generated by the prepare_dummy_dataset.py script:
# Execute from skyrl-train directory
cd SkyRL/skyrl-train
uv run integrations/openenv/prepare_dummy_dataset.py --output_dir ~/data/openenv --env_name echo_envThis creates training and validation datasets with example prompts for the specified environment. We provide dummy train set examples for echo_env and coding_env.
Training
We provide an example training script for Qwen2.5-1.5B-Instruct on OpenEnv environments:
# Execute from skyrl-train directory
cd SkyRL/skyrl-train
bash integrations/openenv/run_openenv.shYou can customize the training by setting environment variables:
ENV_NAME=echo_env NUM_GPUS=2 bash integrations/openenv/run_openenv.shSupporting environments are: echo_env, coding_env, openspiel-env, atari-env, sumo-rl-env, finrl-env.
Example Reward Curve
Here's how the reward curve for the above example script looks like after a few steps:

Tips
- Docker Resources: Ensure sufficient Docker resources are available, especially for computationally intensive environments like Atari or OpenSpiel.
- Generation Format: The generation format is expected to be a single action wrapped in
<action>...</action>tags for dummy testing. Change_get_openenv_actioninintegrations/openenv/env.pyfor custom parsing logic. - Multi-Turn Interaction: Pass
MAX_TURNS=xxto enable multi-turn interaction.