Environment
Environment API — Env, BaseTextEnv, step outputs.
Core Classes
class Env
Bases: Generic[ObsType, ActType]
The main SkyRL Gym class for implementing Reinforcement Learning Agents environments.
The main API methods that users of this class need to know are:
-
step- Perform actions (e.g. tool calls) in the environment. Return the observations, the reward for taking that actions, and a boolean valuedone. -
init- Initializes the environment to an initial state, required before calling step. Returns the first observations for a turn and information, i.e. metrics, debug info. -
close- Closes the environment. Important when external software is used, i.e. pygame for rendering, databases
Functions:
| Name | Description |
|---|---|
step | Parse and run one step of action in the environment. |
init | Initialize the environment, returning initial observation and optional metadata. |
close | After the user has finished using the environment, close contains the code necessary to "clean up" the environment. |
Source code in skyrl-gym/skyrl_gym/core.py:19-97
class Env(Generic[ObsType, ActType]):
"""
The main SkyRL Gym class for implementing Reinforcement Learning Agents environments.
The main API methods that users of this class need to know are:
- `step` - Perform actions (e.g. tool calls) in the environment.
Return the observations, the reward for taking that actions, and a boolean value `done`.
- `init` - Initializes the environment to an initial state, required before calling step.
Returns the first observations for a turn and information, i.e. metrics, debug info.
- `close` - Closes the environment.
Important when external software is used, i.e. pygame for rendering, databases
"""
def step(self, action: ActType) -> EnvStepOutput:
"""
Parse and run one step of action in the environment.
Args:
action (ActType): An action provided to the environment.
For example, in our case, the action can be a [str] response generated by an LLM,
which must be parsed and executed accordingly.
Returns:
observations (ObsType): The resulting observations after executing the action.
For example, this could involve executing a SQL query derived from the LLM response
and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.
reward (SupportsFloat): The reward obtained by taking the action.
done (bool): A boolean value for if the episode has ended, in which case further `step` calls will
return undefined results.
info (Dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
This might, for instance, contain: metrics that describe the performance state, variables that are
hidden from observations, or individual reward terms that are combined to produce the total reward.
"""
raise NotImplementedError
def init(self, *kwargs) -> Tuple[ObsType, Dict[str, Any]]:
"""
Initialize the environment, returning initial observation and optional metadata.
Returns:
observations (ObsType): Observations of the initial state. This is analogous to the observations returned by `step`.
info (Dict): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
the ``info`` returned by `step`.
"""
raise NotImplementedError
def close(self):
"""
After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
This is critical for closing rendering windows, database or HTTP connections.
Calling ``close`` on an already closed environment has no effect and won't raise an error.
"""
pass
def __str__(self):
"""
Returns a string of the environment.
Returns:
A string identifying the environment
"""
return f"Env({type(self).__name__})"
def __enter__(self):
"""Support with-statement for the environment."""
return self
def __exit__(self, *args: Any):
"""Support with-statement for the environment and closes the environment."""
self.close()
# propagate exception
return Falsemethod step
step(action: ActType) -> EnvStepOutputParse and run one step of action in the environment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
action | ActType | An action provided to the environment. For example, in our case, the action can be a [str] response generated by an LLM, which must be parsed and executed accordingly. | required |
Returns:
| Name | Type | Description |
|---|---|---|
observations | ObsType | The resulting observations after executing the action. For example, this could involve executing a SQL query derived from the LLM response and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database. |
reward | SupportsFloat | The reward obtained by taking the action. |
done | bool | A boolean value for if the episode has ended, in which case further step calls will return undefined results. |
info | Dict | Contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward. |
Source code in skyrl-gym/skyrl_gym/core.py:35-58
def step(self, action: ActType) -> EnvStepOutput:
"""
Parse and run one step of action in the environment.
Args:
action (ActType): An action provided to the environment.
For example, in our case, the action can be a [str] response generated by an LLM,
which must be parsed and executed accordingly.
Returns:
observations (ObsType): The resulting observations after executing the action.
For example, this could involve executing a SQL query derived from the LLM response
and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.
reward (SupportsFloat): The reward obtained by taking the action.
done (bool): A boolean value for if the episode has ended, in which case further `step` calls will
return undefined results.
info (Dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
This might, for instance, contain: metrics that describe the performance state, variables that are
hidden from observations, or individual reward terms that are combined to produce the total reward.
"""
raise NotImplementedErrormethod init
init(*kwargs) -> Tuple[ObsType, Dict[str, Any]]Initialize the environment, returning initial observation and optional metadata.
Returns:
| Name | Type | Description |
|---|---|---|
observations | ObsType | Observations of the initial state. This is analogous to the observations returned by step. |
info | Dict | This dictionary contains auxiliary information complementing observation. It should be analogous to the info returned by step. |
Source code in skyrl-gym/skyrl_gym/core.py:60-69
def init(self, *kwargs) -> Tuple[ObsType, Dict[str, Any]]:
"""
Initialize the environment, returning initial observation and optional metadata.
Returns:
observations (ObsType): Observations of the initial state. This is analogous to the observations returned by `step`.
info (Dict): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
the ``info`` returned by `step`.
"""
raise NotImplementedErrormethod close
close()After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
This is critical for closing rendering windows, database or HTTP connections.
Calling close on an already closed environment has no effect and won't raise an error.
Source code in skyrl-gym/skyrl_gym/core.py:71-78
def close(self):
"""
After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
This is critical for closing rendering windows, database or HTTP connections.
Calling ``close`` on an already closed environment has no effect and won't raise an error.
"""
passclass EnvStepOutput
Bases: TypedDict
Attributes:
| Name | Type | Description |
|---|---|---|
observations | ObsType | |
reward | SupportsFloat | |
done | bool | |
metadata | Optional[Dict[str, Any]] |
Source code in skyrl-gym/skyrl_gym/core.py:12-16
class EnvStepOutput(TypedDict):
observations: ObsType
reward: SupportsFloat
done: bool
metadata: Optional[Dict[str, Any]] = Noneattr observations
observations: ObsTypeattr reward
reward: SupportsFloatattr done
done: boolattr metadata
metadata: Optional[Dict[str, Any]] = NoneText Environment
class BaseTextEnv
BaseTextEnv()Bases: Env[ConversationType, str]
Base environment class for all text-in / text-out environments. Supports tool-calling and multi-turn trajectories.
Exposes only step, init and close.
Input Types:
- ObsType: ConversationType (tool output, LLM input)
- ActType: str (LLM output)
Functions:
| Name | Description |
|---|---|
init_tool_groups | Initialize the tool groups for the environment. |
step | Runs one environment step. |
init | Return the first prompt to be given to the model and optional metadata. |
close | Closes the environment, override if needed by subclasses. |
get_metrics | Return environment-specific metrics for the episode. |
aggregate_metrics | Static method to aggregate metrics across many episodes of this env class. |
Attributes:
| Name | Type | Description |
|---|---|---|
turns | ||
max_turns | ||
tool_groups | ||
tool_to_toolgroup |
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:17-100
class BaseTextEnv(Env[ConversationType, str]):
"""
Base environment class for all text-in / text-out environments.
Supports tool-calling and multi-turn trajectories.
Exposes only `step`, `init` and `close`.
Input Types:
- ObsType: ConversationType (tool output, LLM input)
- ActType: str (LLM output)
"""
def __init__(self):
super().__init__()
# Metadata
self.turns = 0
self.max_turns = 1
# Tool groups
self.tool_groups = []
self.tool_to_toolgroup = {}
def init_tool_groups(self, tool_groups: List = []) -> None:
"""
Initialize the tool groups for the environment.
"""
# Find ToolGroup for a given tool
self.tool_groups = tool_groups
self.tool_to_toolgroup = {}
for tool_group in self.tool_groups:
self.tool_to_toolgroup.update(tool_group.get_tool_to_group_mapping())
def _execute_tool(self, tool_group_name: str, tool_name: str, tool_input: Any) -> str:
"""
Find the right ToolGroup and Tool and execute it.
"""
for group in self.tool_groups:
if group.name == tool_group_name:
return group.execute_tool(tool_name, *tool_input) # tool_input must be tuple or list
raise ValueError(f"ToolGroup '{tool_group_name}' not found.")
def step(self, action: str) -> BaseTextEnvStepOutput:
"""
Runs one environment step.
Return:
- observations: [{"role": "user", "content": observation}]
- reward: float
- done: bool
- postprocessed_action: Optional[str]
- metadata: Dict[str, Any] any metadata
"""
pass
def init(self, prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]:
"""
Return the first prompt to be given to the model and optional metadata.
"""
return prompt, {}
def close(self):
"""
Closes the environment, override if needed by subclasses.
"""
pass
def get_metrics(self) -> Dict[str, Any]:
"""
Return environment-specific metrics for the episode.
Default is empty dict (no metrics).
"""
return {}
@staticmethod
def aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Static method to aggregate metrics across many episodes of this env class.
Default behavior: average the numerics, drop the non-numerics.
"""
from skyrl_gym.metrics import default_aggregate_metrics
return default_aggregate_metrics(metrics)attr turns
turns = 0attr max_turns
max_turns = 1attr tool_groups
tool_groups = []attr tool_to_toolgroup
tool_to_toolgroup = {}method init_tool_groups
init_tool_groups(tool_groups: List = []) -> NoneInitialize the tool groups for the environment.
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:40-48
def init_tool_groups(self, tool_groups: List = []) -> None:
"""
Initialize the tool groups for the environment.
"""
# Find ToolGroup for a given tool
self.tool_groups = tool_groups
self.tool_to_toolgroup = {}
for tool_group in self.tool_groups:
self.tool_to_toolgroup.update(tool_group.get_tool_to_group_mapping())method step
step(action: str) -> BaseTextEnvStepOutputRuns one environment step.
Return:
- observations: [{"role": "user", "content": observation}]
- reward: float
- done: bool
- postprocessed_action: Optional[str]
- metadata: Dict[str, Any] any metadata
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:60-71
def step(self, action: str) -> BaseTextEnvStepOutput:
"""
Runs one environment step.
Return:
- observations: [{"role": "user", "content": observation}]
- reward: float
- done: bool
- postprocessed_action: Optional[str]
- metadata: Dict[str, Any] any metadata
"""
passmethod init
init(prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]Return the first prompt to be given to the model and optional metadata.
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:73-77
def init(self, prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]:
"""
Return the first prompt to be given to the model and optional metadata.
"""
return prompt, {}method close
close()Closes the environment, override if needed by subclasses.
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:79-83
def close(self):
"""
Closes the environment, override if needed by subclasses.
"""
passmethod get_metrics
get_metrics() -> Dict[str, Any]Return environment-specific metrics for the episode. Default is empty dict (no metrics).
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:85-90
def get_metrics(self) -> Dict[str, Any]:
"""
Return environment-specific metrics for the episode.
Default is empty dict (no metrics).
"""
return {}method staticmethod aggregate_metrics
aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]Static method to aggregate metrics across many episodes of this env class. Default behavior: average the numerics, drop the non-numerics.
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:92-100
@staticmethod
def aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]:
"""
Static method to aggregate metrics across many episodes of this env class.
Default behavior: average the numerics, drop the non-numerics.
"""
from skyrl_gym.metrics import default_aggregate_metrics
return default_aggregate_metrics(metrics)class BaseTextEnvStepOutput
Bases: TypedDict
Attributes:
| Name | Type | Description |
|---|---|---|
observations | ConversationType | |
reward | float | |
done | bool | |
metadata | Dict[str, Any] | |
postprocessed_action | Optional[str] |
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:9-14
class BaseTextEnvStepOutput(TypedDict):
observations: ConversationType # OpenAI API Messages Format
reward: float
done: bool
metadata: Dict[str, Any]
postprocessed_action: Optional[str] = Noneattr observations
observations: ConversationTypeattr reward
reward: floatattr done
done: boolattr metadata
metadata: Dict[str, Any]attr postprocessed_action
postprocessed_action: Optional[str] = None