Environment

Core Classes

class `Env`

The main SkyRL Gym class for implementing Reinforcement Learning Agents environments.

The main API methods that users of this class need to know are:

step - Perform actions (e.g. tool calls) in the environment. Return the observations, the reward for taking that actions, and a boolean value done.
init - Initializes the environment to an initial state, required before calling step. Returns the first observations for a turn and information, i.e. metrics, debug info.
close - Closes the environment. Important when external software is used, i.e. pygame for rendering, databases

Functions:

Name	Description
`step`	Parse and run one step of action in the environment.
`init`	Initialize the environment, returning initial observation and optional metadata.
`close`	After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

Source code in skyrl-gym/skyrl_gym/core.py:19-97

class Env(Generic[ObsType, ActType]):
    """
    The main SkyRL Gym class for implementing Reinforcement Learning Agents environments.

    The main API methods that users of this class need to know are:

    - `step` - Perform actions (e.g. tool calls) in the environment.
        Return the observations, the reward for taking that actions, and a boolean value `done`.

    - `init` - Initializes the environment to an initial state, required before calling step.
        Returns the first observations for a turn and information, i.e. metrics, debug info.

    - `close` - Closes the environment.
        Important when external software is used, i.e. pygame for rendering, databases
    """

    def step(self, action: ActType) -> EnvStepOutput:
        """
        Parse and run one step of action in the environment.

        Args:
            action (ActType): An action provided to the environment.
                For example, in our case, the action can be a [str] response generated by an LLM,
                which must be parsed and executed accordingly.

        Returns:
            observations (ObsType): The resulting observations after executing the action.
                For example, this could involve executing a SQL query derived from the LLM response
                and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.

            reward (SupportsFloat): The reward obtained by taking the action.

            done (bool): A boolean value for if the episode has ended, in which case further `step` calls will
                return undefined results.

            info (Dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
                This might, for instance, contain: metrics that describe the performance state, variables that are
                hidden from observations, or individual reward terms that are combined to produce the total reward.
        """
        raise NotImplementedError

    def init(self, *kwargs) -> Tuple[ObsType, Dict[str, Any]]:
        """
        Initialize the environment, returning initial observation and optional metadata.

        Returns:
            observations (ObsType): Observations of the initial state. This is analogous to the observations returned by `step`.
            info (Dict): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
                the ``info`` returned by `step`.
        """
        raise NotImplementedError

    def close(self):
        """
        After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

        This is critical for closing rendering windows, database or HTTP connections.
        Calling ``close`` on an already closed environment has no effect and won't raise an error.
        """
        pass

    def __str__(self):
        """
        Returns a string of the environment.

        Returns:
            A string identifying the environment
        """
        return f"Env({type(self).__name__})"

    def __enter__(self):
        """Support with-statement for the environment."""
        return self

    def __exit__(self, *args: Any):
        """Support with-statement for the environment and closes the environment."""
        self.close()
        # propagate exception
        return False

method `step`

step(action: ActType) -> EnvStepOutput

Parse and run one step of action in the environment.

Parameters:

Name	Type	Description	Default
`action`	ActType	An action provided to the environment. For example, in our case, the action can be a [str] response generated by an LLM, which must be parsed and executed accordingly.	required

Returns:

Name	Type	Description
`observations`	ObsType	The resulting observations after executing the action. For example, this could involve executing a SQL query derived from the LLM response and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.
`reward`	SupportsFloat	The reward obtained by taking the action.
`done`	bool	A boolean value for if the episode has ended, in which case further `step` calls will return undefined results.
`info`	Dict	Contains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.

Source code in skyrl-gym/skyrl_gym/core.py:35-58

    def step(self, action: ActType) -> EnvStepOutput:
        """
        Parse and run one step of action in the environment.

        Args:
            action (ActType): An action provided to the environment.
                For example, in our case, the action can be a [str] response generated by an LLM,
                which must be parsed and executed accordingly.

        Returns:
            observations (ObsType): The resulting observations after executing the action.
                For example, this could involve executing a SQL query derived from the LLM response
                and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.

            reward (SupportsFloat): The reward obtained by taking the action.

            done (bool): A boolean value for if the episode has ended, in which case further `step` calls will
                return undefined results.

            info (Dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
                This might, for instance, contain: metrics that describe the performance state, variables that are
                hidden from observations, or individual reward terms that are combined to produce the total reward.
        """
        raise NotImplementedError

method `init`

init(*kwargs) -> Tuple[ObsType, Dict[str, Any]]

Initialize the environment, returning initial observation and optional metadata.

Returns:

Name	Type	Description
`observations`	ObsType	Observations of the initial state. This is analogous to the observations returned by `step`.
`info`	Dict	This dictionary contains auxiliary information complementing `observation`. It should be analogous to the `info` returned by `step`.

Source code in skyrl-gym/skyrl_gym/core.py:60-69

    def init(self, *kwargs) -> Tuple[ObsType, Dict[str, Any]]:
        """
        Initialize the environment, returning initial observation and optional metadata.

        Returns:
            observations (ObsType): Observations of the initial state. This is analogous to the observations returned by `step`.
            info (Dict): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
                the ``info`` returned by `step`.
        """
        raise NotImplementedError

method `close`

close()

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won't raise an error.

Source code in skyrl-gym/skyrl_gym/core.py:71-78

    def close(self):
        """
        After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

        This is critical for closing rendering windows, database or HTTP connections.
        Calling ``close`` on an already closed environment has no effect and won't raise an error.
        """
        pass

class `EnvStepOutput`

Bases: TypedDict

Attributes:

Name	Type	Description
`observations`	ObsType
`reward`	SupportsFloat
`done`	bool
`metadata`	Optional[Dict[str, Any]]

Source code in skyrl-gym/skyrl_gym/core.py:12-16

class EnvStepOutput(TypedDict):
    observations: ObsType
    reward: SupportsFloat
    done: bool
    metadata: Optional[Dict[str, Any]] = None

attr `observations`

observations: ObsType

attr `reward`

reward: SupportsFloat

attr `done`

done: bool

attr `metadata`

metadata: Optional[Dict[str, Any]] = None

Text Environment

class `BaseTextEnv`

BaseTextEnv()

Bases: Env[ConversationType, str]

Base environment class for all text-in / text-out environments. Supports tool-calling and multi-turn trajectories.

Exposes only step, init and close.

Input Types:

ObsType: ConversationType (tool output, LLM input)
ActType: str (LLM output)

Functions:

Name	Description
`init_tool_groups`	Initialize the tool groups for the environment.
`step`	Runs one environment step.
`init`	Return the first prompt to be given to the model and optional metadata.
`close`	Closes the environment, override if needed by subclasses.
`get_metrics`	Return environment-specific metrics for the episode.
`aggregate_metrics`	Static method to aggregate metrics across many episodes of this env class.

Attributes:

Name	Type	Description
`turns`
`max_turns`
`tool_groups`
`tool_to_toolgroup`

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:17-100

class BaseTextEnv(Env[ConversationType, str]):
    """
    Base environment class for all text-in / text-out environments.
    Supports tool-calling and multi-turn trajectories.

    Exposes only `step`, `init` and `close`.

    Input Types:
        - ObsType: ConversationType (tool output, LLM input)
        - ActType: str (LLM output)
    """

    def __init__(self):
        super().__init__()

        # Metadata
        self.turns = 0
        self.max_turns = 1

        # Tool groups
        self.tool_groups = []
        self.tool_to_toolgroup = {}

    def init_tool_groups(self, tool_groups: List = []) -> None:
        """
        Initialize the tool groups for the environment.
        """
        # Find ToolGroup for a given tool
        self.tool_groups = tool_groups
        self.tool_to_toolgroup = {}
        for tool_group in self.tool_groups:
            self.tool_to_toolgroup.update(tool_group.get_tool_to_group_mapping())

    def _execute_tool(self, tool_group_name: str, tool_name: str, tool_input: Any) -> str:
        """
        Find the right ToolGroup and Tool and execute it.
        """
        for group in self.tool_groups:
            if group.name == tool_group_name:
                return group.execute_tool(tool_name, *tool_input)  # tool_input must be tuple or list

        raise ValueError(f"ToolGroup '{tool_group_name}' not found.")

    def step(self, action: str) -> BaseTextEnvStepOutput:
        """
        Runs one environment step.

        Return:
        - observations: [{"role": "user", "content": observation}]
        - reward: float
        - done: bool
        - postprocessed_action: Optional[str]
        - metadata: Dict[str, Any] any metadata
        """
        pass

    def init(self, prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]:
        """
        Return the first prompt to be given to the model and optional metadata.
        """
        return prompt, {}

    def close(self):
        """
        Closes the environment, override if needed by subclasses.
        """
        pass

    def get_metrics(self) -> Dict[str, Any]:
        """
        Return environment-specific metrics for the episode.
        Default is empty dict (no metrics).
        """
        return {}

    @staticmethod
    def aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]:
        """
        Static method to aggregate metrics across many episodes of this env class.
        Default behavior: average the numerics, drop the non-numerics.
        """
        from skyrl_gym.metrics import default_aggregate_metrics

        return default_aggregate_metrics(metrics)

attr `turns`

turns = 0

attr `max_turns`

max_turns = 1

attr `tool_groups`

tool_groups = []

attr `tool_to_toolgroup`

tool_to_toolgroup = {}

method `init_tool_groups`

init_tool_groups(tool_groups: List = []) -> None

Initialize the tool groups for the environment.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:40-48

    def init_tool_groups(self, tool_groups: List = []) -> None:
        """
        Initialize the tool groups for the environment.
        """
        # Find ToolGroup for a given tool
        self.tool_groups = tool_groups
        self.tool_to_toolgroup = {}
        for tool_group in self.tool_groups:
            self.tool_to_toolgroup.update(tool_group.get_tool_to_group_mapping())

method `step`

step(action: str) -> BaseTextEnvStepOutput

Runs one environment step.

Return:

observations: [{"role": "user", "content": observation}]
reward: float
done: bool
postprocessed_action: Optional[str]
metadata: Dict[str, Any] any metadata

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:60-71

    def step(self, action: str) -> BaseTextEnvStepOutput:
        """
        Runs one environment step.

        Return:
        - observations: [{"role": "user", "content": observation}]
        - reward: float
        - done: bool
        - postprocessed_action: Optional[str]
        - metadata: Dict[str, Any] any metadata
        """
        pass

method `init`

init(prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]

Return the first prompt to be given to the model and optional metadata.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:73-77

    def init(self, prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]:
        """
        Return the first prompt to be given to the model and optional metadata.
        """
        return prompt, {}

method `close`

close()

Closes the environment, override if needed by subclasses.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:79-83

    def close(self):
        """
        Closes the environment, override if needed by subclasses.
        """
        pass

method `get_metrics`

get_metrics() -> Dict[str, Any]

Return environment-specific metrics for the episode. Default is empty dict (no metrics).

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:85-90

    def get_metrics(self) -> Dict[str, Any]:
        """
        Return environment-specific metrics for the episode.
        Default is empty dict (no metrics).
        """
        return {}

method staticmethod `aggregate_metrics`

aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]

Static method to aggregate metrics across many episodes of this env class. Default behavior: average the numerics, drop the non-numerics.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:92-100

    @staticmethod
    def aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]:
        """
        Static method to aggregate metrics across many episodes of this env class.
        Default behavior: average the numerics, drop the non-numerics.
        """
        from skyrl_gym.metrics import default_aggregate_metrics

        return default_aggregate_metrics(metrics)

class `BaseTextEnvStepOutput`

Bases: TypedDict

Attributes:

Name	Type	Description
`observations`	ConversationType
`reward`	float
`done`	bool
`metadata`	Dict[str, Any]
`postprocessed_action`	Optional[str]

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:9-14

class BaseTextEnvStepOutput(TypedDict):
    observations: ConversationType  # OpenAI API Messages Format
    reward: float
    done: bool
    metadata: Dict[str, Any]
    postprocessed_action: Optional[str] = None

attr `observations`

observations: ConversationType

attr `reward`

reward: float

attr `done`

done: bool

attr `metadata`

metadata: Dict[str, Any]

attr `postprocessed_action`

postprocessed_action: Optional[str] = None

Environment

On this page