SkyRL
API ReferenceSkyRL-Gym

Environment

Environment API — Env, BaseTextEnv, step outputs.

Core Classes

class Env

Bases: Generic[ObsType, ActType]

The main SkyRL Gym class for implementing Reinforcement Learning Agents environments.

The main API methods that users of this class need to know are:

  • step - Perform actions (e.g. tool calls) in the environment. Return the observations, the reward for taking that actions, and a boolean value done.

  • init - Initializes the environment to an initial state, required before calling step. Returns the first observations for a turn and information, i.e. metrics, debug info.

  • close - Closes the environment. Important when external software is used, i.e. pygame for rendering, databases

Functions:

NameDescription
stepParse and run one step of action in the environment.
initInitialize the environment, returning initial observation and optional metadata.
closeAfter the user has finished using the environment, close contains the code necessary to "clean up" the environment.
Source code in skyrl-gym/skyrl_gym/core.py:19-97
class Env(Generic[ObsType, ActType]):
    """
    The main SkyRL Gym class for implementing Reinforcement Learning Agents environments.

    The main API methods that users of this class need to know are:

    - `step` - Perform actions (e.g. tool calls) in the environment.
        Return the observations, the reward for taking that actions, and a boolean value `done`.

    - `init` - Initializes the environment to an initial state, required before calling step.
        Returns the first observations for a turn and information, i.e. metrics, debug info.

    - `close` - Closes the environment.
        Important when external software is used, i.e. pygame for rendering, databases
    """

    def step(self, action: ActType) -> EnvStepOutput:
        """
        Parse and run one step of action in the environment.

        Args:
            action (ActType): An action provided to the environment.
                For example, in our case, the action can be a [str] response generated by an LLM,
                which must be parsed and executed accordingly.

        Returns:
            observations (ObsType): The resulting observations after executing the action.
                For example, this could involve executing a SQL query derived from the LLM response
                and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.

            reward (SupportsFloat): The reward obtained by taking the action.

            done (bool): A boolean value for if the episode has ended, in which case further `step` calls will
                return undefined results.

            info (Dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
                This might, for instance, contain: metrics that describe the performance state, variables that are
                hidden from observations, or individual reward terms that are combined to produce the total reward.
        """
        raise NotImplementedError

    def init(self, *kwargs) -> Tuple[ObsType, Dict[str, Any]]:
        """
        Initialize the environment, returning initial observation and optional metadata.

        Returns:
            observations (ObsType): Observations of the initial state. This is analogous to the observations returned by `step`.
            info (Dict): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
                the ``info`` returned by `step`.
        """
        raise NotImplementedError

    def close(self):
        """
        After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

        This is critical for closing rendering windows, database or HTTP connections.
        Calling ``close`` on an already closed environment has no effect and won't raise an error.
        """
        pass

    def __str__(self):
        """
        Returns a string of the environment.

        Returns:
            A string identifying the environment
        """
        return f"Env({type(self).__name__})"

    def __enter__(self):
        """Support with-statement for the environment."""
        return self

    def __exit__(self, *args: Any):
        """Support with-statement for the environment and closes the environment."""
        self.close()
        # propagate exception
        return False

method step

step(action: ActType) -> EnvStepOutput

Parse and run one step of action in the environment.

Parameters:

NameTypeDescriptionDefault
actionActTypeAn action provided to the environment. For example, in our case, the action can be a [str] response generated by an LLM, which must be parsed and executed accordingly.required

Returns:

NameTypeDescription
observationsObsTypeThe resulting observations after executing the action. For example, this could involve executing a SQL query derived from the LLM response and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.
rewardSupportsFloatThe reward obtained by taking the action.
doneboolA boolean value for if the episode has ended, in which case further step calls will return undefined results.
infoDictContains auxiliary diagnostic information (helpful for debugging, learning, and logging). This might, for instance, contain: metrics that describe the performance state, variables that are hidden from observations, or individual reward terms that are combined to produce the total reward.
Source code in skyrl-gym/skyrl_gym/core.py:35-58
    def step(self, action: ActType) -> EnvStepOutput:
        """
        Parse and run one step of action in the environment.

        Args:
            action (ActType): An action provided to the environment.
                For example, in our case, the action can be a [str] response generated by an LLM,
                which must be parsed and executed accordingly.

        Returns:
            observations (ObsType): The resulting observations after executing the action.
                For example, this could involve executing a SQL query derived from the LLM response
                and observing {'role': 'user', 'content': 'str(observations)'} output or any error messages from database.

            reward (SupportsFloat): The reward obtained by taking the action.

            done (bool): A boolean value for if the episode has ended, in which case further `step` calls will
                return undefined results.

            info (Dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
                This might, for instance, contain: metrics that describe the performance state, variables that are
                hidden from observations, or individual reward terms that are combined to produce the total reward.
        """
        raise NotImplementedError

method init

init(*kwargs) -> Tuple[ObsType, Dict[str, Any]]

Initialize the environment, returning initial observation and optional metadata.

Returns:

NameTypeDescription
observationsObsTypeObservations of the initial state. This is analogous to the observations returned by step.
infoDictThis dictionary contains auxiliary information complementing observation. It should be analogous to the info returned by step.
Source code in skyrl-gym/skyrl_gym/core.py:60-69
    def init(self, *kwargs) -> Tuple[ObsType, Dict[str, Any]]:
        """
        Initialize the environment, returning initial observation and optional metadata.

        Returns:
            observations (ObsType): Observations of the initial state. This is analogous to the observations returned by `step`.
            info (Dict): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
                the ``info`` returned by `step`.
        """
        raise NotImplementedError

method close

close()

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

This is critical for closing rendering windows, database or HTTP connections. Calling close on an already closed environment has no effect and won't raise an error.

Source code in skyrl-gym/skyrl_gym/core.py:71-78
    def close(self):
        """
        After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

        This is critical for closing rendering windows, database or HTTP connections.
        Calling ``close`` on an already closed environment has no effect and won't raise an error.
        """
        pass

class EnvStepOutput

Bases: TypedDict

Attributes:

NameTypeDescription
observationsObsType
rewardSupportsFloat
donebool
metadataOptional[Dict[str, Any]]
Source code in skyrl-gym/skyrl_gym/core.py:12-16
class EnvStepOutput(TypedDict):
    observations: ObsType
    reward: SupportsFloat
    done: bool
    metadata: Optional[Dict[str, Any]] = None

attr observations

observations: ObsType

attr reward

reward: SupportsFloat

attr done

done: bool

attr metadata

metadata: Optional[Dict[str, Any]] = None

Text Environment

class BaseTextEnv

BaseTextEnv()

Bases: Env[ConversationType, str]

Base environment class for all text-in / text-out environments. Supports tool-calling and multi-turn trajectories.

Exposes only step, init and close.

Input Types:

  • ObsType: ConversationType (tool output, LLM input)
  • ActType: str (LLM output)

Functions:

NameDescription
init_tool_groupsInitialize the tool groups for the environment.
stepRuns one environment step.
initReturn the first prompt to be given to the model and optional metadata.
closeCloses the environment, override if needed by subclasses.
get_metricsReturn environment-specific metrics for the episode.
aggregate_metricsStatic method to aggregate metrics across many episodes of this env class.

Attributes:

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:17-100
class BaseTextEnv(Env[ConversationType, str]):
    """
    Base environment class for all text-in / text-out environments.
    Supports tool-calling and multi-turn trajectories.

    Exposes only `step`, `init` and `close`.

    Input Types:
        - ObsType: ConversationType (tool output, LLM input)
        - ActType: str (LLM output)
    """

    def __init__(self):
        super().__init__()

        # Metadata
        self.turns = 0
        self.max_turns = 1

        # Tool groups
        self.tool_groups = []
        self.tool_to_toolgroup = {}

    def init_tool_groups(self, tool_groups: List = []) -> None:
        """
        Initialize the tool groups for the environment.
        """
        # Find ToolGroup for a given tool
        self.tool_groups = tool_groups
        self.tool_to_toolgroup = {}
        for tool_group in self.tool_groups:
            self.tool_to_toolgroup.update(tool_group.get_tool_to_group_mapping())

    def _execute_tool(self, tool_group_name: str, tool_name: str, tool_input: Any) -> str:
        """
        Find the right ToolGroup and Tool and execute it.
        """
        for group in self.tool_groups:
            if group.name == tool_group_name:
                return group.execute_tool(tool_name, *tool_input)  # tool_input must be tuple or list

        raise ValueError(f"ToolGroup '{tool_group_name}' not found.")

    def step(self, action: str) -> BaseTextEnvStepOutput:
        """
        Runs one environment step.

        Return:
        - observations: [{"role": "user", "content": observation}]
        - reward: float
        - done: bool
        - postprocessed_action: Optional[str]
        - metadata: Dict[str, Any] any metadata
        """
        pass

    def init(self, prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]:
        """
        Return the first prompt to be given to the model and optional metadata.
        """
        return prompt, {}

    def close(self):
        """
        Closes the environment, override if needed by subclasses.
        """
        pass

    def get_metrics(self) -> Dict[str, Any]:
        """
        Return environment-specific metrics for the episode.
        Default is empty dict (no metrics).
        """
        return {}

    @staticmethod
    def aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]:
        """
        Static method to aggregate metrics across many episodes of this env class.
        Default behavior: average the numerics, drop the non-numerics.
        """
        from skyrl_gym.metrics import default_aggregate_metrics

        return default_aggregate_metrics(metrics)

attr turns

turns = 0

attr max_turns

max_turns = 1

attr tool_groups

tool_groups = []

attr tool_to_toolgroup

tool_to_toolgroup = {}

method init_tool_groups

init_tool_groups(tool_groups: List = []) -> None

Initialize the tool groups for the environment.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:40-48
    def init_tool_groups(self, tool_groups: List = []) -> None:
        """
        Initialize the tool groups for the environment.
        """
        # Find ToolGroup for a given tool
        self.tool_groups = tool_groups
        self.tool_to_toolgroup = {}
        for tool_group in self.tool_groups:
            self.tool_to_toolgroup.update(tool_group.get_tool_to_group_mapping())

method step

step(action: str) -> BaseTextEnvStepOutput

Runs one environment step.

Return:

  • observations: [{"role": "user", "content": observation}]
  • reward: float
  • done: bool
  • postprocessed_action: Optional[str]
  • metadata: Dict[str, Any] any metadata
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:60-71
    def step(self, action: str) -> BaseTextEnvStepOutput:
        """
        Runs one environment step.

        Return:
        - observations: [{"role": "user", "content": observation}]
        - reward: float
        - done: bool
        - postprocessed_action: Optional[str]
        - metadata: Dict[str, Any] any metadata
        """
        pass

method init

init(prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]

Return the first prompt to be given to the model and optional metadata.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:73-77
    def init(self, prompt: ConversationType) -> Tuple[ConversationType, Dict[str, Any]]:
        """
        Return the first prompt to be given to the model and optional metadata.
        """
        return prompt, {}

method close

close()

Closes the environment, override if needed by subclasses.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:79-83
    def close(self):
        """
        Closes the environment, override if needed by subclasses.
        """
        pass

method get_metrics

get_metrics() -> Dict[str, Any]

Return environment-specific metrics for the episode. Default is empty dict (no metrics).

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:85-90
    def get_metrics(self) -> Dict[str, Any]:
        """
        Return environment-specific metrics for the episode.
        Default is empty dict (no metrics).
        """
        return {}

method staticmethod aggregate_metrics

aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]

Static method to aggregate metrics across many episodes of this env class. Default behavior: average the numerics, drop the non-numerics.

Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:92-100
    @staticmethod
    def aggregate_metrics(metrics: List[Dict[str, Any]]) -> Dict[str, Any]:
        """
        Static method to aggregate metrics across many episodes of this env class.
        Default behavior: average the numerics, drop the non-numerics.
        """
        from skyrl_gym.metrics import default_aggregate_metrics

        return default_aggregate_metrics(metrics)

class BaseTextEnvStepOutput

Bases: TypedDict

Attributes:

NameTypeDescription
observationsConversationType
rewardfloat
donebool
metadataDict[str, Any]
postprocessed_actionOptional[str]
Source code in skyrl-gym/skyrl_gym/envs/base_text_env.py:9-14
class BaseTextEnvStepOutput(TypedDict):
    observations: ConversationType  # OpenAI API Messages Format
    reward: float
    done: bool
    metadata: Dict[str, Any]
    postprocessed_action: Optional[str] = None

attr observations

observations: ConversationType

attr reward

reward: float

attr done

done: bool

attr metadata

metadata: Dict[str, Any]

attr postprocessed_action

postprocessed_action: Optional[str] = None

On this page