SkyAgent Overview

SkyAgent is a generic agent layer for training and evaluating agents. It is a flexible frontend for building your own agents.

SkyAgent is designed primarily for researchers to have a unified interface around implementing agentic tasks. A modular design allows researchers to

Bring in their own tasks
Use any training backend or simply run evaluation
Modify runtime implementation for a given task (docker, etc)
Improve dispatching logic for a batch of trajectories easily
And more ...

SkyAgent is still under active development. We welcome any early feedback and contributions.

Examples

We have a few examples in the examples folder:

Evaluation: OpenAI: This example shows how to run evaluation with an OpenAI compatible endpoint.
Training: SkyAgent and SkyRL-train : Training a model on the SWEBench task with SkyRL-train.
Training: SkyAgent and VeRL : Training a model on the SWEBench task with VeRL.

Core components

SkyAgent consists of the following components:

AgentRunner : The main entrypoint for Skyagent is the AgentRunner class - it's responsible for generating trajectories for the given batch of prompts
Trajectory : the trajectory class handles generating a single trajectory for the given instance from the batch.
Agent : This is simply an LLM with the ability to call tools.
Task : The task class contains the task specification such as initial instruction, how the agent's runtime should be setup, how to evaluate results, etc.
Dispatcher : the dispatcher is responsible for efficiently handling trajectory execution for a batch of prompts.
Backend : Backend is the LLM backend for generating responses. For example, vLLM for inference or SkyRL-train's inference engines for training.

Trajectory

The trajectory class handles generating a single trajectory for the given instance from the batch. It has three methods:

initialize_trajectory: Initialize the trajectory eg: setup any runtime environment needed for the agent to run.
generate_trajectory: Generate the trajectory i.e. run the agent loop and get the final conversation and task results.
evaluate_trajectory: Evaluate the trajectory i.e. parse the final result and evaluate it for the given task.

The results of both generate_trajectory and evaluate_trajectory are stored in a .result attribute of the trajectory. Each trajectory instance will initialize an Agent instance to generate responses.

Here's a high-level diagram of the components involved in generating a trajectory:

Generate Trajectory

Agent

The agent class is a simple wrapper around an LLM with the ability to call tools. It mainly has a step method that generates an assistant response to the current history. The agent class manages history and response parsing. The actual LLM call is handled by the backend.

Backend

The backend is the LLM backend for generating responses. For example, this can be an OpenAI-compatible webserver for inference or SkyRL-train for training.

Dispatcher

The dispatcher handles the actual execution of a batch of trajectories efficiently. It takes in a batch of trajectories and executes initialize_trajectory, generate_trajectory, and evaluate_trajectory for each trajectory in certain concurrency.

For example, we provide a pipelined dispatcher that can run multiple trajectories in parallel with a maximum concurrency per stage (initialize, generate, evaluate) of max_parallel_agents.

SkyAgent Dispatcher

Overview of the pipelined dispatcher with max_parallel_agents=3

Task

The task class has the following methods:

initialize_runtime: Initialize the runtime for the task in an asyncio-compatible way
get_instruction: Get the initial instruction for the agent in the OpenAI messages format
complete_runtime: Complete or finalize the runtime for the task. For example, this can involve extracting the git patch from the runtime for SWEBench.
evaluate_result: Evaluate model result for the task in an asyncio-compatible way

We currently provide two tasks:

SWEBenchTask : Implements the SWEBench task leveraging OpenHands .
GeneralReactTask : A general task implementation for many basic reasoning tasks like math, science, simple code generation, etc.