Switching Training Backends

This page covers the fsdp and megatron backends.

In SkyRL, you can switch between different training backends with minimal changes to your training script.

Currently, we support the following training backends:

FSDP
FSDP2
Megatron

To switch to a different backend, simply set the trainer.strategy parameter to the desired backend. We use the fsdp2 backend by default.

Prerequisites

First, make sure you are familiar with the standard setup process for running GRPO training. See Quick Start Guide for more details.

We provide baseline examples for GRPO training on GSM8K for each of these backends starting from the basic quickstart example. The quickstart script is available at examples/train/gsm8k/run_gsm8k.sh.

uv run --isolated --extra fsdp -m skyrl.train.entrypoints.main_base \
    trainer.algorithm.advantage_estimator="grpo" \
    data.train_data="['$HOME/data/gsm8k/train.parquet']" \
    data.val_data="['$HOME/data/gsm8k/validation.parquet']" \
    trainer.policy.model.path="Qwen/Qwen2.5-1.5B-Instruct" \
    ... # Other parameters (see `examples/gsm8k/run_gsm8k.sh` for more)

FSDP and FSDP2

To switch to FSDP or FSDP2, set the trainer.strategy parameter to fsdp or fsdp2 respectively.

# bash examples/training_backends/fsdp/run_fsdp2.sh (or just)
bash examples/gsm8k/run_gsm8k.sh trainer.strategy=fsdp2

Additionally, you can tune FSDP specific configurations as shown below:

# enable offloading of model parameters to CPU during the forward pass for the ref model
trainer.ref.fsdp_config.cpu_offload=true \

Note that cpu_offload is distinct from worker state offloading with model colocation. You can find details on this, as well as the full set of FSDP configurations at fsdp-configurations.

cpu_offload cannot be enabled for the policy or critic model with FSDP1, since gradient accumulation outside no_sync mode is not supported with CPU offloading. See the limitations section in FSDP docs for more details.

Megatron

Switching to the megatron backend is more involved, requiring additional dependencies and configuration. For more details, see the docs on Megatron megatron-installation.

Switching Training Backends

Prerequisites

Running the Examples

FSDP and FSDP2

Megatron

On this page