SkyRL
Getting Started

Supported Models

SkyRL provides support for end-to-end training with the FSDP and Megatron training backends (both using vLLM for inference), as well as support for a JAX backend (for both training and inference) via the Tinker API server. The following models are supported by each training backend:

FSDP

Any model that is supported via HuggingFace transformers + vLLM is supported via the FSDP backend. The FSDP backend also comes with support for Ulysses-style sequence parallelism, making it a viable option for training on dense models with up to 32B parameters and 32K/64K context lengths.

Megatron

The Megatron backend comes with support for 5D parallelism (DP/TP/CP/PP/EP) and allows for efficient scaling of large MoE models (100B+). Support for mapping from HuggingFace model definitions to Megatron GPTModels is provided by NVIDIA's Megatron-Bridge library.

The following models are fully supported in SkyRL's Megatron backend via Megatron-Bridge + vLLM:

Other models supported via Megatron-Bridge and vLLM should run on SkyRL as well. Contributions and bug fixes for additional model families are welcome!

JAX

The JAX backend supports the following models:

  • Qwen-3.5 Dense
  • Deepseek-V3
  • Llama-3
  • Qwen3 Dense/MoE

On this page