Supported Models

SkyRL provides support for end-to-end training with the FSDP and Megatron training backends (both using vLLM for inference), as well as support for a JAX backend (for both training and inference) via the Tinker API server. The following models are supported by each training backend:

FSDP

Any model that is supported via HuggingFace transformers + vLLM is supported via the FSDP backend. The FSDP backend also comes with support for Ulysses-style sequence parallelism, making it a viable option for training on dense models with up to 32B parameters and 32K/64K context lengths.

Megatron

The Megatron backend comes with support for 5D parallelism (DP/TP/CP/PP/EP) and allows for efficient scaling of large MoE models (100B+). Support for mapping from HuggingFace model definitions to Megatron GPTModels is provided by NVIDIA's Megatron-Bridge library.

The following models are fully supported in SkyRL's Megatron backend via Megatron-Bridge + vLLM:

Qwen-3.5 (Dense: 0.8B/2B/4B/9B/27B, MoE: 35B-A3B/122B-A10B/397B-A17B)
Nemotron-3 (Dense: Nano-4B-BF16, MoE: Nano-30B-A3B-BF16)
GLM-4.7-Flash/GLM-4.7
Qwen3 (Dense: 0.6B/1.7B/4B/8B/32B, MoE: 30B-A3B/235B-A22B)
Moonlight-16B-A3B

Other models supported via Megatron-Bridge and vLLM should run on SkyRL as well. Contributions and bug fixes for additional model families are welcome!

JAX

The JAX backend supports the following models:

Qwen-3.5 Dense
Deepseek-V3
Llama-3
Qwen3 Dense/MoE

Supported Models

FSDP

Megatron

JAX

On this page