Supported Models
SkyRL provides support for end-to-end training with the FSDP and Megatron training backends (both using vLLM for inference), as well as support for a JAX backend (for both training and inference) via the Tinker API server. The following models are supported by each training backend:
FSDP
Any model that is supported via HuggingFace transformers + vLLM is supported via the FSDP backend. The FSDP backend also comes with support for Ulysses-style sequence parallelism, making it a viable option for training on dense models with up to 32B parameters and 32K/64K context lengths.
Megatron
The Megatron backend comes with support for 5D parallelism (DP/TP/CP/PP/EP) and allows for efficient scaling of large MoE models (100B+). Support for mapping from HuggingFace model definitions to Megatron GPTModels is provided by NVIDIA's Megatron-Bridge library.
The following models are fully supported in SkyRL's Megatron backend via Megatron-Bridge + vLLM:
- Qwen-3.5 (Dense: 0.8B/2B/4B/9B/27B, MoE: 35B-A3B/122B-A10B/397B-A17B)
- Nemotron-3 (Dense: Nano-4B-BF16, MoE: Nano-30B-A3B-BF16)
- GLM-4.7-Flash/GLM-4.7
- Qwen3 (Dense: 0.6B/1.7B/4B/8B/32B, MoE: 30B-A3B/235B-A22B)
- Moonlight-16B-A3B
Other models supported via Megatron-Bridge and vLLM should run on SkyRL as well. Contributions and bug fixes for additional model families are welcome!
JAX
The JAX backend supports the following models:
- Qwen-3.5 Dense
- Deepseek-V3
- Llama-3
- Qwen3 Dense/MoE