Configuration
The backend configuration organization is undergoing a migration to simplify and reorganize these options. The keys and structure described below may change in a future release.
This page describes how to configure the SkyRL Tinker backend, including GPU allocation, training parameters, and inference settings.
When spinning up the Tinker server, the --backend-config flag accepts a JSON dictionary of dot-notation overrides that are applied to the underlying SkyRL-Train configuration. For example:
uv run --extra tinker --extra fsdp -m skyrl.tinker.api \
--base-model "Qwen/Qwen3-0.6B" --backend fsdp \
--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 4}'Any field in the SkyRL-Train config can be overridden this way (see the default config YAML for all available keys and defaults). The most commonly used options are listed below.
GPU and Parallelism
| Key | Default | Description |
|---|---|---|
trainer.placement.policy_num_gpus_per_node | 1 | Number of GPUs for training |
trainer.placement.policy_num_nodes | 1 | Number of nodes for training |
generator.inference_engine.num_engines | 1 | Number of vLLM inference engines for sampling |
generator.inference_engine.tensor_parallel_size | 1 | Tensor parallel size per inference engine |
trainer.micro_forward_batch_size_per_gpu | 1 | Micro-batch size per GPU (for forward pass) |
trainer.micro_train_batch_size_per_gpu | 1 | Micro-batch size per GPU (for gradient accumulation) |
generator.inference_engine.gpu_memory_utilization | 0.8 | Fraction of GPU memory for vLLM KV cache |
When running a small model on multiple GPUs, you typically want to set policy_num_gpus_per_node and generator.inference_engine.num_engines to the same value. For example, on a 4-GPU node:
--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 4}'For large models that don't fit on a single GPU for inference, increase generator.inference_engine.tensor_parallel_size and decrease generator.inference_engine.num_engines accordingly. For example, on 4 GPUs with TP=2:
--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 2, "generator.inference_engine.tensor_parallel_size": 2}'LoRA
LoRA is configured from the client side, not the server. When creating a model via the Tinker SDK, pass a lora_config with the desired rank. For example, in tinker-cookbook recipes:
# LoRA training (default in most recipes)
python -m tinker_cookbook.recipes.sl_loop ... lora_rank=32
# Full-parameter fine-tuning
python -m tinker_cookbook.recipes.sl_loop ... lora_rank=0No server-side configuration is needed to switch between single-tenant LoRA and full-parameter fine-tuning.
Multi-tenant LoRA
Hosting multiple LoRA tenants concurrently against one server does require server-side configuration on the Megatron backend. At minimum:
{
"trainer.placement.colocate_all": false,
"trainer.policy.megatron_config.lora_config.merge_lora": false,
"trainer.policy.model.lora.max_loras": <max concurrent adapters in a single batch>,
"trainer.policy.model.lora.max_cpu_loras": <total adapter capacity>
}merge_lora: false is required so vLLM serves each tenant's adapter by name (with merge_lora: true vLLM only sees the merged base and per-tenant sampling returns the wrong weights). max_cpu_loras must be sized to the peak number of concurrent tenants — there is no on-demand reload, and if vLLM evicts an adapter the next sample() against it 404s. All adapters on one server must share the same (rank, alpha, target_modules) signature; mismatched signatures are hard-rejected at create_model.
See Multi-tenancy for the full operator contract and SFT/RL quickstarts.
Full Config Reference
For the complete list of configuration options, see the SkyRL-Train configuration docs and the default config YAML.