SkyRL
Tinker API

Configuration

The backend configuration organization is undergoing a migration to simplify and reorganize these options. The keys and structure described below may change in a future release.

This page describes how to configure the SkyRL Tinker backend, including GPU allocation, training parameters, and inference settings.

When spinning up the Tinker server, the --backend-config flag accepts a JSON dictionary of dot-notation overrides that are applied to the underlying SkyRL-Train configuration. For example:

uv run --extra tinker --extra fsdp -m skyrl.tinker.api \
    --base-model "Qwen/Qwen3-0.6B" --backend fsdp \
    --backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 4}'

Any field in the SkyRL-Train config can be overridden this way (see SkyRLTrainConfig in skyrl/train/config/config.py for all available keys and defaults). The most commonly used options are listed below.

GPU and Parallelism

KeyDefaultDescription
trainer.placement.policy_num_gpus_per_node1Number of GPUs for training
trainer.placement.policy_num_nodes1Number of nodes for training
generator.inference_engine.num_engines1Number of vLLM inference engines for sampling
generator.inference_engine.tensor_parallel_size1Tensor parallel size per inference engine
trainer.micro_forward_batch_size_per_gpu1Micro-batch size per GPU (for forward pass)
trainer.micro_train_batch_size_per_gpu1Micro-batch size per GPU (for gradient accumulation)
generator.inference_engine.gpu_memory_utilization0.8Fraction of GPU memory for vLLM KV cache

When running a small model on multiple GPUs, you typically want to set policy_num_gpus_per_node and generator.inference_engine.num_engines to the same value. For example, on a 4-GPU node:

--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 4}'

For large models that don't fit on a single GPU for inference, increase generator.inference_engine.tensor_parallel_size and decrease generator.inference_engine.num_engines accordingly. For example, on 4 GPUs with TP=2:

--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 2, "generator.inference_engine.tensor_parallel_size": 2}'

LoRA

LoRA is configured from the client side, not the server. When creating a model via the Tinker SDK, pass a lora_config with the desired rank. For example, in tinker-cookbook recipes:

# LoRA training (default in most recipes)
python -m tinker_cookbook.recipes.sl_loop ... lora_rank=32

# Full-parameter fine-tuning
python -m tinker_cookbook.recipes.sl_loop ... lora_rank=0

No server-side configuration is needed to switch between single-tenant LoRA and full-parameter fine-tuning.

Multi-tenant LoRA

Hosting multiple LoRA tenants concurrently against one server does require server-side configuration on the Megatron backend. At minimum:

{
    "trainer.placement.colocate_all": false,
    "trainer.policy.megatron_config.lora_config.merge_lora": false,
    "trainer.policy.model.lora.max_loras": <max concurrent adapters in a single batch>,
    "trainer.policy.model.lora.max_cpu_loras": <total adapter capacity>
}

merge_lora: false is required so vLLM serves each tenant's adapter by name (with merge_lora: true vLLM only sees the merged base and per-tenant sampling returns the wrong weights). max_cpu_loras must be sized to the peak number of concurrent tenants — there is no on-demand reload, and if vLLM evicts an adapter the next sample() against it 404s. All adapters on one server must share the same (rank, alpha, target_modules) signature; mismatched signatures are hard-rejected at create_model.

See Multi-tenancy for the full operator contract and SFT/RL quickstarts.

Full Config Reference

For the complete list of configuration options, see the SkyRL-Train configuration docs and the SkyRLTrainConfig dataclass.

On this page