SkyRL
Tinker API

Configuration

The backend configuration organization is undergoing a migration to simplify and reorganize these options. The keys and structure described below may change in a future release.

This page describes how to configure the SkyRL Tinker backend, including GPU allocation, training parameters, and inference settings.

When spinning up the Tinker server, the --backend-config flag accepts a JSON dictionary of dot-notation overrides that are applied to the underlying SkyRL-Train configuration. For example:

uv run --extra tinker --extra fsdp -m skyrl.tinker.api \
    --base-model "Qwen/Qwen3-0.6B" --backend fsdp \
    --backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 4}'

Any field in the SkyRL-Train config can be overridden this way (see the default config YAML for all available keys and defaults). The most commonly used options are listed below.

GPU and Parallelism

KeyDefaultDescription
trainer.placement.policy_num_gpus_per_node1Number of GPUs for training
trainer.placement.policy_num_nodes1Number of nodes for training
generator.inference_engine.num_engines1Number of vLLM inference engines for sampling
generator.inference_engine.tensor_parallel_size1Tensor parallel size per inference engine
trainer.micro_forward_batch_size_per_gpu1Micro-batch size per GPU (for forward pass)
trainer.micro_train_batch_size_per_gpu1Micro-batch size per GPU (for gradient accumulation)
generator.inference_engine.gpu_memory_utilization0.8Fraction of GPU memory for vLLM KV cache

When running a small model on multiple GPUs, you typically want to set policy_num_gpus_per_node and generator.inference_engine.num_engines to the same value. For example, on a 4-GPU node:

--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 4}'

For large models that don't fit on a single GPU for inference, increase generator.inference_engine.tensor_parallel_size and decrease generator.inference_engine.num_engines accordingly. For example, on 4 GPUs with TP=2:

--backend-config '{"trainer.placement.policy_num_gpus_per_node": 4, "generator.inference_engine.num_engines": 2, "generator.inference_engine.tensor_parallel_size": 2}'

LoRA

LoRA is configured from the client side, not the server. When creating a model via the Tinker SDK, pass a lora_config with the desired rank. For example, in tinker-cookbook recipes:

# LoRA training (default in most recipes)
python -m tinker_cookbook.recipes.sl_loop ... lora_rank=32

# Full-parameter fine-tuning
python -m tinker_cookbook.recipes.sl_loop ... lora_rank=0

No server-side configuration is needed to switch between single-tenant LoRA and full-parameter fine-tuning.

Multi-tenant LoRA

Hosting multiple LoRA tenants concurrently against one server does require server-side configuration on the Megatron backend. At minimum:

{
    "trainer.placement.colocate_all": false,
    "trainer.policy.megatron_config.lora_config.merge_lora": false,
    "trainer.policy.model.lora.max_loras": <max concurrent adapters in a single batch>,
    "trainer.policy.model.lora.max_cpu_loras": <total adapter capacity>
}

merge_lora: false is required so vLLM serves each tenant's adapter by name (with merge_lora: true vLLM only sees the merged base and per-tenant sampling returns the wrong weights). max_cpu_loras must be sized to the peak number of concurrent tenants — there is no on-demand reload, and if vLLM evicts an adapter the next sample() against it 404s. All adapters on one server must share the same (rank, alpha, target_modules) signature; mismatched signatures are hard-rejected at create_model.

See Multi-tenancy for the full operator contract and SFT/RL quickstarts.

Full Config Reference

For the complete list of configuration options, see the SkyRL-Train configuration docs and the default config YAML.

On this page