SkyRL
Tinker API

Limitations & Roadmap

The Tinker integration is under active development. This page documents current limitations.

Current Limitations

Multi-tenant LoRA: Megatron only

Multi-tenant LoRA training and sampling are supported on the Megatron backend with vLLM serving per-tenant adapters by name. See Multi-tenancy for the operator contract and SL/RL quickstarts. FSDP support is pending, and full-parameter fine-tuning remains single-tenant on both backends — calling create_model with lora_rank=0 while another model exists returns an error.

All adapters registered against one server must share the same (rank, alpha, target_modules) signature; mismatched signatures are hard-rejected at create_model.

No Prompt Logprobs

The sample() API does not yet return prompt logprobs, even when requested. A warning is logged but no error is raised. This may affect scripts that rely on prompt logprobs for KL penalty computation.

KL Penalty

KL penalty (kl_penalty_coef > 0) is not yet supported. This requires prompt logprobs from vLLM (not wired yet) and a way to serve frozen base model logprobs after weight sync. This is disabled by default in cookbook recipes, so it is not a blocker for most workflows.

RL Loss Functions

Only cross_entropy and importance_sampling are currently wired through the Tinker data conversion path. SkyRL's PolicyLossRegistry contains implementations for PPO (regular), cispo, and others, but these are not yet validated through the Tinker API.

On this page