SkyRL
Tinker API

Cookbook Scripts

The tinker-cookbook is a library provided by Thinking Machines with ready-to-run training recipes. This page describes how to run a few example recipes on SkyRL, and provides example curves from our experiments.

Setup

Follow the Quickstart to install SkyRL, then clone the cookbook:

git clone https://github.com/thinking-machines-lab/tinker-cookbook.git
cd tinker-cookbook

Start the Server

Before launching a training script, start the Tinker server from the SkyRL/ directory:

uv run --extra tinker --extra fsdp -m skyrl.tinker.api \
    --base-model "Qwen/Qwen3-0.6B" --backend fsdp

The same server command can be reused across all recipes below. For more detail on configuration options for the training backend, see Configuration.

Recipes

All of the cookbook recipes default to LoRA training (e.g., with lora_rank=32), but full-parameter fine-tuning (FFT) is supported on SkyRL by setting lora_rank=0. Per Thinking Machines' learning rate guide, use a ~10x lower learning rate when switching to FFT.

Supervised Learning Loop (sl_loop)

Fine-tunes a model on the No Robots dataset using cross-entropy loss with a linear learning rate decay.

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets \
    python -m tinker_cookbook.recipes.sl_loop \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-0.6B" \
    train_on_what=LAST_ASSISTANT_MESSAGE

SL NLL over steps

For full-parameter fine-tuning (no LoRA), set lora_rank=0 and lower the learning rate:

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets \
    python -m tinker_cookbook.recipes.sl_loop \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-0.6B" \
    train_on_what=LAST_ASSISTANT_MESSAGE \
    lora_rank=0 learning_rate=1e-5

RL Training Loop (rl_loop)

Trains a model on GSM8K math problems using GRPO-style reward centering with importance sampling.

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets --with torch \
    python -m tinker_cookbook.recipes.rl_loop \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-0.6B"

RL reward over steps

For full-parameter fine-tuning (no LoRA), set lora_rank=0 and lower the learning rate:

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets --with torch \
    python -m tinker_cookbook.recipes.rl_loop \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-0.6B" \
    lora_rank=0 learning_rate=4e-6

Note: rl_loop uses ephemeral weight sync by default, syncing weights to the inference engine without writing to disk. See Weight Sync for details on ephemeral vs persistent modes.

Math RL (math_rl)

RL training specifically for mathematical reasoning.

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets --with torch \
    python -m tinker_cookbook.recipes.math_rl.train \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-0.6B"

Math RL correct over steps

Code RL (code_rl)

RL training for code generation tasks. Uses the same importance_sampling loss with code execution-based rewards.

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets --with torch \
    python -m tinker_cookbook.recipes.code_rl.train \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-0.6B" \
    lora_rank=0 learning_rate=1e-6

Vision Language (vlm_classifier)

Trains an image classifier on top of Qwen3-VL using the vlm_classifier cookbook recipe. For background on SkyRL's VLM path — dataset record shape, interaction patterns, and the broader support — see the Vision-Language RL tutorial.

VLM training requires a newer vLLM than the version pinned in the repo. Clone vLLM from GitHub and point [tool.uv.sources] in the root pyproject.toml at your local checkout before running. See the vLLM source override callout on the tutorial page for the exact steps.

This recipe uses a different server command than the common one at the top of this page, in particular the use_sample_packing override is VLM-specific:

uv run --isolated --extra tinker --extra fsdp \
    -m skyrl.tinker.api \
    --base-model "Qwen/Qwen3-VL-8B-Instruct" --backend fsdp \
    --backend-config '{"trainer.use_sample_packing": false}'

Then launch the training recipe:

TINKER_API_KEY=tml-dummy uv run --with tinker --with datasets --with torch \
    python -m tinker_cookbook.recipes.vlm_classifier.train \
    base_url=http://localhost:8000 \
    model_name="Qwen/Qwen3-VL-8B-Instruct"

VLM classifier NLL over steps

On this page