Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines by YujiaBao · Pull Request #541 · thinking-machines-lab/tinker-cookbook

YujiaBao · 2026-03-25T01:57:32Z

Summary

Add a tutorials/ directory with 6 progressive Jupyter notebooks that guide users from first API call to building custom RL training pipelines. All notebooks include pre-executed outputs so readers can follow along without API access.

#	Notebook	What it teaches
01	Hello Tinker	Architecture overview (local vs remote), client hierarchy, sampling
02	First SFT	Renderers, datum construction, training loop, "Tinker Tinker" persona, Kimi K2.5 scaling demo
03	Efficient Sampling	Concurrent futures vs sequential (6.2x speedup), `num_samples`, batch evaluation throughput
04	First RL	Raw GRPO loop on GSM8K: reward functions, group-relative advantages, degenerate groups
05	Cookbook RL Abstractions	`Env`, `EnvGroupBuilder`, `RLDataset`, `ProblemEnv` — how the raw loop maps to reusable types
06	Custom RL Environment	Build a custom `ProblemEnv` subclass and `RLDataset` for a new task (format compliance)

Key highlights:

Tutorial 02 trains both Qwen3.5-4B and Kimi K2.5 with the same code, demonstrating Tinker's core value: same code, any model scale, no GPU setup
Tutorial 03 shows concrete speedups from concurrent sampling (6.2x) and num_samples (3.4x), setting up the pattern used in RL
Tutorials 05-06 bridge the gap between raw GRPO loops and the cookbook's standard abstractions (ProblemEnv, ProblemGroupBuilder, RLDataset)

Motivation

User feedback indicated that the cookbook has strong reference docs and production recipes, but lacks a guided learning path bridging the two. Users frequently rebuilt what existing recipes already provide (e.g., GRPO on DeepCoder) because they couldn't find them or didn't know where to start. The tutorials fill this gap with a progressive path from basics to custom tasks.

Test plan

All 6 notebooks executed end-to-end against live Tinker API
Pre-commit hooks pass (ruff check, ruff format, end-of-file-fixer)
No warnings in notebook outputs (tqdm, train_on_what suppressed)
Tutorials use Qwen3.5-4B (02, 03, 05, 06), Llama-3.1-8B (04), and Kimi K2.5 (02 scaling demo)

🤖 Generated with Claude Code

Guided introduction to Tinker from first API call to custom RL pipelines: 01 Hello Tinker, 02 First SFT, 03 Async Patterns, 04 First RL, 05 Custom Task. All notebooks verified to run end-to-end against live Tinker API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Run all 5 notebooks against live Tinker API so readers can see expected outputs (loss curves, reward metrics, sample completions) without needing API access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace Pig Latin task with "Tinker Tinker" persona (assistant that knows about the Tinker platform) -- more engaging and on-brand - Switch tutorials 02 and 03 from Qwen3-30B-A3B to Qwen/Qwen3.5-4B - Fix response parsing to use get_text_content() for clean output - Add 8 training examples covering Tinker concepts (Datum, renderer, GRPO, etc.) - Re-execute all modified notebooks with fresh outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Tutorial 05: rewritten to teach Env, EnvGroupBuilder, RLDataset, ProblemEnv and how the raw GRPO loop from tutorial 04 maps to these abstractions - Tutorial 06 (new): build a custom FormatEnv with ProblemEnv, FormatDataset with RLDataset, and train using the standard cookbook pipeline - README updated with 6-tutorial table - All notebooks executed with outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Link to the 6 tutorial notebooks with a summary table so new users can find the guided learning path from the main README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Suppress tqdm IProgress warning across all notebooks - Fix train_on_what warning in tutorials 02/03 (use LAST_ASSISTANT_MESSAGE) - Fix RL environment doc links to use tinker-docs.thinkingmachines.ai - Add reward + datums plot to tutorial 06 - Hardcode LR in tutorial 06 (get_lr doesn't support Qwen3.5 config yet) - Re-execute all 6 notebooks with clean outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add time.time() to training steps so readers see per-step latency - Add loss curve plot after training - Add "Scaling up" section that fine-tunes Kimi K2.5 on the same data - Add side-by-side loss comparison plot (4B vs frontier model) - Sample from fine-tuned Kimi K2.5 to show quality difference - Demonstrates core Tinker value: same code, any model scale Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Switch to kimi_k25_disable_thinking renderer for clean outputs - Increase from 5 to 10 training steps with 5e-4 LR - Loss now drops to 0.0002, responses are clean without </think> artifacts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace the old async SFT training tutorial (which was misleading since forward_backward and optim_step are inherently sequential) with a new tutorial focused on concurrent sampling — the real async payoff in Tinker. Shows three techniques with timing comparisons: - Sequential vs concurrent futures (6.2x speedup) - num_samples for grouped completions (3.4x speedup) - Combined batch evaluation (32 completions in 7.7s) This naturally leads into tutorial 04 (RL) where concurrent sampling is essential for GRPO rollouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Tutorial 02: move "Next steps" to after Kimi K2.5 section - Tutorial 02: suppress Kimi tokenizer debug output via %%capture - Re-execute tutorial 02 with clean outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cjtb-42 · 2026-03-26T17:51:02Z

Jupyter isn't listed in the project deps; since the PR adds tutorial notebooks, it probably should include it, at least as an optional dependency:

[project.optional-dependencies]
tutorials = ["jupyter>=1.0.0"]

cjtb-42 · 2026-03-26T17:56:25Z

Recommend that we update "Install the Tinker SDK and set your API key." to "Install the Tinker SDK and set your API key as an environment variable, TINKER_API_KEY"

cjtb-42 · 2026-03-27T16:42:50Z

@YujiaBao I will v happily approve once the cosmetic/explanatory changes for Tutorial 1 (whichever ones you choose to include from the doc) are sorted + we add the jupyter notebook optional dependency

YujiaBao and others added 3 commits March 24, 2026 18:48

Add pre-executed outputs to tutorial notebooks

c292f32

Run all 5 notebooks against live Tinker API so readers can see expected outputs (loss curves, reward metrics, sample completions) without needing API access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

YujiaBao marked this pull request as draft March 25, 2026 02:09

YujiaBao changed the title ~~Add tutorials directory with 5 progressive Jupyter notebook guides~~ Add tutorials directory with 6 progressive Jupyter notebook guides Mar 25, 2026

YujiaBao and others added 6 commits March 24, 2026 20:46

Add tutorials section to top-level README

979cf28

Link to the 6 tutorial notebooks with a summary table so new users can find the guided learning path from the main README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update README tables for rewritten tutorial 03

bb802a2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

YujiaBao requested a review from joschu March 25, 2026 04:55

YujiaBao changed the title ~~Add tutorials directory with 6 progressive Jupyter notebook guides~~ Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines Mar 25, 2026

YujiaBao requested review from cjtb-42 and dphuang2 March 25, 2026 05:18

YujiaBao marked this pull request as ready for review March 25, 2026 05:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines#541

Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines#541
YujiaBao wants to merge 11 commits intothinking-machines-lab:mainfrom
YujiaBao:tutorials-notebooks

YujiaBao commented Mar 25, 2026 •

edited

Loading

Uh oh!

cjtb-42 commented Mar 26, 2026

Uh oh!

cjtb-42 commented Mar 26, 2026

Uh oh!

cjtb-42 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YujiaBao commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

cjtb-42 commented Mar 26, 2026

Uh oh!

cjtb-42 commented Mar 26, 2026

Uh oh!

cjtb-42 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YujiaBao commented Mar 25, 2026 •

edited

Loading