Skip to content

Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines#541

Open
YujiaBao wants to merge 11 commits intothinking-machines-lab:mainfrom
YujiaBao:tutorials-notebooks
Open

Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines#541
YujiaBao wants to merge 11 commits intothinking-machines-lab:mainfrom
YujiaBao:tutorials-notebooks

Conversation

@YujiaBao
Copy link
Copy Markdown
Member

@YujiaBao YujiaBao commented Mar 25, 2026

Summary

Add a tutorials/ directory with 6 progressive Jupyter notebooks that guide users from first API call to building custom RL training pipelines. All notebooks include pre-executed outputs so readers can follow along without API access.

# Notebook What it teaches
01 Hello Tinker Architecture overview (local vs remote), client hierarchy, sampling
02 First SFT Renderers, datum construction, training loop, "Tinker Tinker" persona, Kimi K2.5 scaling demo
03 Efficient Sampling Concurrent futures vs sequential (6.2x speedup), num_samples, batch evaluation throughput
04 First RL Raw GRPO loop on GSM8K: reward functions, group-relative advantages, degenerate groups
05 Cookbook RL Abstractions Env, EnvGroupBuilder, RLDataset, ProblemEnv — how the raw loop maps to reusable types
06 Custom RL Environment Build a custom ProblemEnv subclass and RLDataset for a new task (format compliance)

Key highlights:

  • Tutorial 02 trains both Qwen3.5-4B and Kimi K2.5 with the same code, demonstrating Tinker's core value: same code, any model scale, no GPU setup
  • Tutorial 03 shows concrete speedups from concurrent sampling (6.2x) and num_samples (3.4x), setting up the pattern used in RL
  • Tutorials 05-06 bridge the gap between raw GRPO loops and the cookbook's standard abstractions (ProblemEnv, ProblemGroupBuilder, RLDataset)

Motivation

User feedback indicated that the cookbook has strong reference docs and production recipes, but lacks a guided learning path bridging the two. Users frequently rebuilt what existing recipes already provide (e.g., GRPO on DeepCoder) because they couldn't find them or didn't know where to start. The tutorials fill this gap with a progressive path from basics to custom tasks.

Test plan

  • All 6 notebooks executed end-to-end against live Tinker API
  • Pre-commit hooks pass (ruff check, ruff format, end-of-file-fixer)
  • No warnings in notebook outputs (tqdm, train_on_what suppressed)
  • Tutorials use Qwen3.5-4B (02, 03, 05, 06), Llama-3.1-8B (04), and Kimi K2.5 (02 scaling demo)

🤖 Generated with Claude Code

YujiaBao and others added 3 commits March 24, 2026 18:48
Guided introduction to Tinker from first API call to custom RL pipelines:
01 Hello Tinker, 02 First SFT, 03 Async Patterns, 04 First RL, 05 Custom Task.
All notebooks verified to run end-to-end against live Tinker API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run all 5 notebooks against live Tinker API so readers can see
expected outputs (loss curves, reward metrics, sample completions)
without needing API access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace Pig Latin task with "Tinker Tinker" persona (assistant that knows
  about the Tinker platform) -- more engaging and on-brand
- Switch tutorials 02 and 03 from Qwen3-30B-A3B to Qwen/Qwen3.5-4B
- Fix response parsing to use get_text_content() for clean output
- Add 8 training examples covering Tinker concepts (Datum, renderer, GRPO, etc.)
- Re-execute all modified notebooks with fresh outputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@YujiaBao YujiaBao marked this pull request as draft March 25, 2026 02:09
- Tutorial 05: rewritten to teach Env, EnvGroupBuilder, RLDataset, ProblemEnv
  and how the raw GRPO loop from tutorial 04 maps to these abstractions
- Tutorial 06 (new): build a custom FormatEnv with ProblemEnv, FormatDataset
  with RLDataset, and train using the standard cookbook pipeline
- README updated with 6-tutorial table
- All notebooks executed with outputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@YujiaBao YujiaBao changed the title Add tutorials directory with 5 progressive Jupyter notebook guides Add tutorials directory with 6 progressive Jupyter notebook guides Mar 25, 2026
YujiaBao and others added 6 commits March 24, 2026 20:46
Link to the 6 tutorial notebooks with a summary table so new users
can find the guided learning path from the main README.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Suppress tqdm IProgress warning across all notebooks
- Fix train_on_what warning in tutorials 02/03 (use LAST_ASSISTANT_MESSAGE)
- Fix RL environment doc links to use tinker-docs.thinkingmachines.ai
- Add reward + datums plot to tutorial 06
- Hardcode LR in tutorial 06 (get_lr doesn't support Qwen3.5 config yet)
- Re-execute all 6 notebooks with clean outputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add time.time() to training steps so readers see per-step latency
- Add loss curve plot after training
- Add "Scaling up" section that fine-tunes Kimi K2.5 on the same data
- Add side-by-side loss comparison plot (4B vs frontier model)
- Sample from fine-tuned Kimi K2.5 to show quality difference
- Demonstrates core Tinker value: same code, any model scale

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch to kimi_k25_disable_thinking renderer for clean outputs
- Increase from 5 to 10 training steps with 5e-4 LR
- Loss now drops to 0.0002, responses are clean without </think> artifacts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the old async SFT training tutorial (which was misleading since
forward_backward and optim_step are inherently sequential) with a new
tutorial focused on concurrent sampling — the real async payoff in Tinker.

Shows three techniques with timing comparisons:
- Sequential vs concurrent futures (6.2x speedup)
- num_samples for grouped completions (3.4x speedup)
- Combined batch evaluation (32 completions in 7.7s)

This naturally leads into tutorial 04 (RL) where concurrent sampling
is essential for GRPO rollouts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@YujiaBao YujiaBao requested a review from joschu March 25, 2026 04:55
@YujiaBao YujiaBao changed the title Add tutorials directory with 6 progressive Jupyter notebook guides Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines Mar 25, 2026
- Tutorial 02: move "Next steps" to after Kimi K2.5 section
- Tutorial 02: suppress Kimi tokenizer debug output via %%capture
- Re-execute tutorial 02 with clean outputs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@YujiaBao YujiaBao requested review from cjtb-42 and dphuang2 March 25, 2026 05:18
@YujiaBao YujiaBao marked this pull request as ready for review March 25, 2026 05:18
@cjtb-42
Copy link
Copy Markdown

cjtb-42 commented Mar 26, 2026

Jupyter isn't listed in the project deps; since the PR adds tutorial notebooks, it probably should include it, at least as an optional dependency:

[project.optional-dependencies]
tutorials = ["jupyter>=1.0.0"]

@cjtb-42
Copy link
Copy Markdown

cjtb-42 commented Mar 26, 2026

Recommend that we update "Install the Tinker SDK and set your API key." to "Install the Tinker SDK and set your API key as an environment variable, TINKER_API_KEY"

@cjtb-42
Copy link
Copy Markdown

cjtb-42 commented Mar 27, 2026

@YujiaBao I will v happily approve once the cosmetic/explanatory changes for Tutorial 1 (whichever ones you choose to include from the doc) are sorted + we add the jupyter notebook optional dependency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants