Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines#541
Open
YujiaBao wants to merge 11 commits intothinking-machines-lab:mainfrom
Open
Add tutorials: 6 progressive Jupyter notebooks from first API call to custom RL pipelines#541YujiaBao wants to merge 11 commits intothinking-machines-lab:mainfrom
YujiaBao wants to merge 11 commits intothinking-machines-lab:mainfrom
Conversation
Guided introduction to Tinker from first API call to custom RL pipelines: 01 Hello Tinker, 02 First SFT, 03 Async Patterns, 04 First RL, 05 Custom Task. All notebooks verified to run end-to-end against live Tinker API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run all 5 notebooks against live Tinker API so readers can see expected outputs (loss curves, reward metrics, sample completions) without needing API access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace Pig Latin task with "Tinker Tinker" persona (assistant that knows about the Tinker platform) -- more engaging and on-brand - Switch tutorials 02 and 03 from Qwen3-30B-A3B to Qwen/Qwen3.5-4B - Fix response parsing to use get_text_content() for clean output - Add 8 training examples covering Tinker concepts (Datum, renderer, GRPO, etc.) - Re-execute all modified notebooks with fresh outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tutorial 05: rewritten to teach Env, EnvGroupBuilder, RLDataset, ProblemEnv and how the raw GRPO loop from tutorial 04 maps to these abstractions - Tutorial 06 (new): build a custom FormatEnv with ProblemEnv, FormatDataset with RLDataset, and train using the standard cookbook pipeline - README updated with 6-tutorial table - All notebooks executed with outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Link to the 6 tutorial notebooks with a summary table so new users can find the guided learning path from the main README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Suppress tqdm IProgress warning across all notebooks - Fix train_on_what warning in tutorials 02/03 (use LAST_ASSISTANT_MESSAGE) - Fix RL environment doc links to use tinker-docs.thinkingmachines.ai - Add reward + datums plot to tutorial 06 - Hardcode LR in tutorial 06 (get_lr doesn't support Qwen3.5 config yet) - Re-execute all 6 notebooks with clean outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add time.time() to training steps so readers see per-step latency - Add loss curve plot after training - Add "Scaling up" section that fine-tunes Kimi K2.5 on the same data - Add side-by-side loss comparison plot (4B vs frontier model) - Sample from fine-tuned Kimi K2.5 to show quality difference - Demonstrates core Tinker value: same code, any model scale Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch to kimi_k25_disable_thinking renderer for clean outputs - Increase from 5 to 10 training steps with 5e-4 LR - Loss now drops to 0.0002, responses are clean without </think> artifacts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the old async SFT training tutorial (which was misleading since forward_backward and optim_step are inherently sequential) with a new tutorial focused on concurrent sampling — the real async payoff in Tinker. Shows three techniques with timing comparisons: - Sequential vs concurrent futures (6.2x speedup) - num_samples for grouped completions (3.4x speedup) - Combined batch evaluation (32 completions in 7.7s) This naturally leads into tutorial 04 (RL) where concurrent sampling is essential for GRPO rollouts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tutorial 02: move "Next steps" to after Kimi K2.5 section - Tutorial 02: suppress Kimi tokenizer debug output via %%capture - Re-execute tutorial 02 with clean outputs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Jupyter isn't listed in the project deps; since the PR adds tutorial notebooks, it probably should include it, at least as an optional dependency: |
|
Recommend that we update "Install the Tinker SDK and set your API key." to "Install the Tinker SDK and set your API key as an environment variable, TINKER_API_KEY" |
|
@YujiaBao I will v happily approve once the cosmetic/explanatory changes for Tutorial 1 (whichever ones you choose to include from the doc) are sorted + we add the jupyter notebook optional dependency |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a
tutorials/directory with 6 progressive Jupyter notebooks that guide users from first API call to building custom RL training pipelines. All notebooks include pre-executed outputs so readers can follow along without API access.num_samples, batch evaluation throughputEnv,EnvGroupBuilder,RLDataset,ProblemEnv— how the raw loop maps to reusable typesProblemEnvsubclass andRLDatasetfor a new task (format compliance)Key highlights:
num_samples(3.4x), setting up the pattern used in RLProblemEnv,ProblemGroupBuilder,RLDataset)Motivation
User feedback indicated that the cookbook has strong reference docs and production recipes, but lacks a guided learning path bridging the two. Users frequently rebuilt what existing recipes already provide (e.g., GRPO on DeepCoder) because they couldn't find them or didn't know where to start. The tutorials fill this gap with a progressive path from basics to custom tasks.
Test plan
🤖 Generated with Claude Code