Skill for searching quantization recipes#1593
Conversation
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds documentation for a quantization recipe search skill: skill metadata, responsibility mapping to related skills, a staged core iteration loop, practical defaults, and two supporting reference documents (recipe iteration guide and Qwen3.6 case study). ChangesQuant Recipe Search Skill Documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
ec55ea0 to
eda36e9
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.claude/skills/quant-recipe-search/references/qwen36_case_study.md (1)
22-22: 💤 Low valueConsider expanding "LCB" on first use.
"LCB" appears twice (lines 22 and 28) without being spelled out. While readers familiar with the quantization benchmark suite will recognize "LiveCodeBench," first-time readers of this case study may need the expansion for clarity.
📝 Suggested expansion
-- Self-attention NVFP4 hurt LCB for modest active-byte savings. +- Self-attention NVFP4 hurt LiveCodeBench (LCB) for modest active-byte savings.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.claude/skills/quant-recipe-search/references/qwen36_case_study.md at line 22, Expand the acronym "LCB" on first use in qwen36_case_study.md by replacing the first occurrence of "LCB" with "LiveCodeBench (LCB)" (or "LCB (LiveCodeBench)" if preferred) so readers unfamiliar with the benchmark understand the term, and leave subsequent mentions as "LCB" or ensure any later standalone occurrences are consistent with that expansion (notably the second occurrence currently at line 28).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.claude/skills/quant-recipe-search/references/qwen36_case_study.md:
- Line 22: Expand the acronym "LCB" on first use in qwen36_case_study.md by
replacing the first occurrence of "LCB" with "LiveCodeBench (LCB)" (or "LCB
(LiveCodeBench)" if preferred) so readers unfamiliar with the benchmark
understand the term, and leave subsequent mentions as "LCB" or ensure any later
standalone occurrences are consistent with that expansion (notably the second
occurrence currently at line 28).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0c1047e1-7a25-42e1-bfac-6096be290a4f
📒 Files selected for processing (3)
.claude/skills/quant-recipe-search/SKILL.md.claude/skills/quant-recipe-search/references/qwen36_case_study.md.claude/skills/quant-recipe-search/references/recipe_iteration.md
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1593 +/- ##
==========================================
- Coverage 73.23% 73.09% -0.15%
==========================================
Files 479 480 +1
Lines 52435 53744 +1309
==========================================
+ Hits 38401 39284 +883
- Misses 14034 14460 +426
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
shengliangxu
left a comment
There was a problem hiding this comment.
Approve to unblock. We definitely will keep iterating.
Edwardf0t1
left a comment
There was a problem hiding this comment.
Thanks @meenchen for adding this skill. I think we need to address the overlap with bigpareto. Our design doc places recipe search in bigpareto (the “strategy/search” layer, with its own /sweep-* skills). This skill is also recipe search, so they need an explicit “when to use which”. Here’s my take:
- bigpareto = automated breadth sweep, fan-out N configs, Pareto frontier — best for well-supported models you can blind-sweep.
- quant-recipe-search = interactive depth loop, one candidate at a time with sensitivity interpretation + debugging — best for novel models where each
Does it align with your thoughts?
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
hychiang-git
left a comment
There was a problem hiding this comment.
Approve to unblock. Do we have a comparison on the hand-craft recipe and the recipe tune by the agents?
@Edwardf0t1 I think these two should be orthogonal. Bigpareto validates how recipes perform for a given model, and quant-recipe-search generates good recipes. I feel we should keep recipe development in ModelOpt and iterate on the skill over time. |
What does this PR do?
Type of change: New skill
Adds a quant-recipe-search skill for guiding iterative quantization recipe development. The skill focuses on choosing compression objectives, starting from existing ModelOpt recipe baselines, planning AutoQuant/manual recipe candidates, interpreting sensitivity results, and deciding the next experiments while delegating PTQ, deployment, eval launching, monitoring, and result comparison to the existing dedicated skills.
Default search goal: find the best-performing recipe for the chosen objective while keeping each benchmark’s accuracy loss under 1pp versus the matching baseline, with reruns for noisy or near-threshold regressions.
Usage
User can ask: "Find the best quantization recipe and generate a PTQ checkpoint for this model."
The skill will:
ptqskill.Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit