feat: add supermodel skill command#126
Conversation
WalkthroughThis PR introduces a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/find/zip_test.go (1)
174-175: Update the test comment to match the new setup.Line 175 says failure is due to invalid
TMPDIR, but the test now also relies onTMPandTEMP. Small wording tweak will keep the intent clear.Suggested comment tweak
-// os.CreateTemp fails due to an invalid TMPDIR. +// os.CreateTemp fails due to invalid temp environment directories (TMPDIR/TMP/TEMP).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/find/zip_test.go` around lines 174 - 175, Update the test comment for TestCreateZip_CreateTempError to reflect the new environment setup: mention that the failure is caused by invalid temporary directory environment variables (TMPDIR, TMP, and TEMP) rather than only TMPDIR; locate the comment above the TestCreateZip_CreateTempError test in internal/find/zip_test.go and change the wording to something like "createZip returns an error when os.CreateTemp fails due to invalid temporary directory environment variables (TMPDIR, TMP, TEMP)" so the comment matches the test's current dependencies.cmd/skill.go (2)
9-19: Consider a single source of truth for the skill prompt text.Right now this content is duplicated with
benchmark/CLAUDE.skill.md, so it can drift over time. Worth centralizing or adding an equality check test between the two artifacts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/skill.go` around lines 9 - 19, The skillPrompt constant content is duplicated in benchmark/CLAUDE.skill.md which can drift; centralize the canonical text or add a test to assert equality. Either move the string out of cmd/skill.go into a single shared resource (e.g., a new package-level file or const in a shared package) and import/use it from cmd/skill.go and benchmark code, or add a unit/integration test that reads benchmark/CLAUDE.skill.md and compares it to the skillPrompt string (or vice versa) to fail the build on drift; update references to the symbol skillPrompt and the benchmark/CLAUDE.skill.md path accordingly.
31-33: Use Cobra's output redirection instead offmt.Println.Right now,
fmt.Printlnwrites directly to stdout, completely bypassing Cobra's output plumbing. This makes it impossible to redirect output in tests (viacmd.SetOut()) or in integrations where you might want to capture the command's output elsewhere.The fix: Use either
fmt.Fprintln(cmd.OutOrStdout(), skillPrompt)or—even simpler—cmd.Println(skillPrompt). Both respect the output redirection that tests and integrations expect.♻️ Suggested patch
Args: cobra.NoArgs, Run: func(cmd *cobra.Command, args []string) { - fmt.Println(skillPrompt) + cmd.Println(skillPrompt) }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/skill.go` around lines 31 - 33, Replace the direct stdout write in the Run func (the call to fmt.Println(skillPrompt)) with Cobra-aware output so output redirection via cmd.SetOut()/tests works; update the Run closure in the command where skillPrompt is printed to use either cmd.Println(skillPrompt) or fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) instead of fmt.Println.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@benchmark/results/blog-post-draft.md`:
- Around line 32-43: The downstream prose still contains old benchmark numbers
that conflict with the updated table; update any occurrences of the outdated
stats (for example the strings "13 turns, $0.22", "7 turns, $0.13", and "Net
result: 40% cheaper") so they match the table values (Naked Claude $0.30/20
turns/122s; + Supermodel (crafted) $0.12/9 turns/29s; + Supermodel (auto)
$0.15/11 turns/42s; Three-file shards $0.25/16 turns/73s) and replace the
summary line with the new aggregate claim "60% cheaper. 4× faster. 55% fewer
turns."; search the file for any other numeric mentions of cost/turns/duration
and reconcile them to these table values and recalculated percentages so all
prose is consistent with the table.
---
Nitpick comments:
In `@cmd/skill.go`:
- Around line 9-19: The skillPrompt constant content is duplicated in
benchmark/CLAUDE.skill.md which can drift; centralize the canonical text or add
a test to assert equality. Either move the string out of cmd/skill.go into a
single shared resource (e.g., a new package-level file or const in a shared
package) and import/use it from cmd/skill.go and benchmark code, or add a
unit/integration test that reads benchmark/CLAUDE.skill.md and compares it to
the skillPrompt string (or vice versa) to fail the build on drift; update
references to the symbol skillPrompt and the benchmark/CLAUDE.skill.md path
accordingly.
- Around line 31-33: Replace the direct stdout write in the Run func (the call
to fmt.Println(skillPrompt)) with Cobra-aware output so output redirection via
cmd.SetOut()/tests works; update the Run closure in the command where
skillPrompt is printed to use either cmd.Println(skillPrompt) or
fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) instead of fmt.Println.
In `@internal/find/zip_test.go`:
- Around line 174-175: Update the test comment for TestCreateZip_CreateTempError
to reflect the new environment setup: mention that the failure is caused by
invalid temporary directory environment variables (TMPDIR, TMP, and TEMP) rather
than only TMPDIR; locate the comment above the TestCreateZip_CreateTempError
test in internal/find/zip_test.go and change the wording to something like
"createZip returns an error when os.CreateTemp fails due to invalid temporary
directory environment variables (TMPDIR, TMP, TEMP)" so the comment matches the
test's current dependencies.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 483b0074-426f-405a-81f9-b0bc1f449d35
⛔ Files ignored due to path filters (1)
benchmark/results/benchmark_results.zipis excluded by!**/*.zip
📒 Files selected for processing (7)
benchmark/CLAUDE.skill.mdbenchmark/results/blog-post-draft.mdbenchmark/results/skill-v2.txtbenchmark/results/summary.mdcmd/skill.gocmd/skill_test.gointernal/find/zip_test.go
| | | Naked Claude | + Supermodel (crafted) | + Supermodel (auto) | Three-file shards | | ||
| |---------------------|-------------|------------------------|---------------------|-------------------| | ||
| | **Cost** | $0.30 | $0.12 | $0.15 | $0.25 | | ||
| | **Turns** | 20 | 9 | 11 | 16 | | ||
| | **Duration** | 122s | 29s | 42s | 73s | | ||
| | **Tests passed** | ✓ YES | ✓ YES | ✓ YES | ✓ YES | | ||
|
|
||
| **40% cheaper. 6 fewer turns. 72 seconds faster.** | ||
| **60% cheaper. 4× faster. 55% fewer turns.** | ||
|
|
||
| Both got the right answer. The only difference was how much digging each one had to do first. | ||
| All four got the right answer. The only difference was how much digging each one had to do first. | ||
|
|
||
| "Crafted" is a hand-written CLAUDE.md with Django-specific hints. "Auto" is what `supermodel skill` generates — a generic prompt that works on any repo. The auto prompt captured 83% of the crafted prompt's savings with zero manual effort. |
There was a problem hiding this comment.
Benchmark numbers are now internally inconsistent with later narrative sections.
After updating this table, the body still reports older values (for example, Line 49 shows 13 turns, $0.22, Line 70 shows 7 turns, $0.13, and Line 103 says Net result: 40% cheaper). This makes the post read as contradictory and weakens credibility.
Please align the downstream prose with this updated table before publish.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@benchmark/results/blog-post-draft.md` around lines 32 - 43, The downstream
prose still contains old benchmark numbers that conflict with the updated table;
update any occurrences of the outdated stats (for example the strings "13 turns,
$0.22", "7 turns, $0.13", and "Net result: 40% cheaper") so they match the table
values (Naked Claude $0.30/20 turns/122s; + Supermodel (crafted) $0.12/9
turns/29s; + Supermodel (auto) $0.15/11 turns/42s; Three-file shards $0.25/16
turns/73s) and replace the summary line with the new aggregate claim "60%
cheaper. 4× faster. 55% fewer turns."; search the file for any other numeric
mentions of cost/turns/duration and reconcile them to these table values and
recalculated percentages so all prose is consistent with the table.
Revised the generic skill prompt based on benchmark trace analysis. Three changes: teach the .graph naming convention so agents construct paths directly, bold the read-order directive, and tell agents to check graph files before grepping for structure. Skill v2: $0.11, 31s, 7 turns (was $0.15, 42s, 11 turns) Matches Grey's hand-crafted Django prompt: $0.12, 29s, 9 turns
Locks in the six key elements that drove benchmark results: graph extension, three section names, naming convention example, and read-order directive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
de04ed2 to
bd084de
Compare
Section headers and cost percentages updated to match the results table: - Naked: 20 turns, \$0.30 - Auto prompt: 11 turns, \$0.15 (50% cheaper) - Crafted: 9 turns, \$0.12 (60% cheaper) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
supermodel skillcommand that emits an optimized skill prompt for Claude Code.graphshard files for smarter navigation and contextCI fixes (rebased from main)
TMP/TEMPalongsideTMPDIRinfind/zip_test.goTest plan
supermodel skilloutputs a valid skill promptOriginally authored by @jonathanpopham — rebased and CI-fixed for merge.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
skillcommand to display guidance for using code relationship data files.Documentation
Tests