Skip to content

feat: add supermodel skill command#126

Merged
greynewell merged 4 commits into
mainfrom
feat/skill-command
Apr 13, 2026
Merged

feat: add supermodel skill command#126
greynewell merged 4 commits into
mainfrom
feat/skill-command

Conversation

@greynewell
Copy link
Copy Markdown
Contributor

@greynewell greynewell commented Apr 13, 2026

Summary

  • Adds supermodel skill command that emits an optimized skill prompt for Claude Code
  • Skill prompt instructs Claude Code to use .graph shard files for smarter navigation and context
  • Includes regression tests for the skill prompt content

CI fixes (rebased from main)

  • Windows: added TMP/TEMP alongside TMPDIR in find/zip_test.go
  • Windows: filepath separator fix already present via rebase on main

Test plan

  • All three CI platforms pass (ubuntu, macos, windows)
  • supermodel skill outputs a valid skill prompt
  • Regression tests cover key phrases in the prompt

Originally authored by @jonathanpopham — rebased and CI-fixed for merge.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a new skill command to display guidance for using code relationship data files.
  • Documentation

    • Updated benchmark results with expanded performance comparisons (60% cheaper, 4× faster).
    • Added documentation for code relationship graph file conventions and usage patterns.
  • Tests

    • Enhanced test coverage for error handling scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Caution

Review failed

The head commit changed during the review from bd084de to f3f4ebd.

Walkthrough

This PR introduces a new skill CLI command that outputs instructions to AI agents on how to use .graph.* sidecar files for understanding code relationships, accompanied by test coverage, documentation, and updated benchmark results reflecting a new 4-way comparison setup.

Changes

Cohort / File(s) Summary
Skill Command & Tests
cmd/skill.go, cmd/skill_test.go
New skill Cobra subcommand that prints an instruction prompt about .graph.* file conventions, sections, and usage patterns. Tests verify the prompt contains required keywords and has substantial content.
Benchmark Documentation
benchmark/CLAUDE.skill.md, benchmark/results/blog-post-draft.md, benchmark/results/summary.md
New guide for graph file usage; updated benchmark narrative and results table expanding from 2-way to 4-way configuration comparison with revised metrics (60% cheaper, 4× faster, 55% fewer turns).
Error Handling Test
internal/find/zip_test.go
Enhanced temp directory error test to set additional environment variables (TMP, TEMP) alongside TMPDIR for broader failure condition coverage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • jonathanpopham

Poem

🧠 A skill to share with minds of code,
.graph.* files light the road,
Dependencies, calls, and impact shown,
Now AI knows what must be known. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description covers the main changes but doesn't follow the repository's template structure with explicit 'What', 'Why', and 'Test plan' sections. Restructure the description using the template format: create explicit 'What' and 'Why' sections, and expand the 'Test plan' section to clarify implementation details.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add supermodel skill command' clearly and concisely describes the main change: a new CLI command being added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skill-command

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
internal/find/zip_test.go (1)

174-175: Update the test comment to match the new setup.

Line 175 says failure is due to invalid TMPDIR, but the test now also relies on TMP and TEMP. Small wording tweak will keep the intent clear.

Suggested comment tweak
-// os.CreateTemp fails due to an invalid TMPDIR.
+// os.CreateTemp fails due to invalid temp environment directories (TMPDIR/TMP/TEMP).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/find/zip_test.go` around lines 174 - 175, Update the test comment
for TestCreateZip_CreateTempError to reflect the new environment setup: mention
that the failure is caused by invalid temporary directory environment variables
(TMPDIR, TMP, and TEMP) rather than only TMPDIR; locate the comment above the
TestCreateZip_CreateTempError test in internal/find/zip_test.go and change the
wording to something like "createZip returns an error when os.CreateTemp fails
due to invalid temporary directory environment variables (TMPDIR, TMP, TEMP)" so
the comment matches the test's current dependencies.
cmd/skill.go (2)

9-19: Consider a single source of truth for the skill prompt text.

Right now this content is duplicated with benchmark/CLAUDE.skill.md, so it can drift over time. Worth centralizing or adding an equality check test between the two artifacts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/skill.go` around lines 9 - 19, The skillPrompt constant content is
duplicated in benchmark/CLAUDE.skill.md which can drift; centralize the
canonical text or add a test to assert equality. Either move the string out of
cmd/skill.go into a single shared resource (e.g., a new package-level file or
const in a shared package) and import/use it from cmd/skill.go and benchmark
code, or add a unit/integration test that reads benchmark/CLAUDE.skill.md and
compares it to the skillPrompt string (or vice versa) to fail the build on
drift; update references to the symbol skillPrompt and the
benchmark/CLAUDE.skill.md path accordingly.

31-33: Use Cobra's output redirection instead of fmt.Println.

Right now, fmt.Println writes directly to stdout, completely bypassing Cobra's output plumbing. This makes it impossible to redirect output in tests (via cmd.SetOut()) or in integrations where you might want to capture the command's output elsewhere.

The fix: Use either fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) or—even simpler—cmd.Println(skillPrompt). Both respect the output redirection that tests and integrations expect.

♻️ Suggested patch
 		Args: cobra.NoArgs,
 		Run: func(cmd *cobra.Command, args []string) {
-			fmt.Println(skillPrompt)
+			cmd.Println(skillPrompt)
 		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/skill.go` around lines 31 - 33, Replace the direct stdout write in the
Run func (the call to fmt.Println(skillPrompt)) with Cobra-aware output so
output redirection via cmd.SetOut()/tests works; update the Run closure in the
command where skillPrompt is printed to use either cmd.Println(skillPrompt) or
fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) instead of fmt.Println.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmark/results/blog-post-draft.md`:
- Around line 32-43: The downstream prose still contains old benchmark numbers
that conflict with the updated table; update any occurrences of the outdated
stats (for example the strings "13 turns, $0.22", "7 turns, $0.13", and "Net
result: 40% cheaper") so they match the table values (Naked Claude $0.30/20
turns/122s; + Supermodel (crafted) $0.12/9 turns/29s; + Supermodel (auto)
$0.15/11 turns/42s; Three-file shards $0.25/16 turns/73s) and replace the
summary line with the new aggregate claim "60% cheaper. 4× faster. 55% fewer
turns."; search the file for any other numeric mentions of cost/turns/duration
and reconcile them to these table values and recalculated percentages so all
prose is consistent with the table.

---

Nitpick comments:
In `@cmd/skill.go`:
- Around line 9-19: The skillPrompt constant content is duplicated in
benchmark/CLAUDE.skill.md which can drift; centralize the canonical text or add
a test to assert equality. Either move the string out of cmd/skill.go into a
single shared resource (e.g., a new package-level file or const in a shared
package) and import/use it from cmd/skill.go and benchmark code, or add a
unit/integration test that reads benchmark/CLAUDE.skill.md and compares it to
the skillPrompt string (or vice versa) to fail the build on drift; update
references to the symbol skillPrompt and the benchmark/CLAUDE.skill.md path
accordingly.
- Around line 31-33: Replace the direct stdout write in the Run func (the call
to fmt.Println(skillPrompt)) with Cobra-aware output so output redirection via
cmd.SetOut()/tests works; update the Run closure in the command where
skillPrompt is printed to use either cmd.Println(skillPrompt) or
fmt.Fprintln(cmd.OutOrStdout(), skillPrompt) instead of fmt.Println.

In `@internal/find/zip_test.go`:
- Around line 174-175: Update the test comment for TestCreateZip_CreateTempError
to reflect the new environment setup: mention that the failure is caused by
invalid temporary directory environment variables (TMPDIR, TMP, and TEMP) rather
than only TMPDIR; locate the comment above the TestCreateZip_CreateTempError
test in internal/find/zip_test.go and change the wording to something like
"createZip returns an error when os.CreateTemp fails due to invalid temporary
directory environment variables (TMPDIR, TMP, TEMP)" so the comment matches the
test's current dependencies.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 483b0074-426f-405a-81f9-b0bc1f449d35

📥 Commits

Reviewing files that changed from the base of the PR and between 999ac58 and de04ed2.

⛔ Files ignored due to path filters (1)
  • benchmark/results/benchmark_results.zip is excluded by !**/*.zip
📒 Files selected for processing (7)
  • benchmark/CLAUDE.skill.md
  • benchmark/results/blog-post-draft.md
  • benchmark/results/skill-v2.txt
  • benchmark/results/summary.md
  • cmd/skill.go
  • cmd/skill_test.go
  • internal/find/zip_test.go

Comment on lines +32 to +43
| | Naked Claude | + Supermodel (crafted) | + Supermodel (auto) | Three-file shards |
|---------------------|-------------|------------------------|---------------------|-------------------|
| **Cost** | $0.30 | $0.12 | $0.15 | $0.25 |
| **Turns** | 20 | 9 | 11 | 16 |
| **Duration** | 122s | 29s | 42s | 73s |
| **Tests passed** | ✓ YES | ✓ YES | ✓ YES | ✓ YES |

**40% cheaper. 6 fewer turns. 72 seconds faster.**
**60% cheaper. 4× faster. 55% fewer turns.**

Both got the right answer. The only difference was how much digging each one had to do first.
All four got the right answer. The only difference was how much digging each one had to do first.

"Crafted" is a hand-written CLAUDE.md with Django-specific hints. "Auto" is what `supermodel skill` generates — a generic prompt that works on any repo. The auto prompt captured 83% of the crafted prompt's savings with zero manual effort.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Benchmark numbers are now internally inconsistent with later narrative sections.

After updating this table, the body still reports older values (for example, Line 49 shows 13 turns, $0.22, Line 70 shows 7 turns, $0.13, and Line 103 says Net result: 40% cheaper). This makes the post read as contradictory and weakens credibility.

Please align the downstream prose with this updated table before publish.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark/results/blog-post-draft.md` around lines 32 - 43, The downstream
prose still contains old benchmark numbers that conflict with the updated table;
update any occurrences of the outdated stats (for example the strings "13 turns,
$0.22", "7 turns, $0.13", and "Net result: 40% cheaper") so they match the table
values (Naked Claude $0.30/20 turns/122s; + Supermodel (crafted) $0.12/9
turns/29s; + Supermodel (auto) $0.15/11 turns/42s; Three-file shards $0.25/16
turns/73s) and replace the summary line with the new aggregate claim "60%
cheaper. 4× faster. 55% fewer turns."; search the file for any other numeric
mentions of cost/turns/duration and reconcile them to these table values and
recalculated percentages so all prose is consistent with the table.

jonathanpopham and others added 3 commits April 13, 2026 17:44
Revised the generic skill prompt based on benchmark trace analysis.
Three changes: teach the .graph naming convention so agents construct
paths directly, bold the read-order directive, and tell agents to
check graph files before grepping for structure.

Skill v2: $0.11, 31s, 7 turns (was $0.15, 42s, 11 turns)
Matches Grey's hand-crafted Django prompt: $0.12, 29s, 9 turns
Locks in the six key elements that drove benchmark results:
graph extension, three section names, naming convention example,
and read-order directive.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Section headers and cost percentages updated to match the results table:
- Naked: 20 turns, \$0.30
- Auto prompt: 11 turns, \$0.15 (50% cheaper)
- Crafted: 9 turns, \$0.12 (60% cheaper)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greynewell greynewell merged commit fd65e1c into main Apr 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants