Add tool-call-pattern hotspot detector#224
Conversation
New `burn hotspots --patterns=tool-replacement-eligible` detector flags vanilla call sequences in session logs that map to a known relaywash (https://github.com/AgentWorkforce/wash) replacement tool, surfacing a concrete "you'd save N tokens by installing relaywash" finding. Detected categories (all from `TurnRecord.toolCalls`, no content sidecar): - search-sequence: Glob → Grep → Read in one turn (≥3× per session) → relaywash__Search - edit-cluster: ≥3 single-edit calls to the same file within 5 consecutive turns → relaywash__Edit (batched) - bash-git-state: git status / diff / log → relaywash__GitState - bash-test-run: pnpm/npm/yarn/bun test, pytest, jest, vitest, cargo test, go test → relaywash__TestRun - bash-gh-pr: gh pr <verb> and gh api → relaywash__GhPR Each finding carries an estimated tokens-saved figure (conservative flat-rate per occurrence pending the `_meta.replaces` annotation reader in #219), priced at the session's dominant model input rate. Findings flow through the existing WasteFinding envelope into `--findings` and `--json` output and the per-detector text table. Closes #220.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 56ff5c2ee6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!parsed.subcommand) return false; | ||
| // TWO_PART_SUBCOMMANDS folds `pr <verb>` into a single subcommand, so | ||
| // `gh pr view` parses to subcommand="pr view". `gh api` is single-part. | ||
| return parsed.subcommand === 'api' || parsed.subcommand.startsWith('pr'); |
There was a problem hiding this comment.
Restrict gh replacement detection to actual PR commands
matchesGhPr currently uses parsed.subcommand.startsWith('pr'), which also matches non-PR gh commands like gh project ... because their subcommand is project. In sessions that use GitHub Projects, this will emit bash-gh-pr findings with inflated token/USD savings even though no PR workflow happened. Match pr/pr ... explicitly (plus api) to avoid these false positives.
Useful? React with 👍 / 👎.
`startsWith('pr')` false-positives on `gh project ...` and other
pr-prefixed subcommands. Match `pr` exactly or with a trailing space
so `gh project list` no longer inflates `bash-gh-pr` findings.
Renames `tool-replacement-eligible` to `tool-call-pattern` and strips all relaywash-specific output (replacementTool field, repo URL, install-X action, wash-named titles/details). Burn now reports the pattern + tokens- of-overhead estimate; downstream tools map categories to whatever consolidation they offer. - packages/analyze: rename detector + types, drop replacementTool field - packages/cli: rename PATTERN_KIND, JSON key (toolCallPatterns), table renderer; drop "replacement" column - README + CHANGELOGs: drop wash references
Brings the SDK package into this branch (added on main as 12a7ede) and extends `hotspots({ patterns })` to also surface findings from `tool-output-bloat`, `ghost-surface`, and `tool-call-pattern`. Previously only the core `detectPatterns` set reached the SDK. - packages/cli: export `buildGhostSurfaceInputs` for SDK reuse - packages/sdk: load Claude settings, query tool-result events, and walk the user-installed surface lazily based on requested patterns; widened HotspotsOptions JSDoc to enumerate supported kinds - CHANGELOGs: note the SDK surface expansion Resolved CHANGELOG conflicts in root, analyze, and cli — kept Unreleased section ahead of the 1.6.2 / 1.5.0 release blocks from main.
`attributeHotspots` and `detectPatterns` take `(turns, opts)` positionally. Calling them with a single object made `opts` undefined and threw on the `pricing` destructure, leaving every path through `hotspots()` unreachable including the new side-channel detectors. Load `pricing` once up front, bucket the flat `userTurns` array into `userTurnsBySession`, and pass both detectors the proper positional args.
The window comparison used `>` so a span of 5 (six turn indexes 0..5) still fit under EDIT_CLUSTER_TURN_WINDOW = 5. Switch to `>=` so the runtime matches the documented "within 5 consecutive turns" semantics, and add a boundary test that pins the cap.
Summary
Closes #220.
Adds a
tool-call-patterndetector toburn hotspots --patternsthat surfaces vanilla tool-call sequences with consolidatable overhead. Vendor-neutral output: each finding describes the pattern and its tokens-of-overhead estimate, with no recommendation about which tool would replace it. Downstream tools (or in-house playbooks) map categories to specific consolidations.What it detects
Five categories, all derived from
TurnRecord.toolCalls(no content sidecar required, onlyhasToolCallscoverage):search-sequence:GlobthenGrepthenReadedit-cluster: ≥3Editcalls to the same filebash-git-state:git status/git diff/git logparseBashCommandsub-verb matchbash-test-run:pnpm test/npm test/pytest/jest/vitest/cargo test/go testparseBashCommandsub-verb matchbash-gh-pr:gh pr <verb>/gh apiparseBashCommandsub-verb matchCross-harness via
normalizeToolName: OpenCode lowercase (glob/grep/read/edit) and Codex (apply_patch) variants are folded automatically. Bash detection covers ClaudeBash, OpenCodebash, Codexexec_command/shell.The two highest-confidence patterns (Glob+Grep+Read, edit clusters) ship with the strict thresholds the issue suggested. Single-edit retry loops and
Read-without-range are deferred — the existingretriesdetector already covers the former, and the latter needs content-sidecar joins.The
edit-clusterdetector is complementary to the existingedit-heavydetector:edit-heavyscores the session-wide edit/read ratio, whileedit-clusterlocalizes the burst to a single file inside a tight turn window.Output
burn hotspots --patterns=tool-call-patternruns the detector and prints the per-detector table (session, category, occurrence count, tokens of overhead, USD overhead).--findingsincludestool-call-patternrows in the unified severity-ranked findings list, with the standardInspect this sessionaction.--jsonadds a top-leveltoolCallPatternsarray to the existing schema.occurrenceCountso downstream consumers can re-price with their own per-tool numbers.Vendor-neutral split
This PR keeps the detection logic in burn. A separate consumer in
AgentWorkforce/washwill maptool-call-patterncategories to specificrelaywash__*recommendations.Test plan
pnpm run buildpnpm run test— 907 tests pass, including 14 tests inpackages/analyze/src/tool-call-patterns.test.tscovering all five categories, the cross-harness aliases, threshold behavior, and the WasteFinding adapter.resolvePatternSelectiontest updated to expect the 12th detector.Generated by Claude Code