Skip to content

Add tool-call-pattern hotspot detector#224

Merged
willwashburn merged 6 commits into
mainfrom
claude/fix-issue-220-IUo5i
May 2, 2026
Merged

Add tool-call-pattern hotspot detector#224
willwashburn merged 6 commits into
mainfrom
claude/fix-issue-220-IUo5i

Conversation

@willwashburn

@willwashburn willwashburn commented May 1, 2026

Copy link
Copy Markdown
Member

Summary

Closes #220.

Adds a tool-call-pattern detector to burn hotspots --patterns that surfaces vanilla tool-call sequences with consolidatable overhead. Vendor-neutral output: each finding describes the pattern and its tokens-of-overhead estimate, with no recommendation about which tool would replace it. Downstream tools (or in-house playbooks) map categories to specific consolidations.

What it detects

Five categories, all derived from TurnRecord.toolCalls (no content sidecar required, only hasToolCalls coverage):

Pattern Heuristic
search-sequence: Glob then Grep then Read In-order match within one turn; flagged at ≥3 sequences per session
edit-cluster: ≥3 Edit calls to the same file Sliding 5-turn window per file
bash-git-state: git status / git diff / git log parseBashCommand sub-verb match
bash-test-run: pnpm test / npm test / pytest / jest / vitest / cargo test / go test parseBashCommand sub-verb match
bash-gh-pr: gh pr <verb> / gh api parseBashCommand sub-verb match

Cross-harness via normalizeToolName: OpenCode lowercase (glob/grep/read/edit) and Codex (apply_patch) variants are folded automatically. Bash detection covers Claude Bash, OpenCode bash, Codex exec_command / shell.

The two highest-confidence patterns (Glob+Grep+Read, edit clusters) ship with the strict thresholds the issue suggested. Single-edit retry loops and Read-without-range are deferred — the existing retries detector already covers the former, and the latter needs content-sidecar joins.

The edit-cluster detector is complementary to the existing edit-heavy detector: edit-heavy scores the session-wide edit/read ratio, while edit-cluster localizes the burst to a single file inside a tight turn window.

Output

  • burn hotspots --patterns=tool-call-pattern runs the detector and prints the per-detector table (session, category, occurrence count, tokens of overhead, USD overhead).
  • --findings includes tool-call-pattern rows in the unified severity-ranked findings list, with the standard Inspect this session action.
  • --json adds a top-level toolCallPatterns array to the existing schema.
  • Token-overhead estimates are conservative per-occurrence flat rates priced at the session's dominant model input rate. Each finding carries occurrenceCount so downstream consumers can re-price with their own per-tool numbers.

Vendor-neutral split

This PR keeps the detection logic in burn. A separate consumer in AgentWorkforce/wash will map tool-call-pattern categories to specific relaywash__* recommendations.

Test plan

  • pnpm run build
  • pnpm run test — 907 tests pass, including 14 tests in packages/analyze/src/tool-call-patterns.test.ts covering all five categories, the cross-harness aliases, threshold behavior, and the WasteFinding adapter.
  • resolvePatternSelection test updated to expect the 12th detector.
  • README and root + per-package CHANGELOG entries added.

Generated by Claude Code

New `burn hotspots --patterns=tool-replacement-eligible` detector flags
vanilla call sequences in session logs that map to a known relaywash
(https://github.com/AgentWorkforce/wash) replacement tool, surfacing a
concrete "you'd save N tokens by installing relaywash" finding.

Detected categories (all from `TurnRecord.toolCalls`, no content sidecar):

- search-sequence: Glob → Grep → Read in one turn (≥3× per session) →
  relaywash__Search
- edit-cluster: ≥3 single-edit calls to the same file within 5
  consecutive turns → relaywash__Edit (batched)
- bash-git-state: git status / diff / log → relaywash__GitState
- bash-test-run: pnpm/npm/yarn/bun test, pytest, jest, vitest, cargo
  test, go test → relaywash__TestRun
- bash-gh-pr: gh pr <verb> and gh api → relaywash__GhPR

Each finding carries an estimated tokens-saved figure (conservative
flat-rate per occurrence pending the `_meta.replaces` annotation reader
in #219), priced at the session's dominant model input rate. Findings
flow through the existing WasteFinding envelope into `--findings` and
`--json` output and the per-detector text table.

Closes #220.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 56ff5c2ee6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (!parsed.subcommand) return false;
// TWO_PART_SUBCOMMANDS folds `pr <verb>` into a single subcommand, so
// `gh pr view` parses to subcommand="pr view". `gh api` is single-part.
return parsed.subcommand === 'api' || parsed.subcommand.startsWith('pr');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict gh replacement detection to actual PR commands

matchesGhPr currently uses parsed.subcommand.startsWith('pr'), which also matches non-PR gh commands like gh project ... because their subcommand is project. In sessions that use GitHub Projects, this will emit bash-gh-pr findings with inflated token/USD savings even though no PR workflow happened. Match pr/pr ... explicitly (plus api) to avoid these false positives.

Useful? React with 👍 / 👎.

claude and others added 2 commits May 1, 2026 21:22
`startsWith('pr')` false-positives on `gh project ...` and other
pr-prefixed subcommands. Match `pr` exactly or with a trailing space
so `gh project list` no longer inflates `bash-gh-pr` findings.
Renames `tool-replacement-eligible` to `tool-call-pattern` and strips all
relaywash-specific output (replacementTool field, repo URL, install-X
action, wash-named titles/details). Burn now reports the pattern + tokens-
of-overhead estimate; downstream tools map categories to whatever
consolidation they offer.

- packages/analyze: rename detector + types, drop replacementTool field
- packages/cli: rename PATTERN_KIND, JSON key (toolCallPatterns), table
  renderer; drop "replacement" column
- README + CHANGELOGs: drop wash references
@willwashburn willwashburn changed the title Add tool-replacement-eligible hotspot pattern detector Add tool-call-pattern hotspot detector May 2, 2026
Brings the SDK package into this branch (added on main as 12a7ede) and
extends `hotspots({ patterns })` to also surface findings from
`tool-output-bloat`, `ghost-surface`, and `tool-call-pattern`. Previously
only the core `detectPatterns` set reached the SDK.

- packages/cli: export `buildGhostSurfaceInputs` for SDK reuse
- packages/sdk: load Claude settings, query tool-result events, and walk
  the user-installed surface lazily based on requested patterns; widened
  HotspotsOptions JSDoc to enumerate supported kinds
- CHANGELOGs: note the SDK surface expansion

Resolved CHANGELOG conflicts in root, analyze, and cli — kept Unreleased
section ahead of the 1.6.2 / 1.5.0 release blocks from main.
devin-ai-integration[bot]

This comment was marked as resolved.

`attributeHotspots` and `detectPatterns` take `(turns, opts)` positionally.
Calling them with a single object made `opts` undefined and threw on the
`pricing` destructure, leaving every path through `hotspots()` unreachable
including the new side-channel detectors.

Load `pricing` once up front, bucket the flat `userTurns` array into
`userTurnsBySession`, and pass both detectors the proper positional args.
devin-ai-integration[bot]

This comment was marked as resolved.

The window comparison used `>` so a span of 5 (six turn indexes 0..5)
still fit under EDIT_CLUSTER_TURN_WINDOW = 5. Switch to `>=` so the
runtime matches the documented "within 5 consecutive turns" semantics,
and add a boundary test that pins the cap.
@willwashburn willwashburn merged commit b48ebd9 into main May 2, 2026
2 checks passed
@willwashburn willwashburn deleted the claude/fix-issue-220-IUo5i branch May 2, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add tool-replacement-eligible hotspot pattern detector

2 participants