Skip to content

relayburn-analyze: port ghost-surface detector#287

Merged
willwashburn merged 2 commits into
mainfrom
claude/close-issue-273-webhooks-CO8WQ
May 5, 2026
Merged

relayburn-analyze: port ghost-surface detector#287
willwashburn merged 2 commits into
mainfrom
claude/close-issue-273-webhooks-CO8WQ

Conversation

@willwashburn

Copy link
Copy Markdown
Member

Closes #273.

Summary

Port packages/analyze/src/ghost-surface.ts and ghost-surface-inputs.ts to Rust under relayburn_analyze::ghost_surface and ghost_surface_inputs. Mirrors the TS adapter shape: per-harness GhostSurfaceAdapter impls for Claude / Codex / OpenCode, an optional observed_names hook for slash-command mining, and a per-source-scoped orchestrator that tolerates missing surfaces and folds findings into a single sorted list.

Public surface (matches issue acceptance list)

  • Detector: detect_ghost_surface, detect_ghost_surface_with_adapters, default_ghost_adapters(), ClaudeGhostAdapter, CodexGhostAdapter, OpenCodeGhostAdapter
  • Slash-command miners: mine_claude_command_names, mine_codex_slash_invocations
  • Finding adapters: ghost_surface_to_finding, ghost_findings_to_waste_findings
  • Inputs: build_observed_names_by_source, build_session_count_by_source, pick_representative_cache_read_rate, build_ghost_surface_inputs
  • Types: GhostFindingKind, GhostSurfaceFinding, GhostCandidate, GhostSurfaceInputs, GhostSurfaceAdapter, DetectGhostSurfaceOptions, GhostSurfaceFindingOptions

Notes

  • GhostSurfaceAdapter is a Rust trait (source() + enumerate() + optional observed_names()); the TS literal-shape const adapter: GhostSurfaceAdapter = {...} becomes a unit struct + impl block per harness.
  • build_ghost_surface_inputs operates on &[TurnRecord] rather than EnrichedTurn[] and takes user_turn_text_by_session as a parameter, keeping relayburn-analyze I/O-free; the production caller (relayburn-cli) loads the content sidecar.
  • Findings sort deterministically by (cost desc, size_tokens desc, path). OpenCode catalog-bloat candidates emit with cost: 0 (and counted_by_catalog_bloat: Some(true)) to dedup against the SystemPromptTax detector — same dollar accounting as TS.

Test plan

  • cargo test -p relayburn-analyze — 115 passed (27 new ghost-surface + ghost-surface-inputs tests)
  • cargo build --workspace --all-targets clean
  • cargo clippy -p relayburn-analyze --all-targets clean
  • cargo fmt -p relayburn-analyze --check clean
  • Tests cover the conformance gate from ghost-surface.test.ts against the shared fixture corpus under tests/fixtures/ghost-surface/, including the cross-source contamination regression (Claude <command-name> markers mustn't leak into Codex's slash miner and vice versa).

https://claude.ai/code/session_01341aYeGaGXobKLYPQ92YH8


Generated by Claude Code

claude and others added 2 commits May 5, 2026 01:51
Port `packages/analyze/src/ghost-surface.ts` and
`ghost-surface-inputs.ts` to Rust under `relayburn_analyze::ghost_surface`
and `ghost_surface_inputs`. Mirrors the TS adapter shape: per-harness
`GhostSurfaceAdapter` impls for Claude / Codex / OpenCode, optional
`observed_names` hook for slash-command mining, and a per-source-scoped
orchestrator that tolerates missing surfaces and folds findings into a
single sorted list.

Public surface matches the issue's acceptance list: `detect_ghost_surface`,
the three default adapters and `default_ghost_adapters()`,
`mine_claude_command_names` / `mine_codex_slash_invocations`,
`ghost_surface_to_finding` / `ghost_findings_to_waste_findings`, and the
input builders (`build_observed_names_by_source`,
`build_session_count_by_source`, `pick_representative_cache_read_rate`,
`build_ghost_surface_inputs`). Findings sort deterministically by
`(cost desc, sizeTokens desc, path)` and OpenCode catalog-bloat candidates
emit with `cost: 0` to dedup against the SystemPromptTax detector.

Tests cover the conformance gate from `ghost-surface.test.ts` against the
shared fixture corpus under `tests/fixtures/ghost-surface/`, including
the cross-source contamination regression (Claude `<command-name>` markers
mustn't leak into Codex's slash miner and vice versa).

https://claude.ai/code/session_01341aYeGaGXobKLYPQ92YH8
# Conflicts:
#	CHANGELOG.md
#	crates/relayburn-analyze/src/lib.rs
@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR ports a ghost-surface detector from TypeScript to Rust, identifying unused AI agents/skills/commands/prompts/rules/memories across harness sources. It adds per-harness adapters (Claude, Codex, OpenCode), slash-command text mining, cost-based sorting, catalog-bloat deduplication, and conversion to WasteFinding records for reporting.

Changes

Ghost-Surface Detector Implementation

Layer / File(s) Summary
Data Models & Traits
crates/relayburn-analyze/src/ghost_surface.rs (lines 36–154)
GhostFindingKind, GhostSurfaceFinding, GhostCandidate, GhostSurfaceInputs structs and GhostSurfaceAdapter trait define the ghost detection surface, options, and adapter contract.
Core Detection & Adapters
crates/relayburn-analyze/src/ghost_surface.rs (lines 155–626)
File enumeration helpers, markdown/plaintext surface filtering, ClaudeGhostAdapter (markdown agents/skills/commands), CodexGhostAdapter (plaintext prompts/skills/rules/memory), and OpenCodeGhostAdapter (JSON catalog dedup) with slash-command mining (mine_claude_command_names, mine_codex_slash_invocations).
Detection Orchestration
crates/relayburn-analyze/src/ghost_surface.rs (lines 628–736)
detect_ghost_surface and detect_ghost_surface_with_adapters merge observed names per source, filter uninvoked candidates, compute per-session/cumulative USD costs (zeroing catalog-bloat entries), and sort deterministically by cost, size, and path.
Output Transformation
crates/relayburn-analyze/src/ghost_surface.rs (lines 742–920)
ghost_surface_to_finding and ghost_findings_to_waste_findings convert findings to WasteFinding records with severity thresholds, archive/removal actions (including POSIX shell quoting and JSON-pointer path handling for OpenCode), and human-readable titles/details.
Input Aggregation
crates/relayburn-analyze/src/ghost_surface_inputs.rs (lines 16–116)
build_observed_names_by_source, build_session_count_by_source, pick_representative_cache_read_rate, and build_ghost_surface_inputs assemble ghost-surface inputs from TurnRecord slices with per-source deduplication and fallback behavior.
Public API & Documentation
crates/relayburn-analyze/src/lib.rs, CHANGELOG.md
New pub mod ghost_surface and pub mod ghost_surface_inputs with comprehensive re-exports; CHANGELOG.md documents feature, deterministic sorting, and catalog-bloat deduplication.
Tests
crates/relayburn-analyze/src/ghost_surface.rs (lines 926–1629), crates/relayburn-analyze/src/ghost_surface_inputs.rs (lines 118–294)
Extensive unit tests for directory enumeration, v1 fallback, slash-command de-ghosting, scoping isolation, catalog-bloat zero-cost behavior, sorting, case-insensitive matching, and input aggregation (name unioning, session dedup, cache-rate selection).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Poem

🐰 A ghost-surface sleuth hops through the brush,
Finding skills unused, without any rush.
With adapters and miners, it sorts them with care,
Converting lost agents to waste-findings fair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: porting a TypeScript ghost-surface detector to Rust in the relayburn-analyze crate.
Description check ✅ Passed The description comprehensively details the ported functionality, public API surface, implementation notes, and test coverage—all directly relevant to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/close-issue-273-webhooks-CO8WQ

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 8: Remove the package-local note from the root CHANGELOG.md and instead
add it to the relayburn-analyze package CHANGELOG: delete the
`relayburn-analyze` entry containing the description of `ghost_surface`,
`ghost_surface_inputs`, the Claude/Codex/OpenCode adapters, slash-command
miners, the per-source-scoped orchestrator, the `WasteFinding` envelope adapter,
the deterministic sorting by `(cost desc, sizeTokens desc, path)`, and the dedup
mention `countedByCatalogBloat` from the root Unreleased block, and create an
equivalent unreleased entry in the relayburn-analyze/CHANGELOG.md so the change
is recorded at the package level only (leave the root Unreleased block for
cross-package/top-level summaries).

In `@crates/relayburn-analyze/src/ghost_surface.rs`:
- Around line 839-846: The generated shell command in the WasteAction::Command
for archiving uses mkdir and mv with quoted paths but doesn't stop option
parsing; modify the command text to insert "--" after both "mkdir -p" and "mv"
so flags are not interpreted if archive_dir_str, ghost.path or
archive_with_slash start with "-". Update the formatted string inside the
WasteAction::Command (the text built from archive_dir_str, ghost.path,
archive_with_slash) to include those "--" operand separators.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d7c0eaae-892b-4339-978d-c78894808dbe

📥 Commits

Reviewing files that changed from the base of the PR and between 1012270 and 244bf8b.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • crates/relayburn-analyze/src/ghost_surface.rs
  • crates/relayburn-analyze/src/ghost_surface_inputs.rs
  • crates/relayburn-analyze/src/lib.rs

Comment thread CHANGELOG.md
## [Unreleased]

- `relayburn-ingest` (Rust): port the standalone primitives — `pending_stamps` (binary-compatible with the TS `@relayburn/ingest` wire format), `walk` (`walk_jsonl` / `walk_opencode_sessions`), `watch_loop` (`tokio::time::interval`-driven `WatchController` with graceful stop), and the typed `cursors` module layered on the SQLite ledger's cursor blob. Public verb surface (`ingest_all`, per-harness verbs, `reingest_missing_content`) is wired; per-harness orchestration follow-ups deferred to dedicated sub-issues. (#245)
- `relayburn-analyze` (Rust): port the ghost-surface detector — `ghost_surface` and `ghost_surface_inputs` modules with Claude / Codex / OpenCode adapters, slash-command miners, the per-source-scoped orchestrator, and the `WasteFinding` envelope adapter. Findings sort deterministically by `(cost desc, sizeTokens desc, path)` and dedup against the OpenCode catalog-bloat detector via `countedByCatalogBloat`. (#273)

@coderabbitai coderabbitai Bot May 5, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move this package-local note out of the root changelog.

This PR only changes relayburn-analyze, so this unreleased note should live in that package's changelog unless you explicitly want to elevate it to a top-level cross-package summary.

As per coding guidelines, "Update the root CHANGELOG.md [Unreleased] block only when work spans multiple packages or warrants a top-level summary; single-package work belongs only in that package's CHANGELOG".

🧰 Tools
🪛 LanguageTool

[grammar] ~8-~8: Ensure spelling is correct
Context: ...(cost desc, sizeTokens desc, path)` and dedup against the OpenCode catalog-bloat dete...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` at line 8, Remove the package-local note from the root
CHANGELOG.md and instead add it to the relayburn-analyze package CHANGELOG:
delete the `relayburn-analyze` entry containing the description of
`ghost_surface`, `ghost_surface_inputs`, the Claude/Codex/OpenCode adapters,
slash-command miners, the per-source-scoped orchestrator, the `WasteFinding`
envelope adapter, the deterministic sorting by `(cost desc, sizeTokens desc,
path)`, and the dedup mention `countedByCatalogBloat` from the root Unreleased
block, and create an equivalent unreleased entry in the
relayburn-analyze/CHANGELOG.md so the change is recorded at the package level
only (leave the root Unreleased block for cross-package/top-level summaries).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rust crates don't have per-crate CHANGELOG.md files yet — only the eight TS packages do. Recent Rust port PRs (#272 subagent-tree/claude-md, #269 compare aggregator) record progress in the root CHANGELOG.md, and I'm following that precedent here. Once the Rust crates start publishing on their own, they'll grow their own CHANGELOGs and this entry will move.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willwashburn, that makes sense — no per-crate CHANGELOG.md exists for the Rust packages yet, so the root file is the right place to track this progress, consistent with the precedent in #272 and #269. I'll stand down on this comment.


✏️ Learnings added
Learnt from: willwashburn
Repo: AgentWorkforce/burn PR: 287
File: CHANGELOG.md:8-8
Timestamp: 2026-05-05T11:23:05.101Z
Learning: In the relayburn repository, Rust crates (e.g. relayburn-analyze, relayburn-ingest) do not have per-crate CHANGELOG.md files. Only the eight TypeScript packages have their own changelogs. Rust port progress is recorded in the root CHANGELOG.md [Unreleased] block until the crates begin publishing independently and grow their own CHANGELOGs. Do not flag root CHANGELOG.md entries for Rust crates as misplaced.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +839 to +846
WasteAction::Command {
label: format!("Archive ghost {kind_label}"),
text: format!(
"mkdir -p {} && mv {} {}",
shell_quote(&archive_dir_str),
shell_quote(&ghost.path),
shell_quote(&archive_with_slash),
),

@coderabbitai coderabbitai Bot May 5, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stop option parsing before the generated archive paths.

Shell quoting is not enough here: if a ghost file or archive directory starts with -, mkdir/mv will still treat it as a flag. Prefix both commands with -- so the suggested remediation always treats these values as operands.

Suggested patch
             WasteAction::Command {
                 label: format!("Archive ghost {kind_label}"),
                 text: format!(
-                    "mkdir -p {} && mv {} {}",
+                    "mkdir -p -- {} && mv -- {} {}",
                     shell_quote(&archive_dir_str),
                     shell_quote(&ghost.path),
                     shell_quote(&archive_with_slash),
                 ),
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-analyze/src/ghost_surface.rs` around lines 839 - 846, The
generated shell command in the WasteAction::Command for archiving uses mkdir and
mv with quoted paths but doesn't stop option parsing; modify the command text to
insert "--" after both "mkdir -p" and "mv" so flags are not interpreted if
archive_dir_str, ghost.path or archive_with_slash start with "-". Update the
formatted string inside the WasteAction::Command (the text built from
archive_dir_str, ghost.path, archive_with_slash) to include those "--" operand
separators.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipping. The Rust port intentionally mirrors packages/analyze/src/ghost-surface.ts byte-for-byte — the conformance gate runs against the shared TS fixture corpus. The TS source emits mkdir -p ${shellQuote(archiveDir)} && mv ${shellQuote(ghost.path)} ${shellQuote(archiveDir + '/')} with no -- separator (see packages/analyze/src/ghost-surface.ts:754), so adding -- here would diverge from TS without fixing the same risk upstream.

The risk is also vanishingly small in practice: archive_dir defaults to ~/.relayburn/... and ghost.path is enumerated from well-known dirs (~/.claude/agents, ~/.claude/skills, opencode.json, etc.). A leading-- filename in any of those would be unusual.

Worth tracking as a separate hardening pass that updates both trees in lockstep.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willwashburn, understood — the parity constraint with the TS conformance corpus is a good reason to keep both trees in sync. The risk profile you describe (paths rooted under well-known home subdirectories) also supports deferring.

Would you like me to open a follow-up GitHub issue to track the joint hardening pass (mkdir -p -- / mv -- in both packages/analyze/src/ghost-surface.ts and the Rust port)? I can link this PR and comment for traceability.

@willwashburn willwashburn merged commit b917d48 into main May 5, 2026
3 checks passed
@willwashburn willwashburn deleted the claude/close-issue-273-webhooks-CO8WQ branch May 5, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rust port] relayburn-analyze: ghost-surface detector

2 participants