relayburn-analyze: port ghost-surface detector#287
Conversation
Port `packages/analyze/src/ghost-surface.ts` and `ghost-surface-inputs.ts` to Rust under `relayburn_analyze::ghost_surface` and `ghost_surface_inputs`. Mirrors the TS adapter shape: per-harness `GhostSurfaceAdapter` impls for Claude / Codex / OpenCode, optional `observed_names` hook for slash-command mining, and a per-source-scoped orchestrator that tolerates missing surfaces and folds findings into a single sorted list. Public surface matches the issue's acceptance list: `detect_ghost_surface`, the three default adapters and `default_ghost_adapters()`, `mine_claude_command_names` / `mine_codex_slash_invocations`, `ghost_surface_to_finding` / `ghost_findings_to_waste_findings`, and the input builders (`build_observed_names_by_source`, `build_session_count_by_source`, `pick_representative_cache_read_rate`, `build_ghost_surface_inputs`). Findings sort deterministically by `(cost desc, sizeTokens desc, path)` and OpenCode catalog-bloat candidates emit with `cost: 0` to dedup against the SystemPromptTax detector. Tests cover the conformance gate from `ghost-surface.test.ts` against the shared fixture corpus under `tests/fixtures/ghost-surface/`, including the cross-source contamination regression (Claude `<command-name>` markers mustn't leak into Codex's slash miner and vice versa). https://claude.ai/code/session_01341aYeGaGXobKLYPQ92YH8
# Conflicts: # CHANGELOG.md # crates/relayburn-analyze/src/lib.rs
📝 WalkthroughWalkthroughThis PR ports a ghost-surface detector from TypeScript to Rust, identifying unused AI agents/skills/commands/prompts/rules/memories across harness sources. It adds per-harness adapters (Claude, Codex, OpenCode), slash-command text mining, cost-based sorting, catalog-bloat deduplication, and conversion to ChangesGhost-Surface Detector Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CHANGELOG.md`:
- Line 8: Remove the package-local note from the root CHANGELOG.md and instead
add it to the relayburn-analyze package CHANGELOG: delete the
`relayburn-analyze` entry containing the description of `ghost_surface`,
`ghost_surface_inputs`, the Claude/Codex/OpenCode adapters, slash-command
miners, the per-source-scoped orchestrator, the `WasteFinding` envelope adapter,
the deterministic sorting by `(cost desc, sizeTokens desc, path)`, and the dedup
mention `countedByCatalogBloat` from the root Unreleased block, and create an
equivalent unreleased entry in the relayburn-analyze/CHANGELOG.md so the change
is recorded at the package level only (leave the root Unreleased block for
cross-package/top-level summaries).
In `@crates/relayburn-analyze/src/ghost_surface.rs`:
- Around line 839-846: The generated shell command in the WasteAction::Command
for archiving uses mkdir and mv with quoted paths but doesn't stop option
parsing; modify the command text to insert "--" after both "mkdir -p" and "mv"
so flags are not interpreted if archive_dir_str, ghost.path or
archive_with_slash start with "-". Update the formatted string inside the
WasteAction::Command (the text built from archive_dir_str, ghost.path,
archive_with_slash) to include those "--" operand separators.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d7c0eaae-892b-4339-978d-c78894808dbe
📒 Files selected for processing (4)
CHANGELOG.mdcrates/relayburn-analyze/src/ghost_surface.rscrates/relayburn-analyze/src/ghost_surface_inputs.rscrates/relayburn-analyze/src/lib.rs
| ## [Unreleased] | ||
|
|
||
| - `relayburn-ingest` (Rust): port the standalone primitives — `pending_stamps` (binary-compatible with the TS `@relayburn/ingest` wire format), `walk` (`walk_jsonl` / `walk_opencode_sessions`), `watch_loop` (`tokio::time::interval`-driven `WatchController` with graceful stop), and the typed `cursors` module layered on the SQLite ledger's cursor blob. Public verb surface (`ingest_all`, per-harness verbs, `reingest_missing_content`) is wired; per-harness orchestration follow-ups deferred to dedicated sub-issues. (#245) | ||
| - `relayburn-analyze` (Rust): port the ghost-surface detector — `ghost_surface` and `ghost_surface_inputs` modules with Claude / Codex / OpenCode adapters, slash-command miners, the per-source-scoped orchestrator, and the `WasteFinding` envelope adapter. Findings sort deterministically by `(cost desc, sizeTokens desc, path)` and dedup against the OpenCode catalog-bloat detector via `countedByCatalogBloat`. (#273) |
There was a problem hiding this comment.
Move this package-local note out of the root changelog.
This PR only changes relayburn-analyze, so this unreleased note should live in that package's changelog unless you explicitly want to elevate it to a top-level cross-package summary.
As per coding guidelines, "Update the root CHANGELOG.md [Unreleased] block only when work spans multiple packages or warrants a top-level summary; single-package work belongs only in that package's CHANGELOG".
🧰 Tools
🪛 LanguageTool
[grammar] ~8-~8: Ensure spelling is correct
Context: ...(cost desc, sizeTokens desc, path)` and dedup against the OpenCode catalog-bloat dete...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@CHANGELOG.md` at line 8, Remove the package-local note from the root
CHANGELOG.md and instead add it to the relayburn-analyze package CHANGELOG:
delete the `relayburn-analyze` entry containing the description of
`ghost_surface`, `ghost_surface_inputs`, the Claude/Codex/OpenCode adapters,
slash-command miners, the per-source-scoped orchestrator, the `WasteFinding`
envelope adapter, the deterministic sorting by `(cost desc, sizeTokens desc,
path)`, and the dedup mention `countedByCatalogBloat` from the root Unreleased
block, and create an equivalent unreleased entry in the
relayburn-analyze/CHANGELOG.md so the change is recorded at the package level
only (leave the root Unreleased block for cross-package/top-level summaries).
There was a problem hiding this comment.
The Rust crates don't have per-crate CHANGELOG.md files yet — only the eight TS packages do. Recent Rust port PRs (#272 subagent-tree/claude-md, #269 compare aggregator) record progress in the root CHANGELOG.md, and I'm following that precedent here. Once the Rust crates start publishing on their own, they'll grow their own CHANGELOGs and this entry will move.
There was a problem hiding this comment.
@willwashburn, that makes sense — no per-crate CHANGELOG.md exists for the Rust packages yet, so the root file is the right place to track this progress, consistent with the precedent in #272 and #269. I'll stand down on this comment.
✏️ Learnings added
Learnt from: willwashburn
Repo: AgentWorkforce/burn PR: 287
File: CHANGELOG.md:8-8
Timestamp: 2026-05-05T11:23:05.101Z
Learning: In the relayburn repository, Rust crates (e.g. relayburn-analyze, relayburn-ingest) do not have per-crate CHANGELOG.md files. Only the eight TypeScript packages have their own changelogs. Rust port progress is recorded in the root CHANGELOG.md [Unreleased] block until the crates begin publishing independently and grow their own CHANGELOGs. Do not flag root CHANGELOG.md entries for Rust crates as misplaced.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| WasteAction::Command { | ||
| label: format!("Archive ghost {kind_label}"), | ||
| text: format!( | ||
| "mkdir -p {} && mv {} {}", | ||
| shell_quote(&archive_dir_str), | ||
| shell_quote(&ghost.path), | ||
| shell_quote(&archive_with_slash), | ||
| ), |
There was a problem hiding this comment.
Stop option parsing before the generated archive paths.
Shell quoting is not enough here: if a ghost file or archive directory starts with -, mkdir/mv will still treat it as a flag. Prefix both commands with -- so the suggested remediation always treats these values as operands.
Suggested patch
WasteAction::Command {
label: format!("Archive ghost {kind_label}"),
text: format!(
- "mkdir -p {} && mv {} {}",
+ "mkdir -p -- {} && mv -- {} {}",
shell_quote(&archive_dir_str),
shell_quote(&ghost.path),
shell_quote(&archive_with_slash),
),
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/relayburn-analyze/src/ghost_surface.rs` around lines 839 - 846, The
generated shell command in the WasteAction::Command for archiving uses mkdir and
mv with quoted paths but doesn't stop option parsing; modify the command text to
insert "--" after both "mkdir -p" and "mv" so flags are not interpreted if
archive_dir_str, ghost.path or archive_with_slash start with "-". Update the
formatted string inside the WasteAction::Command (the text built from
archive_dir_str, ghost.path, archive_with_slash) to include those "--" operand
separators.
There was a problem hiding this comment.
Skipping. The Rust port intentionally mirrors packages/analyze/src/ghost-surface.ts byte-for-byte — the conformance gate runs against the shared TS fixture corpus. The TS source emits mkdir -p ${shellQuote(archiveDir)} && mv ${shellQuote(ghost.path)} ${shellQuote(archiveDir + '/')} with no -- separator (see packages/analyze/src/ghost-surface.ts:754), so adding -- here would diverge from TS without fixing the same risk upstream.
The risk is also vanishingly small in practice: archive_dir defaults to ~/.relayburn/... and ghost.path is enumerated from well-known dirs (~/.claude/agents, ~/.claude/skills, opencode.json, etc.). A leading-- filename in any of those would be unusual.
Worth tracking as a separate hardening pass that updates both trees in lockstep.
There was a problem hiding this comment.
@willwashburn, understood — the parity constraint with the TS conformance corpus is a good reason to keep both trees in sync. The risk profile you describe (paths rooted under well-known home subdirectories) also supports deferring.
Would you like me to open a follow-up GitHub issue to track the joint hardening pass (mkdir -p -- / mv -- in both packages/analyze/src/ghost-surface.ts and the Rust port)? I can link this PR and comment for traceability.
Closes #273.
Summary
Port
packages/analyze/src/ghost-surface.tsandghost-surface-inputs.tsto Rust underrelayburn_analyze::ghost_surfaceandghost_surface_inputs. Mirrors the TS adapter shape: per-harnessGhostSurfaceAdapterimpls for Claude / Codex / OpenCode, an optionalobserved_nameshook for slash-command mining, and a per-source-scoped orchestrator that tolerates missing surfaces and folds findings into a single sorted list.Public surface (matches issue acceptance list)
detect_ghost_surface,detect_ghost_surface_with_adapters,default_ghost_adapters(),ClaudeGhostAdapter,CodexGhostAdapter,OpenCodeGhostAdaptermine_claude_command_names,mine_codex_slash_invocationsghost_surface_to_finding,ghost_findings_to_waste_findingsbuild_observed_names_by_source,build_session_count_by_source,pick_representative_cache_read_rate,build_ghost_surface_inputsGhostFindingKind,GhostSurfaceFinding,GhostCandidate,GhostSurfaceInputs,GhostSurfaceAdapter,DetectGhostSurfaceOptions,GhostSurfaceFindingOptionsNotes
GhostSurfaceAdapteris a Rust trait (source()+enumerate()+ optionalobserved_names()); the TS literal-shapeconst adapter: GhostSurfaceAdapter = {...}becomes a unit struct + impl block per harness.build_ghost_surface_inputsoperates on&[TurnRecord]rather thanEnrichedTurn[]and takesuser_turn_text_by_sessionas a parameter, keepingrelayburn-analyzeI/O-free; the production caller (relayburn-cli) loads the content sidecar.(cost desc, size_tokens desc, path). OpenCode catalog-bloat candidates emit withcost: 0(andcounted_by_catalog_bloat: Some(true)) to dedup against theSystemPromptTaxdetector — same dollar accounting as TS.Test plan
cargo test -p relayburn-analyze— 115 passed (27 new ghost-surface + ghost-surface-inputs tests)cargo build --workspace --all-targetscleancargo clippy -p relayburn-analyze --all-targetscleancargo fmt -p relayburn-analyze --checkcleanghost-surface.test.tsagainst the shared fixture corpus undertests/fixtures/ghost-surface/, including the cross-source contamination regression (Claude<command-name>markers mustn't leak into Codex's slash miner and vice versa).https://claude.ai/code/session_01341aYeGaGXobKLYPQ92YH8
Generated by Claude Code