refactor(sdk/analyze): consolidate duplication + split oversized modules#486
Conversation
The analyze submodules each carried private copies of the same small helpers. Consolidate them so the shared behavior has one definition: - fmt_usd → analyze::util (removed 4 identical copies) - severity_from_usd + SEVERITY_*_USD thresholds → reuse the existing pub(crate) findings::severity_from_usd (removed 2 copies + 4 consts) - PER_MILLION → cost::PER_MILLION, now pub(crate) (removed 2 copies) No behavior change: every removed copy was byte-for-byte equivalent. 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The bytes-per-token (~4, ceil) heuristic was reimplemented six times under four different names/constants across analyze submodules, with the UTF-16 vs raw-byte distinction left implicit. Centralize it as named primitives: - util::tokens_from_bytes — ceil(bytes / 4) - util::tokens_from_utf16_len — ceil(utf16_units / 4); TS string.length parity - util::bytes_from_tokens — inverse (tokens * 4) Routed hotspots (utf16), ghost_surface, claude_md, and tool_output_bloat through them. context_delta intentionally keeps floor division for its own approx_tokens field and is left untouched (noted in util). No behavior change; 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grouping turns into per-session buckets in first-seen order (the TS Map<sessionId, TurnRecord[]> iteration contract that fixtures depend on) was hand-rolled six different ways: inline IndexMaps, HashMap + a parallel order-vec, and a Vec + index-map. Three of those reimplemented from scratch what IndexMap already does. Add util::group_turns_by_session, generic over IntoIterator<Item=&TurnRecord> so both &[TurnRecord] and &[&TurnRecord] (claude_md) callers share it, and route hotspots, subagent_tree, claude_md, patterns, quality, and tool_call_patterns through it. Per-session turn_index sorts are unchanged. No behavior change; 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Home-directory resolution was hand-rolled five times with three different behaviors. summary, flow, and tool_output_bloat used the canonical "HOME, then USERPROFILE (Node os.homedir parity), then '.'" form; ghost_surface and ingest resolved HOME only and silently fell back to "." on Windows where HOME is usually unset — a latent bug. Add crate::util::home_dir() with the canonical semantics and route all five through it, so the Windows USERPROFILE fallback now applies uniformly. ledger::ledger_home (RELAYBURN_HOME data root, HOME-only by design) is left separate and noted in the helper's docs. Common path (HOME set) is unchanged everywhere; the only deltas are the now-uniform USERPROFILE fallback and harmless relative-path prefixes on the both-env-unset edge. 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
patterns.rs carried a ~250-line self-contained shell-command parser (segment splitting, quote-aware tokenization, redirect/operand detection) that only the edit-heavy / codex-read detectors use through one entry point. Move it to a patterns/shell.rs submodule exposing just shell_command_has_file_read; the codex read-command vocabulary (is_codex_shell_read_command) moves with it since nothing else references it. patterns.rs drops from 1851 to 1578 lines. Pure code move (byte-identical apart from the one pub(super) on the entry fn); 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The total/max/truncated output-byte fold was copy-pasted verbatim in four hotspots rollups (file, bash, bash-verb, subagent). Extract it into accumulate_output_bytes so the saturating-add / running-max / truncation-count logic has one definition and the four rollups can't drift. McpServer is untouched (it carries no byte fields). Pure extraction (logic byte-identical); 992/992 workspace tests pass. Also rustfmt-normalizes the home_dir chain in user_claude_settings_path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ghost_surface.rs bundled the per-harness GhostSurfaceAdapter implementations (Claude/Codex/OpenCode) and their filesystem-enumeration helpers in with the public types, orchestrator, and finding adapter. Move the adapter cluster — the three adapters, the DirEntry/list_dir_files directory walker, the is_markdown/is_plain_text_surface predicates, the OpenCode catalog reader, and the default_ghost_adapters registry — into ghost_surface/adapters.rs, exposed to the parent as pub(super). The trait, public types, slash-command miners, orchestrator, finding adapter, and tests stay in the parent. ghost_surface.rs drops from 1571 to 1246 lines. Pure move (no logic change); 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hotspots and patterns each carried a copy of stringifyToolResult. The TS originals (recovered from a573de5~1) are byte-identical, but patterns' Rust port diverged at the array catch-all: it JSON-stringified bare scalar blocks (number/bool/null) that TS — and the hotspots port — skip, because such a block is neither `typeof === 'object'` nor `typeof === 'string'`. Move the TS-faithful version (hotspots') into analyze::util::stringify_tool_result and call it from both, deleting the two copies. This realigns patterns with the TS source and removes the drift permanently. No observable change on real input: Claude/Codex tool_result.content is a bare string or an array of typed content-block objects, never bare scalars, so the corrected arm is unreachable in practice and no test exercises it. 992/992 workspace tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 19 minutes and 43 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughShared utility helpers ( ChangesUtility Centralization and Adapter Extraction
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the relayburn-sdk analysis modules by centralizing shared utilities (such as turn grouping, money formatting, token-byte heuristics, and tool-result stringification) into analyze/util.rs, moving ghost surface adapters and shell-parsing helpers into dedicated submodules, and unifying home directory resolution. Feedback on these changes suggests optimizing group_turns_by_session to reduce string allocations, simplifying redundant error handling in list_dir_files, removing redundant state variables in shell-parsing helpers, and refactoring is_signed_integer to use more idiomatic Rust.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| let mut by_session: IndexMap<String, Vec<&'a TurnRecord>> = IndexMap::new(); | ||
| for t in turns { | ||
| by_session.entry(t.session_id.clone()).or_default().push(t); | ||
| } |
There was a problem hiding this comment.
To avoid cloning session_id on every single turn, we can first check if the session already exists in the map using get_mut. This reduces the number of string allocations from
let mut by_session: IndexMap<String, Vec<&'a TurnRecord>> = IndexMap::new();
for t in turns {
if let Some(vec) = by_session.get_mut(&t.session_id) {
vec.push(t);
} else {
by_session.insert(t.session_id.clone(), vec![t]);
}
}| Err(err) => { | ||
| // ENOENT / ENOTDIR: surface no entries. | ||
| if err.kind() == std::io::ErrorKind::NotFound | ||
| || err.kind() == std::io::ErrorKind::NotADirectory | ||
| { | ||
| return Vec::new(); | ||
| } | ||
| return Vec::new(); | ||
| } |
| fn is_redirect_open(token: &str) -> bool { | ||
| // matches `^\d*>` (zero or more digits followed by '>') | ||
| let mut chars = token.chars(); | ||
| let mut saw_any = false; | ||
| let mut found_gt = false; | ||
| let mut leading_digits = 0_usize; | ||
| for c in chars.by_ref() { | ||
| if c.is_ascii_digit() && !found_gt { | ||
| leading_digits += 1; | ||
| continue; | ||
| } | ||
| if c == '>' { | ||
| found_gt = true; | ||
| saw_any = true; | ||
| break; | ||
| } | ||
| break; | ||
| } | ||
| let _ = leading_digits; | ||
| if found_gt { | ||
| return saw_any; | ||
| } | ||
| token.starts_with('>') | ||
| } |
There was a problem hiding this comment.
This function can be significantly simplified. The leading_digits variable is unused (and currently suppressed with let _ = leading_digits;), and the state tracking variables saw_any and found_gt are redundant. We can return true directly when > is encountered after zero or more digits.
fn is_redirect_open(token: &str) -> bool {
// matches `^\d*>` (zero or more digits followed by '>')
let mut chars = token.chars();
for c in chars.by_ref() {
if c.is_ascii_digit() {
continue;
}
if c == '>'; {
return true;
}
break;
}
token.starts_with('>')
}| fn is_pure_redirect(token: &str) -> bool { | ||
| // matches `/^\d*>+$/` or `/^>+$/` | ||
| let mut i = 0_usize; | ||
| let bytes = token.as_bytes(); | ||
| while i < bytes.len() && bytes[i].is_ascii_digit() { | ||
| i += 1; | ||
| } | ||
| if i == bytes.len() { | ||
| return false; | ||
| } | ||
| let mut saw_gt = false; | ||
| while i < bytes.len() { | ||
| if bytes[i] != b'>' { | ||
| return false; | ||
| } | ||
| saw_gt = true; | ||
| i += 1; | ||
| } | ||
| saw_gt | ||
| } |
There was a problem hiding this comment.
The saw_gt variable is redundant. Since i < bytes.len() is guaranteed to be true if the function doesn't return early at line 229, the loop will run at least once and only complete if all remaining characters are b'>'. Thus, we can simply return true at the end of the loop.
fn is_pure_redirect(token: &str) -> bool {
// matches `/^\d*>+$/` or `/^>+$/`
let mut i = 0_usize;
let bytes = token.as_bytes();
while i < bytes.len() && bytes[i].is_ascii_digit() {
i += 1;
}
if i == bytes.len() {
return false;
}
while i < bytes.len() {
if bytes[i] != b'>' {
return false;
}
i += 1;
}
true
}| fn is_signed_integer(token: &str) -> bool { | ||
| // matches `/^[+-]?\d+$/` | ||
| let bytes = token.as_bytes(); | ||
| if bytes.is_empty() { | ||
| return false; | ||
| } | ||
| let mut i = 0_usize; | ||
| if bytes[0] == b'+' || bytes[0] == b'-' { | ||
| i = 1; | ||
| } | ||
| if i == bytes.len() { | ||
| return false; | ||
| } | ||
| while i < bytes.len() { | ||
| if !bytes[i].is_ascii_digit() { | ||
| return false; | ||
| } | ||
| i += 1; | ||
| } | ||
| true | ||
| } |
There was a problem hiding this comment.
We can simplify this function and make it much more idiomatic Rust by leveraging starts_with and chars().all instead of manual index tracking and byte slicing.
fn is_signed_integer(token: &str) -> bool {
// matches `/^[+-]?\d+$/`
let mut s = token;
if s.starts_with('+') || s.starts_with('-') {
s = &s[1..];
}
!s.is_empty() && s.chars().all(|c| c.is_ascii_digit())
}There was a problem hiding this comment.
🧹 Nitpick comments (2)
crates/relayburn-sdk/src/analyze/ghost_surface/adapters.rs (1)
39-50: 💤 Low valueOptional: Simplify redundant error handling.
Both branches of the conditional return
Vec::new(), making theifstatement redundant. If the intent is to silently handle all errors (which the comment at lines 248-251 inenumerate_opencode_projectsuggests), consider simplifying:♻️ Suggested simplification
let entries = match fs::read_dir(dir) { Ok(e) => e, - Err(err) => { - // ENOENT / ENOTDIR: surface no entries. - if err.kind() == std::io::ErrorKind::NotFound - || err.kind() == std::io::ErrorKind::NotADirectory - { - return Vec::new(); - } + Err(_) => { + // ENOENT / ENOTDIR / permission errors: surface no entries. return Vec::new(); } };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/analyze/ghost_surface/adapters.rs` around lines 39 - 50, The error handling in the Err branch of the match statement contains redundant logic where both the if condition (when error kind is NotFound or NotADirectory) and the implicit else path return Vec::new(). Since all error cases return the same value, remove the if statement entirely and replace the entire Err(err) block body with a single return Vec::new() statement, making the code more concise while maintaining the same behavior.crates/relayburn-sdk/src/analyze/patterns/shell.rs (1)
197-220: 💤 Low valueUnused variable and redundant logic.
leading_digitsis computed but never used (line 215 just suppresses the warning). Additionally,saw_anywill always equalfound_gtin this function, making the return on line 217 redundant.♻️ Simplified implementation
fn is_redirect_open(token: &str) -> bool { // matches `^\d*>` (zero or more digits followed by '>') - let mut chars = token.chars(); - let mut saw_any = false; - let mut found_gt = false; - let mut leading_digits = 0_usize; - for c in chars.by_ref() { - if c.is_ascii_digit() && !found_gt { - leading_digits += 1; - continue; - } - if c == '>' { - found_gt = true; - saw_any = true; - break; - } - break; - } - let _ = leading_digits; - if found_gt { - return saw_any; - } - token.starts_with('>') + let mut chars = token.chars().peekable(); + while chars.peek().map_or(false, |c| c.is_ascii_digit()) { + chars.next(); + } + chars.next() == Some('>') }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/relayburn-sdk/src/analyze/patterns/shell.rs` around lines 197 - 220, The is_redirect_open function has unused and redundant logic. Remove the unused leading_digits variable completely (eliminating the counter increment and the `let _ = leading_digits;` line that suppresses the warning). Also remove the saw_any variable since it will always have the same value as found_gt (saw_any is only set to true when found_gt is set to true). Simplify the final return statement to directly return found_gt instead of saw_any, keeping only the fallback check for token.starts_with('>').
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/relayburn-sdk/src/analyze/ghost_surface/adapters.rs`:
- Around line 39-50: The error handling in the Err branch of the match statement
contains redundant logic where both the if condition (when error kind is
NotFound or NotADirectory) and the implicit else path return Vec::new(). Since
all error cases return the same value, remove the if statement entirely and
replace the entire Err(err) block body with a single return Vec::new()
statement, making the code more concise while maintaining the same behavior.
In `@crates/relayburn-sdk/src/analyze/patterns/shell.rs`:
- Around line 197-220: The is_redirect_open function has unused and redundant
logic. Remove the unused leading_digits variable completely (eliminating the
counter increment and the `let _ = leading_digits;` line that suppresses the
warning). Also remove the saw_any variable since it will always have the same
value as found_gt (saw_any is only set to true when found_gt is set to true).
Simplify the final return statement to directly return found_gt instead of
saw_any, keeping only the fallback check for token.starts_with('>').
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 331a5ea1-9cac-4e6d-a00a-507ace48c69b
📒 Files selected for processing (17)
crates/relayburn-sdk/src/analyze/claude_md.rscrates/relayburn-sdk/src/analyze/cost.rscrates/relayburn-sdk/src/analyze/findings.rscrates/relayburn-sdk/src/analyze/ghost_surface.rscrates/relayburn-sdk/src/analyze/ghost_surface/adapters.rscrates/relayburn-sdk/src/analyze/hotspots.rscrates/relayburn-sdk/src/analyze/patterns.rscrates/relayburn-sdk/src/analyze/patterns/shell.rscrates/relayburn-sdk/src/analyze/quality.rscrates/relayburn-sdk/src/analyze/subagent_tree.rscrates/relayburn-sdk/src/analyze/tool_call_patterns.rscrates/relayburn-sdk/src/analyze/tool_output_bloat.rscrates/relayburn-sdk/src/analyze/util.rscrates/relayburn-sdk/src/ingest/ingest.rscrates/relayburn-sdk/src/query_verbs/flow.rscrates/relayburn-sdk/src/query_verbs/summary.rscrates/relayburn-sdk/src/util.rs
Nitpicks from the PR bots (Gemini, CodeRabbit), all behavior-preserving: - group_turns_by_session: clone the session id once per session via get_mut instead of on every turn via entry() — restores O(S) key clones for the HashMap-based callers this consolidation had bumped to O(N). - list_dir_files: collapse the dead `if` whose branches both returned Vec::new() into a single Err(_) arm. - is_redirect_open / is_pure_redirect: drop redundant state (leading_digits, saw_any, saw_gt) that the control flow made unconditional. - is_signed_integer: rewrite with strip_prefix + bytes().all instead of manual index tracking. 992/992 workspace tests pass; build/clippy/fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the reviews. Devin reported no issues; the Gemini + CodeRabbit suggestions were all behavior-preserving nitpicks, and I've applied all five in e044d4a:
Each is equivalent on all inputs (the redirect/integer rewrites were traced case-by-case), and the full workspace suite still passes 992/992 with |
Survey-driven cleanup of
relayburn-sdk'sanalyzemodule: collapse the duplicated helpers the port had scattered across submodules, split two oversized files along cohesive seams, and fix one latent TS-parity bug surfaced along the way.Each commit is self-contained and was individually verified:
cargo build/clippy/fmtclean and the full workspace test suite held at its 992-test baseline at every step, with the real CLI (summary/hotspots/overhead/ingest) live-tested per change.Commits
Deduplication (behavior-preserving):
consolidate duplicated cost/severity/usd helpers—fmt_usd/severity_from_usd/PER_MILLIONhad 4/2/2 copies → single shared definitions.consolidate approx-token heuristic— the~4 bytes/tokenheuristic was reimplemented 6× under 4 names →util::tokens_from_bytes/tokens_from_utf16_len/bytes_from_tokens, preserving the UTF-16-vs-bytes distinction (context_delta's intentional floor division left alone).single group_turns_by_session helper— 6 hand-rolled session groupings (inline IndexMaps, HashMap+order-vec, Vec+index-map) → one generic helper.single crate::util::home_dir()— home resolution hand-rolled 5× with 3 behaviors; unify on HOME→USERPROFILE→., fixing a latent Windows bug (ghost_surface + ingest were missing the USERPROFILE fallback).share output-byte accumulation across rollups— the total/max/truncated fold was copy-pasted in 4 hotspots rollups →accumulate_output_bytes.Cohesion splits (pure code moves):
extract shell tokenizer to patterns/shell.rs—patterns.rs1851 → 1578.extract ghost_surface adapters to submodule—ghost_surface.rs1571 → 1246; the 3 harness adapters + their filesystem helpers move toghost_surface/adapters.rs.Bug fix:
fix: unify tool-result stringify on the TS-faithful version—hotspotsandpatternseach carried a copy ofstringifyToolResult; the TS originals (recovered froma573de5~1) are byte-identical, but patterns' port diverged at the array catch-all, JSON-stringifying bare scalar blocks that TS skips. Unified on the faithful version inutil::stringify_tool_result. No observable change on real input — Claude/Codextool_result.contentis never an array of bare scalars, so the corrected arm is unreachable in practice and no test exercises it.Verification
cargo build --workspace,cargo clippy --workspace,cargo fmt --all --check: all clean.cargo test --workspace: 992 passed (== pre-refactor baseline) on every commit.Notes for reviewers
price_tokensconsolidation (only 2 of 5 candidate sites are cleanly foldable; the rest are entangled with model-counting or compute a per-token rate) and ahotspots.rstypes/attribution/aggregation split.🤖 Generated with Claude Code