Skip to content

perf(ingest): no-op fast path gated on a source fingerprint (#468)#471

Merged
willwashburn merged 4 commits into
mainfrom
fix-ingest-noop-fast-path
Jun 3, 2026
Merged

perf(ingest): no-op fast path gated on a source fingerprint (#468)#471
willwashburn merged 4 commits into
mainfrom
fix-ingest-noop-fast-path

Conversation

@willwashburn

@willwashburn willwashburn commented Jun 3, 2026

Copy link
Copy Markdown
Member

User description

Fixes #468.

Problem

ingest() cost ~0.7s on every call even when nothing upstream had changed. Profiling a 94k-turn / ~125 MB ledger showed a no-op sweep spends its time on work that produces zero new turns:

phase cost
load_cursors (parse the cursor blob) ~58 ms
clone + save-compare of the cursor map ~33 ms
per-file cursor deserialize (get_typed) ~73 ms
source walk + fs::metadata per file ~170 ms+
per-file parse only the rare changed files

fingerprint() already detects "nothing changed" in ~47 ms, but it reflects the post-ingest turns table, so it can't gate ingest itself.

Fix

Add a stat-only source fingerprintcount:totalBytes:mtimeSum over the Claude/Codex/OpenCode session files (folding in OpenCode's per-session message-dir mtime, the signal that adapter watches for new messages). ingest_all computes it up front and, when it matches the value the previous sweep recorded, returns an empty report without loading cursors, cloning, deserializing per-file cursors, or parsing.

  • New archive_state.source_fingerprint column (schema v6, in-place ALTER TABLE ADD COLUMN, auto-migrated; legacy '' never matches so the first post-upgrade ingest walks normally and records it).
  • The fingerprint captured at the start of a sweep is stored after it completes — the conservative choice: any file that moves mid-sweep yields a different fingerprint next call, forcing a cheap re-scan that per-file cursors then skip. Storing the end value could miss bytes written after the walk passed a file.
  • Blanked by state reset and rebuild_derivable, so a wiped/rebuilt ledger always re-walks.

Detects every relevant change: file add/remove (count + bytes), in-place append (bytes + mtime — which directory mtimes miss), truncate (bytes), and touch (mtime sum, sensitive in either direction so a backwards clock still registers).

Impact

A no-op burn ingest on a live ledger dropped from ~0.74s → ~0.19s. The residual is the unavoidable stat walk over the source trees — on this machine an atypically large OpenCode tree (138k message files); on Claude/Codex-dominant ledgers (the polling/refresh case the issue targets) it returns in tens of ms.

Tests

  • source_fingerprint_is_stable_and_moves_on_change — stable across calls; moves on file add and on append.
  • ingest_all_short_circuits_when_source_unchanged — second sweep with unchanged source reports scanned_sessions == 0 (proves the loops never ran, not just that nothing was appended).
  • ingest_all_rescans_after_source_file_appended — a new source file defeats the gate and its turns are ingested + queryable.
  • legacy_v1_ledger_migrates_to_v2_on_open… extended to assert the v6 source_fingerprint column is added on migration.

Full cargo test --workspace green.

🤖 Generated with Claude Code


CodeAnt-AI Description

Skip unchanged ingest sweeps, while still rescanning on real file events

What Changed

  • ingest() now returns immediately when the source files match the last recorded fingerprint, so a no-op sweep no longer loads cursors or walks every session file
  • Watch-driven ingest ticks still force a full rescan, so file updates from live edits are not skipped
  • The saved fingerprint is cleared when state is reset or rebuilt, and it now survives closing and reopening the ledger
  • Upgrading older ledgers adds the new fingerprint field automatically

Impact

✅ Faster no-op ingest
✅ Fewer unnecessary session scans
✅ Safer live file updates during watch mode

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

`ingest()` cost ~0.7s on every call even when nothing upstream changed —
it always loaded the cursor blob, cloned it, walked every session file,
and deserialized a cursor per file. Profiling a 94k-turn ledger showed the
no-op cost is dominated by the cursor load/clone + per-file deserialize
plus the source walk, none of which produce any new turns.

Add a stat-only source fingerprint (`count:totalBytes:mtimeSum` over the
Claude/Codex/OpenCode session files, folding in OpenCode's per-session
message-dir mtime). `ingest_all` computes it up front and, when it matches
the value the last sweep recorded in the new `archive_state.source_fingerprint`
column, returns an empty report without loading cursors or parsing anything.
The fingerprint is stored after every full sweep; it is blanked by
`state reset` and `rebuild_derivable` so a wiped/rebuilt ledger always
re-walks.

Storing the fingerprint captured at the *start* of the sweep is the safe
choice: a file that changes mid-sweep yields a different fingerprint next
call, forcing a (cheap) re-scan that per-file cursors then skip.

Schema bumped to v6 via in-place `ALTER TABLE ADD COLUMN` (auto-migrated).

On a live ledger a no-op `burn ingest` dropped from ~0.74s to ~0.19s
(bounded by the unavoidable stat walk over the source trees); on
Claude/Codex-dominant ledgers without a large OpenCode tree it returns in
tens of ms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@codeant-ai

codeant-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR adds a stat-only source fingerprint used by ingest_all to detect unchanged upstream session stores and short-circuit ingest. Schema bumped to v6; ledger read/write API persists the fingerprint; ingest computes and compares the fingerprint, skips scanning when unchanged (unless forced), and avoids persisting fingerprints if files were skipped. Watch/CLI and tests updated accordingly.

Changes

Source fingerprint fast-path for ingest

Layer / File(s) Summary
Schema definition and migration
crates/relayburn-sdk/src/ledger/schema.rs, crates/relayburn-sdk/src/ledger/db.rs, crates/relayburn-sdk/src/ledger/tests.rs, crates/relayburn-sdk/src/query_verbs.rs
Adds schema v6 and archive_state.source_fingerprint TEXT NOT NULL DEFAULT '' to DDL, makes migration idempotent for v<6, updates bootstrap to v6, and adjusts tests to assert schema_version == 6 and column presence.
Ledger fingerprint API and state management
crates/relayburn-sdk/src/ledger.rs
Adds read_source_fingerprint() and write_source_fingerprint() and ensures rebuild_derivable and reset clear source_fingerprint.
Fingerprinting and fast-path implementation
crates/relayburn-sdk/src/ingest/ingest.rs
Adds force_scan to IngestOptions; implements source_fingerprint(roots) (counts, sizes, mtimes, OpenCode dir entry counts), computes current_fp at ingest start, compares to stored fingerprint, early-returns empty IngestReport when unchanged and not forced, and persists fingerprint at end when no skips occurred.
Skip-safety propagation
crates/relayburn-sdk/src/ingest/ingest.rs
Introduces had_skips flag propagated into harness helpers and set when metadata/read/parse fails; write_source_fingerprint is conditional on !had_skips.
Watch loop and CLI force-scan wiring
crates/relayburn-sdk/src/ingest/watch_loop.rs, crates/relayburn-cli/src/commands/ingest.rs, crates/relayburn-cli/src/harnesses/pending_stamp.rs, crates/relayburn-sdk/src/ingest_verb.rs
Updates IngestFn to accept force: bool, threads force into RawIngestOptions.force_scan for FS-event-driven ticks, adjusts pending-stamp adapter to accept/ignore the flag, and ensures public ingest verb sets force_scan = false.
Fingerprint unit tests
crates/relayburn-sdk/src/ingest/ingest.rs
Adds unit tests validating fingerprint stability across unchanged roots and movement on file add/append, including OpenCode message-dir child-count detection.
Fast-path integration tests
crates/relayburn-sdk/src/ingest/orchestration_tests.rs
Adds orchestration tests verifying short-circuit (no scan, no appended turns), cleanup during fast-path, force_scan bypass, rescan-after-append, skip-safety (Unix unreadable file case), persistence across ledger reopen, reset behavior blanking fingerprint, and claude_distinct_session helper.
Watch loop tests updates
crates/relayburn-sdk/src/ingest/watch_loop_tests.rs
Updates watch-loop tests to accept force parameter and asserts burst-driven forced ticks occur when expected.
User-facing documentation
CHANGELOG.md, packages/sdk-node/CHANGELOG.md
Documents ingest() early-return when upstream unchanged and schema v6 addition; minor line-wrapping adjustments.

Sequence Diagram(s)

sequenceDiagram
  participant ingest_all
  participant source_fingerprint
  participant Ledger
  participant ScanOrchestration
  ingest_all->>source_fingerprint: compute current_fp (stat-only)
  ingest_all->>Ledger: read_source_fingerprint()
  alt stored == current_fp && !force_scan
    Ledger->>ingest_all: return match
    ingest_all->>ingest_all: return empty IngestReport
  else
    ingest_all->>ScanOrchestration: perform full scan/parse
    ScanOrchestration->>ingest_all: parsed turns + had_skips flag
    alt !had_skips
      ingest_all->>Ledger: write_source_fingerprint(current_fp)
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AgentWorkforce/burn#419: Both PRs modify ingest_all behavior and related ingest plumbing; #419 refactors ingest control flow while this PR adds fingerprint gating and force_scan handling.

Poem

🐰 I counted files with gentle paws,

a fingerprint to skip false claws.
When nothing’s new the ledger sighs,
and ingest naps — a faster prize.
Tiny hops, big time saved — hoorays!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a fast-path optimization to ingest() gated on a source fingerprint, directly addressing the core problem of unnecessary full scans.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, clearly explaining the problem, fix, impact, and test coverage for the source fingerprint optimization.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-ingest-noop-fast-path

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label Jun 3, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec53de3473

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +322 to +324
ledger
.write_source_fingerprint(&current_fp)
.map_err(|e| anyhow::anyhow!(e))?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid caching fingerprints after skipped source files

When any harness sees a source file but cannot read or parse it, the per-file loop logs the error and continues, yet this still persists the fingerprint that already counted that file. If the failure is transient without changing size/mtime (for example a PermissionDenied read fixed with chmod, which changes ctime but not the fingerprint inputs), the next ingest_all matches this stored fingerprint and returns before retrying, leaving that session un-ingested indefinitely. Only record the fingerprint when every counted source was successfully processed, or include failed files in a way that forces a retry.

Useful? React with 👍 / 👎.

Comment on lines +561 to +563
let sid_b = "cccccccc-cccc-cccc-cccc-cccccccccccc";
let file_b = project_dir.join(format!("{sid_b}.jsonl"));
fs::write(&file_b, claude_distinct_session(sid_b)).unwrap();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test claims to verify that appending to an existing session file reopens the fast-path gate, but it actually creates a brand-new file. That only validates add-file detection (count change), so a regression where in-place appends are missed could still pass. Update the test to append new JSONL lines to the existing file and assert those new turns are ingested. [incomplete implementation]

Severity Level: Major ⚠️
- ⚠️ Append-only session updates may regress without test failure.
- ⚠️ Fast-path source fingerprint gate untested for in-place appends.
- ⚠️ Future refactors risk missing append detection semantics.
Steps of Reproduction ✅
1. Open `crates/relayburn-sdk/src/ingest/orchestration_tests.rs` and locate the test
`ingest_all_rescans_after_source_file_appended` starting around line 529, with its doc
comment at lines 527–528 describing "appending to an existing session file" changing size
and mtime.

2. Within that test, observe the initial setup at lines 522–524 where `sid_a` and `file_a`
are created and `fs::write(&file_a, claude_minimal_session(sid_a)).unwrap();` writes the
first session file, followed by `ingest_all(&mut ledger, &opts).unwrap();` at line 551
that ingests it and arms the fast-path gate.

3. After confirming the fast path is armed via `assert_eq!(ingest_all(&mut ledger,
&opts).unwrap().scanned_sessions, 0);` at line 555, examine the subsequent code at lines
561–563: it defines `sid_b`, computes `file_b`, and calls `fs::write(&file_b,
claude_distinct_session(sid_b)).unwrap();`, which creates an entirely new JSONL file
rather than appending to the existing `file_a`.

4. Run `cargo test ingest_all_rescans_after_source_file_appended` in the
`crates/relayburn-sdk` crate; the test will pass if adding a new file correctly reopens
the fast-path gate (checked via `r3.scanned_sessions >= 1` and `r3.appended_turns >= 1` at
lines 47–53) even if a regression were introduced where in-place appends to an existing
session file do not update the fingerprint, because no code path in this test ever appends
to `file_a` as implied by the doc comment.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** crates/relayburn-sdk/src/ingest/orchestration_tests.rs
**Line:** 561:563
**Comment:**
	*Incomplete Implementation: This test claims to verify that appending to an existing session file reopens the fast-path gate, but it actually creates a brand-new file. That only validates add-file detection (count change), so a regression where in-place appends are missed could still pass. Update the test to append new JSONL lines to the existing file and assert those new turns are ingested.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai

codeant-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer applied fixes — committed and pushed 51d3632 to this PR. The notes below describe what changed.

Reviewed and fixed the PR locally.

Changes made:

  • Strengthened source_fingerprint from count:totalBytes:mtimeSum to count:totalBytes:hash, folding in source kind, path, byte length, and mtime. This avoids mtime-sum aliasing and detects renames/path changes.
  • Moved stale pending-stamp cleanup before the ingest fast path so ingest_all still honors its cleanup behavior even when it short-circuits.
  • Added a regression test for fast-path pending-stamp cleanup.
  • Updated schema comments and test expectations for the new fingerprint format.

Validation:

  • git diff --check passed.
  • corepack pnpm --filter @relayburn/sdk test passed, with expected native-binding skips.
  • corepack pnpm --filter @relayburn/mcp test passed.
  • Rust tests could not be run because cargo/Rust toolchain is not installed in this environment.
  • Top-level corepack pnpm test could not run because the package script invokes pnpm directly and no pnpm shim is installed on PATH.

@agent-relay-code

Copy link
Copy Markdown
Contributor

Reviewed the PR diff and traced the ingest/schema/status impact.

Fixed:

  • Updated schema docs to reflect that source_fingerprint is cleared on both state rebuild and state reset.
  • Updated stale migration-test comments from schema v5 wording to schema v6.
  • Ran Prettier on the changed changelogs and verified they pass formatting.

Verification:

  • npx --yes prettier --check CHANGELOG.md packages/sdk-node/CHANGELOG.md passed.
  • Could not run Rust tests because cargo is not installed in this runner.
  • Could not run workspace Node tests because pnpm is not installed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
CHANGELOG.md (1)

9-9: ⚡ Quick win

Simplify to impact-first wording per coding guidelines.

The entry includes implementation backstory ("stat-only source fingerprint is compared...", "skips the cursor load and per-file deserialize") and a PR reference that the guidelines ask you to drop. Focus on the practical effect: ingest() is now much faster on unchanged sources.

As per coding guidelines, changelog entries should be concise and impact-first, dropping implementation details unless they clearly explain the shipped impact.

Suggested condensed wording
-- `ingest()` short-circuits when nothing upstream changed: a stat-only source fingerprint is compared against the last sweep's value, so a no-op ingest skips the cursor load and per-file deserialize and returns `{ ingested: 0 }` in roughly the cost of walking the source dirs instead of ~0.7s. Adds `archive_state.source_fingerprint` (schema v6, auto-migrated). (`#468`)
+- `ingest()` skips re-scanning when upstream sessions haven't changed, reducing no-op calls from ~0.7s to ~0.2s. Adds `archive_state.source_fingerprint` (schema v6, auto-migrated).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` at line 9, Rewrite the changelog line to be impact-first and
concise: state that ingest() is now much faster when sources are unchanged and
omit the implementation backstory and PR reference; keep a brief note that this
adds archive_state.source_fingerprint (schema v6, auto-migrated) and that
unchanged runs return { ingested: 0 } to reflect the observable behavior; target
the existing text mentioning ingest() and archive_state.source_fingerprint for
replacement.
packages/sdk-node/CHANGELOG.md (1)

7-7: ⚡ Quick win

Simplify to impact-first wording per coding guidelines.

The entry includes a PR reference and implementation details ("Gated on a stat-only source fingerprint...", "cursor-load + per-file scan") that the guidelines ask you to drop. For the Node SDK facade changelog, focus on the observable API behavior.

As per coding guidelines, changelog entries should be concise and impact-first, naming the API touched and the practical effect.

Suggested condensed wording
-- `ingest()` returns in roughly source-walk time (`{ ingested: 0 }`) when nothing upstream has changed, instead of paying the full ~0.7s cursor-load + per-file scan every call. Gated on a stat-only source fingerprint persisted in the ledger. (`#468`)
+- `ingest()` returns immediately with `{ ingested: 0 }` when upstream sessions haven't changed, avoiding ~0.7s cursor load.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sdk-node/CHANGELOG.md` at line 7, Changelog entry for the Node SDK
should be rewritten to an impact-first, user-facing sentence: replace the
current line that mentions implementation details and PR number with a concise
statement naming the API (ingest()) and the observable effect (returns
immediately with no work when nothing upstream changed, e.g., returns {
ingested: 0 } faster). Remove phrases like "Gated on a stat-only source
fingerprint", "cursor-load + per-file scan", and the PR number; keep only the
API name and the practical behavior change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/relayburn-sdk/src/ledger/schema.rs`:
- Around line 60-66: The doc comment for archive_state.source_fingerprint
incorrectly states the stored shape as `count:totalBytes:hash`; update the
documentation to match the implementation by describing it as a stat-only
aggregate over mtimes (for example `count:totalBytes:mtimes` or
`count:totalBytes:mtime-aggregate`) and mention that it is derived from file
mtimes rather than a per-file content hash; modify the comment block that
references `archive_state.source_fingerprint` (also the similar text later in
the file) to use this corrected wording.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Line 9: Rewrite the changelog line to be impact-first and concise: state that
ingest() is now much faster when sources are unchanged and omit the
implementation backstory and PR reference; keep a brief note that this adds
archive_state.source_fingerprint (schema v6, auto-migrated) and that unchanged
runs return { ingested: 0 } to reflect the observable behavior; target the
existing text mentioning ingest() and archive_state.source_fingerprint for
replacement.

In `@packages/sdk-node/CHANGELOG.md`:
- Line 7: Changelog entry for the Node SDK should be rewritten to an
impact-first, user-facing sentence: replace the current line that mentions
implementation details and PR number with a concise statement naming the API
(ingest()) and the observable effect (returns immediately with no work when
nothing upstream changed, e.g., returns { ingested: 0 } faster). Remove phrases
like "Gated on a stat-only source fingerprint", "cursor-load + per-file scan",
and the PR number; keep only the API name and the practical behavior change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: edd99d3d-e11b-42a4-8015-46ce058762c7

📥 Commits

Reviewing files that changed from the base of the PR and between bb37bd7 and 51d3632.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • crates/relayburn-sdk/src/ingest/ingest.rs
  • crates/relayburn-sdk/src/ingest/orchestration_tests.rs
  • crates/relayburn-sdk/src/ledger.rs
  • crates/relayburn-sdk/src/ledger/db.rs
  • crates/relayburn-sdk/src/ledger/schema.rs
  • crates/relayburn-sdk/src/ledger/tests.rs
  • crates/relayburn-sdk/src/query_verbs.rs
  • packages/sdk-node/CHANGELOG.md

Comment on lines +60 to +66
/// - `6`: adds `archive_state.source_fingerprint TEXT` — a cheap aggregate
/// (`count:totalBytes:hash`) over the source session files ingest scans.
/// `ingest_all` records it after each sweep and short-circuits to an
/// empty report when the live source fingerprint is unchanged, so a
/// no-op ingest costs a stat-only walk instead of a full cursor load +
/// per-file deserialize. Blanked by `state reset` so the next ingest
/// re-walks from scratch. (#468)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align the documented fingerprint shape with the implementation.

These comments say the stored value is count:totalBytes:hash, but this PR’s fingerprint contract is a stat-only aggregate over mtimes. Leaving hash here will misdescribe the on-disk value in archive_state.source_fingerprint.

Suggested wording
-///   (`count:totalBytes:hash`) over the source session files ingest scans.
+///   (`count:totalBytes:mtimeSum`) over the source session files ingest scans.
...
-    -- Cheap source-side change gate: `count:totalBytes:hash` over the
+    -- Cheap source-side change gate: `count:totalBytes:mtimeSum` over the

Also applies to: 232-233

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/relayburn-sdk/src/ledger/schema.rs` around lines 60 - 66, The doc
comment for archive_state.source_fingerprint incorrectly states the stored shape
as `count:totalBytes:hash`; update the documentation to match the implementation
by describing it as a stat-only aggregate over mtimes (for example
`count:totalBytes:mtimes` or `count:totalBytes:mtime-aggregate`) and mention
that it is derived from file mtimes rather than a per-file content hash; modify
the comment block that references `archive_state.source_fingerprint` (also the
similar text later in the file) to use this corrected wording.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/sdk-node/CHANGELOG.md">

<violation number="1" location="packages/sdk-node/CHANGELOG.md:91">
P3: Inconsistent indentation on continuation line inside bullet point: the surrounding continuation lines use 2-space indent but this line starts at column 0.</violation>
</file>

<file name="CHANGELOG.md">

<violation number="1" location="CHANGELOG.md:262">
P3: Changelog text accidentally turned `+ free-function form` into a nested `-` list item, which breaks sentence rendering and changes the meaning of the release note.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

- Umbrella facade now coerces napi-rs `BigInt` return values to `Number`
for safe-range integers (`[Number.MIN_SAFE_INTEGER,
Number.MAX_SAFE_INTEGER]`), matching the TS 1.x runtime shape; values
Number.MAX_SAFE_INTEGER]`), matching the TS 1.x runtime shape; values

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Inconsistent indentation on continuation line inside bullet point: the surrounding continuation lines use 2-space indent but this line starts at column 0.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/sdk-node/CHANGELOG.md, line 91:

<comment>Inconsistent indentation on continuation line inside bullet point: the surrounding continuation lines use 2-space indent but this line starts at column 0.</comment>

<file context>
@@ -88,7 +88,7 @@
 - Umbrella facade now coerces napi-rs `BigInt` return values to `Number`
   for safe-range integers (`[Number.MIN_SAFE_INTEGER,
-  Number.MAX_SAFE_INTEGER]`), matching the TS 1.x runtime shape; values
+Number.MAX_SAFE_INTEGER]`), matching the TS 1.x runtime shape; values
   outside that range stay `BigInt` to avoid silent precision loss.
 - Conformance suite is now wired into CI: `napi build` writes its outputs
</file context>
Suggested change
Number.MAX_SAFE_INTEGER]`), matching the TS 1.x runtime shape; values
Number.MAX_SAFE_INTEGER]`), matching the TS 1.x runtime shape; values

Comment thread CHANGELOG.md
Comment on lines +262 to +264
- free-function form) returning `SessionsListResult { sessions, limit,
truncated }`. Derived from the `turns` table so older ledgers with an
empty `sessions` table still enumerate correctly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Changelog text accidentally turned + free-function form into a nested - list item, which breaks sentence rendering and changes the meaning of the release note.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At CHANGELOG.md, line 262:

<comment>Changelog text accidentally turned `+ free-function form` into a nested `-` list item, which breaks sentence rendering and changes the meaning of the release note.</comment>

<file context>
@@ -259,9 +259,9 @@ Cross-package release notes for relayburn. Package changelogs contain package-le
-  + free-function form) returning `SessionsListResult { sessions, limit,
-  truncated }`. Derived from the `turns` table so older ledgers with an
-  empty `sessions` table still enumerate correctly.
+  - free-function form) returning `SessionsListResult { sessions, limit,
+truncated }`. Derived from the `turns` table so older ledgers with an
+    empty `sessions` table still enumerate correctly.
</file context>
Suggested change
- free-function form) returning `SessionsListResult { sessions, limit,
truncated }`. Derived from the `turns` table so older ledgers with an
empty `sessions` table still enumerate correctly.
+ free-function form) returning `SessionsListResult { sessions, limit,
truncated }`. Derived from the `turns` table so older ledgers with an
empty `sessions` table still enumerate correctly.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/sdk-node/CHANGELOG.md">

<violation number="1" location="packages/sdk-node/CHANGELOG.md:91">
P3: Inconsistent indentation on continuation line inside bullet point: the surrounding continuation lines use 2-space indent but this line starts at column 0.</violation>
</file>

<file name="CHANGELOG.md">

<violation number="1" location="CHANGELOG.md:262">
P3: Changelog text accidentally turned `+ free-function form` into a nested `-` list item, which breaks sentence rendering and changes the meaning of the release note.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread crates/relayburn-sdk/src/ingest/orchestration_tests.rs
Resolve reviewer findings on #471 and reconcile the path-aware fingerprint
with the watch-loop force-scan + OpenCode child-count work:

- Don't cache the source fingerprint when a statted source file fails to
  read/parse (Codex P2). A transient failure (e.g. PermissionDenied later
  fixed with chmod, moving ctime but not size/mtime) would otherwise match
  the stored fingerprint and short-circuit before retrying, stranding the
  session. Threaded a `had_skips` flag through the per-harness loops and
  skip the fingerprint write when set. Regression test added.
- Rewrite `ingest_all_rescans_after_source_file_appended` to append JSONL
  in place instead of creating a new file (CodeAnt + Cubic), so the test
  actually covers append-triggered fast-path invalidation.
- Keep the path-aware `count:totalBytes:hash` fingerprint AND fold in the
  OpenCode message-dir child count so a new message file re-opens the gate
  even when the dir mtime granularity is too coarse to move.
- Changelog entries trimmed to impact-first wording per CLAUDE.md.

cargo build --workspace, full relayburn-sdk + relayburn-cli suites, and
clippy --workspace all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

CodeAnt AI is running Incremental review

@codeant-ai codeant-ai Bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Jun 3, 2026
@codeant-ai

codeant-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

CodeAnt AI Incremental review completed.

@willwashburn willwashburn merged commit 9d25022 into main Jun 3, 2026
11 of 12 checks passed
@willwashburn willwashburn deleted the fix-ingest-noop-fast-path branch June 3, 2026 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ingest() costs ~0.7-1.9s even when nothing changed — add a no-op fast path (fingerprint-gated)

1 participant