Skip to content

Reader infrastructure: incremental cursors and git-canonical project keys #4

Description

@willwashburn

Context

Two related reader upgrades, bundled because they both land across packages/reader + packages/ledger and share test infrastructure.

Part A: Incremental file cursor

Today parseClaudeSession in packages/reader/src/claude.ts re-parses every session JSONL file in full on every run. On a user with months of sessions this is the hot path for burn summary, and it creates unnecessary duplicate-append pressure on the ledger.

Plan (pattern cribbed from TokenTracker rollout.js:74-183):

  • Persist cursor state in \$RELAYBURN_HOME/cursors.json:
    { \"files\": { \"/abs/path/session.jsonl\": { \"inode\": 12345, \"offsetBytes\": 98765, \"mtimeMs\": 1700000000000 } } }
  • On next run: fstat each session file. If inode unchanged and mtime ≥ stored mtime, seek to offsetBytes and parse only the tail. Otherwise treat as a new file and parse from zero (log rotation / file replacement).
  • Tail-safety: only advance offsetBytes past the last complete newline. Never record a position mid-line.
  • Concurrency: file-lock cursors.json during write (session read is append-only and safe without one).

Part B: Git-canonical project key

Today TurnRecord.project is set to cwd at claude.ts:141. That means /Users/will/Projects/burn and /Users/will/burn-worktree-2 — the same repo — roll up separately, and nothing rolls up across machines.

Plan (pattern from TokenTracker rollout.js:1608-1630):

  • Add a small helper: resolveProject(cwd): { project: string, projectKey?: string }.
  • Walk up from cwd looking for .git/config. Parse [remote \"origin\"] url. Canonicalize to host/owner/repo (strip .git, strip git@host:, normalize https://host/ to host/).
  • Keep project: cwd for backward compatibility; add projectKey as the rollup key.
  • Queries group by projectKey when present, fall back to project.

Part C: Ledger idempotency

While we're in the ledger: dedup by (source, sessionId, messageId) hash on append. Prevents double-counting when a session is re-parsed (which will still happen anytime a file is rewritten — e.g. Claude Code's session save cadence).

  • Maintain a secondary sidecar index at \$RELAYBURN_HOME/ledger.idx — a simple newline-delimited list of hashes, or a Bloom filter if memory becomes a concern.
  • On appendTurns, skip any turn whose hash is already indexed.
  • Expose a ledger.rebuildIndex() for recovery.

Acceptance

  • Second burn summary run over the same data is ≥ 10× faster than the first (measured on a fixture with ≥ 100k turns).
  • A session from /Users/will/Projects/burn and a copy at /tmp/burn-worktree-2 roll up under the same projectKey in burn summary.
  • Repeated parse of the same session file produces zero duplicate ledger entries. Verified by a test that parses the same fixture twice and asserts ledger byte-length is unchanged on the second pass.
  • Log rotation (inode change) correctly triggers full re-parse of the new file.

Depends on

Nothing. Can land in parallel with #1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions