Skip to content

feat(stats): port savings.jsonl telemetry from semble#11

Merged
amondnet merged 2 commits into
mainfrom
feat/unit-13-stats
May 28, 2026
Merged

feat(stats): port savings.jsonl telemetry from semble#11
amondnet merged 2 commits into
mainfrom
feat/unit-13-stats

Conversation

@amondnet

@amondnet amondnet commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Port of `src/semble/stats.py` → `src/stats.ts`. Adds the token-savings telemetry that records each `search` / `find_related` call to `~/.csp/savings.jsonl` (note: `.csp/`, not `.semble/`) and formats the ASCII bar-chart report shown by `csp savings`.

Unit 13 of the parallel port effort (MinishLab/semble → @pleaseai/csp).

What's exported

  • `BucketStats` class — `calls`, `snippetChars`, `fileChars`, `savedChars`; `add(snippet, file)` updates fields with `savedChars += max(0, file - snippet)`.
  • `SavingsSummary` interface — `{ buckets, callTypeCounts }`.
  • `saveSearchStats(results, callType, fileSizes)` — appends a JSONL line (snake_case fields per Python: `ts`, `call`, `results`, `snippet_chars`, `file_chars`). Wrapped in try/catch — never throws on I/O failure.
  • `buildSavingsSummary(path?)` — reads JSONL, skips malformed lines, fills `Today` / `Last 7 days` / `All time` buckets using UTC date math.
  • `formatSavingsReport({ path?, verbose? })` — ASCII bar chart with header `" Csp Token Savings"` (was `" Semble Token Savings"` upstream). Bar width 16, ratio = `saved / file`, token formatting `~123` / `~1.5k` / `~2.3M`. `verbose: true` adds the `Usage Breakdown` section.
  • `setStatsFile(path)` / `getStatsFile()` / `resetStatsFile()` — test seam for overriding `~/.csp/savings.jsonl`.
  • Minimal local types `CallType`, `StatsChunk`, `StatsSearchResult` to avoid a cross-unit dependency before `src/types.ts` lands.

Notable porting choices

  • `Date.now() / 1000` mirrors Python's `datetime.now(timezone.utc).timestamp()` (float seconds).
  • UTC date string `YYYY-MM-DD` is used for the today / last-7-days comparison — matches Python's `dt.date() == today` / `dt.date() > seven_days_ago` (exclusive lower bound).
  • Bucket-iteration order is preserved via object-literal insertion order (`Today`, `Last 7 days`, `All time`).
  • Malformed JSONL lines are skipped silently (semble logs a warning; we omit the log to keep stats imports side-effect-free — same observable behavior).
  • File-chars dedupe by unique `filePath` set matches the Python set comprehension exactly.

Tests (17 pass)

`bun test src/stats.test.ts` — temp-dir isolation via `mkdtempSync` + `setStatsFile`. Covers:

  • `BucketStats.add` accumulation and `>= 0` clamping
  • `saveSearchStats`: one line per call, two calls → two lines, path dedupe, missing-file-size handling, no-throw on I/O failure
  • `buildSavingsSummary`: parses valid lines, skips malformed, bucket math (`snippet=100, file=400` × 2 → `savedChars=600`, ratio `0.75`), older entries outside Today / Last 7
  • `formatSavingsReport`: `"Csp Token Savings"` header (not Semble), missing file → `"No stats yet. Run a search first."`, `~1.5k` and `~1.0M` formatting, verbose Usage Breakdown sorted alphabetically, 75% bar rendering

Out of scope

  • `src/types.ts` (separate unit) — once it lands, `CallType` here should be replaced by a re-export.
  • CLI wiring (`csp savings`) — separate unit.

Summary by cubic

Ports token-savings telemetry from Semble to TypeScript to track search and find_related calls and generate a console report. Writes JSONL records to ~/.csp/savings.jsonl and formats an ASCII report that mirrors Semble’s output.

  • New Features

    • saveSearchStats(results, callType, fileSizes): writes JSONL lines (ts, call, results, snippet_chars, file_chars), dedupes by filePath, and never throws on I/O.
    • buildSavingsSummary(path?): aggregates Today / Last 7 days / All time using UTC; skips malformed lines.
    • formatSavingsReport({ path?, verbose? }): prints "Csp Token Savings" bar chart (width 16), shows saved tokens (~N, ~Nk, ~NM), optional usage breakdown.
    • BucketStats with add(snippet, file) and savedChars = max(0, file - snippet).
    • setStatsFile / getStatsFile / resetStatsFile to override ~/.csp/savings.jsonl.
    • Minimal local types: CallType, StatsSearchResult.
  • Bug Fixes

    • Reject NaN numeric fields in JSONL records to prevent corrupted date formatting and bucket math.
    • Initialize callTypeCounts with Object.create(null) to avoid collisions when call types match built-in properties (e.g., toString, __proto__).

Written for commit 1c244f1. Summary will update on new commits.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request ports token-savings tracking from Python to TypeScript, introducing src/stats.ts and its corresponding test suite in src/stats.test.ts. The reviewer provided several constructive suggestions to improve robustness and prevent runtime exceptions, including using optional chaining for nested properties, adding a guard for fileSizes, enhancing the isStatsRecord type guard to check for NaN, and using Object.create(null) to avoid prototype pollution.

Comment thread src/stats.ts
Comment thread src/stats.ts
Comment thread src/stats.ts
Comment thread src/stats.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Architecture diagram
sequenceDiagram
    participant CLI as CLI (csp savings)
    participant Stats as stats.ts (Stats Module)
    participant FS as File System (~/.csp/savings.jsonl)
    participant Search as Search Engine (search/find_related)

    Note over Search,Stats: NEW: Telemetry Recording Flow

    Search->>Stats: saveSearchStats(results, callType, fileSizes)
    Stats->>Stats: Calculate snippet_chars (sum of result content lengths)
    Stats->>Stats: Deduplicate filePaths for file_chars
    alt fileSizes has path for unique path
        Stats->>Stats: Sum fileSizes[p] for file_chars
    else path missing from fileSizes
        Stats->>Stats: Skip path (0 contribution to file_chars)
    end
    Stats->>Stats: Build JSON record (ts, call, results, snippet_chars, file_chars)
    Stats->>FS: mkdirSync(dir, { recursive: true }) - ensure directory exists
    alt I/O success
        Stats->>FS: appendFileSync(statsFile, JSON record + newline)
    else I/O error (ENOENT, EACCES, etc.)
        Stats->>Stats: catch block silently swallows error (no throw)
    end

    Note over CLI,Stats: NEW: Report Generation Flow

    CLI->>Stats: formatSavingsReport({ path?, verbose? })
    alt Stats file does not exist
        Stats-->>CLI: Return "No stats yet. Run a search first."
    else Stats file exists
        Stats->>FS: readFileSync(target, 'utf8')
        FS-->>Stats: raw JSONL content
        loop For each line in file
            alt Line is empty
                Stats->>Stats: Skip line
            else Line is malformed JSON
                Stats->>Stats: JSON.parse throws → silently skip (equivalent to upstream warning)
            else Valid JSON but not StatsRecord
                Stats->>Stats: Skip (isStatsRecord returns false)
            else Valid StatsRecord
                Stats->>Stats: ymdUtc(record.ts) - get UTC date string
                Stats->>Stats: callTypeCounts[callType]++ (per-call-type count)
                alt Today bucket (day === today UTC)
                    Stats->>Stats: buckets['Today'].add(snippet_chars, file_chars)
                end
                alt Last 7 days bucket (day > sevenDaysAgo UTC)
                    Stats->>Stats: buckets['Last 7 days'].add(snippet_chars, file_chars)
                end
                alt All time bucket
                    Stats->>Stats: buckets['All time'].add(snippet_chars, file_chars)
                end
            end
        end
        Stats->>Stats: Build ASCII report with bar chart (barWidth=16)
        Note over Stats: Header: "Csp Token Savings" (not "Semble Token Savings")
        alt verbose === true and callTypeEntries exist
            Stats->>Stats: Append Usage Breakdown section (sorted alphabetically)
        end
        Stats-->>CLI: Return formatted report (multi-line string)
    end

    Note over Stats: BucketStats.add() Logic
    Note over Stats: savedChars = max(0, fileChars - snippetChars) [no negative]
    Note over Stats: Token approximation: savedTokens = floor(savedChars / 4)
Loading

Re-trigger cubic

- Reject NaN in isStatsRecord type guard. typeof NaN === 'number' would
  otherwise let malformed lines through and propagate NaN into date
  formatting ('NaN-NaN-NaN') and bucket arithmetic.
- Initialize callTypeCounts with Object.create(null) so JSONL call values
  matching built-in object properties (toString, __proto__) don't collide
  with prototype methods.
- Add tests covering NaN rejection and call-type/prototype collision.

@amondnet amondnet left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied 2 of 4 gemini-code-assist suggestions, deferred 2.

Applied (commit 1c244f1):

  • isStatsRecord rejects NaN for ts / snippet_chars / file_chars — without this, malformed lines with NaN would pass the typeof === 'number' check and propagate NaN into ymdUtc ("NaN-NaN-NaN") and bucket math.
  • callTypeCounts now uses Object.create(null) to prevent collisions with built-in object properties when a JSONL call value happens to match e.g. toString or __proto__.
  • Added two regression tests covering both behaviors.

Deferred (parity / type-contract preservation):

  • Optional chaining on r.chunk.content / r.chunk.filePathStatsChunk declares both fields as required and the outer try/catch already guarantees stats writes never throw (covered by the 'never throws on I/O error' test). Adding ?. would imply a runtime nullability that contradicts the public type contract, and upstream stats.py doesn't guard these accesses either.
  • Explicit guard for fileSizes being null/undefined — same reasoning: typed as required Record<string, number>, and try/catch already protects the write.

@amondnet amondnet self-assigned this May 28, 2026
@amondnet amondnet merged commit 2585b6b into main May 28, 2026
1 check passed
@amondnet amondnet deleted the feat/unit-13-stats branch May 28, 2026 16:06
This was referenced Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant