feat(stats): port savings.jsonl telemetry from semble#11
Merged
Conversation
There was a problem hiding this comment.
Code Review
This pull request ports token-savings tracking from Python to TypeScript, introducing src/stats.ts and its corresponding test suite in src/stats.test.ts. The reviewer provided several constructive suggestions to improve robustness and prevent runtime exceptions, including using optional chaining for nested properties, adding a guard for fileSizes, enhancing the isStatsRecord type guard to check for NaN, and using Object.create(null) to avoid prototype pollution.
There was a problem hiding this comment.
No issues found across 2 files
Architecture diagram
sequenceDiagram
participant CLI as CLI (csp savings)
participant Stats as stats.ts (Stats Module)
participant FS as File System (~/.csp/savings.jsonl)
participant Search as Search Engine (search/find_related)
Note over Search,Stats: NEW: Telemetry Recording Flow
Search->>Stats: saveSearchStats(results, callType, fileSizes)
Stats->>Stats: Calculate snippet_chars (sum of result content lengths)
Stats->>Stats: Deduplicate filePaths for file_chars
alt fileSizes has path for unique path
Stats->>Stats: Sum fileSizes[p] for file_chars
else path missing from fileSizes
Stats->>Stats: Skip path (0 contribution to file_chars)
end
Stats->>Stats: Build JSON record (ts, call, results, snippet_chars, file_chars)
Stats->>FS: mkdirSync(dir, { recursive: true }) - ensure directory exists
alt I/O success
Stats->>FS: appendFileSync(statsFile, JSON record + newline)
else I/O error (ENOENT, EACCES, etc.)
Stats->>Stats: catch block silently swallows error (no throw)
end
Note over CLI,Stats: NEW: Report Generation Flow
CLI->>Stats: formatSavingsReport({ path?, verbose? })
alt Stats file does not exist
Stats-->>CLI: Return "No stats yet. Run a search first."
else Stats file exists
Stats->>FS: readFileSync(target, 'utf8')
FS-->>Stats: raw JSONL content
loop For each line in file
alt Line is empty
Stats->>Stats: Skip line
else Line is malformed JSON
Stats->>Stats: JSON.parse throws → silently skip (equivalent to upstream warning)
else Valid JSON but not StatsRecord
Stats->>Stats: Skip (isStatsRecord returns false)
else Valid StatsRecord
Stats->>Stats: ymdUtc(record.ts) - get UTC date string
Stats->>Stats: callTypeCounts[callType]++ (per-call-type count)
alt Today bucket (day === today UTC)
Stats->>Stats: buckets['Today'].add(snippet_chars, file_chars)
end
alt Last 7 days bucket (day > sevenDaysAgo UTC)
Stats->>Stats: buckets['Last 7 days'].add(snippet_chars, file_chars)
end
alt All time bucket
Stats->>Stats: buckets['All time'].add(snippet_chars, file_chars)
end
end
end
Stats->>Stats: Build ASCII report with bar chart (barWidth=16)
Note over Stats: Header: "Csp Token Savings" (not "Semble Token Savings")
alt verbose === true and callTypeEntries exist
Stats->>Stats: Append Usage Breakdown section (sorted alphabetically)
end
Stats-->>CLI: Return formatted report (multi-line string)
end
Note over Stats: BucketStats.add() Logic
Note over Stats: savedChars = max(0, fileChars - snippetChars) [no negative]
Note over Stats: Token approximation: savedTokens = floor(savedChars / 4)
- Reject NaN in isStatsRecord type guard. typeof NaN === 'number' would
otherwise let malformed lines through and propagate NaN into date
formatting ('NaN-NaN-NaN') and bucket arithmetic.
- Initialize callTypeCounts with Object.create(null) so JSONL call values
matching built-in object properties (toString, __proto__) don't collide
with prototype methods.
- Add tests covering NaN rejection and call-type/prototype collision.
amondnet
commented
May 28, 2026
amondnet
left a comment
Contributor
Author
There was a problem hiding this comment.
Applied 2 of 4 gemini-code-assist suggestions, deferred 2.
Applied (commit 1c244f1):
isStatsRecordrejects NaN forts/snippet_chars/file_chars— without this, malformed lines with NaN would pass thetypeof === 'number'check and propagate NaN intoymdUtc("NaN-NaN-NaN") and bucket math.callTypeCountsnow usesObject.create(null)to prevent collisions with built-in object properties when a JSONLcallvalue happens to match e.g.toStringor__proto__.- Added two regression tests covering both behaviors.
Deferred (parity / type-contract preservation):
- Optional chaining on
r.chunk.content/r.chunk.filePath—StatsChunkdeclares both fields as required and the outertry/catchalready guarantees stats writes never throw (covered by the 'never throws on I/O error' test). Adding?.would imply a runtime nullability that contradicts the public type contract, and upstreamstats.pydoesn't guard these accesses either. - Explicit guard for
fileSizesbeing null/undefined — same reasoning: typed as requiredRecord<string, number>, and try/catch already protects the write.
This was referenced Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Port of `src/semble/stats.py` → `src/stats.ts`. Adds the token-savings telemetry that records each `search` / `find_related` call to `~/.csp/savings.jsonl` (note: `.csp/`, not `.semble/`) and formats the ASCII bar-chart report shown by `csp savings`.
Unit 13 of the parallel port effort (MinishLab/semble → @pleaseai/csp).
What's exported
Notable porting choices
Tests (17 pass)
`bun test src/stats.test.ts` — temp-dir isolation via `mkdtempSync` + `setStatsFile`. Covers:
Out of scope
Summary by cubic
Ports token-savings telemetry from Semble to TypeScript to track
searchandfind_relatedcalls and generate a console report. Writes JSONL records to~/.csp/savings.jsonland formats an ASCII report that mirrors Semble’s output.New Features
saveSearchStats(results, callType, fileSizes): writes JSONL lines (ts,call,results,snippet_chars,file_chars), dedupes byfilePath, and never throws on I/O.buildSavingsSummary(path?): aggregates Today / Last 7 days / All time using UTC; skips malformed lines.formatSavingsReport({ path?, verbose? }): prints "Csp Token Savings" bar chart (width 16), shows saved tokens (~N,~Nk,~NM), optional usage breakdown.BucketStatswithadd(snippet, file)andsavedChars = max(0, file - snippet).setStatsFile/getStatsFile/resetStatsFileto override~/.csp/savings.jsonl.CallType,StatsSearchResult.Bug Fixes
callTypeCountswithObject.create(null)to avoid collisions when call types match built-in properties (e.g.,toString,__proto__).Written for commit 1c244f1. Summary will update on new commits.