Skip to content

feat: [ENG-1889] Implement FileQueryLogStore with Zod validation#354

Merged
bao-byterover merged 2 commits intoproj/query-logfrom
feat/ENG-1889
Apr 10, 2026
Merged

feat: [ENG-1889] Implement FileQueryLogStore with Zod validation#354
bao-byterover merged 2 commits intoproj/query-logfrom
feat/ENG-1889

Conversation

@bao-byterover
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Claude finished @bao-byterover's task in 3m 49s —— View job


PR Review: feat: [ENG-1889] Implement FileQueryLogStore with Zod validation

  • Gather context and read changed files
  • Review src/server/constants.ts
  • Review src/server/infra/storage/file-query-log-store.ts
  • Review test/integration/infra/storage/file-query-log-store.test.ts
  • Post inline comments
  • Post summary with overall score

Summary

Overall score: 4 / 5

The implementation is solid. Atomic writes (tmp → rename), Zod-validated reads, silent-null on corruption, and path-traversal prevention via ID_PATTERN are all done correctly. Test coverage is comprehensive — happy paths, Zod failure branches, age/count pruning, stale recovery, and atomic write cleanliness are all exercised.

Issues requiring attention

Severity Location Topic
issue file-query-log-store.ts:28 QueryLogTierSchema hardcodes tier values instead of deriving from QUERY_LOG_TIERS — schema will drift silently
issue file-query-log-store.ts:258 resolveStale calls this.save(), which cascades into pruneOldest on every stale read — cascades badly under bulk stale reads
issue file-query-log-store.ts:146 Filtered list() always reads every file on disk — O(n) even for limit: 1 with a status filter

Suggestions / nitpicks

  • file-query-log-store.ts:115getNextId monotonicity is instance-scoped; worth documenting the "single-instance" assumption
  • file-query-log-store.ts:198 — Prune fires after every save; a simple in-flight guard would prevent redundant concurrent prune passes under write bursts
  • file-query-log-store.test.ts:62 — 50 ms settle in afterEach can be flaky under CI load
  • file-query-log-store.test.ts:349 — Age-pruning test bypasses getNextId and encodes the filename format assumption implicitly; a brief comment would help future readers
  • file-query-log-store.test.ts:364 — Count pruning test data has startedAt inverted relative to ID order; a comment prevents accidental "fix" of the assertions

durationMs: z.number(),
})

const QueryLogTierSchema = z.union([z.literal(0), z.literal(1), z.literal(2), z.literal(3), z.literal(4)])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (correctness): QueryLogTierSchema duplicates the domain QUERY_LOG_TIERS array and will drift

QUERY_LOG_TIERS in query-log-entry.ts is declared as the single source of truth. This hardcoded union will silently become stale when a tier is added or removed.

Derive the schema from the imported const instead:

Suggested change
const QueryLogTierSchema = z.union([z.literal(0), z.literal(1), z.literal(2), z.literal(3), z.literal(4)])
const QueryLogTierSchema = z.union(
QUERY_LOG_TIERS.map((t) => z.literal(t)) as [
z.ZodLiteral<0>,
z.ZodLiteral<1>,
z.ZodLiteral<2>,
z.ZodLiteral<3>,
z.ZodLiteral<4>,
],
)

You'd also need to add QUERY_LOG_TIERS to the import from the entity.

Fix this →

return []
}

const hasFilters = Boolean(status?.length || tier?.length || after !== undefined || before !== undefined)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (correctness): Limit is not applied when filters are present, causing full scan even for limit: 1

When any filter is active, filesToRead = files (all files on disk). A query like list({ status: ['processing'], limit: 1 }) still reads every entry. For a long-running daemon with thousands of log entries this becomes very slow.

A partial improvement: if limit and only time-range filters (after/before) are present, you can still pre-slice by filename timestamp before I/O. For status/tier filters a full scan is unavoidable, but capping reads at Math.min(files.length, MAX_SAFE_SCAN) would bound the worst case.

At minimum, the behaviour should be documented in the interface so callers know that filtered calls are always O(n).

throw error
}

this.pruneOldest().catch(() => {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: prune fire-and-forget could be debounced / guarded

pruneOldest is a costly readdir + stat of every log file. Calling it after every save means concurrent writes each spawn a separate prune pass. Under high write rates (e.g. agentic query burst) this creates redundant I/O.

Consider: guard with an in-flight flag, or only prune when the count crosses a threshold (e.g. files.length > maxEntries * 1.2) to avoid prune-on-every-write once the store is stable.

status: 'error',
}

this.save(recovered).catch(() => {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (correctness): resolveStale calling this.save() cascades into pruneOldest on every stale read

resolveStalethis.save(recovered)pruneOldest() (fire-and-forget). If a consumer calls list() and 20 entries are stale, that triggers 20 concurrent prune passes. The prune itself already fires from save; the stale-recovery path doesn't need the prune overhead.

Write the recovered entry directly (or extract a saveRaw helper that skips the prune) to keep the side-effects of a read minimal:

// instead of this.save(recovered).catch(() => {})
writeFile(this.entryPath(recovered.id), JSON.stringify(recovered, null, 2), 'utf8').catch(() => {})

Fix this →

* Generate the next monotonic log entry ID in the format `qry-{timestamp_ms}`.
* Guaranteed to increase even if called multiple times in the same millisecond.
*/
async getNextId(): Promise<string> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: getNextId monotonicity is instance-scoped

lastTimestamp lives on the instance, so two FileQueryLogStore instances pointing at the same baseDir could race and emit the same ID in the same millisecond. In practice this is unlikely (single daemon, single store), but the class's doc comment should note this assumption so it isn't misused in tests or refactors.

// Use IDs with old timestamps to match age-based pruning (which checks filename timestamp)
const oldTs = Date.now() - 10 * 86_400_000
const recentTs = Date.now() - 1 * 86_400_000
const idOld = `qry-${oldTs}`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: directly constructing IDs couples the test to the filename encoding

The age-pruning test sidesteps getNextId and builds IDs from raw timestamps (qry-${oldTs}). This works because pruning parses the timestamp back out of the filename. If the ID format ever changes (e.g. a prefix change or format bump) this test would silently stop exercising age pruning while still passing.

A @internal comment explaining why direct construction is intentional here (to control filename timestamps for age-testing) would help future readers distinguish this from accidental bypass of getNextId.


afterEach(async () => {
// Allow async fire-and-forget writes (resolveStale, prune) to settle
await new Promise((resolve) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: 50 ms settle delay is fragile under CI load

The setTimeout(resolve, 50) waits for fire-and-forget async writes (stale recovery, prune) to complete before rm. Under high CPU contention on CI this can be too short, causing flaky ENOENT errors on cleanup or stale files being deleted before assertions run in the next test.

Consider 100 ms here, or — better — inject a clock/prune hook in tests that need to observe side-effects deterministically (see the stale processing recovery suite which does its own 50 ms wait inside the test).

})

// Test 19
it('should apply count limit after age-based pruning', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (correctness): pruning test assertion order is inverted relative to startedAt

saveEntries with (i) => ({startedAt: now - i * 1000}) makes ids[0] the newest by startedAt (now) but the oldest by ID/filename (now - 0 → smallest ID). Pruning removes by filename order (oldest first), so ids[0] and ids[1] are pruned — which is what the test asserts.

However, this is counterintuitive: the entry with the highest startedAt value gets deleted. A comment explaining that count-based pruning is filename-order (not startedAt-order) would prevent future maintainers from accidentally "fixing" the assertion order.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Claude finished @bao-byterover's task in 2m 32s —— View job


PR Review: feat: [ENG-1889] Implement FileQueryLogStore with Zod validation

  • Gather context
  • Read changed files
  • Review src/server/constants.ts
  • Review src/server/infra/storage/file-query-log-store.ts
  • Review test/integration/infra/storage/file-query-log-store.test.ts
  • Post inline comments
  • Post summary

Summary

Overall score: 5 / 5

All three issues and all five suggestions from the first review have been addressed cleanly in this commit. The implementation is production-ready.

What was fixed

Previous finding Resolution
QueryLogTierSchema hardcoded tiers (would drift) ✅ Now uses QUERY_LOG_TIER_SET derived from QUERY_LOG_TIERS at runtime
resolveStalesavepruneOldest cascade on stale reads ✅ Now calls writeAtomic directly; comment explains the decision
list() always O(n) even for limit: 1 with status filter ✅ Sequential reads with early termination once limit matches found
getNextId instance-scoped monotonicity undocumented ✅ Full JSDoc added explaining the single-instance assumption and why it holds
No concurrency guard on pruneOldest pruneInFlight flag deduplicates concurrent prune calls
50 ms settle delay in afterEach was flaky ✅ Increased to 100 ms
Age-pruning test bypassed getNextId silently ✅ Detailed @internal comment explains why direct ID construction is intentional
Count-pruning test had inverted startedAt without explanation ✅ Comment documents that pruning is filename-order, not startedAt-order

Remaining nitpicks (non-blocking)

  • file-query-log-store.ts:35z.custom<QueryLogTier> works correctly but uses a type assertion rather than a structurally-inferred type; z.union + z.literal is the idiomatic Zod alternative (see inline comment)
  • file-query-log-store.ts:138 — JSDoc overstates "O(matches) rather than O(total entries)"; worst case is still O(n) when filters are selective (see inline comment with suggested fix)

const QueryLogTierSchema = z.custom<QueryLogTier>(
(val) => QUERY_LOG_TIER_SET.has(val),
{message: 'Invalid query log tier'},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: z.custom loses Zod's built-in enum features

The z.custom<QueryLogTier> approach is functionally correct and solves the drift problem, but the explicit <QueryLogTier> type parameter is an assertion (a claim), not a proof — Zod can't verify the return type of the custom predicate matches QueryLogTier. If, say, QUERY_LOG_TIERS were widened to include a string tier in the future, the schema would silently fail to narrow correctly.

The previous suggestion's shape — z.union(QUERY_LOG_TIERS.map(t => z.literal(t))) — is structurally safer because the return type is inferred rather than asserted, and Zod can enumerate valid values in error messages. The cast dance (as [z.ZodLiteral<0>, ...]) is the awkward part; an alternative that avoids it:

const QueryLogTierSchema = z.union(
  QUERY_LOG_TIERS.map((t) => z.literal(t)) as [
    ReturnType<typeof z.literal<(typeof QUERY_LOG_TIERS)[number]>>,
    ...ReturnType<typeof z.literal<(typeof QUERY_LOG_TIERS)[number]>>[]
  ]
)

Or simply use z.enum with a string cast if tiers were strings. For numeric literals z.union + z.literal is the idiomatic Zod pattern. That said, the current approach works correctly for this codebase — just noting the trade-off.

* List entries sorted newest-first (by timestamp embedded in filename).
* Filters (status, tier, after, before) are applied before limit.
* Reads stop early once `limit` matches are found, so filtered queries with small limits
* are O(matches) rather than O(total entries). Skips corrupt entries silently.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: docstring overstates the O(matches) claim

Reads stop early once limit matches are found, so filtered queries with small limits are O(matches) rather than O(total entries).

This is only true when there are enough matching entries near the head of the file list. In the worst case — list({status: ['error'], limit: 1}) with zero error entries — every file is still read. The actual bound is O(files scanned until limit matching entries are found), which degrades to O(n) when filters are selective.

Suggested rewrite:

Suggested change
* are O(matches) rather than O(total entries). Skips corrupt entries silently.
* Reads stop as soon as `limit` matching entries are found, so filtered queries terminate
* early when matches appear near the newest files. Worst-case is still O(total entries)
* when filters are selective or matches are sparse. Skips corrupt entries silently.

@bao-byterover bao-byterover merged commit a2c800e into proj/query-log Apr 10, 2026
6 checks passed
@bao-byterover bao-byterover deleted the feat/ENG-1889 branch April 11, 2026 16:34
bao-byterover added a commit that referenced this pull request Apr 16, 2026
* feat: [ENG-1897] Create brv query-log view oclif command (#347)

* feat: [ENG-1897] Create brv query-log view oclif command

* feat: [ENG-1897] fix review

* feat: [ENG-1896] Implement QueryLogUseCase with list and detail views (#349)

* feat: [ENG-1896] Implement QueryLogUseCase with list and detail views

* feat: [ENG-1896] Fix review

* feat: [ENG-1888] Define IQueryLogStore interface (#351)

* feat: [ENG-1887] Define QueryLogEntry entity with discriminated union (#353)

* feat: [ENG-1889] Implement FileQueryLogStore with Zod validation (#354)

* feat: [ENG-1889] Implement FileQueryLogStore with Zod validation

* feat: [ENG-1889] fix review

* feat: [ENG-1899] Create brv query-log summary oclif command (#357)

* feat: [ENG-1899] Create brv query-log summary oclif command

* feat: [ENG-1899] fix review

* feat: [ENG-1898] Implement QueryLogSummaryUseCase (#368)

* feat: [ENG-1892] Add QueryExecutorResult type with tier and timing metadata (#369)

* feat: [ENG-1893] Wire QueryLogHandler into daemon lifecycle (#370)

* feat: [ENG-1894] Wire QueryLogHandler into daemon lifecycle (#371)

* feat: [ENG-1894] enhance message output

* feat: [proj/query-log] fix review

* feat: [ENG-2123] brv curate view / brv query-log view truncate long c… (#424)

* feat: [ENG-2123] brv curate view / brv query-log view truncate long content

* feat: [ENG-2123] fix review

* feat: [ENG-2177] Increase Storage Limit for Curate Log and Query Log (#430)

* feat: [ENG-2177] Increase Storage Limit for Curate Log and Query Log

* feat: [ENG-2177] fix review

* feat: [ENG-2177] fix slow test

---------

Co-authored-by: Hoang Pham <lehoangpham1092@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant