feat(F3): LanceAuditSink with temporal timestamps + full schema round-trip#302
Conversation
…writer (PR-F3) Implements LanceAuditSink behind the `audit-log` feature flag (now pulls lance + arrow + futures + tokio). Entries are buffered via append(), flushed to a Lance dataset in append mode, and read back via scan_back(n). Schema: tenant_id (Utf8), actor_id (Utf8), statement_hash (UInt64), timestamp (Int64), action (Utf8). Tests: flush-10 roundtrip, flush-1000 count, multi-flush accumulation, empty-flush noop. All 75 tests pass (4 new Lance tests + 71 existing). https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
…lance limit-pushdown
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 928c5d7f8e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let entries: Vec<AuditEntry> = { | ||
| let mut guard = self.buffer.lock().unwrap_or_else(|e| e.into_inner()); | ||
| std::mem::take(&mut *guard) | ||
| }; |
There was a problem hiding this comment.
Preserve buffered entries until flush write succeeds
flush() drains the in-memory buffer with std::mem::take before any I/O, so if Dataset::write(...) later fails (e.g., transient object-store/network error), those audit records are dropped permanently instead of being retried. This creates silent data loss in exactly the failure scenarios an audit sink must tolerate.
Useful? React with 👍 / 👎.
| let ds = Dataset::open(&self.dataset_path) | ||
| .await | ||
| .map_err(|e| format!("Lance open error: {e}"))?; |
There was a problem hiding this comment.
Handle missing dataset as empty in scan_back
scan_back() immediately errors if the dataset path has not been created yet, which happens before the first successful flush(). In that common startup path, callers cannot read history and must special-case errors, even though there are simply zero persisted entries; returning an empty vector for not-found would match the method contract better and avoid spurious failures.
Useful? React with 👍 / 👎.
Sprint C agent (PR #311) flagged five staleness items in the vision doc that were out of its §7-only scope. Closing the debt now: Header DRAFT - pending review (2026-04-28) -> Status: F1 parity shipped 2026-04-30. F1 latency benchmark not yet started. F2 is a posture, not a delivery. §2 anchor as of 2026-04-28 -> as of 2026-04-30 (post-F1 parity ship) §2 latency cell Designed to match; F1 numbers (forward tense) -> Designed to match; benchmark pending §2 caveat F1 publishes the first numbers (forward tense) -> F1 parity has shipped (correctness); the separately-scoped F1 latency benchmark has not been started. Distinguishes the two sub-deliverables explicitly. §3 F1 We stand up a Foundry instance... (forward) -> Shipped 2026-04-30. Cross-link to §7's as-shipped architecture. §3 F2 gated upstream by lance-graph PR-1 / PR-2 -> lance-graph PR #278 + #280 + #284 (RLS) and PR #278 + #302 (audit). Status today: lance-graph in production; medcare-rs adopter not yet open. Posture, not delivery. §3 F3 gated upstream by lance-graph PR-4 -> lance-graph PR #278 + #280 (parser + hardening). Status today: parser stub on lance-graph main; medcare-rs adopter is future round-2 work. §4 benchmark harness lands as part of F1 F1 numbers are published (both forward tense) -> F1 parity (correctness) shipped; F1 latency benchmarking has not been started. The two are separately-scoped F1 sub-deliverables. What this PR does NOT touch: - F4, F5, §5 (risks), §6 (NOT promising), §7 (next deliverable just landed in PR #311 - clean already). - The vision doc's tone rule. Every change cites a concrete PR number or file path; no marketing language introduced. - Performance numbers. None claimed; the §4 'do not quote unbenchmarked numbers' rule is preserved verbatim. Diff: +41 / -26 across 1 file. Markdown renders cleanly. Cross-link: PR #311 (the §7 fix that motivated this cleanup).
Summary
AuditSinktrait — writesAuditEntryrows to a Lance dataset via Arrow RecordBatch appendDataType::Timestamp(Millisecond, Some("UTC"))— DataFusion temporal predicates work (WHERE ts > now() - 7d)tenant_id(Utf8),actor_id(Utf8),statement_hash(UInt64),timestamp(Timestamp/ms/UTC),action(Utf8),rls_predicates_added(UInt16),rewritten_plan(Utf8 nullable) — no silent column dropsscan_back(n)usesscanner.limit(Some(n), Some(skip))instead of full-scan-then-slice. O(1) not O(N).audit-log; tokio/arrow/lance/futures deps only activate under the flagReview notes
Scanner::limit(Option<i64>, Option<i64>)verified at source line 1344.Test plan
test_timestamp_column_is_temporal_type— schema assertion (failing-first proven)test_round_trip_preserves_rls_predicates_added_and_rewritten_plan— non-default values survive (failing-first proven)flush_10_then_scan_back_roundtrip— enriched with non-default rls_predicates_addedhttps://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Generated by Claude Code