Skip to content

feat(F3): LanceAuditSink with temporal timestamps + full schema round-trip#302

Merged
AdaWorldAPI merged 2 commits into
mainfrom
claude/pr-f3-lance-audit-writer
Apr 30, 2026
Merged

feat(F3): LanceAuditSink with temporal timestamps + full schema round-trip#302
AdaWorldAPI merged 2 commits into
mainfrom
claude/pr-f3-lance-audit-writer

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

  • LanceAuditSink implementing AuditSink trait — writes AuditEntry rows to a Lance dataset via Arrow RecordBatch append
  • Temporal timestamp: DataType::Timestamp(Millisecond, Some("UTC")) — DataFusion temporal predicates work (WHERE ts > now() - 7d)
  • Full schema: tenant_id (Utf8), actor_id (Utf8), statement_hash (UInt64), timestamp (Timestamp/ms/UTC), action (Utf8), rls_predicates_added (UInt16), rewritten_plan (Utf8 nullable) — no silent column drops
  • Limit pushdown: scan_back(n) uses scanner.limit(Some(n), Some(skip)) instead of full-scan-then-slice. O(1) not O(N).
  • Feature-gated behind audit-log; tokio/arrow/lance/futures deps only activate under the flag

Review notes

  • Initial implementation had Int64 timestamp, missing columns (rls_predicates_added, rewritten_plan silently zeroed), and full-scan footgun. All three fixed.
  • Lance v4 Scanner::limit(Option<i64>, Option<i64>) verified at source line 1344.

Test plan

  • test_timestamp_column_is_temporal_type — schema assertion (failing-first proven)
  • test_round_trip_preserves_rls_predicates_added_and_rewritten_plan — non-default values survive (failing-first proven)
  • flush_10_then_scan_back_roundtrip — enriched with non-default rls_predicates_added
  • 14 audit tests pass, 77 callcenter tests pass

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj


Generated by Claude Code

claude added 2 commits April 29, 2026 20:10
…writer (PR-F3)

Implements LanceAuditSink behind the `audit-log` feature flag (now pulls
lance + arrow + futures + tokio). Entries are buffered via append(), flushed
to a Lance dataset in append mode, and read back via scan_back(n).

Schema: tenant_id (Utf8), actor_id (Utf8), statement_hash (UInt64),
timestamp (Int64), action (Utf8).

Tests: flush-10 roundtrip, flush-1000 count, multi-flush accumulation,
empty-flush noop. All 75 tests pass (4 new Lance tests + 71 existing).

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 928c5d7f8e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +189 to +192
let entries: Vec<AuditEntry> = {
let mut guard = self.buffer.lock().unwrap_or_else(|e| e.into_inner());
std::mem::take(&mut *guard)
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve buffered entries until flush write succeeds

flush() drains the in-memory buffer with std::mem::take before any I/O, so if Dataset::write(...) later fails (e.g., transient object-store/network error), those audit records are dropped permanently instead of being retried. This creates silent data loss in exactly the failure scenarios an audit sink must tolerate.

Useful? React with 👍 / 👎.

Comment on lines +279 to +281
let ds = Dataset::open(&self.dataset_path)
.await
.map_err(|e| format!("Lance open error: {e}"))?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle missing dataset as empty in scan_back

scan_back() immediately errors if the dataset path has not been created yet, which happens before the first successful flush(). In that common startup path, callers cannot read history and must special-case errors, even though there are simply zero persisted entries; returning an empty vector for not-found would match the method contract better and avoid spurious failures.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit 830c266 into main Apr 30, 2026
1 of 5 checks passed
AdaWorldAPI added a commit that referenced this pull request Apr 30, 2026
Sprint C agent (PR #311) flagged five staleness items in the vision
doc that were out of its §7-only scope. Closing the debt now:

  Header           DRAFT - pending review (2026-04-28)
                   -> Status: F1 parity shipped 2026-04-30. F1
                      latency benchmark not yet started. F2 is a
                      posture, not a delivery.

  §2 anchor        as of 2026-04-28
                   -> as of 2026-04-30 (post-F1 parity ship)

  §2 latency cell  Designed to match; F1 numbers (forward tense)
                   -> Designed to match; benchmark pending

  §2 caveat        F1 publishes the first numbers (forward tense)
                   -> F1 parity has shipped (correctness); the
                      separately-scoped F1 latency benchmark has
                      not been started. Distinguishes the two
                      sub-deliverables explicitly.

  §3 F1            We stand up a Foundry instance... (forward)
                   -> Shipped 2026-04-30. Cross-link to §7's
                      as-shipped architecture.

  §3 F2            gated upstream by lance-graph PR-1 / PR-2
                   -> lance-graph PR #278 + #280 + #284 (RLS) and
                      PR #278 + #302 (audit). Status today:
                      lance-graph in production; medcare-rs
                      adopter not yet open. Posture, not
                      delivery.

  §3 F3            gated upstream by lance-graph PR-4
                   -> lance-graph PR #278 + #280 (parser +
                      hardening). Status today: parser stub on
                      lance-graph main; medcare-rs adopter is
                      future round-2 work.

  §4               benchmark harness lands as part of F1
                   F1 numbers are published (both forward tense)
                   -> F1 parity (correctness) shipped; F1 latency
                      benchmarking has not been started. The two
                      are separately-scoped F1 sub-deliverables.

What this PR does NOT touch:
  - F4, F5, §5 (risks), §6 (NOT promising), §7 (next deliverable
    just landed in PR #311 - clean already).
  - The vision doc's tone rule. Every change cites a concrete PR
    number or file path; no marketing language introduced.
  - Performance numbers. None claimed; the §4 'do not quote
    unbenchmarked numbers' rule is preserved verbatim.

Diff: +41 / -26 across 1 file. Markdown renders cleanly.

Cross-link: PR #311 (the §7 fix that motivated this cleanup).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants