feat: LF-2 VSA 16K resize + LF-3 JWT/RLS auth module#264
Conversation
Resize the VSA catalogue from 10,000 to 16,384 dimensions (LF-2), matching the existing Binary16K / Vsa16kF32 carrier widths. VSA_WORDS: 157 → 256 (256 × 64 = 16,384 bits) VSA_DIMS: 10,000 → 16,384 8 SMB role keys in [10000..14096), 512 dims each: KUNDE_KEY [10000..10512) customer SCHULDNER_KEY [10512..11024) debtor MAHNUNG_KEY [11024..11536) dunning RECHNUNG_KEY [11536..12048) invoice DOKUMENT_KEY [12048..12560) document BANK_KEY [12560..13072) banking FIBU_KEY [13072..13584) financial accounting STEUER_KEY [13584..14096) tax Headroom [14096..16384) reserved for future SMB keys (LIEFERANT/MITARBEITER/ZAHLUNG/LIEFERSCHEIN). Labels use FNV-64-seeded LCG generator: "smb.kunde" etc. All existing role keys unchanged. Disjointness test updated to include all 55 keys (47 existing + 8 new). 233 contract tests pass. Full workspace cargo check clean. Per SMB session spec doc at smb-office-rs `3a25ce2`. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
LF-3 / DM-7 — callcenter [auth] feature, Phase 1:
ActorContext (contract/auth.rs):
ActorContext { actor_id: String, tenant_id: TenantId, roles }
AuthError enum for extraction failures.
Zero-dep, in contract crate for cross-consumer use.
JwtMiddleware (callcenter/auth.rs, feature = "auth"):
extract_actor(token) — base64-decode JWT payload, parse JSON,
extract sub/tenant_id/roles into ActorContext.
Phase 1: no signature verification (deployment-specific).
Minimal base64url decoder (~30 lines, no external dep).
RlsRewriter (callcenter/rls.rs, feature = "query"):
DataFusion OptimizerRule that injects tenant_id + actor_id
predicates on TableScan nodes in the LogicalPlan.
Admin role skips actor_id filter.
Recursive plan tree walking.
Scope boundaries per SMB REQUEST at bf7c05e:
- IN: JWT → ActorContext → LogicalPlan RLS rewrite
- OUT: connectors, sharding, per-property marking
All tests pass. Workspace cargo check clean.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7310ec36f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let predicate = self.build_predicate(); | ||
| let filtered = LogicalPlanBuilder::new(plan) | ||
| .filter(predicate)? | ||
| .build()?; | ||
| Ok(Transformed::yes(filtered)) |
There was a problem hiding this comment.
Make RLS rewrite converge across optimizer passes
RlsRewriter::rewrite always wraps every TableScan in a new Filter and returns Transformed::yes, but DataFusion invokes optimizer rules repeatedly until a fixed point. With this implementation, the inner TableScan is rewritten again on each pass, so plans accumulate nested identical RLS filters (Filter(Filter(...TableScan))) until max_passes is hit, which bloats plans and can degrade optimization/execution for all authenticated queries. Add an idempotence guard (e.g., detect an existing injected RLS filter or rewrite in a way that does not keep changing the tree).
Useful? React with 👍 / 👎.
| .strip_prefix("Bearer ") | ||
| .or_else(|| header_value.strip_prefix("bearer ")) | ||
| .unwrap_or(header_value); |
There was a problem hiding this comment.
Accept any Bearer scheme casing
JwtMiddleware::extract_from_header only strips "Bearer " and "bearer ", but HTTP auth schemes are case-insensitive. Headers like "BEARER <token>" or mixed-case variants are therefore treated as raw tokens and fail JWT parsing, causing avoidable auth failures with clients/proxies that normalize casing differently. Parse the scheme token and compare with eq_ignore_ascii_case before extracting the credential.
Useful? React with 👍 / 👎.
Summary
3a25ce2.[auth]feature —ActorContexttype in contract,JwtMiddlewarefor JWT extraction (Phase 1, no sig verification),RlsRewriteras DataFusionOptimizerRuleinjecting tenant_id + actor_id predicates on TableScan nodes. Per SMB REQUEST atbf7c05e, UNKNOWN-3 (DataFusion LogicalPlan) and UNKNOWN-4 (actor_id = String) confirmed by user.Scope boundaries honored: DM-7 stays minimal. External connectors (PostgreSQL, MongoDB, SAP, SIEM, LLM APIs, O365, Google Drive) are future scope on the outer-membrane unified DTO layer (LF-10..19).
7 files changed, +884 / −41. Clippy clean on all new files. 233 contract + 16 callcenter tests pass.
Test plan
cargo clippy -p lance-graph-contract -p lance-graph-callcenter— no warnings in new filescargo test -p lance-graph-contract --lib— 233 passed (includes disjointness for all 55 keys)cargo test -p lance-graph-callcenter— 16 passedcargo check— full workspace cleanhttps://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Generated by Claude Code