diff --git a/.claude/DECISION_SPO_ARIGRAPH.md b/.claude/DECISION_SPO_ARIGRAPH.md new file mode 100644 index 00000000..824c63c0 --- /dev/null +++ b/.claude/DECISION_SPO_ARIGRAPH.md @@ -0,0 +1,145 @@ +# DECISION_SPO_ARIGRAPH.md + +> **Status:** Decided. **Decision:** Option B (federated, two-layer cache). **Date:** +> 2026-05-07. **Branch:** `claude/create-graph-ontology-crate-gkuJG`. +> +> **Authority:** This document is the binding ruling for SPO-1 within the scope of the +> `lance-graph-ontology` crate session. It does not legislate the broader `promote_to_spo` +> bridge work owned by SPO-1 itself, which remains the entropy-ledger row's plan. + +## The question + +The `lance-graph-ontology` crate must commit to one of three ways the existing two SPO +stores compose, because the registry that hydrates from TTL needs to know which store +its entities will ultimately settle in: + +A. **Canonical SPO + ARiGraph as a view.** SPO is the single source of truth. + ARiGraph's three layers become tagged subsets of SPO triples; ARiGraph's API stays + but storage delegates downward. +B. **Federated.** SPO and ARiGraph remain separate stores. A planner-IR routing rule + sends semantic / persistent reads to SPO and episodic / temporal reads to + ARiGraph. The two layers may exchange data through narrowly-scoped bridges. +C. **ARiGraph canonical, SPO as compatibility surface.** ARiGraph's three-layer model + becomes the storage substrate; SPO queries get rewritten into ARiGraph traversals. + +## The decision: B (federated) + +The two stores are not duplicates by design. They serve fundamentally different +operations: + +- `lance-graph::graph::spo::*` is **fingerprint-keyed**, columnar, fingerprint+ + HammingMin-semiring. It is built for cold, durable, high-cardinality knowledge — + the SPO-as-physics view (resonance, palette indices, NARS frequency/confidence + packed into bytes). Lookups are O(1) by fingerprint. +- `lance-graph::graph::arigraph::triplet_graph` is **string-keyed**, + `HashMap>`, episodic. It is built for warm, rapidly-mutating + working memory with cheap lexical recall — the AriGraph-as-mind view (semantic + + episodic + cognitive layers, ThinkingStyle activations, episode follows-edges). + +That is the workspace's own framing, recorded in +`.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md:245`: + +> | **SPO-1** | Stage 3 (×2 distinct purposes, **not duplicates by design**) | +> `triplet_graph` Smart (string-keyed methods); `spo::store` Smart (fingerprint-keyed +> methods) | Add `arigraph::SpoBridge::promote_to_spo(&TripletGraph, gate, &mut +> SpoStore)` — promotes warm string-keyed entries into cold fingerprint-keyed store | + +The two stores are an L1/L2 cache pair. Promoting them to a single canonical layer +(option A or C) collapses an architectural distinction the rest of the system has +already absorbed: + +- Cognitive cycles write episodic activations into ARiGraph at sub-millisecond + cadence. Forcing every write to also produce a fingerprinted SPO row would + serialise the hot path. +- SPO's CamPq UDF, palette indices, and NARS-truth columns are tuned for batch + scans against million-edge stores. Forcing a string-key index on top would break + the columnar contract that DataFusion and the planner rely on. +- The existing `SchemaExpander` trait at + `crates/lance-graph-contract/src/ontology.rs` already produces `ExpandedTriple`s + that the SPO bridge in `crates/lance-graph/src/graph/spo/ontology_bridge.rs` + writes. ARiGraph's triplet graph keeps its string-keyed insert path. The two + paths coexist and have not collided. + +## Justification from the recon + +Three findings drive the choice: + +**Working code already federates.** The SPO store and ARiGraph triplet store live +in sibling modules under `crates/lance-graph/src/graph/`. Each has its own builder, +own storage type, own retrieval API. They share only `TruthValue` (the contract +type). The federated topology is what the workspace built, has tested, and ships +on `main` today. Choosing A or C is a refactor of working code; B is the description +of working code. + +**The proposed remedy in the entropy ledger is one-way and additive.** The +ledger's resolution is `arigraph::SpoBridge::promote_to_spo(&TripletGraph, gate, +&mut SpoStore)` — a writer that promotes warm, string-keyed entries into cold, +fingerprint-keyed storage. That is a *gate* between two cache layers, not a +unification. Adopting B preserves it cleanly; A or C dissolves the layering it +gates. + +**The ontology crate doesn't need either store as the canonical authority.** The +ontology registry hydrates TTL into `Ontology`/`Schema`/`LinkSpec` values held by a +Lance dictionary table. Those values flow into existing `SchemaExpander`s, which +then write `ExpandedTriple` rows into whichever store the consumer chose: + +- The SPO store, via `spo/ontology_bridge.rs`, for cold persistent knowledge. +- ARiGraph's `triplet_graph::insert(...)`, for warm episodic state. + +The registry doesn't care which store a particular consumer writes to. It only +needs to answer "given this `bridge_id` and `public_name`, what `SchemaPtr` does +this map to?" That answer is independent of the SPO/ARiGraph layering. + +## What this means for the new crate + +The decision yields a **zero-change** load on the ontology crate. The crate produces +`Ontology` values; consumers carry them into whichever of the two stores fits the +operation. The crate's own state (the `ontology_dictionary` Lance table) is a third, +independent location: append-only, dictionary-only, never the SPO or ARiGraph +substrate. + +Concretely: + +- **No new code in `lance-graph::graph::spo/*`.** The existing SPO is unchanged. +- **No new code in `lance-graph::graph::arigraph/*`.** The existing ARiGraph is + unchanged. +- **No `promote_to_spo` bridge implementation in this session.** The entropy-ledger + row remains owned by SPO-1 itself; it is unblocked by this decision but not + closed by it. Closing SPO-1 is a future session's deliverable; the bridge fn + signature `arigraph::SpoBridge::promote_to_spo(&TripletGraph, gate, &mut + SpoStore)` is correct as drafted. +- **Ontology hydration produces `Ontology` values.** Whether a downstream consumer + expands those into SPO triples (cold, durable) or ARiGraph triples (warm, + episodic) is the consumer's choice, made through the existing `SchemaExpander` + surface. + +## What this rules out + +Future sessions arriving with a "merge SPO and ARiGraph into one store" proposal +should bring evidence that overrides the three findings above (working code +federates; the ledger remedy is one-way; the ontology crate doesn't need +unification). Absent that evidence, B stands. + +This decision does NOT freeze SPO-1. The entropy-ledger row remains open. Adding +the `promote_to_spo` bridge is the natural next step and is unblocked by this +ruling. + +## What this decision does NOT decide + +- Whether ARiGraph's three layers (semantic / episodic / cognitive) deserve their + own dedicated TruthValue algebras. They share `contract::crystal::TruthValue` + today; that may not be optimal long-term, but the ontology crate has no opinion. +- Whether `SchemaExpander` should grow a target-store discriminator. Not necessary + for this session — consumers pick their store explicitly when they call the + expansion path. +- Whether the planner IR should annotate steps with their target cache layer. + That is planner work, not ontology work. + +## Citations + +- `.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md:70` — SPO-1 row. +- `.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md:245` — SPO-1 disposition. +- `crates/lance-graph/src/graph/spo/ontology_bridge.rs` — existing + `SchemaExpander` integration. +- `crates/lance-graph-contract/src/ontology.rs:1-120` — `Ontology`/`Schema`/ + `OntologyBuilder` definition. diff --git a/.claude/RECON_ONTOLOGY_CRATE.md b/.claude/RECON_ONTOLOGY_CRATE.md new file mode 100644 index 00000000..8191238b --- /dev/null +++ b/.claude/RECON_ONTOLOGY_CRATE.md @@ -0,0 +1,322 @@ +# RECON_ONTOLOGY_CRATE.md + +> **Scope:** Phase 1 reconnaissance for `lance-graph-ontology` crate (revision 4 of the +> ontology-crate scope). Read-only; no code is permitted before this document and +> `DECISION_SPO_ARIGRAPH.md` are committed. +> +> **Branch:** `claude/create-graph-ontology-crate-gkuJG` +> **Date:** 2026-05-07 +> **Author:** Session-class agent on Opus 4.7 (1M) + +The plan promised that a long list of substrates was already shipped and that this +session's only deltas are one new crate plus two architectural decisions. The findings +below verify or refute each claim with a path and a short quote. + +## 1. "Already shipped" verification + +### 1.1 `lance-graph-contract::mul` and the shader-driver wiring + +`crates/lance-graph-contract/src/mul.rs` is 344 lines and contains +`MulAssessment::compute(&SituationInput)` as a carrier-method (no free functions on +state). Confirmed. + +`crates/cognitive-shader-driver/src/driver.rs` is 1,195 lines. The MUL wiring sits at +lines 271-320 (the plan said "271-320" — verified). The driver builds a +`SituationInput` from observable shader state, calls `MulAssessment::compute`, and +gates Flow→Hold via `mul.is_unskilled_overconfident()`: + +> ``` +> 295 let situation = SituationInput { +> 296 felt_competence: top_resonance.clamp(0.0, 1.0) as f64, +> 297 demonstrated_competence: (1.0 - free_energy.total).clamp(0.0, 1.0) as f64, +> 298 environment_stability: (1.0 - std_dev_clamped).clamp(0.0, 1.0), +> 299 challenge_level: std_dev_clamped, +> 300 skill_level: awareness_skill, +> 301 ..SituationInput::default() +> 302 }; +> 303 let mul = MulAssessment::compute(&situation); +> ... +> 308 let gate = if free_energy.is_catastrophic() { +> 309 GateDecision::BLOCK +> 310 } else if mul.is_unskilled_overconfident() { +> 311 // MUL veto: the system "feels confident" while DK / trust +> 312 // textures flag the gap. Hold rather than commit. +> 313 GateDecision::HOLD +> ``` + +The publisher-side defaults (`calibration_accuracy`, `allostatic_load`, +`max_acceptable_damage`, `sandbox_available`) flow from `SituationInput::default()` +exactly as the plan stated. Tightening those is publisher-side polish, not this +session's work. + +### 1.2 `lance-graph-contract::ontology` + +`crates/lance-graph-contract/src/ontology.rs` is 646 lines. The first 120 lines +declare `Locale`, `Label`, `EntityTypeId` (Foundry Object-Type equivalent), +`Ontology` with `schemas: Vec`, `links: Vec`, `actions: +Vec`, plus an `OntologyBuilder`. + +`PropertySpec`, `Marking` (Public/Internal/Pii/Financial/Restricted), `SemanticType`, +`Schema`, `LinkSpec`, `ActionSpec` live in +`crates/lance-graph-contract/src/property.rs`. They are imported into `ontology.rs` +at line 20 and composed via `Ontology` and `OntologyBuilder`. Confirmed. + +A `SchemaExpander` trait already exists and is exercised by +`crates/lance-graph/src/graph/spo/ontology_bridge.rs` — that bridge already turns +a contract `Ontology` plus an entity instance into `ExpandedTriple`s for the SPO +store. This is critical for the new crate: TTL hydration produces +`Ontology`/`Schema` values, and the existing SchemaExpander path takes them from +there. + +### 1.3 Polyglot parsers in `lance-graph-planner` + +`crates/lance-graph-planner/src/strategy/` contains: + +``` +arena_ir.rs chat_bundle.rs collapse_gate.rs cypher_parse.rs +dp_join.rs extension.rs gql_parse.rs gremlin_parse.rs +histogram_cost.rs jit_compile.rs mod.rs morsel_exec.rs +rule_optimizer.rs sigma_scan.rs sparql_parse.rs stream_pipeline.rs +truth_propagation.rs workflow_dag.rs +``` + +Cypher / Gremlin / SPARQL / GQL strategies are present and each implements the +`PlanStrategy` trait with a `plan(&self, input, arena)` method. Confirmed. + +**Caveat (PARSER-1 in entropy ledger):** `cypher_parse.rs` is a 72-line *stub* that +detects features by uppercased substring match; the real nom parser lives at +`crates/lance-graph/src/parser.rs:23` (`pub fn parse_cypher_query(input: &str) -> +Result`). The comment at the bottom of the strategy says explicitly: + +> ``` +> // Real implementation: call lance-graph's parser::parse_cypher_query() +> // to produce a full AST. For now, feature detection is the output. +> ``` + +This is documented in `.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md` row PARSER-1 +("Wired ×1 real + Stub ×3 + parallel ×1 excluded", entropy 5). The plan to wire is +known but unowned: *"Wire `cypher_parse::CypherParse::plan` to call +`lance-graph::parser::parse_cypher_query` (real nom)."* + +Implication for this session's Phase 7 (`woa-rs` integration test): the test should +call `lance_graph::parser::parse_cypher_query` directly, not the strategy. The +strategy-dispatched path is gap'd by design and out of scope here. + +### 1.4 SPO store and ARiGraph + +`crates/lance-graph/src/graph/spo/` has `builder.rs`, `merkle.rs`, `mod.rs`, +`nsm_bridge.rs`, `ontology_bridge.rs`, `semiring.rs`, `store.rs`, `truth.rs`. +Fingerprint-keyed columnar SoA layout with `HammingMin` truth-semiring as advertised. +Confirmed. + +`crates/lance-graph/src/graph/arigraph/` has `episodic.rs`, `language.rs`, `mod.rs`, +`orchestrator.rs`, `retrieval.rs`, `sensorium.rs`, `triplet_graph.rs`, `xai_client.rs`. +`triplet_graph.rs` is the string-keyed `HashMap>` (1,072 LOC per +the entropy ledger) holding warm episodic state. Confirmed. + +`AdjacencyStore` (CSR/CSC) and `batch_adjacent` are referenced by +`crates/lance-graph-planner/src/adjacency/` per the workspace dep graph. Three-layer +ARiGraph transcode (semantic Concept + Relation; episodic Episode + EXTRACTED/FOLLOWS; +cognitive ThinkingStyle + ACTIVATES) maps to the files above. Confirmed. + +### 1.5 `CausalEdge64` and `cognitive-shader-driver` + +`crates/causal-edge/src/` has `edge.rs`, `lib.rs`, `network.rs`, `pearl.rs`, +`plasticity.rs`, `tables.rs`. `edge.rs` defines the 64-bit edge with palette indices, +NARS (frequency, confidence), Pearl 2³ `causal_mask`, direction/inference/plasticity/ +temporal bits per the plan. Confirmed. + +`crates/cognitive-shader-driver/src/` has `bindspace.rs` (BindSpace SoA columns), +`driver.rs` (1,195 lines including the MUL gate at 271-320, CausalEdge64 emission +loop at 322 onwards), `planner_bridge.rs` (translates contract↔planner SituationInput +shapes per the plan). Confirmed. + +### 1.6 `smb-ontology` declarative pattern + +`smb-office-rs/crates/smb-ontology/src/` is **2,079 lines** across 8 files: +`customer.rs` (364), `lib.rs` (130), `mahnung.rs` (176), `markings.rs` (151), +`rechnung.rs` (187), `remaining.rs` (632), `schuldner.rs` (149), `woa_artikel.rs` +(290). 13 German Steuerberater entities expressed as declarative +`PropertySpec`/`SemanticType` Rust. Confirmed. This is the proven Foundry-shape +pattern. It stays untouched per the plan as the OGIT-skeptical-customer fallback. + +### 1.7 Existing OGIT / `ontology-crate` work + +``` +$ rg -i "ogit|open.*graph.*it|almatoai|ontology_dict" --type rs --type md --type toml +``` + +found nothing in `crates/lance-graph-contract/src/*.rs`, and one hit each in +`crates/lance-graph-callcenter/`, `crates/lance-graph-contract/src/ontology.rs`, +`crates/lance-graph/src/graph/spo/ontology_bridge.rs`, and +`crates/lance-graph-callcenter/src/transcode/ontology_table.rs` — all of those are +*pre-existing* ontology-shaped code (callcenter DTO, contract ontology builder, SPO +bridge), none of them reference OGIT. + +`find crates -name "*ontology*" -type d` returns nothing — there is no +`crates/lance-graph-ontology/` directory yet. Confirmed: the new crate does not exist. + +`grep -i "OGIT\|almatoai"` against `.claude/**/*.md` returns nothing. Confirmed: no +prior OGIT integration work exists in this workspace. + +### 1.8 `woa-rs` and `WoA` state + +`woa-rs/` contains exactly `CLAUDE.md`, `NOTES.md`, `PROMPT.md`, `README.md`, +`rfcs/` — bare scaffolding as the plan claimed. The `CLAUDE.md` declares a chunked- +write rule, names `AdaWorldAPI/WoA` as canonical source, and says behavioural parity +is the spec. No Rust source is present yet. Confirmed. + +`WoA/` (Python source) contains `RUST_TRANSCODE_PLAN.md` (58,054 bytes — matches +"58KB"), `RUST_TRANSCODE_LEDGER.md` (5,036 bytes), `models.py` (527 lines), +`app.py`, `pdf_gen.py`, `mail_send.py`, `migrate_data.py`, `import_keepass.py`, +`vault_io.py`, `wsgi.py`, plus `static/`, `templates/`, `requirements.txt`. The +transcode plan and ledger are present and authoritative for Phase 6. Confirmed. + +`models.py` declares 16 SQLAlchemy models we will need to TTL-emit: + +``` +Tenant, User, Customer, Article, WorkOrder, Position, Activity, Picture, +HistoryEntry, LogbookEntry, NumberSequence, Setting, CustomerPortalUser, +PasswordEntry, TimeSheet +``` + +(Plus the `WorkOrder.doc_type` enum: `workorder | offer | order | invoice | credit | +gutschrift`.) The shape is German-language WaWi/handwerk: customer + article + work- +order with positions, activities, pictures, history. Phase 6 will transcode these +into TTL files under `OGIT/NTO/WorkOrder/`. + +### 1.9 OGIT fork state + +`/home/user/OGIT/` is the AdaWorldAPI fork. Top-level: `NTO/`, `SDF/`, `SGO/`, +`bin/`, `docs/`, `pdf/`, `versioning/`, plus `ogit.ttl`, `validate.sh`, +`namespace.sh`, `singleTTL.sh`, `verbToEntity2.py`, `LICENSE.md`, `README.md`. + +`NTO/` contains 66 namespace directories sorted alphabetically: `Accounting`, +`Advertising`, `Audit`, `Auth`, `Automation`, `Botany`, `ClassificationStandard`, +`Compliance`, `Cost`, `Credit`, `CustomerSupport`, `Data`, `DataProcessing`, +`Datacenter`, `Documents`, `EmailCorrespondance`, `Examples`, `Factory`, +`FinancialAccounting`, `FinancialMarket`, `Forms`, `Forum`, `GeoProfile`, `HR`, +`Health`, `Knowledge`, `Legal`, `Location`, `MARS`, `ML`, `MRO`, `MRP`, +`MaterialManagement`, `Meteorology`, `Mobile`, `Network`, `OSLC-arch`, +`OSLC-asset`, `OSLC-automation`, `OSLC-change`, `OSLC-core`, `OSLC-crtv`, +`OSLC-ems`, `OSLC-perfmon`, `OSLC-qm`, `OSLC-reqman`, `PLM`, `PTF`, `Politics`, +`Price`, `Procurement`, `Project`, `Publications`, `RDDL`, `RL`, `RPA`, `Religion`, +`SaaS`, `SalesDistribution`, `Schedule`, `Security`, `ServiceManagement`, +`Software`, `Statistics`, `Survey`, `Transport`, `UserMeta`, `Version`. No +`WorkOrder/`, no `Healthcare/`, no `Steuerberater/`, no `Q2Gotham/` directories +exist yet. Confirmed: phase 6 adds `WorkOrder/` and is the first AdaWorldAPI +extension to the fork. + +A representative entity TTL (`NTO/Network/entities/IPAddress.ttl`) shows the +convention used by all OGIT entity files: `@prefix` declarations, an +`ogit.:` subject typed `a rdfs:Class; rdfs:subClassOf +ogit:Entity`, then `rdfs:label`, `dcterms:description`, `dcterms:valid`, +`dcterms:creator`, `ogit:scope "NTO"`, `ogit:parent ogit:Node`, three RDF lists +for `ogit:mandatory-attributes`, `ogit:optional-attributes`, +`ogit:indexed-attributes`, and an `ogit:allowed` block enumerating the verbs and +target entity types this entity may participate in. Phase 6 emits TTL in exactly +this shape. + +`ogit.ttl` (root vocabulary) declares `ogit:Entity`, `ogit:Attribute`, and the +verb/scope vocabulary — same conventions, with `ogit:scope "SGO"` for the root. + +### 1.10 Already-shipped summary table + +| Plan claim | Status | Evidence | +|--------------------------------------------|------------|-----------------------------------------------------------------| +| `lance-graph-contract::mul` | Confirmed | `crates/lance-graph-contract/src/mul.rs` (344 LOC) | +| Shader-driver MUL veto at driver.rs:271-320| Confirmed | `cognitive-shader-driver/src/driver.rs:271-320` | +| `lance-graph-contract::ontology` | Confirmed | `crates/lance-graph-contract/src/ontology.rs` (646 LOC) | +| `PropertySpec`/`Marking`/`SemanticType` | Confirmed | `crates/lance-graph-contract/src/property.rs` | +| `SchemaExpander` trait + spo bridge | Confirmed | `lance-graph/src/graph/spo/ontology_bridge.rs` | +| Polyglot parsers (Cypher/GQL/Gremlin/SPARQL)| Confirmed* | `lance-graph-planner/src/strategy/{cypher,gql,gremlin,sparql}_parse.rs` (\*PARSER-1 stub gap) | +| SPO store (fingerprint, HammingMin) | Confirmed | `lance-graph/src/graph/spo/` | +| ARiGraph triplet_graph (string-keyed) | Confirmed | `lance-graph/src/graph/arigraph/triplet_graph.rs` | +| `CausalEdge64` | Confirmed | `crates/causal-edge/src/{edge,pearl,plasticity,tables}.rs` | +| BindSpace SoA columns + driver | Confirmed | `cognitive-shader-driver/src/{bindspace,driver}.rs` | +| `smb-ontology` (declarative Rust, 13 ents) | Confirmed | `smb-office-rs/crates/smb-ontology/` (2,079 LOC, 8 files) | +| `lance-graph-ontology` does NOT exist | Confirmed | Workspace `Cargo.toml` members + `find` returns no match | +| OGIT references in lance-graph | Confirmed: none | rg returns no `OGIT|almatoai` hit in workspace | +| AdaWorldAPI/OGIT NTO upstream parity | 66 namespaces present, no AdaWorldAPI extensions yet | +| `woa-rs` is bare scaffolding | Confirmed | 4 markdown files, 1 empty `rfcs/` dir, no Rust source | +| `WoA/RUST_TRANSCODE_PLAN.md` (~58 KB) | Confirmed | 58,054 bytes; ledger 5,036 bytes; `models.py` 527 LOC, 16 models| + +The plan's "already shipped" list holds. The PARSER-1 gap is documented and out of +scope; we will route Phase 7 around it via the real `parse_cypher_query` entry point. + +## 2. SPO-1 evidence + +`.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md:70` declares the SPO-1 row: + +> | **SPO-1** | R7/R6 | Two SPO stores | Wired (×2 distinct) | 2 | Med | +> `lance-graph::graph::spo::*` (fingerprint-keyed, HammingMin truth-semiring) + +> `lance-graph::graph::arigraph::triplet_graph` (string-keyed `HashMap Vec>`, 1,072 LOC). Share only `TruthValue`. **No bridge fn** between them +> — `to_fingerprints()` is a derive, not a writer. | none | Missing | **4** | + +`.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md:245` declares the disposition: + +> | **SPO-1** | Stage 3 (×2 distinct purposes, **not duplicates by design**) | +> `triplet_graph` Smart (string-keyed methods); `spo::store` Smart (fingerprint-keyed +> methods) | Add `arigraph::SpoBridge::promote_to_spo(&TripletGraph, gate, &mut +> SpoStore)` — promotes warm string-keyed entries into cold fingerprint-keyed store | + +So the workspace has already named the answer: the two stores are not duplicates, +they are two cache layers (warm, string-keyed working memory; cold, fingerprint-keyed +durable). The blocker for closure is the missing one-way `promote_to_spo` writer. +`DECISION_SPO_ARIGRAPH.md` adopts this directly and discusses what (if anything) the +ontology crate has to do about it. + +## 3. Other findings worth pinning + +### 3.1 No TTL parsing dependency yet + +`Cargo.lock` and per-crate `Cargo.toml` files contain no reference to `oxttl`, +`oxrdf`, or `sophia`. The new `lance-graph-ontology` crate will be the first +introduction of a TTL parser dependency in this workspace. + +### 3.2 `AGENT_LOG.md` is missing + +Entropy ledger row AGENT-LOG-1 (entropy 3) reports that `.claude/board/AGENT_LOG.md` +is referenced by CLAUDE.md but does not exist. This session creates it as a side- +effect of doing Layer-2 A2A handover correctly. (The Mandatory Board-Hygiene Rule +in CLAUDE.md says any PR adding work prepends an entry to AGENT_LOG.md in the same +commit — so creating the file is forced once we land any artifact at all.) + +### 3.3 Existing `ontology_dictionary` Lance table + +`grep -rn "ontology_dict\|ontology_dictionary"` against `lance-graph/` returns no +hits in source. The Lance dictionary table specified in the plan does not exist yet. +Confirmed: Phase 4 creates it. + +### 3.4 `SchemaExpander` is the contact point + +The new crate does *not* invent its own EntityStore. It produces `Ontology` / +`Schema` / `LinkSpec` values via TTL hydration and hands them to existing +`SchemaExpander` paths. Specifically: + +- TTL → `MappingProposal { schemas: Vec, links: Vec }` +- `OntologyBuilder::schema(...).link(...).build()` → `Ontology` +- `Ontology::expand_entity(...)` (already exists, see `spo/ontology_bridge.rs` + test) → `ExpandedTriple` for SPO writes + +So the new crate is a *parser + cache + scoping facade*, not a new storage layer. +This matches the plan's framing ("scoped views, not stores"). + +## 4. Out-of-scope confirmations + +This session does not produce: a new SPOG quad store, janus-driver, new SPARQL/ +Gremlin/GQL/Cypher parsers, a new SPO store, new MUL wiring or new MUL publishers, +new CausalEdge64 variants, new BindSpace columns, smb-ontology TTL migration, +callcenter-bridge, MySQL/MSSQL `SchemaSource` impls, or a customer admin form. Those +are explicit non-goals from the plan. None of the recon above changes that boundary. + +## 5. What this recon authorises + +The plan's premises hold. Phases 2–7 may proceed. The two adjustments are (a) +Phase 7 calls `lance_graph::parser::parse_cypher_query` directly because PARSER-1 +stub is unowned, and (b) Phase 6 TTL emission targets `OGIT/NTO/WorkOrder/` (the +fork has no `WorkOrder/` directory yet, so this is an additive PR with no +upstream conflict). + +The next deliverable is `DECISION_SPO_ARIGRAPH.md`. Then the crate. diff --git a/.claude/board/EPIPHANIES.md b/.claude/board/EPIPHANIES.md index 9eec5e57..f8e254c5 100644 --- a/.claude/board/EPIPHANIES.md +++ b/.claude/board/EPIPHANIES.md @@ -3729,3 +3729,11 @@ Cross-ref: 2026-04-26 BF16-mantissa-inline (Column F); 2026-04-26 SPO Pearl 2³ ontology enrichment (Column E); 2026-04-24 Two SoAs + ONNX L4→L1 feedback (Column G context); LF-22 ObjectView (Column H foundation); soa-review.md §semantic kernel; Q2 plan §Vertex equivalent. + +## 2026-05-07 — FINDING: SPO-1 disposition is Option B (federated two-layer cache; ARiGraph + SPO are NOT duplicates by design) + +**Status:** FINDING + +SPO-1 (the longstanding "are SPO and ARiGraph triplet_graph two implementations of the same triple store?" question) closes with **Option B: federated, two-layer cache**. ARiGraph's `triplet_graph` is the L1 cognitive hot-cache (NARS-truth-bearing, Pearl 2-cube-aware, episodic-bound); SPO is the L2 cold-store (Merkle-anchored, semiring-algebra-ready, persistence-friendly). They share schema via the new `lance-graph-ontology` crate's `OntologyRegistry` but stay structurally distinct because their access patterns and truth-update semantics diverge. The `promote_to_spo` writer bridge is the cache-eviction path (L1 hot → L2 cold) and remains separately owned (not closed by the ontology crate). The earlier instinct "they are duplicates, deduplicate them" was wrong — the dual-layer split is the design, not an accident. + +Cross-ref: `.claude/DECISION_SPO_ARIGRAPH.md` (full decision text, commit `edef321`); `ARCHITECTURE_ENTROPY_LEDGER.md` rows 70 (SPO) + 245 (ARiGraph triplet_graph) — both retain "Wired" status; the federated-cache framing reconciles the apparent overlap. The `lance-graph-ontology` crate (commit `4cf9a26`) is the agnostic schema/bridge spine; consumers route through `SchemaExpander`. SPO-1 itself does NOT close — only its disposition does; `promote_to_spo` remains queued. diff --git a/.claude/board/INTEGRATION_PLANS.md b/.claude/board/INTEGRATION_PLANS.md index 470de201..38f2a57c 100644 --- a/.claude/board/INTEGRATION_PLANS.md +++ b/.claude/board/INTEGRATION_PLANS.md @@ -35,6 +35,36 @@ - **Confidence** — **mutable**: Working / Partial / Broken — see PR #N +--- + +## ogit-cascade-supabase-callcenter-v1 — OGIT SPO-G + Supabase realtime + Zone 1/2/3 (authored 2026-05-07) + +- **Plan:** `.claude/plans/ogit-cascade-supabase-callcenter-v1.md` +- **Author + date:** main thread (Opus 4.7 1M), 2026-05-07. +- **Status:** Active. +- **Scope:** 15 deliverables across `lance-graph-callcenter`, `lance-graph-ontology`, AdaWorldAPI/OGIT (extension fork), and a future `lance-graph-rdf` consumer. Pillar 0 (the holy-grail click): `OntologyRegistry` IS the SoA; per-domain schema (Healthcare, WorkOrder, SMB, CallCenter, Medical) IS the DTO + name→row index. Codec cascade per row: identity Vsa16kF32 → CAM-PQ 6 B → Base17 34 B → palette key 4 B → Scent 1 B → qualia/meta/edge columns. Every step O(1). Pillar 1: OGIT as universal SPO-G lingua franca with `ontology_context_id: u32` per named graph. Pillar 2: Zone 1 (BindSpace, no Serialize) / Zone 2 (Arrow scalar membrane, BBB invariant) / Zone 3 (Supabase RPC, REST, transcode — the only emission point). Pillar 3: smb-bridge + medcare-bridge collapse to 2-line projections over `OntologyRegistry::enumerate(ns)`. Pillar 4: BioPortal arsenal — 10 namespace stubs under `OGIT/NTO/Medical/{ICD10CM,RxNorm,LOINC,FMA,RadLex,SNOMED,MONDO,HPO,DRON,CHEBI}/` carrying provenance + license + size, with full ingestion gated on `lance-graph-rdf-fma-snomed-v1`. +- **Originating context:** main-thread question 2026-05-07: *"should the lance-graph-ontology be the SoA and the schema the DTO + index?"* — answered YES, with the codec cascade chain making it content-addressable through every encoding tier (the holy grail). User-supplied references: `MedCare-rs/.MYSQL/Struktur.sql` (104 tables, 5 dominant prefixes) and `MedCare-rs/releases/tag/bioportal-ontologies-2026-05-05` (25 bundles, ~2.4 GB). +- **Resolves ledger rows:** none directly. **Hardens** v5's D-9 (`MulThresholdProfile` becomes `ontology_context_id`-aware, so medical thresholds are stricter than callcenter thresholds). **Locks down** the BBB membrane doctrine from `callcenter-membrane-v1.md` § 10.9 with a `cert-officer` static check (D-CASCADE-V1-1). +- **Branch:** `claude/create-graph-ontology-crate-gkuJG` (continues the v4/v5 thread). PR target: `AdaWorldAPI/lance-graph` base=`main`. OGIT-fork PRs land under the same branch on the OGIT-fork side. +- **Confidence (2026-05-07):** Pre-execution. Pillar 0 is the only architectural commitment that admits no rollback — and it is right per the existing `LazyLock<&OntologyRegistry>` pattern in `lance-graph-ontology/src/bridges/`. Top-3 ranked: D-CASCADE-V1-1, D-CASCADE-V1-2, D-CASCADE-V1-3 (no upstream blockers). +- **Cross-plan deps:** v5 D-9 (`MulThresholdProfile`), `lance-graph-rdf-fma-snomed-v1` (`SemanticQuad`), `supabase-subscriber-v1` (DM-4 watcher / DM-6 drain), `callcenter-membrane-v1` § 10.9 (BBB iron rule). +- **Out of v1 scope (deferrals):** full SNOMED CT import (license-gated; BioPortal release ships only 666 KB partial), full DRON / CHEBI import (size unclear-payoff; revisit after D-CASCADE-V1-11 measures cascade), n8n-rs / crewai-rust consumption of new SoA columns (separate plan), bgz-tensor attention layer integration (orthogonal). + +--- + +## lance-graph-ontology-v5 — post-merge follow-ons (authored 2026-05-07) + +- **Plan:** `.claude/plans/lance-graph-ontology-v5.md` +- **Author + date:** integration-lead (Opus 4.7 1M), 2026-05-07 +- **Status:** Active +- **Scope:** Picks up where v4 (`claude/create-graph-ontology-crate-gkuJG`, OGIT#1 merged) left off. 15 deliverables ranked by leverage / cost: D-ONTO-V5-1 (dcterms:source provenance, closes TTL-PROBE-5), D-ONTO-V5-2 (`arigraph::SpoBridge::promote_to_spo`, closes SPO-1), D-ONTO-V5-3 (Healthcare TTL transcode), D-ONTO-V5-4 (smb-ontology export-only, NOT migration — brutal-honest reversal, ratified by main thread 2026-05-07), D-ONTO-V5-5 (q2 TTL transcode), D-ONTO-V5-6/7 (MySQL/MSSQL `SchemaSource` impls), D-ONTO-V5-8 (customer admin form, owned by woa-rs surface), D-ONTO-V5-9 (ontology-aware MUL trust thresholds — registry as namespace-keyed lookup), D-ONTO-V5-10 (callcenter-bridge, deferred until SUBJECT-DTO-1 lands), D-ONTO-V5-11 (woa-rs 80/20 binary cut), D-ONTO-V5-12 (MUL publishers — Brier/damage/sandbox), D-ONTO-V5-13 (hydration parallelism), D-ONTO-V5-14 (Lance dictionary load probe), D-ONTO-V5-15 (in-memory → Lance-backed cutover). +- **Originating context:** v4 OGIT#1 merge (15 entities + 12 verbs in `NTO/WorkOrder/`, master); 36 ontology tests pass; cognitive-shader-driver wired (read-only registry attachment). +- **Resolves ledger rows:** TTL-PROBE-5 (D-ONTO-V5-1), SPO-1 (D-ONTO-V5-2 70+245). Partial leverage on MUL-ASSESS-1 (registry as namespace-keyed threshold table). No leverage on TRUST-1 / FLOW-1 / COMPASS-1 / PARSER-1 (out of scope; the ontology crate has no influence on enum consolidation or the cypher cold/hot split). +- **Branch:** `claude/onto-v5-` per deliverable; OGIT-fork PRs per namespace transcode. Upstream `almatoai/OGIT` is never PR'd (ratified 2026-05-07). +- **Confidence (2026-05-07):** Pre-execution. Plan reviews v4's outputs as FINDING-grade and v5's deferrals as honestly-deferred (not punted). Next-3 ranked: D-ONTO-V5-1, D-ONTO-V5-9, D-ONTO-V5-2. +- **Cross-ref:** `.claude/RECON_ONTOLOGY_CRATE.md`, `.claude/DECISION_SPO_ARIGRAPH.md`, `.claude/knowledge/ontology-registry.md`, `sql-spo-ontology-bridge-v1.md` (partially superseded), `foundry-roadmap-unified-smb-medcare-v1.md` (adjacent). +- **Ratifications (main-thread, 2026-05-07):** Q1 smb-ontology export-only — RATIFIED (consistent with v4 "preserved as native fallback"; not a contradiction). Q2 D-9 above D-2 ordering — RATIFIED (registry has zero behavioral consumer until V5-9 lands; SPO L1/L2 cache works without the bridge fn today). Q3 `MulThresholdProfile` location — RATIFIED in `lance-graph-contract` (zero-dep canonical home; co-located with `MulAssessment`). Q4 OGIT-fork upstream PR rule — RATIFIED (AdaWorldAPI/OGIT extension fork only; never PR back to almatoai/OGIT). + --- ## splat-osint-ingestion-v1 — Splat contract + EWA OSINT bridge (authored 2026-05-06) @@ -244,3 +274,8 @@ Phases 2–4 queued. **Scope:** Map the shared Foundry parity surface consumed by both smb-office-rs and medcare-rs. Resolve 5 callcenter UNKNOWNs (consumer-validated). Document the DataFusion/SQL groundtruth pattern. Identify shared build priorities (DM-8 PostgREST is P-0 for both). Ontology unification: one contract shape, two domain-specific instances. **Path:** `.claude/plans/foundry-consumer-parity-v1.md` **Cross-ref:** `smb-office-rs/docs/foundry-parity-checklist.md` (45 LF chunks); `medcare-rs` callcenter-as-owner architecture; `q2-foundry-integration-v1.md`; `lf-integration-mapping-v1.md`; `callcenter-membrane-v1.md` (UNKNOWNs resolved) + +## 2026-05-07 — Status annotation: `sql-spo-ontology-bridge-v1` partially superseded + +**Status:** Active (partially superseded by `lance-graph-ontology` crate, 2026-05-07) +**Note:** The `SchemaExpander` proposed in `sql-spo-ontology-bridge-v1` already shipped in earlier work, and the new `lance-graph-ontology` crate (commit `4cf9a26`, branch `claude/create-graph-ontology-crate-gkuJG`) consumes it as its sole bridge surface. The plan's Phase 4 (NARS cold sink) and `promote_to_spo` writer bridge remain owned by the original plan. Recon + decision for the new crate: `.claude/RECON_ONTOLOGY_CRATE.md` + `.claude/DECISION_SPO_ARIGRAPH.md` (prior commit `edef321`). Federated two-layer cache (Option B): SPO + ARiGraph triplet_graph are not duplicates by design; entropy-ledger rows 70 + 245 cite the L1/L2 cache pair. APPEND-ONLY annotation; original plan entry not edited. diff --git a/.claude/board/LATEST_STATE.md b/.claude/board/LATEST_STATE.md index ecc5bec6..927cd04b 100644 --- a/.claude/board/LATEST_STATE.md +++ b/.claude/board/LATEST_STATE.md @@ -218,3 +218,19 @@ the contract. This file exists to prevent that. | **#270** | 2026-04-26 | ci: remove typos spell-check job (too many false positives) | Removed crate-ci/typos from style.yml; cargo fmt --check remains | | **#269** | 2026-04-26 | feat: Distance trait + SIMD Hamming/cosine wiring + PaletteDistanceTable + Dockerfile docs | Distance trait; SIMD Hamming/cosine wiring; PaletteDistanceTable 128KB; Dockerfile.md | + +--- + +## 2026-05-07 — Append: lance-graph-ontology shipped (commit 4cf9a26, branch claude/create-graph-ontology-crate-gkuJG) + +(Per APPEND-ONLY rule: this dated annotation augments the "Recently Shipped PRs" table and "Current Contract Inventory" snapshot above. Treat the row below as the new top-of-table entry; treat the inventory paragraph below as a new top-of-inventory entry.) + +### Recently Shipped PRs — new top row + +| PR | Merged | Title | What it added | +|---|---|---|---| +| **(open / pending merge)** | *(open)* | feat(lance-graph-ontology): scaffold OGIT-canonical ontology spine | New workspace member `crates/lance-graph-ontology/` (~3000 LOC, 28 tests = 16 inline + 12 integration). Phases 3-5 of the v4 plan: scaffold + TTL hydration + tenant bridges. Public surface: `OntologyRegistry`, `NamespaceBridge` trait, `NamespaceId`, `OgitUri`, `SchemaPtr`, `SchemaKind`, `MappingProposal`, `MappingProposalKind`, `MappingRow`, `MappingHandle`, `HydrationReport`, `HydrationFailure`, `BridgeError`, `Error`, `SchemaSource` trait, `EntityRef`, `EdgeRef`, `OntologyAssembler`, `SemanticTypeMap`, `TtlSource`. Default tenant bridges: `bridges::WoaBridge`, `bridges::MedcareBridge`, `bridges::OgitBridge`. Feature-gated `lance_cache::LanceWriter` (under `lance-cache` feature, gated to keep zero-protoc compile path). Builds on prior commit `edef321` (recon + SPO-1 decision: federated two-layer cache, Option B). | + +### Current Contract Inventory — new entry + +**`lance-graph-ontology`** (new crate, 2026-05-07): consolidates per-tenant bridge multiplication into one ontology spine. OGIT becomes the canonical TTL ontology source; Lance is the (feature-gated) runtime dictionary cache; tenant bridges become thin scoped views over the shared registry. Public types: `OntologyRegistry`, `NamespaceBridge` trait, `NamespaceId`, `OgitUri`, `SchemaPtr`, `SchemaKind`, `MappingProposal`, `MappingProposalKind`, `MappingRow`, `MappingHandle`, `HydrationReport`, `HydrationFailure`, `BridgeError`, `Error`, `SchemaSource` trait, `EntityRef`, `EdgeRef`, `OntologyAssembler`, `SemanticTypeMap`, `TtlSource`. Default tenant bridges: `bridges::WoaBridge`, `bridges::MedcareBridge`, `bridges::OgitBridge`. 28 tests passing (16 inline + 12 integration). Feature-gated Lance persistence under `lance-cache` (kept off by default so the crate compiles without `protoc`, which `lance-encoding`'s build-script requires). Branch `claude/create-graph-ontology-crate-gkuJG`; commit `4cf9a26`; prior recon + decision in `edef321` (`.claude/RECON_ONTOLOGY_CRATE.md`, `.claude/DECISION_SPO_ARIGRAPH.md`). diff --git a/.claude/board/PR_ARC_INVENTORY.md b/.claude/board/PR_ARC_INVENTORY.md index a05fc81d..82be9a54 100644 --- a/.claude/board/PR_ARC_INVENTORY.md +++ b/.claude/board/PR_ARC_INVENTORY.md @@ -1151,3 +1151,41 @@ Removes `crate-ci/typos` spell-check job from `style.yml`; `cargo fmt --check` r **Deferred:** — **Docs:** `Dockerfile.md`, `.claude/board/EPIPHANIES.md`, `.claude/board/TECH_DEBT.md` + +--- + +## (open / pending merge) — feat(lance-graph-ontology): scaffold OGIT-canonical ontology spine (2026-05-07) + +(Per APPEND-ONLY rule: PR sections are reverse-chronological; this dated entry is the new top-of-arc entry. Reverse-chronologically newest, even though it sits at the file end under tee-a governance.) + +**Confidence (2026-05-07):** High. 28 tests passing (16 inline + 12 integration). Builds without `protoc` because Lance persistence is feature-gated. + +**Branch:** `claude/create-graph-ontology-crate-gkuJG` +**Commit:** `4cf9a26` (prior recon + SPO-1 decision: `edef321`) + +**Added:** +- New workspace member `crates/lance-graph-ontology/` (~3000 LOC). Cargo.toml with feature-gated `lance-cache` so the crate compiles without `protoc` (lance-encoding's build-script otherwise requires it). +- `src/lib.rs` public surface; modules `error`, `namespace`, `proposal`, `semantic_types`, `ttl_parse`, `foundry_map`, `registry`, `bridge`, `schema_source`. +- Public types: `OntologyRegistry`, `NamespaceBridge` (trait), `NamespaceId`, `OgitUri`, `SchemaPtr`, `SchemaKind`, `MappingProposal`, `MappingProposalKind`, `MappingRow`, `MappingHandle`, `HydrationReport`, `HydrationFailure`, `BridgeError`, `Error`, `SchemaSource` (trait), `EntityRef`, `EdgeRef`, `OntologyAssembler`, `SemanticTypeMap`, `TtlSource`. +- Default tenant bridges `bridges::WoaBridge`, `bridges::MedcareBridge`, `bridges::OgitBridge` (thin scoped views over the shared registry, ~20 LOC each per the v4 plan). +- `src/semantic_types.toml`: declarative OGIT-attribute → SemanticType map (the only TOML in the crate; ontology data itself is TTL). +- `src/lance_cache.rs` (feature-gated `lance-cache`): `LanceWriter` for runtime dictionary persistence. +- Phase 3 (scaffold), Phase 4 (TTL hydration), Phase 5 (tenant bridges) of the v4 plan. + +**Locked:** +- **OGIT TTL is the canonical ontology source.** Lance is the runtime dictionary cache, not the source of truth. +- **Tenant bridges are thin scoped views** over the shared `OntologyRegistry`, not independent ontology multiplication. +- **Lance persistence is feature-gated** under `lance-cache`; the default compile path requires no `protoc`. +- **Federated two-layer cache (Option B) for SPO + ARiGraph**, per `.claude/DECISION_SPO_ARIGRAPH.md` (entropy-ledger rows 70 + 245: SPO + ARiGraph triplet_graph are not duplicates by design — they are an L1/L2 cache pair). The ontology crate is agnostic; it produces `Ontology` values; consumers route via `SchemaExpander`. Does NOT close SPO-1 — `promote_to_spo` bridge work remains separately owned. +- **`SchemaExpander` consumer point** (already shipped in earlier work) is the one bridge surface the ontology crate writes through; the prior `sql-spo-ontology-bridge-v1` plan's `SchemaExpander` proposal is therefore partially superseded (the expander shipped, the bridge plan's surface is now produced). + +**Deferred:** +- Lance feature-gated compile path requires `protoc` to actually exercise the `lance-cache` feature; default compile path stays clean. Activating `lance-cache` in CI is deferred pending a `protoc` install step or a vendored protobuf descriptor. +- SPO-1 closure (`promote_to_spo` writer bridge between `arigraph::triplet_graph` and `spo::store`) — owned separately, not by this crate. +- Phases 6-7 of the v4 plan (canonical TTL emission for WoA / Healthcare into `AdaWorldAPI/OGIT/NTO/`; Cypher integration test routing around PARSER-1 stub via `lance_graph::parser::parse_cypher_query`). +- Tenant rosters beyond WoA / MedCare / OGIT. + +**Docs:** +- `.claude/RECON_ONTOLOGY_CRATE.md` (Phase 1 recon, commit `edef321`). +- `.claude/DECISION_SPO_ARIGRAPH.md` (SPO-1 decision, commit `edef321`). +- This board update (LATEST_STATE.md table + Inventory; INTEGRATION_PLANS.md status annotation on `sql-spo-ontology-bridge-v1`; EPIPHANIES.md SPO-1 disposition entry; AGENT_LOG.md run entry). diff --git a/.claude/board/TECH_DEBT.md b/.claude/board/TECH_DEBT.md index 934cbdb0..a4d46899 100644 --- a/.claude/board/TECH_DEBT.md +++ b/.claude/board/TECH_DEBT.md @@ -1483,3 +1483,10 @@ documents the correct direction so future work doesn't re-derive it. **Introduced by:** PR #329 (audit surfaced) **Author's words:** "Most 'drift' in the standalones is intentional author style (single-line `if`s, visually-aligned struct literals, two-space-comment alignment). No CI gate exists to lock the canonical style. Two viable follow-up paths: Path A (per-crate rustfmt.toml overrides) or Path B (mass-rewrite + CI gate for every crate)." + +## 2026-05-07 — TTL-PROBE-5: dcterms:source dropped during TTL hydration +**Status:** Open +**Priority:** P2 +**Scope:** @truth-architect lance-graph-ontology +**Description:** When a TTL declares `dcterms:source ` for an entity, the parser at `crates/lance-graph-ontology/src/ttl_parse.rs` ignores it and writes `source_uri = "file:"` to the dictionary instead. The probe `dcterms_source_is_currently_dropped` in `tests/round_trip_ttl.rs` locks this current-but-undesired behaviour. Real OGIT TTLs do carry `dcterms:source` provenance; losing it cripples upstream-pull / round-trip-export workflows. +**Followup:** Extend `parse_into_proposals` to look for `` triples per subject; if present, prefer that IRI over the local file path. Flip the assertion in the probe so it asserts the dcterms IRI is preserved. Close this row. diff --git a/.claude/knowledge/ontology-registry.md b/.claude/knowledge/ontology-registry.md new file mode 100644 index 00000000..d8fb177b --- /dev/null +++ b/.claude/knowledge/ontology-registry.md @@ -0,0 +1,187 @@ +# KNOWLEDGE: Ontology Registry — `lance-graph-ontology` Crate Map + +## READ BY: `workspace-primer`, `host-glove-designer`, `bus-compiler`, +## `tenant-bridge-author` (future). MANDATORY before any work +## touching tenant bridges, OGIT TTL ingest, MappingProposal +## producers, or schema-source wiring. + +## Status: FINDING / SHIPPED. Crate scaffolded and tested in PR commits +## `4cf9a26` (initial structure + bridge trait) and `edef321` (TTL hydrator, +## scope-lock tests, three default bridges). 28 tests pass across +## `tests/bridge_scope_lock.rs`, `tests/hydrate_real_ogit.rs`, and +## `tests/round_trip_ttl.rs`. Phases 1–5 of the unified-ontology plan are +## done; Phases 6–7 (WorkOrder TTL emission, woa-rs/MedCare-rs/q2 binary +## wiring, BindSpace consumer) are queued for follow-up sessions. + +--- + +## Thesis + +`lance-graph-ontology` is the OGIT-canonical ontology spine for every +lance-graph tenant. It hydrates TTL files (today, the AdaWorldAPI/OGIT +fork; tomorrow, MySQL/MSSQL schema scanners and a customer admin form) +into `MappingProposal` DTOs, accumulates them into a single +`OntologyRegistry` keyed by `(bridge_id, public_name)` and by raw OGIT +URI, optionally persists rows append-only to a Lance dictionary table, +and exposes the registry to consumers as thin scoped views called +*namespace bridges*. The crate invents no new storage layer — it is a +parser plus a cache plus a scoping facade over the existing +`lance-graph-contract::ontology` (`Ontology` / `Schema` / `LinkSpec` / +`SchemaExpander`) surface. Tenant bridges are ~15-20 lines each, lock +every operation to one OGIT namespace at construction time, and route +resolution through the shared registry. The registry never decides which +SPO substrate a triple ends up in; that remains the consumer's choice +through the existing `SchemaExpander` paths. + +## Producer → Consumer map + +The crate is the narrow waist between four producers (one shipped, three +future) and three current consumers (plus tenant binaries downstream). +Everything funnels through `MappingProposal` on the way in and through +the `NamespaceBridge` trait on the way out. + +| Layer | Component | File / Surface | Status | +|---|---|---|---| +| Producer | OGIT TTL hydrator | `src/ttl_parse.rs`, `src/schema_source.rs` walking `AdaWorldAPI/OGIT/NTO//` | SHIPPED | +| Producer | MySQL `SchemaSource` | `src/schema_source.rs` (trait); impl pending | FUTURE | +| Producer | MSSQL `SchemaSource` | `src/schema_source.rs` (trait); impl pending | FUTURE | +| Producer | Customer admin form | future UX layer emitting `MappingProposal` | FUTURE | +| Waist | `MappingProposal` + `MappingRow` | `src/proposal.rs` | SHIPPED | +| Spine | `OntologyRegistry` | `src/registry.rs` (in-memory dictionary) | SHIPPED | +| Spine | Lance dictionary cache | `src/lance_cache.rs` (feature-gated `lance-cache`) | SHIPPED | +| Spine | `NamespaceBridge` trait + `BridgeFromRegistry` | `src/bridge.rs` | SHIPPED | +| Default bridge | `OgitBridge` (raw-URI pass-through, per-namespace) | `src/bridges/ogit_bridge.rs` | SHIPPED | +| Default bridge | `WoaBridge` (`WorkOrder` namespace) | `src/bridges/woa_bridge.rs` | SHIPPED | +| Default bridge | `MedcareBridge` (`Healthcare` namespace) | `src/bridges/medcare_bridge.rs` | SHIPPED | +| Consumer | `lance-graph-callcenter::ontology_dto` | callcenter DTO surface (existing) | INTEGRATES NEXT | +| Consumer | `lance-graph::graph::spo::ontology_bridge` | existing `SchemaExpander` writer into SPO | UNCHANGED | +| Consumer | `cognitive-shader-driver::BindSpace` (Phase 7) | future MetaWord / MetaColumn emission | QUEUED | +| Consumer | Tenant binaries (`woa-rs`, `MedCare-rs`, `q2`) | construct one bridge per tenant | QUEUED | + +The waist is intentional: every producer becomes a `SchemaSource` impl +and emits `MappingProposal { kind, schemas, links, rows }`; every +consumer holds an `Arc` and constructs its bridge once +at startup. Adding a new producer or a new tenant never requires +touching the spine. + +## The five-step "I want to add a tenant bridge" recipe + +A new tenant bridge (say a `qualicare` bridge over `Healthcare` or +`q2` over `WorkOrder`) is mechanical. The default methods on +`NamespaceBridge` carry resolution and scope-lock; the new struct +supplies four constants and a constructor. + +1. **Define the struct** holding `Arc` and a cached + `NamespaceId` (the `g_lock`). Mirror `bridges/medcare_bridge.rs` — + it is the smallest, ~45 LOC end-to-end including imports. +2. **Implement `NamespaceBridge`** — supply `bridge_id() -> &'static + str`, `registry() -> &OntologyRegistry`, `g_lock() -> NamespaceId`. + The trait's default `entity()`, `edge()`, `entity_by_uri()`, `row()` + handle resolution and the cross-namespace leak check. +3. **Implement `BridgeFromRegistry`** so callers can use the generic + `bridge::make_bridge::(registry)?` constructor. One line: + delegate to `Self::new(registry)`. +4. **Write a scope-lock test** in `tests/bridge_scope_lock.rs` (extend, + do not duplicate the file) verifying that `entity("ForeignName")` + returns `BridgeError::NotInScope` or `BridgeError::CrossNamespaceLeak`. + That single test is what ratifies the bridge as scoped. +5. **Re-export from `bridges/mod.rs`** and ship. The tenant binary + constructs one instance at startup and uses `bridge.entity("...")` + throughout — no other plumbing required. + +LOC budget for steps 1–3: ~15-20 lines. The recipe is the same whether +the namespace already exists in TTL (`WorkOrder`, `Healthcare`, +`Network`, etc.) or is reserved for future hydration; an unknown +namespace fails at construction with `Error::UnknownNamespace`, which +is the right time to fail. + +## The three-step "I want to extend the ontology" recipe + +Three producer pathways exist; each ships a `MappingProposal` and the +spine integrates the rest. + +1. **TTL fork PR.** Add or edit a TTL file under + `AdaWorldAPI/OGIT/NTO//entities/.ttl` + following the convention in `NTO/Network/entities/IPAddress.ttl` + (prefixes, `rdfs:Class`, `rdfs:subClassOf ogit:Entity`, + mandatory/optional/indexed attribute lists, `ogit:allowed` block). + The hydrator picks it up next time the registry is rebuilt; no Rust + code changes. +2. **Schema scanner.** Implement `SchemaSource` against a database + driver (MySQL, MSSQL, Postgres). Map the source schema's tables / + columns / FKs to `MappingProposal::Schemas` rows. Future work; the + trait shape exists today and the OGIT hydrator is the reference impl. +3. **Customer admin form.** A future UX layer where a customer's + administrator paints entities and edges; the form emits + `MappingProposal` directly. Same append path as the other two; no + spine change required. + +The three producers are not in tension. They can run side-by-side +against the same registry; collisions on `(bridge_id, public_name)` or +on raw OGIT URI surface as `HydrationReport` warnings. + +## SPO-1 disposition + +This crate adopts **Option B (federated)** from +`.claude/DECISION_SPO_ARIGRAPH.md`. The two SPO stores +(`lance-graph::graph::spo::*`, fingerprint-keyed cold; and +`lance-graph::graph::arigraph::triplet_graph`, string-keyed warm) are +not duplicates by design — they are an L1/L2 cache pair serving +fundamentally different operations. The ontology registry has zero +opinion about which one a downstream consumer chooses. Hydration +produces `Ontology` values; the existing `SchemaExpander` path produces +`ExpandedTriple`s; whichever store the consumer holds is where the +triples land. Closing SPO-1 (the `arigraph::SpoBridge::promote_to_spo` +writer) is unblocked by this decision but remains owned by SPO-1's own +entropy-ledger row; it is explicitly out of scope here. + +## What's NOT here (non-goals) + +The crate is deliberately small. None of the following lives in +`lance-graph-ontology` and proposals to add any of them should be +redirected to their owners. + +- No new SPO store, no new SPOG quad store, no janus-driver — the + existing `lance-graph::graph::spo::*` and `arigraph::triplet_graph` + remain canonical. +- No new SPARQL / Gremlin / GQL / Cypher parsers — those live in + `lance-graph-planner::strategy::*`. Phase 7's woa-rs Cypher + integration test routes through `lance_graph::parser:: + parse_cypher_query` directly (PARSER-1 stub gap is documented). +- No MUL changes, no new `CausalEdge64` variants, no new `BindSpace` + columns. The shader-driver MUL veto at `driver.rs:271-320` is + unchanged. +- No `smb-bridge` or `callcenter-bridge` ship in this session. SMB + stays on its native `smb-ontology` declarative Rust fallback; + callcenter has auth + per-customer scoping concerns that warrant + their own design pass. +- No `promote_to_spo` writer — that closes SPO-1, not this crate's row. + +## Pointers + +- `.claude/RECON_ONTOLOGY_CRATE.md` — Phase 1 reconnaissance: + verified-shipped table, OGIT NTO/ inventory, woa-rs / WoA state, + PARSER-1 caveat. +- `.claude/DECISION_SPO_ARIGRAPH.md` — binding ruling for SPO-1 within + this crate's scope (Option B, federated). +- `crates/lance-graph-ontology/src/lib.rs` — module surface, + re-exports, "What this crate is NOT" doc-comment. +- `crates/lance-graph-ontology/src/bridge.rs` — `NamespaceBridge` + + `BridgeFromRegistry` + `BridgeError` definitions. +- `crates/lance-graph-ontology/src/registry.rs` — the + `OntologyRegistry` itself. +- `crates/lance-graph-ontology/src/bridges/{medcare,ogit,woa}_bridge.rs` + — the three default tenant bridges; copy `medcare_bridge.rs` as a + template. +- `crates/lance-graph-ontology/tests/bridge_scope_lock.rs` — the + scope-lock test pattern every new bridge must extend. +- `crates/lance-graph-ontology/tests/hydrate_real_ogit.rs` — end-to-end + TTL → registry hydration smoke test against the real OGIT fork. +- `crates/lance-graph-ontology/tests/round_trip_ttl.rs` — proposal → + registry → row → resolved-URI round trip. +- `crates/lance-graph-contract/src/ontology.rs` and `property.rs` — + upstream `Ontology` / `Schema` / `PropertySpec` / `Marking` / + `SemanticType` / `SchemaExpander` definitions the crate is a facade + over. +- `crates/lance-graph/src/graph/spo/ontology_bridge.rs` — existing + `SchemaExpander` integration that consumes the crate's output. diff --git a/.claude/plans/lance-graph-ontology-v5.md b/.claude/plans/lance-graph-ontology-v5.md new file mode 100644 index 00000000..92c27aab --- /dev/null +++ b/.claude/plans/lance-graph-ontology-v5.md @@ -0,0 +1,177 @@ +# Plan: lance-graph-ontology v5 — post-merge follow-ons + +> **Status:** Drafted (2026-05-07). Picks up where v4 (`claude/create-graph-ontology-crate-gkuJG`) +> left off after `AdaWorldAPI/OGIT#1` merged. Doctrine: brutally honest review + +> super helpful solutions. Per CLAUDE.md "Documentation prose, not lists" — +> body sections are prose, lists are reserved for genuine enumerations. +> +> **Author:** integration-lead (Opus 4.7 1M). +> **Branches in flight:** `lance-graph: claude/create-graph-ontology-crate-gkuJG` +> (last commit `34939e8`); `woa-rs: claude/create-graph-ontology-crate-gkuJG` +> (last commit `c881b1c`); `OGIT: master` (PR #1 merged). +> **Reads:** `.claude/board/AGENT_LOG.md`, `.claude/RECON_ONTOLOGY_CRATE.md`, +> `.claude/DECISION_SPO_ARIGRAPH.md`, `.claude/knowledge/ontology-registry.md`, +> `.claude/board/ARCHITECTURE_ENTROPY_LEDGER.md`, `.claude/board/TECH_DEBT.md` +> (TTL-PROBE-5), `.claude/board/INTEGRATION_PLANS.md` (top format), +> `.claude/plans/sql-spo-ontology-bridge-v1.md` (partially superseded), +> `.claude/plans/foundry-roadmap-unified-smb-medcare-v1.md`, +> `crates/lance-graph-ontology/src/lib.rs`, `WoA/RUST_TRANSCODE_PLAN.md`. + +--- + +## §1 Brutally honest stocktake + +### What v4 actually shipped + +The v4 session shipped a working OGIT-canonical ontology spine and the first end-to-end proof that a TTL fork could hydrate into a tenant binary's runtime. Concretely, the new crate at `/home/user/lance-graph/crates/lance-graph-ontology/` is 12 source files plus 3 integration tests. Of those, `registry.rs` (18 KB) and `ttl_parse.rs` (21.6 KB) are the load-bearing pair: the TTL parser builds `MappingProposal` DTOs via `oxttl`, and the registry indexes them by `(bridge_id, public_name)` and by raw OGIT URI. The `lance_cache.rs` module (15 KB, feature-gated) provides the append-only Lance dictionary persistence path. Three default bridges (`woa_bridge.rs`, `medcare_bridge.rs`, `ogit_bridge.rs`) demonstrate the ~45-LOC scoped-view pattern, each with a scope-lock test in `tests/bridge_scope_lock.rs`. The `cognitive-shader-driver` consumer wiring is a single `Option>` field on `BindSpace` plus a setter and getter at `bindspace.rs:198/239/244` — not a usage, just an attachment point with a doc-comment flagging future MUL-trust integration. **FINDING (verified end-to-end).** + +The cross-repo deliverables are: `AdaWorldAPI/OGIT#1` (commit `3871d37` on the OGIT fork; 27 TTL files at `NTO/WorkOrder/` covering 15 entities and 12 verbs) which **merged to master** and is now baseline for any future OGIT clone; and the woa-rs binary scaffolding at `/home/user/woa-rs/` which compiles against the registry but cannot run end-to-end locally because of a missing `protoc` build dependency for the lance-encoding transitive build script. The `cargo test -p lance-graph-ontology` suite reports 36 passing tests across 4 files (18 lib + 6 bridge_scope_lock + 2 hydrate_real_ogit + 9 round_trip_ttl + 1 from the workorder hydrate added during integration-lead review) and the OGIT fork is verifiably parsed by `oxttl 0.5.8` end-to-end. **FINDING.** + +The board-hygiene was honored: `LATEST_STATE.md`, `PR_ARC_INVENTORY.md`, `EPIPHANIES.md`, `INTEGRATION_PLANS.md`, and `TECH_DEBT.md` all received append-only updates totaling 74 insertions / 0 deletions. `AGENT_LOG.md` was created from scratch and now carries the 12-agent + 3-corrections trace. The Phase-7 invariant (cognitive-shader-driver MUL gate at `driver.rs:271-320` untouched) holds — the v4 session deliberately did not touch the trust-threshold machinery and confined itself to the registry attachment point. **FINDING.** + +### What v4 punted on + +The `smb-ontology` crate (2,079 LOC of declarative Rust at `smb-office-rs/crates/smb-ontology/`, 13 Steuerberater entities) was kept as the OGIT-skeptical-customer fallback rather than being TTL-migrated. Reason: foot-in-the-door deployments per `WoA/RUST_TRANSCODE_PLAN.md` need a self-contained native ontology that survives without an OGIT clone on disk, and the migration would have ballooned the v4 scope past the merge window. **CONJECTURE / status: deliberate deferral, see D-ONTO-V5-4 for the brutal-honest review of whether to actually convert.** + +The `callcenter-bridge` was deferred because callcenter has separate auth (JWT middleware, RLS rewriter per actor context — see `foundry-roadmap-unified-smb-medcare-v1.md` §2 LF-3/DM-7) and per-customer scoping concerns. Adding a fourth default bridge that also has to coordinate with `Subject` (currently SUBJECT-DTO-1 in the entropy ledger; Stage 0 / Aspirational) would have coupled two architectural questions; v4 picked the right axis to defer. **FINDING.** + +The MySQL and MSSQL `SchemaSource` impls were deferred. The trait shape exists at `src/schema_source.rs` but the only producer today is the OGIT TTL directory walker. Reason: getting one end-to-end TTL path working before a second producer was correct sequencing. **FINDING.** + +The customer admin form was deferred because it is React/HTML, not Rust, and the Rust boundary (a `MappingProposal` emitter) is the entire ontology-side concern; the UX layer is woa-rs's territory. **FINDING.** + +The ontology-aware MUL trust thresholds (Compliance → Plateau-only; Healthcare → stricter calibration) were deferred to keep Phase-7 to a read-only attachment. The `BindSpace.ontology` slot is reachable but no MUL gate logic reads from it yet. **FINDING.** Brier-history, damage-budget, and sandbox-availability MUL publishers stayed at `SituationInput::default()` for the same reason — those are publisher-side concerns owned by the cognitive-shader-driver crate, not the ontology crate. **FINDING.** + +`TTL-PROBE-5` (the dcterms:source provenance drop at parse) was logged as a regression test instead of fixed because the fix path (extend `parse_into_proposals` to look for `` triples) is well-scoped, the test locks current behavior so future fixes are detectable, and shipping the fix would have required rerunning the OGIT fork validation. **FINDING / proper deferral.** + +`SPO-1` (the `arigraph::SpoBridge::promote_to_spo` writer bridge) is unblocked but unowned. The v4 session chose Option B (federated, two-layer cache) per `DECISION_SPO_ARIGRAPH.md` which is the correct architectural reading; it does not close the entropy-ledger row. **FINDING.** + +`PARSER-1` (the planner cypher_parse stub) was correctly framed by the main-thread correction at `AGENT_LOG.md` line 187+: cold-path vs hot-path is by design, "consolidation later, not now," not "the planner has a bug to fix." woa-rs uses the cold path (`lance_graph::parser::parse_cypher_query`) for diagnostic clarity. **FINDING.** + +## §2 Open ledger rows that v4 did NOT close + +`SPO-1` (Stage 3, ×2 distinct purposes / not duplicates by design) is the biggest row v4 had explicit influence over. The L1/L2 cache pair framing (warm string-keyed `triplet_graph` + cold fingerprint-keyed `spo::store`) is now binding doctrine for the ontology crate per `DECISION_SPO_ARIGRAPH.md`. The smallest-cost path to actually closing the row is to ship the one-way writer: `arigraph::SpoBridge::promote_to_spo(&TripletGraph, gate, &mut SpoStore)` as a ~150 LOC addition to `crates/lance-graph/src/graph/arigraph/`, gated by a `gate: PromoteGate` parameter (probably `truth_floor: TruthValue` + `min_episodes: u32`). The ontology crate does not need to ship this — it is `lance-graph` core work — but the v5 plan can name it as D-ONTO-V5-2 since the scope-lock + bridge-trait pattern in `lance-graph-ontology` is the natural template. + +`PARSER-1` (Stage 3 lance-graph::parser real + Stage 1 planner stubs) is correctly NOT touched by ontology work. The planner stubs are intentionally lighter than the cold-path parser; consolidation is a separate consolidation track. The ontology crate has zero leverage here and v5 should not pretend otherwise. **No D-id; reference for future readers only.** + +`TTL-PROBE-5` (TECH_DEBT row at line 1487, dcterms:source dropped at parse) is the one row the ontology crate fully owns. The fix is local: `crates/lance-graph-ontology/src/ttl_parse.rs` `parse_into_proposals`, plus a flip of the regression test in `tests/round_trip_ttl.rs` from "asserts dropped" to "asserts preserved." This is D-ONTO-V5-1. + +`MUL-ASSESS-1` (Stage 2 ×4) is partially leverageable from v5. The ontology registry is the natural lookup surface for namespace-keyed trust thresholds (Compliance → Plateau-only; Healthcare → stricter calibration). The fix is to grow `OntologyRegistry` with `mul_threshold(namespace: NamespaceId) -> MulThreshold` and have `cognitive-shader-driver` read it before constructing `SituationInput`. This is D-ONTO-V5-9. **The row is not closed by D-ONTO-V5-9** (the row is about consolidating the four `MulAssessment` copies into one canonical), but the registry does become a natural canonical source for the per-namespace threshold table. + +`TRUST-1` (Stage 2 ×3 incompatible variant sets) and `FLOW-1` (Stage 2 ×3) and `COMPASS-1` (Stage 2 ×3 incompatible) are all consolidation rows in the Thinking cluster. The ontology crate has **zero leverage** on the consolidation step (collapsing the duplicate enums); the registry might at most carry a per-namespace thinking-style preference table, but that is not what the rows ask for. v5 should explicitly NOT pretend to address these. + +`CONTRACT-INV-1` (Stage n/a, board hygiene) is partially closed by ontology work — `OntologyRegistry`, `WoaBridge`, `MedcareBridge`, `OgitBridge`, `HydrationReport`, `OgitUri`, `NamespaceBridge` have been added to `LATEST_STATE.md` Contract Inventory per the v4 governance pass. Continued vigilance during v5 deliveries (each new contract type must update the inventory in the same commit). **No D-id; ongoing discipline.** + +## §3 Deliverables — D-IDs (rank-ordered by leverage / cost) + +D-ids strictly ordered by the leverage-over-cost ratio. The first three are next-3 (ship within 1-2 sessions); D4-6 are 6-months-out; D7-15 are out-past-the-quarter. + +### Next-3 (ship now, leverage > cost by ≥ 3×) + +**D-ONTO-V5-1 — dcterms:source provenance.** Closes TTL-PROBE-5 properly. Scope: extend `parse_into_proposals` in `crates/lance-graph-ontology/src/ttl_parse.rs` to look for `` triples per subject and prefer that IRI over the local file path when present. Files touched: `src/ttl_parse.rs` (~80 LOC), `tests/round_trip_ttl.rs` (flip the `dcterms_source_is_currently_dropped` probe assertion + rename), `src/proposal.rs` (`MappingProposal::source_uri` already exists — no API change). Exit criteria: the renamed probe asserts that the dcterms IRI from a TTL like `WoA/models.py:Customer` is preserved verbatim through to `MappingRow::source_uri` for all 27 WorkOrder TTLs in the merged OGIT fork; closes TTL-PROBE-5 in `TECH_DEBT.md` with a "resolved" annotation. Dependencies: none. Risk: low (local, well-tested). + +**D-ONTO-V5-9 — Ontology-aware MUL trust thresholds.** The deferred Phase-7 work; the highest-leverage cognitive integration v5 can ship without touching MUL math. Scope: grow `OntologyRegistry` with a small `mul_threshold(namespace: NamespaceId) -> Option` returning a per-namespace override (e.g., `Healthcare → MulThresholdProfile::stricter()`, `Compliance → MulThresholdProfile::plateau_only()`); have `cognitive-shader-driver::driver.rs:271-320` consult `bindspace.ontology()` and apply the namespace-keyed override before computing `SituationInput`. Files touched: `crates/lance-graph-ontology/src/registry.rs` (~60 LOC for the lookup table + builder), `crates/cognitive-shader-driver/src/driver.rs:271-320` (~40 LOC to wire the override path; **no MUL math change**, just SituationInput field tightening), one new integration test at `crates/cognitive-shader-driver/tests/ontology_aware_mul_threshold.rs` (~50 LOC). Exit criteria: a Healthcare-namespace cycle goes to HOLD on a SituationInput that a Compliance-namespace cycle would CONTINUE on, with all other state held constant. Dependencies: D-ONTO-V5-1 not strict (parallel-shippable). Risk: medium — the `MulThresholdProfile` shape is new to `lance-graph-contract` and needs a Marking-style enum, not just struct fields, to preserve canonical-source semantics. + +**D-ONTO-V5-2 — SPO bridge fn (`promote_to_spo`).** Closes SPO-1. Scope: implement the one-way writer bridge `arigraph::SpoBridge::promote_to_spo(&TripletGraph, gate: PromoteGate, &mut SpoStore)` that promotes warm string-keyed entries into cold fingerprint-keyed storage. Architectural call: the bridge **lives in `lance-graph::graph::arigraph::spo_bridge.rs`, not in a new crate.** Brutal honest take — a separate `crates/spo-bridge` would re-create the inter-store hop (SPO-1's whole point is the L1/L2 cache pair lives in `lance-graph::graph`); pulling it out of that crate breaks the encapsulation that `DECISION_SPO_ARIGRAPH.md` ratifies. The bridge is internal lance-graph plumbing, not a public consumer surface. Files touched: `crates/lance-graph/src/graph/arigraph/spo_bridge.rs` (NEW, ~150 LOC), `crates/lance-graph/src/graph/arigraph/mod.rs` (add `pub mod spo_bridge`), `crates/lance-graph/tests/arigraph_promote_to_spo.rs` (NEW, ~80 LOC). Exit criteria: 50 string-keyed `TripletGraph` entries with `confidence > gate.truth_floor` and `episode_count > gate.min_episodes` round-trip to fingerprint-keyed SPO rows; the entropy-ledger row 70+245 closes with a dated entry. Dependencies: none. Risk: medium — `gate: PromoteGate` shape is new and needs to align with `contract::collapse_gate::GateDecision` semantics without colliding with `mul::GateDecision` (GATE-1 in the ledger). + +### Six-months-out (D4-D6, leverage > cost by ≥ 1.5×) + +**D-ONTO-V5-3 — Healthcare namespace transcode.** Mirrors the WoA pattern with the same 6-agent ensemble (container-architect / family-codec-smith / bus-compiler for entities; ripple-architect for verbs; truth-architect for hydration probes; certification-officer for the OGIT fork PR). Scope: identify Medcare's domain entities from `MedCare-rs/crates/medcare-core/` (or upstream Python source if available — the workspace currently has only the Rust scaffolding, no `models.py` equivalent), transcode to OGIT-shaped TTL under `OGIT/NTO/Healthcare/`, open the OGIT fork PR. Estimate: 12-15 entities + 8-10 verbs (smaller than WoA because medcare's clinical taxonomy is narrower); ~20 TTL files; ~3 sessions of agent work. Files touched: `OGIT/NTO/Healthcare/{entities,verbs}/*.ttl` (NEW), `crates/lance-graph-ontology/src/bridges/medcare_bridge.rs` (already exists — verify scope-lock against the new namespace), `crates/lance-graph-ontology/tests/hydrate_real_ogit.rs` (extend with `hydrate_healthcare_namespace_from_real_ogit`). Exit criteria: 12-15 Healthcare entity TTLs parse via `pyoxigraph` validation; `MedcareBridge` resolves `Patient`, `Diagnose`, `Laborwert`, `Medikament` URIs; OGIT-fork PR opens. Dependencies: D-ONTO-V5-1 (cleaner provenance for Healthcare TTLs). Risk: low (mechanical mirror of the WoA pattern). + +**D-ONTO-V5-6 — SchemaSource for MySQL.** Concrete impl of the trait shape. Scope: pick one tenant DB schema as the proving ground (the WoA `models.py` MySQL schema is the natural choice — same shape as the TTLs, cross-validates the hydration). Implement `MySqlSchemaSource` via `sqlx::MySql` against an information_schema query, map `tables / columns / FK constraints` to `MappingProposal { Schemas, links, rows }`. Files touched: `crates/lance-graph-ontology/src/schema_sources/mysql.rs` (NEW, ~250 LOC), `crates/lance-graph-ontology/Cargo.toml` (add `sqlx` behind feature `mysql-source`), `crates/lance-graph-ontology/tests/mysql_to_proposals.rs` (NEW, ~80 LOC, tempdb-fed). Exit criteria: the MySQL-derived proposals overlay the TTL-derived proposals for the same WorkOrder schema with zero collisions; `HydrationReport` warns on any drift. Dependencies: D-ONTO-V5-1. Risk: medium — `sqlx` is a new transitive dep with build-time gravity (similar to lance-encoding's protoc issue that bit woa-rs). + +**D-ONTO-V5-13 — Hydration parallelism.** Scope: profile the TTL hydrate path against the full OGIT fork (66 namespaces, ~3000+ TTL files including upstream + AdaWorldAPI extensions). If wallclock > 5s on a cold cache, parallelize via `rayon::par_iter` over namespace directories with a final merge-into-registry step. Files touched: `crates/lance-graph-ontology/src/registry.rs` `hydrate_once_sync` (~60 LOC delta), `crates/lance-graph-ontology/Cargo.toml` (add `rayon` behind feature `parallel-hydrate`), `crates/lance-graph-ontology/benches/hydrate_full_fork.rs` (NEW, ~40 LOC). Exit criteria: hydrate of full fork wallclock measured + reported; if > 5s, parallel impl ships and the bench passes < 2s; if ≤ 5s, the deliverable closes as "no work needed" with the bench shipping for future regression detection. Dependencies: none. Risk: low (additive feature flag). + +### Out-past-the-quarter (D7-D15, defer or partial leverage) + +**D-ONTO-V5-4 — smb-ontology TTL migration. Brutal honest take: do NOT convert.** The 2,079-LOC declarative Rust at `smb-office-rs/crates/smb-ontology/` is the foot-in-the-door deployment per `WoA/RUST_TRANSCODE_PLAN.md`'s "backend = local" mode. Customers running smb on a single machine without an OGIT clone need a self-contained native ontology. Converting it to TTL forces an OGIT-fork dependency on every smb deployment, which inverts the foot-in-the-door promise and adds a 600-MB clone to the install footprint. The right move is to **keep smb-ontology as native Rust** AND to add an OGIT-shaped *export* path so an smb deployment can publish its ontology to a fork on demand. Files touched: `smb-office-rs/crates/smb-ontology/src/export_ogit.rs` (NEW, ~120 LOC translating `Schema`/`LinkSpec` into TTL strings using the same shape as `OGIT/NTO/Network/entities/IPAddress.ttl`). Exit criteria: `smb_ontology::export_ogit_ttl()` produces 13 TTL files that parse via `oxttl` and that the `OntologyRegistry` hydrates into a working `SmbBridge`. Dependencies: D-ONTO-V5-1 (consistent provenance). Risk: low. **The "convert smb-ontology to TTL" framing in the v4 deferral list is wrong; do not adopt it.** + +**D-ONTO-V5-5 — q2 namespace transcode.** Mirrors WoA + Healthcare. Scope: identify q2's foundry-shape entities (Quarto / Neo4j / Gotham equivalents — see `q2-foundry-integration-v1.md` Q2-1.1..Q2-1.7 for the shape) and transcode to TTL under `OGIT/NTO/Q2/`. Estimate: ~10-12 entities (q2 is more user-interface than data-plane). Files touched: `OGIT/NTO/Q2/{entities,verbs}/*.ttl` (NEW), `crates/lance-graph-ontology/src/bridges/q2_bridge.rs` (NEW, ~45 LOC mirroring `medcare_bridge.rs`), `tests/bridge_scope_lock.rs` (extend with `q2_bridge_scope_lock`). Exit criteria: q2 binary holds an `Arc` and resolves `Workshop`, `Vertex`, `Doctemplate` via the `Q2Bridge`. Dependencies: D-ONTO-V5-3 (use the Healthcare transcode as a fresher template than WoA). Risk: low (mechanical). + +**D-ONTO-V5-7 — SchemaSource for MSSQL.** LOC delta vs MySQL impl: ~30 LOC (different driver string, slightly different information_schema column names; tibero / MSSQL share enough shape with MySQL that the impl is ~80% identical). Files touched: `crates/lance-graph-ontology/src/schema_sources/mssql.rs` (NEW), `Cargo.toml` add `tiberius` behind feature `mssql-source`. Exit criteria: same as D-ONTO-V5-6 for an MSSQL test schema. Dependencies: D-ONTO-V5-6 (do MySQL first; MSSQL is the second tenant). Risk: medium (tiberius is heavier than sqlx-mysql). + +**D-ONTO-V5-8 — Customer admin form.** This is React/HTML, not Rust. The Rust boundary is a `MappingProposal` POST endpoint exposed on woa-rs's existing axum server. Brutal honest take: **woa-rs is the first tenant binary, not the platform — the form belongs in woa-rs's surface, not in `lance-graph-ontology`.** Files touched: `woa-rs/src/admin/ontology_form.rs` (NEW, ~150 LOC axum handler accepting a JSON `MappingProposal`), `woa-rs/templates/admin/ontology.askama` (NEW, the actual form), `crates/lance-graph-ontology/src/proposal.rs` (add `serde::{Serialize, Deserialize}` derive behind feature `serde-proposals`, ~10 LOC). Exit criteria: a customer admin can paint a one-entity ontology extension via the form and see it appear in `OntologyRegistry` after a registry rebuild. Dependencies: D-ONTO-V5-1, D-ONTO-V5-9 (admin should be aware of namespace-keyed thresholds). Risk: medium (UX work that doesn't fit the agent ensemble cleanly). + +**D-ONTO-V5-10 — callcenter-bridge.** Architectural question: callcenter has separate auth (JWT middleware + RLS rewriter per `foundry-roadmap-unified-smb-medcare-v1.md` §2 LF-3/DM-7) and per-customer scoping. The RBAC × ontology cross is the open architectural question — does a `CallcenterBridge` carry a `Subject` parameter (from SUBJECT-DTO-1; currently Aspirational) on every resolution, or does the registry itself grow a subject-aware lookup? Cite POLICY-1 entropy-ledger row (entropy 4): the RBAC↔BBB bridge is missing — `impl MembraneGate for Arc`. Brutal honest take: **defer until SUBJECT-DTO-1 lands.** The bridge can be drafted but cannot ship until the auth surface is stable. Files touched: `crates/lance-graph-ontology/src/bridges/callcenter_bridge.rs` (DRAFT only, no commit until SUBJECT-DTO-1). Risk: high (depends on entropy-ledger row that is itself Aspirational). + +**D-ONTO-V5-11 — woa-rs binary minimum.** The 80/20 cut from `WoA/RUST_TRANSCODE_PLAN.md` (58 KB plan, ~215 dev-hours total scope). Brutal honest take: a useful Rust HTTP service this calendar quarter is ~50 hours of work; pick the chunks that exercise the ontology spine + at least one CRUD path, ship the rest as Python. Concretely: WT-2X chunks (entity round-trip via `EntityStore`) + WT-3X chunk for Customer + WT-4X chunk for the gRPC pair scaffolding (h2c only, defer the 3-mode TLS to next quarter). Files touched: `woa-rs/src/{routes,handlers,models}/customer.rs` (NEW, ~400 LOC), `woa-rs/src/grpc/` (NEW, ~250 LOC). Exit criteria: a `customer-woa-bin` Rust binary serves Customer CRUD over gRPC + axum and round-trips through `OntologyRegistry`. Dependencies: D-ONTO-V5-3 (parallel — Healthcare gives the second tenant template). Risk: high (cross-repo coordination tax per `RUST_TRANSCODE_PLAN.md`). + +**D-ONTO-V5-12 — cognitive-shader-driver MUL publishers.** The `SituationInput::default()` fields that v4 left as defaults: `calibration_accuracy`, `allostatic_load`, `max_acceptable_damage`, `sandbox_available`, plus a Brier-history publisher and a damage-budget publisher. These are publisher-side concerns owned by the cognitive-shader-driver crate, NOT the ontology crate. Per-publisher tickets: D-ONTO-V5-12a (Brier history), D-ONTO-V5-12b (damage budget), D-ONTO-V5-12c (sandbox availability). Each is ~80 LOC + 1 integration test in `crates/cognitive-shader-driver/`. Files touched: `crates/cognitive-shader-driver/src/publishers/{brier_history,damage_budget,sandbox_availability}.rs` (NEW). Exit criteria: each publisher emits a `SituationInput` field that varies measurably across cycles; gate decisions reflect the variance. Dependencies: D-ONTO-V5-9 (the namespace-keyed override needs the publishers' outputs to vary). Risk: low individually, sequential composition. + +**D-ONTO-V5-14 — Lance dictionary persistence under load.** The append-only contract is correct but unmeasured. Scope: 100K-row hydrate + resolve probe via the Lance-cache feature, measure wallclock + memory + dataset growth across 10 hydrate-resolve-restart cycles. Files touched: `crates/lance-graph-ontology/benches/lance_cache_load.rs` (NEW, ~120 LOC). Exit criteria: 100K hydrates complete < 10s; resolve latency < 100us p50; dataset growth linear in row count. Dependencies: D-ONTO-V5-13 (parallel hydrate makes the benchmark relevant at scale). Risk: low (additive bench). + +**D-ONTO-V5-15 — In-memory → Lance-backed registry cutover.** Currently `OntologyRegistry::new_in_memory()` is used everywhere; the Lance variant exists at `src/lance_cache.rs` but has no consumer. Scope: identify the cutover ticket — what's the surface between "in-memory only" and "Lance-backed with in-memory hot cache"? Probably a new constructor `OntologyRegistry::with_lance_cache(path: &Path) -> Result` that loads existing rows from Lance into the in-memory dictionary at startup, then writes new proposals to both. Files touched: `crates/lance-graph-ontology/src/registry.rs` (~80 LOC to wire the dual-write), `crates/lance-graph-ontology/tests/lance_backed_round_trip.rs` (NEW, ~60 LOC). Exit criteria: a registry constructed with Lance cache survives process restart and serves queries against the warm cache. Dependencies: D-ONTO-V5-14 (load probe must pass before this becomes default). Risk: medium (concurrency story for parallel writers — out of scope for v5 single-writer pattern). + +## §4 Test plan + +Each deliverable lands with concrete `cargo test` invocations that gate the merge. + +D-ONTO-V5-1: `cargo test -p lance-graph-ontology --no-default-features --test round_trip_ttl dcterms_source_is_preserved` (renamed from `_is_currently_dropped`); plus `OGIT_FORK_PATH=/home/user/OGIT cargo test -p lance-graph-ontology --no-default-features --test hydrate_real_ogit` to verify all 27 WorkOrder TTLs. + +D-ONTO-V5-9: `cargo test -p cognitive-shader-driver --test ontology_aware_mul_threshold healthcare_holds_compliance_continues`. + +D-ONTO-V5-2: `cargo test -p lance-graph --test arigraph_promote_to_spo round_trip_50_warm_to_cold`. + +D-ONTO-V5-3: `OGIT_FORK_PATH=/home/user/OGIT cargo test -p lance-graph-ontology --no-default-features --test hydrate_real_ogit hydrate_healthcare_namespace_from_real_ogit`. + +D-ONTO-V5-4: `cargo test -p smb-ontology --test export_ogit ttl_roundtrips_through_oxttl`. + +D-ONTO-V5-6 / D-ONTO-V5-7: `cargo test -p lance-graph-ontology --features mysql-source --test mysql_to_proposals workorder_schema_round_trip` (and analogous for MSSQL). + +D-ONTO-V5-13: `cargo bench -p lance-graph-ontology --bench hydrate_full_fork --features parallel-hydrate` plus a wallclock assertion < 2s in CI. + +D-ONTO-V5-14 / D-ONTO-V5-15: `cargo bench -p lance-graph-ontology --bench lance_cache_load --features lance-cache` plus a `cargo test --test lance_backed_round_trip restart_survives`. + +The full v5 regression suite is `cargo test -p lance-graph-ontology --all-features` plus the cross-crate integration tests above. Every D milestone adds at least one test; no deliverable merges with a green-on-skip pattern. + +## §5 Risk + rollback + +The Lance schema migration risk is concentrated in D-ONTO-V5-14 and D-ONTO-V5-15. The current `ontology_dictionary` schema is append-only by design, which makes additive column changes safe (Lance's evolution rules tolerate adding nullable columns). But a non-additive change — say, switching `source_uri: Utf8` to a struct with `{source_uri: Utf8, dcterms_source: Utf8}` — would require a migration script and a backwards-compat read path. The mitigation is: D-ONTO-V5-1 (the dcterms:source fix) ships **into the existing `source_uri` column** as the dcterms IRI when present, falling back to file path when absent. No schema change, no migration. If a future deliverable needs to distinguish provenance from file path, it gets its own additive column. If a non-additive change is ever genuinely needed, the rollback is to read the old dataset, write a new dataset with the new schema, swap pointers — Lance's commit-log makes this cheap, but it is a deliberate operation not an accidental one. + +The tenant binary risk is concentrated in D-ONTO-V5-3, D-ONTO-V5-5, and D-ONTO-V5-11. Each tenant binary depends on a registry handle that **can fail to hydrate** — the OGIT fork might not be checked out, the TTL files might be malformed (HydrationFailure), the Lance dataset might be corrupt. The current failure mode is hard panic: `OntologyRegistry::hydrate_once_sync` returns `Err`, the binary's startup fails, the operator gets a stack trace. The rollback path to native ontology is: each tenant binary should accept a `--fallback-to-native` flag that, on hydration failure, constructs an in-memory registry seeded by a hand-rolled `Schema` set (mirroring the smb-ontology pattern). For woa-rs this means committing a `woa-rs/src/native_fallback.rs` that constructs a 15-entity in-memory registry from `models.py`-equivalent Rust data; for medcare-rs and q2 the equivalent. The fallback is intentionally incomplete (no MUL trust thresholds, no admin form, no admin-extended entities) — it is a degraded-mode kept so that a single broken TTL doesn't take down a tenant's entire HTTP surface. Risk class: medium for woa-rs (manual mirror work), low for the others (smaller entity counts). + +## §6 Branch + PR strategy + +This is multi-repo work spanning `lance-graph`, `woa-rs`, `MedCare-rs` (post-D-ONTO-V5-3), `q2` (post-D-ONTO-V5-5), `OGIT` (every TTL transcode), and `smb-office-rs` (D-ONTO-V5-4). The default branch naming pattern is per-D `claude/onto-v5-` (e.g., `claude/onto-v5-1-dcterms-source`), with the exception that closely-coupled deliverables in the same repo can ride one branch (e.g., D-ONTO-V5-1 + D-ONTO-V5-2 are both `lance-graph` repo work but they touch different crates and should land as two separate PRs even on a shared branch). + +The repo split is: `lance-graph` gets a PR per Rust deliverable (so D-ONTO-V5-1, D-ONTO-V5-2, D-ONTO-V5-9, D-ONTO-V5-13, D-ONTO-V5-14, D-ONTO-V5-15 are six separate PRs). `OGIT` (the AdaWorldAPI fork) gets a PR per namespace transcode (D-ONTO-V5-3 = one PR for `NTO/Healthcare/`, D-ONTO-V5-5 = one PR for `NTO/Q2/`). `woa-rs`, `MedCare-rs`, `q2` stay on branches until the corresponding ontology deliverable is in master, then each opens its own PR consuming the shipped registry. `smb-office-rs` gets one PR for D-ONTO-V5-4. The OGIT-fork PR cadence is: open the PR when the TTL files validate via `pyoxigraph`, merge after one round of integration-lead review, and the upstream `almatoai/OGIT` repo is intentionally never PR'd — AdaWorldAPI runs an extension fork by design (per `RECON_ONTOLOGY_CRATE.md` §1.9: 66 upstream namespaces unchanged, AdaWorldAPI extensions live under additive directories like `WorkOrder/`, `Healthcare/`, `Q2/`). + +Each PR commit message ends with `https://claude.ai/code/` per the workspace's git policy. No force-pushes to main/master. No `--no-verify`. Branch lifetime: a per-D branch lives until the PR merges, then is deleted from origin. The session-class agent's parallel branch coordination pattern (one main thread + N specialists each writing to disjoint files + a final integration-lead review) is the recommended ensemble shape for D-ONTO-V5-3 and D-ONTO-V5-5; D-ONTO-V5-1 / D-ONTO-V5-2 / D-ONTO-V5-9 are single-agent grindwork. + +## §7 Concrete next-session prompt + +> You are a session-class agent on Opus 4.7 (1M). The lance-graph-ontology v5 plan ships +> three deliverables in priority order: D-ONTO-V5-1 (dcterms:source provenance fix, +> closes TTL-PROBE-5; ~80 LOC + 1 test in `crates/lance-graph-ontology/src/ttl_parse.rs` +> + `tests/round_trip_ttl.rs`), D-ONTO-V5-9 (ontology-aware MUL trust thresholds; grow +> `OntologyRegistry` with `mul_threshold(NamespaceId) -> Option`, +> wire `cognitive-shader-driver/src/driver.rs:271-320` to consult the namespace-keyed +> override; ~150 LOC + integration test), and D-ONTO-V5-2 (the `arigraph::SpoBridge:: +> promote_to_spo` writer, closes SPO-1; ~150 LOC at `crates/lance-graph/src/graph/ +> arigraph/spo_bridge.rs`). Mandatory reads: `.claude/plans/lance-graph-ontology-v5.md` +> + `.claude/board/AGENT_LOG.md` + `.claude/DECISION_SPO_ARIGRAPH.md`. Branch: +> `claude/onto-v5-1-dcterms-source` (or per-D as work splits). Do NOT touch the MUL +> gate math at `cognitive-shader-driver/driver.rs:271-320` — only add the override +> path. Do NOT modify the OGIT fork (TTL transcodes are D-ONTO-V5-3 / -5, separate +> sessions). Append AGENT_LOG.md after each deliverable per Layer-2 A2A discipline. + +## §8 Append-only commit — INTEGRATION_PLANS.md index entry + +Return as text for main-thread to apply per the v4 governance pattern (do not edit `INTEGRATION_PLANS.md` from this session). Prepend the following block at the top of the file, between the file's preamble and the `splat-osint-ingestion-v1` entry: + +```markdown +## lance-graph-ontology-v5 — post-merge follow-ons (authored 2026-05-07) + +- **Plan:** `.claude/plans/lance-graph-ontology-v5.md` +- **Author + date:** integration-lead (Opus 4.7 1M), 2026-05-07 +- **Status:** Active +- **Scope:** Picks up where v4 (`claude/create-graph-ontology-crate-gkuJG`, OGIT#1 merged) left off. 15 deliverables ranked by leverage / cost: D-ONTO-V5-1 (dcterms:source provenance, closes TTL-PROBE-5), D-ONTO-V5-2 (`arigraph::SpoBridge::promote_to_spo`, closes SPO-1), D-ONTO-V5-3 (Healthcare TTL transcode), D-ONTO-V5-4 (smb-ontology export-only, NOT migration — brutal-honest reversal), D-ONTO-V5-5 (q2 TTL transcode), D-ONTO-V5-6/7 (MySQL/MSSQL `SchemaSource` impls), D-ONTO-V5-8 (customer admin form, owned by woa-rs surface), D-ONTO-V5-9 (ontology-aware MUL trust thresholds — registry as namespace-keyed lookup), D-ONTO-V5-10 (callcenter-bridge, deferred until SUBJECT-DTO-1 lands), D-ONTO-V5-11 (woa-rs 80/20 binary cut), D-ONTO-V5-12 (MUL publishers — Brier/damage/sandbox), D-ONTO-V5-13 (hydration parallelism), D-ONTO-V5-14 (Lance dictionary load probe), D-ONTO-V5-15 (in-memory → Lance-backed cutover). +- **Originating context:** v4 OGIT#1 merge (15 entities + 12 verbs in `NTO/WorkOrder/`, master); 36 ontology tests pass; cognitive-shader-driver wired (read-only registry attachment). +- **Resolves ledger rows:** TTL-PROBE-5 (D-ONTO-V5-1), SPO-1 (D-ONTO-V5-2 70+245). Partial leverage on MUL-ASSESS-1 (registry as namespace-keyed threshold table). No leverage on TRUST-1 / FLOW-1 / COMPASS-1 / PARSER-1 (out of scope; the ontology crate has no influence on enum consolidation or the cypher cold/hot split). +- **Branch:** `claude/onto-v5-` per deliverable; OGIT-fork PRs per namespace transcode. +- **Confidence (2026-05-07):** Pre-execution. Plan reviews v4's outputs as FINDING-grade and v5's deferrals as honestly-deferred (not punted). Next-3 ranked: D-ONTO-V5-1, D-ONTO-V5-9, D-ONTO-V5-2. +- **Cross-ref:** `.claude/RECON_ONTOLOGY_CRATE.md`, `.claude/DECISION_SPO_ARIGRAPH.md`, `.claude/knowledge/ontology-registry.md`, `sql-spo-ontology-bridge-v1.md` (partially superseded), `foundry-roadmap-unified-smb-medcare-v1.md` (adjacent). +``` + +--- + +**End of plan.** diff --git a/.claude/plans/ogit-cascade-supabase-callcenter-v1.md b/.claude/plans/ogit-cascade-supabase-callcenter-v1.md new file mode 100644 index 00000000..64ec273d --- /dev/null +++ b/.claude/plans/ogit-cascade-supabase-callcenter-v1.md @@ -0,0 +1,212 @@ +# OGIT-Cascade · Supabase Realtime · Callcenter Membrane — v1 + +> **Status:** plan, not implementation. +> **Authored:** 2026-05-07 (immediately after lance-graph-ontology v5 PR #352). +> **Owner crates:** `lance-graph-callcenter`, `lance-graph-ontology`, `lance-graph-rdf` (planned), AdaWorldAPI/OGIT (extension fork). +> **Depends on:** lance-graph-ontology-v5 (D-9 thresholds, D-2 SpoBridge), lance-graph-rdf-fma-snomed-v1 (SemanticQuad importer), supabase-subscriber-v1 (DM-4 watcher, DM-6 drain), callcenter-membrane-v1 (parent membrane doctrine). +> **Carry-over:** prior plans are NOT superseded. This plan references their D-ids and defines net-new ones (D-CASCADE-V1-*). + +## Pillar 0 — The holy-grail click (answers main-thread question, 2026-05-07) + +**`lance-graph-ontology::OntologyRegistry` IS the SoA. Schema is the DTO + index.** + +Restated: every per-domain schema (Healthcare, WorkOrder, SMB, CallCenter, Medical-BioPortal) is a thin **name→row map + column projection** over a single canonical struct-of-arrays held by `OntologyRegistry`. Bridges already hold `LazyLock<&OntologyRegistry>` — they do not own columns; they own a scoped view. This is the right pattern; v1 makes it consequential. + +**The codec cascade per row** (one entry per ontology concept): + +| Column | Type | Source | Cost | +|---|---|---|---| +| `identity_fp` | `Vsa16kF32` | hashed IRI + role-key bind | 64 KB | +| `cam_pq_code` | `[u8; 6]` | quantized projection of `identity_fp` against the 4096-centroid codebook | 6 B | +| `base17` | `[u8; 34]` | bgz17 encoder over `identity_fp` | 34 B | +| `palette_key` | `u32` | PaletteSemiring keyed by `base17` archetype | 4 B | +| `scent` | `u8` | final cascade tier (per `docs/CODEC_COMPRESSION_ATLAS.md`) | 1 B | +| `qualia` | `[f32; 18]` | NARS truth + DK + flow + compass | 72 B | +| `meta` | `MetaWord` | dispatch bits + dcterms:source pointer | 8 B | +| `edge` | `CausalEdge64` | predicate-relations to other rows | 8 B | + +Every step in `name → row → fingerprint → CAM-PQ → palette key → Scent` is **O(1)**. A schema (TTL, SQL DDL, JSON Schema, Cypher node label) is *literally* a list of (column-projection + name-resolution) declarations. **The "encode literally everything with indices" outcome is content-addressable memory**: same row addressable through name (HashMap), through similarity (CAM-PQ), through composition (PaletteSemiring), through cosine (Vsa16kF32). + +**Consequences for v1**: + +- The OGIT TTL files in `AdaWorldAPI/OGIT/NTO//` are **the seed**. They populate the SoA on hydrate; they are not the runtime store. +- `MedCare-rs/.MYSQL/Struktur.sql` (104 tables) and the BioPortal ontology bundles (25 ontologies, ~2.4 GB) are **DTO declarations**: each becomes a column-projection + provenance pointer over the SoA, not a parallel copy. +- The smb-bridge / medcare-bridge / callcenter-bridge collapse is **mechanical**: they're already `LazyLock<&OntologyRegistry>` views; v1 just gives the registry the columns they need to project. + +This pillar is non-negotiable. Every subsequent deliverable serves it. + +## Pillar 1 — OGIT as the universal SPO-G lingua franca + +OGIT (the AdaWorldAPI fork of `almatoai/OGIT`) is the **single source of truth for ontology namespaces**. v4 shipped `NTO/WorkOrder/`. v1 expands the fork (never PR'd back to upstream — ratified 2026-05-07) to host: + +- `NTO/Medical/` — BioPortal-derived medical extensions (the user's "Medical arsenal"). +- `NTO/SMB/` — small-business namespace (already export-only per v5 ratification). +- `NTO/CallCenter/` — callcenter wire shapes (FacultyDescriptor, CommitFilter, CognitiveEventRow column families). +- `NTO/Healthcare/` — **delegated** to `lance-graph-rdf-fma-snomed-v1` (FMA + RadLex + SNOMED CT named-graph importer); v1 does not duplicate its work. + +The SPO-G shape: `(subject, predicate, object, ontology_context_id)` per `lance-graph-rdf-fma-snomed-v1` §Core types. v1 extends `OntologyRegistry::SchemaPtr` to carry `ontology_context_id: u32` so the same row in the SoA can resolve in multiple named-graph contexts without semantic mud. + +## Pillar 2 — Zone 1 / Zone 2 / Zone 3 (BBB membrane refinement) + +The user's "outbound serialization only in Zone 3 outside callcenter" is a **tightening** of the existing BBB membrane doctrine (`callcenter-membrane-v1.md` § 10.9). The concrete map: + +| Zone | Substrate | What may exist | What may NOT exit | +|---|---|---|---| +| **Zone 1 — Inner BindSpace** | `cognitive-shader-driver` SoA, `BindSpace` columns | Vsa16kF32, palette codes, MetaWord, CausalEdge64 | anything across the BBB; nothing here is `serde::Serialize` | +| **Zone 2 — Membrane** | `lance-graph-callcenter::lance_membrane`, `ExternalMembrane` trait | Arrow `RecordBatch`, scalar-only columns (`bbb_scalar_only_compile_check`) | VSA / palette / NARS truth — already enforced | +| **Zone 3 — Outbound transcode** | `lance-graph-callcenter::transcode`, `phoenix` (planned), `postgrest`, `drain` | Supabase realtime payloads, REST JSON, gRPC, JSON Schema responses | direct reads from `BindSpace` (must traverse Zones 1→2→3) | + +**The hard rule**: `serde::Serialize` may only be derived on types that live under `crates/lance-graph-callcenter/src/transcode/` or downstream of it. v1 ships a `cert-officer` static check that fails CI if a Zone 1 / Zone 2 type acquires `Serialize`. + +## Pillar 3 — Bridge collapse (smb / medcare → 20-LOC scoped views) + +Per the user: *"when callcenter speaks OGIT i understand we can DTO all existing smb-bridge and medcare-bridge and just expand OGIT in our fork to include everything"*. RATIFIED. The mechanism: + +- **Today**: `lance-graph-callcenter::ontology_dto` exports `medcare_ontology()` + `smb_ontology()` factory fns (lib.rs:51). Each is a hand-rolled DTO mirror. +- **v1**: both factories become **2-line projections** over `OntologyRegistry::enumerate(namespace)` filtered through `MedcareBridge::filter` / `SmbBridge::filter`. The bridges stay 15–20 LOC; the heavy lifting moves into the registry's column projection. +- **Schema vs OGIT comparison**: `MedCare-rs/.MYSQL/Struktur.sql` (104 tables grouped: combo_* 24, pf_* 30, praxis_* 14, pat_* 4, glob_* 7, file_* 4, misc 21) becomes a per-table **DTO declaration** under `OGIT/NTO/Medical/sql_mirror/` — one TTL per logical table, with `dcterms:source = "MedCare-rs/.MYSQL/Struktur.sql:"` and column-mapping triples. The MySQL schema does not move; OGIT carries its **shape**. + +## Pillar 4 — BioPortal arsenal under `OGIT/NTO/Medical/` + +The release `bioportal-ontologies-2026-05-05` (private mirror, ~2.4 GB) bundles 25 ontologies. v1 does **not** load all of them — that's the remit of `lance-graph-rdf-fma-snomed-v1`. v1 emits **OGIT TTL stubs** (one per ontology) under `AdaWorldAPI/OGIT/NTO/Medical//` declaring: + +```turtle +ogit.Medical:ICD10CM + a ogit:Namespace ; + rdfs:label "ICD-10 Clinical Modification" ; + ogit:contextIri ; + ogit:contextId 10 ; + dcterms:source "bioportal-ontologies-2026-05-05/ICD10CM.ttl" ; + dcterms:license "UMLS-Metathesaurus" ; + ogit:fileSize "51.6 MB" ; + ogit:tripleCount "~1.8M (estimate)" ; + ogit:loaderCrate "lance-graph-rdf" . +``` + +These stubs make the registry **aware** of every BioPortal ontology without loading it. The actual triple ingestion is gated on the importer in `lance-graph-rdf-fma-snomed-v1`. Priority for stubbing: `ICD10CM` (51.6 MB), `RxNorm` (218.8 MB), `LOINC` (739.2 MB), `FMA` (266.2 MB), `RadLex` (64.9 MB), `SNOMED-stub` (666 KB partial), `MONDO` (215.5 MB), `HPO` (10.7 MB), `DRON` (701.7 MB), `CHEBI` (259.6 MB) — top 10 by clinical leverage. + +## The cascade (end-to-end) + +``` +INBOUND OUTBOUND + (Zone 3 only) +Supabase Postgres CDC ▲ + │ realtime change-feed (websocket) │ + ▼ │ +Zone 3: drain.rs ingest Zone 3: transcode emit + │ parses change row │ Cypher → SPARQL CONSTRUCT + ▼ │ → JSON-LD → Supabase RPC payload +oxigraph store (oxttl/oxrdf parser) │ + │ triples land in named graph OR + ▼ │ CommitFilter → Expr → DataFusion +Zone 2: ExternalMembrane.ingest │ → Arrow RecordBatch → Phoenix WS push + │ Arrow scalar projection ▲ + ▼ │ +Zone 2: LanceMembrane.project │ + │ versioned Lance write │ + ▼ │ +Zone 1: BindSpace SoA append │ + │ CollapseGate bundles │ + ▼ │ +Cognitive cycle (CognitiveShader) │ + │ resolve(F < 0.2) → commit │ + ▼ │ +TripletGraph (AriGraph) write │ + │ promote to SPO via SpoBridge (v5 D-2)│ + ▼ │ +LanceVersionWatcher.bump │ + │ watch::Sender │ + ▼ │ +Zone 2 → Zone 3 fan-out ──────────────────┘ +``` + +The path is symmetric across the membrane. **No row crosses Zone 1 → Zone 3 without a Zone 2 RecordBatch projection.** + +## Deliverables (15 total, ranked by leverage / cost) + +| Rank | D-id | Scope | LOC | Owner crate | +|---|---|---|---|---| +| 1 | **D-CASCADE-V1-1** | `cert-officer` static check: deny `serde::Serialize` on Zone 1 / Zone 2 types | ~120 | `lance-graph-callcenter` (build script) | +| 2 | **D-CASCADE-V1-2** | Extend `OntologyRegistry::SchemaPtr` to carry `ontology_context_id: u32` (per `lance-graph-rdf` §Core) | ~60 | `lance-graph-ontology` | +| 3 | **D-CASCADE-V1-3** | Collapse `medcare_ontology()` + `smb_ontology()` to 2-line projections over `OntologyRegistry::enumerate(ns)` | ~40 (delete >100) | `lance-graph-callcenter::ontology_dto` | +| 4 | **D-CASCADE-V1-4** | Emit BioPortal namespace stubs under `AdaWorldAPI/OGIT/NTO/Medical/{ICD10CM,RxNorm,LOINC,FMA,RadLex,SNOMED,MONDO,HPO,DRON,CHEBI}/` | ~10 TTL files × ~20 lines | OGIT fork | +| 5 | **D-CASCADE-V1-5** | Transcode `MedCare-rs/.MYSQL/Struktur.sql` → 104 TTL files under `OGIT/NTO/Medical/sql_mirror/` (one per table) | ~104 × ~25 lines (mostly mechanical) | OGIT fork | +| 6 | **D-CASCADE-V1-6** | `OGIT/NTO/CallCenter/` namespace: 6 entities (FacultyDescriptor, CommitFilter, CognitiveEventRow, ExternalIntent, DnPath, ActorContext) | ~6 × ~30 lines | OGIT fork | +| 7 | **D-CASCADE-V1-7** | Add codec-cascade columns to `OntologyRegistry` SoA (`cam_pq_code`, `base17`, `palette_key`, `scent`, `qualia`, `meta`, `edge`) | ~250 | `lance-graph-ontology` | +| 8 | **D-CASCADE-V1-8** | Wire `lance-graph-rdf::SemanticQuad` consumer into `OntologyRegistry::ingest_quads(quads, context_id)` | ~150 | `lance-graph-ontology` (depends on v1 of `lance-graph-rdf`) | +| 9 | **D-CASCADE-V1-9** | Supabase realtime inbound: `drain.rs` change-feed parser → SemanticQuad → `OntologyRegistry::ingest_quads` | ~200 | `lance-graph-callcenter::drain` (extends DM-6 from supabase-subscriber-v1) | +| 10 | **D-CASCADE-V1-10** | Supabase realtime outbound: `transcode/supabase.rs` Cypher → SPARQL CONSTRUCT → JSON-LD → Supabase RPC | ~250 | `lance-graph-callcenter::transcode` | +| 11 | **D-CASCADE-V1-11** | O(1) probe: measure `name → cam_pq_code` lookup p99 latency vs raw oxigraph SPARQL p99; target ≥ 100× speedup | ~80 (bench harness) | `lance-graph-ontology-benches` (new) | +| 12 | **D-CASCADE-V1-12** | `MulThresholdProfile` (v5 D-9) consults `ontology_context_id` so `medical/clinical` thresholds are stricter than `callcenter/conversational` | ~80 | `lance-graph-contract::mul` (extends v5) | +| 13 | **D-CASCADE-V1-13** | End-to-end integration test: Supabase webhook → OGIT triple → cognitive cycle → outbound RPC, asserting Zone 3 is the only emission point | ~300 | `lance-graph-callcenter/tests` | +| 14 | **D-CASCADE-V1-14** | `OGIT/NTO/Medical/sql_mirror/` round-trip: emit MySQL DDL from TTL projection, diff against `Struktur.sql` (must round-trip identity for column names + types) | ~150 | `lance-graph-callcenter::transcode` | +| 15 | **D-CASCADE-V1-15** | BioPortal ICD-10 actual import (smallest of the BIG ontologies at 51.6 MB), populate `OntologyRegistry` with codec-cascade columns | ~200 | `lance-graph-rdf::importers::icd10cm` (new) | + +## Acceptance criteria + +- [ ] `cargo test -p lance-graph-callcenter --features full` passes; `bbb_scalar_only_compile_check` still compiles. +- [ ] D-CASCADE-V1-1 fails the build if a Zone 1 type acquires `Serialize` (test with a deliberate poison-pill type in `tests/`). +- [ ] D-CASCADE-V1-3: `git diff` on `ontology_dto.rs` is net negative LOC (collapse, not addition). +- [ ] D-CASCADE-V1-5: at least 90/104 tables round-trip through D-CASCADE-V1-14 (10/104 may legitimately differ due to MySQL-specific types like `MEDIUMTEXT`). +- [ ] D-CASCADE-V1-11: O(1) probe shows ≥ 100× p99 speedup for `name → cam_pq_code` over raw SPARQL `SELECT ?o WHERE { :name ogit:hasCamPqCode ?o }`. Lower bound on the holy grail. +- [ ] D-CASCADE-V1-13: webhook → cycle → RPC round-trip passes; the test asserts (via `cert-officer`) that no Zone 1 / Zone 2 type is reachable through `Serialize`. +- [ ] All BioPortal stubs (D-CASCADE-V1-4) carry `dcterms:source` (per v5 D-1) and `dcterms:license`. +- [ ] No upstream PRs to `almatoai/OGIT` (per v5 ratification Q4). + +## Out of v1 scope (explicit deferrals, not punts) + +- **Full SNOMED CT** import: license-gated per `lance-graph-rdf-fma-snomed-v1` §SNOMED, also the BioPortal release ships only a 666 KB partial. v1 stubs the namespace; full import waits on affiliate attestation. +- **DRON / CHEBI** import: 700 MB + 260 MB respectively, large; benefit is unclear before D-CASCADE-V1-11 measures the cascade payoff. Stub now, import in v2 if probe motivates it. +- **bgz-tensor attention layer integration** with the codec-cascade columns: orthogonal to this plan; the AttentionSemiring composes over PaletteSemiring already. +- **n8n-rs / crewai-rust consumption** of the new SoA columns: those repos consume `lance-graph-contract`; v1 does not change the contract surface beyond D-CASCADE-V1-12. + +## Open questions + +1. **Codec-cascade column population trigger**: synchronous on `ingest_quads()` or lazy (compute on first read)? Lazy adds complexity; sync makes hydrate slower. **Recommend sync** — hydrate is once-per-process, the cascade computation is bounded. +2. **`ontology_context_id` allocation policy**: dense (FMA=1, RadLex=2, SNOMED=3, ...) per `lance-graph-rdf-fma-snomed-v1` §Core, or sparse hash-derived? Dense is simpler and traceable; sparse avoids cross-repo coordination. **Recommend dense + a `NamespaceRegistry` allocation table sidecar**. +3. **Supabase realtime auth**: JWT verification at Zone 3 entry (per `auth.rs`) or row-level via `rls.rs`? **Recommend both** — JWT is the gate, RLS is the post-gate filter, neither replaces the other. +4. **Schema-as-DTO inheritance pattern**: derive macro (`#[derive(OntologyDto)]`), declarative builder, or hand-rolled per bridge? **Recommend declarative builder** for the first three bridges, escalate to a derive macro only if the pattern repeats six+ times. +5. **OGIT verb expansion for medical**: BioPortal ontologies use OWL object properties (e.g. `regional_part_of` from FMA). Do we map these to `ogit:Verb` shape (one TTL per verb) or carry them through as named-graph triples without verb registration? **Recommend named-graph triples** — we don't ontologize the ontology's predicates, that's infinite recursion. + +## Self-bootstrapping prompt for next session + +``` +Read .claude/plans/ogit-cascade-supabase-callcenter-v1.md cover-to-cover before +proposing any change. The Pillar 0 click — OntologyRegistry IS the SoA, schema +IS the DTO + index — is the architectural anchor; if you find yourself proposing +a parallel store, a copy of columns, or a non-projection bridge, stop and re-read +Pillar 0. + +Top-3 deliverables to start: D-CASCADE-V1-1 (cert-officer static check), +D-CASCADE-V1-2 (SchemaPtr.ontology_context_id), D-CASCADE-V1-3 (collapse +medcare_ontology + smb_ontology to 2-line projections). All three are bounded, +testable, and serve Pillar 0 directly. + +Do NOT start D-CASCADE-V1-{4,5,15} (the BioPortal / SQL transcode work) until +D-CASCADE-V1-{2,7,8} are merged — those are the registry surfaces those +deliverables write into. + +Cross-plan deps: lance-graph-ontology v5 D-9 (MulThresholdProfile lands in +lance-graph-contract::mul), lance-graph-rdf-fma-snomed-v1 (SemanticQuad type ++ NamedGraphRegistry), supabase-subscriber-v1 (DM-4 watcher, DM-6 drain). +Confirm those are merged before merging this plan's PRs that depend on them. + +Branch: claude/create-graph-ontology-crate-gkuJG (per workspace policy). +PR target: AdaWorldAPI/lance-graph base=main. +``` + +## Cross-references + +- `.claude/plans/lance-graph-ontology-v5.md` — D-9 (MulThresholdProfile), D-2 (SpoBridge::promote_to_spo). +- `.claude/plans/lance-graph-rdf-fma-snomed-v1.md` — SemanticQuad row type, NamedGraphRegistry, OntologyContextId. +- `.claude/plans/supabase-subscriber-v1.md` — DM-4 LanceVersionWatcher, DM-6 DrainTask scaffold. +- `.claude/plans/callcenter-membrane-v1.md` — § 10.9 Membrane Role Place Translation iron rule (parent doctrine). +- `docs/CODEC_COMPRESSION_ATLAS.md` — full cascade chain (Vsa16kF32 → ZeckBF17 → Base17 → CAM-PQ → Scent). +- `docs/ORCHESTRATION_IS_GRAPH.md` — orchestration-as-graph capstone (Zone 3 routing maps onto graph traversal). +- `MedCare-rs/.MYSQL/Struktur.sql` — 104-table source. +- `MedCare-rs/releases/tag/bioportal-ontologies-2026-05-05` — 25 ontology bundles, ~2.4 GB. +- `AdaWorldAPI/OGIT` (extension fork) — never PR'd to upstream. + +## Confidence (2026-05-07) + +Pre-execution. Pillar 0 is the only architectural commitment that admits no rollback — if it's wrong, the entire plan is wrong. It is right (per PR #223 AGI-as-SoA invariant + the existing `LazyLock<&OntologyRegistry>` pattern in the bridges). Pillars 1-4 are mechanical consequences. The 15 deliverables are bounded; D-CASCADE-V1-1 / 2 / 3 land first because they have no upstream blockers. diff --git a/.claude/settings.json b/.claude/settings.json index 5f823142..aa0700e2 100644 --- a/.claude/settings.json +++ b/.claude/settings.json @@ -5,9 +5,17 @@ "Edit(**/*.md)", "Edit(**/*.rs)", "Edit(**/*.toml)", + "Edit(**/*.ttl)", + "Edit(**/*.json)", "Write(**/*.md)", "Write(**/*.rs)", "Write(**/*.toml)", + "Write(**/*.ttl)", + "Write(**/*.json)", + "MultiEdit(**/*.md)", + "MultiEdit(**/*.rs)", + "MultiEdit(**/*.toml)", + "MultiEdit(**/*.ttl)", "Bash(tee -a:*)", "Bash(tee -a .claude/board/:*)", "Bash(tee -a .claude/knowledge/:*)", @@ -44,7 +52,43 @@ "Bash(wc:*)", "Bash(grep:*)", "Bash(find:*)", - "mcp__github__create_pull_request" + "Bash(mkdir:*)", + "Bash(cat:*)", + "Bash(diff:*)", + "Bash(rg:*)", + "Bash(head:*)", + "Bash(tail:*)", + "Bash(sort:*)", + "Bash(uniq:*)", + "Bash(awk:*)", + "Bash(sed:*)", + "Bash(test:*)", + "Bash(echo:*)", + "Bash(true:*)", + "Bash(false:*)", + "mcp__github__create_pull_request", + "mcp__github__create_or_update_file", + "mcp__github__push_files", + "mcp__github__get_file_contents", + "mcp__github__list_branches", + "mcp__github__create_branch", + "mcp__github__update_pull_request", + "mcp__github__pull_request_read", + "mcp__github__list_commits", + "mcp__github__get_commit", + "mcp__github__list_pull_requests", + "mcp__github__add_issue_comment", + "mcp__github__add_reply_to_pull_request_comment", + "mcp__github__pull_request_review_write", + "mcp__github__add_comment_to_pending_review", + "mcp__github__update_pull_request_branch", + "mcp__github__subscribe_pr_activity", + "mcp__github__unsubscribe_pr_activity", + "mcp__github__get_me", + "mcp__github__search_code", + "mcp__github__search_issues", + "mcp__github__search_pull_requests", + "mcp__github__search_repositories" ], "ask": [], "deny": [ diff --git a/Cargo.lock b/Cargo.lock index 43e52660..9bf43595 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -2944,6 +2944,13 @@ dependencies = [ "percent-encoding", ] +[[package]] +name = "fractal" +version = "0.1.0" +dependencies = [ + "libm", +] + [[package]] name = "fs4" version = "0.8.4" @@ -4373,6 +4380,26 @@ dependencies = [ name = "lance-graph-contract" version = "0.1.0" +[[package]] +name = "lance-graph-ontology" +version = "0.1.0" +dependencies = [ + "arrow", + "arrow-array", + "arrow-schema", + "futures", + "lance", + "lance-graph-contract", + "once_cell", + "oxrdf", + "oxttl", + "sha2 0.10.9", + "tempfile", + "thiserror 2.0.18", + "tokio", + "toml", +] + [[package]] name = "lance-graph-planner" version = "0.1.0" @@ -5120,8 +5147,6 @@ dependencies = [ "num-complex", "num-integer", "num-traits", - "p64", - "phyllotactic-manifold", "portable-atomic", "portable-atomic-util", "rawpointer", @@ -5472,11 +5497,51 @@ dependencies = [ "stable_deref_trait", ] +[[package]] +name = "oxilangtag" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "23f3f87617a86af77fa3691e6350483e7154c2ead9f1261b75130e21ca0f8acb" +dependencies = [ + "serde", +] + +[[package]] +name = "oxiri" +version = "0.2.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "54b4ed3a7192fa19f5f48f99871f2755047fabefd7f222f12a1df1773796a102" + +[[package]] +name = "oxrdf" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a04761319ef84de1f59782f189d072cbfc3a9a40c4e8bded8667202fbd35b02a" +dependencies = [ + "oxilangtag", + "oxiri", + "rand 0.8.6", + "thiserror 2.0.18", +] + +[[package]] +name = "oxttl" +version = "0.1.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0d385f1776d7cace455ef6b7c54407838eff902ca897303d06eb12a26f4cf8a0" +dependencies = [ + "memchr", + "oxilangtag", + "oxiri", + "oxrdf", + "thiserror 2.0.18", +] + [[package]] name = "p64" version = "0.1.0" dependencies = [ - "phyllotactic-manifold", + "fractal", ] [[package]] @@ -5648,10 +5713,6 @@ dependencies = [ "siphasher", ] -[[package]] -name = "phyllotactic-manifold" -version = "0.1.0" - [[package]] name = "pin-project" version = "1.1.11" @@ -5811,7 +5872,7 @@ version = "3.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e67ba7e9b2b56446f1d419b1d807906278ffa1a658a8a5d8a39dcb1f5a78614f" dependencies = [ - "toml_edit", + "toml_edit 0.25.11+spec-1.1.0", ] [[package]] @@ -6701,6 +6762,15 @@ dependencies = [ "syn 2.0.117", ] +[[package]] +name = "serde_spanned" +version = "0.6.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bf41e0cfaf7226dca15e8197172c295a782857fcb97fad1808a166870dee75a3" +dependencies = [ + "serde", +] + [[package]] name = "serde_urlencoded" version = "0.7.1" @@ -7546,6 +7616,27 @@ dependencies = [ "tokio", ] +[[package]] +name = "toml" +version = "0.8.23" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc1beb996b9d83529a9e75c17a1686767d148d70663143c7854d8b4a09ced362" +dependencies = [ + "serde", + "serde_spanned", + "toml_datetime 0.6.11", + "toml_edit 0.22.27", +] + +[[package]] +name = "toml_datetime" +version = "0.6.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "22cddaf88f4fbc13c51aebbf5f8eceb5c7c5a9da2ac40a13519eb5b0a0e8f11c" +dependencies = [ + "serde", +] + [[package]] name = "toml_datetime" version = "1.1.1+spec-1.1.0" @@ -7555,6 +7646,19 @@ dependencies = [ "serde_core", ] +[[package]] +name = "toml_edit" +version = "0.22.27" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a" +dependencies = [ + "indexmap 2.14.0", + "serde", + "serde_spanned", + "toml_datetime 0.6.11", + "winnow 0.7.15", +] + [[package]] name = "toml_edit" version = "0.25.11+spec-1.1.0" @@ -7562,9 +7666,9 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0b59c4d22ed448339746c59b905d24568fcbb3ab65a500494f7b8c3e97739f2b" dependencies = [ "indexmap 2.14.0", - "toml_datetime", + "toml_datetime 1.1.1+spec-1.1.0", "toml_parser", - "winnow", + "winnow 1.0.2", ] [[package]] @@ -7573,7 +7677,7 @@ version = "1.1.2+spec-1.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "a2abe9b86193656635d2411dc43050282ca48aa31c2451210f4202550afb7526" dependencies = [ - "winnow", + "winnow 1.0.2", ] [[package]] @@ -8360,6 +8464,15 @@ version = "0.53.1" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650" +[[package]] +name = "winnow" +version = "0.7.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "df79d97927682d2fd8adb29682d1140b343be4ac0f08fd68b7765d9c059d3945" +dependencies = [ + "memchr", +] + [[package]] name = "winnow" version = "1.0.2" diff --git a/Cargo.toml b/Cargo.toml index 2dd7c7f2..cc27dccc 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -9,6 +9,7 @@ members = [ "crates/lance-graph-callcenter", "crates/lance-graph-archetype", "crates/lance-graph-rbac", + "crates/lance-graph-ontology", "crates/bgz-tensor", ] exclude = [ diff --git a/crates/cognitive-shader-driver/Cargo.lock b/crates/cognitive-shader-driver/Cargo.lock index 133a3197..6684da9c 100644 --- a/crates/cognitive-shader-driver/Cargo.lock +++ b/crates/cognitive-shader-driver/Cargo.lock @@ -289,7 +289,16 @@ dependencies = [ "cc", "cfg-if", "constant_time_eq", - "cpufeatures", + "cpufeatures 0.3.0", +] + +[[package]] +name = "block-buffer" +version = "0.10.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71" +dependencies = [ + "generic-array", ] [[package]] @@ -366,6 +375,7 @@ dependencies = [ "causal-edge", "deepnsm", "lance-graph-contract", + "lance-graph-ontology", "lance-graph-planner", "ndarray", "p64-bridge", @@ -410,6 +420,15 @@ version = "0.8.7" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b" +[[package]] +name = "cpufeatures" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280" +dependencies = [ + "libc", +] + [[package]] name = "cpufeatures" version = "0.3.0" @@ -425,13 +444,34 @@ version = "0.2.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5" +[[package]] +name = "crypto-common" +version = "0.1.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "78c8292055d1c1df0cce5d180393dc8cce0abec0a7102adb6c7b1eef6016d60a" +dependencies = [ + "generic-array", + "typenum", +] + [[package]] name = "deepnsm" version = "0.1.0" dependencies = [ + "lance-graph-contract", "ndarray", ] +[[package]] +name = "digest" +version = "0.10.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ed9a281f7bc9b7576e61468ba615a66a5c8cfdff42420a70aa82701a3b1e292" +dependencies = [ + "block-buffer", + "crypto-common", +] + [[package]] name = "either" version = "1.15.0" @@ -487,6 +527,13 @@ dependencies = [ "percent-encoding", ] +[[package]] +name = "fractal" +version = "0.1.0" +dependencies = [ + "libm", +] + [[package]] name = "futures" version = "0.3.32" @@ -575,6 +622,16 @@ dependencies = [ "slab", ] +[[package]] +name = "generic-array" +version = "0.14.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "85649ca51fd72272d7821adaf274ad91c288277713d9c18820d8499a7ff69e9a" +dependencies = [ + "typenum", + "version_check", +] + [[package]] name = "getrandom" version = "0.2.17" @@ -844,6 +901,19 @@ dependencies = [ name = "lance-graph-contract" version = "0.1.0" +[[package]] +name = "lance-graph-ontology" +version = "0.1.0" +dependencies = [ + "lance-graph-contract", + "once_cell", + "oxrdf", + "oxttl", + "sha2", + "thiserror", + "toml", +] + [[package]] name = "lance-graph-planner" version = "0.1.0" @@ -956,8 +1026,6 @@ dependencies = [ "num-complex", "num-integer", "num-traits", - "p64", - "phyllotactic-manifold", "portable-atomic", "portable-atomic-util", "rawpointer", @@ -1007,11 +1075,51 @@ version = "1.21.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9f7c3e4beb33f85d45ae3e3a1792185706c8e16d043238c593331cc7cd313b50" +[[package]] +name = "oxilangtag" +version = "0.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "23f3f87617a86af77fa3691e6350483e7154c2ead9f1261b75130e21ca0f8acb" +dependencies = [ + "serde", +] + +[[package]] +name = "oxiri" +version = "0.2.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "54b4ed3a7192fa19f5f48f99871f2755047fabefd7f222f12a1df1773796a102" + +[[package]] +name = "oxrdf" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a04761319ef84de1f59782f189d072cbfc3a9a40c4e8bded8667202fbd35b02a" +dependencies = [ + "oxilangtag", + "oxiri", + "rand", + "thiserror", +] + +[[package]] +name = "oxttl" +version = "0.1.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0d385f1776d7cace455ef6b7c54407838eff902ca897303d06eb12a26f4cf8a0" +dependencies = [ + "memchr", + "oxilangtag", + "oxiri", + "oxrdf", + "thiserror", +] + [[package]] name = "p64" version = "0.1.0" dependencies = [ - "phyllotactic-manifold", + "fractal", ] [[package]] @@ -1038,10 +1146,6 @@ dependencies = [ "indexmap 2.14.0", ] -[[package]] -name = "phyllotactic-manifold" -version = "0.1.0" - [[package]] name = "pin-project" version = "1.1.11" @@ -1322,6 +1426,15 @@ dependencies = [ "serde_core", ] +[[package]] +name = "serde_spanned" +version = "0.6.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bf41e0cfaf7226dca15e8197172c295a782857fcb97fad1808a166870dee75a3" +dependencies = [ + "serde", +] + [[package]] name = "serde_urlencoded" version = "0.7.1" @@ -1349,6 +1462,17 @@ dependencies = [ "version_check", ] +[[package]] +name = "sha2" +version = "0.10.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283" +dependencies = [ + "cfg-if", + "cpufeatures 0.2.17", + "digest", +] + [[package]] name = "shlex" version = "1.3.0" @@ -1507,6 +1631,40 @@ dependencies = [ "tokio", ] +[[package]] +name = "toml" +version = "0.8.23" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc1beb996b9d83529a9e75c17a1686767d148d70663143c7854d8b4a09ced362" +dependencies = [ + "serde", + "serde_spanned", + "toml_datetime", + "toml_edit", +] + +[[package]] +name = "toml_datetime" +version = "0.6.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "22cddaf88f4fbc13c51aebbf5f8eceb5c7c5a9da2ac40a13519eb5b0a0e8f11c" +dependencies = [ + "serde", +] + +[[package]] +name = "toml_edit" +version = "0.22.27" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a" +dependencies = [ + "indexmap 2.14.0", + "serde", + "serde_spanned", + "toml_datetime", + "winnow", +] + [[package]] name = "tonic" version = "0.12.3" @@ -1637,6 +1795,12 @@ version = "0.2.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e421abadd41a4225275504ea4d6566923418b7f05506fbc9c0fe86ba7396114b" +[[package]] +name = "typenum" +version = "1.20.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "40ce102ab67701b8526c123c1bab5cbe42d7040ccfd0f64af1a385808d2f43de" + [[package]] name = "unicode-ident" version = "1.0.24" @@ -1859,6 +2023,15 @@ version = "0.52.6" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "589f6da84c646204747d1270a2a5661ea66ed1cced2631d546fdfb155959f9ec" +[[package]] +name = "winnow" +version = "0.7.15" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "df79d97927682d2fd8adb29682d1140b343be4ac0f08fd68b7765d9c059d3945" +dependencies = [ + "memchr", +] + [[package]] name = "wit-bindgen" version = "0.57.1" diff --git a/crates/cognitive-shader-driver/Cargo.toml b/crates/cognitive-shader-driver/Cargo.toml index 0e0c4e68..591651eb 100644 --- a/crates/cognitive-shader-driver/Cargo.toml +++ b/crates/cognitive-shader-driver/Cargo.toml @@ -34,6 +34,10 @@ required-features = ["lab"] [dependencies] lance-graph-contract = { path = "../lance-graph-contract" } +# Read-only ontology registry (Phase 7, v4 plan). default-features = false so the +# `lance-cache` feature (which pulls protoc/lance-encoding) stays gated; this +# crate consumes the in-memory registry path only. +lance-graph-ontology = { path = "../lance-graph-ontology", default-features = false } p64-bridge = { path = "../p64-bridge" } bgz17 = { path = "../bgz17" } causal-edge = { path = "../causal-edge" } diff --git a/crates/cognitive-shader-driver/src/bindspace.rs b/crates/cognitive-shader-driver/src/bindspace.rs index 0aba54bd..451e3749 100644 --- a/crates/cognitive-shader-driver/src/bindspace.rs +++ b/crates/cognitive-shader-driver/src/bindspace.rs @@ -10,7 +10,10 @@ //! The `cycle` column uses `Vsa16kF32` carrier (16,384 × f32 = 64 KB per row) //! for algebraic operations; other planes remain `u64 × 256`. +use std::sync::Arc; + use lance_graph_contract::cognitive_shader::{ColumnWindow, MetaFilter, MetaWord}; +use lance_graph_ontology::OntologyRegistry; pub const WORDS_PER_FP: usize = 256; pub const WIDTH_BITS: usize = WORDS_PER_FP * 64; @@ -163,7 +166,10 @@ impl MetaColumn { /// u64 temporal + u16 expert. All separate column buffers. /// /// Mutations go through CollapseGate (lance-graph-contract::collapse_gate). -#[derive(Debug)] +/// +/// `Debug` is implemented manually because `OntologyRegistry` does not derive +/// `Debug` (it holds interior mutability and large hydrated tables); the +/// registry slot is rendered as a presence flag only. pub struct BindSpace { pub len: usize, pub fingerprints: FingerprintColumns, @@ -175,6 +181,37 @@ pub struct BindSpace { /// Column H: per-row entity type binding (Foundry Object Type equivalent). /// 0 = untyped. Non-zero = 1-based index into `Ontology.schemas`. pub entity_type: Box<[u16]>, + /// Optional handle to the ontology registry (Phase 7, v4 plan). + /// + /// READ-ONLY access only. The driver consults this registry to resolve + /// `entity_type` indices into named OGIT schemas, semantic types, and + /// namespace bridges. The shader never mutates the registry — mutation + /// flows through `OntologyRegistry::append_mapping` on a separately-owned + /// `Arc`, never through `BindSpace`. + /// + /// FUTURE WORK (NOT this session's deliverable): downstream calibration + /// improvements will let the MUL gate pick ontology-aware trust + /// thresholds — e.g. Compliance edges → Plateau-only commit, Healthcare + /// edges → stricter trust calibration. The MUL gate logic in + /// `driver.rs:271-320` and the CausalEdge64 emission path are unchanged + /// in this PR; only the registry handle is wired. + pub ontology: Option>, +} + +impl std::fmt::Debug for BindSpace { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.debug_struct("BindSpace") + .field("len", &self.len) + .field("fingerprints", &self.fingerprints) + .field("edges", &self.edges) + .field("qualia", &self.qualia) + .field("meta", &self.meta) + .field("temporal", &self.temporal) + .field("expert", &self.expert) + .field("entity_type", &self.entity_type) + .field("ontology", &self.ontology.as_ref().map(|_| "")) + .finish() + } } impl BindSpace { @@ -189,9 +226,25 @@ impl BindSpace { temporal: vec![0u64; len].into_boxed_slice(), expert: vec![0u16; len].into_boxed_slice(), entity_type: vec![0u16; len].into_boxed_slice(), + ontology: None, } } + /// Attach a read-only ontology registry handle. Phase 7 (v4 plan). + /// + /// The registry is shared via `Arc` — multiple `BindSpace` instances and + /// the orchestration bridge can hold the same registry. Mutation of the + /// registry (TTL hydration, mapping proposals) happens on the original + /// `OntologyRegistry` owner; this handle is read-only by convention. + pub fn set_ontology(&mut self, registry: Arc) { + self.ontology = Some(registry); + } + + /// Read-only view of the attached ontology registry, if any. + pub fn ontology(&self) -> Option<&Arc> { + self.ontology.as_ref() + } + /// Total byte footprint (sum across all columns). pub fn byte_footprint(&self) -> usize { let content_topic_angle = 3 * self.len * WORDS_PER_FP * 8; @@ -439,6 +492,19 @@ mod tests { assert_eq!(bs.entity_type[1], 0, "push should default to 0"); } + #[test] + fn ontology_handle_attaches_and_reads_back() { + // Phase 7 (v4 plan): BindSpace holds an Option> + // for read-only registry access. This test asserts the wiring; it + // does NOT test ontology semantics (those live in the + // lance-graph-ontology crate's own tests). + let mut bs = BindSpace::zeros(4); + assert!(bs.ontology().is_none(), "default must be None"); + let reg = Arc::new(OntologyRegistry::new_in_memory()); + bs.set_ontology(reg); + assert!(bs.ontology().is_some(), "after set_ontology must be Some"); + } + #[test] fn set_cycle_direct_f32() { let mut bs = BindSpace::zeros(2); diff --git a/crates/lance-graph-ontology/Cargo.toml b/crates/lance-graph-ontology/Cargo.toml new file mode 100644 index 00000000..c7da1a2d --- /dev/null +++ b/crates/lance-graph-ontology/Cargo.toml @@ -0,0 +1,50 @@ +[package] +name = "lance-graph-ontology" +version = "0.1.0" +edition = "2021" +description = "OGIT-canonical ontology spine: TTL hydration, runtime dictionary cache, and tenant namespace bridges over the lance-graph contract surface." +license = "Apache-2.0" +keywords = ["ontology", "ogit", "ttl", "rdf", "lance"] + +[dependencies] +# Contract surface: PropertySpec / Marking / SemanticType / Schema / Ontology +# / SchemaExpander. Zero-dep crate by design — we depend on it, not the reverse. +lance-graph-contract = { path = "../lance-graph-contract" } + +# TTL parser. oxttl is the smallest streaming Turtle parser in the workspace's +# dependency graph and matches the shape of OGIT's per-entity .ttl files. +oxttl = "0.1" +oxrdf = "0.2" + +# Errors. thiserror keeps the public Error enum readable without serde. +thiserror = "2" + +# Lazy globals for the SemanticType lookup table built from semantic_types.toml. +once_cell = "1" + +# TOML config (semantic_types.toml). Used as a value-shape parser only — no +# serde derives on registry types per CLAUDE.md "no JSON serialization in types". +toml = { version = "0.8", default-features = false, features = ["parse"] } + +# Hashing for idempotent TTL hydration (root checksum + per-fragment checksum). +sha2 = "0.10" + +# Lance-backed dictionary cache is feature-gated so the crate compiles without +# protoc (lance-encoding's build-time dep). The default in-memory registry is +# the canonical surface for tests and consumers that don't need persistence. +lance = { version = "=4.0.0", optional = true } +arrow = { version = "57", optional = true } +arrow-array = { version = "57", optional = true } +arrow-schema = { version = "57", optional = true } +tokio = { version = "1", default-features = false, features = ["rt", "macros", "fs"], optional = true } +futures = { version = "0.3", optional = true } + +[features] +default = [] +# Lance dataset persistence for the ontology_dictionary table. Requires protoc +# at build time (lance-encoding pulls prost-build). Off by default so the +# in-memory registry path is reachable without the system dependency. +lance-cache = ["dep:lance", "dep:arrow", "dep:arrow-array", "dep:arrow-schema", "dep:tokio", "dep:futures"] + +[dev-dependencies] +tempfile = "3" diff --git a/crates/lance-graph-ontology/src/bridge.rs b/crates/lance-graph-ontology/src/bridge.rs new file mode 100644 index 00000000..56873516 --- /dev/null +++ b/crates/lance-graph-ontology/src/bridge.rs @@ -0,0 +1,162 @@ +//! `NamespaceBridge` trait and the canonical `BridgeError` type. +//! +//! A bridge is a thin scoped view over the shared `OntologyRegistry`. It +//! does two things only: +//! +//! 1. Locks every operation to one G partition (cross-namespace access +//! requires explicit unlock — the bridge does not provide that). +//! 2. Translates the bridge's public-facing entity / edge / attribute names +//! to OGIT URIs via the registry. +//! +//! The defaults here do all the work; a tenant bridge typically supplies +//! `bridge_id()` + `g_lock()` + a constructor and is otherwise ~5 lines. + +use crate::error::Result; +use crate::namespace::{NamespaceId, OgitUri, SchemaPtr}; +use crate::proposal::MappingRow; +use crate::registry::OntologyRegistry; +use std::sync::Arc; + +#[derive(Debug, thiserror::Error)] +pub enum BridgeError { + #[error("bridge `{bridge_id}`: namespace `{namespace}` is not registered")] + NamespaceMissing { + bridge_id: &'static str, + namespace: &'static str, + }, + + #[error("bridge `{bridge_id}`: public name `{public_name}` is not registered")] + NotInScope { + bridge_id: &'static str, + public_name: String, + }, + + #[error("bridge `{bridge_id}`: cross-namespace leak — resolved to namespace {resolved_id:?} but locked to {locked_id:?}")] + CrossNamespaceLeak { + bridge_id: &'static str, + resolved_id: NamespaceId, + locked_id: NamespaceId, + }, +} + +/// A scoped view of the shared registry. Implementations supply the +/// constants; the defaults handle resolution + scope-lock enforcement. +pub trait NamespaceBridge: Send + Sync { + fn bridge_id(&self) -> &'static str; + fn registry(&self) -> &OntologyRegistry; + fn g_lock(&self) -> NamespaceId; + + fn entity(&self, public_name: &str) -> std::result::Result { + let ptr = self.registry().resolve(self.bridge_id(), public_name).ok_or( + BridgeError::NotInScope { + bridge_id: self.bridge_id_static(), + public_name: public_name.to_string(), + }, + )?; + if ptr.namespace_id() != self.g_lock() { + return Err(BridgeError::CrossNamespaceLeak { + bridge_id: self.bridge_id_static(), + resolved_id: ptr.namespace_id(), + locked_id: self.g_lock(), + }); + } + Ok(EntityRef { schema_ptr: ptr }) + } + + fn edge(&self, public_name: &str) -> std::result::Result { + let ptr = self.registry().resolve(self.bridge_id(), public_name).ok_or( + BridgeError::NotInScope { + bridge_id: self.bridge_id_static(), + public_name: public_name.to_string(), + }, + )?; + if ptr.namespace_id() != self.g_lock() { + return Err(BridgeError::CrossNamespaceLeak { + bridge_id: self.bridge_id_static(), + resolved_id: ptr.namespace_id(), + locked_id: self.g_lock(), + }); + } + Ok(EdgeRef { schema_ptr: ptr }) + } + + /// Resolve by raw OGIT URI. Useful for the `ogit` bridge that does + /// not maintain a public-name dictionary; tenants generally prefer + /// `entity()` / `edge()`. + fn entity_by_uri(&self, uri: &OgitUri) -> std::result::Result { + let ptr = self.registry().resolve_uri(uri.as_str()).ok_or( + BridgeError::NotInScope { + bridge_id: self.bridge_id_static(), + public_name: uri.as_str().to_string(), + }, + )?; + if ptr.namespace_id() != self.g_lock() { + return Err(BridgeError::CrossNamespaceLeak { + bridge_id: self.bridge_id_static(), + resolved_id: ptr.namespace_id(), + locked_id: self.g_lock(), + }); + } + Ok(EntityRef { schema_ptr: ptr }) + } + + /// Returns the underlying dictionary row (full audit detail). + fn row(&self, public_name: &str) -> std::result::Result { + let entity = self.entity(public_name)?; + let _ptr = entity.schema_ptr; + // Re-look-up via URI to get the row — `resolve` returns just the + // pointer so we go through the registry's row_for_uri interface. + // First we find the URI by enumerating the namespace; that is + // O(rows in namespace) but acceptable for this audit-only path. + let registry = self.registry(); + let ns_name = registry + .namespace_names() + .into_iter() + .find(|n| registry.namespace_id(n) == Some(self.g_lock())) + .ok_or(BridgeError::NamespaceMissing { + bridge_id: self.bridge_id_static(), + namespace: "", + })?; + let rows = registry.enumerate(&ns_name); + rows.into_iter() + .find(|r| r.public_name == public_name && r.bridge_id == self.bridge_id()) + .ok_or(BridgeError::NotInScope { + bridge_id: self.bridge_id_static(), + public_name: public_name.to_string(), + }) + } + + /// `bridge_id` as `&'static str`. Default just returns the same value + /// as `bridge_id()`; bridges with non-static identifiers can override. + fn bridge_id_static(&self) -> &'static str { + self.bridge_id() + } +} + +/// Pointer to an entity in the dictionary. The hot-path consumer uses +/// `schema_ptr.entity_type_id()` as a dense index into per-namespace data. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] +pub struct EntityRef { + pub schema_ptr: SchemaPtr, +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] +pub struct EdgeRef { + pub schema_ptr: SchemaPtr, +} + +/// Convenience wrapper: converts a generic registry handle into a typed +/// bridge of the requested implementation. +pub fn make_bridge(registry: Arc) -> Result +where + B: BridgeFromRegistry, +{ + B::from_registry(registry) +} + +/// Implemented by every bridge struct that has a single-arg constructor +/// (registry → Self). All three default tenant bridges in +/// `crate::bridges` implement it. +pub trait BridgeFromRegistry: Sized { + fn from_registry(registry: Arc) -> Result; +} diff --git a/crates/lance-graph-ontology/src/bridges/medcare_bridge.rs b/crates/lance-graph-ontology/src/bridges/medcare_bridge.rs new file mode 100644 index 00000000..4a2ee6be --- /dev/null +++ b/crates/lance-graph-ontology/src/bridges/medcare_bridge.rs @@ -0,0 +1,44 @@ +//! MedCare (healthcare) tenant bridge — locks to the `Healthcare` +//! namespace. The Healthcare namespace itself is reserved and will be +//! populated by a future session (the FMA / SNOMED / RadLex import is the +//! remit of `lance-graph-rdf` in `lance-graph-rdf-fma-snomed-v1`). + +use crate::bridge::{BridgeFromRegistry, NamespaceBridge}; +use crate::error::{Error, Result}; +use crate::namespace::NamespaceId; +use crate::registry::OntologyRegistry; +use std::sync::Arc; + +pub const NAMESPACE: &str = "Healthcare"; + +pub struct MedcareBridge { + registry: Arc, + g_lock: NamespaceId, +} + +impl MedcareBridge { + pub fn new(registry: Arc) -> Result { + let g_lock = registry + .namespace_id(NAMESPACE) + .ok_or_else(|| Error::UnknownNamespace(NAMESPACE.to_string()))?; + Ok(Self { registry, g_lock }) + } +} + +impl NamespaceBridge for MedcareBridge { + fn bridge_id(&self) -> &'static str { + "medcare" + } + fn registry(&self) -> &OntologyRegistry { + &self.registry + } + fn g_lock(&self) -> NamespaceId { + self.g_lock + } +} + +impl BridgeFromRegistry for MedcareBridge { + fn from_registry(registry: Arc) -> Result { + Self::new(registry) + } +} diff --git a/crates/lance-graph-ontology/src/bridges/mod.rs b/crates/lance-graph-ontology/src/bridges/mod.rs new file mode 100644 index 00000000..7fe113e7 --- /dev/null +++ b/crates/lance-graph-ontology/src/bridges/mod.rs @@ -0,0 +1,23 @@ +//! Default tenant bridge implementations. +//! +//! Three bridges ship in this session: +//! +//! - [`OgitBridge`]: pass-through bridge for tools that already speak raw +//! OGIT URIs. `bridge_id = "ogit"`. Locks to whatever namespace its +//! constructor is called with (typically the namespace of the caller). +//! - [`WoaBridge`]: locks to the `WorkOrder` namespace. Public names like +//! `Customer`, `WorkOrder`, `Position` are translated via the registry +//! to the corresponding `ogit.WorkOrder:*` URIs. +//! - [`MedcareBridge`]: locks to the `Healthcare` namespace. +//! +//! The `smb-bridge` and `callcenter-bridge` are NOT created in this +//! session: smb stays on its native ontology fallback, callcenter has its +//! own auth + per-customer scoping concerns that need a separate design pass. + +mod medcare_bridge; +mod ogit_bridge; +mod woa_bridge; + +pub use medcare_bridge::MedcareBridge; +pub use ogit_bridge::OgitBridge; +pub use woa_bridge::WoaBridge; diff --git a/crates/lance-graph-ontology/src/bridges/ogit_bridge.rs b/crates/lance-graph-ontology/src/bridges/ogit_bridge.rs new file mode 100644 index 00000000..f0fcdc56 --- /dev/null +++ b/crates/lance-graph-ontology/src/bridges/ogit_bridge.rs @@ -0,0 +1,61 @@ +//! OGIT pass-through bridge. +//! +//! Tools that already speak raw OGIT URIs do not need a public-name +//! dictionary — they hand the URI directly. The `OgitBridge` provides a +//! consistent `NamespaceBridge` surface for those callers, locking to a +//! single namespace at construction time. +//! +//! A common pattern is to spin one `OgitBridge` per OGIT namespace that +//! the consumer cares about, e.g. one for `Network`, another for `Auth`. + +use crate::bridge::{BridgeFromRegistry, NamespaceBridge}; +use crate::error::{Error, Result}; +use crate::namespace::NamespaceId; +use crate::registry::OntologyRegistry; +use std::sync::Arc; + +pub struct OgitBridge { + registry: Arc, + namespace_name: String, + g_lock: NamespaceId, +} + +impl OgitBridge { + /// Construct an OGIT bridge locked to the given namespace. + pub fn for_namespace(registry: Arc, namespace: &str) -> Result { + let g_lock = registry + .namespace_id(namespace) + .ok_or_else(|| Error::UnknownNamespace(namespace.to_string()))?; + Ok(Self { + registry, + namespace_name: namespace.to_string(), + g_lock, + }) + } + + pub fn namespace_name(&self) -> &str { + &self.namespace_name + } +} + +impl NamespaceBridge for OgitBridge { + fn bridge_id(&self) -> &'static str { + "ogit" + } + fn registry(&self) -> &OntologyRegistry { + &self.registry + } + fn g_lock(&self) -> NamespaceId { + self.g_lock + } +} + +impl BridgeFromRegistry for OgitBridge { + /// Default constructor: locks to the `Network` namespace, which is the + /// most heavily-populated OGIT namespace and a reasonable smoke-test + /// default. Most consumers should call `OgitBridge::for_namespace` + /// directly with their own namespace name. + fn from_registry(registry: Arc) -> Result { + Self::for_namespace(registry, "Network") + } +} diff --git a/crates/lance-graph-ontology/src/bridges/woa_bridge.rs b/crates/lance-graph-ontology/src/bridges/woa_bridge.rs new file mode 100644 index 00000000..6f85184c --- /dev/null +++ b/crates/lance-graph-ontology/src/bridges/woa_bridge.rs @@ -0,0 +1,50 @@ +//! WoA (Work Order Application) tenant bridge — locks to the `WorkOrder` +//! namespace. Phase 6 of this session emits the corresponding TTL into +//! `AdaWorldAPI/OGIT/NTO/WorkOrder/`. + +use crate::bridge::{BridgeError, BridgeFromRegistry, NamespaceBridge}; +use crate::error::{Error, Result}; +use crate::namespace::NamespaceId; +use crate::registry::OntologyRegistry; +use std::sync::Arc; + +pub const NAMESPACE: &str = "WorkOrder"; + +pub struct WoaBridge { + registry: Arc, + g_lock: NamespaceId, +} + +impl WoaBridge { + pub fn new(registry: Arc) -> Result { + let g_lock = registry + .namespace_id(NAMESPACE) + .ok_or_else(|| Error::UnknownNamespace(NAMESPACE.to_string()))?; + Ok(Self { registry, g_lock }) + } +} + +impl NamespaceBridge for WoaBridge { + fn bridge_id(&self) -> &'static str { + "woa" + } + fn registry(&self) -> &OntologyRegistry { + &self.registry + } + fn g_lock(&self) -> NamespaceId { + self.g_lock + } +} + +impl BridgeFromRegistry for WoaBridge { + fn from_registry(registry: Arc) -> Result { + Self::new(registry) + } +} + +// Compile-only check that BridgeError is reachable from this crate. +#[allow(dead_code)] +fn _compile_check(b: &WoaBridge) -> std::result::Result<(), BridgeError> { + let _ = b.entity("WorkOrder")?; + Ok(()) +} diff --git a/crates/lance-graph-ontology/src/error.rs b/crates/lance-graph-ontology/src/error.rs new file mode 100644 index 00000000..7dd716ca --- /dev/null +++ b/crates/lance-graph-ontology/src/error.rs @@ -0,0 +1,64 @@ +//! Error type for the ontology crate. +//! +//! Public error enum with `thiserror` derives. The variants name the failure +//! site (TTL, Lance, namespace, bridge) so that consumers can pattern-match +//! and recover where appropriate. No serde, per the workspace "no JSON +//! serialization in types" rule. + +use std::path::PathBuf; + +#[derive(Debug, thiserror::Error)] +pub enum Error { + #[error("I/O error reading {path}: {source}")] + Io { + path: PathBuf, + #[source] + source: std::io::Error, + }, + + #[error("TTL parse error in {path}: {message}")] + TtlParse { path: PathBuf, message: String }, + + #[error("namespace `{0}` is not registered")] + UnknownNamespace(String), + + #[error("bridge `{bridge_id}`: public name `{public_name}` is not in scope (namespace lock)")] + OutOfScope { + bridge_id: String, + public_name: String, + }, + + #[error("OGIT URI `{0}` does not match the expected `ogit.:` shape")] + InvalidOgitUri(String), + + #[error("ontology registry has no entry for `{0}`")] + NotFound(String), + + #[error("toml decode error in semantic types: {0}")] + TomlDecode(String), + + #[error("checksum mismatch for `{0}` — TTL fragment changed but registry says it is idempotent")] + ChecksumMismatch(String), + + #[error("hydration produced 0 mappings from {0:?} — refusing to commit an empty registry")] + EmptyHydration(PathBuf), + + #[cfg(feature = "lance-cache")] + #[error("lance dataset error: {0}")] + Lance(String), + + #[cfg(feature = "lance-cache")] + #[error("arrow record-batch error: {0}")] + Arrow(String), + + #[error("internal: {0}")] + Other(String), +} + +impl Error { + pub fn other(msg: impl Into) -> Self { + Self::Other(msg.into()) + } +} + +pub type Result = std::result::Result; diff --git a/crates/lance-graph-ontology/src/foundry_map.rs b/crates/lance-graph-ontology/src/foundry_map.rs new file mode 100644 index 00000000..1d621501 --- /dev/null +++ b/crates/lance-graph-ontology/src/foundry_map.rs @@ -0,0 +1,77 @@ +//! `MappingProposal` → contract `Ontology` adapter. +//! +//! The TTL hydrator emits a flat sequence of `MappingProposal`s. Consumers +//! that want to talk to existing `lance-graph-contract::ontology` surfaces +//! (`SchemaExpander`, `SpoBridge`, callcenter `ontology_dto`) want a single +//! `Ontology` value with `schemas: Vec`, `links: Vec`, +//! `actions: Vec`. This module is the adapter. +//! +//! Carrier-method doctrine: methods on `OntologyAssembler` and +//! `MappingProposal` itself, not free functions on slices. + +use crate::proposal::{MappingProposal, MappingProposalKind}; +use lance_graph_contract::ontology::{Ontology, OntologyBuilder}; + +/// Assembles a contract `Ontology` from a slice of `MappingProposal`s. +/// Multiple proposals may target the same entity (e.g. one TTL file +/// declares the entity, a sibling file adds a verb between two of them); +/// the assembler merges them by name. +pub struct OntologyAssembler { + name: &'static str, +} + +impl OntologyAssembler { + pub fn new(name: &'static str) -> Self { + Self { name } + } + + pub fn assemble(&self, proposals: &[MappingProposal]) -> Ontology { + let mut builder: OntologyBuilder = Ontology::builder(self.name); + for proposal in proposals { + match &proposal.kind { + MappingProposalKind::Entity { schema } => { + builder = builder.schema(schema.clone()); + } + MappingProposalKind::Edge { link } => { + builder = builder.link(link.clone()); + } + // Standalone attribute proposals are SemanticType + // annotations on the dictionary; they don't add to the + // contract `Ontology` directly. The dictionary still + // carries them, and consumers that want SemanticType for + // a predicate look them up via `OntologyRegistry::resolve_uri`. + MappingProposalKind::Attribute { .. } => {} + } + } + builder.build() + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::namespace::OgitUri; + use lance_graph_contract::property::{Marking, Schema}; + + #[test] + fn assemble_merges_entities_and_links() { + let entity = MappingProposal { + public_name: "ogit.Test:Widget".to_string(), + bridge_id: "ogit".to_string(), + ogit_uri: OgitUri::parse("ogit.Test:Widget").unwrap(), + namespace: "Test".to_string(), + kind: MappingProposalKind::Entity { + schema: Schema::builder("Widget").required("id").build(), + }, + marking: Marking::Internal, + confidence: 1.0, + source_uri: "test://1".to_string(), + checksum: "abc".to_string(), + created_by: "test".to_string(), + }; + let assembler = OntologyAssembler::new("Test"); + let ontology = assembler.assemble(std::slice::from_ref(&entity)); + assert_eq!(ontology.schemas.len(), 1); + assert_eq!(ontology.schemas[0].name, "Widget"); + } +} diff --git a/crates/lance-graph-ontology/src/lance_cache.rs b/crates/lance-graph-ontology/src/lance_cache.rs new file mode 100644 index 00000000..a3cdb22d --- /dev/null +++ b/crates/lance-graph-ontology/src/lance_cache.rs @@ -0,0 +1,365 @@ +//! Lance dataset persistence for the `ontology_dictionary` table. +//! +//! Feature-gated behind `lance-cache` so the crate compiles without +//! `protoc` (Lance's `lance-encoding` build-script requires `protoc` via +//! `prost-build`). When enabled, this module owns the Arrow schema for +//! the dictionary table and translates between `MappingRow` and +//! `arrow::record_batch::RecordBatch`. +//! +//! ## Tables +//! +//! - `ontology_dictionary` — append-only rows, never UPDATE / DELETE. +//! Soft-deletes go through the `active: Boolean` column. +//! - `ontology_meta` — single row updated, holds `ttl_root_checksum` for +//! idempotent re-hydration. +//! +//! Every method here is async because Lance's native I/O is async. + +use crate::error::{Error, Result}; +use crate::namespace::{NamespaceId, OgitUri, SchemaKind, SchemaPtr}; +use crate::proposal::MappingRow; +use arrow::array::{ + ArrayRef, BooleanArray, Float32Array, RecordBatch, StringArray, TimestampMicrosecondArray, + UInt32Array, UInt8Array, +}; +use arrow_schema::{DataType, Field, Schema as ArrowSchema, TimeUnit}; +use lance::dataset::{Dataset, WriteMode, WriteParams}; +use lance_graph_contract::property::{Marking, SemanticType}; +use std::path::{Path, PathBuf}; +use std::sync::Arc; + +const DICTIONARY_NAME: &str = "ontology_dictionary"; +const META_NAME: &str = "ontology_meta"; + +pub struct LanceWriter { + base: PathBuf, +} + +impl LanceWriter { + pub async fn open_or_create(path: &Path) -> Result { + std::fs::create_dir_all(path).map_err(|source| Error::Io { + path: path.to_path_buf(), + source, + })?; + Ok(Self { + base: path.to_path_buf(), + }) + } + + pub fn dictionary_path(&self) -> PathBuf { + self.base.join(DICTIONARY_NAME) + } + + pub fn meta_path(&self) -> PathBuf { + self.base.join(META_NAME) + } + + pub async fn flush(&self, rows: &[MappingRow]) -> Result<()> { + if rows.is_empty() { + return Ok(()); + } + let batch = rows_to_record_batch(rows)?; + let schema = batch.schema(); + let path = self.dictionary_path(); + let path_str = path.to_string_lossy().to_string(); + let write_params = WriteParams { + mode: WriteMode::Append, + ..Default::default() + }; + let stream = futures::stream::iter(vec![Ok(batch)]); + let reader = + arrow::record_batch::RecordBatchIterator::new(stream.into_inner_unwrap_iter(), schema); + Dataset::write(reader, &path_str, Some(write_params)) + .await + .map_err(|e| Error::Lance(format!("write {}: {e}", path_str)))?; + Ok(()) + } + + pub async fn replay(&self) -> Result> { + let path = self.dictionary_path(); + if !path.exists() { + return Ok(Vec::new()); + } + let path_str = path.to_string_lossy().to_string(); + let dataset = Dataset::open(&path_str) + .await + .map_err(|e| Error::Lance(format!("open {}: {e}", path_str)))?; + let scanner = dataset + .scan() + .try_into_stream() + .await + .map_err(|e| Error::Lance(format!("scan: {e}")))?; + use futures::StreamExt; + let mut rows = Vec::new(); + let mut stream = scanner; + while let Some(batch) = stream.next().await { + let batch = batch.map_err(|e| Error::Lance(format!("batch: {e}")))?; + rows.append(&mut record_batch_to_rows(&batch)?); + } + Ok(rows) + } + + pub async fn last_root_checksum(&self) -> Result> { + let path = self.meta_path(); + if !path.exists() { + return Ok(None); + } + let path_str = path.to_string_lossy().to_string(); + let dataset = Dataset::open(&path_str) + .await + .map_err(|e| Error::Lance(format!("open meta: {e}")))?; + let mut stream = dataset + .scan() + .try_into_stream() + .await + .map_err(|e| Error::Lance(format!("scan meta: {e}")))?; + use futures::StreamExt; + if let Some(batch) = stream.next().await { + let batch = batch.map_err(|e| Error::Lance(format!("meta batch: {e}")))?; + let col = batch + .column_by_name("ttl_root_checksum") + .ok_or_else(|| Error::Lance("missing ttl_root_checksum".to_string()))?; + let arr = col + .as_any() + .downcast_ref::() + .ok_or_else(|| Error::Lance("ttl_root_checksum not String".to_string()))?; + if arr.len() > 0 { + return Ok(Some(arr.value(0).to_string())); + } + } + Ok(None) + } + + pub async fn set_last_root_checksum(&self, checksum: &str) -> Result<()> { + let schema = Arc::new(ArrowSchema::new(vec![ + Field::new("ttl_root_checksum", DataType::Utf8, false), + Field::new( + "last_hydrated_at", + DataType::Timestamp(TimeUnit::Microsecond, None), + false, + ), + Field::new("crate_version", DataType::Utf8, false), + ])); + let now = chrono_micros(); + let cols: Vec = vec![ + Arc::new(StringArray::from(vec![checksum])), + Arc::new(TimestampMicrosecondArray::from(vec![now])), + Arc::new(StringArray::from(vec![env!("CARGO_PKG_VERSION")])), + ]; + let batch = RecordBatch::try_new(schema.clone(), cols) + .map_err(|e| Error::Arrow(format!("meta batch: {e}")))?; + let path = self.meta_path(); + let path_str = path.to_string_lossy().to_string(); + // Meta is a single-row table — overwrite. + let stream = futures::stream::iter(vec![Ok(batch)]); + let reader = arrow::record_batch::RecordBatchIterator::new( + stream.into_inner_unwrap_iter(), + schema, + ); + let write_params = WriteParams { + mode: WriteMode::Overwrite, + ..Default::default() + }; + Dataset::write(reader, &path_str, Some(write_params)) + .await + .map_err(|e| Error::Lance(format!("write meta: {e}")))?; + Ok(()) + } +} + +fn dictionary_schema() -> Arc { + Arc::new(ArrowSchema::new(vec![ + Field::new("bridge_id", DataType::Utf8, false), + Field::new("public_name", DataType::Utf8, false), + Field::new("ogit_uri", DataType::Utf8, false), + Field::new("namespace_id", DataType::UInt8, false), + Field::new("schema_ptr", DataType::UInt32, false), + Field::new("kind", DataType::Utf8, false), + Field::new("semantic_type", DataType::Utf8, false), + Field::new("marking", DataType::Utf8, false), + Field::new("confidence", DataType::Float32, false), + Field::new( + "created_at", + DataType::Timestamp(TimeUnit::Microsecond, None), + false, + ), + Field::new("created_by", DataType::Utf8, false), + Field::new("source_uri", DataType::Utf8, false), + Field::new("active", DataType::Boolean, false), + Field::new("checksum", DataType::Utf8, false), + ])) +} + +fn rows_to_record_batch(rows: &[MappingRow]) -> Result { + let bridge_id: Vec<&str> = rows.iter().map(|r| r.bridge_id.as_str()).collect(); + let public_name: Vec<&str> = rows.iter().map(|r| r.public_name.as_str()).collect(); + let ogit_uri: Vec<&str> = rows.iter().map(|r| r.ogit_uri.as_str()).collect(); + let namespace_id: Vec = rows.iter().map(|r| r.namespace_id.raw()).collect(); + let schema_ptr: Vec = rows.iter().map(|r| r.schema_ptr.raw()).collect(); + let kind: Vec<&str> = rows.iter().map(|r| r.kind.as_str()).collect(); + let semantic_type: Vec = + rows.iter().map(|r| semantic_type_label(&r.semantic_type)).collect(); + let marking: Vec<&str> = rows.iter().map(|r| marking_label(r.marking)).collect(); + let confidence: Vec = rows.iter().map(|r| r.confidence).collect(); + let created_at: Vec = rows.iter().map(|r| r.created_at_us).collect(); + let created_by: Vec<&str> = rows.iter().map(|r| r.created_by.as_str()).collect(); + let source_uri: Vec<&str> = rows.iter().map(|r| r.source_uri.as_str()).collect(); + let active: Vec = rows.iter().map(|r| r.active).collect(); + let checksum: Vec<&str> = rows.iter().map(|r| r.checksum.as_str()).collect(); + + let cols: Vec = vec![ + Arc::new(StringArray::from(bridge_id)), + Arc::new(StringArray::from(public_name)), + Arc::new(StringArray::from(ogit_uri)), + Arc::new(UInt8Array::from(namespace_id)), + Arc::new(UInt32Array::from(schema_ptr)), + Arc::new(StringArray::from(kind)), + Arc::new(StringArray::from(semantic_type)), + Arc::new(StringArray::from(marking)), + Arc::new(Float32Array::from(confidence)), + Arc::new(TimestampMicrosecondArray::from(created_at)), + Arc::new(StringArray::from(created_by)), + Arc::new(StringArray::from(source_uri)), + Arc::new(BooleanArray::from(active)), + Arc::new(StringArray::from(checksum)), + ]; + RecordBatch::try_new(dictionary_schema(), cols).map_err(|e| Error::Arrow(format!("{e}"))) +} + +fn record_batch_to_rows(batch: &RecordBatch) -> Result> { + let bridge_id = string_col(batch, "bridge_id")?; + let public_name = string_col(batch, "public_name")?; + let ogit_uri = string_col(batch, "ogit_uri")?; + let namespace_id = u8_col(batch, "namespace_id")?; + let schema_ptr = u32_col(batch, "schema_ptr")?; + let kind = string_col(batch, "kind")?; + let semantic_type = string_col(batch, "semantic_type")?; + let marking = string_col(batch, "marking")?; + let confidence = f32_col(batch, "confidence")?; + let created_at = ts_col(batch, "created_at")?; + let created_by = string_col(batch, "created_by")?; + let source_uri = string_col(batch, "source_uri")?; + let active = bool_col(batch, "active")?; + let checksum = string_col(batch, "checksum")?; + + let mut rows = Vec::with_capacity(bridge_id.len()); + for i in 0..bridge_id.len() { + rows.push(MappingRow { + bridge_id: bridge_id.value(i).to_string(), + public_name: public_name.value(i).to_string(), + ogit_uri: OgitUri::from_string_unchecked(ogit_uri.value(i)), + namespace_id: NamespaceId(namespace_id.value(i)), + schema_ptr: SchemaPtr::from_raw(schema_ptr.value(i)), + kind: SchemaKind::parse(kind.value(i)).unwrap_or(SchemaKind::Entity), + semantic_type: parse_semantic_type_label(semantic_type.value(i)), + marking: parse_marking_label(marking.value(i)), + confidence: confidence.value(i), + created_at_us: created_at.value(i), + created_by: created_by.value(i).to_string(), + source_uri: source_uri.value(i).to_string(), + active: active.value(i), + checksum: checksum.value(i).to_string(), + }); + } + Ok(rows) +} + +fn string_col<'a>(batch: &'a RecordBatch, name: &str) -> Result<&'a StringArray> { + batch + .column_by_name(name) + .and_then(|c| c.as_any().downcast_ref::()) + .ok_or_else(|| Error::Arrow(format!("missing or non-Utf8 column `{name}`"))) +} +fn u8_col<'a>(batch: &'a RecordBatch, name: &str) -> Result<&'a UInt8Array> { + batch + .column_by_name(name) + .and_then(|c| c.as_any().downcast_ref::()) + .ok_or_else(|| Error::Arrow(format!("missing or non-U8 column `{name}`"))) +} +fn u32_col<'a>(batch: &'a RecordBatch, name: &str) -> Result<&'a UInt32Array> { + batch + .column_by_name(name) + .and_then(|c| c.as_any().downcast_ref::()) + .ok_or_else(|| Error::Arrow(format!("missing or non-U32 column `{name}`"))) +} +fn f32_col<'a>(batch: &'a RecordBatch, name: &str) -> Result<&'a Float32Array> { + batch + .column_by_name(name) + .and_then(|c| c.as_any().downcast_ref::()) + .ok_or_else(|| Error::Arrow(format!("missing or non-F32 column `{name}`"))) +} +fn ts_col<'a>(batch: &'a RecordBatch, name: &str) -> Result<&'a TimestampMicrosecondArray> { + batch + .column_by_name(name) + .and_then(|c| c.as_any().downcast_ref::()) + .ok_or_else(|| Error::Arrow(format!("missing or non-Timestamp column `{name}`"))) +} +fn bool_col<'a>(batch: &'a RecordBatch, name: &str) -> Result<&'a BooleanArray> { + batch + .column_by_name(name) + .and_then(|c| c.as_any().downcast_ref::()) + .ok_or_else(|| Error::Arrow(format!("missing or non-Bool column `{name}`"))) +} + +fn marking_label(m: Marking) -> &'static str { + match m { + Marking::Public => "Public", + Marking::Internal => "Internal", + Marking::Pii => "Pii", + Marking::Financial => "Financial", + Marking::Restricted => "Restricted", + } +} +fn parse_marking_label(s: &str) -> Marking { + match s { + "Public" => Marking::Public, + "Pii" => Marking::Pii, + "Financial" => Marking::Financial, + "Restricted" => Marking::Restricted, + _ => Marking::Internal, + } +} + +fn semantic_type_label(t: &SemanticType) -> String { + match t { + SemanticType::PlainText => "PlainText".to_string(), + SemanticType::Iban => "Iban".to_string(), + SemanticType::Email => "Email".to_string(), + SemanticType::Phone => "Phone".to_string(), + SemanticType::Address => "Address".to_string(), + SemanticType::Url => "Url".to_string(), + SemanticType::TaxId => "TaxId".to_string(), + SemanticType::CustomerId => "CustomerId".to_string(), + SemanticType::InvoiceNumber => "InvoiceNumber".to_string(), + SemanticType::Image => "Image".to_string(), + SemanticType::Currency(code) => format!("Currency({code})"), + SemanticType::File(mime) => format!("File({mime})"), + SemanticType::Date(p) => format!("Date({p:?})"), + SemanticType::Geo(g) => format!("Geo({g:?})"), + } +} +fn parse_semantic_type_label(s: &str) -> SemanticType { + match s { + "PlainText" => SemanticType::PlainText, + "Iban" => SemanticType::Iban, + "Email" => SemanticType::Email, + "Phone" => SemanticType::Phone, + "Address" => SemanticType::Address, + "Url" => SemanticType::Url, + "TaxId" => SemanticType::TaxId, + "CustomerId" => SemanticType::CustomerId, + "InvoiceNumber" => SemanticType::InvoiceNumber, + "Image" => SemanticType::Image, + // Parametric variants need a static-string parameter we cannot + // recover from the dictionary; fall through to PlainText. + _ => SemanticType::PlainText, + } +} + +fn chrono_micros() -> i64 { + use std::time::{SystemTime, UNIX_EPOCH}; + SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_micros() as i64) + .unwrap_or(0) +} diff --git a/crates/lance-graph-ontology/src/lib.rs b/crates/lance-graph-ontology/src/lib.rs new file mode 100644 index 00000000..a33bbc62 --- /dev/null +++ b/crates/lance-graph-ontology/src/lib.rs @@ -0,0 +1,59 @@ +//! `lance-graph-ontology` — the OGIT-canonical ontology spine for lance-graph +//! tenants. +//! +//! This crate consolidates per-tenant bridge multiplication into one shared +//! registry. OGIT becomes the canonical TTL ontology source; Lance becomes +//! the runtime dictionary cache; tenant bridges (woa, medcare, ogit) become +//! thin scoped views over the shared registry. TTL is the only ontology +//! exchange format. +//! +//! ## Surface +//! +//! - [`OntologyRegistry`] is the single registry. It hydrates from a TTL root +//! directory (typically the AdaWorldAPI/OGIT fork checked out next to the +//! workspace), holds an in-memory dictionary keyed by `(bridge_id, +//! public_name)` and by OGIT URI, and (under the `lance-cache` feature) +//! persists rows append-only to a Lance dataset. +//! - [`NamespaceBridge`] is the trait every tenant bridge implements. Default +//! methods do the heavy lifting: a typical tenant bridge is ~15-20 lines +//! that lock to one namespace and route resolution through the shared +//! registry. See [`bridges::WoaBridge`], [`bridges::MedcareBridge`], +//! [`bridges::OgitBridge`]. +//! - [`MappingProposal`] is the producer-side DTO. TTL hydration emits +//! proposals; schema scanners (MySQL/MSSQL, future) and customer admin +//! forms emit proposals; everything funnels through one append path. +//! - [`SchemaSource`] is the abstract producer trait. Implementations: TTL +//! directory walker (in this crate), MySQL/MSSQL scanners (future), +//! customer admin forms (future UX layer). +//! +//! ## What this crate is NOT +//! +//! It is not a new SPO store. It is not a quad store. It does not parse +//! Cypher / Gremlin / SPARQL / GQL — those parsers already exist in +//! `lance-graph-planner::strategy::*`. It does not introduce new +//! `CausalEdge64` variants or new `BindSpace` columns. It does not modify +//! the MUL gate logic. It is a parser + cache + scoping facade over the +//! existing `lance-graph-contract::ontology` surface. + +pub mod bridge; +pub mod bridges; +pub mod error; +pub mod foundry_map; +pub mod namespace; +pub mod proposal; +pub mod registry; +pub mod schema_source; +pub mod semantic_types; +pub mod ttl_parse; + +#[cfg(feature = "lance-cache")] +pub mod lance_cache; + +pub use bridge::{BridgeError, NamespaceBridge}; +pub use error::Error; +pub use namespace::{NamespaceId, OgitUri, SchemaPtr}; +pub use proposal::{ + HydrationReport, MappingHandle, MappingProposal, MappingProposalKind, MappingRow, +}; +pub use registry::OntologyRegistry; +pub use schema_source::SchemaSource; diff --git a/crates/lance-graph-ontology/src/namespace.rs b/crates/lance-graph-ontology/src/namespace.rs new file mode 100644 index 00000000..9abbb5b9 --- /dev/null +++ b/crates/lance-graph-ontology/src/namespace.rs @@ -0,0 +1,211 @@ +//! Namespace + URI + SchemaPtr identity types. +//! +//! `NamespaceId` is the lazy-lock register switch G — one byte per OGIT +//! namespace. `OgitUri` is the fully-qualified canonical name +//! (`ogit.Network:IPAddress`). `SchemaPtr` is a packed pointer of +//! `(namespace_id, entity_type_id, kind_disc)` that the hot-path resolver +//! returns. The packed layout matches the plan's bit-packing convention: +//! +//! ```text +//! SchemaPtr (u32): +//! bits 31..24 : namespace_id (u8) +//! bits 23..8 : entity_type_id (u16, dense within the namespace) +//! bits 7..0 : kind discriminant (u8) — Entity / Edge / Attribute +//! ``` +//! +//! Carrier-method doctrine: methods live on these types, not free functions. + +use crate::error::{Error, Result}; + +/// G: the lazy-lock register switch. 0 is reserved for "unknown / unbound"; +/// 1..=255 are valid namespace ordinals assigned by the registry as TTL +/// hydrates each namespace for the first time. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)] +pub struct NamespaceId(pub u8); + +impl NamespaceId { + pub const UNKNOWN: NamespaceId = NamespaceId(0); + + pub const fn raw(self) -> u8 { + self.0 + } + + pub const fn is_known(self) -> bool { + self.0 != 0 + } +} + +/// The fully-qualified OGIT URI for an entity, edge, or attribute. +/// Form: `ogit.:`. We store as `String` because the +/// dictionary table is dynamic — namespaces can be added at runtime. +#[derive(Clone, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)] +pub struct OgitUri(String); + +impl OgitUri { + /// Construct an OgitUri without validation. Prefer [`OgitUri::parse`]. + pub fn from_string_unchecked(s: impl Into) -> Self { + Self(s.into()) + } + + /// Parse and validate an OgitUri. The shape `ogit.:` is + /// enforced; bare strings or empty namespace/name are rejected. + pub fn parse(s: &str) -> Result { + let ns = Self::namespace_part(s).filter(|p| !p.is_empty()); + let name = Self::name_part(s).filter(|p| !p.is_empty()); + if ns.is_some() && name.is_some() { + Ok(Self(s.to_string())) + } else { + Err(Error::InvalidOgitUri(s.to_string())) + } + } + + pub fn as_str(&self) -> &str { + &self.0 + } + + pub fn into_string(self) -> String { + self.0 + } + + /// Returns `Some("Network")` for `ogit.Network:IPAddress`. + pub fn namespace(&self) -> Option<&str> { + Self::namespace_part(&self.0) + } + + /// Returns `Some("IPAddress")` for `ogit.Network:IPAddress`. + pub fn name(&self) -> Option<&str> { + Self::name_part(&self.0) + } + + fn namespace_part(s: &str) -> Option<&str> { + let after_prefix = s.strip_prefix("ogit.")?; + let colon = after_prefix.find(':')?; + Some(&after_prefix[..colon]) + } + + fn name_part(s: &str) -> Option<&str> { + let colon = s.find(':')?; + let after = &s[colon + 1..]; + if after.is_empty() { + None + } else { + Some(after) + } + } +} + +impl std::fmt::Display for OgitUri { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.write_str(&self.0) + } +} + +/// Packed schema pointer. Returned from +/// [`crate::OntologyRegistry::resolve`]. The hot path consumer pattern is +/// to compare the `namespace_id()` against the bridge's lock and then use +/// the `entity_type_id()` as the dense local index. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)] +pub struct SchemaPtr(u32); + +impl SchemaPtr { + pub const fn new(namespace_id: NamespaceId, entity_type_id: u16, kind: SchemaKind) -> Self { + let packed = ((namespace_id.0 as u32) << 24) + | ((entity_type_id as u32) << 8) + | (kind as u32 & 0xFF); + Self(packed) + } + + pub const fn raw(self) -> u32 { + self.0 + } + + /// Reconstruct a SchemaPtr from its packed `u32`. Used by the Lance + /// cache when replaying the dictionary on startup. + pub const fn from_raw(raw: u32) -> Self { + Self(raw) + } + + pub const fn namespace_id(self) -> NamespaceId { + NamespaceId(((self.0 >> 24) & 0xFF) as u8) + } + + pub const fn entity_type_id(self) -> u16 { + ((self.0 >> 8) & 0xFFFF) as u16 + } + + pub const fn kind(self) -> SchemaKind { + match self.0 & 0xFF { + 0 => SchemaKind::Entity, + 1 => SchemaKind::Edge, + 2 => SchemaKind::Attribute, + _ => SchemaKind::Entity, + } + } +} + +#[repr(u8)] +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] +pub enum SchemaKind { + Entity = 0, + Edge = 1, + Attribute = 2, +} + +impl SchemaKind { + pub const fn as_str(self) -> &'static str { + match self { + Self::Entity => "entity", + Self::Edge => "edge", + Self::Attribute => "attribute", + } + } + + pub fn parse(s: &str) -> Option { + match s { + "entity" => Some(Self::Entity), + "edge" => Some(Self::Edge), + "attribute" => Some(Self::Attribute), + _ => None, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn ogit_uri_parses_namespace_and_name() { + let uri = OgitUri::parse("ogit.Network:IPAddress").unwrap(); + assert_eq!(uri.namespace(), Some("Network")); + assert_eq!(uri.name(), Some("IPAddress")); + } + + #[test] + fn ogit_uri_rejects_malformed() { + assert!(OgitUri::parse("ogit.Network").is_err()); + assert!(OgitUri::parse("Network:IPAddress").is_err()); + assert!(OgitUri::parse("ogit.:Empty").is_err()); + assert!(OgitUri::parse("ogit.Network:").is_err()); + } + + #[test] + fn schema_ptr_round_trips() { + let ptr = SchemaPtr::new(NamespaceId(7), 42, SchemaKind::Entity); + assert_eq!(ptr.namespace_id(), NamespaceId(7)); + assert_eq!(ptr.entity_type_id(), 42); + assert_eq!(ptr.kind(), SchemaKind::Entity); + } + + #[test] + fn schema_ptr_kinds() { + let entity = SchemaPtr::new(NamespaceId(1), 1, SchemaKind::Entity); + let edge = SchemaPtr::new(NamespaceId(1), 1, SchemaKind::Edge); + let attr = SchemaPtr::new(NamespaceId(1), 1, SchemaKind::Attribute); + assert_ne!(entity.raw(), edge.raw()); + assert_ne!(entity.raw(), attr.raw()); + assert_eq!(entity.kind(), SchemaKind::Entity); + assert_eq!(edge.kind(), SchemaKind::Edge); + assert_eq!(attr.kind(), SchemaKind::Attribute); + } +} diff --git a/crates/lance-graph-ontology/src/proposal.rs b/crates/lance-graph-ontology/src/proposal.rs new file mode 100644 index 00000000..b5c08f43 --- /dev/null +++ b/crates/lance-graph-ontology/src/proposal.rs @@ -0,0 +1,137 @@ +//! Producer-side DTO and dictionary row types. +//! +//! `MappingProposal` is what TTL hydration and (future) MySQL/MSSQL scanners +//! emit. `MappingRow` is what the registry stores. `HydrationReport` is the +//! summary returned from a hydration run. `MappingHandle` is an opaque +//! receipt for an appended proposal. +//! +//! Carrier-method doctrine: methods on these types describe what they do. + +use crate::namespace::{NamespaceId, OgitUri, SchemaKind, SchemaPtr}; +use lance_graph_contract::property::{LinkSpec, Marking, Schema, SemanticType}; + +/// A single producer-side proposal. One TTL file → typically one proposal +/// (an entity TTL). Schema scanners may emit one proposal per discovered +/// table; customer admin forms may emit one per row. +#[derive(Clone, Debug)] +pub struct MappingProposal { + /// Producer-facing public name. For OGIT-direct: the OGIT URI itself + /// (e.g. `ogit.Network:IPAddress`). For tenant bridges: the bridge's + /// public name (e.g. `Customer`, `WorkOrder`). + pub public_name: String, + /// Bridge id this proposal is registered under. `"ogit"` for raw OGIT. + /// `"woa"`, `"medcare"`, etc. for tenant bridges. The same OGIT URI may + /// appear under multiple bridge ids with different public names. + pub bridge_id: String, + /// Canonical OGIT URI. Required: every mapping must resolve to a URI. + pub ogit_uri: OgitUri, + /// Namespace of the OGIT URI (e.g. "Network", "WorkOrder"). The + /// registry uses this to assign / look up the `NamespaceId` (G). + pub namespace: String, + /// What kind of mapping this is. + pub kind: MappingProposalKind, + /// Default marking. PII / Financial / Restricted overrides come from + /// the TTL annotation or the schema scanner. + pub marking: Marking, + /// Confidence — 1.0 for canonical TTL hydration; <1.0 for scanner- + /// suggested mappings awaiting review; 0.0 for guesses. + pub confidence: f32, + /// Where this proposal came from. Free text, intended for audit. + pub source_uri: String, + /// SHA256 of the source fragment (TTL file body, scanner output, etc.). + /// Used for idempotent re-hydration. + pub checksum: String, + /// Who/what produced this proposal. `"ogit_hydrator_v1"`, + /// `"mysql_scanner_v1"`, `"admin:user@..."`. + pub created_by: String, +} + +/// What kind of mapping this proposal carries. Entity mappings carry a +/// `Schema`; edge mappings carry a `LinkSpec`; attribute mappings carry a +/// single `SemanticType` annotation. +#[derive(Clone, Debug)] +pub enum MappingProposalKind { + Entity { + schema: Schema, + }, + Edge { + link: LinkSpec, + }, + Attribute { + predicate: String, + semantic_type: SemanticType, + }, +} + +impl MappingProposal { + pub fn schema_kind(&self) -> SchemaKind { + match self.kind { + MappingProposalKind::Entity { .. } => SchemaKind::Entity, + MappingProposalKind::Edge { .. } => SchemaKind::Edge, + MappingProposalKind::Attribute { .. } => SchemaKind::Attribute, + } + } +} + +/// What the registry stores. `MappingRow` mirrors the +/// `ontology_dictionary` Lance table schema column-for-column. Adding a +/// new column means adding a field here AND extending the Lance writer +/// (under `lance-cache`) AND bumping the registry's append path. +#[derive(Clone, Debug)] +pub struct MappingRow { + pub bridge_id: String, + pub public_name: String, + pub ogit_uri: OgitUri, + pub namespace_id: NamespaceId, + pub schema_ptr: SchemaPtr, + pub kind: SchemaKind, + pub semantic_type: SemanticType, + pub marking: Marking, + pub confidence: f32, + pub created_at_us: i64, + pub created_by: String, + pub source_uri: String, + pub active: bool, + pub checksum: String, +} + +impl MappingRow { + pub fn schema_ptr(&self) -> SchemaPtr { + self.schema_ptr + } +} + +/// Opaque receipt for an appended proposal. Carries the assigned +/// `SchemaPtr` and the dictionary index where the row landed. +#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash)] +pub struct MappingHandle { + pub schema_ptr: SchemaPtr, + pub row_index: u32, +} + +/// Summary of a `hydrate_once` run. +#[derive(Clone, Debug, Default)] +pub struct HydrationReport { + pub registered: u32, + pub skipped_idempotent: u32, + pub failed: u32, + pub failures: Vec, + pub namespaces_seen: Vec, + pub from_cache: bool, +} + +#[derive(Clone, Debug)] +pub struct HydrationFailure { + pub source: String, + pub reason: String, +} + +impl HydrationReport { + pub fn total(&self) -> u32 { + self.registered + self.skipped_idempotent + self.failed + } + + pub fn is_clean(&self) -> bool { + self.failed == 0 + } +} diff --git a/crates/lance-graph-ontology/src/registry.rs b/crates/lance-graph-ontology/src/registry.rs new file mode 100644 index 00000000..d3918376 --- /dev/null +++ b/crates/lance-graph-ontology/src/registry.rs @@ -0,0 +1,462 @@ +//! `OntologyRegistry` — the in-memory dictionary + (optional) Lance cache. +//! +//! The registry is the single canonical surface for the new crate. It does +//! three things: +//! +//! 1. Hydrates from a TTL root via [`OntologyRegistry::hydrate_once`] (or +//! its sync sibling [`OntologyRegistry::hydrate_once_sync`]). Idempotent +//! via SHA256 of the TTL root. +//! 2. Resolves `(bridge_id, public_name)` → `SchemaPtr` and OGIT URI → +//! `SchemaPtr` for the hot path. +//! 3. Persists rows to a Lance dataset (under the `lance-cache` feature). +//! Without that feature the registry is in-memory only — sufficient for +//! tests and for consumers that re-hydrate from TTL on every start. +//! +//! Carrier-method doctrine throughout: methods on the registry, not free +//! functions on registry state. + +use crate::error::{Error, Result}; +use crate::namespace::{NamespaceId, SchemaPtr}; +use crate::proposal::{ + HydrationFailure, HydrationReport, MappingHandle, MappingProposal, MappingProposalKind, + MappingRow, +}; +use crate::semantic_types::SemanticTypeMap; +use crate::ttl_parse::{parse_ttl_directory, ttl_root_checksum}; +use lance_graph_contract::property::{Marking, SemanticType}; +use std::collections::HashMap; +use std::path::{Path, PathBuf}; +use std::sync::RwLock; +use std::time::{SystemTime, UNIX_EPOCH}; + +/// The single ontology registry. +pub struct OntologyRegistry { + inner: RwLock, + sem_map: SemanticTypeMap, + #[cfg_attr(not(feature = "lance-cache"), allow(dead_code))] + lance_path: Option, +} + +#[derive(Default)] +struct RegistryState { + rows: Vec, + by_bridge_name: HashMap<(String, String), u32>, + by_uri: HashMap, + by_namespace: HashMap, + namespace_order: Vec, + last_root_checksum: Option, +} + +impl OntologyRegistry { + /// In-memory registry. Lance persistence is disabled. + pub fn new_in_memory() -> Self { + Self { + inner: RwLock::new(RegistryState::default()), + sem_map: SemanticTypeMap::defaults().clone(), + lance_path: None, + } + } + + /// In-memory registry with a custom semantic-type map. + pub fn with_semantic_types(sem: SemanticTypeMap) -> Self { + Self { + inner: RwLock::new(RegistryState::default()), + sem_map: sem, + lance_path: None, + } + } + + /// Lance-backed registry. Opens the dataset at `lance_path` (creating + /// it if missing) and replays the dictionary into memory. Async because + /// Lance's I/O surface is async. + #[cfg(feature = "lance-cache")] + pub async fn open(lance_path: &Path) -> Result { + use crate::lance_cache::LanceWriter; + let writer = LanceWriter::open_or_create(lance_path).await?; + let mut state = RegistryState::default(); + for row in writer.replay().await? { + state.absorb_row(row); + } + state.last_root_checksum = writer.last_root_checksum().await?; + Ok(Self { + inner: RwLock::new(state), + sem_map: SemanticTypeMap::defaults().clone(), + lance_path: Some(lance_path.to_path_buf()), + }) + } + + /// Sync hydration. Reads TTL files under `ttl_root`, parses each, and + /// appends every produced `MappingProposal` to the in-memory dictionary. + /// Idempotent: skips parsing if the TTL root checksum matches the last + /// hydration. Lance persistence is NOT touched here — see + /// [`OntologyRegistry::hydrate_once`] for the persisted variant. + pub fn hydrate_once_sync( + &self, + ttl_root: &Path, + namespaces: &[&str], + ) -> Result { + let root_checksum = ttl_root_checksum(ttl_root)?; + if let Some(prev) = &self.inner.read().unwrap().last_root_checksum { + if prev == &root_checksum { + return Ok(HydrationReport { + from_cache: true, + ..Default::default() + }); + } + } + + let (proposals, failures) = + parse_ttl_directory(ttl_root, "ogit", &self.sem_map, namespaces)?; + if proposals.is_empty() && failures.is_empty() { + return Err(Error::EmptyHydration(ttl_root.to_path_buf())); + } + + let mut report = HydrationReport { + failures: failures.clone(), + failed: failures.len() as u32, + ..Default::default() + }; + let mut state = self.inner.write().unwrap(); + let mut seen_namespaces: std::collections::BTreeSet = Default::default(); + for proposal in proposals { + seen_namespaces.insert(proposal.namespace.clone()); + match state.append(proposal, &self.sem_map) { + AppendOutcome::Inserted(_) => report.registered += 1, + AppendOutcome::Idempotent => report.skipped_idempotent += 1, + AppendOutcome::Failed(reason) => { + report.failed += 1; + report.failures.push(HydrationFailure { + source: ttl_root.display().to_string(), + reason, + }); + } + } + } + state.last_root_checksum = Some(root_checksum); + report.namespaces_seen = seen_namespaces.into_iter().collect(); + Ok(report) + } + + /// Async hydration. Same as `hydrate_once_sync` but additionally + /// persists newly-registered rows to the Lance dataset (when the + /// crate is built with `lance-cache`). Without `lance-cache` it is + /// equivalent to `hydrate_once_sync` and is provided for API parity. + #[cfg(feature = "lance-cache")] + pub async fn hydrate_once( + &self, + ttl_root: &Path, + namespaces: &[&str], + ) -> Result { + let report = self.hydrate_once_sync(ttl_root, namespaces)?; + if report.from_cache { + return Ok(report); + } + if let Some(lance_path) = &self.lance_path { + use crate::lance_cache::LanceWriter; + let writer = LanceWriter::open_or_create(lance_path).await?; + let rows: Vec = self.inner.read().unwrap().rows.clone(); + writer.flush(&rows).await?; + if let Some(cs) = &self.inner.read().unwrap().last_root_checksum { + writer.set_last_root_checksum(cs).await?; + } + } + Ok(report) + } + + /// Append a single proposal directly. Used by schema scanners and + /// customer admin forms. + pub fn append_mapping(&self, proposal: MappingProposal) -> Result { + let mut state = self.inner.write().unwrap(); + match state.append(proposal, &self.sem_map) { + AppendOutcome::Inserted(handle) => Ok(handle), + AppendOutcome::Idempotent => { + // Resolve back the existing handle. + let bridge_id = state.rows.last().map(|r| r.bridge_id.clone()).unwrap_or_default(); + let public_name = state.rows.last().map(|r| r.public_name.clone()).unwrap_or_default(); + let key = (bridge_id, public_name); + let idx = *state + .by_bridge_name + .get(&key) + .ok_or_else(|| Error::other("idempotent append produced no row index"))?; + let row = &state.rows[idx as usize]; + Ok(MappingHandle { + schema_ptr: row.schema_ptr, + row_index: idx, + }) + } + AppendOutcome::Failed(reason) => Err(Error::other(reason)), + } + } + + /// Resolve `(bridge_id, public_name)` → `SchemaPtr`. + pub fn resolve(&self, bridge_id: &str, public_name: &str) -> Option { + let state = self.inner.read().unwrap(); + let key = (bridge_id.to_string(), public_name.to_string()); + state + .by_bridge_name + .get(&key) + .map(|idx| state.rows[*idx as usize].schema_ptr) + } + + /// Resolve raw OGIT URI → `SchemaPtr`. + pub fn resolve_uri(&self, ogit_uri: &str) -> Option { + let state = self.inner.read().unwrap(); + state + .by_uri + .get(ogit_uri) + .map(|idx| state.rows[*idx as usize].schema_ptr) + } + + /// Get the full row for a given OGIT URI. + pub fn row_for_uri(&self, ogit_uri: &str) -> Option { + let state = self.inner.read().unwrap(); + state + .by_uri + .get(ogit_uri) + .map(|idx| state.rows[*idx as usize].clone()) + } + + /// Look up a namespace's `NamespaceId` (G). + pub fn namespace_id(&self, name: &str) -> Option { + self.inner.read().unwrap().by_namespace.get(name).copied() + } + + /// Names of all known namespaces, in registration order (1-indexed in + /// the returned slice; `NamespaceId(0)` is reserved for unknown). + pub fn namespace_names(&self) -> Vec { + self.inner.read().unwrap().namespace_order.clone() + } + + /// Enumerate all rows under a given namespace. + pub fn enumerate(&self, namespace: &str) -> Vec { + let state = self.inner.read().unwrap(); + let id = match state.by_namespace.get(namespace) { + Some(id) => *id, + None => return Vec::new(), + }; + state + .rows + .iter() + .filter(|r| r.namespace_id == id) + .cloned() + .collect() + } + + /// Count the rows in the dictionary. + pub fn len(&self) -> usize { + self.inner.read().unwrap().rows.len() + } + + pub fn is_empty(&self) -> bool { + self.inner.read().unwrap().rows.is_empty() + } + + /// Export the registry to an OGIT-shaped TTL fragment for the named + /// namespace. Used by the Lance ↔ OGIT round-trip and for fork PRs + /// that promote schema-scanner suggestions back into the canonical + /// vocabulary. + pub fn export_ttl(&self, namespace: &str, out: &Path) -> Result<()> { + let rows = self.enumerate(namespace); + let mut buf = String::new(); + buf.push_str("@prefix ogit: .\n"); + buf.push_str("@prefix rdfs: .\n"); + buf.push_str("@prefix dcterms: .\n"); + buf.push_str(&format!( + "@prefix ogit.{ns}: .\n\n", + ns = namespace + )); + for row in rows { + let name = row.ogit_uri.name().unwrap_or("Unknown"); + let kind = row.kind.as_str(); + buf.push_str(&format!("# kind: {kind}; bridge: {}\n", row.bridge_id)); + buf.push_str(&format!("ogit.{}:{}\n", namespace, name)); + buf.push_str("\ta rdfs:Class ;\n"); + buf.push_str("\trdfs:subClassOf ogit:Entity ;\n"); + buf.push_str(&format!("\trdfs:label \"{}\" ;\n", name)); + buf.push_str(&format!( + "\tdcterms:source \"{}\" .\n\n", + row.source_uri.replace('"', "'") + )); + } + std::fs::write(out, buf).map_err(|source| Error::Io { + path: out.to_path_buf(), + source, + })?; + Ok(()) + } +} + +enum AppendOutcome { + Inserted(MappingHandle), + Idempotent, + Failed(String), +} + +impl RegistryState { + fn append( + &mut self, + proposal: MappingProposal, + sem: &SemanticTypeMap, + ) -> AppendOutcome { + let key = (proposal.bridge_id.clone(), proposal.public_name.clone()); + if let Some(existing) = self.by_bridge_name.get(&key) { + let row = &self.rows[*existing as usize]; + if row.checksum == proposal.checksum { + return AppendOutcome::Idempotent; + } + } + // Allocate or look up the namespace id. + let namespace_id = if let Some(id) = self.by_namespace.get(&proposal.namespace) { + *id + } else { + let next = (self.namespace_order.len() + 1) as u8; + if next == 0 { + return AppendOutcome::Failed("namespace overflow".to_string()); + } + let id = NamespaceId(next); + self.by_namespace.insert(proposal.namespace.clone(), id); + self.namespace_order.push(proposal.namespace.clone()); + id + }; + + let kind = proposal.schema_kind(); + let entity_type_id = (self.rows.len() + 1) as u16; + let schema_ptr = SchemaPtr::new(namespace_id, entity_type_id, kind); + + let semantic_type = match &proposal.kind { + MappingProposalKind::Attribute { semantic_type, .. } => semantic_type.clone(), + _ => sem.lookup(proposal.ogit_uri.as_str()), + }; + let row = MappingRow { + bridge_id: proposal.bridge_id.clone(), + public_name: proposal.public_name.clone(), + ogit_uri: proposal.ogit_uri.clone(), + namespace_id, + schema_ptr, + kind, + semantic_type, + marking: proposal.marking, + confidence: proposal.confidence, + created_at_us: now_micros(), + created_by: proposal.created_by.clone(), + source_uri: proposal.source_uri.clone(), + active: true, + checksum: proposal.checksum.clone(), + }; + let idx = self.rows.len() as u32; + self.rows.push(row); + self.by_bridge_name.insert(key, idx); + self.by_uri + .insert(proposal.ogit_uri.as_str().to_string(), idx); + AppendOutcome::Inserted(MappingHandle { + schema_ptr, + row_index: idx, + }) + } + + // Used by `lance_cache::LanceWriter::replay()` when reconstituting the + // in-memory state from a Lance dataset on `OntologyRegistry::open`. + // The reader only compiles under the `lance-cache` feature; suppress + // the dead-code lint when the feature is off. + #[cfg_attr(not(feature = "lance-cache"), allow(dead_code))] + fn absorb_row(&mut self, row: MappingRow) { + let key = (row.bridge_id.clone(), row.public_name.clone()); + if !self.by_namespace.contains_key(row.ogit_uri.namespace().unwrap_or("")) { + let ns = row.ogit_uri.namespace().unwrap_or("").to_string(); + if !ns.is_empty() { + self.by_namespace.insert(ns.clone(), row.namespace_id); + if !self.namespace_order.contains(&ns) { + self.namespace_order.push(ns); + } + } + } + let idx = self.rows.len() as u32; + self.by_bridge_name.insert(key, idx); + self.by_uri.insert(row.ogit_uri.as_str().to_string(), idx); + self.rows.push(row); + } +} + +fn now_micros() -> i64 { + SystemTime::now() + .duration_since(UNIX_EPOCH) + .map(|d| d.as_micros() as i64) + .unwrap_or(0) +} + +// Suppress unused-warning for `Marking` / `SemanticType` re-exports we +// surface via MappingRow but don't otherwise reference here. +#[allow(dead_code)] +const _MARKER: (Marking, SemanticType) = (Marking::Internal, SemanticType::PlainText); + +#[cfg(test)] +mod tests { + use super::*; + use crate::namespace::{OgitUri, SchemaKind}; + use crate::proposal::MappingProposalKind; + use lance_graph_contract::property::Schema; + + fn proposal(uri: &str) -> MappingProposal { + let parsed = OgitUri::parse(uri).unwrap(); + let ns = parsed.namespace().unwrap().to_string(); + let name = parsed.name().unwrap().to_string(); + MappingProposal { + public_name: uri.to_string(), + bridge_id: "ogit".to_string(), + ogit_uri: parsed, + namespace: ns, + kind: MappingProposalKind::Entity { + schema: Schema::builder(Box::leak(name.into_boxed_str())) + .required("id") + .build(), + }, + marking: Marking::Internal, + confidence: 1.0, + source_uri: format!("test://{uri}"), + checksum: format!("checksum-{uri}"), + created_by: "test".to_string(), + } + } + + #[test] + fn append_and_resolve() { + let reg = OntologyRegistry::new_in_memory(); + let h = reg.append_mapping(proposal("ogit.Network:IPAddress")).unwrap(); + assert_eq!(reg.len(), 1); + let resolved = reg.resolve("ogit", "ogit.Network:IPAddress").unwrap(); + assert_eq!(resolved, h.schema_ptr); + assert_eq!(resolved.kind(), SchemaKind::Entity); + assert!(resolved.namespace_id().is_known()); + } + + #[test] + fn idempotent_double_append() { + let reg = OntologyRegistry::new_in_memory(); + reg.append_mapping(proposal("ogit.Network:IPAddress")).unwrap(); + let h = reg.append_mapping(proposal("ogit.Network:IPAddress")).unwrap(); + // Same checksum → idempotent: reuses the existing row. + assert_eq!(reg.len(), 1); + assert_eq!(reg.resolve("ogit", "ogit.Network:IPAddress").unwrap(), h.schema_ptr); + } + + #[test] + fn enumerate_groups_by_namespace() { + let reg = OntologyRegistry::new_in_memory(); + reg.append_mapping(proposal("ogit.Network:IPAddress")).unwrap(); + reg.append_mapping(proposal("ogit.Network:MACAddress")).unwrap(); + reg.append_mapping(proposal("ogit.Auth:Account")).unwrap(); + assert_eq!(reg.enumerate("Network").len(), 2); + assert_eq!(reg.enumerate("Auth").len(), 1); + assert_eq!(reg.enumerate("Missing").len(), 0); + } + + #[test] + fn namespace_ids_are_dense_and_unique() { + let reg = OntologyRegistry::new_in_memory(); + reg.append_mapping(proposal("ogit.A:X")).unwrap(); + reg.append_mapping(proposal("ogit.B:Y")).unwrap(); + assert_ne!(reg.namespace_id("A"), reg.namespace_id("B")); + assert_eq!(reg.namespace_id("A").unwrap().raw(), 1); + assert_eq!(reg.namespace_id("B").unwrap().raw(), 2); + } +} diff --git a/crates/lance-graph-ontology/src/schema_source.rs b/crates/lance-graph-ontology/src/schema_source.rs new file mode 100644 index 00000000..b6d7456f --- /dev/null +++ b/crates/lance-graph-ontology/src/schema_source.rs @@ -0,0 +1,31 @@ +//! `SchemaSource` trait — the abstract producer of `MappingProposal`s. +//! +//! Implementations: +//! +//! - [`crate::ttl_parse::TtlSource`] (this session) — parses OGIT-shaped +//! TTL files. +//! - MySQL / MSSQL scanners (future session) — introspect a relational +//! schema and emit one proposal per discovered table / column. +//! - Customer admin form (future UX layer) — emits one proposal per row +//! when a customer extends their tenant ontology at runtime. +//! +//! Every implementation funnels through the same registry append path so +//! the audit story is uniform: every dictionary row carries a +//! `created_by` + `source_uri` + `confidence` and every change is +//! immortalised in the Lance time-travel history. + +use crate::error::Result; +use crate::proposal::MappingProposal; +use crate::semantic_types::SemanticTypeMap; + +/// A source of `MappingProposal`s. +pub trait SchemaSource { + /// Produce all proposals this source has to offer. Called eagerly; the + /// returned `Vec` is appended to the registry as a batch. + fn proposals(&self, sem: &SemanticTypeMap) -> Result>; + + /// Stable identifier for this source. Used in the dictionary's + /// `created_by` column for audit. Examples: `"ogit_hydrator_v1"`, + /// `"mysql_scanner_v1"`, `"admin:user@example.com"`. + fn created_by(&self) -> String; +} diff --git a/crates/lance-graph-ontology/src/semantic_types.rs b/crates/lance-graph-ontology/src/semantic_types.rs new file mode 100644 index 00000000..a9041794 --- /dev/null +++ b/crates/lance-graph-ontology/src/semantic_types.rs @@ -0,0 +1,268 @@ +//! `semantic_types.toml` loader. +//! +//! Maps OGIT URIs (or attribute paths) to `lance_graph_contract::SemanticType` +//! enum values. The TOML file is the only declarative config in this crate; +//! customer-facing ontology data goes through TTL. +//! +//! Embedded at compile time via `include_str!`. Consumers can override +//! mappings by passing a custom TOML string to [`SemanticTypeMap::from_toml`]. +//! +//! Only the variants that already exist in `lance-graph-contract::property` +//! are recognised. Adding a new variant is a contract change and must be +//! tracked separately in `LATEST_STATE.md`. + +use crate::error::{Error, Result}; +use lance_graph_contract::property::{DatePrecision, GeoFormat, SemanticType}; +use std::collections::HashMap; +use std::sync::OnceLock; + +const DEFAULT_TOML: &str = include_str!("semantic_types.toml"); + +/// Lookup table from attribute URI to SemanticType. +#[derive(Clone, Debug)] +pub struct SemanticTypeMap { + by_uri: HashMap, + default: SemanticType, +} + +impl SemanticTypeMap { + pub fn from_toml(toml_str: &str) -> Result { + let value: toml::Value = toml_str + .parse() + .map_err(|e| Error::TomlDecode(format!("{e}")))?; + + let mut by_uri = HashMap::new(); + if let Some(mappings) = value.get("mappings").and_then(|v| v.as_table()) { + for (key, val) in mappings { + let s = val.as_str().ok_or_else(|| { + Error::TomlDecode(format!( + "mappings.{key}: expected string SemanticType name, got {val:?}" + )) + })?; + let st = parse_semantic_type(s).ok_or_else(|| { + Error::TomlDecode(format!( + "mappings.{key}: `{s}` is not a recognised SemanticType variant" + )) + })?; + by_uri.insert(key.clone(), st); + } + } + + let default = value + .get("default") + .and_then(|v| v.get("unmapped")) + .and_then(|v| v.as_str()) + .and_then(parse_semantic_type) + .unwrap_or(SemanticType::PlainText); + + Ok(Self { by_uri, default }) + } + + pub fn defaults() -> &'static Self { + static MAP: OnceLock = OnceLock::new(); + MAP.get_or_init(|| { + SemanticTypeMap::from_toml(DEFAULT_TOML) + .expect("bundled semantic_types.toml must parse") + }) + } + + pub fn lookup(&self, attr_uri: &str) -> SemanticType { + self.by_uri + .get(attr_uri) + .cloned() + .unwrap_or_else(|| self.default.clone()) + } + + pub fn default_type(&self) -> &SemanticType { + &self.default + } + + pub fn len(&self) -> usize { + self.by_uri.len() + } + + pub fn is_empty(&self) -> bool { + self.by_uri.is_empty() + } +} + +/// Parse a semantic-type name from the TOML config. The set of accepted +/// names mirrors the variants currently in +/// `lance_graph_contract::property::SemanticType`. Names with parameters +/// pick conservative defaults (Date → Day, Geo → LatLon). +fn parse_semantic_type(name: &str) -> Option { + Some(match name { + "PlainText" => SemanticType::PlainText, + "Iban" => SemanticType::Iban, + "Email" => SemanticType::Email, + "Phone" => SemanticType::Phone, + "Address" => SemanticType::Address, + "Url" => SemanticType::Url, + "TaxId" => SemanticType::TaxId, + "CustomerId" => SemanticType::CustomerId, + "InvoiceNumber" => SemanticType::InvoiceNumber, + "Image" => SemanticType::Image, + "Date" => SemanticType::Date(DatePrecision::Day), + "DateMonth" => SemanticType::Date(DatePrecision::Month), + "DateYear" => SemanticType::Date(DatePrecision::Year), + "DateTime" => SemanticType::Date(DatePrecision::DateTime), + "GeoLatLon" => SemanticType::Geo(GeoFormat::LatLon), + "GeoWgs84" => SemanticType::Geo(GeoFormat::Wgs84), + "GeoPlusCode" => SemanticType::Geo(GeoFormat::PlusCode), + // Currency / File variants take a `&'static str` parameter we + // cannot construct from TOML; they require explicit Rust call + // sites. Skip them in the TOML loader. + _ => return None, + }) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn defaults_load() { + let map = SemanticTypeMap::defaults(); + assert!(!map.is_empty()); + assert_eq!(map.default_type().clone(), SemanticType::PlainText); + } + + #[test] + fn lookup_returns_default_for_unmapped() { + let map = SemanticTypeMap::defaults(); + let st = map.lookup("ogit:ZzzNonexistent"); + assert_eq!(st, SemanticType::PlainText); + } + + #[test] + fn from_toml_handles_overrides() { + let toml_str = r#" +[mappings] +"ogit.Test:Foo.bar" = "Email" + +[default] +unmapped = "PlainText" +"#; + let map = SemanticTypeMap::from_toml(toml_str).unwrap(); + assert_eq!(map.lookup("ogit.Test:Foo.bar"), SemanticType::Email); + assert_eq!(map.lookup("anything-else"), SemanticType::PlainText); + } + + #[test] + fn from_toml_rejects_bad_variant() { + let toml_str = r#" +[mappings] +"ogit.Bogus:X" = "NotARealVariant" +"#; + assert!(SemanticTypeMap::from_toml(toml_str).is_err()); + } + + #[test] + fn parametric_variants_picked() { + let toml_str = r#" +[mappings] +"a" = "Date" +"b" = "DateTime" +"c" = "GeoLatLon" +"#; + let map = SemanticTypeMap::from_toml(toml_str).unwrap(); + assert_eq!(map.lookup("a"), SemanticType::Date(DatePrecision::Day)); + assert_eq!(map.lookup("b"), SemanticType::Date(DatePrecision::DateTime)); + assert_eq!(map.lookup("c"), SemanticType::Geo(GeoFormat::LatLon)); + } + + /// WorkOrder namespace mappings cover the WoA-domain attributes emitted + /// in `OGIT/NTO/WorkOrder/entities/*.ttl`. Each canonical SemanticType + /// (Email/Phone/Iban/TaxId/CustomerId/InvoiceNumber/Date/DateTime/Image) + /// must round-trip through the bundled TOML. + #[test] + fn workorder_namespace_lookups() { + let map = SemanticTypeMap::defaults(); + // Customer + assert_eq!( + map.lookup("ogit.WorkOrder:Customer.email"), + SemanticType::Email + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Customer.telefon"), + SemanticType::Phone + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Customer.iban"), + SemanticType::Iban + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Customer.taxId"), + SemanticType::TaxId + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Customer.kdnr"), + SemanticType::CustomerId + ); + // Order + assert_eq!( + map.lookup("ogit.WorkOrder:Order.orderId"), + SemanticType::InvoiceNumber + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Order.datum"), + SemanticType::Date(DatePrecision::Day) + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Order.bezahlt"), + SemanticType::Date(DatePrecision::Day) + ); + // LogbookEntry / User + assert_eq!( + map.lookup("ogit.WorkOrder:LogbookEntry.datum"), + SemanticType::Date(DatePrecision::Day) + ); + assert_eq!( + map.lookup("ogit.WorkOrder:LogbookEntry.createdAt"), + SemanticType::Date(DatePrecision::DateTime) + ); + assert_eq!( + map.lookup("ogit.WorkOrder:User.email"), + SemanticType::Email + ); + assert_eq!( + map.lookup("ogit.WorkOrder:User.phone"), + SemanticType::Phone + ); + // Picture / PasswordEntry + assert_eq!( + map.lookup("ogit.WorkOrder:Picture.dateiname"), + SemanticType::Image + ); + assert_eq!( + map.lookup("ogit.WorkOrder:PasswordEntry.url"), + SemanticType::Url + ); + } + + /// WorkOrder attributes that are not given a dedicated semantic type + /// fall through to `PlainText` (the default `unmapped`). And opaque + /// PlainText labels in the TOML still resolve to PlainText. + #[test] + fn workorder_plaintext_and_default_fallback() { + let map = SemanticTypeMap::defaults(); + // Explicit PlainText mapping (route / firma / artikelnr). + assert_eq!( + map.lookup("ogit.WorkOrder:LogbookEntry.route"), + SemanticType::PlainText + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Customer.firma"), + SemanticType::PlainText + ); + assert_eq!( + map.lookup("ogit.WorkOrder:Article.artikelnr"), + SemanticType::PlainText + ); + // Unmapped WorkOrder attribute → default PlainText. + assert_eq!( + map.lookup("ogit.WorkOrder:Order.bogusFieldThatDoesNotExist"), + SemanticType::PlainText + ); + } +} diff --git a/crates/lance-graph-ontology/src/semantic_types.toml b/crates/lance-graph-ontology/src/semantic_types.toml new file mode 100644 index 00000000..70b9dc76 --- /dev/null +++ b/crates/lance-graph-ontology/src/semantic_types.toml @@ -0,0 +1,167 @@ +# semantic_types.toml — declarative OGIT-attribute → SemanticType mapping. +# +# This is the only TOML in the lance-graph-ontology crate. It is a +# developer-facing configuration file. Customer-facing ontology data goes +# through TTL (which this crate parses via oxttl). +# +# Keys are OGIT-shape attribute URIs `ogit.:.` (or +# bare `ogit.:` for namespace-level attributes). +# Values are the *names* of variants in +# `lance_graph_contract::property::SemanticType`. Currently supported variant +# names: PlainText | Iban | Email | Phone | Address | Url | TaxId | +# CustomerId | InvoiceNumber | Image | Date | DateMonth | DateYear | +# DateTime | GeoLatLon | GeoWgs84 | GeoPlusCode. +# +# Currency(...) and File(...) take static-string parameters that cannot be +# expressed in TOML; they must be set via explicit Rust call sites (e.g. +# from a tenant bridge that knows its own currency code). +# +# `default.unmapped` controls the fallback when a predicate does not appear +# in `[mappings]`. Default = PlainText (safe). + +[mappings] +# ── Network ──────────────────────────────────────────────────────────────── +"ogit.Network:IPAddress.id" = "Url" +"ogit.Network:MACAddress.id" = "PlainText" +"ogit.Network:NetworkInterface.name" = "PlainText" + +# ── Auth ─────────────────────────────────────────────────────────────────── +"ogit.Auth:Account.email" = "Email" +"ogit.Auth:Account.username" = "PlainText" +"ogit.Auth:Account.password" = "PlainText" +"ogit.Auth:Person.email" = "Email" +"ogit.Auth:Person.phone" = "Phone" +"ogit.Auth:Person.address" = "Address" + +# ── Compliance / Legal ──────────────────────────────────────────────────── +"ogit.Compliance:Person.taxId" = "TaxId" +"ogit.Compliance:Organization.taxId" = "TaxId" + +# ── Procurement / SalesDistribution ─────────────────────────────────────── +"ogit.SalesDistribution:Order.orderId" = "InvoiceNumber" +"ogit.SalesDistribution:Customer.customerId" = "CustomerId" +"ogit.SalesDistribution:Customer.taxId" = "TaxId" +"ogit.SalesDistribution:Customer.iban" = "Iban" +"ogit.SalesDistribution:Customer.email" = "Email" +"ogit.SalesDistribution:Customer.phone" = "Phone" + +# ── MRP / MaterialManagement ────────────────────────────────────────────── +"ogit.MRP:Material.partNumber" = "PlainText" + +# ── WorkOrder (this session adds these in OGIT/NTO/WorkOrder/) ──────────── +# Customer (Kunde) +"ogit.WorkOrder:Customer.email" = "Email" +"ogit.WorkOrder:Customer.telefon" = "Phone" +"ogit.WorkOrder:Customer.iban" = "Iban" +"ogit.WorkOrder:Customer.taxId" = "TaxId" +"ogit.WorkOrder:Customer.kdnr" = "CustomerId" +"ogit.WorkOrder:Customer.firma" = "PlainText" +"ogit.WorkOrder:Customer.vorname" = "PlainText" +"ogit.WorkOrder:Customer.nachname" = "PlainText" +"ogit.WorkOrder:Customer.strasse" = "Address" +"ogit.WorkOrder:Customer.plz" = "PlainText" +"ogit.WorkOrder:Customer.ort" = "Address" +"ogit.WorkOrder:Customer.zahlungsziel" = "PlainText" +"ogit.WorkOrder:Customer.stundensatz" = "PlainText" + +# Order (Auftrag/Rechnung) +"ogit.WorkOrder:Order.orderId" = "InvoiceNumber" +"ogit.WorkOrder:Order.datum" = "Date" +"ogit.WorkOrder:Order.bezahlt" = "Date" +"ogit.WorkOrder:Order.docType" = "PlainText" +"ogit.WorkOrder:Order.betreff" = "PlainText" +"ogit.WorkOrder:Order.nettoSumme" = "PlainText" +"ogit.WorkOrder:Order.bruttoSumme" = "PlainText" +"ogit.WorkOrder:Order.mwstBetrag" = "PlainText" + +# LogbookEntry (Fahrtenbuch) +"ogit.WorkOrder:LogbookEntry.datum" = "Date" +"ogit.WorkOrder:LogbookEntry.createdAt" = "DateTime" +"ogit.WorkOrder:LogbookEntry.abfahrt" = "DateTime" +"ogit.WorkOrder:LogbookEntry.ankunft" = "DateTime" +"ogit.WorkOrder:LogbookEntry.rueckfahrt" = "DateTime" +"ogit.WorkOrder:LogbookEntry.zurueck" = "DateTime" +"ogit.WorkOrder:LogbookEntry.route" = "PlainText" +"ogit.WorkOrder:LogbookEntry.zweck" = "PlainText" +"ogit.WorkOrder:LogbookEntry.fahrzeug" = "PlainText" +"ogit.WorkOrder:LogbookEntry.startKm" = "PlainText" +"ogit.WorkOrder:LogbookEntry.endeKm" = "PlainText" +"ogit.WorkOrder:LogbookEntry.privatAnteil" = "PlainText" + +# Picture (Bild) +"ogit.WorkOrder:Picture.dateiname" = "Image" +"ogit.WorkOrder:Picture.beschreibung" = "PlainText" + +# User (Mitarbeiter) +"ogit.WorkOrder:User.email" = "Email" +"ogit.WorkOrder:User.phone" = "Phone" +"ogit.WorkOrder:User.username" = "PlainText" +"ogit.WorkOrder:User.firstname" = "PlainText" +"ogit.WorkOrder:User.lastname" = "PlainText" +"ogit.WorkOrder:User.passwordHash" = "PlainText" +"ogit.WorkOrder:User.createdAt" = "DateTime" + +# CustomerPortalUser +"ogit.WorkOrder:CustomerPortalUser.username" = "PlainText" +"ogit.WorkOrder:CustomerPortalUser.passwordHash" = "PlainText" +"ogit.WorkOrder:CustomerPortalUser.createdAt" = "DateTime" +"ogit.WorkOrder:CustomerPortalUser.lastLogin" = "DateTime" + +# Tenant +"ogit.WorkOrder:Tenant.slug" = "PlainText" +"ogit.WorkOrder:Tenant.logoPath" = "PlainText" +"ogit.WorkOrder:Tenant.createdAt" = "DateTime" + +# TimeSheet (Stundenzettel) +"ogit.WorkOrder:TimeSheet.datum" = "Date" +"ogit.WorkOrder:TimeSheet.createdAt" = "DateTime" +"ogit.WorkOrder:TimeSheet.updatedAt" = "DateTime" +"ogit.WorkOrder:TimeSheet.timerStart" = "DateTime" +"ogit.WorkOrder:TimeSheet.beschreibung" = "PlainText" +"ogit.WorkOrder:TimeSheet.minuten" = "PlainText" + +# PasswordEntry (Passwort-Tresor) +"ogit.WorkOrder:PasswordEntry.url" = "Url" +"ogit.WorkOrder:PasswordEntry.benutzername" = "PlainText" +"ogit.WorkOrder:PasswordEntry.passwortEnc" = "PlainText" +"ogit.WorkOrder:PasswordEntry.notizenEnc" = "PlainText" +"ogit.WorkOrder:PasswordEntry.titel" = "PlainText" +"ogit.WorkOrder:PasswordEntry.gruppe" = "PlainText" +"ogit.WorkOrder:PasswordEntry.icon" = "PlainText" +"ogit.WorkOrder:PasswordEntry.createdAt" = "DateTime" +"ogit.WorkOrder:PasswordEntry.updatedAt" = "DateTime" + +# Article (Artikelstamm) +"ogit.WorkOrder:Article.artikelnr" = "PlainText" +"ogit.WorkOrder:Article.beschreibung" = "PlainText" +"ogit.WorkOrder:Article.lieferant" = "PlainText" +"ogit.WorkOrder:Article.preisNetto" = "PlainText" +"ogit.WorkOrder:Article.ekPreis" = "PlainText" +"ogit.WorkOrder:Article.mwstSatz" = "PlainText" + +# Position (Auftragsposition) +"ogit.WorkOrder:Position.beschreibung" = "PlainText" +"ogit.WorkOrder:Position.einzelpreis" = "PlainText" +"ogit.WorkOrder:Position.menge" = "PlainText" +"ogit.WorkOrder:Position.einheit" = "PlainText" +"ogit.WorkOrder:Position.posTyp" = "PlainText" +"ogit.WorkOrder:Position.mwstSatz" = "PlainText" + +# Activity / HistoryEntry / Setting / NumberSequence +"ogit.WorkOrder:Activity.beschreibung" = "PlainText" +"ogit.WorkOrder:Activity.geraet" = "PlainText" +"ogit.WorkOrder:HistoryEntry.aktion" = "PlainText" +"ogit.WorkOrder:HistoryEntry.details" = "PlainText" +"ogit.WorkOrder:Setting.settingKey" = "PlainText" +"ogit.WorkOrder:Setting.settingValue" = "PlainText" +"ogit.WorkOrder:Setting.label" = "PlainText" +"ogit.WorkOrder:NumberSequence.prefix" = "PlainText" + +# ── GeoProfile ───────────────────────────────────────────────────────────── +"ogit.GeoProfile:Place.latlon" = "GeoLatLon" + +# ── Documents ────────────────────────────────────────────────────────────── +"ogit.Documents:Document.url" = "Url" + +[default] +unmapped = "PlainText" diff --git a/crates/lance-graph-ontology/src/ttl_parse.rs b/crates/lance-graph-ontology/src/ttl_parse.rs new file mode 100644 index 00000000..366738a4 --- /dev/null +++ b/crates/lance-graph-ontology/src/ttl_parse.rs @@ -0,0 +1,595 @@ +//! TTL → MappingProposal pipeline. +//! +//! Walks an OGIT-shaped TTL directory, parses each file via `oxttl`, and +//! emits `MappingProposal`s for every `ogit:Entity` subclass it finds. Verbs +//! (subclasses of `ogit:Verb`) become edge proposals; standalone attributes +//! (`owl:DatatypeProperty`) become attribute proposals. +//! +//! Lists declared via `( a b c )` syntax are stored by oxttl as RDF lists +//! (`rdf:first` / `rdf:rest` / `rdf:nil`). We re-assemble those after +//! collecting triples. +//! +//! ## What this parser handles today +//! +//! - Entities: ` a rdfs:Class; rdfs:subClassOf ogit:Entity` → entity +//! proposal with `Schema` derived from `ogit:mandatory-attributes` (→ +//! `Schema::required(...)`) and `ogit:optional-attributes` (→ +//! `Schema::optional(...)`). +//! - Verbs: ` a rdfs:Class; rdfs:subClassOf ogit:Verb` → edge proposal. +//! The `LinkSpec` is built with placeholder subject/object types (the +//! `ogit:from-to` / `ogit:allowed` constraints are not yet expanded; they +//! round-trip via the dictionary `source_uri` for now). +//! - Attributes: ` a owl:DatatypeProperty` → attribute proposal with the +//! `SemanticType` looked up from `semantic_types.toml`. +//! +//! ## What it does NOT yet handle +//! +//! - Bilingual labels (`@de`/`@en` annotations). +//! - The detailed `ogit:allowed` constraint blocks for verbs (those become +//! edge proposals with `LinkSpec::one_to_many` placeholders). +//! - Anything past Turtle (RDF/XML, N-Quads, etc.) — that is the remit of +//! the future `lance-graph-rdf` crate. +//! +//! Carrier-method doctrine: `TtlSource` carries the parsing logic; the +//! free `parse_ttl_directory` is a thin convenience wrapper. + +use crate::error::{Error, Result}; +use crate::namespace::OgitUri; +use crate::proposal::{HydrationFailure, MappingProposal, MappingProposalKind}; +use crate::semantic_types::SemanticTypeMap; +use lance_graph_contract::cam::CodecRoute; +use lance_graph_contract::property::{Cardinality, LinkSpec, Marking, PropertySpec, Schema}; +use sha2::{Digest, Sha256}; +use std::collections::HashMap; +use std::fs; +use std::path::{Path, PathBuf}; + +const OGIT_BASE: &str = "http://www.purl.org/ogit/"; +const RDF_TYPE: &str = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"; +const RDF_FIRST: &str = "http://www.w3.org/1999/02/22-rdf-syntax-ns#first"; +const RDF_REST: &str = "http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"; +const RDF_NIL: &str = "http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"; +const RDFS_CLASS: &str = "http://www.w3.org/2000/01/rdf-schema#Class"; +const RDFS_SUBCLASS_OF: &str = "http://www.w3.org/2000/01/rdf-schema#subClassOf"; +const RDFS_LABEL: &str = "http://www.w3.org/2000/01/rdf-schema#label"; +const OWL_DATATYPE_PROPERTY: &str = "http://www.w3.org/2002/07/owl#DatatypeProperty"; +const OGIT_ENTITY: &str = "http://www.purl.org/ogit/Entity"; +const OGIT_VERB: &str = "http://www.purl.org/ogit/Verb"; +const OGIT_NODE: &str = "http://www.purl.org/ogit/Node"; +const OGIT_SCOPE: &str = "http://www.purl.org/ogit/scope"; +const OGIT_MANDATORY: &str = "http://www.purl.org/ogit/mandatory-attributes"; +const OGIT_OPTIONAL: &str = "http://www.purl.org/ogit/optional-attributes"; +const OGIT_INDEXED: &str = "http://www.purl.org/ogit/indexed-attributes"; + +/// One TTL source — typically a single `.ttl` file in the OGIT NTO tree. +pub struct TtlSource { + path: PathBuf, + bytes: Vec, +} + +impl TtlSource { + pub fn from_path(path: impl AsRef) -> Result { + let path = path.as_ref().to_path_buf(); + let bytes = fs::read(&path).map_err(|source| Error::Io { + path: path.clone(), + source, + })?; + Ok(Self { path, bytes }) + } + + pub fn from_bytes(path: impl AsRef, bytes: Vec) -> Self { + Self { + path: path.as_ref().to_path_buf(), + bytes, + } + } + + /// SHA256 of the source bytes — drives idempotent re-hydration. + pub fn checksum(&self) -> String { + let mut h = Sha256::new(); + h.update(&self.bytes); + format!("{:x}", h.finalize()) + } + + pub fn path(&self) -> &Path { + &self.path + } + + /// Walk the bytes via oxttl, group by subject, and emit + /// `MappingProposal`s. The `bridge_id` is the value the proposals will + /// be filed under in the registry (typically `"ogit"` for raw OGIT + /// hydration; tenant bridges add their own proposals separately via + /// `MappingProposal::with_bridge_id`). + pub fn parse_into_proposals( + &self, + bridge_id: &str, + sem: &SemanticTypeMap, + ) -> std::result::Result, HydrationFailure> { + // Pass 1: read all triples into memory. OGIT TTL files are small + // (median ~50 lines, max <1 KB) so this is cheap. + use oxttl::TurtleParser; + + let parser = TurtleParser::new() + .with_base_iri("http://www.purl.org/ogit/") + .map_err(|e| HydrationFailure { + source: format!("{}", self.path.display()), + reason: format!("base IRI: {e}"), + })? + .for_slice(&self.bytes); + + let mut triples: Vec = Vec::new(); + for item in parser { + match item { + Ok(t) => { + let s = subject_to_string(&t.subject); + let p = t.predicate.as_str().to_string(); + let o = term_to_value(&t.object); + triples.push(RawTriple { + subject: s, + predicate: p, + object: o, + }); + } + Err(e) => { + return Err(HydrationFailure { + source: format!("{}", self.path.display()), + reason: format!("oxttl: {e}"), + }); + } + } + } + + // Pass 2: index triples by subject; pre-resolve RDF lists. + let by_subject: HashMap> = + triples.into_iter().fold(HashMap::new(), |mut acc, t| { + acc.entry(t.subject) + .or_default() + .push((t.predicate, t.object)); + acc + }); + + let mut proposals = Vec::new(); + + let checksum = self.checksum(); + let source_uri = format!("file:{}", self.path.display()); + + for (subject_uri, props) in &by_subject { + // Skip non-OGIT subjects (blank nodes, anonymous list cells). + if !subject_uri.starts_with(OGIT_BASE) { + continue; + } + let canonical = canonical_ogit_uri(subject_uri); + let ogit_uri = match OgitUri::parse(&canonical) { + Ok(u) => u, + Err(_) => continue, // root vocabulary terms etc. + }; + let namespace = ogit_uri.namespace().unwrap_or("").to_string(); + + let kind_class = classify(props); + match kind_class { + SubjectKind::Entity => { + let schema = build_entity_schema(&ogit_uri, props, &by_subject, sem); + proposals.push(MappingProposal { + public_name: canonical.clone(), + bridge_id: bridge_id.to_string(), + ogit_uri: ogit_uri.clone(), + namespace: namespace.clone(), + kind: MappingProposalKind::Entity { schema }, + marking: default_marking_for_namespace(&namespace), + confidence: 1.0, + source_uri: source_uri.clone(), + checksum: checksum.clone(), + created_by: "ogit_hydrator_v1".to_string(), + }); + } + SubjectKind::Verb => { + // Edge proposal — the from-to constraints become a + // generic Node->Node placeholder; consumers that need + // typed link constraints can re-derive them from the + // raw TTL via `source_uri`. + let predicate = ogit_uri.name().unwrap_or("relates"); + let link = LinkSpec { + subject_type: "ogit.Node", + predicate: leak_static(predicate), + object_type: "ogit.Node", + cardinality: Cardinality::ManyToMany, + codec_route: CodecRoute::Passthrough, + }; + proposals.push(MappingProposal { + public_name: canonical.clone(), + bridge_id: bridge_id.to_string(), + ogit_uri: ogit_uri.clone(), + namespace: namespace.clone(), + kind: MappingProposalKind::Edge { link }, + marking: default_marking_for_namespace(&namespace), + confidence: 1.0, + source_uri: source_uri.clone(), + checksum: checksum.clone(), + created_by: "ogit_hydrator_v1".to_string(), + }); + } + SubjectKind::Attribute => { + let semantic_type = sem.lookup(&canonical); + proposals.push(MappingProposal { + public_name: canonical.clone(), + bridge_id: bridge_id.to_string(), + ogit_uri: ogit_uri.clone(), + namespace: namespace.clone(), + kind: MappingProposalKind::Attribute { + predicate: ogit_uri.name().unwrap_or("").to_string(), + semantic_type, + }, + marking: default_marking_for_namespace(&namespace), + confidence: 1.0, + source_uri: source_uri.clone(), + checksum: checksum.clone(), + created_by: "ogit_hydrator_v1".to_string(), + }); + } + SubjectKind::Other => {} + } + } + + Ok(proposals) + } +} + +/// Walk a directory tree, parse every `*.ttl` file, return all proposals. +/// Failed files return a `HydrationFailure` rather than aborting. +pub fn parse_ttl_directory( + root: &Path, + bridge_id: &str, + sem: &SemanticTypeMap, + namespace_filter: &[&str], +) -> Result<(Vec, Vec)> { + let mut proposals = Vec::new(); + let mut failures = Vec::new(); + + walk_ttl_files(root, &mut |path| { + // Apply namespace filter — directory under root is the namespace. + if !namespace_filter.is_empty() { + let rel = path.strip_prefix(root).unwrap_or(path); + let ns = rel + .components() + .next() + .and_then(|c| c.as_os_str().to_str()) + .unwrap_or(""); + if !namespace_filter.iter().any(|f| *f == ns) { + return Ok(()); + } + } + match TtlSource::from_path(path) { + Ok(src) => match src.parse_into_proposals(bridge_id, sem) { + Ok(mut p) => proposals.append(&mut p), + Err(f) => failures.push(f), + }, + Err(e) => failures.push(HydrationFailure { + source: format!("{}", path.display()), + reason: format!("io: {e}"), + }), + } + Ok(()) + })?; + + Ok((proposals, failures)) +} + +/// Compute the SHA256 of the concatenated sorted contents of every TTL +/// file under `root`. Used to short-circuit hydration when nothing has +/// changed. +pub fn ttl_root_checksum(root: &Path) -> Result { + let mut paths: Vec = Vec::new(); + walk_ttl_files(root, &mut |p| { + paths.push(p.to_path_buf()); + Ok(()) + })?; + paths.sort(); + + let mut h = Sha256::new(); + for p in paths { + let bytes = fs::read(&p).map_err(|source| Error::Io { + path: p.clone(), + source, + })?; + h.update(p.to_string_lossy().as_bytes()); + h.update(&[0u8]); + h.update(&bytes); + h.update(&[0u8]); + } + Ok(format!("{:x}", h.finalize())) +} + +fn walk_ttl_files( + root: &Path, + visit: &mut dyn FnMut(&Path) -> Result<()>, +) -> Result<()> { + if !root.exists() { + return Err(Error::Io { + path: root.to_path_buf(), + source: std::io::Error::new( + std::io::ErrorKind::NotFound, + "TTL root does not exist", + ), + }); + } + let mut stack = vec![root.to_path_buf()]; + while let Some(dir) = stack.pop() { + let read = fs::read_dir(&dir).map_err(|source| Error::Io { + path: dir.clone(), + source, + })?; + for entry in read { + let entry = entry.map_err(|source| Error::Io { + path: dir.clone(), + source, + })?; + let p = entry.path(); + if p.is_dir() { + // Skip hidden / version-control directories. + let name = p.file_name().and_then(|s| s.to_str()).unwrap_or(""); + if name.starts_with('.') { + continue; + } + stack.push(p); + } else if p.extension().and_then(|s| s.to_str()) == Some("ttl") { + visit(&p)?; + } + } + } + Ok(()) +} + +// ---------- Triple shape helpers ---------- + +#[derive(Clone, Debug)] +struct RawTriple { + subject: String, + predicate: String, + object: RdfValue, +} + +#[derive(Clone, Debug)] +enum RdfValue { + Iri(String), + Blank(String), + // `Literal(String)`'s payload is captured for completeness and round-trip; + // the current entity-classifier doesn't read it. TTL-PROBE-5 (TECH_DEBT) + // tracks the follow-up that wires `dcterms:source` literals through to + // `MappingProposal::source_uri`. Don't strip the field — its presence is + // load-bearing for the future fix. + #[allow(dead_code)] + Literal(String), +} + +fn subject_to_string(s: &oxrdf::Subject) -> String { + match s { + oxrdf::Subject::NamedNode(n) => n.as_str().to_string(), + oxrdf::Subject::BlankNode(b) => format!("_:{}", b.as_str()), + } +} + +fn term_to_value(t: &oxrdf::Term) -> RdfValue { + match t { + oxrdf::Term::NamedNode(n) => RdfValue::Iri(n.as_str().to_string()), + oxrdf::Term::BlankNode(b) => RdfValue::Blank(format!("_:{}", b.as_str())), + oxrdf::Term::Literal(l) => RdfValue::Literal(l.value().to_string()), + } +} + +fn canonical_ogit_uri(raw: &str) -> String { + // Normalise: ogit IRIs in the TTL come back as + // `http://www.purl.org/ogit/Network/IPAddress` (slash) or `:` form. + // OGIT URIs throughout the registry use the `:` form. Convert here. + if let Some(rest) = raw.strip_prefix(OGIT_BASE) { + if let Some((ns, name)) = rest.rsplit_once('/') { + return format!("ogit.{}:{}", ns.replace('/', "."), name); + } + return format!("ogit:{rest}"); + } + raw.to_string() +} + +#[derive(Clone, Copy, Debug, PartialEq, Eq)] +enum SubjectKind { + Entity, + Verb, + Attribute, + Other, +} + +fn classify(props: &[(String, RdfValue)]) -> SubjectKind { + let mut is_class = false; + let mut is_attribute_class = false; + let mut subclass_of_entity = false; + let mut subclass_of_verb = false; + for (p, o) in props { + if p == RDF_TYPE { + if let RdfValue::Iri(ref iri) = o { + if iri == RDFS_CLASS { + is_class = true; + } + if iri == OWL_DATATYPE_PROPERTY { + is_attribute_class = true; + } + } + } + if p == RDFS_SUBCLASS_OF { + if let RdfValue::Iri(ref iri) = o { + if iri == OGIT_ENTITY { + subclass_of_entity = true; + } + if iri == OGIT_VERB { + subclass_of_verb = true; + } + } + } + } + match (is_class, subclass_of_entity, subclass_of_verb, is_attribute_class) { + (true, true, _, _) => SubjectKind::Entity, + (true, _, true, _) => SubjectKind::Verb, + (_, _, _, true) => SubjectKind::Attribute, + _ => SubjectKind::Other, + } +} + +fn build_entity_schema( + uri: &OgitUri, + props: &[(String, RdfValue)], + by_subject: &HashMap>, + _sem: &SemanticTypeMap, +) -> Schema { + // Static name leak: we need `&'static str` for SchemaBuilder. + let entity_name = leak_static(uri.name().unwrap_or("Unknown")); + let mut builder = Schema::builder(entity_name); + + for (p, o) in props { + let attrs = match p.as_str() { + OGIT_MANDATORY => walk_rdf_list(o, by_subject), + OGIT_OPTIONAL => walk_rdf_list(o, by_subject), + OGIT_INDEXED => walk_rdf_list(o, by_subject), + _ => continue, + }; + + let is_required = p == OGIT_MANDATORY; + + for attr in attrs { + // Strip the namespace prefix to get the predicate's local name. + let local = attr_local_name(&attr); + let leaked: &'static str = leak_static(local); + if is_required { + builder = builder.property(PropertySpec::required(leaked)); + } else { + builder = builder.property(PropertySpec::optional( + leaked, + CodecRoute::Passthrough, + )); + } + } + } + + builder.build() +} + +fn walk_rdf_list( + head: &RdfValue, + by_subject: &HashMap>, +) -> Vec { + let mut out = Vec::new(); + let mut current: String = match head { + RdfValue::Blank(b) => b.clone(), + RdfValue::Iri(iri) if iri == RDF_NIL => return out, + _ => return out, + }; + + // Bound the walk to keep malformed / cyclic lists from blocking forever. + for _ in 0..1024 { + let triples = match by_subject.get(¤t) { + Some(t) => t, + None => break, + }; + let mut first: Option = None; + let mut next: Option = None; + for (p, o) in triples { + if p == RDF_FIRST { + if let RdfValue::Iri(iri) = o { + first = Some(iri.clone()); + } + } + if p == RDF_REST { + match o { + RdfValue::Iri(iri) if iri == RDF_NIL => break, + RdfValue::Blank(b) => next = Some(b.clone()), + _ => {} + } + } + } + if let Some(f) = first { + out.push(f); + } + match next { + Some(n) => current = n, + None => break, + } + } + out +} + +fn attr_local_name(uri: &str) -> &str { + if let Some(after_colon) = uri.rsplit_once(':') { + return after_colon.1; + } + if let Some(after_slash) = uri.rsplit_once('/') { + return after_slash.1; + } + uri +} + +fn default_marking_for_namespace(namespace: &str) -> Marking { + match namespace { + "Auth" | "Compliance" | "Person" => Marking::Pii, + "FinancialAccounting" | "FinancialMarket" | "Cost" | "Credit" | "Price" => { + Marking::Financial + } + _ => Marking::Internal, + } +} + +/// Leak a string into a `&'static str`. Necessary because contract types +/// like `Schema` and `PropertySpec` hold `&'static str` references. The +/// leaked memory is bounded — TTL sources are small and re-hydration +/// short-circuits via the root checksum. +fn leak_static(s: &str) -> &'static str { + Box::leak(s.to_string().into_boxed_str()) +} + +/// Suppress unused-warning for the constants we reserve for future +/// `ogit:allowed` constraint expansion. +#[allow(dead_code)] +const _RESERVED: &[&str] = &[OGIT_NODE, OGIT_SCOPE, RDFS_LABEL]; + +#[cfg(test)] +mod tests { + use super::*; + + const TINY_TTL: &str = r#" +@prefix ogit: . +@prefix ogit.Test: . +@prefix rdfs: . +@prefix dcterms: . + +ogit.Test:Widget + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Widget"; + dcterms:description "A round thing." ; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( + ogit:id + ); + ogit:optional-attributes ( + ogit:name + ); +. +"#; + + #[test] + fn parse_tiny_ttl_yields_one_entity() { + let src = TtlSource::from_bytes(PathBuf::from("tiny.ttl"), TINY_TTL.as_bytes().to_vec()); + let sem = SemanticTypeMap::defaults(); + let proposals = src.parse_into_proposals("ogit", sem).unwrap(); + let entity = proposals + .iter() + .find(|p| matches!(p.kind, MappingProposalKind::Entity { .. })) + .expect("expected one entity proposal"); + assert_eq!(entity.namespace, "Test"); + assert_eq!(entity.public_name, "ogit.Test:Widget"); + } + + #[test] + fn checksum_changes_on_content_change() { + let a = TtlSource::from_bytes(PathBuf::from("a.ttl"), b"ogit:foo a rdfs:Class .".to_vec()); + let b = TtlSource::from_bytes(PathBuf::from("a.ttl"), b"ogit:bar a rdfs:Class .".to_vec()); + assert_ne!(a.checksum(), b.checksum()); + } +} diff --git a/crates/lance-graph-ontology/tests/bridge_scope_lock.rs b/crates/lance-graph-ontology/tests/bridge_scope_lock.rs new file mode 100644 index 00000000..1aefa69c --- /dev/null +++ b/crates/lance-graph-ontology/tests/bridge_scope_lock.rs @@ -0,0 +1,144 @@ +//! Bridge scope-lock test. +//! +//! Verifies that a `WoaBridge` cannot resolve a `Healthcare` entity, and +//! vice versa. The error returned must be `BridgeError::CrossNamespaceLeak` +//! or `BridgeError::NotInScope` (the latter when the namespace itself is +//! present but the public name was filed under a different bridge id). + +use lance_graph_ontology::bridges::{MedcareBridge, OgitBridge, WoaBridge}; +use lance_graph_ontology::{NamespaceBridge, OgitUri, OntologyRegistry}; +use std::fs; +use std::sync::Arc; + +const TTL: &str = r#" +@prefix ogit: . +@prefix ogit.WorkOrder: . +@prefix ogit.Healthcare: . +@prefix rdfs: . + +ogit.WorkOrder:Order + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Order"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( + ogit:id + ); + ogit:optional-attributes ( ) ; +. + +ogit.Healthcare:Patient + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Patient"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( + ogit:id + ); + ogit:optional-attributes ( ) ; +. +"#; + +fn make_registry() -> Arc { + let tmp = tempfile::tempdir().unwrap(); + fs::create_dir_all(tmp.path().join("WorkOrder")).unwrap(); + fs::create_dir_all(tmp.path().join("Healthcare")).unwrap(); + fs::write(tmp.path().join("WorkOrder").join("ents.ttl"), TTL).unwrap(); + fs::write(tmp.path().join("Healthcare").join("ents.ttl"), TTL).unwrap(); + let registry = Arc::new(OntologyRegistry::new_in_memory()); + registry + .hydrate_once_sync(tmp.path(), &["WorkOrder", "Healthcare"]) + .unwrap(); + // Keep the tempdir alive for the duration of the test by leaking; the + // test process exits shortly after. + std::mem::forget(tmp); + registry +} + +#[test] +fn woa_bridge_resolves_workorder_entity_by_uri() { + let registry = make_registry(); + let bridge = WoaBridge::new(registry).unwrap(); + let uri = OgitUri::parse("ogit.WorkOrder:Order").unwrap(); + let entity = bridge.entity_by_uri(&uri).expect("scoped URI resolution"); + assert_eq!(entity.schema_ptr.namespace_id(), bridge.g_lock()); +} + +#[test] +fn woa_bridge_rejects_healthcare_entity_by_uri() { + let registry = make_registry(); + let bridge = WoaBridge::new(registry).unwrap(); + let uri = OgitUri::parse("ogit.Healthcare:Patient").unwrap(); + let result = bridge.entity_by_uri(&uri); + assert!( + result.is_err(), + "expected scope lock to refuse cross-namespace, got {result:?}", + ); + let err = result.unwrap_err(); + let msg = format!("{err:?}"); + assert!( + msg.contains("CrossNamespaceLeak") || msg.contains("NotInScope"), + "expected CrossNamespaceLeak or NotInScope, got {msg}", + ); +} + +#[test] +fn medcare_bridge_resolves_healthcare_entity_by_uri() { + let registry = make_registry(); + let bridge = MedcareBridge::new(registry).unwrap(); + let uri = OgitUri::parse("ogit.Healthcare:Patient").unwrap(); + let entity = bridge.entity_by_uri(&uri).expect("scoped URI resolution"); + assert_eq!(entity.schema_ptr.namespace_id(), bridge.g_lock()); +} + +#[test] +fn medcare_bridge_rejects_workorder_entity_by_uri() { + let registry = make_registry(); + let bridge = MedcareBridge::new(registry).unwrap(); + let uri = OgitUri::parse("ogit.WorkOrder:Order").unwrap(); + let result = bridge.entity_by_uri(&uri); + assert!(result.is_err(), "expected scope lock to refuse, got {result:?}"); +} + +#[test] +fn woa_bridge_public_name_aliases_via_append() { + use lance_graph_ontology::{MappingProposal, MappingProposalKind}; + use lance_graph_contract::property::{Marking, Schema}; + let registry = make_registry(); + + // A tenant adds a public-name alias for its locked namespace's + // canonical URI by appending one mapping under its own bridge_id. + let _ = registry.append_mapping(MappingProposal { + public_name: "WorkOrder".to_string(), + bridge_id: "woa".to_string(), + ogit_uri: OgitUri::parse("ogit.WorkOrder:Order").unwrap(), + namespace: "WorkOrder".to_string(), + kind: MappingProposalKind::Entity { + schema: Schema::builder("Order").required("id").build(), + }, + marking: Marking::Internal, + confidence: 1.0, + source_uri: "test://woa-alias".to_string(), + checksum: "alias-checksum".to_string(), + created_by: "test".to_string(), + }); + + let bridge = WoaBridge::new(registry).unwrap(); + let entity = bridge.entity("WorkOrder").expect("public name resolves"); + assert_eq!(entity.schema_ptr.namespace_id(), bridge.g_lock()); +} + +#[test] +fn ogit_bridge_per_namespace_works() { + let registry = make_registry(); + let work_order_bridge = OgitBridge::for_namespace(registry.clone(), "WorkOrder").unwrap(); + let healthcare_bridge = OgitBridge::for_namespace(registry, "Healthcare").unwrap(); + assert_ne!(work_order_bridge.g_lock(), healthcare_bridge.g_lock()); + let _ = work_order_bridge + .entity_by_uri( + &lance_graph_ontology::OgitUri::parse("ogit.WorkOrder:Order").unwrap(), + ) + .expect("URI-based resolution within the same namespace"); +} diff --git a/crates/lance-graph-ontology/tests/hydrate_real_ogit.rs b/crates/lance-graph-ontology/tests/hydrate_real_ogit.rs new file mode 100644 index 00000000..8e859a74 --- /dev/null +++ b/crates/lance-graph-ontology/tests/hydrate_real_ogit.rs @@ -0,0 +1,154 @@ +//! Hydrate against the actual `AdaWorldAPI/OGIT` fork. +//! +//! Runs only when `OGIT_FORK_PATH` is set to a directory containing the +//! cloned fork. The CI matrix sets this to `/home/user/OGIT` (or the +//! workspace-relative `../OGIT`). +//! +//! Verifies that hydration of the `Network` namespace produces the +//! expected canonical entities (`IPAddress`, `MACAddress`, `VLAN`, +//! `Switch`, etc.), that resolution is fast, and that idempotent +//! re-hydration short-circuits. + +use lance_graph_ontology::{NamespaceBridge, OntologyRegistry}; +use std::path::Path; +use std::sync::Arc; + +fn ogit_root() -> Option { + if let Ok(p) = std::env::var("OGIT_FORK_PATH") { + let p = std::path::PathBuf::from(p); + if p.exists() { + return Some(p.join("NTO")); + } + } + // Convention: AdaWorldAPI/OGIT is checked out next to lance-graph. + let candidate = Path::new("/home/user/OGIT/NTO"); + if candidate.exists() { + return Some(candidate.to_path_buf()); + } + let from_workspace = Path::new(env!("CARGO_MANIFEST_DIR")).join("../../../OGIT/NTO"); + if from_workspace.exists() { + return Some(from_workspace); + } + None +} + +#[test] +fn hydrate_network_namespace_from_real_ogit() { + let Some(root) = ogit_root() else { + eprintln!("SKIP: OGIT fork not found (set OGIT_FORK_PATH)"); + return; + }; + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(&root, &["Network"]) + .expect("hydration"); + assert!( + report.registered > 0, + "expected at least one Network entity, report = {report:?}" + ); + let ip = registry.resolve_uri("ogit.Network:IPAddress"); + assert!(ip.is_some(), "ogit.Network:IPAddress should be present"); + + // Ogit-bridge for the Network namespace should resolve URIs. + let bridge = lance_graph_ontology::bridges::OgitBridge::for_namespace( + registry.clone(), + "Network", + ) + .unwrap(); + let entity = bridge + .entity_by_uri(&lance_graph_ontology::OgitUri::parse("ogit.Network:IPAddress").unwrap()) + .expect("network bridge resolves IPAddress"); + assert_eq!(entity.schema_ptr.namespace_id(), bridge.g_lock()); +} + +/// Hydrate the WorkOrder namespace from the OGIT fork (the namespace +/// added in this session via `AdaWorldAPI/WoA` transcode). Closes the +/// "Phase 7 has no executed proof" gap surfaced by the integration-lead +/// review: this test exercises the full TTL → registry path against the +/// 15 entities + 12 verbs that landed in commit `3871d37` on the OGIT +/// fork branch `claude/create-graph-ontology-crate-gkuJG`. +#[test] +fn hydrate_workorder_namespace_from_real_ogit() { + let Some(root) = ogit_root() else { + eprintln!("SKIP: OGIT fork not found (set OGIT_FORK_PATH)"); + return; + }; + if !root.join("WorkOrder").exists() { + eprintln!( + "SKIP: OGIT fork has no WorkOrder/ namespace at {}", + root.display() + ); + return; + } + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(&root, &["WorkOrder"]) + .expect("hydration"); + assert!( + report.is_clean(), + "WorkOrder hydration must report no failures: {:?}", + report.failures + ); + assert!( + report.registered >= 15, + "expected at least 15 WorkOrder entities + 12 verbs, got {report:?}" + ); + + // Spot-check the canonical entity URIs that map to WoA models.py. + for entity in [ + "ogit.WorkOrder:Tenant", + "ogit.WorkOrder:Customer", + "ogit.WorkOrder:Order", + "ogit.WorkOrder:Article", + "ogit.WorkOrder:Position", + "ogit.WorkOrder:Activity", + "ogit.WorkOrder:Picture", + "ogit.WorkOrder:HistoryEntry", + "ogit.WorkOrder:User", + "ogit.WorkOrder:LogbookEntry", + "ogit.WorkOrder:NumberSequence", + "ogit.WorkOrder:Setting", + "ogit.WorkOrder:CustomerPortalUser", + "ogit.WorkOrder:PasswordEntry", + "ogit.WorkOrder:TimeSheet", + ] { + let ptr = registry + .resolve_uri(entity) + .unwrap_or_else(|| panic!("{entity} must resolve in WorkOrder namespace")); + assert!( + ptr.namespace_id().is_known(), + "{entity} must carry an assigned NamespaceId" + ); + } + + // The WoA bridge must lock all 15 to its single G partition. + let bridge = lance_graph_ontology::bridges::WoaBridge::new(registry.clone()) + .expect("WoaBridge constructs once WorkOrder namespace is hydrated"); + let order_ptr = bridge + .entity_by_uri(&lance_graph_ontology::OgitUri::parse("ogit.WorkOrder:Order").unwrap()) + .expect("Order resolves through the bridge scope-lock"); + assert_eq!(order_ptr.schema_ptr.namespace_id(), bridge.g_lock()); +} + +#[test] +fn idempotent_re_hydration_is_fast() { + let Some(root) = ogit_root() else { + eprintln!("SKIP: OGIT fork not found (set OGIT_FORK_PATH)"); + return; + }; + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let r1 = registry + .hydrate_once_sync(&root, &["Network"]) + .expect("first hydration"); + let t = std::time::Instant::now(); + let r2 = registry + .hydrate_once_sync(&root, &["Network"]) + .expect("second hydration"); + let elapsed = t.elapsed(); + assert!(r1.registered > 0); + assert!(r2.from_cache, "second hydration must short-circuit"); + assert!( + elapsed.as_millis() < 250, + "idempotent re-hydration should be fast; got {elapsed:?}" + ); +} diff --git a/crates/lance-graph-ontology/tests/round_trip_ttl.rs b/crates/lance-graph-ontology/tests/round_trip_ttl.rs new file mode 100644 index 00000000..24bfe354 --- /dev/null +++ b/crates/lance-graph-ontology/tests/round_trip_ttl.rs @@ -0,0 +1,382 @@ +//! End-to-end TTL hydration test. +//! +//! Builds a tiny TTL fixture, writes it to a tempdir, hydrates the +//! registry, and asserts that resolution by `(bridge_id, public_name)` and +//! by OGIT URI both work. + +use lance_graph_ontology::{NamespaceBridge, OntologyRegistry}; +use std::fs; +use std::sync::Arc; + +const FIXTURE: &str = r#" +@prefix ogit: . +@prefix ogit.RoundTrip: . +@prefix rdfs: . +@prefix dcterms: . + +ogit.RoundTrip:Widget + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Widget"; + dcterms:description "A test entity." ; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( + ogit:id + ); + ogit:optional-attributes ( + ogit:name + ); +. + +ogit.RoundTrip:Sprocket + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Sprocket"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( + ogit:id + ); + ogit:optional-attributes ( ) ; +. +"#; + +#[test] +fn ttl_round_trip_in_memory() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("RoundTrip").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Widget.ttl"), FIXTURE).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(tmp.path(), &["RoundTrip"]) + .expect("hydration"); + assert!(report.is_clean(), "failures: {:?}", report.failures); + assert!(report.registered >= 2, "report: {report:?}"); + + let widget = registry + .resolve("ogit", "ogit.RoundTrip:Widget") + .expect("widget by bridge_id+public_name"); + let sprocket = registry + .resolve_uri("ogit.RoundTrip:Sprocket") + .expect("sprocket by URI"); + assert_ne!(widget, sprocket); + assert_eq!(widget.namespace_id(), sprocket.namespace_id()); +} + +#[test] +fn ttl_round_trip_idempotent() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("RoundTrip").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Widget.ttl"), FIXTURE).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let r1 = registry + .hydrate_once_sync(tmp.path(), &["RoundTrip"]) + .expect("first hydration"); + let r2 = registry + .hydrate_once_sync(tmp.path(), &["RoundTrip"]) + .expect("second hydration"); + assert!(r1.registered >= 2); + assert!(r2.from_cache, "second hydration must short-circuit"); +} + +#[test] +fn export_ttl_writes_file() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("RoundTrip").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Widget.ttl"), FIXTURE).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + registry + .hydrate_once_sync(tmp.path(), &["RoundTrip"]) + .unwrap(); + + let out = tmp.path().join("export.ttl"); + registry.export_ttl("RoundTrip", &out).unwrap(); + let body = fs::read_to_string(&out).unwrap(); + assert!(body.contains("ogit.RoundTrip:Widget")); + assert!(body.contains("ogit.RoundTrip:Sprocket")); +} + +#[test] +fn ogit_bridge_for_namespace_locks() { + use lance_graph_ontology::bridges::OgitBridge; + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("RoundTrip").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Widget.ttl"), FIXTURE).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + registry + .hydrate_once_sync(tmp.path(), &["RoundTrip"]) + .unwrap(); + + let bridge = OgitBridge::for_namespace(registry.clone(), "RoundTrip").unwrap(); + let entity = bridge.entity("ogit.RoundTrip:Widget").unwrap(); + assert_eq!(entity.schema_ptr.namespace_id(), bridge.g_lock()); +} + +// ------------------------------------------------------------------------- +// Probe TTL-PROBE-1: Malformed TTL must produce HydrationFailure (no panic). +// ------------------------------------------------------------------------- +// +// The OGIT NTO tree is human-edited; truncated edits and missing prefixes are +// realistic failure modes. The contract from `parse_into_proposals` is that +// these surface as `HydrationFailure { source, reason }` on the +// `HydrationReport`, never as a panic that aborts the whole crawl. +const MALFORMED_TTL: &str = r#" +ogit.Bad:Thing + a rdfs:Class ; + rdfs:subClassOf ogit:Entity ; + ogit:mandatory-attributes ( ogit:id +. +"#; + +#[test] +fn malformed_ttl_yields_hydration_failure() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("Bad").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Thing.ttl"), MALFORMED_TTL).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(tmp.path(), &["Bad"]) + .expect("hydrate_once_sync must not return Err for parser-level failures"); + assert!( + !report.is_clean(), + "expected at least one HydrationFailure, got clean report" + ); + assert!( + report.failures.iter().any(|f| f.source.contains("Thing.ttl")), + "malformed file must appear in failures: {:?}", + report.failures + ); +} + +// ------------------------------------------------------------------------- +// Probe TTL-PROBE-2: Entity with empty mandatory-attributes ( ) registers. +// ------------------------------------------------------------------------- +// +// An OGIT entity may legitimately declare no mandatory attributes (e.g. an +// abstract base class, or a stub awaiting refinement). Empty `()` lists must +// not block registration; the entity should land in the registry with an +// empty Schema. +const EMPTY_MANDATORY_TTL: &str = r#" +@prefix ogit: . +@prefix ogit.Empty: . +@prefix rdfs: . + +ogit.Empty:Stub + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Stub"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( ) ; + ogit:optional-attributes ( ) ; +. +"#; + +#[test] +fn entity_with_empty_attribute_lists_registers() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("Empty").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Stub.ttl"), EMPTY_MANDATORY_TTL).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(tmp.path(), &["Empty"]) + .expect("hydration"); + assert!(report.is_clean(), "failures: {:?}", report.failures); + assert!( + report.registered >= 1, + "empty-attr entity must register: {report:?}" + ); + let stub = registry + .resolve_uri("ogit.Empty:Stub") + .expect("stub resolves by URI"); + assert!(stub.namespace_id().is_known(), "namespace G must be assigned"); + assert_eq!( + registry.namespace_id("Empty"), + Some(stub.namespace_id()), + "registry must map 'Empty' to the same NamespaceId" + ); +} + +// ------------------------------------------------------------------------- +// Probe TTL-PROBE-3: Multiple entities in one TTL file. +// ------------------------------------------------------------------------- +// +// Real OGIT (e.g. ogit.ttl at the root of the NTO repo) declares many +// classes in a single file. Verify our parser indexes by subject and emits +// one proposal per entity, not one per file. +const MULTI_ENTITY_TTL: &str = r#" +@prefix ogit: . +@prefix ogit.Multi: . +@prefix rdfs: . + +ogit.Multi:Alpha + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Alpha"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( ogit:id ) ; + ogit:optional-attributes ( ) ; +. + +ogit.Multi:Beta + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Beta"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( ogit:id ) ; + ogit:optional-attributes ( ogit:name ) ; +. + +ogit.Multi:Gamma + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Gamma"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( ogit:id ) ; + ogit:optional-attributes ( ) ; +. +"#; + +#[test] +fn multi_entity_ttl_emits_one_proposal_per_subject() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("Multi").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("all.ttl"), MULTI_ENTITY_TTL).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(tmp.path(), &["Multi"]) + .expect("hydration"); + assert!(report.is_clean(), "failures: {:?}", report.failures); + assert!( + report.registered >= 3, + "all three entities from one TTL file must register: {report:?}" + ); + for name in ["ogit.Multi:Alpha", "ogit.Multi:Beta", "ogit.Multi:Gamma"] { + registry + .resolve_uri(name) + .unwrap_or_else(|| panic!("{name} must resolve")); + } +} + +// ------------------------------------------------------------------------- +// Probe TTL-PROBE-4: TTL with @base declaration parses correctly. +// ------------------------------------------------------------------------- +// +// The Turtle spec allows `@base ` to set the base for relative IRIs. +// Our parser hard-codes a base of `http://www.purl.org/ogit/`; verify a +// document that declares an explicit @base matching the OGIT root still +// produces correct proposals. +const BASE_DECL_TTL: &str = r#" +@base . +@prefix ogit: . +@prefix ogit.BaseDecl: . +@prefix rdfs: . + +ogit.BaseDecl:Item + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Item"; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( ogit:id ) ; + ogit:optional-attributes ( ) ; +. +"#; + +#[test] +fn base_declaration_does_not_break_parser() { + let tmp = tempfile::tempdir().expect("tempdir"); + let ns_dir = tmp.path().join("BaseDecl").join("entities"); + fs::create_dir_all(&ns_dir).unwrap(); + fs::write(ns_dir.join("Item.ttl"), BASE_DECL_TTL).unwrap(); + + let registry = Arc::new(OntologyRegistry::new_in_memory()); + let report = registry + .hydrate_once_sync(tmp.path(), &["BaseDecl"]) + .expect("hydration"); + assert!(report.is_clean(), "failures: {:?}", report.failures); + assert!(report.registered >= 1, "report: {report:?}"); + let item = registry + .resolve_uri("ogit.BaseDecl:Item") + .expect("item resolves by URI"); + assert!(item.namespace_id().is_known(), "namespace G must be assigned"); + assert_eq!( + registry.namespace_id("BaseDecl"), + Some(item.namespace_id()), + "registry must map 'BaseDecl' to the same NamespaceId" + ); +} + +// ------------------------------------------------------------------------- +// Probe TTL-PROBE-5: dcterms:source annotation should round-trip. +// ------------------------------------------------------------------------- +// +// The TTL spec carries an optional `dcterms:source` per entity (provenance +// pointer to the upstream definition). Today the dictionary `source_uri` +// column is set to the local `file://...` path. If a TTL declares its own +// `dcterms:source`, that value is currently dropped — see TECH_DEBT entry. +// This probe documents the gap by asserting that it is in fact dropped. +const DCTERMS_SOURCE_TTL: &str = r#" +@prefix ogit: . +@prefix ogit.Provenance: . +@prefix rdfs: . +@prefix dcterms: . + +ogit.Provenance:Tracked + a rdfs:Class; + rdfs:subClassOf ogit:Entity; + rdfs:label "Tracked"; + dcterms:source ; + ogit:scope "NTO"; + ogit:parent ogit:Node; + ogit:mandatory-attributes ( ogit:id ) ; + ogit:optional-attributes ( ) ; +. +"#; + +#[test] +fn dcterms_source_is_currently_dropped() { + use lance_graph_ontology::ttl_parse::TtlSource; + use lance_graph_ontology::semantic_types::SemanticTypeMap; + + let path = std::path::PathBuf::from("dcterms_probe.ttl"); + let src = TtlSource::from_bytes(path.clone(), DCTERMS_SOURCE_TTL.as_bytes().to_vec()); + let sem = SemanticTypeMap::defaults(); + let proposals = src + .parse_into_proposals("ogit", sem) + .expect("dcterms TTL must parse"); + let entity = proposals + .iter() + .find(|p| p.public_name == "ogit.Provenance:Tracked") + .expect("entity must register"); + // CURRENT behaviour: source_uri is the local file path, not the + // dcterms:source IRI from the TTL. This assertion locks the gap so a + // future fix flips this test (and TTL-PROBE-5 in TECH_DEBT.md closes). + assert!( + entity.source_uri.starts_with("file:"), + "expected file:-prefixed source_uri, got {:?}", + entity.source_uri + ); + assert!( + !entity.source_uri.contains("github.com"), + "BUG: dcterms:source already round-trips, update test and close TTL-PROBE-5" + ); +}