Skip to content

feat(ontology): seed NamespaceRegistry with bO-* upstream vocabs (PR #407 follow-up)#408

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/hydrate-dolce-dul-owl-Ce9Oa
May 21, 2026
Merged

feat(ontology): seed NamespaceRegistry with bO-* upstream vocabs (PR #407 follow-up)#408
AdaWorldAPI merged 1 commit into
mainfrom
claude/hydrate-dolce-dul-owl-Ce9Oa

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

@AdaWorldAPI AdaWorldAPI commented May 21, 2026

Summary

Follow-up to merged PR #407. Adds the 13 new external vocabularies as
seeded entries in NamespaceRegistry::seed_defaults() — the canonical
IRI ↔ context_id matching table backed by lance_cache.rs's Lance
dataset for O(1) lookup. Closes the gap between "the hydrator exists"
(PR #407) and "consumers can look up the namespace by shortname".

Why this lives in lance-graph-ontology, not in OGIT

User direction 2026-05-21:

"if OWL files are public, preferably don't mess with it otherwise
their use is tainted"

"expand always but drift is probably bad"

"deinterlace them locally and keep that matching table in a lance
table for O(1) and check what lance-graph-ontology has in regards"

Per those constraints:

  • Public OWL/RDF/XSD/Schematron source files in data/ontologies/
    stay pristine (DOLCE+DUL, FIBO, OWL-Time, PROV-O, QUDT, schema.org,
    SKOS, ZUGFeRD CII). Never modified — only parsed at hydrate time.
  • The OGIT repo is authoritative for namespace registrations but
    adding new TTL files there with hand-picked contextIds would
    drift OGIT's existing dense Medical 10-19 allocation pattern.
  • The matching table is a CLIENT concern → lives in lance-graph-ontology,
    keyed by namespace shortname, persisted via the existing lance_cache
    layer (already O(1)).

Allocation

Slots picked to extend the existing dense-within-family pattern
(Medical/ 10-19) without colliding:

ID Namespace bO-PR Hydrator
0 SMB pre-existing (export-only)
1 WorkOrder pre-existing woa-rs
2 Healthcare pre-existing medcare-rs
3 Network pre-existing
4 EmailCorrespondance pre-existing
5 SharePoint pre-existing
10-19 Medical/ pre-existing bioportal hydrators
20 Foundation/DOLCE-DUL bO-1 hydrate_dolce
21 Foundation/OWL-Time bO-2 hydrate_owltime
22 Foundation/PROV-O bO-3 hydrate_provo
23 Foundation/QUDT bO-4 hydrate_qudt
24 Foundation/schema-org bO-8 hydrate_schemaorg
25 Foundation/SKOS bO-5 hydrate_skos
30 FinancialAccounting/FIBO-FND bO-6 hydrate_fibo_fnd
31 FinancialAccounting/FIBO-BE bO-7 hydrate_fibo_be
32 FinancialAccounting/ZUGFeRD bO-16 hydrate_zugferd
33 FinancialAccounting/ZUGFeRD-Rules bO-15 hydrate_zugferd_rules
34 FinancialAccounting/SKR03 bO-13 hydrate_skr03
35 FinancialAccounting/SKR04 bO-13 hydrate_skr04
36 FinancialAccounting/SKR03-Bau bO-13 hydrate_skr03_bau

allocate() continues to fill gaps 6..=9 and 26..=29 first, then
37+ — matching the existing Medical-as-dense-family pattern.

Diff scope

  • crates/lance-graph-ontology/src/namespace_registry.rs:
    • seed_defaults adds 13 entries
    • next_free_id doc-comment updated (was claiming "20" as first
      dynamic id; actual was 6 even before this PR)
    • Two unit tests updated (_has_twenty_nine_entries,
      _assigns_canonical_ids) + allocate_skips_to_first_unused_id
      len assertion
  • crates/lance-graph-ontology/tests/context_id_test.rs:
    • _seed_defaults_assigns_canonical_v1_ids adds spot-checks at the
      new IDs (20, 25, 30, 36)
    • _allocate_is_idempotent_and_dense len assertion 18 → 31

Net: +62 / -11 lines, 2 files. No behavior change for existing
consumers (the 16 pre-existing entries keep their canonical IDs).

Test plan

  • All 116 lance-graph-ontology tests pass (was 116 before; same
    count — 2 integration tests updated in-place rather than added)
  • cargo clippy clean (5 pre-existing oxrdf deprecation warnings,
    no new ones)
  • Downstream consumers build clean: lance-graph-callcenter,
    lance-graph-consumer-conformance, cognitive-shader-driver
  • OGIT repo NOT modified — no TTL drift
  • data/ontologies/ source files NOT modified — pristine upstream

Cross-references


Generated by Claude Code

Summary by CodeRabbit

  • Chores
    • Extended namespace registry seed allocations to support Foundation and FinancialAccounting systems.
    • Updated test suite to reflect expanded namespace allocation expectations.

Review Change Stack

… vocabs

Companion to PR #407 (merged). Expands `NamespaceRegistry::seed_defaults`
from 16 to 29 entries, registering the 13 external vocabularies that
PR #407 added hydrators for. This is the O(1) IRI ↔ context_id matching
table backed by `lance_cache.rs`'s Lance dataset; consumers like
smb-office-rs and woa-rs lookup by namespace shortname instead of
hand-rolling slot constants.

Why this lives in lance-graph-ontology, not in OGIT:
- Public OWL/RDF source files stay pristine in data/ontologies/
  (DOLCE+DUL, FIBO-FND/BE, OWL-Time, PROV-O, QUDT, schema.org, SKOS,
  ZUGFeRD CII XSDs + Schematron). Modifying them taints downstream use.
- The OGIT repo is authoritative for namespace registrations but adding
  new TTL files there with hand-picked contextIds would be drift.
- The matching table belongs in the CLIENT (lance-graph-ontology), keyed
  by namespace shortname, persisted via the existing lance_cache layer.
- Per user direction 2026-05-21: "expand always but drift is probably bad"
  + "deinterlace them locally and keep that matching table in a lance
  table for O(1) and check what lance-graph-ontology has in regards"
  → expansion lives here, OGIT untouched.

Allocation:

  ID    Namespace                            PR / Hydrator
  ─────────────────────────────────────────────────────────
   0    SMB                                  (pre-existing)
   1    WorkOrder                            (pre-existing)
   2    Healthcare                           (pre-existing)
   3    Network                              (pre-existing)
   4    EmailCorrespondance                  (pre-existing)
   5    SharePoint                           (pre-existing)
  10-19 Medical/<sub>                        (pre-existing, dense)
  20    Foundation/DOLCE-DUL                 bO-1   hydrate_dolce
  21    Foundation/OWL-Time                  bO-2   hydrate_owltime
  22    Foundation/PROV-O                    bO-3   hydrate_provo
  23    Foundation/QUDT                      bO-4   hydrate_qudt
  24    Foundation/schema-org                bO-8   hydrate_schemaorg
  25    Foundation/SKOS                      bO-5   hydrate_skos
  30    FinancialAccounting/FIBO-FND         bO-6   hydrate_fibo_fnd
  31    FinancialAccounting/FIBO-BE          bO-7   hydrate_fibo_be
  32    FinancialAccounting/ZUGFeRD          bO-16  hydrate_zugferd
  33    FinancialAccounting/ZUGFeRD-Rules    bO-15  hydrate_zugferd_rules
  34    FinancialAccounting/SKR03            bO-13  hydrate_skr03
  35    FinancialAccounting/SKR04            bO-13  hydrate_skr04
  36    FinancialAccounting/SKR03-Bau        bO-13  hydrate_skr03_bau

Allocation policy matches the existing Medical/<sub> pattern: dense
within family-range, gaps between ranges left as expansion room.
`allocate()` continues to fill gaps 6..=9 and 26..=29 first, then 37+.

Notes:
- `next_free_id` doc-comment updated to reflect the new seed layout.
  First dynamic id is now 6 (was already 6 in practice; the prior
  comment said "20" which was off by 14).
- Three regression tests updated:
  * `seed_defaults_has_sixteen_entries` → `_has_twenty_nine_entries`
  * `seed_defaults_assigns_canonical_ids` adds spot-checks at 20/25/30/34/35/36
  * `allocate_skips_to_first_unused_id` len assertion 16 → 29
- One integration test (`tests/context_id_test.rs`) updated to match.

All 116 lance-graph-ontology tests pass; clippy clean (5 pre-existing
oxrdf deprecation warnings, no new); downstream consumers
(callcenter, consumer-conformance, cognitive-shader-driver) build clean.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The pull request expands the namespace registry's pre-seeded namespace allocations from 16 to 29 entries by adding reserved ranges for Foundation/* (context ids 20–29) and FinancialAccounting/* (context ids 30–39). The implementation, unit tests, and integration tests are all updated consistently to reflect this expansion.

Changes

Expanded Namespace Registry Seeding

Layer / File(s) Summary
Expanded seed defaults allocation
crates/lance-graph-ontology/src/namespace_registry.rs
The seed_defaults method increases HashMap capacity and inserts new reserved namespaces for Foundation/* (ids 20–29) and FinancialAccounting/* (ids 30–39), with updated documentation describing the first dynamic id after the expanded ranges.
Seed defaults test assertions
crates/lance-graph-ontology/src/namespace_registry.rs
Unit tests assert that seed_defaults now contains 29 entries (instead of 16) and verify canonical context ids for each newly seeded Foundation and FinancialAccounting namespace, including allocate behavior expectations.
Integration tests for expanded allocations
crates/lance-graph-ontology/tests/context_id_test.rs
Test expectations are updated to reflect the new 29-entry seed baseline: assertions verify the expanded Foundation and FinancialAccounting namespace mappings, and the expected registry length after two allocations is adjusted from 18 to 31.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A rabbit hops through namespaces wide,
Foundation and Finance now unified side by side,
Twenty-nine seeds in the registry sown,
Dynamic allocations claim what's yet unknown!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main change: seeding the NamespaceRegistry with upstream vocabularies as a follow-up to PR #407, matching the documented objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/hydrate-dolce-dul-owl-Ce9Oa

Comment @coderabbitai help to get the list of available commands and usage tips.

@AdaWorldAPI AdaWorldAPI merged commit f8c6236 into main May 21, 2026
4 of 5 checks passed
AdaWorldAPI pushed a commit that referenced this pull request May 21, 2026
…O(1) inheritance from family buckets

Two user clarifications, folded into §4:

1. The OWL/DOLCE cross-walk table is not "interop crutches" — it is
   the SOURCE MATERIAL from which lance-graph-ontology constructs
   the OGIT/OWL/DOLCE mapping. The hydrators are the construction
   tool; the cross-walk standards (DOLCE+DUL, OWL-Time, PROV-O,
   QUDT, schema.org, FIBO, SKOS, Schematron, XSD, SKR DATEV,
   ZUGFeRD) are the bricks; OGIT canonical surface (per-family
   codebook + inherits-from DAG + edge whitelists at OGIT::*_V1
   slots) is the synthesis.

2. The hydrators + inherits_from + per-family codebook + family-
   bucket dense array TOGETHER form the spine that makes schema /
   label / codebook inheritance O(1) cheap at lookup time.

§4.1 rewritten as "the source material for the OGIT mapping" —
direction-of-build diagram added showing how each external
standard flows through its hydrator into its OGIT::*_V1 G-slot.
The MetaAnchors fields (foundry_object_type, owl_upper_class,
dolce_marker, wikidata_qid) are reframed as the runtime READ
SURFACE over the constructed mapping, not as the populated
target. DolceMarker enum naming open question (Endurant/
Perdurant vs Object/Event per canonical DUL rename) called out
explicitly as a decision needed before D-UB-3.

§4.3 lead-in reframed from "the cross-walk surface is now
concrete" to "the construction tool that builds the OGIT
mapping is now concrete in lance_graph_ontology::hydrators::*".
PR #408 reference added (NamespaceRegistry::seed_defaults wires
the corresponding G-slots at boot).

§4.4 NEW — locks the O(1) inheritance property:
- Schema inheritance: inherits_from chain flattened into
  FamilyEntry.axiom_blob at hydration; query-time cost is one
  masked u16 + one array index = O(1). Zero chain-walks at
  query time.
- Label inheritance: rdfs:label per-locale collapsed during the
  subClassOf walk at hydration; FamilyEntry.label_* reads are
  O(1) array index. Zero parent lookup at query time.
- Codebook inheritance: per-family centroid references parent
  codebook by u8 offset when content distributions overlap (with
  Jirak-aware Berry-Esseen bound per I-NOISE-FLOOR-JIRAK). One
  indirection max.
- Why family buckets vs flat dict: ~5ns L1-cache-resident two
  array indices vs ~50-100ns hash + collision + cache miss = 20×
  cost gap. Co-design between construction tool (hydrators) and
  runtime substrate (family buckets) — neither earns the
  property alone.

Concrete consumer-side payoffs spelled out for woa-bridge,
medcare-bridge, smb-bridge: pre-baked schema / label / codebook
inheritance means route handlers read one FamilyEntry per row
identity; no OWL reasoning, no rdfs:label walk, no Schematron
re-parse at query time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants