diff --git a/.claude/BOOT.md b/.claude/BOOT.md new file mode 100644 index 00000000..373ef2da --- /dev/null +++ b/.claude/BOOT.md @@ -0,0 +1,215 @@ +# BOOT — Session Entry Point for lance-graph + +> **If you are a new session on this workspace, read this file first.** +> It is the one page that tells you everything you need to bootstrap +> before proposing any work. Target cold-start cost: 3–5 turns, not 30. + +--- + +## Read Order (MANDATORY) + +Read these in order before proposing anything: + +1. **`.claude/knowledge/LATEST_STATE.md`** — current contract + inventory, recently shipped PRs, active branches, queued work, + explicit deferrals. Tells you **what exists**. +2. **`.claude/knowledge/PR_ARC_INVENTORY.md`** — per-PR Added / + Locked / Deferred / Docs / Confidence, reverse-chronological. + APPEND-ONLY (only Confidence is mutable; corrections append as + new dated lines; reversals get their own PR entry). Tells you + **why it exists**. +3. **`.claude/agents/BOOT.md`** — the 19-specialist + 5-meta-agent + ensemble, the Knowledge Activation trigger table (domain → + agent → required knowledge docs), the Handover Protocol spec. + Tells you **how to coordinate**. + +These three files give you ~90 % of the context you need to avoid +re-proposing what's shipped or violating a locked convention. + +Two companion dashboards (consult when deliverable status or plan +version matters — typically mid-session, not at cold start): + +- **`.claude/knowledge/STATUS_BOARD.md`** — deliverable-level + dashboard. All D-ids across every active plan with Status + (Shipped / In PR / In progress / Queued / Backlog / Deferred / + Abandoned). Plus infrastructure status, research threads, and + the 102-file prior-art audit. +- **`.claude/knowledge/INTEGRATION_PLANS.md`** — versioned plan + index, APPEND-ONLY. New plan versions prepend; prior versions + stay with Status annotation. Active plan lives at + `.claude/plans/-v.md`. + +--- + +## The Governance Rules (never violate) + +1. **Append-only on bookkeeping files.** Eight files carry the + workspace's historical record; rows / sections inside them are + immutable, with a short list of mutable fields per file. + + **Method hierarchy (preferred → discouraged):** + + 1. **APPEND** (the method of choice) — add a NEW dated row to + an existing section. Either via Edit (prepend inside a `---` + bounded section) or Bash `cat >> file << EOF`. No prompt. + The old row stays untouched. This is the double-bookkeeping + pattern — new state enters as a new entry, not by mutating + the old. + 2. **Edit field with prior Read** — flip a mutable field + (Status / Confidence / Resolution / Payoff) on an existing + row after reading the file. No prompt. Use for clarifications, + status transitions, and minor updates. Prior Read is the + workspace discipline (already in `CLAUDE.md`). + 3. **Write (full overwrite)** — prompts for approval on every + bookkeeping file via `.claude/settings.json::permissions.ask`. + Discouraged; only for wholesale replacement of a file that's + been through explicit review. Never the default answer. + + Rule: **when in doubt, append**. The double-bookkeeping habit + (new row rather than edited old row) preserves the full arc and + keeps the audit trail legible. Edit-a-field is acceptable when + the change is clearly a status transition on an existing entry; + Write is the escape hatch that costs a confirmation. + + | File | Immutable | Mutable fields | + |---|---|---| + | `PR_ARC_INVENTORY.md` | PR rows (Added / Locked / Deferred / Docs) | Confidence line per entry. Corrections APPEND as dated lines. | + | `LATEST_STATE.md` | Recently-shipped PR table rows | Snapshot sections (Current Inventory / Active Branches / Queued / Deferred) are updated by replacement, not history | + | `STATUS_BOARD.md` | Deliverable rows (D-id / title / plan-version / scope) | Status column, PR / Evidence column per row | + | `INTEGRATION_PLANS.md` | Plan entries (scope / path / deliverables) | Status and Confidence lines per entry | + | `EPIPHANIES.md` | Dated entry bodies | Status line (FINDING / CONJECTURE / SUPERSEDED) | + | `ISSUES.md` | Dated entry bodies | Status line (Open / Resolved / Wontfix / Superseded) + Resolution line (append on close) | + | `IDEAS.md` | Dated entry bodies | Status line (Open / Implemented / Rejected / Deferred / Reactivated) + Rationale (append) | + | `TECH_DEBT.md` | Dated entry bodies | Status line (Open / Paid / Moot) + Payoff line (append) + Priority / Scope (filled at creation, stable) | + + Core invariant: **rows are history; specific fields are state; + never delete a row.** Supersedure is a new row that cites the + old; the old row's Status updates to "Superseded by ". + + **Kanban discipline** — every entry in `ISSUES.md`, `IDEAS.md`, + `TECH_DEBT.md` carries `**Priority:**` (P0-P3) and + `**Scope:**` (@agent D domain:) fields. Agents filter + the board by their own `@`-mention or domain. Status moves + through Open → In Progress → Resolved/Implemented/Paid. Nothing + falls through the cracks because every ticket has an owner + mentioned by `@`. +2. **Model policy.** Main thread on Opus with deep thinking. + Subagents: Sonnet for grindwork (single-source mechanical), + Opus for accumulation (multi-source synthesis). **Never Haiku**, + regardless of task. Full spec in CLAUDE.md §Model Policy. +3. **GitHub access.** Zipball to `/tmp/sources/` + local grep for + 3+ reads per external repo. MCP github only for writes (PR, + comments, reviews) and single-path reads. Full spec in + CLAUDE.md §GitHub Access Policy. +4. **Contract zero-dep invariant.** `lance-graph-contract` has no + external crate deps. Do not add any. +5. **Read before Write.** Always Read a file before overwriting. + Write-over-self without Read is the documented failure mode; + see CLAUDE.md §In-Session Orchestration Discipline. +6. **No JSON serialization in types.** Serde stays debug-only; wire + formats are explicit. +7. **Pumpkin framing** for externally-visible examples (clinical / + game-AI disguise for the AGI primitives). + +--- + +## A2A Orchestration — Two Layers + +Orchestration on this workspace runs at two distinct layers: + +- **Layer 1 — Runtime A2A:** `lance_graph_contract::a2a_blackboard` + `Blackboard` + `BlackboardEntry` (experts write, later rounds + read), `OrchestrationBridge` + `StepDomain` + `UnifiedStep`, + `orchestration_mode` for composition modes. For + cognitive-cycle composition inside the running system. +- **Layer 2 — Session A2A:** The mandatory-read files above are + the shared blackboard across subagents. `.claude/knowledge/*.md` + are extended blackboard entries. `.claude/handovers/*.md` + (created per agent chain) carry per-handover state. Parallel + `Agent` spawns in one main-thread turn are the cheapest + coordination pattern. + +Full description: CLAUDE.md §Agent-to-Agent (A2A) Orchestration. +Detailed agent ensemble + trigger table + handover protocol: +`.claude/agents/BOOT.md`. + +--- + +## The 30-Turn Rediscovery Tax (why this file exists) + +Without this BOOT.md, a new session would typically: + +1. Not know the current contract — propose duplicate types. +2. Not know recently locked conventions — violate them, get + reverted, propose again. +3. Not know queued work — suggest things already in the plan. +4. Not know the agent ensemble — re-invent subagent coordination. +5. Re-read the same large knowledge docs multiple times because + nothing tells it which docs matter for which work. + +Cumulative cost: 20–30 turns of main-thread context before the +session is productive. That's ~$20–40 in Opus charges per fresh +session, most of which is wasted rediscovery. + +Reading the three mandatory files above reduces this to ~3–5 turns. +The entire BOOT spec (this file) is ~2 KB. The three mandatory +reads total ~15 KB. Compared to 20–30 wasted turns, the bootload +is a 5–10× cost reduction on every session start. + +--- + +## Existing content — don't duplicate, link + +This workspace already has substantial curated content. New work +should reference these, not recreate them: + +- **`.claude/prompts/`** — 41 scoped prompts (sessions, + certifications, probes, handovers, research surfaces). Each is a + self-contained task brief. See `.claude/prompts/SCOPED_PROMPTS.md` + as the natural index. +- **`.claude/plans/`** — versioned integration plans. Index at + `.claude/knowledge/INTEGRATION_PLANS.md` (APPEND-ONLY — new + versions prepend; prior plans stay with Status annotation). + Active: `.claude/plans/elegant-herding-rocket-v1.md`. +- **`.claude/*.md`** (top-level, 61 docs) — calibration reports, + handover logs, epiphanies compressed, integration-plan snapshots, + cross-repo audits, invariant matrices. Browse before writing new + reference docs. Examples: `EPIPHANIES_COMPRESSED.md` in prompts/, + `SESSION_CAPSTONE.md`, `INTEGRATIONSPLAN_2026_04_01.md`, + `INTEGRATION_SESSIONS.md`, `INVENTORY_MAP.md`. +- **`.claude/knowledge/*.md`** (newer, structured) — the knowledge + base proper with `READ BY:` headers + Knowledge Activation + triggers. See `.claude/agents/BOOT.md` § Knowledge Activation for + the trigger table. +- **`.claude/agents/*.md`** (19 specialists + 5 meta-agents) — + ensemble cards. See `.claude/agents/README.md` for the function + inventory or `BOOT.md` (sibling) for the orchestration spec. +- **`.claude/hooks/*.sh`** — SessionStart and PostCompact hooks + wired via `.claude/settings.json`. +- **`.claude/skills/cca2a/`** — the A2A pattern explanation skill. + +Before creating a new `.claude/*.md` file, grep the existing 61 +docs and 41 prompts for the topic. Most architectural concerns have +prior art. + +## Fallback — CLAUDE.md is the source of truth for everything else + +If this file doesn't answer your question, CLAUDE.md does. Tables +of contents: + +- CLAUDE.md §Session Start — mandatory reads (same three as above) +- CLAUDE.md §A2A Orchestration — both layers in detail +- CLAUDE.md §Model Policy — grindwork vs accumulation, never Haiku +- CLAUDE.md §GitHub Access Policy — zipball for reads +- CLAUDE.md §Workspace Structure — 11-crate layout +- CLAUDE.md §Knowledge Base — all `.claude/knowledge/` files +- CLAUDE.md §Knowledge Activation (MANDATORY) — agent protocol +- CLAUDE.md §In-Session Orchestration Discipline — Read before Write + +--- + +## One sentence + +**Read `LATEST_STATE.md`, `PR_ARC_INVENTORY.md`, and +`.claude/agents/BOOT.md` before proposing anything, then the +trigger-table domain doc, then start work.** diff --git a/.claude/agents/BOOT.md b/.claude/agents/BOOT.md new file mode 100644 index 00000000..62754ffd --- /dev/null +++ b/.claude/agents/BOOT.md @@ -0,0 +1,249 @@ +# Agent Ensemble + +This folder contains focused agent cards for `lance-graph`. + +The goal is not to multiply personalities for decoration. +The goal is to keep unlike architectural concerns from being flattened into one fuzzy voice. + +## Existing agents + +### `container-architect` +Protects the word-level container layout, field mapping, and container-vs-plane boundary. +Use for container schemas, reserved ranges, and read/write invariants. + +### `ripple-architect` +Protects ontology-level distinctions across anatomy, field, sweep, bus, and thought object. +Use for top-level architecture and anti-flattening decisions. + +### `resonance-cartographer` +Protects superpositional field state before collapse. +Use for `ResonanceDto`, searchable field design, HHTL, CLAM, and sweep semantics. + +### `bus-compiler` +Protects explicit structured execution. +Use for `BusDto`, `CausalEdge64`, p64 style mapping, and CognitiveShader compilation. + +### `contradiction-cartographer` +Protects contradiction as first-class structure. +Use for BRANCH, signed residuals, contradiction masks, and conflict-aware revision. + +### `thought-struct-scribe` +Protects durable thought objects and host-facing structured guidance. +Use for `ThoughtStruct`, provenance, revision, and host-model glove semantics. + +### `family-codec-smith` +Protects family-level representation choices. +Use for HEEL/HIP/BRANCH/TWIG/LEAF encodings, codebooks, palettes, and residual strategy. + +### `host-glove-designer` +Protects the interface from explicit thought objects into the host model. +Use for prompt-side, side-channel, adapter-like, or hybrid glove designs. + +### `savant-research` +ZeckBF17 compression, golden-step traversal, octave encoding, distance +metric design. Carries the codec-research domain knowledge and Pareto frontier. + +### `truth-architect` +Guards measurement-before-synthesis discipline and architectural ground truth. +Detects synthesis spirals (elaborate proposals with 0 measurements). Carries +the hard-won BF16-HHTL correction chain (5 iterations, 4 corrections, 0 probes). +**Mandatory reviewer** for any proposal that adds layers, touches HHTL cascade +architecture, or claims γ+φ carries information. See Knowledge Activation below. + +## Knowledge Activation Protocol + +Agents do not operate in a vacuum. When woken, an agent MUST read the knowledge +documents listed in its header (`READ BY:` line) BEFORE producing output. + +**Mandatory knowledge activation triggers:** + +| Trigger | Agent(s) woken | Knowledge loaded first | +|---|---|---| +| Any HHTL cascade proposal | truth-architect + family-codec-smith | bf16-hhtl-terrain.md | +| γ+φ regime discussion | truth-architect | bf16-hhtl-terrain.md | +| φ-spiral / Zeckendorf math | savant-research + truth-architect | zeckendorf-spiral-proof.md, phi-spiral-reconstruction.md | +| Bucketing vs resolution claim | truth-architect | bf16-hhtl-terrain.md | +| Slot D / Slot V layout | container-architect + truth-architect | bf16-hhtl-terrain.md | +| New unification proposal | truth-architect (mandatory review) | bf16-hhtl-terrain.md probe queue | +| Fibonacci mod p traversal | savant-research | savant-research.md (§ golden-step) | +| Which encoding to use? | truth-architect + family-codec-smith | two-basin-routing.md | +| BGZ scope / claims | truth-architect | two-basin-routing.md (§ BGZ must win) | +| Pairwise vs centroid | truth-architect | two-basin-routing.md (§ pairwise rule) | +| Semantic vs distribution | truth-architect | two-basin-routing.md (§ two-basin doctrine) | +| Attribution / blame | truth-architect | two-basin-routing.md (§ attribution discipline) | +| Audio test / carrier+residual | savant-research + truth-architect | two-basin-routing.md (§ audio sanity test) | +| Composing subsystems / integration | truth-architect | frankenstein-checklist.md | +| New abstraction / new struct | truth-architect | frankenstein-checklist.md (§ redundant abstractions) | +| Performance budget question | truth-architect | frankenstein-checklist.md (§ correctness-first) | + +**The insight update cycle:** + +``` +1. Agent proposes a claim +2. truth-architect checks: is there a probe for this? Has it run? +3. If NOT RUN → probe is the next deliverable, not more synthesis +4. If RUN + PASS → update .claude/knowledge/ with FINDING (not CONJECTURE) +5. If RUN + FAIL → update .claude/knowledge/ with correction +6. Commit knowledge update: docs(knowledge): probe [ID] — [result] +``` + +This cycle is MANDATORY. Knowledge docs are living documents that track +the frontier between conjecture and measurement. An insight that hasn't +been probed is labeled CONJECTURE. An insight that has been probed and +passed is labeled FINDING. The distinction must be visible in the doc. + +## Meta-Agents (scopes and when to use) + +Meta-agents coordinate across specialists. They do NOT replace +specialist judgment — they route, prime, and compose. + +| Meta-agent | Scope | +|---|---| +| **`workspace-primer`** | Reads `LATEST_STATE.md` + `PR_ARC_INVENTORY.md`, reports current contract inventory, queued work, active branch. **First thing to wake on any new session.** | +| **`integration-lead`** | Composes specialist outputs into a coherent shipping plan. Use when ≥ 3 specialists have weighed in and their outputs need reconciliation before a PR. | +| **`adk-coordinator`** | Coordinates ADK tooling across crates. Use when orchestration crosses crate boundaries. | +| **`adk-behavior-monitor`** | Watches for drift between spec and implementation on multi-crate PRs. | +| **`truth-architect`** (elevated meta) | Mandatory reviewer on any proposal touching HHTL cascade, γ+φ, or claims-without-probes. | + +Meta-agents run on Opus. Specialists run on Sonnet for single-source +grindwork (draft-file-from-spec); Opus for multi-source accumulation. +Never Haiku. See `../CLAUDE.md § Model Policy`. + +## Session-Start Knowledge Bootload (per agent) + +Every subagent spawned on this workspace loads these in order: + +1. **Tier-0 (MANDATORY, all agents):** + - `.claude/knowledge/LATEST_STATE.md` — current contract inventory, + what's shipped, what's queued, what's explicitly deferred. + - `.claude/knowledge/PR_ARC_INVENTORY.md` — per-PR Added / Locked / + Deferred / Docs / Confidence. APPEND-ONLY. + +2. **Tier-1 (domain-triggered):** load matching docs per the + Knowledge Activation table below. + +3. **Tier-2 (on-demand):** `docs/*.md` loaded when referenced. + +Bootload cost: ~3 Read calls per agent. Fixed-cost insurance against +hallucinating structure that already exists. + +## Knowledge Activation Protocol (updated 2026-04-19) + +Agents do not operate in a vacuum. When woken, an agent MUST read the knowledge +documents listed in its trigger row BEFORE producing output. + +| Trigger | Agent(s) woken | Knowledge loaded first | +|---|---|---| +| Any HHTL cascade proposal | truth-architect + family-codec-smith | bf16-hhtl-terrain.md | +| γ+φ regime discussion | truth-architect | bf16-hhtl-terrain.md | +| φ-spiral / Zeckendorf math | savant-research + truth-architect | zeckendorf-spiral-proof.md, phi-spiral-reconstruction.md | +| Bucketing vs resolution claim | truth-architect | bf16-hhtl-terrain.md | +| Slot D / Slot V layout | container-architect + truth-architect | bf16-hhtl-terrain.md | +| New unification proposal | truth-architect (mandatory review) | bf16-hhtl-terrain.md probe queue | +| Fibonacci mod p traversal | savant-research | savant-research.md (§ golden-step) | +| Which encoding to use? | truth-architect + family-codec-smith | two-basin-routing.md | +| BGZ scope / claims | truth-architect | two-basin-routing.md (§ BGZ must win) | +| Pairwise vs centroid | truth-architect | two-basin-routing.md (§ pairwise rule) | +| Semantic vs distribution | truth-architect | two-basin-routing.md (§ two-basin doctrine) | +| Attribution / blame | truth-architect | two-basin-routing.md (§ attribution discipline) | +| Audio test / carrier+residual | savant-research + truth-architect | two-basin-routing.md (§ audio sanity test) | +| Composing subsystems / integration | truth-architect | frankenstein-checklist.md | +| New abstraction / new struct | truth-architect | frankenstein-checklist.md (§ redundant abstractions) | +| Performance budget question | truth-architect | frankenstein-checklist.md (§ correctness-first) | +| **Grammar / DeepNSM / TEKAMOLO** | integration-lead + truth-architect | grammar-landscape.md, grammar-tiered-routing.md, linguistic-epiphanies-2026-04-19.md | +| **Crystal / CrystalFingerprint / sandwich** | container-architect + family-codec-smith | crystal-quantum-blueprints.md, cross-repo-harvest-2026-04-19.md | +| **NARS inference / thinking styles** | truth-architect + bus-compiler | linguistic-epiphanies-2026-04-19.md (Chomsky isomorphism), integration-plan-grammar-crystal-arigraph.md | +| **Coreference / Markov ±5** | truth-architect + resonance-cartographer | grammar-landscape.md §5 §6, integration-plan-grammar-crystal-arigraph.md | +| **Cross-linguistic bundling** | integration-lead | grammar-landscape.md §4 case inventories, linguistic-epiphanies-2026-04-19.md E22 Rosetta | +| **AriGraph / episodic memory** | ripple-architect + integration-lead | LATEST_STATE.md (§AriGraph Inventory), PR_ARC_INVENTORY.md #208 | +| **Story arc / ONNX emergence** | ripple-architect + contradiction-cartographer | crystal-quantum-blueprints.md (§Quantum mode), endgame-holographic-agi.md | +| **Argmax / codec / PolarQuant** | truth-architect + family-codec-smith | fractal-codec-argmax-regime.md (ORTHOGONAL to grammar work) | +| **Cross-repo harvest** | savant-research | cross-repo-harvest-2026-04-19.md, linguistic-epiphanies-2026-04-19.md | + +**The insight update cycle:** + +``` +1. Agent proposes a claim +2. truth-architect checks: is there a probe for this? Has it run? +3. If NOT RUN → probe is the next deliverable, not more synthesis +4. If RUN + PASS → update .claude/knowledge/ with FINDING (not CONJECTURE) +5. If RUN + FAIL → update .claude/knowledge/ with correction +6. Commit knowledge update: docs(knowledge): probe [ID] — [result] +``` + +This cycle is MANDATORY. + +## Handover Protocol (agent A → agent B) + +When an agent completes work that feeds another agent's scope, the +handover goes through a **handover blackboard entry** — not verbal +chat, not main-thread summary. + +### The handover entry (file under `.claude/handovers/`) + +Path: `.claude/handovers/YYYY-MM-DD-HHMM--to-.md` + +Required sections: + +```markdown +# Handover: +**Date:** YYYY-MM-DD HH:MM +**Scope:** one sentence — what's being handed off + +## What I did +Short factual list. Files touched, tests run, decisions made. +Include commit SHAs where applicable. + +## What I know (FINDING) +Claims I hold as measured / verified. Include the probe or test +that verifies each. + +## What I suspect (CONJECTURE) +Claims I hold as plausible-but-unverified. The receiving agent +should probe or escalate to truth-architect before building on these. + +## What I couldn't do +Blockers, out-of-scope items, ambiguities I punted on. Explicit. + +## What you need to know +Minimum context transfer. Point at knowledge docs + source files, +do not repeat their content. + +## Open questions +Questions the receiving agent must answer before proceeding. Each +bullet has an expected answer shape (yes/no, number, choice of N). +``` + +### Handover rules + +1. **Every multi-agent workflow writes at least one handover.** + Two-agent chain = 1 handover. N agents = N−1 handovers. +2. **Handovers are APPEND-ONLY.** Same governance as + `PR_ARC_INVENTORY.md` — don't rewrite a previous agent's + handover; append a follow-up if correction is needed. +3. **Receiving agent's FIRST action is to read the handover.** + Then load the knowledge docs it cites. Then start work. +4. **Handovers name their successor.** If agent B needs agent C + next, B's handover to C is its final deliverable. +5. **Main thread is the router, not a handover surface.** Main + orchestrates which agents run; agents write handovers that other + agents consume. Keeps main-thread context small. +6. **`truth-architect` review** is a valid handover target from any + agent before a PR lands. Default final link on HHTL / codec / + claims-without-probes work. + +## Orchestration + +Primary orchestration prompt: +- `.claude/agent2agent-orchestrator-prompt.md` + +Core knowledge (workspace-wide): +- `.claude/knowledge/LATEST_STATE.md` (Tier-0, mandatory) +- `.claude/knowledge/PR_ARC_INVENTORY.md` (Tier-0, mandatory) +- `.claude/knowledge/*.md` (Tier-1, trigger-activated — see table above) +- `docs/integrated-architecture-map.md` + +## Rule of use + +Only wake the agents whose objects are actually being touched. +A crowded room is not automatically a wiser room. diff --git a/.claude/agents/README.md b/.claude/agents/README.md index 7a7d0939..83f34b34 100644 --- a/.claude/agents/README.md +++ b/.claude/agents/README.md @@ -1,108 +1,214 @@ -# Agent Ensemble +# Agent Ensemble — Function Inventory + +> **Reference catalog.** Every agent on this workspace, grouped by +> concern, with one-paragraph function + scope + when-to-use. +> +> **Session-start spec** lives in `BOOT.md` (mandatory reads, +> Knowledge Activation triggers, Handover Protocol). Read BOOT.md +> when starting a session; read this file when deciding which +> specialist to wake for a specific task. + +Ensemble size: **19 specialists + 5 meta-agents**. Every card is at +`.claude/agents/.md`. Each card declares its own +`tools`, `model`, and `READ BY:` knowledge prerequisites. + +--- + +## Meta-Agents (orchestration, priming, review) + +Agents that coordinate other agents. All run on Opus. + +### `workspace-primer` +Onboarding agent for any new session that touches lance-graph, +ndarray, or related AdaWorldAPI crates. Distills ~20 canonical rules +and corrections from prior sessions so the next session starts +oriented instead of re-deriving them. +**Wake first** on any new session before proposing architecture. + +### `adk-coordinator` +Agent Development Kit style ensemble coordinator. Frames a problem +that needs multiple specialists, selects the **minimal** agent set, +requires object-level outputs, detects flattening between layers, +produces a consolidated verdict that names productive disagreements. +**Use when** a problem needs 3+ specialists and you need to avoid +noise / overlapping scopes / synthesis loops. + +### `adk-behavior-monitor` +Watches for behavioral anti-patterns during R&D sessions. Fires on: +premature commitment to untested projections, centroid-residual +framing on near-orthogonal data, "225/225 feels like success" +confirmation bias, new codec built when existing one hasn't been +measured, Python inference in a Rust-native pipeline, chained-score +multiplication without chain-collapse validation. **Flags and +redirects, does not block.** + +### `integration-lead` +Cross-session orchestration. Knows what's done, what's pending, what's +outdated across lance-graph and ndarray. Tracks dependencies and +phase gates. **Use when** planning work order, checking +prerequisites, or deciding which session to execute next. -This folder contains focused agent cards for `lance-graph`. +### `truth-architect` +Guards measurement-before-synthesis discipline and architectural +ground truth. Detects synthesis-to-measurement ratios worse than +1:0. Enforces probe-first protocol. Carries the BF16-HHTL correction +chain terrain. **Mandatory reviewer** when any proposal adds layers +without numbers, γ+φ placement is discussed, HHTL cascade is +touched, or any unification is proposed without a falsifying probe. + +--- -The goal is not to multiply personalities for decoration. -The goal is to keep unlike architectural concerns from being flattened into one fuzzy voice. +## Codec / Compression -## Existing agents +### `palette-engineer` +bgz17 crate: palette compression, Base17 encoding, 256×256 distance +matrices, compose tables, PaletteSemiring, PaletteMatrix, PaletteCsr, +SIMD batch distance (AVX-512 / AVX2 / scalar). +**Use for** work in `crates/bgz17/`, palette optimizations, HHTL +layer-1 operations. + +### `family-codec-smith` +Shapes HEEL / HIP / BRANCH / TWIG / LEAF representation choices, +role-aware codebooks, palettes, residuals, family purity benchmarks. +**Use when** deciding how NOT to blur unlike functions into one +codec too early. + +### `savant-research` +ZeckBF17 compression, golden-step traversal, octave encoding, +distance metric design for the codec-research crate. Covers +zeckbf17.rs, accumulator crystallization, Diamond Markov invariant, +HHTL integration, cross-crate alignment with production neighborhood. +**Use for** codec-research work and compression fidelity / rank +correlation claims. + +--- + +## Architecture / Layout ### `container-architect` -Protects the word-level container layout, field mapping, and container-vs-plane boundary. -Use for container schemas, reserved ranges, and read/write invariants. +256-word metadata container layout, word-by-word field mapping, +bgz17 annex at W112–125, local palette CSR at W96–111, +scent/palette neighbor indices at W176–191 (WIDE), cascade stride-16 +sampling, checksum coverage. Guards the boundary between container +(structured) and planes (flat). +**Use for** any work touching container fields, Lance schemas +(`columnar.rs`, `storage.rs`), or container read/write paths. ### `ripple-architect` -Protects ontology-level distinctions across anatomy, field, sweep, bus, and thought object. -Use for top-level architecture and anti-flattening decisions. +Ontology guardian for the ripple architecture. Keeps anatomy, field, +sweep, bus, and thought object distinct. **Use when** shaping +system-wide architecture, naming layers, proposing DTOs, or +preventing premature flattening of unlike functions. + +### `certification-officer` +Runs numerical certification of a derived format (lab BF16, Base17, +bgz-hhtl-d palette, compressed codebook) against a ground-truth +source file, reporting Pearson r, Spearman ρ, and Cronbach α to 4 +decimal places. Always reads real source bytes via `mmap`; samples +deterministically via SplitMix64 seed `0x9E3779B97F4A7C15`; scans for +NaN at every pipeline stage. **Refuses synthetic test inputs.** +**Use when** the task is "prove format X preserves properties of +format Y within target T." + +--- + +## Cognitive Structure ### `resonance-cartographer` -Protects superpositional field state before collapse. -Use for `ResonanceDto`, searchable field design, HHTL, CLAM, and sweep semantics. +Maps superpositional field state, candidate families, pressure, +contradiction, drift, and style mix **before** collapse. +**Use for** work on `ResonanceDto`, searchable fields, CLAM / HHTL +substrate, and sweep logic. ### `bus-compiler` -Protects explicit structured execution. -Use for `BusDto`, `CausalEdge64`, p64 style mapping, and CognitiveShader compilation. +Compiles explicit thought into accountable bus packets for `p64` +and CognitiveShader. **Use when** defining `BusDto`, mapping style +into bus knobs, or proving structure-first execution beats text- +first routing. ### `contradiction-cartographer` -Protects contradiction as first-class structure. -Use for BRANCH, signed residuals, contradiction masks, and conflict-aware revision. +Protects contradiction as first-class structure across family, +branch, bus, and thought-object layers. **Use when** designing +signed residuals, branch encodings, support-vs-contra pressure, or +conflict-aware revision. ### `thought-struct-scribe` -Protects durable thought objects and host-facing structured guidance. -Use for `ThoughtStruct`, provenance, revision, and host-model glove semantics. - -### `family-codec-smith` -Protects family-level representation choices. -Use for HEEL/HIP/BRANCH/TWIG/LEAF encodings, codebooks, palettes, and residual strategy. +Shapes durable thought objects, blackboard semantics, revision +rules, and host-model glove interfaces. **Use when** defining +`ThoughtStruct`, blackboard persistence, provenance, or +metacognitive observation. -### `host-glove-designer` -Protects the interface from explicit thought objects into the host model. -Use for prompt-side, side-channel, adapter-like, or hybrid glove designs. +--- -### `savant-research` -ZeckBF17 compression, golden-step traversal, octave encoding, distance -metric design. Carries the codec-research domain knowledge and Pareto frontier. +## Perspective / Interaction -### `truth-architect` -Guards measurement-before-synthesis discipline and architectural ground truth. -Detects synthesis spirals (elaborate proposals with 0 measurements). Carries -the hard-won BF16-HHTL correction chain (5 iterations, 4 corrections, 0 probes). -**Mandatory reviewer** for any proposal that adds layers, touches HHTL cascade -architecture, or claims γ+φ carries information. See Knowledge Activation below. - -## Knowledge Activation Protocol - -Agents do not operate in a vacuum. When woken, an agent MUST read the knowledge -documents listed in its header (`READ BY:` line) BEFORE producing output. - -**Mandatory knowledge activation triggers:** - -| Trigger | Agent(s) woken | Knowledge loaded first | -|---|---|---| -| Any HHTL cascade proposal | truth-architect + family-codec-smith | bf16-hhtl-terrain.md | -| γ+φ regime discussion | truth-architect | bf16-hhtl-terrain.md | -| φ-spiral / Zeckendorf math | savant-research + truth-architect | zeckendorf-spiral-proof.md, phi-spiral-reconstruction.md | -| Bucketing vs resolution claim | truth-architect | bf16-hhtl-terrain.md | -| Slot D / Slot V layout | container-architect + truth-architect | bf16-hhtl-terrain.md | -| New unification proposal | truth-architect (mandatory review) | bf16-hhtl-terrain.md probe queue | -| Fibonacci mod p traversal | savant-research | savant-research.md (§ golden-step) | -| Which encoding to use? | truth-architect + family-codec-smith | two-basin-routing.md | -| BGZ scope / claims | truth-architect | two-basin-routing.md (§ BGZ must win) | -| Pairwise vs centroid | truth-architect | two-basin-routing.md (§ pairwise rule) | -| Semantic vs distribution | truth-architect | two-basin-routing.md (§ two-basin doctrine) | -| Attribution / blame | truth-architect | two-basin-routing.md (§ attribution discipline) | -| Audio test / carrier+residual | savant-research + truth-architect | two-basin-routing.md (§ audio sanity test) | -| Composing subsystems / integration | truth-architect | frankenstein-checklist.md | -| New abstraction / new struct | truth-architect | frankenstein-checklist.md (§ redundant abstractions) | -| Performance budget question | truth-architect | frankenstein-checklist.md (§ correctness-first) | - -**The insight update cycle:** - -``` -1. Agent proposes a claim -2. truth-architect checks: is there a probe for this? Has it run? -3. If NOT RUN → probe is the next deliverable, not more synthesis -4. If RUN + PASS → update .claude/knowledge/ with FINDING (not CONJECTURE) -5. If RUN + FAIL → update .claude/knowledge/ with correction -6. Commit knowledge update: docs(knowledge): probe [ID] — [result] -``` - -This cycle is MANDATORY. Knowledge docs are living documents that track -the frontier between conjecture and measurement. An insight that hasn't -been probed is labeled CONJECTURE. An insight that has been probed and -passed is labeled FINDING. The distinction must be visible in the doc. - -## Orchestration - -Primary orchestration prompt: -- `.claude/agent2agent-orchestrator-prompt.md` - -Core knowledge: -- `.claude/knowledge.md` -- `.claude/ripple-project-readme.mde` -- `docs/integrated-architecture-map.md` - -## Rule of use - -Only wake the agents whose objects are actually being touched. -A crowded room is not automatically a wiser room. +### `host-glove-designer` +Designs the structured interface from explicit thought objects into +the host model. **Use for** prompt-side guidance, side-channel +payloads, adapter-like conditioning, and minimal glove experiments. + +### `perspective-weaver` +Models user, agent, topic, and angle as intertwined perspective +objects. **Use for** persona modeling, shared gestalt, angle masks, +perspective mismatch, and synthesis across multiple viewpoints. + +### `mirror-kernel-synthesist` +Explores semantic-kernel mediation, mirror-style perspective +modeling, hypothesis loops, and synthesis policies across +user-agent-topic frames. **Use when** designing compact mediation +layers between resonance, traversal, and response formation. + +--- + +## Memory / Trajectory + +### `trajectory-cartographer` +Models episodic memory, causal arcs, graph revision chains, and +spine trajectories over time. **Use for** AriGraph / episodic +integration, causal path persistence, stabilization, and +revision-aware memory design. + +--- + +## How to pick the right agent + +Decision flow: + +1. **Session just started?** → `workspace-primer` first. +2. **Problem needs 3+ specialists?** → `adk-coordinator` to frame. +3. **Proposal adds a layer or makes claims without measurement?** → + `truth-architect` is the mandatory reviewer; loop it in early. +4. **Otherwise** pick the specialist by what the work touches: + - Container words, Lance schemas → `container-architect`. + - HHTL / Base17 / palette distance → `palette-engineer`. + - Cascade family choice → `family-codec-smith`. + - Ontology drift / premature flattening → `ripple-architect`. + - Numerical certification → `certification-officer`. + - AriGraph / episodic → `trajectory-cartographer`. + - ResonanceDto / CLAM / sweep → `resonance-cartographer`. + - BusDto / p64 / CognitiveShader → `bus-compiler`. + - Contradiction handling → `contradiction-cartographer`. + - ThoughtStruct / blackboard persistence → `thought-struct-scribe`. + - Host-model glove / prompt-side → `host-glove-designer`. + - Persona / user / topic / angle → `perspective-weaver`. + - Kernel mediation / hypothesis loops → `mirror-kernel-synthesist`. + - codec-research / ZeckBF17 / golden-step → `savant-research`. + - Drift / anti-pattern check → `adk-behavior-monitor`. +5. **Before PR merge** on any HHTL / codec / claims work → + `truth-architect` review is the final link. + +The minimal agent set principle: **a crowded room is not +automatically a wiser room.** Only wake the agents whose objects +are actually being touched. + +## Cross-reference + +- **`BOOT.md`** (sibling) — session-start spec, Knowledge Activation + trigger table, Handover Protocol. +- **`../BOOT.md`** (top-level) — the one-page session entry point. +- **`../../CLAUDE.md` § Agent-to-Agent (A2A) Orchestration** — the + two-layer model (runtime A2A via `contract::a2a_blackboard` vs + session A2A via knowledge docs + handovers). +- **`../knowledge/LATEST_STATE.md`** — current contract inventory + (what every specialist's domain looks like right now). +- **`../knowledge/PR_ARC_INVENTORY.md`** — decision history per PR. diff --git a/.claude/hooks/post-compact.sh b/.claude/hooks/post-compact.sh new file mode 100755 index 00000000..02a601bc --- /dev/null +++ b/.claude/hooks/post-compact.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +# PostCompact hook — re-inject critical state after context compaction +# drops the workspace bootload. Without this, the post-compact model +# turn rediscovers what existed before compaction at full cost. +# +# Wired from: .claude/settings.json +# Documented by: .claude/skills/cca2a/SKILL.md +set -euo pipefail + +cat <<'JSON' +{ + "hookSpecificOutput": { + "hookEventName": "PostCompact", + "additionalContext": "POST-COMPACT RE-INJECT (lance-graph, via cca2a pattern):\n\nCompaction just dropped prior context. Re-read mandatory state BEFORE answering the next turn:\n\n1. .claude/knowledge/LATEST_STATE.md — current contract inventory.\n2. .claude/knowledge/PR_ARC_INVENTORY.md — top 3 PR entries (reverse chronological).\n3. .claude/agents/BOOT.md — if this session was doing agent orchestration.\n4. .claude/handovers/ — if a chain was in progress, read the most recent handover file.\n\nDo NOT propose new types without re-grepping LATEST_STATE — compaction may have dropped your awareness that the type already exists.\n\nActive branch and current work-in-progress state should be discoverable via git status + the handover (if any). If no handover exists but work was clearly in progress, ask the user to clarify before resuming." + } +} +JSON diff --git a/.claude/hooks/session-start.sh b/.claude/hooks/session-start.sh new file mode 100755 index 00000000..f962f6d5 --- /dev/null +++ b/.claude/hooks/session-start.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +# SessionStart hook — inject workspace bootload context at turn 0. +# Emits JSON on stdout; Claude consumes hookSpecificOutput.additionalContext +# as a system reminder in the first model turn. +# +# Wired from: .claude/settings.json +# Documented by: .claude/skills/cca2a/SKILL.md +set -euo pipefail + +cat <<'JSON' +{ + "hookSpecificOutput": { + "hookEventName": "SessionStart", + "additionalContext": "WORKSPACE BOOTLOAD (lance-graph, via cca2a pattern):\n\n## Mandatory reads this session (in order)\n\n1. .claude/knowledge/LATEST_STATE.md — current contract inventory, recently shipped PRs, queued work, explicit deferrals. What exists.\n2. .claude/knowledge/PR_ARC_INVENTORY.md — APPEND-ONLY decision history per PR (Added / Locked / Deferred / Docs + mutable Confidence line). Why it exists.\n3. .claude/agents/BOOT.md — 19 specialist + 5 meta-agent ensemble, Knowledge Activation trigger table, Handover Protocol. How to coordinate.\n\nDo NOT propose any new type, module, or convention without grepping LATEST_STATE first.\n\n## Governance (never violate)\n\n- APPEND-ONLY: LATEST_STATE + PR_ARC_INVENTORY Edit prompts for approval; Write to append stays unprompted. Only the Confidence line per PR entry is updatable; corrections append as dated lines; reversals get their own PR entry.\n- Model policy: main thread Opus + deep thinking; subagent grindwork (single-source mechanical) → Sonnet; accumulation (multi-source synthesis) → Opus; NEVER Haiku.\n- GitHub reads: zipball to /tmp/sources/ + local grep for 3+ reads per external repo. MCP only for writes (PR, comments, reviews) and single-path reads.\n- Contract zero-dep invariant: lance-graph-contract has no external crate deps.\n- Read before Write: always Read a file before overwriting.\n- No JSON serialization in runtime types (serde is debug-only).\n\n## A2A two layers\n\n- Layer 1 (runtime): contract::a2a_blackboard. Cognitive-cycle bus.\n- Layer 2 (session): knowledge docs + .claude/handovers/*.md. Parallel subagent spawns in one main-thread turn is cheapest.\n\n## Pattern explained\n\n.claude/skills/cca2a/SKILL.md — read once to grok the pattern, skip re-deriving.\n\n## Full spec\n\nCLAUDE.md at workspace root." + } +} +JSON diff --git a/.claude/knowledge/EPIPHANIES.md b/.claude/knowledge/EPIPHANIES.md new file mode 100644 index 00000000..baf0e92d --- /dev/null +++ b/.claude/knowledge/EPIPHANIES.md @@ -0,0 +1,209 @@ +# Epiphanies — Append-Only Log (date-prefixed) + +> **APPEND-ONLY.** Every epiphany, realization, correction, or +> "aha" moment gets a dated entry here so nothing gets lost between +> sessions. Reverse chronological (newest first). Never delete an +> entry; correct via a new entry that cites the old one. +> +> **Format invariant:** every entry begins with a `## YYYY-MM-DD —` +> header. A CONJECTURE / FINDING / CORRECTION-OF label is optional +> but encouraged. Body is short: one paragraph + optional +> cross-reference. Long material goes in a dedicated knowledge +> doc; the epiphany here is the **pointer + one-line claim**. +> +> Mutable field: `**Status:**` line (FINDING / CONJECTURE / +> SUPERSEDED) is the only thing in an entry that can be updated. +> Everything else is immutable. + +--- + +## How to use + +**When a new insight surfaces** — stop, prepend an entry with today's +date at the top of the "Entries" section below. One paragraph. If +the full idea needs more room, create a dedicated knowledge doc +and reference it from the epiphany entry. + +**When an old epiphany is wrong** — prepend a new entry labeled +`CORRECTION-OF YYYY-MM-DD ` and update the old entry's +`**Status:**` line to `SUPERSEDED by <new-entry>`. Never edit the +old body. + +**When reading the log** — top N entries are the recent thinking; +deeper entries are the accumulated substrate. Everything is there. + +--- + +## Prior art (pre-existing epiphany collections — do not duplicate) + +These files already hold numbered epiphany sets from earlier work. +New epiphanies go in **this file** with date prefix; the files below +stay as historical references. + +| File | Contents | +|---|---| +| `linguistic-epiphanies-2026-04-19.md` | E13–E27 (Chomsky hierarchy, Σ10 Rubicon, sigma_rosetta, Markov living frame, resonanzsiebe, method grammar, 4D hashtag glyph, membrane, verbs as productions) | +| `cross-repo-harvest-2026-04-19.md` | H1–H14 (Born rule, phase-tag threshold, interference truth, Grammar Triangle ≡ ContextCrystal(w=1), NSM ≡ SPO axes, FP_WORDS=160, Mexican-hat, Int4State, Glyph5B, Crystal4K, teleport F=1, 144-verb, Three Mountains) | +| `integration-plan-grammar-crystal-arigraph.md` | E1–E12 (grammar-tiered, morphology-easier, FailureTicket, cross-lingual superposition, Markov ±5, NARS-about-grammar, crystal hierarchy, sandwich, 5D quorum, episodic unbundle, AriGraph substrate, demo matrix) | +| `session-capstone-2026-04-18.md` | 8 epiphanies from 2026-04-18 session (four-pillar inheritance, CMYK/RGB qualia, vocabulary IS semantics, WorldMapRenderer, Σ hierarchy maps to crate boundaries, proprioception as ontological self-recognition, BindSpace+cycle_fingerprint as latent episodic, two-frame DTO) | +| `crystal-quantum-blueprints.md` | Crystal mode vs Quantum mode split (bundled Markov SPO chain vs holographic residual) | +| `endgame-holographic-agi.md` | 5-layer stack, 12-step holographic memory loop, three-demo matrix | +| `fractal-codec-argmax-regime.md` | Orthogonal research thread — MFDFA on Hadamard-rotated coefficients as fractal-descriptor leaf | + +## Governance + +- **APPEND-ONLY.** Immutable body per entry. +- **Mutable:** `**Status:**` line only (FINDING / CONJECTURE / + SUPERSEDED by <date-title>). +- **Corrections APPEND as new dated entries.** The old entry's + Status changes to SUPERSEDED. +- **`permissions.ask` on Edit** (same rule as `PR_ARC_INVENTORY.md` + / `LATEST_STATE.md` — rewriting history prompts for approval; + Write for append stays unprompted). + +--- + +## Entries (reverse chronological) + +## 2026-04-19 — Mandatory epiphanies log (this file) + +**Status:** FINDING + +Every epiphany from prior sessions lived in separate doc (E1–E12 +here, H1–H14 there, E13–E27 somewhere else). No single place to +append a new one. This file is the unified target going forward. +Old files stay as historical substrate; new insights land here with +date prefix. Cross-reference: `BOOT.md`, `CLAUDE.md`, `cca2a/ +concepts.md` — all four bookkeeping files now plus this one. + +## 2026-04-19 — Cold-start tax is solvable with three mandatory reads + +**Status:** FINDING + +A new session on non-trivial workspace burns 20–30 turns rediscovering +what's shipped. Three files (`LATEST_STATE.md`, `PR_ARC_INVENTORY.md`, +`.claude/agents/BOOT.md`) + SessionStart hook closes the gap to +3–5 turns. Proven by PR #211. Savings per cold-start: ~$15–35 of +Opus. See `.claude/skills/cca2a/SKILL.md` for the full pattern. + +## 2026-04-19 — 10,000-D f32 VSA is lossless under linear sum + +**Status:** FINDING + +Earlier framing of "Vsa10kF32 is wire-only passthrough" was wrong. +10K × 32 = 320 K bits of capacity ≫ any single signal; orthogonal +role keys give exact unbundle. **10K f32 is native storage**, not +passthrough. lancedb famously supports 10K-D VSA natively. Cross-ref: +PR #209 refactor. + +## 2026-04-19 — Signed 5^5 bipolar is lossless; unsigned / bitpacked is lossy + +**Status:** FINDING + +Negative cancellation on bipolar cells is VSA-native; opposing cells +at the same sandwich dim cancel on bundling. Unsigned 5^5 saturates +under accumulation (lossy). Binary bitpacked commits to 0/1 via +majority vote (lossy). CAM-PQ projection is distance-preserving +(lossless cross-form). Cross-ref: PR #209 sandwich layout. + +## 2026-04-19 — VSA convention is `[start:stop]` contiguous slices, not scattered bits + +**Status:** FINDING + +Role keys own disjoint contiguous slices of the 10K VSA space — +SUBJECT=[0..2000), PREDICATE=[2000..4000), etc. Binding into one +slice does not contaminate another. Scattered-bit role encoding +(early draft) was the wrong pattern. Cross-ref: PR #210 D6 +role_keys.rs. + +## 2026-04-19 — Finnish object marking is Nominative/Genitive/Partitive, NOT Accusative + +**Status:** FINDING (CORRECTION-OF an earlier Latinate transplant) + +Prior draft wrote Finnish "Accusative `-n/-t` → Object" which is +a Latinate transplant. Finnish object marking actually uses: +Nominative (plural), Genitive `-n` (total singular), Partitive +`-a/-ä` (partial / negated). True Accusative is only for personal +pronouns (`minut`, `sinut`, `hänet`, `meidät`, `teidät`, `heidät`). +Each language gets its native case terminology. +Cross-ref: `grammar-landscape.md` §4.1. + +## 2026-04-19 — Morphology-rich languages are easier, not harder + +**Status:** FINDING + +Finnish 15 cases → 98%+ local coverage. English (word order only) → +85% (WORST case). Case endings directly encode TEKAMOLO slots; +morphology commits grammatical role at the morpheme level, +eliminating the inference English needs. Cross-ref: +`grammar-tiered-routing.md` §Morphology Coverage Table. + +## 2026-04-19 — Markov ±5 is the context upgrade to NARS+SPO 2³+TEKAMOLO + +**Status:** FINDING + +Pre-Markov reasoning unit = sentence. Post-Markov = trajectory. +NARS doesn't reason about "this sentence"; it reasons about "this +sentence in this flow." The context dimension is the whole point. +Cross-ref: `integration-plan-grammar-crystal-arigraph.md` E5. + +## 2026-04-19 — Grammar Triangle IS ContextCrystal at window=1 + +**Status:** FINDING + +Two parallel architectures turn out to be the same thing at +different window sizes. Triangle emits `Structured5x5` with S/O +collapsed + only t=2 populated; ContextCrystal populates all 5 +axes. Unification. Cross-ref: +`cross-repo-harvest-2026-04-19.md` H4, +`ladybug-rs/docs/GRAMMAR_VS_CRYSTAL.md`. + +## 2026-04-19 — NSM primes map directly to SPO + Qualia + Temporal axes + +**Status:** FINDING + +The 65 Wierzbicka primes aren't orthogonal to SPO — they ARE an +SPO encoding. I/YOU/SOMEONE → Subject; THINK/WANT/FEEL → +Predicate; SOMETHING/BODY → Object; GOOD/BAD → Qualia.valence; +BEFORE/AFTER → Temporal; BECAUSE/IF → Causality via Markov flow. +DeepNSM + Structured5x5 already speak NSM's vocabulary. +Cross-ref: `cross-repo-harvest-2026-04-19.md` H5. + +## 2026-04-19 — Chomsky hierarchy isomorphism with Pearl rungs and Σ tiers + +**Status:** FINDING + +Type-3 Regular = Pearl rung 1 = Σ1–Σ2 = DeepNSM FSM (LLM token +prediction lives here). Type-2 CF = rung 2 = Σ3–Σ5 = SPO 2³. Type-1 +CS = rung 3–4 = Σ6–Σ8 = Markov ±5 + coref + counterfactual. Type-0 +TM = rung 5 = Σ9–Σ10 = LLM escalation only. The 90–99% local / +1–10% LLM split is the Chomsky-hierarchy boundary between +context-sensitive-decidable and Turing-complete-undecidable. The +split is mathematically principled, not arbitrary. +Cross-ref: `linguistic-epiphanies-2026-04-19.md` E13, E26. + +## 2026-04-19 — Grindwork vs accumulation is the subagent model split + +**Status:** FINDING + +Grindwork (single-source mechanical: write-file-from-spec, grep, +list paths) → Sonnet. Accumulation (multi-source synthesis: +harvest across repos, combine N docs, trace architecture) → Opus. +Cheaper tiers produce shallow outputs under accumulation; quality +drop is visible. Never Haiku. +Cross-ref: `CLAUDE.md §Model Policy`. + +## 2026-04-19 — Zipball-for-reads is ~20× cheaper than MCP-per-file + +**Status:** FINDING + +`mcp__github__get_file_contents` drops the full file into context +and recharges on every subsequent turn. Zipball to `/tmp/sources/` ++ local grep lands only the grep output (typically 2–10 KB) vs +50 KB per file per turn. 95% savings on cross-repo harvest turns. +MCP stays for writes (PR creation, comments). +Cross-ref: `CLAUDE.md §GitHub Access Policy`. + +--- + +(append new epiphanies above this marker; format: `## YYYY-MM-DD — <title>`) diff --git a/.claude/knowledge/IDEAS.md b/.claude/knowledge/IDEAS.md new file mode 100644 index 00000000..2774ee81 --- /dev/null +++ b/.claude/knowledge/IDEAS.md @@ -0,0 +1,173 @@ +# Ideas Log — Open + Implemented + Integration (triple-entry, append-only) + +> **Append-only ledger** for every architectural idea, speculative +> design, "what if we tried X" moment. Ideas accumulate here +> whether or not they're ready to ship. When one gets implemented, +> it moves from Open → Implemented → Integration (a row linking +> the idea to the plan entry that scheduled it + the PR that +> shipped it). +> +> **Purpose:** a speculation has nowhere else to live until it's +> scoped into a plan. This file is the speculation surface. Ideas +> die or graduate here; nothing is lost. + +--- + +## Triple-entry discipline + +Every idea moves through three ledger sections in this file: + +1. **Open Ideas** — speculative; captured when proposed. +2. **Implemented Ideas** — idea became real; row appended with PR + anchor + integration-plan D-id reference. +3. **Integration Plan Update Log** — the paired "what the plan + changed when this idea landed" row, citing the specific + `INTEGRATION_PLANS.md` version bump or `STATUS_BOARD.md` row + flip triggered by the idea. + +The row in Open is NEVER moved; its Status flips. The Implemented +row is a NEW append that cites the Open anchor. The Integration +row is a THIRD append that cites both. + +This is **triple-entry bookkeeping** — three sections, same idea, +cross-linked. The cost is a bit more writing; the benefit is that +every shipped idea has an audit trail from speculation → code → +plan consequence. + +--- + +## Rejected / Deferred + +Ideas that don't graduate go into a fourth section: + +4. **Rejected / Deferred Ideas** — with `**Rationale:**` and cross- + ref to the original Open entry. The Open row's Status flips to + `Rejected YYYY-MM-DD` or `Deferred to <when>`. + +Deferred ideas can later reactivate — append a new Open entry +citing the Deferred one; Deferred row's Status flips to `Reactivated +YYYY-MM-DD <new-entry>`. + +--- + +## Governance + +- **Append-only.** Never delete a row. +- **Mutable fields:** `**Status:**` line, `**Rationale:**` line (if + added later with more context). +- **`permissions.ask` on Edit** (same rule as other bookkeeping + files). Write for appends stays unprompted. + +## Cross-references + +- `EPIPHANIES.md` — if an idea came from an epiphany, both entries + cross-reference each other. +- `INTEGRATION_PLANS.md` — the plan version that incorporated the + idea. +- `STATUS_BOARD.md` — the D-id status row that reflects the idea's + shipping status. +- `PR_ARC_INVENTORY.md` — the PR that landed the code. +- `ISSUES.md` — if implementing an idea surfaced a bug, both rows + link. + +--- + +## Kanban Format (priority + scope on every entry) + +Every idea carries: +- **Priority** — `P0` must-ship-this-phase / `P1` next-phase / `P2` + eventual / `P3` speculative. +- **Scope** — which agent / deliverable / domain: `@<agent-name>`, + `D<N>` (plan D-id), `domain:<grammar|codec|arigraph|infra|...>`. + +Ticket tag on each entry: `[P2 @family-codec-smith D7 domain:grammar]`. +Agents filter by `@`-mention or domain to see what's theirs. + +## Open Ideas + +(Prepend new ideas here with today's date. Format:) + +``` +## YYYY-MM-DD — <short title> +**Status:** Open +**Priority:** P0 | P1 | P2 | P3 +**Scope:** @<agent> D<N> domain:<tag> + +<one paragraph: what the idea is, rough scope, why it matters> + +Cross-ref: <epiphany entry / plan D-id / related knowledge doc> +``` + +--- + +## Implemented Ideas + +(When an Open idea ships, APPEND here with same title + PR anchor.) + +``` +## YYYY-MM-DD — <same title as Open entry> (from YYYY-MM-DD) +**Status:** Implemented YYYY-MM-DD via PR #NNN +**Shipped as:** D<N> in integration plan v<K> +**PR:** #NNN (commit SHA) + +<verbatim original Open paragraph> + +Cross-ref: <same + PR link + plan D-id> +``` + +The original Open entry's Status flips to `Implemented YYYY-MM-DD`. + +--- + +## Integration Plan Update Log + +(When an idea triggers a plan change — version bump, D-id status +move, new deliverable — APPEND here. This is the third-entry row.) + +``` +## YYYY-MM-DD — Plan consequence of <idea title> (from YYYY-MM-DD) +**Trigger idea:** <idea title> (YYYY-MM-DD) +**Plan change:** <version bump / D-id flip / deliverable added> +**Plan entry:** `INTEGRATION_PLANS.md` v<K> entry or new v<K+1> entry +**Status board update:** <D-id> → <new Status> + +<one paragraph: what the plan documented differently after this idea> +``` + +--- + +## Rejected / Deferred Ideas + +(Ideas that don't graduate go here.) + +``` +## YYYY-MM-DD — <same title as Open entry> (from YYYY-MM-DD) +**Status:** Rejected YYYY-MM-DD | Deferred to <when / trigger> +**Rationale:** <short explanation> + +<original Open paragraph> + +Cross-ref: <original + any related> +``` + +--- + +## How to use this file + +**When a new architectural idea surfaces** — prepend to **Open +Ideas** with today's date. One paragraph. If it needs more, create +a knowledge doc and link. + +**When an Open idea ships** — APPEND to **Implemented Ideas**; flip +Open Status to `Implemented YYYY-MM-DD`. Then APPEND to +**Integration Plan Update Log** with the plan consequence. + +**When an Open idea is rejected** — APPEND to **Rejected / +Deferred Ideas** with Rationale; flip Open Status. + +**When a deferred idea reactivates** — prepend a NEW Open entry +citing the deferred one; flip the deferred entry's Status to +`Reactivated YYYY-MM-DD <new-entry>`. + +Nothing is lost. Every idea has a trail from speculation to +disposition. diff --git a/.claude/knowledge/INTEGRATION_PLANS.md b/.claude/knowledge/INTEGRATION_PLANS.md new file mode 100644 index 00000000..14d56300 --- /dev/null +++ b/.claude/knowledge/INTEGRATION_PLANS.md @@ -0,0 +1,85 @@ +# Integration Plans — Versioned Index + +> **APPEND-ONLY.** Every integration plan ever authored for this +> workspace, versioned, with status. New plans append to the top +> as new entries. Superseded plans stay — they are the design arc. +> +> Governance: same rule as `PR_ARC_INVENTORY.md`. The **Status** +> field is the only mutable field per entry. Supersedure is marked +> by adding a new top entry that references the prior version; the +> prior entry is NOT deleted. + +--- + +## APPEND-ONLY RULE + +1. **New plans PREPEND** a new section at the top. +2. **Old plan entries are IMMUTABLE** except the **Status** line. +3. **Supersedure:** if plan vN is replaced by vN+1, prepend a new + entry for vN+1 that cites vN; update vN's **Status** to + "Superseded by vN+1". +4. **Corrections** to plan scope during its lifetime: append a + `**Correction (YYYY-MM-DD):**` line to the entry; do not edit + the original scope line. +5. **Retire but never delete.** When a plan is complete or + abandoned, update Status and move on. The entry stays. + +**Per-entry format:** + +- **Plan name + version** +- **Author + date** +- **Scope** — one-sentence goal (immutable) +- **Path** — workspace file location +- **Deliverables** — D-id list (immutable) +- **Status** — **mutable**: Active / Shipped / Superseded / Deferred / Abandoned +- **Confidence** — **mutable**: Working / Partial / Broken — see PR #N + +--- + +## v1 — Elegant Herding Rocket (authored 2026-04-19) + +**Author:** main-thread session 2026-04-19 +**Scope:** DeepNSM as full parser via Grammar Triangle wiring + Markov ±5 SPO+TEKAMOLO bundling + NARS-tested grammar thinking styles + coreference resolution + story-context bridge + ONNX arc emergence. +**Path:** `.claude/plans/elegant-herding-rocket-v1.md` (2,085 lines) +**Deliverables:** D0 landscape doc, D2 FailureTicket emission, D3 Triangle bridge, D4 ContextChain reasoning, D5 Markov ±5 bundler, D6 role keys, D7 grammar thinking styles, D8 story context + contradictions, D9 ONNX arc export, D10 Animal Farm validation harness, D11 bundle-perturb emergence. + +**Status (2026-04-19):** Active. Phase 1 (D0 + D4 + D6) shipped in PR #210. + +**Confidence (2026-04-19):** Phase 1 working (125 tests passing). +Phases 2–4 queued. + +**Phases:** + +- **Phase 1 — SHIPPED** (PR #210, merged): D0 landscape doc + D4 + ContextChain reasoning ops + D6 role keys. 125 tests passing. +- **Phase 2 — QUEUED:** D2 FailureTicket emission + D3 Triangle + bridge + D5 Markov bundler + D7 grammar thinking styles. + Estimate ~930 LOC, one PR. +- **Phase 3 — QUEUED:** D8 story-context/contradictions + D10 + Animal Farm validation harness. +- **Phase 4 — FUTURE:** D9 ONNX arc export + D11 bundle-perturb + emergence interface. + +--- + +## How to use this file + +1. **Starting a new session:** check top entry. If Status is Active, + that's the current plan. Read it at + `.claude/plans/<plan-file>.md`. +2. **Proposing a new plan:** prepend a new v entry; move prior + plan's Status to Superseded. +3. **Tracking deliverable progress:** use + `.claude/knowledge/STATUS_BOARD.md` for the cross-deliverable + view (which D-ids are in which phase / PR). +4. **User requests / open threads** that aren't yet a plan: capture + in `.claude/knowledge/OPEN_PROMPTS.md`. + +## Cross-references + +- **`STATUS_BOARD.md`** — deliverable-level status (D0 / D2 / D3 / … + across all plans). +- **`OPEN_PROMPTS.md`** — outstanding user questions / threads that + aren't yet scoped into a plan. +- **`PR_ARC_INVENTORY.md`** — shipped-PR decision history. +- **`LATEST_STATE.md`** — current-state snapshot. diff --git a/.claude/knowledge/ISSUES.md b/.claude/knowledge/ISSUES.md new file mode 100644 index 00000000..86d2c405 --- /dev/null +++ b/.claude/knowledge/ISSUES.md @@ -0,0 +1,125 @@ +# Issues Log — Open + Resolved (double-entry, append-only) + +> **Append-only ledger.** Every issue (bug, regression, invariant +> violation, blocker) gets a dated entry here. Entries move from +> Open → Resolved by status-flip; they are NEVER deleted. +> +> **Format invariant:** every entry starts with `## YYYY-MM-DD — ` +> followed by a short title. Body is short — one paragraph of +> problem + cross-references. Full repro / fix / test details go +> in the PR or in a dedicated doc and are LINKED, not duplicated. +> +> **Mutable field:** `**Status:**` line only (Open / Resolved / +> Wontfix / Superseded). Resolved entries keep a `**Resolution:**` +> line pointing at the PR + commit SHA that fixed them. + +--- + +## Double-entry discipline + +Every issue has TWO corresponding rows, both in this file: +1. **Open section** — issue captured when first seen. +2. **Resolved section** — same entry, appended when closed, with + `**Resolution:**` line pointing at fix. + +The resolved entry cites the open entry's date as anchor. Old +"Open" entry's **Status:** flips to `Resolved YYYY-MM-DD` — it +stays in the Open section (never moved) so chronology is +preserved. The Resolved section accumulates fixes for discovery. + +This is **bookkeeping discipline**, not a storage optimization: +- Open section = what broke and when. +- Resolved section = how and when it was fixed. +- Both sections keep the same row forever; the view depends on + which section you're reading. + +--- + +## Governance + +- **Append-only.** Never delete a row from either section. +- **Mutable:** `**Status:**` and `**Resolution:**` fields only. +- **`permissions.ask` on Edit** (same rule as PR_ARC_INVENTORY). + Write for appends stays unprompted. +- **Supersedure:** if an issue turns out to be a duplicate of an + older one, Status → `Superseded by YYYY-MM-DD <title>`; old entry + stays. + +## Cross-references + +- `PR_ARC_INVENTORY.md` — which PR shipped the fix. +- `STATUS_BOARD.md` — deliverable-level view (an issue may block + one or more D-ids). +- `EPIPHANIES.md` — if debugging surfaced an architectural + insight, that lands in Epiphanies; this file tracks the concrete + fix. +- `TECH_DEBT.md` — if an issue is knowingly deferred rather than + fixed, it moves (via cross-ref) into technical debt. + +--- + +## Kanban Format (priority + scope on every entry) + +Every issue carries: +- **Priority** — `P0` blocker / `P1` high / `P2` medium / `P3` low. +- **Scope** — which agent / deliverable / domain owns it. One or + more of: `@<agent-name>`, `D<N>` (plan D-id), + `domain:<grammar|codec|infra|arigraph|...>`. + +Together they form the ticket tag: `[P1 @truth-architect D5 domain:grammar]`. +Agents filter by their own `@`-mention or their domain; nothing +gets buried. + +## Open Issues + +(No tracked open issues at initial commit. New issues PREPEND here +in reverse chronological order. Format below.) + +``` +## YYYY-MM-DD — <short title> +**Status:** Open +**Priority:** P0 | P1 | P2 | P3 +**Scope:** @<agent> D<N> domain:<tag> + +<one paragraph: what's broken, where it surfaces, rough impact> + +Cross-ref: <file:line or PR # or knowledge doc> +``` + +--- + +## Resolved Issues + +(No resolved issues at initial commit. When an Open issue is fixed, +APPEND a copy here with the same date anchor + `**Resolution:**` +line. Old Open entry's Status flips to `Resolved YYYY-MM-DD`. Old +entry stays in the Open section for chronology.) + +``` +## YYYY-MM-DD — <same title as Open entry> +**Status:** Resolved YYYY-MM-DD +**Resolution:** PR #NNN (commit SHA) — <one-line description> + +<original problem paragraph, verbatim> + +Cross-ref: <same as Open entry> +``` + +--- + +## How to use this file + +**When an issue is found** — prepend to **Open Issues** section with +today's date + `**Status:** Open` + one-paragraph description. + +**When an issue is fixed** — append to **Resolved Issues** section +with the same title and date anchor + `**Status:** Resolved +YYYY-MM-DD` + `**Resolution:** PR #NNN`. Don't edit the Open entry +body; just flip its Status to `Resolved YYYY-MM-DD`. + +**When an issue is a duplicate** — append a new entry in Resolved +section noting `**Resolution:** duplicate of YYYY-MM-DD <title>`; +flip Open entry to Superseded. + +**When an issue is deferred knowingly** — leave it Open here but +also append a row to `TECH_DEBT.md` with cross-ref back. diff --git a/.claude/knowledge/LATEST_STATE.md b/.claude/knowledge/LATEST_STATE.md new file mode 100644 index 00000000..5baac10d --- /dev/null +++ b/.claude/knowledge/LATEST_STATE.md @@ -0,0 +1,100 @@ +# LATEST_STATE — What Just Shipped (read this FIRST) + +> **Auto-injected at session start via SessionStart hook.** +> Updated after every merged PR. +> **Last updated:** 2026-04-19 post PR #210. +> +> Purpose: prevent new sessions from hallucinating structure that +> already exists or proposing features already shipped. Read this +> BEFORE proposing any grammar/crystal/contract changes. + +--- + +## Recently Shipped PRs (reverse chronological) + +| PR | Merged | Title | What it added | +|---|---|---|---| +| **#210** | 2026-04-19 | Phase 1 grammar + knowledge docs | ContextChain reasoning ops, role_keys slice catalogue, 3 knowledge docs (grammar-landscape, linguistic-epiphanies E13-E27, fractal-codec) | +| **#209** | 2026-04-19 | sandwich layout + bipolar cells | Crystal fingerprint sandwich, VSA_permute reference, lossless bundling corrections | +| **#208** | 2026-04-19 | grammar + crystal + AriGraph unbundle | Contract grammar/ + crystal/ modules, AriGraph episodic unbundle hooks with SIMD dispatch | +| **#206** | 2026-04-18 | state classification pillars | qualia.rs (17D), proprioception.rs (7 anchors), world_map.rs, sigma_rosetta 64 glyphs + 144 verbs, Pumpkin NPC example | +| **#205** | 2026-04-18 | engine bridge + CMYK/RGB qualia | engine_bridge.rs, 12-style unified mapping, 17D vs 18D qualia distinction | +| **#204** | 2026-04-18 | cognitive-shader-driver | New crate, ShaderDispatch/Resonance/Bus/Crystal DTOs, BindSpace struct-of-arrays, full ladybug-rs import | + +## Current Contract Inventory (lance-graph-contract) + +Types that EXIST — do NOT re-propose them: + +**`grammar/`**: `FailureTicket`, `PartialParse`, `CausalAmbiguity`, `TekamoloSlots`, `TekamoloSlot`, `WechselAmbiguity`, `WechselRole`, `FinnishCase`, `finnish_case_for_suffix`, `NarsInference`, `inference_to_style_cluster`, `ContextChain` (with coherence_at / total_coherence / replay_with_alternative / disambiguate / DisambiguationResult / WeightingKernel), `RoleKey` + 47 `LazyLock<RoleKey>` instances + `Tense` enum + `finnish_case_key / tense_key / nars_inference_key` lookups. + +**`crystal/`**: `Crystal` trait, `CrystalKind`, `TruthValue`, `UNBUNDLE_HARDNESS_THRESHOLD = 0.8`, `CrystalFingerprint` (Binary16K / Structured5x5 / Vsa10kI8 / Vsa10kF32), `Structured5x5`, `Quorum5D`, `SentenceCrystal`, `ContextCrystal`, `DocumentCrystal`, `CycleCrystal`, `SessionCrystal`, sandwich layout constants. + +**`cognitive_shader`**: `ShaderDispatch`, `ShaderResonance`, `ShaderBus`, `ShaderCrystal`, `MetaWord` (u32 packed), `MetaFilter`, `ColumnWindow`, `StyleSelector`, `RungLevel`, `EmitMode`, `ShaderSink` trait, `CognitiveShaderDriver` trait. + +**`proprioception`**: 7 `StateAnchor` (Intake/Focused/Rest/Flow/Observer/Balanced/Baseline), 11-D `ProprioceptionAxes`, `StateClassifier` trait, `DefaultClassifier`, `hydrate()` softmax-weighted blend. + +**`qualia`**: 17-D `QualiaVector`, `qualia_to_state` projection (17→11). + +**`world_map`**: `WorldMapDto`, `WorldMapRenderer` trait, `DefaultRenderer`. + +**`world_model`**: `WorldModelDto` with `qualia`, `axes`, `proprioception`, `cycle_fingerprint`, `timestamp`, `cycle_index`, `is_self_recognised()` / `is_liminal()`. + +**`container`**: `Container = [u64; 256]` (16Kbit = 2KB), `CogRecord`. + +**`a2a_blackboard`**, **`collapse_gate`**, **`exploration`**, **`literal_graph`**, **`orchestration_mode`**, **`jit`**, **`nars`**, **`plan`**, **`orchestration`**, **`thinking`** (36 styles, 6 clusters), **`mul`**, **`sensorium`**, **`cam`**, **`high_heel`**. + +## Current AriGraph Inventory (lance-graph/src/graph/arigraph/) + +4696 LOC shipped, 7 modules: +- `episodic.rs` (210 LOC + unbundle hooks from #208) — `Episode`, `EpisodicMemory`, `unbundle_hardened`, `unbundle_targeted`, `rebundle_cold`, `UnbundleReport`, `RebundleReport`, `UNBUNDLE_HARDNESS_THRESHOLD = 0.8` +- `triplet_graph.rs` (1064 LOC) — SPO graph, NARS truth, BFS, spatial paths +- `retrieval.rs` (447 LOC) — fingerprint retrieval policies +- `sensorium.rs` (539 LOC) — observation → triplets +- `orchestrator.rs` (1562 LOC) — AriGraph coordinator +- `xai_client.rs` (521 LOC) — xAI enrichment +- `language.rs` (339 LOC) — LM bridge + +## Workspace Conventions (locked in CLAUDE.md) + +1. **Model policy:** main thread Opus + deep thinking; subagent grindwork → Sonnet; accumulation → Opus; NEVER Haiku. +2. **GitHub reads:** zipball to `/tmp/sources/` + local grep for 3+ reads per repo. MCP only for writes (PR, comments) and single-path reads. +3. **Contract zero-dep invariant:** `lance-graph-contract` has no external crate deps. Do not add any. +4. **Read before Write:** always Read a file before overwriting. Write-over-self without Read is the documented failure mode. +5. **No JSON serialization in types.** Serde stays out of types (debug-only). Wire formats are explicit. +6. **Pumpkin framing** for externally-visible examples (clinical / game-AI disguise for the AGI primitives). + +## Active Branches (local at /home/user/lance-graph) + +- `claude/deepnsm-grammar-phase1` — Phase 1 PR #210, merged into main. +- `main` — up-to-date post #210. + +## Immediate Next Work (Phase 2 — queued, NOT yet shipped) + +Per `/root/.claude/plans/elegant-herding-rocket.md`: + +- **D2** DeepNSM emits `FailureTicket` on low coverage (~150 LOC). +- **D3** Grammar Triangle wired into DeepNSM via `triangle_bridge.rs` (~220 LOC). +- **D5** Markov ±5 bundler with role-indexed VSA (~300 LOC). +- **D7** NARS-tested grammar thinking styles as meta-inference policies (~260 LOC). + +Total ~930 LOC, one PR when it ships. + +## Deferred (do NOT propose these — they're explicitly parked) + +- CausalityFlow TEKAMOLO extension (modal/local/instrument + beneficiary/goal/source, 9 total) — struct change deferred until after Phase 2. +- D8 story-context bridge, D9 ONNX arc export, D10 Animal Farm validation, D11 bundle-perturb emergence — Phase 3/4. +- Named Entity pre-pass (NER) — biggest OSINT blocker, separate PR. +- FP_WORDS = 160 migration (currently 157) — coordinated ndarray change. +- Crystal4K 41:1 persistence compression. +- 200-500 YAML TEKAMOLO templates per language — future training pipeline. +- Python/TypeScript grammar-stack convergence. + +## If You're Tempted to Propose Something + +Check this file first. Then check the KNOWLEDGE_INDEX.md for which +docs cover your domain. Then load only those docs. If you're still +uncertain whether something exists, grep the actual source before +proposing a new type. + +The fastest way to waste 30 turns is to re-invent what's already in +the contract. This file exists to prevent that. diff --git a/.claude/knowledge/PR_ARC_INVENTORY.md b/.claude/knowledge/PR_ARC_INVENTORY.md new file mode 100644 index 00000000..0284f48d --- /dev/null +++ b/.claude/knowledge/PR_ARC_INVENTORY.md @@ -0,0 +1,220 @@ +# PR Arc — Architectural Decision History + +> **Auto-loaded at session start.** Every merged PR, its meta, and +> the decisions it locked in. Read BEFORE proposing anything — a new +> proposal that contradicts a decision in this arc is a 30-turn +> rediscovery tax waiting to happen. +> +> ## APPEND-ONLY RULE (MANDATORY) +> +> 1. **New PRs PREPEND** a new section at the top (most-recent first). +> 2. **Old PR sections are IMMUTABLE HISTORY.** Never rewrite or +> delete a past PR's Added / Locked / Deferred / Docs entries. +> 3. **The ONE exception: Confidence annotations.** Each PR section +> may have a `**Confidence (YYYY-MM-DD):**` line that IS updatable. +> Use it to record: "working", "partial", "superseded by PR #N", +> "broken — see PR #N for fix". This is the only mutable field. +> 4. **Corrections append.** If a Locked claim turns out wrong, +> append a `**Correction (YYYY-MM-DD from PR #N):**` line to the +> same entry — do not edit the original Locked line. Both stay. +> 5. **Reversals are their own PR entry.** If a later PR explicitly +> undoes a decision, the later entry documents the reversal; the +> earlier entry's Confidence line references it. Both remain in +> the arc. +> +> The arc is the historical record. Rewriting it destroys the +> "why was this decided that way" context that prevents future +> rediscovery. Every entry stays. +> +> **Format:** reverse chronological. Each PR carries: +> - **Added** — new types / modules / LOC (immutable) +> - **Locked** — conventions / invariants / patterns (immutable) +> - **Deferred** — explicit parks (immutable) +> - **Docs** — knowledge files produced (immutable) +> - **Confidence (YYYY-MM-DD):** — the ONLY mutable field + +--- + +## #210 — Phase 1 grammar + knowledge docs (merged 2026-04-19) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- `contract::grammar::context_chain` — coherence_at / total_coherence / replay_with_alternative / disambiguate / DisambiguationResult / WeightingKernel {Uniform, MexicanHat, Gaussian} (+396 LOC, 8 tests). +- `contract::grammar::role_keys` — 47 canonical role keys addressed as contiguous `[start:stop]` slices over 10,000 VSA dims. FNV-64 + per-dim LCG deterministic generation. `Tense` enum (12 variants). `finnish_case_key / tense_key / nars_inference_key` lookups (+404 LOC, 7 tests). + +**Locked:** +- **Role-key VSA addressing uses contiguous slices**, not scattered bits. Subject=[0..2000), Predicate=[2000..4000), Object=[4000..6000), Modifier=[6000..7500), Context=[7500..9000), TEKAMOLO slots=[9000..9900), Finnish cases=[9840..9910), tenses=[9910..9970), NARS inferences=[9970..10000). +- **All role-key slices are disjoint**; binding into one slice does not contaminate another. +- **ContextChain coherence is Hamming-based** on the Binary16K variant, graceful zero-score on other variants (zero-dep constraint). +- **Mexican-hat weight:** `(1 - 2x^2) · exp(-2x^2)` where `x = d / MARKOV_RADIUS`. Monotone on d=0..5. +- **DISAMBIGUATION_MARGIN_THRESHOLD = 0.1** — below this the `escalate_to_llm` flag fires. + +**Deferred:** +- CausalityFlow 3→9 slot extension (modal/local/instrument + beneficiary/goal/source). +- Phase 2 work: D2 FailureTicket emission, D3 Triangle bridge, D5 Markov bundler, D7 grammar thinking styles. +- All of Phase 3/4. + +**Docs:** +- `grammar-landscape.md` (429 lines) +- `linguistic-epiphanies-2026-04-19.md` (466 lines, E13-E27) +- `fractal-codec-argmax-regime.md` (256 lines, orthogonal thread) + +**Decisions for future PRs to respect:** +- Finnish object marking uses Nominative/Genitive/Partitive, NOT Latinate Accusative (except personal pronouns). +- Russian 6 cases include Instrumental (not omitted). +- Each language gets its native case terminology. +- Never spawn Haiku subagents. +- Explore subagents → Sonnet, `general-purpose` grindwork → Sonnet, accumulation → Opus. + +--- + +## #209 — sandwich layout + bipolar cells (merged 2026-04-19) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- `CrystalFingerprint::Structured5x5` uses sandwich layout: 3,125 cells in middle (dims 3437..6562), 5 quorum floats (6562..6567), quorum sentinel (6567), plus leading/trailing role-binding space. +- Bipolar cell encoding: `u8 0..=255 → f32 [-1, 1]` via `v/127.5 - 1.0`. +- Lossless bundle/unbundle between Structured5x5 ↔ Vsa10kF32 sandwich. +- Codex-review fixes: Binary16K aliasing, i8 /128 clamp, `quorum: None` sentinel. + +**Locked:** +- **VSA operations stay in `ndarray::hpc::vsa`** (bind, unbind, bundle, permute, similarity, hamming, sequence, clean). DO NOT duplicate in contract. +- **10K f32 Vsa10kF32 (40 KB) is lossless under linear sum**, not a wire-only format; lancedb natively handles 10K VSA. +- **Signed 5^5 bipolar is lossless**; unsigned / bitpacked binary is lossy via saturation. +- **CAM-PQ projection is distance-preserving** (lossless across form transitions). +- **VSA convention is `[start:stop]` contiguous slices**, not scattered bits. +- `Structured5x5` is the native rich form; `Vsa10kF32` is native storage (not passthrough). + +**Deferred:** +- PhaseTag types (ladybug-rs owns them). +- Crystal4K 41:1 compression persistence (ladybug-rs owns it). +- ladybug-rs quantum 9-op set port. + +**Docs:** +- `crystal-quantum-blueprints.md` (existing, cross-referenced) +- Cross-repo-harvest H1-H14 (Born rule, phase tag, interference, Grammar Triangle ≡ ContextCrystal(w=1), NSM ≡ SPO axes, FP_WORDS=160, Mexican-hat, Int4State, Glyph5B, Crystal4K, teleport F=1, 144-verb, Three Mountains). + +--- + +## #208 — grammar + crystal + AriGraph unbundle (merged 2026-04-19) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- `contract::grammar/` module (6 files): FailureTicket, PartialParse, CausalAmbiguity, TekamoloSlots/TekamoloSlot, WechselAmbiguity/WechselRole, FinnishCase, NarsInference (7 variants), ContextChain (ring buffer), LOCAL_COVERAGE_THRESHOLD = 0.9, MARKOV_RADIUS = 5. +- `contract::crystal/` module (7 files): Crystal trait, CrystalKind, TruthValue, CrystalFingerprint (Binary16K / Structured5x5 / Vsa10kI8 / Vsa10kF32), SentenceCrystal / ContextCrystal / DocumentCrystal / CycleCrystal / SessionCrystal. +- `lance-graph::graph::arigraph::episodic`: unbundle_hardened / unbundle_targeted / rebundle_cold with ndarray::hpc::bitwise::hamming_batch_raw SIMD dispatch under `ndarray-hpc` feature. +- `UNBUNDLE_HARDNESS_THRESHOLD = 0.8` synchronized in contract + arigraph. + +**Locked:** +- **AriGraph lives in-tree** at `lance-graph/src/graph/arigraph/` (not a standalone crate). 4696 LOC transcoded from Python AdaWorldAPI/AriGraph. +- **Crystals unbundle when hardness ≥ 0.8.** Rebundle for cold entries. +- **FailureTicket carries SPO × 2³ × TEKAMOLO × Wechsel decomposition** plus coverage + attempted_inference + recommended_next. +- **Finnish 15 cases, Russian 6 cases, Turkish 6 cases** + agglutinative chain, German 4 cases, Japanese particles — each in native terminology. + +**Deferred:** +- DeepNSM emission of FailureTicket (D2, Phase 2). +- Grammar Triangle bridge into DeepNSM (D3, Phase 2). + +**Docs:** +- `integration-plan-grammar-crystal-arigraph.md` (E1-E12 epiphanies). +- `crystal-quantum-blueprints.md` (Crystal vs Quantum modes). +- `endgame-holographic-agi.md` (5-layer stack). + +--- + +## #207 — session capstone + Wikidata plan (merged 2026-04-18) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- `session-capstone-2026-04-18.md` — 8 epiphanies (E1-E8), Sleeping Beauties (SB1-7), Missing Bridges (MB1-5), Known Brittle (KB1-5), priority map. +- `wikidata-spo-nars-at-scale.md` — 1.2B triples → 14.4 GB scale demo plan. + +**Locked:** +- **§7 addendum correction:** AriGraph is SHIPPED, not deferred. Invalidates capstone's DD2 and MB3. +- **4-pillar inheritance** (NARS + thinking + qualia + proprioception) is compile-time contract. +- **CMYK (17D experienced) vs RGB (18D observed) qualia** — dim 17 = classification_distance. +- **Vocabulary IS semantics** — glyph names use pretraining-derived associations on purpose. + +--- + +## #206 — state classification pillars (merged 2026-04-18) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- `contract::qualia` — 17-D QualiaVector + 17→11 projection. +- `contract::proprioception` — 7 StateAnchor (Intake/Focused/Rest/Flow/Observer/Balanced/Baseline), 11-D state vector, ProprioceptionAxes, StateClassifier trait, DefaultClassifier, softmax `hydrate()`. +- `contract::world_map` — WorldMapDto + WorldMapRenderer trait + DefaultRenderer. +- `world_model.rs` extended: `qualia`, `axes`, `proprioception`, `cycle_fingerprint`, `timestamp`, `cycle_index`, `is_self_recognised()`, `is_liminal()`. +- `cognitive-shader-driver::sigma_rosetta` — 64 glyphs (Σ11) × 144 verbs (Σ12 = 12 roots × 12 tenses), SigmaState, TriangleGestalt, QuadTriangleGestalt, InteractionKinematic (Σ13 dome), QPL ↔ sigma 17D bridge. +- REST (`serve` feature) + gRPC (`grpc` feature) endpoints. +- `examples/villager_ai.rs` — Pumpkin NPC AI reference. + +**Locked:** +- **Proprioception in the contract** = ontological self-recognition by compilation. +- **WorldMapRenderer pattern** — drop-in framing trait keeps contract vocabulary neutral. +- **Pumpkin / villager AI framing** for externally-visible examples (corporate disguise). +- **`proprioception::axes` named fields** (drive / arousal / dominance / …) — public API. +- **No code comments or PR descriptions** reveal the disguise scrubbing. Everything personal goes to ada-rs. + +--- + +## #205 — engine_bridge + CMYK/RGB qualia (merged 2026-04-18) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- `cognitive-shader-driver::engine_bridge` — ingest_codebook_indices, dispatch_from_top_k, EngineBusBridge, persist_cycle, UNIFIED_STYLES[12] 3-way mapping (ThinkingStyle ↔ StyleSelector ↔ p64 StyleParams). +- CMYK vs RGB qualia decomposition — 17D experienced vs 18D observed, classification_distance as dim 17. + +**Locked:** +- **12 UNIFIED_STYLES** are the canonical style inventory (3-way mapping must stay aligned). +- **Named emotion archetypes** (fear/anger/sadness/joy/surprise/disgust) live in engine_bridge as classification references. + +--- + +## #204 — cognitive-shader-driver crate + Shader DTOs (merged 2026-04-18) + +**Confidence (2026-04-19):** Working. All tests green at merge time. + +**Added:** +- New crate `cognitive-shader-driver` (24 tests). +- `contract::cognitive_shader` — ShaderDispatch / ShaderResonance / ShaderBus / ShaderCrystal, MetaWord (u32 packed: thinking 6 + awareness 4 + nars_f 8 + nars_c 8 + free_e 6), CognitiveShaderDriver trait, ShaderSink commit-adapter. +- `auto_style` — 18D qualia → style ordinal. +- 630K LOC ladybug-rs import into `lance-graph-cognitive` (grammar, spo, learning, world, search, fabric, spectroscopy, container_bs, core_full). +- `crates/holograph` imported from RedisGraph, 10K→16K migration. +- `contract::container` — Container (16K fingerprint) + CogRecord (4KB = meta + content). +- `contract::collapse_gate` — GateDecision, MergeMode. + +**Locked:** +- **Shader IS the driver** (role reversal from thinking-engine-first). +- **MetaWord packing layout** — thinking(6) + awareness(4) + nars_f(8) + nars_c(8) + free_e(6). +- **BindSpace struct-of-arrays** — FingerprintColumns (4 planes × 256 u64), EdgeColumn, QualiaColumn (18 f32), MetaColumn (u32). +- **`ShaderBus::cycle_fingerprint: [u64; 256]`** IS `Container` IS `CrystalFingerprint::Binary16K` (same 2 KB backing). +- **No serde in types** (debug-only); wire formats explicit. + +**Docs:** +- `cognitive-shader-architecture.md` (canonical architecture reference). + +--- + +## How to Use This File + +1. **Opening a session on this workspace:** read the top 3 PRs + (most recent). That covers ~90 % of what you need to know about + current state. +2. **Before proposing a new type:** grep this file for the type + name. If it's listed under Added, stop and read the source. +3. **Before proposing a convention:** grep for the topic. If it's + listed under Locked, your proposal needs explicit justification + to overturn it. +4. **When a PR merges:** prepend a new section at the top of this + file. Old PRs stay — they are the arc. + +This file is the fastest bootstrap available for a new session on +this workspace. Load it, then load 1-2 knowledge docs as the domain +triggers, then start working. Target: 3-5 turn cold start, not 30. diff --git a/.claude/knowledge/STATUS_BOARD.md b/.claude/knowledge/STATUS_BOARD.md new file mode 100644 index 00000000..defa64c7 --- /dev/null +++ b/.claude/knowledge/STATUS_BOARD.md @@ -0,0 +1,186 @@ +# Status Board — Cross-Deliverable View + +> Deliverable-level status across all active integration plans. +> **Status** and **PR / Evidence** columns are the only mutable +> fields — title, plan-version, and scope are immutable. +> +> For plan-level status see `INTEGRATION_PLANS.md`. +> For per-PR decision history see `PR_ARC_INVENTORY.md`. +> For current contract inventory see `LATEST_STATE.md`. + +--- + +## Status Legend + +| Status | Meaning | +|---|---| +| **Shipped** | Merged to main. PR column cites the merge commit. | +| **In PR** | PR open, under review. Not yet merged. | +| **In progress** | Active branch, code in flight, not yet PR. | +| **Queued** | Next up; spec is clear; work not started. | +| **Backlog** | Future; still in scope but not yet queued for a phase. | +| **Deferred** | Explicitly parked. Rationale recorded. Will be revisited. | +| **Abandoned** | Removed from scope. Rationale recorded. Will not be revisited. | + +Rules: +- New rows APPEND (at the bottom of the relevant section). +- Status field is the ONLY field that gets edited in place. +- When a deliverable ships, record the PR number — never delete the + row. +- When a deliverable is superseded by a different design, keep the + row with Status = Abandoned and cite the replacement. + +--- + +## elegant-herding-rocket-v1 — Phase-structured + +Active integration plan, 12 deliverables D0 + D2–D11 (D1 dropped +early — CausalityFlow extension deferred). Plan path: +`.claude/plans/elegant-herding-rocket-v1.md`. + +### Phase 1 — Shipped (PR #210, merged 2026-04-19) + +| D-id | Title | Status | PR / Evidence | +|---|---|---|---| +| D0 | grammar-landscape.md + linguistic-epiphanies + fractal-codec knowledge docs | **Shipped** | #210 — 3 docs, 1151 LOC | +| D4 | ContextChain reasoning ops (coherence / replay / disambiguate / WeightingKernel) | **Shipped** | #210 — 396 LOC, 8 tests | +| D6 | Role-key catalogue with contiguous `[start:stop]` slice addressing | **Shipped** | #210 — 404 LOC, 7 tests | + +### Phase 2 — Queued + +| D-id | Title | Status | PR / Evidence | +|---|---|---|---| +| D2 | DeepNSM emits `FailureTicket` on low coverage | **Queued** | — | +| D3 | Grammar Triangle wired into DeepNSM via `triangle_bridge.rs` | **Queued** | — | +| D5 | Markov ±5 SPO+TEKAMOLO bundler with role-indexed VSA | **Queued** | — | +| D7 | NARS-tested grammar thinking styles (meta-inference policies) | **Queued** | — | + +### Phase 3 — Queued + +| D-id | Title | Status | PR / Evidence | +|---|---|---|---| +| D8 | Story-context bridge (AriGraph episodic + triplet-graph + orthogonal global-context) | **Queued** | — | +| D10 | Forward-validation harness (Animal Farm benchmark) | **Queued** | — | + +### Phase 4 — Backlog + +| D-id | Title | Status | PR / Evidence | +|---|---|---|---| +| D9 | ONNX story-arc export + ArcPressure / ArcDerivative awareness hook | **Backlog** | — | +| D11 | Bundle-perturb emergence interface (transformer-free generative stack) | **Backlog** | — | + +### Dropped / Deferred from the plan itself + +| D-id | Title | Status | Notes | +|---|---|---|---| +| D1 | CausalityFlow 3→9 slot extension (modal/local/instrument + beneficiary/goal/source) | **Deferred** | User decision; follow-up PR after Phase 2 | + +--- + +## Infrastructure / governance (not in elegant-herding-rocket) + +Workspace-level bootstrap work. Tracked here rather than PR_ARC +because it's process, not architecture. + +| Item | Status | PR / Evidence | +|---|---|---| +| CLAUDE.md §Session Start — three mandatory reads | **Shipped** | #211 | +| CLAUDE.md §A2A Orchestration — two layers (runtime + session) | **Shipped** | #211 | +| CLAUDE.md §Model Policy — grindwork vs accumulation + never Haiku | **Shipped** | #211 | +| CLAUDE.md §GitHub Access Policy — zipball-for-reads | **Shipped** | #211 | +| `.claude/BOOT.md` session entry + prior-art links | **Shipped** | #211 | +| `.claude/agents/BOOT.md` orchestration spec (renamed from README) | **Shipped** | #211 | +| `.claude/agents/README.md` function inventory | **Shipped** | #211 | +| `.claude/knowledge/LATEST_STATE.md` current-state snapshot | **Shipped** | #211 | +| `.claude/knowledge/PR_ARC_INVENTORY.md` append-only decision arc | **Shipped** | #211 | +| `.claude/knowledge/INTEGRATION_PLANS.md` versioned plan index | **Shipped** | #211 | +| `.claude/knowledge/STATUS_BOARD.md` this file | **Shipped** | #211 | +| `.claude/settings.json` team-shared governance (ask/deny + hooks) | **Shipped** | #211 | +| `.claude/hooks/session-start.sh` + `post-compact.sh` | **Shipped** | #211 | +| `.claude/skills/cca2a/` pattern-explanation skill | **Shipped** | #211 | +| `.claude/plans/elegant-herding-rocket-v1.md` plan in workspace | **Shipped** | #211 | + +## Infrastructure — queued + +| Item | Status | Notes | +|---|---|---| +| `.claude/rules/` with `paths:` frontmatter | **Backlog** | Audit rec 2; replace / complement `READ BY:` headers with path-scoped loading | +| Skill `context: fork` + `agent:` field | **Backlog** | Audit rec 4; read-only isolation for search-only skill variants | +| Auto memory (`~/.claude/projects/<proj>/memory/`) | **Backlog** | Audit rec; unstructured addition to curated LATEST_STATE | + +--- + +## Cross-cutting research threads (orthogonal to grammar work) + +Separate research thread — not entangled with grammar/crystal/A2A. +Tracked here so it doesn't get lost. + +| Item | Status | Notes | +|---|---|---| +| Named-Entity pre-pass (NER) — biggest OSINT blocker | **Deferred** | Dedicated PR after Phase 2 | +| FP_WORDS = 160 migration (currently 157) | **Deferred** | Needs coordinated ndarray change | +| Crystal4K 41:1 persistence compression | **Deferred** | ladybug-rs owns it; would port later | +| 200–500 YAML TEKAMOLO templates per language | **Deferred** | Training pipeline; future | +| Cross-linguistic active parsers (EN+FI+RU+TR) | **Deferred** | Role keys exist; parsers later | +| Fractal-descriptor leaf codec (MFDFA on Hadamard) | **Research** | `.claude/knowledge/fractal-codec-argmax-regime.md`. 30-min probe first. | +| UK Biobank cardiac MRI benchmark | **Research** | Downstream of fractal-codec probe | +| Chess vertical (ruci + lichess-bot integration) | **Deferred** | Capstone Tier 0, parallel stream | +| Wikidata ingest (1.2 B triples → 14.4 GB) | **Deferred** | `.claude/knowledge/wikidata-spo-nars-at-scale.md` | +| OSINT pipeline (spider + reader-lm + DeepNSM) | **Deferred** | `.claude/knowledge/osint-pipeline-openclaw.md` | +| Python/TypeScript grammar-stack convergence | **Deferred** | `.claude/knowledge/grammar-landscape.md` §7 | + +--- + +## Prior-art audit (61 + 41 = 102 existing docs) + +Before this session, the workspace accumulated 61 `.claude/*.md` +top-level docs + 41 `.claude/prompts/*.md` files across prior +sessions. They are indexed in `.claude/BOOT.md §Existing content` +and `CLAUDE.md §Prior art`, but their individual **status** (still +active / superseded / archival) has not been audited. + +Status rows per bucket, not per file (102 rows would drown the +board — use filesystem + INTEGRATION_PLANS + PR_ARC for per-file +history): + +| Bucket | Count | Status | Notes | +|---|---|---|---| +| `.claude/*.md` top-level calibration reports / handovers / audits / snapshots | 61 | **Indexed** | Pointed at from BOOT.md + CLAUDE.md. Per-file active/superseded status: **Backlog** (needs one-pass audit). | +| `.claude/prompts/*.md` scoped session / probe / handover prompts | 41 | **Indexed** | Pointed at from BOOT.md via `SCOPED_PROMPTS.md` index. Per-file status: **Backlog**. | +| `.claude/knowledge/*.md` structured knowledge | 12 | **Active** | Current; each has `READ BY:` header; used by Knowledge Activation triggers. | +| `.claude/agents/*.md` specialist + meta-agent cards | 24 | **Active** | Current; used by spawning + Knowledge Activation. | +| `.claude/hooks/*.sh` | 2 | **Active** | Wired via settings.json. | +| `.claude/skills/cca2a/*.md` | 3 | **Active** | Current. | +| `.claude/plans/*.md` integration plans | 1 (v1) | **Active** | Elegant herding rocket v1, Phase 1 shipped. | + +**Backlog item — prior-art audit.** One-pass sweep across the +61+41 files. Per file: label as active / superseded / archival +with a one-line note. Deliverable = an `ARCHIVE_INDEX.md` that +splits the 102 into current vs historical, plus rename/move of +superseded files into an `archive/` subdirectory. Estimate ~200 +LOC of meta work, ~2 hours of reading. **Not urgent**; useful +before the next major planning session. + +## Update protocol + +When a deliverable ships: +1. Edit this file's Status column in place for the row → **Shipped**. +2. Fill in PR / Evidence column with the merge commit or PR #. +3. Append a new section to `PR_ARC_INVENTORY.md` (Added / Locked / + Deferred / Docs / Confidence). +4. Update `LATEST_STATE.md` (Recently Shipped PRs + Current Inventory + if types change). + +When a deliverable moves phase (e.g. Queued → In progress → In PR): +1. Edit Status column in place. Don't reorder rows. +2. If the move reflects scope correction, also update + `INTEGRATION_PLANS.md` Status line for the parent plan. + +When a new deliverable is added to a plan: +1. Append a new row at the bottom of the plan's section. +2. D-id is sequential in the plan (D12, D13, etc.). +3. Original scope becomes immutable once committed. + +When a deliverable is abandoned: +1. Edit Status → **Abandoned**. Don't remove the row. +2. Cite the replacement in Notes. diff --git a/.claude/knowledge/TECH_DEBT.md b/.claude/knowledge/TECH_DEBT.md new file mode 100644 index 00000000..edb35a45 --- /dev/null +++ b/.claude/knowledge/TECH_DEBT.md @@ -0,0 +1,188 @@ +# Technical Debt Log — Open + Paid (double-entry, append-only) + +> **Append-only ledger** for knowingly-deferred work: TODOs, shortcuts, +> workarounds, unsafe assumptions, missing probes, hardcoded +> thresholds, stubs, and anything else we shipped with intentional +> debt. Debt moves Open → Paid by status-flip; rows are NEVER +> deleted. +> +> **Purpose:** separate from `ISSUES.md` (bugs) and `IDEAS.md` +> (speculation). Tech debt is **code that works but we know +> something better is owed**. This file is where we admit it. + +--- + +## Double-entry discipline + +Same pattern as `ISSUES.md`: + +1. **Open Debt** — known shortcut or deferral, captured at shipping + time. +2. **Paid Debt** — when the shortcut is replaced with the proper + implementation, append here with PR anchor + Status flip on the + original Open entry to `Paid YYYY-MM-DD`. + +The Open entry stays in its section forever (chronology). The Paid +section accumulates retirements for audit. + +--- + +## Governance + +- **Append-only.** Never delete a row. +- **Mutable fields:** `**Status:**` and `**Payoff:**` lines only. +- **`permissions.ask` on Edit** (same rule as other bookkeeping + files). + +## Cross-references + +- `ISSUES.md` — an issue may become tech debt when knowingly + deferred rather than fixed. +- `IDEAS.md` — a rejected idea that was "the better way" often + leaves behind a debt entry documenting the compromise shipped + instead. +- `PR_ARC_INVENTORY.md` — which PR introduced the debt + which PR + paid it. +- `STATUS_BOARD.md` — debt items that block deliverables are + cross-referenced from the D-id row. +- `EPIPHANIES.md` — an epiphany often retroactively turns something + into tech debt (the old approach is now known to be suboptimal). + +--- + +## Kanban Format (priority + scope on every entry) + +Every debt item carries: +- **Priority** — `P0` must-pay-before-next-phase / `P1` pay-soon / + `P2` eventual / `P3` keep-tracked-but-low. +- **Scope** — which agent / deliverable / domain owes it: + `@<agent-name>`, `D<N>`, `domain:<tag>`. + +Ticket tag: `[P2 @truth-architect D10 domain:grammar]`. Same +filter discipline — agents pull their own debt by `@`-mention. + +## Open Debt + +(Seeded with known deferrals from recent PRs. New items PREPEND +with today's date.) + +``` +## YYYY-MM-DD — <short title> +**Status:** Open +**Priority:** P0 | P1 | P2 | P3 +**Scope:** @<agent> D<N> domain:<tag> +**Introduced by:** PR #NNN +**Payoff estimate:** <rough LOC / time> + +<one paragraph: what shortcut was taken, why, what the proper fix +looks like, any blocking dependencies> + +Cross-ref: <file:line / deliverable D-id / epiphany entry> +``` + +### Seeded from PRs #204–#211 + +## 2026-04-19 — Contract `ContextChain::coherence_at` returns 0 for non-Binary16K variants +**Status:** Open +**Priority:** P2 +**Scope:** @resonance-cartographer @container-architect D4 domain:grammar +**Introduced by:** PR #210 +**Payoff estimate:** ~80 LOC + tests + +D4 shipped with Hamming-based coherence on the `Binary16K` variant +only; other `CrystalFingerprint` variants (`Structured5x5`, +`Vsa10kI8`, `Vsa10kF32`) return 0 as a zero-dep fallback. Cosine +coherence on the f32 variants would unlock richer disambiguation +but requires adding a minimal linear-algebra shim without breaking +the zero-dep invariant of the contract. + +Cross-ref: `crates/lance-graph-contract/src/grammar/context_chain.rs`. + +## 2026-04-19 — CausalityFlow has 3/9 TEKAMOLO slots; modal/local/instrument + beneficiary/goal/source deferred +**Status:** Open +**Priority:** P1 +**Scope:** @integration-lead @truth-architect D1 domain:grammar +**Introduced by:** PR #208 + #210 (deliberate deferral) +**Payoff estimate:** 6 new `Option<String>` fields in +`lance-graph-cognitive/src/grammar/causality.rs` + tests + +Full thematic-role inventory needs 6 more slots on `CausalityFlow`. +Deferred per user decision; D2 ticket emission and D3 triangle +bridge map only the 3 existing slots for now. Phase 2 work is +consistent with 3/9; Phase 3 may benefit from the extension. + +Cross-ref: `grammar-landscape.md` §3, `STATUS_BOARD.md` D1 row. + +## 2026-04-19 — Named-Entity pre-pass (NER) is the biggest OSINT blocker; stubbed out +**Status:** Open +**Introduced by:** architectural choice (all PRs) +**Payoff estimate:** dedicated PR, ~800 LOC new crate / subsystem + +COCA 4096 has zero coverage of proper nouns (Altman, Anthropic, +Riyadh). Every unknown entity falls through to hash-bucket +collisions in the SPO graph. Grammar work proceeds without NER; +OSINT pipeline is blocked on this. + +Cross-ref: `grammar-tiered-routing.md` §C8, +`STATUS_BOARD.md` Research threads section. + +## 2026-04-19 — FP_WORDS = 157 (not 160); SIMD remainder loops remain +**Status:** Open +**Introduced by:** architectural choice (ndarray::hpc::vsa) +**Payoff estimate:** coordinated ndarray + lance-graph change, +~30 LOC in ndarray + 0 in lance-graph if field naming is stable + +160 u64 = 10,240 bits (SIMD-clean for AVX-512 / AVX2 / NEON), zero +remainder loop in every SIMD pass. Current 157 u64 has 5-word +scalar tail. Performance delta is measurable but not critical yet. + +Cross-ref: `cross-repo-harvest-2026-04-19.md` H6. + +## 2026-04-19 — Abduction-threshold for unbundle-to-graph is hand-picked (0.88) +**Status:** Open +**Introduced by:** #208 (inherits PR design) +**Payoff estimate:** empirical calibration run on real corpus + +~30 LOC threshold parameterization + +NARS Abduction confidence threshold for promoting facts into the +triplet graph is hand-picked at 0.88. Miscalibration on a specific +corpus (e.g. Animal Farm) would compound errors. Calibration is +pending D10 validation harness. + +Cross-ref: `integration-plan-grammar-crystal-arigraph.md` E8, +`STATUS_BOARD.md` D10 row. + +--- + +## Paid Debt + +(No debt paid at initial commit. When an Open entry is retired, +APPEND here with same title + PR anchor.) + +``` +## YYYY-MM-DD — <same title as Open entry> (from YYYY-MM-DD) +**Status:** Paid YYYY-MM-DD +**Payoff:** PR #NNN (commit SHA) — <one-line description> + +<verbatim original Open paragraph> + +Cross-ref: <same + PR link> +``` + +--- + +## How to use this file + +**When shipping with a known shortcut** — prepend to **Open Debt** +with `**Status:** Open` + `**Introduced by:** PR #NNN` + +`**Payoff estimate:**`. One paragraph describing what's owed. + +**When paying debt** — append to **Paid Debt** with the same title ++ date anchor + `**Status:** Paid YYYY-MM-DD` + `**Payoff:** PR +#NNN`. Flip the Open entry's Status to `Paid YYYY-MM-DD`. + +**When debt becomes irrelevant** (e.g. the feature it blocked got +abandoned) — flip Open Status to `Moot YYYY-MM-DD`. Keep the row. + +Nothing is lost. Every shortcut has a trail from introduction to +payoff (or abandonment). diff --git a/.claude/plans/elegant-herding-rocket-v1.md b/.claude/plans/elegant-herding-rocket-v1.md new file mode 100644 index 00000000..d55b77de --- /dev/null +++ b/.claude/plans/elegant-herding-rocket-v1.md @@ -0,0 +1,2085 @@ +# Plan — DeepNSM as Full Parser via Markov ±5 Context Upgrade + +## Context + +**Priority (user):** Making DeepNSM the language parser without LLM is +huge; it has priority. Markov ±5 SPO+TEKAMOLO bundling is **the +context upgrade to NARS + SPO 2³ + TEKAMOLO** — trajectory as +reasoning unit, not sentence. Crystal types are reused, not expanded. + +**Already documented** (do not duplicate, read as prerequisites): + +- `.claude/knowledge/grammar-tiered-routing.md` — the full 5-criterion + coverage detector, SPO×2³×TEKAMOLO×Wechsel failure decomposition, + morphology coverage table (Finnish 98 % > English 85 %), self- + improving loop, OSINT language priorities. +- `.claude/knowledge/integration-plan-grammar-crystal-arigraph.md` — + 12 epiphanies E1–E12 (grammar-tiered, morphology-easier, FailureTicket, + cross-lingual superposition, Markov ±5, NARS-about-grammar, crystal + hierarchy, CrystalFingerprint, 5D quorum, episodic unbundle, AriGraph + substrate, demo matrix). +- `.claude/knowledge/crystal-quantum-blueprints.md` — Crystal mode + (bundled Markov SPO chain, Structured5x5 middle cells, bipolar) vs + Quantum mode (holographic residual, Vsa10kF32 + PhaseTag, sandwich + wings for phase keys). +- `.claude/knowledge/cross-repo-harvest-2026-04-19.md` — H1 Born rule, + H2 phase-tag threshold, H3 interference truth, H4 Triangle ≡ + ContextCrystal(w=1), H5 NSM ≡ SPO axes, H6 FP_WORDS=160, H7 Mexican- + hat, H8 Int4State, H9 Glyph5B, H10 Crystal4K 41:1, H11 teleport F=1, + H12 144-verb taxonomy, H13 Three Mountains, H14 Triangle+Context + hybrid. +- `.claude/knowledge/session-capstone-2026-04-18.md` — SB1–SB7, MB1– + MB5, E1–E8, §7 AriGraph-already-shipped addendum. +- `.claude/knowledge/endgame-holographic-agi.md` — 5-layer stack, 12- + step holographic memory loop, P0/P1/P2/P3 priorities. + +## Shipped Today (merged to main) + +- PR #208 — grammar/ + crystal/ contract modules, AriGraph + unbundle hooks. +- PR #209 — sandwich layout, bipolar cells, Codex fixes. +- 112 contract tests + 11 episodic tests passing. + +## Today's Lossiness Epiphanies (not yet in a knowledge doc) + +| Form | Size | Lossy? | Why | +|---|---|---|---| +| 10 000-D f32 (Vsa10kF32) | 40 KB | **LOSSLESS** | 320 K bits of capacity ≫ any bundled signal; orthogonal role keys give exact unbundle | +| Signed 5^5 bipolar | 3 KB | **LOSSLESS** | Negative cancellation = VSA superposition; opposites cancel natively | +| Unsigned 5^5 | 3 KB | **LOSSY** | No cancellation; accumulation saturates | +| Binary bitpacked (10K or 16K) | 1.25 / 2 KB | **LOSSY** | Majority-vote bundle commits to 0/1, loses count | +| CAM-PQ projection (Binary16K → 10K) | 10 / 40 KB | **LOSSLESS** | Distance-preserving by construction | + +**VSA convention:** role keys use contiguous `[start:stop]` slices +(e.g. SUBJECT → [0..156], OBJECT → [156..312]), not scattered bits. +This is the `vsa_permute`-friendly layout. + +## First Concrete Target — Pronoun / Coreference Resolution + +Before any abstract architecture claim, there's a bounded problem that +exercises the entire meta-inference stack: **resolving grammatical +references** — "it", "he", "she", "they", "the company", "that" — +back to their antecedents. Every downstream pipeline (OSINT, chess +commentary, Wikidata triples) depends on it. LLM-based extractors +solve it via attention over tokens; we solve it via **logical +reasoning over the ±5 Markov chain**. + +**The problem:** + +``` +S_{-2} "Pentagon contracted OpenAI in December." +S_{-1} "Thiel was at the meeting." +S_{0} "It announced the deal on Tuesday." ← "it" = ? +S_{+1} "The stock jumped 4%." +``` + +"It" could refer to Pentagon, OpenAI, or the meeting. English leaves +it ambiguous. LLMs use attention heads. We use the context chain as +the candidate enumerator and meta-inference as the decision rule. + +**Grammatical roles ARE bundling — the VSA-native shortcut.** + +Each grammatical role owns a contiguous `[start:stop]` slice of the +10K VSA space (D6). At parse time, content is XOR-bound into its +role's slice and accumulated: + +``` +for token in sentence: + role_key = role_keys[token.grammatical_role] // per-slice key + bound = vsa_bind(content_fp(token), role_key) + trajectory = vsa_bundle(trajectory, bound) // accumulates +``` + +Because binding uses contiguous slices, bundling preserves +role-indexed content throughout. At retrieval: + +``` +subjects_bundle = vsa_unbind(trajectory, SUBJECT_KEY) // just subjects +objects_bundle = vsa_unbind(trajectory, OBJECT_KEY) +locatives_bundle = vsa_unbind(trajectory, LOKAL_KEY) +``` + +**The bundle IS the candidate store.** Candidate enumeration for +coreference becomes an O(1) slice unbind, not a search: + +- "She announced…" → `vsa_unbind(±5_trajectory, SUBJECT_KEY)` recovers + every subject played in the window. +- Then apply `vsa_clean` against a gender-feminine codebook to filter + to feminine-animate survivors. +- Typically 1-2 candidates remain; Deduction commits. + +**Why this matters:** without role-indexed bundling, candidate +enumeration scans the context chain and rebuilds the candidate set +every time. With role-indexed bundling, the bundle itself IS the +structured index — retrieval is a single XOR + clean. + +**Cross-lingual bundling is literally role-slice addition.** Finnish +parse of the same entity lands Finnish content into the same +SUBJECT_KEY slice as English content. When the two trajectories are +bundled together, the SUBJECT slice carries both parses' +role-committed content. Unbinding recovers the aggregate — and +Finnish case morphology has already committed slots the English +parse left ambiguous. + +**This is what D5 and D6 actually produce together.** D5's +`MarkovBundler` does the role-indexed bind-and-bundle. D6's role keys +are the slice addresses. Coreference resolution = D4 +`ContextChain::disambiguate` running counterfactual substitution over +the role-unbind candidate set. + +**How the stack resolves it — grounded in what we're shipping:** + +1. **Enumerate candidates from ±5 Markov chain** — extract every noun + in `ContextChain.preceding + focal + following` as a potential + antecedent. Weight by recency × grammatical salience (subjects > + objects > obliques) × Mexican-hat kernel. + +2. **Hydrate each candidate's SPO role** — from D3's Grammar Triangle + bridge, every prior sentence's SPO triple is available with Pearl + 2³ mask committed. Pentagon was Subject of "contracted"; OpenAI + was Object; meeting was Locative modifier. + +3. **Counterfactual axis test** — for each candidate, apply the + substitution: "Pentagon announced the deal" / "OpenAI announced + the deal" / "The meeting announced the deal". Score the SPO + coherence (does the verb fit the subject? "meetings don't + announce" → Pearl mask inconsistent → low score). + +4. **Markov axis test** — bind each candidate as the Subject of S_{0}, + replay the chain via `ContextChain::replay_with_alternative`, + measure `total_coherence`. The binding that maximises coherence + with ±5 context wins. + +5. **Joint reading (D7 meta-inference duality)** — counterfactual and + Markov axes must agree. If both say "Pentagon" → Deduction commits. + If they split (CF says Pentagon by subject-continuity; Markov says + OpenAI because stock jumped = company signal) → the disagreement + is diagnostic: + - One of them is measuring a surface bias (subject continuity + default) that the other correctly overrides. + - The style's permanent dispatch picks Abduction to weigh which + axis has grip on this signal profile. + +6. **NARS revision on outcome** (D7 empirical layer) — when the + ensemble agrees with a downstream confirmation (e.g. the LLM + fallback would have picked the same answer, or a named-entity + link confirms the binding), the `GrammarStyleAwareness` revises + its prior over "how often the subject-continuity default holds + on this content distribution." + +**Two-tier resolution by inherent factual content.** + +Pronouns split into two classes with different resolution costs: + +**Fixed Pronomen — carry inherent factual commitments:** + +| Pronoun | Inherent commits (axiomatic, permanent, language-specific) | +|---|---| +| I / we / me / us | Speaker-deictic (SPEAKER role) | +| you / thou | Addressee-deictic (ADDRESSEE role) | +| he / him / his | singular + animate + masculine | +| she / her | singular + animate + feminine | +| they (personal) | plural + animate | +| Proper names | rigid designators — committed entity | + +For fixed pronouns, the **inherent-feature filter over ±5 candidates +IS the resolution**. "She announced…" with candidate set +`{Pentagon(neuter), OpenAI(neuter), Marietje(fem)}` — the filter +eliminates 2/3 axiomatically. No counterfactual replay needed. This +is Deduction on inherent facts. Cheap, permanent. + +**Wechsel Pronomen — zero inherent commitment:** + +| Pronoun | Ambiguous on | +|---|---| +| it | referent (anything non-animate, or expletive) | +| that | demonstrative / relative / complementizer / conjunction | +| this | demonstrative / intensifier ("this big") | +| they (singular / expletive) | "they say…", singular they | +| one | indefinite / ordinal / pro-form | +| which | relative / interrogative | + +These need the **full meta-inference stack** (CF × Markov axes) from +the previous section — the expensive path. They're already on the +`WechselAmbiguity` list from D2. + +**Cross-linguistic commitment profiles vary per pronoun class — +another reversal from the morphology story:** + +| Language | Morphology (slot-filling) | Pronoun feature commitment | +|---|---|---| +| English | weak (word order) | moderate (he/she/it on 3sg) | +| German | moderate (4 cases) | **strong** (er/sie/es + case: ihn/ihm/seiner) | +| Russian | heavy (6 cases) | **strong** (он/она/оно + full case paradigm) | +| Finnish | **very heavy** (15 cases) | **weak** (single `hän` for he/she — gender-neutral) | +| Japanese | agglutinative particles | minimal (usually dropped) | +| Turkish | agglutinative | weak (single `o` for he/she/it) | + +**Finnish is the easiest language on morphological slot-filling but +NOT on pronoun feature resolution** (because `hän` is gender-neutral). +German and Russian are richest on pronoun features. + +A cross-lingual bundle (EN + DE + RU + FI) maximises both axes +simultaneously: each language contributes where its commitment is +strongest. Wechsel-heavy English + feature-weak Finnish + feature- +strong German/Russian = the complementary quartet for OSINT +resolution. + +**Resolution dispatch:** + +``` +pronoun = lex_classify(token) + │ + ├── FIXED → feature_filter(±5 candidates, pronoun.inherent_features) + │ → 1-2 survivors → Deduction commits + │ → no Markov / CF needed (usually) + │ + └── WECHSEL → full D7 meta-inference + (CF axis × Markov axis × cross-lingual bundle) + → WechselAmbiguity on the FailureTicket if both weak +``` + +**Categories of reference resolution this unlocks** (all same +mechanism, different candidate generators): + +| Reference type | Example | Candidate set | +|---|---|---| +| Anaphoric pronoun | "Pentagon … **it** announced …" | All preceding nouns, subject-weighted | +| Cataphoric pronoun | "If **he** calls, tell John …" | Next-mentioned proper noun | +| Dropped subject (Russian, Japanese, Italian) | "Пошёл домой." ("[he/she] went home") | Preceding subject, verb agreement narrows gender/number | +| Definite NP | "the company" | All prior company-mentioned entities | +| Demonstrative | "that", "this" | Most salient noun or fact by Markov coherence | +| Possessive | "his", "her", "its" | Possessor candidates filtered by animacy + agreement | +| Relative pronoun | "the man who …" | Immediate antecedent in syntactic scope | + +**Cross-lingual bonus** — dropped subjects are cheap in +morphologically-rich languages because verb agreement narrows the +candidate set before the Markov axis fires. Russian +`Пошла домой` — feminine verb ending → candidate restricted to +feminine antecedents in ±5. Japanese verb form + particle commits +discourse role. English has no such morphological narrowing, which is +why English coreference is *the* hard problem and everyone else is +easier. + +**What this means for shipping order.** + +The D2/D3/D4/D5/D6/D7 deliverables already cover everything needed. +The coreference problem is **the first verification target** — the +end-to-end test that proves the stack works: + +- D4 `ContextChain::disambiguate` runs the replay. +- D5 Markov bundle produces the trajectory fingerprints to score against. +- D6 role keys bind each candidate as the Subject in the counterfactual substitution. +- D7's meta-inference duality combines CF + Markov scores. +- D2 FailureTicket fires only when both axes are weak. + +**Integration test corpus: Orwell, *Animal Farm*.** + +Animal Farm is the canonical benchmark because: + +- Orwell is famous for deliberately-clean textbook English grammar — + plain, precise, unambiguous. Used in English-language teaching + precisely because every sentence parses cleanly. If we cannot hit + 85 %+ local on Animal Farm, the tiered-routing claim fails. +- Heavy named-entity density: Napoleon, Snowball, Squealer, Boxer, + Clover, Mollie, Benjamin, Old Major, Mr. Jones, Pilkington, + Frederick, Whymper. Each has committed animacy / gender / species / + social-role features — classic fixed-pronoun resolution targets. +- Gendered pronouns across species: "Mollie tossed her mane" / + "Napoleon issued his decree" / "the horses lost their strength" — + feature-filter resolution is unambiguous on 90 %+ of cases. +- Subject-continuity chains through chapters — Markov ±5 coherence is + measurable across whole narrative sequences. +- Clear causal chains (Squealer explains why X; Napoleon decrees + because Y) — SPO 2³ causal mask commits are testable. +- Public-domain (UK copyright expired 2020); ~40 K words; small + enough to process end-to-end in a single run; large enough to give + statistically meaningful coref accuracy. + +**Benchmark targets:** + +- Local (no LLM) coverage ≥ 85 % on full Animal Farm text. +- Fixed-pronoun coreference accuracy ≥ 95 % (he/she/his/her/their + against gendered named animals). +- Wechsel-pronoun ("it", "that") accuracy ≥ 75 % via CF+Markov meta- + inference. +- Cross-lingual bundle improvement: EN + DE (Orwell's German + translation is widely-available, textbook-grade) shows measurable + Wechsel-pronoun disambiguation improvement ≥ 5 %. +- End-to-end latency: full book parsed locally in ≤ 5 minutes + (< 10 µs/sentence × 40 K sentences = 400 ms; the rest is ±5 + buffering and bundle construction). + +--- + +## The Synthesis (what Markov ±5 actually buys) + +Pre-Markov reasoning unit = **sentence**. Thinking style applies NARS +inference to one SPO triple + TEKAMOLO slots. Isolated. Fragile on +Wechsel. + +Post-Markov reasoning unit = **trajectory**. Thinking style applies +NARS inference to a trajectory fingerprint that carries ±5 sentences +of context (position-permuted, Mexican-hat weighted). NARS doesn't +reason about "this sentence"; it reasons about "this sentence in +this flow." This is the context dimension upgrade to NARS+SPO2³+TEKAMOLO. + +Cross-linguistic corollary (AGI-grade per user): XOR-bundle a trajectory +from EN and FI of the same entity. Finnish case morphology (adessive +`-llä`, elative `-sta`) commits Wechsel-ambiguous English roles into +specific TEKAMOLO slots. The superposed trajectory carries +disambiguation for free. + +## To Build (one combined PR, ~1,555 LOC) + +### D0 — Knowledge doc + +**New:** `/home/user/lance-graph/.claude/knowledge/grammar-landscape.md` +(~320 lines). Three grammar stacks (Rust / Python / TypeScript) with +paths + LOCs; Triangle = NSM × Causality × Qualia; TEKAMOLO templates +with 3-slot gap flagged; YAML 200–500/language training pipeline as +future target; Markov ±5 as the context upgrade to NARS+SPO2³+TEKAMOLO. +Cross-refs to the five existing knowledge docs. + +**Includes a §Case Inventories Per Language** section with native- +terminology tables for at least: Finnish (15 — with the Accusative +correction), Russian (6 — full Nom/Gen/Dat/Acc/**Ins**/Prep), +German (4 — Nom/Gen/Dat/Akk), Turkish (6 + agglutinative chain), +Japanese (particles: が/を/に/で/へ/と/から/まで). Each case entry +maps to its permanent TEKAMOLO / SPO role and notes when a Latinate +label (e.g. "Accusative") would be misleading for that language. + +**Also updates (edit):** `grammar-tiered-routing.md` — patch the +Finnish case table with the correct Nominative/Genitive/Partitive +object marking and a note that true Accusative is personal-pronoun +only. Patch is ≤ 30 lines. + +**CausalityFlow extension scope (deferred from this PR, documented +in D0).** The original "3 new fields" (modal / local / instrument) +is insufficient. Full thematic-role inventory adds 3 more: + +- **Beneficiary** — "for whom" (dative of benefit). German + `für ihn`, Finnish Allative `-lle` in benefactive reading, Russian + Dative with `для` / `+dat`. +- **Goal / Direction** — "to where" (directional). Finnish Illative + `-Vn/-hVn/-seen`, Russian Accusative with `в/на`, German `nach/ + zu` + Dat. +- **Source** — "from where" (ablative origin). Finnish Elative + `-sta/-stä` or Ablative `-lta/-ltä`, Russian Genitive with `из/с`, + German `aus/von` + Dat. + +Optional later additions (language-specific): + +- **Path** — "through where" (prolative). Finnish Prolative + `-tse/-itse`, Turkish Instrumental-path construction. +- **Purpose / Finale** — "for what purpose". German `zum + Inf`, + Finnish Translative `-ksi` in purposive reading. +- **Result** — "leading to what". German `sodass`, Finnish + consequence constructions. + +Full extended CausalityFlow (deferred follow-up PR, not this one): + +```rust +pub struct CausalityFlow { + // existing (3/9 slots): + pub agent: Option<String>, + pub action: Option<String>, + pub patient: Option<String>, + pub reason: Option<String>, // Kausal + pub temporality: f32, // Temporal as float + pub agency: f32, + pub dependency_type: DependencyType, + + // TEKAMOLO completion (3 slots, the original "modal/local/instrument"): + pub modal: Option<String>, + pub local: Option<String>, + pub instrument: Option<String>, + + // Thematic-role completion (3 more slots, classical theory): + pub beneficiary: Option<String>, + pub goal: Option<String>, + pub source: Option<String>, +} +``` + +6 new `Option<String>` fields in total (not 3) to reach the full +thematic-role inventory. Still trivial struct change; still deferred +from the current PR's scope (user decision). + +### D2 — DeepNSM emits FailureTicket + +**Edit:** `crates/deepnsm/src/parser.rs` (+30 LOC) — end-of-parse +coverage branch. **New:** `crates/deepnsm/src/ticket_emit.rs` (+120). +**Feature gate:** `contract-ticket` on deepnsm; default stays zero-dep. + +Ticket carries the SPO × 2³ × TEKAMOLO × Wechsel decomposition from +grammar-tiered-routing.md §"Combined failure ticket". Failure reason +itself is the routing signal (COCA miss → Wikidata; FSM incomplete → +LLM parse; slots unfillable → disambiguate; low primes → LLM; high +classification_distance → novel domain). + +### D3 — Grammar Triangle wired into DeepNSM + +**New:** `crates/deepnsm/src/triangle_bridge.rs` (+220 LOC). + +Thin adapter: DeepNSM FSM produces `SentenceStructure`; call +`lance_graph_cognitive::grammar::GrammarTriangle::analyze(text)` in +parallel to produce `(NSMField, CausalityFlow, Qualia18D)`; merge +into `SpoWithGrammar { triples, causality, nsm_field, +qualia_signature, classification_distance }`. + +**Feature gate:** `grammar-triangle` (additive). + +**Explicit non-scope:** CausalityFlow TEKAMOLO extension (modal/local/ +instrument) is **deferred** per user. Current 3/6 slots mapped. + +### D4 — ContextChain reasoning ops (coherence / replay / disambiguate) + +**Edit:** `crates/lance-graph-contract/src/grammar/context_chain.rs` +(+140 LOC). Upgrades ring-buffer to reasoning substrate: + +```rust +impl ContextChain { + pub fn coherence_at(&self, i: usize) -> f32; + pub fn total_coherence(&self) -> f32; + pub fn replay_with_alternative(&self, i: usize, + alt: CrystalFingerprint) -> (Self, f32); + pub fn disambiguate<I>(&self, i: usize, candidates: I) + -> DisambiguationResult; +} +pub enum WeightingKernel { Uniform, MexicanHat, Gaussian } +``` + +Zero-dep preserved (internal Hamming on `Box<[u64; 256]>`; cosine for +VSA variants). Mexican-hat = harvest H7 landing. + +### D5 — Markov ±5 SPO+TEKAMOLO bundler (role-indexed) + +**New:** `crates/deepnsm/src/markov_bundle.rs` (+220 LOC). +**New:** `crates/deepnsm/src/trajectory.rs` (+80 LOC). + +`MarkovBundler` — **role-indexed bundling** (the VSA-native shortcut +for coreference): + +- Ring buffer of 11 `SentenceStructure`s. +- For each token, XOR-bind `content_fp(token)` with the appropriate + role key from D6: SUBJECT_KEY, PREDICATE_KEY, OBJECT_KEY, + MODIFIER_KEY, CONTEXT_KEY, TEMPORAL_KEY, KAUSAL_KEY, MODAL_KEY, + LOKAL_KEY, INSTRUMENT_KEY, or a language-specific case key + (Finnish Inessive, Russian Instrumental, etc.) when morphology + commits the role. +- Because role keys own disjoint `[start:stop]` slices of the 10K + vector, the bind lands content into its role's slice; no cross- + role contamination. +- `vsa_permute(v, position_offset)` per sentence in the window + (harvest H5 bit-chain). +- `vsa_bundle` with Mexican-hat weights (focal > ±1 > … > ±5). +- Produce `Trajectory { vsa_vector, context_chain, structured_5x5 }`. + Trajectory's fingerprint IS `SentenceCrystal::fingerprint` — no new + crystal type introduced. + +**Companion retrieval API on `Trajectory`:** + +```rust +impl Trajectory { + /// O(1) unbind of a role slice — recovers the role-indexed + /// content bundle without scanning the context chain. + pub fn role_bundle(&self, role: GrammaticalRole) -> Box<[f32; 10_000]>; + + /// Filter the role bundle through a codebook (e.g. feminine- + /// animate nouns for "she") to produce survivor candidates. + pub fn role_candidates( + &self, + role: GrammaticalRole, + filter_codebook: &[&[f32; 10_000]], + ) -> Vec<Candidate>; +} +``` + +This is what makes coreference O(1) per query: candidate +enumeration = slice unbind + codebook clean, not a context-chain scan. + +Target bundled form: **lossless** — `Vsa10kF32` sandwich (f32 wings +accumulate residuals; middle cells hold Structured5x5). Derived +`Binary16K` via sign-binarize for hot-path sweep. + +### D6 — Role-key catalogue + +**New:** `crates/lance-graph-contract/src/grammar/role_keys.rs` +(+160 LOC). Deterministic `LazyLock` `VsaVector`-shaped keys via FNV-64, +addressed as contiguous `[start:stop]` slices: + +- 5 SPO-role keys: SUBJECT [0..2000], PREDICATE [2000..4000], + OBJECT [4000..6000], MODIFIER [6000..8000], CONTEXT [8000..10000]. +- 5 TEKAMOLO-slot keys (TEMPORAL, KAUSAL, MODAL, LOKAL, INSTRUMENT; + last three future-ready). +- 15 Finnish case keys from `FinnishCase`. +- 12 tense keys (aligned with sigma_rosetta 144 = 12 × 12). +- 7 NARS inference keys. + +(Exact slice boundaries set so role-domains don't overlap; Hamming +similarity within a role-domain stays meaningful.) + +### D8 — Story Context = Episodic Memory + Triplet Graph (already shipped) + +**The story IS a graph.** AriGraph's `triplet_graph.rs` (1064 LOC, +shipped) stores every SPO triple with NARS truth and temporal +metadata. `episodic.rs` (210 LOC + unbundle hooks from PR #208) stores +each sentence as an Episode. Together they already represent the +narrative: entities are nodes, verbs are edges, temporal ordering +makes a Markov chain through the graph. No new `StoryContext` struct +needed — just query methods that traverse what's there. + +**Two-tier Markov chain via existing storage:** + +| Tier | Where it already lives | Query method | +|---|---|---| +| Working memory (±5) | `MarkovBundler::Trajectory` (D5, new) | `Trajectory::role_bundle` | +| Paragraph (±50) | `EpisodicMemory` Hamming retrieval | existing `retrieve_similar(fp, k)` | +| Document (±500) | `TripletGraph` BFS association + spatial paths | existing `triplet_graph.rs` | +| Permanent (rigid designator) | AriGraph unbundled facts (PR #208) | `EpisodicMemory::unbundle_hardened` | + +Each tier uses the **same role-indexed VSA bundling mechanism** and +the **same SPO store** — the scale is the query radius, not a new +data structure. + +**On the grammar side (already covered in D5):** + +`MarkovBundler` emits a `Trajectory` at each focal sentence. Feeds +into AriGraph via the existing `EpisodicMemory::add()`. + +**On the AriGraph side — two thin additions, no new module:** + +**(a) Graph direct lookup — objects as nodes, SPO as edges.** + +The existing `TripletGraph` (1,064 LOC shipped) already stores +objects as nodes addressed by fingerprint, with SPO triples as +edges carrying NARS truth + Pearl 2³ causal mask. For coref, this +is the right primitive: direct node lookup by feature index is +lossless and O(k) where k = matching nodes, vs a bundled vector's +√N-bounded recall. + +Three retrieval tiers, chosen by query type: + +**Tier 1 — direct graph lookup by feature (primary for coref).** + +```rust +impl TripletGraph { + /// Enumerate nodes whose feature bundle matches the filter + /// (e.g. masculine + animate for "he", feminine + animate for + /// "she"). Reads the feature indices maintained during commit. + pub fn nodes_matching( + &self, + features: &FeatureFilter, + ) -> Vec<NodeRef>; + + /// Rank nodes by Markov proximity to a focal sentence — + /// recency-weighted via Mexican-hat over episode_index delta. + pub fn rank_by_proximity( + &self, + candidates: Vec<NodeRef>, + focal_episode: u64, + ) -> Vec<RankedCandidate>; +} +``` + +For coref resolution: + +``` +pronoun "he" at focal episode 9 + │ + ▼ +graph.nodes_matching(masculine + animate) + │ → [Napoleon, Boxer, Benjamin, Mr. Jones] (k small, lossless) + ▼ +graph.rank_by_proximity(..., focal=9) + │ → [Napoleon (score 0.8), Boxer (0.3), Benjamin (0.1), Jones (0.05)] + ▼ +commit top candidate if margin > threshold else FailureTicket +``` + +This is O(k) not O(√N). No bundle saturation. No tree walk. +Every committed entity remains individually addressable forever. + +**Tier 1.5 — orthogonal global-context superposition of known facts.** + +Every fact that crosses the hardness threshold (PR #208) or the +Abduction-confidence threshold (this PR) is **also** bundled into a +single 10 K orthogonal global-context vector — not per-role, just +the committed-facts field. Each fact is `vsa_permute`d by its +episode index before bundling so facts sit in orthogonal subspaces +and don't destructively interfere. + +```rust +impl EpisodicMemory { + /// Orthogonal superposition of all known (hardness- or + /// Abduction-committed) facts. Each fact permuted by its + /// episode index, then bundled. Staying within √N capacity + /// because only committed facts contribute — Animal Farm's + /// committed set is ~500, well below saturation. + pub fn global_context(&self) -> &[f32; 10_000]; + + /// Called on every unbundle (hardness or Abduction gate) — + /// O(1) incremental update: permute, XOR-accumulate. + fn integrate_into_global(&mut self, fact: &Triplet, ep_idx: u64); +} +``` + +**Coref query becomes a two-stage refinement:** + +``` +Stage A — ambient pre-filter (cheap, one superposition + unbind) + combined = vsa_bundle(trajectory, global_context) + ambient_subj = vsa_unbind(combined, SUBJECT_KEY) + ambient_survivors = vsa_clean(ambient_subj, feature_codebook) + → ~5-10 candidates from ±5 + whole-story committed facts + +Stage B — lossless refinement via graph direct lookup (Tier 1) + node_candidates = graph.nodes_matching(FeatureFilter { … }) + refined = intersect(ambient_survivors, node_candidates) + ranked = graph.rank_by_proximity(refined, focal_episode) + +Commit ranked[0] if margin > threshold else FailureTicket. +``` + +The global context is the **attention mask** over the graph: it +pre-filters the candidate pool before O(k) node enumeration. Graph +stays lossless; superposition keeps the common path cheap. + +**Why gate on hardness / Abduction threshold only:** uncommitted +content stays ±5-local. Only facts the system actually believes +contribute to global context. The global vector encodes "what we +know for sure about this story so far" — exactly the information a +reader accumulates about Animal Farm as they read it. + +**Tier 3 — CLAM tree for 5 B-sentence scale.** + +Only needed for Wikidata / chess-database regime where the graph has +billions of nodes and feature-index enumeration gets expensive. +Already shipped in `ndarray/src/hpc/clam.rs`. Escape hatch, not +primary path. + +**Why this ordering matters.** Coref is an **individual-entity** +query — "which specific noun does this pronoun refer to?" That's a +node lookup, not a bundle reconstruction. The graph IS the natural +data structure. Tier 2's bundle superposition is for a different +question (ambient texture), and Tier 3 is for a different scale +(billion-node corpus). Getting the tiering right keeps each +mechanism doing what it's best at. + +**(b) Abduction-threshold unbundle + epiphany/error-correction split.** + +Extend PR #208's `unbundle_hardened` with TWO complementary gates: + +```rust +/// Abduction-confidence threshold for direct fact promotion. +/// Parallel to UNBUNDLE_HARDNESS_THRESHOLD. +pub const UNBUNDLE_ABDUCTION_THRESHOLD: f32 = 0.88; + +/// Threshold above which a counterfactual loser is "epiphany +/// material" — retained alongside the winner rather than silently +/// discarded. Below this, losers are routine error corrections. +pub const EPIPHANY_SUPPORT_THRESHOLD: f32 = 0.55; + +impl EpisodicMemory { + /// Promote Abduction outcomes to the triplet graph as committed + /// facts when NARS confidence exceeds UNBUNDLE_ABDUCTION_THRESHOLD. + pub fn unbundle_abductive( + &mut self, + graph: &mut TripletGraph, + ) -> UnbundleReport; + + /// Emit a counterfactual outcome from meta-inference. Classifies + /// into epiphany or error-correction based on the loser's + /// independent Markov + graph support. + pub fn record_counterfactual( + &mut self, + graph: &mut TripletGraph, + winner: &Triplet, + loser: &Triplet, + winner_confidence: f32, + loser_independent_support: f32, + ) -> CounterfactualDisposition; +} + +pub enum CounterfactualDisposition { + /// Loser had no independent support — routine ambiguity + /// resolution. Apply truth decrement to the losing interpretation, + /// commit only the winner. Silent. + ErrorCorrection { winner_truth: TruthValue, + loser_decrement: f32 }, + + /// Loser has independent Markov / graph support — the + /// ambiguity IS the meaning (propaganda, sarcasm, unreliable + /// narrator). BOTH readings commit as separate triples with + /// their own NARS truth. The episode gets tagged as carrying + /// an "epiphany" marker for later surfacing. + Epiphany { winner_truth: TruthValue, + loser_truth: TruthValue, + reason_tag: EpiphanyTag }, +} + +pub enum EpiphanyTag { + /// Both readings coherent with ±5 Markov (legitimate ambiguity). + DualCoherence, + /// Graph carries hardened facts supporting the loser (narrative + /// unreliability / deliberate revision). + GraphConflictsWithLocal, + /// Cross-lingual bundle disagrees with English default + /// (colloquialism / idiom / translation-sensitive reading). + CrossLingualDivergence, + /// Thinking style self-flagged: this style's track record on + /// similar signal profile is low; flag the decision. + StyleLowConfidence, +} +``` + +**Routing via the two gates:** + +``` +Counterfactual outcome at meta-inference dispatch + │ + ├── winner_confidence > 0.88 AND loser_independent_support < 0.55 + │ → ErrorCorrection (routine; apply decrement silently) + │ + ├── winner_confidence > 0.88 AND loser_independent_support > 0.55 + │ → Epiphany (preserve BOTH; tag the episode) + │ + └── winner_confidence ≤ 0.88 + → FailureTicket (escalate; don't commit either) +``` + +**Animal Farm examples:** + +- *Routine error-correction:* "Boxer worked hard all day." CF test: + is "Boxer" a horse, a breed, a boxing verb? Graph carries Boxer as + hardened rigid-designator (masculine, horse, carthorse, strong). + Loser readings have zero independent support → silent error + correction, commit only the horse reading. + +- *Epiphany:* "The windmill fell." CF test: Snowball's sabotage (per + Napoleon's narrative in ch. 6) vs the storm (original cause, ch. 6 + before Napoleon's retelling). Graph carries BOTH as committed + facts with separate NARS truths. Loser's independent support + ≥ 0.55 → Epiphany with `GraphConflictsWithLocal` tag. Both + triples commit; the episode is flagged as carrying propaganda + dissonance. This IS the literary meaning of Animal Farm. + +- *Cross-lingual epiphany:* "All animals are equal." CF test: + straightforward reading ("every animal has equal status") vs + constitutional-axiom reading ("this is a founding rule"). Russian + translation uses different case marking ("равны" vs "равноправны") + that distinguishes the readings. Cross-lingual bundle disagrees + with English default → Epiphany with `CrossLingualDivergence`. + +**Why this matters for AGI.** A system that can only error-correct +treats all ambiguity as noise. A system that can distinguish +epiphany from error recognizes when ambiguity carries meaning — +which is what sarcasm, allegory, propaganda, and poetry are made of. +Animal Farm is the canonical test because the novel's mechanism is +exactly this — narratorial unreliability made visible through +committed-vs-retold facts diverging. + +**Naming pattern:** Abduction-threshold-unbundle uses the same NARS +revision rule as hardness-unbundle; epiphany preservation uses the +same triplet-graph storage as any committed fact. Zero new storage +primitives — all from PR #208's pattern plus a disposition tag. + +**Edit:** `crates/lance-graph/src/graph/arigraph/episodic.rs` (+80 LOC) +— `unbundle_abductive`, `role_candidates_wider`, `lookup_rigid`. + +**Edit:** `crates/lance-graph/src/graph/arigraph/triplet_graph.rs` (+40 LOC) +— `role_bundle()` accessor + incremental update when new triples land. + +**(c) Contradictions as phase + magnitude (Staunen markers).** + +Each committed triple gets an optional **complex amplitude** beyond +its NARS `TruthValue`. This is a targeted application of Path 2's +phase-tag machinery — not to the whole memory field, just to +contradictions: + +```rust +pub struct Contradiction { + /// Angular distance from consensus position, in normalized + /// units: 0 = fully agrees, π (= PhaseTag::pi()) = fully opposite. + pub phase: PhaseTag, // 128 bits, from ladybug-rs hologram types + /// Strength of the dissonance: 0 = no contradiction worth + /// tracking, 1 = strong contradiction that commands attention. + pub magnitude: f32, +} + +impl Triplet { + /// Optional contradiction record. Present when this triple + /// conflicts with prior commitments; None when clean. + pub contradiction: Option<Contradiction>, +} + +impl TripletGraph { + /// When a new triple conflicts with existing content, compute + /// the phase angle from the consensus (via XOR of fingerprints) + /// and magnitude from NARS-truth divergence. Attach as + /// Contradiction to the new triple; fires Staunen if past + /// threshold. + pub fn commit_with_contradiction_check( + &mut self, + triple: Triplet, + ) -> CommitResult; +} + +pub enum CommitResult { + Clean, + Contradicts { staunen_fires: bool, magnitude: f32 }, +} +``` + +**Why this matters.** A flat NARS `TruthValue` represents belief +strength but can't carry **direction of disagreement**. Two triples +that disagree can have the same confidence but point opposite ways +in fact-space. Phase captures the direction; magnitude captures how +seriously we should take the disagreement. + +**Linkage to sigma_rosetta — Staunen → phase, Wisdom → magnitude.** + +The 64-glyph qualia vocabulary already contains the coordinates we +need. No new glyphs; just map: + +| Sigma family | Maps to | Interpretation | +|---|---|---| +| **Staunen family** (wonder, astonishment, unexpected — includes Staunen / Emberglow / Thornrose) | **phase** | Direction of the surprise. Which way does this contradiction point relative to consensus? Staunen at angle φ means "surprised *in this direction*." | +| **Wisdom family** (clarity, insight, understood, metacognition — typically in Cognitive glyphs 32-47) | **magnitude** | Depth of the insight. How much does this contradiction reveal? Wisdom near 1 = profound reveal; near 0 = shallow surprise. | + +Concretely: the per-cycle qualia vector's Staunen-family projection +becomes the `Contradiction.phase` angle, and its Wisdom-family +projection becomes the `magnitude`. Contradiction detection is no +longer a separate computation — it's a projection onto two axes of +the qualia vocabulary we already ship. + +```rust +impl Contradiction { + /// Build a Contradiction from the current cycle's QualiaVector + /// by projecting onto Staunen (phase) and Wisdom (magnitude) + /// subspaces of the sigma_rosetta 64-glyph vocabulary. + pub fn from_qualia(q: &QualiaVector) -> Self { + Self { + phase: PhaseTag::from_angle(q.staunen_projection()), + magnitude: q.wisdom_projection().clamp(0.0, 1.0), + } + } +} +``` + +**Animal Farm example revisited:** "Some are more equal" fires +Staunen-family glyphs hard (the surprise direction is clear: toward +hierarchy) AND Wisdom-family glyphs hard (the reader understands +something profound about power). Phase = Staunen-axis angle (≈ π +from the prior claim); magnitude = Wisdom-axis depth (≈ 0.9). Both +high → Staunen marker fires on the WorldModelDto, epiphany committed +with the phase direction preserved. + +Routine contradictions fire Staunen-family low (no clear surprise +direction) and Wisdom-family low (no deep insight). Phase noisy, +magnitude near 0 → silent error correction, no Staunen attention. + +**This is the epiphany-vs-error-correction distinction at the +storage layer**, with the sigma_rosetta qualia vocabulary as the +natural coordinate system. We already compute these qualia per +cycle; contradiction detection becomes free reuse of the existing +machinery. + +**Animal Farm example:** "All animals are equal" (ch. 2) gets +committed with magnitude 0. "All animals are equal, but some are +more equal than others" (ch. 10) commits with phase ≈ π from the +prior fact (opposite direction in equality-space), magnitude ≈ 0.9. +Staunen fires. The system has detected the book's central +contradiction as a first-class fact, not as a rewrite of the +earlier triple. + +**Files (folded into existing D8 scope):** +- `crates/lance-graph/src/graph/arigraph/triplet_graph.rs` — + +40 LOC for `Contradiction` struct, `commit_with_contradiction_check`, + phase computation via XOR + hamming. +- `crates/lance-graph-contract/src/grammar/mod.rs` — add + `STAUNEN_THRESHOLD` constant. +- No new files; ~40 LOC addition, ~30 LOC tests. + +**Edit:** `crates/lance-graph/src/graph/arigraph/mod.rs` — re-export. + +**Coref escalation path — the bridge that makes Animal Farm work:** + +``` +grammar: "He decreed it." ← "he"? + │ + ▼ +D4 ContextChain.disambiguate over ±5 (D5 Trajectory) + │ + ├── fixed pronoun? apply feature filter via D5 + │ Trajectory::role_candidates with masculine-animate codebook. + │ └── 1+ survivors? → commit. + │ + └── no candidate in ±5 → escalate via D8 + │ + ▼ + EpisodicMemory::role_candidates_wider(Subject, + masculine_codebook, StoryScale::Paragraph) + │ + ├── candidate at paragraph scope? → commit. + ├── else escalate to Chapter, then Document. + └── still nothing → lookup_rigid for hardened facts + (is "Napoleon" a committed masculine-animate boar + in the already-unbundled rigid-designator store?) + │ + ├── yes → commit with high-confidence NARS truth. + └── no → FailureTicket, LLM fallback. +``` + +**Why this ships with the grammar work:** without the bridge, coref +on Animal Farm fails on cross-chapter references. Napoleon appears +in chapter 1 and is referred to as "he" in chapter 9 — ±5-sentence +Markov can't reach chapter 1. The bridge resolves it because +Napoleon's masculine-animate commitments hardened early via PR #208's +`unbundle_hardened` and live as rigid-designator facts in the +existing triplet graph. + +**LOC estimate:** ~90 LOC on the AriGraph side (episodic.rs + +triplet_graph.rs edits + mod.rs + tests). No new module. Pure reuse +of shipped infrastructure. + +### D9 — Story-Arc as an ONNX Learning Graph (export interface only; training deferred) + +**The idea.** The story is not just a factual graph (D8) and a ±5 +Markov trajectory (D5) — it also has an **arc**, a learned pattern +of how state transitions unfold. Animal Farm has a recognisable +corruption arc (equality → ambiguity → dominance → totalitarianism); +fairy tales have propp-function arcs; news stories have +inverted-pyramid arcs. The arc is what predicts "what typically +happens next" given the committed state. + +**ONNX is the right format** because: +- It's already in our stack (CLAUDE.md Model Registry: + `jina-v5-onnx/` with `model.onnx` 2.3 GB). +- Rust ONNX runtimes exist (`ort`, `tract`) and are production-ready. +- ONNX graphs are portable — Netron visualizes them, any framework + consumes them, trained arcs can be shared across sessions and users. +- It's a **computational graph** at heart: nodes = states, edges = + transitions with weights. Exactly what a narrative arc is. + +**Three time scales of story representation (combined picture):** + +| Layer | Representation | Timescale | Purpose | +|---|---|---|---| +| Markov ±5 (D5) | VSA trajectory | Sentence / paragraph | Local coref / disambiguation | +| Episodic + Triplet graph (D8) | Node–edge graph | Session → Persistent | Committed facts lookup | +| Story arc (D9) | ONNX computational graph | Learned pattern | Predict next state given current | + +Each layer feeds the next. Committed facts populate the arc's +training signal; the arc predicts state transitions that the Markov +trajectory verifies. + +**Shipping scope for this PR (minimal):** + +1. **Export interface.** Add `graph.to_onnx()` on `TripletGraph` that + emits the committed-facts subgraph as a valid `.onnx` file. Nodes + = entity fingerprints; edges = SPO triples with NARS truth as + edge attributes. ONNX is graph-structured, not tensor-structured, + for this use — we use the GraphProto container with custom ops + for triples. +2. **Integration hook.** Reserve a `story_arc: Option<ArcPredictor>` + field on the meta-inference context; default `None`. When set, the + predictor is consulted as a third axis alongside Counterfactual + and Markov during coref escalation. +3. **Arc-axis trait definition.** `ArcPredictor` trait with + `predict_next_state(current: &TrajectoryFingerprint) -> + StatePrediction`. No default implementation in this PR — + downstream crates or a future PR train the ONNX model and ship it + as a loadable asset. + +**Deferred to follow-up PR:** + +- Actual ONNX model training on committed-fact sequences. +- Pre-trained story-arc templates for common narrative patterns. +- Cross-story transfer (train on Animal Farm, apply to The Trial). +- Arc-axis integration into the meta-inference dispatch. + +**Arc pressure → awareness (architectural hook shipped now).** + +The arc carries two derivative signals that matter *now*, before +ONNX training lands: + +| Derivative | Interpretation | Awareness effect | +|---|---|---| +| `d(phase) / dt` | Direction of the arc is shifting | Plot twist / topic change / character reversal — widen attention | +| `d(magnitude) / dt` | Tension is accelerating | Climax / crisis / reveal — focus attention | +| Both low | Stable narrative flow | Routine dispatch, no special signal | +| Both high | Major arc event | Fire Staunen awareness | + +This is exactly how narrative theory defines "story beats" — the +moments the arc pivots. Computational detection is threshold +crossing on the derivative. The architectural hook ships now, the +ONNX training fills in the prediction later. + +```rust +/// Instantaneous arc pressure — computed per cycle from the +/// accumulated Markov trajectory + graph commitments. +pub struct ArcPressure { + /// Phase: direction of the arc (Staunen-family projection). + pub phase: f32, + /// Magnitude: tension / depth (Wisdom-family projection). + pub magnitude: f32, +} + +/// Rate of change between consecutive cycles. This is what the +/// cognitive shader reads as an awareness signal. +pub struct ArcDerivative { + pub dphase_dt: f32, + pub dmagnitude_dt: f32, +} + +impl ArcDerivative { + /// True when either derivative crosses the awareness threshold + /// — the system should surface this cycle as a story beat. + pub fn is_arc_shift(&self, threshold: f32) -> bool { + self.dphase_dt.abs() > threshold + || self.dmagnitude_dt.abs() > threshold + } +} +``` + +**Integration with the existing cognitive shader (shipped in +PR #204).** + +`MetaWord` in the contract already has 4 reserved awareness bits +(per PR #204 — `awareness: u8` packed into the u32). Use those bits +for arc-shift detection: + +``` +awareness 4 bits: + bit 0 — d(phase)/dt crossed threshold (arc direction shifting) + bit 1 — d(magnitude)/dt crossed threshold (tension accelerating) + bit 2 — Staunen marker fired (from D8 contradiction detection) + bit 3 — StateAnchor changed since last cycle (proprioception) +``` + +The shader's `MetaFilter.awareness_min` (shipped) already gates +dispatch on awareness. Arc shifts now contribute to this gate +without touching the shader code — they just set different bits. + +**Thinking-style dispatch reads arc pressure as a signal profile +axis** (extending D7's meta-inference duality to three axes: +Counterfactual / Markov / Arc). Styles can condition on it: + +- **Focused** style when tension accelerating + direction stable → + converge on the climax event. +- **Exploratory** style when direction shifting → widen candidate + enumeration, the narrative just changed frame. +- **Reflective** style when tension falling + shift complete → + integrate what just happened (NARS revision on recent commits). +- **Deliberate** style when both derivatives high → major arc pivot, + run the full inference pipeline, commit carefully. + +**ONNX training signal (deferred to follow-up).** Every cycle emits +a (state, arc_pressure, arc_derivative) tuple. These are the +training pairs for the ONNX narrative-arc model — the model learns +to predict the next arc_pressure given the current state. Once +trained, it fills the third axis of the meta-inference dispatch. + +**Awareness = noticing the arc change.** Computationally modest +(two subtractions + threshold check per cycle) but architecturally +meaningful: the system observes shifts in its own narrative +reasoning, not just the content. That's what "becoming aware of +shifts in the story arc as actual awareness" means in code — +derivative crossings feed the shader's awareness gate, which +changes which style dispatches, which changes what the system +notices about itself. + +**Files (folded into D9 scope):** +- `crates/lance-graph-contract/src/grammar/arc_predictor.rs` — + add `ArcPressure`, `ArcDerivative`, `is_arc_shift`. Keep zero-dep. + (~40 LOC added to the existing +60.) +- `crates/cognitive-shader-driver/src/engine_bridge.rs` — wire arc + derivative into the awareness bits of MetaWord. (~30 LOC edit.) +- No new crates; same deliverable, expanded scope. + +**Files:** + +- `crates/lance-graph/src/graph/arigraph/triplet_graph.rs` — + `to_onnx()` method (+80 LOC). Uses `prost` or raw `.onnx` protobuf + emission; no runtime dep on `ort` or `tract` in this PR. +- `crates/lance-graph-contract/src/grammar/arc_predictor.rs` — NEW + (+60 LOC). `ArcPredictor` trait, `StatePrediction` struct, + `StatePrediction::margin_above(threshold)` helper. Zero-dep; just + the interface. +- `crates/lance-graph-contract/src/grammar/mod.rs` — re-export. + +**LOC:** ~140 added to the total. Total still one PR. + +**Why ship the interface now even without the model:** the +meta-inference in D7 is defined with 2 axes (CF + Markov). Pushing +the arc predictor to a later PR means the meta-inference dispatch +has to be refactored then. Shipping the trait now lets D7 reserve +the slot, and the follow-up PR just lands the implementation. + +### D7 — Grammar Thinking Styles as Meta-Inference Policies + +**The frame.** The 7 NARS inferences are **permanent logical operators** +(Deduction / Induction / Abduction / Revision / Synthesis / +Extrapolation / Counterfactual). They don't change. What a thinking +style is, is a **meta-inference policy**: given the evidence signals +from an attempted parse, which logical operator fits? + +Meta-inference = inference *about* inference. Reasoning about the +reasoning. The style doesn't just "apply Deduction"; it evaluates +whether Deduction was the appropriate operator given the signal +profile, and if not, meta-reasons which operator to escalate to. + +**Permanent logical core.** The signal-profile → inference dispatch +rules are **axiomatic**, not empirical: + +| Signal from the grammar attempt | Logically implies | +|---|---| +| Morphology unambiguous (e.g. Finnish Inessive `-ssa` "in") | **Deduction** — rule directly applicable | +| ≥ 3 of 8 Pearl SPO 2³ masks plausible | **Abduction** — need best-explanation, deduction can't decide | +| All TEKAMOLO slots fill coherently | **Deduction** closed the parse | +| Any TEKAMOLO slot ambiguous | **Counterfactual Synthesis** — test alternative fillings | +| Markov coherence < threshold at focal | **Abduction** — re-explain this sentence given the flow | +| Novel surface form, no matching template | **Extrapolation** — extend nearest known pattern | +| Conflicting evidence between parses of same claim | **Revision** — merge truths | +| Multiple independent signals agree | **Synthesis** — bind into combined inference | + +These rules are permanent. They don't drift with data. A grammar style +that routes morphology-unambiguous-signal to Abduction is *logically +wrong*, regardless of how many parses it has revised. + +**The two orthogonal reasoning axes (the meta-inference duality).** + +A parse attempt is tested from two independent angles. Their agreement +or disagreement IS the awareness signal. + +``` + PARSE ATTEMPT + │ + ┌─────────────┴─────────────┐ + ▼ ▼ + COUNTERFACTUAL AXIS MARKOV CHAIN AXIS + (within-sentence) (cross-sentence) + │ │ + morphology + TEKAMOLO ±5 context chain + + SPO 2³ causal mask + Mexican-hat kernel + │ │ + enumerates the local scores this parse + alternative space against the discourse + (finite counterfactuals) flow (continuous coherence) + │ │ + ▼ ▼ + "Is this parse the best "Does this parse fit + among its plausible the surrounding flow?" + alternatives?" + │ │ + └──────────┬────────────────┘ + ▼ + JOINT READING: + · both strong → HIGH confidence (Synthesis) + · CF strong, Markov weak → topic shift / novelty at focal + · CF weak, Markov strong → locally ambiguous but contextually committed + → try cross-lingual bundle (Finnish etc.) + · both weak → genuine novelty → escalate (FailureTicket) +``` + +The counterfactual axis is where **heavy-grammar morphology earns its +keep**. Each morphological commitment *eliminates* counterfactual +branches: + +| Language | Surface form | Commits | Counterfactual space | +|---|---|---|---| +| English | "The book on the table" | nothing | 8 SPO 2³ × {Lokal, Modal, Topical} TEKAMOLO = 24 branches | +| Russian | "Книга **на столе**" (Prepositional `-е`) | Lokal slot | 8 × 1 = 8 branches | +| Finnish | "Kirja **pöydällä**" (Adessive `-llä`) | Lokal at-surface | 8 × 1 = 8 branches | +| Russian | "Резать **ножом**" (Instrumental `-ом`) | Modal / means | 8 × 1 = 8 branches | +| German | "mit dem Messer" (Dativ after `mit`) | Modal / means | 8 × 1 = 8 branches | +| Turkish | "masa-**da**" (Locative `-da`) | Lokal | 8 × 1 = 8 branches | + +A morphology-rich language hands you a counterfactual space that's 1/N +the size of English's. The counterfactual axis has less work to do, so +the Markov axis is what decides the remaining ambiguity — and the +Markov axis is where **Counterfactual Synthesis** (NARS inference) +tests alternatives against ±5 coherence. + +**Hydration, not extraction.** The SPO triple isn't just "S, P, O" — +it's **hydrated** with the Pearl 2³ causal mask from the morphology. +The Instrumental `-ом` in "Убил **ножом**" hydrates the SPO +(Killer, Killed, Knife) with mask bit 1 (enabling) committed — the +knife didn't CAUSE directly, it enabled. This hydration happens at +extraction time, no separate causal-inference pass needed. + +``` +Surface: "X убил Y ножом" + morphology: Acc(Y) + Ins(nóž) + ↓ +Extracted SPO: (X, убить, Y) + ↓ +Hydrated: (X, убить, Y) + + Pearl.direct = 1 (X performed the action) + + Pearl.enabling = 1 (knife enabled) + + Pearl.confounding = 0 + → mask = 0b011 = 0x03 + ↓ +Counterfactual space collapsed from 8 → 1 branch +because Instrumental morphology committed the enabling role. +``` + +In English "X killed Y with a knife" — the `with` preposition is a +Wechsel (could be Modal / Instrumental / Comitative). Counterfactual +space stays at 3 branches until Markov coherence or cross-lingual +bundle collapses it. + +**Markov vs Counterfactual: when each is primary.** + +| Situation | Primary axis | Reason | +|---|---|---| +| Heavy morphology, clear discourse | Neither — Deduction closes | Both axes agree trivially | +| Heavy morphology, weird discourse | **Markov primary** | Counterfactual already collapsed; surprise is cross-sentence | +| Light morphology (English), clear discourse | **Counterfactual primary** | ±5 context can't help if sentence itself is ambiguous; must enumerate SPO 2³ × TEKAMOLO branches | +| Light morphology, weird discourse | **Both weak → FailureTicket** | Escalate — neither axis has grip | +| Cross-lingual bundle available | **Bundle collapses CF** | Finnish / Russian morphology from bundled parse commits what English left ambiguous | + +This is what "meta-inference between permanent logical reasoning" +means: the style doesn't just pick a NARS operator — it evaluates +*which axis has grip* on this parse, then picks the operator for that +axis. The permanence is in the axes themselves and their combination +rules; the empirical layer is the prior over which axis wins on the +style's content distribution. + +**Linguistic-precision correction (fix from yesterday's draft).** The +Finnish case table in `grammar-tiered-routing.md` wrote "Accusative +`-n/-t` → Object" — this is a Latinate transplant. Finnish object +marking is actually: + +- Total object: **Nominative** (plural) or **Genitive `-n`** (singular) +- Partial / negated object: **Partitive `-a/-ä`** +- True **Accusative**: only for personal pronouns (`minut`, `sinut`, + `hänet`, `meidät`, `teidät`, `heidät`) + +The grammar-landscape doc (D0) corrects this. Each language's case +table uses its native case inventory, not a forced Latinate mapping. + +**Russian 6 cases — full inventory** (needed for the Russian priority +language per OSINT): + +| Case | Suffix pattern (sg masc / fem / neut) | Role | +|---|---|---| +| Nominative | -ø / -а, -я / -о, -е | Subject (S) | +| Genitive | -а, -я / -ы, -и / -а, -я | Possessor / negated object / partitive | +| Dative | -у, -ю / -е, -и / -у, -ю | Recipient — often TEKAMOLO Kausal indirect ("to X") | +| Accusative | = Nom (inanimate) / = Gen (animate) / -у, -ю / -о, -е | Direct object (O) | +| **Instrumental** | -ом, -ем / -ой, -ей / -ом, -ем | Means / agent in passive — **TEKAMOLO Modal** ("by means of X") | +| Prepositional (Locative) | -е / -е, -и / -е | Governed by `в`/`на`/`о` prepositions — TEKAMOLO Lokal or Temporal | + +Russian Instrumental is exactly the Finnish Adessive `-lla/-llä` +(means/instrument) plus the Finnish Essive `-na/-nä` (role/state) +folded together — it commits to TEKAMOLO Modal by morphology alone. +This is why morphologically-rich languages are easier: one case +ending carries a slot assignment that English would need surrounding +prepositions + word-order to infer. + +Analogous precision needed for every language priority on the OSINT +list: German 4 cases (Nom/Gen/Dat/Akk), Arabic 3 + trilateral root +system, Turkish 6 + agglutinative suffix chain, Japanese particles +(が / を / に / で / へ / と / から / まで), Hebrew no-cases with +root-pattern morphology. Each gets its native table. + +**What drifts (and where awareness lives).** Styles differ in their +**priors over signal-profile frequency** — how often each profile +occurs in the style's observed content distribution. Analytical style +expects clean morphology signals; Exploratory expects ambiguous ones. +These priors revise via NARS based on actual parse outcomes, but the +signal→inference dispatch rules underneath stay axiomatic. + +**So awareness has two layers:** + +1. **Permanent layer** (the logical core): signal-profile → + NARS-inference dispatch table. Shared across all styles. Not + revised by data. Encoded as a pure function, not as configuration. + +2. **Empirical layer** (the priors): each style's expected distribution + of signal profiles on its content. NARS-revised per style based on + parse outcomes. This IS the style's grammar awareness. + +The style's runtime behaviour = permanent dispatch rules × style's +empirical prior × current signal profile. + +Structural parallel: `agi-chat/src/grammar/grammar-awareness.ts` +(237 LOC) implements exactly this separation — the dispatch is +"permanent" via pure TypeScript rules; the `awareness` state is what +the style has learned about its content. + +**New:** `crates/lance-graph-contract/src/grammar/thinking_styles.rs` +(+260 LOC) — three types: + +```rust +/// Static prior loaded from YAML. +pub struct GrammarStyleConfig { + pub style: ThinkingStyle, + pub nars: NarsPriorityChain, // primary + fallback + pub morphology: MorphologyPolicy, // tables, agglutinative + pub tekamolo: TekamoloPolicy, // slot priority, require_fillable + pub markov: MarkovPolicy, // radius, kernel, replay + pub spo_causal: SpoCausalPolicy, // pearl_mask, tolerance + pub coverage: CoveragePolicy, // local_threshold, escalate_below +} + +/// NARS truth values per parameter slot. Mutates at runtime. +/// This is the "awareness" — the style's track record. +pub struct GrammarStyleAwareness { + pub style: ThinkingStyle, + /// Truth per (parameter_axis, value) pair. Updates on each parse. + pub param_truths: HashMap<ParamKey, TruthValue>, + /// Aggregate success rate of this style over recent window. + pub recent_success: TruthValue, + /// Count of parses this style has driven. + pub parse_count: u64, +} + +pub enum ParamKey { + NarsPrimary(NarsInference), // did this inference work? + MorphologyTable(MorphologyTableId), // did this table help? + TekamoloSlot(TekamoloSlot), // did this slot fill right? + MarkovKernel(WeightingKernel), // did this kernel resolve? + SpoCausalMask(u8), // did this mask fit? +} + +impl GrammarStyleAwareness { + /// Revise a single parameter's truth after a parse outcome. + pub fn revise(&mut self, key: ParamKey, outcome: ParseOutcome); + + /// Derive a runtime config from prior + awareness. + /// Parameters with low accumulated truth get down-weighted. + pub fn effective_config(&self, prior: &GrammarStyleConfig) + -> GrammarStyleConfig; +} + +pub enum ParseOutcome { + LocalSuccess, // → truth +ε + LocalSuccessConfirmedByLLM, // → truth +2ε + EscalatedButLLMAgreed, // → truth +ε/2 + EscalatedAndLLMDisagreed, // → truth −ε + LocalFailureLLMSucceeded, // → truth −2ε +} +``` + +YAML parser: zero-dep mini-parser reading the subset used by +`grammar_styles/*.yaml` (plain key/value + simple lists, no nesting +beyond one level). Alternatively expose a pure `from_str` taking +pre-parsed `&[(key, value)]` so the contract stays zero-dep and +deepnsm does the YAML read. + +**New:** `crates/deepnsm/assets/grammar_styles/*.yaml` — the starter +catalogue (12 files). Each YAML config grounds a style in the 4 +dimensions: + +```yaml +# analytical.yaml — strict rule-apply, English SVO, case deductive +style: analytical +nars: + primary: Deduction + fallback: Abduction +morphology: + tables: [english_svo, finnish_case_table] # try SVO first + agglutinative_mode: false +tekamolo: + priority: [temporal, lokal, kausal, modal] # TE+LO > KA > MO + require_fillable: true # fail if slots ambiguous +markov: + radius: 5 + kernel: uniform # no anticipation + replay: forward # no backward replay +spo_causal: + pearl_mask: 0x01 # commit to direct causation only + ambiguity_tolerance: 0.1 +coverage: + local_threshold: 0.90 # strict + escalate_below: 0.85 +``` + +```yaml +# exploratory.yaml — counterfactual Wechsel resolution +style: exploratory +nars: + primary: CounterfactualSynthesis + fallback: Abduction +morphology: + tables: [english_svo, finnish_case_table, russian_case_table] + agglutinative_mode: true # peel suffixes R→L +tekamolo: + priority: [modal, kausal, lokal, temporal] # explore MO first + require_fillable: false # tolerate gaps +markov: + radius: 5 + kernel: mexican_hat # anticipation on + replay: both_and_compare # test both directions +spo_causal: + pearl_mask: 0xFF # all 8 causal configs plausible + ambiguity_tolerance: 0.4 +coverage: + local_threshold: 0.70 # permissive, try hard before escalating + escalate_below: 0.50 +``` + +12 starter configs: analytical / convergent / systematic / creative / +divergent / exploratory / focused / diffuse / peripheral / intuitive / +deliberate / metacognitive. Each is ≤ 40 lines YAML. + +**Edit:** `crates/deepnsm/src/ticket_emit.rs` (+60 LOC) — load style's +`effective_config` (prior ⊕ awareness) at dispatch; populate +`FailureTicket::attempted_inference` from the style's NARS-revised +top-ranked inference, not the static YAML primary. On ticket +resolution (LLM returns, or local success confirmed), call +`awareness.revise(ParamKey::NarsPrimary(...), outcome)`. + +**Edit:** `crates/deepnsm/src/markov_bundle.rs` (+50 LOC) — `MarkovBundler` +reads kernel shape + radius from the active style's effective config; +on bundle success/failure, revise +`ParamKey::MarkovKernel(kernel)`. + +**Edit:** `crates/lance-graph-contract/src/grammar/mod.rs` — re-export +`GrammarStyleConfig`, `GrammarStyleAwareness`, `ParamKey`, +`ParseOutcome`. + +**Awareness lifecycle:** + +1. **Bootstrap:** load YAML prior → `GrammarStyleConfig`. +2. **Initialize:** construct `GrammarStyleAwareness` with `param_truths` + at neutral `TruthValue { f: 0.5, c: 0.01 }` (no evidence yet). +3. **Per-parse:** derive `effective_config = prior ⊕ awareness`; run + parse; emit outcome; call `awareness.revise(...)` for each + parameter that fired. +4. **NARS revision rule** (already in contract): + `f_new = (f_old × c_old + f_observed × c_observed) / (c_old + c_observed)` + `c_new = (c_old + c_observed) / (c_old + c_observed + 1)` +5. **Promotion:** when `recent_success.c > 0.8 && recent_success.f > 0.75` + the style is "earned its confidence" and its effective config + diverges more aggressively from the prior. +6. **Persistence** (out of scope for this PR, flagged): serialize + `GrammarStyleAwareness` to `BindSpace` row or sled KV so awareness + survives session boundaries. Next PR. + +**Grounding summary** (the 4 axes per style): + +| Axis | What the style picks | +|---|---| +| **SPO 2³** | Which Pearl causal-mask bits to commit to (0x01 = direct only; 0xFF = all plausible); ambiguity tolerance | +| **Morphology** | Which case tables to consult, in what order; agglutinative suffix-peeling on/off | +| **TEKAMOLO** | Slot priority order; whether to require all slots fillable or tolerate gaps | +| **Markov bundling** | Radius (default 5); kernel shape (uniform / mexican_hat / gaussian); replay direction | + +The style IS the grammar-reasoning policy. The YAML makes it editable +without touching Rust. + +## Critical Files + +| File | Change | LOC | +|---|---|---| +| `.claude/knowledge/grammar-landscape.md` | NEW | +300 | +| `crates/deepnsm/Cargo.toml` | features | +10 | +| `crates/deepnsm/src/parser.rs` | coverage branch | +30 | +| `crates/deepnsm/src/ticket_emit.rs` | NEW + style-aware emission + revise on outcome | +180 | +| `crates/deepnsm/src/triangle_bridge.rs` | NEW | +220 | +| `crates/deepnsm/src/markov_bundle.rs` | NEW + style-driven kernel + revise | +270 | +| `crates/deepnsm/src/trajectory.rs` | NEW | +80 | +| `crates/deepnsm/src/lib.rs` | re-exports | +20 | +| `crates/deepnsm/assets/grammar_styles/*.yaml` | 12 YAML configs | +480 | +| `crates/lance-graph-contract/src/grammar/context_chain.rs` | reasoning ops | +140 | +| `crates/lance-graph-contract/src/grammar/role_keys.rs` | NEW | +160 | +| `crates/lance-graph-contract/src/grammar/thinking_styles.rs` | NEW — config + awareness + revise | +260 | +| `crates/lance-graph-contract/src/grammar/mod.rs` | re-export | +10 | +| `crates/deepnsm/tests/integration.rs` | NEW | +110 | +| `crates/deepnsm/tests/ticket.rs` | NEW | +40 | +| `crates/deepnsm/tests/triangle.rs` | NEW | +60 | +| `crates/deepnsm/tests/styles.rs` | NEW — YAML load + awareness revision + effective_config drift | +140 | +| `crates/deepnsm/benches/parse.rs` | NEW | +40 | + +**Total:** ~2,490 LOC, 18 files + 12 YAML configs, one PR. + +## Reused (not rebuilt) + +- Contract: `FailureTicket`, `PartialParse`, `CausalAmbiguity`, + `TekamoloSlots`, `WechselAmbiguity`, `FinnishCase`, `NarsInference`, + `ContextChain`, `SentenceCrystal`, `ContextCrystal`, + `CrystalFingerprint` (sandwich), `Structured5x5` bipolar. +- `lance-graph-cognitive::grammar::{GrammarTriangle, NSMField, + CausalityFlow, Qualia18D}` (1,929 LOC shipped). +- `ndarray::hpc::vsa::{vsa_bind, vsa_bundle, vsa_permute, + vsa_similarity, vsa_hamming}`. +- `ndarray::hpc::bitwise::hamming_batch_raw`. + +## The NARS + Morphology + TEKAMOLO Triad (unified mechanism) + +The three load-bearing concepts compose into a single inference +machine. Each layer supplies what the others need: + +``` + ┌──────────────── 144 VERB-ROLE TAXONOMY ────────────────┐ + │ 12 semantic families × 12 tense/aspect variants. │ + │ Each verb carries a prior over which TEKAMOLO slots │ + │ it expects filled (e.g. TRANSFER needs Kausal + Lokal; │ + │ STATE needs Modal; MOTION needs Lokal + Modal). │ + └────────────┬────────────────────────────────────────────┘ + │ verb-identification (DeepNSM FSM + COCA) + ▼ + ┌──────────── MORPHOLOGY (per-language cases) ────────────┐ + │ Russian Inst -ом → Modal (means/instrument) │ + │ Finnish Ade -lla → Modal (at/by) │ + │ Finnish Ine -ssa → Lokal (in/inside) │ + │ German Dat + mit → Modal │ + │ English «with» → Wechsel(Modal | Comitative | Topical) │ + └────────────┬────────────────────────────────────────────┘ + │ case-table lookup per token + ▼ + ┌──────────── TEKAMOLO SLOT CANDIDATES ───────────────────┐ + │ Each token's morphology proposes a slot + truth. │ + │ Unambiguous morphology → single slot with high truth. │ + │ Ambiguous (English prep, case syncretism) → multiple. │ + └────────────┬────────────────────────────────────────────┘ + │ propose + ▼ + ┌──────────── NARS INFERENCE OVER SLOTS ──────────────────┐ + │ Deduction : verb's slot-prior + morphology-committed │ + │ slot agree → commit with high truth. │ + │ Induction : N past sentences had same (verb, morphology │ + │ → slot) → reinforce the pattern's prior. │ + │ Abduction : ambiguous morphology → pick slot that best │ + │ explains the verb's prior + Markov context. │ + │ Revision : another parse of same sentence (cross- │ + │ lingual or re-read) → merge truths on slots.│ + │ Synthesis : multiple independent signals (morphology + │ + │ word-order + context) → bind into single │ + │ slot assignment. │ + │ Counterfactual : test alternative slot fillings against │ + │ ±5 Markov coherence + graph story context. │ + │ Extrapolation : novel morphology on known verb family → │ + │ extend by analogy. │ + └────────────┬────────────────────────────────────────────┘ + │ + ▼ + committed slot assignment with NARS truth; + role-indexed VSA bundling (D5) lands content in slot's slice. +``` + +**The reason 144 works.** The verb's slot-prior is what turns +TEKAMOLO from a 4-slot schema into a structured expectation: + +| Verb family (12) | Expected TEKAMOLO profile | +|---|---| +| BECOMES | Temporal + Modal | +| CAUSES | Subject + Object + Kausal | +| SUPPORTS | Object + Modal | +| CONTRADICTS | Object + Modal (adversative) | +| REFINES | Object + Modal | +| GROUNDS | Object + Lokal | +| ABSTRACTS | Object | +| ENABLES | Object + Kausal | +| PREVENTS | Object + Kausal (negated) | +| TRANSFORMS| Object + Temporal + Modal | +| MIRRORS | Object + Modal | +| DISSOLVES | Object + Temporal | + +× 12 tense/aspect/mood variants (present / past / future / perfect / +continuous / pluperfect / future-perfect / habitual / potential / +imperative / subjunctive / gerund) → 144 verb-role cells, each with +its own TEKAMOLO slot prior. Parsing a verb looks up its row; +morphology fills its columns; NARS inference reconciles them. + +**This is what turns parse into table lookup + truth aggregation.** +No search, no inference-time rule-walk — the (verb × tense) cell +IS the slot-filling policy, the morphology IS the slot-filler, and +NARS is the truth-merge operator. The 3,125 Structured5x5 cells are +large enough to index this space (5^5 > 144 × 10 × 10 for verb × +slot × value). + +### D11 — Bundle + Perturb Cognitive Stack (generative counterpart, interface only) + +**The core claim.** A generative cognitive stack doesn't need +QKV attention + up/down MLP + learned query. The same effect — +asking "what does this state want to emerge into?" — is a canonical +VSA operation when you already have role-indexed bundling + an +ONNX arc model: + +| Transformer primitive | Our substitute | Why it works | +|---|---|---| +| **Query vector (Q)** — what am I looking for? | The epiphany crystal itself (phase + magnitude + committed facts bundle) | The epiphany IS the query; no separate learned weights needed | +| **Key matrix (K)** — what's available? | Role-indexed slices of the triplet graph's story_vector + ±5 trajectory | Already present in D8 tier-1.5; slices ARE the keys | +| **Value matrix (V)** — what content comes back? | Role-unbound content from the slice | Unbind IS the retrieval | +| **Up-projection MLP** — expand to higher dim | `vsa_permute` by epiphany's phase-tag | Permutation widens the subspace without a learned projection | +| **Nonlinearity** — break linear combinatorics | **ONNX arc model perturbation** — apply the bundle as delta to the trained arc graph | The arc model IS the nonlinearity; input delta = perturbation | +| **Down-projection** — back to hidden-dim | Unbind via role key | XOR unbind recovers the role-specific content at the original dim | +| **Query-response (asking)** | **Bundle-perturb-unbind** | End-to-end: bundle epiphany with state → perturb ONNX arc → unbind target role → emergent content | + +**The stack for "what does this epiphany want to emerge?":** + +``` +epiphany_crystal (from D8: phase + magnitude + committed-facts bundle) + │ + ▼ vsa_bundle(epiphany, current_trajectory, story_vector) +state_bundle + │ + ▼ vsa_permute(state_bundle, epiphany.phase) — up-project equivalent +perturbed_input + │ + ▼ ONNX arc model forward pass with perturbed_input + │ (the learned arc is the "MLP nonlinearity") +emergent_state + │ + ▼ vsa_unbind(emergent_state, RoleKey::for_target(task)) +emerged_direction — what the epiphany "wants" to unfold +``` + +**Zero learned projections on our side.** The role keys (D6) are +canonical fingerprints; the bundle operations are algebraic; only +the ONNX arc model is learned — and it's orders of magnitude smaller +than a full transformer because it only has to predict *arc phase +and magnitude transitions*, not token probabilities. + +**Interpretability bonus.** Every step of the generative stack is +inspectable: +- The query is a crystal with explicit phase + magnitude + fact refs. +- The bundle is composable (you can add / remove contributors and + see the effect). +- The perturbation is a small arc-model input delta. +- The emerged direction is role-tagged content (not raw token soup). + +At every stage you can ask "why this next state?" and trace back +through bundle contributors + arc transitions. The stack is legible +in a way transformer internals are not. + +**"What it wants to emerge" is an operation, not a metaphor.** The +epiphany's phase direction under ONNX-arc perturbation picks out the +narrative trajectory that most consistently continues from the +committed contradiction. It's not generation guided by gradients on +token probabilities — it's arc continuation guided by phase +consistency on an already-learned narrative graph. + +**Shipping scope for this PR (interface only; generation deferred).** + +1. **Interface in the contract:** + ```rust + /// Ask what an epiphany wants to emerge into, given current state. + pub trait EpiphanyEmergence { + fn emerge( + &self, + epiphany: &Contradiction, + trajectory: &Trajectory, + graph: &TripletGraph, + target_role: GrammaticalRole, + ) -> EmergedDirection; + } + + pub struct EmergedDirection { + /// Role-unbound content bundle at target role. + pub content: Box<[f32; 10_000]>, + /// Predicted phase / magnitude of the emerged state. + pub arc_continuation: ArcPressure, + /// NARS truth that the emergence is coherent with history. + pub truth: TruthValue, + } + ``` + +2. **Reference implementation (stub)** — `DefaultEmergence` that + does the bundle + permute + (identity ONNX, since we don't have + a trained arc yet) + unbind. Returns a deterministic + `EmergedDirection` that's inspectable but not predictive until + the ONNX model lands. + +3. **Example test** on Animal Farm epiphany from D10: + - Input: ch. 3 epiphany "Napoleon takes puppies in secret," + current trajectory = ch. 1-3. + - Target role: FUTURE_EVENT (new role key). + - Emergence should produce a direction whose role-unbound + content overlaps (Hamming < 0.45) with ch. 5's actual facts + once the ONNX arc is trained in a follow-up PR. + - Until then, the test asserts the interface produces + deterministic output, not predictive correctness. + +**Files (folded into D9's scope since it shares the ONNX runtime):** +- `crates/lance-graph-contract/src/grammar/emergence.rs` — NEW + (~100 LOC). `EpiphanyEmergence` trait + `EmergedDirection` struct. +- `crates/lance-graph-contract/src/grammar/mod.rs` — re-export. +- `crates/deepnsm/src/emergence_impl.rs` — NEW (~80 LOC). + `DefaultEmergence` stub implementation. +- `crates/deepnsm/tests/emergence.rs` — NEW (~40 LOC). Interface + determinism test on Animal Farm ch. 3 epiphany. + +**LOC:** ~220 LOC added. Still one PR. + +**What this closes.** The stack becomes symmetric — D2-D8 extract +from text, D9-D11 generate back into text. Both sides use the same +VSA primitives; the only asymmetry is that extraction doesn't need +the ONNX arc model while generation does. Awareness (D9 derivatives) +drives dispatch; epiphanies (D8 + D10) drive emergence; validation +(D10) holds both accountable. No QKV, no MLP, no separate query +vectors — canonical operations all the way down. + +### D10 — Forward-Validation Harness (epiphanies as predictions, NARS-tested against future arc) + +**The proof-test.** Every epiphany the system records is implicitly a +prediction — "this contradiction carries meaning, and therefore the +arc will continue to deviate in this direction." The rest of the +text is ground truth. Forward-validation closes the loop: + +``` +Chapter 1-5 prefix + ↓ full extraction pipeline + ↓ commit facts to graph + ↓ fire N epiphanies (Staunen markers with phase + magnitude) + ↓ emit arc_derivatives at M cycles + ↓ + ┌───────────────────┘ + ▼ + For each epiphany and each arc_shift: + record the prediction: + - direction (phase) + - magnitude + - which facts / entities it implicates + +Chapter 6-10 suffix + ↓ full extraction pipeline + ↓ commit more facts + ↓ for each recorded epiphany / arc_shift: + ↓ check whether the committed suffix facts + ↓ confirm or refute the prediction + ↓ NARS revision on the recorded prediction: + f = confirmed ? raise : lower + c = c + c_new (always rises — new evidence seen) +``` + +**The measurement is standard NARS.** An epiphany's belief was +`TruthValue { f₀, c₀ }` when it fired. Future evidence arrives with +`TruthValue { f_obs, c_obs }` where `f_obs = 1` if confirmed, `0` if +refuted. Standard revision rule: + + f_new = (f₀·c₀ + f_obs·c_obs) / (c₀ + c_obs) + c_new = (c₀ + c_obs) / (c₀ + c_obs + 1) + +Applied retroactively, this revises the epiphany's truth against its +own future. The system LEARNS its awareness accuracy from the story +itself — no external labels needed. + +**Ground-truth labels on Animal Farm (hand-authored, < 200 lines).** + +| Chapter | Ground-truth epiphany | Expected direction | +|---|---|---| +| 2 | Animalism's 7 Commandments drafted | Baseline, arc start | +| 3 | Napoleon takes puppies in secret | **Prediction:** Napoleon will use dogs for coercion | +| 4 | Battle of Cowshed, Snowball shines | **Prediction:** Snowball's rise threatens Napoleon | +| 5 | Snowball expelled by dogs | Confirms ch. 3 prediction; arc shifts | +| 6 | Squealer revises the "Snowball hero" story | **Prediction:** narrative unreliability escalates | +| 7 | Commandments silently amended (sleeping-in-beds) | Confirms ch. 6; wisdom-magnitude spike | +| 8 | Boxer works himself to collapse | **Prediction:** Boxer will be betrayed | +| 9 | Boxer sold to knacker | Confirms ch. 8 ("Sundering of the loyal") | +| 10 | "All animals are equal BUT some more than others" | Max-magnitude contradiction with ch. 2; arc completes | + +**Metrics produced by the harness:** + +| Metric | What it measures | +|---|---| +| **Epiphany precision** | Of N epiphanies fired, how many were confirmed by future text? | +| **Epiphany recall** | Of M ground-truth beats, how many did the system flag as epiphanies? | +| **Arc-shift detection F1** | Did `d(phase)/dt` / `d(magnitude)/dt` spikes align with actual story pivots? | +| **Prediction direction accuracy** | Of confirmed epiphanies, was the predicted phase direction aligned with the suffix's factual direction? | +| **Retroactive-revision monotonicity** | NARS confidence should rise on every observation regardless of direction; truth (f) should converge to the ground-truth polarity | + +**Files:** + +- `crates/deepnsm/tests/animal_farm.rs` (NEW, ~220 LOC) — load + public-domain Animal Farm text, run the pipeline end-to-end on + 10 breakpoints, assert the metrics above. +- `crates/deepnsm/assets/animal_farm/ground_truth.yaml` (NEW, + ~200 lines) — hand-labelled beat list per chapter. +- `crates/lance-graph/src/graph/arigraph/episodic.rs` — add + `retroactive_revise(prediction_id, observation)` to expose NARS + revision on stored epiphanies (~40 LOC). + +**Benchmark targets:** + +- Epiphany precision ≥ 0.80 (most fired epiphanies should be confirmed). +- Epiphany recall ≥ 0.60 (catch most major beats). +- Arc-shift F1 ≥ 0.70 on ground-truth-labelled pivots. +- Prediction direction accuracy ≥ 0.85 on confirmed epiphanies. +- End-to-end metric-emitting run ≤ 10 minutes on a single core + (40 K words × <10 µs local + occasional LLM tail). + +**Why this ships with the PR, not later.** Without forward-validation +the awareness/epiphany claims are architectural prose. With it, the +PR description reports measured numbers on a canonical literary text, +and every follow-up PR can regression-test against the same benchmark. +Staunen / arc-shift / epiphany stop being metaphors and become +Spearman-ρ-grade measurements. **This is also what turns the arc- +awareness into genuine self-testing cognition** — the system's own +predictions are held accountable by the story that made them. + +## Challenges to the Universal-Grammar Claims (honest pushback) + +Before shipping, it's worth stress-testing the theoretical scaffolding. +Several load-bearing claims have real weaknesses. + +### C1 — "Morphology-heavy languages are easier" assumes textbook grammar + +The 98% Finnish vs 85% English coverage number works on textbook +Finnish. **Real-world Finnish** has: +- **Colloquial clipping** (`-ssa` drops to `-s` in speech and casual + writing): *kaupassa* → *kaupas*. +- **Clitic stacking** (`-kin`, `-kaan`, `-han`): *kirjakinpa* layers + three clitics onto one stem. +- **Dialect variation**: Savonian Finnish rewrites morphology + systematically; Western dialects preserve archaisms. +- **Vowel harmony interacting with gradation**: the table must handle + `-ssa` vs `-ssä` vs `-sta` vs `-stä` correctly; mis-segmenting + drops accuracy hard. + +Turkish agglutination is worse: *evlerimizdeydiler* = "they were at +our houses" = 6 stacked morphemes. Suffix segmentation is its **own** +ambiguity problem. Our D5 case table doesn't handle suffix order +disambiguation; the 98% number assumes clean segmentation. + +**What breaks:** the coverage gain from heavy morphology shrinks on +real text. Expect 95% on Finnish literary prose, 85-90% on +conversational Finnish, <80% on Turkish agglutination without a +dedicated morpheme-peeler. + +### C2 — Russian case syncretism: the "case" alone doesn't commit + +Russian masculine inanimate Accusative = Nominative. Feminine +Genitive singular = Nominative plural for many nouns. The ending +doesn't commit the case — you need gender + animacy + number + verb +context to disambiguate. The "one case = one TEKAMOLO slot" mapping +in D5 is a simplification that will produce wrong slot assignments +on ~15-20% of Russian NPs. + +**What breaks:** Russian coverage is closer to 88% than 92%. The +table needs case-syncretism resolution rules (gender × animacy × +number) to hit the higher number. + +### C3 — Cross-lingual bundling assumes semantically-parallel sources + +The "XOR-bundle EN+FI of the same entity, get disambiguation for +free" claim requires **translations**, not just mentions. For Animal +Farm we have authentic translations. For OSINT we'd have to rely on +machine-translated or independently-written sources — which don't +semantically align at the clause level. Bundling non-parallel texts +about the same entity averages noise, not meaning. + +**What breaks:** the cross-lingual shortcut works for canon literature +(Animal Farm, Wikidata entries) but NOT for news OSINT where no +parallel translation exists. Falls back to monolingual parsing plus +LLM disambiguation for the 10% tail. + +### C4 — ±5 window is arbitrary; literary text violates it + +Why ±5? No empirical justification was presented. Animal Farm has +chapter-spanning references ("as we agreed at the meeting in the +barn" — ch. 6 referring to ch. 1). ±5 has zero chance of reaching +chapter 1 from chapter 6. + +The D8 story-context bridge handles it, but that's a fallback +mechanism for what might actually be the *common* case in long-form +text — not the 5-10% exception. + +**What breaks:** the ±5 default is wrong for literary corpora. Needs +to be adaptive (widen when local coherence is low) or much larger +by default. Our 85% local coverage target may already assume the +bridge fires frequently. + +### C5 — Counterfactual threshold (0.55) is a continuum, not a split + +Epiphany vs error-correction is a useful conceptual distinction but +the real distribution of `loser_independent_support` is continuous. +A 0.55 cutoff turns a gray zone into two discrete classes. Many real +counterfactuals land at 0.45-0.65 — mis-classified in either +direction. + +Also **circularity**: "independent support" for the loser means +"previously committed facts that agree with the loser." If we +already committed the loser's reading earlier (via Napoleon's +propaganda), the graph naturally supports the loser, and we flag it +as epiphany — but it might actually be error compounding across +episodes. + +**What breaks:** the binary split needs to become a continuous +disposition score with three regions: high-confidence-winner (commit +only), high-support-loser (epiphany preserve), middle band (escalate +to LLM). The middle band might be 30-40% of all counterfactuals. + +### C6 — 90-99% coverage is a projection, not a measurement + +The "90-99% local, 1-10% LLM" tiering comes from: +- COCA 4096 = 98.4% of running **tokens** (not sentences). +- FSM handles 85% of English SVO **sentences** (different metric). +- Heavy-morphology languages: 95-98% (claimed, not measured). + +No harness has been run on any corpus to confirm that the joint +DeepNSM + grammar triangle + Markov + failure-ticket pipeline +actually hits 90%. The number is a stitched-together estimate. + +**What breaks:** until Animal Farm actually runs through and we +measure, the tiering claim is marketing. The PR should ship with +the measured number, whatever it is. + +### C6b — Research standing: what's established vs novel + +**Established in the literature (validates our direction):** + +| Paper | Contribution | +|---|---| +| [arxiv 2003.05171](https://arxiv.org/abs/2003.05171) — "VSA for Context-Free Grammars" (2020) | Chomsky-normal-form CFGs fully encodable in VSA via role-filler binding; recursive mapping of phrase-structure trees to Fock-space vectors; representation theorem proved. **This is the theoretical foundation for our approach.** | +| [arxiv 2111.06077](https://arxiv.org/abs/2111.06077) / [2112.15424](https://arxiv.org/abs/2112.15424) — HDC/VSA Surveys Part I+II | Canonical role-binding (XOR / circular convolution / elementwise multiply), superposition, permutation are the three standard operations. Exactly our bind / bundle / permute. | +| [arxiv 2512.14709](https://arxiv.org/html/2512.14709) — "Attention as Binding" | Transformer attention reinterpreted through VSA role-filler binding. Parallels our bgz-tensor attention-as-lookup. | +| [arxiv 2509.25045](https://arxiv.org/html/2509.25045) — "Hyperdimensional Probe" | LLM residual streams projected into interpretable VSA concepts. Parallel mechanism to our grammar-awareness. | +| [arxiv 2408.10734](https://arxiv.org/html/2408.10734) — "Vector Symbolic OSINT Discovery" | VSA applied directly to OSINT. Validates the OSINT vertical. | + +**What's novel in our combination (not published):** + +- **NSM 65 primes + VSA** — Wierzbicka's semantic-primes framework + encoded as role-filler binding in 10K VSA space. No published + paper combines these. +- **TEKAMOLO slot filling via case-morphology binding** — German + grammar-pedagogy template applied as VSA role-keys with Finnish / + Russian case endings as the filler commits. Novel. +- **NARS truth on VSA role-filler bindings** — truth revision at the + binding level, not just at the entity level. Novel combination. +- **Crystal-mode vs Quantum-mode duality** for memory consolidation + — structured (Markov SPO) vs holographic (phase-tagged residual) + on the same 10K substrate. Not in the surveys. +- **Cross-linguistic superposition as disambiguation mechanism** — + bundling parses from EN + FI + RU + DE via VSA to let heavy- + morphology languages disambiguate what English leaves ambiguous. + Novel pitch. +- **Grammar-tiered routing (local VSA 90-99 % + LLM 1-10 %)** with + structured FailureTicket (SPO 2³ × TEKAMOLO × Wechsel) as a + meta-inference reason trace. Novel architecture. +- **Counterfactual outcomes split into epiphany vs error-correction + classes** preserving narrative dissonance — novel distinction for + allegorical / propagandistic text (Animal Farm). + +**Practical upshot:** we're standing on a theoretically-proven +foundation (VSA-CFG encoding works) and adding several novel +combinations on top. The foundation paper is from 2020 and cites the +open question of *practical at-scale grammar extraction from VSA*; +our stack is one concrete answer. + +### C7 — NSM and TEKAMOLO are contested in linguistics + +- **NSM 65 primes** — Wierzbicka's framework is cited by cognitive + science and contested in mainstream linguistics. Universality + claims don't survive empirical testing on polysynthetic languages + (Mohawk, Inuit). Some "primes" are culturally loaded (the English + word for what's meant to be a universal). +- **TEKAMOLO** is a German grammar-pedagogy mnemonic, not a + cross-linguistic universal. Arabic has "hal" (state / accompaniment + adverbial) that doesn't fit the 4-slot schema. Mandarin "bǎ" + construction rearranges transitivity in ways TEKAMOLO can't + describe. +- **Chomskyan UG** — Tomasello, Evans & Levinson argue UG is + empirically weak. Building on it is philosophically risky. + +**What breaks:** the universal-grammar ideas are scaffolding for a +practical extraction pipeline, not a claim about linguistic +universals. The D0 doc should state this explicitly — TEKAMOLO is +a *useful slot template*, not a *linguistic universal*. Same for the +144-verb taxonomy (numerologically chosen, not empirically derived). + +### C8 — Named-Entity gap is the actual blocker + +Already flagged in `grammar-tiered-routing.md` as the biggest miss +(~90% of OSINT is proper nouns). COCA 4096 has zero coverage of +Altman, Anthropic, Riyadh. Every fingerprint collision there +fragments the graph. + +**What breaks:** Without a NER pre-pass + Wikidata/Wikipedia entity +linking, the 90% coverage claim fails on any realistic OSINT input. +This is out-of-scope for the D2-D8 PR but blocks the OSINT demo. +Needs a follow-up NER PR before the OSINT vertical ships. + +### C9 — Abduction-threshold unbundle may compound errors + +If Abduction confidence is miscalibrated (i.e., the threshold 0.88 +doesn't reliably predict ground-truth correctness), promoting +Abductive conclusions to the graph as committed facts **encodes +systematic errors**. Each subsequent coref escalation reads those +errors as ground truth, reinforcing them. + +**What breaks:** the confidence calibration of NARS-on-grammar hasn't +been measured. The 0.88 threshold is inherited from PR #208 without +empirical tuning for this domain. First Animal Farm run may need +threshold re-calibration or an introspection step that audits +recently-unbundled facts. + +### Summary of mitigations + +| Challenge | Mitigation in this PR | Deferred | +|---|---|---| +| C1 morphology colloquial | Start with literary-grade text (Animal Farm); flag as known-textbook baseline | Dialect / agglutination suffix-peeler — future PR | +| C2 case syncretism | D6 role keys include gender-gated variants for Russian | Full syncretism rules — follow-up | +| C3 non-parallel texts | Use Animal Farm authentic translation; flag news as needing LLM tail | OSINT bundle quality — future work | +| C4 ±5 inadequate | D8 story context bridge; make radius a style parameter in D7 | Adaptive-radius selection — future | +| C5 continuous disposition | Ship with 3-region (commit / epiphany / escalate) not 2-region | Calibrated thresholds per corpus — follow-up | +| C6 unmeasured coverage | Animal Farm benchmark IS the measurement | - | +| C7 contested linguistics | D0 doc explicitly states "useful templates, not universals" | - | +| C8 NER gap | Flagged as blocker; D2 FailureTicket emits COCA-miss as routing signal | Dedicated NER pre-pass PR | +| C9 Abduction calibration | Emit Abduction confidence distribution in test output for first run | Per-domain threshold tuning — follow-up | + +None of these kill the architecture. Several downgrade the +quantitative claims; all sharpen what the PR actually demonstrates. + +## Out of Scope + +- Path 2 (holographic residue at leaf). +- CausalityFlow TEKAMOLO extension (modal/local/instrument) — deferred. +- Phase tags, Bell-S>2, ladybug quantum 9-op set. +- Active multi-lingual parsers (EN+FI+RU+TR). Keys exist; parsers later. +- 200–500 YAML TEKAMOLO templates per language (future training). +- Named Entity gap (biggest OSINT blocker — pre-parser NER layer, separate PR). +- Cockpit Cypher, chess vertical. +- FP_WORDS=160 migration (H6), Crystal4K persistence (H10), Int4State + upper-nibble (H8), Glyph5B wide-container (H9). + +## Verification + +1. `cargo test -p deepnsm` — base zero-dep passes. +2. `cargo test -p deepnsm --features contract-ticket` — ticket tests pass. +3. `cargo test -p deepnsm --features grammar-triangle` — triangle bridge tests pass. +4. `cargo test -p lance-graph-contract` — 112 + 10 new → 122 (6 context_chain + role_keys + 4 thinking_styles). +5. `cargo bench -p deepnsm parse` — median ≤ 10 µs (FSM path). +6. `cargo bench -p deepnsm parse --features grammar-triangle` — ≤ 50 µs. +7. Property: same SPO, different temporals → Hamming ≥ 0.15 × 10 000. +8. Property: Wechsel-ambiguous + ±5 context → `disambiguate` margin > 0.1, no LLM. +9. Property: Mexican-hat weighting monotone by distance from focal. +10. `cargo check -p deepnsm --no-default-features` — zero-dep preserved. +11. **NARS-awareness property:** load analytical-style prior; feed 50 simulated parse outcomes favouring `NarsInference::Abduction`; verify `effective_config.nars.primary == Abduction` (awareness drifted from YAML-defined Deduction). +12. **NARS-awareness property:** 50 opposing outcomes → `recent_success.c` drops below 0.3 (low-confidence style, ready for escalation). +13. **NARS-awareness property:** revision is monotone under NARS rule — repeated positive outcomes raise `f` and `c`; repeated negative raise `c` but lower `f`. +14. **Awareness persistence** test skipped in this PR (persistence is follow-up) — but ensure `GrammarStyleAwareness` is `Clone + Debug` so snapshot/restore is trivial to add. +15. **Coreference resolution (the integration target):** + - `cargo test -p deepnsm --features grammar-triangle test_coref` + - 10-sentence English paragraph with 5 pronouns ("it" / "he" / "she" / "they" / "that"): each resolves to correct antecedent via joint CF + Markov scoring, no LLM call. + - Russian dropped-subject paragraph with 5 verb-only sentences: each resolves via morphological agreement + Markov axis; 95%+ accuracy. + - Ambiguous case where CF axis and Markov axis disagree → `FailureTicket` emitted with `recommended_next = Abduction` and the candidate disagreement surfaced as `WechselAmbiguity` (since pronouns ARE dual-role tokens). + - Cross-lingual bundle: same paragraph in EN + FI. Finnish morphology commits some pronouns that English leaves ambiguous; XOR-bundle improves English resolution accuracy by ≥ 5 %. + +## Branch / PR + +- New branch: `claude/deepnsm-markov-context-bundling` off main. +- Single PR, description cites the 5 prerequisite knowledge docs and + today's lossiness-epiphany table. +- Calls out: LLM cost avoided, morphology-rich languages easier not + harder, 200–500 YAML templates as the next milestone. + +## Post-Ship State + +- DeepNSM parses ≤ 10 µs / ≤ 50 µs (no triangle / with triangle) on + 90–99 % of traffic. +- Trajectory fingerprints carry ±5 context; NARS reasons about flows, + not isolated sentences. +- FailureTicket cleanly emitted with SPO×2³×TEKAMOLO×Wechsel + decomposition for the 1–10 % tail. +- Canonical role keys with slice addressing available. +- Grammar landscape documented. + +Path 2 (holographic residue), CausalityFlow extension, YAML template +training, and Named Entity NER layer land on top in subsequent PRs. diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 00000000..e2c72d7c --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,54 @@ +{ + "$schema": "https://json.schemastore.org/claude-code-settings.json", + "permissions": { + "ask": [ + "Write(.claude/knowledge/PR_ARC_INVENTORY.md)", + "Write(.claude/knowledge/LATEST_STATE.md)", + "Write(.claude/knowledge/STATUS_BOARD.md)", + "Write(.claude/knowledge/INTEGRATION_PLANS.md)", + "Write(.claude/knowledge/EPIPHANIES.md)", + "Write(.claude/knowledge/ISSUES.md)", + "Write(.claude/knowledge/IDEAS.md)", + "Write(.claude/knowledge/TECH_DEBT.md)" + ], + "deny": [ + "Bash(git push --force:*)", + "Bash(git push -f:*)", + "Bash(git push --force-with-lease:*)", + "Bash(git branch -D:*)", + "Bash(git branch --delete:*)", + "Bash(git reset --hard:*)", + "Bash(rm -rf:*)", + "Bash(rm -fr:*)", + "mcp__github__merge_pull_request", + "mcp__github__delete_file", + "mcp__github__enable_pr_auto_merge", + "mcp__github__fork_repository", + "mcp__github__create_repository" + ] + }, + "hooks": { + "SessionStart": [ + { + "matcher": "startup", + "hooks": [ + { + "type": "command", + "command": "bash .claude/hooks/session-start.sh" + } + ] + } + ], + "PostCompact": [ + { + "matcher": "", + "hooks": [ + { + "type": "command", + "command": "bash .claude/hooks/post-compact.sh" + } + ] + } + ] + } +} diff --git a/.claude/skills/cca2a/SKILL.md b/.claude/skills/cca2a/SKILL.md new file mode 100644 index 00000000..2aca8ed2 --- /dev/null +++ b/.claude/skills/cca2a/SKILL.md @@ -0,0 +1,80 @@ +--- +name: cca2a +description: > + Explains the Claude Code Agent-to-Agent (CCA2A) pattern used in + this workspace: BOOT.md session entry, agent ensemble with + meta-agents, append-only knowledge state (LATEST_STATE + + PR_ARC_INVENTORY), handover protocol, governance permissions. + Read this once to understand the pattern; then stop re-explaining + it across sessions. Canonical files live at their normal + workspace paths, not in this skill. +--- + +# CCA2A — Claude Code Agent-to-Agent + +Explanation-only skill. Documents the pattern so a new session can +grok it once instead of the user re-deriving it for 30 turns. + +**Canonical files** (read these, not template copies): + +**Entry points:** +- `.claude/BOOT.md` — session entry point (one page). +- `CLAUDE.md` — workspace spec, links to all of the below. + +**Eight bookkeeping files** (rows are history; specific fields are +state; never delete a row; method hierarchy: APPEND preferred → +Edit-field after prior Read → Write prompts for confirmation): + +_History / dashboards:_ +- `.claude/knowledge/LATEST_STATE.md` — current contract inventory. +- `.claude/knowledge/PR_ARC_INVENTORY.md` — APPEND-ONLY per-PR arc. +- `.claude/knowledge/STATUS_BOARD.md` — deliverable-level dashboard. +- `.claude/knowledge/INTEGRATION_PLANS.md` — versioned plan index. + +_Kanban / audit (priority + scope tags on every entry):_ +- `.claude/knowledge/EPIPHANIES.md` — date-prefixed insight log. + Status: FINDING / CONJECTURE / SUPERSEDED. +- `.claude/knowledge/ISSUES.md` — Open + Resolved bugs / regressions. + Double-entry: issue captured on discovery; Resolution appends on + close. Priority P0-P3, Scope @agent D<N> domain:<tag>. +- `.claude/knowledge/IDEAS.md` — Open + Implemented + Rejected + speculation. Triple-entry: idea captured, shipped appended, + plan-update logged. Priority + Scope. +- `.claude/knowledge/TECH_DEBT.md` — Open + Paid knowingly-deferred + work. Priority + Scope + Introduced-by-PR + Payoff-estimate. + +**Agent ensemble + orchestration:** +- `.claude/agents/BOOT.md` — orchestration spec + Knowledge + Activation + Handover Protocol. +- `.claude/agents/README.md` — function inventory of all agents. + +**Governance + hooks:** +- `.claude/settings.json` — team-shared governance (ask-on-Edit for + the two strictest bookkeeping files, deny on destructive ops, + SessionStart + PostCompact hooks). +- `.claude/hooks/session-start.sh` / `post-compact.sh` — inject + bootload context at turn 0 / after compaction. + +**Active plans:** +- `.claude/plans/<name>-v<N>.md` — indexed via + `INTEGRATION_PLANS.md`. Old versions stay with Status annotation; + new versions prepend. + +## What to read when + +- **New session on this workspace:** `.claude/BOOT.md` first, then + the three mandatory reads it lists. +- **Understanding the A2A two-layer model:** + [concepts.md](concepts.md). +- **Comparing to official Claude Code conventions:** + [divergence.md](divergence.md). + +## The idea in one paragraph + +Every new session pays a 30-turn rediscovery tax rebuilding context +about what types exist, which conventions are locked, what work is +queued. CCA2A is the scaffold that cuts that to 3-5 turns: two +mandatory knowledge files that state current inventory and decision +history, one agent ensemble file that routes subagents, one hook +that injects critical state after compaction. Append-only +governance on the history files so drift can't erode the arc. diff --git a/.claude/skills/cca2a/concepts.md b/.claude/skills/cca2a/concepts.md new file mode 100644 index 00000000..223b56f7 --- /dev/null +++ b/.claude/skills/cca2a/concepts.md @@ -0,0 +1,148 @@ +# CCA2A — Concepts + +## The Problem + +A new Claude Code session on a non-trivial workspace burns 20-30 +turns rediscovering: +- What types / modules already exist (re-proposes duplicates). +- What conventions have been locked (violates them, gets reverted). +- What work is already queued (proposes things in the plan). +- How to coordinate subagents (re-invents the wheel). + +At ~$15-$75 per 1M tokens of Opus, that's **$20-$40 of waste per +cold start.** CCA2A is the scaffolding that reduces this to 3-5 +turns. + +## Two A2A Layers + +### Layer 1 — Runtime A2A (code-level inside the system) + +If the workspace is building a cognitive system, its runtime +orchestration needs a blackboard where experts write entries and +later rounds read them. Typical structure: + +```rust +struct Blackboard { + entries: Vec<BlackboardEntry>, + round: u32, +} + +struct BlackboardEntry { + expert_id: ExpertId, + capability: ExpertCapability, + result: u16, + confidence: f32, + support: [u16; 4], + dissonance: f32, + cost_us: u32, +} +``` + +This is the A2A bus for multi-expert inference inside the system. +Name it consistently; reference from BOOT.md. + +### Layer 2 — Session A2A (Claude Code subagent coordination) + +For subagent coordination *during a Claude session*: +- The two mandatory knowledge files (LATEST_STATE + + PR_ARC_INVENTORY) are the shared blackboard. +- Domain-specific knowledge docs are extended blackboard entries + (load-on-trigger via Knowledge Activation table). +- Handover files (`.claude/handovers/*.md`) carry per-chain state + transfer. +- Parallel subagent spawns in one main-thread turn is the cheapest + coordination pattern. + +Layers 1 and 2 don't conflict — they operate at different time +scales. Don't entangle them. + +## Governance Rules + +### Append-only history (eight bookkeeping files, unified rule) + +The workspace carries eight bookkeeping files. Rows / entries +inside them are immutable historical record; specific fields per +file are mutable state. Never delete a row. Supersedure is a new +row that cites the old; the old row's Status updates to +"Superseded by <new>". + +| File | Role | Immutable (rows) | Mutable (state fields) | +|---|---|---|---| +| `PR_ARC_INVENTORY.md` | Per-PR decision arc | PR rows (Added / Locked / Deferred / Docs) | Confidence + Corrections APPEND | +| `LATEST_STATE.md` | Current-state snapshot | Recently-shipped PR table | Snapshot sections updated by replacement | +| `STATUS_BOARD.md` | Deliverable-level dashboard | Rows (D-id / title / plan / scope) | Status, PR/Evidence per row | +| `INTEGRATION_PLANS.md` | Versioned plan index | Entries (scope / path / deliverables) | Status + Confidence per entry | +| `EPIPHANIES.md` | Dated insight log | Entry bodies | Status line (FINDING/CONJECTURE/SUPERSEDED) | +| `ISSUES.md` | Open + Resolved bugs | Entry bodies | Status + Resolution line (append on close) | +| `IDEAS.md` | Open + Implemented + Rejected speculation | Entry bodies | Status + Rationale line (append) | +| `TECH_DEBT.md` | Open + Paid debt | Entry bodies | Status + Payoff line (append) | + +**Method hierarchy (preferred → discouraged):** + +1. **APPEND** (the method of choice) — add a NEW dated row. Either + Edit-to-prepend inside a bounded section, or `Bash cat >> + file << EOF`. No prompt. Old rows stay untouched. This is the + double-bookkeeping pattern: new state = new row, not mutation. +2. **Edit field with prior Read** — flip a mutable field + (Status / Confidence / Resolution / Payoff) on an existing row + after reading the file. No prompt. Use for clarifications and + status transitions on already-captured entries. +3. **Write (full overwrite)** — prompts via + `.claude/settings.json::permissions.ask`. Discouraged; only for + wholesale replacement after explicit review. + +**Rule:** when in doubt, APPEND. Double-bookkeeping keeps the +arc intact and makes the audit trail legible. Edit-a-field is the +lighter touch for clear status transitions. Write is the escape +hatch that costs a confirmation because it can destroy history. + +**Kanban discipline** — three files track work items that must +not get buried: `ISSUES.md`, `IDEAS.md`, `TECH_DEBT.md`. Every +entry carries: + +- **Priority** — P0 blocker / P1 high / P2 medium / P3 low. +- **Scope** — `@<agent-name>`, `D<N>` (plan D-id), + `domain:<grammar|codec|arigraph|infra|...>`. + +Agents filter by their own `@`-mention or their domain. Status +moves through Open → In Progress → Resolved / Implemented / Paid. +Nothing falls through the cracks because every ticket has an owner +named by `@`. + +Core invariant: **the arc is the record; rewriting it destroys the +"why was this decided that way" context that prevents future +rediscovery.** + +### Model policy + +- Main thread: Opus, deep thinking, full `effortLevel: high`. +- Subagent grindwork (single-source mechanical): Sonnet. +- Subagent accumulation (multi-source synthesis): Opus. +- NEVER Haiku — quality floor for this pattern is Sonnet. + +Concrete test before spawning a subagent: +> "Does this agent have to read N sources and produce something that +> only makes sense when those sources are held in mind together?" +> +> Yes → Opus. No → Sonnet. + +### GitHub access + +- Reads (3+ files from same repo): zipball to `/tmp/sources/` + + local grep. Each `mcp__github__get_file_contents` call drops the + full file into session context and re-bills on every subsequent + turn. Zipball-then-grep is ~20x cheaper. +- Reads (single targeted file, known path): MCP is fine. +- Writes (PR creation, comments, reviews, review threads): MCP. + +### Destructive ops denied globally + +`.claude/settings.json::permissions.deny` blocks: `git push --force`, +`git push -f`, `git branch -D`, `git reset --hard`, `rm -rf`, +MCP `merge_pull_request`, `delete_file`, `fork_repository`, +`create_repository`, `enable_pr_auto_merge`. + +## Why This Is Different From Official Claude Code Docs + +See [divergence.md](divergence.md) for what's official, what's +invented, and what's recommended for adoption. diff --git a/.claude/skills/cca2a/divergence.md b/.claude/skills/cca2a/divergence.md new file mode 100644 index 00000000..19967f1f --- /dev/null +++ b/.claude/skills/cca2a/divergence.md @@ -0,0 +1,132 @@ +# CCA2A — Divergence from Official Claude Code + +Audit of what CCA2A does vs what official Anthropic Claude Code +conventions prescribe. Per the 2026-04-19 audit of +`docs.claude.com/en/docs/claude-code/`. + +## Aligned with official + +| Area | Convention | +|---|---| +| Agent cards location | `.claude/agents/<name>.md` with YAML frontmatter | +| Frontmatter fields | `name`, `description`, `tools`, `model` | +| Settings files | `.claude/settings.json` (project, committed) + `.claude/settings.local.json` (local, gitignored) | +| Model values | `sonnet`, `opus` (CCA2A excludes `haiku` by policy) | +| CLAUDE.md location | Project root or `.claude/CLAUDE.md` | +| Permissions arrays | `permissions.ask`, `permissions.allow`, `permissions.deny` | +| Hook events | `SessionStart`, `PostCompact`, etc. all official | + +## Extensions (invented by CCA2A, not in official docs) + +| Feature | What it adds | Why | +|---|---|---| +| `.claude/BOOT.md` | Single page entry point referenced from CLAUDE.md | Explicit bootstrap contract; official doesn't have this concept | +| `.claude/knowledge/LATEST_STATE.md` | Current-state snapshot, updated after every PR | Prevents re-proposing shipped work | +| `.claude/knowledge/PR_ARC_INVENTORY.md` | APPEND-ONLY decision history per PR | Locks architectural history against rewrite-drift | +| `.claude/handovers/*.md` | Agent-to-agent state transfer files | Official has subagents but no multi-step chain handover protocol | +| Knowledge Activation trigger table | Domain → agent → knowledge docs mapping | Extends `READ BY:` headers into an orchestration table | +| Grindwork vs accumulation model split | Per-task model selection rule | Official allows `model: sonnet` but doesn't codify the grindwork/accumulation split | +| Zipball-for-reads policy | Use curl zipball + local grep for 3+ cross-repo reads | Official doesn't cover cross-repo read cost optimization | + +These extensions do not conflict with official conventions. They +layer on top as workspace-specific conventions. A Claude Code +session without CCA2A installed still works with official +conventions; CCA2A just accelerates cold-start. + +## Recommended adoptions from official (future work) + +### 1. `.claude/rules/` with `paths:` frontmatter + +Official path-scoped knowledge loading. CCA2A's READ BY headers +pattern could be replaced or complemented by: + +```markdown +--- +paths: ["src/billing/**"] +--- +Load this rule when Claude works in src/billing/. +``` + +Deterministic (directory match) vs pattern-matched intent. Consider +migrating domain-specific knowledge docs into `.claude/rules/` +with path scopes where the mapping is literal. + +### 2. `SessionStart` hook with `matcher: startup` + +CCA2A already includes this. Official doc recommends it for +session-start context injection. Covered by the install. + +### 3. `PostCompact` hook + +CCA2A includes this. Critical — compaction otherwise drops the +workspace bootload, and the next turn rediscovers. Official doc +recommends this for compaction-resilient sessions. + +### 4. `Skill` fork + `agent:` field + +Skills can fork into named subagents via `agent: Explore` or similar +for read-only isolation. CCA2A doesn't use this yet; could be added +for pure-search skill variants (e.g. a read-only CCA2A audit skill +that reports divergence without installing). + +### 5. Auto memory + +`~/.claude/projects/<project>/memory/MEMORY.md` — Claude writes +automatically. CCA2A's LATEST_STATE.md is the curated alternative; +auto memory could be an unstructured addition for personal notes. + +## Drift to avoid + +- **Invented frontmatter fields.** Stick to the official schema for + `.claude/agents/*.md` frontmatter. Don't add workspace-specific + fields; put conventions in the prose body instead. +- **Custom hook JSON shapes.** Return only the fields official docs + support (`hookSpecificOutput.additionalContext`, + `hookSpecificOutput.hookEventName`, etc.). +- **Skill max frontmatter size.** 1024 chars. Keep `description` + short; offload detail to subpages referenced by the skill. + +## Audit note 5 — `permissions.ask` is intentional, not a typo + +The official settings schema supports three permission rule arrays: +`permissions.allow`, `permissions.deny`, and `permissions.ask`. +CCA2A uses `permissions.ask` deliberately on the two append-only +history files: + +```json +"ask": [ + "Edit(.claude/knowledge/PR_ARC_INVENTORY.md)", + "Edit(.claude/knowledge/LATEST_STATE.md)" +] +``` + +The semantics are: +- **`allow`**: tool runs without prompting. +- **`deny`**: tool is blocked completely. +- **`ask`**: tool runs only after an explicit user-approval prompt, + even if it would otherwise be auto-allowed (e.g. via a broader + `Edit` allow rule in personal `settings.local.json`). + +The goal for append-only files is **surface the edit attempt**, not +block it. `deny` would make legitimate Confidence updates or +Correction appends impossible. `allow` (or absence from the lists) +would silently permit history rewrites. `ask` is the correct middle +ground — rewrites surface as a prompt the human can approve or +refuse. + +`Write` on the same paths stays in the personal `allow` list so that +appending a new PR entry at the top of the file doesn't trigger the +prompt. Append = Write; rewrite = Edit. This split is why both keys +show up together in the governance stack. + +## Compatibility + +CCA2A installs work alongside any existing `.claude/` setup: +- Doesn't replace existing agent cards. +- Doesn't clobber existing `settings.json` rules (merges). +- Doesn't overwrite existing CLAUDE.md (prepends a new section if + absent, skips if a Session Start section already exists). + +A workspace can adopt CCA2A partially: install only the knowledge +docs + governance rules without the handover protocol, or add +handovers later without reinstalling anything. diff --git a/.gitignore b/.gitignore index 10a3d6f1..f415446f 100644 --- a/.gitignore +++ b/.gitignore @@ -58,3 +58,6 @@ knowledge_graph_data/ # Large model data (use GitHub Releases) crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/token_embd_centroids_4096x5120.f32 crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/token_embd_4096x4096.u8 + +# Claude Code personal overrides (per-project) +.claude/settings.local.json diff --git a/CLAUDE.md b/CLAUDE.md index 97dac6d6..d323bb72 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,11 +1,264 @@ # CLAUDE.md — lance-graph -> **Updated**: 2026-03-28 +> **Updated**: 2026-04-19 (post PR #210) > **Role**: The obligatory spine — query engine, codec stack, semantic transformer, and orchestration contract > **Status**: 11 crates, 5 in workspace, 4 excluded (standalone), Phases 1-2 DONE, Phase 3 IN PROGRESS --- +## Session Start — MANDATORY READS (in this order) + +**Start here:** `.claude/BOOT.md` — the one-page session entry point. +It names the three files below as mandatory reads, so load them +whether you went through BOOT.md or landed here directly. + +1. **`.claude/knowledge/LATEST_STATE.md`** — current contract + inventory, recently shipped PRs, active branches, queued work, + explicit deferrals. **What exists.** +2. **`.claude/knowledge/PR_ARC_INVENTORY.md`** — per-PR Added / + Locked / Deferred / Docs / Confidence, reverse chronological. + **APPEND-ONLY**; only the Confidence line is updatable; + corrections append as new dated lines; reversals get their own + PR entry. **Why it exists.** +3. **`.claude/agents/BOOT.md`** — the 19 specialist + 5 meta-agent + ensemble (`workspace-primer`, `integration-lead`, + `adk-coordinator`, `adk-behavior-monitor`, `truth-architect`), + the Knowledge Activation trigger table (domain → agent → docs), + and the Handover Protocol spec. **This is the A2A orchestration + specification for this workspace.** **How to coordinate.** + +After these three, load domain-specific knowledge docs only as +triggered by the user's request. + +**Companion dashboards (mid-session references, not cold-start +mandatory):** + +- **`.claude/knowledge/STATUS_BOARD.md`** — deliverable-level + dashboard. All D-ids across every active plan with Status. + Consult when asking "where is D5" or "is this shipped yet." +- **`.claude/knowledge/INTEGRATION_PLANS.md`** — versioned plan + index (APPEND-ONLY). Active plan lives at + `.claude/plans/<name>-v<N>.md`. Consult before proposing a new + plan. + +**If you want the pattern explained rather than the specifics:** +see **`.claude/skills/cca2a/SKILL.md`** — explanation-only skill +covering the A2A two-layer model, governance rules, and how this +workspace's conventions diverge from official Claude Code docs. +Read once to grok; then stop re-deriving across sessions. + +### Prior art — reference before writing new docs + +This workspace accumulated substantial curated content across +prior sessions. Before drafting a new `.claude/*.md` or proposing +a new architectural direction, grep these: + +- **`.claude/prompts/`** (41 files) — scoped session / probe / + handover / research prompts. See `SCOPED_PROMPTS.md` as index. +- **`.claude/plans/`** + **`.claude/knowledge/INTEGRATION_PLANS.md`** + — versioned integration plans (APPEND-ONLY index; prior versions + retained with Status annotation). +- **`.claude/*.md`** (61 top-level docs) — calibration reports, + session handovers, epiphanies, integration-plan snapshots, + audits, inventory maps, invariant matrices. Examples: + `SESSION_CAPSTONE.md`, `INTEGRATIONSPLAN_2026_04_01.md`, + `INTEGRATION_SESSIONS.md`, `INVENTORY_MAP.md`, + `HANDOVER_NEXT_SESSION.md`, `KNOWLEDGE_SYNC_SIGNED_SESSION.md`. +- **`.claude/knowledge/*.md`** (newer, structured) — with + `READ BY:` headers + Knowledge Activation triggers. +- **`.claude/agents/*.md`** (19 specialists + 5 meta-agents) — see + `README.md` for function inventory, `BOOT.md` for orchestration. +- **`.claude/hooks/*.sh`** — SessionStart + PostCompact context + injectors. +- **`.claude/skills/cca2a/`** — the pattern explanation skill. + +**Rule:** grep the existing ~100 files before writing a new one. +Most architectural concerns have prior art in this workspace. + +--- + +## Agent-to-Agent (A2A) Orchestration — Two Layers + +Orchestration in this workspace runs at two distinct layers. Each +layer uses a different "blackboard" substrate; both must be respected +when spawning subagents or composing cognitive cycles. + +### Layer 1 — Runtime A2A (code-level, in the contract) + +For cognitive-cycle orchestration *inside* the running system: + +- **`lance_graph_contract::a2a_blackboard`** — `Blackboard` with + `entries: Vec<BlackboardEntry>` + `round: u32`. Each entry carries + `expert_id`, `capability`, `result`, `confidence`, `support [u16; 4]`, + `dissonance`, `cost_us`. Experts write; later rounds read prior + entries and build on them. This IS the A2A bus for multi-expert + inference. +- **`lance_graph_contract::orchestration::OrchestrationBridge`** — + trait bridging `StepDomain` (Codec / Thinking / Query / Semantic / + Persistence / Inference / Learning) into `UnifiedStep`. Each domain + contributes a `BridgeSlot`; orchestration routes steps across + domains without duplicating state. +- **`lance_graph_contract::orchestration_mode`** — explicit modes for + how a cycle composes (linear, parallel, blackboard-broadcast, etc.). +- **`ExpertCapability`** enum — the capability taxonomy experts + declare. Do NOT invent new capabilities ad-hoc; grep + `a2a_blackboard.rs` first. +- **Reference doc:** `docs/ORCHESTRATION_IS_GRAPH.md` — capstone + treating orchestration AS graph traversal (the runtime A2A is a + directed blackboard-graph). + +Use Layer 1 when the question is "how do two cognitive experts +compose their outputs at runtime." + +### Layer 2 — Session A2A (Claude-code-level, between subagents) + +For subagent coordination *during* this session: + +- **The mandatory-read files above (`LATEST_STATE.md` + + `PR_ARC_INVENTORY.md`) are the shared blackboard.** Every subagent + I spawn reads them to know the current state, same as a Layer-1 + expert reads prior blackboard entries to know current round state. +- **Knowledge docs in `.claude/knowledge/`** are the extended + blackboard — cross-session persistent entries. Each doc has a + `READ BY:` header declaring which subagent types load it (the + equivalent of `ExpertCapability` matchers). +- **`/root/.claude/plans/*.md`** — plan files authored via `Plan` + agents; session-scoped blackboard for multi-turn work. Other + agents reference the active plan for context. +- **Parallel subagent spawns** in one main-thread turn are the + cheapest Layer-2 A2A pattern. Independent work fans out; results + aggregate back to the main thread, which does the cross-source + synthesis (accumulation → main thread on Opus per policy above). + +Use Layer 2 when the question is "how do I coordinate N subagents +without burning main-thread turns re-reading the same state." + +### Agent ensemble, knowledge bootload, handover protocol + +See **`.claude/agents/BOOT.md`** — it's the meta-card for the 19 +specialist agents + 5 meta-agents in this workspace. It covers: + +- **Meta-agent scopes** (`workspace-primer`, `integration-lead`, + `adk-coordinator`, `adk-behavior-monitor`, `truth-architect`). +- **Session-start knowledge bootload** — every subagent loads Tier-0 + (`LATEST_STATE.md` + `PR_ARC_INVENTORY.md`) unconditionally, then + Tier-1 knowledge docs by trigger table. +- **Knowledge Activation Protocol** — the trigger → agent → doc + mapping (updated 2026-04-19 with grammar / crystal / NARS / + coreference / AriGraph / codec rows). +- **Handover Protocol** — `.claude/handovers/YYYY-MM-DD-HHMM-<from>- + to-<to>.md` files carry What-I-did / FINDING / CONJECTURE / + Blockers / Open-questions. APPEND-ONLY. + +A new session doesn't reinvent the ensemble — it reads this README, +picks the agents whose objects are being touched, and lets them +bootload their own Tier-0 + Tier-1 docs. + +### Rule of thumb + +| Question | Layer | Primary substrate | +|---|---|---| +| "What expert capabilities compose this cycle?" | 1 | `contract::a2a_blackboard::Blackboard` | +| "How do cross-domain steps compose?" | 1 | `OrchestrationBridge` + `UnifiedStep` | +| "What does subagent A need to know before drafting?" | 2 | Mandatory reads + domain knowledge docs | +| "How do I not re-read the same context on every turn?" | 2 | Knowledge doc with `READ BY:` header | +| "How do I coordinate 3 subagents in parallel?" | 2 | Single main-thread turn with parallel `Agent` spawns | + +Layer 1 and Layer 2 do NOT conflict — they operate at different time +scales. A runtime A2A cycle inside a running shader does not involve +Claude subagents; a Claude-session subagent spawn does not write to +the runtime Blackboard. Keep them architecturally distinct. + +--- + +## Model Policy (P0 — never violate) + +**The split that matters: grindwork vs accumulation.** + +- **Grindwork** (single-task mechanical): write-this-file-from-spec, + grep-this-pattern, list-these-paths, run-these-tests, draft-this- + section — **Sonnet**. Bounded input, known output shape, no + synthesis across sources. +- **Accumulation** (multi-source synthesis): harvest-across-repos, + combine-N-docs-into-insights, trace-architecture-across-files, + cross-reference and integrate, judgment calls that depend on + seeing several inputs at once — **Opus**. Cheaper tiers produce + shallow outputs when asked to accumulate; quality drop is visible + and costly. + +**By subagent type:** +- **Main thread:** `claude-opus-4-7[1m]` (or current Opus) with deep + thinking. Synthesis, architecture, review, decisions — all + accumulation. +- **`Plan` subagent:** Opus. Planning is accumulation by definition. +- **Code-review subagent:** Opus. Review is multi-file judgment. +- **`general-purpose` subagent:** depends on task. Pure grindwork → + Sonnet. Anything accumulating → Opus. When in doubt: Opus. +- **`Explore` subagent:** Sonnet default. Search is pattern matching. + If the explore requires synthesis across many files (mapping an + architecture), escalate to Opus. +- **NEVER `haiku` for any subagent in this workspace.** Quality floor + is Sonnet regardless of task simplicity. + +**Concrete test before spawning a subagent:** +> "Does this agent have to read N sources and produce something that +> only makes sense when those sources are held in mind together?" +> +> **Yes → Opus.** No (one source in, one shape out) → Sonnet. + +**Settings baseline** (`.claude/settings.local.json`, gitignored): +`alwaysThinkingEnabled: true`, `effortLevel: high`, +`fastModePerSessionOptIn: true`. Main thread stays at full depth. + +--- + +## GitHub Access Policy — Zipball for Reads, MCP for Writes + +Every `mcp__github__get_file_contents` call drops the full file into +session context and recharges on every subsequent turn. A 50 KB doc +read three times in a session = 150 KB of replayed context, not 50 KB. +This is the second-biggest cost lever after model choice. + +**Preferred pattern for cross-repo reads:** + +```bash +# One zipball per repo per session, stored under /tmp/sources/: +curl -H "Authorization: Bearer $GITHUB_TOKEN" \ + -L https://api.github.com/repos/AdaWorldAPI/<repo>/zipball/HEAD \ + -o /tmp/<repo>.zip +unzip -q /tmp/<repo>.zip -d /tmp/sources/ +# Then use local Read, Grep, Glob — zero context cost until output. +``` + +Or via `pygithub`'s `repo.get_archive_link('zipball')` when Python +is already the path. + +**Use MCP github ONLY for:** + +- `create_pull_request` — can't be done via zipball. +- `add_issue_comment` / `add_reply_to_pull_request_comment` / review + thread writes — writes to GitHub state. +- `pull_request_read` on our own PRs to check reviews / comments — + tiny JSON, not worth zipball overhead. +- `get_me`, `subscribe_pr_activity` — session setup. + +**Do NOT use MCP github for:** + +- Cross-repo file exploration — zipball + local grep is 95 % cheaper. +- Surveying many files from the same repo — same reason. +- Directory listings — extracted tree is free to traverse locally. + +**Edge case:** single targeted read when the exact path is known and +you'll read once → `mcp__github__get_file_contents` is fine. Zipball +overhead (download + extract) only pays off at 3+ reads per repo. + +**This maps to the grindwork/accumulation split too:** +zipball-then-grep-then-synthesize is accumulation, and the synthesis +step (main thread or Opus subagent) reads just the relevant greps, +not whole files. + +--- + ## What This Is Graph query engine AND cognitive codec stack for the Ada architecture. lance-graph is no longer @@ -305,7 +558,7 @@ their synergies, and their FINDING/CONJECTURE status. Never guess architecture. Every `.claude/knowledge/` document has a `READ BY:` header listing which agents MUST load it before producing output in that domain. When a knowledge trigger fires -(see `.claude/agents/README.md § Knowledge Activation Protocol`), the relevant +(see `.claude/agents/BOOT.md § Knowledge Activation Protocol`), the relevant knowledge docs are loaded BEFORE the agent responds. **Critical process rule:** `.claude/knowledge/bf16-hhtl-terrain.md` contains a