docs(knowledge): lab = API+Planner+JIT, thinking harvest, I11 measurability#224
Conversation
… I11 measurability The prior "positive purpose" framing was too narrow (codec iteration velocity). The actual architecture the lab surface buys is three-part: REST/gRPC API — no rebuild per codec candidate Planner — real dispatch path under test (not a toy bench) JIT — swap kernels at runtime without relinking Two loads share this stack; neither is secondary: 1. Codec certification. Reconstruction ICC on real safetensors is necessary but not sufficient — the cert gate is token agreement vs Passthrough on full decode. PR #219's 0.9998 was synthetic / overfit-on-training; PR #220's 0.195 was real-weight but still reconstruction-only. The next load-bearing measurement is the token-level comparison, which is only tractable on this stack. At 8-17 min/rebuild × ~200 codec invariants to tune, iteration without the API is infeasible. 2. Thinking harvest (the AGI magic bullet). The same API + Planner + JIT externalises the planner's 36-style / 13-verb / NARS trace. POST a Cypher query, get {rows, thinking_trace} back. The trace is log / replay / NARS-revise-able — which is the architectural shape of a system that learns its own meta-inference. This is the REST/Cypher injection path we can revive at near-zero cost now that PR #221 landed the REST/gRPC scaffolding. I11 (new invariant): Measurable stack, not a black box. Every layer (L0 ndarray → L4 planner) emits a harvest-ready trace through the lab surface. Proposed changes that shrink trace for perf/simplicity are rejected — the trace contract is what makes the feedback loop mechanisable. Also refined: Decision Procedure item 3 (codec research is a legitimate positive use, not a grudging exception); rule-of-thumb measurement order (reconstruction error → reconstruction ICC → token agreement) with token agreement as the cert gate. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b569c967d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | #219 | Claimed ICC 0.9998 at 6 B/row on 128 training rows; lab-gated CAM-PQ candidates behind `--features lab` | Research iteration inside the lab surface; no production wiring yet | | ||
| | #220 | D5 full-size validation via `cam_pq_calibrate` CLI on real safetensors; mean ICC **0.195**, 0/234 tensors ≥ 0.99; EPIPHANIES marks prior entry SUPERSEDED | Same lab surface exercised on full tensors — negative result surfaced **before** the codec linked into any canonical consumer | | ||
| ``` | ||
| POST /v1/planner/query { cypher: "MATCH (n:Topic) …" } |
There was a problem hiding this comment.
Replace nonexistent planner endpoint in guidance
This section documents POST /v1/planner/query returning rows and thinking_trace, but that API does not exist in the current lab server: crates/cognitive-shader-driver/src/serve.rs exposes planner via /v1/shader/plan (and /v1/shader/route), and crates/cognitive-shader-driver/src/wire.rs defines WirePlanResponse without rows/thinking_trace. Because this document is positioned as an anti-hallucination architecture guard, pointing contributors to a nonexistent endpoint will cause broken client code and incorrect follow-on docs.
Useful? React with 👍 / 👎.
| | Layer | Trace emitted | Harvested via | | ||
| |---|---|---| | ||
| | L0 ndarray SIMD | pattern counters, instruction mix | `/v1/shader/probe` (perf counters on hot kernels) | | ||
| | L1 BindSpace | column hit masks, survivor counts after each monotone-narrowing sweep | `/v1/shader/dispatch` response | |
There was a problem hiding this comment.
Align I11 trace contract with actual dispatch payload
The new I11 table states that /v1/shader/dispatch responses expose "column hit masks" and "survivor counts," but the real dispatch payload (WireCrystal in crates/cognitive-shader-driver/src/wire.rs) does not contain those fields. Declaring this as a hard invariant in this doc creates a false API contract that downstream agents/reviewers may enforce, leading to incorrect assumptions about available telemetry.
Useful? React with 👍 / 👎.
Operationalises PR #220's "What's Needed to Fix" list (wider codebook, residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through the lab endpoint — every codec difference is a JIT kernel, not a cargo rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run unlimited candidates without further rebuilds; Phase 5 graduates winners to the canonical OrchestrationBridge surface. Structure: Phase 0 — API hardening (one rebuild, then frozen): D0.1 CodecParams in WireCalibrate D0.2 WireTokenAgreement endpoint (I11 cert gate) D0.3 WireSweep streaming + Lance append D0.4 surface freeze Phase 1 — JIT codec kernels (rebuild-free): D1.1 CodecKernelCache via JitCompiler (Cranelift) D1.2 Rotation primitives (Identity / Hadamard / OPQ) D1.3 Residual PQ via JIT composition Phase 2 — Token-agreement harness (the I11 cert gate): D2.1 Reference-model loader (ndarray safetensors) D2.2 Decode-and-compare loop (top-k, per-layer MSE) D2.3 Handler wiring Phase 3 — Sweep driver + Lance logger Phase 4 — DataFusion frontier analysis Phase 5 — Graduation to OrchestrationBridge (per winner only) ~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards. Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of hours). All work behind --features lab until graduation. INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224 dependency for the architectural framing. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Operationalises PR #220's "What's Needed to Fix" list (wider codebook, residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through the lab endpoint — every codec difference is a JIT kernel, not a cargo rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run unlimited candidates without further rebuilds; Phase 5 graduates winners to the canonical OrchestrationBridge surface. Structure: Phase 0 — API hardening (one rebuild, then frozen): D0.1 CodecParams in WireCalibrate D0.2 WireTokenAgreement endpoint (I11 cert gate) D0.3 WireSweep streaming + Lance append D0.4 surface freeze Phase 1 — JIT codec kernels (rebuild-free): D1.1 CodecKernelCache via JitCompiler (Cranelift) D1.2 Rotation primitives (Identity / Hadamard / OPQ) D1.3 Residual PQ via JIT composition Phase 2 — Token-agreement harness (the I11 cert gate): D2.1 Reference-model loader (ndarray safetensors) D2.2 Decode-and-compare loop (top-k, per-layer MSE) D2.3 Handler wiring Phase 3 — Sweep driver + Lance logger Phase 4 — DataFusion frontier analysis Phase 5 — Graduation to OrchestrationBridge (per winner only) ~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards. Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of hours). All work behind --features lab until graduation. INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224 dependency for the architectural framing. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Retroactive hygiene for the recent PR arc + prospective enforcement so the gap never recurs. User directive: "should have happened to begin with." LATEST_STATE.md: - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)" - Recently Shipped table: prepended rows for #225 (open), #224, and #223 with full shipped-content summaries - Contract Inventory: expanded cam:: entry with all new codec- sweep types (LaneWidth / Distance / Rotation / ResidualSpec / CodecParams / CodecParamsBuilder / CodecParamsError) including the precision-ladder-fires-before-JIT invariant - Active Branches: recorded claude/teleport-session-setup-wMZfb and its three merged PRs - Active Integration Plans: added codec-sweep-via-lab-infra-v1 alongside elegant-herding-rocket-v1 - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/ 0.5) + the elegant-herding Phase 2 block PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only): - #225 entry: plan + CodecParams/Builder/precision validation + rules A-F locked + decisions for future PRs - #224 entry: three-part lab stack + thinking harvest + I11 measurability locked - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants locked (the cross-cutting architectural ruleset this workspace now enforces) STATUS_BOARD.md: - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across 5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued) EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries): - Board hygiene is the driving seat, not cleanup (this session's self-reflection turned into a rule) - Codec cert is token agreement, not synthetic ICC (#219 → #220 arc; #225 CalibrationEqualsMeasurement typed rejection) - Lab REST surface is three-part (API + Planner + JIT), not just scaffolding - Thinking harvest via REST/Cypher = the AGI magic bullet - SoA never scalarises without ndarray (iron rule Rule C) - AGI is the glove, not the oracle — four-axis SoA is what you wear CLAUDE.md — new top-level § "The Stance — Driving Seat + AGI-as-Glove (P0, read first)": - Explicit driving-seat posture: the session STEERS the stack, doesn't observe it - AGI-as-glove doctrine concrete: topic → FingerprintColumns, angle → QualiaColumn, thinking → MetaColumn, planner → EdgeColumn. New capability lands as a new column, not a layer. - MANDATORY Board-Hygiene Rule as a table: every PR that adds a type / plan / D-id / epiphany / tech-debt / issue MUST update the corresponding board file IN THE SAME COMMIT. Retroactive hygiene (merge PR → later cleanup) is now an anti-pattern the rule forbids. - "Consult, don't guess" — agent/knowledge-first discipline: specialist-agent card → knowledge doc → board inventory → only then grep. Subagent spawn with curated docs beats main- thread grep. 147/147 contract suite still passing. Doc-only PR otherwise (Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps from the timed-out bus-compiler subagent were reverted — they'll land with D0.1/D0.3 when the Wire code lands). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Summary
Refines
lab-vs-canonical-surface.mdwith three load-bearingadditions grounded in the #219 → #220 calibration arc and the
revived REST/Cypher injection idea.
1. The lab surface is three-part: API + Planner + JIT
The prior "positive purpose" framing was too narrow (codec
iteration velocity). The actual architecture the lab surface buys:
All three together = one running binary that measures N
candidates against real inputs in seconds per call.
2. Codec cert is token agreement, not synthetic ICC
Honest narration of what #219 / #220 actually measured:
Rule of thumb for any codec candidate: reconstruction error →
reconstruction ICC (held-out rows) → token agreement. The cert
gate is token agreement, not synthetic ICC.
3. Thinking harvest — the AGI magic bullet
The same API + Planner + JIT stack externalises the planner's
36-style / 13-verb / NARS trace.
POST /v1/planner/query { cypher: … }returns{ rows, thinking_trace: { active_styles, modulation, beliefs, tensions, entropy, verb_trail } }. The traceis log / replay / NARS-revise-able — that's the architectural shape
of a system that learns its own meta-inference. This is the
REST/Cypher injection path we can revive at near-zero cost now that
PR #221 landed the REST/gRPC scaffolding.
I11 — Measurable stack, not a black box
New cross-cutting invariant: every layer (L0 ndarray → L4 planner)
emits harvest-ready trace through the lab surface:
/v1/shader/probe/v1/shader/dispatchresponseMetaWord+ShaderHitstreamShaderSink→ REST/gRPC streamGateDecision+ delta fingerprint/v1/shader/planresponseVerbTraceper step/v1/planner/querythinking_tracefieldRule: proposed changes that hide state from this trace contract
— "for perf" / "to simplify" — are rejected. The lab surface is the
observation port; shrinking it to bool-out turns the stack back
into a black box and kills the feedback loop.
Three purposes held together (none dominates)
log/replay/revision (the AGI magic bullet)
UnifiedStepvia
OrchestrationBridge; never seesWire*per-op DTOsTest Plan
positive use of the lab surface, with graduation rule
accurately without overselling either ICC number
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh