Skip to content

docs(knowledge): lab = API+Planner+JIT, thinking harvest, I11 measurability#224

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/teleport-session-setup-wMZfb
Apr 20, 2026
Merged

docs(knowledge): lab = API+Planner+JIT, thinking harvest, I11 measurability#224
AdaWorldAPI merged 1 commit into
mainfrom
claude/teleport-session-setup-wMZfb

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Refines lab-vs-canonical-surface.md with three load-bearing
additions grounded in the #219#220 calibration arc and the
revived REST/Cypher injection idea.

1. The lab surface is three-part: API + Planner + JIT

The prior "positive purpose" framing was too narrow (codec
iteration velocity). The actual architecture the lab surface buys:

Part Role
REST/gRPC API Curl-friendly entry points; no rebuild per candidate
Planner Real dispatch path under test — not a toy bench
JIT Swap codec kernels / NARS rules / modulation constants at runtime without relinking

All three together = one running binary that measures N
candidates against real inputs in seconds per call.

2. Codec cert is token agreement, not synthetic ICC

Honest narration of what #219 / #220 actually measured:

PR Measurement What it was
#219 ICC 0.9998 on 128 rows trained+measured on same 128 rows Synthetic / overfit — not real weights, not tokens
#220 Reconstruction ICC 0.195 mean, 0/234 ≥ 0.99 on Qwen3-TTS-0.6B Real weights, reconstruction only
Next Token agreement vs Passthrough on full decode The actual cert gate — only tractable on the three-part stack

Rule of thumb for any codec candidate: reconstruction error →
reconstruction ICC (held-out rows) → token agreement. The cert
gate is token agreement, not synthetic ICC.

3. Thinking harvest — the AGI magic bullet

The same API + Planner + JIT stack externalises the planner's
36-style / 13-verb / NARS trace. POST /v1/planner/query { cypher: … } returns { rows, thinking_trace: { active_styles, modulation, beliefs, tensions, entropy, verb_trail } }. The trace
is log / replay / NARS-revise-able — that's the architectural shape
of a system that learns its own meta-inference. This is the
REST/Cypher injection path we can revive at near-zero cost now that
PR #221 landed the REST/gRPC scaffolding.

I11 — Measurable stack, not a black box

New cross-cutting invariant: every layer (L0 ndarray → L4 planner)
emits harvest-ready trace through the lab surface:

Layer Trace emitted Harvested via
L0 ndarray SIMD pattern counters, instruction mix /v1/shader/probe
L1 BindSpace column hit masks, survivor counts /v1/shader/dispatch response
L2 CognitiveShader MetaWord + ShaderHit stream ShaderSink → REST/gRPC stream
L3 CollapseGate GateDecision + delta fingerprint /v1/shader/plan response
L4 Planner VerbTrace per step /v1/planner/query thinking_trace field

Rule: proposed changes that hide state from this trace contract
— "for perf" / "to simplify" — are rejected. The lab surface is the
observation port; shrinking it to bool-out turns the stack back
into a black box and kills the feedback loop.

Three purposes held together (none dominates)

  1. Iteration velocity — codec cert loop without rebuilds
  2. Thinking harvest — externalise planner reasoning for
    log/replay/revision (the AGI magic bullet)
  3. Canonical firewall — production still walks UnifiedStep
    via OrchestrationBridge; never sees Wire* per-op DTOs

Test Plan

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… I11 measurability

The prior "positive purpose" framing was too narrow (codec iteration
velocity). The actual architecture the lab surface buys is three-part:

  REST/gRPC API  — no rebuild per codec candidate
  Planner        — real dispatch path under test (not a toy bench)
  JIT            — swap kernels at runtime without relinking

Two loads share this stack; neither is secondary:

1. Codec certification. Reconstruction ICC on real safetensors is
   necessary but not sufficient — the cert gate is token agreement
   vs Passthrough on full decode. PR #219's 0.9998 was synthetic /
   overfit-on-training; PR #220's 0.195 was real-weight but still
   reconstruction-only. The next load-bearing measurement is the
   token-level comparison, which is only tractable on this stack.
   At 8-17 min/rebuild × ~200 codec invariants to tune, iteration
   without the API is infeasible.

2. Thinking harvest (the AGI magic bullet). The same API + Planner +
   JIT externalises the planner's 36-style / 13-verb / NARS trace.
   POST a Cypher query, get {rows, thinking_trace} back. The trace
   is log / replay / NARS-revise-able — which is the architectural
   shape of a system that learns its own meta-inference. This is
   the REST/Cypher injection path we can revive at near-zero cost
   now that PR #221 landed the REST/gRPC scaffolding.

I11 (new invariant): Measurable stack, not a black box. Every layer
(L0 ndarray → L4 planner) emits a harvest-ready trace through the
lab surface. Proposed changes that shrink trace for perf/simplicity
are rejected — the trace contract is what makes the feedback loop
mechanisable.

Also refined: Decision Procedure item 3 (codec research is a
legitimate positive use, not a grudging exception); rule-of-thumb
measurement order (reconstruction error → reconstruction ICC →
token agreement) with token agreement as the cert gate.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
@AdaWorldAPI AdaWorldAPI merged commit e0c4305 into main Apr 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b569c967d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

| #219 | Claimed ICC 0.9998 at 6 B/row on 128 training rows; lab-gated CAM-PQ candidates behind `--features lab` | Research iteration inside the lab surface; no production wiring yet |
| #220 | D5 full-size validation via `cam_pq_calibrate` CLI on real safetensors; mean ICC **0.195**, 0/234 tensors ≥ 0.99; EPIPHANIES marks prior entry SUPERSEDED | Same lab surface exercised on full tensors — negative result surfaced **before** the codec linked into any canonical consumer |
```
POST /v1/planner/query { cypher: "MATCH (n:Topic) …" }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace nonexistent planner endpoint in guidance

This section documents POST /v1/planner/query returning rows and thinking_trace, but that API does not exist in the current lab server: crates/cognitive-shader-driver/src/serve.rs exposes planner via /v1/shader/plan (and /v1/shader/route), and crates/cognitive-shader-driver/src/wire.rs defines WirePlanResponse without rows/thinking_trace. Because this document is positioned as an anti-hallucination architecture guard, pointing contributors to a nonexistent endpoint will cause broken client code and incorrect follow-on docs.

Useful? React with 👍 / 👎.

| Layer | Trace emitted | Harvested via |
|---|---|---|
| L0 ndarray SIMD | pattern counters, instruction mix | `/v1/shader/probe` (perf counters on hot kernels) |
| L1 BindSpace | column hit masks, survivor counts after each monotone-narrowing sweep | `/v1/shader/dispatch` response |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align I11 trace contract with actual dispatch payload

The new I11 table states that /v1/shader/dispatch responses expose "column hit masks" and "survivor counts," but the real dispatch payload (WireCrystal in crates/cognitive-shader-driver/src/wire.rs) does not contain those fields. Declaring this as a hard invariant in this doc creates a false API contract that downstream agents/reviewers may enforce, leading to incorrect assumptions about available telemetry.

Useful? React with 👍 / 👎.

AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Operationalises PR #220's "What's Needed to Fix" list (wider codebook,
residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through
the lab endpoint — every codec difference is a JIT kernel, not a cargo
rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run
unlimited candidates without further rebuilds; Phase 5 graduates
winners to the canonical OrchestrationBridge surface.

Structure:

  Phase 0 — API hardening (one rebuild, then frozen):
    D0.1 CodecParams in WireCalibrate
    D0.2 WireTokenAgreement endpoint (I11 cert gate)
    D0.3 WireSweep streaming + Lance append
    D0.4 surface freeze

  Phase 1 — JIT codec kernels (rebuild-free):
    D1.1 CodecKernelCache via JitCompiler (Cranelift)
    D1.2 Rotation primitives (Identity / Hadamard / OPQ)
    D1.3 Residual PQ via JIT composition

  Phase 2 — Token-agreement harness (the I11 cert gate):
    D2.1 Reference-model loader (ndarray safetensors)
    D2.2 Decode-and-compare loop (top-k, per-layer MSE)
    D2.3 Handler wiring

  Phase 3 — Sweep driver + Lance logger
  Phase 4 — DataFusion frontier analysis
  Phase 5 — Graduation to OrchestrationBridge (per winner only)

~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards.
Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of
hours). All work behind --features lab until graduation.

INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224
dependency for the architectural framing.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Operationalises PR #220's "What's Needed to Fix" list (wider codebook,
residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through
the lab endpoint — every codec difference is a JIT kernel, not a cargo
rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run
unlimited candidates without further rebuilds; Phase 5 graduates
winners to the canonical OrchestrationBridge surface.

Structure:

  Phase 0 — API hardening (one rebuild, then frozen):
    D0.1 CodecParams in WireCalibrate
    D0.2 WireTokenAgreement endpoint (I11 cert gate)
    D0.3 WireSweep streaming + Lance append
    D0.4 surface freeze

  Phase 1 — JIT codec kernels (rebuild-free):
    D1.1 CodecKernelCache via JitCompiler (Cranelift)
    D1.2 Rotation primitives (Identity / Hadamard / OPQ)
    D1.3 Residual PQ via JIT composition

  Phase 2 — Token-agreement harness (the I11 cert gate):
    D2.1 Reference-model loader (ndarray safetensors)
    D2.2 Decode-and-compare loop (top-k, per-layer MSE)
    D2.3 Handler wiring

  Phase 3 — Sweep driver + Lance logger
  Phase 4 — DataFusion frontier analysis
  Phase 5 — Graduation to OrchestrationBridge (per winner only)

~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards.
Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of
hours). All work behind --features lab until graduation.

INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224
dependency for the architectural framing.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Retroactive hygiene for the recent PR arc + prospective enforcement
so the gap never recurs. User directive: "should have happened to
begin with."

LATEST_STATE.md:
  - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)"
  - Recently Shipped table: prepended rows for #225 (open), #224,
    and #223 with full shipped-content summaries
  - Contract Inventory: expanded cam:: entry with all new codec-
    sweep types (LaneWidth / Distance / Rotation / ResidualSpec /
    CodecParams / CodecParamsBuilder / CodecParamsError) including
    the precision-ladder-fires-before-JIT invariant
  - Active Branches: recorded claude/teleport-session-setup-wMZfb
    and its three merged PRs
  - Active Integration Plans: added codec-sweep-via-lab-infra-v1
    alongside elegant-herding-rocket-v1
  - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/
    0.5) + the elegant-herding Phase 2 block

PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only):
  - #225 entry: plan + CodecParams/Builder/precision validation +
    rules A-F locked + decisions for future PRs
  - #224 entry: three-part lab stack + thinking harvest + I11
    measurability locked
  - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants
    locked (the cross-cutting architectural ruleset this workspace
    now enforces)

STATUS_BOARD.md:
  - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across
    5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued)

EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries):
  - Board hygiene is the driving seat, not cleanup (this session's
    self-reflection turned into a rule)
  - Codec cert is token agreement, not synthetic ICC (#219#220
    arc; #225 CalibrationEqualsMeasurement typed rejection)
  - Lab REST surface is three-part (API + Planner + JIT), not just
    scaffolding
  - Thinking harvest via REST/Cypher = the AGI magic bullet
  - SoA never scalarises without ndarray (iron rule Rule C)
  - AGI is the glove, not the oracle — four-axis SoA is what you
    wear

CLAUDE.md — new top-level § "The Stance — Driving Seat +
AGI-as-Glove (P0, read first)":

  - Explicit driving-seat posture: the session STEERS the stack,
    doesn't observe it
  - AGI-as-glove doctrine concrete: topic → FingerprintColumns,
    angle → QualiaColumn, thinking → MetaColumn, planner →
    EdgeColumn. New capability lands as a new column, not a layer.
  - MANDATORY Board-Hygiene Rule as a table: every PR that adds a
    type / plan / D-id / epiphany / tech-debt / issue MUST update
    the corresponding board file IN THE SAME COMMIT. Retroactive
    hygiene (merge PR → later cleanup) is now an anti-pattern the
    rule forbids.
  - "Consult, don't guess" — agent/knowledge-first discipline:
    specialist-agent card → knowledge doc → board inventory →
    only then grep. Subagent spawn with curated docs beats main-
    thread grep.

147/147 contract suite still passing. Doc-only PR otherwise
(Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps
from the timed-out bus-compiler subagent were reverted — they'll
land with D0.1/D0.3 when the Wire code lands).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI added a commit that referenced this pull request Apr 20, 2026
…p-wMZfb

board hygiene + CLAUDE.md driving-seat tightening (post #223/#224/#225)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants