Skip to content

D1+D2+D5: CAM-PQ calibration pipeline — honest negative result#220

Merged
AdaWorldAPI merged 2 commits into
mainfrom
claude/cam-pq-production-d1-d2
Apr 20, 2026
Merged

D1+D2+D5: CAM-PQ calibration pipeline — honest negative result#220
AdaWorldAPI merged 2 commits into
mainfrom
claude/cam-pq-production-d1-d2

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

  • D1route_tensor classifier in lance-graph-contract::cam: routes tensors to CamPq / Passthrough / Skip per invariant I1. 10 tests, 133/133 contract suite passes.
  • D2cam_pq_calibrate CLI (--features calibrate): reads safetensors, trains per-tensor CamCodebook, encodes fingerprints, serializes codebooks + fingerprints + manifest.json with SHA256 + ICC + reconstruction error.
  • D5 — Full-size validation on Qwen3-TTS-0.6B: FAILS the ≥0.99 ICC gate.
  • Diagnostic probe (cam_pq_row_count_probe) demonstrates the root cause.
  • EPIPHANIES.md updated: prior "CAM-PQ solves argmax" entry marked SUPERSEDED.

Negative Result

PR #218's bench measured ICC 0.9998 on 128 rows trained and measured on those same 128 rows. With 256 centroids per subspace, 128 rows trivially fit — every row gets its own centroid. This does not generalize.

Full-size validation (234 argmax-regime tensors, Qwen3-TTS-0.6B):

Metric Value
Mean ICC 0.195
Max ICC 0.957
Tensors ≥ 0.99 ICC 0 / 234
Relative L2 error 0.70–0.90

Diagnostic on one gate_proj [3072, 1024]:

n_train icc_train icc_all_rows
128 1.000 −0.304
256 1.000 −0.130
512 0.531 0.015
3072 −0.079 −0.079

Root cause: 6×256 PQ is centroid-starved for production tensors (1024–3072 rows). The "128× compression" claim was extrapolated from a trivial in-training fit.

What's Sound

Infrastructure works correctly: the CLI, route classifier, codebook serialization format, ICC/reconstruction measurement harness. The negative result is in the codec's capacity, not the tooling.

What's Needed to Fix

  • (a) Wider codebook: 1024+ centroids per subspace (10 bits = 7.5 B/row)
  • (b) Residual PQ: encode residuals after first pass
  • (c) Hadamard pre-rotation to decorrelate subspaces
  • (d) OPQ (optimized product quantization) rotation

Test plan

  • cargo test -p lance-graph-contract --lib route_tests — 10/10 pass
  • cargo test -p lance-graph-contract — 133/133 pass
  • cam_pq_calibrate builds and runs on Qwen3-TTS-0.6B safetensors
  • cam_pq_row_count_probe reproduces the 128-row artifact

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 2 commits April 20, 2026 07:29
Enforces invariant I1: index-regime tensors (embed_tokens, lm_head,
token_embd, wte/wpe) MUST stay Passthrough — identity lookup can't
survive any codec. Argmax-regime (attention Q/K/V/O, MLP gate/up/down)
routes to CamPq. Norms/conv/small → Skip.

Order of rules matters: index-regime match comes BEFORE the
ambiguous-large-2D fallback so lm_head (2D, 151936×hidden) isn't
misrouted. Covered by lm_head_not_misrouted_as_campq test.

8 tests covering Qwen/Llama/GPT-2/GGUF naming conventions. 133/133
contract tests pass. Zero deps preserved.

First deliverable (D1) of the CAM-PQ production wiring plan merged
in PR #219.
…sult

D2 — cam_pq_calibrate binary: reads safetensors, classifies tensors via
route_tensor (D1), trains a CamCodebook per argmax-regime tensor, encodes
all rows to 6-byte fingerprints, measures ICC_3_1 and relative L2 error,
writes codebooks + fingerprints + manifest.json.

D5 — full-size validation on Qwen3-TTS-0.6B: FAILS. 234 argmax-regime
tensors measured. Mean ICC = 0.195, zero tensors meet the ≥0.99 gate.
Relative L2 error 0.70–0.90.

Root cause: PR #218 bench measured ICC 0.9998 on 128 rows trained and
measured on those same 128 rows — a trivially-correct fit (128 ≤ 256
centroids → every row gets its own centroid). At production tensor
sizes (1024–3072 rows), the 6×256 codebook is centroid-starved.

cam_pq_row_count_probe.rs demonstrates the collapse:
  n=128 → icc_train=1.000, icc_all=-0.304
  n=3072 → icc_train=-0.079

Also broadens route_tensor embedding match to catch codec_embedding,
adding 2 new test cases (10 total, 133/133 contract tests pass).

Infrastructure (CLI, serialization, measurement) is sound. The
negative result is in the codec's capacity vs tensor row counts, not
the tooling. Plan needs revision before D6/D7 effort.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
@AdaWorldAPI AdaWorldAPI merged commit 3d9239d into main Apr 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a78131fa76

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +146 to +150
let (row_dim, n_rows) = match row_layout(&dims_u64) {
Some(v) => v,
None => {
eprintln!(" [skip: not a 2D matrix]");
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle non-2D CamPq routes without dropping tensors

When a tensor is classified as CamPq but is not 2D (for example, packed expert weights in MoE checkpoints), this branch logs a skip and continues before writing any artifact or manifest row. Because route_tensor matches attention/MLP names without checking rank (crates/lance-graph-contract/src/cam.rs), these tensors can be silently omitted, producing incomplete calibration outputs that downstream tooling cannot fully reconstruct.

Useful? React with 👍 / 👎.

Comment on lines +406 to +407
let bytes: Vec<u8> = data.iter().flat_map(|f| f.to_le_bytes()).collect();
w.write_all(&bytes)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stream passthrough tensor writes instead of materializing bytes

This code converts the entire &[f32] tensor into a separate Vec<u8> before writing, which roughly doubles peak memory for each passthrough tensor. For large embedding/lm_head tensors this can push calibration runs into OOM or severe memory pressure even though the data could be written incrementally in chunks.

Useful? React with 👍 / 👎.

AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
The LAB-ONLY surface isn't just quarantine scaffolding — it's the
codec-research iteration testbed. Its reason for existing is the cost
of the alternative: every codec candidate re-measured through a
cargo build cycle burns minutes per iteration.

With the lab REST/gRPC + wire DTOs, a single binary serves dozens of
candidates against the same safetensors in seconds per call. PR #220
falsified PR #219's ICC-0.9998 claim via exactly this path: the
calibration CLI + /v1/shader/calibrate endpoint surfaced mean ICC
0.195 / 0/234 pass rate on full Qwen3-TTS-0.6B tensors before any
production consumer linked the codec.

Two purposes now named explicitly in the doc:

1. Iteration velocity (positive) — lab surface = curl-friendly
   research loop, no rebuild per candidate.
2. Canonical firewall (guard) — consumers still walk UnifiedStep via
   OrchestrationBridge; they never see Wire* per-op DTOs.

Changes:
- New subsection "Why the Lab Surface Exists (positive purpose — not
  just quarantine)" with the #219#220 worked-example table.
- Decision Procedure item 3 reframed: research ops and curl-friendly
  debug shortcuts are a legitimate use of the lab surface, with a
  graduation rule (full-size validation → new StepDomain variant; lab
  endpoint stays for continued iteration, production moves to bridge).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
… I11 measurability

The prior "positive purpose" framing was too narrow (codec iteration
velocity). The actual architecture the lab surface buys is three-part:

  REST/gRPC API  — no rebuild per codec candidate
  Planner        — real dispatch path under test (not a toy bench)
  JIT            — swap kernels at runtime without relinking

Two loads share this stack; neither is secondary:

1. Codec certification. Reconstruction ICC on real safetensors is
   necessary but not sufficient — the cert gate is token agreement
   vs Passthrough on full decode. PR #219's 0.9998 was synthetic /
   overfit-on-training; PR #220's 0.195 was real-weight but still
   reconstruction-only. The next load-bearing measurement is the
   token-level comparison, which is only tractable on this stack.
   At 8-17 min/rebuild × ~200 codec invariants to tune, iteration
   without the API is infeasible.

2. Thinking harvest (the AGI magic bullet). The same API + Planner +
   JIT externalises the planner's 36-style / 13-verb / NARS trace.
   POST a Cypher query, get {rows, thinking_trace} back. The trace
   is log / replay / NARS-revise-able — which is the architectural
   shape of a system that learns its own meta-inference. This is
   the REST/Cypher injection path we can revive at near-zero cost
   now that PR #221 landed the REST/gRPC scaffolding.

I11 (new invariant): Measurable stack, not a black box. Every layer
(L0 ndarray → L4 planner) emits a harvest-ready trace through the
lab surface. Proposed changes that shrink trace for perf/simplicity
are rejected — the trace contract is what makes the feedback loop
mechanisable.

Also refined: Decision Procedure item 3 (codec research is a
legitimate positive use, not a grudging exception); rule-of-thumb
measurement order (reconstruction error → reconstruction ICC →
token agreement) with token agreement as the cert gate.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Operationalises PR #220's "What's Needed to Fix" list (wider codebook,
residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through
the lab endpoint — every codec difference is a JIT kernel, not a cargo
rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run
unlimited candidates without further rebuilds; Phase 5 graduates
winners to the canonical OrchestrationBridge surface.

Structure:

  Phase 0 — API hardening (one rebuild, then frozen):
    D0.1 CodecParams in WireCalibrate
    D0.2 WireTokenAgreement endpoint (I11 cert gate)
    D0.3 WireSweep streaming + Lance append
    D0.4 surface freeze

  Phase 1 — JIT codec kernels (rebuild-free):
    D1.1 CodecKernelCache via JitCompiler (Cranelift)
    D1.2 Rotation primitives (Identity / Hadamard / OPQ)
    D1.3 Residual PQ via JIT composition

  Phase 2 — Token-agreement harness (the I11 cert gate):
    D2.1 Reference-model loader (ndarray safetensors)
    D2.2 Decode-and-compare loop (top-k, per-layer MSE)
    D2.3 Handler wiring

  Phase 3 — Sweep driver + Lance logger
  Phase 4 — DataFusion frontier analysis
  Phase 5 — Graduation to OrchestrationBridge (per winner only)

~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards.
Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of
hours). All work behind --features lab until graduation.

INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224
dependency for the architectural framing.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
…hanges

Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md,
codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and
user directives "its all there, dont touch, just be aware how to use
crate::simd", "wire accordingly into the lab infra", "via struct of
arrays":

  - slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77,
    const-generic. I conflated it with a missing ndarray::array_window
    (singular); corrected.

  - AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via
    stable inline asm on Rust 1.94, per src/simd_amx.rs header),
    NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly;
    inline asm at src/hpc/amx_matmul.rs is the stable consumer path.
    Verified on kernel 6.18.5 with XCR0 bits 17+18 set.

  - Real primitive names (no hallucinated matmul_tiled /
    hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16
    for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI;
    F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline.

  - Polyfill hierarchy per user directive
    (simd_amx > simd_avx512 > simd_avx2 fallback):
      Tier 1: Intel AMX tile (256 MACs/instr)
      Tier 2: AVX-512 VNNI (64 MACs/instr)
      Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory
              default per ndarray's .cargo/config.toml
              target-cpu=x86-64-v4)
      Tier 4: AVX-2 F32x8 fallback
      Tier 5: scalar reference

  - Rule A wires SoA: the &[u8] slice array_windows iterates comes
    from a BindSpace column (FingerprintColumns / QualiaColumn /
    MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new
    data structures — the SoA column IS the input surface.

  - Dropped all "Phase 0 ndarray prerequisite" language. Everything
    the sweep needs exists in ndarray today; this plan wires the
    existing surface into cognitive-shader-driver (REST handlers +
    CodecKernelCache + CodecResearchBridge). Zero ndarray changes.

  - Added reality-check against codec-findings-2026-04-20.md so the
    sweep does NOT re-derive measured winners: Had-Q5×D-R already
    ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row-
    only at ICC ≈ 0.9; zipper serves bundling axis, not argmax;
    fractal leaf descriptors are DEAD (sign-flip invariant). The
    sweep focuses on #220's four unmeasured candidates (wider
    codebook / residual PQ / Hadamard pre-rotation / OPQ) and on
    the missing axis — token agreement, not reconstruction ICC.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Eight concrete YAML configs for configs/codec/*.yaml that Phase 0
will consume:

  00_baseline_passthrough         — regression anchor (top1=1.000 exactly)
  01_pr220_baseline               — negative control, reproduces #220 ICC 0.195
  02_pr219_overfit_reproducer     — negative control, split-test must FAIL
  10_fix_a_wider_codebook         — #220 (a) 1024 centroids
  11_fix_b_residual_pq            — #220 (b) residual depth=1
  12_fix_c_hadamard_rotation      — #220 (c) Hadamard pre-rotation
  13_fix_d_opq_rotation           — #220 (d) OPQ learned rotation
  20_composite_a_plus_b           — composition probe for combinatorial lift
  30_cross_product_sweep          — SweepGrid for D3.1 initial sweep

Each YAML:

  - Names lane_width explicitly (Rule E) so the JIT compiles the
    right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others
    default to F32x16.
  - Carries a notes: block stating the expected measurement
    outcome, so Phase 0's regression detection has ground truth
    to check against (e.g., baseline reproducer must produce
    ICC ≈ 0.195, overfit reproducer must FAIL the split-test).
  - Separates calibration_rows from measurement_rows where
    relevant (pr219_overfit_reproducer sets them equal so the
    pipeline refuses to report the ICC, demonstrating the guard
    that prevents PR #219's overfit-on-training artefact from
    recurring).

30_cross_product_sweep specifies the initial 54-candidate grid
(1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance
× 2 lane widths). Expected JIT compile budget: ~800 ms one-time;
everything after is cache hits per Rule A/B.

Operating principle reiterated at the end: adding a candidate is
authoring a YAML; changing params is editing YAML; Rust reads
YAML once at ingress (Rule F) and never re-serialises. Sweep
logger appends result rows to Lance — the only egress beyond the
REST response.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
…dation

First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan.
Zero-dep contract-side types the lab API (cognitive-shader-driver)
will carry into JIT compilation.

Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC):

  Enums (Rule E — Wire surface IS the SIMD surface, object-oriented):
    LaneWidth { F32x16, U8x64, F64x8, BF16x32 }  — mirrors ndarray::simd::*
    Distance  { AdcU8, AdcI8 }                    — CODING_PRACTICES gap 5
                                                    (sign-handling /
                                                    bipolar cancellation)
    Rotation  { Identity, Hadamard{dim}, Opq{blob,dim} }

  Structs:
    ResidualSpec  { depth, centroids }
    CodecParams   { subspaces, centroids, residual, lane_width,
                    pre_rotation, distance, calibration_rows,
                    measurement_rows, seed }

  Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct):
    CodecParamsBuilder::new()
      .subspaces(u32).centroids(u32).residual(ResidualSpec)
      .lane_width(LaneWidth).rotation(Rotation).distance(Distance)
      .calibration_rows(u32).measurement_rows(u32).seed(u64)
      .build() -> Result<CodecParams, CodecParamsError>

  Validation fires BEFORE any JIT compile (D0.7 precision ladder):
    - ZeroDimension          — subspaces == 0 or centroids == 0
    - OpqRequiresBf16        — OPQ routes through tile_dpbf16ps;
                               only LaneWidth::BF16x32 is valid
    - HadamardDimNotPow2     — Sylvester construction needs dim = 2^k
    - CalibrationEqualsMeasurement — overfit guard: refuses to emit
                               ICC when calibration_rows ==
                               measurement_rows (reproduces PR #219's
                               128-row trained-and-tested artifact)

  Methods on CodecParams:
    kernel_signature() -> u64   — JIT cache key (Rule E); excludes
                                  seed so calibration-sample changes
                                  don't invalidate cached kernels
    is_matmul_heavy() -> bool   — true for OPQ or centroids > 512;
                                  drives Tier-1 AMX dispatch decision
                                  (Rule C polyfill hierarchy)

  Rotation::is_matmul() -> bool  — Identity and Hadamard are false
                                  (butterfly stays on Tier-3 F32x16);
                                  only Opq returns true

14 new tests covering:
  - builder default matches PR #220 baseline shape
  - each validation variant fires correctly
  - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error
  - Hadamard + non-pow2 dim rejected
  - overfit guard fires on calibration == measurement
  - kernel_signature stable across identical builds
  - kernel_signature excludes seed (cache stays hot)
  - kernel_signature changes with centroids / rotation kind
  - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids)

Zero-dep preserved (stdlib only: std::collections::hash_map::
DefaultHasher for kernel_signature, core::fmt + core::error for
error types). No serde in the contract — YAML/JSON deserialisation
belongs to the consumer crate, which will produce CodecParams via
serde at the REST handler (Rule F — serialisation at edge only).

Tests: 147/147 contract suite passing (133 prior + 14 new).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Operationalises PR #220's "What's Needed to Fix" list (wider codebook,
residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through
the lab endpoint — every codec difference is a JIT kernel, not a cargo
rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run
unlimited candidates without further rebuilds; Phase 5 graduates
winners to the canonical OrchestrationBridge surface.

Structure:

  Phase 0 — API hardening (one rebuild, then frozen):
    D0.1 CodecParams in WireCalibrate
    D0.2 WireTokenAgreement endpoint (I11 cert gate)
    D0.3 WireSweep streaming + Lance append
    D0.4 surface freeze

  Phase 1 — JIT codec kernels (rebuild-free):
    D1.1 CodecKernelCache via JitCompiler (Cranelift)
    D1.2 Rotation primitives (Identity / Hadamard / OPQ)
    D1.3 Residual PQ via JIT composition

  Phase 2 — Token-agreement harness (the I11 cert gate):
    D2.1 Reference-model loader (ndarray safetensors)
    D2.2 Decode-and-compare loop (top-k, per-layer MSE)
    D2.3 Handler wiring

  Phase 3 — Sweep driver + Lance logger
  Phase 4 — DataFusion frontier analysis
  Phase 5 — Graduation to OrchestrationBridge (per winner only)

~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards.
Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of
hours). All work behind --features lab until graduation.

INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224
dependency for the architectural framing.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
…hanges

Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md,
codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and
user directives "its all there, dont touch, just be aware how to use
crate::simd", "wire accordingly into the lab infra", "via struct of
arrays":

  - slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77,
    const-generic. I conflated it with a missing ndarray::array_window
    (singular); corrected.

  - AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via
    stable inline asm on Rust 1.94, per src/simd_amx.rs header),
    NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly;
    inline asm at src/hpc/amx_matmul.rs is the stable consumer path.
    Verified on kernel 6.18.5 with XCR0 bits 17+18 set.

  - Real primitive names (no hallucinated matmul_tiled /
    hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16
    for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI;
    F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline.

  - Polyfill hierarchy per user directive
    (simd_amx > simd_avx512 > simd_avx2 fallback):
      Tier 1: Intel AMX tile (256 MACs/instr)
      Tier 2: AVX-512 VNNI (64 MACs/instr)
      Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory
              default per ndarray's .cargo/config.toml
              target-cpu=x86-64-v4)
      Tier 4: AVX-2 F32x8 fallback
      Tier 5: scalar reference

  - Rule A wires SoA: the &[u8] slice array_windows iterates comes
    from a BindSpace column (FingerprintColumns / QualiaColumn /
    MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new
    data structures — the SoA column IS the input surface.

  - Dropped all "Phase 0 ndarray prerequisite" language. Everything
    the sweep needs exists in ndarray today; this plan wires the
    existing surface into cognitive-shader-driver (REST handlers +
    CodecKernelCache + CodecResearchBridge). Zero ndarray changes.

  - Added reality-check against codec-findings-2026-04-20.md so the
    sweep does NOT re-derive measured winners: Had-Q5×D-R already
    ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row-
    only at ICC ≈ 0.9; zipper serves bundling axis, not argmax;
    fractal leaf descriptors are DEAD (sign-flip invariant). The
    sweep focuses on #220's four unmeasured candidates (wider
    codebook / residual PQ / Hadamard pre-rotation / OPQ) and on
    the missing axis — token agreement, not reconstruction ICC.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Eight concrete YAML configs for configs/codec/*.yaml that Phase 0
will consume:

  00_baseline_passthrough         — regression anchor (top1=1.000 exactly)
  01_pr220_baseline               — negative control, reproduces #220 ICC 0.195
  02_pr219_overfit_reproducer     — negative control, split-test must FAIL
  10_fix_a_wider_codebook         — #220 (a) 1024 centroids
  11_fix_b_residual_pq            — #220 (b) residual depth=1
  12_fix_c_hadamard_rotation      — #220 (c) Hadamard pre-rotation
  13_fix_d_opq_rotation           — #220 (d) OPQ learned rotation
  20_composite_a_plus_b           — composition probe for combinatorial lift
  30_cross_product_sweep          — SweepGrid for D3.1 initial sweep

Each YAML:

  - Names lane_width explicitly (Rule E) so the JIT compiles the
    right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others
    default to F32x16.
  - Carries a notes: block stating the expected measurement
    outcome, so Phase 0's regression detection has ground truth
    to check against (e.g., baseline reproducer must produce
    ICC ≈ 0.195, overfit reproducer must FAIL the split-test).
  - Separates calibration_rows from measurement_rows where
    relevant (pr219_overfit_reproducer sets them equal so the
    pipeline refuses to report the ICC, demonstrating the guard
    that prevents PR #219's overfit-on-training artefact from
    recurring).

30_cross_product_sweep specifies the initial 54-candidate grid
(1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance
× 2 lane widths). Expected JIT compile budget: ~800 ms one-time;
everything after is cache hits per Rule A/B.

Operating principle reiterated at the end: adding a candidate is
authoring a YAML; changing params is editing YAML; Rust reads
YAML once at ingress (Rule F) and never re-serialises. Sweep
logger appends result rows to Lance — the only egress beyond the
REST response.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
…dation

First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan.
Zero-dep contract-side types the lab API (cognitive-shader-driver)
will carry into JIT compilation.

Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC):

  Enums (Rule E — Wire surface IS the SIMD surface, object-oriented):
    LaneWidth { F32x16, U8x64, F64x8, BF16x32 }  — mirrors ndarray::simd::*
    Distance  { AdcU8, AdcI8 }                    — CODING_PRACTICES gap 5
                                                    (sign-handling /
                                                    bipolar cancellation)
    Rotation  { Identity, Hadamard{dim}, Opq{blob,dim} }

  Structs:
    ResidualSpec  { depth, centroids }
    CodecParams   { subspaces, centroids, residual, lane_width,
                    pre_rotation, distance, calibration_rows,
                    measurement_rows, seed }

  Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct):
    CodecParamsBuilder::new()
      .subspaces(u32).centroids(u32).residual(ResidualSpec)
      .lane_width(LaneWidth).rotation(Rotation).distance(Distance)
      .calibration_rows(u32).measurement_rows(u32).seed(u64)
      .build() -> Result<CodecParams, CodecParamsError>

  Validation fires BEFORE any JIT compile (D0.7 precision ladder):
    - ZeroDimension          — subspaces == 0 or centroids == 0
    - OpqRequiresBf16        — OPQ routes through tile_dpbf16ps;
                               only LaneWidth::BF16x32 is valid
    - HadamardDimNotPow2     — Sylvester construction needs dim = 2^k
    - CalibrationEqualsMeasurement — overfit guard: refuses to emit
                               ICC when calibration_rows ==
                               measurement_rows (reproduces PR #219's
                               128-row trained-and-tested artifact)

  Methods on CodecParams:
    kernel_signature() -> u64   — JIT cache key (Rule E); excludes
                                  seed so calibration-sample changes
                                  don't invalidate cached kernels
    is_matmul_heavy() -> bool   — true for OPQ or centroids > 512;
                                  drives Tier-1 AMX dispatch decision
                                  (Rule C polyfill hierarchy)

  Rotation::is_matmul() -> bool  — Identity and Hadamard are false
                                  (butterfly stays on Tier-3 F32x16);
                                  only Opq returns true

14 new tests covering:
  - builder default matches PR #220 baseline shape
  - each validation variant fires correctly
  - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error
  - Hadamard + non-pow2 dim rejected
  - overfit guard fires on calibration == measurement
  - kernel_signature stable across identical builds
  - kernel_signature excludes seed (cache stays hot)
  - kernel_signature changes with centroids / rotation kind
  - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids)

Zero-dep preserved (stdlib only: std::collections::hash_map::
DefaultHasher for kernel_signature, core::fmt + core::error for
error types). No serde in the contract — YAML/JSON deserialisation
belongs to the consumer crate, which will produce CodecParams via
serde at the REST handler (Rule F — serialisation at edge only).

Tests: 147/147 contract suite passing (133 prior + 14 new).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Retroactive hygiene for the recent PR arc + prospective enforcement
so the gap never recurs. User directive: "should have happened to
begin with."

LATEST_STATE.md:
  - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)"
  - Recently Shipped table: prepended rows for #225 (open), #224,
    and #223 with full shipped-content summaries
  - Contract Inventory: expanded cam:: entry with all new codec-
    sweep types (LaneWidth / Distance / Rotation / ResidualSpec /
    CodecParams / CodecParamsBuilder / CodecParamsError) including
    the precision-ladder-fires-before-JIT invariant
  - Active Branches: recorded claude/teleport-session-setup-wMZfb
    and its three merged PRs
  - Active Integration Plans: added codec-sweep-via-lab-infra-v1
    alongside elegant-herding-rocket-v1
  - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/
    0.5) + the elegant-herding Phase 2 block

PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only):
  - #225 entry: plan + CodecParams/Builder/precision validation +
    rules A-F locked + decisions for future PRs
  - #224 entry: three-part lab stack + thinking harvest + I11
    measurability locked
  - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants
    locked (the cross-cutting architectural ruleset this workspace
    now enforces)

STATUS_BOARD.md:
  - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across
    5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued)

EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries):
  - Board hygiene is the driving seat, not cleanup (this session's
    self-reflection turned into a rule)
  - Codec cert is token agreement, not synthetic ICC (#219#220
    arc; #225 CalibrationEqualsMeasurement typed rejection)
  - Lab REST surface is three-part (API + Planner + JIT), not just
    scaffolding
  - Thinking harvest via REST/Cypher = the AGI magic bullet
  - SoA never scalarises without ndarray (iron rule Rule C)
  - AGI is the glove, not the oracle — four-axis SoA is what you
    wear

CLAUDE.md — new top-level § "The Stance — Driving Seat +
AGI-as-Glove (P0, read first)":

  - Explicit driving-seat posture: the session STEERS the stack,
    doesn't observe it
  - AGI-as-glove doctrine concrete: topic → FingerprintColumns,
    angle → QualiaColumn, thinking → MetaColumn, planner →
    EdgeColumn. New capability lands as a new column, not a layer.
  - MANDATORY Board-Hygiene Rule as a table: every PR that adds a
    type / plan / D-id / epiphany / tech-debt / issue MUST update
    the corresponding board file IN THE SAME COMMIT. Retroactive
    hygiene (merge PR → later cleanup) is now an anti-pattern the
    rule forbids.
  - "Consult, don't guess" — agent/knowledge-first discipline:
    specialist-agent card → knowledge doc → board inventory →
    only then grep. Subagent spawn with curated docs beats main-
    thread grep.

147/147 contract suite still passing. Doc-only PR otherwise
(Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps
from the timed-out bus-compiler subagent were reverted — they'll
land with D0.1/D0.3 when the Wire code lands).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1.
66/66 cognitive-shader-driver tests pass under --features serve (+11 new).

D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1):
  Reads <model_path>/config.json (HuggingFace layout) and returns
  ModelFingerprint { architecture, hidden_size, n_layers,
  tokenizer_class, vocab_size, default_lane_width, default_distance }.

  Architecture routing:
    llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX)
    bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512)
  torch_dtype override wins over architecture heuristic.

  Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}.
  Best-effort tokenizer_class from tokenizer_config.json.

  8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta
  (d_model alias) / generic fallback / missing-config / missing-field.

D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate):
  DTOs:
    WireBaseline { Passthrough } — default, extensible
    WireTokenAgreement { model_path, reference, candidate (WireCodecParams),
                          prompt_set_blob_id, n_tokens }
    WireTokenAgreementResult { top1_rate, top5_rate,
                                divergence_positions, per_layer_mse,
                                candidate_latency_us, reference_latency_us,
                                stub, backend }

  Phase 0 handler stub (not shipped yet): returns stub:true /
  backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the
  real decode-and-compare loop (reference model load + top-k
  comparison + per-layer MSE).

  Pass gates (for when the harness lands):
    top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline.
    This is the ACTUAL codec cert gate — reconstruction ICC is
    necessary-but-not-sufficient (per #219/#220 lesson).

  3 round-trip serde tests: full payload + stub-backend default +
  baseline default.

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md updated:
    D0.1 Queued → Shipped (PR #227 — was stale)
    D0.2 Queued → In PR (this branch)
    D0.5 Queued → In PR (this branch)

Phase 0 state after this commit:
  ✅ D0.1 WireCalibrate + WireTensorView (PR #227)
  ✅ D0.6 CodecParamsBuilder (PR #225)
  ✅ D0.7 precision-ladder validation (PR #225)
  ✅ D0.5 auto_detect (this PR)
  ✅ D0.2 WireTokenAgreement stub (this PR)
  ⏳ D0.3 WireSweep streaming endpoint (next PR)
  ⏳ D0.4 surface freeze (gates after D0.3)

Rules honored:
  Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams
  Rule E — Wire surface IS the SIMD surface (lane_width on candidate)
  Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 21, 2026
First Phase 2 deliverable — scaffold of the I11 cert gate harness.
The PR #219#220 lesson landed as a typed-rejection wall: the
stub result carries stub:true + backend:"stub" so no client can
confuse Phase 0 stub output for a real measurement.

crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC):

  ReferenceModel { path, path_hash, stub_token_count }
    ::load(&Path) -> Result<Self, TokenAgreementError>
      D2.1 stub: validates path exists, hashes display; does NOT
      parse safetensors yet. D2.2 replaces with real loader driven
      by auto_detect::detect() → ModelFingerprint.
    ::stub(tag, n_tokens) — builds stub model without touching fs

  TokenAgreementError:
    ModelPathMissing { path }
    EmptyPromptSet
    TokenCountMismatch { reference, candidate }
    NotImplementedYet { what }  ← measure_full() until D2.2

  TopKAgreement { top1_matches, top5_matches, total_positions,
                  divergence_positions: Vec<u32> }
    ::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self>
      Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5].
      Records divergence positions for failure-mode analysis
      (late-sequence drift vs random errors).
    ::top1_rate() / top5_rate() -> f32
    ::meets_cert_gate() -> bool  (top1 ≥ 0.99 AND top5 ≥ 0.999)
    ::aggregate(per_prompt) — sums counters; concatenates
      divergence with per-prompt offset so failures stay localised

  TokenAgreementHarness:
    ::new(reference, baseline, candidate, n_tokens)
    ::measure_stub() -> WireTokenAgreementResult { stub:true, .. }
    ::measure_full() -> NotImplementedYet (D2.2 scope)

Tests (13 new):
  - reference_model_stub_builds_without_filesystem
  - reference_model_load_missing_path_yields_typed_error
  - topk_compare_identical_streams_is_perfect (full cert gate pass)
  - topk_compare_all_different_fails_cert_gate
  - topk_top5_matches_when_top1_misses_but_in_top5
    (ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts)
  - topk_mismatched_stream_lengths_yield_typed_error
  - topk_aggregate_sums_counters_and_offsets_divergence
    (prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10)
  - cert_gate_passes_at_exact_thresholds
    (990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit)
  - cert_gate_fails_when_top1_below_threshold_even_if_top5_passes
  - cert_gate_fails_when_top5_below_threshold_even_if_top1_passes
  - harness_measure_stub_returns_machine_checkable_stub_flag
    (stub:true enforced; backend="stub"; all rates 0.0; zero latencies)
  - harness_measure_full_returns_not_implemented_pointing_at_d22
  - harness_measure_stub_rejects_zero_n_tokens

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md D2.1 Queued → In PR

Phase state:
  Phase 0 ✅ complete (D0.1-D0.7 all shipped)
  Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued)
  Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued

Rules honored:
  Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement)
  Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate)
  Rule F — No serialization between stages; per-prompt Vec<Vec<u32>>
           token streams are plain Rust owned; the serde happens at
           D2.3 handler entry / exit only

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 21, 2026
First Phase 2 deliverable — scaffold of the I11 cert gate harness.
The PR #219#220 lesson landed as a typed-rejection wall: the
stub result carries stub:true + backend:"stub" so no client can
confuse Phase 0 stub output for a real measurement.

crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC):

  ReferenceModel { path, path_hash, stub_token_count }
    ::load(&Path) -> Result<Self, TokenAgreementError>
      D2.1 stub: validates path exists, hashes display; does NOT
      parse safetensors yet. D2.2 replaces with real loader driven
      by auto_detect::detect() → ModelFingerprint.
    ::stub(tag, n_tokens) — builds stub model without touching fs

  TokenAgreementError:
    ModelPathMissing { path }
    EmptyPromptSet
    TokenCountMismatch { reference, candidate }
    NotImplementedYet { what }  ← measure_full() until D2.2

  TopKAgreement { top1_matches, top5_matches, total_positions,
                  divergence_positions: Vec<u32> }
    ::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self>
      Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5].
      Records divergence positions for failure-mode analysis
      (late-sequence drift vs random errors).
    ::top1_rate() / top5_rate() -> f32
    ::meets_cert_gate() -> bool  (top1 ≥ 0.99 AND top5 ≥ 0.999)
    ::aggregate(per_prompt) — sums counters; concatenates
      divergence with per-prompt offset so failures stay localised

  TokenAgreementHarness:
    ::new(reference, baseline, candidate, n_tokens)
    ::measure_stub() -> WireTokenAgreementResult { stub:true, .. }
    ::measure_full() -> NotImplementedYet (D2.2 scope)

Tests (13 new):
  - reference_model_stub_builds_without_filesystem
  - reference_model_load_missing_path_yields_typed_error
  - topk_compare_identical_streams_is_perfect (full cert gate pass)
  - topk_compare_all_different_fails_cert_gate
  - topk_top5_matches_when_top1_misses_but_in_top5
    (ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts)
  - topk_mismatched_stream_lengths_yield_typed_error
  - topk_aggregate_sums_counters_and_offsets_divergence
    (prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10)
  - cert_gate_passes_at_exact_thresholds
    (990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit)
  - cert_gate_fails_when_top1_below_threshold_even_if_top5_passes
  - cert_gate_fails_when_top5_below_threshold_even_if_top1_passes
  - harness_measure_stub_returns_machine_checkable_stub_flag
    (stub:true enforced; backend="stub"; all rates 0.0; zero latencies)
  - harness_measure_full_returns_not_implemented_pointing_at_d22
  - harness_measure_stub_rejects_zero_n_tokens

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md D2.1 Queued → In PR

Phase state:
  Phase 0 ✅ complete (D0.1-D0.7 all shipped)
  Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued)
  Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued

Rules honored:
  Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement)
  Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate)
  Rule F — No serialization between stages; per-prompt Vec<Vec<u32>>
           token streams are plain Rust owned; the serde happens at
           D2.3 handler entry / exit only

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 21, 2026
…ft guard)

Final Phase 3 scaffold deliverable — curl-driven lab iteration against
the shipped /v1/shader/sweep endpoint.

Files:

  configs/codec/README.md — inventory + DoS-ceiling note +
                            anti-#219 stub:true flag explanation

  configs/codec/00_pr220_baseline.yaml
    - PR #220 baseline regression: 6 subspaces × 256 centroids ×
      identity rotation. Expected ICC ≈ 0.195 mean when D2.2 lands
      real decode-and-compare.

  configs/codec/10_wider_codebook.yaml
    - PR #220 fix (a): centroids ∈ {256, 512, 1024}. Cardinality 3,
      three distinct kernel signatures → warm cache after one pass.

  configs/codec/12_hadamard_pre_rotation.yaml
    - PR #220 fix (c): Hadamard × centroids cross-product (2×2 = 4).
      Hadamard stays Tier-3 F32x16 per Rule C.

  scripts/codec_sweep.sh
    - yq YAML → JSON conversion
    - POST to ${SHADER_LAB_URL}/v1/shader/sweep (default localhost:3001)
    - jq-pretty request + response
    - Stub honesty check: prints results[0].stub flag
      → verifies Phase 0 returns true (machine-checkable anti-#219)
    - Requires: yq (mikefarah/yq ≥ v4), curl, jq

  wire.rs +1 test: sweep_request_yaml_shape_deserializes_via_serde_json
    - Inline JSON fixture mirroring the canonical YAML → JSON shape
    - If this test breaks, the YAML configs are stale relative to
      the Rust DTOs → scripts/codec_sweep.sh would fail at runtime
    - Caught a real drift during development: PascalCase "Identity"
      vs the DTO's rename_all="lowercase" (YAMLs correctly use
      lowercase; test fixture had the typo)

Phase state:
  Phase 0 ✅ complete
  Phase 1 scaffold ✅ (D1.1 / D1.2 / D1.3 shipped; D1.1b queued)
  Phase 2 scaffold ✅ (D2.1 harness + D2.3 handler; D2.2 queued)
  Phase 3 scaffold ✅ — D3.1 batch handler + D3.2 client driver shipped
                   ⏳ D3.1b real Lance append writer queued

DoS-ceiling note: sweep handler rejects grids with cardinality
> 10_000 before enumeration (PR #238 P1 fix). README documents the
ceiling so config authors can budget axis lengths.

Rule D honored: adding a new codec candidate = authoring a new
YAML file in configs/codec/. Zero Rust changes. Zero rebuilds.

Rules F honored at the client boundary: YAML → JSON → HTTP ingress.
Single deserialisation at the shader-lab's handler; everything
after is in-memory Rust (WireSweepRequest → CodecParams → grid
enumerate() → per-candidate WireSweepResult).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants