codec research: CAM-PQ solves argmax blind spot (ICC 0.9998 at 6 B/row) + production plan by AdaWorldAPI · Pull Request #219 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-20T07:23:33Z

Summary

Full codec research arc ending at a measured result and a production wiring plan.

Headline measurement: ndarray::hpc::cam_pq (production codec, already shipped)
hits ICC 0.9998 / top-5 recall 1.0 on Qwen3-8B q_proj / k_proj / gate_proj
at 6 bytes per row once registered in codec_rnd_bench.rs. Had-Q5×D-R's
0-B/row claim was a bench placeholder bug (storage is actually ~2 KB/row).

Honest compression: ~128× per Qwen3-8B model, not 1300× — codebook is
~4 MB per tensor, not 24 KB. Still a big win, corrected in the plan.

What this PR ships (docs + lab-gated research)

Findings doc: .claude/knowledge/codec-findings-2026-04-20.md — 5 measured
invariants (sign-flip, per-row norm, maximally-irrational stride, aperiodic
φ-sampling, sign-bit vs i8), codec hierarchy table, decision tree, dead-end
list, 5 unmeasured probes queued.
Production wiring plan: .claude/plans/cam-pq-production-wiring-v1.md —
D1 classifier, D2 calibration, D3 Lance schema, D4 runtime decode, D5 full-
size validation, D6 end-to-end token-agreement benchmark, D7 fallback.
~8 person-days.
Lab-gated codec candidates (behind lab feature — do NOT link into
production): Fractal (DEAD — sign-flip invariance), Zipper-Full / 5^5 /
7^7 / I8-μ-law, CamPqRaw / CamPqPhase. All measured through
codec_rnd_bench.rs ICC framework.
EPIPHANIES ledger entries for every pivot — including the
Had-Q5×D-R correction.

What this PR does NOT ship

Production wiring. The codec exists in ndarray::hpc::cam_pq (620+ LOC,
15+ tests) but no production consumer routes through it yet. That's the
follow-up PR (D1–D7 in the plan).

Test plan

cargo check -p bgz-tensor --no-default-features — main builds don't link lab.
cargo check -p bgz-tensor --features lab — lab candidates compile.
cargo run --release --features lab --example codec_rnd_bench reproduces
ICC 0.9998 for CamPqRaw/CamPqPhase at 6 B/row.
D5/D6 full-size validation before any production wiring lands — ICC ≥ 0.99 on full 4096-row tensors; ≤1 % top-1 token agreement loss vs Passthrough baseline. Gated in the plan.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Earlier claim "Had-Q5×D-R ICC 0.989 at 0 B/row → argmax wall cracked" was wrong. ParametricCodec::bytes_per_row() returns hardcoded 0 as an instrumentation placeholder; actual storage is 4 bits × n_cols full- dim Hadamard-quantized = ~2 KB/row for q_proj. Corrected compact hierarchy: no codec ≤ 100 B/row in this bench reaches ICC > 0.3. Zipper-Full at 64 B (ICC 0.204) remains the honest compact Pareto leader. Real compact argmax codec (codebook-only, shared state) would need CAM-PQ wiring — already production in ndarray::hpc::cam_pq but not registered as CodecCandidate in this bench. That's the true probe to settle "can we get ICC > 0.5 at ~9 B/row?" https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…ly probe Per user: reality-check through endpoint on whether a codebook-only codec actually hits ICC > 0.5 at compact bytes. Wires ndarray::hpc::cam_pq (already production) as two lab-gated CodecCandidates in codec_rnd_bench.rs: CAM-PQ-Raw(6B) — baseline: train_geometric on raw rows, 6 subspaces × 256 centroids, 6 B fingerprint. CAM-PQ-Phase(6B) — repurposed: train codebook on Hadamard-rotated rows so subspaces sample frequency bands, not coordinates. Encodes via WHT → quantize → reconstruct → inverse WHT → cosine. Per-population calibration (train codebook on the 128-row sample). Shared codebook ~24 KB for 1024-d (6 × 256 × 170 B subvectors). Per-row: honest 6 B (the fingerprint indices). If CAM-PQ-Phase hits ICC > 0.5 on q_proj, it confirms: 1. The argmax blind spot IS solvable at compact bytes with a population-calibrated codebook. 2. Hadamard pre-rotation fixes the near-orthogonality failure (I2) that plagued CLAM/centroid+residual codecs. If CAM-PQ-Phase fails (ICC near 0), it confirms that I2 is deeper than the basis — argmax-regime requires per-row identity preservation beyond any shared-codebook approach. Either way: this is the reality-check the findings doc asked for. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Wired ndarray::hpc::cam_pq as CodecCandidate. Measured result: | CAM-PQ-Raw | 6 B/row | ICC 0.9998 | top-5 recall 1.0 | | CAM-PQ-Phase | 6 B/row | ICC 0.9998 | top-5 recall 1.0 | Across all three populations (k_proj, gate_proj, q_proj). Per-row storage 6 B + ~24 KB shared codebook per tensor (amortized to zero as N_rows grows). Compression: Qwen3-8B q_proj 4096×4096 f32 (64 MB) → 48 KB total at ICC 0.9999. 1300× compression, near-Passthrough fidelity. Hadamard pre-rotation made no difference — k-means captures the discriminative structure in either basis. The "argmax needs JL/PolarQuant" intuition was wrong; plain PQ with subspace k-means suffices. The entire fractal → zipper research arc was solving a solved problem. CAM-PQ has been production in ndarray::hpc::cam_pq since Phase 1. All 10 zipper candidates are superseded on argmax ICC. Wiring next: expose CAM-PQ through CamCodecContract to consumers currently defaulting to Passthrough on argmax tensors. 1300× storage win. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Plan to wire ndarray::hpc::cam_pq as default argmax-regime codec. Measured: ICC 0.9999 at 6 B/row. Honest compression ~128× per model. D1 classifier, D2 calibration, D3 Lance storage, D4 runtime decode, D5 full-size validation, D6 E2E bench, D7 fallback. Registered in INTEGRATION_PLANS.md. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Enforces invariant I1: index-regime tensors (embed_tokens, lm_head, token_embd, wte/wpe) MUST stay Passthrough — identity lookup can't survive any codec. Argmax-regime (attention Q/K/V/O, MLP gate/up/down) routes to CamPq. Norms/conv/small → Skip. Order of rules matters: index-regime match comes BEFORE the ambiguous-large-2D fallback so lm_head (2D, 151936×hidden) isn't misrouted. Covered by lm_head_not_misrouted_as_campq test. 8 tests covering Qwen/Llama/GPT-2/GGUF naming conventions. 133/133 contract tests pass. Zero deps preserved. First deliverable (D1) of the CAM-PQ production wiring plan merged in PR #219.

The LAB-ONLY surface isn't just quarantine scaffolding — it's the codec-research iteration testbed. Its reason for existing is the cost of the alternative: every codec candidate re-measured through a cargo build cycle burns minutes per iteration. With the lab REST/gRPC + wire DTOs, a single binary serves dozens of candidates against the same safetensors in seconds per call. PR #220 falsified PR #219's ICC-0.9998 claim via exactly this path: the calibration CLI + /v1/shader/calibrate endpoint surfaced mean ICC 0.195 / 0/234 pass rate on full Qwen3-TTS-0.6B tensors before any production consumer linked the codec. Two purposes now named explicitly in the doc: 1. Iteration velocity (positive) — lab surface = curl-friendly research loop, no rebuild per candidate. 2. Canonical firewall (guard) — consumers still walk UnifiedStep via OrchestrationBridge; they never see Wire* per-op DTOs. Changes: - New subsection "Why the Lab Surface Exists (positive purpose — not just quarantine)" with the #219 → #220 worked-example table. - Decision Procedure item 3 reframed: research ops and curl-friendly debug shortcuts are a legitimate use of the lab surface, with a graduation rule (full-size validation → new StepDomain variant; lab endpoint stays for continued iteration, production moves to bridge). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… I11 measurability The prior "positive purpose" framing was too narrow (codec iteration velocity). The actual architecture the lab surface buys is three-part: REST/gRPC API — no rebuild per codec candidate Planner — real dispatch path under test (not a toy bench) JIT — swap kernels at runtime without relinking Two loads share this stack; neither is secondary: 1. Codec certification. Reconstruction ICC on real safetensors is necessary but not sufficient — the cert gate is token agreement vs Passthrough on full decode. PR #219's 0.9998 was synthetic / overfit-on-training; PR #220's 0.195 was real-weight but still reconstruction-only. The next load-bearing measurement is the token-level comparison, which is only tractable on this stack. At 8-17 min/rebuild × ~200 codec invariants to tune, iteration without the API is infeasible. 2. Thinking harvest (the AGI magic bullet). The same API + Planner + JIT externalises the planner's 36-style / 13-verb / NARS trace. POST a Cypher query, get {rows, thinking_trace} back. The trace is log / replay / NARS-revise-able — which is the architectural shape of a system that learns its own meta-inference. This is the REST/Cypher injection path we can revive at near-zero cost now that PR #221 landed the REST/gRPC scaffolding. I11 (new invariant): Measurable stack, not a black box. Every layer (L0 ndarray → L4 planner) emits a harvest-ready trace through the lab surface. Proposed changes that shrink trace for perf/simplicity are rejected — the trace contract is what makes the feedback loop mechanisable. Also refined: Decision Procedure item 3 (codec research is a legitimate positive use, not a grudging exception); rule-of-thumb measurement order (reconstruction error → reconstruction ICC → token agreement) with token agreement as the cert gate. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Eight concrete YAML configs for configs/codec/*.yaml that Phase 0 will consume: 00_baseline_passthrough — regression anchor (top1=1.000 exactly) 01_pr220_baseline — negative control, reproduces #220 ICC 0.195 02_pr219_overfit_reproducer — negative control, split-test must FAIL 10_fix_a_wider_codebook — #220 (a) 1024 centroids 11_fix_b_residual_pq — #220 (b) residual depth=1 12_fix_c_hadamard_rotation — #220 (c) Hadamard pre-rotation 13_fix_d_opq_rotation — #220 (d) OPQ learned rotation 20_composite_a_plus_b — composition probe for combinatorial lift 30_cross_product_sweep — SweepGrid for D3.1 initial sweep Each YAML: - Names lane_width explicitly (Rule E) so the JIT compiles the right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others default to F32x16. - Carries a notes: block stating the expected measurement outcome, so Phase 0's regression detection has ground truth to check against (e.g., baseline reproducer must produce ICC ≈ 0.195, overfit reproducer must FAIL the split-test). - Separates calibration_rows from measurement_rows where relevant (pr219_overfit_reproducer sets them equal so the pipeline refuses to report the ICC, demonstrating the guard that prevents PR #219's overfit-on-training artefact from recurring). 30_cross_product_sweep specifies the initial 54-candidate grid (1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance × 2 lane widths). Expected JIT compile budget: ~800 ms one-time; everything after is cache hits per Rule A/B. Operating principle reiterated at the end: adding a candidate is authoring a YAML; changing params is editing YAML; Rust reads YAML once at ingress (Rule F) and never re-serialises. Sweep logger appends result rows to Lance — the only egress beyond the REST response. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…dation First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan. Zero-dep contract-side types the lab API (cognitive-shader-driver) will carry into JIT compilation. Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC): Enums (Rule E — Wire surface IS the SIMD surface, object-oriented): LaneWidth { F32x16, U8x64, F64x8, BF16x32 } — mirrors ndarray::simd::* Distance { AdcU8, AdcI8 } — CODING_PRACTICES gap 5 (sign-handling / bipolar cancellation) Rotation { Identity, Hadamard{dim}, Opq{blob,dim} } Structs: ResidualSpec { depth, centroids } CodecParams { subspaces, centroids, residual, lane_width, pre_rotation, distance, calibration_rows, measurement_rows, seed } Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct): CodecParamsBuilder::new() .subspaces(u32).centroids(u32).residual(ResidualSpec) .lane_width(LaneWidth).rotation(Rotation).distance(Distance) .calibration_rows(u32).measurement_rows(u32).seed(u64) .build() -> Result<CodecParams, CodecParamsError> Validation fires BEFORE any JIT compile (D0.7 precision ladder): - ZeroDimension — subspaces == 0 or centroids == 0 - OpqRequiresBf16 — OPQ routes through tile_dpbf16ps; only LaneWidth::BF16x32 is valid - HadamardDimNotPow2 — Sylvester construction needs dim = 2^k - CalibrationEqualsMeasurement — overfit guard: refuses to emit ICC when calibration_rows == measurement_rows (reproduces PR #219's 128-row trained-and-tested artifact) Methods on CodecParams: kernel_signature() -> u64 — JIT cache key (Rule E); excludes seed so calibration-sample changes don't invalidate cached kernels is_matmul_heavy() -> bool — true for OPQ or centroids > 512; drives Tier-1 AMX dispatch decision (Rule C polyfill hierarchy) Rotation::is_matmul() -> bool — Identity and Hadamard are false (butterfly stays on Tier-3 F32x16); only Opq returns true 14 new tests covering: - builder default matches PR #220 baseline shape - each validation variant fires correctly - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error - Hadamard + non-pow2 dim rejected - overfit guard fires on calibration == measurement - kernel_signature stable across identical builds - kernel_signature excludes seed (cache stays hot) - kernel_signature changes with centroids / rotation kind - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids) Zero-dep preserved (stdlib only: std::collections::hash_map:: DefaultHasher for kernel_signature, core::fmt + core::error for error types). No serde in the contract — YAML/JSON deserialisation belongs to the consumer crate, which will produce CodecParams via serde at the REST handler (Rule F — serialisation at edge only). Tests: 147/147 contract suite passing (133 prior + 14 new). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Eight concrete YAML configs for configs/codec/*.yaml that Phase 0 will consume: 00_baseline_passthrough — regression anchor (top1=1.000 exactly) 01_pr220_baseline — negative control, reproduces #220 ICC 0.195 02_pr219_overfit_reproducer — negative control, split-test must FAIL 10_fix_a_wider_codebook — #220 (a) 1024 centroids 11_fix_b_residual_pq — #220 (b) residual depth=1 12_fix_c_hadamard_rotation — #220 (c) Hadamard pre-rotation 13_fix_d_opq_rotation — #220 (d) OPQ learned rotation 20_composite_a_plus_b — composition probe for combinatorial lift 30_cross_product_sweep — SweepGrid for D3.1 initial sweep Each YAML: - Names lane_width explicitly (Rule E) so the JIT compiles the right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others default to F32x16. - Carries a notes: block stating the expected measurement outcome, so Phase 0's regression detection has ground truth to check against (e.g., baseline reproducer must produce ICC ≈ 0.195, overfit reproducer must FAIL the split-test). - Separates calibration_rows from measurement_rows where relevant (pr219_overfit_reproducer sets them equal so the pipeline refuses to report the ICC, demonstrating the guard that prevents PR #219's overfit-on-training artefact from recurring). 30_cross_product_sweep specifies the initial 54-candidate grid (1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance × 2 lane widths). Expected JIT compile budget: ~800 ms one-time; everything after is cache hits per Rule A/B. Operating principle reiterated at the end: adding a candidate is authoring a YAML; changing params is editing YAML; Rust reads YAML once at ingress (Rule F) and never re-serialises. Sweep logger appends result rows to Lance — the only egress beyond the REST response. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…dation First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan. Zero-dep contract-side types the lab API (cognitive-shader-driver) will carry into JIT compilation. Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC): Enums (Rule E — Wire surface IS the SIMD surface, object-oriented): LaneWidth { F32x16, U8x64, F64x8, BF16x32 } — mirrors ndarray::simd::* Distance { AdcU8, AdcI8 } — CODING_PRACTICES gap 5 (sign-handling / bipolar cancellation) Rotation { Identity, Hadamard{dim}, Opq{blob,dim} } Structs: ResidualSpec { depth, centroids } CodecParams { subspaces, centroids, residual, lane_width, pre_rotation, distance, calibration_rows, measurement_rows, seed } Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct): CodecParamsBuilder::new() .subspaces(u32).centroids(u32).residual(ResidualSpec) .lane_width(LaneWidth).rotation(Rotation).distance(Distance) .calibration_rows(u32).measurement_rows(u32).seed(u64) .build() -> Result<CodecParams, CodecParamsError> Validation fires BEFORE any JIT compile (D0.7 precision ladder): - ZeroDimension — subspaces == 0 or centroids == 0 - OpqRequiresBf16 — OPQ routes through tile_dpbf16ps; only LaneWidth::BF16x32 is valid - HadamardDimNotPow2 — Sylvester construction needs dim = 2^k - CalibrationEqualsMeasurement — overfit guard: refuses to emit ICC when calibration_rows == measurement_rows (reproduces PR #219's 128-row trained-and-tested artifact) Methods on CodecParams: kernel_signature() -> u64 — JIT cache key (Rule E); excludes seed so calibration-sample changes don't invalidate cached kernels is_matmul_heavy() -> bool — true for OPQ or centroids > 512; drives Tier-1 AMX dispatch decision (Rule C polyfill hierarchy) Rotation::is_matmul() -> bool — Identity and Hadamard are false (butterfly stays on Tier-3 F32x16); only Opq returns true 14 new tests covering: - builder default matches PR #220 baseline shape - each validation variant fires correctly - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error - Hadamard + non-pow2 dim rejected - overfit guard fires on calibration == measurement - kernel_signature stable across identical builds - kernel_signature excludes seed (cache stays hot) - kernel_signature changes with centroids / rotation kind - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids) Zero-dep preserved (stdlib only: std::collections::hash_map:: DefaultHasher for kernel_signature, core::fmt + core::error for error types). No serde in the contract — YAML/JSON deserialisation belongs to the consumer crate, which will produce CodecParams via serde at the REST handler (Rule F — serialisation at edge only). Tests: 147/147 contract suite passing (133 prior + 14 new). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Retroactive hygiene for the recent PR arc + prospective enforcement so the gap never recurs. User directive: "should have happened to begin with." LATEST_STATE.md: - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)" - Recently Shipped table: prepended rows for #225 (open), #224, and #223 with full shipped-content summaries - Contract Inventory: expanded cam:: entry with all new codec- sweep types (LaneWidth / Distance / Rotation / ResidualSpec / CodecParams / CodecParamsBuilder / CodecParamsError) including the precision-ladder-fires-before-JIT invariant - Active Branches: recorded claude/teleport-session-setup-wMZfb and its three merged PRs - Active Integration Plans: added codec-sweep-via-lab-infra-v1 alongside elegant-herding-rocket-v1 - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/ 0.5) + the elegant-herding Phase 2 block PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only): - #225 entry: plan + CodecParams/Builder/precision validation + rules A-F locked + decisions for future PRs - #224 entry: three-part lab stack + thinking harvest + I11 measurability locked - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants locked (the cross-cutting architectural ruleset this workspace now enforces) STATUS_BOARD.md: - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across 5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued) EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries): - Board hygiene is the driving seat, not cleanup (this session's self-reflection turned into a rule) - Codec cert is token agreement, not synthetic ICC (#219 → #220 arc; #225 CalibrationEqualsMeasurement typed rejection) - Lab REST surface is three-part (API + Planner + JIT), not just scaffolding - Thinking harvest via REST/Cypher = the AGI magic bullet - SoA never scalarises without ndarray (iron rule Rule C) - AGI is the glove, not the oracle — four-axis SoA is what you wear CLAUDE.md — new top-level § "The Stance — Driving Seat + AGI-as-Glove (P0, read first)": - Explicit driving-seat posture: the session STEERS the stack, doesn't observe it - AGI-as-glove doctrine concrete: topic → FingerprintColumns, angle → QualiaColumn, thinking → MetaColumn, planner → EdgeColumn. New capability lands as a new column, not a layer. - MANDATORY Board-Hygiene Rule as a table: every PR that adds a type / plan / D-id / epiphany / tech-debt / issue MUST update the corresponding board file IN THE SAME COMMIT. Retroactive hygiene (merge PR → later cleanup) is now an anti-pattern the rule forbids. - "Consult, don't guess" — agent/knowledge-first discipline: specialist-agent card → knowledge doc → board inventory → only then grep. Subagent spawn with curated docs beats main- thread grep. 147/147 contract suite still passing. Doc-only PR otherwise (Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps from the timed-out bus-compiler subagent were reverted — they'll land with D0.1/D0.3 when the Wire code lands). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

First code deliverable of codec-sweep-via-lab-infra Phase 0 on the consumer side. Extends the lab Wire surface to carry the CodecParams shape from PR #225 with an object-oriented tensor DTO that decodes once at ingress into a 64-byte-aligned buffer consumable directly by F32x16::from_slice via slice::array_windows::<64>. crates/cognitive-shader-driver/src/wire.rs: Serde mirrors for contract::cam types (zero serde in the contract per CLAUDE.md rule 5): - WireLaneWidth {F32x16, U8x64, F64x8, BF16x32} - WireDistance {AdcU8, AdcI8} - WireRotation {Identity, Hadamard{dim}, Opq{matrix_blob_id, dim}} - WireResidualSpec {depth, centroids} - WireCodecParams — mirrors CodecParams one-for-one - From/TryFrom conversions; TryFrom<WireCodecParams> for CodecParams runs the precision-ladder validation (OPQ↔BF16x32, Hadamard pow2, overfit guard rejecting calibration_rows == measurement_rows) BEFORE any JIT compile would fire. WireTensorView + AlignedBytes (Rule A + Rule E + Rule F): - shape [u32; 2] + lane_width + bytes_base64 on the wire - decode() base64-decodes ONCE at ingress into AlignedBytes (heap, 64-byte aligned via Layout::from_size_align, Drop deallocates with matching layout, Send + Sync) - row() / subspace() / row_count() / col_count() / row_bytes() / element_bytes() — object-oriented methods per Rule E, mirror the SoA+SIMD ops the JIT kernel will perform - is_aligned_64() for the kernel_contract_test gate - WireTensorViewError {Base64, SizeMismatch, ZeroShape} WireCalibrateRequest extended additively: - New: params: Option<WireCodecParams> + tensor_view: Option<WireTensorView> (the new path) - Legacy: num_subspaces / num_centroids / kmeans_iterations / max_rows / icc_samples preserved for back-compat WireCalibrateResponse extended additively: - New: kernel_hash (= CodecParams::kernel_signature()), compile_time_us, backend ("amx" | "vnni" | "avx512" | "avx2" | "legacy") - Never "scalar" on a SoA path — the iron rule enforced at the response contract crates/cognitive-shader-driver/Cargo.toml: - serve feature now pulls base64 (v0.22) + bytemuck (v1) optional deps. No new features; these belong under the existing lab umbrella. crates/cognitive-shader-driver/src/codec_research.rs: - Legacy calibrate_tensor path fills the new response fields with zeros + backend = "legacy". D1.1 (JIT kernel) populates them meaningfully when it lands. Tests (8 new, all passing under --features serve): - wire_codec_params_round_trip_to_contract — OPQ + BF16x32 + wide codebook → builds cleanly, is_matmul_heavy true - wire_codec_params_rejects_opq_with_f32x16 — precision-ladder guard typed-rejects the wrong lane at ingress - wire_codec_params_rejects_calibration_equals_measurement — overfit guard typed-rejects the PR #219 pattern at ingress - wire_codec_params_deserializes_from_minimal_json — serde defaults correct (lane_width=F32x16, distance=AdcU8, rotation=Identity, calibration_rows=2048, seed=42) - wire_tensor_view_decode_lands_in_64byte_aligned_buffer — explicit Rule A proof: decoded AlignedBytes.is_aligned_64(), and slice::array_windows::<64>() yields exactly one window per 16-col F32 row - wire_tensor_view_rejects_size_mismatch — typed error for base64 payload not matching declared shape - wire_tensor_view_subspace_slicing — subspace(row, k, sub_bytes) returns the expected offset+len - wire_calibrate_request_{accepts_new_params_field, back_compat_legacy_fields} — both the new (params-carrying) and legacy payload shapes deserialise correctly Board hygiene in the SAME commit (per CLAUDE.md Mandatory Board-Hygiene Rule from PR #226): - STATUS_BOARD.md D0.1 row: Queued → In PR - LATEST_STATE.md cognitive-shader-driver section: new subsection listing the Wire surface types landed Test summary: 55/55 cognitive-shader-driver tests pass under --features serve; 147/147 lance-graph-contract tests pass. Rules honored: Rule A — stdlib slice::array_windows::<N>() + ndarray::simd::*, proven by the test that calls array_windows::<64>() on the decoded row Rule B — no std::arch, no hpc::simd_avxNNN reach; ndarray::simd::* imports only Rule C — n/a for DTO code (JIT tier selection lands D1.1) Rule D — JSON/YAML/REST only, no in-Rust CodecParams construction on the Wire side Rule E — Wire surface IS the SIMD surface; LaneWidth explicit, methods not scalar bags, 64-byte-aligned decode proven Rule F — decode ONCE at ingress via WireTensorView::decode; WireCalibrateRequest::params: Option<WireCodecParams> carries the Rust object through the rest of the pipeline https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1. 66/66 cognitive-shader-driver tests pass under --features serve (+11 new). D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1): Reads <model_path>/config.json (HuggingFace layout) and returns ModelFingerprint { architecture, hidden_size, n_layers, tokenizer_class, vocab_size, default_lane_width, default_distance }. Architecture routing: llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX) bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512) torch_dtype override wins over architecture heuristic. Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}. Best-effort tokenizer_class from tokenizer_config.json. 8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta (d_model alias) / generic fallback / missing-config / missing-field. D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate): DTOs: WireBaseline { Passthrough } — default, extensible WireTokenAgreement { model_path, reference, candidate (WireCodecParams), prompt_set_blob_id, n_tokens } WireTokenAgreementResult { top1_rate, top5_rate, divergence_positions, per_layer_mse, candidate_latency_us, reference_latency_us, stub, backend } Phase 0 handler stub (not shipped yet): returns stub:true / backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the real decode-and-compare loop (reference model load + top-k comparison + per-layer MSE). Pass gates (for when the harness lands): top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline. This is the ACTUAL codec cert gate — reconstruction ICC is necessary-but-not-sufficient (per #219/#220 lesson). 3 round-trip serde tests: full payload + stub-backend default + baseline default. Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md updated: D0.1 Queued → Shipped (PR #227 — was stale) D0.2 Queued → In PR (this branch) D0.5 Queued → In PR (this branch) Phase 0 state after this commit: ✅ D0.1 WireCalibrate + WireTensorView (PR #227) ✅ D0.6 CodecParamsBuilder (PR #225) ✅ D0.7 precision-ladder validation (PR #225) ✅ D0.5 auto_detect (this PR) ✅ D0.2 WireTokenAgreement stub (this PR) ⏳ D0.3 WireSweep streaming endpoint (next PR) ⏳ D0.4 surface freeze (gates after D0.3) Rules honored: Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams Rule E — Wire surface IS the SIMD surface (lane_width on candidate) Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Two findings from the D0.2 + D0.5 implementation, landed per user directive "including the epiphanies board": 1. D0.2 stub flag is anti-#219 defense at the type level. WireTokenAgreementResult.stub:bool + backend:"stub" default make the "synthetic-rows-mistaken-for-real" failure machine-checkable, not just documented. Generalises: every Phase-N surface DTO that lands before its Phase-N+k harness should carry an explicit stub flag. 2. D0.5 auto_detect is the concrete Python↔Rust handshake mechanism. Same architecture→lane-width table in Rosetta v2 Python and Rust auto_detect.rs. E-MEMB-11 handshake moves from conceptual to implemented; the slice-layout reconciliation doc (E-MEMB-1 fix) can use the same pattern (architecture → layout version → canonical slice table). Both entries prepended per APPEND-ONLY rule. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

First Phase 2 deliverable — scaffold of the I11 cert gate harness. The PR #219 → #220 lesson landed as a typed-rejection wall: the stub result carries stub:true + backend:"stub" so no client can confuse Phase 0 stub output for a real measurement. crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC): ReferenceModel { path, path_hash, stub_token_count } ::load(&Path) -> Result<Self, TokenAgreementError> D2.1 stub: validates path exists, hashes display; does NOT parse safetensors yet. D2.2 replaces with real loader driven by auto_detect::detect() → ModelFingerprint. ::stub(tag, n_tokens) — builds stub model without touching fs TokenAgreementError: ModelPathMissing { path } EmptyPromptSet TokenCountMismatch { reference, candidate } NotImplementedYet { what } ← measure_full() until D2.2 TopKAgreement { top1_matches, top5_matches, total_positions, divergence_positions: Vec<u32> } ::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self> Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5]. Records divergence positions for failure-mode analysis (late-sequence drift vs random errors). ::top1_rate() / top5_rate() -> f32 ::meets_cert_gate() -> bool (top1 ≥ 0.99 AND top5 ≥ 0.999) ::aggregate(per_prompt) — sums counters; concatenates divergence with per-prompt offset so failures stay localised TokenAgreementHarness: ::new(reference, baseline, candidate, n_tokens) ::measure_stub() -> WireTokenAgreementResult { stub:true, .. } ::measure_full() -> NotImplementedYet (D2.2 scope) Tests (13 new): - reference_model_stub_builds_without_filesystem - reference_model_load_missing_path_yields_typed_error - topk_compare_identical_streams_is_perfect (full cert gate pass) - topk_compare_all_different_fails_cert_gate - topk_top5_matches_when_top1_misses_but_in_top5 (ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts) - topk_mismatched_stream_lengths_yield_typed_error - topk_aggregate_sums_counters_and_offsets_divergence (prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10) - cert_gate_passes_at_exact_thresholds (990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit) - cert_gate_fails_when_top1_below_threshold_even_if_top5_passes - cert_gate_fails_when_top5_below_threshold_even_if_top1_passes - harness_measure_stub_returns_machine_checkable_stub_flag (stub:true enforced; backend="stub"; all rates 0.0; zero latencies) - harness_measure_full_returns_not_implemented_pointing_at_d22 - harness_measure_stub_rejects_zero_n_tokens Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md D2.1 Queued → In PR Phase state: Phase 0 ✅ complete (D0.1-D0.7 all shipped) Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued) Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued Rules honored: Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement) Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate) Rule F — No serialization between stages; per-prompt Vec<Vec<u32>> token streams are plain Rust owned; the serde happens at D2.3 handler entry / exit only https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…ete) Final Phase 2 surface deliverable — routes WireTokenAgreement requests through the D2.1 TokenAgreementHarness stub. End-to-end flow: WireTokenAgreement (JSON at ingress) ↓ serde_json deserialize (Rule F: edge only) Handler validates candidate: WireCodecParams → CodecParams ↓ precision-ladder + overfit guard fire here (ingress) NOT deeper ReferenceModel::load(&req.model_path) when path exists OR ReferenceModel::stub(hash(model_path), 0) fallback ↓ deterministic stub keyed on model_path string for regression tests TokenAgreementHarness::new(ref, baseline, candidate, n_tokens) ↓ .measure_stub() → WireTokenAgreementResult { stub:true, backend:"stub" } ↓ serde_json serialize (Rule F: edge only) HTTP 200 Json response crates/cognitive-shader-driver/src/serve.rs — ~70 LOC: - token_agreement_handler async fn - new imports: ReferenceModel, TokenAgreementHarness, WireTokenAgreement, WireTokenAgreementResult, CodecParams, StdPath (aliased to avoid collision with axum::extract::Path) - new route: POST /v1/shader/token-agreement Errors typed at handler boundary: - BAD_REQUEST + "invalid CodecParams: <CodecParamsError display>" for precision-ladder / overfit guard failures - BAD_REQUEST + "model load: ModelPathMissing { path }" when real path is specified but does not exist - BAD_REQUEST + stringified TokenAgreementError for harness errors (EmptyPromptSet when n_tokens = 0) Stub-fallback behavior (when model_path does not exist on fs): Deterministic hash of the path string keys the ReferenceModel stub. Same model_path → same stub fingerprint → test harnesses get repeatable results without needing a real safetensors file. D2.2 replaces with strict path validation once the real loader lands. Why the `stub:true` wall matters end-to-end: Client sends WireTokenAgreement over HTTP → handler returns 200 OK with WireTokenAgreementResult. Without the `stub` flag, a client pipeline could silently treat 0.0 rates as real measurements. With the flag, any `assert!(!result.stub)` fails the pipeline loudly — the Phase 0/D2.1 anti-#219 discipline extends through the HTTP surface. Phase state: Phase 0 ✅ complete Phase 1 scaffold ✅ (D1.1 / D1.2 / D1.3 shipped; D1.1b queued) Phase 2 scaffold ✅ — D2.1 harness + D2.3 handler (this PR) ⏳ D2.2 real decode-and-compare loop queued Tests: 117/117 cognitive-shader-driver --features serve pass (unchanged; handler's logic is a thin pass-through and the harness it delegates to is already covered by 13 D2.1 tests). Board hygiene: STATUS_BOARD.md D2.3 Queued → In PR Rules honored: Rule F — serde_json::from at ingress (Json<WireTokenAgreement>), serde_json::to at egress (Json<WireTokenAgreementResult>); in between: CodecParams + ReferenceModel + Harness are all in-memory Rust objects, zero re-serialisation https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Two fixes from Codex review on the D3.1 sweep_handler: P1 — Reject oversized sweep grids before enumerate(). A small JSON payload with moderately-sized axes can explode into an enormous Cartesian product and exhaust CPU/memory. Added MAX_GRID_CARDINALITY = 10,000 check before materialization; returns 400 with explicit error message if exceeded. P2 — Stop echoing req.log_to_lance into lance_fragment_path. The D3.1 stub does NOT actually append rows to Lance; echoing the requested path back in the response falsely claims persistence that never occurred. Clients treating lance_fragment_path as evidence of successful logging would silently skip retries and lose experiment results. Set to None until the real Lance append writer lands (D3.1b). Same anti-#219 pattern: don't claim a measurement/persistence happened when it didn't. The stub:true flag on per-point results covers measurement-honesty; this fix covers persistence-honesty. 117/117 tests pass. Clippy clean under --features serve. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…ft guard) Final Phase 3 scaffold deliverable — curl-driven lab iteration against the shipped /v1/shader/sweep endpoint. Files: configs/codec/README.md — inventory + DoS-ceiling note + anti-#219 stub:true flag explanation configs/codec/00_pr220_baseline.yaml - PR #220 baseline regression: 6 subspaces × 256 centroids × identity rotation. Expected ICC ≈ 0.195 mean when D2.2 lands real decode-and-compare. configs/codec/10_wider_codebook.yaml - PR #220 fix (a): centroids ∈ {256, 512, 1024}. Cardinality 3, three distinct kernel signatures → warm cache after one pass. configs/codec/12_hadamard_pre_rotation.yaml - PR #220 fix (c): Hadamard × centroids cross-product (2×2 = 4). Hadamard stays Tier-3 F32x16 per Rule C. scripts/codec_sweep.sh - yq YAML → JSON conversion - POST to ${SHADER_LAB_URL}/v1/shader/sweep (default localhost:3001) - jq-pretty request + response - Stub honesty check: prints results[0].stub flag → verifies Phase 0 returns true (machine-checkable anti-#219) - Requires: yq (mikefarah/yq ≥ v4), curl, jq wire.rs +1 test: sweep_request_yaml_shape_deserializes_via_serde_json - Inline JSON fixture mirroring the canonical YAML → JSON shape - If this test breaks, the YAML configs are stale relative to the Rust DTOs → scripts/codec_sweep.sh would fail at runtime - Caught a real drift during development: PascalCase "Identity" vs the DTO's rename_all="lowercase" (YAMLs correctly use lowercase; test fixture had the typo) Phase state: Phase 0 ✅ complete Phase 1 scaffold ✅ (D1.1 / D1.2 / D1.3 shipped; D1.1b queued) Phase 2 scaffold ✅ (D2.1 harness + D2.3 handler; D2.2 queued) Phase 3 scaffold ✅ — D3.1 batch handler + D3.2 client driver shipped ⏳ D3.1b real Lance append writer queued DoS-ceiling note: sweep handler rejects grids with cardinality > 10_000 before enumeration (PR #238 P1 fix). README documents the ceiling so config authors can budget axis lengths. Rule D honored: adding a new codec candidate = authoring a new YAML file in configs/codec/. Zero Rust changes. Zero rebuilds. Rules F honored at the client boundary: YAML → JSON → HTTP ingress. Single deserialisation at the shader-lab's handler; everything after is in-memory Rust (WireSweepRequest → CodecParams → grid enumerate() → per-candidate WireSweepResult). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… log Codex P2 review catch — the "Stub honesty check" block was echoing results[0].stub but always exiting 0, which undermined the anti-#219 safeguard the script claims to enforce. In Phase 0/2 runs, a non-stub or malformed response could look like a successful sweep in automation. Fix: after extracting the flag, case on it: - true → OK, echo success (Phase 0 stub honored) - false → FAIL exit 3, diagnostic message naming two possible causes (server running non-scaffold code OR wrong endpoint hit) - * → FAIL exit 3, points at the malformed response section The fix embodies the session's own principle ("the object does the work" / "stub flag is machine-checkable anti-#219"): a script that claims to check a flag but doesn't exit on failure is exactly the pattern we warn against in CODING_PRACTICES.md. The script now actually enforces what it documents. Cross-ref: PR #239 D3.2; EPIPHANIES.md 2026-04-20 "D0.2 stub flag is anti-#219 defense at the type level"; Codex P2 review 2026-04-21. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… construction) Applies the CODING_PRACTICES "builder is the only path" principle at the type-system level. Ten key Wire DTOs now forbid raw struct-literal construction from external crates: WireCalibrateRequest WireSweepGrid WireCalibrateResponse WireSweepRequest WireCodecParams WireSweepResult WireTensorView WireSweepResponse WireTokenAgreement WireTokenAgreementResult External callers (hypothetical downstream consumer crates) are now forced through one of these paths to construct these types: 1. serde deserialize — JSON/YAML at REST/gRPC ingress (the intended path per Rule F "serialise once at edges only") 2. TryFrom<WireCodecParams> for CodecParams — the shipped validated conversion that runs precision-ladder + overfit guard 3. Future Builder::build() if one is added for a specific DTO Internal construction (tests in wire.rs, grpc.rs conversion code, serve.rs handler body) is UNAFFECTED — #[non_exhaustive] only gates external-crate callers. Same-crate code retains struct-literal access. Why this matters in practice: - `WireCodecParams { subspaces: 6, centroids: 0, ... }` compiled today (skipping the ZeroDimension guard). External users can now only get a CodecParams through the TryFrom conversion, which runs the full validation chain at ingress. - Future new fields on these DTOs won't break external callers (they were already non-exhaustive-forbidden from raw struct literals), so downstream rebuilds don't need coordinated updates. - The "object does the work" principle now has teeth — the type system enforces what the docs say. Test Plan: - cargo test --features lab --lib: 118/118 pass (unchanged) - cargo clippy --features lab -- -D warnings: CLEAN - No internal construction path broken (tests + grpc.rs + serve.rs handler all inside the crate) Also in this commit (cherry-picked from orphan branch state that was not in the #239 merge): - scripts/codec_sweep.sh: Codex P2 fix — stub honesty check now exits 3 on missing/false flag instead of just logging. Script actually enforces the anti-#219 safeguard it documents. - CODING_PRACTICES.md: polyfill chain diagram + mandatory cargo clippy + feature-matrix discipline section (~131 LOC). Cross-ref: CODING_PRACTICES.md anti-pattern #10 "Raw struct literals bypassing builders"; session epiphany "the object does the work"; Codex P2 review 2026-04-21 on the stub honesty script. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 5 commits April 20, 2026 05:29

fix(lab): CamPqPhase dim-mismatch — CAM-PQ truncates to multiple of 6

f1498bc

AdaWorldAPI merged commit 57d4db2 into main Apr 20, 2026

AdaWorldAPI mentioned this pull request Apr 20, 2026

docs(knowledge): lab = API+Planner+JIT, thinking harvest, I11 measurability #224

Merged

4 tasks

AdaWorldAPI mentioned this pull request Apr 20, 2026

plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types #225

Merged

5 tasks

AdaWorldAPI mentioned this pull request Apr 20, 2026

D0.5 auto_detect + D0.2 WireTokenAgreement stub (Phase 0, 66/66 tests) #231

Merged

5 tasks

AdaWorldAPI mentioned this pull request Apr 21, 2026

D2.1 token-agreement harness scaffold (I11 cert gate infra, 117/117 tests) #236

Merged

AdaWorldAPI mentioned this pull request Apr 21, 2026

D2.3 /v1/shader/token-agreement handler (Phase 2 scaffold complete) #237

Merged

AdaWorldAPI mentioned this pull request Apr 21, 2026

D3.2 client driver + starter YAML configs (Phase 3 scaffold complete, 118/118 tests) #239

Merged

AdaWorldAPI mentioned this pull request Apr 21, 2026

Seal Wire DTOs with #[non_exhaustive] + recover Codex P2 + polyfill chain docs #240

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codec research: CAM-PQ solves argmax blind spot (ICC 0.9998 at 6 B/row) + production plan#219

codec research: CAM-PQ solves argmax blind spot (ICC 0.9998 at 6 B/row) + production plan#219
AdaWorldAPI merged 5 commits into
mainfrom
claude/quick-wins-2026-04-19

AdaWorldAPI commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 20, 2026

Summary

What this PR ships (docs + lab-gated research)

What this PR does NOT ship

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants