D1+D2+D5: CAM-PQ calibration pipeline — honest negative result by AdaWorldAPI · Pull Request #220 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-20T09:30:52Z

Summary

D1 — route_tensor classifier in lance-graph-contract::cam: routes tensors to CamPq / Passthrough / Skip per invariant I1. 10 tests, 133/133 contract suite passes.
D2 — cam_pq_calibrate CLI (--features calibrate): reads safetensors, trains per-tensor CamCodebook, encodes fingerprints, serializes codebooks + fingerprints + manifest.json with SHA256 + ICC + reconstruction error.
D5 — Full-size validation on Qwen3-TTS-0.6B: FAILS the ≥0.99 ICC gate.
Diagnostic probe (cam_pq_row_count_probe) demonstrates the root cause.
EPIPHANIES.md updated: prior "CAM-PQ solves argmax" entry marked SUPERSEDED.

Negative Result

PR #218's bench measured ICC 0.9998 on 128 rows trained and measured on those same 128 rows. With 256 centroids per subspace, 128 rows trivially fit — every row gets its own centroid. This does not generalize.

Full-size validation (234 argmax-regime tensors, Qwen3-TTS-0.6B):

Metric	Value
Mean ICC	0.195
Max ICC	0.957
Tensors ≥ 0.99 ICC	0 / 234
Relative L2 error	0.70–0.90

Diagnostic on one gate_proj [3072, 1024]:

n_train	icc_train	icc_all_rows
128	1.000	−0.304
256	1.000	−0.130
512	0.531	0.015
3072	−0.079	−0.079

Root cause: 6×256 PQ is centroid-starved for production tensors (1024–3072 rows). The "128× compression" claim was extrapolated from a trivial in-training fit.

What's Sound

Infrastructure works correctly: the CLI, route classifier, codebook serialization format, ICC/reconstruction measurement harness. The negative result is in the codec's capacity, not the tooling.

What's Needed to Fix

(a) Wider codebook: 1024+ centroids per subspace (10 bits = 7.5 B/row)
(b) Residual PQ: encode residuals after first pass
(c) Hadamard pre-rotation to decorrelate subspaces
(d) OPQ (optimized product quantization) rotation

Test plan

cargo test -p lance-graph-contract --lib route_tests — 10/10 pass
cargo test -p lance-graph-contract — 133/133 pass
cam_pq_calibrate builds and runs on Qwen3-TTS-0.6B safetensors
cam_pq_row_count_probe reproduces the 128-row artifact

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Enforces invariant I1: index-regime tensors (embed_tokens, lm_head, token_embd, wte/wpe) MUST stay Passthrough — identity lookup can't survive any codec. Argmax-regime (attention Q/K/V/O, MLP gate/up/down) routes to CamPq. Norms/conv/small → Skip. Order of rules matters: index-regime match comes BEFORE the ambiguous-large-2D fallback so lm_head (2D, 151936×hidden) isn't misrouted. Covered by lm_head_not_misrouted_as_campq test. 8 tests covering Qwen/Llama/GPT-2/GGUF naming conventions. 133/133 contract tests pass. Zero deps preserved. First deliverable (D1) of the CAM-PQ production wiring plan merged in PR #219.

…sult D2 — cam_pq_calibrate binary: reads safetensors, classifies tensors via route_tensor (D1), trains a CamCodebook per argmax-regime tensor, encodes all rows to 6-byte fingerprints, measures ICC_3_1 and relative L2 error, writes codebooks + fingerprints + manifest.json. D5 — full-size validation on Qwen3-TTS-0.6B: FAILS. 234 argmax-regime tensors measured. Mean ICC = 0.195, zero tensors meet the ≥0.99 gate. Relative L2 error 0.70–0.90. Root cause: PR #218 bench measured ICC 0.9998 on 128 rows trained and measured on those same 128 rows — a trivially-correct fit (128 ≤ 256 centroids → every row gets its own centroid). At production tensor sizes (1024–3072 rows), the 6×256 codebook is centroid-starved. cam_pq_row_count_probe.rs demonstrates the collapse: n=128 → icc_train=1.000, icc_all=-0.304 n=3072 → icc_train=-0.079 Also broadens route_tensor embedding match to catch codec_embedding, adding 2 new test cases (10 total, 133/133 contract tests pass). Infrastructure (CLI, serialization, measurement) is sound. The negative result is in the codec's capacity vs tensor row counts, not the tooling. Plan needs revision before D6/D7 effort. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a78131fa76

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T09:35:35Z

+                let (row_dim, n_rows) = match row_layout(&dims_u64) {
+                    Some(v) => v,
+                    None => {
+                        eprintln!("  [skip: not a 2D matrix]");
+                        continue;


Handle non-2D CamPq routes without dropping tensors

When a tensor is classified as CamPq but is not 2D (for example, packed expert weights in MoE checkpoints), this branch logs a skip and continues before writing any artifact or manifest row. Because route_tensor matches attention/MLP names without checking rank (crates/lance-graph-contract/src/cam.rs), these tensors can be silently omitted, producing incomplete calibration outputs that downstream tooling cannot fully reconstruct.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-20T09:35:35Z

+    let bytes: Vec<u8> = data.iter().flat_map(|f| f.to_le_bytes()).collect();
+    w.write_all(&bytes)?;


Stream passthrough tensor writes instead of materializing bytes

This code converts the entire &[f32] tensor into a separate Vec<u8> before writing, which roughly doubles peak memory for each passthrough tensor. For large embedding/lm_head tensors this can push calibration runs into OOM or severe memory pressure even though the data could be written incrementally in chunks.

Useful? React with 👍 / 👎.

The LAB-ONLY surface isn't just quarantine scaffolding — it's the codec-research iteration testbed. Its reason for existing is the cost of the alternative: every codec candidate re-measured through a cargo build cycle burns minutes per iteration. With the lab REST/gRPC + wire DTOs, a single binary serves dozens of candidates against the same safetensors in seconds per call. PR #220 falsified PR #219's ICC-0.9998 claim via exactly this path: the calibration CLI + /v1/shader/calibrate endpoint surfaced mean ICC 0.195 / 0/234 pass rate on full Qwen3-TTS-0.6B tensors before any production consumer linked the codec. Two purposes now named explicitly in the doc: 1. Iteration velocity (positive) — lab surface = curl-friendly research loop, no rebuild per candidate. 2. Canonical firewall (guard) — consumers still walk UnifiedStep via OrchestrationBridge; they never see Wire* per-op DTOs. Changes: - New subsection "Why the Lab Surface Exists (positive purpose — not just quarantine)" with the #219 → #220 worked-example table. - Decision Procedure item 3 reframed: research ops and curl-friendly debug shortcuts are a legitimate use of the lab surface, with a graduation rule (full-size validation → new StepDomain variant; lab endpoint stays for continued iteration, production moves to bridge). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… I11 measurability The prior "positive purpose" framing was too narrow (codec iteration velocity). The actual architecture the lab surface buys is three-part: REST/gRPC API — no rebuild per codec candidate Planner — real dispatch path under test (not a toy bench) JIT — swap kernels at runtime without relinking Two loads share this stack; neither is secondary: 1. Codec certification. Reconstruction ICC on real safetensors is necessary but not sufficient — the cert gate is token agreement vs Passthrough on full decode. PR #219's 0.9998 was synthetic / overfit-on-training; PR #220's 0.195 was real-weight but still reconstruction-only. The next load-bearing measurement is the token-level comparison, which is only tractable on this stack. At 8-17 min/rebuild × ~200 codec invariants to tune, iteration without the API is infeasible. 2. Thinking harvest (the AGI magic bullet). The same API + Planner + JIT externalises the planner's 36-style / 13-verb / NARS trace. POST a Cypher query, get {rows, thinking_trace} back. The trace is log / replay / NARS-revise-able — which is the architectural shape of a system that learns its own meta-inference. This is the REST/Cypher injection path we can revive at near-zero cost now that PR #221 landed the REST/gRPC scaffolding. I11 (new invariant): Measurable stack, not a black box. Every layer (L0 ndarray → L4 planner) emits a harvest-ready trace through the lab surface. Proposed changes that shrink trace for perf/simplicity are rejected — the trace contract is what makes the feedback loop mechanisable. Also refined: Decision Procedure item 3 (codec research is a legitimate positive use, not a grudging exception); rule-of-thumb measurement order (reconstruction error → reconstruction ICC → token agreement) with token agreement as the cert gate. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Operationalises PR #220's "What's Needed to Fix" list (wider codebook, residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through the lab endpoint — every codec difference is a JIT kernel, not a cargo rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run unlimited candidates without further rebuilds; Phase 5 graduates winners to the canonical OrchestrationBridge surface. Structure: Phase 0 — API hardening (one rebuild, then frozen): D0.1 CodecParams in WireCalibrate D0.2 WireTokenAgreement endpoint (I11 cert gate) D0.3 WireSweep streaming + Lance append D0.4 surface freeze Phase 1 — JIT codec kernels (rebuild-free): D1.1 CodecKernelCache via JitCompiler (Cranelift) D1.2 Rotation primitives (Identity / Hadamard / OPQ) D1.3 Residual PQ via JIT composition Phase 2 — Token-agreement harness (the I11 cert gate): D2.1 Reference-model loader (ndarray safetensors) D2.2 Decode-and-compare loop (top-k, per-layer MSE) D2.3 Handler wiring Phase 3 — Sweep driver + Lance logger Phase 4 — DataFusion frontier analysis Phase 5 — Graduation to OrchestrationBridge (per winner only) ~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards. Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of hours). All work behind --features lab until graduation. INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224 dependency for the architectural framing. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…hanges Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md, codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and user directives "its all there, dont touch, just be aware how to use crate::simd", "wire accordingly into the lab infra", "via struct of arrays": - slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77, const-generic. I conflated it with a missing ndarray::array_window (singular); corrected. - AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via stable inline asm on Rust 1.94, per src/simd_amx.rs header), NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly; inline asm at src/hpc/amx_matmul.rs is the stable consumer path. Verified on kernel 6.18.5 with XCR0 bits 17+18 set. - Real primitive names (no hallucinated matmul_tiled / hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16 for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI; F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline. - Polyfill hierarchy per user directive (simd_amx > simd_avx512 > simd_avx2 fallback): Tier 1: Intel AMX tile (256 MACs/instr) Tier 2: AVX-512 VNNI (64 MACs/instr) Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory default per ndarray's .cargo/config.toml target-cpu=x86-64-v4) Tier 4: AVX-2 F32x8 fallback Tier 5: scalar reference - Rule A wires SoA: the &[u8] slice array_windows iterates comes from a BindSpace column (FingerprintColumns / QualiaColumn / MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new data structures — the SoA column IS the input surface. - Dropped all "Phase 0 ndarray prerequisite" language. Everything the sweep needs exists in ndarray today; this plan wires the existing surface into cognitive-shader-driver (REST handlers + CodecKernelCache + CodecResearchBridge). Zero ndarray changes. - Added reality-check against codec-findings-2026-04-20.md so the sweep does NOT re-derive measured winners: Had-Q5×D-R already ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row- only at ICC ≈ 0.9; zipper serves bundling axis, not argmax; fractal leaf descriptors are DEAD (sign-flip invariant). The sweep focuses on #220's four unmeasured candidates (wider codebook / residual PQ / Hadamard pre-rotation / OPQ) and on the missing axis — token agreement, not reconstruction ICC. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Eight concrete YAML configs for configs/codec/*.yaml that Phase 0 will consume: 00_baseline_passthrough — regression anchor (top1=1.000 exactly) 01_pr220_baseline — negative control, reproduces #220 ICC 0.195 02_pr219_overfit_reproducer — negative control, split-test must FAIL 10_fix_a_wider_codebook — #220 (a) 1024 centroids 11_fix_b_residual_pq — #220 (b) residual depth=1 12_fix_c_hadamard_rotation — #220 (c) Hadamard pre-rotation 13_fix_d_opq_rotation — #220 (d) OPQ learned rotation 20_composite_a_plus_b — composition probe for combinatorial lift 30_cross_product_sweep — SweepGrid for D3.1 initial sweep Each YAML: - Names lane_width explicitly (Rule E) so the JIT compiles the right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others default to F32x16. - Carries a notes: block stating the expected measurement outcome, so Phase 0's regression detection has ground truth to check against (e.g., baseline reproducer must produce ICC ≈ 0.195, overfit reproducer must FAIL the split-test). - Separates calibration_rows from measurement_rows where relevant (pr219_overfit_reproducer sets them equal so the pipeline refuses to report the ICC, demonstrating the guard that prevents PR #219's overfit-on-training artefact from recurring). 30_cross_product_sweep specifies the initial 54-candidate grid (1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance × 2 lane widths). Expected JIT compile budget: ~800 ms one-time; everything after is cache hits per Rule A/B. Operating principle reiterated at the end: adding a candidate is authoring a YAML; changing params is editing YAML; Rust reads YAML once at ingress (Rule F) and never re-serialises. Sweep logger appends result rows to Lance — the only egress beyond the REST response. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…dation First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan. Zero-dep contract-side types the lab API (cognitive-shader-driver) will carry into JIT compilation. Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC): Enums (Rule E — Wire surface IS the SIMD surface, object-oriented): LaneWidth { F32x16, U8x64, F64x8, BF16x32 } — mirrors ndarray::simd::* Distance { AdcU8, AdcI8 } — CODING_PRACTICES gap 5 (sign-handling / bipolar cancellation) Rotation { Identity, Hadamard{dim}, Opq{blob,dim} } Structs: ResidualSpec { depth, centroids } CodecParams { subspaces, centroids, residual, lane_width, pre_rotation, distance, calibration_rows, measurement_rows, seed } Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct): CodecParamsBuilder::new() .subspaces(u32).centroids(u32).residual(ResidualSpec) .lane_width(LaneWidth).rotation(Rotation).distance(Distance) .calibration_rows(u32).measurement_rows(u32).seed(u64) .build() -> Result<CodecParams, CodecParamsError> Validation fires BEFORE any JIT compile (D0.7 precision ladder): - ZeroDimension — subspaces == 0 or centroids == 0 - OpqRequiresBf16 — OPQ routes through tile_dpbf16ps; only LaneWidth::BF16x32 is valid - HadamardDimNotPow2 — Sylvester construction needs dim = 2^k - CalibrationEqualsMeasurement — overfit guard: refuses to emit ICC when calibration_rows == measurement_rows (reproduces PR #219's 128-row trained-and-tested artifact) Methods on CodecParams: kernel_signature() -> u64 — JIT cache key (Rule E); excludes seed so calibration-sample changes don't invalidate cached kernels is_matmul_heavy() -> bool — true for OPQ or centroids > 512; drives Tier-1 AMX dispatch decision (Rule C polyfill hierarchy) Rotation::is_matmul() -> bool — Identity and Hadamard are false (butterfly stays on Tier-3 F32x16); only Opq returns true 14 new tests covering: - builder default matches PR #220 baseline shape - each validation variant fires correctly - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error - Hadamard + non-pow2 dim rejected - overfit guard fires on calibration == measurement - kernel_signature stable across identical builds - kernel_signature excludes seed (cache stays hot) - kernel_signature changes with centroids / rotation kind - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids) Zero-dep preserved (stdlib only: std::collections::hash_map:: DefaultHasher for kernel_signature, core::fmt + core::error for error types). No serde in the contract — YAML/JSON deserialisation belongs to the consumer crate, which will produce CodecParams via serde at the REST handler (Rule F — serialisation at edge only). Tests: 147/147 contract suite passing (133 prior + 14 new). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Operationalises PR #220's "What's Needed to Fix" list (wider codebook, residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through the lab endpoint — every codec difference is a JIT kernel, not a cargo rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run unlimited candidates without further rebuilds; Phase 5 graduates winners to the canonical OrchestrationBridge surface. Structure: Phase 0 — API hardening (one rebuild, then frozen): D0.1 CodecParams in WireCalibrate D0.2 WireTokenAgreement endpoint (I11 cert gate) D0.3 WireSweep streaming + Lance append D0.4 surface freeze Phase 1 — JIT codec kernels (rebuild-free): D1.1 CodecKernelCache via JitCompiler (Cranelift) D1.2 Rotation primitives (Identity / Hadamard / OPQ) D1.3 Residual PQ via JIT composition Phase 2 — Token-agreement harness (the I11 cert gate): D2.1 Reference-model loader (ndarray safetensors) D2.2 Decode-and-compare loop (top-k, per-layer MSE) D2.3 Handler wiring Phase 3 — Sweep driver + Lance logger Phase 4 — DataFusion frontier analysis Phase 5 — Graduation to OrchestrationBridge (per winner only) ~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards. Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of hours). All work behind --features lab until graduation. INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224 dependency for the architectural framing. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…hanges Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md, codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and user directives "its all there, dont touch, just be aware how to use crate::simd", "wire accordingly into the lab infra", "via struct of arrays": - slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77, const-generic. I conflated it with a missing ndarray::array_window (singular); corrected. - AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via stable inline asm on Rust 1.94, per src/simd_amx.rs header), NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly; inline asm at src/hpc/amx_matmul.rs is the stable consumer path. Verified on kernel 6.18.5 with XCR0 bits 17+18 set. - Real primitive names (no hallucinated matmul_tiled / hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16 for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI; F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline. - Polyfill hierarchy per user directive (simd_amx > simd_avx512 > simd_avx2 fallback): Tier 1: Intel AMX tile (256 MACs/instr) Tier 2: AVX-512 VNNI (64 MACs/instr) Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory default per ndarray's .cargo/config.toml target-cpu=x86-64-v4) Tier 4: AVX-2 F32x8 fallback Tier 5: scalar reference - Rule A wires SoA: the &[u8] slice array_windows iterates comes from a BindSpace column (FingerprintColumns / QualiaColumn / MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new data structures — the SoA column IS the input surface. - Dropped all "Phase 0 ndarray prerequisite" language. Everything the sweep needs exists in ndarray today; this plan wires the existing surface into cognitive-shader-driver (REST handlers + CodecKernelCache + CodecResearchBridge). Zero ndarray changes. - Added reality-check against codec-findings-2026-04-20.md so the sweep does NOT re-derive measured winners: Had-Q5×D-R already ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row- only at ICC ≈ 0.9; zipper serves bundling axis, not argmax; fractal leaf descriptors are DEAD (sign-flip invariant). The sweep focuses on #220's four unmeasured candidates (wider codebook / residual PQ / Hadamard pre-rotation / OPQ) and on the missing axis — token agreement, not reconstruction ICC. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Eight concrete YAML configs for configs/codec/*.yaml that Phase 0 will consume: 00_baseline_passthrough — regression anchor (top1=1.000 exactly) 01_pr220_baseline — negative control, reproduces #220 ICC 0.195 02_pr219_overfit_reproducer — negative control, split-test must FAIL 10_fix_a_wider_codebook — #220 (a) 1024 centroids 11_fix_b_residual_pq — #220 (b) residual depth=1 12_fix_c_hadamard_rotation — #220 (c) Hadamard pre-rotation 13_fix_d_opq_rotation — #220 (d) OPQ learned rotation 20_composite_a_plus_b — composition probe for combinatorial lift 30_cross_product_sweep — SweepGrid for D3.1 initial sweep Each YAML: - Names lane_width explicitly (Rule E) so the JIT compiles the right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others default to F32x16. - Carries a notes: block stating the expected measurement outcome, so Phase 0's regression detection has ground truth to check against (e.g., baseline reproducer must produce ICC ≈ 0.195, overfit reproducer must FAIL the split-test). - Separates calibration_rows from measurement_rows where relevant (pr219_overfit_reproducer sets them equal so the pipeline refuses to report the ICC, demonstrating the guard that prevents PR #219's overfit-on-training artefact from recurring). 30_cross_product_sweep specifies the initial 54-candidate grid (1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance × 2 lane widths). Expected JIT compile budget: ~800 ms one-time; everything after is cache hits per Rule A/B. Operating principle reiterated at the end: adding a candidate is authoring a YAML; changing params is editing YAML; Rust reads YAML once at ingress (Rule F) and never re-serialises. Sweep logger appends result rows to Lance — the only egress beyond the REST response. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…dation First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan. Zero-dep contract-side types the lab API (cognitive-shader-driver) will carry into JIT compilation. Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC): Enums (Rule E — Wire surface IS the SIMD surface, object-oriented): LaneWidth { F32x16, U8x64, F64x8, BF16x32 } — mirrors ndarray::simd::* Distance { AdcU8, AdcI8 } — CODING_PRACTICES gap 5 (sign-handling / bipolar cancellation) Rotation { Identity, Hadamard{dim}, Opq{blob,dim} } Structs: ResidualSpec { depth, centroids } CodecParams { subspaces, centroids, residual, lane_width, pre_rotation, distance, calibration_rows, measurement_rows, seed } Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct): CodecParamsBuilder::new() .subspaces(u32).centroids(u32).residual(ResidualSpec) .lane_width(LaneWidth).rotation(Rotation).distance(Distance) .calibration_rows(u32).measurement_rows(u32).seed(u64) .build() -> Result<CodecParams, CodecParamsError> Validation fires BEFORE any JIT compile (D0.7 precision ladder): - ZeroDimension — subspaces == 0 or centroids == 0 - OpqRequiresBf16 — OPQ routes through tile_dpbf16ps; only LaneWidth::BF16x32 is valid - HadamardDimNotPow2 — Sylvester construction needs dim = 2^k - CalibrationEqualsMeasurement — overfit guard: refuses to emit ICC when calibration_rows == measurement_rows (reproduces PR #219's 128-row trained-and-tested artifact) Methods on CodecParams: kernel_signature() -> u64 — JIT cache key (Rule E); excludes seed so calibration-sample changes don't invalidate cached kernels is_matmul_heavy() -> bool — true for OPQ or centroids > 512; drives Tier-1 AMX dispatch decision (Rule C polyfill hierarchy) Rotation::is_matmul() -> bool — Identity and Hadamard are false (butterfly stays on Tier-3 F32x16); only Opq returns true 14 new tests covering: - builder default matches PR #220 baseline shape - each validation variant fires correctly - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error - Hadamard + non-pow2 dim rejected - overfit guard fires on calibration == measurement - kernel_signature stable across identical builds - kernel_signature excludes seed (cache stays hot) - kernel_signature changes with centroids / rotation kind - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids) Zero-dep preserved (stdlib only: std::collections::hash_map:: DefaultHasher for kernel_signature, core::fmt + core::error for error types). No serde in the contract — YAML/JSON deserialisation belongs to the consumer crate, which will produce CodecParams via serde at the REST handler (Rule F — serialisation at edge only). Tests: 147/147 contract suite passing (133 prior + 14 new). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Retroactive hygiene for the recent PR arc + prospective enforcement so the gap never recurs. User directive: "should have happened to begin with." LATEST_STATE.md: - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)" - Recently Shipped table: prepended rows for #225 (open), #224, and #223 with full shipped-content summaries - Contract Inventory: expanded cam:: entry with all new codec- sweep types (LaneWidth / Distance / Rotation / ResidualSpec / CodecParams / CodecParamsBuilder / CodecParamsError) including the precision-ladder-fires-before-JIT invariant - Active Branches: recorded claude/teleport-session-setup-wMZfb and its three merged PRs - Active Integration Plans: added codec-sweep-via-lab-infra-v1 alongside elegant-herding-rocket-v1 - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/ 0.5) + the elegant-herding Phase 2 block PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only): - #225 entry: plan + CodecParams/Builder/precision validation + rules A-F locked + decisions for future PRs - #224 entry: three-part lab stack + thinking harvest + I11 measurability locked - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants locked (the cross-cutting architectural ruleset this workspace now enforces) STATUS_BOARD.md: - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across 5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued) EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries): - Board hygiene is the driving seat, not cleanup (this session's self-reflection turned into a rule) - Codec cert is token agreement, not synthetic ICC (#219 → #220 arc; #225 CalibrationEqualsMeasurement typed rejection) - Lab REST surface is three-part (API + Planner + JIT), not just scaffolding - Thinking harvest via REST/Cypher = the AGI magic bullet - SoA never scalarises without ndarray (iron rule Rule C) - AGI is the glove, not the oracle — four-axis SoA is what you wear CLAUDE.md — new top-level § "The Stance — Driving Seat + AGI-as-Glove (P0, read first)": - Explicit driving-seat posture: the session STEERS the stack, doesn't observe it - AGI-as-glove doctrine concrete: topic → FingerprintColumns, angle → QualiaColumn, thinking → MetaColumn, planner → EdgeColumn. New capability lands as a new column, not a layer. - MANDATORY Board-Hygiene Rule as a table: every PR that adds a type / plan / D-id / epiphany / tech-debt / issue MUST update the corresponding board file IN THE SAME COMMIT. Retroactive hygiene (merge PR → later cleanup) is now an anti-pattern the rule forbids. - "Consult, don't guess" — agent/knowledge-first discipline: specialist-agent card → knowledge doc → board inventory → only then grep. Subagent spawn with curated docs beats main- thread grep. 147/147 contract suite still passing. Doc-only PR otherwise (Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps from the timed-out bus-compiler subagent were reverted — they'll land with D0.1/D0.3 when the Wire code lands). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1. 66/66 cognitive-shader-driver tests pass under --features serve (+11 new). D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1): Reads <model_path>/config.json (HuggingFace layout) and returns ModelFingerprint { architecture, hidden_size, n_layers, tokenizer_class, vocab_size, default_lane_width, default_distance }. Architecture routing: llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX) bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512) torch_dtype override wins over architecture heuristic. Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}. Best-effort tokenizer_class from tokenizer_config.json. 8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta (d_model alias) / generic fallback / missing-config / missing-field. D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate): DTOs: WireBaseline { Passthrough } — default, extensible WireTokenAgreement { model_path, reference, candidate (WireCodecParams), prompt_set_blob_id, n_tokens } WireTokenAgreementResult { top1_rate, top5_rate, divergence_positions, per_layer_mse, candidate_latency_us, reference_latency_us, stub, backend } Phase 0 handler stub (not shipped yet): returns stub:true / backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the real decode-and-compare loop (reference model load + top-k comparison + per-layer MSE). Pass gates (for when the harness lands): top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline. This is the ACTUAL codec cert gate — reconstruction ICC is necessary-but-not-sufficient (per #219/#220 lesson). 3 round-trip serde tests: full payload + stub-backend default + baseline default. Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md updated: D0.1 Queued → Shipped (PR #227 — was stale) D0.2 Queued → In PR (this branch) D0.5 Queued → In PR (this branch) Phase 0 state after this commit: ✅ D0.1 WireCalibrate + WireTensorView (PR #227) ✅ D0.6 CodecParamsBuilder (PR #225) ✅ D0.7 precision-ladder validation (PR #225) ✅ D0.5 auto_detect (this PR) ✅ D0.2 WireTokenAgreement stub (this PR) ⏳ D0.3 WireSweep streaming endpoint (next PR) ⏳ D0.4 surface freeze (gates after D0.3) Rules honored: Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams Rule E — Wire surface IS the SIMD surface (lane_width on candidate) Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

First Phase 2 deliverable — scaffold of the I11 cert gate harness. The PR #219 → #220 lesson landed as a typed-rejection wall: the stub result carries stub:true + backend:"stub" so no client can confuse Phase 0 stub output for a real measurement. crates/cognitive-shader-driver/src/token_agreement.rs (~320 LOC): ReferenceModel { path, path_hash, stub_token_count } ::load(&Path) -> Result<Self, TokenAgreementError> D2.1 stub: validates path exists, hashes display; does NOT parse safetensors yet. D2.2 replaces with real loader driven by auto_detect::detect() → ModelFingerprint. ::stub(tag, n_tokens) — builds stub model without touching fs TokenAgreementError: ModelPathMissing { path } EmptyPromptSet TokenCountMismatch { reference, candidate } NotImplementedYet { what } ← measure_full() until D2.2 TopKAgreement { top1_matches, top5_matches, total_positions, divergence_positions: Vec<u32> } ::compare(ref: &[Vec<u32>], cand: &[Vec<u32>]) -> Result<Self> Position-by-position: top1 = r[0] == c[0]; top5 = r[0] in c[..5]. Records divergence positions for failure-mode analysis (late-sequence drift vs random errors). ::top1_rate() / top5_rate() -> f32 ::meets_cert_gate() -> bool (top1 ≥ 0.99 AND top5 ≥ 0.999) ::aggregate(per_prompt) — sums counters; concatenates divergence with per-prompt offset so failures stay localised TokenAgreementHarness: ::new(reference, baseline, candidate, n_tokens) ::measure_stub() -> WireTokenAgreementResult { stub:true, .. } ::measure_full() -> NotImplementedYet (D2.2 scope) Tests (13 new): - reference_model_stub_builds_without_filesystem - reference_model_load_missing_path_yields_typed_error - topk_compare_identical_streams_is_perfect (full cert gate pass) - topk_compare_all_different_fails_cert_gate - topk_top5_matches_when_top1_misses_but_in_top5 (ref top-1 = 7; cand has 7 at position 3 in top-5 → top5 counts) - topk_mismatched_stream_lengths_yield_typed_error - topk_aggregate_sums_counters_and_offsets_divergence (prompt 2's divergence at pos 4 → aggregate pos 14 after prompt 1's 10) - cert_gate_passes_at_exact_thresholds (990/1000 = 0.99, 999/1000 = 0.999 — both boundaries hit) - cert_gate_fails_when_top1_below_threshold_even_if_top5_passes - cert_gate_fails_when_top5_below_threshold_even_if_top1_passes - harness_measure_stub_returns_machine_checkable_stub_flag (stub:true enforced; backend="stub"; all rates 0.0; zero latencies) - harness_measure_full_returns_not_implemented_pointing_at_d22 - harness_measure_stub_rejects_zero_n_tokens Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md D2.1 Queued → In PR Phase state: Phase 0 ✅ complete (D0.1-D0.7 all shipped) Phase 1 scaffold ✅ (D1.1, D1.2, D1.3 shipped; D1.1b queued) Phase 2 ⏳ D2.1 (this PR), D2.2 + D2.3 queued Rules honored: Rule D — Measurement set comes from Wire DTOs (D0.2 WireTokenAgreement) Rule E — TopKAgreement exposes object-methods (top1_rate, meets_cert_gate) Rule F — No serialization between stages; per-prompt Vec<Vec<u32>> token streams are plain Rust owned; the serde happens at D2.3 handler entry / exit only https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…ft guard) Final Phase 3 scaffold deliverable — curl-driven lab iteration against the shipped /v1/shader/sweep endpoint. Files: configs/codec/README.md — inventory + DoS-ceiling note + anti-#219 stub:true flag explanation configs/codec/00_pr220_baseline.yaml - PR #220 baseline regression: 6 subspaces × 256 centroids × identity rotation. Expected ICC ≈ 0.195 mean when D2.2 lands real decode-and-compare. configs/codec/10_wider_codebook.yaml - PR #220 fix (a): centroids ∈ {256, 512, 1024}. Cardinality 3, three distinct kernel signatures → warm cache after one pass. configs/codec/12_hadamard_pre_rotation.yaml - PR #220 fix (c): Hadamard × centroids cross-product (2×2 = 4). Hadamard stays Tier-3 F32x16 per Rule C. scripts/codec_sweep.sh - yq YAML → JSON conversion - POST to ${SHADER_LAB_URL}/v1/shader/sweep (default localhost:3001) - jq-pretty request + response - Stub honesty check: prints results[0].stub flag → verifies Phase 0 returns true (machine-checkable anti-#219) - Requires: yq (mikefarah/yq ≥ v4), curl, jq wire.rs +1 test: sweep_request_yaml_shape_deserializes_via_serde_json - Inline JSON fixture mirroring the canonical YAML → JSON shape - If this test breaks, the YAML configs are stale relative to the Rust DTOs → scripts/codec_sweep.sh would fail at runtime - Caught a real drift during development: PascalCase "Identity" vs the DTO's rename_all="lowercase" (YAMLs correctly use lowercase; test fixture had the typo) Phase state: Phase 0 ✅ complete Phase 1 scaffold ✅ (D1.1 / D1.2 / D1.3 shipped; D1.1b queued) Phase 2 scaffold ✅ (D2.1 harness + D2.3 handler; D2.2 queued) Phase 3 scaffold ✅ — D3.1 batch handler + D3.2 client driver shipped ⏳ D3.1b real Lance append writer queued DoS-ceiling note: sweep handler rejects grids with cardinality > 10_000 before enumeration (PR #238 P1 fix). README documents the ceiling so config authors can budget axis lengths. Rule D honored: adding a new codec candidate = authoring a new YAML file in configs/codec/. Zero Rust changes. Zero rebuilds. Rules F honored at the client boundary: YAML → JSON → HTTP ingress. Single deserialisation at the shader-lab's handler; everything after is in-memory Rust (WireSweepRequest → CodecParams → grid enumerate() → per-candidate WireSweepResult). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 2 commits April 20, 2026 07:29

AdaWorldAPI merged commit 3d9239d into main Apr 20, 2026

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

AdaWorldAPI mentioned this pull request Apr 20, 2026

Unified API: codec research + runbook + planner-via-DTO + OrchestrationBridge #221

Merged

3 tasks

AdaWorldAPI mentioned this pull request Apr 20, 2026

docs(knowledge): lab = API+Planner+JIT, thinking harvest, I11 measurability #224

Merged

4 tasks

AdaWorldAPI mentioned this pull request Apr 20, 2026

plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types #225

Merged

5 tasks

AdaWorldAPI mentioned this pull request Apr 20, 2026

D0.5 auto_detect + D0.2 WireTokenAgreement stub (Phase 0, 66/66 tests) #231

Merged

5 tasks

AdaWorldAPI mentioned this pull request Apr 21, 2026

D2.1 token-agreement harness scaffold (I11 cert gate infra, 117/117 tests) #236

Merged

AdaWorldAPI mentioned this pull request Apr 21, 2026

D3.2 client driver + starter YAML configs (Phase 3 scaffold complete, 118/118 tests) #239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

D1+D2+D5: CAM-PQ calibration pipeline — honest negative result#220

D1+D2+D5: CAM-PQ calibration pipeline — honest negative result#220
AdaWorldAPI merged 2 commits into
mainfrom
claude/cam-pq-production-d1-d2

AdaWorldAPI commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let bytes: Vec<u8> = data.iter().flat_map(\|f\| f.to_le_bytes()).collect();
		w.write_all(&bytes)?;

Conversation

AdaWorldAPI commented Apr 20, 2026

Summary

Negative Result

What's Sound

What's Needed to Fix

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants