plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types by AdaWorldAPI · Pull Request #225 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-20T17:42:42Z

Summary

Codec-sweep-via-lab-infra plan (9 docs commits shaping the design,
starting from the PR #220 honest-negative arc) + first Phase 0 code
deliverable on the contract side (CodecParams types + builder +
precision-ladder validation).

The plan — six rules binding every JIT-emitted kernel

Authored through successive corrections in one session:

Rule A — Tensor access via stdlib slice::array_windows::<N>()
(stable since Rust 1.77) + ndarray::simd::* lane loaders. Zero
hand-rolled slicing.
Rule B — SIMD exclusively via ndarray::simd::* and its AMX
sibling modules (ndarray::simd_amx::*,
ndarray::hpc::amx_matmul::*, ndarray::hpc::simd_caps::*).
Everything already exists in ndarray; zero ndarray changes.
Rule C — Polyfill hierarchy (Intel AMX → AVX-512 VNNI →
AVX-512 baseline → AVX-2). No consumer-visible scalar tier; SoA
never scalarises without ndarray.
Rule D — Configuration JSON / YAML / REST only. New candidate
= new YAML; zero Rust changes, zero rebuilds.
Rule E — Wire surface IS the SIMD surface (object-oriented).
LaneWidth enum mirrors lane types; DTOs expose methods
(row(), lanes_f32x16(), kernel_signature()) not scalar bags.
Rule F — Serialisation at the edge only; never inside. One
decode at REST ingress; one encode at response / Lance egress.
No internal serde between layers.

The four-PR staircase it unlocks

Laid out in the plan:

PR A (Phase 0): hardens Wire surface + builder + auto_detect
- precision validation. One upfront rebuild.
PR B (Phases 1+2): JIT kernels + token-agreement cert gate.
PR C (Phases 3+4): sweep driver + Lance logger + frontier
analysis.
PR D (Phase 5): per-winner graduation to OrchestrationBridge.

The sweep runs unlimited candidates after the one upfront rebuild
because every candidate is a JIT kernel keyed on
CodecParams::kernel_signature, not a new binary.

Audit vs `.claude/CODING_PRACTICES.md` (EmbedAnything patterns)

Three gaps found, remediated as Phase 0 deliverables:

Gap 1 — auto-detect, not hardcode → D0.5 auto_detect.rs
reads config.json next to safetensors (mirrors EmbedAnything's
pattern).
Gap 3 — builder, not raw struct → D0.6 CodecParamsBuilder
landed in this PR (fluent API + 14 tests).
Gap 5 — u8 vs i8 tables → Distance::{AdcU8, AdcI8} split
for sign-handling / bipolar cancellation.

All five anti-patterns dodged (lib.rs stays declarations-only; hot
path is zero-copy + Arc'd KernelHandle; Rust-first; codebook
lookup only; precision ladder BF16 calibration → u8/i8 runtime →
f32 accumulator).

Starter YAML configs (Appendix A — 9 configs)

Concrete Phase 0 inputs live in configs/codec/*.yaml:

00_baseline_passthrough — regression anchor (top1 = 1.000 exactly)
01_pr220_baseline — reproduces D1+D2+D5: CAM-PQ calibration pipeline — honest negative result #220 ICC ≈ 0.195 (pipeline
sanity check)
02_pr219_overfit_reproducer — calibration_rows = measurement_rows
→ pipeline's overfit guard must FAIL it
10_fix_a_wider_codebook — 1024 centroids
11_fix_b_residual_pq — residual depth 1
12_fix_c_hadamard_rotation — Sylvester butterfly, stays on
Tier-3 F32x16
13_fix_d_opq_rotation — learned rotation + BF16x32 lane
(matches tile_dpbf16ps)
20_composite_a_plus_b — combinatorial-lift probe
30_cross_product_sweep — 54-candidate initial grid

Code delivered in this PR (D0.6 + D0.7)

crates/lance-graph-contract/src/cam.rs, ~383 LOC, zero-dep:

LaneWidth, Distance, Rotation, ResidualSpec, CodecParams
CodecParamsBuilder with fluent API
CodecParamsError typed errors
CodecParams::kernel_signature() (JIT cache key; excludes seed)
CodecParams::is_matmul_heavy() (drives Tier-1 AMX dispatch)
14 tests — all passing. Full suite: 147/147.

Precision-ladder validation fires before JIT compile:

OpqRequiresBf16 — OPQ routes through tile_dpbf16ps; only
BF16x32 lane accepted
HadamardDimNotPow2 — Sylvester construction needs dim = 2^k
CalibrationEqualsMeasurement — typed rejection of PR codec research: CAM-PQ solves argmax blind spot (ICC 0.9998 at 6 B/row) + production plan #219's
trained-and-tested-on-same-rows pattern

Test Plan

cargo test -p lance-graph-contract --lib — 147/147 pass
cargo test -p lance-graph-contract --lib codec_params_tests —
14/14 new tests pass
Zero-dep preserved: stdlib only (DefaultHasher, core::fmt,
core::error)
No serde in the contract — YAML/JSON deserialisation belongs
to the consumer crate at the REST handler (Rule F)
Plan + INTEGRATION_PLANS.md append-only entry committed

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Operationalises PR #220's "What's Needed to Fix" list (wider codebook, residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through the lab endpoint — every codec difference is a JIT kernel, not a cargo rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run unlimited candidates without further rebuilds; Phase 5 graduates winners to the canonical OrchestrationBridge surface. Structure: Phase 0 — API hardening (one rebuild, then frozen): D0.1 CodecParams in WireCalibrate D0.2 WireTokenAgreement endpoint (I11 cert gate) D0.3 WireSweep streaming + Lance append D0.4 surface freeze Phase 1 — JIT codec kernels (rebuild-free): D1.1 CodecKernelCache via JitCompiler (Cranelift) D1.2 Rotation primitives (Identity / Hadamard / OPQ) D1.3 Residual PQ via JIT composition Phase 2 — Token-agreement harness (the I11 cert gate): D2.1 Reference-model loader (ndarray safetensors) D2.2 Decode-and-compare loop (top-k, per-layer MSE) D2.3 Handler wiring Phase 3 — Sweep driver + Lance logger Phase 4 — DataFusion frontier analysis Phase 5 — Graduation to OrchestrationBridge (per winner only) ~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards. Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of hours). All work behind --features lab until graduation. INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224 dependency for the architectural framing. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…::* + AMX + YAML/JSON Binds four non-negotiable rules on every JIT-emitted kernel in Phases 1-3: Rule A: Tensor access via array_window only. No manual index math, no raw pointer reach, no custom slice offset recompute. ndarray::simd::array_window handles stride / alignment / bounds / lane padding. Rule B: SIMD exclusively via ndarray::simd::*. No std::arch::*, no ndarray::hpc::*, no hand-rolled intrinsics. Missing primitive → add to ndarray first, never bypass the canonical surface from the JIT. Rule C: Backend dispatch via simd_caps() (AMX-ready). JIT emits generic IR calling ndarray::simd primitives. Those resolve to AMX tiles on aarch64-apple-darwin with AMX capability, AVX-512 on x86_64, NEON on aarch64, and scalar fallback otherwise. Rotation and distance-table kernels benefit most from AMX (matmul-heavy paths). JIT never emits AMX intrinsics directly — it calls matmul_tiled / hadamard_butterfly / etc., which dispatch internally. Rule D: Configuration is JSON / YAML / REST only. No codec candidate defined in Rust. One schema (CodecParams) serialised three ways: - YAML under configs/codec/*.yaml (human-authored) - JSON payload (curl / REST) - REST endpoint body at /v1/shader/calibrate New candidate = new YAML/JSON file. Zero Rust changes. Zero rebuilds. Enforcement: Phase 0 ships two new test gates — - kernel_contract_test scans emitted IR for banned symbols (std::arch, ndarray::hpc) and required symbols (array_window). - amx_dispatch_test (aarch64-apple-darwin-only) verifies simd_caps().has_amx() and trace records backend = "amx" for rotation kernels on M-series. D1.1-D1.3 body sketches updated to show the contract in practice: every decode / rotation / composition stage reads via array_window and calls ndarray::simd primitives (adc_distances_simd, hadamard_butterfly, matmul_tiled, sub_tiled, add_tiled), never raw intrinsics. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…hanges Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md, codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and user directives "its all there, dont touch, just be aware how to use crate::simd", "wire accordingly into the lab infra", "via struct of arrays": - slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77, const-generic. I conflated it with a missing ndarray::array_window (singular); corrected. - AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via stable inline asm on Rust 1.94, per src/simd_amx.rs header), NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly; inline asm at src/hpc/amx_matmul.rs is the stable consumer path. Verified on kernel 6.18.5 with XCR0 bits 17+18 set. - Real primitive names (no hallucinated matmul_tiled / hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16 for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI; F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline. - Polyfill hierarchy per user directive (simd_amx > simd_avx512 > simd_avx2 fallback): Tier 1: Intel AMX tile (256 MACs/instr) Tier 2: AVX-512 VNNI (64 MACs/instr) Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory default per ndarray's .cargo/config.toml target-cpu=x86-64-v4) Tier 4: AVX-2 F32x8 fallback Tier 5: scalar reference - Rule A wires SoA: the &[u8] slice array_windows iterates comes from a BindSpace column (FingerprintColumns / QualiaColumn / MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new data structures — the SoA column IS the input surface. - Dropped all "Phase 0 ndarray prerequisite" language. Everything the sweep needs exists in ndarray today; this plan wires the existing surface into cognitive-shader-driver (REST handlers + CodecKernelCache + CodecResearchBridge). Zero ndarray changes. - Added reality-check against codec-findings-2026-04-20.md so the sweep does NOT re-derive measured winners: Had-Q5×D-R already ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row- only at ICC ≈ 0.9; zipper serves bundling axis, not argmax; fractal leaf descriptors are DEAD (sign-flip invariant). The sweep focuses on #220's four unmeasured candidates (wider codebook / residual PQ / Hadamard pre-rotation / OPQ) and on the missing axis — token agreement, not reconstruction ICC. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

User directive: "i should never have to remind you to use simd because the struct of arrays never ever does scalar without ndarray." Corrections: - Removed consumer-visible "Tier 5 scalar" row from the polyfill table. Scalar fallback (when it exists at all for exotic targets) lives INSIDE ndarray::simd::* — the consumer never hand-rolls a scalar loop on a SoA path. - Added iron rule before the tier table: every tier in the chain calls ndarray::simd::* / ndarray::simd_amx::* / ndarray::hpc::amx_matmul::* — if a kernel runs scalar on the SoA path, the SoA invariant is broken. - Dispatch pseudo-code cleaned: the else branch lands on ndarray::simd::F32x16 (Tier 3 mandatory floor via target-cpu= x86-64-v4). No "else scalar loop" short-circuit exists. If ndarray::simd were unavailable, SoA wouldn't be the right path. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…oriented) User directive: "the api for lab needs to be simd object oriented surface." Rule E binds the lab Wire DTOs to the SIMD shapes they feed. The Wire types are NOT convenience scalar bags that get reassembled into SIMD structures internally — they ARE the SIMD surface, serialised. Four consequences: (i) Lane-shaped aggregates. LaneWidth enum mirrors ndarray::simd::* lane types (F32x16, U8x64, F64x8, BF16x32). Every tensor-carrying DTO names its lane_width explicitly. (ii) Methods, not bags. WireTensorView exposes row() / row_count() / lanes_f32x16() / subspace(); CodecParams exposes kernel_signature() / lane_width() / is_matmul_heavy(). Consumers never reassemble a tensor from a Vec<f32>. (iii) Kernel signature keying. CodecParams::kernel_signature() returns a stable hash only over fields that shape the emitted IR. JIT cache keys on this object-computed signature; adding an unrelated config field does not invalidate entries. (iv) Serialisation preserves alignment. Decoded WireTensorView bytes land in a 64-byte-aligned buffer; consumers call slice::array_windows::<64>() + F32x16::from_slice directly, no adapter, no copy, no re-align. Plus three cleanups from prior corrections: - kernel_contract_test now scans IR for the real symbols: ndarray::simd::*, ndarray::simd_amx::*, ndarray::hpc::amx_matmul::* (allowed) and std::arch / simd_avxNNN reach (banned). - amx_dispatch_test corrected: x86_64-gated (not aarch64-apple-darwin), calls ndarray::simd_amx::amx_available(). When true on Sapphire Rapids+ runners, asserts backend = "amx" trace for matmul-heavy candidates; when false, verifies Tier-2 VNNI or Tier-3 F32x16 selection — NEVER scalar. - New wire_object_surface_test round-trips WireCalibrate + WireTensorView through JSON/gRPC and proves the decoded bytes are consumable with zero adapter code via array_windows + F32x16. - D1.1 body sketch cleaned: dropped fictional array_window (singular); imports simd_caps from ndarray::hpc::simd_caps (real path); cache uses RwLock for interior mutability per ndarray data-flow rule ("no &mut self during computation"); kernel_signature comes from CodecParams method (Rule E), not a free-function hash. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

User directive: "Serialisation only once when touching as rest, no Serialisation EVER inside." Rule F binds serialisation to the two legitimate edges: Ingress (once per request): REST/gRPC handler decodes JSON/protobuf → Rust objects WireTensorView.bytes_base64 → 64-byte-aligned [u8] buffer YAML config file → parsed CodecParams at load time Egress (once per response / per candidate): REST/gRPC response encodes Rust result → JSON/protobuf Lance append writes candidate row → Arrow columnar Everything between ingress and egress is in-memory Rust objects or zero-copy &[u8] SoA slices. No JSON, no YAML, no protobuf, no bincode, no re-encode for "debug dumps." Traces flow as Rust objects through ShaderSink; only the final sink at the egress boundary may serialise. Hard prohibitions inside the pipeline: - serde_json::to_string between layers - bincode::serialize for L1↔L2↔L3 handoffs - prost::Message::encode inside the JIT loop - re-parsing YAML per candidate (parse once at load, cache object) - debug-JSON dumps inside hot paths Why load-bearing: 1. Alignment survives — decoded tensor bytes land once in a 64-byte-aligned buffer; no intermediate re-pack. 2. JIT cache keys stay stable — kernel_signature hashes the Rust object directly, no "same config, different whitespace → different hash → cache miss" trap. 3. Token-agreement comparisons stay honest — both Passthrough and candidate paths consume the same decoded buffer; any internal re-encode would introduce precision drift that mimics or masks codec error. 4. Sweep throughput — decode at 2-10 GB/s is fine once; repeated re-serialisation would turn a JIT-fast sweep into serde-bound. Enforcement: new test gate no_internal_serialisation_test in Phase 0 scans codec_research.rs / codec_bridge.rs / token_agreement.rs / markov_bundle.rs for forbidden symbols (serde_json::*, bincode::*, prost encode/decode outside handlers). Fails the build if any such call appears outside src/serve.rs / src/grpc.rs ingress/egress handlers or the Lance append writer. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Eight concrete YAML configs for configs/codec/*.yaml that Phase 0 will consume: 00_baseline_passthrough — regression anchor (top1=1.000 exactly) 01_pr220_baseline — negative control, reproduces #220 ICC 0.195 02_pr219_overfit_reproducer — negative control, split-test must FAIL 10_fix_a_wider_codebook — #220 (a) 1024 centroids 11_fix_b_residual_pq — #220 (b) residual depth=1 12_fix_c_hadamard_rotation — #220 (c) Hadamard pre-rotation 13_fix_d_opq_rotation — #220 (d) OPQ learned rotation 20_composite_a_plus_b — composition probe for combinatorial lift 30_cross_product_sweep — SweepGrid for D3.1 initial sweep Each YAML: - Names lane_width explicitly (Rule E) so the JIT compiles the right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others default to F32x16. - Carries a notes: block stating the expected measurement outcome, so Phase 0's regression detection has ground truth to check against (e.g., baseline reproducer must produce ICC ≈ 0.195, overfit reproducer must FAIL the split-test). - Separates calibration_rows from measurement_rows where relevant (pr219_overfit_reproducer sets them equal so the pipeline refuses to report the ICC, demonstrating the guard that prevents PR #219's overfit-on-training artefact from recurring). 30_cross_product_sweep specifies the initial 54-candidate grid (1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance × 2 lane widths). Expected JIT compile budget: ~800 ms one-time; everything after is cache hits per Rule A/B. Operating principle reiterated at the end: adding a candidate is authoring a YAML; changing params is editing YAML; Rust reads YAML once at ingress (Rule F) and never re-serialises. Sweep logger appends result rows to Lance — the only egress beyond the REST response. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Three gaps found in the 8-item checklist; remediations folded into Phase 0 as new deliverables so they ship from day one, not as follow-up: Gap 1 — auto-detect, don't hardcode. Current plan expects caller to supply lane_width + tensor shape. Patch: D0.5 new auto_detect.rs (~140 LOC) reads config.json next to the safetensors and returns ModelFingerprint { architecture, hidden_size, lane_width default, tokenizer_class, … }. Consumed by WireTokenAgreement when tensor_view.lane_width is omitted. Mirrors EmbedAnything auto_detect.rs (6 tests). Gap 3 — builder, not raw struct assembly. Current plan shows CodecParams assembled directly. Patch: D0.6 CodecParamsBuilder fluent API in lance-graph-contract::cam. Used by sweep driver / tests / frontier analysis; YAML ingress still produces CodecParams via serde. Mirrors EmbedAnything builder.rs (7 tests). Gap 5 — u8 vs i8 distance tables. Current plan treats "adc" as one distance variant. Patch: split distance into adc_u8 / adc_i8 at the YAML + Rust enum level. Sign-handling affects bipolar cancellation per codec-findings-2026-04-20.md §I1 sign-flip. Three remain clean: Item 2 (sink pattern) — ShaderSink trait + Lance append are sinks. Item 4 (feature gates) — --features lab / serve / grpc declared. Item 6 (per-role scales) — one role per YAML preserves z-scale. Item 7 (calibration↔runtime boundary) — calibration_rows vs measurement_rows already split; 02_pr219_overfit_reproducer is the explicit test that enforces the boundary. Item 8 (no forward pass) — codebook/tile lookup only, per I6. All 5 anti-patterns dodged: lib.rs stays declarations-only; hot path is zero-copy &[u8] into SoA + Arc'd KernelHandle (no clones); Rust-first API; codebook/tile lookup (no matmul inner loop); precision ladder BF16 calibration → u8/i8 runtime → f32 accumulator (enforced by Rule E's LaneWidth on the Wire DTO matching the JIT kernel input format). New D0.7 — precision-ladder contract. CodecParams validation refuses impossible shapes at ingress (e.g., { lane_width: F32x16, rotation: Opq(…) } — OPQ must use BF16x32 to match tile_dpbf16ps). Validation fires before any JIT compile. Phase 0 LOC bumps: ~480 → ~700. Still one upfront rebuild. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…dation First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan. Zero-dep contract-side types the lab API (cognitive-shader-driver) will carry into JIT compilation. Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC): Enums (Rule E — Wire surface IS the SIMD surface, object-oriented): LaneWidth { F32x16, U8x64, F64x8, BF16x32 } — mirrors ndarray::simd::* Distance { AdcU8, AdcI8 } — CODING_PRACTICES gap 5 (sign-handling / bipolar cancellation) Rotation { Identity, Hadamard{dim}, Opq{blob,dim} } Structs: ResidualSpec { depth, centroids } CodecParams { subspaces, centroids, residual, lane_width, pre_rotation, distance, calibration_rows, measurement_rows, seed } Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct): CodecParamsBuilder::new() .subspaces(u32).centroids(u32).residual(ResidualSpec) .lane_width(LaneWidth).rotation(Rotation).distance(Distance) .calibration_rows(u32).measurement_rows(u32).seed(u64) .build() -> Result<CodecParams, CodecParamsError> Validation fires BEFORE any JIT compile (D0.7 precision ladder): - ZeroDimension — subspaces == 0 or centroids == 0 - OpqRequiresBf16 — OPQ routes through tile_dpbf16ps; only LaneWidth::BF16x32 is valid - HadamardDimNotPow2 — Sylvester construction needs dim = 2^k - CalibrationEqualsMeasurement — overfit guard: refuses to emit ICC when calibration_rows == measurement_rows (reproduces PR #219's 128-row trained-and-tested artifact) Methods on CodecParams: kernel_signature() -> u64 — JIT cache key (Rule E); excludes seed so calibration-sample changes don't invalidate cached kernels is_matmul_heavy() -> bool — true for OPQ or centroids > 512; drives Tier-1 AMX dispatch decision (Rule C polyfill hierarchy) Rotation::is_matmul() -> bool — Identity and Hadamard are false (butterfly stays on Tier-3 F32x16); only Opq returns true 14 new tests covering: - builder default matches PR #220 baseline shape - each validation variant fires correctly - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error - Hadamard + non-pow2 dim rejected - overfit guard fires on calibration == measurement - kernel_signature stable across identical builds - kernel_signature excludes seed (cache stays hot) - kernel_signature changes with centroids / rotation kind - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids) Zero-dep preserved (stdlib only: std::collections::hash_map:: DefaultHasher for kernel_signature, core::fmt + core::error for error types). No serde in the contract — YAML/JSON deserialisation belongs to the consumer crate, which will produce CodecParams via serde at the REST handler (Rule F — serialisation at edge only). Tests: 147/147 contract suite passing (133 prior + 14 new). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Retroactive hygiene for the recent PR arc + prospective enforcement so the gap never recurs. User directive: "should have happened to begin with." LATEST_STATE.md: - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)" - Recently Shipped table: prepended rows for #225 (open), #224, and #223 with full shipped-content summaries - Contract Inventory: expanded cam:: entry with all new codec- sweep types (LaneWidth / Distance / Rotation / ResidualSpec / CodecParams / CodecParamsBuilder / CodecParamsError) including the precision-ladder-fires-before-JIT invariant - Active Branches: recorded claude/teleport-session-setup-wMZfb and its three merged PRs - Active Integration Plans: added codec-sweep-via-lab-infra-v1 alongside elegant-herding-rocket-v1 - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/ 0.5) + the elegant-herding Phase 2 block PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only): - #225 entry: plan + CodecParams/Builder/precision validation + rules A-F locked + decisions for future PRs - #224 entry: three-part lab stack + thinking harvest + I11 measurability locked - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants locked (the cross-cutting architectural ruleset this workspace now enforces) STATUS_BOARD.md: - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across 5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued) EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries): - Board hygiene is the driving seat, not cleanup (this session's self-reflection turned into a rule) - Codec cert is token agreement, not synthetic ICC (#219 → #220 arc; #225 CalibrationEqualsMeasurement typed rejection) - Lab REST surface is three-part (API + Planner + JIT), not just scaffolding - Thinking harvest via REST/Cypher = the AGI magic bullet - SoA never scalarises without ndarray (iron rule Rule C) - AGI is the glove, not the oracle — four-axis SoA is what you wear CLAUDE.md — new top-level § "The Stance — Driving Seat + AGI-as-Glove (P0, read first)": - Explicit driving-seat posture: the session STEERS the stack, doesn't observe it - AGI-as-glove doctrine concrete: topic → FingerprintColumns, angle → QualiaColumn, thinking → MetaColumn, planner → EdgeColumn. New capability lands as a new column, not a layer. - MANDATORY Board-Hygiene Rule as a table: every PR that adds a type / plan / D-id / epiphany / tech-debt / issue MUST update the corresponding board file IN THE SAME COMMIT. Retroactive hygiene (merge PR → later cleanup) is now an anti-pattern the rule forbids. - "Consult, don't guess" — agent/knowledge-first discipline: specialist-agent card → knowledge doc → board inventory → only then grep. Subagent spawn with curated docs beats main- thread grep. 147/147 contract suite still passing. Doc-only PR otherwise (Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps from the timed-out bus-compiler subagent were reverted — they'll land with D0.1/D0.3 when the Wire code lands). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…p-wMZfb board hygiene + CLAUDE.md driving-seat tightening (post #223/#224/#225)

First code deliverable of codec-sweep-via-lab-infra Phase 0 on the consumer side. Extends the lab Wire surface to carry the CodecParams shape from PR #225 with an object-oriented tensor DTO that decodes once at ingress into a 64-byte-aligned buffer consumable directly by F32x16::from_slice via slice::array_windows::<64>. crates/cognitive-shader-driver/src/wire.rs: Serde mirrors for contract::cam types (zero serde in the contract per CLAUDE.md rule 5): - WireLaneWidth {F32x16, U8x64, F64x8, BF16x32} - WireDistance {AdcU8, AdcI8} - WireRotation {Identity, Hadamard{dim}, Opq{matrix_blob_id, dim}} - WireResidualSpec {depth, centroids} - WireCodecParams — mirrors CodecParams one-for-one - From/TryFrom conversions; TryFrom<WireCodecParams> for CodecParams runs the precision-ladder validation (OPQ↔BF16x32, Hadamard pow2, overfit guard rejecting calibration_rows == measurement_rows) BEFORE any JIT compile would fire. WireTensorView + AlignedBytes (Rule A + Rule E + Rule F): - shape [u32; 2] + lane_width + bytes_base64 on the wire - decode() base64-decodes ONCE at ingress into AlignedBytes (heap, 64-byte aligned via Layout::from_size_align, Drop deallocates with matching layout, Send + Sync) - row() / subspace() / row_count() / col_count() / row_bytes() / element_bytes() — object-oriented methods per Rule E, mirror the SoA+SIMD ops the JIT kernel will perform - is_aligned_64() for the kernel_contract_test gate - WireTensorViewError {Base64, SizeMismatch, ZeroShape} WireCalibrateRequest extended additively: - New: params: Option<WireCodecParams> + tensor_view: Option<WireTensorView> (the new path) - Legacy: num_subspaces / num_centroids / kmeans_iterations / max_rows / icc_samples preserved for back-compat WireCalibrateResponse extended additively: - New: kernel_hash (= CodecParams::kernel_signature()), compile_time_us, backend ("amx" | "vnni" | "avx512" | "avx2" | "legacy") - Never "scalar" on a SoA path — the iron rule enforced at the response contract crates/cognitive-shader-driver/Cargo.toml: - serve feature now pulls base64 (v0.22) + bytemuck (v1) optional deps. No new features; these belong under the existing lab umbrella. crates/cognitive-shader-driver/src/codec_research.rs: - Legacy calibrate_tensor path fills the new response fields with zeros + backend = "legacy". D1.1 (JIT kernel) populates them meaningfully when it lands. Tests (8 new, all passing under --features serve): - wire_codec_params_round_trip_to_contract — OPQ + BF16x32 + wide codebook → builds cleanly, is_matmul_heavy true - wire_codec_params_rejects_opq_with_f32x16 — precision-ladder guard typed-rejects the wrong lane at ingress - wire_codec_params_rejects_calibration_equals_measurement — overfit guard typed-rejects the PR #219 pattern at ingress - wire_codec_params_deserializes_from_minimal_json — serde defaults correct (lane_width=F32x16, distance=AdcU8, rotation=Identity, calibration_rows=2048, seed=42) - wire_tensor_view_decode_lands_in_64byte_aligned_buffer — explicit Rule A proof: decoded AlignedBytes.is_aligned_64(), and slice::array_windows::<64>() yields exactly one window per 16-col F32 row - wire_tensor_view_rejects_size_mismatch — typed error for base64 payload not matching declared shape - wire_tensor_view_subspace_slicing — subspace(row, k, sub_bytes) returns the expected offset+len - wire_calibrate_request_{accepts_new_params_field, back_compat_legacy_fields} — both the new (params-carrying) and legacy payload shapes deserialise correctly Board hygiene in the SAME commit (per CLAUDE.md Mandatory Board-Hygiene Rule from PR #226): - STATUS_BOARD.md D0.1 row: Queued → In PR - LATEST_STATE.md cognitive-shader-driver section: new subsection listing the Wire surface types landed Test summary: 55/55 cognitive-shader-driver tests pass under --features serve; 147/147 lance-graph-contract tests pass. Rules honored: Rule A — stdlib slice::array_windows::<N>() + ndarray::simd::*, proven by the test that calls array_windows::<64>() on the decoded row Rule B — no std::arch, no hpc::simd_avxNNN reach; ndarray::simd::* imports only Rule C — n/a for DTO code (JIT tier selection lands D1.1) Rule D — JSON/YAML/REST only, no in-Rust CodecParams construction on the Wire side Rule E — Wire surface IS the SIMD surface; LaneWidth explicit, methods not scalar bags, 64-byte-aligned decode proven Rule F — decode ONCE at ingress via WireTensorView::decode; WireCalibrateRequest::params: Option<WireCodecParams> carries the Rust object through the rest of the pipeline https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1. 66/66 cognitive-shader-driver tests pass under --features serve (+11 new). D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1): Reads <model_path>/config.json (HuggingFace layout) and returns ModelFingerprint { architecture, hidden_size, n_layers, tokenizer_class, vocab_size, default_lane_width, default_distance }. Architecture routing: llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX) bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512) torch_dtype override wins over architecture heuristic. Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}. Best-effort tokenizer_class from tokenizer_config.json. 8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta (d_model alias) / generic fallback / missing-config / missing-field. D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate): DTOs: WireBaseline { Passthrough } — default, extensible WireTokenAgreement { model_path, reference, candidate (WireCodecParams), prompt_set_blob_id, n_tokens } WireTokenAgreementResult { top1_rate, top5_rate, divergence_positions, per_layer_mse, candidate_latency_us, reference_latency_us, stub, backend } Phase 0 handler stub (not shipped yet): returns stub:true / backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the real decode-and-compare loop (reference model load + top-k comparison + per-layer MSE). Pass gates (for when the harness lands): top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline. This is the ACTUAL codec cert gate — reconstruction ICC is necessary-but-not-sufficient (per #219/#220 lesson). 3 round-trip serde tests: full payload + stub-backend default + baseline default. Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md updated: D0.1 Queued → Shipped (PR #227 — was stale) D0.2 Queued → In PR (this branch) D0.5 Queued → In PR (this branch) Phase 0 state after this commit: ✅ D0.1 WireCalibrate + WireTensorView (PR #227) ✅ D0.6 CodecParamsBuilder (PR #225) ✅ D0.7 precision-ladder validation (PR #225) ✅ D0.5 auto_detect (this PR) ✅ D0.2 WireTokenAgreement stub (this PR) ⏳ D0.3 WireSweep streaming endpoint (next PR) ⏳ D0.4 surface freeze (gates after D0.3) Rules honored: Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams Rule E — Wire surface IS the SIMD surface (lane_width on candidate) Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Last Phase 0 Wire-surface deliverable from codec-sweep-via-lab-infra-v1. 71/71 cognitive-shader-driver tests pass under --features serve (+5 new D0.3 tests). DTOs (~250 LOC in wire.rs): WireMeasure enum: ReconstructionErrorHeldOut / ReconstructionIccHeldOut / TokenAgreementTop1 / TokenAgreementTop5 / PerLayerMse (serde: lowercase snake_case) WireSweepGrid: subspaces / centroids / residual_depths / rotations / distances / lane_widths — each a Vec<T> with sensible defaults (defaults produce cardinality 1 for minimal payloads) + residual_centroids / calibration_rows / measurement_rows / seed Methods: - cardinality() -> usize — product of axis lengths - enumerate() -> Vec<WireCodecParams> — full Cartesian product WireSweepRequest: tensor_path / grid / measure (default: ICC + top-1) / log_to_lance (optional Lance fragment path) / label WireSweepResult (one per grid point): grid_index / candidate / kernel_hash (CodecParams::kernel_signature) / calibrate (Option<WireCalibrateResponse>) / token_agreement (Option<WireTokenAgreementResult>) / stub flag (mirrors WireTokenAgreementResult.stub) WireSweepResponse (for non-streaming batch clients): label / cardinality / results / elapsed_ms / lance_fragment_path Streaming handler (SSE) + Lance writer deferred to Phase 3 D3.1. Phase 0 ships the SURFACE; Phase 3 lands the execution. Tests (5 new): - sweep_grid_cardinality_is_product_of_axes (1×3×3×2×1×2 = 36) - sweep_grid_enumerate_produces_all_unique_signatures (4 distinct kernel signatures from 4 distinct IR-shaping tuples) - sweep_grid_defaults_produce_single_candidate (empty JSON {} → cardinality 1, single default WireCodecParams) - sweep_request_round_trips_json (full payload with all fields) - sweep_measure_serializes_snake_case (serde enum format) Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md: D0.3 Queued → In PR D0.4 Queued → Ready (surface freeze fires on merge) EPIPHANIES.md PREPEND: "D0.3 sweep grid IS the JIT cache warmer" — the grid and the cache signature are the same object viewed from two sides. Each unique (subspaces, centroids, residual_depth, rotation_kind, distance, lane_width) tuple maps to exactly one kernel_signature(). First traversal compiles N kernels; every subsequent sweep with overlapping tuples hits cache at ~0 ms. 54-candidate Appendix A §30 sweep: ~800 ms one-time compile, free after. Phase 0 state after this PR (all 7 D0.x deliverables): ✅ D0.1 WireCalibrate + WireTensorView (PR #227) ✅ D0.2 WireTokenAgreement stub (PR #231) ✅ D0.3 WireSweep DTOs + grid (this PR) ⏳ D0.4 surface freeze (fires on merge) ✅ D0.5 auto_detect (PR #231) ✅ D0.6 CodecParamsBuilder (PR #225) ✅ D0.7 precision-ladder validation (PR #225) Rules honored (every Wire DTO in this PR): Rule D — JSON/YAML/REST only, never in-Rust construction at ingress Rule E — Wire surface IS SIMD surface (lane_widths axis explicit, kernel_hash returned per result) Rule F — serde mirrors at ingress only; enumerate() returns plain Rust objects that never re-serialize until egress After this PR merges: D0.4 surface freeze → Phase 1 (JIT kernels) begins. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…ace merge + adhering-agent review checklist Four additions to .claude/CODING_PRACTICES.md — extends the existing EmbedAnything-patterns content with the SoA / object-does-the-work / substrate-level patterns that crystallized during the codec-sweep lab-infra session (PRs #225–#239). 1. SoA + Object-Does-The-Work Patterns (~100 lines) - Checklist for new DTOs / kernels / caches: sealed builders, stable signatures excluding drift, typed errors, Cache<H> generic-over-handle, stub flag for Phase-N-before-Phase-N+k, feature matrix tested, serialisation at edges, DoS ceilings at construction not enumeration - Five additional anti-patterns (6-10) surfaced by session corrections: stateless-shader vs stateful-engine misframed, hallucinating ndarray surface, feature-matrix blindness, epiphany-dumping orientation-as-discovery, raw struct literals bypassing builders - 10 shipped-pattern reference entries citing the actual files + test counts - 8 principles: object does the work, SoA over AoS, same- substrate-different-view, Stream/Resonance/Bus lifecycle, weights are seeds, scaffold-before-codegen, feature matrix is part of contract, pin your toolchain - Read order for new sessions 2. MANDATORY: `ndarray::simd::*` canonical import (new section) - Correct/wrong examples per Rule B + invariant I2 - AMX sibling module + tile primitives + simd_caps canonical paths - Polyfill hierarchy (Tier 1 AMX → Tier 4 AVX-2, no consumer scalar tier) - Reviewer trigger for `std::arch::*` or `ndarray::hpc::simd_avxNNN::*` reach 3. 3-Way BindSpace Mutation Scheme (new section) - Table: Xor (single-writer reversible) / Bundle (multi-writer saturating, E-SUBSTRATE-1 guaranteed associative in expectation) / Superposition (preserve ambiguity) - When to use each + explicit DON'T-INTERCHANGE rule - Iron rule citation (CLAUDE.md I-SUBSTRATE-MARKOV): Xor on multi-writer path breaks the Markov guarantee - Reviewer trigger for Xor on concurrent-writer paths 4. Adhering-Agent Review Checklist (new section) - Per-agent table mapping 18 specialist agents + 5 meta-agents to the specific checklist sections they own - Spawn pattern: hand PR scope to 1-2 matching agents with pointer to this doc; they walk their rubric and return PASS/FAIL with specific line citations - Agents READ this doc as their rubric, not their personality The doc is now both the author-side pattern guide AND the reviewer-side checklist. Specialist agents adhere to it; PRs are reviewed against it; new sessions load it as part of the mandatory pre-read set. Cross-ref: CLAUDE.md I-SUBSTRATE-MARKOV + I-NOISE-FLOOR-JIRAK; lab-vs-canonical-surface.md invariants I1-I11 + six rules A-F; cognitive-shader-architecture.md 7-layer stack + SoA column types; ripple-dto-contracts.md Stream/Resonance/Bus/ThoughtStruct lifecycle; this session's PRs #225-#239 shipping the SoA patterns in practice. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 9 commits April 20, 2026 17:41

AdaWorldAPI merged commit 9db3a47 into main Apr 20, 2026

AdaWorldAPI mentioned this pull request Apr 20, 2026

board hygiene + CLAUDE.md driving-seat tightening (post #223/#224/#225) #226

Merged

5 tasks

AdaWorldAPI added a commit that referenced this pull request Apr 20, 2026

Merge pull request #226 from AdaWorldAPI/claude/teleport-session-setu…

3c9ee75

…p-wMZfb board hygiene + CLAUDE.md driving-seat tightening (post #223/#224/#225)

AdaWorldAPI mentioned this pull request Apr 20, 2026

D0.5 auto_detect + D0.2 WireTokenAgreement stub (Phase 0, 66/66 tests) #231

Merged

5 tasks

AdaWorldAPI mentioned this pull request Apr 20, 2026

D0.3 WireSweep DTOs + grid enumerator (Phase 0 surface complete, 71/71 tests) #232

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types#225

plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types#225
AdaWorldAPI merged 9 commits into
mainfrom
claude/teleport-session-setup-wMZfb

AdaWorldAPI commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 20, 2026

Summary

The plan — six rules binding every JIT-emitted kernel

The four-PR staircase it unlocks

Audit vs .claude/CODING_PRACTICES.md (EmbedAnything patterns)

Starter YAML configs (Appendix A — 9 configs)

Code delivered in this PR (D0.6 + D0.7)

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Audit vs `.claude/CODING_PRACTICES.md` (EmbedAnything patterns)