plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types#225
Merged
Conversation
Operationalises PR #220's "What's Needed to Fix" list (wider codebook, residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through the lab endpoint — every codec difference is a JIT kernel, not a cargo rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run unlimited candidates without further rebuilds; Phase 5 graduates winners to the canonical OrchestrationBridge surface. Structure: Phase 0 — API hardening (one rebuild, then frozen): D0.1 CodecParams in WireCalibrate D0.2 WireTokenAgreement endpoint (I11 cert gate) D0.3 WireSweep streaming + Lance append D0.4 surface freeze Phase 1 — JIT codec kernels (rebuild-free): D1.1 CodecKernelCache via JitCompiler (Cranelift) D1.2 Rotation primitives (Identity / Hadamard / OPQ) D1.3 Residual PQ via JIT composition Phase 2 — Token-agreement harness (the I11 cert gate): D2.1 Reference-model loader (ndarray safetensors) D2.2 Decode-and-compare loop (top-k, per-layer MSE) D2.3 Handler wiring Phase 3 — Sweep driver + Lance logger Phase 4 — DataFusion frontier analysis Phase 5 — Graduation to OrchestrationBridge (per winner only) ~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards. Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of hours). All work behind --features lab until graduation. INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224 dependency for the architectural framing. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…::* + AMX + YAML/JSON
Binds four non-negotiable rules on every JIT-emitted kernel in
Phases 1-3:
Rule A: Tensor access via array_window only.
No manual index math, no raw pointer reach, no custom
slice offset recompute. ndarray::simd::array_window
handles stride / alignment / bounds / lane padding.
Rule B: SIMD exclusively via ndarray::simd::*.
No std::arch::*, no ndarray::hpc::*, no hand-rolled
intrinsics. Missing primitive → add to ndarray first,
never bypass the canonical surface from the JIT.
Rule C: Backend dispatch via simd_caps() (AMX-ready).
JIT emits generic IR calling ndarray::simd primitives.
Those resolve to AMX tiles on aarch64-apple-darwin with
AMX capability, AVX-512 on x86_64, NEON on aarch64, and
scalar fallback otherwise. Rotation and distance-table
kernels benefit most from AMX (matmul-heavy paths).
JIT never emits AMX intrinsics directly — it calls
matmul_tiled / hadamard_butterfly / etc., which dispatch
internally.
Rule D: Configuration is JSON / YAML / REST only.
No codec candidate defined in Rust. One schema
(CodecParams) serialised three ways:
- YAML under configs/codec/*.yaml (human-authored)
- JSON payload (curl / REST)
- REST endpoint body at /v1/shader/calibrate
New candidate = new YAML/JSON file. Zero Rust changes.
Zero rebuilds.
Enforcement: Phase 0 ships two new test gates —
- kernel_contract_test scans emitted IR for banned symbols
(std::arch, ndarray::hpc) and required symbols (array_window).
- amx_dispatch_test (aarch64-apple-darwin-only) verifies
simd_caps().has_amx() and trace records backend = "amx" for
rotation kernels on M-series.
D1.1-D1.3 body sketches updated to show the contract in practice:
every decode / rotation / composition stage reads via array_window
and calls ndarray::simd primitives (adc_distances_simd,
hadamard_butterfly, matmul_tiled, sub_tiled, add_tiled), never
raw intrinsics.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…hanges
Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md,
codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and
user directives "its all there, dont touch, just be aware how to use
crate::simd", "wire accordingly into the lab infra", "via struct of
arrays":
- slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77,
const-generic. I conflated it with a missing ndarray::array_window
(singular); corrected.
- AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via
stable inline asm on Rust 1.94, per src/simd_amx.rs header),
NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly;
inline asm at src/hpc/amx_matmul.rs is the stable consumer path.
Verified on kernel 6.18.5 with XCR0 bits 17+18 set.
- Real primitive names (no hallucinated matmul_tiled /
hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16
for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI;
F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline.
- Polyfill hierarchy per user directive
(simd_amx > simd_avx512 > simd_avx2 fallback):
Tier 1: Intel AMX tile (256 MACs/instr)
Tier 2: AVX-512 VNNI (64 MACs/instr)
Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory
default per ndarray's .cargo/config.toml
target-cpu=x86-64-v4)
Tier 4: AVX-2 F32x8 fallback
Tier 5: scalar reference
- Rule A wires SoA: the &[u8] slice array_windows iterates comes
from a BindSpace column (FingerprintColumns / QualiaColumn /
MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new
data structures — the SoA column IS the input surface.
- Dropped all "Phase 0 ndarray prerequisite" language. Everything
the sweep needs exists in ndarray today; this plan wires the
existing surface into cognitive-shader-driver (REST handlers +
CodecKernelCache + CodecResearchBridge). Zero ndarray changes.
- Added reality-check against codec-findings-2026-04-20.md so the
sweep does NOT re-derive measured winners: Had-Q5×D-R already
ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row-
only at ICC ≈ 0.9; zipper serves bundling axis, not argmax;
fractal leaf descriptors are DEAD (sign-flip invariant). The
sweep focuses on #220's four unmeasured candidates (wider
codebook / residual PQ / Hadamard pre-rotation / OPQ) and on
the missing axis — token agreement, not reconstruction ICC.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
User directive: "i should never have to remind you to use simd
because the struct of arrays never ever does scalar without ndarray."
Corrections:
- Removed consumer-visible "Tier 5 scalar" row from the polyfill
table. Scalar fallback (when it exists at all for exotic targets)
lives INSIDE ndarray::simd::* — the consumer never hand-rolls
a scalar loop on a SoA path.
- Added iron rule before the tier table: every tier in the chain
calls ndarray::simd::* / ndarray::simd_amx::* /
ndarray::hpc::amx_matmul::* — if a kernel runs scalar on the
SoA path, the SoA invariant is broken.
- Dispatch pseudo-code cleaned: the else branch lands on
ndarray::simd::F32x16 (Tier 3 mandatory floor via target-cpu=
x86-64-v4). No "else scalar loop" short-circuit exists. If
ndarray::simd were unavailable, SoA wouldn't be the right path.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…oriented)
User directive: "the api for lab needs to be simd object oriented
surface."
Rule E binds the lab Wire DTOs to the SIMD shapes they feed. The Wire
types are NOT convenience scalar bags that get reassembled into SIMD
structures internally — they ARE the SIMD surface, serialised.
Four consequences:
(i) Lane-shaped aggregates. LaneWidth enum mirrors ndarray::simd::*
lane types (F32x16, U8x64, F64x8, BF16x32). Every tensor-carrying
DTO names its lane_width explicitly.
(ii) Methods, not bags. WireTensorView exposes row() / row_count() /
lanes_f32x16() / subspace(); CodecParams exposes
kernel_signature() / lane_width() / is_matmul_heavy(). Consumers
never reassemble a tensor from a Vec<f32>.
(iii) Kernel signature keying. CodecParams::kernel_signature() returns
a stable hash only over fields that shape the emitted IR. JIT
cache keys on this object-computed signature; adding an unrelated
config field does not invalidate entries.
(iv) Serialisation preserves alignment. Decoded WireTensorView bytes
land in a 64-byte-aligned buffer; consumers call
slice::array_windows::<64>() + F32x16::from_slice directly, no
adapter, no copy, no re-align.
Plus three cleanups from prior corrections:
- kernel_contract_test now scans IR for the real symbols:
ndarray::simd::*, ndarray::simd_amx::*, ndarray::hpc::amx_matmul::*
(allowed) and std::arch / simd_avxNNN reach (banned).
- amx_dispatch_test corrected: x86_64-gated (not aarch64-apple-darwin),
calls ndarray::simd_amx::amx_available(). When true on Sapphire
Rapids+ runners, asserts backend = "amx" trace for matmul-heavy
candidates; when false, verifies Tier-2 VNNI or Tier-3 F32x16
selection — NEVER scalar.
- New wire_object_surface_test round-trips WireCalibrate +
WireTensorView through JSON/gRPC and proves the decoded bytes are
consumable with zero adapter code via array_windows + F32x16.
- D1.1 body sketch cleaned: dropped fictional array_window (singular);
imports simd_caps from ndarray::hpc::simd_caps (real path); cache
uses RwLock for interior mutability per ndarray data-flow rule
("no &mut self during computation"); kernel_signature comes from
CodecParams method (Rule E), not a free-function hash.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
User directive: "Serialisation only once when touching as rest, no
Serialisation EVER inside."
Rule F binds serialisation to the two legitimate edges:
Ingress (once per request):
REST/gRPC handler decodes JSON/protobuf → Rust objects
WireTensorView.bytes_base64 → 64-byte-aligned [u8] buffer
YAML config file → parsed CodecParams at load time
Egress (once per response / per candidate):
REST/gRPC response encodes Rust result → JSON/protobuf
Lance append writes candidate row → Arrow columnar
Everything between ingress and egress is in-memory Rust objects or
zero-copy &[u8] SoA slices. No JSON, no YAML, no protobuf, no
bincode, no re-encode for "debug dumps." Traces flow as Rust
objects through ShaderSink; only the final sink at the egress
boundary may serialise.
Hard prohibitions inside the pipeline:
- serde_json::to_string between layers
- bincode::serialize for L1↔L2↔L3 handoffs
- prost::Message::encode inside the JIT loop
- re-parsing YAML per candidate (parse once at load, cache object)
- debug-JSON dumps inside hot paths
Why load-bearing:
1. Alignment survives — decoded tensor bytes land once in a
64-byte-aligned buffer; no intermediate re-pack.
2. JIT cache keys stay stable — kernel_signature hashes the Rust
object directly, no "same config, different whitespace →
different hash → cache miss" trap.
3. Token-agreement comparisons stay honest — both Passthrough and
candidate paths consume the same decoded buffer; any internal
re-encode would introduce precision drift that mimics or masks
codec error.
4. Sweep throughput — decode at 2-10 GB/s is fine once; repeated
re-serialisation would turn a JIT-fast sweep into serde-bound.
Enforcement: new test gate no_internal_serialisation_test in
Phase 0 scans codec_research.rs / codec_bridge.rs / token_agreement.rs
/ markov_bundle.rs for forbidden symbols (serde_json::*, bincode::*,
prost encode/decode outside handlers). Fails the build if any such
call appears outside src/serve.rs / src/grpc.rs ingress/egress
handlers or the Lance append writer.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Eight concrete YAML configs for configs/codec/*.yaml that Phase 0 will consume: 00_baseline_passthrough — regression anchor (top1=1.000 exactly) 01_pr220_baseline — negative control, reproduces #220 ICC 0.195 02_pr219_overfit_reproducer — negative control, split-test must FAIL 10_fix_a_wider_codebook — #220 (a) 1024 centroids 11_fix_b_residual_pq — #220 (b) residual depth=1 12_fix_c_hadamard_rotation — #220 (c) Hadamard pre-rotation 13_fix_d_opq_rotation — #220 (d) OPQ learned rotation 20_composite_a_plus_b — composition probe for combinatorial lift 30_cross_product_sweep — SweepGrid for D3.1 initial sweep Each YAML: - Names lane_width explicitly (Rule E) so the JIT compiles the right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others default to F32x16. - Carries a notes: block stating the expected measurement outcome, so Phase 0's regression detection has ground truth to check against (e.g., baseline reproducer must produce ICC ≈ 0.195, overfit reproducer must FAIL the split-test). - Separates calibration_rows from measurement_rows where relevant (pr219_overfit_reproducer sets them equal so the pipeline refuses to report the ICC, demonstrating the guard that prevents PR #219's overfit-on-training artefact from recurring). 30_cross_product_sweep specifies the initial 54-candidate grid (1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance × 2 lane widths). Expected JIT compile budget: ~800 ms one-time; everything after is cache hits per Rule A/B. Operating principle reiterated at the end: adding a candidate is authoring a YAML; changing params is editing YAML; Rust reads YAML once at ingress (Rule F) and never re-serialises. Sweep logger appends result rows to Lance — the only egress beyond the REST response. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Three gaps found in the 8-item checklist; remediations folded into
Phase 0 as new deliverables so they ship from day one, not as
follow-up:
Gap 1 — auto-detect, don't hardcode.
Current plan expects caller to supply lane_width + tensor shape.
Patch: D0.5 new auto_detect.rs (~140 LOC) reads config.json
next to the safetensors and returns ModelFingerprint {
architecture, hidden_size, lane_width default, tokenizer_class,
… }. Consumed by WireTokenAgreement when tensor_view.lane_width
is omitted. Mirrors EmbedAnything auto_detect.rs (6 tests).
Gap 3 — builder, not raw struct assembly.
Current plan shows CodecParams assembled directly.
Patch: D0.6 CodecParamsBuilder fluent API in
lance-graph-contract::cam. Used by sweep driver / tests /
frontier analysis; YAML ingress still produces CodecParams via
serde. Mirrors EmbedAnything builder.rs (7 tests).
Gap 5 — u8 vs i8 distance tables.
Current plan treats "adc" as one distance variant.
Patch: split distance into adc_u8 / adc_i8 at the YAML + Rust
enum level. Sign-handling affects bipolar cancellation per
codec-findings-2026-04-20.md §I1 sign-flip.
Three remain clean:
Item 2 (sink pattern) — ShaderSink trait + Lance append are sinks.
Item 4 (feature gates) — --features lab / serve / grpc declared.
Item 6 (per-role scales) — one role per YAML preserves z-scale.
Item 7 (calibration↔runtime boundary) — calibration_rows vs
measurement_rows already split; 02_pr219_overfit_reproducer is
the explicit test that enforces the boundary.
Item 8 (no forward pass) — codebook/tile lookup only, per I6.
All 5 anti-patterns dodged: lib.rs stays declarations-only; hot
path is zero-copy &[u8] into SoA + Arc'd KernelHandle (no clones);
Rust-first API; codebook/tile lookup (no matmul inner loop);
precision ladder BF16 calibration → u8/i8 runtime → f32 accumulator
(enforced by Rule E's LaneWidth on the Wire DTO matching the JIT
kernel input format).
New D0.7 — precision-ladder contract. CodecParams validation
refuses impossible shapes at ingress (e.g., { lane_width: F32x16,
rotation: Opq(…) } — OPQ must use BF16x32 to match tile_dpbf16ps).
Validation fires before any JIT compile.
Phase 0 LOC bumps: ~480 → ~700. Still one upfront rebuild.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…dation
First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan.
Zero-dep contract-side types the lab API (cognitive-shader-driver)
will carry into JIT compilation.
Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC):
Enums (Rule E — Wire surface IS the SIMD surface, object-oriented):
LaneWidth { F32x16, U8x64, F64x8, BF16x32 } — mirrors ndarray::simd::*
Distance { AdcU8, AdcI8 } — CODING_PRACTICES gap 5
(sign-handling /
bipolar cancellation)
Rotation { Identity, Hadamard{dim}, Opq{blob,dim} }
Structs:
ResidualSpec { depth, centroids }
CodecParams { subspaces, centroids, residual, lane_width,
pre_rotation, distance, calibration_rows,
measurement_rows, seed }
Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct):
CodecParamsBuilder::new()
.subspaces(u32).centroids(u32).residual(ResidualSpec)
.lane_width(LaneWidth).rotation(Rotation).distance(Distance)
.calibration_rows(u32).measurement_rows(u32).seed(u64)
.build() -> Result<CodecParams, CodecParamsError>
Validation fires BEFORE any JIT compile (D0.7 precision ladder):
- ZeroDimension — subspaces == 0 or centroids == 0
- OpqRequiresBf16 — OPQ routes through tile_dpbf16ps;
only LaneWidth::BF16x32 is valid
- HadamardDimNotPow2 — Sylvester construction needs dim = 2^k
- CalibrationEqualsMeasurement — overfit guard: refuses to emit
ICC when calibration_rows ==
measurement_rows (reproduces PR #219's
128-row trained-and-tested artifact)
Methods on CodecParams:
kernel_signature() -> u64 — JIT cache key (Rule E); excludes
seed so calibration-sample changes
don't invalidate cached kernels
is_matmul_heavy() -> bool — true for OPQ or centroids > 512;
drives Tier-1 AMX dispatch decision
(Rule C polyfill hierarchy)
Rotation::is_matmul() -> bool — Identity and Hadamard are false
(butterfly stays on Tier-3 F32x16);
only Opq returns true
14 new tests covering:
- builder default matches PR #220 baseline shape
- each validation variant fires correctly
- OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error
- Hadamard + non-pow2 dim rejected
- overfit guard fires on calibration == measurement
- kernel_signature stable across identical builds
- kernel_signature excludes seed (cache stays hot)
- kernel_signature changes with centroids / rotation kind
- is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids)
Zero-dep preserved (stdlib only: std::collections::hash_map::
DefaultHasher for kernel_signature, core::fmt + core::error for
error types). No serde in the contract — YAML/JSON deserialisation
belongs to the consumer crate, which will produce CodecParams via
serde at the REST handler (Rule F — serialisation at edge only).
Tests: 147/147 contract suite passing (133 prior + 14 new).
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 20, 2026
Retroactive hygiene for the recent PR arc + prospective enforcement so the gap never recurs. User directive: "should have happened to begin with." LATEST_STATE.md: - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)" - Recently Shipped table: prepended rows for #225 (open), #224, and #223 with full shipped-content summaries - Contract Inventory: expanded cam:: entry with all new codec- sweep types (LaneWidth / Distance / Rotation / ResidualSpec / CodecParams / CodecParamsBuilder / CodecParamsError) including the precision-ladder-fires-before-JIT invariant - Active Branches: recorded claude/teleport-session-setup-wMZfb and its three merged PRs - Active Integration Plans: added codec-sweep-via-lab-infra-v1 alongside elegant-herding-rocket-v1 - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/ 0.5) + the elegant-herding Phase 2 block PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only): - #225 entry: plan + CodecParams/Builder/precision validation + rules A-F locked + decisions for future PRs - #224 entry: three-part lab stack + thinking harvest + I11 measurability locked - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants locked (the cross-cutting architectural ruleset this workspace now enforces) STATUS_BOARD.md: - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across 5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued) EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries): - Board hygiene is the driving seat, not cleanup (this session's self-reflection turned into a rule) - Codec cert is token agreement, not synthetic ICC (#219 → #220 arc; #225 CalibrationEqualsMeasurement typed rejection) - Lab REST surface is three-part (API + Planner + JIT), not just scaffolding - Thinking harvest via REST/Cypher = the AGI magic bullet - SoA never scalarises without ndarray (iron rule Rule C) - AGI is the glove, not the oracle — four-axis SoA is what you wear CLAUDE.md — new top-level § "The Stance — Driving Seat + AGI-as-Glove (P0, read first)": - Explicit driving-seat posture: the session STEERS the stack, doesn't observe it - AGI-as-glove doctrine concrete: topic → FingerprintColumns, angle → QualiaColumn, thinking → MetaColumn, planner → EdgeColumn. New capability lands as a new column, not a layer. - MANDATORY Board-Hygiene Rule as a table: every PR that adds a type / plan / D-id / epiphany / tech-debt / issue MUST update the corresponding board file IN THE SAME COMMIT. Retroactive hygiene (merge PR → later cleanup) is now an anti-pattern the rule forbids. - "Consult, don't guess" — agent/knowledge-first discipline: specialist-agent card → knowledge doc → board inventory → only then grep. Subagent spawn with curated docs beats main- thread grep. 147/147 contract suite still passing. Doc-only PR otherwise (Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps from the timed-out bus-compiler subagent were reverted — they'll land with D0.1/D0.3 when the Wire code lands). https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
5 tasks
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 20, 2026
First code deliverable of codec-sweep-via-lab-infra Phase 0 on the consumer side. Extends the lab Wire surface to carry the CodecParams shape from PR #225 with an object-oriented tensor DTO that decodes once at ingress into a 64-byte-aligned buffer consumable directly by F32x16::from_slice via slice::array_windows::<64>. crates/cognitive-shader-driver/src/wire.rs: Serde mirrors for contract::cam types (zero serde in the contract per CLAUDE.md rule 5): - WireLaneWidth {F32x16, U8x64, F64x8, BF16x32} - WireDistance {AdcU8, AdcI8} - WireRotation {Identity, Hadamard{dim}, Opq{matrix_blob_id, dim}} - WireResidualSpec {depth, centroids} - WireCodecParams — mirrors CodecParams one-for-one - From/TryFrom conversions; TryFrom<WireCodecParams> for CodecParams runs the precision-ladder validation (OPQ↔BF16x32, Hadamard pow2, overfit guard rejecting calibration_rows == measurement_rows) BEFORE any JIT compile would fire. WireTensorView + AlignedBytes (Rule A + Rule E + Rule F): - shape [u32; 2] + lane_width + bytes_base64 on the wire - decode() base64-decodes ONCE at ingress into AlignedBytes (heap, 64-byte aligned via Layout::from_size_align, Drop deallocates with matching layout, Send + Sync) - row() / subspace() / row_count() / col_count() / row_bytes() / element_bytes() — object-oriented methods per Rule E, mirror the SoA+SIMD ops the JIT kernel will perform - is_aligned_64() for the kernel_contract_test gate - WireTensorViewError {Base64, SizeMismatch, ZeroShape} WireCalibrateRequest extended additively: - New: params: Option<WireCodecParams> + tensor_view: Option<WireTensorView> (the new path) - Legacy: num_subspaces / num_centroids / kmeans_iterations / max_rows / icc_samples preserved for back-compat WireCalibrateResponse extended additively: - New: kernel_hash (= CodecParams::kernel_signature()), compile_time_us, backend ("amx" | "vnni" | "avx512" | "avx2" | "legacy") - Never "scalar" on a SoA path — the iron rule enforced at the response contract crates/cognitive-shader-driver/Cargo.toml: - serve feature now pulls base64 (v0.22) + bytemuck (v1) optional deps. No new features; these belong under the existing lab umbrella. crates/cognitive-shader-driver/src/codec_research.rs: - Legacy calibrate_tensor path fills the new response fields with zeros + backend = "legacy". D1.1 (JIT kernel) populates them meaningfully when it lands. Tests (8 new, all passing under --features serve): - wire_codec_params_round_trip_to_contract — OPQ + BF16x32 + wide codebook → builds cleanly, is_matmul_heavy true - wire_codec_params_rejects_opq_with_f32x16 — precision-ladder guard typed-rejects the wrong lane at ingress - wire_codec_params_rejects_calibration_equals_measurement — overfit guard typed-rejects the PR #219 pattern at ingress - wire_codec_params_deserializes_from_minimal_json — serde defaults correct (lane_width=F32x16, distance=AdcU8, rotation=Identity, calibration_rows=2048, seed=42) - wire_tensor_view_decode_lands_in_64byte_aligned_buffer — explicit Rule A proof: decoded AlignedBytes.is_aligned_64(), and slice::array_windows::<64>() yields exactly one window per 16-col F32 row - wire_tensor_view_rejects_size_mismatch — typed error for base64 payload not matching declared shape - wire_tensor_view_subspace_slicing — subspace(row, k, sub_bytes) returns the expected offset+len - wire_calibrate_request_{accepts_new_params_field, back_compat_legacy_fields} — both the new (params-carrying) and legacy payload shapes deserialise correctly Board hygiene in the SAME commit (per CLAUDE.md Mandatory Board-Hygiene Rule from PR #226): - STATUS_BOARD.md D0.1 row: Queued → In PR - LATEST_STATE.md cognitive-shader-driver section: new subsection listing the Wire surface types landed Test summary: 55/55 cognitive-shader-driver tests pass under --features serve; 147/147 lance-graph-contract tests pass. Rules honored: Rule A — stdlib slice::array_windows::<N>() + ndarray::simd::*, proven by the test that calls array_windows::<64>() on the decoded row Rule B — no std::arch, no hpc::simd_avxNNN reach; ndarray::simd::* imports only Rule C — n/a for DTO code (JIT tier selection lands D1.1) Rule D — JSON/YAML/REST only, no in-Rust CodecParams construction on the Wire side Rule E — Wire surface IS the SIMD surface; LaneWidth explicit, methods not scalar bags, 64-byte-aligned decode proven Rule F — decode ONCE at ingress via WireTensorView::decode; WireCalibrateRequest::params: Option<WireCodecParams> carries the Rust object through the rest of the pipeline https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 20, 2026
Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1.
66/66 cognitive-shader-driver tests pass under --features serve (+11 new).
D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1):
Reads <model_path>/config.json (HuggingFace layout) and returns
ModelFingerprint { architecture, hidden_size, n_layers,
tokenizer_class, vocab_size, default_lane_width, default_distance }.
Architecture routing:
llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX)
bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512)
torch_dtype override wins over architecture heuristic.
Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}.
Best-effort tokenizer_class from tokenizer_config.json.
8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta
(d_model alias) / generic fallback / missing-config / missing-field.
D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate):
DTOs:
WireBaseline { Passthrough } — default, extensible
WireTokenAgreement { model_path, reference, candidate (WireCodecParams),
prompt_set_blob_id, n_tokens }
WireTokenAgreementResult { top1_rate, top5_rate,
divergence_positions, per_layer_mse,
candidate_latency_us, reference_latency_us,
stub, backend }
Phase 0 handler stub (not shipped yet): returns stub:true /
backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the
real decode-and-compare loop (reference model load + top-k
comparison + per-layer MSE).
Pass gates (for when the harness lands):
top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline.
This is the ACTUAL codec cert gate — reconstruction ICC is
necessary-but-not-sufficient (per #219/#220 lesson).
3 round-trip serde tests: full payload + stub-backend default +
baseline default.
Board hygiene (CLAUDE.md Mandatory rule):
STATUS_BOARD.md updated:
D0.1 Queued → Shipped (PR #227 — was stale)
D0.2 Queued → In PR (this branch)
D0.5 Queued → In PR (this branch)
Phase 0 state after this commit:
✅ D0.1 WireCalibrate + WireTensorView (PR #227)
✅ D0.6 CodecParamsBuilder (PR #225)
✅ D0.7 precision-ladder validation (PR #225)
✅ D0.5 auto_detect (this PR)
✅ D0.2 WireTokenAgreement stub (this PR)
⏳ D0.3 WireSweep streaming endpoint (next PR)
⏳ D0.4 surface freeze (gates after D0.3)
Rules honored:
Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams
Rule E — Wire surface IS the SIMD surface (lane_width on candidate)
Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
5 tasks
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 20, 2026
Last Phase 0 Wire-surface deliverable from codec-sweep-via-lab-infra-v1.
71/71 cognitive-shader-driver tests pass under --features serve
(+5 new D0.3 tests).
DTOs (~250 LOC in wire.rs):
WireMeasure enum:
ReconstructionErrorHeldOut / ReconstructionIccHeldOut /
TokenAgreementTop1 / TokenAgreementTop5 / PerLayerMse
(serde: lowercase snake_case)
WireSweepGrid:
subspaces / centroids / residual_depths / rotations /
distances / lane_widths — each a Vec<T> with sensible defaults
(defaults produce cardinality 1 for minimal payloads)
+ residual_centroids / calibration_rows / measurement_rows / seed
Methods:
- cardinality() -> usize — product of axis lengths
- enumerate() -> Vec<WireCodecParams> — full Cartesian product
WireSweepRequest:
tensor_path / grid / measure (default: ICC + top-1) /
log_to_lance (optional Lance fragment path) / label
WireSweepResult (one per grid point):
grid_index / candidate / kernel_hash (CodecParams::kernel_signature) /
calibrate (Option<WireCalibrateResponse>) /
token_agreement (Option<WireTokenAgreementResult>) /
stub flag (mirrors WireTokenAgreementResult.stub)
WireSweepResponse (for non-streaming batch clients):
label / cardinality / results / elapsed_ms / lance_fragment_path
Streaming handler (SSE) + Lance writer deferred to Phase 3 D3.1.
Phase 0 ships the SURFACE; Phase 3 lands the execution.
Tests (5 new):
- sweep_grid_cardinality_is_product_of_axes (1×3×3×2×1×2 = 36)
- sweep_grid_enumerate_produces_all_unique_signatures
(4 distinct kernel signatures from 4 distinct IR-shaping tuples)
- sweep_grid_defaults_produce_single_candidate
(empty JSON {} → cardinality 1, single default WireCodecParams)
- sweep_request_round_trips_json (full payload with all fields)
- sweep_measure_serializes_snake_case (serde enum format)
Board hygiene (CLAUDE.md Mandatory rule):
STATUS_BOARD.md:
D0.3 Queued → In PR
D0.4 Queued → Ready (surface freeze fires on merge)
EPIPHANIES.md PREPEND:
"D0.3 sweep grid IS the JIT cache warmer" —
the grid and the cache signature are the same object viewed
from two sides. Each unique (subspaces, centroids,
residual_depth, rotation_kind, distance, lane_width) tuple maps
to exactly one kernel_signature(). First traversal compiles N
kernels; every subsequent sweep with overlapping tuples hits
cache at ~0 ms. 54-candidate Appendix A §30 sweep: ~800 ms
one-time compile, free after.
Phase 0 state after this PR (all 7 D0.x deliverables):
✅ D0.1 WireCalibrate + WireTensorView (PR #227)
✅ D0.2 WireTokenAgreement stub (PR #231)
✅ D0.3 WireSweep DTOs + grid (this PR)
⏳ D0.4 surface freeze (fires on merge)
✅ D0.5 auto_detect (PR #231)
✅ D0.6 CodecParamsBuilder (PR #225)
✅ D0.7 precision-ladder validation (PR #225)
Rules honored (every Wire DTO in this PR):
Rule D — JSON/YAML/REST only, never in-Rust construction at ingress
Rule E — Wire surface IS SIMD surface (lane_widths axis explicit,
kernel_hash returned per result)
Rule F — serde mirrors at ingress only; enumerate() returns plain
Rust objects that never re-serialize until egress
After this PR merges: D0.4 surface freeze → Phase 1 (JIT kernels) begins.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
5 tasks
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 21, 2026
…ace merge + adhering-agent review checklist Four additions to .claude/CODING_PRACTICES.md — extends the existing EmbedAnything-patterns content with the SoA / object-does-the-work / substrate-level patterns that crystallized during the codec-sweep lab-infra session (PRs #225–#239). 1. SoA + Object-Does-The-Work Patterns (~100 lines) - Checklist for new DTOs / kernels / caches: sealed builders, stable signatures excluding drift, typed errors, Cache<H> generic-over-handle, stub flag for Phase-N-before-Phase-N+k, feature matrix tested, serialisation at edges, DoS ceilings at construction not enumeration - Five additional anti-patterns (6-10) surfaced by session corrections: stateless-shader vs stateful-engine misframed, hallucinating ndarray surface, feature-matrix blindness, epiphany-dumping orientation-as-discovery, raw struct literals bypassing builders - 10 shipped-pattern reference entries citing the actual files + test counts - 8 principles: object does the work, SoA over AoS, same- substrate-different-view, Stream/Resonance/Bus lifecycle, weights are seeds, scaffold-before-codegen, feature matrix is part of contract, pin your toolchain - Read order for new sessions 2. MANDATORY: `ndarray::simd::*` canonical import (new section) - Correct/wrong examples per Rule B + invariant I2 - AMX sibling module + tile primitives + simd_caps canonical paths - Polyfill hierarchy (Tier 1 AMX → Tier 4 AVX-2, no consumer scalar tier) - Reviewer trigger for `std::arch::*` or `ndarray::hpc::simd_avxNNN::*` reach 3. 3-Way BindSpace Mutation Scheme (new section) - Table: Xor (single-writer reversible) / Bundle (multi-writer saturating, E-SUBSTRATE-1 guaranteed associative in expectation) / Superposition (preserve ambiguity) - When to use each + explicit DON'T-INTERCHANGE rule - Iron rule citation (CLAUDE.md I-SUBSTRATE-MARKOV): Xor on multi-writer path breaks the Markov guarantee - Reviewer trigger for Xor on concurrent-writer paths 4. Adhering-Agent Review Checklist (new section) - Per-agent table mapping 18 specialist agents + 5 meta-agents to the specific checklist sections they own - Spawn pattern: hand PR scope to 1-2 matching agents with pointer to this doc; they walk their rubric and return PASS/FAIL with specific line citations - Agents READ this doc as their rubric, not their personality The doc is now both the author-side pattern guide AND the reviewer-side checklist. Specialist agents adhere to it; PRs are reviewed against it; new sessions load it as part of the mandatory pre-read set. Cross-ref: CLAUDE.md I-SUBSTRATE-MARKOV + I-NOISE-FLOOR-JIRAK; lab-vs-canonical-surface.md invariants I1-I11 + six rules A-F; cognitive-shader-architecture.md 7-layer stack + SoA column types; ripple-dto-contracts.md Stream/Resonance/Bus/ThoughtStruct lifecycle; this session's PRs #225-#239 shipping the SoA patterns in practice. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Codec-sweep-via-lab-infra plan (9 docs commits shaping the design,
starting from the PR #220 honest-negative arc) + first Phase 0 code
deliverable on the contract side (CodecParams types + builder +
precision-ladder validation).
The plan — six rules binding every JIT-emitted kernel
Authored through successive corrections in one session:
slice::array_windows::<N>()(stable since Rust 1.77) +
ndarray::simd::*lane loaders. Zerohand-rolled slicing.
ndarray::simd::*and its AMXsibling modules (
ndarray::simd_amx::*,ndarray::hpc::amx_matmul::*,ndarray::hpc::simd_caps::*).Everything already exists in ndarray; zero ndarray changes.
AVX-512 baseline → AVX-2). No consumer-visible scalar tier; SoA
never scalarises without ndarray.
= new YAML; zero Rust changes, zero rebuilds.
LaneWidthenum mirrors lane types; DTOs expose methods(
row(),lanes_f32x16(),kernel_signature()) not scalar bags.decode at REST ingress; one encode at response / Lance egress.
No internal serde between layers.
The four-PR staircase it unlocks
Laid out in the plan:
analysis.
OrchestrationBridge.The sweep runs unlimited candidates after the one upfront rebuild
because every candidate is a JIT kernel keyed on
CodecParams::kernel_signature, not a new binary.Audit vs
.claude/CODING_PRACTICES.md(EmbedAnything patterns)Three gaps found, remediated as Phase 0 deliverables:
auto_detect.rsreads
config.jsonnext to safetensors (mirrors EmbedAnything'spattern).
CodecParamsBuilderlanded in this PR (fluent API + 14 tests).
Distance::{AdcU8, AdcI8}splitfor sign-handling / bipolar cancellation.
All five anti-patterns dodged (lib.rs stays declarations-only; hot
path is zero-copy + Arc'd KernelHandle; Rust-first; codebook
lookup only; precision ladder BF16 calibration → u8/i8 runtime →
f32 accumulator).
Starter YAML configs (Appendix A — 9 configs)
Concrete Phase 0 inputs live in
configs/codec/*.yaml:00_baseline_passthrough— regression anchor (top1 = 1.000 exactly)01_pr220_baseline— reproduces D1+D2+D5: CAM-PQ calibration pipeline — honest negative result #220 ICC ≈ 0.195 (pipelinesanity check)
02_pr219_overfit_reproducer— calibration_rows = measurement_rows→ pipeline's overfit guard must FAIL it
10_fix_a_wider_codebook— 1024 centroids11_fix_b_residual_pq— residual depth 112_fix_c_hadamard_rotation— Sylvester butterfly, stays onTier-3 F32x16
13_fix_d_opq_rotation— learned rotation +BF16x32lane(matches
tile_dpbf16ps)20_composite_a_plus_b— combinatorial-lift probe30_cross_product_sweep— 54-candidate initial gridCode delivered in this PR (D0.6 + D0.7)
crates/lance-graph-contract/src/cam.rs, ~383 LOC, zero-dep:LaneWidth,Distance,Rotation,ResidualSpec,CodecParamsCodecParamsBuilderwith fluent APICodecParamsErrortyped errorsCodecParams::kernel_signature()(JIT cache key; excludes seed)CodecParams::is_matmul_heavy()(drives Tier-1 AMX dispatch)Precision-ladder validation fires before JIT compile:
OpqRequiresBf16— OPQ routes throughtile_dpbf16ps; onlyBF16x32lane acceptedHadamardDimNotPow2— Sylvester construction needs dim = 2^kCalibrationEqualsMeasurement— typed rejection of PR codec research: CAM-PQ solves argmax blind spot (ICC 0.9998 at 6 B/row) + production plan #219'strained-and-tested-on-same-rows pattern
Test Plan
cargo test -p lance-graph-contract --lib— 147/147 passcargo test -p lance-graph-contract --lib codec_params_tests—14/14 new tests pass
DefaultHasher,core::fmt,core::error)to the consumer crate at the REST handler (Rule F)
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh