Skip to content

plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types#225

Merged
AdaWorldAPI merged 9 commits into
mainfrom
claude/teleport-session-setup-wMZfb
Apr 20, 2026
Merged

plan(codec-sweep) + D0.6/D0.7: lab-infra sweep plan + CodecParams types#225
AdaWorldAPI merged 9 commits into
mainfrom
claude/teleport-session-setup-wMZfb

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Codec-sweep-via-lab-infra plan (9 docs commits shaping the design,
starting from the PR #220 honest-negative arc) + first Phase 0 code
deliverable on the contract side (CodecParams types + builder +
precision-ladder validation).

The plan — six rules binding every JIT-emitted kernel

Authored through successive corrections in one session:

  • Rule A — Tensor access via stdlib slice::array_windows::<N>()
    (stable since Rust 1.77) + ndarray::simd::* lane loaders. Zero
    hand-rolled slicing.
  • Rule B — SIMD exclusively via ndarray::simd::* and its AMX
    sibling modules (ndarray::simd_amx::*,
    ndarray::hpc::amx_matmul::*, ndarray::hpc::simd_caps::*).
    Everything already exists in ndarray; zero ndarray changes.
  • Rule C — Polyfill hierarchy (Intel AMX → AVX-512 VNNI →
    AVX-512 baseline → AVX-2). No consumer-visible scalar tier; SoA
    never scalarises without ndarray.
  • Rule D — Configuration JSON / YAML / REST only. New candidate
    = new YAML; zero Rust changes, zero rebuilds.
  • Rule E — Wire surface IS the SIMD surface (object-oriented).
    LaneWidth enum mirrors lane types; DTOs expose methods
    (row(), lanes_f32x16(), kernel_signature()) not scalar bags.
  • Rule F — Serialisation at the edge only; never inside. One
    decode at REST ingress; one encode at response / Lance egress.
    No internal serde between layers.

The four-PR staircase it unlocks

Laid out in the plan:

  • PR A (Phase 0): hardens Wire surface + builder + auto_detect
    • precision validation. One upfront rebuild.
  • PR B (Phases 1+2): JIT kernels + token-agreement cert gate.
  • PR C (Phases 3+4): sweep driver + Lance logger + frontier
    analysis.
  • PR D (Phase 5): per-winner graduation to OrchestrationBridge.

The sweep runs unlimited candidates after the one upfront rebuild
because every candidate is a JIT kernel keyed on
CodecParams::kernel_signature, not a new binary.

Audit vs .claude/CODING_PRACTICES.md (EmbedAnything patterns)

Three gaps found, remediated as Phase 0 deliverables:

  • Gap 1 — auto-detect, not hardcode → D0.5 auto_detect.rs
    reads config.json next to safetensors (mirrors EmbedAnything's
    pattern).
  • Gap 3 — builder, not raw structD0.6 CodecParamsBuilder
    landed in this PR
    (fluent API + 14 tests).
  • Gap 5 — u8 vs i8 tablesDistance::{AdcU8, AdcI8} split
    for sign-handling / bipolar cancellation.

All five anti-patterns dodged (lib.rs stays declarations-only; hot
path is zero-copy + Arc'd KernelHandle; Rust-first; codebook
lookup only; precision ladder BF16 calibration → u8/i8 runtime →
f32 accumulator).

Starter YAML configs (Appendix A — 9 configs)

Concrete Phase 0 inputs live in configs/codec/*.yaml:

  • 00_baseline_passthrough — regression anchor (top1 = 1.000 exactly)
  • 01_pr220_baseline — reproduces D1+D2+D5: CAM-PQ calibration pipeline — honest negative result #220 ICC ≈ 0.195 (pipeline
    sanity check)
  • 02_pr219_overfit_reproducer — calibration_rows = measurement_rows
    → pipeline's overfit guard must FAIL it
  • 10_fix_a_wider_codebook — 1024 centroids
  • 11_fix_b_residual_pq — residual depth 1
  • 12_fix_c_hadamard_rotation — Sylvester butterfly, stays on
    Tier-3 F32x16
  • 13_fix_d_opq_rotation — learned rotation + BF16x32 lane
    (matches tile_dpbf16ps)
  • 20_composite_a_plus_b — combinatorial-lift probe
  • 30_cross_product_sweep — 54-candidate initial grid

Code delivered in this PR (D0.6 + D0.7)

crates/lance-graph-contract/src/cam.rs, ~383 LOC, zero-dep:

  • LaneWidth, Distance, Rotation, ResidualSpec, CodecParams
  • CodecParamsBuilder with fluent API
  • CodecParamsError typed errors
  • CodecParams::kernel_signature() (JIT cache key; excludes seed)
  • CodecParams::is_matmul_heavy() (drives Tier-1 AMX dispatch)
  • 14 tests — all passing. Full suite: 147/147.

Precision-ladder validation fires before JIT compile:

Test Plan

  • cargo test -p lance-graph-contract --lib — 147/147 pass
  • cargo test -p lance-graph-contract --lib codec_params_tests
    14/14 new tests pass
  • Zero-dep preserved: stdlib only (DefaultHasher, core::fmt,
    core::error)
  • No serde in the contract — YAML/JSON deserialisation belongs
    to the consumer crate at the REST handler (Rule F)
  • Plan + INTEGRATION_PLANS.md append-only entry committed

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 9 commits April 20, 2026 17:41
Operationalises PR #220's "What's Needed to Fix" list (wider codebook,
residual PQ, Hadamard pre-rotation, OPQ) as a parameter sweep through
the lab endpoint — every codec difference is a JIT kernel, not a cargo
rebuild. Phase 0 hardens the Wire surface once; Phases 1-4 run
unlimited candidates without further rebuilds; Phase 5 graduates
winners to the canonical OrchestrationBridge surface.

Structure:

  Phase 0 — API hardening (one rebuild, then frozen):
    D0.1 CodecParams in WireCalibrate
    D0.2 WireTokenAgreement endpoint (I11 cert gate)
    D0.3 WireSweep streaming + Lance append
    D0.4 surface freeze

  Phase 1 — JIT codec kernels (rebuild-free):
    D1.1 CodecKernelCache via JitCompiler (Cranelift)
    D1.2 Rotation primitives (Identity / Hadamard / OPQ)
    D1.3 Residual PQ via JIT composition

  Phase 2 — Token-agreement harness (the I11 cert gate):
    D2.1 Reference-model loader (ndarray safetensors)
    D2.2 Decode-and-compare loop (top-k, per-layer MSE)
    D2.3 Handler wiring

  Phase 3 — Sweep driver + Lance logger
  Phase 4 — DataFusion frontier analysis
  Phase 5 — Graduation to OrchestrationBridge (per winner only)

~1,920 LOC total; 1 upfront rebuild; unlimited candidates afterwards.
Compare to naive path (4 fixes × 8-17 min × N tweaks = hundreds of
hours). All work behind --features lab until graduation.

INTEGRATION_PLANS.md prepended per APPEND-ONLY rule, citing PR #224
dependency for the architectural framing.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…::* + AMX + YAML/JSON

Binds four non-negotiable rules on every JIT-emitted kernel in
Phases 1-3:

  Rule A: Tensor access via array_window only.
          No manual index math, no raw pointer reach, no custom
          slice offset recompute. ndarray::simd::array_window
          handles stride / alignment / bounds / lane padding.

  Rule B: SIMD exclusively via ndarray::simd::*.
          No std::arch::*, no ndarray::hpc::*, no hand-rolled
          intrinsics. Missing primitive → add to ndarray first,
          never bypass the canonical surface from the JIT.

  Rule C: Backend dispatch via simd_caps() (AMX-ready).
          JIT emits generic IR calling ndarray::simd primitives.
          Those resolve to AMX tiles on aarch64-apple-darwin with
          AMX capability, AVX-512 on x86_64, NEON on aarch64, and
          scalar fallback otherwise. Rotation and distance-table
          kernels benefit most from AMX (matmul-heavy paths).
          JIT never emits AMX intrinsics directly — it calls
          matmul_tiled / hadamard_butterfly / etc., which dispatch
          internally.

  Rule D: Configuration is JSON / YAML / REST only.
          No codec candidate defined in Rust. One schema
          (CodecParams) serialised three ways:
            - YAML under configs/codec/*.yaml (human-authored)
            - JSON payload (curl / REST)
            - REST endpoint body at /v1/shader/calibrate
          New candidate = new YAML/JSON file. Zero Rust changes.
          Zero rebuilds.

Enforcement: Phase 0 ships two new test gates —
  - kernel_contract_test scans emitted IR for banned symbols
    (std::arch, ndarray::hpc) and required symbols (array_window).
  - amx_dispatch_test (aarch64-apple-darwin-only) verifies
    simd_caps().has_amx() and trace records backend = "amx" for
    rotation kernels on M-series.

D1.1-D1.3 body sketches updated to show the contract in practice:
every decode / rotation / composition stage reads via array_window
and calls ndarray::simd primitives (adc_distances_simd,
hadamard_butterfly, matmul_tiled, sub_tiled, add_tiled), never
raw intrinsics.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…hanges

Corrections after hand-grep vs curated knowledge (encoding-ecosystem.md,
codec-findings-2026-04-20.md, rotation_vs_error_correction.md) and
user directives "its all there, dont touch, just be aware how to use
crate::simd", "wire accordingly into the lab infra", "via struct of
arrays":

  - slice::array_windows::<N>() IS real — stdlib, stable Rust 1.77,
    const-generic. I conflated it with a missing ndarray::array_window
    (singular); corrected.

  - AMX in ndarray is INTEL (Sapphire Rapids TDPBUSD/TDPBF16PS via
    stable inline asm on Rust 1.94, per src/simd_amx.rs header),
    NOT Apple. rust-lang #126622 keeps AMX intrinsics nightly;
    inline asm at src/hpc/amx_matmul.rs is the stable consumer path.
    Verified on kernel 6.18.5 with XCR0 bits 17+18 set.

  - Real primitive names (no hallucinated matmul_tiled /
    hadamard_butterfly): tile_dpbusd, tile_dpbf16ps, vnni_pack_bf16
    for tier-1 AMX; vnni_matvec / matvec_dispatch for tier-2 VNNI;
    F32x16 / U8x64 / Fingerprint<N> for tier-3 AVX-512 baseline.

  - Polyfill hierarchy per user directive
    (simd_amx > simd_avx512 > simd_avx2 fallback):
      Tier 1: Intel AMX tile (256 MACs/instr)
      Tier 2: AVX-512 VNNI (64 MACs/instr)
      Tier 3: AVX-512 baseline F32x16 (16 MACs/instr, mandatory
              default per ndarray's .cargo/config.toml
              target-cpu=x86-64-v4)
      Tier 4: AVX-2 F32x8 fallback
      Tier 5: scalar reference

  - Rule A wires SoA: the &[u8] slice array_windows iterates comes
    from a BindSpace column (FingerprintColumns / QualiaColumn /
    MetaColumn / EdgeColumn) per the AGI-as-SoA identity. No new
    data structures — the SoA column IS the input surface.

  - Dropped all "Phase 0 ndarray prerequisite" language. Everything
    the sweep needs exists in ndarray today; this plan wires the
    existing surface into cognitive-shader-driver (REST handlers +
    CodecKernelCache + CodecResearchBridge). Zero ndarray changes.

  - Added reality-check against codec-findings-2026-04-20.md so the
    sweep does NOT re-derive measured winners: Had-Q5×D-R already
    ICC ≈ 0.99 with shared codebook; I8-Hadamard leads for per-row-
    only at ICC ≈ 0.9; zipper serves bundling axis, not argmax;
    fractal leaf descriptors are DEAD (sign-flip invariant). The
    sweep focuses on #220's four unmeasured candidates (wider
    codebook / residual PQ / Hadamard pre-rotation / OPQ) and on
    the missing axis — token agreement, not reconstruction ICC.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
User directive: "i should never have to remind you to use simd
because the struct of arrays never ever does scalar without ndarray."

Corrections:

  - Removed consumer-visible "Tier 5 scalar" row from the polyfill
    table. Scalar fallback (when it exists at all for exotic targets)
    lives INSIDE ndarray::simd::* — the consumer never hand-rolls
    a scalar loop on a SoA path.

  - Added iron rule before the tier table: every tier in the chain
    calls ndarray::simd::* / ndarray::simd_amx::* /
    ndarray::hpc::amx_matmul::* — if a kernel runs scalar on the
    SoA path, the SoA invariant is broken.

  - Dispatch pseudo-code cleaned: the else branch lands on
    ndarray::simd::F32x16 (Tier 3 mandatory floor via target-cpu=
    x86-64-v4). No "else scalar loop" short-circuit exists. If
    ndarray::simd were unavailable, SoA wouldn't be the right path.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…oriented)

User directive: "the api for lab needs to be simd object oriented
surface."

Rule E binds the lab Wire DTOs to the SIMD shapes they feed. The Wire
types are NOT convenience scalar bags that get reassembled into SIMD
structures internally — they ARE the SIMD surface, serialised.

Four consequences:

(i)  Lane-shaped aggregates. LaneWidth enum mirrors ndarray::simd::*
     lane types (F32x16, U8x64, F64x8, BF16x32). Every tensor-carrying
     DTO names its lane_width explicitly.

(ii) Methods, not bags. WireTensorView exposes row() / row_count() /
     lanes_f32x16() / subspace(); CodecParams exposes
     kernel_signature() / lane_width() / is_matmul_heavy(). Consumers
     never reassemble a tensor from a Vec<f32>.

(iii) Kernel signature keying. CodecParams::kernel_signature() returns
      a stable hash only over fields that shape the emitted IR. JIT
      cache keys on this object-computed signature; adding an unrelated
      config field does not invalidate entries.

(iv) Serialisation preserves alignment. Decoded WireTensorView bytes
     land in a 64-byte-aligned buffer; consumers call
     slice::array_windows::<64>() + F32x16::from_slice directly, no
     adapter, no copy, no re-align.

Plus three cleanups from prior corrections:

- kernel_contract_test now scans IR for the real symbols:
  ndarray::simd::*, ndarray::simd_amx::*, ndarray::hpc::amx_matmul::*
  (allowed) and std::arch / simd_avxNNN reach (banned).

- amx_dispatch_test corrected: x86_64-gated (not aarch64-apple-darwin),
  calls ndarray::simd_amx::amx_available(). When true on Sapphire
  Rapids+ runners, asserts backend = "amx" trace for matmul-heavy
  candidates; when false, verifies Tier-2 VNNI or Tier-3 F32x16
  selection — NEVER scalar.

- New wire_object_surface_test round-trips WireCalibrate +
  WireTensorView through JSON/gRPC and proves the decoded bytes are
  consumable with zero adapter code via array_windows + F32x16.

- D1.1 body sketch cleaned: dropped fictional array_window (singular);
  imports simd_caps from ndarray::hpc::simd_caps (real path); cache
  uses RwLock for interior mutability per ndarray data-flow rule
  ("no &mut self during computation"); kernel_signature comes from
  CodecParams method (Rule E), not a free-function hash.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
User directive: "Serialisation only once when touching as rest, no
Serialisation EVER inside."

Rule F binds serialisation to the two legitimate edges:

  Ingress (once per request):
    REST/gRPC handler decodes JSON/protobuf → Rust objects
    WireTensorView.bytes_base64 → 64-byte-aligned [u8] buffer
    YAML config file → parsed CodecParams at load time

  Egress (once per response / per candidate):
    REST/gRPC response encodes Rust result → JSON/protobuf
    Lance append writes candidate row → Arrow columnar

Everything between ingress and egress is in-memory Rust objects or
zero-copy &[u8] SoA slices. No JSON, no YAML, no protobuf, no
bincode, no re-encode for "debug dumps." Traces flow as Rust
objects through ShaderSink; only the final sink at the egress
boundary may serialise.

Hard prohibitions inside the pipeline:
- serde_json::to_string between layers
- bincode::serialize for L1↔L2↔L3 handoffs
- prost::Message::encode inside the JIT loop
- re-parsing YAML per candidate (parse once at load, cache object)
- debug-JSON dumps inside hot paths

Why load-bearing:
1. Alignment survives — decoded tensor bytes land once in a
   64-byte-aligned buffer; no intermediate re-pack.
2. JIT cache keys stay stable — kernel_signature hashes the Rust
   object directly, no "same config, different whitespace →
   different hash → cache miss" trap.
3. Token-agreement comparisons stay honest — both Passthrough and
   candidate paths consume the same decoded buffer; any internal
   re-encode would introduce precision drift that mimics or masks
   codec error.
4. Sweep throughput — decode at 2-10 GB/s is fine once; repeated
   re-serialisation would turn a JIT-fast sweep into serde-bound.

Enforcement: new test gate no_internal_serialisation_test in
Phase 0 scans codec_research.rs / codec_bridge.rs / token_agreement.rs
/ markov_bundle.rs for forbidden symbols (serde_json::*, bincode::*,
prost encode/decode outside handlers). Fails the build if any such
call appears outside src/serve.rs / src/grpc.rs ingress/egress
handlers or the Lance append writer.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Eight concrete YAML configs for configs/codec/*.yaml that Phase 0
will consume:

  00_baseline_passthrough         — regression anchor (top1=1.000 exactly)
  01_pr220_baseline               — negative control, reproduces #220 ICC 0.195
  02_pr219_overfit_reproducer     — negative control, split-test must FAIL
  10_fix_a_wider_codebook         — #220 (a) 1024 centroids
  11_fix_b_residual_pq            — #220 (b) residual depth=1
  12_fix_c_hadamard_rotation      — #220 (c) Hadamard pre-rotation
  13_fix_d_opq_rotation           — #220 (d) OPQ learned rotation
  20_composite_a_plus_b           — composition probe for combinatorial lift
  30_cross_product_sweep          — SweepGrid for D3.1 initial sweep

Each YAML:

  - Names lane_width explicitly (Rule E) so the JIT compiles the
    right SIMD tier. BF16x32 for OPQ (AMX bf16 tile path) — others
    default to F32x16.
  - Carries a notes: block stating the expected measurement
    outcome, so Phase 0's regression detection has ground truth
    to check against (e.g., baseline reproducer must produce
    ICC ≈ 0.195, overfit reproducer must FAIL the split-test).
  - Separates calibration_rows from measurement_rows where
    relevant (pr219_overfit_reproducer sets them equal so the
    pipeline refuses to report the ICC, demonstrating the guard
    that prevents PR #219's overfit-on-training artefact from
    recurring).

30_cross_product_sweep specifies the initial 54-candidate grid
(1 subspace × 3 centroids × 3 residuals × 3 rotations × 1 distance
× 2 lane widths). Expected JIT compile budget: ~800 ms one-time;
everything after is cache hits per Rule A/B.

Operating principle reiterated at the end: adding a candidate is
authoring a YAML; changing params is editing YAML; Rust reads
YAML once at ingress (Rule F) and never re-serialises. Sweep
logger appends result rows to Lance — the only egress beyond the
REST response.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Three gaps found in the 8-item checklist; remediations folded into
Phase 0 as new deliverables so they ship from day one, not as
follow-up:

  Gap 1 — auto-detect, don't hardcode.
    Current plan expects caller to supply lane_width + tensor shape.
    Patch: D0.5 new auto_detect.rs (~140 LOC) reads config.json
    next to the safetensors and returns ModelFingerprint {
    architecture, hidden_size, lane_width default, tokenizer_class,
    … }. Consumed by WireTokenAgreement when tensor_view.lane_width
    is omitted. Mirrors EmbedAnything auto_detect.rs (6 tests).

  Gap 3 — builder, not raw struct assembly.
    Current plan shows CodecParams assembled directly.
    Patch: D0.6 CodecParamsBuilder fluent API in
    lance-graph-contract::cam. Used by sweep driver / tests /
    frontier analysis; YAML ingress still produces CodecParams via
    serde. Mirrors EmbedAnything builder.rs (7 tests).

  Gap 5 — u8 vs i8 distance tables.
    Current plan treats "adc" as one distance variant.
    Patch: split distance into adc_u8 / adc_i8 at the YAML + Rust
    enum level. Sign-handling affects bipolar cancellation per
    codec-findings-2026-04-20.md §I1 sign-flip.

Three remain clean:

  Item 2 (sink pattern) — ShaderSink trait + Lance append are sinks.
  Item 4 (feature gates) — --features lab / serve / grpc declared.
  Item 6 (per-role scales) — one role per YAML preserves z-scale.
  Item 7 (calibration↔runtime boundary) — calibration_rows vs
  measurement_rows already split; 02_pr219_overfit_reproducer is
  the explicit test that enforces the boundary.
  Item 8 (no forward pass) — codebook/tile lookup only, per I6.

All 5 anti-patterns dodged: lib.rs stays declarations-only; hot
path is zero-copy &[u8] into SoA + Arc'd KernelHandle (no clones);
Rust-first API; codebook/tile lookup (no matmul inner loop);
precision ladder BF16 calibration → u8/i8 runtime → f32 accumulator
(enforced by Rule E's LaneWidth on the Wire DTO matching the JIT
kernel input format).

New D0.7 — precision-ladder contract. CodecParams validation
refuses impossible shapes at ingress (e.g., { lane_width: F32x16,
rotation: Opq(…) } — OPQ must use BF16x32 to match tile_dpbf16ps).
Validation fires before any JIT compile.

Phase 0 LOC bumps: ~480 → ~700. Still one upfront rebuild.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…dation

First Phase 0 code deliverable from codec-sweep-via-lab-infra-v1 plan.
Zero-dep contract-side types the lab API (cognitive-shader-driver)
will carry into JIT compilation.

Adds to crates/lance-graph-contract/src/cam.rs (~290 LOC):

  Enums (Rule E — Wire surface IS the SIMD surface, object-oriented):
    LaneWidth { F32x16, U8x64, F64x8, BF16x32 }  — mirrors ndarray::simd::*
    Distance  { AdcU8, AdcI8 }                    — CODING_PRACTICES gap 5
                                                    (sign-handling /
                                                    bipolar cancellation)
    Rotation  { Identity, Hadamard{dim}, Opq{blob,dim} }

  Structs:
    ResidualSpec  { depth, centroids }
    CodecParams   { subspaces, centroids, residual, lane_width,
                    pre_rotation, distance, calibration_rows,
                    measurement_rows, seed }

  Builder (CODING_PRACTICES gap 3 — fluent API, not raw-struct):
    CodecParamsBuilder::new()
      .subspaces(u32).centroids(u32).residual(ResidualSpec)
      .lane_width(LaneWidth).rotation(Rotation).distance(Distance)
      .calibration_rows(u32).measurement_rows(u32).seed(u64)
      .build() -> Result<CodecParams, CodecParamsError>

  Validation fires BEFORE any JIT compile (D0.7 precision ladder):
    - ZeroDimension          — subspaces == 0 or centroids == 0
    - OpqRequiresBf16        — OPQ routes through tile_dpbf16ps;
                               only LaneWidth::BF16x32 is valid
    - HadamardDimNotPow2     — Sylvester construction needs dim = 2^k
    - CalibrationEqualsMeasurement — overfit guard: refuses to emit
                               ICC when calibration_rows ==
                               measurement_rows (reproduces PR #219's
                               128-row trained-and-tested artifact)

  Methods on CodecParams:
    kernel_signature() -> u64   — JIT cache key (Rule E); excludes
                                  seed so calibration-sample changes
                                  don't invalidate cached kernels
    is_matmul_heavy() -> bool   — true for OPQ or centroids > 512;
                                  drives Tier-1 AMX dispatch decision
                                  (Rule C polyfill hierarchy)

  Rotation::is_matmul() -> bool  — Identity and Hadamard are false
                                  (butterfly stays on Tier-3 F32x16);
                                  only Opq returns true

14 new tests covering:
  - builder default matches PR #220 baseline shape
  - each validation variant fires correctly
  - OPQ + BF16x32 accepted; OPQ + F32x16 rejected with typed error
  - Hadamard + non-pow2 dim rejected
  - overfit guard fires on calibration == measurement
  - kernel_signature stable across identical builds
  - kernel_signature excludes seed (cache stays hot)
  - kernel_signature changes with centroids / rotation kind
  - is_matmul_heavy detects OPQ AND wide codebook (≥512 centroids)

Zero-dep preserved (stdlib only: std::collections::hash_map::
DefaultHasher for kernel_signature, core::fmt + core::error for
error types). No serde in the contract — YAML/JSON deserialisation
belongs to the consumer crate, which will produce CodecParams via
serde at the REST handler (Rule F — serialisation at edge only).

Tests: 147/147 contract suite passing (133 prior + 14 new).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
@AdaWorldAPI AdaWorldAPI merged commit 9db3a47 into main Apr 20, 2026
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Retroactive hygiene for the recent PR arc + prospective enforcement
so the gap never recurs. User directive: "should have happened to
begin with."

LATEST_STATE.md:
  - Header: "Last updated 2026-04-20 post PR #224 (PR #225 open)"
  - Recently Shipped table: prepended rows for #225 (open), #224,
    and #223 with full shipped-content summaries
  - Contract Inventory: expanded cam:: entry with all new codec-
    sweep types (LaneWidth / Distance / Rotation / ResidualSpec /
    CodecParams / CodecParamsBuilder / CodecParamsError) including
    the precision-ladder-fires-before-JIT invariant
  - Active Branches: recorded claude/teleport-session-setup-wMZfb
    and its three merged PRs
  - Active Integration Plans: added codec-sweep-via-lab-infra-v1
    alongside elegant-herding-rocket-v1
  - Immediate Next Work: codec-sweep Phase 0 remainder (D0.1/0.2/0.3/
    0.5) + the elegant-herding Phase 2 block

PR_ARC_INVENTORY.md (APPEND-ONLY — PREPEND only):
  - #225 entry: plan + CodecParams/Builder/precision validation +
    rules A-F locked + decisions for future PRs
  - #224 entry: three-part lab stack + thinking harvest + I11
    measurability locked
  - #223 entry: LAB-ONLY firewall + AGI-as-SoA + I1-I10 invariants
    locked (the cross-cutting architectural ruleset this workspace
    now enforces)

STATUS_BOARD.md:
  - New section: codec-sweep-via-lab-infra-v1 with 18 D-ids across
    5 phases (D0.6/D0.7 marked Shipped-in-#225; remainder Queued)

EPIPHANIES.md (APPEND-ONLY — PREPEND only, 6 new dated entries):
  - Board hygiene is the driving seat, not cleanup (this session's
    self-reflection turned into a rule)
  - Codec cert is token agreement, not synthetic ICC (#219#220
    arc; #225 CalibrationEqualsMeasurement typed rejection)
  - Lab REST surface is three-part (API + Planner + JIT), not just
    scaffolding
  - Thinking harvest via REST/Cypher = the AGI magic bullet
  - SoA never scalarises without ndarray (iron rule Rule C)
  - AGI is the glove, not the oracle — four-axis SoA is what you
    wear

CLAUDE.md — new top-level § "The Stance — Driving Seat +
AGI-as-Glove (P0, read first)":

  - Explicit driving-seat posture: the session STEERS the stack,
    doesn't observe it
  - AGI-as-glove doctrine concrete: topic → FingerprintColumns,
    angle → QualiaColumn, thinking → MetaColumn, planner →
    EdgeColumn. New capability lands as a new column, not a layer.
  - MANDATORY Board-Hygiene Rule as a table: every PR that adds a
    type / plan / D-id / epiphany / tech-debt / issue MUST update
    the corresponding board file IN THE SAME COMMIT. Retroactive
    hygiene (merge PR → later cleanup) is now an anti-pattern the
    rule forbids.
  - "Consult, don't guess" — agent/knowledge-first discipline:
    specialist-agent card → knowledge doc → board inventory →
    only then grep. Subagent spawn with curated docs beats main-
    thread grep.

147/147 contract suite still passing. Doc-only PR otherwise
(Cargo.toml / src/* unchanged; the orphan serde_yaml/base64 deps
from the timed-out bus-compiler subagent were reverted — they'll
land with D0.1/D0.3 when the Wire code lands).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI added a commit that referenced this pull request Apr 20, 2026
…p-wMZfb

board hygiene + CLAUDE.md driving-seat tightening (post #223/#224/#225)
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
First code deliverable of codec-sweep-via-lab-infra Phase 0 on the
consumer side. Extends the lab Wire surface to carry the CodecParams
shape from PR #225 with an object-oriented tensor DTO that decodes
once at ingress into a 64-byte-aligned buffer consumable directly by
F32x16::from_slice via slice::array_windows::<64>.

crates/cognitive-shader-driver/src/wire.rs:

  Serde mirrors for contract::cam types (zero serde in the contract
  per CLAUDE.md rule 5):
    - WireLaneWidth {F32x16, U8x64, F64x8, BF16x32}
    - WireDistance {AdcU8, AdcI8}
    - WireRotation {Identity, Hadamard{dim}, Opq{matrix_blob_id, dim}}
    - WireResidualSpec {depth, centroids}
    - WireCodecParams — mirrors CodecParams one-for-one
    - From/TryFrom conversions; TryFrom<WireCodecParams> for
      CodecParams runs the precision-ladder validation (OPQ↔BF16x32,
      Hadamard pow2, overfit guard rejecting calibration_rows ==
      measurement_rows) BEFORE any JIT compile would fire.

  WireTensorView + AlignedBytes (Rule A + Rule E + Rule F):
    - shape [u32; 2] + lane_width + bytes_base64 on the wire
    - decode() base64-decodes ONCE at ingress into AlignedBytes
      (heap, 64-byte aligned via Layout::from_size_align, Drop
      deallocates with matching layout, Send + Sync)
    - row() / subspace() / row_count() / col_count() / row_bytes() /
      element_bytes() — object-oriented methods per Rule E, mirror
      the SoA+SIMD ops the JIT kernel will perform
    - is_aligned_64() for the kernel_contract_test gate
    - WireTensorViewError {Base64, SizeMismatch, ZeroShape}

  WireCalibrateRequest extended additively:
    - New: params: Option<WireCodecParams> + tensor_view:
      Option<WireTensorView> (the new path)
    - Legacy: num_subspaces / num_centroids / kmeans_iterations /
      max_rows / icc_samples preserved for back-compat

  WireCalibrateResponse extended additively:
    - New: kernel_hash (= CodecParams::kernel_signature()),
      compile_time_us, backend ("amx" | "vnni" | "avx512" |
      "avx2" | "legacy")
    - Never "scalar" on a SoA path — the iron rule enforced at
      the response contract

crates/cognitive-shader-driver/Cargo.toml:
  - serve feature now pulls base64 (v0.22) + bytemuck (v1)
    optional deps. No new features; these belong under the
    existing lab umbrella.

crates/cognitive-shader-driver/src/codec_research.rs:
  - Legacy calibrate_tensor path fills the new response fields
    with zeros + backend = "legacy". D1.1 (JIT kernel) populates
    them meaningfully when it lands.

Tests (8 new, all passing under --features serve):
  - wire_codec_params_round_trip_to_contract
    — OPQ + BF16x32 + wide codebook → builds cleanly,
      is_matmul_heavy true
  - wire_codec_params_rejects_opq_with_f32x16
    — precision-ladder guard typed-rejects the wrong lane at ingress
  - wire_codec_params_rejects_calibration_equals_measurement
    — overfit guard typed-rejects the PR #219 pattern at ingress
  - wire_codec_params_deserializes_from_minimal_json
    — serde defaults correct (lane_width=F32x16, distance=AdcU8,
      rotation=Identity, calibration_rows=2048, seed=42)
  - wire_tensor_view_decode_lands_in_64byte_aligned_buffer
    — explicit Rule A proof: decoded AlignedBytes.is_aligned_64(),
      and slice::array_windows::<64>() yields exactly one window
      per 16-col F32 row
  - wire_tensor_view_rejects_size_mismatch
    — typed error for base64 payload not matching declared shape
  - wire_tensor_view_subspace_slicing
    — subspace(row, k, sub_bytes) returns the expected offset+len
  - wire_calibrate_request_{accepts_new_params_field,
    back_compat_legacy_fields}
    — both the new (params-carrying) and legacy payload shapes
      deserialise correctly

Board hygiene in the SAME commit (per CLAUDE.md Mandatory
Board-Hygiene Rule from PR #226):
  - STATUS_BOARD.md D0.1 row: Queued → In PR
  - LATEST_STATE.md cognitive-shader-driver section: new
    subsection listing the Wire surface types landed

Test summary: 55/55 cognitive-shader-driver tests pass under
--features serve; 147/147 lance-graph-contract tests pass.

Rules honored:
  Rule A — stdlib slice::array_windows::<N>() + ndarray::simd::*,
           proven by the test that calls array_windows::<64>() on
           the decoded row
  Rule B — no std::arch, no hpc::simd_avxNNN reach; ndarray::simd::*
           imports only
  Rule C — n/a for DTO code (JIT tier selection lands D1.1)
  Rule D — JSON/YAML/REST only, no in-Rust CodecParams construction
           on the Wire side
  Rule E — Wire surface IS the SIMD surface; LaneWidth explicit,
           methods not scalar bags, 64-byte-aligned decode proven
  Rule F — decode ONCE at ingress via WireTensorView::decode;
           WireCalibrateRequest::params: Option<WireCodecParams>
           carries the Rust object through the rest of the pipeline

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Two more Phase 0 deliverables from codec-sweep-via-lab-infra-v1.
66/66 cognitive-shader-driver tests pass under --features serve (+11 new).

D0.5 — auto_detect.rs (~300 LOC, CODING_PRACTICES gap 1):
  Reads <model_path>/config.json (HuggingFace layout) and returns
  ModelFingerprint { architecture, hidden_size, n_layers,
  tokenizer_class, vocab_size, default_lane_width, default_distance }.

  Architecture routing:
    llama / qwen / qwen2 / qwen3 / mistral / mixtral → BF16x32 (AMX)
    bert / modernbert / xlm-roberta / generic → F32x16 (AVX-512)
  torch_dtype override wins over architecture heuristic.

  Typed errors: ConfigMissing / Io / Parse / MissingField {path, field}.
  Best-effort tokenizer_class from tokenizer_config.json.

  8 tests: llama / qwen3-with-tokenizer / bert / modernbert / xlm-roberta
  (d_model alias) / generic fallback / missing-config / missing-field.

D0.2 — WireTokenAgreement stub (~100 LOC, the I11 cert gate):
  DTOs:
    WireBaseline { Passthrough } — default, extensible
    WireTokenAgreement { model_path, reference, candidate (WireCodecParams),
                          prompt_set_blob_id, n_tokens }
    WireTokenAgreementResult { top1_rate, top5_rate,
                                divergence_positions, per_layer_mse,
                                candidate_latency_us, reference_latency_us,
                                stub, backend }

  Phase 0 handler stub (not shipped yet): returns stub:true /
  backend:"stub" deterministic result. Phase 2 D2.1-D2.3 land the
  real decode-and-compare loop (reference model load + top-k
  comparison + per-layer MSE).

  Pass gates (for when the harness lands):
    top1_rate ≥ 0.99 + top5_rate ≥ 0.999 vs Passthrough baseline.
    This is the ACTUAL codec cert gate — reconstruction ICC is
    necessary-but-not-sufficient (per #219/#220 lesson).

  3 round-trip serde tests: full payload + stub-backend default +
  baseline default.

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md updated:
    D0.1 Queued → Shipped (PR #227 — was stale)
    D0.2 Queued → In PR (this branch)
    D0.5 Queued → In PR (this branch)

Phase 0 state after this commit:
  ✅ D0.1 WireCalibrate + WireTensorView (PR #227)
  ✅ D0.6 CodecParamsBuilder (PR #225)
  ✅ D0.7 precision-ladder validation (PR #225)
  ✅ D0.5 auto_detect (this PR)
  ✅ D0.2 WireTokenAgreement stub (this PR)
  ⏳ D0.3 WireSweep streaming endpoint (next PR)
  ⏳ D0.4 surface freeze (gates after D0.3)

Rules honored:
  Rule D — JSON/YAML/REST only, CodecParams carried through via WireCodecParams
  Rule E — Wire surface IS the SIMD surface (lane_width on candidate)
  Rule F — serde mirrors at ingress only; TryFrom → CodecParams at handler

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
Last Phase 0 Wire-surface deliverable from codec-sweep-via-lab-infra-v1.
71/71 cognitive-shader-driver tests pass under --features serve
(+5 new D0.3 tests).

DTOs (~250 LOC in wire.rs):
  WireMeasure enum:
    ReconstructionErrorHeldOut / ReconstructionIccHeldOut /
    TokenAgreementTop1 / TokenAgreementTop5 / PerLayerMse
    (serde: lowercase snake_case)

  WireSweepGrid:
    subspaces / centroids / residual_depths / rotations /
    distances / lane_widths — each a Vec<T> with sensible defaults
    (defaults produce cardinality 1 for minimal payloads)
    + residual_centroids / calibration_rows / measurement_rows / seed
    Methods:
      - cardinality() -> usize — product of axis lengths
      - enumerate() -> Vec<WireCodecParams> — full Cartesian product

  WireSweepRequest:
    tensor_path / grid / measure (default: ICC + top-1) /
    log_to_lance (optional Lance fragment path) / label

  WireSweepResult (one per grid point):
    grid_index / candidate / kernel_hash (CodecParams::kernel_signature) /
    calibrate (Option<WireCalibrateResponse>) /
    token_agreement (Option<WireTokenAgreementResult>) /
    stub flag (mirrors WireTokenAgreementResult.stub)

  WireSweepResponse (for non-streaming batch clients):
    label / cardinality / results / elapsed_ms / lance_fragment_path

Streaming handler (SSE) + Lance writer deferred to Phase 3 D3.1.
Phase 0 ships the SURFACE; Phase 3 lands the execution.

Tests (5 new):
  - sweep_grid_cardinality_is_product_of_axes (1×3×3×2×1×2 = 36)
  - sweep_grid_enumerate_produces_all_unique_signatures
    (4 distinct kernel signatures from 4 distinct IR-shaping tuples)
  - sweep_grid_defaults_produce_single_candidate
    (empty JSON {} → cardinality 1, single default WireCodecParams)
  - sweep_request_round_trips_json (full payload with all fields)
  - sweep_measure_serializes_snake_case (serde enum format)

Board hygiene (CLAUDE.md Mandatory rule):
  STATUS_BOARD.md:
    D0.3 Queued → In PR
    D0.4 Queued → Ready (surface freeze fires on merge)

  EPIPHANIES.md PREPEND:
    "D0.3 sweep grid IS the JIT cache warmer" —
    the grid and the cache signature are the same object viewed
    from two sides. Each unique (subspaces, centroids,
    residual_depth, rotation_kind, distance, lane_width) tuple maps
    to exactly one kernel_signature(). First traversal compiles N
    kernels; every subsequent sweep with overlapping tuples hits
    cache at ~0 ms. 54-candidate Appendix A §30 sweep: ~800 ms
    one-time compile, free after.

Phase 0 state after this PR (all 7 D0.x deliverables):
  ✅ D0.1 WireCalibrate + WireTensorView (PR #227)
  ✅ D0.2 WireTokenAgreement stub (PR #231)
  ✅ D0.3 WireSweep DTOs + grid (this PR)
  ⏳ D0.4 surface freeze (fires on merge)
  ✅ D0.5 auto_detect (PR #231)
  ✅ D0.6 CodecParamsBuilder (PR #225)
  ✅ D0.7 precision-ladder validation (PR #225)

Rules honored (every Wire DTO in this PR):
  Rule D — JSON/YAML/REST only, never in-Rust construction at ingress
  Rule E — Wire surface IS SIMD surface (lane_widths axis explicit,
           kernel_hash returned per result)
  Rule F — serde mirrors at ingress only; enumerate() returns plain
           Rust objects that never re-serialize until egress

After this PR merges: D0.4 surface freeze → Phase 1 (JIT kernels) begins.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI pushed a commit that referenced this pull request Apr 21, 2026
…ace merge + adhering-agent review checklist

Four additions to .claude/CODING_PRACTICES.md — extends the existing
EmbedAnything-patterns content with the SoA / object-does-the-work /
substrate-level patterns that crystallized during the codec-sweep
lab-infra session (PRs #225#239).

1. SoA + Object-Does-The-Work Patterns (~100 lines)
   - Checklist for new DTOs / kernels / caches: sealed builders,
     stable signatures excluding drift, typed errors, Cache<H>
     generic-over-handle, stub flag for Phase-N-before-Phase-N+k,
     feature matrix tested, serialisation at edges, DoS ceilings
     at construction not enumeration
   - Five additional anti-patterns (6-10) surfaced by session
     corrections: stateless-shader vs stateful-engine misframed,
     hallucinating ndarray surface, feature-matrix blindness,
     epiphany-dumping orientation-as-discovery, raw struct literals
     bypassing builders
   - 10 shipped-pattern reference entries citing the actual files
     + test counts
   - 8 principles: object does the work, SoA over AoS, same-
     substrate-different-view, Stream/Resonance/Bus lifecycle,
     weights are seeds, scaffold-before-codegen, feature matrix
     is part of contract, pin your toolchain
   - Read order for new sessions

2. MANDATORY: `ndarray::simd::*` canonical import (new section)
   - Correct/wrong examples per Rule B + invariant I2
   - AMX sibling module + tile primitives + simd_caps canonical
     paths
   - Polyfill hierarchy (Tier 1 AMX → Tier 4 AVX-2, no consumer
     scalar tier)
   - Reviewer trigger for `std::arch::*` or
     `ndarray::hpc::simd_avxNNN::*` reach

3. 3-Way BindSpace Mutation Scheme (new section)
   - Table: Xor (single-writer reversible) / Bundle (multi-writer
     saturating, E-SUBSTRATE-1 guaranteed associative in
     expectation) / Superposition (preserve ambiguity)
   - When to use each + explicit DON'T-INTERCHANGE rule
   - Iron rule citation (CLAUDE.md I-SUBSTRATE-MARKOV): Xor on
     multi-writer path breaks the Markov guarantee
   - Reviewer trigger for Xor on concurrent-writer paths

4. Adhering-Agent Review Checklist (new section)
   - Per-agent table mapping 18 specialist agents + 5 meta-agents
     to the specific checklist sections they own
   - Spawn pattern: hand PR scope to 1-2 matching agents with
     pointer to this doc; they walk their rubric and return
     PASS/FAIL with specific line citations
   - Agents READ this doc as their rubric, not their personality

The doc is now both the author-side pattern guide AND the
reviewer-side checklist. Specialist agents adhere to it; PRs are
reviewed against it; new sessions load it as part of the mandatory
pre-read set.

Cross-ref: CLAUDE.md I-SUBSTRATE-MARKOV + I-NOISE-FLOOR-JIRAK;
lab-vs-canonical-surface.md invariants I1-I11 + six rules A-F;
cognitive-shader-architecture.md 7-layer stack + SoA column types;
ripple-dto-contracts.md Stream/Resonance/Bus/ThoughtStruct lifecycle;
this session's PRs #225-#239 shipping the SoA patterns in practice.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants