From 99be7bf921fd9e451dd4b102280ebe1574093704 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 15 Apr 2026 12:47:05 +0000 Subject: [PATCH] docs: codec invariants + experiment catalogue (session end) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Session-end artefact for future déjà-vu. Catalogues every compression approach tried in PRs #176-#185 and the lesson each one produced. No approach is thrown away — each failed experiment carries information about where the real boundary is. ## Structure ### Core invariants (6) I1. Two regimes, opposite needs (argmax vs index) I2. Near-orthogonality of weight rows in high dim I3. Direction vs amplitude cannot be merged into one scalar I4. Wire-format type widths are hard caps — assert at encode time I5. 'u8 can span u16/u64 effective' requires the right decoder I6. The ticket-for-curve model (SpiralAddress + shared curve) ### Approaches tried (7) A1. HhtlDTensor — Base17 + Slot D + Slot V (correct for cascade, wrong for f32 GEMM) A2. Progressive residual RVQ with k-ladder (works argmax, fails index) A3. Hierarchical CLAM 256x256 (REFUTED — cos 0.0046 on vocab) A4. Passthrough BF16 n_rows > 8192 (SHIPS for correctness, net loss for ratio) A5. SlotL 8 x i8 on SVD basis (correct algorithm, misapplied to Base17 centroid) A6. HhtlF32Tensor f32 palette + SlotL (right direction, 10x better, still short) A7. cascade_attention_probe Base17 palette (3.71% argmax agreement — palette doesn't preserve inner products) ### Abstractions that ARE the right primitive (3) R1. highheelbgz::rehydrate::SpiralEncoding (exists, untested on real Qwen3) R2. Per-role stride in NeuronPrint (q/k=3, v=5, gate=8, up=2, down=4) R3. HHTL cascade inference (hhtl_cache RouteAction) ### Open probes (4) P1. SpiralEncoding on real Qwen3 weights — claim rho >= 0.95 unproven P2. Shared anchors + i8 position per row — depends on P1 P3. Palette preserves inner-product neighbourhoods — A7 refuted for Base17 P4. Log-radial CLAM with magnitude split — hypothesised > linear CLAM ### Déjà-vu table Lists 7 'if you're tempted to...' instincts with the PR that already refuted them. Exists so future sessions hit the lesson before writing the code. ### Structural checklist (5 questions) Before shipping any new codec: 1. What regime does this tensor belong to? (I1) 2. Does the codec encode direction AND amplitude separately? (I3) 3. Is the palette substrate inner-product-preserving? (I2, A7) 4. Does the decoder evaluate the curve, or tile anchors? (I5) 5. Are wire-format widths asserted at encode time? (I4) ## Why this doc matters Every failed approach in this session taught something the next session would otherwise re-learn the hard way. HCLAM (#177->#178) already has its lesson buried in a passthrough commit. The Base17 reconstruction failure (#183) is buried in a PR comment. The #184 Path A/B duality (they aren't independent) is only visible if you read the probe results. This doc surfaces all of it as a single index, structured for mutation: each approach has 'mutation hooks' naming how it could evolve into something that works, rather than being discarded. ## Next step blocked by token budget The SpiralEncoding-on-real-Qwen3 probe (P1) is the obvious next experiment and would have landed in this PR. Deferred to a fresh session with budget. The doc leaves the probe fully specified so re-entering cold loses no context. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj --- docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md | 231 +++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md diff --git a/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md new file mode 100644 index 00000000..239a67ed --- /dev/null +++ b/docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md @@ -0,0 +1,231 @@ +# Codec Invariants & Experiments + +> Session-end catalogue of every compression approach tried in PRs #176–#185 +> and the lesson each one produced. Nothing is thrown away. Future sessions +> should use this to recognise which approach fits a given tensor shape, +> role, and quality gate — and to mutate from when the immediate path fails. + +## Core invariants (must always hold) + +These are structural truths about this codebase that every future codec +must respect. Violating any of them silently corrupts downstream state +(see `#183`, `#184`, #185` for each class of violation in the wild). + +### I1. Two regimes, opposite needs + +| Regime | Where it lives | What it requires | Error shape | +|---|---|---|---| +| **Argmax-decoded** | attention / MLP / logits / codec head | top-1 argmax stability under `hidden @ W.T` | robust to cos ≈ 0.95 | +| **Index-lookup** | `text_embedding`, `lm_head`, `code_embed` | per-row identity | cascading — no argmax downstream rescues | + +Empirically measured on Qwen3-TTS-0.6B: 477/478 tensors survived RVQ at cos ≈ 0.95 (argmax regime); one vocab tensor at cos = 0.05 **destroyed the pipeline** (index regime). See PR `#178` passthrough fix. + +### I2. Near-orthogonality of weight rows in high dim + +Qwen3 weight matrix rows in 1024-d or 2048-d space behave near-orthogonal for random pairs. Any compression that assumes rows cluster tightly in L2 is wrong. + +Concretely this refutes: +- `RVQ_K_LADDER_TUNING.md § 3` claim "one L2 centroid per row at ≤3 rows/leaf → cos ≈ 1" (disproven by PR `#177` HCLAM run: cos = 0.0046). +- Any single-centroid tree quantisation without directional residual (`HhtlDTensor::reconstruct_row` without SlotL cannot synthesise direction). + +### I3. Direction vs amplitude cannot be merged into one scalar + +A scalar residual (like `Slot V`) can only shift magnitude. It cannot describe direction. Any codec that uses one scalar magnitude + direction-less centroid misses high-dim directional information entirely. + +This was the unstated assumption baking into `BGZ_HHTL_D.md`'s "cos ≈ 0.95 typical" claim — probably true for *HHTL cascade inference* (table lookup), definitely false for *f32 GEMM reconstruction* (measured cos = 0.04 on real Qwen3 in PR `#183`). + +### I4. Wire-format type widths are hard caps, enforce at encode time + +`HhtlF32Entry.twig: u8` silently wraps `ci as u8` for `k > 256` (caught in `#185` codex review). Always `assert!(k <= MAX_*)` at encode sites. Widening the index (u8 → u16) is a wire-format change; log-companded bucketing is the alternative. + +### I5. "u8 can span u16/u64 effective" requires the right decoder + +Per the bgz17 philosophy: u8 × BF16 (amplitude) × gamma (stride) = u24–u64 effective precision at decode time — **if and only if** the decoder evaluates the universal curve parametrised by those values, not a straight-line interpolation or a tile-back. + +`Base17::to_f32` is the floor (tile-and-average). The elevator +(`rehydrate_interpolated` with γ+φ weighting) lives in +`highheelbgz::rehydrate` and is **not wired into `HhtlDTensor`** — that's +part of the gap that made PR #183 fail. + +### I6. The ticket-for-curve model + +The real primitive per the bgz17 design: each row = a ticket on a +universal kurvenlineal (curve). + +``` +Universal curve: r(θ) = a · e^(bθ) or fitted anchor spline +Ticket per row: (start, stop, stride, polarity) — as few as 1 signed byte (i8) +Shared per group: curve anchors (K × 17 × 2 B BF16), gamma profile (28 B) +``` + +Reconstruction = curve evaluation at the ticket's parameters. Not +`centroid + residual`. Not tile-and-average. Not tree quantisation. +**`highheelbgz::rehydrate::SpiralEncoding` implements this.** + +## Approaches tried, what each one was, where it fits + +### A1. `HhtlDTensor` — Base17 + Slot D + Slot V (PR #173–#174, codebase existing) + +- **What**: 4 B/row tree address (HEEL 2b + HIP 4b + TWIG 8b + polarity) + BF16 scalar magnitude +- **Designed for**: HHTL cascade lookup inference (Skip/Attend/Compose/Escalate routing) +- **Measured on**: Qwen3-TTS-0.6B reconstruction path in `#183` — cos = 0.04 +- **Verdict**: **Correct codec, wrong application**. Use for cascade inference (`bgz-tensor::hhtl_cache`). Do NOT use for f32 GEMM reconstruction. +- **Mutation hooks**: Slot L residual (PR `#181`) adds direction correction; Slot V is still unused in `reconstruct_row`. If f32 GEMM is the target, ADD a curve-evaluator decode path (`rehydrate_interpolated`) instead of the current Base17 tile-back. + +### A2. Progressive residual RVQ with k-ladder (PR #176) + +- **What**: Multiple CLAM codebooks per tensor, residual accumulates across levels +- **Measured on**: Qwen3-TTS-0.6B vocab embedding — cos = 0.054 +- **Verdict**: **Works on argmax-regime tensors** (477/478 hit cos ≈ 1). **Fails on index-regime vocab tensors** because k=4096 < rows/4 on 151K-row vocab. +- **Mutation hooks**: Extend k-ladder for large-vocab tensors (e.g. `[256, 1024, 4096, 16384]`) OR switch those tensors to passthrough BF16 (what #178 did). + +### A3. Hierarchical CLAM 256×256 (PR #177, REFUTED by #178) + +- **What**: Tree quantisation: 256 L1 coarse clusters × 256 L2 fine centroids per cluster, one leaf per row, no residual sum +- **Measured on**: vocab embedding — cos = 0.0046 (**worse than RVQ it replaced**) +- **Verdict**: **Structurally incapable of reconstructing near-orthogonal rows.** Single-centroid picks one existing row as the answer; for near-orthogonal distinct rows, cos ≈ 0. +- **Mutation hooks**: Do NOT use for reconstruction. Could work for lookup-grade routing where only nearest-centroid identity matters, not value fidelity. That is what `HhtlDTensor` already is. +- **Refutation notice**: `docs/RVQ_K_LADDER_TUNING.md § 3` must be read with this refutation in mind. + +### A4. Passthrough BF16 for `n_rows > 8192` (PR #178, SHIPS) + +- **What**: Skip compression entirely on vocab-sized tensors +- **Measured on**: Qwen3-TTS-0.6B — codec token match 225/225 = 100% +- **Verdict**: **Correctness ship-grade.** Storage ratio 1:1.39 (net loss) — not a product. +- **Mutation hooks**: Replace passthrough with any index-regime codec (SpiralEncoding shared-anchor, HhtlDTensor + SlotL properly reconstructed, f32 palette with log-radial CLAM) as soon as that codec hits ρ ≥ 0.98 on real vocab rows. + +### A5. SlotL — 8 × i8 directional residual on shared SVD basis (PR #180, #181, #182) + +- **What**: 8 i8 coefficients on a palette-shared Matryoshka SVD basis; encoder projects `row − centroid` onto basis, quantises +- **Measured on**: synthetic low-rank — ρ ≥ 0.98; paired with Base17 centroid on real Qwen3 — ρ ≈ 0.04 (ineffective because centroid is direction-less) +- **Verdict**: **Algorithm is correct in isolation.** Fails at integration because it's adding a direction correction to a centroid that has no direction. +- **Mutation hooks**: Keep the module, reuse with a directional centroid (f32 CLAM or curve-eval output). SlotL is a generic residual primitive that composes. + +### A6. HhtlF32Tensor — f32/BF16 CLAM centroid palette + SlotL (PR #184) + +- **What**: Replaces Base17 palette with CLAM centroids stored as f32 vectors; reuses SlotL residual +- **Measured on**: Qwen3-TTS-0.6B — ρ̄ ≈ 0.2–0.5 (10× better than Base17's 0.04, still short of 0.95 target) +- **Verdict**: **Right direction, insufficient bandwidth.** k=256 + 8 SVD coefficients is not enough for 1024-d near-orthogonal rows. +- **Mutation hooks**: k=512 or 1024 (needs widening twig to u16); per-leaf local SVD basis; log-radial CLAM on unit-normalised rows. Module already has codex-P1 bounds enforcement from #185. + +### A7. cascade_attention_probe — HhtlCache + FisherZTable table lookup for attention (PR #184) + +- **What**: Replace `Q · K^T → argmax` with `FisherZTable[pal_idx(Q), pal_idx(K)] → argmax` +- **Measured on**: layer-0 k_proj, 512 queries — 3.71% top-1 agreement +- **Verdict**: **Fails because Base17 palette doesn't preserve inner-product neighbourhoods.** Not an argument against codec-space inference; an argument that the palette under it must preserve inner-product structure first. +- **Mutation hooks**: Retry with f32 CLAM palette (Path A under Path B) — cascade inference only works when the palette faithfully partitions by inner product. This is the Path B / Path A dependency that wasn't clear before running the probe. + +## Abstractions that ARE the right primitive + +### R1. `highheelbgz::rehydrate::SpiralEncoding` + +- 6-byte `SpiralAddress` (start, stride) + K anchors × 17 × 2 B BF16 per row +- `GammaProfile` shared per model (28 B: role_gamma[6] + phi_scale) +- `rehydrate_interpolated(target_spd, gamma)`: φ-weighted interpolation `frac.powf(1/GOLDEN_RATIO)` between anchors — **golden-rule reconstruction, not linear interpolation** +- Self-test in module: exact match round-trip ρ = 1 on self; different vectors get ρ < 1; 1000-token vocab < 200 KB + +This is the real kurvenlineal codec. Every other "reconstruction-grade" attempt in this session is a less-capable cousin. + +**Unproven**: has not been measured against real Qwen3-TTS weight rows end-to-end. That's the missing probe — see § Open probes. + +### R2. Per-role stride in `NeuronPrint` (highheelbgz lib.rs) + +Six `SpiralAddress` fields, one per role, with fixed strides per the design: + +``` +q: stride=3 (attention, must match K) +k: stride=3 (attention) +v: stride=5 (content) +gate: stride=8 (thinking style) +up: stride=2 +down: stride=4 (down/up ratio = effective rank) +``` + +Total 36 bytes per neuron (6 roles × 6 bytes). This is what `should_use_leaf` / `classify_role` in `bgz-tensor::shared_palette` was reaching toward — mapping roles to per-role encoding parameters. **Currently the two schemes aren't integrated.** + +### R3. HHTL cascade inference (`bgz-tensor::hhtl_cache`) + +RouteAction { Skip, Attend, Compose, Escalate }. `HhtlDTensor` + `FisherZTable` composed at inference time replaces `hidden @ W.T` with table lookups. + +**Requires**: a palette that preserves inner-product neighbourhoods (the Base17 palette probably does *not* — see A7 above). The Path A+B dependency. + +## Open probes (unproven claims that need experiment before next build) + +### P1. SpiralEncoding on real Qwen3 weights + +Claim: `SpiralEncoding::rehydrate_interpolated` hits ρ ≥ 0.95 on real Qwen3-TTS-0.6B weight rows at reasonable K (say K=4–16). + +Probe: `spiral_reconstruction_probe.rs` (this PR). + +Pass → wire SpiralEncoding into `universal_hhtld_encode`-style pipeline, retire the Base17 reconstruction path. +Fail → the curve family is mis-fit; need to calibrate anchors differently, or a different curve equation. + +### P2. Shared anchors + i8 position per row + +Claim: If anchors are shared across a (component, role, shape) group à la `SharedPaletteGroup`, per-row cost collapses from 142 B to ~1 B. + +Probe: NOT YET WRITTEN. Depends on P1 passing first. + +Pass → real compression story. Projected 200:1 on vocab tensors at shippable ρ. +Fail → shared anchors lose per-row fidelity; each row needs its own curve calibration. + +### P3. Palette preserves inner-product neighbourhoods (Path A → B dependency) + +Claim: An f32 CLAM palette on Qwen3 weight rows, used as the substrate for `FisherZTable`, gives `lookup_f32(pal(q), pal(k)) ≈ q · k^T`. + +Probe: NOT YET WRITTEN. Successor to `cascade_attention_probe.rs` with f32 palette instead of Base17. + +Pass → cascade inference is viable, proceed to pipeline rewire. +Fail → codec-space inference needs richer routing (per-family tables, hierarchical route indices). + +### P4. Log-radial CLAM with magnitude split + +Claim: Unit-normalising rows (direction ∈ sphere) + CLAM on unit sphere + BF16 magnitude separately ≫ linear CLAM on raw f32 rows. + +Probe: NOT YET WRITTEN. Would replace `clam_furthest_point_f32` in `hhtl_f32.rs`. + +Pass → HhtlF32Tensor ρ̄ improves from 0.2–0.5 to ≥ 0.95 at same k=256. +Fail → direction space is too near-uniform to cluster; needs different factorisation. + +## Signposts for future sessions + +**Déjà vu triggers** — if a future session is tempted to do any of these, +read the referenced PR first: + +| Instinct | Read first | +|---|---| +| "Let's reconstruct rows from Base17 centroids" | #183 — the cos = 0.04 measurement | +| "Hierarchical CLAM will fix the vocab tensor" | #177 → #178, HCLAM got cos = 0.0046, worse than RVQ | +| "Widen twig to u16 for k > 256 centroids" | #185 codex; first probe log-companded bucketing | +| "Base17 palette will preserve attention scoring" | #184 cascade_attention_probe 3.71% agreement | +| "Add more layers of residual" (RVQ-style) | A2 — works for argmax regime only | +| "f32 palette fixes reconstruction entirely" | A6 — 10× better than Base17, still not 0.95 | +| "Single scalar residual (Slot V)" | I3 — can only shift amplitude, cannot add direction | + +**Structural checklist before shipping any new codec:** + +1. What regime does this tensor belong to? (I1) +2. Does the codec encode direction AND amplitude separately? (I3) +3. Is the palette substrate inner-product-preserving? (I2, A7) +4. Does the decoder evaluate the curve, or tile anchors? (I5) +5. Are wire-format widths asserted at encode time? (I4) + +## PR timeline (this session) + +| PR | Approach | Gate result | +|---|---|---| +| #176 | AVX-512 F32x16 FMA encoder + AMX polyfill | ✓ SIMD correct | +| #177 | HCLAM 256×256 | ✗ REFUTED for vocab (cos 0.0046) | +| #178 | Passthrough BF16 `n_rows > 8192` + Lance roadmap + WAV test | ✓ token match 225/225 | +| #179 | Compression mindset shifts doc | — (doc) | +| #180 | SlotL foundation (8 × i8 on shared SVD) | ✓ unit tests pass | +| #181 | HhtlDTensor × SlotL integration | ✓ tests pass, integration with centroid flawed | +| #182 | SharedPaletteGroup × SlotL group-level | ✓ tests pass | +| #183 | Universal encoder with Base17 centroid reconstruction | ✗ ρ ≈ 0.04 on real Qwen3 | +| #184 | HhtlF32Tensor + Path A/B probes | ◐ Path A ρ̄ 0.2–0.5 (improves on Base17, short of target); Path B 3.71% (fails) | +| #185 | `HhtlF32Tensor` palette bounds (codex P1) | ✓ safety fix | +| #186 | This doc + SpiralEncoding reconstruction probe | — (probe) | + +Next session starts here. + +https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj