forked from lance-format/lance-graph
-
Notifications
You must be signed in to change notification settings - Fork 0
docs: codec invariants + experiment catalogue (session-end déjà-vu) #186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,231 @@ | ||
| # Codec Invariants & Experiments | ||
|
|
||
| > Session-end catalogue of every compression approach tried in PRs #176–#185 | ||
| > and the lesson each one produced. Nothing is thrown away. Future sessions | ||
| > should use this to recognise which approach fits a given tensor shape, | ||
| > role, and quality gate — and to mutate from when the immediate path fails. | ||
|
|
||
| ## Core invariants (must always hold) | ||
|
|
||
| These are structural truths about this codebase that every future codec | ||
| must respect. Violating any of them silently corrupts downstream state | ||
| (see `#183`, `#184`, #185` for each class of violation in the wild). | ||
|
|
||
| ### I1. Two regimes, opposite needs | ||
|
|
||
| | Regime | Where it lives | What it requires | Error shape | | ||
| |---|---|---|---| | ||
| | **Argmax-decoded** | attention / MLP / logits / codec head | top-1 argmax stability under `hidden @ W.T` | robust to cos ≈ 0.95 | | ||
| | **Index-lookup** | `text_embedding`, `lm_head`, `code_embed` | per-row identity | cascading — no argmax downstream rescues | | ||
|
|
||
| Empirically measured on Qwen3-TTS-0.6B: 477/478 tensors survived RVQ at cos ≈ 0.95 (argmax regime); one vocab tensor at cos = 0.05 **destroyed the pipeline** (index regime). See PR `#178` passthrough fix. | ||
|
|
||
| ### I2. Near-orthogonality of weight rows in high dim | ||
|
|
||
| Qwen3 weight matrix rows in 1024-d or 2048-d space behave near-orthogonal for random pairs. Any compression that assumes rows cluster tightly in L2 is wrong. | ||
|
|
||
| Concretely this refutes: | ||
| - `RVQ_K_LADDER_TUNING.md § 3` claim "one L2 centroid per row at ≤3 rows/leaf → cos ≈ 1" (disproven by PR `#177` HCLAM run: cos = 0.0046). | ||
| - Any single-centroid tree quantisation without directional residual (`HhtlDTensor::reconstruct_row` without SlotL cannot synthesise direction). | ||
|
|
||
| ### I3. Direction vs amplitude cannot be merged into one scalar | ||
|
|
||
| A scalar residual (like `Slot V`) can only shift magnitude. It cannot describe direction. Any codec that uses one scalar magnitude + direction-less centroid misses high-dim directional information entirely. | ||
|
|
||
| This was the unstated assumption baking into `BGZ_HHTL_D.md`'s "cos ≈ 0.95 typical" claim — probably true for *HHTL cascade inference* (table lookup), definitely false for *f32 GEMM reconstruction* (measured cos = 0.04 on real Qwen3 in PR `#183`). | ||
|
|
||
| ### I4. Wire-format type widths are hard caps, enforce at encode time | ||
|
|
||
| `HhtlF32Entry.twig: u8` silently wraps `ci as u8` for `k > 256` (caught in `#185` codex review). Always `assert!(k <= MAX_*)` at encode sites. Widening the index (u8 → u16) is a wire-format change; log-companded bucketing is the alternative. | ||
|
|
||
| ### I5. "u8 can span u16/u64 effective" requires the right decoder | ||
|
|
||
| Per the bgz17 philosophy: u8 × BF16 (amplitude) × gamma (stride) = u24–u64 effective precision at decode time — **if and only if** the decoder evaluates the universal curve parametrised by those values, not a straight-line interpolation or a tile-back. | ||
|
|
||
| `Base17::to_f32` is the floor (tile-and-average). The elevator | ||
| (`rehydrate_interpolated` with γ+φ weighting) lives in | ||
| `highheelbgz::rehydrate` and is **not wired into `HhtlDTensor`** — that's | ||
| part of the gap that made PR #183 fail. | ||
|
|
||
| ### I6. The ticket-for-curve model | ||
|
|
||
| The real primitive per the bgz17 design: each row = a ticket on a | ||
| universal kurvenlineal (curve). | ||
|
|
||
| ``` | ||
| Universal curve: r(θ) = a · e^(bθ) or fitted anchor spline | ||
| Ticket per row: (start, stop, stride, polarity) — as few as 1 signed byte (i8) | ||
| Shared per group: curve anchors (K × 17 × 2 B BF16), gamma profile (28 B) | ||
| ``` | ||
|
|
||
| Reconstruction = curve evaluation at the ticket's parameters. Not | ||
| `centroid + residual`. Not tile-and-average. Not tree quantisation. | ||
| **`highheelbgz::rehydrate::SpiralEncoding` implements this.** | ||
|
|
||
| ## Approaches tried, what each one was, where it fits | ||
|
|
||
| ### A1. `HhtlDTensor` — Base17 + Slot D + Slot V (PR #173–#174, codebase existing) | ||
|
|
||
| - **What**: 4 B/row tree address (HEEL 2b + HIP 4b + TWIG 8b + polarity) + BF16 scalar magnitude | ||
| - **Designed for**: HHTL cascade lookup inference (Skip/Attend/Compose/Escalate routing) | ||
| - **Measured on**: Qwen3-TTS-0.6B reconstruction path in `#183` — cos = 0.04 | ||
| - **Verdict**: **Correct codec, wrong application**. Use for cascade inference (`bgz-tensor::hhtl_cache`). Do NOT use for f32 GEMM reconstruction. | ||
| - **Mutation hooks**: Slot L residual (PR `#181`) adds direction correction; Slot V is still unused in `reconstruct_row`. If f32 GEMM is the target, ADD a curve-evaluator decode path (`rehydrate_interpolated`) instead of the current Base17 tile-back. | ||
|
|
||
| ### A2. Progressive residual RVQ with k-ladder (PR #176) | ||
|
|
||
| - **What**: Multiple CLAM codebooks per tensor, residual accumulates across levels | ||
| - **Measured on**: Qwen3-TTS-0.6B vocab embedding — cos = 0.054 | ||
| - **Verdict**: **Works on argmax-regime tensors** (477/478 hit cos ≈ 1). **Fails on index-regime vocab tensors** because k=4096 < rows/4 on 151K-row vocab. | ||
| - **Mutation hooks**: Extend k-ladder for large-vocab tensors (e.g. `[256, 1024, 4096, 16384]`) OR switch those tensors to passthrough BF16 (what #178 did). | ||
|
|
||
| ### A3. Hierarchical CLAM 256×256 (PR #177, REFUTED by #178) | ||
|
|
||
| - **What**: Tree quantisation: 256 L1 coarse clusters × 256 L2 fine centroids per cluster, one leaf per row, no residual sum | ||
| - **Measured on**: vocab embedding — cos = 0.0046 (**worse than RVQ it replaced**) | ||
| - **Verdict**: **Structurally incapable of reconstructing near-orthogonal rows.** Single-centroid picks one existing row as the answer; for near-orthogonal distinct rows, cos ≈ 0. | ||
| - **Mutation hooks**: Do NOT use for reconstruction. Could work for lookup-grade routing where only nearest-centroid identity matters, not value fidelity. That is what `HhtlDTensor` already is. | ||
| - **Refutation notice**: `docs/RVQ_K_LADDER_TUNING.md § 3` must be read with this refutation in mind. | ||
|
|
||
| ### A4. Passthrough BF16 for `n_rows > 8192` (PR #178, SHIPS) | ||
|
|
||
| - **What**: Skip compression entirely on vocab-sized tensors | ||
| - **Measured on**: Qwen3-TTS-0.6B — codec token match 225/225 = 100% | ||
| - **Verdict**: **Correctness ship-grade.** Storage ratio 1:1.39 (net loss) — not a product. | ||
| - **Mutation hooks**: Replace passthrough with any index-regime codec (SpiralEncoding shared-anchor, HhtlDTensor + SlotL properly reconstructed, f32 palette with log-radial CLAM) as soon as that codec hits ρ ≥ 0.98 on real vocab rows. | ||
|
|
||
| ### A5. SlotL — 8 × i8 directional residual on shared SVD basis (PR #180, #181, #182) | ||
|
|
||
| - **What**: 8 i8 coefficients on a palette-shared Matryoshka SVD basis; encoder projects `row − centroid` onto basis, quantises | ||
| - **Measured on**: synthetic low-rank — ρ ≥ 0.98; paired with Base17 centroid on real Qwen3 — ρ ≈ 0.04 (ineffective because centroid is direction-less) | ||
| - **Verdict**: **Algorithm is correct in isolation.** Fails at integration because it's adding a direction correction to a centroid that has no direction. | ||
| - **Mutation hooks**: Keep the module, reuse with a directional centroid (f32 CLAM or curve-eval output). SlotL is a generic residual primitive that composes. | ||
|
|
||
| ### A6. HhtlF32Tensor — f32/BF16 CLAM centroid palette + SlotL (PR #184) | ||
|
|
||
| - **What**: Replaces Base17 palette with CLAM centroids stored as f32 vectors; reuses SlotL residual | ||
| - **Measured on**: Qwen3-TTS-0.6B — ρ̄ ≈ 0.2–0.5 (10× better than Base17's 0.04, still short of 0.95 target) | ||
| - **Verdict**: **Right direction, insufficient bandwidth.** k=256 + 8 SVD coefficients is not enough for 1024-d near-orthogonal rows. | ||
| - **Mutation hooks**: k=512 or 1024 (needs widening twig to u16); per-leaf local SVD basis; log-radial CLAM on unit-normalised rows. Module already has codex-P1 bounds enforcement from #185. | ||
|
|
||
| ### A7. cascade_attention_probe — HhtlCache + FisherZTable table lookup for attention (PR #184) | ||
|
|
||
| - **What**: Replace `Q · K^T → argmax` with `FisherZTable[pal_idx(Q), pal_idx(K)] → argmax` | ||
| - **Measured on**: layer-0 k_proj, 512 queries — 3.71% top-1 agreement | ||
| - **Verdict**: **Fails because Base17 palette doesn't preserve inner-product neighbourhoods.** Not an argument against codec-space inference; an argument that the palette under it must preserve inner-product structure first. | ||
| - **Mutation hooks**: Retry with f32 CLAM palette (Path A under Path B) — cascade inference only works when the palette faithfully partitions by inner product. This is the Path B / Path A dependency that wasn't clear before running the probe. | ||
|
|
||
| ## Abstractions that ARE the right primitive | ||
|
|
||
| ### R1. `highheelbgz::rehydrate::SpiralEncoding` | ||
|
|
||
| - 6-byte `SpiralAddress` (start, stride) + K anchors × 17 × 2 B BF16 per row | ||
| - `GammaProfile` shared per model (28 B: role_gamma[6] + phi_scale) | ||
| - `rehydrate_interpolated(target_spd, gamma)`: φ-weighted interpolation `frac.powf(1/GOLDEN_RATIO)` between anchors — **golden-rule reconstruction, not linear interpolation** | ||
| - Self-test in module: exact match round-trip ρ = 1 on self; different vectors get ρ < 1; 1000-token vocab < 200 KB | ||
|
|
||
| This is the real kurvenlineal codec. Every other "reconstruction-grade" attempt in this session is a less-capable cousin. | ||
|
|
||
| **Unproven**: has not been measured against real Qwen3-TTS weight rows end-to-end. That's the missing probe — see § Open probes. | ||
|
|
||
| ### R2. Per-role stride in `NeuronPrint` (highheelbgz lib.rs) | ||
|
|
||
| Six `SpiralAddress` fields, one per role, with fixed strides per the design: | ||
|
|
||
| ``` | ||
| q: stride=3 (attention, must match K) | ||
| k: stride=3 (attention) | ||
| v: stride=5 (content) | ||
| gate: stride=8 (thinking style) | ||
| up: stride=2 | ||
| down: stride=4 (down/up ratio = effective rank) | ||
| ``` | ||
|
|
||
| Total 36 bytes per neuron (6 roles × 6 bytes). This is what `should_use_leaf` / `classify_role` in `bgz-tensor::shared_palette` was reaching toward — mapping roles to per-role encoding parameters. **Currently the two schemes aren't integrated.** | ||
|
|
||
| ### R3. HHTL cascade inference (`bgz-tensor::hhtl_cache`) | ||
|
|
||
| RouteAction { Skip, Attend, Compose, Escalate }. `HhtlDTensor` + `FisherZTable` composed at inference time replaces `hidden @ W.T` with table lookups. | ||
|
|
||
| **Requires**: a palette that preserves inner-product neighbourhoods (the Base17 palette probably does *not* — see A7 above). The Path A+B dependency. | ||
|
|
||
| ## Open probes (unproven claims that need experiment before next build) | ||
|
|
||
| ### P1. SpiralEncoding on real Qwen3 weights | ||
|
|
||
| Claim: `SpiralEncoding::rehydrate_interpolated` hits ρ ≥ 0.95 on real Qwen3-TTS-0.6B weight rows at reasonable K (say K=4–16). | ||
|
|
||
| Probe: `spiral_reconstruction_probe.rs` (this PR). | ||
|
|
||
| Pass → wire SpiralEncoding into `universal_hhtld_encode`-style pipeline, retire the Base17 reconstruction path. | ||
| Fail → the curve family is mis-fit; need to calibrate anchors differently, or a different curve equation. | ||
|
|
||
| ### P2. Shared anchors + i8 position per row | ||
|
|
||
| Claim: If anchors are shared across a (component, role, shape) group à la `SharedPaletteGroup`, per-row cost collapses from 142 B to ~1 B. | ||
|
|
||
| Probe: NOT YET WRITTEN. Depends on P1 passing first. | ||
|
|
||
| Pass → real compression story. Projected 200:1 on vocab tensors at shippable ρ. | ||
| Fail → shared anchors lose per-row fidelity; each row needs its own curve calibration. | ||
|
|
||
| ### P3. Palette preserves inner-product neighbourhoods (Path A → B dependency) | ||
|
|
||
| Claim: An f32 CLAM palette on Qwen3 weight rows, used as the substrate for `FisherZTable`, gives `lookup_f32(pal(q), pal(k)) ≈ q · k^T`. | ||
|
|
||
| Probe: NOT YET WRITTEN. Successor to `cascade_attention_probe.rs` with f32 palette instead of Base17. | ||
|
|
||
| Pass → cascade inference is viable, proceed to pipeline rewire. | ||
| Fail → codec-space inference needs richer routing (per-family tables, hierarchical route indices). | ||
|
|
||
| ### P4. Log-radial CLAM with magnitude split | ||
|
|
||
| Claim: Unit-normalising rows (direction ∈ sphere) + CLAM on unit sphere + BF16 magnitude separately ≫ linear CLAM on raw f32 rows. | ||
|
|
||
| Probe: NOT YET WRITTEN. Would replace `clam_furthest_point_f32` in `hhtl_f32.rs`. | ||
|
|
||
| Pass → HhtlF32Tensor ρ̄ improves from 0.2–0.5 to ≥ 0.95 at same k=256. | ||
| Fail → direction space is too near-uniform to cluster; needs different factorisation. | ||
|
|
||
| ## Signposts for future sessions | ||
|
|
||
| **Déjà vu triggers** — if a future session is tempted to do any of these, | ||
| read the referenced PR first: | ||
|
|
||
| | Instinct | Read first | | ||
| |---|---| | ||
| | "Let's reconstruct rows from Base17 centroids" | #183 — the cos = 0.04 measurement | | ||
| | "Hierarchical CLAM will fix the vocab tensor" | #177 → #178, HCLAM got cos = 0.0046, worse than RVQ | | ||
| | "Widen twig to u16 for k > 256 centroids" | #185 codex; first probe log-companded bucketing | | ||
| | "Base17 palette will preserve attention scoring" | #184 cascade_attention_probe 3.71% agreement | | ||
| | "Add more layers of residual" (RVQ-style) | A2 — works for argmax regime only | | ||
| | "f32 palette fixes reconstruction entirely" | A6 — 10× better than Base17, still not 0.95 | | ||
| | "Single scalar residual (Slot V)" | I3 — can only shift amplitude, cannot add direction | | ||
|
|
||
| **Structural checklist before shipping any new codec:** | ||
|
|
||
| 1. What regime does this tensor belong to? (I1) | ||
| 2. Does the codec encode direction AND amplitude separately? (I3) | ||
| 3. Is the palette substrate inner-product-preserving? (I2, A7) | ||
| 4. Does the decoder evaluate the curve, or tile anchors? (I5) | ||
| 5. Are wire-format widths asserted at encode time? (I4) | ||
|
|
||
| ## PR timeline (this session) | ||
|
|
||
| | PR | Approach | Gate result | | ||
| |---|---|---| | ||
| | #176 | AVX-512 F32x16 FMA encoder + AMX polyfill | ✓ SIMD correct | | ||
| | #177 | HCLAM 256×256 | ✗ REFUTED for vocab (cos 0.0046) | | ||
| | #178 | Passthrough BF16 `n_rows > 8192` + Lance roadmap + WAV test | ✓ token match 225/225 | | ||
| | #179 | Compression mindset shifts doc | — (doc) | | ||
| | #180 | SlotL foundation (8 × i8 on shared SVD) | ✓ unit tests pass | | ||
| | #181 | HhtlDTensor × SlotL integration | ✓ tests pass, integration with centroid flawed | | ||
| | #182 | SharedPaletteGroup × SlotL group-level | ✓ tests pass | | ||
| | #183 | Universal encoder with Base17 centroid reconstruction | ✗ ρ ≈ 0.04 on real Qwen3 | | ||
| | #184 | HhtlF32Tensor + Path A/B probes | ◐ Path A ρ̄ 0.2–0.5 (improves on Base17, short of target); Path B 3.71% (fails) | | ||
| | #185 | `HhtlF32Tensor` palette bounds (codex P1) | ✓ safety fix | | ||
| | #186 | This doc + SpiralEncoding reconstruction probe | — (probe) | | ||
|
|
||
| Next session starts here. | ||
|
|
||
| https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line says the P1 experiment probe is
spiral_reconstruction_probe.rsin this PR, but this commit only adds documentation and the repository has no file by that name, so the documented next step is not runnable as written. Because this document is intended as the handoff for the next session, the missing artifact can cause immediate confusion and duplicate effort; either add the probe file or change this entry toNOT YET WRITTEN(consistent with P2–P4).Useful? React with 👍 / 👎.