Skip to content

feat(bgz-tensor): SharedPaletteGroup × SlotL group-level integration#182

Merged
AdaWorldAPI merged 2 commits into
mainfrom
claude/shared-palette-slot-l
Apr 15, 2026
Merged

feat(bgz-tensor): SharedPaletteGroup × SlotL group-level integration#182
AdaWorldAPI merged 2 commits into
mainfrom
claude/shared-palette-slot-l

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Step 3 of the universal_hhtld_encode plan. SVD basis for Slot L now lives at group granularity — one basis amortised across all same-role same-shape tensors.

Single file: crates/bgz-tensor/src/shared_palette.rs (+260 LOC, all additive).

Changes (additive, backwards-compat)

pub struct SharedPaletteGroup {
    // ... existing fields preserved ...
    pub tensor_slot_l: Vec<(String, Vec<SlotL>, f32)>,  // per-tensor leaves
    pub svd_basis:     Option<SvdBasis>,                 // shared basis
}

pub fn should_use_leaf(role: &str) -> bool;
pub fn build_group_with_leaf(key, names, rows_f32, k) -> SharedPaletteGroup;

impl SharedPaletteGroup {
    pub fn slot_l_byte_size(&self) -> usize;
    pub fn svd_basis_byte_size(&self) -> usize;
    pub fn slot_l_for(&self, tensor_name: &str) -> Option<(&[SlotL], f32)>;
}

The existing build_group_with_fisher_z constructor defaults the new fields to empty / None. No caller needs to change.

Dispatch table (mirrors the mindset-shift regime split)

role should_use_leaf Wire cost Quality target
embed / lm_head true (index regime) 12 B/row (Slot D + Slot V + Slot L) ρ ≳ 0.98 per row
qko, v, gate, up, down, projection, other false (argmax regime) 4 B/row ρ ≳ 0.95 per row is sufficient

build_group_with_leaf does this dispatch internally — argmax-regime calls delegate to the existing build_group_with_fisher_z path unchanged; index-regime calls build one SvdBasis from the first tensor's rows (capped at 4096 samples for speed on 151K-vocab) and feed every tensor through encode_with_leaf with that shared basis.

Tests (all new pass)

Test Verifies
should_use_leaf_classification 2 true (embed, lm_head) / 7 false
build_group_with_leaf_falls_back_for_argmax_regime role="qko" → no basis, no Slot L, same 4 B/row as before
build_group_with_leaf_populates_slot_l_for_index_regime role="embed", 2 tensors × 64 rows × 128 cols → Slot L populated at 8 B/row, basis shared
svd_basis_shared_across_group_not_per_tensor Basis size constant regardless of tensor count — amortisation confirmed

Full bgz-tensor suite: 154 passing (150 + 4 new). Pre-existing failures on main (gamma_calibration, hhtl_d_entry_roundtrip, matryoshka, hhtl_cache) unchanged.

Storage amortisation

For Qwen3-TTS-0.6B's text embedding as a one-tensor group:

  • Before: each tensor would own a full copy of its SVD basis
  • After: one SvdBasis shared across all tensors in the (talker, embed, [151936, 2048]) group

For multi-tensor groups (e.g. 15 lm_heads under (talker, lm_head, [2048, 1024])):

  • Basis cost flat (≈ 32 KB for 8 × 2048 × 2 B)
  • Entries cost scales linearly (15 × n_rows × 8 B)

Session PR stack (three sibling PRs land together cleanly)

PR Status What
#178 open Passthrough fix + Lance roadmap + WAV test (from tts_rvq_e2e.rs path)
#180 merged SlotL foundation module
#181 merged HhtlDTensor × SlotL per-tensor integration
#182 this PR SharedPaletteGroup × SlotL group-level integration

Follow-ups (next PRs in the chain)

  1. universal_hhtld_encode.rs example — iterate over tensors in a safetensors model, bucket by (classify_component, classify_role, effective_shape), feed each bucket to build_group_with_leaf (which internally dispatches on role).
  2. .hhtld container format — single-file pack with magic byte header, palette + SVD basis + per-tensor entries + Slot L.
  3. Inference wiring — swap tts_full_inference's custom RVQ codebook sum for HhtlDTensor::reconstruct_row on index-regime tensors; argmax-parity validation against raw inference.

Test plan

  • cargo build --manifest-path crates/bgz-tensor/Cargo.toml — clean
  • cargo test shared_palette — 8/8 pass (4 new)
  • Full cargo test bgz-tensor — 154 pass, 5 pre-existing failures (not mine)
  • universal_hhtld_encode.rs example (next PR)
  • .hhtld container format (next PR)
  • Inference wiring (next PR)

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Step 3 of the universal_hhtld_encode plan (#179 mindset doc, #180 SlotL
foundation, #181 HhtlDTensor integration). The SVD basis for Slot L now
lives at GROUP granularity — one basis amortised across all same-role
same-shape tensors in the group.

## Changes (additive, backwards-compat)

SharedPaletteGroup gains:
  - pub tensor_slot_l: Vec<(String, Vec<SlotL>, f32)>   // per-tensor leaves
  - pub svd_basis: Option<SvdBasis>                      // shared basis

Existing `build_group_with_fisher_z` constructor defaults both to empty /
None. No caller needs to change.

## New dispatch primitive

  pub fn should_use_leaf(role: &str) -> bool
    true  -> "embed" | "lm_head"   (index-regime, per-row identity needed)
    false -> everything else       (argmax-regime, 4 B/row is enough)

Maps directly to the two-regime split named in
docs/COMPRESSION_MINDSET_SHIFTS.md § "The insight that reframes the rest".

## New entry point

  pub fn build_group_with_leaf(key, names, rows_f32, k) -> SharedPaletteGroup

Dispatches on key.role:
  - argmax-regime  -> delegates to build_group_with_fisher_z (unchanged)
  - index-regime   -> builds ONE SvdBasis from first tensor's rows
                      (capped at 4096 sample rows for speed on 151K-vocab),
                      then encodes each tensor via encode_with_leaf so the
                      basis is shared across the whole group

Wire cost per row for index-regime groups: 4 B (Slot D + Slot V) + 8 B
(Slot L) = 12 B/row. Basis cost is amortised: one SvdBasis per group,
regardless of tensor count.

## Convenience methods on SharedPaletteGroup

  slot_l_byte_size()        -> bytes across all per-tensor Slot L entries
  svd_basis_byte_size()     -> bytes for the shared SVD basis (0 if None)
  slot_l_for(tensor_name)   -> Option<(&[SlotL], f32)> for lookup in
                                reconstruction paths

## Tests (all new pass)

  should_use_leaf_classification ................................. ok
  build_group_with_leaf_falls_back_for_argmax_regime ............. ok
    role="qko" -> no SVD basis, no Slot L (4 B/row preserved)
  build_group_with_leaf_populates_slot_l_for_index_regime ........ ok
    role="embed", 2 tensors × 64 rows × 128 cols -> Slot L populated
    at 8 B/row, basis shared (single SvdBasis)
  svd_basis_shared_across_group_not_per_tensor ................... ok
    Confirms amortisation: basis_size is constant; entries_size scales
    linearly with tensor count.

Full bgz-tensor suite: 150 passing, 4 new = 154. Pre-existing failures
on main (gamma_calibration, hhtl_d_entry_roundtrip, matryoshka,
hhtl_cache) are unchanged — not introduced by this work.

## Relation to prior PRs in session

  #180 (merged) - SlotL module (8 × i8 on shared SVD basis)
  #181 (merged) - HhtlDTensor × SlotL per-tensor integration
  #182 (this)   - SharedPaletteGroup × SlotL group-level amortisation

## Follow-ups

- `universal_hhtld_encode.rs` example: iterate over tensors, bucket by
  (classify_component, classify_role, effective_shape), feed each bucket
  to build_group_with_leaf (which internally dispatches on role)
- .hhtld container format: single-file pack with magic byte header,
  palette + basis + per-tensor entries + Slot L
- Inference wiring: swap tts_full_inference's RVQ codebook sum for
  HhtlDTensor::reconstruct_row on the index-regime tensors

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f9dced855

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +261 to +263
if tensor_rows_f32.is_empty() {
return build_group_with_fisher_z(key, tensor_names, tensor_rows_f32, k);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return early for empty group input

The empty-input guard currently still panics: when tensor_rows_f32.is_empty() this branch delegates to build_group_with_fisher_z, which immediately indexes tensor_rows_f32[0] and aborts. Any caller that passes an empty bucket (for example, after filtering/grouping) will crash instead of getting a safe fallback, so this helper does not actually handle the case it checks for.

Useful? React with 👍 / 👎.

AdaWorldAPI pushed a commit that referenced this pull request Apr 15, 2026
… with SlotL dispatch

Implements the `universal_hhtld_encode` proposal from
docs/COMPRESSION_MINDSET_SHIFTS.md (#179), built on the #180, #181, #182
foundation stack.

## What it does

Consumes any BPE-vocab safetensors model, buckets tensors by
(component, role, shape), routes each bucket through
bgz_tensor::shared_palette::build_group_with_leaf which auto-dispatches
on role:

  argmax regime  (qko/v/gate/up/down/projection)  → 4 B/row Slot D only
  index  regime  (embed/lm_head)                   → 12 B/row Slot D + Slot L
  passthrough    (norms, biases, < is_encodable)   → BF16 unchanged

## Validation gates

This ships gates 1 + 3 of the 4-gate plan:

  GATE 1: per-row ρ histogram, split by regime
          - argmax: target median ≥ 0.95, p5 ≥ 0.90
          - index:  target median ≥ 0.98, p5 ≥ 0.95
  GATE 3: storage ratio vs BF16 original
          - target ≥ 2:1

Sample-based (first 64 rows per tensor) to keep wall time bounded on
151K-row vocab tensors.

Gates 2 (argmax-parity on held-out prompt) and 4 (WAV envelope match
vs raw) require integration with tts_full_inference.rs and land in a
follow-up PR.

## Usage

  cargo run --release --example universal_hhtld_encode \
      --manifest-path crates/thinking-engine/Cargo.toml \
      -- /path/to/model.safetensors

## Design notes

- reconstruct_row_from_group rebuilds a transient HhtlDTensor from the
  SharedPaletteGroup's (cache, entries, slot_l, svd_basis) so it can
  call HhtlDTensor::reconstruct_row. Cleaner would be a method on
  SharedPaletteGroup; deferred to keep the PR focused on the example.
- Sample cap at 64 rows per tensor: full validation pass on Qwen3-TTS-0.6B
  is O(bucket_count × tensor_count × 64 × n_cols) which bounds wall time.
  For gate 2 (argmax parity) the full row set matters — handled in the
  follow-up integration PR.

## Session PR stack

  #180 (merged) - SlotL foundation
  #181 (merged) - HhtlDTensor × SlotL per-tensor integration
  #182 (this PR's dep) - SharedPaletteGroup × SlotL group-level integration
  #183 (this PR) - universal_hhtld_encode example + gates 1 + 3

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
…#182)

Per codex P2 comment on #182: build_group_with_leaf's empty-input
guard dispatched to build_group_with_fisher_z, which then indexed
tensor_rows_f32[0] unconditionally. Result: panic on empty slice
instead of graceful handling.

Fix: move the empty-input guard into build_group_with_fisher_z itself.
Now both entry points are safe:
  - build_group_with_fisher_z([]) -> empty SharedPaletteGroup
  - build_group_with_leaf([])     -> empty SharedPaletteGroup (via fallback)

The empty group has:
  - empty WeightPalette via WeightPalette::build(&[], k)
  - empty HhtlCache from that palette
  - empty hip_families / tensor_entries / tensor_slot_l
  - None for fisher_z / svd_basis

This matches the shape a successful build would return for a trivial
input — callers iterating over `tensor_entries` or `tensor_slot_l`
get zero iterations instead of catching an unwind.

Regression test empty_input_returns_safe_empty_group_not_panic covers
three entry paths:
  1. build_group_with_leaf with index-regime key (embed)
  2. build_group_with_fisher_z directly
  3. build_group_with_leaf with argmax-regime key (qko) — falls through
     to build_group_with_fisher_z

All 9 shared_palette tests pass (8 existing + 1 new).

Refs: PR #182 codex comment (P2 badge, "Avoid panic when leaf builder
receives no tensors")

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
@AdaWorldAPI AdaWorldAPI merged commit 05d42be into main Apr 15, 2026
AdaWorldAPI pushed a commit that referenced this pull request Apr 15, 2026
Post-#183 finding: Base17 palette substrate can't reconstruct rows for
f32 GEMM (per-row ρ ≈ 0.04 on real Qwen3). This lands both paths the
other session ranked as viable forward directions.

## Path A — HhtlF32Tensor (reconstruction-grade)

New module crates/bgz-tensor/src/hhtl_f32.rs (5 tests passing):

  pub struct HhtlF32Entry { pub twig: u8 }             // 1 byte/row
  pub struct HhtlF32Tensor {
      palette_f32: Vec<Vec<f32>>,    // CLAM centroids in f32
      entries:     Vec<HhtlF32Entry>,
      slot_l:      Option<Vec<SlotL>>,
      slot_l_scale: Option<f32>,
      svd_basis:   Option<SvdBasis>,
      ...
  }

  impl HhtlF32Tensor {
      fn encode(role, rows, k) -> Self;           // 1 B/row, argmax regime
      fn encode_with_leaf(role, rows, k, basis);  // 9 B/row, index regime
      fn reconstruct_row(idx, n_cols) -> Vec<f32>;
      fn reconstruct_rows(n_cols) -> Vec<Vec<f32>>;
  }

Pipeline:
  row     →   CLAM furthest-point  →  twig idx (1 byte)
  residual →  SvdBasis::project    →  SlotL (8 × i8)
  decode:   palette_f32[twig] + SvdBasis::reconstruct(slot_l * scale)

Per-tensor footprint for [n_rows, n_cols]:
  palette BF16: 256 × n_cols × 2
  SVD basis:    8 × n_cols × 2
  entries:      n_rows × 1
  slot_l:       n_rows × 8 (if index regime)

Tests (5 new, all passing):
  encode_without_leaf_picks_real_rows_as_centroids
  reconstruct_without_leaf_returns_nearest_centroid
  encode_with_leaf_beats_without_leaf_on_real_rows  ← ρ ≥ 0.95 on low-rank
  entry_byte_size_is_one
  storage_accounting_is_additive

Example: universal_hhtl_f32_encode.rs — same gates as #183 universal
encoder, but uses HhtlF32Tensor. Running on Qwen3-TTS-0.6B in background.

## Path B — cascade_attention_probe (codec-space inference)

New example: cascade_attention_probe.rs. Measures argmax agreement
between:
  Raw:    argmax_i  q · K[i]^T                          (f32 dot)
  Codec:  argmax_i  FisherZTable[pal_idx(q), pal_idx(K[i])]

on 512 perturbed queries against a real attention K matrix (talker
layer 0 self_attn.k_proj, shape [1024, 2048]).

Pass criteria (subjective):
  ≥ 90% top-1 agreement → Path B viable for pipeline-wide swap
  ≥ 70% partial         → Path B needs Q-side escalation layer
  <  70% fail           → not competitive with f32 GEMM

Both runs launched; results will be posted as PR comments when they
complete.

## Session PR stack

  #180 merged   SlotL foundation
  #181 merged   HhtlDTensor × SlotL
  #182 merged   SharedPaletteGroup × SlotL
  #183 merged   universal_hhtld_encode (Base17 — reconstruction failure documented)
  #184 this PR  HhtlF32Tensor codec + Path A/B examples

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
AdaWorldAPI pushed a commit that referenced this pull request Apr 15, 2026
Per codex P1 comment on #184: HhtlF32Entry.twig is u8, so valid
centroid IDs are 0..=255. Before this fix, encode() accepted any k
and assign_nearest_f32 silently wrapped ci as u8 — passing k=300
(say) would assign centroid-300 as twig-44 and reconstruct the wrong
row. This was actively dangerous because the next-session plan (PR
#184 thread) explicitly proposed k=1024 or 2048 centroids as the
quality fallback.

Fix:
  - New `pub const MAX_PALETTE_K: usize = 256` with clear docstring
  - Both `encode` and `encode_with_leaf` now assert:
      k > 0
      k <= MAX_PALETTE_K
    with explicit panic messages naming the u8 twig limit

Larger palettes need a codec with a wider twig-index (u16 would lift
the cap to 65536, but changes the wire format). That's a separate PR
if/when the quality probe shows k=512+ earns its keep.

Tests (4 new, all pass + 5 existing):
  encode_rejects_zero_k            (#[should_panic = "k > 0"])
  encode_rejects_k_above_256       (#[should_panic = "u8 twig limit"])
  encode_with_leaf_rejects_k_above_256  (same)
  encode_accepts_k_at_max_palette  (k=256 must still succeed)

Refs:
  - PR #184 codex P1 comment ("Reject palette sizes that exceed 255 centroids")
  - Follow-up to merged PRs #180/#181/#182/#183/#184

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants