Skip to content

feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe#184

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/hhtl-f32-codec
Apr 15, 2026
Merged

feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe#184
AdaWorldAPI merged 1 commit into
mainfrom
claude/hhtl-f32-codec

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Two follow-up paths from PR #183's reconstruction failure, both tested on real Qwen3-TTS-0.6B.

Path What Status
A HhtlF32Tensor — f32 CLAM centroids replace Base17 palette; keeps SlotL on top Running — partial results ρ̄ ≈ 0.2–0.5 on attention/embed tensors
B cascade_attention_probe — argmax-parity probe on single attention head using existing Base17 palette + FisherZTable FAIL — 3.71% top-1 agreement (19/512)

Path B finding (architectural)

Path B reused the existing Base17 palette as the indexing substrate. Argmax parity vs raw f32 Q · K^T collapsed to 3.71%. Root cause is the same class of error as #183: Base17 fold discards direction information, so two rows that are near-neighbours by f32 dot product land in completely different palette cells.

Paths A and B are NOT separate tracks. Path B's FisherZTable approach only works if the palette faithfully partitions the row space by inner-product similarity. Base17 palette doesn't. f32 CLAM palette might. Path B depends on Path A's f32 palette substrate before it can work. This is a session insight that wasn't clear before running the probe.

Path A preliminary numbers (run still live)

[1/26] ARGMAX code_predictor/down      [1024×3072] × 5   ρ̄=0.27
[2/26] INDEX  code_predictor/embed     [2048×1024] × 15  ρ̄=0.47
[3/26] ARGMAX code_predictor/gate      [3072×1024] × 5   ρ̄=0.21
[4/26] INDEX  code_predictor/lm_head   [2048×1024] × 15  ρ̄=0.41
[5/26] ARGMAX code_predictor/qko       [1024×1024] × 5   ρ̄=0.49

~10× better than HhtlDTensor's ρ ≈ 0.04 on the same tensors — confirms the Base17 reconstruction pathology and that f32 centroids are moving the right direction — but still short of the 0.95 argmax / 0.98 index targets.

Why Path A isn't hitting targets

On 1024-dim near-orthogonal rows:

  • k=256 centroids: nearest centroid to any row is at L2 distance ≈ √2 × row_norm (near-orthogonal to target)
  • 8 SVD coefficients recover < 1% of signal direction for rank-1024 rows

Two clean fallbacks the other session's review named (E4, E5):

  • k=1024 or 2048 centroids — larger palette, denser coverage
  • Per-leaf local SVD basis — 16–32 components per centroid cluster instead of 8 shared

Full run numbers + decision will follow in a comment when Path A completes.

What this PR contains

File Purpose
crates/bgz-tensor/src/hhtl_f32.rs (+316 L, 5 tests) New reconstruction-grade codec: f32 palette + SlotL
crates/bgz-tensor/src/lib.rs (+1 L) pub mod hhtl_f32;
crates/thinking-engine/examples/universal_hhtl_f32_encode.rs (+264 L) Path A encoder with regime dispatch, gates 1 + 3
crates/thinking-engine/examples/cascade_attention_probe.rs (+192 L) Path B argmax-parity probe (single attention head)

Tests passing

test hhtl_f32::tests::entry_byte_size_is_one ........................... ok
test hhtl_f32::tests::encode_without_leaf_picks_real_rows_as_centroids . ok
test hhtl_f32::tests::reconstruct_without_leaf_returns_nearest_centroid  ok
test hhtl_f32::tests::storage_accounting_is_additive ................... ok
test hhtl_f32::tests::encode_with_leaf_beats_without_leaf_on_real_rows . ok
5 passed; 0 failed

Recommendation

Merge the module + tests even if Path A gates fail on full run — HhtlF32Tensor is a correct primitive. The example can be either:

  1. Merged as-is with a note documenting the measured ρ̄ ≈ 0.3 result, OR
  2. Closed, reworked in a follow-up PR with k=1024 + per-leaf local SVD

Depends on full run result — will update this description when it lands.

Session PR stack

PR Status
#180 merged — SlotL foundation
#181 merged — HhtlDTensor × SlotL
#182 merged — SharedPaletteGroup × SlotL
#183 merged — universal_hhtld_encode (Base17 path, reconstruction failure documented)
#184 this PR — HhtlF32Tensor + Path A/B probes

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Post-#183 finding: Base17 palette substrate can't reconstruct rows for
f32 GEMM (per-row ρ ≈ 0.04 on real Qwen3). This lands both paths the
other session ranked as viable forward directions.

## Path A — HhtlF32Tensor (reconstruction-grade)

New module crates/bgz-tensor/src/hhtl_f32.rs (5 tests passing):

  pub struct HhtlF32Entry { pub twig: u8 }             // 1 byte/row
  pub struct HhtlF32Tensor {
      palette_f32: Vec<Vec<f32>>,    // CLAM centroids in f32
      entries:     Vec<HhtlF32Entry>,
      slot_l:      Option<Vec<SlotL>>,
      slot_l_scale: Option<f32>,
      svd_basis:   Option<SvdBasis>,
      ...
  }

  impl HhtlF32Tensor {
      fn encode(role, rows, k) -> Self;           // 1 B/row, argmax regime
      fn encode_with_leaf(role, rows, k, basis);  // 9 B/row, index regime
      fn reconstruct_row(idx, n_cols) -> Vec<f32>;
      fn reconstruct_rows(n_cols) -> Vec<Vec<f32>>;
  }

Pipeline:
  row     →   CLAM furthest-point  →  twig idx (1 byte)
  residual →  SvdBasis::project    →  SlotL (8 × i8)
  decode:   palette_f32[twig] + SvdBasis::reconstruct(slot_l * scale)

Per-tensor footprint for [n_rows, n_cols]:
  palette BF16: 256 × n_cols × 2
  SVD basis:    8 × n_cols × 2
  entries:      n_rows × 1
  slot_l:       n_rows × 8 (if index regime)

Tests (5 new, all passing):
  encode_without_leaf_picks_real_rows_as_centroids
  reconstruct_without_leaf_returns_nearest_centroid
  encode_with_leaf_beats_without_leaf_on_real_rows  ← ρ ≥ 0.95 on low-rank
  entry_byte_size_is_one
  storage_accounting_is_additive

Example: universal_hhtl_f32_encode.rs — same gates as #183 universal
encoder, but uses HhtlF32Tensor. Running on Qwen3-TTS-0.6B in background.

## Path B — cascade_attention_probe (codec-space inference)

New example: cascade_attention_probe.rs. Measures argmax agreement
between:
  Raw:    argmax_i  q · K[i]^T                          (f32 dot)
  Codec:  argmax_i  FisherZTable[pal_idx(q), pal_idx(K[i])]

on 512 perturbed queries against a real attention K matrix (talker
layer 0 self_attn.k_proj, shape [1024, 2048]).

Pass criteria (subjective):
  ≥ 90% top-1 agreement → Path B viable for pipeline-wide swap
  ≥ 70% partial         → Path B needs Q-side escalation layer
  <  70% fail           → not competitive with f32 GEMM

Both runs launched; results will be posted as PR comments when they
complete.

## Session PR stack

  #180 merged   SlotL foundation
  #181 merged   HhtlDTensor × SlotL
  #182 merged   SharedPaletteGroup × SlotL
  #183 merged   universal_hhtld_encode (Base17 — reconstruction failure documented)
  #184 this PR  HhtlF32Tensor codec + Path A/B examples

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16ad36001c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

let mut best_d = f32::MAX;
for (ci, c) in centroids.iter().enumerate() {
let d = l2_dist_sq(row, c);
if d < best_d { best_d = d; best = ci as u8; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject palette sizes that exceed 255 centroids

HhtlF32Entry.twig is a u8, but encode accepts any k and assign_nearest_f32 stores centroid IDs via ci as u8. When k > 256, centroid indices above 255 wrap (e.g., 300→44), so rows are silently assigned and reconstructed from the wrong centroid, corrupting cosine/argmax quality measurements instead of failing fast. Please clamp or reject k beyond the representable range (and validate k > 0) before writing twig indices.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit e59042e into main Apr 15, 2026
AdaWorldAPI pushed a commit that referenced this pull request Apr 15, 2026
Per codex P1 comment on #184: HhtlF32Entry.twig is u8, so valid
centroid IDs are 0..=255. Before this fix, encode() accepted any k
and assign_nearest_f32 silently wrapped ci as u8 — passing k=300
(say) would assign centroid-300 as twig-44 and reconstruct the wrong
row. This was actively dangerous because the next-session plan (PR
#184 thread) explicitly proposed k=1024 or 2048 centroids as the
quality fallback.

Fix:
  - New `pub const MAX_PALETTE_K: usize = 256` with clear docstring
  - Both `encode` and `encode_with_leaf` now assert:
      k > 0
      k <= MAX_PALETTE_K
    with explicit panic messages naming the u8 twig limit

Larger palettes need a codec with a wider twig-index (u16 would lift
the cap to 65536, but changes the wire format). That's a separate PR
if/when the quality probe shows k=512+ earns its keep.

Tests (4 new, all pass + 5 existing):
  encode_rejects_zero_k            (#[should_panic = "k > 0"])
  encode_rejects_k_above_256       (#[should_panic = "u8 twig limit"])
  encode_with_leaf_rejects_k_above_256  (same)
  encode_accepts_k_at_max_palette  (k=256 must still succeed)

Refs:
  - PR #184 codex P1 comment ("Reject palette sizes that exceed 255 centroids")
  - Follow-up to merged PRs #180/#181/#182/#183/#184

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
AdaWorldAPI pushed a commit that referenced this pull request Apr 15, 2026
Session-end artefact for future déjà-vu. Catalogues every compression
approach tried in PRs #176-#185 and the lesson each one produced. No
approach is thrown away — each failed experiment carries information
about where the real boundary is.

## Structure

### Core invariants (6)
  I1. Two regimes, opposite needs (argmax vs index)
  I2. Near-orthogonality of weight rows in high dim
  I3. Direction vs amplitude cannot be merged into one scalar
  I4. Wire-format type widths are hard caps — assert at encode time
  I5. 'u8 can span u16/u64 effective' requires the right decoder
  I6. The ticket-for-curve model (SpiralAddress + shared curve)

### Approaches tried (7)
  A1. HhtlDTensor — Base17 + Slot D + Slot V (correct for cascade, wrong for f32 GEMM)
  A2. Progressive residual RVQ with k-ladder (works argmax, fails index)
  A3. Hierarchical CLAM 256x256 (REFUTED — cos 0.0046 on vocab)
  A4. Passthrough BF16 n_rows > 8192 (SHIPS for correctness, net loss for ratio)
  A5. SlotL 8 x i8 on SVD basis (correct algorithm, misapplied to Base17 centroid)
  A6. HhtlF32Tensor f32 palette + SlotL (right direction, 10x better, still short)
  A7. cascade_attention_probe Base17 palette (3.71% argmax agreement — palette doesn't preserve inner products)

### Abstractions that ARE the right primitive (3)
  R1. highheelbgz::rehydrate::SpiralEncoding (exists, untested on real Qwen3)
  R2. Per-role stride in NeuronPrint (q/k=3, v=5, gate=8, up=2, down=4)
  R3. HHTL cascade inference (hhtl_cache RouteAction)

### Open probes (4)
  P1. SpiralEncoding on real Qwen3 weights — claim rho >= 0.95 unproven
  P2. Shared anchors + i8 position per row — depends on P1
  P3. Palette preserves inner-product neighbourhoods — A7 refuted for Base17
  P4. Log-radial CLAM with magnitude split — hypothesised > linear CLAM

### Déjà-vu table

Lists 7 'if you're tempted to...' instincts with the PR that already
refuted them. Exists so future sessions hit the lesson before writing
the code.

### Structural checklist (5 questions)

Before shipping any new codec:
  1. What regime does this tensor belong to? (I1)
  2. Does the codec encode direction AND amplitude separately? (I3)
  3. Is the palette substrate inner-product-preserving? (I2, A7)
  4. Does the decoder evaluate the curve, or tile anchors? (I5)
  5. Are wire-format widths asserted at encode time? (I4)

## Why this doc matters

Every failed approach in this session taught something the next session
would otherwise re-learn the hard way. HCLAM (#177->#178) already has
its lesson buried in a passthrough commit. The Base17 reconstruction
failure (#183) is buried in a PR comment. The #184 Path A/B duality
(they aren't independent) is only visible if you read the probe results.

This doc surfaces all of it as a single index, structured for mutation:
each approach has 'mutation hooks' naming how it could evolve into
something that works, rather than being discarded.

## Next step blocked by token budget

The SpiralEncoding-on-real-Qwen3 probe (P1) is the obvious next
experiment and would have landed in this PR. Deferred to a fresh
session with budget. The doc leaves the probe fully specified so
re-entering cold loses no context.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
AdaWorldAPI pushed a commit that referenced this pull request Apr 17, 2026
Runs `highheelbgz::rehydrate::SpiralEncoding` signature codec against
256 stride-sampled rows of Qwen3-TTS-0.6B k_proj.weight at K ∈ {4, 8, 16}
with spiral stride=3 (matching NeuronPrint's k_proj role design).

Clarification discovered during probe design: SpiralEncoding is a
SIGNATURE codec (17 Base17 dims × K anchor samples), not a dense row
reconstructor. The probe measures neighborhood preservation:
  G1 self-cosine = 1.0 identity check
  G2 top-1 NN match to raw-cosine argmax
  G3 pairwise rank agreement on random pairs

## Results

  K=4   top-1=18.4%  top-5=39.8%  rank-agree=0.663  142 B/row
  K=8   top-1=31.6%  top-5=59.8%  rank-agree=0.747  278 B/row
  K=16  top-1=44.9%  top-5=78.9%  rank-agree=0.803  550 B/row

Self-cos = 1.000000 at all K (identity holds).

## Status: P1 PARTIAL

~12× better than Base17 palette (#184 cascade_attention_probe: 3.71%
top-1 at similar cost). Monotonic with K but misses the 90% top-1 /
0.85 rank-agree threshold even at K=16 (550 B/row, 100× more than
Slot D's 4 B).

## Forward menu (updated in invariants doc)

1. Hybrid: SpiralEncoding at small K + compact BF16/i8 residual.
2. Per-role stride sweep (tested stride=3; other roles 2/4/5/8).
3. Wire SpiralEncoding into #184 cascade_attention_probe directly —
   cascade routing (Skip/Attend/Compose/Escalate) may converge
   even when raw argmax-parity does not.

docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md updated: P1 status moved
from 'unproven' to 'PARTIAL with measurements', mutation hooks added,
session finding summary documents the K-bound + forward menu.

Refs:
  - #184 Path B cascade_attention_probe — Base17 palette baseline
  - knowledge/phi-spiral-reconstruction.md — "spiral as constraint,
    not guess" design principle

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
AdaWorldAPI added a commit that referenced this pull request Apr 17, 2026
fix(bgz-tensor): reject HhtlF32Tensor palette sizes >= 256 (codex #184 P1)
AdaWorldAPI pushed a commit that referenced this pull request Apr 17, 2026
Codifies 7 anti-patterns (AP1-AP7) learned from PRs #176-#188 into
an agent card that fires flags when the session repeats them:

  AP1: "225/225 feels like success" without gate 2 (#178)
  AP2: Projecting quality from docs instead of measuring (#177)
  AP3: Building new codec before benching existing ones (#184)
  AP4: Centroid-residual framing on near-orthogonal data (#177/#183)
  AP5: Python in the inference hot path
  AP6: Chained score multiplication without chain-collapse check (P5)
  AP7: Modifying ndarray without explicit permission (#176)

Invoked by adk-coordinator when pattern repetition is suspected, or
by human directly. Output: list of fired flags, max 7 lines.

Also audited all 29 agent cards across both repos:
  - All pin model: opus or model: sonnet (no hardcoded versions)
  - opus → Opus 4.7 automatically, sonnet → Sonnet 4.6
  - 3 ndarray agents on sonnet (l3-strategist, migration-tracker,
    product-engineer) — intentional for speed-over-depth roles
  - adk-coordinator missing Bash tool (by design — delegates)
  - sentinel-qa missing Edit/Write (by design — audit-only)

No agent changes needed for Opus 4.7 compatibility — model: opus
resolves correctly.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants