feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe by AdaWorldAPI · Pull Request #184 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-15T12:05:42Z

Summary

Two follow-up paths from PR #183's reconstruction failure, both tested on real Qwen3-TTS-0.6B.

Path	What	Status
A	`HhtlF32Tensor` — f32 CLAM centroids replace Base17 palette; keeps SlotL on top	Running — partial results ρ̄ ≈ 0.2–0.5 on attention/embed tensors
B	`cascade_attention_probe` — argmax-parity probe on single attention head using existing Base17 palette + FisherZTable	FAIL — 3.71% top-1 agreement (19/512)

Path B finding (architectural)

Path B reused the existing Base17 palette as the indexing substrate. Argmax parity vs raw f32 Q · K^T collapsed to 3.71%. Root cause is the same class of error as #183: Base17 fold discards direction information, so two rows that are near-neighbours by f32 dot product land in completely different palette cells.

Paths A and B are NOT separate tracks. Path B's FisherZTable approach only works if the palette faithfully partitions the row space by inner-product similarity. Base17 palette doesn't. f32 CLAM palette might. Path B depends on Path A's f32 palette substrate before it can work. This is a session insight that wasn't clear before running the probe.

Path A preliminary numbers (run still live)

[1/26] ARGMAX code_predictor/down      [1024×3072] × 5   ρ̄=0.27
[2/26] INDEX  code_predictor/embed     [2048×1024] × 15  ρ̄=0.47
[3/26] ARGMAX code_predictor/gate      [3072×1024] × 5   ρ̄=0.21
[4/26] INDEX  code_predictor/lm_head   [2048×1024] × 15  ρ̄=0.41
[5/26] ARGMAX code_predictor/qko       [1024×1024] × 5   ρ̄=0.49

~10× better than HhtlDTensor's ρ ≈ 0.04 on the same tensors — confirms the Base17 reconstruction pathology and that f32 centroids are moving the right direction — but still short of the 0.95 argmax / 0.98 index targets.

Why Path A isn't hitting targets

On 1024-dim near-orthogonal rows:

k=256 centroids: nearest centroid to any row is at L2 distance ≈ √2 × row_norm (near-orthogonal to target)
8 SVD coefficients recover < 1% of signal direction for rank-1024 rows

Two clean fallbacks the other session's review named (E4, E5):

k=1024 or 2048 centroids — larger palette, denser coverage
Per-leaf local SVD basis — 16–32 components per centroid cluster instead of 8 shared

Full run numbers + decision will follow in a comment when Path A completes.

What this PR contains

File	Purpose
`crates/bgz-tensor/src/hhtl_f32.rs` (+316 L, 5 tests)	New reconstruction-grade codec: f32 palette + SlotL
`crates/bgz-tensor/src/lib.rs` (+1 L)	`pub mod hhtl_f32;`
`crates/thinking-engine/examples/universal_hhtl_f32_encode.rs` (+264 L)	Path A encoder with regime dispatch, gates 1 + 3
`crates/thinking-engine/examples/cascade_attention_probe.rs` (+192 L)	Path B argmax-parity probe (single attention head)

Tests passing

test hhtl_f32::tests::entry_byte_size_is_one ........................... ok
test hhtl_f32::tests::encode_without_leaf_picks_real_rows_as_centroids . ok
test hhtl_f32::tests::reconstruct_without_leaf_returns_nearest_centroid  ok
test hhtl_f32::tests::storage_accounting_is_additive ................... ok
test hhtl_f32::tests::encode_with_leaf_beats_without_leaf_on_real_rows . ok
5 passed; 0 failed

Recommendation

Merge the module + tests even if Path A gates fail on full run — HhtlF32Tensor is a correct primitive. The example can be either:

Merged as-is with a note documenting the measured ρ̄ ≈ 0.3 result, OR
Closed, reworked in a follow-up PR with k=1024 + per-leaf local SVD

Depends on full run result — will update this description when it lands.

Session PR stack

PR	Status
`#180`	merged — SlotL foundation
`#181`	merged — HhtlDTensor × SlotL
`#182`	merged — SharedPaletteGroup × SlotL
`#183`	merged — universal_hhtld_encode (Base17 path, reconstruction failure documented)
`#184`	this PR — HhtlF32Tensor + Path A/B probes

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Post-#183 finding: Base17 palette substrate can't reconstruct rows for f32 GEMM (per-row ρ ≈ 0.04 on real Qwen3). This lands both paths the other session ranked as viable forward directions. ## Path A — HhtlF32Tensor (reconstruction-grade) New module crates/bgz-tensor/src/hhtl_f32.rs (5 tests passing): pub struct HhtlF32Entry { pub twig: u8 } // 1 byte/row pub struct HhtlF32Tensor { palette_f32: Vec<Vec<f32>>, // CLAM centroids in f32 entries: Vec<HhtlF32Entry>, slot_l: Option<Vec<SlotL>>, slot_l_scale: Option<f32>, svd_basis: Option<SvdBasis>, ... } impl HhtlF32Tensor { fn encode(role, rows, k) -> Self; // 1 B/row, argmax regime fn encode_with_leaf(role, rows, k, basis); // 9 B/row, index regime fn reconstruct_row(idx, n_cols) -> Vec<f32>; fn reconstruct_rows(n_cols) -> Vec<Vec<f32>>; } Pipeline: row → CLAM furthest-point → twig idx (1 byte) residual → SvdBasis::project → SlotL (8 × i8) decode: palette_f32[twig] + SvdBasis::reconstruct(slot_l * scale) Per-tensor footprint for [n_rows, n_cols]: palette BF16: 256 × n_cols × 2 SVD basis: 8 × n_cols × 2 entries: n_rows × 1 slot_l: n_rows × 8 (if index regime) Tests (5 new, all passing): encode_without_leaf_picks_real_rows_as_centroids reconstruct_without_leaf_returns_nearest_centroid encode_with_leaf_beats_without_leaf_on_real_rows ← ρ ≥ 0.95 on low-rank entry_byte_size_is_one storage_accounting_is_additive Example: universal_hhtl_f32_encode.rs — same gates as #183 universal encoder, but uses HhtlF32Tensor. Running on Qwen3-TTS-0.6B in background. ## Path B — cascade_attention_probe (codec-space inference) New example: cascade_attention_probe.rs. Measures argmax agreement between: Raw: argmax_i q · K[i]^T (f32 dot) Codec: argmax_i FisherZTable[pal_idx(q), pal_idx(K[i])] on 512 perturbed queries against a real attention K matrix (talker layer 0 self_attn.k_proj, shape [1024, 2048]). Pass criteria (subjective): ≥ 90% top-1 agreement → Path B viable for pipeline-wide swap ≥ 70% partial → Path B needs Q-side escalation layer < 70% fail → not competitive with f32 GEMM Both runs launched; results will be posted as PR comments when they complete. ## Session PR stack #180 merged SlotL foundation #181 merged HhtlDTensor × SlotL #182 merged SharedPaletteGroup × SlotL #183 merged universal_hhtld_encode (Base17 — reconstruction failure documented) #184 this PR HhtlF32Tensor codec + Path A/B examples https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16ad36001c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T12:09:18Z

+        let mut best_d = f32::MAX;
+        for (ci, c) in centroids.iter().enumerate() {
+            let d = l2_dist_sq(row, c);
+            if d < best_d { best_d = d; best = ci as u8; }


Reject palette sizes that exceed 255 centroids

HhtlF32Entry.twig is a u8, but encode accepts any k and assign_nearest_f32 stores centroid IDs via ci as u8. When k > 256, centroid indices above 255 wrap (e.g., 300→44), so rows are silently assigned and reconstructed from the wrong centroid, corrupting cosine/argmax quality measurements instead of failing fast. Please clamp or reject k beyond the representable range (and validate k > 0) before writing twig indices.

Useful? React with 👍 / 👎.

Per codex P1 comment on #184: HhtlF32Entry.twig is u8, so valid centroid IDs are 0..=255. Before this fix, encode() accepted any k and assign_nearest_f32 silently wrapped ci as u8 — passing k=300 (say) would assign centroid-300 as twig-44 and reconstruct the wrong row. This was actively dangerous because the next-session plan (PR #184 thread) explicitly proposed k=1024 or 2048 centroids as the quality fallback. Fix: - New `pub const MAX_PALETTE_K: usize = 256` with clear docstring - Both `encode` and `encode_with_leaf` now assert: k > 0 k <= MAX_PALETTE_K with explicit panic messages naming the u8 twig limit Larger palettes need a codec with a wider twig-index (u16 would lift the cap to 65536, but changes the wire format). That's a separate PR if/when the quality probe shows k=512+ earns its keep. Tests (4 new, all pass + 5 existing): encode_rejects_zero_k (#[should_panic = "k > 0"]) encode_rejects_k_above_256 (#[should_panic = "u8 twig limit"]) encode_with_leaf_rejects_k_above_256 (same) encode_accepts_k_at_max_palette (k=256 must still succeed) Refs: - PR #184 codex P1 comment ("Reject palette sizes that exceed 255 centroids") - Follow-up to merged PRs #180/#181/#182/#183/#184 https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Session-end artefact for future déjà-vu. Catalogues every compression approach tried in PRs #176-#185 and the lesson each one produced. No approach is thrown away — each failed experiment carries information about where the real boundary is. ## Structure ### Core invariants (6) I1. Two regimes, opposite needs (argmax vs index) I2. Near-orthogonality of weight rows in high dim I3. Direction vs amplitude cannot be merged into one scalar I4. Wire-format type widths are hard caps — assert at encode time I5. 'u8 can span u16/u64 effective' requires the right decoder I6. The ticket-for-curve model (SpiralAddress + shared curve) ### Approaches tried (7) A1. HhtlDTensor — Base17 + Slot D + Slot V (correct for cascade, wrong for f32 GEMM) A2. Progressive residual RVQ with k-ladder (works argmax, fails index) A3. Hierarchical CLAM 256x256 (REFUTED — cos 0.0046 on vocab) A4. Passthrough BF16 n_rows > 8192 (SHIPS for correctness, net loss for ratio) A5. SlotL 8 x i8 on SVD basis (correct algorithm, misapplied to Base17 centroid) A6. HhtlF32Tensor f32 palette + SlotL (right direction, 10x better, still short) A7. cascade_attention_probe Base17 palette (3.71% argmax agreement — palette doesn't preserve inner products) ### Abstractions that ARE the right primitive (3) R1. highheelbgz::rehydrate::SpiralEncoding (exists, untested on real Qwen3) R2. Per-role stride in NeuronPrint (q/k=3, v=5, gate=8, up=2, down=4) R3. HHTL cascade inference (hhtl_cache RouteAction) ### Open probes (4) P1. SpiralEncoding on real Qwen3 weights — claim rho >= 0.95 unproven P2. Shared anchors + i8 position per row — depends on P1 P3. Palette preserves inner-product neighbourhoods — A7 refuted for Base17 P4. Log-radial CLAM with magnitude split — hypothesised > linear CLAM ### Déjà-vu table Lists 7 'if you're tempted to...' instincts with the PR that already refuted them. Exists so future sessions hit the lesson before writing the code. ### Structural checklist (5 questions) Before shipping any new codec: 1. What regime does this tensor belong to? (I1) 2. Does the codec encode direction AND amplitude separately? (I3) 3. Is the palette substrate inner-product-preserving? (I2, A7) 4. Does the decoder evaluate the curve, or tile anchors? (I5) 5. Are wire-format widths asserted at encode time? (I4) ## Why this doc matters Every failed approach in this session taught something the next session would otherwise re-learn the hard way. HCLAM (#177->#178) already has its lesson buried in a passthrough commit. The Base17 reconstruction failure (#183) is buried in a PR comment. The #184 Path A/B duality (they aren't independent) is only visible if you read the probe results. This doc surfaces all of it as a single index, structured for mutation: each approach has 'mutation hooks' naming how it could evolve into something that works, rather than being discarded. ## Next step blocked by token budget The SpiralEncoding-on-real-Qwen3 probe (P1) is the obvious next experiment and would have landed in this PR. Deferred to a fresh session with budget. The doc leaves the probe fully specified so re-entering cold loses no context. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Runs `highheelbgz::rehydrate::SpiralEncoding` signature codec against 256 stride-sampled rows of Qwen3-TTS-0.6B k_proj.weight at K ∈ {4, 8, 16} with spiral stride=3 (matching NeuronPrint's k_proj role design). Clarification discovered during probe design: SpiralEncoding is a SIGNATURE codec (17 Base17 dims × K anchor samples), not a dense row reconstructor. The probe measures neighborhood preservation: G1 self-cosine = 1.0 identity check G2 top-1 NN match to raw-cosine argmax G3 pairwise rank agreement on random pairs ## Results K=4 top-1=18.4% top-5=39.8% rank-agree=0.663 142 B/row K=8 top-1=31.6% top-5=59.8% rank-agree=0.747 278 B/row K=16 top-1=44.9% top-5=78.9% rank-agree=0.803 550 B/row Self-cos = 1.000000 at all K (identity holds). ## Status: P1 PARTIAL ~12× better than Base17 palette (#184 cascade_attention_probe: 3.71% top-1 at similar cost). Monotonic with K but misses the 90% top-1 / 0.85 rank-agree threshold even at K=16 (550 B/row, 100× more than Slot D's 4 B). ## Forward menu (updated in invariants doc) 1. Hybrid: SpiralEncoding at small K + compact BF16/i8 residual. 2. Per-role stride sweep (tested stride=3; other roles 2/4/5/8). 3. Wire SpiralEncoding into #184 cascade_attention_probe directly — cascade routing (Skip/Attend/Compose/Escalate) may converge even when raw argmax-parity does not. docs/CODEC_INVARIANTS_AND_EXPERIMENTS.md updated: P1 status moved from 'unproven' to 'PARTIAL with measurements', mutation hooks added, session finding summary documents the K-bound + forward menu. Refs: - #184 Path B cascade_attention_probe — Base17 palette baseline - knowledge/phi-spiral-reconstruction.md — "spiral as constraint, not guess" design principle https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

fix(bgz-tensor): reject HhtlF32Tensor palette sizes >= 256 (codex #184 P1)

Codifies 7 anti-patterns (AP1-AP7) learned from PRs #176-#188 into an agent card that fires flags when the session repeats them: AP1: "225/225 feels like success" without gate 2 (#178) AP2: Projecting quality from docs instead of measuring (#177) AP3: Building new codec before benching existing ones (#184) AP4: Centroid-residual framing on near-orthogonal data (#177/#183) AP5: Python in the inference hot path AP6: Chained score multiplication without chain-collapse check (P5) AP7: Modifying ndarray without explicit permission (#176) Invoked by adk-coordinator when pattern repetition is suspected, or by human directly. Output: list of fired flags, max 7 lines. Also audited all 29 agent cards across both repos: - All pin model: opus or model: sonnet (no hardcoded versions) - opus → Opus 4.7 automatically, sonnet → Sonnet 4.6 - 3 ndarray agents on sonnet (l3-strategist, migration-tracker, product-engineer) — intentional for speed-over-depth roles - adk-coordinator missing Bash tool (by design — delegates) - sentinel-qa missing Edit/Write (by design — audit-only) No agent changes needed for Opus 4.7 compatibility — model: opus resolves correctly. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

chatgpt-codex-connector Bot reviewed Apr 15, 2026

View reviewed changes

AdaWorldAPI merged commit e59042e into main Apr 15, 2026

AdaWorldAPI mentioned this pull request Apr 15, 2026

docs: codec invariants + experiment catalogue (session-end déjà-vu) #186

Merged

AdaWorldAPI added a commit that referenced this pull request Apr 17, 2026

Merge pull request #185 from AdaWorldAPI/claude/hhtl-f32-palette-bounds

f0aa58b

fix(bgz-tensor): reject HhtlF32Tensor palette sizes >= 256 (codex #184 P1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe#184

feat(bgz-tensor): HhtlF32Tensor codec + Path A encoder + Path B argmax-parity probe#184
AdaWorldAPI merged 1 commit into
mainfrom
claude/hhtl-f32-codec

AdaWorldAPI commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 15, 2026

Summary

Path B finding (architectural)

Path A preliminary numbers (run still live)

Why Path A isn't hitting targets

What this PR contains

Tests passing

Recommendation

Session PR stack

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants