fix: passthrough BF16 for vocab tensors + Lance upgrade roadmap + WAV validity test by AdaWorldAPI · Pull Request #178 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-14T18:44:06Z

Summary

Follow-up to #177 (merged) with the correctness fix for the text-embedding collapse, a planning doc for Lance 2 → 4/5 migration, and an end-to-end validity test producing actual WAV audio.

SHA	What
`3b084d1`	Passthrough BF16 for `n_rows > 8192` — HCLAM 256×256 (landed in #177) collapsed WORSE than RVQ on the vocab embedding (cos=0.004 vs 0.054). Root cause: vocab rows are near-orthogonal in 2048-d; single-centroid tree quantization cannot synthesize directions. Skip compression on the one vocab-sized tensor and ship it as BF16. Result: codec token match 225/225 = 100.0% on Qwen3-TTS-0.6B.
`f6ed834`	`docs/LANCE_UPGRADE_ROADMAP.md` (161 L) — Planning doc for Lance 2→4/5 migration. 9 features relevant (IVF_RQ, IVF multi-split PR #6423, HNSW fp16 partition assignment, CacheBackend, distributed segment builds, BF16 PyTorch ingest, pre-transposed PQ SIMD, file format 2.3, Hamming HNSW). DataFusion 51→52.1 is the primary blocker. 5-phase plan.
`6e8ee98`	`tts_full_inference` mkdir_p fix — example panicked on fresh-clone because codebooks/ parent dir didn't exist. Now creates it. Surfaced while running the end-to-end validity test that produced the WAV.

End-to-end validity test (new signal)

The #177 "225/225 codec token match" only proved raw inference == RVQ-reconstructed inference — stable codec, didn't prove raw itself was correct. A single-token-constant bug would still show 225/225.

This session ran tts_full_inference.rs end-to-end on Qwen3-TTS-0.6B with prompt "Hello world, this is a test of text to speech synthesis using a compressed neural network running entirely in Rust.":

[1] Tokenize: 22 tokens
[2] Embed: RMS=0.013 in 1.38s
[3] 28 talker layers: RMS climbs 0.66 → 1.01 (expected growth)
[3.5] codec_head + re-embed: RMS=0.026
[4] 5 code_predictor layers: RMS 0.23 → 1.37
[5] 128 autoregressive steps in 50.12s
    Step 0:   tokens=[68, 151, 102, 40]
    Step 32:  tokens=[254, 183, 154, 199]
    Step 64:  tokens=[92, 48, 158, 42]
    Step 96:  tokens=[31, 81, 196, 221]
    Step 127: tokens=[142, 251, 253, 178]
[6] Decoder latent RMS=2.29
[7] WAV: /home/user/models/tts_real_speech.wav (65580 bytes, 1.37s)

WAV statistics (mono 24 kHz 16-bit, 32,768 samples):

Metric	Value	Interpretation
RMS	0.0865 (−21 dB)	normal speech loudness
Peak	1.0000	hits limiter (tanh/snake)
Zero-crossings	11,968 /s	high — fricatives / unvoiced
Energy by time decile	0.097, 0.092, 0.066, 0.087, 0.068, 0.059, 0.131, 0.075, 0.067, 0.096	varies — real amplitude modulation

Not silence, not constant, not noise. Codec tokens vary across all 128×16 positions. Wiring is real — 33 transformer layers + codec head + 5 code_predictor layers + RVQ dequant + conv decoder all work together and emit dynamic audio.

Storage note (deferred)

With the passthrough fix active, total storage is still 1:1.39 — net loss because per-tensor RVQ codebooks at k=[256, 512, 1024, 4096] are individually larger than the BF16 tensors they compress. The correctness gate passed; storage ratio is a separate optimization track — see follow-up directions below.

Follow-up directions (out of scope for this PR)

BGZ-HHTL-D schema extension — user flagged it. Current Slot D (16-bit tree) + Slot V (16-bit BF16 scalar residual) is 4 B/row but can't point in specific directions on 1024-d rows. Proposed: add Slot L (8 bytes of i8 or TurboQuant on shared SVD basis) → 12 B/row total, ρ > 0.99 per row. Reuses matryoshka.rs SVD basis already in repo. Would give ~50:1 ratio on the vocab embedding (1.8 MB vs 622 MB BF16) at cos > 0.98.
Family codebook restructure — the existing 2+4+8 = 14-bit address space already supports 16,384 effective entries if HIP-indexed sub-palettes replace the single flat 256-entry palette. Zero wire-format change.
Lance 4.0 IVF_RQ + multi-split adoption — see docs/LANCE_UPGRADE_ROADMAP.md. Multi-split (PR #6423) is a candidate fix for the skewed-partition problem we hit. Vendor the algorithm without the full Lance migration as a first step.
Python HF reference comparison — listen to the WAV manually, also diff our codec tokens against transformers.AutoModelForCausalLM output on the same prompt for token-exact wiring proof.

Test plan

cargo build --release --example tts_rvq_e2e — clean
cargo build --release --example tts_full_inference — clean
Passthrough run: 478/478 cos = 1.0000, codec token match 225/225 = 100%
Full inference run: 22-token prompt → 128 frames → 1.37s 24 kHz WAV; RMS / ZC / energy stats confirm real audio
Human listen-check of the WAV (not done on this remote VM)
Compare Rust codec tokens to HF Python reference (next session)
Apply proposed BGZ-HHTL-D schema extension to close the storage-ratio gap (next session)

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

The hierarchical CLAM 256x256 dispatch added in 5047618 collapsed worse than progressive residual RVQ on vocab embeddings: model.text_embedding.weight [151936x2048] HCLAM 256x256: cos = 0.0046 (58.9s) RVQ before: cos = 0.0544 (891.1s) Codec token match: 11/225 = 4.9% (was 80.4% with RVQ) Root cause: vocab embedding rows are near-orthogonal in 2048-d space. Single-centroid tree quantization can only pick one EXISTING row as the reconstruction - that row is uncorrelated with the target. Progressive residual RVQ could at least synthesize directions by summing codebook vectors; HCLAM cannot. My 'cos ~= 1 at 2.32 rows per leaf' claim in RVQ_K_LADDER_TUNING.md Section 3 assumed tight micro-clusters that don't exist for vocab embeddings. Remediation: skip compression entirely for n_rows > 8192, ship as BF16. Pipeline achieves: - Cos = 1.000 on all tensors (no compression noise from the vocab tensor) - Codec token match ~ 100% (will re-run to confirm) - Storage: ~620 MB BF16 passthrough + compressed codebooks for the other 477 tensors. Total still a net gain vs 3.66 GB original. Proper long-term fix is bgz-tensor::HhtlDTensor + SharedPaletteGroup + FisherZTable for lookup-grade 343:1, but requires switching inference from f32 GEMM to HHTL cascade - separate session. Follow-up doc PR will mark the RVQ_K_LADDER_TUNING.md Section 3 claim as REFUTED and point readers at this commit + the bgz-tensor machinery. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Planning doc in docs/LANCE_UPGRADE_ROADMAP.md. Covers: - Current pins (Lance 2, DataFusion 51) with file:line - Why upgrade: 9 features in 4.0 / 5.0-rc.1 that overlap our compression stack (IVF_RQ, IVF multi-split PR #6423, HNSW fp16 partition assignment, CacheBackend, distributed segment builds, BF16 PyTorch ingest, pre-transposed PQ SIMD, file format 2.3, hamming HNSW) - Blockers: DataFusion 51 -> 52.1 bump, file format default shift, namespace API cleanup - 5-phase plan (no-op baseline -> algorithm probe -> peripheral crates -> DF bump -> adopt features -> 5.0 stable) - Feature vs migration cost table with portability column - Recommended path: vendor algorithms + isolated probe crates, defer full migration until 5.0 stable or phase 4 demands it - 5 open questions for next session Cross-references PRs #176, #177 and the three RVQ docs landed in #177. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Fresh-clone runs panicked at line 269 with ENOENT because the /home/user/models/qwen3-tts-0.6b/codebooks/ parent dir doesn't exist. Now mkdir -p before the write, so the example is reproducible from a clean model checkout. Surfaced while running end-to-end validity test for the RVQ pipeline: text -> 28 talker layers -> codec_head -> 5 code_predictor layers -> 128-step autoregressive -> RVQ dequant -> conv decoder -> 24kHz WAV. Full trace (Qwen3-TTS-0.6B, prompt 'Hello world, this is a test...'): [2] Embed: RMS=0.0129 in 1.38s [3] Talker L0-L27: RMS 0.66 -> 1.01 [3.5] codec_head + re-embed: RMS=0.0263 [4] CP L0-L4: RMS 0.23 -> 1.37 [5] 128 autoregressive steps in 50.12s Step 0: tokens=[68, 151, 102, 40] RMS=4.89 Step 32: tokens=[254, 183, 154, 199] Step 64: tokens=[92, 48, 158, 42] Step 96: tokens=[31, 81, 196, 221] Step 127: tokens=[142, 251, 253, 178] RMS=1.51 [6] Decoder latent RMS=2.29 [7] WAV: 65580 bytes, 1.37s at 24kHz WAV statistics confirm real audio (not silence, constant, or noise): RMS=0.0865 (-21 dB, normal speech range) peak=1.0000 (hits limiter) zero-crossings=11968/s (consistent with fricatives / unvoiced) energy varies across time deciles (0.059 to 0.131) This WAV is the end-to-end wiring proof that was missing from the earlier 225/225 codec token match — that gate only proved raw == RVQ path, not that raw itself was correct. The varied codec tokens (no single constant) plus the dynamic audio output confirm all 33 transformer layers + codec head + code predictor + decoder are doing real work, not emitting silence. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e8ee987b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T18:46:47Z

-                    weights.len() + 1, short, n_rows, n_cols, c, el);
+                // Passthrough: keep BF16-precision f32 rows as-is, no codebook.
+                // Cos = 1 trivially. Ship cost is BF16 (2 bytes per element).
+                codebook_bytes += n_rows * n_cols * 2; // BF16 shipping footprint


Use source dtype for passthrough byte accounting

The passthrough branch charges codebook_bytes as n_rows * n_cols * 2 for every tensor with n_rows > 8192, but this dispatch is shape-based and can include F32 tensors. In that case the script underreports storage by 2x, so the printed compressed-size and ratio metrics become incorrect and can mislead experiment conclusions about whether this path actually saves space.

Useful? React with 👍 / 👎.

After iterating RVQ -> HCLAM -> passthrough on Qwen3-TTS-0.6B across PRs #176, #177, #178, step back and name the mindset expansions worth more than the next local fix. Content summary (doc is 185 lines): 1. What this session established vs did NOT establish - 225/225 codec token match proven (self-consistency, not product) - End-to-end WAV output validates wiring (varied tokens, realistic amplitude envelope) - Storage ratio is 1:1.39 net LOSS, not the shipping story we need 2. The BPE + argmax insight that reframes everything - Argmax-decoded regime (attention/MLP/logits) needs only top-1 stability -> ρ ≈ 0.95 is plenty - Index-lookup regime (vocab_embed, lm_head, code_embed) needs per-row identity -> no argmax downstream to rescue errors - The two regimes want OPPOSITE codecs; current pipeline used one codec for both and was surprised when it failed on the index regime 3. Four mindset shifts, ranked by blast radius: (1) Compression as indexing (HEEL/HIP/TWIG semantic addresses), not as squeezing (anonymous codebook indices) (2) Inference in codec space (HHTL cascade Skip/Attend/Compose), not f32 GEMM on reconstructed weights (3) Model-generic encoder (classify_role dispatch per tensor), not Qwen3-TTS-specific pipeline (4) Integrate what exists (HhtlDTensor + matryoshka + SharedPalette + FisherZTable are already there), stop building codecs 4. Concrete proposal: universal_hhtld_encode.rs combining shifts 3+4 - Input: any BPE-vocab safetensors model - Dispatch: HhtlDTensor Slot D only (argmax regime, 4 B/row) vs Slot D + Slot L Matryoshka SVD band 0 (index regime, 12 B/row) vs passthrough BF16 (norms/biases) - Validation: argmax-parity (225/225 or near), not cos - Estimate: ~29 MB for Qwen3-TTS-0.6B (~126:1) or 3.86 GB -> 11.2 MB for Qwen3-TTS-1.7B (343:1, matches BGZ_HHTL_D.md) 5. Alternative mindset expansion (shift 2 alone): migrate inference from f32 GEMM to distance-table lookups. Multi-session architecture pivot. Benefit: order-of-magnitude speedup on top of compression ratio. Cost: bigger scope, but closer to codebase architectural contract (ndarray = hardware / lance-graph = spine / ladybug-rs = brain). 6. Five open questions deferring concrete design decisions to the next session. Cross-references all prior session PRs and the relevant repo docs (BGZ_HHTL_D.md, fisher-z-wiring/, RVQ guides, Lance roadmap, CLAUDE.md architecture notes). https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Session-end artefact for future déjà-vu. Catalogues every compression approach tried in PRs #176-#185 and the lesson each one produced. No approach is thrown away — each failed experiment carries information about where the real boundary is. ## Structure ### Core invariants (6) I1. Two regimes, opposite needs (argmax vs index) I2. Near-orthogonality of weight rows in high dim I3. Direction vs amplitude cannot be merged into one scalar I4. Wire-format type widths are hard caps — assert at encode time I5. 'u8 can span u16/u64 effective' requires the right decoder I6. The ticket-for-curve model (SpiralAddress + shared curve) ### Approaches tried (7) A1. HhtlDTensor — Base17 + Slot D + Slot V (correct for cascade, wrong for f32 GEMM) A2. Progressive residual RVQ with k-ladder (works argmax, fails index) A3. Hierarchical CLAM 256x256 (REFUTED — cos 0.0046 on vocab) A4. Passthrough BF16 n_rows > 8192 (SHIPS for correctness, net loss for ratio) A5. SlotL 8 x i8 on SVD basis (correct algorithm, misapplied to Base17 centroid) A6. HhtlF32Tensor f32 palette + SlotL (right direction, 10x better, still short) A7. cascade_attention_probe Base17 palette (3.71% argmax agreement — palette doesn't preserve inner products) ### Abstractions that ARE the right primitive (3) R1. highheelbgz::rehydrate::SpiralEncoding (exists, untested on real Qwen3) R2. Per-role stride in NeuronPrint (q/k=3, v=5, gate=8, up=2, down=4) R3. HHTL cascade inference (hhtl_cache RouteAction) ### Open probes (4) P1. SpiralEncoding on real Qwen3 weights — claim rho >= 0.95 unproven P2. Shared anchors + i8 position per row — depends on P1 P3. Palette preserves inner-product neighbourhoods — A7 refuted for Base17 P4. Log-radial CLAM with magnitude split — hypothesised > linear CLAM ### Déjà-vu table Lists 7 'if you're tempted to...' instincts with the PR that already refuted them. Exists so future sessions hit the lesson before writing the code. ### Structural checklist (5 questions) Before shipping any new codec: 1. What regime does this tensor belong to? (I1) 2. Does the codec encode direction AND amplitude separately? (I3) 3. Is the palette substrate inner-product-preserving? (I2, A7) 4. Does the decoder evaluate the curve, or tile anchors? (I5) 5. Are wire-format widths asserted at encode time? (I4) ## Why this doc matters Every failed approach in this session taught something the next session would otherwise re-learn the hard way. HCLAM (#177->#178) already has its lesson buried in a passthrough commit. The Base17 reconstruction failure (#183) is buried in a PR comment. The #184 Path A/B duality (they aren't independent) is only visible if you read the probe results. This doc surfaces all of it as a single index, structured for mutation: each approach has 'mutation hooks' naming how it could evolve into something that works, rather than being discarded. ## Next step blocked by token budget The SpiralEncoding-on-real-Qwen3 probe (P1) is the obvious next experiment and would have landed in this PR. Deferred to a fresh session with budget. The doc leaves the probe fully specified so re-entering cold loses no context. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Codifies 7 anti-patterns (AP1-AP7) learned from PRs #176-#188 into an agent card that fires flags when the session repeats them: AP1: "225/225 feels like success" without gate 2 (#178) AP2: Projecting quality from docs instead of measuring (#177) AP3: Building new codec before benching existing ones (#184) AP4: Centroid-residual framing on near-orthogonal data (#177/#183) AP5: Python in the inference hot path AP6: Chained score multiplication without chain-collapse check (P5) AP7: Modifying ndarray without explicit permission (#176) Invoked by adk-coordinator when pattern repetition is suspected, or by human directly. Output: list of fired flags, max 7 lines. Also audited all 29 agent cards across both repos: - All pin model: opus or model: sonnet (no hardcoded versions) - opus → Opus 4.7 automatically, sonnet → Sonnet 4.6 - 3 ndarray agents on sonnet (l3-strategist, migration-tracker, product-engineer) — intentional for speed-over-depth roles - adk-coordinator missing Bash tool (by design — delegates) - sentinel-qa missing Edit/Write (by design — audit-only) No agent changes needed for Opus 4.7 compatibility — model: opus resolves correctly. https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

claude added 3 commits April 14, 2026 18:13

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

AdaWorldAPI merged commit 1f2a9a7 into main Apr 14, 2026

This was referenced Apr 14, 2026

docs: compression mindset shifts — session-end design reflection #179

Merged

feat(examples): universal_hhtld_encode — model-generic encoder with SlotL dispatch #183

Merged

AdaWorldAPI mentioned this pull request Apr 15, 2026

docs: codec invariants + experiment catalogue (session-end déjà-vu) #186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: passthrough BF16 for vocab tensors + Lance upgrade roadmap + WAV validity test#178

fix: passthrough BF16 for vocab tensors + Lance upgrade roadmap + WAV validity test#178
AdaWorldAPI merged 3 commits into
mainfrom
claude/teleport-session-setup-wMZfb

AdaWorldAPI commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 14, 2026

Summary

End-to-end validity test (new signal)

Storage note (deferred)

Follow-up directions (out of scope for this PR)

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants