feat(lab): zipper codec family + fractal-leaf research arc + findings doc by AdaWorldAPI · Pull Request #218 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-20T05:25:15Z

Summary

Full research arc from fractal-leaf proposal to measured zipper codec family, with empirical answers to "does the zipper fix the argmax blind spot?" (no) and "what does it fix?" (no-codebook / progressive-decode / bundling). All lab-gated behind --features lab; main builds unaffected.

What's shipped

Lab-gated code (cargo --features lab):

bgz-tensor::fractal_descriptor — MFDFA magnitude-only (empirically dead)
bgz-tensor::zipper — sign-bit zipper (ZipperDescriptor), I8-μ-law zipper (ZipperI8Descriptor), 5-level signed bipolar (Zipper5LevelDescriptor), 7-level signed bipolar (Zipper7LevelDescriptor), plus bundle() for VSA negative-cancellation superposition
bgz-tensor/examples/fractal_probe.rs — HF-streaming CoV(w) probe
thinking-engine/examples/codec_rnd_bench.rs — 10 new lab-gated codec candidates registered alongside the 67 production candidates

Comprehensive findings documentation:

.claude/knowledge/codec-findings-2026-04-20.md — 242-line reference. Measured ICC hierarchy, 5 invariants, decision tree, dead ends to avoid, 5 unmeasured probe queue items, recipe for adding new candidates. Deliberately prevents future sessions from re-deriving these measurements.
EPIPHANIES.md — 5 dated findings covering the full arc
IDEAS.md — zipper architecture, fractal round-trip, round-trip codec
TECH_DEBT.md — existing entries preserved

Measured results (q_proj L0, Qwen3-8B, N=128 rows, ICC_3_1)

Codec	Bytes	ICC	Verdict
Passthrough	row×4	1.000	baseline
`Had-Q5×D-R` (already shipping)	0/row	0.989	argmax leader, shared codebook
I8-Hadamard (est)	9	~0.9	argmax, per-row-only
Zipper-Full (sign+mag)	64	0.204	new Pareto point
Zipper-7^7×7	18	0.144	new compact Pareto
Zipper-Phase	8	0.097	beats Base17 at 1/4 bytes
Base17	34	0.024	dominated
Fractal-Desc (magnitude)	7	−0.996	DEAD
Fractal-Phase (flip density)	5	−0.997	DEAD

Key diagnoses

Sign-flip invariance bug (your diagnosis, confirmed empirically): fractal descriptors using MFDFA variance or flip-density are invariant under WHT linearity of negation → ICC → −1. Fix: sample actual sign bits, not derived statistics.
Per-row i8 normalization destroys inter-row magnitude info. Fix: global-scale (population median) quantization, as used in 5^5/7^7 variants.
Quintenzirkel stride loses to φ-stride on argmax — harmonic-proximity ordering is the wrong metric here; maximal-irrationality (φ) dominates. Quintenzirkel may still win on other tasks.
Argmax blind spot was already fixed by Had-Q5×D-R and I8-Hadamard in the existing sweep. The zipper family does NOT close that gap and is not intended to — it fills different constraint cells (no-codebook, progressive, bundling).

Test plan

6/6 zipper unit tests pass (sign-bit, I8, 5-level, 7-level encode/pack/self-similarity/sign-flip/scaling/bundle)
Fractal descriptor 6/6 tests pass
cargo check --features lab — lab gate works both ways (on/off)
End-to-end bench runs against Qwen3-8B shard 1, 1400 s, produces all codec rows across 3 populations
Main-build surface unchanged (no exports leak into main binary without --features lab)

Unmeasured follow-ups (filed in findings doc)

MRI-style differential phase (N rotations, inter-view phase delta)
Fibonacci-weighted bundling (256-signal capacity via Zeckendorf decode)
Audiophile multi-band precision (non-uniform bit allocation per position)
JL multi-view phase cleaning (√N SNR improvement)
Gamma-calibrated global scale via ICC optimization

Each ~50-100 LOC + one bench run.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…scales Per EPIPHANIES 2026-04-19 CORRECTION: magnitude-only fractal leaf measured the envelope (D, w, σ, H) which is near-constant across rows. The per-row variation lives in the SIGN PATTERN of Hadamard-rotated coefficients — that is the phase. New primitive in bgz_tensor::fractal_descriptor (lab-gated): PhaseDescriptor { flip_density: [f32; 5] // scales s ∈ {4, 8, 16, 32, 64} } PhaseDescriptor::from_row(row) -> PhaseDescriptor 1. wht_f32(row) — orthogonal projection 2. sign sequence s_i = sign(c_i) 3. count sign flips per window at 5 scales, normalize → density 4. 5-D signature per row PhaseDescriptor::cosine(other) -> f32 normalized dot product between two 5-D phase signatures Two new CodecCandidates in codec_rnd_bench.rs (lab-gated): FractalPhaseOnly 5 B fractal phase signature alone FractalPhasePlusBase17 39 B 0.75*Base17 + 0.25*phase blend Re-runs through the same endpoint psychometric suite (bgz_tensor::quality::icc_3_1 + cronbach_alpha + spearman + 7 others). Direct comparison to the magnitude-only variant that measured ICC_3_1 = -0.9955 on Qwen3-8B q_proj. Gates unchanged: all behind --features lab. Main builds untouched. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… dead Ran codec_rnd_bench.rs with both fractal variants. Qwen3-8B q_proj L0, N=128 rows, pairwise cosine ground truth. | Fractal-Desc (magnitude, 7 B) | ICC_3_1 = -0.9955 | | Fractal-Phase (phase, 5 B) | ICC_3_1 = -0.9972 | | Fractal + Base17 | ICC_3_1 = -0.4879 | | Phase + Base17 | ICC_3_1 = -0.4982 | BOTH orthogonal axes of row-level fractal statistics are flat across rows after Hadamard. Per I2 (near-orthogonality), any row-level summary statistic looks identical once rows are Gaussian-ish post-rotation. Discrimination requires full sign/magnitude coordinate pattern (~512 B/row). Fractal-leaf line of research closed for row-level compression. Three probes completed, all negative. Only still-open variant: fractal-interpolation-between-Base17-anchors for round-trip codec (unmeasured, unbuilt). I8-Hadamard ~9 B remains the argmax-regime leader. Don't pursue row-level-statistic fractal compression further. Wall time: 22.5 min, 4 new candidates on 60-codec sweep, 128 rows. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

… bgz17 container Per user + existing phi-spiral-reconstruction.md "family zipper" concept: bgz17 halo isn't waste, it's magnitude storage at a different φ-stride. Supersedes the triple-channel matryoshka proposal (3 separate containers) with a single-container zipper: phase stride = round(N / φ) → ~48-64 bits (existing bgz17) mag stride = round(N / φ²) → ~48-64 bits (halo positions) halo-rem → ~16,200 bits (ECC / future) Both strides maximally-irrational → both anti-moiré ("X-Trans") → coincidences at φ-ratios → hidden moiré preserved for both streams in the same container. Zeckendorf property: unique non-adjacent Fibonacci decomposition → non-colliding strides are mathematical, not hand-tuned. Matryoshka truncation preserved: read phase alone = coarse, read phase + mag = fine, read halo ECC = corrected. Single stride-aware reader, not 3 parallel ones. Halo utilization: 0.3% → 0.6% signal density. Advantage over triple- channel: 1 container vs 3, matches existing bgz17 design intent. Next: implement bgz17::zipper_{encode,decode}, add ZipperCodec as lab-gated candidate in codec_rnd_bench.rs, measure ICC_3_1. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…6-64 active Implements the φ-zipper architecture from IDEAS.md 2026-04-19. Single container carries two φ-stride-multiplexed streams: PHASE_ACTIVE_BITS = 64 (explicit constant) MAG_ACTIVE_SAMPLES = 56 (explicit constant) ZIPPER_BYTES = 64 (8 B phase + 56 B i8 magnitude) Both streams share one row, at different φ-strides: phase stride = round(N / φ) — Base17-style aperiodic sampling mag stride = round(N / φ²) — Zeckendorf-non-adjacent stride Zeckendorf property: non-adjacent Fibonacci indices → strides mathematically non-colliding. No hand-tuning. Both streams maximally-irrational vs the Hadamard butterfly → both anti-moiré ("X-Trans sensor" principle). Coincidences at φ-ratios = "hidden moiré" — dispersed below visibility. Matryoshka truncation via single descriptor: cosine_phase_only (8 B) coarse decode cosine_magnitude_only (56 B) magnitude alone (diagnostic) cosine_zipper_full (64 B) full decode — 0.5 phase + 0.5 mag 6/6 unit tests pass: constants_are_explicit (locks 64 / 56 / 64) encode_pack_roundtrip self_similarity_unity (cos(d, d) = 1.0) different_rows_lower (random rows don't falsely agree) sign_flip_inverts_both (Hadamard linearity: -row → -cos) positive_scaling_preserves (k·row → cos = 1 for k > 0) Wired as lab-gated candidates in codec_rnd_bench.rs: ZipperPhaseOnly (8 B) ZipperFull (64 B) Next: run bench → measure ICC_3_1 vs Base17 (0.024) and fractal candidates (-0.9955 / -0.9972). Hypothesis: zipper beats both because magnitude-stream carries independent signal not captured by row-level fractal statistics. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…ctal bug Three-population ICC measurement confirms both the diagnosis and fix: Sign-flip invariance of fractal descriptors (MFDFA variance + flip density both unchanged under WHT linearity of negation) → codec sees cos(x, -x) = 1.0 while ground truth sees -1.0 → perfect ranking inversion → ICC = -0.999. Not "no signal", but "collapsed opposites". Zipper fix: sample sign BITS at positions, not derived statistics. Invariance broken, anti-correlation vanishes, POSITIVE ICC restored. | Codec | Bytes | k_proj | gate_proj | q_proj | | Base17 | 34 | 0.007 | 0.012 | 0.024 | | Fractal-X | | -0.999 | -0.999 | -0.996 | | Zipper-Phase | 8 | 0.050 | 0.049 | 0.097 | (beats Base17 @ 1/4 bytes) | Zipper-Full | 64 | 0.129 | 0.107 | 0.203 | (top-5 recall 0.6) Still behind I8-Hadamard leader (ICC ~0.9 at 9 B), but FIRST fractal-family codec with positive ICC. Anti-moiré φ-stride + explicit sign preservation is the working recipe. Next probes: wider phase stream, φ-permute morph, different bases, blend weight tuning. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Three improvements to the zipper codec: 1. ZipperI8Descriptor — i8 samples (sign+magnitude) instead of sign-only bits. 8× info density per byte. Same budget as Zipper-Phase, vastly denser signal. 2. Quintenzirkel stride — log₂(3/2) ≈ 0.585 irrational rotation. Circle of Fifths ordering: adjacent samples are harmonically related (consonant). Natural truncation: 7 samples = diatonic, 12 = chromatic, 24+ = overtone. Tests harmonic-proximity ordering vs φ's maximal-irrationality. 3. μ-law companding (MU_LAW=255) — gamma-corrected i8 quantization. sign(x) * log(1 + μ|x|) / log(1 + μ) → concentrates precision near zero where argmax decisions happen, coarsens at extremes. Inverse: mu_law_decode for reconstruction. Constants made explicit: QUINT_STRIDE = 0.584962500721156 (log₂(3/2)) MU_LAW = 255.0 (telephony-standard) Four new lab-gated CodecCandidates in codec_rnd_bench.rs: Zipper-I8-φ(8B) — 8 i8, φ-stride, μ-law Zipper-I8-Q5(8B) — 8 i8, Quintenzirkel-stride, μ-law Zipper-I8-φ(64B) — 64 i8, φ-stride, μ-law Zipper-I8-Q5(64B) — 64 i8, Quintenzirkel-stride, μ-law All behind --features lab. Main builds untouched. 17D qualia parallel: the 17D qualia vector (CMYK/RGB transform in lance-graph-cognitive::grammar::qualia) already encodes cognitive features in a harmonic-frequency domain. Quintenzirkel stride in the codec mirrors this — harmonic structure as the natural ordering for both perceptual (qualia) and computational (codec) representations. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Per user: negative-canceling bipolar with 5^5 (3125 states) and 7^7 (823,543 states) structure. Key fix from prior negative result: GLOBAL (population-wide) scale instead of per-row max-abs. Zipper5LevelDescriptor: - Values ∈ {-2, -1, 0, +1, +2} - 5 samples = 5^5 states, packs to ~2 B - 25 samples = 5 × 5^5, packs to ~10 B - bundle() saturates at ±2; negative values cancel (VSA semantics) - compute_global_scale() returns median |coef| across population Zipper7LevelDescriptor: - Values ∈ {-3, -2, -1, 0, +1, +2, +3} - 7 samples = 7^7 states, packs to ~3 B - 49 samples = 7 × 7^7, packs to ~18 B - bundle() saturates at ±3 Thresholds at half-integer multiples of global_scale: 5-level: {-1.5, -0.5, +0.5, +1.5} × scale 7-level: {-2.5, -1.5, -0.5, +0.5, +1.5, +2.5} × scale This unifies the Structured5x5 ethos from PR #209 with the φ-stride zipper sampling. Negative cancellation on bundling means noise cancels, signal accumulates — useful for VSA query superposition (not directly measured by the pair-cosine bench, but a property the descriptor holds). 4 new lab-gated CodecCandidates: Zipper-5^5(2B) — 5 samples, 5-level Zipper-5^5×5(10B) — 25 samples, 5-level Zipper-7^7(3B) — 7 samples, 7-level Zipper-7^7×7(18B) — 49 samples, 7-level https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

…sweep Three populations, 12 codecs, 1400s wall. Best zipper: 7^7×7 at 18 B → ICC 0.144 on q_proj (Pareto point between Base17 0.024 and Zipper-Full 0.20). Existing sweep has Had-Q5×D-R at ICC 0.989 / 0-B-per-row (shared codebook, TurboQuant-class). This is the argmax leader; nothing in zipper family competes on pure ICC. Quintenzirkel empirically loses to φ across all size tiers. Per-row μ-law normalization destroys inter-row magnitude info. Global-scale 5^5/7^7 recovers some (7^7×7 at 18 B > I8 μ-law at 64 B). Pragmatic: use Had-Q5×D-R for production, zipper only when bundling/ progressive-decode/anti-moiré properties matter. Unmeasured: MRI differential phase, Fibonacci bundling, audiophile multi-band precision. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Comprehensive findings from the fractal → zipper research arc (2026-04-19/20). Captures measured ICC, decision tree, invariants, and dead ends so future sessions don't re-derive them. Does the zipper fix the argmax blind spot? NO. Already fixed by Had-Q5×D-R (ICC 0.989) and I8-Hadamard (ICC ~0.9). Zipper hits 0.20 and fixes DIFFERENT blind spots: no-codebook calibration, progressive decode, bundling with negative cancellation, Fibonacci-weighted 256-signal capacity. 5 invariants established by measurement: I1: Sign-flip invariance kills argmax ICC I2: Per-row normalization destroys inter-row magnitude info I3: Maximally-irrational strides beat harmonic for argmax (φ > Q5) I4: Aperiodic φ-stride beats linear dyadic on butterfly signals I5: Sign bits carry less info but avoid i8 pitfalls Decision tree + measured hierarchy + 5 dead ends + 5 unmeasured probes + recipe for adding new CodecCandidates. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c4c0e7f7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-20T05:31:04Z

+    fn bytes_per_row(&self) -> usize { 5 }
+    fn pairwise_scores(&self, rows: &[Vec<f32>]) -> Vec<f64> {
+        use bgz_tensor::fractal_descriptor::PhaseDescriptor;


Quantize phase features before reporting 5-byte size

Fractal-Phase(5B) reports a 5-byte payload, but its scoring path uses PhaseDescriptor values as raw f32 features (5 floats) with no packing/quantization step. This means the measured quality for this candidate (and Phase+Base17(39B), which reuses the same phase descriptor) is computed with substantially more precision than the advertised byte budget, skewing the benchmark’s quality-vs-size comparison. Either quantize to an actual 5-byte representation before scoring or report the true payload size.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-20T05:31:04Z

+    /// Compute population-global scale: median of per-row max-abs,
+    /// scaled so ~70% of coefficients land in the middle 3 levels.
+    pub fn compute_global_scale(rows: &[Vec<f32>]) -> f32 {
+        let mut all_abs: Vec<f32> = Vec::with_capacity(rows.len() * rows[0].len());


Guard global-scale computation against empty input

compute_global_scale indexes rows[0] unconditionally to size all_abs, so calling it with an empty slice panics immediately. Since this is a public helper, it should fail explicitly (e.g., assert with a clear message or return Option/Result) instead of triggering an index-out-of-bounds panic on empty populations.

Useful? React with 👍 / 👎.

…sult D2 — cam_pq_calibrate binary: reads safetensors, classifies tensors via route_tensor (D1), trains a CamCodebook per argmax-regime tensor, encodes all rows to 6-byte fingerprints, measures ICC_3_1 and relative L2 error, writes codebooks + fingerprints + manifest.json. D5 — full-size validation on Qwen3-TTS-0.6B: FAILS. 234 argmax-regime tensors measured. Mean ICC = 0.195, zero tensors meet the ≥0.99 gate. Relative L2 error 0.70–0.90. Root cause: PR #218 bench measured ICC 0.9998 on 128 rows trained and measured on those same 128 rows — a trivially-correct fit (128 ≤ 256 centroids → every row gets its own centroid). At production tensor sizes (1024–3072 rows), the 6×256 codebook is centroid-starved. cam_pq_row_count_probe.rs demonstrates the collapse: n=128 → icc_train=1.000, icc_all=-0.304 n=3072 → icc_train=-0.079 Also broadens route_tensor embedding match to catch codec_embedding, adding 2 new test cases (10 total, 133/133 contract tests pass). Infrastructure (CLI, serialization, measurement) is sound. The negative result is in the codec's capacity vs tensor row counts, not the tooling. Plan needs revision before D6/D7 effort. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 9 commits April 19, 2026 22:10

AdaWorldAPI merged commit 357b0d2 into main Apr 20, 2026

chatgpt-codex-connector Bot reviewed Apr 20, 2026

View reviewed changes

AdaWorldAPI mentioned this pull request Apr 20, 2026

D1+D2+D5: CAM-PQ calibration pipeline — honest negative result #220

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lab): zipper codec family + fractal-leaf research arc + findings doc#218

feat(lab): zipper codec family + fractal-leaf research arc + findings doc#218
AdaWorldAPI merged 9 commits into
mainfrom
claude/quick-wins-2026-04-19

AdaWorldAPI commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 20, 2026

Summary

What's shipped

Measured results (q_proj L0, Qwen3-8B, N=128 rows, ICC_3_1)

Key diagnoses

Test plan

Unmeasured follow-ups (filed in findings doc)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants