Skip to content

feat(lab): zipper codec family + fractal-leaf research arc + findings doc#218

Merged
AdaWorldAPI merged 9 commits into
mainfrom
claude/quick-wins-2026-04-19
Apr 20, 2026
Merged

feat(lab): zipper codec family + fractal-leaf research arc + findings doc#218
AdaWorldAPI merged 9 commits into
mainfrom
claude/quick-wins-2026-04-19

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Full research arc from fractal-leaf proposal to measured zipper codec family, with empirical answers to "does the zipper fix the argmax blind spot?" (no) and "what does it fix?" (no-codebook / progressive-decode / bundling). All lab-gated behind --features lab; main builds unaffected.

What's shipped

Lab-gated code (cargo --features lab):

  • bgz-tensor::fractal_descriptor — MFDFA magnitude-only (empirically dead)
  • bgz-tensor::zipper — sign-bit zipper (ZipperDescriptor), I8-μ-law zipper (ZipperI8Descriptor), 5-level signed bipolar (Zipper5LevelDescriptor), 7-level signed bipolar (Zipper7LevelDescriptor), plus bundle() for VSA negative-cancellation superposition
  • bgz-tensor/examples/fractal_probe.rs — HF-streaming CoV(w) probe
  • thinking-engine/examples/codec_rnd_bench.rs — 10 new lab-gated codec candidates registered alongside the 67 production candidates

Comprehensive findings documentation:

  • .claude/knowledge/codec-findings-2026-04-20.md — 242-line reference. Measured ICC hierarchy, 5 invariants, decision tree, dead ends to avoid, 5 unmeasured probe queue items, recipe for adding new candidates. Deliberately prevents future sessions from re-deriving these measurements.
  • EPIPHANIES.md — 5 dated findings covering the full arc
  • IDEAS.md — zipper architecture, fractal round-trip, round-trip codec
  • TECH_DEBT.md — existing entries preserved

Measured results (q_proj L0, Qwen3-8B, N=128 rows, ICC_3_1)

Codec Bytes ICC Verdict
Passthrough row×4 1.000 baseline
Had-Q5×D-R (already shipping) 0/row 0.989 argmax leader, shared codebook
I8-Hadamard (est) 9 ~0.9 argmax, per-row-only
Zipper-Full (sign+mag) 64 0.204 new Pareto point
Zipper-7^7×7 18 0.144 new compact Pareto
Zipper-Phase 8 0.097 beats Base17 at 1/4 bytes
Base17 34 0.024 dominated
Fractal-Desc (magnitude) 7 −0.996 DEAD
Fractal-Phase (flip density) 5 −0.997 DEAD

Key diagnoses

  1. Sign-flip invariance bug (your diagnosis, confirmed empirically): fractal descriptors using MFDFA variance or flip-density are invariant under WHT linearity of negation → ICC → −1. Fix: sample actual sign bits, not derived statistics.
  2. Per-row i8 normalization destroys inter-row magnitude info. Fix: global-scale (population median) quantization, as used in 5^5/7^7 variants.
  3. Quintenzirkel stride loses to φ-stride on argmax — harmonic-proximity ordering is the wrong metric here; maximal-irrationality (φ) dominates. Quintenzirkel may still win on other tasks.
  4. Argmax blind spot was already fixed by Had-Q5×D-R and I8-Hadamard in the existing sweep. The zipper family does NOT close that gap and is not intended to — it fills different constraint cells (no-codebook, progressive, bundling).

Test plan

  • 6/6 zipper unit tests pass (sign-bit, I8, 5-level, 7-level encode/pack/self-similarity/sign-flip/scaling/bundle)
  • Fractal descriptor 6/6 tests pass
  • cargo check --features lab — lab gate works both ways (on/off)
  • End-to-end bench runs against Qwen3-8B shard 1, 1400 s, produces all codec rows across 3 populations
  • Main-build surface unchanged (no exports leak into main binary without --features lab)

Unmeasured follow-ups (filed in findings doc)

  • MRI-style differential phase (N rotations, inter-view phase delta)
  • Fibonacci-weighted bundling (256-signal capacity via Zeckendorf decode)
  • Audiophile multi-band precision (non-uniform bit allocation per position)
  • JL multi-view phase cleaning (√N SNR improvement)
  • Gamma-calibrated global scale via ICC optimization

Each ~50-100 LOC + one bench run.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 9 commits April 19, 2026 22:10
…scales

Per EPIPHANIES 2026-04-19 CORRECTION: magnitude-only fractal leaf
measured the envelope (D, w, σ, H) which is near-constant across rows.
The per-row variation lives in the SIGN PATTERN of Hadamard-rotated
coefficients — that is the phase.

New primitive in bgz_tensor::fractal_descriptor (lab-gated):

  PhaseDescriptor {
    flip_density: [f32; 5]  // scales s ∈ {4, 8, 16, 32, 64}
  }

  PhaseDescriptor::from_row(row) -> PhaseDescriptor
    1. wht_f32(row)                     — orthogonal projection
    2. sign sequence s_i = sign(c_i)
    3. count sign flips per window at 5 scales, normalize → density
    4. 5-D signature per row

  PhaseDescriptor::cosine(other) -> f32
    normalized dot product between two 5-D phase signatures

Two new CodecCandidates in codec_rnd_bench.rs (lab-gated):

  FractalPhaseOnly          5 B    fractal phase signature alone
  FractalPhasePlusBase17    39 B   0.75*Base17 + 0.25*phase blend

Re-runs through the same endpoint psychometric suite
(bgz_tensor::quality::icc_3_1 + cronbach_alpha + spearman + 7 others).
Direct comparison to the magnitude-only variant that measured ICC_3_1
= -0.9955 on Qwen3-8B q_proj.

Gates unchanged: all behind --features lab. Main builds untouched.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
… dead

Ran codec_rnd_bench.rs with both fractal variants. Qwen3-8B q_proj L0,
N=128 rows, pairwise cosine ground truth.

| Fractal-Desc (magnitude, 7 B)  | ICC_3_1 = -0.9955 |
| Fractal-Phase (phase, 5 B)     | ICC_3_1 = -0.9972 |
| Fractal + Base17               | ICC_3_1 = -0.4879 |
| Phase + Base17                 | ICC_3_1 = -0.4982 |

BOTH orthogonal axes of row-level fractal statistics are flat across
rows after Hadamard. Per I2 (near-orthogonality), any row-level
summary statistic looks identical once rows are Gaussian-ish
post-rotation. Discrimination requires full sign/magnitude
coordinate pattern (~512 B/row).

Fractal-leaf line of research closed for row-level compression. Three
probes completed, all negative. Only still-open variant:
fractal-interpolation-between-Base17-anchors for round-trip codec
(unmeasured, unbuilt).

I8-Hadamard ~9 B remains the argmax-regime leader. Don't pursue
row-level-statistic fractal compression further.

Wall time: 22.5 min, 4 new candidates on 60-codec sweep, 128 rows.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
… bgz17 container

Per user + existing phi-spiral-reconstruction.md "family zipper"
concept: bgz17 halo isn't waste, it's magnitude storage at a
different φ-stride.

Supersedes the triple-channel matryoshka proposal (3 separate
containers) with a single-container zipper:

  phase    stride = round(N / φ)   → ~48-64 bits (existing bgz17)
  mag      stride = round(N / φ²)  → ~48-64 bits (halo positions)
  halo-rem                         → ~16,200 bits (ECC / future)

Both strides maximally-irrational → both anti-moiré ("X-Trans") →
coincidences at φ-ratios → hidden moiré preserved for both streams
in the same container.

Zeckendorf property: unique non-adjacent Fibonacci decomposition →
non-colliding strides are mathematical, not hand-tuned.

Matryoshka truncation preserved: read phase alone = coarse, read
phase + mag = fine, read halo ECC = corrected. Single stride-aware
reader, not 3 parallel ones.

Halo utilization: 0.3% → 0.6% signal density. Advantage over triple-
channel: 1 container vs 3, matches existing bgz17 design intent.

Next: implement bgz17::zipper_{encode,decode}, add ZipperCodec as
lab-gated candidate in codec_rnd_bench.rs, measure ICC_3_1.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…6-64 active

Implements the φ-zipper architecture from IDEAS.md 2026-04-19. Single
container carries two φ-stride-multiplexed streams:

  PHASE_ACTIVE_BITS    = 64   (explicit constant)
  MAG_ACTIVE_SAMPLES   = 56   (explicit constant)
  ZIPPER_BYTES         = 64   (8 B phase + 56 B i8 magnitude)

Both streams share one row, at different φ-strides:

  phase stride = round(N / φ)     — Base17-style aperiodic sampling
  mag stride   = round(N / φ²)    — Zeckendorf-non-adjacent stride

Zeckendorf property: non-adjacent Fibonacci indices → strides
mathematically non-colliding. No hand-tuning.

Both streams maximally-irrational vs the Hadamard butterfly → both
anti-moiré ("X-Trans sensor" principle). Coincidences at φ-ratios =
"hidden moiré" — dispersed below visibility.

Matryoshka truncation via single descriptor:
  cosine_phase_only       (8 B)   coarse decode
  cosine_magnitude_only   (56 B)  magnitude alone (diagnostic)
  cosine_zipper_full      (64 B)  full decode — 0.5 phase + 0.5 mag

6/6 unit tests pass:
  constants_are_explicit       (locks 64 / 56 / 64)
  encode_pack_roundtrip
  self_similarity_unity        (cos(d, d) = 1.0)
  different_rows_lower         (random rows don't falsely agree)
  sign_flip_inverts_both       (Hadamard linearity: -row → -cos)
  positive_scaling_preserves   (k·row → cos = 1 for k > 0)

Wired as lab-gated candidates in codec_rnd_bench.rs:
  ZipperPhaseOnly (8 B)
  ZipperFull (64 B)

Next: run bench → measure ICC_3_1 vs Base17 (0.024) and fractal
candidates (-0.9955 / -0.9972). Hypothesis: zipper beats both because
magnitude-stream carries independent signal not captured by row-level
fractal statistics.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…ctal bug

Three-population ICC measurement confirms both the diagnosis and fix:

Sign-flip invariance of fractal descriptors (MFDFA variance + flip
density both unchanged under WHT linearity of negation) → codec sees
cos(x, -x) = 1.0 while ground truth sees -1.0 → perfect ranking
inversion → ICC = -0.999. Not "no signal", but "collapsed opposites".

Zipper fix: sample sign BITS at positions, not derived statistics.
Invariance broken, anti-correlation vanishes, POSITIVE ICC restored.

| Codec | Bytes | k_proj  | gate_proj | q_proj |
| Base17 | 34    |  0.007  |  0.012    | 0.024  |
| Fractal-X |   | -0.999  | -0.999    | -0.996 |
| Zipper-Phase | 8 | 0.050 |  0.049    | 0.097  | (beats Base17 @ 1/4 bytes)
| Zipper-Full | 64 | 0.129 |  0.107    | 0.203  | (top-5 recall 0.6)

Still behind I8-Hadamard leader (ICC ~0.9 at 9 B), but FIRST
fractal-family codec with positive ICC. Anti-moiré φ-stride +
explicit sign preservation is the working recipe.

Next probes: wider phase stream, φ-permute morph, different bases,
blend weight tuning.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Three improvements to the zipper codec:

1. ZipperI8Descriptor — i8 samples (sign+magnitude) instead of sign-only
   bits. 8× info density per byte. Same budget as Zipper-Phase, vastly
   denser signal.

2. Quintenzirkel stride — log₂(3/2) ≈ 0.585 irrational rotation. Circle
   of Fifths ordering: adjacent samples are harmonically related (consonant).
   Natural truncation: 7 samples = diatonic, 12 = chromatic, 24+ = overtone.
   Tests harmonic-proximity ordering vs φ's maximal-irrationality.

3. μ-law companding (MU_LAW=255) — gamma-corrected i8 quantization.
   sign(x) * log(1 + μ|x|) / log(1 + μ) → concentrates precision near
   zero where argmax decisions happen, coarsens at extremes.
   Inverse: mu_law_decode for reconstruction.

Constants made explicit:
  QUINT_STRIDE = 0.584962500721156  (log₂(3/2))
  MU_LAW       = 255.0              (telephony-standard)

Four new lab-gated CodecCandidates in codec_rnd_bench.rs:
  Zipper-I8-φ(8B)   — 8 i8, φ-stride, μ-law
  Zipper-I8-Q5(8B)  — 8 i8, Quintenzirkel-stride, μ-law
  Zipper-I8-φ(64B)  — 64 i8, φ-stride, μ-law
  Zipper-I8-Q5(64B) — 64 i8, Quintenzirkel-stride, μ-law

All behind --features lab. Main builds untouched.

17D qualia parallel: the 17D qualia vector (CMYK/RGB transform in
lance-graph-cognitive::grammar::qualia) already encodes cognitive
features in a harmonic-frequency domain. Quintenzirkel stride in the
codec mirrors this — harmonic structure as the natural ordering for
both perceptual (qualia) and computational (codec) representations.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Per user: negative-canceling bipolar with 5^5 (3125 states) and 7^7
(823,543 states) structure. Key fix from prior negative result:
GLOBAL (population-wide) scale instead of per-row max-abs.

Zipper5LevelDescriptor:
  - Values ∈ {-2, -1, 0, +1, +2}
  - 5 samples = 5^5 states, packs to ~2 B
  - 25 samples = 5 × 5^5, packs to ~10 B
  - bundle() saturates at ±2; negative values cancel (VSA semantics)
  - compute_global_scale() returns median |coef| across population

Zipper7LevelDescriptor:
  - Values ∈ {-3, -2, -1, 0, +1, +2, +3}
  - 7 samples = 7^7 states, packs to ~3 B
  - 49 samples = 7 × 7^7, packs to ~18 B
  - bundle() saturates at ±3

Thresholds at half-integer multiples of global_scale:
  5-level: {-1.5, -0.5, +0.5, +1.5} × scale
  7-level: {-2.5, -1.5, -0.5, +0.5, +1.5, +2.5} × scale

This unifies the Structured5x5 ethos from PR #209 with the φ-stride
zipper sampling. Negative cancellation on bundling means noise cancels,
signal accumulates — useful for VSA query superposition (not directly
measured by the pair-cosine bench, but a property the descriptor holds).

4 new lab-gated CodecCandidates:
  Zipper-5^5(2B)   — 5 samples,  5-level
  Zipper-5^5×5(10B) — 25 samples, 5-level
  Zipper-7^7(3B)   — 7 samples,  7-level
  Zipper-7^7×7(18B) — 49 samples, 7-level

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…sweep

Three populations, 12 codecs, 1400s wall.

Best zipper: 7^7×7 at 18 B → ICC 0.144 on q_proj (Pareto point between
Base17 0.024 and Zipper-Full 0.20).

Existing sweep has Had-Q5×D-R at ICC 0.989 / 0-B-per-row (shared
codebook, TurboQuant-class). This is the argmax leader; nothing in
zipper family competes on pure ICC.

Quintenzirkel empirically loses to φ across all size tiers.

Per-row μ-law normalization destroys inter-row magnitude info.
Global-scale 5^5/7^7 recovers some (7^7×7 at 18 B > I8 μ-law at 64 B).

Pragmatic: use Had-Q5×D-R for production, zipper only when bundling/
progressive-decode/anti-moiré properties matter.

Unmeasured: MRI differential phase, Fibonacci bundling, audiophile
multi-band precision.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Comprehensive findings from the fractal → zipper research arc
(2026-04-19/20). Captures measured ICC, decision tree, invariants,
and dead ends so future sessions don't re-derive them.

Does the zipper fix the argmax blind spot? NO. Already fixed by
Had-Q5×D-R (ICC 0.989) and I8-Hadamard (ICC ~0.9). Zipper hits 0.20
and fixes DIFFERENT blind spots: no-codebook calibration, progressive
decode, bundling with negative cancellation, Fibonacci-weighted
256-signal capacity.

5 invariants established by measurement:
  I1: Sign-flip invariance kills argmax ICC
  I2: Per-row normalization destroys inter-row magnitude info
  I3: Maximally-irrational strides beat harmonic for argmax (φ > Q5)
  I4: Aperiodic φ-stride beats linear dyadic on butterfly signals
  I5: Sign bits carry less info but avoid i8 pitfalls

Decision tree + measured hierarchy + 5 dead ends + 5 unmeasured probes
+ recipe for adding new CodecCandidates.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
@AdaWorldAPI AdaWorldAPI merged commit 357b0d2 into main Apr 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c4c0e7f7f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +198 to +200
fn bytes_per_row(&self) -> usize { 5 }
fn pairwise_scores(&self, rows: &[Vec<f32>]) -> Vec<f64> {
use bgz_tensor::fractal_descriptor::PhaseDescriptor;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Quantize phase features before reporting 5-byte size

Fractal-Phase(5B) reports a 5-byte payload, but its scoring path uses PhaseDescriptor values as raw f32 features (5 floats) with no packing/quantization step. This means the measured quality for this candidate (and Phase+Base17(39B), which reuses the same phase descriptor) is computed with substantially more precision than the advertised byte budget, skewing the benchmark’s quality-vs-size comparison. Either quantize to an actual 5-byte representation before scoring or report the true payload size.

Useful? React with 👍 / 👎.

/// Compute population-global scale: median of per-row max-abs,
/// scaled so ~70% of coefficients land in the middle 3 levels.
pub fn compute_global_scale(rows: &[Vec<f32>]) -> f32 {
let mut all_abs: Vec<f32> = Vec::with_capacity(rows.len() * rows[0].len());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard global-scale computation against empty input

compute_global_scale indexes rows[0] unconditionally to size all_abs, so calling it with an empty slice panics immediately. Since this is a public helper, it should fail explicitly (e.g., assert with a clear message or return Option/Result) instead of triggering an index-out-of-bounds panic on empty populations.

Useful? React with 👍 / 👎.

AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
…sult

D2 — cam_pq_calibrate binary: reads safetensors, classifies tensors via
route_tensor (D1), trains a CamCodebook per argmax-regime tensor, encodes
all rows to 6-byte fingerprints, measures ICC_3_1 and relative L2 error,
writes codebooks + fingerprints + manifest.json.

D5 — full-size validation on Qwen3-TTS-0.6B: FAILS. 234 argmax-regime
tensors measured. Mean ICC = 0.195, zero tensors meet the ≥0.99 gate.
Relative L2 error 0.70–0.90.

Root cause: PR #218 bench measured ICC 0.9998 on 128 rows trained and
measured on those same 128 rows — a trivially-correct fit (128 ≤ 256
centroids → every row gets its own centroid). At production tensor
sizes (1024–3072 rows), the 6×256 codebook is centroid-starved.

cam_pq_row_count_probe.rs demonstrates the collapse:
  n=128 → icc_train=1.000, icc_all=-0.304
  n=3072 → icc_train=-0.079

Also broadens route_tensor embedding match to catch codec_embedding,
adding 2 new test cases (10 total, 133/133 contract tests pass).

Infrastructure (CLI, serialization, measurement) is sound. The
negative result is in the codec's capacity vs tensor row counts, not
the tooling. Plan needs revision before D6/D7 effort.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants