feat(lab): zipper codec family + fractal-leaf research arc + findings doc#218
Conversation
…scales
Per EPIPHANIES 2026-04-19 CORRECTION: magnitude-only fractal leaf
measured the envelope (D, w, σ, H) which is near-constant across rows.
The per-row variation lives in the SIGN PATTERN of Hadamard-rotated
coefficients — that is the phase.
New primitive in bgz_tensor::fractal_descriptor (lab-gated):
PhaseDescriptor {
flip_density: [f32; 5] // scales s ∈ {4, 8, 16, 32, 64}
}
PhaseDescriptor::from_row(row) -> PhaseDescriptor
1. wht_f32(row) — orthogonal projection
2. sign sequence s_i = sign(c_i)
3. count sign flips per window at 5 scales, normalize → density
4. 5-D signature per row
PhaseDescriptor::cosine(other) -> f32
normalized dot product between two 5-D phase signatures
Two new CodecCandidates in codec_rnd_bench.rs (lab-gated):
FractalPhaseOnly 5 B fractal phase signature alone
FractalPhasePlusBase17 39 B 0.75*Base17 + 0.25*phase blend
Re-runs through the same endpoint psychometric suite
(bgz_tensor::quality::icc_3_1 + cronbach_alpha + spearman + 7 others).
Direct comparison to the magnitude-only variant that measured ICC_3_1
= -0.9955 on Qwen3-8B q_proj.
Gates unchanged: all behind --features lab. Main builds untouched.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
… dead Ran codec_rnd_bench.rs with both fractal variants. Qwen3-8B q_proj L0, N=128 rows, pairwise cosine ground truth. | Fractal-Desc (magnitude, 7 B) | ICC_3_1 = -0.9955 | | Fractal-Phase (phase, 5 B) | ICC_3_1 = -0.9972 | | Fractal + Base17 | ICC_3_1 = -0.4879 | | Phase + Base17 | ICC_3_1 = -0.4982 | BOTH orthogonal axes of row-level fractal statistics are flat across rows after Hadamard. Per I2 (near-orthogonality), any row-level summary statistic looks identical once rows are Gaussian-ish post-rotation. Discrimination requires full sign/magnitude coordinate pattern (~512 B/row). Fractal-leaf line of research closed for row-level compression. Three probes completed, all negative. Only still-open variant: fractal-interpolation-between-Base17-anchors for round-trip codec (unmeasured, unbuilt). I8-Hadamard ~9 B remains the argmax-regime leader. Don't pursue row-level-statistic fractal compression further. Wall time: 22.5 min, 4 new candidates on 60-codec sweep, 128 rows. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
… bgz17 container
Per user + existing phi-spiral-reconstruction.md "family zipper"
concept: bgz17 halo isn't waste, it's magnitude storage at a
different φ-stride.
Supersedes the triple-channel matryoshka proposal (3 separate
containers) with a single-container zipper:
phase stride = round(N / φ) → ~48-64 bits (existing bgz17)
mag stride = round(N / φ²) → ~48-64 bits (halo positions)
halo-rem → ~16,200 bits (ECC / future)
Both strides maximally-irrational → both anti-moiré ("X-Trans") →
coincidences at φ-ratios → hidden moiré preserved for both streams
in the same container.
Zeckendorf property: unique non-adjacent Fibonacci decomposition →
non-colliding strides are mathematical, not hand-tuned.
Matryoshka truncation preserved: read phase alone = coarse, read
phase + mag = fine, read halo ECC = corrected. Single stride-aware
reader, not 3 parallel ones.
Halo utilization: 0.3% → 0.6% signal density. Advantage over triple-
channel: 1 container vs 3, matches existing bgz17 design intent.
Next: implement bgz17::zipper_{encode,decode}, add ZipperCodec as
lab-gated candidate in codec_rnd_bench.rs, measure ICC_3_1.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…6-64 active
Implements the φ-zipper architecture from IDEAS.md 2026-04-19. Single
container carries two φ-stride-multiplexed streams:
PHASE_ACTIVE_BITS = 64 (explicit constant)
MAG_ACTIVE_SAMPLES = 56 (explicit constant)
ZIPPER_BYTES = 64 (8 B phase + 56 B i8 magnitude)
Both streams share one row, at different φ-strides:
phase stride = round(N / φ) — Base17-style aperiodic sampling
mag stride = round(N / φ²) — Zeckendorf-non-adjacent stride
Zeckendorf property: non-adjacent Fibonacci indices → strides
mathematically non-colliding. No hand-tuning.
Both streams maximally-irrational vs the Hadamard butterfly → both
anti-moiré ("X-Trans sensor" principle). Coincidences at φ-ratios =
"hidden moiré" — dispersed below visibility.
Matryoshka truncation via single descriptor:
cosine_phase_only (8 B) coarse decode
cosine_magnitude_only (56 B) magnitude alone (diagnostic)
cosine_zipper_full (64 B) full decode — 0.5 phase + 0.5 mag
6/6 unit tests pass:
constants_are_explicit (locks 64 / 56 / 64)
encode_pack_roundtrip
self_similarity_unity (cos(d, d) = 1.0)
different_rows_lower (random rows don't falsely agree)
sign_flip_inverts_both (Hadamard linearity: -row → -cos)
positive_scaling_preserves (k·row → cos = 1 for k > 0)
Wired as lab-gated candidates in codec_rnd_bench.rs:
ZipperPhaseOnly (8 B)
ZipperFull (64 B)
Next: run bench → measure ICC_3_1 vs Base17 (0.024) and fractal
candidates (-0.9955 / -0.9972). Hypothesis: zipper beats both because
magnitude-stream carries independent signal not captured by row-level
fractal statistics.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…ctal bug Three-population ICC measurement confirms both the diagnosis and fix: Sign-flip invariance of fractal descriptors (MFDFA variance + flip density both unchanged under WHT linearity of negation) → codec sees cos(x, -x) = 1.0 while ground truth sees -1.0 → perfect ranking inversion → ICC = -0.999. Not "no signal", but "collapsed opposites". Zipper fix: sample sign BITS at positions, not derived statistics. Invariance broken, anti-correlation vanishes, POSITIVE ICC restored. | Codec | Bytes | k_proj | gate_proj | q_proj | | Base17 | 34 | 0.007 | 0.012 | 0.024 | | Fractal-X | | -0.999 | -0.999 | -0.996 | | Zipper-Phase | 8 | 0.050 | 0.049 | 0.097 | (beats Base17 @ 1/4 bytes) | Zipper-Full | 64 | 0.129 | 0.107 | 0.203 | (top-5 recall 0.6) Still behind I8-Hadamard leader (ICC ~0.9 at 9 B), but FIRST fractal-family codec with positive ICC. Anti-moiré φ-stride + explicit sign preservation is the working recipe. Next probes: wider phase stream, φ-permute morph, different bases, blend weight tuning. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Three improvements to the zipper codec: 1. ZipperI8Descriptor — i8 samples (sign+magnitude) instead of sign-only bits. 8× info density per byte. Same budget as Zipper-Phase, vastly denser signal. 2. Quintenzirkel stride — log₂(3/2) ≈ 0.585 irrational rotation. Circle of Fifths ordering: adjacent samples are harmonically related (consonant). Natural truncation: 7 samples = diatonic, 12 = chromatic, 24+ = overtone. Tests harmonic-proximity ordering vs φ's maximal-irrationality. 3. μ-law companding (MU_LAW=255) — gamma-corrected i8 quantization. sign(x) * log(1 + μ|x|) / log(1 + μ) → concentrates precision near zero where argmax decisions happen, coarsens at extremes. Inverse: mu_law_decode for reconstruction. Constants made explicit: QUINT_STRIDE = 0.584962500721156 (log₂(3/2)) MU_LAW = 255.0 (telephony-standard) Four new lab-gated CodecCandidates in codec_rnd_bench.rs: Zipper-I8-φ(8B) — 8 i8, φ-stride, μ-law Zipper-I8-Q5(8B) — 8 i8, Quintenzirkel-stride, μ-law Zipper-I8-φ(64B) — 64 i8, φ-stride, μ-law Zipper-I8-Q5(64B) — 64 i8, Quintenzirkel-stride, μ-law All behind --features lab. Main builds untouched. 17D qualia parallel: the 17D qualia vector (CMYK/RGB transform in lance-graph-cognitive::grammar::qualia) already encodes cognitive features in a harmonic-frequency domain. Quintenzirkel stride in the codec mirrors this — harmonic structure as the natural ordering for both perceptual (qualia) and computational (codec) representations. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Per user: negative-canceling bipolar with 5^5 (3125 states) and 7^7
(823,543 states) structure. Key fix from prior negative result:
GLOBAL (population-wide) scale instead of per-row max-abs.
Zipper5LevelDescriptor:
- Values ∈ {-2, -1, 0, +1, +2}
- 5 samples = 5^5 states, packs to ~2 B
- 25 samples = 5 × 5^5, packs to ~10 B
- bundle() saturates at ±2; negative values cancel (VSA semantics)
- compute_global_scale() returns median |coef| across population
Zipper7LevelDescriptor:
- Values ∈ {-3, -2, -1, 0, +1, +2, +3}
- 7 samples = 7^7 states, packs to ~3 B
- 49 samples = 7 × 7^7, packs to ~18 B
- bundle() saturates at ±3
Thresholds at half-integer multiples of global_scale:
5-level: {-1.5, -0.5, +0.5, +1.5} × scale
7-level: {-2.5, -1.5, -0.5, +0.5, +1.5, +2.5} × scale
This unifies the Structured5x5 ethos from PR #209 with the φ-stride
zipper sampling. Negative cancellation on bundling means noise cancels,
signal accumulates — useful for VSA query superposition (not directly
measured by the pair-cosine bench, but a property the descriptor holds).
4 new lab-gated CodecCandidates:
Zipper-5^5(2B) — 5 samples, 5-level
Zipper-5^5×5(10B) — 25 samples, 5-level
Zipper-7^7(3B) — 7 samples, 7-level
Zipper-7^7×7(18B) — 49 samples, 7-level
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
…sweep Three populations, 12 codecs, 1400s wall. Best zipper: 7^7×7 at 18 B → ICC 0.144 on q_proj (Pareto point between Base17 0.024 and Zipper-Full 0.20). Existing sweep has Had-Q5×D-R at ICC 0.989 / 0-B-per-row (shared codebook, TurboQuant-class). This is the argmax leader; nothing in zipper family competes on pure ICC. Quintenzirkel empirically loses to φ across all size tiers. Per-row μ-law normalization destroys inter-row magnitude info. Global-scale 5^5/7^7 recovers some (7^7×7 at 18 B > I8 μ-law at 64 B). Pragmatic: use Had-Q5×D-R for production, zipper only when bundling/ progressive-decode/anti-moiré properties matter. Unmeasured: MRI differential phase, Fibonacci bundling, audiophile multi-band precision. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Comprehensive findings from the fractal → zipper research arc (2026-04-19/20). Captures measured ICC, decision tree, invariants, and dead ends so future sessions don't re-derive them. Does the zipper fix the argmax blind spot? NO. Already fixed by Had-Q5×D-R (ICC 0.989) and I8-Hadamard (ICC ~0.9). Zipper hits 0.20 and fixes DIFFERENT blind spots: no-codebook calibration, progressive decode, bundling with negative cancellation, Fibonacci-weighted 256-signal capacity. 5 invariants established by measurement: I1: Sign-flip invariance kills argmax ICC I2: Per-row normalization destroys inter-row magnitude info I3: Maximally-irrational strides beat harmonic for argmax (φ > Q5) I4: Aperiodic φ-stride beats linear dyadic on butterfly signals I5: Sign bits carry less info but avoid i8 pitfalls Decision tree + measured hierarchy + 5 dead ends + 5 unmeasured probes + recipe for adding new CodecCandidates. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c4c0e7f7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn bytes_per_row(&self) -> usize { 5 } | ||
| fn pairwise_scores(&self, rows: &[Vec<f32>]) -> Vec<f64> { | ||
| use bgz_tensor::fractal_descriptor::PhaseDescriptor; |
There was a problem hiding this comment.
Quantize phase features before reporting 5-byte size
Fractal-Phase(5B) reports a 5-byte payload, but its scoring path uses PhaseDescriptor values as raw f32 features (5 floats) with no packing/quantization step. This means the measured quality for this candidate (and Phase+Base17(39B), which reuses the same phase descriptor) is computed with substantially more precision than the advertised byte budget, skewing the benchmark’s quality-vs-size comparison. Either quantize to an actual 5-byte representation before scoring or report the true payload size.
Useful? React with 👍 / 👎.
| /// Compute population-global scale: median of per-row max-abs, | ||
| /// scaled so ~70% of coefficients land in the middle 3 levels. | ||
| pub fn compute_global_scale(rows: &[Vec<f32>]) -> f32 { | ||
| let mut all_abs: Vec<f32> = Vec::with_capacity(rows.len() * rows[0].len()); |
There was a problem hiding this comment.
Guard global-scale computation against empty input
compute_global_scale indexes rows[0] unconditionally to size all_abs, so calling it with an empty slice panics immediately. Since this is a public helper, it should fail explicitly (e.g., assert with a clear message or return Option/Result) instead of triggering an index-out-of-bounds panic on empty populations.
Useful? React with 👍 / 👎.
…sult D2 — cam_pq_calibrate binary: reads safetensors, classifies tensors via route_tensor (D1), trains a CamCodebook per argmax-regime tensor, encodes all rows to 6-byte fingerprints, measures ICC_3_1 and relative L2 error, writes codebooks + fingerprints + manifest.json. D5 — full-size validation on Qwen3-TTS-0.6B: FAILS. 234 argmax-regime tensors measured. Mean ICC = 0.195, zero tensors meet the ≥0.99 gate. Relative L2 error 0.70–0.90. Root cause: PR #218 bench measured ICC 0.9998 on 128 rows trained and measured on those same 128 rows — a trivially-correct fit (128 ≤ 256 centroids → every row gets its own centroid). At production tensor sizes (1024–3072 rows), the 6×256 codebook is centroid-starved. cam_pq_row_count_probe.rs demonstrates the collapse: n=128 → icc_train=1.000, icc_all=-0.304 n=3072 → icc_train=-0.079 Also broadens route_tensor embedding match to catch codec_embedding, adding 2 new test cases (10 total, 133/133 contract tests pass). Infrastructure (CLI, serialization, measurement) is sound. The negative result is in the codec's capacity vs tensor row counts, not the tooling. Plan needs revision before D6/D7 effort. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Summary
Full research arc from fractal-leaf proposal to measured zipper codec family, with empirical answers to "does the zipper fix the argmax blind spot?" (no) and "what does it fix?" (no-codebook / progressive-decode / bundling). All lab-gated behind
--features lab; main builds unaffected.What's shipped
Lab-gated code (
cargo --features lab):bgz-tensor::fractal_descriptor— MFDFA magnitude-only (empirically dead)bgz-tensor::zipper— sign-bit zipper (ZipperDescriptor), I8-μ-law zipper (ZipperI8Descriptor), 5-level signed bipolar (Zipper5LevelDescriptor), 7-level signed bipolar (Zipper7LevelDescriptor), plusbundle()for VSA negative-cancellation superpositionbgz-tensor/examples/fractal_probe.rs— HF-streaming CoV(w) probethinking-engine/examples/codec_rnd_bench.rs— 10 new lab-gated codec candidates registered alongside the 67 production candidatesComprehensive findings documentation:
.claude/knowledge/codec-findings-2026-04-20.md— 242-line reference. Measured ICC hierarchy, 5 invariants, decision tree, dead ends to avoid, 5 unmeasured probe queue items, recipe for adding new candidates. Deliberately prevents future sessions from re-deriving these measurements.EPIPHANIES.md— 5 dated findings covering the full arcIDEAS.md— zipper architecture, fractal round-trip, round-trip codecTECH_DEBT.md— existing entries preservedMeasured results (q_proj L0, Qwen3-8B, N=128 rows, ICC_3_1)
Had-Q5×D-R(already shipping)Key diagnoses
Had-Q5×D-Rand I8-Hadamard in the existing sweep. The zipper family does NOT close that gap and is not intended to — it fills different constraint cells (no-codebook, progressive, bundling).Test plan
cargo check --features lab— lab gate works both ways (on/off)--features lab)Unmeasured follow-ups (filed in findings doc)
Each ~50-100 LOC + one bench run.
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh