Skip to content
201 changes: 201 additions & 0 deletions crates/lance-graph-codec-research/KNOWLEDGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# KNOWLEDGE.md — ZeckBF17 Research Agent Reference

## What This Crate Is

A standalone research crate testing whether 16,384-dimensional i8 accumulator
planes can be compressed to 48 bytes (341:1) by treating the dimensions as
17 base classes × 964 octaves, where only ~14 octaves carry independent information.

**The one number that matters:** Spearman rank correlation ρ between pairwise
distances computed on full i8[16384] planes vs i16[17] base patterns.
If ρ > 0.937, ZeckBF17 eliminates the dead zone in the Pareto frontier.

## Architecture (the production system this feeds into)

### Three-Layer Search Cascade

```
L1 neighborhood/zeckf64.rs ZeckF64 u64 per edge ρ=0.937 ~1ms
- Byte 0: scent (7 Boolean masks: S close? P? O? SP? SO? PO? SPO?)
- Bytes 1-7: distance quantiles per mask (0=identical, 255=max different)
- 19 of 128 patterns are legal (Boolean lattice constraint)
- Distance: L1 on bytes (Manhattan). NOT Hamming.

L2 blasgraph/types.rs BitVec 16Kbit integrated ρ=0.834
- ONE vector: majority-vote bundle of S ⊕ P ⊕ O
- Distance: Hamming on the bundled vector
- LOSES plane separation → scores LOWER than 1-byte scent
- This is NOT a good baseline for ZeckBF17

L3 spo/ exact S+P+O planes ρ=1.000
- THREE separate 16384-bit planes
- Distance: ds + dp + do (sum of per-plane Hamming)
```

### CRITICAL INSIGHT: Why L2 < L1

The integrated BitVec (ρ=0.834) scores LOWER than the scent byte (ρ=0.937)
because bundling S+P+O into one vector DESTROYS the plane-separation
information that the 7 scent bits preserve. The scent encodes WHICH planes
are close. The integrated BitVec encodes whether the COMBINED signal is
close but can't say which plane caused the difference.

ZeckBF17 stores THREE separate base patterns (i16[17] each), preserving
plane separation like the scent. It should compare against ρ=0.937 (scent)
and ρ=0.982 (full ZeckF64), NOT against ρ=0.834 (integrated BitVec).

### Correct Pareto Frontier

```
Encoding Bytes Preserves S/P/O? ρ vs exact S+P+O
─────────────────────────────────────────────────────────────────────
Scent byte 1 YES (7 masks) 0.937
ZeckBF17 bases 102 YES (3 × i16[17]) ? ← MEASURE
ZeckBF17 edge 116 YES (+ envelope) ? ← MEASURE
Full ZeckF64 8 YES (7 + quantiles) 0.982
Integrated BitVec 2048 NO (bundled S⊕P⊕O) 0.834 ← WRONG CURVE
Exact S+P+O 6144 YES (3 planes) 1.000

The integrated BitVec is on a DIFFERENT Pareto curve (bundled metric).
ZeckBF17 and the scent byte are on the SAME curve (per-plane metric).
```

## ZeckBF17 Format

### Why 17

17 is prime. Golden-ratio step = round(17/φ) = 11. gcd(11,17) = 1 →
visits all 17 residues. This is the discrete golden-angle / X-Trans pattern.

**WARNING:** An earlier version claimed Fibonacci mod 17 visits all 17.
WRONG. Fibonacci mod 17 visits only 13 (missing {6,7,10,11}).
Fibonacci mod p has full coverage only for p ∈ {2, 3, 5, 7}.
The golden-ratio STEP (not Fibonacci SEQUENCE) is the correct traversal.

### Encoding

```rust
ZeckBF17Plane { // 48 bytes
base: [i16; 17], // 34 bytes: mean per base dim, fixed-point ×256
envelope: [u8; 14], // 14 bytes: amplitude scale per independent octave
}

ZeckBF17Edge { // 116 bytes
subject: [i16; 17], // 34 bytes
predicate: [i16; 17], // 34 bytes
object: [i16; 17], // 34 bytes
envelope: [u8; 14], // 14 bytes (shared)
}
```

### Why i16 (not BF16)

BF16: 1 sign + 8 exponent + 7 mantissa. Wastes 8 exponent bits on dynamic
range never used (source is i8). Mean of 0.2 → stores 0.0 → LOSES SIGN.

i16 fixed-point (×256): mean 0.2 → stores 51. Mean -0.003 → stores -1.
256× finer quantization. Native SIMD. Same 34 bytes.

### Distance: L1 on i16 (not Hamming)

```rust
fn base_l1(a: &BasePattern, b: &BasePattern) -> u32 {
a.dims.iter().zip(b.dims.iter())
.map(|(&x, &y)| (x as i32 - y as i32).unsigned_abs())
.sum()
}
```

Matches production `zeckf64_distance()` which is L1 on quantile bytes.
`zeckf64_from_base()` produces a full u64 with the same byte layout
as `neighborhood/zeckf64.rs::zeckf64()`.

### Golden-Step Traversal

```
GOLDEN_POS_17 = [0, 11, 5, 16, 10, 4, 15, 9, 3, 14, 8, 2, 13, 7, 1, 12, 6]

Encode: for each octave (0..964), for each base_idx (0..17):
dim = octave * 17 + GOLDEN_POS_17[base_idx]
base[base_idx] += accumulator[dim] (then average and scale to i16)

Decode: reverse — distribute base pattern across octaves, scaled by envelope.
```

## Codec Session Epiphanies (for context)

```
[3] Alpha(|value|) = GAIN, sign bits = SHAPE. The Pareto frontier's 3 points
are gain-only (8 bits), gain+coarse shape (57 bits), gain+exact shape (49K bits).
L1/L2/L3 are ONE codec at three bitrates.

[6] Semantic folding: crystallized planes predict uncrystallized ones.
ZeckBF17 eliminates the dead zone by CHANGING THE BASIS.

[9] σ-3 crystallization threshold IS a Lagrange multiplier for R-D optimization.
Making σ adaptive per-scope: σ = c·2^((density-12)/3).

[12] Octave 0/1/2 hierarchy IS residual vector quantization.
Combined with 3-layer cascade: 3×3 = 9-level refinement grid.
```

## Known Bugs (as of PR #21)

### FIXED in this branch:
- ✓ Fibonacci → golden step (all 17 residues)
- ✓ BF16 → i16 fixed-point
- ✓ BF16 Hamming → L1 on i16
- ✓ Unused deps removed (hound, half)
- ✓ Accumulator cyclic shift removed (shift mismatch with crystallize/unbind)

### Remaining:
- iMDCT reconstruction incomplete in transform.rs
- FftPlanner::new() per-call in transform.rs (perf)
- Duplicate pearson() in accumulator.rs and universal_perception.rs
- metrics.rs white noise RNG produces correlated data
- decode_hybrid_scent_only unimplemented in hybrid.rs
- Crate not in workspace Cargo.toml (by design — standalone)

## File Map

```
src/zeckbf17.rs 643 lines THE codec. Run fidelity_experiment().
src/accumulator.rs 370 lines Streaming bundle. Shift bug FIXED.
src/diamond.rs 289 lines Diamond Markov extraction.
src/universal_perception.rs 561 lines Noise floor vs masking hypothesis.
src/transform.rs 267 lines MDCT, Bark bands, psychoacoustic mask.
src/bands.rs 161 lines BF16 pack/unpack, weighted Hamming.
src/perframe.rs 147 lines Strategy A: per-frame MDCT.
src/hybrid.rs 235 lines Strategy C: combined.
src/metrics.rs 359 lines 4-strategy comparison.
src/lib.rs 114 lines Types, module declarations.
```

## How To Run

```bash
cd crates/lance-graph-codec-research
cargo test -- --nocapture

# THE critical test:
cargo test test_fidelity_vs_encounters -- --nocapture

# Page curve (after fidelity validates):
cargo test test_diamond -- --nocapture
cargo test test_accumulation_convergence -- --nocapture
cargo test test_universal_perception -- --nocapture
```

## What Success Looks Like

```
encounters | fidelity | ρ(rank) | scent%
5 | > 0.6 | > 0.5 | > 50%
10 | > 0.7 | > 0.7 | > 60%
20 | > 0.8 | > 0.85 | > 70%
50 | > 0.85 | > 0.93 | > 80% ← THIS is the target
100 | > 0.9 | > 0.95 | > 85%

If ρ at 50 encounters > 0.937: ZeckBF17 beats scent-only.
The dead zone disappears. Proceed to Page curve measurement.
```
24 changes: 7 additions & 17 deletions crates/lance-graph-codec-research/src/accumulator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@ use crate::{AudioFrame, AudioQualia, CrystallizedComponent, SpectralAccumulator,
use crate::transform::{mdct, coeffs_to_band_energies, psychoacoustic_mask, sine_window};
use crate::bands::{pack_bands, f32_to_bf16, bf16_to_f32};

use std::f64::consts::PI;

// Golden angle for cyclic shift (same as fibonacci-vsa GOLDEN_ANGLE).
const PHI: f64 = 1.618_033_988_749_895;
const GOLDEN_ANGLE: f64 = 2.0 * PI / (PHI * PHI);

/// Accumulator cell count: 24 bands × 16 bits per BF16 = 384.
const CELL_COUNT: usize = BARK_BANDS * 16;

Expand All @@ -35,29 +29,25 @@ impl SpectralAccumulator {
/// Accumulate one frame into the accumulator.
///
/// The frame's BF16 bands are unpacked to 384 individual bits.
/// Each bit is cyclically shifted by frame_count × golden_angle positions
/// (mod CELL_COUNT), then added to the accumulator via saturating i16 add.
/// Each bit is added to the accumulator via saturating i16 add.
///
/// The cyclic shift ensures that consecutive frames don't align,
/// so only REPEATED patterns accumulate above the noise floor.
/// NOTE: Earlier versions used a cyclic shift per frame for decorrelation.
/// This caused a mismatch: crystallize() and unbind() read un-shifted cells.
/// The shift is removed — decorrelation is handled at the encoding level
/// by ZeckBF17's golden-step traversal, not at the accumulation level.
pub fn accumulate_frame(&mut self, bands: &[u16; BARK_BANDS]) {
// Compute cyclic shift for this frame
let shift = ((self.frame_count as f64 * GOLDEN_ANGLE * CELL_COUNT as f64 / (2.0 * PI))
as usize) % CELL_COUNT;

// Unpack 24 BF16 values into 384 bits
for band in 0..BARK_BANDS {
for bit in 0..16 {
let cell_idx = band * 16 + bit;
let shifted_idx = (cell_idx + shift) % CELL_COUNT;

// Extract bit from BF16 value
let bit_val = ((bands[band] >> bit) & 1) as i16;
// Map 0→-1, 1→+1 for bipolar accumulation
let bipolar = bit_val * 2 - 1;

// Saturating add
self.cells[shifted_idx] = self.cells[shifted_idx].saturating_add(bipolar);
// Saturating add — NO shift, cells stay at natural position
self.cells[cell_idx] = self.cells[cell_idx].saturating_add(bipolar);
}
}

Expand Down
13 changes: 2 additions & 11 deletions crates/lance-graph-codec-research/src/universal_perception.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@
//! If any correlation is r < 0.3, the mapping is substrate-specific
//! and doesn't transfer. Still useful, but not universal.

use std::f64::consts::PI;

// ═══════════════════════════════════════════════════════════════════════
// SHARED: The Universal Accumulator
// ═══════════════════════════════════════════════════════════════════════

const PHI: f64 = 1.618_033_988_749_895;
const GOLDEN_ANGLE: f64 = 2.0 * PI / (PHI * PHI);

/// A modality-agnostic accumulator.
/// Works on any signal decomposed into N bands of B bits each.
/// The SAME accumulator structure for audio, text, and video.
Expand All @@ -50,20 +45,16 @@ impl UniversalAccumulator {
}

/// Accumulate one frame: N_bands values, each B bits.
/// Cyclic shift by golden angle per frame (same for ALL modalities).
/// Direct accumulation — no cyclic shift (decorrelation at encoding level).
pub fn accumulate(&mut self, band_values: &[u16]) {
assert_eq!(band_values.len(), self.n_bands);
let total_cells = self.cells.len();
let shift = ((self.frame_count as f64 * GOLDEN_ANGLE * total_cells as f64
/ (2.0 * PI)) as usize) % total_cells;

for band in 0..self.n_bands {
for bit in 0..self.bits_per_band {
let cell_idx = band * self.bits_per_band + bit;
let shifted = (cell_idx + shift) % total_cells;
let bit_val = ((band_values[band] >> bit) & 1) as i16;
let bipolar = bit_val * 2 - 1;
self.cells[shifted] = self.cells[shifted].saturating_add(bipolar);
self.cells[cell_idx] = self.cells[cell_idx].saturating_add(bipolar);
}
}
self.frame_count += 1;
Expand Down