Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions .claude/knowledge/phi-spiral-reconstruction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
# KNOWLEDGE UPDATE: φ-Spiral Reconstruction — The Core Superpower

## READ BY: family-codec-smith, palette-engineer, savant-research,
## truth-architect, cascade-architect, integration-lead

---

## 1. INTERPOLATION = SPIRAL EVALUATION, NOT AVERAGING

```
WRONG (halftone, deleted):
bin[3] missing → bin[3] = (bin[2] + bin[4]) / 2
Linear average. Destroys spiral structure.
Like replacing a curve with a straight line on a map.

RIGHT (φ-spiral reconstruction):
bin[3] missing → position 3 on the φ-spiral is KNOWN
θ = 3 × golden_angle = 3 × 2π/φ² ≈ 3 × 137.507°
r = f(3) from the spiral equation r(θ) = a × e^(bθ)
bin[3] = spiral(θ, r) → EXACT reconstruction
Because the spiral IS the constraint. Not a guess.
```

The golden ratio is the WORST case for rational approximation
(Hurwitz's theorem). This means:
- No two φ-spiral positions alias to the same quantization bucket
- The spiral fills space MAXIMALLY uniformly (Weyl equidistribution)
- Fewer points suffice to identify the spiral than any other curve

## 2. STRIDE AND OFFSET SELECTION

### Octave stride

```
BF16 weight vector: 5120 dimensions
Base17: 17 bins
Octaves: ceil(5120 / 17) = 302 octaves

Stride=1: sample ALL 302 octaves → 302 × 17 = 5134 BF16→f64 conversions
Stride=4: every 4th octave → 76 × 17 = 1292 conversions (4× faster)
Stride=16: every 16th octave → 19 × 17 = 323 conversions (16× faster)
Stride=20: every 20th octave → 16 × 17 = 272 conversions (19× faster)

The stride selects WHICH octaves to sample.
NOT which bins. ALL 17 bins are always sampled.
```

### Why stride works (the spiral argument)

```
Each octave is one 17-position spiral turn.
Consecutive octaves are CORRELATED (smooth weight variations).
Stride=16 means: sample every 16th turn of the spiral.

IF the spiral is smooth (weights don't jump wildly between octaves):
stride=16 captures the spiral shape with 19 points
the 283 skipped octaves lie ON the same spiral
reconstruction from 19 points = evaluate spiral at 302 positions

IF the spiral has high-frequency variation:
stride=16 misses sharp features (aliasing)
stride=1 needed for these tensors
→ detect via: compare stride=1 vs stride=16 Pearson ρ
→ if ρ > 0.99: stride=16 is safe (spiral is smooth)
→ if ρ < 0.95: use stride=1 (high-frequency content)
```

### Offset selection

```
Default: offset=0 (start at first octave)
Better: offset = golden ratio fractional part

offset = floor(n_octaves × (φ - 1)) = floor(302 × 0.618...) = 186

Starting at octave 186 instead of 0:
stride=16 samples octaves: 186, 202, 218, 234, 250, 266, 282, 296, 10, 26, ...
(wrapping around at 302)

This MAXIMIZES the coverage because φ-offset + stride
guarantees the sample positions don't cluster.

vs offset=0: samples 0, 16, 32, ... (regular, but misses middle if stride too big)
vs offset=186: φ-scattered across the full range

The offset IS the golden-angle sampling that highheelbgz already does.
It's the same φ-distribution applied to octave selection.
```

## 3. CORRECTION: BENDING vs COMPRESSION

Two sources of error, two different corrections:

### Bending (γ correction) — distribution SHAPE is wrong

```
Problem: raw cosine distribution is NOT uniform.
Gate weights: 68.9% near zero, thin tails.
Attention: broad, nearly symmetric.
Down: narrow, one-sided.

Uniform quantization wastes bits on empty regions.
Gate values near zero get the SAME resolution as gate values at 0.3
But the SiLU decision boundary IS at zero → needs MORE resolution there.

Fix: gamma_phi_encode(value, role_gamma, phi_scale)
Stage 1: γ-normalize by role (compress highlights, expand shadows)
gamma_encode(v, γ) = sign(v) × ln(1 + |v|/γ) × γ
Gate γ=1.50 → MOST expansion near zero
Q γ=0.37 → less expansion (already broad)

Stage 2: φ-distribute (golden ratio spacing)
phi_encode(v, φ_scale) = sign(v) × log_φ(1 + |v|/φ_scale) × φ_scale
Ensures quantization boundaries sit at irrational positions
No BF16 bucket aliasing

Stored as metadata: GammaProfile { role_gamma: [f32; 6], phi_scale: f32 }
28 bytes per model. Exact decode: phi_decode(gamma_decode(stored, γ), φ_scale)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply γ+φ decode in inverse transform order

This decode formula is reversed: gamma_phi_encode applies gamma first and phi second, so rehydration must decode phi first and then gamma. Documenting phi_decode(gamma_decode(...)) will produce incorrect round-trips if someone implements from this guide, and it conflicts with the actual inverse implementation in crates/bgz-tensor/src/gamma_phi.rs::gamma_phi_decode.

Useful? React with 👍 / 👎.

```

### Compression (stride/offset) — samples are SPARSE

```
Problem: stride=16 samples 19 of 302 octaves.
The 283 skipped octaves contribute to the true centroid average
but are not measured.

Fix: spiral-aware interpolation.
The 19 sampled points define a φ-spiral in 17D.
The 283 missing points lie ON this spiral (smooth assumption).
Reconstruction: evaluate spiral at missing positions.

NOT: linear interpolation (wrong, ignores curvature)
NOT: halftone dropping (wrong, misses entire dimensions)
IS: φ-spiral fit from sampled points → evaluate at all positions

Implementation:
For each of the 17 bins:
sampled_values[19]: the values we measured at stride=16
sampled_positions[19]: which octaves we sampled (0, 16, 32, ...)

Fit: r(θ) = a × e^(b×θ) through the 19 points
Evaluate: r(θ) at all 302 positions
Average: mean of all 302 reconstructed values = the true bin value

This is MORE accurate than averaging only the 19 sampled values
because the spiral fit exploits the smoothness constraint.
```

### When to use which correction

```
Per centroid pair in the distance table:

1. ALWAYS: γ correction (role-specific distribution shaping)
→ stored as GammaProfile metadata (28 bytes)
→ applied during encoding AND decoding

2. IF stride > 1: spiral reconstruction
→ stored as stride + offset metadata (2 bytes)
→ applied during centroid averaging (StackedN build)
→ NOT applied to the distance table values (those come from cosine)

3. IF Spearman ρ < 0.998 after γ: ICC profile correction
→ stored as transfer curve (per pair or per region)
→ applied during distance table lookup
→ absorbs whatever γ + spiral didn't fix

4. IF ICC insufficient: CoSENT candle training
→ directly optimizes rank order
→ last resort before LoRA (the nuclear option)
```

## 4. THE METADATA CHAIN

Every baked table carries:

```json
{
"source_gguf": "jinaai/jina-reranker-v3-GGUF",
"source_dtype": "BF16",
"n_centroids": 256,
"centroid_spd": 32,
"octave_stride": 16,
"octave_offset": 186,
"role": "ffn_up",
"gate_modulated": true,
"gamma_profile": {
"role_gamma": 0.12,
"phi_scale": 0.08,
"n_calibration": 5120
},
"encoding": "BF16",
"cosine_range": [-0.886, 0.826],
"sign_distribution": { "positive": 32512, "negative": 32768, "zero": 256 },
"spearman_rho_vs_f32": null,
"icc_profile": null,
"variance_agreement_score": null,
"cronbach_alpha_context": null
}
```

Each field enables exact reconstruction:
- stride + offset → which octaves were sampled
- gamma_profile → undo γ+φ for exact decode
- cosine_range → per-role scale factor
- icc_profile → correction curve (when available)
- variance_agreement → quorum confidence per pair

## 5. QUALITY CHECKS (updated)

Before encoding any distance table:

```
[ ] Uses StackedN/ClamCodebook? (not raw f32 bypass)
[ ] BF16 precision? (not 8-bit bottleneck)
[ ] All 17 bins sampled? (not halftone 9/17)
[ ] Stride documented? (octave_stride in metadata)
[ ] Offset is φ-fractional? (not 0, not arbitrary)
[ ] γ correction applied? (per-role from GammaProfile)
[ ] φ distribution applied? (irrational bucket boundaries)
[ ] Gate modulation on Up only? (silu(gate)×up before cosine)
[ ] Metadata JSON saved alongside table?
[ ] Reconstruction path documented? (decode = φ_decode(γ_decode(stored)))
```

## 6. THE ZECKENDORF CONNECTION

```
Zeckendorf's theorem: every positive integer has a unique
representation as sum of non-consecutive Fibonacci numbers.

ZeckF64 in bgz-tensor: encodes positions as Fibonacci sums.
Position 42 = F(9) + F(6) + F(3) = 34 + 8 + 2 = 44...
(actually: the nearest Zeckendorf representation)

Fibonacci numbers ARE the φ-spiral positions:
F(n) / F(n-1) → φ as n → ∞
Each Fibonacci number is the next position on the spiral

Zeckendorf decomposition = expressing a point as
a sum of spiral positions = its coordinates ON the spiral.

Spiral reconstruction from Zeckendorf:
Given: bin values at Zeckendorf positions
The Fibonacci structure tells you WHERE on the spiral each value sits
Reconstruction = evaluate the spiral BETWEEN Fibonacci positions
This is optimal because Fibonacci spacing = φ-optimal coverage
```