forked from lance-format/lance-graph
-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Phase 3 BF16 wiring + φ-spiral reconstruction theory #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,249 @@ | ||
| # KNOWLEDGE UPDATE: φ-Spiral Reconstruction — The Core Superpower | ||
|
|
||
| ## READ BY: family-codec-smith, palette-engineer, savant-research, | ||
| ## truth-architect, cascade-architect, integration-lead | ||
|
|
||
| --- | ||
|
|
||
| ## 1. INTERPOLATION = SPIRAL EVALUATION, NOT AVERAGING | ||
|
|
||
| ``` | ||
| WRONG (halftone, deleted): | ||
| bin[3] missing → bin[3] = (bin[2] + bin[4]) / 2 | ||
| Linear average. Destroys spiral structure. | ||
| Like replacing a curve with a straight line on a map. | ||
|
|
||
| RIGHT (φ-spiral reconstruction): | ||
| bin[3] missing → position 3 on the φ-spiral is KNOWN | ||
| θ = 3 × golden_angle = 3 × 2π/φ² ≈ 3 × 137.507° | ||
| r = f(3) from the spiral equation r(θ) = a × e^(bθ) | ||
| bin[3] = spiral(θ, r) → EXACT reconstruction | ||
| Because the spiral IS the constraint. Not a guess. | ||
| ``` | ||
|
|
||
| The golden ratio is the WORST case for rational approximation | ||
| (Hurwitz's theorem). This means: | ||
| - No two φ-spiral positions alias to the same quantization bucket | ||
| - The spiral fills space MAXIMALLY uniformly (Weyl equidistribution) | ||
| - Fewer points suffice to identify the spiral than any other curve | ||
|
|
||
| ## 2. STRIDE AND OFFSET SELECTION | ||
|
|
||
| ### Octave stride | ||
|
|
||
| ``` | ||
| BF16 weight vector: 5120 dimensions | ||
| Base17: 17 bins | ||
| Octaves: ceil(5120 / 17) = 302 octaves | ||
|
|
||
| Stride=1: sample ALL 302 octaves → 302 × 17 = 5134 BF16→f64 conversions | ||
| Stride=4: every 4th octave → 76 × 17 = 1292 conversions (4× faster) | ||
| Stride=16: every 16th octave → 19 × 17 = 323 conversions (16× faster) | ||
| Stride=20: every 20th octave → 16 × 17 = 272 conversions (19× faster) | ||
|
|
||
| The stride selects WHICH octaves to sample. | ||
| NOT which bins. ALL 17 bins are always sampled. | ||
| ``` | ||
|
|
||
| ### Why stride works (the spiral argument) | ||
|
|
||
| ``` | ||
| Each octave is one 17-position spiral turn. | ||
| Consecutive octaves are CORRELATED (smooth weight variations). | ||
| Stride=16 means: sample every 16th turn of the spiral. | ||
|
|
||
| IF the spiral is smooth (weights don't jump wildly between octaves): | ||
| stride=16 captures the spiral shape with 19 points | ||
| the 283 skipped octaves lie ON the same spiral | ||
| reconstruction from 19 points = evaluate spiral at 302 positions | ||
|
|
||
| IF the spiral has high-frequency variation: | ||
| stride=16 misses sharp features (aliasing) | ||
| stride=1 needed for these tensors | ||
| → detect via: compare stride=1 vs stride=16 Pearson ρ | ||
| → if ρ > 0.99: stride=16 is safe (spiral is smooth) | ||
| → if ρ < 0.95: use stride=1 (high-frequency content) | ||
| ``` | ||
|
|
||
| ### Offset selection | ||
|
|
||
| ``` | ||
| Default: offset=0 (start at first octave) | ||
| Better: offset = golden ratio fractional part | ||
|
|
||
| offset = floor(n_octaves × (φ - 1)) = floor(302 × 0.618...) = 186 | ||
|
|
||
| Starting at octave 186 instead of 0: | ||
| stride=16 samples octaves: 186, 202, 218, 234, 250, 266, 282, 296, 10, 26, ... | ||
| (wrapping around at 302) | ||
|
|
||
| This MAXIMIZES the coverage because φ-offset + stride | ||
| guarantees the sample positions don't cluster. | ||
|
|
||
| vs offset=0: samples 0, 16, 32, ... (regular, but misses middle if stride too big) | ||
| vs offset=186: φ-scattered across the full range | ||
|
|
||
| The offset IS the golden-angle sampling that highheelbgz already does. | ||
| It's the same φ-distribution applied to octave selection. | ||
| ``` | ||
|
|
||
| ## 3. CORRECTION: BENDING vs COMPRESSION | ||
|
|
||
| Two sources of error, two different corrections: | ||
|
|
||
| ### Bending (γ correction) — distribution SHAPE is wrong | ||
|
|
||
| ``` | ||
| Problem: raw cosine distribution is NOT uniform. | ||
| Gate weights: 68.9% near zero, thin tails. | ||
| Attention: broad, nearly symmetric. | ||
| Down: narrow, one-sided. | ||
|
|
||
| Uniform quantization wastes bits on empty regions. | ||
| Gate values near zero get the SAME resolution as gate values at 0.3 | ||
| But the SiLU decision boundary IS at zero → needs MORE resolution there. | ||
|
|
||
| Fix: gamma_phi_encode(value, role_gamma, phi_scale) | ||
| Stage 1: γ-normalize by role (compress highlights, expand shadows) | ||
| gamma_encode(v, γ) = sign(v) × ln(1 + |v|/γ) × γ | ||
| Gate γ=1.50 → MOST expansion near zero | ||
| Q γ=0.37 → less expansion (already broad) | ||
|
|
||
| Stage 2: φ-distribute (golden ratio spacing) | ||
| phi_encode(v, φ_scale) = sign(v) × log_φ(1 + |v|/φ_scale) × φ_scale | ||
| Ensures quantization boundaries sit at irrational positions | ||
| No BF16 bucket aliasing | ||
|
|
||
| Stored as metadata: GammaProfile { role_gamma: [f32; 6], phi_scale: f32 } | ||
| 28 bytes per model. Exact decode: phi_decode(gamma_decode(stored, γ), φ_scale) | ||
| ``` | ||
|
|
||
| ### Compression (stride/offset) — samples are SPARSE | ||
|
|
||
| ``` | ||
| Problem: stride=16 samples 19 of 302 octaves. | ||
| The 283 skipped octaves contribute to the true centroid average | ||
| but are not measured. | ||
|
|
||
| Fix: spiral-aware interpolation. | ||
| The 19 sampled points define a φ-spiral in 17D. | ||
| The 283 missing points lie ON this spiral (smooth assumption). | ||
| Reconstruction: evaluate spiral at missing positions. | ||
|
|
||
| NOT: linear interpolation (wrong, ignores curvature) | ||
| NOT: halftone dropping (wrong, misses entire dimensions) | ||
| IS: φ-spiral fit from sampled points → evaluate at all positions | ||
|
|
||
| Implementation: | ||
| For each of the 17 bins: | ||
| sampled_values[19]: the values we measured at stride=16 | ||
| sampled_positions[19]: which octaves we sampled (0, 16, 32, ...) | ||
|
|
||
| Fit: r(θ) = a × e^(b×θ) through the 19 points | ||
| Evaluate: r(θ) at all 302 positions | ||
| Average: mean of all 302 reconstructed values = the true bin value | ||
|
|
||
| This is MORE accurate than averaging only the 19 sampled values | ||
| because the spiral fit exploits the smoothness constraint. | ||
| ``` | ||
|
|
||
| ### When to use which correction | ||
|
|
||
| ``` | ||
| Per centroid pair in the distance table: | ||
|
|
||
| 1. ALWAYS: γ correction (role-specific distribution shaping) | ||
| → stored as GammaProfile metadata (28 bytes) | ||
| → applied during encoding AND decoding | ||
|
|
||
| 2. IF stride > 1: spiral reconstruction | ||
| → stored as stride + offset metadata (2 bytes) | ||
| → applied during centroid averaging (StackedN build) | ||
| → NOT applied to the distance table values (those come from cosine) | ||
|
|
||
| 3. IF Spearman ρ < 0.998 after γ: ICC profile correction | ||
| → stored as transfer curve (per pair or per region) | ||
| → applied during distance table lookup | ||
| → absorbs whatever γ + spiral didn't fix | ||
|
|
||
| 4. IF ICC insufficient: CoSENT candle training | ||
| → directly optimizes rank order | ||
| → last resort before LoRA (the nuclear option) | ||
| ``` | ||
|
|
||
| ## 4. THE METADATA CHAIN | ||
|
|
||
| Every baked table carries: | ||
|
|
||
| ```json | ||
| { | ||
| "source_gguf": "jinaai/jina-reranker-v3-GGUF", | ||
| "source_dtype": "BF16", | ||
| "n_centroids": 256, | ||
| "centroid_spd": 32, | ||
| "octave_stride": 16, | ||
| "octave_offset": 186, | ||
| "role": "ffn_up", | ||
| "gate_modulated": true, | ||
| "gamma_profile": { | ||
| "role_gamma": 0.12, | ||
| "phi_scale": 0.08, | ||
| "n_calibration": 5120 | ||
| }, | ||
| "encoding": "BF16", | ||
| "cosine_range": [-0.886, 0.826], | ||
| "sign_distribution": { "positive": 32512, "negative": 32768, "zero": 256 }, | ||
| "spearman_rho_vs_f32": null, | ||
| "icc_profile": null, | ||
| "variance_agreement_score": null, | ||
| "cronbach_alpha_context": null | ||
| } | ||
| ``` | ||
|
|
||
| Each field enables exact reconstruction: | ||
| - stride + offset → which octaves were sampled | ||
| - gamma_profile → undo γ+φ for exact decode | ||
| - cosine_range → per-role scale factor | ||
| - icc_profile → correction curve (when available) | ||
| - variance_agreement → quorum confidence per pair | ||
|
|
||
| ## 5. QUALITY CHECKS (updated) | ||
|
|
||
| Before encoding any distance table: | ||
|
|
||
| ``` | ||
| [ ] Uses StackedN/ClamCodebook? (not raw f32 bypass) | ||
| [ ] BF16 precision? (not 8-bit bottleneck) | ||
| [ ] All 17 bins sampled? (not halftone 9/17) | ||
| [ ] Stride documented? (octave_stride in metadata) | ||
| [ ] Offset is φ-fractional? (not 0, not arbitrary) | ||
| [ ] γ correction applied? (per-role from GammaProfile) | ||
| [ ] φ distribution applied? (irrational bucket boundaries) | ||
| [ ] Gate modulation on Up only? (silu(gate)×up before cosine) | ||
| [ ] Metadata JSON saved alongside table? | ||
| [ ] Reconstruction path documented? (decode = φ_decode(γ_decode(stored))) | ||
| ``` | ||
|
|
||
| ## 6. THE ZECKENDORF CONNECTION | ||
|
|
||
| ``` | ||
| Zeckendorf's theorem: every positive integer has a unique | ||
| representation as sum of non-consecutive Fibonacci numbers. | ||
|
|
||
| ZeckF64 in bgz-tensor: encodes positions as Fibonacci sums. | ||
| Position 42 = F(9) + F(6) + F(3) = 34 + 8 + 2 = 44... | ||
| (actually: the nearest Zeckendorf representation) | ||
|
|
||
| Fibonacci numbers ARE the φ-spiral positions: | ||
| F(n) / F(n-1) → φ as n → ∞ | ||
| Each Fibonacci number is the next position on the spiral | ||
|
|
||
| Zeckendorf decomposition = expressing a point as | ||
| a sum of spiral positions = its coordinates ON the spiral. | ||
|
|
||
| Spiral reconstruction from Zeckendorf: | ||
| Given: bin values at Zeckendorf positions | ||
| The Fibonacci structure tells you WHERE on the spiral each value sits | ||
| Reconstruction = evaluate the spiral BETWEEN Fibonacci positions | ||
| This is optimal because Fibonacci spacing = φ-optimal coverage | ||
| ``` | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decode formula is reversed:
gamma_phi_encodeapplies gamma first and phi second, so rehydration must decode phi first and then gamma. Documentingphi_decode(gamma_decode(...))will produce incorrect round-trips if someone implements from this guide, and it conflicts with the actual inverse implementation incrates/bgz-tensor/src/gamma_phi.rs::gamma_phi_decode.Useful? React with 👍 / 👎.