From e8e1d636ee0c3326b0b491c26e79eb737a5e3e5a Mon Sep 17 00:00:00 2001 From: Claude Date: Sun, 5 Apr 2026 22:23:35 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20CRITICAL=20knowledge=20sync=20for=20sig?= =?UTF-8?q?ned=20session=20=E2=80=94=2033%=20error,=20not=20cosmetic?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The signed session built i8 tables by converting u8 → i8 (subtract 128). This RELABELS ranks but does NOT recover gate sign information. The 33% scale error from SiLU gate nonlinearity is still baked in. MEASURED on real Qwopus BF16: 68.9% of gate weights near zero (decision boundary) 99.2% of table cells change with SiLU correction Mean change: 84.2 u8 levels (33% of 256 scale) Consistent across all 64 layers FIX: build i8 tables directly from BF16 → signed cosine → i8 NOT from u8 table → subtract 128 → i8 Also documents: - 7-lane calibration plan (u8/i8/γ+φ/signed/spiral) - BF16 bucket boundary awareness (5% rank flips) - Jina v5 as ONNX f32 ground truth (no API key) - Why CDF makes all models look identical (by design) https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp --- .claude/KNOWLEDGE_SYNC_SIGNED_SESSION.md | 233 +++++++++++++++++++++++ 1 file changed, 233 insertions(+) create mode 100644 .claude/KNOWLEDGE_SYNC_SIGNED_SESSION.md diff --git a/.claude/KNOWLEDGE_SYNC_SIGNED_SESSION.md b/.claude/KNOWLEDGE_SYNC_SIGNED_SESSION.md new file mode 100644 index 00000000..546ff6fa --- /dev/null +++ b/.claude/KNOWLEDGE_SYNC_SIGNED_SESSION.md @@ -0,0 +1,233 @@ +# KNOWLEDGE SYNC: What the Signed Session Needs to Know + +## THE 33% ERROR — Not Cosmetic + +### What happened + +``` +Step 1 (this session, early): Synthetic gates on Jina lens + Result: cos(raw, corrected) = 0.999, 83% peak agreement + Verdict: "COSMETIC" + +Step 2 (this session, later): REAL Qwopus 27B BF16 gates streamed + Result: 86% material corrections, 99.2% cells change, mean Δ = 84.2 u8 + Verdict: "33% OF THE ENTIRE SCALE IS WRONG" + +The synthetic test used wide gate ranges [-0.1, 0.3]. +The real Qwopus gates are concentrated at zero: 68.9% of |w| < 0.01. +SiLU's nonlinearity lives at zero. Narrow gates = big correction. +``` + +### The numbers (MEASURED, not estimated) + +``` +Source: Qwopus3.5-27B-v3 BF16 GGUF (53.8 GB), streamed via HTTP range +File: crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/layer_stats.json + +ffn_gate L0: + Weight range: [-0.109, 0.115] + Near zero (|w| < 0.01): 68.9% + Cosine range: [-0.23, +0.18], std=0.022 + +ffn_up L0 (what the table encodes): + Raw cosine std: 0.021 + SiLU(gate)×up cosine std: 0.051 ← 2.4× MORE SPREAD + + Table comparison (256×256 u8): + Cells changed: 99.2% (64,968 / 65,536) + Mean |Δ|: 84.2 u8 levels (33% of 256 scale) + Max |Δ|: 254 u8 levels (nearly full range) + +Consistent across ALL 64 layers: + Layer 0: gate_zero=69%, SiLU Δ=85 + Layer 16: gate_zero=64%, SiLU Δ=85 + Layer 32: gate_zero=66%, SiLU Δ=84 + Layer 48: gate_zero=66%, SiLU Δ=84 + Layer 63: gate_zero=57%, SiLU Δ=84 +``` + +## THE CRITICAL BUG IN signed_engine.rs + +### What the signed session built + +```rust +// In dual_signed_experiment.rs line ~30: +let signed_table: Vec = table.iter() + .map(|&v| (v as i16 - 128) as i8) + .collect(); +``` + +### Why this does NOT fix the 33% error + +``` +The u8 table was built from: + CLAM centroids → raw cosine → CDF percentile → u8[0,255] + +The gate sign information was LOST during "raw cosine": + cos(weight_row_i, weight_row_j) treats all dimensions equally. + It doesn't know that gate[k] = -0.005 means BLOCK + while gate[k] = +0.005 means PASS. + Both contribute equally to cosine. + +Converting u8 → i8 by subtracting 128: + u8[156] → i8[+28] (was positive, still positive) + u8[121] → i8[-7] (was below midpoint, now negative) + + This creates signed values from the CDF RANK, not from the WEIGHT SIGNS. + A u8 value of 121 means "43rd percentile of cosine distribution" + NOT "the weights have opposite signs here." + + The i8 conversion is a RELABELING of ranks, not a RECOVERY of signs. +``` + +### What needs to happen instead + +``` +WRONG (current signed_engine path): + u8 table (gate info lost) → subtract 128 → i8 (gate info still lost) + +RIGHT: + BF16 weights → compute SIGNED cosine → encode directly as i8[-128,+127] + + For gate-modulated roles (K, V, Up): + activated = silu(gate_row) × weight_row (elementwise) + cos(activated_i, activated_j) → i8 (SIGNED, preserves gate decisions) + + For raw roles (Q, Down): + cos(weight_row_i, weight_row_j) → i8 (still signed, preserves weight polarity) + + The sign in the cosine IS the excitation/inhibition signal. + Negative cosine = opposed features = inhibition. + This is REAL — the models have negative cosines: + Qwopus ffn_gate: cos[-0.23, +0.18] + Reranker: cos[-0.886, +0.826] + Reader-LM ffn_down: cos[-0.885, +0.188] +``` + +## WHAT THE DUAL EXPERIMENT ACTUALLY TESTS + +``` +Current dual_signed_experiment.rs tests: + "Does i8(u8 - 128) produce different peaks than u8?" + Answer: Yes, but the difference is from RELABELING, not from gate recovery. + +What it SHOULD test: + "Does i8(BF16 → signed cosine) produce different peaks than u8(BF16 → CDF)?" + Answer: unknown — not built yet. + +The experiment framework (DualEngine, DualResult, comparison metrics) is CORRECT. +The TABLE CONTENT feeding into the signed engine is WRONG. +``` + +## THE 7-LANE CALIBRATION PLAN + +### Why we went back to the design board + +``` +After measuring the 33% error, we realized: +1. All 7 HDR lenses have identical statistics (Mean=127.5, Std=73.6) + because CDF encoding forces uniform distribution. + Model-specific topology IS preserved (99.2% bytes differ between models) + but you can't see it in the statistics. + +2. γ+φ encoding (golden ratio offset) is NOT applied to any baked table. + The code exists in bgz-tensor/gamma_phi.rs but was never wired. + Per-role γ offsets are DOCUMENTED (Gate=1.50, Q=0.37) but NOT USED. + +3. Calibrating against GGUF BF16 is calibrating against TIFF, not RAW. + BF16 has 7-bit mantissa → ±0.008 precision → ~5% rank flips at boundaries. + Need ONNX f32 as ground truth. + +4. Jina v5 has BOTH ONNX f32 (2.4 GB) and GGUF (1.2 GB). + No API key needed. Both verified streamable. +``` + +### The 7 encoding lanes to compare + +``` +For each model × each role: + Lane 1: u8 linear (current 64×64 codebook tables) + Lane 2: u8 CDF (current 256×256 HDR lenses) + Lane 3: u8 γ+φ (gamma offset + phi redistribution) + Lane 4: i8 from u8 (subtract 128 — what signed_engine.rs does now) + Lane 5: i8 from BF16 (signed cosine directly — NOT BUILT YET) + Lane 6: i8 γ+φ signed (gamma + phi on signed range) + Lane 7: highheelbgz spiral (golden ratio stride encoding) + +Ground truth: ONNX f32 forward pass via rten +Metric: Spearman ρ(lane_distances, onnx_distances) +After ICC: does correction bring all lanes to ρ > 0.998? + +The lane that needs the LEAST ICC correction = the most faithful encoding. +``` + +### BF16 bucket boundary awareness + +``` +When raw cosine is within ±0.008 of a HEEL bucket boundary, +BF16 truncation can flip the bucket assignment. +High precision refinement (HIP/TWIG) on the wrong bucket = confidently lost. + +Fix: boundary_risk metadata per centroid pair. + 95% safe → fast cascade + 5% uncertain → skip cascade, validate at LEAF or compute directly + +γ+φ golden ratio stride reduces boundary risk by placing bucket +edges at irrational positions that don't align with BF16 quant steps. +``` + +## ACTION ITEMS FOR THE SIGNED SESSION + +``` +1. DO NOT trust the current i8 tables (u8 - 128 = relabeled ranks, not gate signs) + +2. BUILD i8 tables directly from BF16 weights: + Stream Qwopus BF16 → silu(gate) × up → cosine → round(cos × 127) → i8 + Use the existing streaming pipeline (Python scripts in this session, + or the Rust stream_hdr_lens.rs pattern) + +3. RE-RUN dual_signed_experiment with BOTH table types: + DualEngine with: + unsigned = u8 CDF (current, from raw cosine) + signed = i8 from BF16 signed cosine (NEW, from silu(gate)×up) + + THEN compare. The agreement metric will be meaningful. + +4. For calibration: + Download Jina v5 ONNX (2.4 GB) — the f32 ground truth + Download Jina v5 GGUF (1.2 GB) — our streaming source + Run rten on ONNX → compute f32 embedding cosines for test sentences + Compare: baked table distances vs ONNX distances → Spearman ρ + Build ICC profile → corrected ρ should reach > 0.998 + +5. Temperature + nucleus sampling: + This UNBLOCKS coherent output. Without it, even perfect tables collapse. + See HANDOVER_MAVERICK_SESSION.md for the 10-line implementation. + Wire INTO thinking styles (Analytical=top_p 0.3, Creative=0.95). +``` + +## FILES THAT MATTER + +``` +MEASURED DATA (this session): + crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/layer_stats.json + → per-layer gate near-zero %, cosine ranges, SiLU correction stats + crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/gate_raw_256x256.u8 + → L0 gate table WITHOUT SiLU + crates/thinking-engine/data/Qwopus3.5-27B-v3-BF16-silu/gate_silu_corrected_256x256.u8 + → L0 gate table WITH SiLU (compare: 99.2% cells differ, mean Δ=84.2) + +SILU CORRECTION CODE: + crates/thinking-engine/src/silu_correction.rs + → generate_training_data(), gate_modulate_centroids(), apply_corrections() + → May be OBSOLETE if i8-from-BF16 path works (sign preserves gate natively) + → But the MEASUREMENT code (correction_stats()) is still valuable for analysis + +CALIBRATION HARNESS: + crates/thinking-engine/examples/calibrate_lenses.rs → Spearman + ICC builder + crates/lance-graph-contract/src/high_heel.rs → LensProfile, LensConfig, LENS_REGISTRY + +HANDOVER DOCS: + .claude/HANDOVER_MAVERICK_SESSION.md → i8 architecture, Maverick plan, temperature fix + .claude/HANDOVER_CALIBRATION_SESSION.md → H1-H5 hypotheses, Cronbach α protocol +```