jc: Σ-Codebook Viability Probe — empirically rules out CausalEdge64 8→16 byte expansion#288
Conversation
Empirical measurement (NOT a theorem proof). Decides between Σ-edge encoding
strategies BEFORE any production CausalEdgeTensor / propagate() code is
written.
# Result
✓ CODEBOOK VIABLE R² = 0.994900 threshold ≥ 0.99 (278 ms)
Edges: 10,000 Codebook: k=256 Used clusters: 256/256 k-Means iters: 40/100
R² is the log-Euclidean coefficient of determination — 99.49% of the variance
in log-Σ space is captured by the 256-entry codebook. Just over the 0.99
threshold; passes but tightly. For the 240-edge HighHeelBGZ container case
this is sufficient.
# Decision unlocked
Result rules out the v3 "CausalEdgeTensor 8→16 Byte" expansion (which would
have halved the 240-edge-per-container limit). Recommended path:
Option A: Σ-Codebook (3.5 KB workspace) + 1-byte sidecar per edge
CausalEdge64 unchanged. 240 edges per container preserved.
Option C: SchemaSidecar Block 14/15 carries Σ-indices for 240 edges.
No new sidecar pipe; uses already-reserved 16k-fingerprint blocks.
# What was measured
Synthesize 10,000 plausible CausalEdges with realistic field distributions:
- frequency: u² (Beta-shaped, biased low) — most edges weak
- confidence: 1−(1−u)² (Beta-shaped, biased high) — when present, decisive
- direction: uniform discrete 0..8 — 8 discrete bearings
Map each edge to a 2×2 SPD Σ via reasoned semantic mapping:
- strength = freq × conf (overall scale)
- anisotropy = conf (high conf → narrow major axis)
- rotation = direction · π/8
Run Lloyd's k-Means in 3D log-Euclidean space (the standard linearization of
the affine-invariant Riemannian metric on the SPD cone). Compute:
R² = 1 − Σ d²(Σ_i, codebook[assignment_i]) / Σ d²(Σ_i, Σ_global_mean)
# Architecturally sound because
1. CausalEdge64 stays bit-exact unchanged — 7 consumer crates unaffected
2. HighHeelBGZ 240-edges/2KB hard limit preserved
3. Codebook overhead negligible (3.5 KB workspace-wide vs 1.92 KB per container)
4. Bgz17-Palette pattern (also k=256) is the natural sibling — same
architectural logic, applied to white-matter Σ instead of S/P/O
# Honest limitations
- 2×2 SPD chosen for closed-form math; full 3×3 anisotropic Σ would be
harder to cluster — but the 2×2 case demonstrates the principle
- Synthesized distribution is plausible, not measured from production —
re-run with real-stream data to confirm
- R² = 0.9949 is just over threshold; for >5-hop multi-hop queries the
cumulative error may matter — caller should evaluate based on use case
- Log-Euclidean ≠ affine-invariant; agree to first order, diverge in tails
# Files
- crates/jc/src/sigma_codebook_probe.rs (new, ~370 lines incl. 6 unit tests)
- crates/jc/src/lib.rs (module declaration with explicit "NOT a pillar" note)
- crates/jc/examples/sigma_probe.rs (new standalone runner with decision output)
- crates/jc/Cargo.toml (new [[example]] entry)
NOT added to run_all_pillars — probes are diagnostic, pillars are theorem
proofs. Different category.
# Run
cargo test --manifest-path crates/jc/Cargo.toml --release sigma_codebook
cargo run --manifest-path crates/jc/Cargo.toml --release --example sigma_probe
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43e72f1e19
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn sq_dist3(a: &[f64; 3], b: &[f64; 3]) -> f64 { | ||
| let dx = a[0] - b[0]; | ||
| let dy = a[1] - b[1]; | ||
| let dz = a[2] - b[2]; | ||
| dx * dx + dy * dy + dz * dz |
There was a problem hiding this comment.
Use Frobenius-weighted distance for log-Σ clustering
The probe states that k-means and R² are computed in log-Euclidean/Frobenius space, but sq_dist3 treats [a,b,c] as plain Euclidean coordinates and uses dx²+dy²+dz². For symmetric 2×2 matrices, Frobenius distance is Δa² + 2·Δb² + Δc², so the off-diagonal term is currently underweighted by 2x. This changes centroid assignments and SSE/SST, which can shift r_squared around the 0.99 decision threshold and lead to a wrong architecture recommendation.
Useful? React with 👍 / 👎.
…s Pillar 6 (Σ push-forward), not a 3D renderer
User pointed to actual source: crates/jc/src/ewa_sandwich.rs +
crates/lance-graph-contract/src/sigma_propagation.rs + .claude/plans/
jc-pillars-runtime-wiring-v1{,-ERRATUM}.md + crates/jc/examples/
osint_edge_traversal.rs + splat_perturbationslernen.rs.
Real architecture: EWA-Sandwich = Σ_n = M_n·...·M_1·Σ_0·M_1ᵀ·...·M_nᵀ
along multi-hop edge paths. Pillar 6 certifies PSD-preservation
(10000/10000 hops in probe) + log-norm concentration at
Köstenberger-Stark rate (CV tightness 1.467×). Combined with Pillar
5 (Jirak scalar), 5+ (KS Σ-tensor), 5++ (DZ Hilbert), 7
(α-saturation), the full aggregation substrate sits on certified
ground.
Plus PR #288 (Σ-codebook viability, R² = 0.9949) rules out
CausalEdge64 8→16 byte expansion — 256-entry codebook with 1-byte
sidecar is sufficient.
EPIPHANIES: prepend CORRECTION-OF entry with the real math kernel +
pillar-stack composition; keep the original splat-conjecture entry
intact (append-only).
IDEAS: split into two distinct rows — (1) CORRECTION acknowledging
EWA-Sandwich is an existing certified pillar not a new idea;
(2) separate-and-orthogonal 3DGS render-buffer idea kept for
sprint-5+ pickup (different crate home, different math role).
sprint-4-execution-plan.md (W1): patched W1's wrong acronym
expansion ("Efficient Weighted Adjacency" → Elliptical Weighted
Average, Heckbert origin, Pillar 6 in JC framework, real code
locations cited).
Was
Empirical measurement, NOT a theorem proof. Decides between Σ-edge encoding strategies before any production
CausalEdgeTensor/propagate()code is written.Resultat
R² ist der log-Euklidische Bestimmtheits-Koeffizient — 99.49 % der Varianz im log-Σ-Raum wird vom 256-Eintrag-Codebook eingefangen. Knapp über der 0.99-Schwelle; passt aber tight. Für den 240-Edge-HighHeelBGZ-Container-Fall ausreichend.
Architektur-Entscheidung freigeschaltet
Das Resultat schließt den v3 "CausalEdgeTensor 8→16 Byte"-Erweiterungspfad aus (hätte das 240-Edges-pro-Container-Limit halbiert). Empfohlener Pfad:
In beiden Fällen: CausalEdge64 unverändert, HighHeelBGZ-Container 240-Edge-Hard-Limit erhalten, 7 Konsumenten-Crates unberührt.
Was gemessen wurde
Synthese von 10 000 plausiblen CausalEdges mit realistischer Feldverteilung:
frequency: u² (Beta-shaped, biased low) — die meisten Edges sind schwach belegtconfidence: 1−(1−u)² (Beta-shaped, biased high) — wenn da, dann entscheidenddirection: uniform discrete 0..8 — 8 diskrete RichtungenMapping jedes Edges zu 2×2 SPD-Σ via semantisch begründete Ableitung:
strength= freq × conf (Gesamtskala)anisotropy= conf (hohe Konfidenz → schmale Hauptachse)rotation= direction · π/8Dann Lloyds k-Means in 3D log-Euklidischem Raum (Standardlinearisierung der affine-invarianten Riemannschen Metric auf der SPD-Mannigfaltigkeit). Berechnung:
Architektonisch sauber weil
Honest Limitations
Files
crates/jc/src/sigma_codebook_probe.rs— neu, ~370 Zeilen inkl. 6 Unit-Testscrates/jc/src/lib.rs— Modul-Deklaration mit explizitem "NOT a pillar"-Hinweiscrates/jc/examples/sigma_probe.rs— neuer standalone-Runner mit Entscheidungs-Outputcrates/jc/Cargo.toml— neuer[[example]]-EintragNicht zur
run_all_pillars-Liste hinzugefügt — Probes sind diagnostisch, Pillars sind Theorem-Beweise. Andere Kategorie.Tests (6/6 grün)
sym2_log_exp_round_trip(matrix log/exp Round-trip)identity_log_is_zero(sanity: log(I) = 0)edge_synthesizer_is_deterministic(RNG-State-Schutz)edge_to_sigma_produces_spd(Mapping liefert immer SPD)kmeans_converges_on_separable_data(Algorithmus-Sanity)probe_runs_and_reports_meaningful_result(End-to-end)Verifikation
Unblocks
Mit diesem Probe-Resultat kann jetzt
CausalEdgeTensor-Variante als 9-Byte-Sidecar (CausalEdge64+ 1 Byte Σ-Codebook-Index) entworfen werden, ODER äquivalent über Schemasidecar Block 14/15. Caller-Wahl, beide architektonisch tragbar.