jc: Σ-Codebook Viability Probe — empirically rules out CausalEdge64 8→16 byte expansion by AdaWorldAPI · Pull Request #288 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-29T10:21:27Z

Was

Empirical measurement, NOT a theorem proof. Decides between Σ-edge encoding strategies before any production CausalEdgeTensor / propagate() code is written.

Resultat

✓ CODEBOOK VIABLE  R² = 0.994900  threshold ≥ 0.99  (278 ms)
Edges: 10,000  Codebook: k=256  Used clusters: 256/256  k-Means iters: 40/100
Σ eigenvalue-ratio max: 4.00× (anisotropy spread indicator)

R² ist der log-Euklidische Bestimmtheits-Koeffizient — 99.49 % der Varianz im log-Σ-Raum wird vom 256-Eintrag-Codebook eingefangen. Knapp über der 0.99-Schwelle; passt aber tight. Für den 240-Edge-HighHeelBGZ-Container-Fall ausreichend.

Architektur-Entscheidung freigeschaltet

Das Resultat schließt den v3 "CausalEdgeTensor 8→16 Byte"-Erweiterungspfad aus (hätte das 240-Edges-pro-Container-Limit halbiert). Empfohlener Pfad:

Option	Strategie	Größe
A	Σ-Codebook (3.5 KB workspace-weit) + 1-Byte-Sidecar pro Edge	240 Byte/Container Wachstum
C	SchemaSidecar Block 14/15 trägt Σ-Indices für 240 Edges	0 Byte Wachstum (nutzt reservierten Platz)

In beiden Fällen: CausalEdge64 unverändert, HighHeelBGZ-Container 240-Edge-Hard-Limit erhalten, 7 Konsumenten-Crates unberührt.

Was gemessen wurde

Synthese von 10 000 plausiblen CausalEdges mit realistischer Feldverteilung:

frequency: u² (Beta-shaped, biased low) — die meisten Edges sind schwach belegt
confidence: 1−(1−u)² (Beta-shaped, biased high) — wenn da, dann entscheidend
direction: uniform discrete 0..8 — 8 diskrete Richtungen

Mapping jedes Edges zu 2×2 SPD-Σ via semantisch begründete Ableitung:

strength = freq × conf (Gesamtskala)
anisotropy = conf (hohe Konfidenz → schmale Hauptachse)
rotation = direction · π/8

Dann Lloyds k-Means in 3D log-Euklidischem Raum (Standardlinearisierung der affine-invarianten Riemannschen Metric auf der SPD-Mannigfaltigkeit). Berechnung:

R² = 1 − Σ d²(Σ_i, codebook[assignment_i]) / Σ d²(Σ_i, Σ_global_mean)

Architektonisch sauber weil

CausalEdge64 bleibt bit-genau unverändert — 7 Konsumenten-Crates unberührt
HighHeelBGZ 240-edges/2KB Hard-Limit erhalten
Codebook-Overhead vernachlässigbar (3.5 KB workspace-weit vs. 1.92 KB pro Container)
bgz17-Palette-Pattern (auch k=256) ist die natürliche Schwester — gleiche architektonische Logik, nur auf white-matter Σ angewandt statt S/P/O

Honest Limitations

2×2 SPD gewählt für closed-form Mathe; volles 3×3 anisotropes Σ würde schwerer clustern — aber 2×2 demonstriert das Prinzip
Synthetisierte Verteilung ist plausibel, nicht aus Production gemessen — mit echtem Stream nochmal laufen lassen, um zu bestätigen
R² = 0.9949 ist knapp über Threshold; für >5-Hop-Multi-Hop-Queries kann der kumulierte Fehler relevant werden — Caller sollte basierend auf Use-Case bewerten
Log-Euklidisch ≠ affine-invariant; stimmen erster Ordnung überein, divergieren in Tails

Files

crates/jc/src/sigma_codebook_probe.rs — neu, ~370 Zeilen inkl. 6 Unit-Tests
crates/jc/src/lib.rs — Modul-Deklaration mit explizitem "NOT a pillar"-Hinweis
crates/jc/examples/sigma_probe.rs — neuer standalone-Runner mit Entscheidungs-Output
crates/jc/Cargo.toml — neuer [[example]]-Eintrag

Nicht zur run_all_pillars-Liste hinzugefügt — Probes sind diagnostisch, Pillars sind Theorem-Beweise. Andere Kategorie.

Tests (6/6 grün)

sym2_log_exp_round_trip (matrix log/exp Round-trip)
identity_log_is_zero (sanity: log(I) = 0)
edge_synthesizer_is_deterministic (RNG-State-Schutz)
edge_to_sigma_produces_spd (Mapping liefert immer SPD)
kmeans_converges_on_separable_data (Algorithmus-Sanity)
probe_runs_and_reports_meaningful_result (End-to-end)

Verifikation

cargo test --manifest-path crates/jc/Cargo.toml --release sigma_codebook
# 6 passed; 0 failed

cargo run --manifest-path crates/jc/Cargo.toml --release --example sigma_probe
# ✓ CODEBOOK VIABLE  R²=0.994900  threshold≥0.99

Unblocks

Mit diesem Probe-Resultat kann jetzt CausalEdgeTensor-Variante als 9-Byte-Sidecar (CausalEdge64 + 1 Byte Σ-Codebook-Index) entworfen werden, ODER äquivalent über Schemasidecar Block 14/15. Caller-Wahl, beide architektonisch tragbar.

Empirical measurement (NOT a theorem proof). Decides between Σ-edge encoding strategies BEFORE any production CausalEdgeTensor / propagate() code is written. # Result ✓ CODEBOOK VIABLE R² = 0.994900 threshold ≥ 0.99 (278 ms) Edges: 10,000 Codebook: k=256 Used clusters: 256/256 k-Means iters: 40/100 R² is the log-Euclidean coefficient of determination — 99.49% of the variance in log-Σ space is captured by the 256-entry codebook. Just over the 0.99 threshold; passes but tightly. For the 240-edge HighHeelBGZ container case this is sufficient. # Decision unlocked Result rules out the v3 "CausalEdgeTensor 8→16 Byte" expansion (which would have halved the 240-edge-per-container limit). Recommended path: Option A: Σ-Codebook (3.5 KB workspace) + 1-byte sidecar per edge CausalEdge64 unchanged. 240 edges per container preserved. Option C: SchemaSidecar Block 14/15 carries Σ-indices for 240 edges. No new sidecar pipe; uses already-reserved 16k-fingerprint blocks. # What was measured Synthesize 10,000 plausible CausalEdges with realistic field distributions: - frequency: u² (Beta-shaped, biased low) — most edges weak - confidence: 1−(1−u)² (Beta-shaped, biased high) — when present, decisive - direction: uniform discrete 0..8 — 8 discrete bearings Map each edge to a 2×2 SPD Σ via reasoned semantic mapping: - strength = freq × conf (overall scale) - anisotropy = conf (high conf → narrow major axis) - rotation = direction · π/8 Run Lloyd's k-Means in 3D log-Euclidean space (the standard linearization of the affine-invariant Riemannian metric on the SPD cone). Compute: R² = 1 − Σ d²(Σ_i, codebook[assignment_i]) / Σ d²(Σ_i, Σ_global_mean) # Architecturally sound because 1. CausalEdge64 stays bit-exact unchanged — 7 consumer crates unaffected 2. HighHeelBGZ 240-edges/2KB hard limit preserved 3. Codebook overhead negligible (3.5 KB workspace-wide vs 1.92 KB per container) 4. Bgz17-Palette pattern (also k=256) is the natural sibling — same architectural logic, applied to white-matter Σ instead of S/P/O # Honest limitations - 2×2 SPD chosen for closed-form math; full 3×3 anisotropic Σ would be harder to cluster — but the 2×2 case demonstrates the principle - Synthesized distribution is plausible, not measured from production — re-run with real-stream data to confirm - R² = 0.9949 is just over threshold; for >5-hop multi-hop queries the cumulative error may matter — caller should evaluate based on use case - Log-Euclidean ≠ affine-invariant; agree to first order, diverge in tails # Files - crates/jc/src/sigma_codebook_probe.rs (new, ~370 lines incl. 6 unit tests) - crates/jc/src/lib.rs (module declaration with explicit "NOT a pillar" note) - crates/jc/examples/sigma_probe.rs (new standalone runner with decision output) - crates/jc/Cargo.toml (new [[example]] entry) NOT added to run_all_pillars — probes are diagnostic, pillars are theorem proofs. Different category. # Run cargo test --manifest-path crates/jc/Cargo.toml --release sigma_codebook cargo run --manifest-path crates/jc/Cargo.toml --release --example sigma_probe

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43e72f1e19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-29T10:23:14Z

+fn sq_dist3(a: &[f64; 3], b: &[f64; 3]) -> f64 {
+    let dx = a[0] - b[0];
+    let dy = a[1] - b[1];
+    let dz = a[2] - b[2];
+    dx * dx + dy * dy + dz * dz


Use Frobenius-weighted distance for log-Σ clustering

The probe states that k-means and R² are computed in log-Euclidean/Frobenius space, but sq_dist3 treats [a,b,c] as plain Euclidean coordinates and uses dx²+dy²+dz². For symmetric 2×2 matrices, Frobenius distance is Δa² + 2·Δb² + Δc², so the off-diagonal term is currently underweighted by 2x. This changes centroid assignments and SSE/SST, which can shift r_squared around the 0.99 decision threshold and lead to a wrong architecture recommendation.

Useful? React with 👍 / 👎.

…s Pillar 6 (Σ push-forward), not a 3D renderer User pointed to actual source: crates/jc/src/ewa_sandwich.rs + crates/lance-graph-contract/src/sigma_propagation.rs + .claude/plans/ jc-pillars-runtime-wiring-v1{,-ERRATUM}.md + crates/jc/examples/ osint_edge_traversal.rs + splat_perturbationslernen.rs. Real architecture: EWA-Sandwich = Σ_n = M_n·...·M_1·Σ_0·M_1ᵀ·...·M_nᵀ along multi-hop edge paths. Pillar 6 certifies PSD-preservation (10000/10000 hops in probe) + log-norm concentration at Köstenberger-Stark rate (CV tightness 1.467×). Combined with Pillar 5 (Jirak scalar), 5+ (KS Σ-tensor), 5++ (DZ Hilbert), 7 (α-saturation), the full aggregation substrate sits on certified ground. Plus PR #288 (Σ-codebook viability, R² = 0.9949) rules out CausalEdge64 8→16 byte expansion — 256-entry codebook with 1-byte sidecar is sufficient. EPIPHANIES: prepend CORRECTION-OF entry with the real math kernel + pillar-stack composition; keep the original splat-conjecture entry intact (append-only). IDEAS: split into two distinct rows — (1) CORRECTION acknowledging EWA-Sandwich is an existing certified pillar not a new idea; (2) separate-and-orthogonal 3DGS render-buffer idea kept for sprint-5+ pickup (different crate home, different math role). sprint-4-execution-plan.md (W1): patched W1's wrong acronym expansion ("Efficient Weighted Adjacency" → Elliptical Weighted Average, Heckbert origin, Pillar 6 in JC framework, real code locations cited).

chatgpt-codex-connector Bot reviewed Apr 29, 2026

View reviewed changes

AdaWorldAPI merged commit ebba4f2 into main Apr 29, 2026
2 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jc: Σ-Codebook Viability Probe — empirically rules out CausalEdge64 8→16 byte expansion#288

jc: Σ-Codebook Viability Probe — empirically rules out CausalEdge64 8→16 byte expansion#288
AdaWorldAPI merged 1 commit into
mainfrom
claude/jc-sigma-codebook-probe

AdaWorldAPI commented Apr 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 29, 2026

Was

Resultat

Architektur-Entscheidung freigeschaltet

Was gemessen wurde

Architektonisch sauber weil

Honest Limitations

Files

Tests (6/6 grün)

Verifikation

Unblocks

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants