Skip to content

jc: Σ-Codebook Viability Probe — empirically rules out CausalEdge64 8→16 byte expansion#288

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/jc-sigma-codebook-probe
Apr 29, 2026
Merged

jc: Σ-Codebook Viability Probe — empirically rules out CausalEdge64 8→16 byte expansion#288
AdaWorldAPI merged 1 commit into
mainfrom
claude/jc-sigma-codebook-probe

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Was

Empirical measurement, NOT a theorem proof. Decides between Σ-edge encoding strategies before any production CausalEdgeTensor / propagate() code is written.

Resultat

✓ CODEBOOK VIABLE  R² = 0.994900  threshold ≥ 0.99  (278 ms)
Edges: 10,000  Codebook: k=256  Used clusters: 256/256  k-Means iters: 40/100
Σ eigenvalue-ratio max: 4.00× (anisotropy spread indicator)

R² ist der log-Euklidische Bestimmtheits-Koeffizient — 99.49 % der Varianz im log-Σ-Raum wird vom 256-Eintrag-Codebook eingefangen. Knapp über der 0.99-Schwelle; passt aber tight. Für den 240-Edge-HighHeelBGZ-Container-Fall ausreichend.

Architektur-Entscheidung freigeschaltet

Das Resultat schließt den v3 "CausalEdgeTensor 8→16 Byte"-Erweiterungspfad aus (hätte das 240-Edges-pro-Container-Limit halbiert). Empfohlener Pfad:

Option Strategie Größe
A Σ-Codebook (3.5 KB workspace-weit) + 1-Byte-Sidecar pro Edge 240 Byte/Container Wachstum
C SchemaSidecar Block 14/15 trägt Σ-Indices für 240 Edges 0 Byte Wachstum (nutzt reservierten Platz)

In beiden Fällen: CausalEdge64 unverändert, HighHeelBGZ-Container 240-Edge-Hard-Limit erhalten, 7 Konsumenten-Crates unberührt.

Was gemessen wurde

Synthese von 10 000 plausiblen CausalEdges mit realistischer Feldverteilung:

  • frequency: u² (Beta-shaped, biased low) — die meisten Edges sind schwach belegt
  • confidence: 1−(1−u)² (Beta-shaped, biased high) — wenn da, dann entscheidend
  • direction: uniform discrete 0..8 — 8 diskrete Richtungen

Mapping jedes Edges zu 2×2 SPD-Σ via semantisch begründete Ableitung:

  • strength = freq × conf (Gesamtskala)
  • anisotropy = conf (hohe Konfidenz → schmale Hauptachse)
  • rotation = direction · π/8

Dann Lloyds k-Means in 3D log-Euklidischem Raum (Standardlinearisierung der affine-invarianten Riemannschen Metric auf der SPD-Mannigfaltigkeit). Berechnung:

R² = 1 − Σ d²(Σ_i, codebook[assignment_i]) / Σ d²(Σ_i, Σ_global_mean)

Architektonisch sauber weil

  1. CausalEdge64 bleibt bit-genau unverändert — 7 Konsumenten-Crates unberührt
  2. HighHeelBGZ 240-edges/2KB Hard-Limit erhalten
  3. Codebook-Overhead vernachlässigbar (3.5 KB workspace-weit vs. 1.92 KB pro Container)
  4. bgz17-Palette-Pattern (auch k=256) ist die natürliche Schwester — gleiche architektonische Logik, nur auf white-matter Σ angewandt statt S/P/O

Honest Limitations

  • 2×2 SPD gewählt für closed-form Mathe; volles 3×3 anisotropes Σ würde schwerer clustern — aber 2×2 demonstriert das Prinzip
  • Synthetisierte Verteilung ist plausibel, nicht aus Production gemessen — mit echtem Stream nochmal laufen lassen, um zu bestätigen
  • R² = 0.9949 ist knapp über Threshold; für >5-Hop-Multi-Hop-Queries kann der kumulierte Fehler relevant werden — Caller sollte basierend auf Use-Case bewerten
  • Log-Euklidisch ≠ affine-invariant; stimmen erster Ordnung überein, divergieren in Tails

Files

  • crates/jc/src/sigma_codebook_probe.rs — neu, ~370 Zeilen inkl. 6 Unit-Tests
  • crates/jc/src/lib.rs — Modul-Deklaration mit explizitem "NOT a pillar"-Hinweis
  • crates/jc/examples/sigma_probe.rs — neuer standalone-Runner mit Entscheidungs-Output
  • crates/jc/Cargo.toml — neuer [[example]]-Eintrag

Nicht zur run_all_pillars-Liste hinzugefügt — Probes sind diagnostisch, Pillars sind Theorem-Beweise. Andere Kategorie.

Tests (6/6 grün)

  • sym2_log_exp_round_trip (matrix log/exp Round-trip)
  • identity_log_is_zero (sanity: log(I) = 0)
  • edge_synthesizer_is_deterministic (RNG-State-Schutz)
  • edge_to_sigma_produces_spd (Mapping liefert immer SPD)
  • kmeans_converges_on_separable_data (Algorithmus-Sanity)
  • probe_runs_and_reports_meaningful_result (End-to-end)

Verifikation

cargo test --manifest-path crates/jc/Cargo.toml --release sigma_codebook
# 6 passed; 0 failed

cargo run --manifest-path crates/jc/Cargo.toml --release --example sigma_probe
# ✓ CODEBOOK VIABLE  R²=0.994900  threshold≥0.99

Unblocks

Mit diesem Probe-Resultat kann jetzt CausalEdgeTensor-Variante als 9-Byte-Sidecar (CausalEdge64 + 1 Byte Σ-Codebook-Index) entworfen werden, ODER äquivalent über Schemasidecar Block 14/15. Caller-Wahl, beide architektonisch tragbar.

Empirical measurement (NOT a theorem proof). Decides between Σ-edge encoding
strategies BEFORE any production CausalEdgeTensor / propagate() code is
written.

# Result

  ✓ CODEBOOK VIABLE  R² = 0.994900  threshold ≥ 0.99  (278 ms)
  Edges: 10,000  Codebook: k=256  Used clusters: 256/256  k-Means iters: 40/100

R² is the log-Euclidean coefficient of determination — 99.49% of the variance
in log-Σ space is captured by the 256-entry codebook. Just over the 0.99
threshold; passes but tightly. For the 240-edge HighHeelBGZ container case
this is sufficient.

# Decision unlocked

Result rules out the v3 "CausalEdgeTensor 8→16 Byte" expansion (which would
have halved the 240-edge-per-container limit). Recommended path:

  Option A: Σ-Codebook (3.5 KB workspace) + 1-byte sidecar per edge
            CausalEdge64 unchanged. 240 edges per container preserved.

  Option C: SchemaSidecar Block 14/15 carries Σ-indices for 240 edges.
            No new sidecar pipe; uses already-reserved 16k-fingerprint blocks.

# What was measured

Synthesize 10,000 plausible CausalEdges with realistic field distributions:
  - frequency:  u² (Beta-shaped, biased low)         — most edges weak
  - confidence: 1−(1−u)² (Beta-shaped, biased high)  — when present, decisive
  - direction:  uniform discrete 0..8                — 8 discrete bearings

Map each edge to a 2×2 SPD Σ via reasoned semantic mapping:
  - strength    = freq × conf  (overall scale)
  - anisotropy  = conf         (high conf → narrow major axis)
  - rotation    = direction · π/8

Run Lloyd's k-Means in 3D log-Euclidean space (the standard linearization of
the affine-invariant Riemannian metric on the SPD cone). Compute:
  R² = 1 − Σ d²(Σ_i, codebook[assignment_i]) / Σ d²(Σ_i, Σ_global_mean)

# Architecturally sound because

1. CausalEdge64 stays bit-exact unchanged — 7 consumer crates unaffected
2. HighHeelBGZ 240-edges/2KB hard limit preserved
3. Codebook overhead negligible (3.5 KB workspace-wide vs 1.92 KB per container)
4. Bgz17-Palette pattern (also k=256) is the natural sibling — same
   architectural logic, applied to white-matter Σ instead of S/P/O

# Honest limitations

- 2×2 SPD chosen for closed-form math; full 3×3 anisotropic Σ would be
  harder to cluster — but the 2×2 case demonstrates the principle
- Synthesized distribution is plausible, not measured from production —
  re-run with real-stream data to confirm
- R² = 0.9949 is just over threshold; for >5-hop multi-hop queries the
  cumulative error may matter — caller should evaluate based on use case
- Log-Euclidean ≠ affine-invariant; agree to first order, diverge in tails

# Files

- crates/jc/src/sigma_codebook_probe.rs (new, ~370 lines incl. 6 unit tests)
- crates/jc/src/lib.rs (module declaration with explicit "NOT a pillar" note)
- crates/jc/examples/sigma_probe.rs (new standalone runner with decision output)
- crates/jc/Cargo.toml (new [[example]] entry)

NOT added to run_all_pillars — probes are diagnostic, pillars are theorem
proofs. Different category.

# Run

  cargo test --manifest-path crates/jc/Cargo.toml --release sigma_codebook
  cargo run --manifest-path crates/jc/Cargo.toml --release --example sigma_probe
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43e72f1e19

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +193 to +197
fn sq_dist3(a: &[f64; 3], b: &[f64; 3]) -> f64 {
let dx = a[0] - b[0];
let dy = a[1] - b[1];
let dz = a[2] - b[2];
dx * dx + dy * dy + dz * dz
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use Frobenius-weighted distance for log-Σ clustering

The probe states that k-means and R² are computed in log-Euclidean/Frobenius space, but sq_dist3 treats [a,b,c] as plain Euclidean coordinates and uses dx²+dy²+dz². For symmetric 2×2 matrices, Frobenius distance is Δa² + 2·Δb² + Δc², so the off-diagonal term is currently underweighted by 2x. This changes centroid assignments and SSE/SST, which can shift r_squared around the 0.99 decision threshold and lead to a wrong architecture recommendation.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit ebba4f2 into main Apr 29, 2026
2 of 6 checks passed
AdaWorldAPI pushed a commit that referenced this pull request May 13, 2026
…s Pillar 6 (Σ push-forward), not a 3D renderer

User pointed to actual source: crates/jc/src/ewa_sandwich.rs +
crates/lance-graph-contract/src/sigma_propagation.rs + .claude/plans/
jc-pillars-runtime-wiring-v1{,-ERRATUM}.md + crates/jc/examples/
osint_edge_traversal.rs + splat_perturbationslernen.rs.

Real architecture: EWA-Sandwich = Σ_n = M_n·...·M_1·Σ_0·M_1ᵀ·...·M_nᵀ
along multi-hop edge paths. Pillar 6 certifies PSD-preservation
(10000/10000 hops in probe) + log-norm concentration at
Köstenberger-Stark rate (CV tightness 1.467×). Combined with Pillar
5 (Jirak scalar), 5+ (KS Σ-tensor), 5++ (DZ Hilbert), 7
(α-saturation), the full aggregation substrate sits on certified
ground.

Plus PR #288 (Σ-codebook viability, R² = 0.9949) rules out
CausalEdge64 8→16 byte expansion — 256-entry codebook with 1-byte
sidecar is sufficient.

EPIPHANIES: prepend CORRECTION-OF entry with the real math kernel +
pillar-stack composition; keep the original splat-conjecture entry
intact (append-only).

IDEAS: split into two distinct rows — (1) CORRECTION acknowledging
EWA-Sandwich is an existing certified pillar not a new idea;
(2) separate-and-orthogonal 3DGS render-buffer idea kept for
sprint-5+ pickup (different crate home, different math role).

sprint-4-execution-plan.md (W1): patched W1's wrong acronym
expansion ("Efficient Weighted Adjacency" → Elliptical Weighted
Average, Heckbert origin, Pillar 6 in JC framework, real code
locations cited).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants