docs(knowledge+board): SIMD alien-magic capture — E-SIMD-SWEEP-1 + 13 TD entries + wave plan#400
Conversation
…D entries, wave plan
Architectural capture commit per autoattended-multiagent-pattern §3
("P0 fixes land BEFORE the next sprint PR opens"). The simd-savant
PRE-MERGE audit of origin/main surfaced 158 raw-intrinsic violations
across 5 consumer crates (mul.rs, blasgraph/types + bridge, holograph
hamming, bgz17 simd + prefetch, thinking-engine VNNI dispatch) plus
3 missing primitives in ndarray::simd that block clean remediation.
Before any worker spawns to fix PR #398's codex P1/P2 findings, the
architectural shape needs writing so every wave-1 worker briefs
against the same canonical reference.
Files (4):
1. .claude/knowledge/ndarray-vertical-simd-alien-magic.md (NEW, 144L)
The canonical reference. "The Click" statement (ndarray ships
struct methods on typed wrappers + closure-parameterized batch
primitives; consumers compose with domain enums). Per-workload
surface table covering palette L1-L4, spatial splat, blasgraph-
over-palette, i4 packed qualia, hamming over u64, signature
kernels. W1a (5 ndarray PRs) + W1b (5 consumer migrations) +
W1.5 (3 sigker primitives, gated on jc Pillar 11) wave plan.
Cross-links to simd-savant card, autoattended-pattern §14,
sigker lib.rs, Jirak iron rule. Litmus tests for surface
proposals.
2. .claude/board/EPIPHANIES.md (PREPEND E-SIMD-SWEEP-1)
Captures the 158-violation finding: PR #398 was the 5th
violation, not the first. Doctrinal claim: the SIMD source-of-
truth invariant is retroactive, not just forward. Full AP-SIMD-N
breakdown (117 / 8 / 13 / 7 / 19 / 0 / 2 / 13 = 158). Strategic
angle on sigker as Index-regime third lane that bypasses
I-NOISE-FLOOR-JIRAK. Doctrinal counterpart to E-META-10 /
I-LEGACY-API-FEATURE-GATED (same retroactive-sweep shape).
3. .claude/board/TECH_DEBT.md (PREPEND 10 entries)
- W1a (5 entries): TD-NDARRAY-SIMD-UNPACK-I4-16D,
-SATURATING-ABS-I8, -GATHER, -PREFETCH, -POPCOUNT-U64.
Each with severity, required API surface (Required ndarray
PR), and cross-refs.
- W1.5 (3 entries, DEFERRED): TD-NDARRAY-SIMD-SIGNATURE-PDE-SWEEP,
-RANDOMIZED-PROJECTION, -LYNDON-PACK. Gated on sigker
benchmarking + jc Pillar 11 certification.
- W1b consumer migrations (5 entries): TD-SIMD-SWEEP-W1
(holograph), W2 (blasgraph), W3 (bgz17), W4 (mul.rs follow-
up, P0), W5 (thinking-engine VNNI).
4. CLAUDE.md § Knowledge Base (+1 row)
Inventory entry for the new knowledge doc per Mandatory
Board-Hygiene Rule.
Notable architectural decisions captured (so workers don't re-derive):
- The "alien magic" shape is struct methods + closure-batch primitives,
NOT free functions or consumer-side traits. The polyfill is the
single channel; consumers compose via closures.
- Direction B for codex P2 i8::MIN: scalar is buggy
(`unsigned_abs() as i8` wraps i8::MIN → -128), AVX-512 is correct
(`_mm512_abs_epi8` saturates to 127 by ISA). Per spec line 233 of
pr-sprint-13-simd-i4.md: |signed_mantissa| ≤ 1 → ValleyOfDespair
means "weak rule signal", not "sign-extreme". Verdict from
PP-16 preflight-drift-auditor 2026-05-16.
- Narrow scope for mul.rs follow-up + 4 separate sweep PRs, NOT one
mega-sweep. Each violator has a distinct missing-primitive
blocker; bundling would conflate 4 unrelated correctness reviews.
Per PP-14 convergence-architect SYNERGY 3 verdict.
- sigker positioning: Index-regime third codec lane (alongside
bgz17 palette-distance and deepnsm NSM tiling). Bypasses Jirak
noise floor via Hambly-Lyons 2010 uniqueness. Activates as a
first-class W1.5 consumer when jc Pillar 11 trips. Zero raw
intrinsics today — cleanest exemplar of "domain crate composes
via closures" pattern.
Pre-spawn verdict from simd-savant: the original "narrow scope"
plan was insufficient given the audit; the ndarray-first wave is
now mandatory (not just preferred). Workers W1a-#1 through W1a-#5
can spawn in parallel against adaworldapi/ndarray master once this
capture commit lands.
https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1141770710
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Five small primitives, each on its own branch, each auditable by `simd-savant` before merge: | ||
|
|
||
| 1. **`TD-NDARRAY-SIMD-UNPACK-I4-16D`** — `I8x16::from_i4_packed_u64` + `I8x16::lane_i8::<N>` + `batch_packed_i4_16<E, F>` closure-batch entry. AVX-512 path via `_mm512_cvtepi8_epi16` + nibble shuffle; NEON via `vshl_n_s8` / `vqshl_n_s8`; scalar via fused-loop fallback. Bounds-aware tail. | ||
| 2. **`TD-NDARRAY-SIMD-SATURATING-ABS-I8`** — `I8x16::saturating_abs(self) -> Self`. AVX-512 `_mm512_abs_epi8` (saturates `i8::MIN → 127` by ISA); NEON `vqabsq_s8`; scalar `i8::saturating_abs`. Closes codex P2 i8::MIN divergence on PR #398 by giving consumers a single source-of-truth for "the abs that matches hardware semantics." |
There was a problem hiding this comment.
Correct the AVX-512 abs semantics
When a W1a worker follows this canonical plan to implement I8x16::saturating_abs, using _mm512_abs_epi8 does not saturate i8::MIN to 127: VPABSB leaves the 0x80 lane value as 0x80, which is still -128 when represented as i8. That means the documented fix for the PR #398 i8::MIN divergence would preserve the bad AVX-512 behavior instead of matching i8::saturating_abs/NEON vqabsq_s8; the primitive needs an extra clamp/remap of 0x80 to 0x7f or the docs should not call this instruction saturating.
Useful? React with 👍 / 👎.
Architectural capture commit per autoattended-multiagent-pattern §3 ("P0 fixes land BEFORE the next sprint PR opens"). The simd-savant PRE-MERGE audit of
origin/mainsurfaced 158 raw-intrinsic violations across 5 consumer crates plus 3 missing primitives inndarray::simdthat block clean remediation. Before any worker spawns to fix PR #398's codex P1/P2 findings, the architectural shape needs writing so every wave-1 worker briefs against the same canonical reference.What lands here (no code, only docs + board)
1.
.claude/knowledge/ndarray-vertical-simd-alien-magic.md(NEW, 144 LOC)The canonical reference. The Click: ndarray ships struct methods on typed wrappers (
I8x16,U8x32,F32x16,U64x8, …) plus closure-parameterized batch primitives; consumers compose with domain enums via closures. Per-workload surface table covers palette L1-L4, spatial splat, blasgraph-over-palette, i4 packed qualia, hamming, signature kernels.Why this shape: struct methods over free fns (surface fragments) and over consumer-side traits (single-channel property lost). The "alien magic" is that ndarray is designed AS-IF it had clairvoyant knowledge of our exact workloads — the polyfill is the cellular chemistry, consumer crates assemble organs from struct methods.
Wave plan:
I8x16::from_i4_packed_u64+batch_packed_i4_16<E,F>;I8x16::saturating_abs(codex P2 Direction B);U16x8::gather_u16; cross-archprefetch_read_t0;U64x8::popcntjc Pillar 11activation): signature-PDE-sweep, randomized-projection, lyndon-pack — forcrates/sigker2.
.claude/board/EPIPHANIES.md— PREPENDE-SIMD-SWEEP-1Full AP-SIMD-N breakdown (117 + 8 + 13 + 7 + 19 + 0 + 2 + 13 = 158). Doctrinal counterpart to
E-META-10/I-LEGACY-API-FEATURE-GATED(same retroactive-sweep shape). Strategic angle: sigker as Index-regime third lane bypassingI-NOISE-FLOOR-JIRAK.3.
.claude/board/TECH_DEBT.md— PREPEND 13 entriesTD-NDARRAY-SIMD-{UNPACK-I4-16D, SATURATING-ABS-I8, GATHER, PREFETCH, POPCOUNT-U64}— each with required API surface for the corresponding ndarray PRTD-NDARRAY-SIMD-{SIGNATURE-PDE-SWEEP, RANDOMIZED-PROJECTION, LYNDON-PACK}TD-SIMD-SWEEP-{W1 holograph, W2 blasgraph, W3 bgz17, W4 mul.rs P0, W5 thinking-engine}4.
CLAUDE.md§ Knowledge Base (+1 row)Inventory entry for the new knowledge doc per Mandatory Board-Hygiene Rule.
Architectural decisions captured (so workers don't re-derive)
unsigned_abs() as i8wrapsi8::MIN → -128), AVX-512 is correct (_mm512_abs_epi8saturates to 127 by ISA). Per spec line 233 ofpr-sprint-13-simd-i4.md. Verdict from PP-16 preflight-drift-auditor.jc Pillar 11trips.Pre-spawn verdict
simd-savant says the original "narrow scope" plan was insufficient given the 158-violation audit; the ndarray-first wave is now mandatory, not preferred. Workers W1a-#1 through W1a-#5 can spawn in parallel against
adaworldapi/ndarray masteronce this capture commit lands.https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Generated by Claude Code