Skip to content

docs(knowledge+board): SIMD alien-magic capture — E-SIMD-SWEEP-1 + 13 TD entries + wave plan#400

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/simd-alien-magic-capture
May 16, 2026
Merged

docs(knowledge+board): SIMD alien-magic capture — E-SIMD-SWEEP-1 + 13 TD entries + wave plan#400
AdaWorldAPI merged 1 commit into
mainfrom
claude/simd-alien-magic-capture

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Architectural capture commit per autoattended-multiagent-pattern §3 ("P0 fixes land BEFORE the next sprint PR opens"). The simd-savant PRE-MERGE audit of origin/main surfaced 158 raw-intrinsic violations across 5 consumer crates plus 3 missing primitives in ndarray::simd that block clean remediation. Before any worker spawns to fix PR #398's codex P1/P2 findings, the architectural shape needs writing so every wave-1 worker briefs against the same canonical reference.

What lands here (no code, only docs + board)

1. .claude/knowledge/ndarray-vertical-simd-alien-magic.md (NEW, 144 LOC)

The canonical reference. The Click: ndarray ships struct methods on typed wrappers (I8x16, U8x32, F32x16, U64x8, …) plus closure-parameterized batch primitives; consumers compose with domain enums via closures. Per-workload surface table covers palette L1-L4, spatial splat, blasgraph-over-palette, i4 packed qualia, hamming, signature kernels.

Why this shape: struct methods over free fns (surface fragments) and over consumer-side traits (single-channel property lost). The "alien magic" is that ndarray is designed AS-IF it had clairvoyant knowledge of our exact workloads — the polyfill is the cellular chemistry, consumer crates assemble organs from struct methods.

Wave plan:

  • W1a (5 parallel ndarray PRs): I8x16::from_i4_packed_u64 + batch_packed_i4_16<E,F>; I8x16::saturating_abs (codex P2 Direction B); U16x8::gather_u16; cross-arch prefetch_read_t0; U64x8::popcnt
  • W1b (5 consumer migrations, gated on W1a merge): mul.rs follow-up (P0), bgz17/simd, bgz17/prefetch, blasgraph types+bridge, holograph hamming, thinking-engine VNNI
  • W1.5 (DEFERRED, gated on jc Pillar 11 activation): signature-PDE-sweep, randomized-projection, lyndon-pack — for crates/sigker

2. .claude/board/EPIPHANIES.md — PREPEND E-SIMD-SWEEP-1

PR #398 was the 5th violation, not the first; the SIMD source-of-truth invariant is retroactive

Full AP-SIMD-N breakdown (117 + 8 + 13 + 7 + 19 + 0 + 2 + 13 = 158). Doctrinal counterpart to E-META-10 / I-LEGACY-API-FEATURE-GATED (same retroactive-sweep shape). Strategic angle: sigker as Index-regime third lane bypassing I-NOISE-FLOOR-JIRAK.

3. .claude/board/TECH_DEBT.md — PREPEND 13 entries

  • W1a (5): TD-NDARRAY-SIMD-{UNPACK-I4-16D, SATURATING-ABS-I8, GATHER, PREFETCH, POPCOUNT-U64} — each with required API surface for the corresponding ndarray PR
  • W1.5 (3, DEFERRED): TD-NDARRAY-SIMD-{SIGNATURE-PDE-SWEEP, RANDOMIZED-PROJECTION, LYNDON-PACK}
  • W1b (5): TD-SIMD-SWEEP-{W1 holograph, W2 blasgraph, W3 bgz17, W4 mul.rs P0, W5 thinking-engine}

4. CLAUDE.md § Knowledge Base (+1 row)

Inventory entry for the new knowledge doc per Mandatory Board-Hygiene Rule.

Architectural decisions captured (so workers don't re-derive)

  • Surface shape: struct methods + closure-batch primitives. Not free functions, not consumer-side traits.
  • Direction B for codex P2 i8::MIN: scalar is buggy (unsigned_abs() as i8 wraps i8::MIN → -128), AVX-512 is correct (_mm512_abs_epi8 saturates to 127 by ISA). Per spec line 233 of pr-sprint-13-simd-i4.md. Verdict from PP-16 preflight-drift-auditor.
  • Narrow scope per PR, NOT one mega-sweep. Each violator has a distinct missing-primitive blocker; bundling conflates correctness reviews. Per PP-14 convergence-architect §SYNERGY 3.
  • sigker positioning: Index-regime third codec lane (alongside bgz17 + deepnsm). Bypasses Jirak via Hambly-Lyons 2010. Zero raw intrinsics today; activates as W1.5 consumer when jc Pillar 11 trips.

Pre-spawn verdict

simd-savant says the original "narrow scope" plan was insufficient given the 158-violation audit; the ndarray-first wave is now mandatory, not preferred. Workers W1a-#1 through W1a-#5 can spawn in parallel against adaworldapi/ndarray master once this capture commit lands.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS


Generated by Claude Code

…D entries, wave plan

Architectural capture commit per autoattended-multiagent-pattern §3
("P0 fixes land BEFORE the next sprint PR opens"). The simd-savant
PRE-MERGE audit of origin/main surfaced 158 raw-intrinsic violations
across 5 consumer crates (mul.rs, blasgraph/types + bridge, holograph
hamming, bgz17 simd + prefetch, thinking-engine VNNI dispatch) plus
3 missing primitives in ndarray::simd that block clean remediation.
Before any worker spawns to fix PR #398's codex P1/P2 findings, the
architectural shape needs writing so every wave-1 worker briefs
against the same canonical reference.

Files (4):

1. .claude/knowledge/ndarray-vertical-simd-alien-magic.md (NEW, 144L)
   The canonical reference. "The Click" statement (ndarray ships
   struct methods on typed wrappers + closure-parameterized batch
   primitives; consumers compose with domain enums). Per-workload
   surface table covering palette L1-L4, spatial splat, blasgraph-
   over-palette, i4 packed qualia, hamming over u64, signature
   kernels. W1a (5 ndarray PRs) + W1b (5 consumer migrations) +
   W1.5 (3 sigker primitives, gated on jc Pillar 11) wave plan.
   Cross-links to simd-savant card, autoattended-pattern §14,
   sigker lib.rs, Jirak iron rule. Litmus tests for surface
   proposals.

2. .claude/board/EPIPHANIES.md (PREPEND E-SIMD-SWEEP-1)
   Captures the 158-violation finding: PR #398 was the 5th
   violation, not the first. Doctrinal claim: the SIMD source-of-
   truth invariant is retroactive, not just forward. Full AP-SIMD-N
   breakdown (117 / 8 / 13 / 7 / 19 / 0 / 2 / 13 = 158). Strategic
   angle on sigker as Index-regime third lane that bypasses
   I-NOISE-FLOOR-JIRAK. Doctrinal counterpart to E-META-10 /
   I-LEGACY-API-FEATURE-GATED (same retroactive-sweep shape).

3. .claude/board/TECH_DEBT.md (PREPEND 10 entries)
   - W1a (5 entries): TD-NDARRAY-SIMD-UNPACK-I4-16D,
     -SATURATING-ABS-I8, -GATHER, -PREFETCH, -POPCOUNT-U64.
     Each with severity, required API surface (Required ndarray
     PR), and cross-refs.
   - W1.5 (3 entries, DEFERRED): TD-NDARRAY-SIMD-SIGNATURE-PDE-SWEEP,
     -RANDOMIZED-PROJECTION, -LYNDON-PACK. Gated on sigker
     benchmarking + jc Pillar 11 certification.
   - W1b consumer migrations (5 entries): TD-SIMD-SWEEP-W1
     (holograph), W2 (blasgraph), W3 (bgz17), W4 (mul.rs follow-
     up, P0), W5 (thinking-engine VNNI).

4. CLAUDE.md § Knowledge Base (+1 row)
   Inventory entry for the new knowledge doc per Mandatory
   Board-Hygiene Rule.

Notable architectural decisions captured (so workers don't re-derive):

- The "alien magic" shape is struct methods + closure-batch primitives,
  NOT free functions or consumer-side traits. The polyfill is the
  single channel; consumers compose via closures.
- Direction B for codex P2 i8::MIN: scalar is buggy
  (`unsigned_abs() as i8` wraps i8::MIN → -128), AVX-512 is correct
  (`_mm512_abs_epi8` saturates to 127 by ISA). Per spec line 233 of
  pr-sprint-13-simd-i4.md: |signed_mantissa| ≤ 1 → ValleyOfDespair
  means "weak rule signal", not "sign-extreme". Verdict from
  PP-16 preflight-drift-auditor 2026-05-16.
- Narrow scope for mul.rs follow-up + 4 separate sweep PRs, NOT one
  mega-sweep. Each violator has a distinct missing-primitive
  blocker; bundling would conflate 4 unrelated correctness reviews.
  Per PP-14 convergence-architect SYNERGY 3 verdict.
- sigker positioning: Index-regime third codec lane (alongside
  bgz17 palette-distance and deepnsm NSM tiling). Bypasses Jirak
  noise floor via Hambly-Lyons 2010 uniqueness. Activates as a
  first-class W1.5 consumer when jc Pillar 11 trips. Zero raw
  intrinsics today — cleanest exemplar of "domain crate composes
  via closures" pattern.

Pre-spawn verdict from simd-savant: the original "narrow scope"
plan was insufficient given the audit; the ndarray-first wave is
now mandatory (not just preferred). Workers W1a-#1 through W1a-#5
can spawn in parallel against adaworldapi/ndarray master once this
capture commit lands.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1141770710

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Five small primitives, each on its own branch, each auditable by `simd-savant` before merge:

1. **`TD-NDARRAY-SIMD-UNPACK-I4-16D`** — `I8x16::from_i4_packed_u64` + `I8x16::lane_i8::<N>` + `batch_packed_i4_16<E, F>` closure-batch entry. AVX-512 path via `_mm512_cvtepi8_epi16` + nibble shuffle; NEON via `vshl_n_s8` / `vqshl_n_s8`; scalar via fused-loop fallback. Bounds-aware tail.
2. **`TD-NDARRAY-SIMD-SATURATING-ABS-I8`** — `I8x16::saturating_abs(self) -> Self`. AVX-512 `_mm512_abs_epi8` (saturates `i8::MIN → 127` by ISA); NEON `vqabsq_s8`; scalar `i8::saturating_abs`. Closes codex P2 i8::MIN divergence on PR #398 by giving consumers a single source-of-truth for "the abs that matches hardware semantics."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correct the AVX-512 abs semantics

When a W1a worker follows this canonical plan to implement I8x16::saturating_abs, using _mm512_abs_epi8 does not saturate i8::MIN to 127: VPABSB leaves the 0x80 lane value as 0x80, which is still -128 when represented as i8. That means the documented fix for the PR #398 i8::MIN divergence would preserve the bad AVX-512 behavior instead of matching i8::saturating_abs/NEON vqabsq_s8; the primitive needs an extra clamp/remap of 0x80 to 0x7f or the docs should not call this instruction saturating.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants