Skip to content

fix(codex-p1): VPABSB does NOT saturate i8::MIN — correct AVX-512 saturating_abs spec#401

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/codex-p1-vpabsb-correction
May 16, 2026
Merged

fix(codex-p1): VPABSB does NOT saturate i8::MIN — correct AVX-512 saturating_abs spec#401
AdaWorldAPI merged 1 commit into
mainfrom
claude/codex-p1-vpabsb-correction

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Codex caught a P1 on PR #400: my canonical reference doc claimed _mm512_abs_epi8 (VPABSB) saturates i8::MIN → 127 by ISA. This is wrong — VPABSB returns the same bit pattern for 0x80, so abs(i8::MIN) = i8::MIN. A W1a worker implementing I8x16::saturating_abs using only VPABSB would have shipped exactly the i8::MIN divergence the spec was supposed to close.

The PR #398 AVX-512 path got the right answer through a DIFFERENT mechanism (widen i8 → i64, negate-blend, which produces +128 in the wider register where it fits). My doc conflated the two and proposed the wrong primitive.

Correct AVX-512 saturating_abs

let raw_abs = _mm512_abs_epi8(self.0);            // 0x80 → 0x80 (unchanged)
let clamped = _mm512_min_epu8(raw_abs,
                              _mm512_set1_epi8(0x7f));  // VPMINUB: 0x80 → 0x7f
I8x16(clamped)

VPMINUB (unsigned-byte min) reads 0x80 as 128 and clamps to 127. All other lanes have abs(x) < 0x80, unaffected.

NEON vqabsq_s8 is already hardware-saturating. Scalar i8::saturating_abs is correct.

Files (3, all inline corrections — no new files)

  1. .claude/knowledge/ndarray-vertical-simd-alien-magic.md §W1a Module 6: #[track_caller] error macros for zero-cost location capture #2 — correct AVX-512 impl spelled out, NEON suffix-q explanation, scalar fallback.
  2. .claude/board/EPIPHANIES.md E-SIMD-SWEEP-1 — inline VPABSB-doesn't-saturate note + corrected primitive entry.
  3. .claude/board/TECH_DEBT.md TD-NDARRAY-SIMD-SATURATING-ABS-I8 — rewritten Description (explains why PR impl(sprint-13/W-I1): D-CSV-13b — i4 batch SIMD dispatch + tests #398's AVX-512 got the right answer despite VPABSB not saturating) + rewritten Required API surface + new Mandatory test (assert I8x16::saturating_abs(splat(i8::MIN)) == splat(i8::MAX) across all backends).

Direction B verdict (PP-16 preflight) is unchanged

Scalar is buggy (unsigned_abs() as i8 wraps i8::MIN → -128), AVX-512 outcome is correct, i8::MIN should classify as Slope/Plateau (not ValleyOfDespair). The fix is to the IMPLEMENTATION STRATEGY for the new ndarray primitive, not to the architectural decision.

Companion PR

AdaWorldAPI/ndarray #149 captures the same correction on the ndarray side, with full per-arch implementation hints + mandatory parity test, so W1a workers spawning against ndarray master have the correct spec.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS


Generated by Claude Code

…abs spec

Codex P1 on PR #400 caught that the canonical reference doc
(ndarray-vertical-simd-alien-magic.md §W1a #2) claimed
`_mm512_abs_epi8` saturates `i8::MIN → 127` by ISA. This is wrong:
VPABSB returns the same bit pattern for `0x80` (i.e., abs(i8::MIN)
= i8::MIN, since +128 doesn't fit in i8). A W1a worker implementing
the documented primitive would have shipped the same i8::MIN
divergence the spec was supposed to close.

Three files updated with the correct semantics:

1. `.claude/knowledge/ndarray-vertical-simd-alien-magic.md` §W1a #2
   Correct AVX-512 impl: `_mm512_min_epu8(_mm512_abs_epi8(x),
   _mm512_set1_epi8(0x7f))`. VPABSB gives the absolute-value bit
   pattern; VPMINUB (unsigned min) then clamps the single
   problematic byte 0x80 (=128 unsigned > 127) down to 0x7f
   (=127). All other lanes are unchanged since `abs(x) < 0x80`
   for `x ≠ i8::MIN`. NEON `vqabsq_s8` is already saturating
   (the `q` suffix); scalar `i8::saturating_abs` is correct.

2. `.claude/board/EPIPHANIES.md` E-SIMD-SWEEP-1
   Inline correction: `TD-NDARRAY-SIMD-SATURATING-ABS-I8` entry now
   names the VPABSB+VPMINUB pair and explicitly notes that VPABSB
   alone does NOT saturate i8::MIN.

3. `.claude/board/TECH_DEBT.md` TD-NDARRAY-SIMD-SATURATING-ABS-I8
   Description rewrite: clarifies that PR #398's AVX-512 path got
   the right answer not because of VPABSB but because it widens
   i8 → i64 first and negate-blends (a different mechanism). The
   new ndarray primitive must produce truly-saturating semantics
   in the same byte-wide register without widening. Added a
   mandatory test: `I8x16::saturating_abs(splat(i8::MIN))` must
   return `splat(i8::MAX)` on all three backends.

Direction B verdict (scalar is buggy, AVX-512 outcome is correct)
is unchanged. The fix is to the IMPLEMENTATION STRATEGY for the
new ndarray primitive, not to the architectural decision.

Cross-ref: PR #400 codex P1 review; PR #398 codex P2 (the
i8::MIN divergence that motivated W1a-#2 in the first place);
Intel Intrinsics Guide for `_mm512_abs_epi8` (VPABSB);
ARM Architecture Reference for VQABS (`vqabsq_s8`).

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
@AdaWorldAPI AdaWorldAPI merged commit 74046c8 into main May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants