fix(codex-p1): VPABSB does NOT saturate i8::MIN — correct AVX-512 saturating_abs spec by AdaWorldAPI · Pull Request #401 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-05-16T20:26:42Z

Codex caught a P1 on PR #400: my canonical reference doc claimed _mm512_abs_epi8 (VPABSB) saturates i8::MIN → 127 by ISA. This is wrong — VPABSB returns the same bit pattern for 0x80, so abs(i8::MIN) = i8::MIN. A W1a worker implementing I8x16::saturating_abs using only VPABSB would have shipped exactly the i8::MIN divergence the spec was supposed to close.

The PR #398 AVX-512 path got the right answer through a DIFFERENT mechanism (widen i8 → i64, negate-blend, which produces +128 in the wider register where it fits). My doc conflated the two and proposed the wrong primitive.

Correct AVX-512 saturating_abs

let raw_abs = _mm512_abs_epi8(self.0);            // 0x80 → 0x80 (unchanged)
let clamped = _mm512_min_epu8(raw_abs,
                              _mm512_set1_epi8(0x7f));  // VPMINUB: 0x80 → 0x7f
I8x16(clamped)

VPMINUB (unsigned-byte min) reads 0x80 as 128 and clamps to 127. All other lanes have abs(x) < 0x80, unaffected.

NEON vqabsq_s8 is already hardware-saturating. Scalar i8::saturating_abs is correct.

Files (3, all inline corrections — no new files)

.claude/knowledge/ndarray-vertical-simd-alien-magic.md §W1a Module 6: #[track_caller] error macros for zero-cost location capture #2 — correct AVX-512 impl spelled out, NEON suffix-q explanation, scalar fallback.
.claude/board/EPIPHANIES.md E-SIMD-SWEEP-1 — inline VPABSB-doesn't-saturate note + corrected primitive entry.
.claude/board/TECH_DEBT.md TD-NDARRAY-SIMD-SATURATING-ABS-I8 — rewritten Description (explains why PR impl(sprint-13/W-I1): D-CSV-13b — i4 batch SIMD dispatch + tests #398's AVX-512 got the right answer despite VPABSB not saturating) + rewritten Required API surface + new Mandatory test (assert I8x16::saturating_abs(splat(i8::MIN)) == splat(i8::MAX) across all backends).

Direction B verdict (PP-16 preflight) is unchanged

Scalar is buggy (unsigned_abs() as i8 wraps i8::MIN → -128), AVX-512 outcome is correct, i8::MIN should classify as Slope/Plateau (not ValleyOfDespair). The fix is to the IMPLEMENTATION STRATEGY for the new ndarray primitive, not to the architectural decision.

Companion PR

AdaWorldAPI/ndarray #149 captures the same correction on the ndarray side, with full per-arch implementation hints + mandatory parity test, so W1a workers spawning against ndarray master have the correct spec.

https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

Generated by Claude Code

…abs spec Codex P1 on PR #400 caught that the canonical reference doc (ndarray-vertical-simd-alien-magic.md §W1a #2) claimed `_mm512_abs_epi8` saturates `i8::MIN → 127` by ISA. This is wrong: VPABSB returns the same bit pattern for `0x80` (i.e., abs(i8::MIN) = i8::MIN, since +128 doesn't fit in i8). A W1a worker implementing the documented primitive would have shipped the same i8::MIN divergence the spec was supposed to close. Three files updated with the correct semantics: 1. `.claude/knowledge/ndarray-vertical-simd-alien-magic.md` §W1a #2 Correct AVX-512 impl: `_mm512_min_epu8(_mm512_abs_epi8(x), _mm512_set1_epi8(0x7f))`. VPABSB gives the absolute-value bit pattern; VPMINUB (unsigned min) then clamps the single problematic byte 0x80 (=128 unsigned > 127) down to 0x7f (=127). All other lanes are unchanged since `abs(x) < 0x80` for `x ≠ i8::MIN`. NEON `vqabsq_s8` is already saturating (the `q` suffix); scalar `i8::saturating_abs` is correct. 2. `.claude/board/EPIPHANIES.md` E-SIMD-SWEEP-1 Inline correction: `TD-NDARRAY-SIMD-SATURATING-ABS-I8` entry now names the VPABSB+VPMINUB pair and explicitly notes that VPABSB alone does NOT saturate i8::MIN. 3. `.claude/board/TECH_DEBT.md` TD-NDARRAY-SIMD-SATURATING-ABS-I8 Description rewrite: clarifies that PR #398's AVX-512 path got the right answer not because of VPABSB but because it widens i8 → i64 first and negate-blends (a different mechanism). The new ndarray primitive must produce truly-saturating semantics in the same byte-wide register without widening. Added a mandatory test: `I8x16::saturating_abs(splat(i8::MIN))` must return `splat(i8::MAX)` on all three backends. Direction B verdict (scalar is buggy, AVX-512 outcome is correct) is unchanged. The fix is to the IMPLEMENTATION STRATEGY for the new ndarray primitive, not to the architectural decision. Cross-ref: PR #400 codex P1 review; PR #398 codex P2 (the i8::MIN divergence that motivated W1a-#2 in the first place); Intel Intrinsics Guide for `_mm512_abs_epi8` (VPABSB); ARM Architecture Reference for VQABS (`vqabsq_s8`). https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS

AdaWorldAPI merged commit 74046c8 into main May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codex-p1): VPABSB does NOT saturate i8::MIN — correct AVX-512 saturating_abs spec#401

fix(codex-p1): VPABSB does NOT saturate i8::MIN — correct AVX-512 saturating_abs spec#401
AdaWorldAPI merged 1 commit into
mainfrom
claude/codex-p1-vpabsb-correction

AdaWorldAPI commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 16, 2026

Correct AVX-512 saturating_abs

Files (3, all inline corrections — no new files)

Direction B verdict (PP-16 preflight) is unchanged

Companion PR

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants