fix(codex-p1): VPABSB does NOT saturate i8::MIN — correct AVX-512 saturating_abs spec#401
Merged
Merged
Conversation
…abs spec Codex P1 on PR #400 caught that the canonical reference doc (ndarray-vertical-simd-alien-magic.md §W1a #2) claimed `_mm512_abs_epi8` saturates `i8::MIN → 127` by ISA. This is wrong: VPABSB returns the same bit pattern for `0x80` (i.e., abs(i8::MIN) = i8::MIN, since +128 doesn't fit in i8). A W1a worker implementing the documented primitive would have shipped the same i8::MIN divergence the spec was supposed to close. Three files updated with the correct semantics: 1. `.claude/knowledge/ndarray-vertical-simd-alien-magic.md` §W1a #2 Correct AVX-512 impl: `_mm512_min_epu8(_mm512_abs_epi8(x), _mm512_set1_epi8(0x7f))`. VPABSB gives the absolute-value bit pattern; VPMINUB (unsigned min) then clamps the single problematic byte 0x80 (=128 unsigned > 127) down to 0x7f (=127). All other lanes are unchanged since `abs(x) < 0x80` for `x ≠ i8::MIN`. NEON `vqabsq_s8` is already saturating (the `q` suffix); scalar `i8::saturating_abs` is correct. 2. `.claude/board/EPIPHANIES.md` E-SIMD-SWEEP-1 Inline correction: `TD-NDARRAY-SIMD-SATURATING-ABS-I8` entry now names the VPABSB+VPMINUB pair and explicitly notes that VPABSB alone does NOT saturate i8::MIN. 3. `.claude/board/TECH_DEBT.md` TD-NDARRAY-SIMD-SATURATING-ABS-I8 Description rewrite: clarifies that PR #398's AVX-512 path got the right answer not because of VPABSB but because it widens i8 → i64 first and negate-blends (a different mechanism). The new ndarray primitive must produce truly-saturating semantics in the same byte-wide register without widening. Added a mandatory test: `I8x16::saturating_abs(splat(i8::MIN))` must return `splat(i8::MAX)` on all three backends. Direction B verdict (scalar is buggy, AVX-512 outcome is correct) is unchanged. The fix is to the IMPLEMENTATION STRATEGY for the new ndarray primitive, not to the architectural decision. Cross-ref: PR #400 codex P1 review; PR #398 codex P2 (the i8::MIN divergence that motivated W1a-#2 in the first place); Intel Intrinsics Guide for `_mm512_abs_epi8` (VPABSB); ARM Architecture Reference for VQABS (`vqabsq_s8`). https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Codex caught a P1 on PR #400: my canonical reference doc claimed
_mm512_abs_epi8(VPABSB) saturatesi8::MIN → 127by ISA. This is wrong — VPABSB returns the same bit pattern for0x80, soabs(i8::MIN) = i8::MIN. A W1a worker implementingI8x16::saturating_absusing only VPABSB would have shipped exactly the i8::MIN divergence the spec was supposed to close.The PR #398 AVX-512 path got the right answer through a DIFFERENT mechanism (widen i8 → i64, negate-blend, which produces +128 in the wider register where it fits). My doc conflated the two and proposed the wrong primitive.
Correct AVX-512 saturating_abs
VPMINUB (unsigned-byte min) reads
0x80as128and clamps to127. All other lanes haveabs(x) < 0x80, unaffected.NEON
vqabsq_s8is already hardware-saturating. Scalari8::saturating_absis correct.Files (3, all inline corrections — no new files)
.claude/knowledge/ndarray-vertical-simd-alien-magic.md§W1a Module 6: #[track_caller] error macros for zero-cost location capture #2 — correct AVX-512 impl spelled out, NEON suffix-qexplanation, scalar fallback..claude/board/EPIPHANIES.mdE-SIMD-SWEEP-1— inline VPABSB-doesn't-saturate note + corrected primitive entry..claude/board/TECH_DEBT.mdTD-NDARRAY-SIMD-SATURATING-ABS-I8— rewritten Description (explains why PR impl(sprint-13/W-I1): D-CSV-13b — i4 batch SIMD dispatch + tests #398's AVX-512 got the right answer despite VPABSB not saturating) + rewritten Required API surface + new Mandatory test (assertI8x16::saturating_abs(splat(i8::MIN)) == splat(i8::MAX)across all backends).Direction B verdict (PP-16 preflight) is unchanged
Scalar is buggy (
unsigned_abs() as i8wrapsi8::MIN → -128), AVX-512 outcome is correct,i8::MINshould classify asSlope/Plateau(notValleyOfDespair). The fix is to the IMPLEMENTATION STRATEGY for the new ndarray primitive, not to the architectural decision.Companion PR
AdaWorldAPI/ndarray#149 captures the same correction on the ndarray side, with full per-arch implementation hints + mandatory parity test, so W1a workers spawning against ndarray master have the correct spec.https://claude.ai/code/session_01UwJuKqP828qyX1VkLgGJFS
Generated by Claude Code