silk: Add Arm NEON silk_VAD_GetSA_Q8#483
Open
czoli1976 wants to merge 1 commit into
Open
Conversation
silk_VAD_GetSA_Q8 had an x86 SSE4.1 implementation but no Arm one, and it runs on every SILK/hybrid frame in the default (float) build. Add a NEON version mirroring the SSE4.1 one: it vectorises the per-subframe energy sum-of-squares ((X[i] >> 3)^2 accumulated in int32), 8 samples per iteration via vshrq_n_s16 + paired vmlal_s16, with a scalar tail. Bit-exact with the C reference (exact integer sum, no overflow), validated by the existing OPUS_CHECK_ASM full-state memcmp. As on x86, silk_VAD_GetNoiseLevels is exported (rather than static inline in VAD.c) when NEON is enabled so the kernel can call it. Dispatched via the existing OVERRIDE_silk_VAD_GetSA_Q8 hook (PRESUME + an RTCD table in arm_silk_map.c); the source goes in the common SILK_SOURCES_ARM_NEON_INTR group, already wired in autotools/CMake/Meson. Microbench on Apple M4 (the real subband lengths, 10-80): ~1.1-1.7x over scalar; E2E within run-to-run noise (VAD is a small per-frame cost). Full meson test suite passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
silk_VAD_GetSA_Q8had an x86 SSE4.1 implementation but no Arm one, even though it runs on every SILK/hybrid frame in the default (float) build. This adds a NEON version, mirroring the SSE4.1 one.What's vectorised
The per-subframe energy sum-of-squares —
(X[i] >> 3)^2accumulated in int32 — 8 samples per iteration viavshrq_n_s16+ pairedvmlal_s16(low/high), with a scalar tail and a horizontalvaddvq_s32. Everything else (analysis filterbank, noise estimation, SNR/tilt) is identical to the C reference, exactly as the SSE4.1 version does.silk_VAD_GetSA_Q8_c(exact integer sum of squares, no overflow), validated by the existingOPUS_CHECK_ASMfull-encoder-statememcmp.silk_VAD_GetNoiseLevelsbecomes exported (instead ofstatic inlineinVAD.c) when NEON is enabled, so the kernel can call it.Dispatch / wiring
Uses the existing
OVERRIDE_silk_VAD_GetSA_Q8hook: a newsilk/arm/VAD_arm.hprovides the PRESUME (direct call) and RTCD (SILK_VAD_GETSA_Q8_IMPLtable inarm_silk_map.c) dispatch, mirroringsilk/x86/main_sse.h. The source is added to the commonSILK_SOURCES_ARM_NEON_INTRgroup, which is already wired in autotools / CMake / Meson — so no build-system changes are needed.Numbers (Apple M4)
This is the last of the silk-side x86-has-it/ARM-doesn't parity gaps that runs in the default build.