feat: add wasm SIMD support by indutny · Pull Request #73 · nodejs/llparse

indutny · 2025-04-17T18:49:52Z

This makes wasm version of llhttp around 5-6% faster on real-world payloads. The produced code looks roughly like this:

        /* Load input */
        input = wasm_v128_load(p);
        /* Find first character that does not match `ranges` */
        single = wasm_i8x16_eq(input, wasm_u8x16_const_splat(0x9));
        mask = single;
        single = wasm_v128_and(
          wasm_i8x16_ge(input, wasm_u8x16_const_splat(' ')),
          wasm_i8x16_le(input, wasm_u8x16_const_splat('~'))
        );
        mask = wasm_v128_or(mask, single);
        single = wasm_v128_or(
          wasm_i8x16_ge(input, wasm_u8x16_const_splat(0x80)),
          wasm_i8x16_le(input, wasm_u8x16_const_splat(0xff))
        );
        mask = wasm_v128_or(mask, single);
        match_len = __builtin_ctz(
          ~wasm_i8x16_bitmask(mask)
        );

It is conceptually similar to SSE vectorization that we already support except that we can't multiple comparisons at once and have to check ranges individually.

See previous attempt #72
Benchmark: https://gist.github.com/indutny/d18d56fe3c7254d888a79eca98f39145

cc @nodejs/llhttp @mcollina

indutny · 2025-04-17T18:50:46Z

src/implementation/c/node/table-lookup.ts

    out.push(`if (${ctx.endPosArg()} - ${ctx.posArg()} >= 16) {`);
    out.push('  __m128i ranges;');
    out.push('  __m128i input;');
-    out.push('  int avail;');


indutny · 2025-04-17T18:51:03Z

package.json

    "mocha": "^9.2.2",
    "ts-node": "^9.0.0",
-    "typescript": "^4.0.3"
+    "typescript": "^5.0.3"


The package wasn't compiling without typescript update.

indutny · 2025-04-17T18:51:15Z

src/implementation/c/compilation.ts

        const hex: string[] = [];
        for (let j = i; j < limit; j++) {
          const value = buffer[j];
+          assert(value !== undefined);


With typescript version change - new errors popped up.

ShogunPanda

LGTM!
That's just amazing!

mcollina

lgtm, this is amazing!!!!

mcollina · 2025-04-18T07:20:24Z

@indutny go ahead and release! :D

indutny · 2025-04-18T14:55:07Z

Let me take a deeper look at assembly before merging, but if I don't come back let's merge in two days!

indutny · 2025-04-18T15:48:32Z

For posterity SIMD assembly on ARM64:

cmp w10, #0x10 ; (16)
b.ge #+0x20 ; (addr 0xa8cde2a2f8c)
ldr q4, [x4, x6]
add v5.16b, v4.16b, v0.16b
cmhi v5.16b, v1.16b, v5.16b
cmeq v6.16b, v4.16b, v2.16b
mvn v6.16b, v6.16b
and v5.16b, v6.16b, v5.16b
cmgt v4.16b, v4.16b, v3.16b
and v4.16b, v4.16b, v5.16b
movz x16, #0x4080
movk x16, #0x1020, lsl #16
movk x16, #0x408, lsl #32
movk x16, #0x102, lsl #48
dup v31.2d, x16
ushr v30.16b, v4.16b, #7
pmull2 v4.1q, v31.2d, v30.2d
pmull v30.1q, v31.1d, v30.1d
trn2 v30.8b, v30.8b, v4.8b
umov w6, v30.h[3]
orr w6, w6, #0x10000
rbit w6, w6
clz w6, w6
add w8, w6, w8
cmp w6, #0x10 ; (16)
b.ne #+0xa0 ; (addr 0xa8cde2a3088)

llvm-mca -iterations=1

Iterations:        1
Instructions:      26
Total Cycles:      35
Total uOps:        29

Dispatch Width:    6
uOps Per Cycle:    0.83
IPC:               0.74
Block RThroughput: 4.8

No SIMD

ldrb w6, [x4, x8]
add w6, w6, w5
ldrb w6, [x4, x6]
cmp w6, #0x1 ; (1)
b.ne #+0x7c ; (addr 0x31dfce0273a8)
add w8, w8, #0x1 ; (1)
b #+0x68 ; (addr 0x31dfce02739c)

llvm-mca -iterations=16

Iterations:        16
Instructions:      112
Total Cycles:      48
Total uOps:        144

Dispatch Width:    6
uOps Per Cycle:    3.00
IPC:               2.33
Block RThroughput: 2.0

This makes wasm version of llhttp around 5-6% faster on real-world payloads. The produced code looks roughly like this: /* Load input */ input = wasm_v128_load(p); /* Find first character that does not match `ranges` */ single = wasm_i8x16_eq(input, wasm_u8x16_const_splat(0x9)); mask = single; single = wasm_v128_and( wasm_i8x16_ge(input, wasm_u8x16_const_splat(' ')), wasm_i8x16_le(input, wasm_u8x16_const_splat('~')) ); mask = wasm_v128_or(mask, single); single = wasm_v128_or( wasm_i8x16_ge(input, wasm_u8x16_const_splat(0x80)), wasm_i8x16_le(input, wasm_u8x16_const_splat(0xff)) ); mask = wasm_v128_or(mask, single); match_len = __builtin_ctz( ~wasm_i8x16_bitmask(mask) ); It is conceptually similar to SSE vectorization that we already support except that we can't multiple comparisons at once and have to check ranges individually.

indutny · 2025-04-18T19:11:42Z

Found slightly more optimal code path.

indutny · 2025-04-18T19:46:28Z

I'm suddenly feeling a bit apprehensive about merging this since I currently have no way of running llhttp test suite on wasm file. Might need to look into it first.

indutny · 2025-04-19T17:04:59Z

Okay, with nodejs/llparse-test-fixture#22 I managed to test this PR and confirm that it passes all existing llhttp tests. Merging.

indutny · 2025-04-19T17:08:10Z

@mcollina llparse@7.2.0 was released. I believe llhttp has ^7.1.1 dependency spec so it should pick it up without changes. Please expect no performance difference on Mac CPUs (for reasons yet unknown), there should be a perf boost on x86_64, though.

indutny force-pushed the feature/wasm-simd-2 branch from 2cf2e2a to 7cc6104 Compare April 17, 2025 18:50

indutny commented Apr 17, 2025

View reviewed changes

indutny force-pushed the feature/wasm-simd-2 branch 2 times, most recently from 190bc50 to 9e9e699 Compare April 17, 2025 21:43

ShogunPanda approved these changes Apr 18, 2025

View reviewed changes

mcollina approved these changes Apr 18, 2025

View reviewed changes

indutny force-pushed the feature/wasm-simd-2 branch from 9e9e699 to 8f30ff9 Compare April 18, 2025 19:11

indutny merged commit 4d89e77 into nodejs:main Apr 19, 2025

indutny deleted the feature/wasm-simd-2 branch April 19, 2025 17:07

mcollina mentioned this pull request Sep 8, 2025

deps: update undici to 7.15.0 nodejs/node#59605

Closed

richardlau mentioned this pull request Oct 9, 2025

deps: V8: cherry-pick 2abc61361dd4 nodejs/node#60177

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add wasm SIMD support#73

feat: add wasm SIMD support#73
indutny merged 1 commit intonodejs:mainfrom
indutny:feature/wasm-simd-2

indutny commented Apr 17, 2025 •

edited

Loading

Uh oh!

indutny Apr 17, 2025

Uh oh!

indutny Apr 17, 2025

Uh oh!

indutny Apr 17, 2025

Uh oh!

ShogunPanda left a comment

Uh oh!

mcollina left a comment

Uh oh!

mcollina commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 19, 2025

Uh oh!

indutny commented Apr 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

indutny commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

indutny Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

indutny Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

indutny Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

ShogunPanda left a comment

Choose a reason for hiding this comment

Uh oh!

mcollina left a comment

Choose a reason for hiding this comment

Uh oh!

mcollina commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 18, 2025

Uh oh!

indutny commented Apr 19, 2025

Uh oh!

indutny commented Apr 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

indutny commented Apr 17, 2025 •

edited

Loading