Skip to content

perf(runtime): fast-path small Buffer.alloc via per-thread bump slab#173

Merged
proggeramlug merged 1 commit intomainfrom
fix-92-buffer-alloc-small-arena
Apr 24, 2026
Merged

perf(runtime): fast-path small Buffer.alloc via per-thread bump slab#173
proggeramlug merged 1 commit intomainfrom
fix-92-buffer-alloc-small-arena

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

  • Adds a per-thread bump-pointer slab for Buffer.alloc(N) where N < 256, eliminating one malloc call and one HashSet::insert per allocation
  • is_registered_buffer checks slab block address ranges before the HashSet, so instanceof / Buffer.isBuffer still work without per-alloc registration
  • Large buffers (≥ 256 bytes) fall through to the unchanged malloc + HashSet path

Addresses the Buffer.alloc(small) part of #92. Buffer reads/writes are separate work (PR #166).

Threshold rationale (256 bytes)

Covers the primary Postgres cell-decode pattern from the issue: 4 bytes (Int32), 8 bytes (Int64/UUID half), 16 bytes (UUID), up to 255 bytes (mid-size strings). The malloc path is retained for larger allocations where its overhead is amortised over the allocation cost.

Micro-benchmark: 100k × Buffer.alloc(16)

Time
Perry before ~8ms
Perry after ~1–2ms
Bun target ~2ms

4–8× speedup, matching Bun.

GC interaction

Buffers have never carried a GcHeader and are not tracked in MALLOC_STATE — the existing malloc path also never calls dealloc on individual buffers (they live for the thread's lifetime). Slab blocks follow the same lifetime: one alloc per 256 KB block, retained until the thread exits. No GC behaviour changes.

is_registered_buffer correctness: slab blocks exclusively contain BufferHeader allocations. All callers pass the header pointer (NaN-boxed POINTER_TAG always encodes the BufferHeader*, never interior data bytes), so the O(n_slabs) range scan has no false positives. n_slabs is typically 1–5 for programs doing 100k small-buffer allocs.

Test plan

  • cargo test --release -p perry-runtime --lib: 114/114 (3 new slab tests added: unique addresses, is_registered_buffer recognition at boundary capacities, large-buffer HashSet path still works)
  • Parity suite (run_parity_tests.sh): 104 passing — same gap-test failures as main (console_methods, string_methods, typed_arrays are known failing per CLAUDE.md)
  • test_buffer_small_alloc.ts new correctness test: alloc, fill, from-string, from-array, slice, concat, isBuffer, hex encoding, boundary sizes (255/256)
  • benchmarks/buffer_alloc_bench.ts micro-benchmark committed for future comparisons

https://claude.ai/code/session_01WrEQhY83DvjhzWXfkfKou1


Generated by Claude Code

@proggeramlug proggeramlug force-pushed the fix-92-buffer-alloc-small-arena branch 2 times, most recently from 0334d4f to 425b458 Compare April 24, 2026 05:39
…lab (v0.5.190)

Closes #92 (Buffer.alloc small half) via PR #173.

Per-thread bump-pointer slab for `Buffer.alloc(N)` where `N < 256`,
eliminating one `malloc` call and one `HashSet::insert` per
allocation. Large buffers (≥ 256 bytes) fall through to the
unchanged malloc + HashSet path.

`is_registered_buffer` checks slab block address ranges before the
`BUFFER_REGISTRY` HashSet, so `instanceof` / `Buffer.isBuffer`
still work without per-alloc registration.

## Threshold rationale (256 bytes)

Covers the primary Postgres cell-decode pattern from the issue:
4 bytes (Int32), 8 bytes (Int64 / UUID half), 16 bytes (UUID),
up to 255 bytes (mid-size strings). The malloc path is retained
for larger allocations where its overhead is amortised over the
allocation cost.

## Micro-benchmark: 100k × `Buffer.alloc(16)`

| | Time |
|---|---|
| Perry before | ~8ms |
| **Perry after** | **~1–2ms** |
| Bun target | ~2ms |

4–8× speedup, matching Bun.

## GC interaction

Buffers have never carried a `GcHeader` and are not tracked in
`MALLOC_STATE` — the existing malloc path also never calls
`dealloc` on individual buffers (they live for the thread's
lifetime). Slab blocks follow the same lifetime: one `alloc` per
256 KB block, retained until the thread exits. No GC behaviour
changes.

`is_registered_buffer` correctness: slab blocks exclusively
contain `BufferHeader` allocations. All callers pass the header
pointer (NaN-boxed `POINTER_TAG` always encodes the
`BufferHeader*`, never interior data bytes), so the O(n_slabs)
range scan has no false positives. `n_slabs` is typically 1-5
for programs doing 100k small-buffer allocs.

## Maintainer fixup folded at merge

`test_buffer_small_alloc` exposes the same macOS-14 SDK/linker
gap that affects every other Buffer test in the suite (compiles
cleanly on macOS 15.x; fails on the GitHub macOS-14 runner).
Added to `SKIP_TESTS` in `.github/workflows/test.yml` and to
`test-parity/known_failures.json` with `status: "ci-env"` —
same pattern as `test_gap_buffer_ops`, `test_stress_buffer`,
`test_buffer_numeric_read_intrinsic`. Not a perry-side bug.

## Out of scope

Buffer reads/writes are separate work — the readInt32BE et al.
intrinsic landed in v0.5.183 (PR #166), and per-receiver-type
dispatch for `tArr[i] = v` is a known follow-up after the v0.5.184
revert.

Cloud-authored PR, manually audited and metadata (version bump +
CLAUDE.md entry + ci-env skip-list entries) folded in at merge.
@proggeramlug proggeramlug force-pushed the fix-92-buffer-alloc-small-arena branch from 425b458 to 9307430 Compare April 24, 2026 06:37
@proggeramlug proggeramlug merged commit c4980b8 into main Apr 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant