perf(runtime): fast-path small Buffer.alloc via per-thread bump slab#173
Merged
proggeramlug merged 1 commit intomainfrom Apr 24, 2026
Merged
perf(runtime): fast-path small Buffer.alloc via per-thread bump slab#173proggeramlug merged 1 commit intomainfrom
proggeramlug merged 1 commit intomainfrom
Conversation
0334d4f to
425b458
Compare
…lab (v0.5.190) Closes #92 (Buffer.alloc small half) via PR #173. Per-thread bump-pointer slab for `Buffer.alloc(N)` where `N < 256`, eliminating one `malloc` call and one `HashSet::insert` per allocation. Large buffers (≥ 256 bytes) fall through to the unchanged malloc + HashSet path. `is_registered_buffer` checks slab block address ranges before the `BUFFER_REGISTRY` HashSet, so `instanceof` / `Buffer.isBuffer` still work without per-alloc registration. ## Threshold rationale (256 bytes) Covers the primary Postgres cell-decode pattern from the issue: 4 bytes (Int32), 8 bytes (Int64 / UUID half), 16 bytes (UUID), up to 255 bytes (mid-size strings). The malloc path is retained for larger allocations where its overhead is amortised over the allocation cost. ## Micro-benchmark: 100k × `Buffer.alloc(16)` | | Time | |---|---| | Perry before | ~8ms | | **Perry after** | **~1–2ms** | | Bun target | ~2ms | 4–8× speedup, matching Bun. ## GC interaction Buffers have never carried a `GcHeader` and are not tracked in `MALLOC_STATE` — the existing malloc path also never calls `dealloc` on individual buffers (they live for the thread's lifetime). Slab blocks follow the same lifetime: one `alloc` per 256 KB block, retained until the thread exits. No GC behaviour changes. `is_registered_buffer` correctness: slab blocks exclusively contain `BufferHeader` allocations. All callers pass the header pointer (NaN-boxed `POINTER_TAG` always encodes the `BufferHeader*`, never interior data bytes), so the O(n_slabs) range scan has no false positives. `n_slabs` is typically 1-5 for programs doing 100k small-buffer allocs. ## Maintainer fixup folded at merge `test_buffer_small_alloc` exposes the same macOS-14 SDK/linker gap that affects every other Buffer test in the suite (compiles cleanly on macOS 15.x; fails on the GitHub macOS-14 runner). Added to `SKIP_TESTS` in `.github/workflows/test.yml` and to `test-parity/known_failures.json` with `status: "ci-env"` — same pattern as `test_gap_buffer_ops`, `test_stress_buffer`, `test_buffer_numeric_read_intrinsic`. Not a perry-side bug. ## Out of scope Buffer reads/writes are separate work — the readInt32BE et al. intrinsic landed in v0.5.183 (PR #166), and per-receiver-type dispatch for `tArr[i] = v` is a known follow-up after the v0.5.184 revert. Cloud-authored PR, manually audited and metadata (version bump + CLAUDE.md entry + ci-env skip-list entries) folded in at merge.
425b458 to
9307430
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Buffer.alloc(N)whereN < 256, eliminating onemalloccall and oneHashSet::insertper allocationis_registered_bufferchecks slab block address ranges before the HashSet, so instanceof /Buffer.isBufferstill work without per-alloc registrationAddresses the
Buffer.alloc(small)part of #92. Buffer reads/writes are separate work (PR #166).Threshold rationale (256 bytes)
Covers the primary Postgres cell-decode pattern from the issue: 4 bytes (Int32), 8 bytes (Int64/UUID half), 16 bytes (UUID), up to 255 bytes (mid-size strings). The malloc path is retained for larger allocations where its overhead is amortised over the allocation cost.
Micro-benchmark: 100k ×
Buffer.alloc(16)4–8× speedup, matching Bun.
GC interaction
Buffers have never carried a
GcHeaderand are not tracked inMALLOC_STATE— the existing malloc path also never callsdeallocon individual buffers (they live for the thread's lifetime). Slab blocks follow the same lifetime: oneallocper 256 KB block, retained until the thread exits. No GC behaviour changes.is_registered_buffercorrectness: slab blocks exclusively containBufferHeaderallocations. All callers pass the header pointer (NaN-boxedPOINTER_TAGalways encodes theBufferHeader*, never interior data bytes), so the O(n_slabs) range scan has no false positives.n_slabsis typically 1–5 for programs doing 100k small-buffer allocs.Test plan
cargo test --release -p perry-runtime --lib: 114/114 (3 new slab tests added: unique addresses,is_registered_bufferrecognition at boundary capacities, large-buffer HashSet path still works)run_parity_tests.sh): 104 passing — same gap-test failures asmain(console_methods,string_methods,typed_arraysare known failing per CLAUDE.md)test_buffer_small_alloc.tsnew correctness test: alloc, fill, from-string, from-array, slice, concat, isBuffer, hex encoding, boundary sizes (255/256)benchmarks/buffer_alloc_bench.tsmicro-benchmark committed for future comparisonshttps://claude.ai/code/session_01WrEQhY83DvjhzWXfkfKou1
Generated by Claude Code