fix: Node 26 CI failures — wall profiler cleanup-hook UAF + OpenSSL LSan leak#360
Merged
Conversation
WallProfiler registers an environment cleanup hook (CleanupHook) via node::AddEnvironmentCleanupHook so it can stop profiling when a worker isolate is terminated without a beforeExit notification. On termination, CleanupHook -> Cleanup -> Dispose called node::RemoveEnvironmentCleanupHook for that same hook. Node's cleanup-hook trampoline (CleanupHookThunkRun) invokes the hook and then dereferences its internal hook record again to remove the hook itself. Removing the hook from within the hook frees that record early, so when our callback returns Node reads freed memory — a use-after-free that segfaults on worker termination (observed reliably on Node 26, exercised by the "should not crash when worker is terminated" test). Add a removeCleanupHook flag to Dispose (default true). Cleanup, which only runs from inside CleanupHook, passes false and lets Node remove the hook itself. The still-running call sites (StopCore, StartImpl failure) keep removing it, since there the hook is live and must not fire later against a destroyed profiler. Reproduced and verified fixed in an Alpine/musl Node 26 container under gdb: crashed within 1-2 runs before, 30/30 clean after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Overall package sizeSelf size: 2.19 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | pprof-format | 2.2.2 | 500.53 kB | 500.53 kB | | source-map | 0.7.6 | 185.63 kB | 185.63 kB | | node-gyp-build | 4.8.4 | 13.86 kB | 13.86 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Node 26's OpenSSL populates a global builtin-compressions list during OPENSSL_init_crypto (ossl_load_builtin_compressions) that is never freed at process exit, causing the ASAN job to fail with a 24-byte leak report. This is the same class of benign one-time process-init leak we already suppress via CRYPTO_zalloc, but this path allocates via CRYPTO_malloc and so slips through. Suppress the specific ossl_load_builtin_compressions frame rather than broadening to CRYPTO_malloc, which is OpenSSL's general allocator and could mask genuine leaks in crypto code paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
IlyasShabi
approved these changes
Jun 26, 2026
IlyasShabi
pushed a commit
that referenced
this pull request
Jun 26, 2026
WallProfiler registers an environment cleanup hook (CleanupHook) via node::AddEnvironmentCleanupHook so it can stop profiling when a worker isolate is terminated without a beforeExit notification. On termination, CleanupHook -> Cleanup -> Dispose called node::RemoveEnvironmentCleanupHook for that same hook. Node's cleanup-hook trampoline (CleanupHookThunkRun) invokes the hook and then dereferences its internal hook record again to remove the hook itself. Removing the hook from within the hook frees that record early, so when our callback returns Node reads freed memory — a use-after-free that segfaults on worker termination (observed reliably on Node 26, exercised by the "should not crash when worker is terminated" test). Add a removeCleanupHook flag to Dispose (default true). Cleanup, which only runs from inside CleanupHook, passes false and lets Node remove the hook itself. The still-running call sites (StopCore, StartImpl failure) keep removing it, since there the hook is live and must not fire later against a destroyed profiler. Reproduced and verified fixed in an Alpine/musl Node 26 container under gdb: crashed within 1-2 runs before, 30/30 clean after. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merged
IlyasShabi
pushed a commit
that referenced
this pull request
Jun 26, 2026
WallProfiler registers an environment cleanup hook (CleanupHook) via node::AddEnvironmentCleanupHook so it can stop profiling when a worker isolate is terminated without a beforeExit notification. On termination, CleanupHook -> Cleanup -> Dispose called node::RemoveEnvironmentCleanupHook for that same hook. Node's cleanup-hook trampoline (CleanupHookThunkRun) invokes the hook and then dereferences its internal hook record again to remove the hook itself. Removing the hook from within the hook frees that record early, so when our callback returns Node reads freed memory — a use-after-free that segfaults on worker termination (observed reliably on Node 26, exercised by the "should not crash when worker is terminated" test). Add a removeCleanupHook flag to Dispose (default true). Cleanup, which only runs from inside CleanupHook, passes false and lets Node remove the hook itself. The still-running call sites (StopCore, StartImpl failure) keep removing it, since there the hook is live and must not fire later against a destroyed profiler. Reproduced and verified fixed in an Alpine/musl Node 26 container under gdb: crashed within 1-2 runs before, 30/30 clean after. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
szegedi
added a commit
that referenced
this pull request
Jun 26, 2026
WallProfiler registers an environment cleanup hook (CleanupHook) via node::AddEnvironmentCleanupHook so it can stop profiling when a worker isolate is terminated without a beforeExit notification. On termination, CleanupHook -> Cleanup -> Dispose called node::RemoveEnvironmentCleanupHook for that same hook. Node's cleanup-hook trampoline (CleanupHookThunkRun) invokes the hook and then dereferences its internal hook record again to remove the hook itself. Removing the hook from within the hook frees that record early, so when our callback returns Node reads freed memory — a use-after-free that segfaults on worker termination (observed reliably on Node 26, exercised by the "should not crash when worker is terminated" test). Add a removeCleanupHook flag to Dispose (default true). Cleanup, which only runs from inside CleanupHook, passes false and lets Node remove the hook itself. The still-running call sites (StopCore, StartImpl failure) keep removing it, since there the hook is live and must not fire later against a destroyed profiler. Reproduced and verified fixed in an Alpine/musl Node 26 container under gdb: crashed within 1-2 runs before, 30/30 clean after. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the two Node.js 26 CI failures. They're bundled into one PR because each failure blocks the other's CI from going green, so neither can be merged alone.
1.
build / alpine-test-26— SIGSEGV (use-after-free)Symptom
alpine-test-26crashes withSIGSEGV(exit 139) during theWorker Threads›should not crash when worker is terminatedtest, which spawns and terminates many workers that each run the wall profiler. Faulting thread:Root cause
WallProfilerregisters an environment cleanup hook (CleanupHook) vianode::AddEnvironmentCleanupHookso it can stop profiling when a worker isolate is terminated without abeforeExitnotification. On termination,CleanupHook → Cleanup → Disposecallednode::RemoveEnvironmentCleanupHookfor that same hook.Node's cleanup-hook trampoline runs the hook and then dereferences its internal hook record again to remove the hook itself (disassembly — the fault is at
ldr x0, [x19]right after theblrthat calls our callback, and the fault address equalsx19, the record pointer):Because our callback removed (and freed) the record mid-execution, Node read freed memory when it returned.
Fix
Add a
removeCleanupHookflag toDispose(defaulttrue).Cleanup— which only ever runs from insideCleanupHook— passesfalseand lets Node remove the hook itself. The still-running call sites (StopCore,StartImplfailure path) keep removing it, since there the hook is live and must not fire later against a destroyed profiler.Verification
Reproduced and verified in an Alpine/musl Node 26 container under gdb: the worker-termination test crashed within 1–2 runs before the change, and ran 30/30 clean after it. Covered by the existing
should not crash when worker is terminatedtest.2.
asan (26)— LeakSanitizer failure (OpenSSL)Symptom
Root cause & fix
Node 26's OpenSSL populates a global builtin-compressions list during
OPENSSL_init_cryptothat is never freed at process exit. This is the same class of benign one-time process-init leak already suppressed vialeak:CRYPTO_zalloc, but this path allocates viaCRYPTO_mallocand so isn't matched. Suppress the specificossl_load_builtin_compressionsframe (rather than broadening toCRYPTO_malloc, OpenSSL's general allocator, which could mask real leaks).Supersedes #359 (LSan suppression), which is folded in here as a separate commit.
🤖 Generated with Claude Code