Skip to content

fix(revert): avm sim uses event loop again (#21138)#21630

Merged
ludamad merged 1 commit into
backport-to-v4-stagingfrom
revert/threaded-async-op-v4-clean
Mar 16, 2026
Merged

fix(revert): avm sim uses event loop again (#21138)#21630
ludamad merged 1 commit into
backport-to-v4-stagingfrom
revert/threaded-async-op-v4-clean

Conversation

@ludamad

@ludamad ludamad commented Mar 16, 2026

Copy link
Copy Markdown
Collaborator

Reverts #21138 on v4. ThreadedAsyncOperation has a use-after-free that causes SIGBUS on macOS and silent memory corruption on Linux. Restoring AsyncOperation (libuv pool) with the original deadlock-prevention semaphore (UV_THREADPOOL_SIZE / 2) until a proper fix lands on next (#21625).

Post mortem

…v pool (#21138)

ThreadedAsyncOperation has a use-after-free that causes SIGBUS on macOS
and silent corruption on Linux. Reverting to AsyncOperation (libuv pool)
with the original UV_THREADPOOL_SIZE/2 deadlock-prevention semaphore
until a proper fix lands on next.
@ludamad ludamad changed the title revert: run AVM NAPI simulations on dedicated threads instead of libuv pool (#21138) fix(revert): avm sim uses event loop again (#21138) Mar 16, 2026
@ludamad ludamad enabled auto-merge (squash) March 16, 2026 19:17
@AztecBot

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/2c7b3bc940656e86�2c7b3bc940656e868;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_mbps.parallel.test.ts "builds multiple blocks per slot with transactions anchored to checkpointed block" (177s) (code: 0) group:e2e-p2p-epoch-flakes

@ludamad ludamad merged commit 619b541 into backport-to-v4-staging Mar 16, 2026
12 of 13 checks passed
@ludamad ludamad deleted the revert/threaded-async-op-v4-clean branch March 16, 2026 21:20
alexghr added a commit that referenced this pull request Mar 17, 2026
BEGIN_COMMIT_OVERRIDE
fix(aztec-nr): return Option from decode functions and fix event
commitment capacity (backport #21264) (#21360)
fix: backport #21271 — handle bad note lengths on
compute_note_hash_and_nullifier (#21364)
fix: not reusing tags of partially reverted txs (#20817)
chore: revert accidental backport of #20817 (#21583)
feat: Implement commit all and revert all for world state checkpoints
(#21532)
cherry-pick: fix: dependabot alerts (#21531)
fix: dependabot alerts (backport #21531 to v4) (#21592)
fix: backport #21443 — Don't update state if we failed to execute
sufficient transactions (v4) (#21610)
chore: Fix msgpack serialisation (#21612)
fix(p2p): fall back to maxTxsPerCheckpoint for per-block tx validation
(#21605)
chore: merge v4 into backport-to-v4-staging (#21618)
fix(revert): avm sim uses event loop again (#21138) (#21630)
fix(e2e): remove historic/finalized block checks from epochs_multiple
test (#21642)
fix: clamp finalized block to oldest available in world-state (#21643)
fix: skip handleChainFinalized when block is behind oldest available
(#21656)
chore: demote finalized block skip log to trace (#21661)
fix: off-by-1 in getBlockHashMembershipWitness archive snapshot
(backport #21648) (#21663)
fix: capture txs not available error reason in proposal handler (#21670)
chore: add L1 inclusion time to stg public (#21665)
END_COMMIT_OVERRIDE

---------

Co-authored-by: Jan Beneš <janbenes1234@gmail.com>
Co-authored-by: PhilWindle <60546371+PhilWindle@users.noreply.github.com>
Co-authored-by: Phil Windle <philip.windle@gmail.com>
Co-authored-by: Santiago Palladino <santiago@aztecprotocol.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: ludamad <adam.domurad@gmail.com>
Co-authored-by: Alex Gherghisan <alexghr@users.noreply.github.com>
AztecBot pushed a commit that referenced this pull request May 19, 2026
Fixes use-after-free in `ThreadedAsyncOperation` (#21138) that causes SIGBUS on macOS and silent memory corruption on Linux. v4 is handled by reverting: #21630.

**Root cause**: TSFN `BlockingCall` (`napi_tsfn_blocking`) only blocks on *queue insertion*, NOT on callback completion. The callback runs asynchronously on the JS main thread, so `delete this` on the worker thread raced with the callback reading member fields. macOS's magazine malloc aggressively unmaps freed pages, turning this into a consistent SIGBUS. Linux glibc keeps pages mapped, so the race is silent.

**Fix**: manage `ThreadedAsyncOperation` via `shared_ptr` (`enable_shared_from_this`). Both the worker thread lambda and the TSFN callback capture a `shared_ptr`, so the object lives until both are done. Verified clean under ASAN with 1000+ concurrent operations (heap-use-after-free confirmed on buggy code, clean on fix).

[Full post mortem](https://gist.github.com/ludamad/443afe321853389a08693c4ff73676f7)
github-merge-queue Bot pushed a commit that referenced this pull request May 19, 2026
…cOS (#21625)

Fixes use-after-free in `ThreadedAsyncOperation` (#21138) that causes
SIGBUS on macOS and silent memory corruption on Linux. v4 is handled by
reverting: #21630.

**Root cause**: TSFN `BlockingCall` (`napi_tsfn_blocking`) only blocks
on *queue insertion*, NOT on callback completion. The callback runs
asynchronously on the JS main thread, so `delete this` on the worker
thread raced with the callback reading member fields. macOS's magazine
malloc aggressively unmaps freed pages, turning this into a consistent
SIGBUS. Linux glibc keeps pages mapped, so the race is silent.

**Fix**: manage `ThreadedAsyncOperation` via `shared_ptr`
(`enable_shared_from_this`). Both the worker thread lambda and the TSFN
callback capture a `shared_ptr`, so the object lives until both are
done. Verified clean under ASAN with 1000+ concurrent operations
(heap-use-after-free confirmed on buggy code, clean on fix).

[Full post
mortem](https://gist.github.com/ludamad/443afe321853389a08693c4ff73676f7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants