Skip to content

feat: AVM cutover — delete NAPI AVM, wire IPC simulator pool + CDB IPC server [PR 3b]#23196

Merged
charlielye merged 0 commit into
cl/ipc-4-avm-binaryfrom
cl/ipc-5-avm-cutover
May 29, 2026
Merged

feat: AVM cutover — delete NAPI AVM, wire IPC simulator pool + CDB IPC server [PR 3b]#23196
charlielye merged 0 commit into
cl/ipc-4-avm-binaryfrom
cl/ipc-5-avm-cutover

Conversation

@charlielye

Copy link
Copy Markdown
Contributor

Summary

Cuts the in-process NAPI AVM over to the standalone `aztec-avm` binary (PR 3a, #23084) + a TS-hosted `CdbIpcServer` for contract data callbacks. NAPI AVM and its TsCallback-based contracts DB are deleted.

Third (and final) stage of the WSDB/AVM IPC architecture migration:

NAPI side (delete)

  • Delete `barretenberg/cpp/src/barretenberg/nodejs_module/avm_simulate/` (NAPI AVM module + TS-callback contracts DB + utils).
  • Strip the AVM exports from `init_module.cpp`; `nodejs_module.node` now exposes only `LMDBStore`, `MsgpackClient`, `MsgpackClientAsync`.
  • Drop the `NativeAvm`/`NativeAvmCancellationToken` exports from `@aztec/native`.

TS side (rewire)

  • `public_tx_simulator/cpp_public_tx_simulator.ts` drives an `AvmIpcBackend` (pool from PR 3a), returns `SimulationHandle { result, cancel }`.
  • `public_tx_simulator/factories.ts` returns the IPC simulator for production block building and proving.
  • `public_processor` + `guarded_merkle_tree` updated for the new `SimulationHandle` and `AvmIpcBackend` types.
  • `public_tx_simulator/ipc_vs_ts_public_tx_simulator.ts` added as the new dual-run test helper (replaces `cpp_vs_ts_public_tx_simulator.ts`).
  • `public_tx_simulator/dumping_cpp_public_tx_simulator.ts` updated to dump IPC msgpack inputs.

Integration

  • `aztec-node/src/aztec-node/server.ts` spawns `AvmBackend` + `CdbIpcServer` alongside `WsdbBackend`.
  • `validator-client/checkpoint_builder` and `prover-node` wired through.
  • `txe` test infra uses the same backends.

Removed test

  • `simulator/src/public/public_processor/apps_tests/timeout_race.test.ts` — the bug it guarded against (in-process C++ AVM racing TS checkpoint reverts on the same WorldState handle) is structurally impossible once the AVM is a separate process. The cancellation path remains exercised through `SimulationHandle.cancel()`.

Known issue (bug-debug zone)

CI surfaces the partial-note `_finalize_mint_to_private` gas-exhaustion bug inherited from `cl/wsdb_cdb`: the IPC AVM consumes all gas and reverts with empty `revertData` (fingerprint of C++ `handle_exceptional_halt`). Bounded diff makes this fixable in isolation.

Test plan

  • yarn-project build green
  • simulator unit tests pass
  • e2e_block_building / e2e_avm_simulator pass
  • e2e_fees/account_init passes — bug-fix zone for this PR
  • Local sandbox boots and accepts a transaction

@AztecBot

Copy link
Copy Markdown
Collaborator

This issue was automatically closed because it was referenced in PR #23469 which has been merged to the default branch.

View workflow run

@AztecBot AztecBot closed this May 22, 2026
@charlielye charlielye reopened this May 29, 2026
@AztecBot

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/23a45829fd6bc027�23a45829fd6bc0278;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/add_rollup.test.ts (606s) (code: 0) group:e2e-p2p-epoch-flakes

@charlielye charlielye force-pushed the cl/ipc-4-avm-binary branch from 479d6a6 to 6867e96 Compare May 29, 2026 13:55
@charlielye charlielye merged commit 6867e96 into cl/ipc-4-avm-binary May 29, 2026
12 of 20 checks passed
@charlielye charlielye deleted the cl/ipc-5-avm-cutover branch May 29, 2026 13:58
@charlielye charlielye force-pushed the cl/ipc-5-avm-cutover branch from 1c8d4f9 to 6867e96 Compare May 29, 2026 13:58
@charlielye charlielye restored the cl/ipc-5-avm-cutover branch May 29, 2026 13:59
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
…AztecProtocol#23469)

## Summary

`aztec start --local-network` reliably SIGBUSes a few blocks into a run
on macOS arm64 (since `v5.0.0-nightly.20260520`, i.e. after AztecProtocol#21625
shipped the `shared_ptr` use-after-free fix). This is a **different**
fault from the one AztecProtocol#21625 fixed: a stack-guard violation (stack
overflow) on a `nodejs_module.node` worker thread running AVM-simulation
code, not a use-after-free.

This pins an explicit, generous stack size on the
`ThreadedAsyncOperation` worker thread.

## Root cause

`ThreadedAsyncOperation::Queue()` (introduced in AztecProtocol#21138) runs the AVM
simulation (`_fn`) directly on a bare `std::thread(...).detach()`. A
`std::thread` uses the OS default stack for non-main threads, which is
**512 KB on macOS** versus **8 MB on Linux**. The AVM-simulation call
chain is deep enough to overflow 512 KB, so on macOS arm64 the worker
writes into its stack-guard page and the process aborts with:

```
EXC_BAD_ACCESS / SIGBUS, KERN_PROTECTION_FAILURE
"Could not determine thread index for stack guard region"
  #0 _platform_memmove
  #1.. nodejs_module.node  bb::nodejs (AVM simulation path)
```

Linux is unaffected because its 8 MB default is comfortably large. The
previous `AsyncOperation` path never hit this either: it ran on the
libuv threadpool, whose threads are sized from `RLIMIT_STACK` (8 MB soft
on macOS), not the 512 KB raw-thread default.

## Fix

`std::thread` can't set a stack size, so launch the worker via
`pthreads` with `pthread_attr_setstacksize` pinned to a generous
`WORKER_STACK_SIZE` (32 MB — 4× the 8 MB that the libuv path proved
sufficient, with headroom for deeper future call chains). Falls back to
a default-stack `std::thread` only if pthreads is unavailable (`_WIN32`)
or `pthread_create` fails.

The shared_ptr lifetime model from AztecProtocol#21625 is preserved exactly — both
the worker lambda and the `BlockingCall` completion callback still
capture `self`, so this does not reintroduce the use-after-free. Only
the thread-launch mechanism changed.

## Testing

- The full bb build is too heavy to run in this session, so this is
**not yet a local end-to-end repro/fix verification** — it relies on CI
for compilation and on a macOS arm64 `aztec start --local-network` run
to confirm the crash is gone.
- The pthread/`std::function` trampoline was compiled and run standalone
under `-std=c++20 -Wall -Wextra -Werror`: the worker thread receives a
32 MB stack (`pthread_get_stacksize_np` reports `33554432`), and the
work runs and completes.
- **Requested:** verify against tonight's nightly on macOS arm64 (M3) —
the reporter's exact repro.

## Notes for reviewers

- Targets `next` (not `merge-train/barretenberg`) to match AztecProtocol#21625's base
and to make the nightly, since this is an urgent release-affecting
crash. Happy to retarget if you'd prefer it go through the merge train.
- 32 MB is a deliberate over-provision; if you'd rather mirror the libuv
path precisely we could instead size from `getrlimit(RLIMIT_STACK)`. The
fixed constant is simpler and the virtual reservation only commits pages
as touched.
- The longer-term fix is the NAPI→IPC migration (AztecProtocol#21331 / AztecProtocol#23196 /
AztecProtocol#23238), which removes this in-process worker entirely. This is a
targeted stop-gap for the shipping NAPI path.

Related: AztecProtocol#21138 (introduced the threaded model), AztecProtocol#21625 (use-after-free
fix), AztecProtocol#21629 (open alternative).

---
*Created by
[claudebox](https://claudebox.work/v2/sessions/4bd36dc505c20254) ·
group: `slackbot`*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants