chore: low-hanging chonk prover fixes from profiling by suyash67 · Pull Request #22855 · AztecProtocol/aztec-packages

suyash67 · 2026-04-29T16:33:06Z

A handful of profiling-driven trims for Chonk client-IVC. Each commit targets one zone, no behaviour change.

Per-commit gains

Each row is the commit applied alone on top of merge-train/barretenberg, measured on ecdsar1+transfer_1_recursions+sponsored_fpc, 16-thread remote bench, single sample per build.

Commit	Target zone	Native baseline	Native Δ	WASM baseline	WASM Δ
W2: reuse precomputed VK in `Chonk::accumulate_hiding_kernel`	hiding kernel	121 ms	−104 ms	343 ms	−301 ms
W3: parallelise `construct_trace_data` over trace blocks¹	construct_trace_data	135 ms	−34 ms	373 ms	−116 ms
W6: parallelise `compute_permutation_mapping` cycle loop	compute_permutation_mapping	36 ms	−11 ms	61 ms	−23 ms

¹ W3 splits the work into two parallel phases: Phase 1 fans out per-block (wires + copy-cycle node emission), Phase 2 fans out over a flattened (block, selector) task list so the threadpool can load-balance selector filling across blocks regardless of per-block size skew. The single-pass-per-block structure that the original W3 used was WASM-only; the flattened selector phase is what unlocks the native gain.

² End-to-end numbers below were measured with the original W4 commit included; subtract W4's per-commit Δ (−17 ms native, −71 ms WASM) for the W4-less stack, or roll #22893 in for the up-to-date total.

End-to-end (full stack on top of baseline)

Zone	Native baseline	Native Δ	WASM baseline	WASM Δ
`Chonk::accumulate` (×11 per proof)	3292 ms	−103 ms (3.1%)²	9172 ms	−532 ms (5.8%)²
`ChonkAPI::prove` (full E2E)	6236 ms	−121 ms (2%)²	16728 ms	−532 ms (3.2%)²

A fifth change (fold ECCVM masking poly into wire batch) was prototyped but didn't show measurable impact under the new "masking at top of trace" model, so it was dropped.

The original W4 (fuse N add_scaled in HypernovaFoldingProver::batch_polynomials) has been split out into #22893, where it's extracted into a shared Polynomial::add_scaled_batch helper and applied to the PCS poly batcher and AVM prover.

Test plan

chonk_tests (33/33), eccvm_tests (44/44), ultra_honk_tests (283/283), hypernova_tests (9/9) green locally
Profiled native + WASM with /profile-chonk on remote bench machine

iakovenkos · 2026-04-29T18:27:10Z

+    // parallel_for startup overhead into 1×. Chunking over the destination range (not per
+    // source) keeps writes disjoint across threads even when sources have different
+    // start_index/end_index.
+    auto fused_add_scaled = [&](Polynomial<FF>& dst) {


can you pls enable it in the pcs where poly batcher calls add_scaled? and in the avm prover as @federicobarbacovi pointed out

iakovenkos

lgtm! left a couple of minor comments/suggestions

Skip the 31 sequential polynomial commits in MegaZKVerificationKey(precomputed) ctor by reusing the caller-supplied precomputed_vk directly as hiding_vk. MegaZKFlavor inherits VerificationKey from MegaFlavor unchanged, so the two types are identical (static_assert enforces this). Falls back to reconstruction when precomputed_vk is null for dev/test paths. Zone wall drops: - WASM: 342/300/333 ms -> 46/46/46 ms (-85% to -87%) - Native: 137/125/137 ms -> 18/17/18 ms (-86% to -87%) VK pinning short hash d519f639 unchanged; chonk tests pass.

Cycles are disjoint by construction of the generalized permutation argument (every (gate_idx, wire_idx) position belongs to exactly one variable, hence to exactly one cycle), so per-(col, row) writes never alias across cycles. Drop the serial outer loop for parallel_for_heuristic over cycle_idx without any thread-local staging or merge step. Zone wall drops: - WASM: compute_permutation_mapping 186/63/129 ms -> 112/40/80 ms (-36% to -40%) - Native: compute_permutation_mapping 110/33/71 ms -> 73/20/44 ms (-34% to -40%) VK pinning short hash d519f639 unchanged; chonk + ultra_honk tests pass.

Replace the serial block loop in populate_wires_and_selectors_and_compute _copy_cycles with parallel_for over blocks.get(). Each worker writes wires and selectors directly (disjoint row ranges per block by construction of trace_offset) and accumulates copy-cycle emissions into a thread-local flat list of (real_var_idx, cycle_node) pairs. A serial concat pass preserves block order so compute_permutation_mapping -> VK bytes stay deterministic. Zone wall drops: - WASM: construct_trace_data 1155/405/788 ms -> 886/311/585 ms (-23% to -26%) - Native: ~flat (dispatch + alloc overhead roughly cancels the parallel win on already-fast memcpy) Below-prediction outcome (WASM 0.5-1.1% vs predicted 2.0-2.4%, native near-zero vs 2.6%) - ceiling capped by Amdahl on unequal block sizes. VK pinning d519f639 unchanged; chonk + ultra_honk tests pass.

BEGIN_COMMIT_OVERRIDE fix(ci): default S3_BUILD_CACHE_AWS_PARAMS in cache_s3_transfer{,_to} (AztecProtocol#22898) chore: low-hanging chonk prover fixes from profiling (AztecProtocol#22855) chore: fuse N `add_scaled` into one `parallel_for` (AztecProtocol#22893) feat: Delayed merge implementation (AztecProtocol#22775) chore: numeric audit response (AztecProtocol#22856) fix: harden BN254 G2 SRS ingress (AztecProtocol#22858) fix: remove unused hash_challenge variable in batch_merge.test.cpp (AztecProtocol#22906) fix(bbup): remove jq dependency (AztecProtocol#22912) chore: fix g2 test failing on merge-train (AztecProtocol#22920) fix(ci): error on disabled-cache in CI hash calculation (AztecProtocol#22904) END_COMMIT_OVERRIDE

suyash67 changed the title ~~perf(barretenberg): low-hanging Chonk prover fixes from profiling~~ chore: low-hanging chonk prover fixes from profiling Apr 29, 2026

suyash67 requested a review from iakovenkos April 29, 2026 16:37

iakovenkos reviewed Apr 29, 2026

View reviewed changes

Comment thread barretenberg/cpp/src/barretenberg/chonk/chonk.cpp

iakovenkos reviewed Apr 29, 2026

View reviewed changes

Comment thread barretenberg/cpp/src/barretenberg/trace_to_polynomials/trace_to_polynomials.cpp Outdated

iakovenkos reviewed Apr 29, 2026

View reviewed changes

Comment thread barretenberg/cpp/src/barretenberg/trace_to_polynomials/trace_to_polynomials.cpp Outdated

iakovenkos approved these changes Apr 29, 2026

View reviewed changes

suyash67 requested review from IlyasRidhuan, MirandaWood and jeanmon as code owners May 1, 2026 07:57

suyash67 added 2 commits May 1, 2026 08:17

suyash67 force-pushed the sb/profile-fixes branch from 5bf32c6 to 96d552f Compare May 1, 2026 08:21

suyash67 removed request for IlyasRidhuan, MirandaWood and jeanmon May 1, 2026 08:46

suyash67 force-pushed the sb/profile-fixes branch from 96d552f to a825cc9 Compare May 1, 2026 13:28

suyash67 merged commit 6921f35 into merge-train/barretenberg May 1, 2026
12 checks passed

suyash67 deleted the sb/profile-fixes branch May 1, 2026 17:41

AztecBot mentioned this pull request May 1, 2026

feat: merge-train/barretenberg #22901

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: low-hanging chonk prover fixes from profiling#22855

chore: low-hanging chonk prover fixes from profiling#22855
suyash67 merged 3 commits into
merge-train/barretenbergfrom
sb/profile-fixes

suyash67 commented Apr 29, 2026 •

edited

Loading

Uh oh!

iakovenkos Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iakovenkos left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

suyash67 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Per-commit gains

End-to-end (full stack on top of baseline)

Test plan

Uh oh!

iakovenkos Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iakovenkos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

suyash67 commented Apr 29, 2026 •

edited

Loading