feat: merge-train/barretenberg by AztecBot · Pull Request #22901 · AztecProtocol/aztec-packages

AztecBot · 2026-05-01T17:15:52Z

BEGIN_COMMIT_OVERRIDE
fix(ci): default S3_BUILD_CACHE_AWS_PARAMS in cache_s3_transfer{,_to} (#22898)
chore: low-hanging chonk prover fixes from profiling (#22855)
chore: fuse N add_scaled into one parallel_for (#22893)
feat: Delayed merge implementation (#22775)
chore: numeric audit response (#22856)
fix: harden BN254 G2 SRS ingress (#22858)
fix: remove unused hash_challenge variable in batch_merge.test.cpp (#22906)
fix(bbup): remove jq dependency (#22912)
chore: fix g2 test failing on merge-train (#22920)
fix(ci): error on disabled-cache in CI hash calculation (#22904)
END_COMMIT_OVERRIDE

…#22898) `cache_s3_transfer` and `cache_s3_transfer_to` in `ci3/source_cache` reference `${S3_BUILD_CACHE_AWS_PARAMS}` without a default. Every script that sources `ci3/source` runs under `set -euo pipefail` (via `ci3/source_options`), and the variable is unset in production CI (it is only set in `ci3/source_test` for local fixtures), so the functions abort with: ``` ci3/source_cache: line 30: S3_BUILD_CACHE_AWS_PARAMS: unbound variable ``` This caused benchmark breakdown S3 uploads from `ci_benchmark_ivc_flows.sh` / `ci_benchmark_ultrahonk_circuits.sh` to fail silently, surfacing as the bench job appearing skipped on the breakdown dashboard. Match the existing convention used by `cache_upload`, `cache_delete`, `cache_download`, `cache_exists`, and `cache_ls` by providing an empty default (`:-`). Regression from dfa23c2 ("feat: migrate CI from SSH to SSM with SSH fallback") which added these two functions. Details: https://gist.github.com/AztecBot/517dc093e9149bc56adbacc8f721c8b4 ClaudeBox log: https://claudebox.work/s/33613d341cf3129d?run=1

A handful of profiling-driven trims for Chonk client-IVC. Each commit targets one zone, no behaviour change. ## Per-commit gains Each row is the commit applied **alone** on top of `merge-train/barretenberg`, measured on `ecdsar1+transfer_1_recursions+sponsored_fpc`, 16-thread remote bench, single sample per build. | Commit | Target zone | Native baseline | Native Δ | WASM baseline | WASM Δ | |---|---|---:|---:|---:|---:| | W2: reuse precomputed VK in `Chonk::accumulate_hiding_kernel` | hiding kernel | 121 ms | **−104 ms** | 343 ms | **−301 ms** | | W3: parallelise `construct_trace_data` over trace blocks¹ | construct_trace_data | 135 ms | **−34 ms** | 373 ms | **−116 ms** | | W6: parallelise `compute_permutation_mapping` cycle loop | compute_permutation_mapping | 36 ms | **−11 ms** | 61 ms | **−23 ms** | ¹ W3 splits the work into two parallel phases: Phase 1 fans out per-block (wires + copy-cycle node emission), Phase 2 fans out over a flattened `(block, selector)` task list so the threadpool can load-balance selector filling across blocks regardless of per-block size skew. The single-pass-per-block structure that the original W3 used was WASM-only; the flattened selector phase is what unlocks the native gain. ² End-to-end numbers below were measured with the original W4 commit included; subtract W4's per-commit Δ (−17 ms native, −71 ms WASM) for the W4-less stack, or roll #22893 in for the up-to-date total. ## End-to-end (full stack on top of baseline) | Zone | Native baseline | Native Δ | WASM baseline | WASM Δ | |---|---:|---:|---:|---:| | `Chonk::accumulate` (×11 per proof) | 3292 ms | **−103 ms** (3.1%)² | 9172 ms | **−532 ms** (5.8%)² | | `ChonkAPI::prove` (full E2E) | 6236 ms | **−121 ms** (2%)² | 16728 ms | **−532 ms** (3.2%)² | A fifth change (fold ECCVM masking poly into wire batch) was prototyped but didn't show measurable impact under the new "masking at top of trace" model, so it was dropped. The original W4 (fuse N `add_scaled` in `HypernovaFoldingProver::batch_polynomials`) has been split out into #22893, where it's extracted into a shared `Polynomial::add_scaled_batch` helper and applied to the PCS poly batcher and AVM prover. ## Test plan - [x] `chonk_tests` (33/33), `eccvm_tests` (44/44), `ultra_honk_tests` (283/283), `hypernova_tests` (9/9) green locally - [x] Profiled native + WASM with `/profile-chonk` on remote bench machine

Replaces the N sequential `add_scaled` calls in three batch-polynomial accumulation hot spots with a single fused `parallel_for` over the destination range, amortising N× `parallel_for` startup overhead into 1×. Extracted as a generic `Polynomial<Fr>::add_scaled_batch(dst, sources, scalars)` helper in [`polynomials/polynomial.{hpp,cpp}`](barretenberg/cpp/src/barretenberg/polynomials/polynomial.hpp), explicitly instantiated for `bb::fr` and `grumpkin::fr`. Equivalent to a loop of `dst.add_scaled(sources[i], scalars[i])` calls — same numerical result, single dispatch. ## Where it's applied | Site | N (fused sources) | Notes | |---|---|---| | `HypernovaFoldingProver::batch_polynomials` | 51 unshifted, 4 shifted (Mega) | First commit on this branch | | `PolynomialBatcher::compute_batched` (`gemini.hpp`) | 52 / 5 (Mega), 36 / 5 (Ultra), 66 / 19 (Chonk batched translator+MegaZK) | Used by shplemini | | `AvmProver::execute_pcs_rounds` | **2715 unshifted, 363 shifted** | Largest N by far — savings amplified | ## Measured impact (remote EC2, 16 threads, 3 runs each) **Chonk transfer flow** (`ecdsar1+transfer_1_recursions+sponsored_fpc`, full proof): | | Before | After | Δ | |---|---|---|---| | Native runtime | 6.11 s | 6.07 s | -0.7% | | WASM runtime | 17.27 s | 17.25 s | -0.1% | | WASM peak RSS | 323,604 KB | 323,568 KB | ~0 | Marginal on Chonk because the affected callsites are not Chonk's hot path. **AVM proof** (`AvmVerifierTests.GoodPublicInputs`): | Function | Before | After | Δ | |---|---|---|---| | `AvmProver::execute_pcs_rounds` | **392.5 ms** | **245.4 ms** | **−37.5%** | | Total proof time | 935 ms | 927 ms | -0.9% | The AVM `execute_pcs_rounds` cuts in half because N≈3000 fused `parallel_for` startups are eliminated per call. ## Memory cost Each call adds two transient `std::vector`s of size N (one of `PolynomialSpan<const Fr>` = 24 B, one of `Fr` = 32 B), freed when the function returns. Largest case is AVM unshifted (~170 KB transient) — no source polynomial is copied, only views are stored. Confirmed via `/usr/bin/time -v` on WASM Chonk: peak RSS unchanged (within 36 KB of 316 MB). ## Test plan - [x] `chonk_tests` (33/33), `hypernova_tests` (9/9), `commitment_schemes_tests` (88/88) green - [x] `vm2_tests` `*Verifier*:*Prover*` (5/5) green; full `vm2_tests` (1764/1764) - [x] Native + WASM Chonk benchmarks on remote EC2 confirm no regression - [x] AVM `execute_pcs_rounds` benchmark on remote confirms −37.5% on the targeted function

Implementation of the delayed merge: we replace the single merges in the kernels with a final batch merge in the hiding kernel. Description of the batch merge protocol is the relevant md file --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: ledwards2225 <l.edwards.d@gmail.com>

Addresses external audit [findings](https://cantina.xyz/code/53bf9e53-db08-46a1-88d7-63dc995c5102/findings) for numeric module (all low or informational).

## Summary Tightens the BN254 G2 SRS load path with four layered defenses, addressing an audit finding around G2 input validation. - **`add subgroup check on g2 points`** — implements `affine_element::is_in_prime_subgroup()` (textbook `[r]·P == O` check) and applies it at every G2 entry point. Catches G2 inputs that lie in a cofactor subgroup of E'(Fq2) but happen to be on-curve. - **`pin SHA-256 of g2 SRS bytes and verify on disk load`** — bakes the canonical Aztec `[x]_2` bytes (`BN254_G2_ELEMENT_BYTES`) and their SHA-256 (`BN254_G2_ELEMENT_SHA256`) into the binary as constants, mirroring the existing G1 chunk-hash mechanism. `get_bn254_g2_data` now hash-compares the on-disk payload before deserialization, so a tampered `bn254_g2.dat` is rejected before its bytes can reach any verifier. Also implements the previously declaration-only `get_bn254_g2_data`. - **`reject g2 = point at infinity`** — explicit gate against `[x]_2 = O`. Infinity is in every subgroup (so the subgroup check passes) and `e(−W, O) = 1` collapses the KZG pairing equation. Strictly redundant with the hash pin, but kept as defense-in-depth so any future loosening of the hash gate (e.g. a "custom SRS" flag) does not silently reopen the issue. - **`reject off-curve points in is_in_prime_subgroup`** — guards the primitive itself: the Weierstrass group law is unsound for off-curve coordinates, which means the `[r]·P` trick can return a false positive on a point satisfying `y² = x³ + b'` for some `b' ≠ b` with a prime-r factor in its order. One squaring up front closes that gap and protects every existing and future caller. ## Threat model The hash pin promotes the on-disk SRS bytes to the same trust level as the binary itself. Concretely, the attack vectors closed: - MITM on the plain-HTTP CRS download from `crs.aztec-cdn.foundation` - Compromise of the CDN bucket (R2/S3 key leak, BGP hijack) - Local cache tampering by a same-user-but-not-root process (Docker volume mounts, sibling processes, backup tooling) None of these require the attacker to patch the verifier binary itself. ## Test plan - [x] `srs_tests` — all `CrsFactory.*` pass, including new `Bn254HardcodedG2IsInPrimeSubgroup`, `Bn254G2HashMatchesPinnedBytes`, `Bn254G2DataLoadsAndVerifies`, `Bn254G2CorruptionDetected`, `Bn254G2InfinityRejected` - [x] `ecc_tests` — full suite (843 tests) passes, including new `g2.IsInPrimeSubgroupAcceptsSubgroupPoints`, `g2.IsInPrimeSubgroupRejectsCofactorPoint`, `g2.IsInPrimeSubgroupRejectsOffCurvePoint` (regression test for the off-curve guard) and the typed `IsInPrimeSubgroupAcceptsSubgroupPoints` - [x] Compiles clean with `ninja srs_tests ecc_tests`

…22906) Fixes the merge-train/barretenberg merge-queue failure: http://ci.aztec-labs.com/1617122ce095114b ## Summary `compute_running_hash` in `batch_merge.test.cpp` (added by #22775) declared `bb::fr hash_challenge(0)` but never used it. Under `-Werror -Wall`, the ARM build fails with `[-Wunused-variable]`. The fix removes the dead declaration. ## Why ci-full on #22775 didn't catch it Clang's `-Wunused-variable` is suppressed when a variable's constructor has visible side effects. On x86 (`DISABLE_ASM=OFF`), `field<T>` constructors call `self_to_montgomery_form()` → `operator*=` → `asm_self_mul_with_coarse_reduction`, which contains an `__asm__()` block with a `"memory"` clobber. Clang treats inline asm as a side effect and silences the warning. On ARM, `CMakeLists.txt` auto-sets `DISABLE_ASM=ON` (line 50–57), so `bb::fr(0)` resolves to a pure `constexpr` arithmetic path with no observable side effect — clang then issues the warning. `ci-full` (label) runs only on amd64 (see `ci.sh` `full` target — single `bootstrap_ec2 ci-full` on x86), so the asm side-effect masked the warning. The merge queue (`ci.sh merge-queue`) is what adds the arm64 leg (`a1-fast arm64 ci-fast`), which is why the failure surfaced only at merge time. ClaudeBox log: https://claudebox.work/s/e0102d5ad296fd89?run=1

Closes #22533. `bbup` failed on systems without `jq` installed: ``` /root/.bb/bbup: line 54: jq: command not found ✗ No version specified and couldn't determine version from noir ``` `bb-versions.json` is a flat `"<noir-version>": "<bb-version>"` map with one entry per line, so the lookup can be done with a `grep` + `cut` pipeline that mirrors the style already used elsewhere in this script for parsing GitHub release JSON (see lines 45 / 47 / 207 / 209 — all use `cut -d'"' -f4`). As a small side benefit, missing keys now resolve to an empty string (caught by the existing `[ -z "$version" ]` check), instead of the literal string `"null"` that `jq -r` produces. Verified locally that the new pipeline produces identical output to `jq -r --arg version "$v" '.[$version]'` for every key in `bb-versions.json` and returns empty for nonexistent keys. Closes #22533 ClaudeBox log: https://claudebox.work/s/ee6e4282b260d539?run=2

2.IsInPrimeSubgroupRejectsCofactorPoint used a hardcoded BN254 G2 cofactor point encoded as raw native Montgomery limbs. That works in the native build, but WASM uses a different internal Montgomery representation, so the same raw limbs decoded to a different point. In WASM, the fixture was not even on-curve, so the test failed at: ASSERT_TRUE(off_subgroup.on_curve()) The fix is to express the point using canonical field coordinates instead of backend-specific Montgomery limbs: fq2{ fq(2), fq(1) } fq2{ fq("0x101f...01ce"), fq("0x2b76...5fde") }

rather than silently bypass `disabled-cache` error loudly when CI calculates a hash that would be `disabled-cache`. Instead of`WARNING: Noticed changes to rebuild patterns` in `cache_content_hash` we promote to a hard error --------- Co-authored-by: ludamad <adam.domurad@gmail.com>

AztecBot added the ci-no-squash label May 1, 2026

AztecBot requested a review from charlielye as a code owner May 1, 2026 17:15

AztecBot and others added 3 commits May 1, 2026 17:17

Merge branch 'next' into merge-train/barretenberg

3bfdb79

suyash67 requested review from IlyasRidhuan, MirandaWood and jeanmon as code owners May 1, 2026 17:49

federicobarbacovi approved these changes May 1, 2026

View reviewed changes

federicobarbacovi added this pull request to the merge queue May 1, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 1, 2026

ledwards2225 and others added 3 commits May 1, 2026 13:50

chore: numeric audit response (#22856)

47bf473

Addresses external audit [findings](https://cantina.xyz/code/53bf9e53-db08-46a1-88d7-63dc995c5102/findings) for numeric module (all low or informational).

federicobarbacovi enabled auto-merge May 1, 2026 21:14

federicobarbacovi added this pull request to the merge queue May 1, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 1, 2026

AztecBot and others added 5 commits May 1, 2026 23:50

Merge branch 'next' into merge-train/barretenberg

5d41972

Merge branch 'next' into merge-train/barretenberg

1363ab3

Merge branch 'next' into merge-train/barretenberg

dc62ea5

iakovenkos added the ci-full Run all master checks. label May 4, 2026

iakovenkos enabled auto-merge May 4, 2026 11:47

iakovenkos added this pull request to the merge queue May 4, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026

AztecBot and others added 2 commits May 4, 2026 10:52

Merge branch 'next' into merge-train/barretenberg

f2fd2bf

AztecBot added 2 commits May 4, 2026 18:12

Merge branch 'next' into merge-train/barretenberg

94d4451

Merge branch 'next' into merge-train/barretenberg

5bf15ad

iakovenkos added this pull request to the merge queue May 5, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 5, 2026

iakovenkos removed ci-full Run all master checks. labels May 5, 2026

Merge branch 'next' into merge-train/barretenberg

f19315a

federicobarbacovi enabled auto-merge May 5, 2026 11:38

federicobarbacovi added this pull request to the merge queue May 5, 2026

Merged via the queue into next with commit 4520681 May 5, 2026
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: merge-train/barretenberg#22901

feat: merge-train/barretenberg#22901
federicobarbacovi merged 18 commits into
nextfrom
merge-train/barretenberg

AztecBot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

AztecBot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AztecBot commented May 1, 2026 •

edited

Loading