feat: merge-train/barretenberg#22901
Merged
Merged
Conversation
…#22898) `cache_s3_transfer` and `cache_s3_transfer_to` in `ci3/source_cache` reference `${S3_BUILD_CACHE_AWS_PARAMS}` without a default. Every script that sources `ci3/source` runs under `set -euo pipefail` (via `ci3/source_options`), and the variable is unset in production CI (it is only set in `ci3/source_test` for local fixtures), so the functions abort with: ``` ci3/source_cache: line 30: S3_BUILD_CACHE_AWS_PARAMS: unbound variable ``` This caused benchmark breakdown S3 uploads from `ci_benchmark_ivc_flows.sh` / `ci_benchmark_ultrahonk_circuits.sh` to fail silently, surfacing as the bench job appearing skipped on the breakdown dashboard. Match the existing convention used by `cache_upload`, `cache_delete`, `cache_download`, `cache_exists`, and `cache_ls` by providing an empty default (`:-`). Regression from dfa23c2 ("feat: migrate CI from SSH to SSM with SSH fallback") which added these two functions. Details: https://gist.github.com/AztecBot/517dc093e9149bc56adbacc8f721c8b4 ClaudeBox log: https://claudebox.work/s/33613d341cf3129d?run=1
A handful of profiling-driven trims for Chonk client-IVC. Each commit targets one zone, no behaviour change. ## Per-commit gains Each row is the commit applied **alone** on top of `merge-train/barretenberg`, measured on `ecdsar1+transfer_1_recursions+sponsored_fpc`, 16-thread remote bench, single sample per build. | Commit | Target zone | Native baseline | Native Δ | WASM baseline | WASM Δ | |---|---|---:|---:|---:|---:| | W2: reuse precomputed VK in `Chonk::accumulate_hiding_kernel` | hiding kernel | 121 ms | **−104 ms** | 343 ms | **−301 ms** | | W3: parallelise `construct_trace_data` over trace blocks¹ | construct_trace_data | 135 ms | **−34 ms** | 373 ms | **−116 ms** | | W6: parallelise `compute_permutation_mapping` cycle loop | compute_permutation_mapping | 36 ms | **−11 ms** | 61 ms | **−23 ms** | ¹ W3 splits the work into two parallel phases: Phase 1 fans out per-block (wires + copy-cycle node emission), Phase 2 fans out over a flattened `(block, selector)` task list so the threadpool can load-balance selector filling across blocks regardless of per-block size skew. The single-pass-per-block structure that the original W3 used was WASM-only; the flattened selector phase is what unlocks the native gain. ² End-to-end numbers below were measured with the original W4 commit included; subtract W4's per-commit Δ (−17 ms native, −71 ms WASM) for the W4-less stack, or roll #22893 in for the up-to-date total. ## End-to-end (full stack on top of baseline) | Zone | Native baseline | Native Δ | WASM baseline | WASM Δ | |---|---:|---:|---:|---:| | `Chonk::accumulate` (×11 per proof) | 3292 ms | **−103 ms** (3.1%)² | 9172 ms | **−532 ms** (5.8%)² | | `ChonkAPI::prove` (full E2E) | 6236 ms | **−121 ms** (2%)² | 16728 ms | **−532 ms** (3.2%)² | A fifth change (fold ECCVM masking poly into wire batch) was prototyped but didn't show measurable impact under the new "masking at top of trace" model, so it was dropped. The original W4 (fuse N `add_scaled` in `HypernovaFoldingProver::batch_polynomials`) has been split out into #22893, where it's extracted into a shared `Polynomial::add_scaled_batch` helper and applied to the PCS poly batcher and AVM prover. ## Test plan - [x] `chonk_tests` (33/33), `eccvm_tests` (44/44), `ultra_honk_tests` (283/283), `hypernova_tests` (9/9) green locally - [x] Profiled native + WASM with `/profile-chonk` on remote bench machine
Replaces the N sequential `add_scaled` calls in three batch-polynomial
accumulation hot spots with a single fused `parallel_for` over the
destination range, amortising N× `parallel_for` startup overhead into
1×.
Extracted as a generic `Polynomial<Fr>::add_scaled_batch(dst, sources,
scalars)` helper in
[`polynomials/polynomial.{hpp,cpp}`](barretenberg/cpp/src/barretenberg/polynomials/polynomial.hpp),
explicitly instantiated for `bb::fr` and `grumpkin::fr`. Equivalent to a
loop of `dst.add_scaled(sources[i], scalars[i])` calls — same numerical
result, single dispatch.
## Where it's applied
| Site | N (fused sources) | Notes |
|---|---|---|
| `HypernovaFoldingProver::batch_polynomials` | 51 unshifted, 4 shifted
(Mega) | First commit on this branch |
| `PolynomialBatcher::compute_batched` (`gemini.hpp`) | 52 / 5 (Mega),
36 / 5 (Ultra), 66 / 19 (Chonk batched translator+MegaZK) | Used by
shplemini |
| `AvmProver::execute_pcs_rounds` | **2715 unshifted, 363 shifted** |
Largest N by far — savings amplified |
## Measured impact (remote EC2, 16 threads, 3 runs each)
**Chonk transfer flow** (`ecdsar1+transfer_1_recursions+sponsored_fpc`,
full proof):
| | Before | After | Δ |
|---|---|---|---|
| Native runtime | 6.11 s | 6.07 s | -0.7% |
| WASM runtime | 17.27 s | 17.25 s | -0.1% |
| WASM peak RSS | 323,604 KB | 323,568 KB | ~0 |
Marginal on Chonk because the affected callsites are not Chonk's hot
path.
**AVM proof** (`AvmVerifierTests.GoodPublicInputs`):
| Function | Before | After | Δ |
|---|---|---|---|
| `AvmProver::execute_pcs_rounds` | **392.5 ms** | **245.4 ms** |
**−37.5%** |
| Total proof time | 935 ms | 927 ms | -0.9% |
The AVM `execute_pcs_rounds` cuts in half because N≈3000 fused
`parallel_for` startups are eliminated per call.
## Memory cost
Each call adds two transient `std::vector`s of size N (one of
`PolynomialSpan<const Fr>` = 24 B, one of `Fr` = 32 B), freed when the
function returns. Largest case is AVM unshifted (~170 KB transient) — no
source polynomial is copied, only views are stored. Confirmed via
`/usr/bin/time -v` on WASM Chonk: peak RSS unchanged (within 36 KB of
316 MB).
## Test plan
- [x] `chonk_tests` (33/33), `hypernova_tests` (9/9),
`commitment_schemes_tests` (88/88) green
- [x] `vm2_tests` `*Verifier*:*Prover*` (5/5) green; full `vm2_tests`
(1764/1764)
- [x] Native + WASM Chonk benchmarks on remote EC2 confirm no regression
- [x] AVM `execute_pcs_rounds` benchmark on remote confirms −37.5% on
the targeted function
Implementation of the delayed merge: we replace the single merges in the kernels with a final batch merge in the hiding kernel. Description of the batch merge protocol is the relevant md file --------- Co-authored-by: Copilot <copilot@github.com> Co-authored-by: ledwards2225 <l.edwards.d@gmail.com>
federicobarbacovi
approved these changes
May 1, 2026
Addresses external audit [findings](https://cantina.xyz/code/53bf9e53-db08-46a1-88d7-63dc995c5102/findings) for numeric module (all low or informational).
## Summary Tightens the BN254 G2 SRS load path with four layered defenses, addressing an audit finding around G2 input validation. - **`add subgroup check on g2 points`** — implements `affine_element::is_in_prime_subgroup()` (textbook `[r]·P == O` check) and applies it at every G2 entry point. Catches G2 inputs that lie in a cofactor subgroup of E'(Fq2) but happen to be on-curve. - **`pin SHA-256 of g2 SRS bytes and verify on disk load`** — bakes the canonical Aztec `[x]_2` bytes (`BN254_G2_ELEMENT_BYTES`) and their SHA-256 (`BN254_G2_ELEMENT_SHA256`) into the binary as constants, mirroring the existing G1 chunk-hash mechanism. `get_bn254_g2_data` now hash-compares the on-disk payload before deserialization, so a tampered `bn254_g2.dat` is rejected before its bytes can reach any verifier. Also implements the previously declaration-only `get_bn254_g2_data`. - **`reject g2 = point at infinity`** — explicit gate against `[x]_2 = O`. Infinity is in every subgroup (so the subgroup check passes) and `e(−W, O) = 1` collapses the KZG pairing equation. Strictly redundant with the hash pin, but kept as defense-in-depth so any future loosening of the hash gate (e.g. a "custom SRS" flag) does not silently reopen the issue. - **`reject off-curve points in is_in_prime_subgroup`** — guards the primitive itself: the Weierstrass group law is unsound for off-curve coordinates, which means the `[r]·P` trick can return a false positive on a point satisfying `y² = x³ + b'` for some `b' ≠ b` with a prime-r factor in its order. One squaring up front closes that gap and protects every existing and future caller. ## Threat model The hash pin promotes the on-disk SRS bytes to the same trust level as the binary itself. Concretely, the attack vectors closed: - MITM on the plain-HTTP CRS download from `crs.aztec-cdn.foundation` - Compromise of the CDN bucket (R2/S3 key leak, BGP hijack) - Local cache tampering by a same-user-but-not-root process (Docker volume mounts, sibling processes, backup tooling) None of these require the attacker to patch the verifier binary itself. ## Test plan - [x] `srs_tests` — all `CrsFactory.*` pass, including new `Bn254HardcodedG2IsInPrimeSubgroup`, `Bn254G2HashMatchesPinnedBytes`, `Bn254G2DataLoadsAndVerifies`, `Bn254G2CorruptionDetected`, `Bn254G2InfinityRejected` - [x] `ecc_tests` — full suite (843 tests) passes, including new `g2.IsInPrimeSubgroupAcceptsSubgroupPoints`, `g2.IsInPrimeSubgroupRejectsCofactorPoint`, `g2.IsInPrimeSubgroupRejectsOffCurvePoint` (regression test for the off-curve guard) and the typed `IsInPrimeSubgroupAcceptsSubgroupPoints` - [x] Compiles clean with `ninja srs_tests ecc_tests`
…22906) Fixes the merge-train/barretenberg merge-queue failure: http://ci.aztec-labs.com/1617122ce095114b ## Summary `compute_running_hash` in `batch_merge.test.cpp` (added by #22775) declared `bb::fr hash_challenge(0)` but never used it. Under `-Werror -Wall`, the ARM build fails with `[-Wunused-variable]`. The fix removes the dead declaration. ## Why ci-full on #22775 didn't catch it Clang's `-Wunused-variable` is suppressed when a variable's constructor has visible side effects. On x86 (`DISABLE_ASM=OFF`), `field<T>` constructors call `self_to_montgomery_form()` → `operator*=` → `asm_self_mul_with_coarse_reduction`, which contains an `__asm__()` block with a `"memory"` clobber. Clang treats inline asm as a side effect and silences the warning. On ARM, `CMakeLists.txt` auto-sets `DISABLE_ASM=ON` (line 50–57), so `bb::fr(0)` resolves to a pure `constexpr` arithmetic path with no observable side effect — clang then issues the warning. `ci-full` (label) runs only on amd64 (see `ci.sh` `full` target — single `bootstrap_ec2 ci-full` on x86), so the asm side-effect masked the warning. The merge queue (`ci.sh merge-queue`) is what adds the arm64 leg (`a1-fast arm64 ci-fast`), which is why the failure surfaced only at merge time. ClaudeBox log: https://claudebox.work/s/e0102d5ad296fd89?run=1
Closes #22533. `bbup` failed on systems without `jq` installed: ``` /root/.bb/bbup: line 54: jq: command not found ✗ No version specified and couldn't determine version from noir ``` `bb-versions.json` is a flat `"<noir-version>": "<bb-version>"` map with one entry per line, so the lookup can be done with a `grep` + `cut` pipeline that mirrors the style already used elsewhere in this script for parsing GitHub release JSON (see lines 45 / 47 / 207 / 209 — all use `cut -d'"' -f4`). As a small side benefit, missing keys now resolve to an empty string (caught by the existing `[ -z "$version" ]` check), instead of the literal string `"null"` that `jq -r` produces. Verified locally that the new pipeline produces identical output to `jq -r --arg version "$v" '.[$version]'` for every key in `bb-versions.json` and returns empty for nonexistent keys. Closes #22533 ClaudeBox log: https://claudebox.work/s/ee6e4282b260d539?run=2
2.IsInPrimeSubgroupRejectsCofactorPoint used a hardcoded BN254 G2
cofactor point encoded as raw native Montgomery limbs. That works in the
native build, but WASM uses a different internal Montgomery
representation, so the same raw limbs decoded to a different point. In
WASM, the fixture was not even on-curve, so the test failed at:
ASSERT_TRUE(off_subgroup.on_curve())
The fix is to express the point using canonical field coordinates
instead of backend-specific Montgomery limbs:
fq2{ fq(2), fq(1) }
fq2{ fq("0x101f...01ce"), fq("0x2b76...5fde") }
rather than silently bypass `disabled-cache` error loudly when CI calculates a hash that would be `disabled-cache`. Instead of`WARNING: Noticed changes to rebuild patterns` in `cache_content_hash` we promote to a hard error --------- Co-authored-by: ludamad <adam.domurad@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BEGIN_COMMIT_OVERRIDE
fix(ci): default S3_BUILD_CACHE_AWS_PARAMS in cache_s3_transfer{,_to} (#22898)
chore: low-hanging chonk prover fixes from profiling (#22855)
chore: fuse N
add_scaledinto oneparallel_for(#22893)feat: Delayed merge implementation (#22775)
chore: numeric audit response (#22856)
fix: harden BN254 G2 SRS ingress (#22858)
fix: remove unused hash_challenge variable in batch_merge.test.cpp (#22906)
fix(bbup): remove jq dependency (#22912)
chore: fix g2 test failing on merge-train (#22920)
fix(ci): error on disabled-cache in CI hash calculation (#22904)
END_COMMIT_OVERRIDE