Skip to content

feat: merge-train/barretenberg#22901

Merged
federicobarbacovi merged 18 commits into
nextfrom
merge-train/barretenberg
May 5, 2026
Merged

feat: merge-train/barretenberg#22901
federicobarbacovi merged 18 commits into
nextfrom
merge-train/barretenberg

Conversation

@AztecBot

@AztecBot AztecBot commented May 1, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
fix(ci): default S3_BUILD_CACHE_AWS_PARAMS in cache_s3_transfer{,_to} (#22898)
chore: low-hanging chonk prover fixes from profiling (#22855)
chore: fuse N add_scaled into one parallel_for (#22893)
feat: Delayed merge implementation (#22775)
chore: numeric audit response (#22856)
fix: harden BN254 G2 SRS ingress (#22858)
fix: remove unused hash_challenge variable in batch_merge.test.cpp (#22906)
fix(bbup): remove jq dependency (#22912)
chore: fix g2 test failing on merge-train (#22920)
fix(ci): error on disabled-cache in CI hash calculation (#22904)
END_COMMIT_OVERRIDE

…#22898)

`cache_s3_transfer` and `cache_s3_transfer_to` in `ci3/source_cache`
reference `${S3_BUILD_CACHE_AWS_PARAMS}` without a default. Every script
that sources `ci3/source` runs under `set -euo pipefail` (via
`ci3/source_options`), and the variable is unset in production CI (it is
only set in `ci3/source_test` for local fixtures), so the functions
abort with:

```
ci3/source_cache: line 30: S3_BUILD_CACHE_AWS_PARAMS: unbound variable
```

This caused benchmark breakdown S3 uploads from
`ci_benchmark_ivc_flows.sh` / `ci_benchmark_ultrahonk_circuits.sh` to
fail silently, surfacing as the bench job appearing skipped on the
breakdown dashboard.

Match the existing convention used by `cache_upload`, `cache_delete`,
`cache_download`, `cache_exists`, and `cache_ls` by providing an empty
default (`:-`).

Regression from dfa23c2 ("feat: migrate CI from SSH to SSM with SSH
fallback") which added these two functions.

Details:
https://gist.github.com/AztecBot/517dc093e9149bc56adbacc8f721c8b4


ClaudeBox log: https://claudebox.work/s/33613d341cf3129d?run=1
AztecBot and others added 3 commits May 1, 2026 17:17
A handful of profiling-driven trims for Chonk client-IVC. Each commit
targets one zone, no behaviour change.

## Per-commit gains

Each row is the commit applied **alone** on top of
`merge-train/barretenberg`, measured on
`ecdsar1+transfer_1_recursions+sponsored_fpc`, 16-thread remote bench,
single sample per build.

| Commit | Target zone | Native baseline | Native Δ | WASM baseline |
WASM Δ |
|---|---|---:|---:|---:|---:|
| W2: reuse precomputed VK in `Chonk::accumulate_hiding_kernel` | hiding
kernel | 121 ms | **−104 ms** | 343 ms | **−301 ms** |
| W3: parallelise `construct_trace_data` over trace blocks¹ |
construct_trace_data | 135 ms | **−34 ms** | 373 ms | **−116 ms** |
| W6: parallelise `compute_permutation_mapping` cycle loop |
compute_permutation_mapping | 36 ms | **−11 ms** | 61 ms | **−23 ms** |

¹ W3 splits the work into two parallel phases: Phase 1 fans out
per-block (wires + copy-cycle node emission), Phase 2 fans out over a
flattened `(block, selector)` task list so the threadpool can
load-balance selector filling across blocks regardless of per-block size
skew. The single-pass-per-block structure that the original W3 used was
WASM-only; the flattened selector phase is what unlocks the native gain.

² End-to-end numbers below were measured with the original W4 commit
included; subtract W4's per-commit Δ (−17 ms native, −71 ms WASM) for
the W4-less stack, or roll #22893 in for the up-to-date total.

## End-to-end (full stack on top of baseline)

| Zone | Native baseline | Native Δ | WASM baseline | WASM Δ |
|---|---:|---:|---:|---:|
| `Chonk::accumulate` (×11 per proof) | 3292 ms | **−103 ms** (3.1%)² |
9172 ms | **−532 ms** (5.8%)² |
| `ChonkAPI::prove` (full E2E) | 6236 ms | **−121 ms** (2%)² | 16728 ms
| **−532 ms** (3.2%)² |

A fifth change (fold ECCVM masking poly into wire batch) was prototyped
but didn't show measurable impact under the new "masking at top of
trace" model, so it was dropped.

The original W4 (fuse N `add_scaled` in
`HypernovaFoldingProver::batch_polynomials`) has been split out into
#22893, where it's extracted into a shared
`Polynomial::add_scaled_batch` helper and applied to the PCS poly
batcher and AVM prover.

## Test plan
- [x] `chonk_tests` (33/33), `eccvm_tests` (44/44), `ultra_honk_tests`
(283/283), `hypernova_tests` (9/9) green locally
- [x] Profiled native + WASM with `/profile-chonk` on remote bench
machine
Replaces the N sequential `add_scaled` calls in three batch-polynomial
accumulation hot spots with a single fused `parallel_for` over the
destination range, amortising N× `parallel_for` startup overhead into
1×.

Extracted as a generic `Polynomial<Fr>::add_scaled_batch(dst, sources,
scalars)` helper in
[`polynomials/polynomial.{hpp,cpp}`](barretenberg/cpp/src/barretenberg/polynomials/polynomial.hpp),
explicitly instantiated for `bb::fr` and `grumpkin::fr`. Equivalent to a
loop of `dst.add_scaled(sources[i], scalars[i])` calls — same numerical
result, single dispatch.

## Where it's applied

| Site | N (fused sources) | Notes |
|---|---|---|
| `HypernovaFoldingProver::batch_polynomials` | 51 unshifted, 4 shifted
(Mega) | First commit on this branch |
| `PolynomialBatcher::compute_batched` (`gemini.hpp`) | 52 / 5 (Mega),
36 / 5 (Ultra), 66 / 19 (Chonk batched translator+MegaZK) | Used by
shplemini |
| `AvmProver::execute_pcs_rounds` | **2715 unshifted, 363 shifted** |
Largest N by far — savings amplified |

## Measured impact (remote EC2, 16 threads, 3 runs each)

**Chonk transfer flow** (`ecdsar1+transfer_1_recursions+sponsored_fpc`,
full proof):

| | Before | After | Δ |
|---|---|---|---|
| Native runtime | 6.11 s | 6.07 s | -0.7% |
| WASM runtime | 17.27 s | 17.25 s | -0.1% |
| WASM peak RSS | 323,604 KB | 323,568 KB | ~0 |

Marginal on Chonk because the affected callsites are not Chonk's hot
path.

**AVM proof** (`AvmVerifierTests.GoodPublicInputs`):

| Function | Before | After | Δ |
|---|---|---|---|
| `AvmProver::execute_pcs_rounds` | **392.5 ms** | **245.4 ms** |
**−37.5%** |
| Total proof time | 935 ms | 927 ms | -0.9% |

The AVM `execute_pcs_rounds` cuts in half because N≈3000 fused
`parallel_for` startups are eliminated per call.

## Memory cost

Each call adds two transient `std::vector`s of size N (one of
`PolynomialSpan<const Fr>` = 24 B, one of `Fr` = 32 B), freed when the
function returns. Largest case is AVM unshifted (~170 KB transient) — no
source polynomial is copied, only views are stored. Confirmed via
`/usr/bin/time -v` on WASM Chonk: peak RSS unchanged (within 36 KB of
316 MB).

## Test plan
- [x] `chonk_tests` (33/33), `hypernova_tests` (9/9),
`commitment_schemes_tests` (88/88) green
- [x] `vm2_tests` `*Verifier*:*Prover*` (5/5) green; full `vm2_tests`
(1764/1764)
- [x] Native + WASM Chonk benchmarks on remote EC2 confirm no regression
- [x] AVM `execute_pcs_rounds` benchmark on remote confirms −37.5% on
the targeted function
Implementation of the delayed merge: we replace the single merges in the
kernels with a final batch merge in the hiding kernel. Description of
the batch merge protocol is the relevant md file

---------

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: ledwards2225 <l.edwards.d@gmail.com>
@federicobarbacovi federicobarbacovi added this pull request to the merge queue May 1, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 1, 2026
ledwards2225 and others added 3 commits May 1, 2026 13:50
Addresses external audit
[findings](https://cantina.xyz/code/53bf9e53-db08-46a1-88d7-63dc995c5102/findings)
for numeric module (all low or informational).
## Summary

Tightens the BN254 G2 SRS load path with four layered defenses,
addressing an audit finding around G2 input validation.

- **`add subgroup check on g2 points`** — implements
`affine_element::is_in_prime_subgroup()` (textbook `[r]·P == O` check)
and applies it at every G2 entry point. Catches G2 inputs that lie in a
cofactor subgroup of E'(Fq2) but happen to be on-curve.
- **`pin SHA-256 of g2 SRS bytes and verify on disk load`** — bakes the
canonical Aztec `[x]_2` bytes (`BN254_G2_ELEMENT_BYTES`) and their
SHA-256 (`BN254_G2_ELEMENT_SHA256`) into the binary as constants,
mirroring the existing G1 chunk-hash mechanism. `get_bn254_g2_data` now
hash-compares the on-disk payload before deserialization, so a tampered
`bn254_g2.dat` is rejected before its bytes can reach any verifier. Also
implements the previously declaration-only `get_bn254_g2_data`.
- **`reject g2 = point at infinity`** — explicit gate against `[x]_2 =
O`. Infinity is in every subgroup (so the subgroup check passes) and
`e(−W, O) = 1` collapses the KZG pairing equation. Strictly redundant
with the hash pin, but kept as defense-in-depth so any future loosening
of the hash gate (e.g. a "custom SRS" flag) does not silently reopen the
issue.
- **`reject off-curve points in is_in_prime_subgroup`** — guards the
primitive itself: the Weierstrass group law is unsound for off-curve
coordinates, which means the `[r]·P` trick can return a false positive
on a point satisfying `y² = x³ + b'` for some `b' ≠ b` with a prime-r
factor in its order. One squaring up front closes that gap and protects
every existing and future caller.

## Threat model

The hash pin promotes the on-disk SRS bytes to the same trust level as
the binary itself. Concretely, the attack vectors closed:

- MITM on the plain-HTTP CRS download from `crs.aztec-cdn.foundation`
- Compromise of the CDN bucket (R2/S3 key leak, BGP hijack)
- Local cache tampering by a same-user-but-not-root process (Docker
volume mounts, sibling processes, backup tooling)

None of these require the attacker to patch the verifier binary itself.

## Test plan

- [x] `srs_tests` — all `CrsFactory.*` pass, including new
`Bn254HardcodedG2IsInPrimeSubgroup`, `Bn254G2HashMatchesPinnedBytes`,
`Bn254G2DataLoadsAndVerifies`, `Bn254G2CorruptionDetected`,
`Bn254G2InfinityRejected`
- [x] `ecc_tests` — full suite (843 tests) passes, including new
`g2.IsInPrimeSubgroupAcceptsSubgroupPoints`,
`g2.IsInPrimeSubgroupRejectsCofactorPoint`,
`g2.IsInPrimeSubgroupRejectsOffCurvePoint` (regression test for the
off-curve guard) and the typed `IsInPrimeSubgroupAcceptsSubgroupPoints`
- [x] Compiles clean with `ninja srs_tests ecc_tests`
…22906)

Fixes the merge-train/barretenberg merge-queue failure:
http://ci.aztec-labs.com/1617122ce095114b

## Summary

`compute_running_hash` in `batch_merge.test.cpp` (added by #22775)
declared `bb::fr hash_challenge(0)` but never used it. Under `-Werror
-Wall`, the ARM build fails with `[-Wunused-variable]`. The fix removes
the dead declaration.

## Why ci-full on #22775 didn't catch it

Clang's `-Wunused-variable` is suppressed when a variable's constructor
has visible side effects. On x86 (`DISABLE_ASM=OFF`), `field<T>`
constructors call `self_to_montgomery_form()` → `operator*=` →
`asm_self_mul_with_coarse_reduction`, which contains an `__asm__()`
block with a `"memory"` clobber. Clang treats inline asm as a side
effect and silences the warning.

On ARM, `CMakeLists.txt` auto-sets `DISABLE_ASM=ON` (line 50–57), so
`bb::fr(0)` resolves to a pure `constexpr` arithmetic path with no
observable side effect — clang then issues the warning.

`ci-full` (label) runs only on amd64 (see `ci.sh` `full` target — single
`bootstrap_ec2 ci-full` on x86), so the asm side-effect masked the
warning. The merge queue (`ci.sh merge-queue`) is what adds the arm64
leg (`a1-fast arm64 ci-fast`), which is why the failure surfaced only at
merge time.

ClaudeBox log: https://claudebox.work/s/e0102d5ad296fd89?run=1
@federicobarbacovi federicobarbacovi added this pull request to the merge queue May 1, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 1, 2026
AztecBot and others added 5 commits May 1, 2026 23:50
Closes #22533.

`bbup` failed on systems without `jq` installed:

```
/root/.bb/bbup: line 54: jq: command not found
✗ No version specified and couldn't determine version from noir
```

`bb-versions.json` is a flat `"<noir-version>": "<bb-version>"` map with
one entry per line, so the lookup can be done with a `grep` + `cut`
pipeline that mirrors the style already used elsewhere in this script
for parsing GitHub release JSON (see lines 45 / 47 / 207 / 209 — all use
`cut -d'"' -f4`).

As a small side benefit, missing keys now resolve to an empty string
(caught by the existing `[ -z "$version" ]` check), instead of the
literal string `"null"` that `jq -r` produces.

Verified locally that the new pipeline produces identical output to `jq
-r --arg version "$v" '.[$version]'` for every key in `bb-versions.json`
and returns empty for nonexistent keys.

Closes #22533

ClaudeBox log: https://claudebox.work/s/ee6e4282b260d539?run=2
2.IsInPrimeSubgroupRejectsCofactorPoint used a hardcoded BN254 G2
cofactor point encoded as raw native Montgomery limbs. That works in the
native build, but WASM uses a different internal Montgomery
representation, so the same raw limbs decoded to a different point. In
WASM, the fixture was not even on-curve, so the test failed at:

  ASSERT_TRUE(off_subgroup.on_curve())

The fix is to express the point using canonical field coordinates
instead of backend-specific Montgomery limbs:

  fq2{ fq(2), fq(1) }
  fq2{ fq("0x101f...01ce"), fq("0x2b76...5fde") }
@iakovenkos iakovenkos added the ci-full Run all master checks. label May 4, 2026
@iakovenkos iakovenkos enabled auto-merge May 4, 2026 11:47
@iakovenkos iakovenkos added this pull request to the merge queue May 4, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026
AztecBot and others added 2 commits May 4, 2026 10:52
rather than silently bypass `disabled-cache` error loudly when CI
calculates a hash that would be `disabled-cache`. Instead of`WARNING:
Noticed changes to rebuild patterns` in `cache_content_hash` we promote
to a hard error

---------

Co-authored-by: ludamad <adam.domurad@gmail.com>
@iakovenkos iakovenkos added this pull request to the merge queue May 5, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 5, 2026
@iakovenkos iakovenkos removed ci-full Run all master checks. labels May 5, 2026
@federicobarbacovi federicobarbacovi added this pull request to the merge queue May 5, 2026
Merged via the queue into next with commit 4520681 May 5, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants