Skip to content

feat: merge-train/barretenberg#22147

Merged
iakovenkos merged 10 commits into
nextfrom
merge-train/barretenberg
Mar 31, 2026
Merged

feat: merge-train/barretenberg#22147
iakovenkos merged 10 commits into
nextfrom
merge-train/barretenberg

Conversation

@AztecBot

@AztecBot AztecBot commented Mar 30, 2026

Copy link
Copy Markdown
Collaborator

BEGIN_COMMIT_OVERRIDE
chore: bench phase breakdown + thread sweep for MSM reduction (#21885)
chore: minor fixes pt. 2 (#22138)
chore: minor fixes pt. 3 (#22181)
fix: satisfy Chonk num_circuits >= 4 assertion in mock IVC creation (#22188)
END_COMMIT_OVERRIDE

## Context

This PR addresses `AztecProtocol/barretenberg#1656` by making the
`MSM::batch_multi_scalar_mul(...)` phase breakdown measurable and by
adding a benchmark case that targets the exact regime called out in the
issue (`2^16` points with `256` threads).

The goal is to answer: is the single-threaded final reduction
(`accumulate_results`) actually a bottleneck relative to the full MSM?

## What Changed

### 1) Phase Breakdown (`BB_BENCH`) inside `batch_multi_scalar_mul`

Added `BB_BENCH` scopes for:
- `MSM::batch_multi_scalar_mul/evaluate_work_units`
- `MSM::batch_multi_scalar_mul/accumulate_results`
- `MSM::batch_multi_scalar_mul/batch_normalize`
- `MSM::batch_multi_scalar_mul/scalars_to_montgomery`

These are surfaced in google-benchmark output via
`GOOGLE_BB_BENCH_REPORTER(state)`.

### 2) New Benchmark Case: `BatchMSM_1656`

Added a dedicated benchmark:
- `PippengerBench/BatchMSM_1656/256/{msm_size}`
- Single MSM (`num_polys = 1`)
- Sizes:
  - `msm_size ∈ {2^16, 2^20}`

This uses `bb::set_parallel_for_concurrency(256)` to force the intended
partitioning and restores the original value after the benchmark
finishes.

## Results (Local)

Machine: 4 vCPU laptop (so `256 threads` is intentionally
oversubscribed; the key point is the *absolute* reduction overhead).

Key takeaways (from `BB_BENCH` counters; time counters are nanoseconds):
- `2^16, 256 threads`:
  - `accumulate_results ≈ 139k ns` (~0.139 ms)
  - total `≈ 521 ms`
  - reduction fraction `~0.027%`
- `2^20, 256 threads`:
  - `accumulate_results ≈ 141k ns` (~0.141 ms)
  - total `≈ 3457 ms`
  - reduction fraction `~0.004%`

So the final reduction appears negligible in these regimes on my setup,
and the benchmark now makes it easy to validate on a 64+ core machine
where the original concern is most relevant.

## Notes / Background

- `MSM` here is a Pippenger-style MSM. For background reading on
multiexponentiation / multi-product methods:
- Nicholas Pippenger, *On the evaluation of powers and related problems*
(1976).
- Jurjen Bos and Matthijs Coster, *Addition Chain Heuristics* (CRYPTO
1989).
- Ryan Henry, *Pippenger's Multiproduct and Multiexponentiation
Algorithms* (2010).

## How To Run

```
cmake --build <build-dir> --target pippenger_bench
CRS_PATH=<path-to-bn254_g1.dat> <build-dir>/bin/pippenger_bench \
  --benchmark_filter='BatchMSM_1656' \
  --benchmark_min_time=0.1s \
  --benchmark_counters_tabular=true
```

Fixes AztecProtocol/barretenberg#1656

---------

Co-authored-by: Peter <peter@users.noreply.github.com>
Co-authored-by: Jonathan Hao <jonathan@aztec-labs.com>

@ludamad ludamad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Auto-approved

@AztecBot

Copy link
Copy Markdown
Collaborator Author

🤖 Auto-merge enabled after 4 hours of inactivity. This PR will be merged automatically once all checks pass.

@AztecBot AztecBot added this pull request to the merge queue Mar 30, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 30, 2026
@iakovenkos iakovenkos enabled auto-merge March 31, 2026 14:52
…22188)

## Summary
PR #22181 tightened the `Chonk` constructor to require `num_circuits >=
4`, but `create_mock_chonk_from_constraints()` was passing
`constraints.size()` (1-2 for most kernels), causing assertion failures
during VK generation for `mock-protocol-circuits` and
`noir-protocol-circuits`.

## Fix
Pass `std::max(constraints.size(), 4)` since the mock doesn't use
`num_circuits` for anything — it directly manipulates the verification
queue.

## Verification
- All DSL tests pass (Hypernova, MockVerifier, ChonkRecursion)
- `mock-protocol-circuits` bootstrap: 11/11 circuits pass
- `noir-protocol-circuits` bootstrap: 120/120 circuits pass

ClaudeBox log: https://claudebox.work/s/71d20ef48c81e6e3?run=1
@iakovenkos iakovenkos added this pull request to the merge queue Mar 31, 2026
Merged via the queue into next with commit 7bfae55 Mar 31, 2026
16 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants