feat: add --memory_profile_out flag for Chonk memory profiling#22145
Merged
Conversation
Add tooling to continually assess memory consumption in Chonk proving. The new --memory_profile_out flag on `bb prove` outputs a JSON report with per-circuit polynomial memory breakdown by category (wires, sigmas, selectors, etc.), CRS size, and RSS checkpoints at key proving stages. A Python extraction script converts the JSON into benchmark dashboard entries (stacked charts for polynomial categories, line charts for totals and peak RSS). Integrated into ci_benchmark_ivc_flows.sh. Refs: AztecProtocol/barretenberg#1641
Switch from stacked: to stacked-area: prefix in extract_memory_benchmarks.py so the dashboard renders proper stacked area charts instead of overlaid lines. This requires the corresponding stacked-area chart support in benchmark-page-data.
- Add circuit_name to RSS checkpoints (set from ChonkAccumulate) - Extract RSS checkpoints as per-commit dashboard entries with labels like "06_EcdsaRAccount:entrypoint/after_accumulate" - Remove CRS instrumentation from commitment_key.hpp (constant, not useful to track) - Remove crs_MB, total_polynomial_MB, peak_rss_MB from dashboard metrics (redundant with existing memusage and stacked area chart)
…rics The per-circuit RSS timeline chart is sufficient for tracking memory. The stacked area chart added noise without adding diagnostic value that the RSS timeline and the raw JSON don't already provide.
Split the single after_poly_allocation checkpoint into: - after_alloc: right after polynomial backing memory is allocated - after_trace: after trace data is populated into polynomials - after_oink: after OinkProver (CRS load + commitments) - after_sumcheck: after sumcheck rounds complete - after_accumulate: after full fold/accumulate (existing) This reveals where memory jumps happen within each circuit. For example, the trace population step adds 9-14 MiB for large circuits.
- Prefix stage names with sequence numbers (0_alloc, 1_trace, etc.) so alphabetical sort matches execution order - Rename chart group to rss_over_stages, placed directly under the flow path alongside other metrics - Only pass --memory_profile_out for native builds (getrusage unavailable in wasm)
- Remove CategoryStats, CircuitMemoryStats, classify_polynomial, analyze_prover_polynomials_categorized (unused after removing stacked area chart) - Remove CRS instrumentation from commitment_key.hpp - Use msgpack serialization for JSON output instead of manual string building - Reuse peak_rss_bytes() from logstr.cpp instead of duplicating getrusage logic - Simplify MemoryProfile API: add_rss_checkpoint(stage) with internal circuit index tracking via next_circuit() - JSON output is now a flat array of checkpoints
Avoids breaking the dashboard's label extraction which uses the last path segment (split by /). Labels are now like "06_EcdsaRAccount:entrypoint_0_alloc" instead of "06_EcdsaRAccount:entrypoint/0_alloc".
ludamad
approved these changes
Apr 1, 2026
This was referenced Apr 1, 2026
johnathan79717
pushed a commit
that referenced
this pull request
Apr 1, 2026
## Summary Fixes WASM build failure introduced by #22145 (--memory_profile_out flag for Chonk memory profiling). Two issues in `memory_profile.cpp`: 1. **Narrowing conversion**: `peak_rss_bytes() / (1024ULL * 1024ULL)` produces `unsigned long long` which can't narrow to `size_t` (32-bit `unsigned long` on WASM) in an initializer list. Fixed with `static_cast<size_t>(...)`. 2. **Undefined symbol**: `peak_rss_bytes()` lives in the `env` module which is intentionally excluded from the WASM link target. Guarded the call with `#ifndef __wasm__`, returning 0 for RSS on WASM (memory profiling is only meaningful on native builds). ## Test plan - [x] `./bootstrap.sh ci` — all 1761 tests pass, both WASM builds succeed ClaudeBox log: https://claudebox.work/s/b5d31ce0b9e0fa7f?run=1
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Apr 2, 2026
BEGIN_COMMIT_OVERRIDE fix: verify accumulated pairing points in native ChonkVerifier (#22224) chore: enable _GLIBCXX_DEBUG in debug build presets (#22218) feat: add --memory_profile_out flag for Chonk memory profiling (#22145) fix: disable max capacity test in debug + tiny gate separator improvements (#22215) fix: WASM build for memory_profile.cpp (#22231) fix: translator audit fixes (#22242) fix: remove constexpr from functions using std::vector for _GLIBCXX_DEBUG compat (#22239) fix: pippenger edge case (#22256) fix: avoid dereferencing past-the-end vector iterators in serialize.hpp (#22261) chore: crypto primitives external audit response 0 (#22263) feat: switch memory profiling from peak RSS to live heap usage (#22266) fix: replace UB end-iterator dereference in serialize.hpp (#22262) fix: catch exceptions in ChonkBatchVerifier::batch_check (#22270) END_COMMIT_OVERRIDE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
--memory_profile_out <file>flag tobb provethat outputs a JSON array of RSS checkpoints at key proving stages for each circuit. Each checkpoint records the stage name (alloc, trace, oink, sumcheck, accumulate), circuit index, circuit name, and peak RSS in MB.In CI,
extract_memory_benchmarks.pyconverts these into dashboard entries (one overlaid line per circuit stage, tracked across commits).peak_rss_bytes()from logstr.cpp (no duplicate getrusage)Dashboard visualization: AztecProtocol/benchmark-page-data#1
Refs: AztecProtocol/barretenberg#1641
Example JSON output (62 checkpoints for 13 circuits)
[ {"stage": "after_alloc", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 120}, {"stage": "after_trace", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 129}, {"stage": "after_oink", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 135}, {"stage": "after_sumcheck", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 137}, {"stage": "after_accumulate", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 137}, {"stage": "after_alloc", "circuit_index": 1, "circuit_name": "private_kernel_init", "rss_mb": 147}, ... {"stage": "after_trace", "circuit_index": 6, "circuit_name": "EcdsaRAccount:entrypoint", "rss_mb": 227}, ... {"stage": "after_trace", "circuit_index": 12, "circuit_name": "hiding_kernel", "rss_mb": 263} ]Test plan