Skip to content

feat: add --memory_profile_out flag for Chonk memory profiling#22145

Merged
johnathan79717 merged 10 commits into
merge-train/barretenbergfrom
jh/memory-profile-out
Apr 1, 2026
Merged

feat: add --memory_profile_out flag for Chonk memory profiling#22145
johnathan79717 merged 10 commits into
merge-train/barretenbergfrom
jh/memory-profile-out

Conversation

@johnathan79717

@johnathan79717 johnathan79717 commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds --memory_profile_out <file> flag to bb prove that outputs a JSON array of RSS checkpoints at key proving stages for each circuit. Each checkpoint records the stage name (alloc, trace, oink, sumcheck, accumulate), circuit index, circuit name, and peak RSS in MB.

In CI, extract_memory_benchmarks.py converts these into dashboard entries (one overlaid line per circuit stage, tracked across commits).

  • RSS checkpoints at 5 stages per circuit: after_alloc, after_trace, after_oink, after_sumcheck, after_accumulate
  • Circuit names threaded from ChonkAccumulate
  • Reuses peak_rss_bytes() from logstr.cpp (no duplicate getrusage)
  • JSON via msgpack serialization
  • Native only (getrusage unavailable in wasm)

Dashboard visualization: AztecProtocol/benchmark-page-data#1

Refs: AztecProtocol/barretenberg#1641

Example JSON output (62 checkpoints for 13 circuits)

[
  {"stage": "after_alloc", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 120},
  {"stage": "after_trace", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 129},
  {"stage": "after_oink", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 135},
  {"stage": "after_sumcheck", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 137},
  {"stage": "after_accumulate", "circuit_index": 0, "circuit_name": "MultiCallEntrypoint:entrypoint", "rss_mb": 137},
  {"stage": "after_alloc", "circuit_index": 1, "circuit_name": "private_kernel_init", "rss_mb": 147},
  ...
  {"stage": "after_trace", "circuit_index": 6, "circuit_name": "EcdsaRAccount:entrypoint", "rss_mb": 227},
  ...
  {"stage": "after_trace", "circuit_index": 12, "circuit_name": "hiding_kernel", "rss_mb": 263}
]

Test plan

  • Builds cleanly
  • Tested with deploy_ecdsar1+sponsored_fpc flow: 62 checkpoints, correct circuit names
  • Extraction script produces correct dashboard entries
  • CI build passes

Add tooling to continually assess memory consumption in Chonk proving.
The new --memory_profile_out flag on `bb prove` outputs a JSON report
with per-circuit polynomial memory breakdown by category (wires, sigmas,
selectors, etc.), CRS size, and RSS checkpoints at key proving stages.

A Python extraction script converts the JSON into benchmark dashboard
entries (stacked charts for polynomial categories, line charts for
totals and peak RSS). Integrated into ci_benchmark_ivc_flows.sh.

Refs: AztecProtocol/barretenberg#1641
@johnathan79717 johnathan79717 added the ci-barretenberg Run all barretenberg/cpp checks. label Mar 30, 2026
Switch from stacked: to stacked-area: prefix in extract_memory_benchmarks.py
so the dashboard renders proper stacked area charts instead of overlaid lines.
This requires the corresponding stacked-area chart support in benchmark-page-data.
- Add circuit_name to RSS checkpoints (set from ChonkAccumulate)
- Extract RSS checkpoints as per-commit dashboard entries with labels
  like "06_EcdsaRAccount:entrypoint/after_accumulate"
- Remove CRS instrumentation from commitment_key.hpp (constant, not
  useful to track)
- Remove crs_MB, total_polynomial_MB, peak_rss_MB from dashboard
  metrics (redundant with existing memusage and stacked area chart)
…rics

The per-circuit RSS timeline chart is sufficient for tracking memory.
The stacked area chart added noise without adding diagnostic value
that the RSS timeline and the raw JSON don't already provide.
Split the single after_poly_allocation checkpoint into:
- after_alloc: right after polynomial backing memory is allocated
- after_trace: after trace data is populated into polynomials
- after_oink: after OinkProver (CRS load + commitments)
- after_sumcheck: after sumcheck rounds complete
- after_accumulate: after full fold/accumulate (existing)

This reveals where memory jumps happen within each circuit. For example,
the trace population step adds 9-14 MiB for large circuits.
- Prefix stage names with sequence numbers (0_alloc, 1_trace, etc.)
  so alphabetical sort matches execution order
- Rename chart group to rss_over_stages, placed directly under the
  flow path alongside other metrics
- Only pass --memory_profile_out for native builds (getrusage
  unavailable in wasm)
- Remove CategoryStats, CircuitMemoryStats, classify_polynomial,
  analyze_prover_polynomials_categorized (unused after removing stacked
  area chart)
- Remove CRS instrumentation from commitment_key.hpp
- Use msgpack serialization for JSON output instead of manual string
  building
- Reuse peak_rss_bytes() from logstr.cpp instead of duplicating
  getrusage logic
- Simplify MemoryProfile API: add_rss_checkpoint(stage) with internal
  circuit index tracking via next_circuit()
- JSON output is now a flat array of checkpoints
Avoids breaking the dashboard's label extraction which uses the last
path segment (split by /). Labels are now like
"06_EcdsaRAccount:entrypoint_0_alloc" instead of
"06_EcdsaRAccount:entrypoint/0_alloc".
@johnathan79717 johnathan79717 requested a review from ludamad April 1, 2026 14:40
@johnathan79717 johnathan79717 merged commit ad82386 into merge-train/barretenberg Apr 1, 2026
12 checks passed
@johnathan79717 johnathan79717 deleted the jh/memory-profile-out branch April 1, 2026 15:01
johnathan79717 pushed a commit that referenced this pull request Apr 1, 2026
## Summary
Fixes WASM build failure introduced by #22145 (--memory_profile_out flag
for Chonk memory profiling).

Two issues in `memory_profile.cpp`:
1. **Narrowing conversion**: `peak_rss_bytes() / (1024ULL * 1024ULL)`
produces `unsigned long long` which can't narrow to `size_t` (32-bit
`unsigned long` on WASM) in an initializer list. Fixed with
`static_cast<size_t>(...)`.
2. **Undefined symbol**: `peak_rss_bytes()` lives in the `env` module
which is intentionally excluded from the WASM link target. Guarded the
call with `#ifndef __wasm__`, returning 0 for RSS on WASM (memory
profiling is only meaningful on native builds).

## Test plan
- [x] `./bootstrap.sh ci` — all 1761 tests pass, both WASM builds
succeed

ClaudeBox log: https://claudebox.work/s/b5d31ce0b9e0fa7f?run=1
github-merge-queue Bot pushed a commit that referenced this pull request Apr 2, 2026
BEGIN_COMMIT_OVERRIDE
fix: verify accumulated pairing points in native ChonkVerifier (#22224)
chore: enable _GLIBCXX_DEBUG in debug build presets (#22218)
feat: add --memory_profile_out flag for Chonk memory profiling (#22145)
fix: disable max capacity test in debug + tiny gate separator
improvements (#22215)
fix: WASM build for memory_profile.cpp (#22231)
fix: translator audit fixes (#22242)
fix: remove constexpr from functions using std::vector for
_GLIBCXX_DEBUG compat (#22239)
fix: pippenger edge case (#22256)
fix: avoid dereferencing past-the-end vector iterators in serialize.hpp
(#22261)
chore: crypto primitives external audit response 0 (#22263)
feat: switch memory profiling from peak RSS to live heap usage (#22266)
fix: replace UB end-iterator dereference in serialize.hpp (#22262)
fix: catch exceptions in ChonkBatchVerifier::batch_check (#22270)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-barretenberg Run all barretenberg/cpp checks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants