feat(bb): Introduce chunks for univariate computation for the AVM by jeanmon · Pull Request #12707 · AztecProtocol/aztec-packages

jeanmon · 2025-03-13T09:27:04Z

This PR introduces a more even vertically distributed trace chunks processing among threads in the univariate computation as part of sumcheck. This leads to a substantial speed up of sumcheck.

Let t be the number of threads. The processing of the rows of a circuit used to be

thread_1 (round_size/t rows)
...
thread_t (round_size/t rows) [end of circuit]

while this PR introduces the possibility of having the processing be interleaved, and therefore the load is more uniformly balanced across threads:

thread_1 (chunk_thread_portion_size rows)
...
thread_t (chunk_thread_portion_size rows)
thread_1 (chunk_thread_portion_size rows)
...
thread_t (chunk_thread_portion_size rows)
...
[end of circuit]

This PR improves #12703 measurement from 8.5 seconds to 2.4 seconds.

fcarreiro · 2025-03-13T09:35:09Z

The gains look amazing! but... I think we should wait until we have the capacity to do a full trace in VM2 for this. Also I think think

the current approach couples the sumcheck_round with the concept of avm flavor, and requires ifdefs, which kind of pollutes the code a lot. I think we can do better.
we should have a more detailed PR explanation with tracy graphs (before/after) on the threads, explaining why this approach works both for short and fuller traces

jeanmon · 2025-03-13T10:18:04Z

@fcarreiro I addressed 1) as it was indeed ugly.

fcarreiro · 2025-03-13T11:15:04Z

The gains look amazing! but... I think we should wait until we have the capacity to do a full trace in VM2 for this. Also I think think

the current approach couples the sumcheck_round with the concept of avm flavor, and requires ifdefs, which kind of pollutes the code a lot. I think we can do better.

we should have a more detailed PR explanation with tracy graphs (before/after) on the threads, explaining why this approach works both for short and fuller traces

Pushed some changes to fix 1

fcarreiro · 2025-03-13T13:10:57Z

Having fixed (1) I'm not against merging this as long as crypto is ok with it. We should however revisit it later.

fcarreiro · 2025-03-13T17:06:52Z

Actually now it looks good to merge!

lucasxia01

Looks fine, and the results are great! I'm not sure how well tested this code is - some of the logic is a little tricky so I would hope for some better testing. Also feel like it needs to be more readable.

lucasxia01 · 2025-03-14T04:43:26Z

+
+            // When the trace is shrunk to a point where the chunk portion size per thread is lower than 2,
+            // we fall back to a single chunk, i.e., we keep the "non-AVM" values.
+            if (thread_portion_size_candidate >= 2) {


why is there an if here? Seems unnecessary?

I added more explanations and actually could simplify a bit the logic. See new version.

lucasxia01 · 2025-03-14T04:43:42Z

+            static_assert(Flavor::MAX_CHUNK_THREAD_PORTION_SIZE >= 2);
+            static_assert((Flavor::MAX_CHUNK_THREAD_PORTION_SIZE & (Flavor::MAX_CHUNK_THREAD_PORTION_SIZE - 1)) == 0);
+
+            const auto thread_portion_size_candidate =


naming is not great.. can't think of proper naming at the moment but at least requires a comment to what this is

yeah, not so easy to have great terminology. I did no change it but added plenty of explanations.

lucasxia01 · 2025-03-14T04:44:25Z

        size_t num_threads = bb::calculate_num_threads_pow2(round_size, min_iterations_per_thread);
-        size_t iterations_per_thread = round_size / num_threads; // actual iterations per thread

+        // In the AVM, the trace is more dense at the top and therefore it is worth to split the work over the threads


this section just has a lot of logic thats hard to follow. All of the divisions and unclear names make it hard to parse.

lucasxia01 · 2025-03-14T04:45:31Z

-        size_t iterations_per_thread = round_size / num_threads; // actual iterations per thread

+        // In the AVM, the trace is more dense at the top and therefore it is worth to split the work over the threads
+        // a bit more evenly on the vertical axis. To achieve this, we split the trace into chunks and each thread


this comment could have more detail in terms what you mean by "more evenly on the vertical axis"

jeanmon · 2025-03-14T09:16:30Z

@lucasxia01 I added a significant number of explanations and explained the required properties to be satisfied and added a little proof of why the code satisfies this. I hope this gives enough confidence that the code is correct. it is a bit hard to unit test this. In any case, I think there is enough safeguard that the new code does not affect non-AVM parts in any way.

ledwards2225 · 2025-03-14T21:34:42Z

@jeanmon This is more or less exactly what I had in mind with this issue. @lucasxia01 do you see any reason why this same mechanism isn't applicable for us in the PG context?

fcarreiro · 2025-03-14T22:28:15Z

Personally I think this approach is noticeably better if you don't have uniform density (like in our case, we get many times improvement), and even if you do, it should work at least as good (assuming poor cache locality, which in any case you could probably improve with a bigger thread chunk; but that needs hard data).

lucasxia01

Thanks for the fantastic comments!

lucasxia01 · 2025-03-16T16:06:12Z

@ledwards2225 It's not clear exactly what we do in PG, but I could see it applying similarly. It might not be as effective or easy to implement because of how we structure the trace so the nonzero blocks are all over the place.

jeanmon marked this pull request as ready for review March 13, 2025 09:27

jeanmon requested review from IlyasRidhuan, Maddiaa0 and fcarreiro as code owners March 13, 2025 09:27

jeanmon requested review from ledwards2225 and lucasxia01 and removed request for IlyasRidhuan and Maddiaa0 March 13, 2025 09:27

jeanmon force-pushed the jm/sumcheck-chunks branch from 871a40e to f9f88dd Compare March 13, 2025 10:15

jeanmon marked this pull request as draft March 13, 2025 10:44

fcarreiro reviewed Mar 13, 2025

View reviewed changes

Comment thread barretenberg/cpp/src/barretenberg/sumcheck/sumcheck_round.hpp

jeanmon force-pushed the jm/sumcheck-chunks branch from 8aba26d to f006b50 Compare March 13, 2025 16:38

fcarreiro reviewed Mar 13, 2025

View reviewed changes

Comment thread barretenberg/cpp/src/barretenberg/vm2/generated/flavor.hpp Outdated

Comment thread barretenberg/cpp/src/barretenberg/vm2/generated/flavor.hpp Outdated

jeanmon marked this pull request as ready for review March 13, 2025 17:15

fcarreiro changed the title ~~chore: Introduce chunks for univariate computation for the AVM~~ feat(bb): Introduce chunks for univariate computation for the AVM Mar 13, 2025

lucasxia01 reviewed Mar 14, 2025

View reviewed changes

jeanmon and others added 8 commits March 14, 2025 07:30

Introduce chunks for univariate computation for the AVM

bde40e1

Removing dependency of sumcheck to avm

13cd912

rollback last change

febe4bc

use concepts

f46b226

fix concept

eea7f1c

Addressing review feedback

b3f12ac

Further adressing of review feedback

c8073a2

Addressing review feedback

56df29c

jeanmon force-pushed the jm/sumcheck-chunks branch from 4dc4d59 to 56df29c Compare March 14, 2025 09:11

jeanmon requested a review from lucasxia01 March 14, 2025 09:16

lucasxia01 approved these changes Mar 16, 2025

View reviewed changes

jeanmon merged commit c912bd6 into master Mar 16, 2025

jeanmon deleted the jm/sumcheck-chunks branch March 16, 2025 16:43

AztecBot mentioned this pull request Mar 16, 2025

chore(master): release 0.82.0 #12770

Merged

Conversation

jeanmon commented Mar 13, 2025 • edited by fcarreiro Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fcarreiro commented Mar 13, 2025

Uh oh!

jeanmon commented Mar 13, 2025

Uh oh!

fcarreiro commented Mar 13, 2025

Uh oh!

Uh oh!

fcarreiro commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

fcarreiro commented Mar 13, 2025

Uh oh!

lucasxia01 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucasxia01 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeanmon Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

lucasxia01 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeanmon Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

lucasxia01 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

lucasxia01 Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeanmon commented Mar 14, 2025

Uh oh!

ledwards2225 commented Mar 14, 2025

Uh oh!

fcarreiro commented Mar 14, 2025

Uh oh!

lucasxia01 left a comment

Choose a reason for hiding this comment

Uh oh!

lucasxia01 commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeanmon commented Mar 13, 2025 •

edited by fcarreiro

Loading