feat: Multithreaded Sumcheck#556
Conversation
f3fbed9 to
5b76ea4
Compare
| # Checkout baseline branch, run benchmarks, save results in json format | ||
| echo -e "\nConfiguring and building $BENCH_TARGET in $BASELINE_BRANCH branch..\n" | ||
| git checkout master > /dev/null | ||
| cd $BASE_DIR |
There was a problem hiding this comment.
Unrelated to this PR, just a bug in the script
| const FF alpha) | ||
| { | ||
| size_t round_size = 1; | ||
| size_t round_size = 2; |
There was a problem hiding this comment.
This was always a bug but it didnt matter because it was being used in loop like this for (size_t edge_idx = 0; edge_idx < round_size; edge_idx += 2) in compute_univariate Round size is at minimum 2, never 1 in practice.
There was a problem hiding this comment.
This may have been left over from when we thought of round size as the number of edges. Its not that anymmore. Its the number of rows ( = number of edges * 2)
codygunton
left a comment
There was a problem hiding this comment.
Lovely, compact PR here. We need to tune the parameters and try to understand where we stop getting linear scaling, but this is definitely a nice and significant first step.
| size_t iterations_per_thread = round_size / num_threads; // actual iterations per thread | ||
|
|
||
| // Constuct univariate accumulator containers; one per thread | ||
| std::vector<RelationUnivariates> thread_univariate_accumulators; |
There was a problem hiding this comment.
You can just initialize with the size rather than to resize. Maybe doesn't matter.
| * @param tuple_2 Second summand | ||
| */ | ||
| template <typename... T> | ||
| static constexpr void add_tuples(std::tuple<T...>& tuple_1, const std::tuple<T...>& tuple_2) |
There was a problem hiding this comment.
FYI I think there's built-in folding over the plus operator using something like +...
There was a problem hiding this comment.
There's likely a lot of room for improvement in a lot of the tuple based methods I've written. Could be a nice puzzle for someone (maybe even me) at some point.
Description
Introduces multithreading to the primary loop in Sumcheck. Essentially, each thread is responsible for accumulating the univariate contribution of each sub-relation for a subset of edges per round. After all threads have completed their work, the accumulators from each thread are combined to obtain the contribution over the complete hypercube for the round.
The number of cores is variable across rounds and is determined by
min_iterations_per_thread(currently 2^6). Early rounds (for sufficiently large circuits) will use all available cores. As the round size is reduced, we use the minimum number of threads such that each thread is responsible for at leastmin_iterations_per_thread. This leads to the use of successively fewer threads, down to a single thread in the final rounds.Note: A few functions in sumcheck round now take as input things that used to be accessed as class members (e.g.
univariate_accumulatorsandextended_edges. This is because multiple threads cannot access the same location in memory. The solution was to simply make a corresponding container for each thread. (This is not expensive becauseunivariate_accumulatorsandextended_edgesare very small objects).These results show an average 77% CPU time reduction across the Ultra benchmarks as compared to

master:Checklist:
@briefdescribing the intended functionality.includedirectives have been added.