ggml webgpu: minor set rows optimization by reeselevine · Pull Request #16810 · ggml-org/llama.cpp

reeselevine · 2025-10-27T23:11:13Z

Better parallelization of SET_ROWS by having multiple threads work on each row, as well as vectorization
Adds more useful labels to buffers for debugging
Adds Dawn-specific toggles which disable some safety protections when running natively, for better performance

Better matrix multiplication coming soon!

…ecks

* updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com>

slaren · 2025-11-01T21:12:09Z

        std::vector<ggml_tensor *> expert_views(n_expert_used);
        for (int64_t i = 0; i < n_expert_used; ++i) {
-            expert_views[i] = ggml_view_2d(ctx, weighted, n_embd, n_tokens, weighted->nb[2], i * weighted->nb[1]);
+            expert_views[i] = ggml_view_2d(ctx, weighted, n_embd, n_tokens, weighted->nb[1], i * weighted->nb[1]);


I don't think this change is correct.

This update was testing some changes to the addition kernels in response to the discussion in #16857. But, it looks like the CUDA CI is failing with this change too, so if it's confirmed that the nb[2] is correct here I'll need to do a little more debugging to understand why the WebGPU add op is failing as currently written. I'll mark this PR as a draft for now to avoid it accidentally being merged.

Quick update: Realized this due to the non-contiguity in the view tensors, which isn't supported yet in the kernels. I disabled support for non-contiguous tensors here and added a note so it can be added in the future.

This reverts commit ed710b3.

…or future support

reeselevine · 2025-11-04T23:27:17Z

Just a quick ping here @slaren and/or @CISC, hopefully getting this merged will fix the WebGPU CI errors for now and allow some more PRs to be opened.

* origin/master: (21 commits) vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (ggml-org#16919) examples(gguf): GGUF example outputs (ggml-org#17025) mtmd: allow QwenVL to process larger image by default (ggml-org#17020) server : do not default to multiple slots with speculative decoding (ggml-org#17017) mtmd: improve struct initialization (ggml-org#16981) docs: Clarify the endpoint that webui uses (ggml-org#17001) model : add openPangu-Embedded (ggml-org#16941) ggml webgpu: minor set rows optimization (ggml-org#16810) sync : ggml ggml : fix conv2d_dw SVE path (ggml/1380) CUDA: update ops.md (ggml-org#17005) opencl: update doc (ggml-org#17011) refactor: replace sprintf with snprintf for safer string handling in dump functions (ggml-org#16913) vulkan: remove the need for the dryrun (ggml-org#16826) server : do context shift only while generating (ggml-org#17000) readme : update hot topics (ggml-org#17002) ggml-cpu : bicubic interpolation (ggml-org#16891) ci : apply model label to models (ggml-org#16994) chore : fix models indent after refactor (ggml-org#16992) Fix garbled output with REPACK at high thread counts (ggml-org#16956) ...

* Add buffer label and enable dawn-specific toggles to turn off some checks * Minor set_rows optimization (ggml-org#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Remove some comments * Implement overlap binary operators * Revert "Implement overlap binary operators" This reverts commit ed710b3. * Disable support for non-contiguous binary_op tensors and leave note for future support --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com> Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>

* Add buffer label and enable dawn-specific toggles to turn off some checks * Minor set_rows optimization (#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Remove some comments * Implement overlap binary operators * Revert "Implement overlap binary operators" This reverts commit ed710b36f51ab3f53fa13db15c1685dc8678a32a. * Disable support for non-contiguous binary_op tensors and leave note for future support --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com> Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>

reeselevine and others added 5 commits October 15, 2025 19:04

Add buffer label and enable dawn-specific toggles to turn off some ch…

b566811

…ecks

Merge remote-tracking branch 'upstream/master'

2aa05c6

Comment on dawn toggles

51aae63

Remove some comments

f0cfae4

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Oct 28, 2025

Merge remote-tracking branch 'upstream/master'

da5296e

reeselevine mentioned this pull request Oct 31, 2025

CUDA: add expert reduce kernel #16857

Merged

Implement overlap binary operators

ed710b3

reeselevine requested a review from slaren as a code owner November 1, 2025 20:55

github-actions bot added the testing Everything test related label Nov 1, 2025

slaren reviewed Nov 1, 2025

View reviewed changes

reeselevine marked this pull request as draft November 2, 2025 04:24

reeselevine added 2 commits November 1, 2025 21:33

Revert "Implement overlap binary operators"

b319672

This reverts commit ed710b3.

Disable support for non-contiguous binary_op tensors and leave note f…

9a029e4

…or future support

reeselevine marked this pull request as ready for review November 2, 2025 04:40

CISC approved these changes Nov 5, 2025

View reviewed changes

CISC merged commit 03ea041 into ggml-org:master Nov 5, 2025
71 of 72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml webgpu: minor set rows optimization#16810

ggml webgpu: minor set rows optimization#16810
CISC merged 9 commits intoggml-org:masterfrom
reeselevine:master

reeselevine commented Oct 27, 2025

Uh oh!

slaren Nov 1, 2025

Uh oh!

reeselevine Nov 2, 2025

Uh oh!

reeselevine Nov 2, 2025

Uh oh!

reeselevine commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

reeselevine commented Oct 27, 2025

Uh oh!

slaren Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

reeselevine Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

reeselevine Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

reeselevine commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants