ExecutionSpace: flattened `OpState` by maartenarnst · Pull Request #304 · uliegecsm/graph-dispatching

maartenarnst · 2026-01-20T17:04:09Z

Food for thought :) An alternative design with an opstate.

The main change is that in this design, work is launched from the start function on host. This way, we avoid launching it from the set_value function that the standard seems to want to run on the execution agent, which may be the device.

If we make the op state queryable for the exec, we can decide in the start functor whether or not to fence.

romintomasetti · 2026-01-20T17:31:47Z

+    inner_opstate.start();
+
+    try {
+        Kokkos::parallel_for(
+            std::format("{}: then", Kokkos::Impl::TypeInfo<typename Schd::execution_space>::name()),
+            Kokkos::RangePolicy(this->schd.state->exec, 0, 1),
+            wrapper);
+    } catch (...) {
+        stdexec::set_error(std::move(this->rcvr), std::current_exception());
+    }


This has no chance to work.

Say we have:

auto chain = stdexec::schedule(esc.get_scheduler()) | stdexec::then(..A..) | stdexec::then(..B..); stdexec::sync_wait(std::move(chain));

What you'll get is:

the operation state of sync_wait will start the operation state of B

operation state of B calls operation state of A

operation state of A call operation state of the scheduler, that call the set value on its receiver

thus calling set value on A recevier

calling set value on B receiver

calling set value on sync_wait receiver

triggering the sync wait fence

Only after that does control return to A operation state start() function, which then launches kernel A.

Then control returns to B operation state start() function, which launches kernel B

So you got everything in the wrong order, well done 👍

I agree we can have operation states, yet, it's the call to the ThenReceiver set_value that tells us "now is the time to launch you async operation".

And so we're back to the same situation that #262 solves.

If you feel strongly about this, you can followup on #262 and move all the propagate_completion_signal stuff in some operation state, much like in nvexec.

I agree it can't work on its own. We'd have to modify the schedule sender and sync wait too. Essentially, the start function of the op state of the schedule sender would do nothing. The one of the sync wait would fence.

The issue that this tries to solve is that we currently launch work and fence in our set_value functions. And if we read the standard literally, it seems to say that the set_value must run on an execution agent, i.e. the gpu. This is a problem because we can't launch and fence from device. So even if we don't retain the upstate from this PR, there's the food for thought 😄.

We'd have to modify the schedule sender and sync wait too.

It will not help you with the when_all case at all.

maartenarnst · 2026-01-27T18:52:36Z

Hey @romintomasetti. This is more concretely what I am thinking of. There are two commits:

the first step introduces a ThenOpstate
the second step moves the launch from the set_value function to the start function of this op state (there is still work here, the ordering is currently not right)

The main reason I think of the second step is point 10 from https://eel.is/c++draft/exec#async.ops-10:

[A scheduler] is a factory for senders whose asynchronous operations execute value completion operations on an execution agent belonging to the scheduler's associated execution resource.

I interpret this as saying that the set_value will in principle be called on device. And so that's why I feel it may not be the best place to launch work.

There could also be a third step and move the launch into a "child opstate". There could be some nice links with graph nodes in that case.

romintomasetti

LEt's also update the PR description.

romintomasetti · 2026-02-19T13:32:46Z

+        std::is_nothrow_move_constructible_v<typename std::remove_cvref_t<Data>::functor_t>
+        && stdexec::__nothrow_decay_copyable<Sndr&&>) {


I think this is not right.

You'd better have a helper that givens you the ParallelForSender type given the Data and Sndr.

Let's not write it at all, it does not serve any purpose (error channel or overload resolution) and just make it harder to read.

romintomasetti · 2026-02-19T13:56:43Z

+requires std::same_as<
+             stdexec::__completion_domain_of_t<stdexec::set_value_t, Sndr, stdexec::env_of_t<Rcvr>>,
+             Kokkos::Experimental::details::execution_space::Domain
+         >


Seems useless ?

Signed-off-by: Maarten Arnst <maarten.arnst@uliege.be>

maartenarnst self-assigned this Jan 20, 2026

romintomasetti reviewed Jan 20, 2026

View reviewed changes

maartenarnst force-pushed the opstate2 branch from dfb7d79 to d97f2c2 Compare January 27, 2026 18:42

maartenarnst force-pushed the opstate2 branch 2 times, most recently from 625a39d to 7cfb5db Compare January 27, 2026 20:43

maartenarnst mentioned this pull request Feb 2, 2026

ExecutionSpace: OpState and ParallelFor #347

Merged

maartenarnst force-pushed the opstate2 branch from 77a3c1a to dfdc450 Compare February 6, 2026 21:17

maartenarnst mentioned this pull request Feb 7, 2026

ExecutionSpace: move definition of Domain to separate file #360

Merged

maartenarnst changed the title ~~Food for thought: alternative design with opstate~~ ExecutionSpace: fused opstate Feb 7, 2026

maartenarnst changed the title ~~ExecutionSpace: fused opstate~~ ExecutionSpace: fused OpState Feb 7, 2026

maartenarnst force-pushed the opstate2 branch 13 times, most recently from 901eba4 to 4ba0a8d Compare February 10, 2026 07:57

maartenarnst force-pushed the opstate2 branch 4 times, most recently from 11bafad to 173ed6a Compare February 18, 2026 11:20

maartenarnst commented Feb 18, 2026

View reviewed changes

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated

maartenarnst commented Feb 18, 2026

View reviewed changes

Comment thread kokkos_ext/impl/Concepts.hpp Outdated

romintomasetti reviewed Feb 18, 2026

View reviewed changes

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated

romintomasetti reviewed Feb 18, 2026

View reviewed changes

Comment thread tests/kokkos_ext/execution_space/test_operation_state.cpp Outdated

maartenarnst force-pushed the opstate2 branch 8 times, most recently from 4293b8d to c163709 Compare February 19, 2026 08:51

romintomasetti requested changes Feb 19, 2026

View reviewed changes

romintomasetti reviewed Feb 19, 2026

View reviewed changes

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated

romintomasetti reviewed Feb 19, 2026

View reviewed changes

maartenarnst commented Feb 19, 2026

View reviewed changes

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp

romintomasetti reviewed Feb 19, 2026

View reviewed changes

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated

maartenarnst commented Feb 19, 2026

View reviewed changes

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp

romintomasetti reviewed Feb 19, 2026

View reviewed changes

Comment thread tests/kokkos_ext/execution_space/test_parallel_for.cpp Outdated

maartenarnst commented Feb 19, 2026

View reviewed changes

Comment thread tests/kokkos_ext/execution_space/test_parallel_for.cpp Outdated

maartenarnst mentioned this pull request Feb 19, 2026

details: noexcept specification for transform sender #396

Open

maartenarnst force-pushed the opstate2 branch 3 times, most recently from 3a2c3b0 to 8d07d67 Compare February 20, 2026 10:58

maartenarnst changed the title ~~ExecutionSpace: fused OpState~~ ExecutionSpace: folded OpState Feb 20, 2026

maartenarnst force-pushed the opstate2 branch 3 times, most recently from d079a48 to 31915fe Compare February 23, 2026 18:01

Folded opstate

7208251

Signed-off-by: Maarten Arnst <maarten.arnst@uliege.be>

maartenarnst force-pushed the opstate2 branch from 31915fe to 7208251 Compare February 23, 2026 18:08

maartenarnst changed the title ~~ExecutionSpace: folded OpState~~ ExecutionSpace: flattened OpState Mar 2, 2026

romintomasetti mentioned this pull request Mar 19, 2026

ExecutionSpace: flattened operation state uliegecsm/kokkos-execution#62

Merged

		std::is_nothrow_move_constructible_v<typename std::remove_cvref_t<Data>::functor_t>
		&& stdexec::__nothrow_decay_copyable<Sndr&&>) {

Uh oh!

Conversation

maartenarnst commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romintomasetti Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maartenarnst Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

romintomasetti Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

maartenarnst commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romintomasetti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

romintomasetti Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

romintomasetti Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

romintomasetti Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maartenarnst commented Jan 20, 2026 •

edited

Loading

romintomasetti Jan 20, 2026 •

edited

Loading

maartenarnst commented Jan 27, 2026 •

edited

Loading