Skip to content

ExecutionSpace: flattened OpState#304

Open
maartenarnst wants to merge 1 commit into
mainfrom
opstate2
Open

ExecutionSpace: flattened OpState#304
maartenarnst wants to merge 1 commit into
mainfrom
opstate2

Conversation

@maartenarnst

@maartenarnst maartenarnst commented Jan 20, 2026

Copy link
Copy Markdown
Collaborator

Food for thought :) An alternative design with an opstate.

The main change is that in this design, work is launched from the start function on host. This way, we avoid launching it from the set_value function that the standard seems to want to run on the execution agent, which may be the device.

If we make the op state queryable for the exec, we can decide in the start functor whether or not to fence.

@maartenarnst maartenarnst self-assigned this Jan 20, 2026
Comment on lines +67 to +76
inner_opstate.start();

try {
Kokkos::parallel_for(
std::format("{}: then", Kokkos::Impl::TypeInfo<typename Schd::execution_space>::name()),
Kokkos::RangePolicy(this->schd.state->exec, 0, 1),
wrapper);
} catch (...) {
stdexec::set_error(std::move(this->rcvr), std::current_exception());
}

@romintomasetti romintomasetti Jan 20, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no chance to work.

Say we have:

auto chain = stdexec::schedule(esc.get_scheduler()) | stdexec::then(..A..) | stdexec::then(..B..);
stdexec::sync_wait(std::move(chain));

What you'll get is:

  1. the operation state of sync_wait will start the operation state of B
  2. operation state of B calls operation state of A
  3. operation state of A call operation state of the scheduler, that call the set value on its receiver
  4. thus calling set value on A recevier
  5. calling set value on B receiver
  6. calling set value on sync_wait receiver
  7. triggering the sync wait fence
  8. Only after that does control return to A operation state start() function, which then launches kernel A.
  9. Then control returns to B operation state start() function, which launches kernel B

So you got everything in the wrong order, well done 👍

I agree we can have operation states, yet, it's the call to the ThenReceiver set_value that tells us "now is the time to launch you async operation".

And so we're back to the same situation that #262 solves.

If you feel strongly about this, you can followup on #262 and move all the propagate_completion_signal stuff in some operation state, much like in nvexec.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it can't work on its own. We'd have to modify the schedule sender and sync wait too. Essentially, the start function of the op state of the schedule sender would do nothing. The one of the sync wait would fence.

The issue that this tries to solve is that we currently launch work and fence in our set_value functions. And if we read the standard literally, it seems to say that the set_value must run on an execution agent, i.e. the gpu. This is a problem because we can't launch and fence from device. So even if we don't retain the upstate from this PR, there's the food for thought 😄.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd have to modify the schedule sender and sync wait too.

It will not help you with the when_all case at all.

@maartenarnst

maartenarnst commented Jan 27, 2026

Copy link
Copy Markdown
Collaborator Author

Hey @romintomasetti. This is more concretely what I am thinking of. There are two commits:

  • the first step introduces a ThenOpstate
  • the second step moves the launch from the set_value function to the start function of this op state (there is still work here, the ordering is currently not right)

The main reason I think of the second step is point 10 from https://eel.is/c++draft/exec#async.ops-10:

[A scheduler] is a factory for senders whose asynchronous operations execute value completion operations on an execution agent belonging to the scheduler's associated execution resource.

I interpret this as saying that the set_value will in principle be called on device. And so that's why I feel it may not be the best place to launch work.

There could also be a third step and move the launch into a "child opstate". There could be some nice links with graph nodes in that case.

@maartenarnst maartenarnst force-pushed the opstate2 branch 2 times, most recently from 625a39d to 7cfb5db Compare January 27, 2026 20:43
@maartenarnst maartenarnst changed the title Food for thought: alternative design with opstate ExecutionSpace: fused opstate Feb 7, 2026
@maartenarnst maartenarnst changed the title ExecutionSpace: fused opstate ExecutionSpace: fused OpState Feb 7, 2026
@maartenarnst maartenarnst force-pushed the opstate2 branch 13 times, most recently from 901eba4 to 4ba0a8d Compare February 10, 2026 07:57
@maartenarnst maartenarnst force-pushed the opstate2 branch 4 times, most recently from 11bafad to 173ed6a Compare February 18, 2026 11:20
Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated
Comment thread kokkos_ext/impl/Concepts.hpp Outdated
Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated
Comment thread tests/kokkos_ext/execution_space/test_operation_state.cpp Outdated
@maartenarnst maartenarnst force-pushed the opstate2 branch 8 times, most recently from 4293b8d to c163709 Compare February 19, 2026 08:51

@romintomasetti romintomasetti left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LEt's also update the PR description.

Comment thread kokkos_ext/impl/execution_space/domain.hpp Outdated
Comment thread kokkos_ext/impl/Concepts.hpp Outdated
Comment on lines +60 to +61
std::is_nothrow_move_constructible_v<typename std::remove_cvref_t<Data>::functor_t>
&& stdexec::__nothrow_decay_copyable<Sndr&&>) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not right.

You'd better have a helper that givens you the ParallelForSender type given the Data and Sndr.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not write it at all, it does not serve any purpose (error channel or overload resolution) and just make it harder to read.

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated
requires std::same_as<
stdexec::__completion_domain_of_t<stdexec::set_value_t, Sndr, stdexec::env_of_t<Rcvr>>,
Kokkos::Experimental::details::execution_space::Domain
>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems useless ?

Comment thread kokkos_ext/impl/execution_space/operation_state.hpp
Comment thread kokkos_ext/impl/execution_space/operation_state.hpp Outdated
Comment thread kokkos_ext/impl/execution_space/operation_state.hpp
Comment thread tests/kokkos_ext/execution_space/test_parallel_for.cpp Outdated
Comment thread tests/kokkos_ext/execution_space/test_parallel_for.cpp Outdated
@maartenarnst maartenarnst force-pushed the opstate2 branch 3 times, most recently from 3a2c3b0 to 8d07d67 Compare February 20, 2026 10:58
@maartenarnst maartenarnst changed the title ExecutionSpace: fused OpState ExecutionSpace: folded OpState Feb 20, 2026
@maartenarnst maartenarnst force-pushed the opstate2 branch 3 times, most recently from d079a48 to 31915fe Compare February 23, 2026 18:01
Signed-off-by: Maarten Arnst <maarten.arnst@uliege.be>
@maartenarnst maartenarnst changed the title ExecutionSpace: folded OpState ExecutionSpace: flattened OpState Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants