Expose cancellation signal on request#650
Expose cancellation signal on request#650mcuelenaere wants to merge 2 commits intorestatedev:mainfrom
Conversation
…tion Add a `cancellationSignal: AbortSignal` property to the `Request` interface that aborts as soon as the Restate runtime sends a cancellation, rather than waiting for the next `ctx.run()` call. This allows users to pass the signal to fetch(), database clients, or other async operations for proactive cancellation handling. Implementation uses a new CancellationWatcherPromise that hooks into the existing PromisesExecutor loop to monitor the VM's cancel_handle() without modifying InputPump or doProgressInner. Co-authored-by: Cursor <cursoragent@cursor.com>
Stop the watcher's doProgressInner loop when the invocation ends by registering cancelWatcher.stop() on invocationEndPromise. Without this, the watcher keeps calling do_progress() on the closed VM after sys_end(), causing cascading "(598) State machine was closed" errors and DANGER warnings about operations running after invocation close. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thanks a lot for your contribution @mcuelenaere. I think this is a really nice improvement. Curious to hear what @nikrooz thinks about this idea. The one thing that one needs to be aware of is that with this change, the code snippet that reacts to the |
|
hi @tillrohrmann, I didn't think of any possible side effects that could occur from this. To give some context, our usecase is a restate service that does streaming LLM (via an out-of-band pub-sub mechanism), but we'd like the user to be able to abort this. My thinking was to leverage the built-in Restate cancellation mechanism, but we do need to do some simple cleanup/bookkeeping (setting some state in the virtual object) after cancellation is requested. If that breaks certain Restate assumptions, then I could look into pivotting the cancellation to an out-of-band mechanism as well (and drop this PR). WDYT? |
|
I don't think that this is a problem @mcuelenaere but I wanted to double check with @slinkydeveloper who knows the SDK implementation a lot better than me. I like your approach and the capability to abort an ongoing |
|
Thanks @mcuelenaere for doing this PR. As Till mentioned this is something for Francesco to look at. CC @igalshilman
|
|
For reference, I didn't have the time to look into this yet, but this PR might be causing journal mismatch errors: https://restatecommunity.slack.com/archives/C0821C5RBH9/p1772110842398699 |
|
I skimmed through this, and i think the current design won't work correctly, as we won't be able to replay this deterministically, because this creates the situation of 2 competing asynchronous tasks:
If the abort handler runs some asynchronous stuff, and uses the context (most likely), then you will have the abort handler competing with the restate handler. I think for this to work well, we would have to change how we expose cancellation in the first place, and/or deterministically be able to run the abort handler and later resume the restate handler code... Will have to think about this... |
|
Just FYI: internally, we are no longer using this PR. We switched to an out-of-bound signalling system (basically, using Redis), which fulfill the requirements just as good. I can close this PR or keep it open, up to you. I will most likely not be maintaining this change |

Summary
This PR adds a
cancellationSignal: AbortSignalproperty to theRequestinterface that aborts as soon as the Restate runtime sends a cancellation, rather than waiting for the nextctx.run()call.This allows passing the signal to
fetch(), database clients, or any API that accepts anAbortSignalfor proactive cancellation of in-flight operations. The signal's reason is aCancelledError.Motivation
When an invocation is cancelled, the SDK currently only detects this at the next Restate operation. For handlers with long-running user code between operations (e.g., AI/LLM streaming), cancellation detection can be significantly delayed.
Implementation
A new
CancellationWatcherPromisehooks into the existingPromisesExecutorloop to monitoris_completed(cancel_handle())and abort anAbortControllerwhen cancellation is detected. The watcher is stopped viainvocationEndPromisewhen the invocation ends, preventing interaction with a closed VM.Design document
Notes
I've verified this change works correctly in my local setup. The added E2E tests I haven't run yet, I was hoping the CI could help with that.