fix(core): ensure async deserialization resolves promises in event log order#1246
Conversation
…g order Add a sequential deserializationChain to WorkflowOrchestratorContext that ensures step/hook hydration resolves in event log order even when individual deserialization takes variable time (e.g. due to encryption/decryption). Without this, concurrent async hydrations could resolve out of order, breaking workflow replay determinism.
🦋 Changeset detectedLatest commit: 7def3db The changes in this PR will be included in the next version bump. This PR includes changesets to release 14 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (48 failed)turso (48 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
There was a problem hiding this comment.
Pull request overview
Fixes a determinism hazard in the core workflow replay engine by enforcing sequential async deserialization so step/hook promises resolve in the same order as the event log, even when hydration time varies (e.g., future decryption).
Changes:
- Introduces
deserializationChain: Promise<void>onWorkflowOrchestratorContextand initializes it for workflow runs and test contexts. - Chains step (
step_completed) and hook (hook_received) hydration + promise resolution throughdeserializationChainto preserve event-log ordering. - Adds a new test file intended to validate ordering under simulated variable async deserialization delays.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/private.ts | Adds deserializationChain to the orchestrator context type (with doc comment). |
| packages/core/src/workflow.ts | Initializes deserializationChain for real workflow runs. |
| packages/core/src/step.ts | Routes step completion hydration/resolve through deserializationChain. |
| packages/core/src/workflow/hook.ts | Routes hook payload hydration/resolve through deserializationChain in both immediate and queued cases. |
| packages/core/src/async-deserialization-ordering.test.ts | New tests meant to simulate variable async hydration delays and assert deterministic resolution ordering. |
| packages/core/src/workflow/sleep.test.ts | Updates test context helper to include deserializationChain. |
| packages/core/src/workflow/hook.test.ts | Updates test context helper to include deserializationChain. |
| packages/core/src/step.test.ts | Updates test context helper to include deserializationChain. |
| .changeset/fix-deserialization-ordering.md | Publishes the fix as a patch changeset for @workflow/core. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback: - Add hook-specific deserialization ordering test - Add afterEach(vi.restoreAllMocks) to prevent spy leaks - Remove manual mockRestore calls
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- The hasConflict path in createHookPromise() was rejecting immediately, bypassing the promiseQueue and potentially settling out of event-log order relative to prior queued resolutions. - Add missing runId and encryptionKey to sleep.test.ts context stub.
VaguelySerious
left a comment
There was a problem hiding this comment.
Great solution to the problem
Add failing tests that reproduce a regression from #1246 where the sleep's WorkflowSuspension fires before all hook payloads are delivered when a hook and sleep run concurrently. Root cause: when the null event fires, the sleep queues a suspension through promiseQueue. After the first hook payload resolves, subsequent hook payload resolutions are queued AFTER the sleep suspension, causing the workflow to terminate prematurely via Promise.race. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…romiseQueue The promiseQueue refactor (#1246) moved ALL promise resolutions through the queue, including WorkflowSuspension calls. This caused premature termination when hooks had buffered payloads and a concurrent entity (sleep or incomplete step) was pending: 1. Null event fires → all subscribers run 2. Sleep/step queues WorkflowSuspension via promiseQueue.then() 3. Hook's next payload resolution is also queued via promiseQueue.then() 4. Suspension fires first → Promise.race terminates workflow 5. Hook payload never delivered → infinite retry loop Fix: move WorkflowSuspension calls back to setTimeout(0) (macrotask). Suspensions fire AFTER all microtask-based deliveries (promiseQueue resolve/reject for step results and hook payloads) have completed. Non-suspension resolve/reject calls remain on promiseQueue for deterministic ordering of data delivery. Affected paths: - step.ts: null event handler - sleep.ts: null event handler - hook.ts: null event handler, eventLogEmpty suspension, dispose suspension
…h concurrent pending entities (#1294) * test: reproduce hook+sleep promiseQueue regression Add failing tests that reproduce a regression from #1246 where the sleep's WorkflowSuspension fires before all hook payloads are delivered when a hook and sleep run concurrently. Root cause: when the null event fires, the sleep queues a suspension through promiseQueue. After the first hook payload resolves, subsequent hook payload resolutions are queued AFTER the sleep suspension, causing the workflow to terminate prematurely via Promise.race. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: expand tests to isolate bug to hook + pending entity pattern Add control tests proving sequential steps are NOT affected by the promiseQueue regression, isolating the bug to hooks specifically: Failing (hook-based): - hook + sleep: all 3 payloads → step invocation - hook + sleep: 2 payloads → return - hook + incomplete step: 2 payloads → return Passing (step-based controls): - sleep + sequential steps: both step events exist - sleep + sequential steps: only 1st step completed - incomplete step + sequential steps: all step events exist - hook only (no concurrent entity): payloads + step The bug is: any entity that queues a suspension through promiseQueue at null-event time preempts hook payload delivery, because hooks buffer payloads in payloadsQueue and only resolve them one-at-a-time as the workflow code iterates. Steps are unaffected because each step has its own events consumed before null fires. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(core): move WorkflowSuspension calls back to setTimeout(0) from promiseQueue The promiseQueue refactor (#1246) moved ALL promise resolutions through the queue, including WorkflowSuspension calls. This caused premature termination when hooks had buffered payloads and a concurrent entity (sleep or incomplete step) was pending: 1. Null event fires → all subscribers run 2. Sleep/step queues WorkflowSuspension via promiseQueue.then() 3. Hook's next payload resolution is also queued via promiseQueue.then() 4. Suspension fires first → Promise.race terminates workflow 5. Hook payload never delivered → infinite retry loop Fix: move WorkflowSuspension calls back to setTimeout(0) (macrotask). Suspensions fire AFTER all microtask-based deliveries (promiseQueue resolve/reject for step results and hook payloads) have completed. Non-suspension resolve/reject calls remain on promiseQueue for deterministic ordering of data delivery. Affected paths: - step.ts: null event handler - sleep.ts: null event handler - hook.ts: null event handler, eventLogEmpty suspension, dispose suspension * fix(core): use pendingDeliveries counter + scheduleWhenIdle for suspensions Replace all nested setTimeout/promiseQueue suspension patterns with a clean idle-polling mechanism: 1. pendingDeliveries counter: incremented before async hydration (step results, hook payloads), decremented in finally block after delivery 2. scheduleWhenIdle(ctx, fn): polls via setTimeout(0) → check counter → if > 0, wait for promiseQueue.then() → repeat. Only fires fn when pendingDeliveries reaches 0. This correctly handles: - Sync deserialization (no encryption): counter is 0, fires immediately after first setTimeout(0) - Async deserialization (with encryption): waits for decryption to complete before firing - Multi-round hook payload delivery: each createHookPromise() call increments the counter, preventing premature suspension between delivery rounds Also adds payloadsQueue.length check to hook null handler — don't trigger suspension if buffered payloads can satisfy pending awaits. Tests now run in both sync and async modes (14 total = 7 scenarios x 2). All 164 tests pass. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Nathan Rajlich <n@n8.io>
Summary
promiseQueueonWorkflowOrchestratorContextthrough which all event-driven promise resolutions flow — steps, hooks, sleeps, failures, suspensions, and errorssetTimeout(0)calls fromstep.ts,hook.ts, andsleep.tsin favor of the unified queueProblem
The
hydrateStepReturnValuefunction was madeasyncin preparation for encryption/decryption support. Currently no realawaithappens so things work, but once actual async work begins (e.g. decrypting a larger payload takes longer), multiple in-flight hydrations would race and resolve out of order:Changes
packages/core/src/private.tspromiseQueue: Promise<void>to context typepackages/core/src/workflow.tsPromise.resolve()packages/core/src/step.tspromiseQueue, remove allsetTimeout(0)packages/core/src/workflow/hook.tspromiseQueue, remove allsetTimeout(0)packages/core/src/workflow/sleep.tspromiseQueue, remove allsetTimeout(0)packages/core/src/async-deserialization-ordering.test.tspromiseQueueto context instep.test.ts,hook.test.ts,sleep.test.tsTest plan
All 145 tests pass (7 new + 138 existing):
New tests (
async-deserialization-ordering.test.ts):Existing tests:
workflow.test.ts— 69 integration testsstep.test.ts— 15 testshook.test.ts— 23 testssleep.test.ts— 9 testsevents-consumer.test.ts— 22 tests