fix(core): detect and fatal error on orphaned/invalid events#1055
Conversation
…sConsumer When an event log has duplicate or invalid events (e.g., 2 wait_completed for a single wait_created), the EventsConsumer gets stuck: the orphaned event has no callback to consume it, so eventIndex never advances, blocking all subsequent events and hanging the workflow forever. This adds deferred orphaned event detection that raises a WorkflowRuntimeError instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 9bfe214 The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (43 failed)turso (43 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
|
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Nitro | Express Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds detection and error handling for orphaned/invalid events in the workflow event log that would otherwise cause workflows to hang silently. The implementation uses a deferred detection mechanism with setTimeout(0) to allow legitimate callbacks registered via process.nextTick to complete before flagging an event as orphaned.
Changes:
- Added
onUnconsumedEventcallback parameter toEventsConsumerthat triggers when events cannot be consumed - Wired orphaned event detection in
runWorkflowto reject the workflow withWorkflowRuntimeError - Added comprehensive unit and integration tests for duplicate and orphaned events across wait and step operations
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/events-consumer.ts | Implements deferred orphaned event detection using setTimeout(0) with cancellation on new subscribes |
| packages/core/src/events-consumer.test.ts | Adds unit tests for onUnconsumedEvent callback covering null events, orphaned events, and subscribe cancellation |
| packages/core/src/workflow.ts | Wires EventsConsumer with onUnconsumedEvent callback that rejects workflow via workflowDiscontinuation |
| packages/core/src/workflow.test.ts | Adds integration tests for duplicate wait_completed, duplicate step_completed, and orphaned events blocking workflow execution |
| packages/core/src/workflow/sleep.test.ts | Updates setupWorkflowContext helper with onUnconsumedEvent handler and adds test for duplicate wait_completed events |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use correct imports (getRun from workflow/api) and declare variables to satisfy the docs-typecheck CI job. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VaguelySerious
left a comment
There was a problem hiding this comment.
I think it'd be great to have an e2e test that does Promise.all([10x sleep('1s')]) or similar and asserts any status return within ~2s (since it might fail or not).
Adds parallelSleepWorkflow that does Promise.all with 10 concurrent
sleep('1s') calls, and an e2e test asserting it completes or fails
within a reasonable time (not hanging).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Added in 678901d — |
run.status returns immediately and the workflow is still 'running'. Use run.returnValue which polls until completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test verifies 10 concurrent sleep('1s') calls complete in parallel
(~1s) rather than serially (10s) or hanging indefinitely. 30s timeout
is sufficient since the workflow should finish in ~1-2s.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code ReviewThe orphaned event detection is a solid improvement — turning a silent hang into a fast, diagnosable failure is the right call. The The
|
Summary
wait_completedfor a singlewait_created), theEventsConsumerpreviously got stuck — the orphaned event had no callback to consume it, soeventIndexnever advanced, blocking all subsequent events and hanging the workflow foreverEventsConsumer— when a non-null event cannot be consumed by any registered callback, aWorkflowRuntimeErroris raised instead of silently hangingsetTimeout(0)(macrotask) deferral with cancellation on new subscribes, so legitimate callbacks that register viaprocess.nextTickaren't falsely flaggedCORRUPTED_EVENT_LOGerror slug to@workflow/errorswith a docs page at/docs/errors/corrupted-event-logexplaining causes and remediationTest plan
WorkflowRuntimeErroris raised viaonWorkflowErrorEventsConsumerunit tests: orphaned non-null event callsonUnconsumedEvent, null event does not, backward compat without callback, new subscribe cancels pending checkWorkflowRuntimeErrorrejectionstep_completed, orphanedstep_completed(unknown correlationId), orphanedwait_completed— all blocking subsequent events🤖 Generated with Claude Code