Skip to content

Materialize waits as entities to prevent duplicate wait_completed events#1057

Merged
pranaygp merged 10 commits into
mainfrom
pgp/wait-complete-guard
Feb 17, 2026
Merged

Materialize waits as entities to prevent duplicate wait_completed events#1057
pranaygp merged 10 commits into
mainfrom
pgp/wait-complete-guard

Conversation

@pranaygp
Copy link
Copy Markdown
Contributor

@pranaygp pranaygp commented Feb 14, 2026

Summary

Adds wait entity materialization to the local and postgres world implementations, matching the server-side DynamoDB behavior in workflow-server PR #265. This prevents duplicate wait_completed events by creating wait entities with status tracking and conditional writes.

Changes by package

@workflow/world

  • New waits.ts with Wait type, WaitSchema, and WaitStatusSchema
  • Added optional wait field to EventResult interface

@workflow/world-local

  • wait_created creates a wait entity file with status: 'waiting'; throws 409 if already exists
  • wait_completed transitions to status: 'completed'; throws 404 if not found, 409 if already completed
  • Waits are cleaned up on terminal run states (run_completed, run_failed, run_cancelled) via deleteAllWaitsForRun
  • wait_created blocked on terminal runs (can't create new entities)

@workflow/world-postgres

  • New workflow_waits table with wait_status enum (migration 0007_add_waits_table)
  • wait_created uses INSERT with onConflictDoNothing + check for 409
  • wait_completed uses conditional UPDATE (WHERE status = 'waiting') with fallback SELECT to distinguish 404 vs 409
  • Waits cleaned up on terminal run states and in legacy handler

@workflow/core

  • Handles 409 conflict gracefully in wait_completed event creation (concurrent VQS invocations)

Server-side counterpart: https://github.com/vercel/workflow-server/pull/265

Test plan

  • Local world unit tests pass (pnpm test --filter world-local)
  • Postgres world unit tests pass (pnpm test --filter world-postgres)
  • TypeScript compiles cleanly
  • E2E Vercel Prod tests pass against server preview branch
  • Clear WORKFLOW_SERVER_URL_OVERRIDE before merging

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 14, 2026 01:25
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Feb 14, 2026

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 14, 2026

🦋 Changeset detected

Latest commit: 04f58a3

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 18 packages
Name Type
@workflow/core Patch
@workflow/world Patch
@workflow/world-local Patch
@workflow/world-postgres Patch
@workflow/builders Patch
@workflow/cli Patch
@workflow/next Patch
@workflow/nitro Patch
@workflow/web-shared Patch
workflow Patch
@workflow/world-testing Patch
@workflow/world-vercel Patch
@workflow/astro Patch
@workflow/nest Patch
@workflow/rollup Patch
@workflow/sveltekit Patch
@workflow/vite Patch
@workflow/nuxt Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 14, 2026

🧪 E2E Test Results

Some tests failed

Summary

Passed Failed Skipped Total
✅ ▲ Vercel Production 512 0 38 550
✅ 💻 Local Development 532 0 68 600
✅ 📦 Local Production 532 0 68 600
✅ 🐘 Local Postgres 532 0 68 600
✅ 🪟 Windows 47 0 3 50
❌ 🌍 Community Worlds 106 44 9 159
✅ 📋 Other 129 0 21 150
Total 2390 44 275 2709

❌ Failed Tests

🌍 Community Worlds (44 failed)

mongodb (1 failed):

  • webhookWorkflow

turso (43 failed):

  • addTenWorkflow
  • addTenWorkflow
  • should work with react rendering in step
  • promiseAllWorkflow
  • promiseRaceWorkflow
  • promiseAnyWorkflow
  • hookWorkflow
  • webhookWorkflow
  • sleepingWorkflow
  • parallelSleepWorkflow
  • nullByteWorkflow
  • workflowAndStepMetadataWorkflow
  • fetchWorkflow
  • promiseRaceStressTestWorkflow
  • error handling error propagation workflow errors nested function calls preserve message and stack trace
  • error handling error propagation workflow errors cross-file imports preserve message and stack trace
  • error handling error propagation step errors basic step error preserves message and stack trace
  • error handling error propagation step errors cross-file step error preserves message and function names in stack
  • error handling retry behavior regular Error retries until success
  • error handling retry behavior FatalError fails immediately without retries
  • error handling retry behavior RetryableError respects custom retryAfter delay
  • error handling retry behavior maxRetries=0 disables retries
  • error handling retry behavior workflow completes despite transient 5xx on step_completed
  • error handling catchability FatalError can be caught and detected with FatalError.is()
  • hookCleanupTestWorkflow - hook token reuse after workflow completion
  • concurrent hook token conflict - two workflows cannot use the same hook token simultaneously
  • stepFunctionPassingWorkflow - step function references can be passed as arguments (without closure vars)
  • stepFunctionWithClosureWorkflow - step function with closure variables passed as argument
  • closureVariableWorkflow - nested step functions with closure variables
  • spawnWorkflowFromStepWorkflow - spawning a child workflow using start() inside a step
  • health check (queue-based) - workflow and step endpoints respond to health check messages
  • pathsAliasWorkflow - TypeScript path aliases resolve correctly
  • Calculator.calculate - static workflow method using static step methods from another class
  • AllInOneService.processNumber - static workflow method using sibling static step methods
  • ChainableService.processWithThis - static step methods using this to reference the class
  • thisSerializationWorkflow - step function invoked with .call() and .apply()
  • customSerializationWorkflow - custom class serialization with WORKFLOW_SERIALIZE/WORKFLOW_DESERIALIZE
  • instanceMethodStepWorkflow - instance methods with "use step" directive
  • crossContextSerdeWorkflow - classes defined in step code are deserializable in workflow context
  • stepFunctionAsStartArgWorkflow - step function reference passed as start() argument
  • pages router addTenWorkflow via pages router
  • pages router promiseAllWorkflow via pages router
  • pages router sleepingWorkflow via pages router

Details by Category

✅ ▲ Vercel Production
App Passed Failed Skipped
✅ astro 46 0 4
✅ example 46 0 4
✅ express 46 0 4
✅ fastify 46 0 4
✅ hono 46 0 4
✅ nextjs-turbopack 49 0 1
✅ nextjs-webpack 49 0 1
✅ nitro 46 0 4
✅ nuxt 46 0 4
✅ sveltekit 46 0 4
✅ vite 46 0 4
✅ 💻 Local Development
App Passed Failed Skipped
✅ astro-stable 43 0 7
✅ express-stable 43 0 7
✅ fastify-stable 43 0 7
✅ hono-stable 43 0 7
✅ nextjs-turbopack-canary 47 0 3
✅ nextjs-turbopack-stable 47 0 3
✅ nextjs-webpack-canary 47 0 3
✅ nextjs-webpack-stable 47 0 3
✅ nitro-stable 43 0 7
✅ nuxt-stable 43 0 7
✅ sveltekit-stable 43 0 7
✅ vite-stable 43 0 7
✅ 📦 Local Production
App Passed Failed Skipped
✅ astro-stable 43 0 7
✅ express-stable 43 0 7
✅ fastify-stable 43 0 7
✅ hono-stable 43 0 7
✅ nextjs-turbopack-canary 47 0 3
✅ nextjs-turbopack-stable 47 0 3
✅ nextjs-webpack-canary 47 0 3
✅ nextjs-webpack-stable 47 0 3
✅ nitro-stable 43 0 7
✅ nuxt-stable 43 0 7
✅ sveltekit-stable 43 0 7
✅ vite-stable 43 0 7
✅ 🐘 Local Postgres
App Passed Failed Skipped
✅ astro-stable 43 0 7
✅ express-stable 43 0 7
✅ fastify-stable 43 0 7
✅ hono-stable 43 0 7
✅ nextjs-turbopack-canary 47 0 3
✅ nextjs-turbopack-stable 47 0 3
✅ nextjs-webpack-canary 47 0 3
✅ nextjs-webpack-stable 47 0 3
✅ nitro-stable 43 0 7
✅ nuxt-stable 43 0 7
✅ sveltekit-stable 43 0 7
✅ vite-stable 43 0 7
✅ 🪟 Windows
App Passed Failed Skipped
✅ nextjs-turbopack 47 0 3
❌ 🌍 Community Worlds
App Passed Failed Skipped
✅ mongodb-dev 3 0 0
❌ mongodb 46 1 3
✅ redis-dev 3 0 0
✅ redis 47 0 3
✅ turso-dev 3 0 0
❌ turso 4 43 3
✅ 📋 Other
App Passed Failed Skipped
✅ e2e-local-dev-nest-stable 43 0 7
✅ e2e-local-postgres-nest-stable 43 0 7
✅ e2e-local-prod-nest-stable 43 0 7

📋 View full workflow run

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds graceful handling for 409 Conflict responses when creating wait_completed events, addressing race conditions when multiple concurrent workflow invocations attempt to complete the same wait. The changes build on PR #1055 which added orphaned event detection, and work in conjunction with server-side changes that materialize waits as entities to prevent duplicate completions.

Changes:

  • Wraps wait_completed event creation in try/catch blocks to handle 409 conflicts gracefully
  • Applies 409 handling in both automatic wait completion (runtime.ts) and manual wake-up (runs.ts)
  • Updates event sourcing documentation to reflect that waits are now materialized entities in storage

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
packages/core/src/runtime/runs.ts Adds 409 conflict handling in wakeUpRun() to count already-completed waits as successful
packages/core/src/runtime/runs.test.ts New test file with unit tests for 409 handling and error cases in wakeUpRun()
packages/core/src/runtime.ts Wraps automatic wait_completed event creation in try/catch to log and skip on 409 conflicts
docs/content/docs/how-it-works/event-sourcing.mdx Updates documentation to clarify waits are materialized entities with atomic completion guarantees
.changeset/wait-complete-guard.md Adds changeset for patch release describing the fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pranaygp and others added 4 commits February 15, 2026 17:19
When multiple concurrent workflow invocations race to complete the same
wait, the server returns 409 (conflict) for duplicates. This change
handles the 409 gracefully in both runtime.ts (sleep elapsed check) and
runs.ts (wakeUpRun), preventing crashes and treating already-completed
waits as successful.

Also updates event-sourcing docs to reflect that waits are now
materialized as entities in storage with atomic completion guarantees.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ranch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… duplicate wait_completed events

Adds Wait type/schema to the shared world package and implements wait entity
materialization in both local (filesystem) and postgres world implementations,
matching the DynamoDB behavior. wait_created creates a wait entity with status
'waiting', and wait_completed transitions it to 'completed' with guards that
reject duplicates (409).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pranaygp and others added 4 commits February 15, 2026 17:54
…ation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* origin/main:
  add closure comment in sleep test helper (#1071)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rride

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@VaguelySerious VaguelySerious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we add (or modify) one e2e test to include Promise.all(sleep(1s) x 10) or similar? That'd probably ensure general race-condition safety for waits into the future

@pranaygp
Copy link
Copy Markdown
Contributor Author

pranaygp commented Feb 16, 2026

LGTM, but can we add (or modify) one e2e test to include Promise.all(sleep(1s) x 10) or similar? That'd probably ensure general race-condition safety for waits into the future

I already did that in the previous PR and it's been merged.

export async function parallelSleepWorkflow() {
'use workflow';
const startTime = Date.now();
await Promise.all(Array.from({ length: 10 }, () => sleep('1s')));
const endTime = Date.now();
return { startTime, endTime };
}

@pranaygp
Copy link
Copy Markdown
Contributor Author

Replying to @TooTallNate's code review:

Backward Compatibility

Good call — both local and postgres would throw 404 for waits created before this code was deployed. For Vercel production this is handled server-side (the backend backfills the wait entity in completed state on wait_completed if it doesn't exist — see vercel/workflow-server#265). For self-hosted postgres and local dev, I agree we should add backfill logic. I'll add that as a follow-up.

waitId Inconsistency

Fixed in a5e093c — both backends now use the raw correlationId as waitId, consistent with how steps and hooks use their correlation IDs as PKs.

Runtime 409 Handling

Agreed it causes one wasted invocation. The self-healing behavior is correct though — the next invocation picks up the persisted event and proceeds. I'll note this as an optimization for a follow-up (re-fetch the event on 409 and push it to the array to avoid the extra round-trip).

wakeUpRun 409 Handling

👍


Re: "world-postgres doesn't seem too happy" — this was fixed, the migration journal entry was missing. Added in ce2eb72.

pranaygp and others added 2 commits February 16, 2026 15:53
The server-side changes (vercel/workflow-server#265) have been merged to main
and deployed to production, so we no longer need to point at the preview URL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* origin/main:
  Remove "workflow/internal/serialization" export (#1082)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants