implement event-sourced architecture#621
Conversation
🦋 Changeset detectedLatest commit: b7a352a The changes in this PR will be included in the next version bump. This PR includes changesets to release 18 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests🌍 Community Worlds (161 failed)mongodb (40 failed):
redis (40 failed):
starter (41 failed):
turso (40 failed):
Details by Category✅ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
6ebd4c5 to
2e46b8a
Compare
eece359 to
290e879
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a performance optimization for event creation by adding a createBatch() method to the World interface. The implementation enables atomic batch creation of multiple events, significantly improving the wait completion logic in the runtime from O(n²) to O(n) complexity.
Key Changes
- Added
events.createBatch()method to the World interface for creating multiple events in a single operation - Implemented batch creation across three storage backends (world-vercel, world-postgres, world-local) with backend-specific optimizations
- Optimized runtime wait completion logic using Set-based correlation ID lookup and batch event creation
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
packages/world/src/interfaces.ts |
Added createBatch() method signature with JSDoc documentation to the Storage events interface |
packages/world-vercel/src/storage.ts |
Integrated batch event creation into the storage adapter |
packages/world-vercel/src/events.ts |
Implemented createWorkflowRunEventBatch() using parallel API calls via Promise.all |
packages/world-postgres/src/storage.ts |
Implemented batch creation using a single INSERT query with multiple values for optimal database performance |
packages/world-local/src/storage.ts |
Implemented sequential batch creation to maintain monotonic ULID ordering for filesystem storage |
packages/core/src/runtime.ts |
Refactored wait completion to use Set-based lookup and batch event creation, improving from O(n²) to O(n) complexity |
.changeset/brave-dots-bake.md |
Added changeset documenting the performance improvement across all affected packages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix broken link in hook-conflict.mdx (/docs/foundations/webhooks -> /docs/api-reference/workflow/create-webhook) - Add hook-conflict to errors index page so it's discoverable by the docs link validator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update tests to expect hook_conflict events instead of thrown errors when duplicate hook tokens are used. This aligns with the new event-sourced approach where conflicts are recorded as events rather than thrown. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add specVersion property to World interface to track world package version - Add specVersion to WorkflowRun schema and run_created event data - World implementations (vercel, local, postgres) set specVersion from npm version - Server can use specVersion to route operations based on world version - Add specVersion display to observability UI attribute panel - Add spec_version column to postgres runs schema 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Properly generates migration with drizzle-kit CLI - Removes deprecated 'paused' status from enum - Adds spec_version column 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add RunNotSupportedError for runs requiring newer world versions - Add semver-based version utilities (isLegacyVersion) to @workflow/world - World implementations check specVersion and route to legacy handlers - Legacy runs (< 4.1.0): run_cancelled skips event storage, wait_completed stores event only - New runs always get current world version (4.1.0-beta.0) - Make EventResult.event optional for legacy compatibility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace semver-based version compatibility with explicit integer spec versions: - SPEC_VERSION_LEGACY (1): pre-event-sourcing runs - SPEC_VERSION_CURRENT (2): event-sourced architecture Use branded SpecVersion type to enforce importing from @workflow/world. Remove semver dependency from world, world-local, and world-postgres. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add error_cbor bytea columns to workflow_runs and workflow_steps tables - Deprecate text error column, rename to errorJson with fallback parsing - Remove JSON.stringify from error writes (run_failed, step_failed, step_retrying) - Add parseErrorJson helper for backwards compatibility with legacy data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace generic Error with WorkflowRuntimeError for runtime assertions - Add explicit check for run entity in run_created response - Use run.runId instead of event.runId for consistency - Use actual run status instead of hardcoded 'pending' in attributes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add specVersion field to Step, Hook, and Event interfaces in @workflow/world - Add spec_version column to steps, hooks, events tables in postgres schema - Set specVersion to SPEC_VERSION_CURRENT when creating entities in all worlds - Update migration to include spec_version columns for all entity tables Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Split storage.ts (1041 lines) into smaller, focused modules: - storage/filters.ts: Data filtering helpers - storage/helpers.ts: ULID and date utilities - storage/hooks-storage.ts: Hook CRUD operations - storage/legacy.ts: Legacy event handling - storage/runs-storage.ts: Run get/list operations - storage/steps-storage.ts: Step get/list operations - storage/events-storage.ts: Event create/list operations - storage/index.ts: Main composition Also extracted test helpers to test-helpers.ts for reusability. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The World.specVersion string property was never actually read - only the numeric SPEC_VERSION_CURRENT is used for backwards compatibility. - Remove genversion dependency and generated version.ts from @workflow/world - Remove specVersion property from World interface and all implementations - Minor fix: correct error message to reference 'workflow' package - Minor fix: correct error source priority in world-vercel events.ts - Minor fix: update comment in runtime.ts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These packages no longer need genversion since we removed the World.specVersion property in the previous commit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
No longer needed after removing genversion. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests for world-local and world-postgres covering: - Legacy runs (specVersion < 2 or null/undefined) - run_cancelled handling (updates run, no event stored) - wait_completed handling (stores event only) - Rejection of unsupported events - Hook cleanup on cancellation - Future runs (specVersion > current) - Rejection with RunNotSupportedError - Current version runs (normal processing) - Legacy error parsing (errorJson field parsing) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When resumeHook() is called on a legacy run (specVersion < 2), the hook_received event was previously rejected. This adds support for storing hook_received events on legacy runs without entity mutation, matching the behavior of wait_completed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Peter Wielander <mittgfu@gmail.com>
…tarted events Replace with run_completed, run_failed, and run_started equivalents. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| @@ -0,0 +1,12 @@ | |||
| --- | |||
| "@workflow/world": minor | |||
There was a problem hiding this comment.
note: this should cause a minor bump transitive to everything that depends on the world package
The manually-created EventWithRefsSchema was missing the specVersion field, which caused specVersion to be stripped when using lazy (refs) mode for events. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Core runtime now sends specVersion in run_created eventData - world-local accepts specVersion from eventData (defaults to current) - world-postgres accepts specVersion from eventData (defaults to current) This matches workflow-server behavior where v2 endpoints accept specVersion from the client, while v1 endpoints default to legacy. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add specVersion to BaseEventSchema (event level, not eventData) - Remove specVersion from RunCreatedEventSchema.eventData - Core runtime sends specVersion on event object - world-local reads specVersion from event, propagates to run/step/hook entities - world-postgres reads specVersion from event, propagates to run/step/hook entities This ensures specVersion flows from client through event to all created entities. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- specVersion is optional in all entity schemas (runs, steps, hooks, events) for backwards compatibility with legacy data in storage - Runtime always sends specVersion on event requests - world-local and world-postgres provide fallback to SPEC_VERSION_CURRENT - Test helpers include specVersion in all event creation calls - EventWithRefsSchema in world-vercel defaults specVersion to 1 for legacy Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two tests were calling queue.queue() without setting up VERCEL_DEPLOYMENT_ID, causing them to fail with "No deploymentId provided" error before reaching the code they were testing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The server's CreateEventSchemaV2 requires specVersion on all events, but only run_created was sending it. This caused 400 Bad Request errors for all other event types (run_started, run_completed, run_failed, run_cancelled, step_created, step_started, step_completed, step_failed, step_retrying, hook_created, hook_received, wait_created, wait_completed). Now all event creation calls include specVersion: SPEC_VERSION_CURRENT. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Pranay:
corresponding workflow-server PR: https://github.com/vercel/workflow-server/pull/154
important: This is a big change to the way workflows work since everything is now event sourced, I introduced new events types,
and changed the shape of the step object (lastKnownError -> error and startedAt -> firstStartedAt). New event logs that use this published version ofworkflowwill be incompatible with previous workflow version event logs. This doesn't affect the runtime of workflows since those are deployment pegged - but this does affect observability since the event shape looks different and the world spec has changed. The web-shared package just needs to be compatible with viewing workflow runs of the old schema for this to work correctly (which I believe it does, but please double check @VaguelySerious if I missed anything).The currently failing e2e tests on vercel world are related to the CLI I believe (slack x-ref). However once we merged the workflow-server PR, we can drop the env var changes on the vercel deployments for PR so that this PR points to the main prod deployment, again and then I'll re-run e2e tests to make sure they work :)
I Also added a new docs page with diagrams to explain the event sourcing and state machine lifecycles (preview link):
small: I also removed the unused run paused/resumed stuff which we've never used to simplify
Summary
Implement event-sourced architecture for runs, steps, and hooks:
run_created,run_started,run_completed,run_failed,run_cancelled)step_retryingevent for non-fatal step failures that will be retriedfatalfield fromstep_failedevent (step_failed now implies terminal failure)lastKnownErrortoerrorfor consistency with serverevents.create()step_createdevent for earlier detectionrun_paused/run_resumedevents andpausedstatusThis makes the system faster, easier to reason about, and resilient to data inconsistencies.
Test plan
🤖 Generated with Claude Code