[world-vercel] Move event ref resolution from server-side to client-side#1189
Conversation
When fetching events with resolveData='all', world-vercel now always sends remoteRefBehavior=lazy to workflow-server and hydrates the returned ref descriptors client-side. This moves memory pressure from the server (which was OOMing when resolving all refs for large event lists) to the client. For inline refs (dbrf:), data is decoded locally from the descriptor's embedded payload with zero network overhead. For S3/Redis refs, parallel requests are made to the new GET /v2/refs endpoint with bounded concurrency (max 10 concurrent requests).
🦋 Changeset detectedLatest commit: 28d1696 The changes in this PR will be included in the next version bump. This PR includes changesets to release 15 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
workflow with 1 step💻 Local Development
▲ Production (Vercel)
workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
❌ Some benchmark jobs failed:
Check the workflow run for details. |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (1 failed)astro (1 failed):
🌍 Community Worlds (45 failed)turso (45 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
There was a problem hiding this comment.
Pull request overview
Moves event ref resolution for world-vercel from workflow-server to the @workflow/world-vercel client by always requesting remoteRefBehavior=lazy for event listing, then hydrating ref descriptors locally (inline decode) or via GET /v2/refs (bounded concurrency) to reduce server memory pressure and prevent OOMs.
Changes:
- Added a new
refs.tsmodule to model/identifyRefDescriptors and resolve them locally (inline) or remotely (/v2/refs) with bounded concurrency. - Updated
getWorkflowRunEvents()to always request lazy refs and (whenresolveData='all') hydrate returned events client-side. - Extended event list parsing to accept both legacy
eventDataRefand v2eventData(which may contain nested ref descriptors).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
packages/world-vercel/src/refs.ts |
Adds ref-descriptor types/guards and single/batch resolution logic (inline decode + server fetch). |
packages/world-vercel/src/events.ts |
Forces lazy ref behavior for event listing and hydrates refs client-side for resolveData='all'. |
Comments suppressed due to low confidence (1)
packages/world-vercel/src/events.ts:229
- In the
resolveData === 'none'path, the response items can still includeeventDataRefbecausefilterEventData()only stripseventData. SinceEventdoes not includeeventDataRef, this leaks internal ref descriptors to callers and breaks the stated intent (“strip eventData and eventDataRef”). Update the filtering to omiteventDataRefas well (or strip it here before returning).
// resolveData === 'none': strip eventData and eventDataRef
return {
...response,
data: response.data.map((event: any) =>
filterEventData(event, resolveData)
),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pranaygp
left a comment
There was a problem hiding this comment.
Good approach — moving ref resolution client-side to avoid server OOMs makes sense. The code is well-structured with clean separation between ref resolution and event hydration. A few issues to address, one bug and some hardening suggestions.
pranaygp
left a comment
There was a problem hiding this comment.
Deep-dive review covering performance, error handling, race conditions, and security. See inline comments for details. The 3 HIGH items are all in the error handling category — the N+1 request pattern introduces new failure modes (cascading failures, no retry for 429/5XX, request storms against downed endpoints) that didn't exist with the old single-request approach.
pranaygp
left a comment
There was a problem hiding this comment.
OTEL/tracing review — the new multi-request fan-out is invisible in traces. The highest priority fix is adding a parent span (world.refs.hydrate) around hydrateEventRefs to group the /v2/refs HTTP calls and distinguish them from the primary list request.
Update resolveRefDescriptor to include the runId in the endpoint path, matching the server-side rename from GET /v2/refs to GET /v2/runs/:runId/refs. Thread runId from events through the dedup map and into resolveRefDescriptors via a new RefWithRunId interface that pairs each descriptor with its owning event's runId.
pranaygp
left a comment
There was a problem hiding this comment.
Follow-up review on the latest 4 commits. 11 original issues resolved, 5 deferred to follow-ups. 4 new observations, all MEDIUM or LOW — the biggest one is that the switch from makeRequest to direct fetch for remote refs loses OTEL HTTP spans and structured error context. Overall looking good.
- Wrap direct fetch in resolveRefDescriptor with trace('http GET')
span matching makeRequest conventions (http.request.method, url.full,
http.response.status_code, peer.service, error recording)
- Set Accept and X-Request-Time headers that makeRequest normally adds,
preventing RSC request memoization and enabling content negotiation
- Use WorkflowAPIError instead of plain Error for ref fetch failures,
preserving structured context (url, status) for future retry logic
- Log warning when EventSchema.safeParse fails during post-hydration
coercion, making schema mismatches visible in production
Summary
resolveData='all',world-vercelnow always sendsremoteRefBehavior=lazyto workflow-server and hydrates the returned ref descriptors client-sidedbrf:), data is decoded locally from the descriptor's embedded payload — zero network overheads3rf:/kvrf:), parallel requests are made to the newGET /v2/refsendpoint with bounded concurrency (max 10 concurrent)Files Changed
packages/world-vercel/src/refs.tsRefDescriptortype,isRefDescriptor()guard,resolveRefDescriptor()(local decode for inline refs, HTTP for remote),resolveRefDescriptors()(bounded-concurrency batch resolver)packages/world-vercel/src/events.tsgetWorkflowRunEvents()to always sendlazy, then hydrate refs client-side whenresolveData='all'; addedEventWithRefsSchemaeventDatafield,eventDataRefFieldMap,collectPendingRefs(),hydrateEventRefs()Companion PR
The server-side endpoint (
GET /api/v2/refs) that this PR depends on is in the companion PR onvercel/workflow-server.Testing