[web] Probe deployment specVersion before replaying run#1782
[web] Probe deployment specVersion before replaying run#1782VaguelySerious wants to merge 1 commit into
Conversation
🦋 Changeset detectedLatest commit: d568c54 The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (1 failed)express (1 failed):
🌍 Community Worlds (98 failed)mongodb (15 failed):
redis (15 failed):
turso (68 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
✅ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 10 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 25 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 50 sequential data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 10 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) workflow with 25 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express workflow with 50 concurrent data payload steps (10KB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express stream pipeline with 5 transform steps (1MB)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express 10 parallel streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express fan-out fan-in 10 streams (1MB each)💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
Replay/Re-run now probes the target deployment's specVersion via health check before calling recreateRunFromExisting. Without this, the queue transport is chosen based on the original run's specVersion, which mismatches the deployment when the deployment has been upgraded past the run's specVersion. Falls back to the run's specVersion if the health check fails (e.g. against old deployments without health check support). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Peter Wielander <mittgfu@gmail.com>
c076664 to
d568c54
Compare
TooTallNate
left a comment
There was a problem hiding this comment.
Review
Small, well-targeted fix. The logic is correct and the fallback preserves existing behavior when the probe fails, so this is strictly safer than the pre-PR behavior.
What looks good
- Correctness:
recreateRunFromExistingalready supports thespecVersionoption (runs.ts:64-65), andhealthCheckalready returnsspecVersionon theHealthCheckResult. The PR just wires them together. - Deployment ID resolution matches the fallback in
recreateRunFromExisting— uses the override if provided, else the run's currentdeploymentId. Both callers agree on the target. - Fallback on probe failure leaves
specVersionasundefined, which falls back to the existingrun.specVersion ?? SPEC_VERSION_LEGACYlogic insiderecreateRunFromExisting. No regression. - Changeset is scoped to
@workflow/webonly, since the only behavior change is in the web dashboard's replay action.
Non-blocking concerns
See inline comments. The main one worth discussing is the UX cost of the 10s probe timeout for replays against old deployments that don't support health check — that's arguably the most common case this PR is trying to help, and each one now pays up to 10s of wait before falling back.
Test plan
The PR description has an unchecked test plan. Ideally these e2e tests would be added before merge. At minimum the third case (probe timeout) is worth exercising since it's the new latency floor for the old-deployment case.
| if (hc.healthy && hc.specVersion != null) { | ||
| specVersion = hc.specVersion; | ||
| } | ||
| } catch { |
There was a problem hiding this comment.
Latency concern: this adds up to 10s of UI wait for the exact case the PR is trying to help.
The healthCheck() implementation polls for a response until timeout (helpers.ts:254-294). An old deployment that doesn't recognize the __wkf_workflow_health_check queue topic will silently drop the message — there's no fast-fail signal, so the probe will wait the full 10s before returning healthy: false.
This means: a user clicking "Replay" on a run from a pre-health-check deployment pays 10 seconds of UI latency before the replay even starts. That's the exact scenario the PR needs to handle gracefully (old deployment \u2192 JSON transport), but it's now the slowest case.
Options to consider:
-
Shorter timeout (e.g. 3\u20134s) \u2014 if the deployment is alive and supports health check, it typically responds in < 500ms, so 10s is overkill. 3s should be safe.
-
Version-gate the probe \u2014 only probe if
run.specVersionsuggests the deployment might have been upgraded. Ifrun.specVersion >= SPEC_VERSION_SUPPORTS_HEALTH_CHECK, probe; otherwise skip. (Would need to add that constant.) -
Non-blocking probe with a short budget \u2014 race the probe against a short timeout (say 2s); if it doesn't resolve in time, fall back. The successful path stays fast, the failure path isn't punished.
Option 1 is the simplest and probably sufficient.
| // if the probe fails (e.g. old deployment without health check support). | ||
| let specVersion: number | undefined; | ||
| try { | ||
| let targetDeploymentId = deploymentId; |
There was a problem hiding this comment.
Nit: when deploymentId is not provided, this calls world.runs.get(runId) to get the run's deploymentId, and then recreateRunFromExisting internally calls world.runs.get(runId, { resolveData: 'all' }) again a moment later. Not a correctness issue, just a duplicated round-trip. Could be avoided by fetching the run once here and passing both run and the resolved ID through, but that would require a larger signature change on recreateRunFromExisting. Probably not worth it for a non-hot path.
Summary
Follows #1629 and #1627. The web dashboard's Replay / Re-run flow now probes the target deployment's specVersion via health check before calling
recreateRunFromExisting, so the correct queue transport (JSON for old deployments, CBOR for new) is used.Without this, the transport was chosen based on the original run's specVersion, which mismatches the target deployment when the deployment has been upgraded past that spec.
healthCheck(world, 'workflow', { deploymentId, timeout: 10_000 })specVersion, passes it torecreateRunFromExistingspecVersioninsiderecreateRunFromExistingTest plan
🤖 Generated with Claude Code