fix: return 200 on Stripe webhook processing errors#3773
fix: return 200 on Stripe webhook processing errors#3773avasis-ai wants to merge 1 commit intodubinc:mainfrom
Conversation
|
@abhayjnayakk is attempting to deploy a commit to the Dub Team on Vercel. A member of the Team first needs to authorize it. |
|
|
📝 WalkthroughWalkthroughThe Stripe webhook handler was updated to return HTTP 200 status instead of HTTP 400 when downstream business logic errors occur during event processing, while preserving error logging. This prevents Stripe from treating these failures as retriable errors. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/app/(ee)/api/stripe/webhook/route.ts (1)
81-89:⚠️ Potential issue | 🟠 MajorReturning 200 without retry/DLQ or idempotency turns transient failures into permanently lost events.
The status-code change itself is correct for stopping Stripe retry-driven duplicates, but the linked issue explicitly scopes three pieces together: return 200, handle failures internally via retry/DLQ, and deduplicate by event ID. Only the first is implemented here. Concretely, with downstream handlers like
checkoutSessionCompleted(plan/limit update →sendBatchEmail→tokenCache.expireMany) andchargeSucceeded(invoice update →processPayoutInvoicepublishing Qstash jobs), a throw mid-flight now ends the pipeline silently — the workspace may be half-upgraded, emails never sent, or payout jobs never enqueued, with no retry path. Previously the 400 at least forced a retry; now alog()entry is the only recovery signal.Before this lands, please add at minimum:
- Idempotency on
event.id(persist with NX + TTL, short‑circuit at the top of the handler) so a retry (Stripe's or your own) is safe.- Retry/DLQ for the caught error — e.g., enqueue the raw event to QStash with a retry policy, or write it to a
stripe_webhook_failurestable that a cron drains. Without this, a 200 on failure is strictly worse than the old 400 for at-least-once delivery.Also consider including
event.idin the error log to make correlating failures with the replay/DLQ record practical:🛠️ Suggested tweak to the log line (minimal)
} catch (error) { await log({ - message: `Stripe webhook failed (${event.type}). Error: ${error.message}`, + message: `Stripe webhook failed (${event.type}, id: ${event.id}). Error: ${(error as Error).message}`, type: "errors", }); return new Response("Webhook received (processing failed internally)", { status: 200, }); }Related prior pattern in this repo: the stablecoin payout handler deliberately distinguishes permanent vs. retriable Stripe failures by
returnvs.throw, specifically to preserve a retry path for transient issues — the same distinction is missing here once 400 is removed. Based on learnings from PR 3449 (apps/web/lib/partners/create-stablecoin-payout.ts), retriable failures must keep a retry affordance; swallowing them with 200 removes it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/app/`(ee)/api/stripe/webhook/route.ts around lines 81 - 89, The handler currently catches all errors and returns 200, which loses transient failures; add idempotency and a retry/DLQ path: at the top of the webhook handler persist event.id with an NX + TTL (short-circuit if already present) to deduplicate incoming Stripe events, and in the catch block enqueue the raw event payload to QStash (or insert into a stripe_webhook_failures table) with a retry policy so downstream handlers like checkoutSessionCompleted, chargeSucceeded, tokenCache.expireMany and processPayoutInvoice can be retried; also include event.id in the processLogger.log/error call so you can correlate logs with DLQ entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@apps/web/app/`(ee)/api/stripe/webhook/route.ts:
- Around line 81-89: The handler currently catches all errors and returns 200,
which loses transient failures; add idempotency and a retry/DLQ path: at the top
of the webhook handler persist event.id with an NX + TTL (short-circuit if
already present) to deduplicate incoming Stripe events, and in the catch block
enqueue the raw event payload to QStash (or insert into a
stripe_webhook_failures table) with a retry policy so downstream handlers like
checkoutSessionCompleted, chargeSucceeded, tokenCache.expireMany and
processPayoutInvoice can be retried; also include event.id in the
processLogger.log/error call so you can correlate logs with DLQ entries.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 961102d5-48e0-4fd4-ae18-6acc97dd6bb0
📒 Files selected for processing (1)
apps/web/app/(ee)/api/stripe/webhook/route.ts
Summary
Fixes #3752
The Stripe webhook handler at
apps/web/app/(ee)/api/stripe/webhook/route.tswas returning HTTP 400 when downstream business logic (e.g., email send failures, transient DB issues) threw an error. Stripe interprets 4xx responses as a signal to retry the webhook delivery, which can cause unguarded retries of events likecheckout.session.completed— potentially leading to double plan upgrades or duplicate processing.Changes
400to200in the catch block so Stripe does not retry on internal processing failureslog()call already handles error tracking, so no logging changes were neededTest Plan
log()remains unchanged — failures are still tracked internallySummary by CodeRabbit