Skip to content

fix: return 200 on Stripe webhook processing errors#3773

Open
avasis-ai wants to merge 1 commit intodubinc:mainfrom
avasis-ai:fix/stripe-webhook-return-200-on-business-logic-errors
Open

fix: return 200 on Stripe webhook processing errors#3773
avasis-ai wants to merge 1 commit intodubinc:mainfrom
avasis-ai:fix/stripe-webhook-return-200-on-business-logic-errors

Conversation

@avasis-ai
Copy link
Copy Markdown

@avasis-ai avasis-ai commented Apr 18, 2026

Summary

Fixes #3752

The Stripe webhook handler at apps/web/app/(ee)/api/stripe/webhook/route.ts was returning HTTP 400 when downstream business logic (e.g., email send failures, transient DB issues) threw an error. Stripe interprets 4xx responses as a signal to retry the webhook delivery, which can cause unguarded retries of events like checkout.session.completed — potentially leading to double plan upgrades or duplicate processing.

Changes

  • Changed the error response status from 400 to 200 in the catch block so Stripe does not retry on internal processing failures
  • Updated the response message to indicate the error was received but processing failed internally
  • The existing log() call already handles error tracking, so no logging changes were needed

Test Plan

  • Verified that the only change is in the catch block response (2 lines)
  • Error logging via log() remains unchanged — failures are still tracked internally
  • Webhook signature verification errors (line 38-40) still correctly return 400 since those are genuine bad requests

Summary by CodeRabbit

  • Bug Fixes
    • Improved Stripe webhook error handling to ensure consistent message delivery and processing reliability.

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 18, 2026

@abhayjnayakk is attempting to deploy a commit to the Dub Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

📝 Walkthrough

Walkthrough

The Stripe webhook handler was updated to return HTTP 200 status instead of HTTP 400 when downstream business logic errors occur during event processing, while preserving error logging. This prevents Stripe from treating these failures as retriable errors.

Changes

Cohort / File(s) Summary
Stripe webhook error handling
apps/web/app/(ee)/api/stripe/webhook/route.ts
Modified error response in event-type processing to return HTTP 200 with generic message instead of HTTP 400 with specific error details, preventing Stripe's retry machinery from being triggered on downstream failures.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

🐰 A webhook that cries "error!" with code four-zero-zero,
Makes Stripe think "retry me!"—oh what a zero.
But now we respond with a calm "200,"
Business logic fails quietly, Stripe won't be due.
No double-charges, no duplicate plays—
Just graceful recovery in webhook-y ways! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR partially addresses issue #3752 by returning 200 on business-logic errors but does not implement event-ID deduplication to prevent double-processing on legitimate Stripe retries. Implement event-ID deduplication logic (via Redis with NX and TTL semantics) to complete the fix and prevent double-processing when Stripe legitimately retries.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: return 200 on Stripe webhook processing errors' accurately and specifically summarizes the main change—returning HTTP 200 instead of 400 on webhook processing failures.
Out of Scope Changes check ✅ Passed All changes are scoped to the Stripe webhook error-handling behavior as specified in issue #3752; no out-of-scope modifications were introduced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/web/app/(ee)/api/stripe/webhook/route.ts (1)

81-89: ⚠️ Potential issue | 🟠 Major

Returning 200 without retry/DLQ or idempotency turns transient failures into permanently lost events.

The status-code change itself is correct for stopping Stripe retry-driven duplicates, but the linked issue explicitly scopes three pieces together: return 200, handle failures internally via retry/DLQ, and deduplicate by event ID. Only the first is implemented here. Concretely, with downstream handlers like checkoutSessionCompleted (plan/limit update → sendBatchEmailtokenCache.expireMany) and chargeSucceeded (invoice update → processPayoutInvoice publishing Qstash jobs), a throw mid-flight now ends the pipeline silently — the workspace may be half-upgraded, emails never sent, or payout jobs never enqueued, with no retry path. Previously the 400 at least forced a retry; now a log() entry is the only recovery signal.

Before this lands, please add at minimum:

  1. Idempotency on event.id (persist with NX + TTL, short‑circuit at the top of the handler) so a retry (Stripe's or your own) is safe.
  2. Retry/DLQ for the caught error — e.g., enqueue the raw event to QStash with a retry policy, or write it to a stripe_webhook_failures table that a cron drains. Without this, a 200 on failure is strictly worse than the old 400 for at-least-once delivery.

Also consider including event.id in the error log to make correlating failures with the replay/DLQ record practical:

🛠️ Suggested tweak to the log line (minimal)
   } catch (error) {
     await log({
-      message: `Stripe webhook failed (${event.type}). Error: ${error.message}`,
+      message: `Stripe webhook failed (${event.type}, id: ${event.id}). Error: ${(error as Error).message}`,
       type: "errors",
     });
     return new Response("Webhook received (processing failed internally)", {
       status: 200,
     });
   }

Related prior pattern in this repo: the stablecoin payout handler deliberately distinguishes permanent vs. retriable Stripe failures by return vs. throw, specifically to preserve a retry path for transient issues — the same distinction is missing here once 400 is removed. Based on learnings from PR 3449 (apps/web/lib/partners/create-stablecoin-payout.ts), retriable failures must keep a retry affordance; swallowing them with 200 removes it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/app/`(ee)/api/stripe/webhook/route.ts around lines 81 - 89, The
handler currently catches all errors and returns 200, which loses transient
failures; add idempotency and a retry/DLQ path: at the top of the webhook
handler persist event.id with an NX + TTL (short-circuit if already present) to
deduplicate incoming Stripe events, and in the catch block enqueue the raw event
payload to QStash (or insert into a stripe_webhook_failures table) with a retry
policy so downstream handlers like checkoutSessionCompleted, chargeSucceeded,
tokenCache.expireMany and processPayoutInvoice can be retried; also include
event.id in the processLogger.log/error call so you can correlate logs with DLQ
entries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/web/app/`(ee)/api/stripe/webhook/route.ts:
- Around line 81-89: The handler currently catches all errors and returns 200,
which loses transient failures; add idempotency and a retry/DLQ path: at the top
of the webhook handler persist event.id with an NX + TTL (short-circuit if
already present) to deduplicate incoming Stripe events, and in the catch block
enqueue the raw event payload to QStash (or insert into a
stripe_webhook_failures table) with a retry policy so downstream handlers like
checkoutSessionCompleted, chargeSucceeded, tokenCache.expireMany and
processPayoutInvoice can be retried; also include event.id in the
processLogger.log/error call so you can correlate logs with DLQ entries.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 961102d5-48e0-4fd4-ae18-6acc97dd6bb0

📥 Commits

Reviewing files that changed from the base of the PR and between 65e78b3 and a3d7319.

📒 Files selected for processing (1)
  • apps/web/app/(ee)/api/stripe/webhook/route.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stripe webhook handler returns 4xx on business logic errors, causing unguarded retries

3 participants