Improve backend resilience for file and trigger workflows by riderx · Pull Request #1564 · Cap-go/capgo

riderx · 2026-02-03T06:12:05Z

Summary (AI generated)

Harden R2 file reads to reduce transient failures
Add retries and safer error handling for channel/manifest/cron triggers

Test plan (AI generated)

bun lint:backend
bun test:backend

Screenshots (AI generated)

N/A

Checklist (AI generated)

My code follows the code style of this project and passes .
My change requires a change to the documentation.
I have updated the documentation
accordingly.
My change has adequate E2E test coverage.
I have tested my code manually, and I have provided steps how to reproduce
my tests

Generated with AI

Summary by CodeRabbit

Bug Fixes
- Enhanced error handling and recovery mechanisms for file operations, channel updates, and manifest processing.
- Added automatic retry logic for channel and manifest updates to improve reliability.
- Improved error logging and handling of edge cases with missing or invalid data.

coderabbitai · 2026-02-03T06:12:13Z

📝 Walkthrough

Walkthrough

These changes enhance error handling and resilience across Supabase backend functions by introducing retry mechanisms with exponential backoff for cloud operations, wrapping critical calls in try/catch blocks with improved logging via cloudlogErr, and converting some error scenarios to graceful responses instead of exceptions.

Changes

Cohort / File(s)	Summary
Retry Mechanisms for Cloud Operations `supabase/functions/_backend/triggers/on_channel_update.ts`, `supabase/functions/_backend/triggers/on_manifest_create.ts`	Added retry logic with configurable attempts and progressive delays for channel updates (iOS, Android, public) and S3 manifest size retrieval. Implements exponential backoff and logs final failures via cloudlogErr.
Enhanced Error Handling & Logging `supabase/functions/_backend/files/files.ts`, `supabase/functions/_backend/triggers/cron_stat_org.ts`, `supabase/functions/_backend/utils/plans.ts`	Wrapped critical operations (file head/get, plan status checks, stripe updates) in try/catch blocks. Replaces silent failures or exceptions with conditional logging via cloudlogErr and graceful error responses (503 for file operations).
HTTP Request Processing `supabase/functions/_backend/utils/on_error.ts`	Added conditional request body reading based on HTTP method; skips body capture for GET/HEAD/OPTIONS requests to reduce unnecessary processing.

Possibly related PRs

fix: ensure manifest file_size is updated #1466: Modifies the same manifest file_size handling in on_manifest_create.ts, addressing retry behavior for S3 size=0 scenarios.
fix: cache range header in /files get && make sure that a range equal… #1175: Related to file download logic validation using HTTP Range requests and object metadata (head) verification.
feat: rename cron jobs for plan and stats #1221: Modifies cron_stat_org.ts error handling and routing patterns in the same trigger function.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through error logs with glee,
Retry, retry, until success we see!
With exponential delays and catches so neat,
Resilience blooms where failures once beat. 🌿✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description covers the main changes (retries, error handling) and test plan (linters), but the checklist items are incomplete: documentation updates are unchecked and no manual test reproduction steps are provided despite the checklist stating they should be.	Provide manual test reproduction steps or clarify why they are not applicable; confirm whether documentation updates to Cap-go/website are needed and update accordingly.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main focus of the changeset: improving resilience across multiple backend workflows (file reads, channel updates, cron triggers) through retry mechanisms and enhanced error handling.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch riderx/log-fix-plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18b78ef969

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-03T06:14:08Z

+        await supabase
+          .from('stripe_info')
+          .update({ plan_calculated_at: new Date().toISOString() })
+          .eq('customer_id', body.customerId)


Gate plan_calculated_at update on successful plan calc

The new inner try/catch swallows failures from checkPlanStatusOnly, but the timestamp update still runs unconditionally. Because queue_cron_plan_for_org checks plan_calculated_at to skip re-queueing for an hour (see supabase/migrations/20251007134349_cron_plan_from_stats_backend.sql), a transient failure now marks the org as “calculated” and suppresses retries, leaving plan status/usage stale until the next window. Consider updating plan_calculated_at only when the plan calculation actually succeeds.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR strengthens backend resilience around plan checks, file downloads, and trigger workflows, with more robust error handling and retry logic.

Changes:

Wrap plan status calculation and cron-trigger logic with targeted try/catch blocks and structured logging to avoid aborting flows on transient failures.
Add retry/backoff helpers for manifest size retrieval and channel updates, and make some trigger error paths return success rather than failing the trigger.
Adjust the global onError handler to skip reading the body for safe HTTP methods and improve file download error handling by mapping R2/storage failures to a structured 503 response.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`supabase/functions/_backend/utils/plans.ts`	Wraps `calculatePlanStatus` in a try/catch within `checkPlanStatusOnly`, logging failures and returning early to avoid failing the entire cron when plan calculation errors occur.
`supabase/functions/_backend/utils/on_error.ts`	Changes the error handler to only attempt JSON body parsing for non-GET/HEAD/OPTIONS methods, reducing spurious “failToReadBody” errors and preserving existing structured logging and alerting.
`supabase/functions/_backend/triggers/on_manifest_create.ts`	Introduces `getManifestSizeWithRetry` for repeated `s3.getSize` attempts and alters error handling so some invalid/zero-size cases now log and return `BRES` instead of throwing, changing the observable error contract.
`supabase/functions/_backend/triggers/on_channel_update.ts`	Adds `updateChannelsWithRetry` with retry/backoff for Supabase updates and changes missing `id`/`app_id` cases from throwing `simpleError` to returning `BRES`, altering previously tested error responses.
`supabase/functions/_backend/triggers/cron_stat_org.ts`	Wraps `checkPlanStatusOnly` and the `plan_calculated_at` update in inner try/catch blocks to log and continue on failure while still ensuring the DB client is closed in the `finally` block.
`supabase/functions/_backend/files/files.ts`	Hardens the R2-backed GET handler by wrapping `head`/`get` calls in try/catch, logging failures, and mapping `get` exceptions to a 503 `upstream_unavailable` error while preserving 404 behavior for missing objects and existing caching logic.

Copilot · 2026-02-03T06:17:15Z

  if (!record.id) {
    cloudlog({ requestId: c.get('requestId'), message: 'No id' })
-    throw simpleError('no_id', 'No id', { record })
+    return c.json(BRES)
  }
  if (!record.app_id) {
-    throw simpleError('no_app_id', 'No app id included the request', { record })
+    cloudlog({ requestId: c.get('requestId'), message: 'No app id included the request' })
+    return c.json(BRES)


This change turns invalid payload cases (missing id or app_id) from an error (simpleError('no_id'/'no_app_id')) into a silent success response (BRES), which breaks the existing contract and the tests in tests/trigger-error-cases.test.ts that assert a 400 response with error: 'no_app_id' for this scenario. If the goal is to make the trigger more resilient, consider keeping a 4xx error for structurally invalid records (or updating the tests and any callers explicitly), rather than returning a generic success that hides configuration issues and makes debugging harder.

Copilot · 2026-02-03T06:17:15Z

  if (!record.app_version_id || !record.s3_path) {
    cloudlog({ requestId: c.get('requestId'), message: 'no app_version_id or s3_path' })
-    throw simpleError('no_app_version_id_or_s3_path', 'No app_version_id or s3_path', { record })
+    return c.json(BRES)


Previously, when app_version_id or s3_path was missing, this trigger threw simpleError('no_app_version_id_or_s3_path', ...), and the error-case test in tests/trigger-error-cases.test.ts explicitly asserts a 400 response with error: 'no_app_version_id_or_s3_path'. Changing this to return c.json(BRES) silently treats an invalid manifest row as success, which both breaks those tests and makes it harder to detect schema/configuration issues; consider preserving the explicit error (or updating tests and any consumers if this contract change is intentional).

Suggested change

return c.json(BRES)

throw simpleError('no_app_version_id_or_s3_path', 'no app_version_id or s3_path', { record })

Copilot · 2026-02-03T06:17:16Z

+  let object: R2ObjectBody | null = null
+  try {
+    object = await new RetryBucket(bucket, DEFAULT_RETRY_PARAMS).get(fileId, {
+      range: c.req.raw.headers,
+    })
+  }
+  catch (error) {
+    cloudlogErr({ requestId: c.get('requestId'), message: 'getHandler files get failed', fileId, error })
+    throw quickError(503, 'upstream_unavailable', 'File storage temporarily unavailable', { fileId })


The new catch block around RetryBucket(bucket, DEFAULT_RETRY_PARAMS).get(...) introduces a specific 503 upstream_unavailable error path for R2/storage failures, but there is no test exercising this behavior even though we already have /files error-case tests (see tests/error-cases.test.ts). To prevent regressions and to lock in the intended contract for transient storage outages (status code and error payload), consider adding a test that forces R2Bucket.get to throw and asserts the 503 response with the expected error code.

sonarqubecloud · 2026-02-03T06:34:15Z

Quality Gate failed

Failed conditions
21.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@supabase/functions/_backend/triggers/cron_stat_org.ts`:
- Around line 34-46: The stripe_info plan_calculated_at update currently runs
unconditionally when body.customerId exists even if checkPlanStatusOnly()
failed; add a boolean flag (e.g., planStatusSucceeded) that is set true only
when checkPlanStatusOnly() completes without throwing and use that flag to gate
the update so it only runs on success; also replace the Supabase call at the
update block (currently using supabaseAdmin(c).from('stripe_info').update(...))
with the existing drizzleClient (initialized earlier) and perform the equivalent
Drizzle/PG update against the stripe_info table to set plan_calculated_at,
preserving the same logging (cloudlog) on success and error.

In `@supabase/functions/_backend/triggers/on_channel_update.ts`:
- Around line 1-29: The retry loop in updateChannelsWithRetry currently only
handles resolved error results from operation() and will stop retrying if
operation() throws; modify updateChannelsWithRetry so the call to operation() is
wrapped in try/catch: invoke operation() inside the loop, catch any thrown error
and treat it like a failure (assign to lastError), continue to the next retry
after the existing delay logic, and only return early when operation() resolves
with no error; keep the existing cloudlogErr call (including c.get('requestId')
and context) to log lastError when all attempts fail.

coderabbitai · 2026-02-03T06:38:47Z

+      try {
+        const supabase = supabaseAdmin(c)
+        await supabase
+          .from('stripe_info')
+          .update({ plan_calculated_at: new Date().toISOString() })
+          .eq('customer_id', body.customerId)
+          .throwOnError()

-      cloudlog({ requestId: c.get('requestId'), message: 'plan calculated timestamp updated', customerId: body.customerId })
+        cloudlog({ requestId: c.get('requestId'), message: 'plan calculated timestamp updated', customerId: body.customerId })
+      }
+      catch (error) {
+        cloudlog({ requestId: c.get('requestId'), message: 'plan calculated timestamp update failed', customerId: body.customerId, error })
+      }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n supabase/functions/_backend/triggers/cron_stat_org.ts | head -60

Repository: Cap-go/capgo

Length of output: 2271

Gate the timestamp update on plan check success and use Drizzle/PG client for database operations.

The timestamp update runs regardless of whether checkPlanStatusOnly() succeeds or fails—the error is caught at line 28–30 but no flag is set, so the stripe_info update at line 34–46 will execute when body.customerId is present. Add a planStatusSucceeded flag to gate the timestamp update.

Additionally, use drizzleClient (already initialized at line 22–23) instead of supabaseAdmin() at line 35 to align with the migration to Drizzle/PG clients for all database operations in backend code.

🤖 Prompt for AI Agents

In `@supabase/functions/_backend/triggers/cron_stat_org.ts` around lines 34 - 46, The stripe_info plan_calculated_at update currently runs unconditionally when body.customerId exists even if checkPlanStatusOnly() failed; add a boolean flag (e.g., planStatusSucceeded) that is set true only when checkPlanStatusOnly() completes without throwing and use that flag to gate the update so it only runs on success; also replace the Supabase call at the update block (currently using supabaseAdmin(c).from('stripe_info').update(...)) with the existing drizzleClient (initialized earlier) and perform the equivalent Drizzle/PG update against the stripe_info table to set plan_calculated_at, preserving the same logging (cloudlog) on success and error.

coderabbitai · 2026-02-03T06:38:47Z

+import type { Context } from 'hono'
 import type { MiddlewareKeyVariables } from '../utils/hono.ts'
 import type { Database } from '../utils/supabase.types.ts'
 import { Hono } from 'hono/tiny'
 import { BRES, middlewareAPISecret, simpleError, triggerValidator } from '../utils/hono.ts'
-import { cloudlog } from '../utils/logging.ts'
+import { cloudlog, cloudlogErr } from '../utils/logging.ts'
 import { supabaseAdmin } from '../utils/supabase.ts'

 export const app = new Hono<MiddlewareKeyVariables>()

+const UPDATE_RETRY_ATTEMPTS = 3
+const UPDATE_RETRY_DELAY_MS = 300
+
+async function updateChannelsWithRetry(
+  c: Context<MiddlewareKeyVariables>,
+  operation: () => Promise<{ error: any }>,
+  context: Record<string, unknown>,
+) {
+  let lastError: any
+  for (let attempt = 0; attempt < UPDATE_RETRY_ATTEMPTS; attempt++) {
+    const { error } = await operation()
+    if (!error)
+      return
+    lastError = error
+    if (attempt < UPDATE_RETRY_ATTEMPTS - 1)
+      await new Promise(resolve => setTimeout(resolve, UPDATE_RETRY_DELAY_MS * (attempt + 1)))
+  }
+  cloudlogErr({ requestId: c.get('requestId'), message: 'on_channel_update failed after retries', error: lastError, ...context })
+}


⚠️ Potential issue | 🟠 Major

Ensure retries cover thrown errors too.

If operation() rejects/throws, the loop exits and retries are skipped. Wrap it so failures still retry.

♻️ Suggested fix

async function updateChannelsWithRetry( c: Context<MiddlewareKeyVariables>, operation: () => Promise<{ error: any }>, context: Record<string, unknown>, ) { let lastError: any for (let attempt = 0; attempt < UPDATE_RETRY_ATTEMPTS; attempt++) { - const { error } = await operation() - if (!error) - return - lastError = error + try { + const { error } = await operation() + if (!error) + return + lastError = error + } + catch (error) { + lastError = error + } if (attempt < UPDATE_RETRY_ATTEMPTS - 1) await new Promise(resolve => setTimeout(resolve, UPDATE_RETRY_DELAY_MS * (attempt + 1))) } cloudlogErr({ requestId: c.get('requestId'), message: 'on_channel_update failed after retries', error: lastError, ...context }) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import type { Context } from 'hono'

import type { MiddlewareKeyVariables } from '../utils/hono.ts'

import type { Database } from '../utils/supabase.types.ts'

import { Hono } from 'hono/tiny'

import { BRES, middlewareAPISecret, simpleError, triggerValidator } from '../utils/hono.ts'

import { cloudlog } from '../utils/logging.ts'

import { cloudlog, cloudlogErr } from '../utils/logging.ts'

import { supabaseAdmin } from '../utils/supabase.ts'

export const app = new Hono<MiddlewareKeyVariables>()

const UPDATE_RETRY_ATTEMPTS = 3

const UPDATE_RETRY_DELAY_MS = 300

async function updateChannelsWithRetry(

c: Context<MiddlewareKeyVariables>,

operation: () => Promise<{ error: any }>,

context: Record<string, unknown>,

) {

let lastError: any

for (let attempt = 0; attempt < UPDATE_RETRY_ATTEMPTS; attempt++) {

const { error } = await operation()

if (!error)

return

lastError = error

if (attempt < UPDATE_RETRY_ATTEMPTS - 1)

await new Promise(resolve => setTimeout(resolve, UPDATE_RETRY_DELAY_MS * (attempt + 1)))

}

cloudlogErr({ requestId: c.get('requestId'), message: 'on_channel_update failed after retries', error: lastError, ...context })

}

import type { Context } from 'hono'

import type { MiddlewareKeyVariables } from '../utils/hono.ts'

import type { Database } from '../utils/supabase.types.ts'

import { Hono } from 'hono/tiny'

import { BRES, middlewareAPISecret, simpleError, triggerValidator } from '../utils/hono.ts'

import { cloudlog, cloudlogErr } from '../utils/logging.ts'

import { supabaseAdmin } from '../utils/supabase.ts'

export const app = new Hono<MiddlewareKeyVariables>()

const UPDATE_RETRY_ATTEMPTS = 3

const UPDATE_RETRY_DELAY_MS = 300

async function updateChannelsWithRetry(

c: Context<MiddlewareKeyVariables>,

operation: () => Promise<{ error: any }>,

context: Record<string, unknown>,

) {

let lastError: any

for (let attempt = 0; attempt < UPDATE_RETRY_ATTEMPTS; attempt++) {

try {

const { error } = await operation()

if (!error)

return

lastError = error

}

catch (error) {

lastError = error

}

if (attempt < UPDATE_RETRY_ATTEMPTS - 1)

await new Promise(resolve => setTimeout(resolve, UPDATE_RETRY_DELAY_MS * (attempt + 1)))

}

cloudlogErr({ requestId: c.get('requestId'), message: 'on_channel_update failed after retries', error: lastError, ...context })

}

🤖 Prompt for AI Agents

In `@supabase/functions/_backend/triggers/on_channel_update.ts` around lines 1 - 29, The retry loop in updateChannelsWithRetry currently only handles resolved error results from operation() and will stop retrying if operation() throws; modify updateChannelsWithRetry so the call to operation() is wrapped in try/catch: invoke operation() inside the loop, catch any thrown error and treat it like a failure (assign to lastError), continue to the next retry after the existing delay logic, and only return early when operation() resolves with no error; keep the existing cloudlogErr call (including c.get('requestId') and context) to log lastError when all attempts fail.

fix(backend): harden file and queue handlers

18b78ef

Copilot AI review requested due to automatic review settings February 3, 2026 06:12

Copilot started reviewing on behalf of riderx February 3, 2026 06:12 View session

chatgpt-codex-connector Bot reviewed Feb 3, 2026

View reviewed changes

Copilot AI reviewed Feb 3, 2026

View reviewed changes

riderx added 2 commits February 3, 2026 06:22

fix(backend): await channel updates in retries

62997ed

fix(backend): restore channel update errors

9f5b569

coderabbitai Bot reviewed Feb 3, 2026

View reviewed changes

riderx closed this Feb 6, 2026

riderx deleted the riderx/log-fix-plan branch March 17, 2026 14:52

	return c.json(BRES)
	throw simpleError('no_app_version_id_or_s3_path', 'no app_version_id or s3_path', { record })

Uh oh!

Conversation

riderx commented Feb 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary (AI generated)

Test plan (AI generated)

Screenshots (AI generated)

Checklist (AI generated)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Feb 3, 2026

Quality Gate failed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riderx commented Feb 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Feb 3, 2026 •

edited

Loading