[codex] harden files upload retries and alert resilience#1934
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDurable-object upload forwarding/retry logic rewritten (replay heuristics, header-based retry signaling, HEAD-based offset recovery), PostgREST retry predicates centralized and used across triggers/statistics/manifest update, env access normalized, and new unit tests added for upload retry behaviors and manifest retry. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant UploadHandler as Upload Handler
participant DurableObject as Durable Object
participant RetryBackoff as Backoff/Retry
Client->>UploadHandler: POST/PATCH upload request
UploadHandler->>UploadHandler: evaluate requestHasNonEmptyUploadBody
alt Body replayable
UploadHandler->>DurableObject: fetch (attach body, duplex: 'half', extended timeout)
else No body / non-replayable
UploadHandler->>DurableObject: fetch (no body)
end
DurableObject-->>UploadHandler: response (e.g. 503 + X-Capgo-DO-Retryable:1)
UploadHandler->>UploadHandler: classify via header / isRetryableDurableObjectResetError
alt Can replay
UploadHandler->>RetryBackoff: schedule retry with backoff (re-send request)
RetryBackoff-->>UploadHandler: retry outcome
UploadHandler->>Client: forward final response
else Cannot replay
UploadHandler->>DurableObject: HEAD (remove content-length)
DurableObject-->>UploadHandler: HEAD response (Upload-Offset)
UploadHandler->>Client: respond 409 with recovered offset headers
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/backend-alert-resilience.unit.test.ts (1)
76-76: Consider usingit.concurrent()for the new tests.Per coding guidelines, tests should use
it.concurrent()when possible to maximize parallelism. The new tests at lines 76, 110, 244, and 273 use isolated state and mocks, so they can safely run concurrently.♻️ Suggested change for new tests
- it('retries retryable durable object responses for empty-body upload creation requests', async () => { + it.concurrent('retries retryable durable object responses for empty-body upload creation requests', async () => {Apply similar changes to the other new tests at lines 110, 244, and 273.
As per coding guidelines:
tests/**/*.{ts,js}: Useit.concurrent()instead ofit()when possible to run tests in parallel within the same file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/backend-alert-resilience.unit.test.ts` at line 76, Replace the synchronous test declarations with concurrent ones: change the it(...) calls for the tests whose titles include "retries retryable durable object responses for empty-body upload creation requests" and the other new tests referenced at lines 110, 244, and 273 to use it.concurrent(...) instead of it(...), ensuring the same test callback and mocks/state remain unchanged so they run safely in parallel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@supabase/functions/_backend/utils/utils.ts`:
- Around line 174-181: existInEnv currently uses the "in" operator which treats
keys with undefined values as present; change it to check the actual value so
undefined bindings are treated as absent (e.g., use getContextEnv(c)[key] !==
undefined or Object.hasOwn(contextEnv, key) && contextEnv[key] !== undefined).
Also update getEnv to consistently treat undefined the same way (read
contextEnv[key], return it if !== undefined, otherwise fall through to fallback)
so callers like the MAIN_SUPABASE_DB_URL branch don't see a
configured-but-undefined value and skip fallbacks; reference existInEnv, getEnv
and getContextEnv to locate the code to change.
---
Nitpick comments:
In `@tests/backend-alert-resilience.unit.test.ts`:
- Line 76: Replace the synchronous test declarations with concurrent ones:
change the it(...) calls for the tests whose titles include "retries retryable
durable object responses for empty-body upload creation requests" and the other
new tests referenced at lines 110, 244, and 273 to use it.concurrent(...)
instead of it(...), ensuring the same test callback and mocks/state remain
unchanged so they run safely in parallel.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6d57291c-c195-489d-bfb0-c13205abcb86
📒 Files selected for processing (10)
supabase/functions/_backend/files/files.tssupabase/functions/_backend/files/uploadHandler.tssupabase/functions/_backend/files/util.tssupabase/functions/_backend/public/statistics/index.tssupabase/functions/_backend/triggers/cron_stat_app.tssupabase/functions/_backend/triggers/cron_sync_sub.tssupabase/functions/_backend/triggers/on_manifest_create.tssupabase/functions/_backend/utils/retry.tssupabase/functions/_backend/utils/utils.tstests/backend-alert-resilience.unit.test.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f19bb7d368
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76fc55e3d8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/backend-alert-resilience.unit.test.ts (1)
204-243: Consider asserting retry recovery uses HEAD on the second fetch.This test validates the final 409 +
Upload-Offset, which is great. Adding call-shape assertions would make the regression stricter and protect the PATCH→HEAD recovery contract.Suggested test hardening
const response = await filesTestUtils.fetchUploadHandlerWithRetry( @@ expect(response.status).toBe(409) expect(response.headers.get('Upload-Offset')).toBe('5242880') expect(handler.fetch).toHaveBeenCalledTimes(2) + const firstCallRequest = handler.fetch.mock.calls[0]?.[0] as Request + const secondCallRequest = handler.fetch.mock.calls[1]?.[0] as Request + expect(firstCallRequest.method).toBe('PATCH') + expect(secondCallRequest.method).toBe('HEAD')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/backend-alert-resilience.unit.test.ts` around lines 204 - 243, The test should assert the retry recovery uses a HEAD request on the second fetch to enforce the PATCH→HEAD recovery contract; update the test for 'recovers upload offset after a retryable durable object patch response' to add a call-shape assertion against handler.fetch (e.g., using toHaveBeenNthCalledWith or similar) to verify the second call was made with method 'HEAD' (and include the expected URL/Request shape via expect.objectContaining or equivalent), while keeping the existing assertions on response.status, Upload-Offset and call count.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/backend-alert-resilience.unit.test.ts`:
- Around line 204-243: The test should assert the retry recovery uses a HEAD
request on the second fetch to enforce the PATCH→HEAD recovery contract; update
the test for 'recovers upload offset after a retryable durable object patch
response' to add a call-shape assertion against handler.fetch (e.g., using
toHaveBeenNthCalledWith or similar) to verify the second call was made with
method 'HEAD' (and include the expected URL/Request shape via
expect.objectContaining or equivalent), while keeping the existing assertions on
response.status, Upload-Offset and call count.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bfd1403a-3d36-4d59-83ba-d3f4f8c4ddf0
📒 Files selected for processing (2)
supabase/functions/_backend/files/files.tstests/backend-alert-resilience.unit.test.ts
✅ Files skipped from review due to trivial changes (1)
- supabase/functions/_backend/files/files.ts
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62eb0f5821
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|



Summary (AI generated)
on_manifest_createmanifestfile_sizeupdates, with regression coverage for all three failure pathsMotivation (AI generated)
Cloudflare Durable Object storage resets were being treated as terminal upload failures even when the request was safe to replay, and the error-reporting path could throw its own
ENVIRONMENTlookup error when bindings were unavailable on the active context. Separately,on_manifest_createstill treated transient PostgREST 5xx responses as permanent failures, which matched the queue alerts.Business Impact (AI generated)
This reduces failed file uploads for bundle and attachment flows, cuts noisy Discord alerts that masked the real upload issue, and makes manifest follow-up jobs more resilient to transient infrastructure hiccups. That lowers support load and improves release reliability for customers shipping updates.
Test Plan (AI generated)
bunx vitest run tests/backend-alert-resilience.unit.test.tsbun run lint:backendbun run supabase:with-env -- bunx vitest run tests/trigger-error-cases.test.tsbun run supabase:with-env -- bunx vitest run tests/tus-upload.test.tsGenerated with AI
Summary by CodeRabbit
Bug Fixes
Refactor
Tests