Skip to content

feat(cloud): rate limiting + Sentry wiring + Stripe webhook hardening#33

Merged
Gradata merged 11 commits into
mainfrom
feat/api-security-hardening
Apr 15, 2026
Merged

feat(cloud): rate limiting + Sentry wiring + Stripe webhook hardening#33
Gradata merged 11 commits into
mainfrom
feat/api-security-hardening

Conversation

@Gradata

@Gradata Gradata commented Apr 13, 2026

Copy link
Copy Markdown
Owner

Summary

Closes three production gaps on the Gradata Cloud FastAPI backend. Three atomic commits, each independently revertible.

  • Rate limiting via slowapi — three buckets (public / authenticated / sensitive) covering /health, /ready, /billing/*, /billing/webhook. Limits are env-configurable; defaults: 60/min public, 300/min per user, 10/min sensitive.
  • Sentry wiring — adds profiles_sample_rate=0.1, tags events with user.id + workspace_id via tag_user() called from every auth dependency, and scrubs email / email_address / user_email from event extras.
  • Stripe webhook hardening — replay protection (reject events with created older than 5 minutes), idempotency (skip already-processed event.ids via new processed_webhooks table + migration 005). Signature verification was already active.

Test plan

  • 115 pre-existing cloud tests still pass
  • 13 new tests: 5 rate-limit (boundary, tight cap, auth keying, test-env bypass), 4 Sentry (profiles arg, user/workspace tagging, email scrub, no-op when disabled), 4 webhook (stale reject, missing-created reject, duplicate event.id returns 200, new id inserted)
  • cd cloud && pytest -> 128 passed
  • Apply migrations/005_processed_webhooks.sql in Supabase before enabling webhook hardening in prod (idempotency degrades gracefully without it — logged warning, no 5xx).

Env vars added (all optional, all in RAILWAY-ENV.md)

  • GRADATA_RATE_LIMIT_ENABLED (default: on in prod, off in tests)
  • GRADATA_RATE_LIMIT_PUBLIC / _AUTHENTICATED / _SENSITIVE
  • GRADATA_SENTRY_PROFILES_SAMPLE_RATE (default 0.1)

Backwards-compat

  • All features gracefully disable when env vars are unset.
  • Rate limits off by default in tests — no existing test touched.
  • Webhook hardening tolerates missing processed_webhooks table during rollout (logs warning, processes event).

Generated with Gradata

Oliver Le and others added 3 commits April 13, 2026 13:35
Three buckets, each configurable via env:
- public (60/min/IP) on /health and /ready
- authenticated (300/min/user) on /billing/{checkout,portal,subscription}
- sensitive (10/min/IP) on /billing/webhook

The authenticated limiter keys by hashed bearer-token prefix so each
user gets their own budget; it falls back to IP keying when no auth
header is present. 429 responses carry a uniform JSON shape plus
Retry-After when slowapi surfaces it.

Test env defaults to disabled so the existing 115 tests are unaffected;
set GRADATA_RATE_LIMIT_ENABLED=true to exercise the caps (see
tests/test_rate_limit.py for boundary + token-keying coverage).

Co-Authored-By: Gradata <noreply@gradata.ai>
- Pass profiles_sample_rate=0.1 into sentry_sdk.init (config value was
  defined but never forwarded).
- Add tag_user(user_id, workspace_id) helper that writes to the current
  isolation scope so any later exception carries the owning user's UUID
  and workspace tag. Wired into get_current_user_id, get_current_brain,
  and require_operator.
- Expand _SENSITIVE_KEYS to scrub email / email_address / user_email in
  Sentry event extras and contexts. Opaque user.id is fine under
  send_default_pii=False; the raw email never is.
- No-op when DSN is unset (production behaviour unchanged).

Co-Authored-By: Gradata <noreply@gradata.ai>
- New migration 005_processed_webhooks.sql: one row per successfully
  handled event.id, PK conflict gives us idempotency for free.
- Reject events with `created` older than 5 minutes (replay window is
  shorter than Stripe's 3-day retry schedule, so legitimate deliveries
  always fall inside it).
- Reject events missing `created` entirely (can't date-check them, treat
  as malformed — signature-verified live events always include it).
- Look up event.id in processed_webhooks before dispatching; return 200
  with {"duplicate": true} when we've seen it before so the handler
  doesn't double-apply state changes (subscription plan flips, etc).
- Insert event.id into processed_webhooks after successful handling.
  PK conflicts from parallel deliveries are swallowed defensively.
- Signature verification (stripe.Webhook.construct_event) unchanged —
  that was already correct.

Migration note added to RAILWAY-ENV.md. When the table doesn't exist
yet, idempotency degrades to always-process (logged warning), so the
handler keeps working during rollout.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@Gradata has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 24 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 24 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e168a72a-f92e-4cd4-87a7-aca88108e49d

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfc271 and 616cc99.

📒 Files selected for processing (3)
  • cloud/app/auth.py
  • cloud/app/routes/billing.py
  • cloud/tests/test_billing.py

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Implements Stripe webhook hardening (signature verification, age check, atomic claim-first idempotency), adds DB migration for processed webhooks, introduces app-wide rate limiting with three buckets and middleware integration, expands Sentry sampling/tagging/scrubbing, and adds tests and env docs.

Changes

Cohort / File(s) Summary
Env & Docs
cloud/RAILWAY-ENV.md
Documented webhook hardening, required processed_webhooks migration, GRADATA_SENTRY_PROFILES_SAMPLE_RATE, and new rate-limit env vars (GRADATA_RATE_LIMIT_ENABLED, per-bucket caps).
Config
cloud/app/config.py
Added sentry_profiles_sample_rate: float and rate-limiting settings: rate_limit_enabled, rate_limit_public, rate_limit_authenticated, rate_limit_sensitive.
Rate limiting core
cloud/app/rate_limit.py
New module: slowapi-based limiters, auth-or-IP keying, enablement resolution, limits_enabled(), refresh_enabled_flag(), and decorator factories: public_limit, auth_limit, sensitive_limit.
Middleware integration
cloud/app/middleware.py
Replaced local limiter construction with imported limiters; calls refresh_enabled_flag() at setup; attaches limiters to app.state; registers custom RateLimitExceeded -> 429 JSON handler with optional Retry-After.
Routes — Billing & Webhooks
cloud/app/routes/billing.py
Added @auth_limit to checkout/portal/subscription routes; @sensitive_limit on /billing/webhook; webhook verifies Stripe signature, rejects missing/stale created, and implements atomic claim-first idempotency via processed_webhooks (processing → processed). create_checkout parameter order changed.
Routes — Health
cloud/app/routes/health.py
/health and /ready handlers now accept Request and are decorated with @public_limit.
Sentry init & tagging
cloud/app/sentry_init.py, cloud/app/auth.py
Expanded PII scrubbing to recurse lists/tuples and include email keys, added tag_user(user_id, workspace_id?), forwarded profiles_sample_rate to sentry_sdk.init, and added _tag_brain/calls to tag user/workspace in auth flows.
DB Migration
cloud/migrations/005_processed_webhooks.sql
New migration adds processed_webhooks table (PK event_id, event_type, status ('processing'
Tests
cloud/tests/test_billing.py, cloud/tests/test_rate_limit.py, cloud/tests/test_sentry.py
Updated billing tests to ensure id/created, added webhook-hardening tests (age rejection, idempotency, claim race, reclaim stale claims, store-unavailable), new rate-limit tests (public/auth/sensitive buckets, auth/IP keying), and expanded Sentry scrub/profile/tag tests.

Sequence Diagram(s)

sequenceDiagram
  participant Stripe as Stripe
  participant Client as HTTP Client
  participant App as Gradata App
  participant DB as Postgres (processed_webhooks)
  participant Handler as Internal Handler

  Client->>App: POST /api/v1/billing/webhook (signature header)
  App->>App: stripe.Webhook.construct_event (verify signature)
  alt signature invalid
    App-->>Client: 400 Bad Request
  else signature valid
    App->>App: extract event.id, event.created
    alt missing created OR created too old
      App-->>Client: 400 Bad Request
    else within age
      App->>DB: INSERT event_id ... status='processing' (claim)
      alt insert conflict OR existing status='processed'
        App-->>Client: 200 {"duplicate": true}
      else claim acquired
        App->>Handler: process event (business logic)
        Handler-->>App: processing complete
        App->>DB: UPDATE processed_webhooks SET status='processed', processed_at=now() WHERE event_id=...
        App-->>Client: 200 {"status":"ok","duplicate":false}
      end
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.83% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title concisely summarizes the three main features added: rate limiting, Sentry wiring, and Stripe webhook hardening, matching the core changes across the changeset.
Description check ✅ Passed The description clearly explains the three production gaps being closed, provides test coverage details, documents new environment variables, and addresses backwards-compatibility concerns directly related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/api-security-hardening

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/auth.py`:
- Around line 112-123: Centralize Sentry tagging by extracting the duplicated
tag_user logic into a single helper (e.g., tag_user_for_brain) and call it from
this function instead of inline calls, or alternatively add tagging inside the
common auth primitives (verify_api_key and verify_jwt) so every auth path
(including get_brain_for_request) applies the same Sentry context; update
get_brain_for_request to invoke the new helper or rely on the updated verify_*
functions, and ensure the helper accepts the brain/user_id and workspace_id
resolution logic used here (brain.get("user_id") or brain.get("id"),
workspace_id from brain or rows[0]) so tagging is consistent across codepaths.

In `@cloud/app/middleware.py`:
- Around line 22-37: The Retry-After header in _rate_limit_handler is being
truncated via int(retry_after) which floors values (0.8 -> 0) causing immediate
client retries; update the logic to compute the ceiling of retry_after and clamp
it to at least 1 when retry_after is present (e.g., use math.ceil(retry_after)
and max(1, ...)) and then set headers["Retry-After"] to the stringified integer
so the response always tells clients to wait at least one second.

In `@cloud/app/rate_limit.py`:
- Around line 54-66: Both limiter and auth_limiter are constructed with
headers_enabled=False so clients don't receive X-RateLimit-* headers; change
their constructors to use headers_enabled=True (i.e., update the Limiter(...)
calls that create limiter and auth_limiter) so the middleware will emit
X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers; keep
the existing enabled handling via refresh_enabled_flag() as-is since enabled is
mutable post-construction.

In `@cloud/app/routes/billing.py`:
- Around line 209-223: Replay-protection is too strict for potential clock skew:
adjust the age check in the handler that calls _event_created(event) to allow a
small grace window by comparing age against WEBHOOK_MAX_AGE_SECONDS plus a new
constant (e.g., WEBHOOK_MAX_AGE_GRACE_SECONDS or GRACE_SECONDS), update the
rejection log to include the effective threshold and use the same combined
threshold when raising the HTTPException; ensure the new constant is defined
near other webhook config so it's discoverable and documented.

In `@cloud/app/sentry_init.py`:
- Around line 37-42: The current scrubber only walks dicts and misses list/tuple
containers; update the scrub routine (the _scrub_dict/_scrub_value function) to
recursively traverse list and tuple values as well as dicts, replacing any dict
entries whose key lowercased matches entries in _SENSITIVE_KEYS with
"[Filtered]" and returning a new tuple for tuple inputs; ensure you handle
nested lists/tuples of dicts so keys like "email", "email_address", and
"user_email" are scrubbed even when embedded in lists.

In `@cloud/migrations/005_processed_webhooks.sql`:
- Around line 10-17: The schema currently records events only after successful
handling which allows a race where parallel deliveries both pass the lookup and
double-apply effects; change the processed_webhooks design to an atomic
claim-first flow: add columns like status (ENUM or TEXT), claimed_at TIMESTAMPTZ
NULL, processed_at TIMESTAMPTZ NULL (and optionally processor_id) to the
processed_webhooks table, then in cloud/app/routes/billing.py perform an atomic
INSERT ... ON CONFLICT DO NOTHING to "claim" the event_id before calling
_handle_event(event) and check the insert result (if 0 rows affected, skip
processing), and after successful handling UPDATE the row to status='processed'
and set processed_at (or wrap claim+handle+update in a transaction if desired)
so no two workers can both process the same event.

In `@cloud/tests/test_rate_limit.py`:
- Around line 78-96: The test test_rate_limit_disabled_in_test_env_by_default
resets rate_limit.limiter but not rate_limit.auth_limiter, which can leave auth
limiter state leaking between tests; after calling rate_limit.limiter.reset()
(and/or after rate_limit.refresh_enabled_flag()), also call
rate_limit.auth_limiter.reset() so both limiters are cleared before creating the
TestClient, ensuring consistent isolation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: baab2914-0d91-4156-b4d3-178fc99f309c

📥 Commits

Reviewing files that changed from the base of the PR and between ce1ad3c and 4039ff0.

📒 Files selected for processing (12)
  • cloud/RAILWAY-ENV.md
  • cloud/app/auth.py
  • cloud/app/config.py
  • cloud/app/middleware.py
  • cloud/app/rate_limit.py
  • cloud/app/routes/billing.py
  • cloud/app/routes/health.py
  • cloud/app/sentry_init.py
  • cloud/migrations/005_processed_webhooks.sql
  • cloud/tests/test_billing.py
  • cloud/tests/test_rate_limit.py
  • cloud/tests/test_sentry.py
📜 Review details
🔇 Additional comments (20)
cloud/tests/test_billing.py (5)

228-238: LGTM — Helper updated to support webhook hardening tests.

The setdefault approach cleanly injects required id and created fields while allowing individual tests to override them. Using id(event) for unique IDs is pragmatic for test isolation.


334-341: Intentional design: _post_with omits defaults for precise hardening tests.

Unlike TestWebhook._post_event, this helper correctly avoids injecting defaults so tests can exercise edge cases like missing created or specific id values.


343-356: LGTM — Stale event rejection test.

The 6-minute offset correctly exceeds the 5-minute WEBHOOK_MAX_AGE_SECONDS threshold, validating replay protection.


370-402: LGTM — Idempotency test validates duplicate detection.

Good approach: seeding processed_webhooks via mock_supabase.add_response, then asserting no update calls occur and response includes "duplicate": true.


404-436: LGTM — New event recording test.

Properly verifies that successful handling triggers an insert to processed_webhooks with correct event_id and event_type.

cloud/tests/test_rate_limit.py (4)

10-38: LGTM — Well-structured fixture with proper state isolation.

The fixture correctly:

  • Clears settings cache before setting env vars
  • Calls refresh_enabled_flag() to sync limiter state
  • Resets both limiters before and after test execution

This prevents cross-test state bleed as noted in the relevant code snippet about shared MemoryStorage.


40-59: LGTM — Public bucket boundary tests.

First test validates under-limit success; second validates 429 on overflow with proper detail message and optional Retry-After header check.


62-75: LGTM — Sensitive bucket test approach is sound.

Intentionally triggering signature/config failures (400/503) to prove the limiter runs first is a valid test strategy. The third request hitting 429 confirms the sensitive bucket's 2/min cap.


98-116: LGTM — Auth keying unit test.

Good coverage of the _auth_or_ip_key function:

  • Different tokens → different keys
  • Same token → deterministic (same key)
  • Missing auth → IP fallback with ip: prefix
  • Present auth → u: prefix
cloud/app/rate_limit.py (4)

30-39: LGTM — Auth key derivation is well-designed.

The blake2b hash with 6-byte digest (48 bits / 12 hex chars) provides sufficient collision resistance for rate-limit keying while keeping keys compact. The u: and ip: prefixes ensure namespace separation.


42-51: LGTM — Environment-aware toggle logic.

Explicit env var takes precedence; otherwise defaults to enabled (non-test) or disabled (test). This design prevents accidental rate-limit failures in the existing test suite.


69-78: LGTM — refresh_enabled_flag addresses import-time evaluation.

This correctly allows test fixtures to override env vars after import, then sync both limiters to the new state.


96-101: LGTM — Decorator factories are clean.

Using callables (_public, _authenticated, _sensitive) for lazy limit resolution is the correct slowapi pattern.

cloud/app/routes/billing.py (7)

56-61: LGTM — Rate limit decorator applied to checkout.

The @auth_limit decorator correctly uses the authenticated bucket (300/min per user).


101-104: LGTM — Rate limit decorator applied to portal.

Consistent use of @auth_limit for authenticated billing endpoints.


129-156: LGTM — Event accessor helpers are robust.

The polymorphic accessors (_event_id, _event_type, _event_created) correctly handle both dict events and Stripe Event objects, with appropriate null/type checks.


159-169: Defensive fallback on lookup failure is appropriate.

Returning False (not processed) when the table doesn't exist allows webhook handling to continue during migration rollout. The warning log provides visibility.


172-180: Insert exception handling is correct for PK race.

Two concurrent deliveries of the same event could race to insert; the second will fail with a PK conflict. Logging at info level and continuing is the right behavior since the event was already processed.


225-234: LGTM — Idempotency check and marker insertion are correctly sequenced.

The flow is correct:

  1. Check if already processed → return early with duplicate: True
  2. Handle the event
  3. Mark as processed

This ensures events aren't marked until successfully handled.


322-326: LGTM — Rate limit decorator applied to subscription endpoint.

Consistent use of @auth_limit for the subscription status endpoint.

Comment thread cloud/app/auth.py
Comment thread cloud/app/middleware.py
Comment thread cloud/app/rate_limit.py Outdated
Comment on lines +54 to +66
# IP-keyed limiter (public + sensitive buckets)
limiter = Limiter(
key_func=get_remote_address,
enabled=limits_enabled(),
headers_enabled=False,
)

# Bearer-token-keyed limiter (authenticated API bucket)
auth_limiter = Limiter(
key_func=_auth_or_ip_key,
enabled=limits_enabled(),
headers_enabled=False,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🌐 Web query:

slowapi Limiter constructor parameters headers_enabled

💡 Result:

The slowapi Limiter constructor includes a parameter headers_enabled of type bool with default value False. It controls whether to inject X-RateLimit-* response headers (such as X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) into responses when rate limits are applied. Full list of key constructor parameters from official API documentation and source analysis: - key_func: Callable[..., str] (required) - Function to extract client identifier (e.g., IP address). - default_limits: List[StrOrCallableStr] = [] - Global fallback limits for routes without specific limits. - application_limits: List[StrOrCallableStr] = [] - Limits applied to the entire application. - headers_enabled: bool = False - Enable rate limit response headers. - strategy: Optional[str] = None - Rate limiting strategy (e.g., "fixed-window"). - storage_uri: Optional[str] = None - Storage backend URI (e.g., "redis://localhost:6379"). - storage_options: Dict[str, str] = {} - Additional storage options. - auto_check: bool = True - Automatically check limits in decorator chain. - swallow_errors: bool = False - Suppress exceptions on storage failures. - in_memory_fallback: List[StrOrCallableStr] = [] - Fallback limits when storage is down. - Additional parameters: enabled (bool=True), retry_after, key_prefix, key_style, in_memory_fallback_enabled, config_filename. Usage example: from slowapi import Limiter from slowapi.util import get_remote_address limiter = Limiter( key_func=get_remote_address, headers_enabled=True, # Enables X-RateLimit-* headers default_limits=["100/hour"] )

Citations:


🏁 Script executed:

rg -n "refresh_enabled_flag" cloud/app/

Repository: Gradata/gradata

Length of output: 399


🏁 Script executed:

rg -n "enabled" cloud/app/rate_limit.py -A 2 -B 2

Repository: Gradata/gradata

Length of output: 1062


🏁 Script executed:

cat -n cloud/app/rate_limit.py

Repository: Gradata/gradata

Length of output: 4067


Consider enabling headers_enabled=True to expose rate-limit headers to clients.

With headers_enabled=False, clients cannot see their remaining request budget. Enabling X-RateLimit-* headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) improves API usability for integrators. The enabled property is already mutable post-construction and correctly updated by refresh_enabled_flag() at lines 77–78.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/rate_limit.py` around lines 54 - 66, Both limiter and auth_limiter
are constructed with headers_enabled=False so clients don't receive
X-RateLimit-* headers; change their constructors to use headers_enabled=True
(i.e., update the Limiter(...) calls that create limiter and auth_limiter) so
the middleware will emit X-RateLimit-Limit, X-RateLimit-Remaining, and
X-RateLimit-Reset headers; keep the existing enabled handling via
refresh_enabled_flag() as-is since enabled is mutable post-construction.

Comment thread cloud/app/routes/billing.py
Comment thread cloud/app/sentry_init.py
Comment thread cloud/migrations/005_processed_webhooks.sql
Comment thread cloud/tests/test_rate_limit.py
Gradata pushed a commit that referenced this pull request Apr 13, 2026
PR #33 (API security hardening) introduced 005_processed_webhooks.sql.
PR #33 has been merged and 005_processed_webhooks already applied to prod Supabase.
Renaming this migration to 006 to resolve the collision.

Co-Authored-By: Gradata <noreply@gradata.ai>
Security/correctness fixes from CodeRabbit CHANGES_REQUESTED review:

- billing: atomic claim-first webhook idempotency. Insert row with
  status='processing' BEFORE handler runs; PK conflict = race loss,
  skip handler. Eliminates double-apply under concurrent delivery.
- migrations/005: add status + claimed_at + processed_at columns to
  support claim-first design. Keep old created_at index for rollout compat.
- middleware: ceil+clamp Retry-After so 0.8s doesn't floor to 0s and
  cause immediate client retries.
- sentry_init: scrubber now recurses through list/tuple containers
  (prev only walked dicts — emails in nested arrays slipped through).
- auth: centralise Sentry user tagging in _tag_brain() so every auth
  path (get_current_brain + get_brain_for_request) gets the same
  Sentry context. Previously get_brain_for_request was untagged.
- billing: add WEBHOOK_MAX_AGE_GRACE_SECONDS=30 for NTP/edge clock skew.
- tests: reset auth_limiter alongside limiter in disabled-env test.
- tests: add claim-race-loss test asserting handler skips on PK conflict.
- tests: add list/tuple scrubbing test for Sentry PII protection.

Declined: headers_enabled=True on slowapi Limiter. slowapi's
X-RateLimit-* injection requires route handlers to take a
`response: Response` param and is incompatible with Pydantic/JSONResponse
returns. Tracked for a follow-up route migration. Retry-After on 429
still emitted from the exception handler.

All 130 cloud tests + 2070 SDK tests pass. No new ruff/pyright errors.

Co-Authored-By: Gradata <noreply@gradata.ai>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Apr 13, 2026

Copy link
Copy Markdown

Deploying gradata-dashboard with  Cloudflare Pages  Cloudflare Pages

Latest commit: 616cc99
Status: ✅  Deploy successful!
Preview URL: https://496a4a70.gradata-dashboard.pages.dev
Branch Preview URL: https://feat-api-security-hardening.gradata-dashboard.pages.dev

View logs

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/auth.py`:
- Around line 128-134: After verify_jwt(cred) (and similarly immediately after
verify_api_key(...) on the API-key branch) call tag_user(user_id) to set the
Sentry user context before any authorization checks that can fail; then after
loading the brain record via db.select("brains", filters={"user_id": user_id})
and before raising or returning, call tag_workspace/workspace tagging (or reuse
_tag_brain(rows[0], user_id=user_id) to enrich Sentry with workspace_id) so the
403/404/operator-denied paths are tagged; apply the same change to the other
similar blocks referenced (around the verify_jwt/verify_api_key flows at the
other ranges) so tagging happens immediately after identity verification and is
then enriched after the brain DB read.

In `@cloud/app/config.py`:
- Around line 29-34: The Settings.rate_limit_enabled field is unused because
limits_enabled() in cloud/app/rate_limit.py reads os.environ directly; update
limits_enabled() to obtain the flag from get_settings().rate_limit_enabled
(import get_settings from cloud.app.config if needed) and use that boolean to
enable/disable the limiter instead of reading GRADATA_RATE_LIMIT_ENABLED from
os.environ (remove or ignore the current os.environ fallback so the Settings
value controls the kill-switch).

In `@cloud/migrations/005_processed_webhooks.sql`:
- Around line 21-29: The current webhook handler treats any existing row as a
duplicate, turning a live 'processing' claim into a permanent poison pill;
update the billing route logic in cloud/app/routes/billing.py (the handler
around the call to _handle_event()) to only short‑circuit when status =
'processed' (not merely any row), and/or implement reclaim logic that, when an
existing row has status = 'processing' and claimed_at is older than your retry
window (e.g., >10 minutes), either DELETE or reset that row so the handler can
re-claim and rerun _handle_event(); ensure you update checks and any
upsert/insert paths to set claimed_at, status, and processed_at consistently to
avoid stuck rows.

In `@cloud/tests/test_billing.py`:
- Around line 338-345: TestWebhookHardening._post_with currently mirrors
TestWebhook._post_event but intentionally omits the setdefault calls for
id/created to allow tests to control those fields; add a short inline comment
above the _post_with method explaining that omission (reference
TestWebhook._post_event and the setdefault behavior) so future readers
understand it's deliberate and not a bug.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2ecd596-0821-4ba1-a2ca-81069441c60d

📥 Commits

Reviewing files that changed from the base of the PR and between ce1ad3c and fe0ed7c.

📒 Files selected for processing (12)
  • cloud/RAILWAY-ENV.md
  • cloud/app/auth.py
  • cloud/app/config.py
  • cloud/app/middleware.py
  • cloud/app/rate_limit.py
  • cloud/app/routes/billing.py
  • cloud/app/routes/health.py
  • cloud/app/sentry_init.py
  • cloud/migrations/005_processed_webhooks.sql
  • cloud/tests/test_billing.py
  • cloud/tests/test_rate_limit.py
  • cloud/tests/test_sentry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (24)
cloud/app/rate_limit.py (3)

30-39: LGTM — Token-keyed rate limiting is well implemented.

The _auth_or_ip_key function correctly:

  • Parses the Authorization header case-insensitively
  • Hashes the bearer token using blake2b (48 bits is sufficient for rate-limit disambiguation)
  • Falls back to IP for anonymous requests with clear prefixes (u: vs ip:)

42-74: LGTM — Environment-aware rate limiting toggle is correctly implemented.

The design appropriately:

  • Defaults to enabled in production, disabled in tests (prevents flaky test failures)
  • Allows explicit override via GRADATA_RATE_LIMIT_ENABLED
  • Documents the headers_enabled=False decision with clear rationale

77-109: LGTM — Refresh mechanism and decorator factories are well designed.

The lazy callable approach for limit strings allows configuration changes without restarting, and refresh_enabled_flag() correctly synchronizes both limiter instances after settings are cleared.

cloud/tests/test_rate_limit.py (4)

10-37: LGTM — Test fixture provides proper isolation.

The limited_client fixture correctly:

  • Enables rate limiting explicitly via env var
  • Clears settings cache before and after
  • Resets both limiter and auth_limiter to prevent state leakage between tests

40-76: LGTM — Rate limit boundary tests are comprehensive.

Tests correctly verify:

  • Requests under limit succeed (lines 40-44)
  • Requests over limit return 429 with proper JSON detail (lines 47-59)
  • Sensitive bucket enforces tighter caps independently (lines 62-75)

78-100: LGTM — Disabled-by-default test validates test environment behavior.

The test properly verifies that rate limiting is a no-op in test environments unless explicitly enabled, preventing flaky test failures in the existing test suite.


102-120: LGTM — Auth keying unit tests are thorough.

The mock _Req class correctly simulates FastAPI's Request interface for testing _auth_or_ip_key, verifying:

  • Different tokens produce different keys
  • Same token produces deterministic keys
  • Missing auth falls back to IP-prefixed keys
cloud/tests/test_billing.py (6)

227-238: LGTM — Helper updated to match hardening requirements.

The _post_event helper now injects default id and created fields via setdefault, ensuring existing tests continue to work with the new replay protection while allowing individual tests to override these values.


265-271: LGTM — Assertions correctly filter for workspace-specific writes.

The filtering for table == "workspaces" accounts for the new processed_webhooks updates, keeping test assertions focused on the business logic being verified.


347-373: LGTM — Replay protection tests validate boundary conditions.

Tests correctly verify:

  • Events older than 5 minutes are rejected with 400 (line 359)
  • Events missing created are rejected with 400 (line 372)

374-406: LGTM — Duplicate detection test properly validates idempotency.

The test seeds processed_webhooks with an existing event_id and verifies:

  • Response is 200 with duplicate: True
  • Handler does not run (no captured updates)

408-457: LGTM — Claim-first workflow test validates the two-phase insert/update pattern.

The test correctly asserts:

  • A claim insert occurs with status='processing' before handling
  • A completion update flips status to 'processed' after success

459-499: LGTM — Race condition test validates claim-loss behavior.

The test simulates a PK conflict on insert and correctly verifies:

  • Response is 200 with duplicate: True
  • Handler does NOT touch workspaces (preventing double-apply)

This is the core correctness guarantee of the claim-first design.

cloud/app/routes/billing.py (11)

56-62: LGTM — Rate limiting correctly applied to checkout endpoint.

The @auth_limit decorator is properly placed, and the parameter reordering (request: Request first) ensures slowapi can extract the request for key derivation.


101-106: LGTM — Portal endpoint correctly rate-limited.


129-137: LGTM — Replay protection constants include clock skew grace.

The 30-second grace period addresses the previous review feedback about tight timing with potential clock drift between Stripe's edge and your servers.


140-161: LGTM — Event accessor helpers handle both dict and object forms.

This defensive approach ensures compatibility across Stripe SDK versions where construct_event may return different types.


164-183: LGTM — _already_processed provides cheap duplicate short-circuit with graceful degradation.

The try/except block correctly handles the migration rollout scenario where processed_webhooks table may not exist yet, falling back to "not processed" rather than failing all webhooks.


222-239: LGTM — _mark_processed is correctly best-effort.

Since the claim row already exists with status='processing', a failed update still guarantees idempotency. Logging the failure is appropriate for observability.


242-252: LGTM — Webhook handler docstring accurately describes hardening measures.


268-283: LGTM — Replay protection correctly rejects stale events.

The age check uses the combined threshold (WEBHOOK_MAX_AGE_SECONDS + WEBHOOK_MAX_AGE_GRACE_SECONDS) and logs sufficient detail for debugging rejected events.


285-310: LGTM — Idempotency implementation follows the claim-first pattern correctly.

The flow is:

  1. Cheap short-circuit via _already_processed (avoids INSERT for known duplicates)
  2. Atomic claim via _claim_event (PK conflict = lose race = skip)
  3. Handle event only if claim won
  4. Mark processed (best-effort, non-fatal)

Returning HTTP 200 for duplicates is correct webhook behavior — Stripe will not retry on 200.


396-401: LGTM — Subscription endpoint correctly rate-limited.


186-219: The broad exception catch is intentional and documented in the function's docstring.

The code explicitly states that "conflict, network, table missing" are treated identically as "did not claim" because it's "safer to drop a duplicate than to double-apply a subscription update." This is a deliberate safety-first trade-off: any failure (PK conflict, network timeout, missing table) results in skipping processing to avoid double-applying webhook updates.

If distinguishing network/timeout errors for retry logic is desired, that would require reconsidering the error handling strategy throughout this webhook processing flow, not just this function.

Comment thread cloud/app/auth.py
Comment thread cloud/app/config.py
Comment thread cloud/migrations/005_processed_webhooks.sql
Comment thread cloud/tests/test_billing.py
Remove unused `_scrub_and_enrich` in sentry_init.py — it was never
wired into `before_send` (which uses `_scrub_event` directly) and has
no callers in cloud/ or tests/. Pure dead-code removal, no behaviour
change.

Declined (conservatism on security-sensitive code):
- Webhook event extractors (_event_id/_type/_created) kept as separate
  named helpers — explicit is better than clever on the audit path.
- `_mark_processed`'s unused event_type param kept for symmetry with
  `_claim_event` — trivial cost, cleaner signature.
- `_scrub_dict` compat shim kept as documented one-liner.
- Atomic claim-first logic and _scrub_value recursion untouched.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

Polite re-ping: CR review is still showing CHANGES_REQUESTED but all line-level actionables from the last review were addressed in the most recent push. Could you re-review when convenient? @coderabbitai full review

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown

@Gradata: Thanks for the re-ping! Triggering a full review now to pick up all the changes from your latest push. I'll go through the entire diff fresh.

✅ Actions performed

Full review triggered.

- rate_limit.py: wire Settings.rate_limit_enabled through limits_enabled().
  Env var override (GRADATA_RATE_LIMIT_ENABLED) still takes precedence for
  tests/ops, but .env and Settings() values are now honoured instead of
  ignored. CR flagged the config field as dead wiring.
- billing.py: fix poison-pill in webhook idempotency. _already_processed
  now short-circuits only on status='processed' (was: any row). _claim_event
  reclaims 'processing' rows older than WEBHOOK_CLAIM_STALE_SECONDS (10min)
  using a claimed_at-scoped UPDATE so racing reclaims can't both win.
  Previously a worker crash between claim and completion would permanently
  block every Stripe retry for that event_id.
- test_billing.py: seed duplicate-test with status='processed' to match new
  contract; add _post_with docstring explaining the intentional divergence
  from _post_event; add two regression tests (stale reclaim runs handler,
  active claim does not).

Security constraints preserved:
- Stripe signature verification unchanged (still via construct_event).
- Rate limit enforcement unchanged (still on by default in non-test env).
- Sentry PII scrubbing unchanged (emails/tokens still filtered).

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

Addressed CodeRabbit round-3 feedback in 8bfc271:

  • config.py / rate_limit.py (Major): Settings.rate_limit_enabled is now actually wired through limits_enabled(). The GRADATA_RATE_LIMIT_ENABLED env-var override is kept (tests and ops kill-switch) but .env values and Settings(rate_limit_enabled=False) overrides now take effect instead of being silently ignored.
  • billing.py webhook poison-pill (Major): _already_processed() now only short-circuits on status='processed'. _claim_event() reclaims stale processing rows older than 10 minutes using a claimed_at-scoped atomic UPDATE (racing reclaims can't both win — the filter on the prior timestamp makes it a CAS). A crashed worker no longer permanently blocks Stripe retries for that event_id.
  • test_billing.py (Trivial): Added _post_with docstring explaining the intentional omission of setdefault for id/created (tests that need to control those fields precisely). Updated duplicate-delivery test to seed status='processed' to match the tightened contract. Added two regression tests: stale claim triggers reclaim + handler run, active claim does not.

Security invariants preserved:

  • Stripe construct_event signature verification unchanged.
  • Rate limiting still enforced by default outside test env.
  • Sentry PII scrubbing (emails/tokens) unchanged.

Tests: 132 passed (130 existing + 2 new regression tests). Push SHA: 8bfc271.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — fixes pushed, please re-verify.

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown

@Gradata: Thanks for the detailed summary of changes in 8bfc271! The fixes look well-reasoned — particularly the CAS-style stale processing reclaim in _claim_event() and the proper wiring of Settings.rate_limit_enabled through limits_enabled(). Triggering a full re-review now to verify everything end-to-end.

✅ Actions performed

Full review triggered.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — fixes pushed since last review, please re-verify

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown

@Gradata: Thanks for the re-ping! Triggering a full review now to pick up all the latest fixes. I'll go through the entire diff fresh and re-verify everything end-to-end.

✅ Actions performed

Full review triggered.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — CR Pro active, please re-verify.

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown

@Gradata: Thanks for the ping and for confirming CR Pro is active! Triggering a full re-review now to verify all the latest changes end-to-end, including the fixes from 8bfc271 (rate_limit_enabled wiring, CAS-style stale claim reclaim, and the regression tests).

✅ Actions performed

Full review triggered.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — re-verify latest pushes

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown

@Gradata: Thanks for the ping! Triggering a full review now to re-verify all the latest pushes end-to-end, including the rate-limiting wiring, CAS-style stale claim reclaim in _claim_event(), the tightened _already_processed() contract, and the new regression tests.

✅ Actions performed

Full review triggered.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
cloud/app/auth.py (1)

128-133: ⚠️ Potential issue | 🟡 Minor

Tag identity earlier on JWT/operator paths to cover denied/not-found branches.

In these flows, tagging happens after checks that can raise 403/404, so those failures can miss user.id context. Tag immediately after identity verification (verify_jwt(...) / claims["sub"]), then enrich later with workspace when available.

Suggested patch
 async def get_current_brain(
@@
-    user_id = await verify_jwt(cred)
+    user_id = await verify_jwt(cred)
+    tag_user(user_id)
     db = get_db()
@@
 async def get_brain_for_request(
@@
-    user_id = await verify_jwt(cred)
+    user_id = await verify_jwt(cred)
+    tag_user(user_id)
     brain = await verify_brain_ownership(brain_id, user_id)
     _tag_brain(brain, user_id=user_id)
@@
 async def require_operator(
@@
     claims = await verify_jwt_claims(credentials.credentials)
     user_id = claims["sub"]
+    tag_user(user_id)
@@
-    tag_user(user_id)
     return user_id

Also applies to: 171-174, 205-217

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/auth.py` around lines 128 - 133, After verify_jwt(...) returns the
user id, call the tagging helper immediately to record the identity context
(e.g., invoke _tag_brain or the common tagging function with user_id) before
performing DB lookups or raising HTTPException; then later, after
get_db()/rows/workspace resolution, call the same tagging function again to
enrich the tag with workspace/brain info. Apply this pattern around the
verify_jwt -> db.select flow (current block using verify_jwt, get_db, rows, and
_tag_brain) and similarly at the other mentioned sites (the flows around lines
~171-174 and ~205-217) so 403/404 branches include user.id context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/billing.py`:
- Around line 197-245: _claim_event currently collapses "already processed",
"another worker holds claim", and "DB/table unavailable" into a single False
result which leads stripe_webhook to ack retries incorrectly; change
_claim_event (and the stripe_webhook call sites that use it) to return distinct
outcomes (e.g., an enum or tuple) indicating: PROCESSED (row exists with
status='processed'), LOCKED (status='processing' and claimed_at is fresh),
RECLAIMED/CLAIMED (we won the claim), and ERROR/UNAVAILABLE (DB errors or
missing table). Update stripe_webhook handling so only PROCESSED results return
200 {"duplicate": true}; LOCKED and ERROR/UNAVAILABLE must return a non-2xx (or
explicitly opt to bypass idempotency only when ERROR/UNAVAILABLE is
intentionally chosen), and CLAIMED proceeds to call _handle_event; reference
function names _claim_event, stripe_webhook, and _handle_event when making these
control-flow changes.
- Around line 257-266: The reclaim UPDATE currently only filters on event_id and
claimed_at_raw and can overwrite a row that another worker already set to
processed; add a condition that the existing row's status is "processing" to the
CAS filter so the db.update on the "processed_webhooks" table only succeeds if
the row is still processing. Modify the filters passed to the db.update call
(the one updating "processed_webhooks" with data {"status": "processing",
"claimed_at": "now()"}) to include status="processing" (reference the db.update
call and the _mark_processed path).

---

Duplicate comments:
In `@cloud/app/auth.py`:
- Around line 128-133: After verify_jwt(...) returns the user id, call the
tagging helper immediately to record the identity context (e.g., invoke
_tag_brain or the common tagging function with user_id) before performing DB
lookups or raising HTTPException; then later, after get_db()/rows/workspace
resolution, call the same tagging function again to enrich the tag with
workspace/brain info. Apply this pattern around the verify_jwt -> db.select flow
(current block using verify_jwt, get_db, rows, and _tag_brain) and similarly at
the other mentioned sites (the flows around lines ~171-174 and ~205-217) so
403/404 branches include user.id context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f5bbbbc-bea6-4e7b-9c45-04a120e22515

📥 Commits

Reviewing files that changed from the base of the PR and between ce1ad3c and 8bfc271.

📒 Files selected for processing (12)
  • cloud/RAILWAY-ENV.md
  • cloud/app/auth.py
  • cloud/app/config.py
  • cloud/app/middleware.py
  • cloud/app/rate_limit.py
  • cloud/app/routes/billing.py
  • cloud/app/routes/health.py
  • cloud/app/sentry_init.py
  • cloud/migrations/005_processed_webhooks.sql
  • cloud/tests/test_billing.py
  • cloud/tests/test_rate_limit.py
  • cloud/tests/test_sentry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (10)
cloud/app/auth.py (1)

101-110: Good centralization of auth Sentry context.

_tag_brain(...) is a clean consolidation point and keeps user/workspace tagging consistent across auth paths.

cloud/app/middleware.py (2)

23-43: 429 response shaping and Retry-After handling look correct.

The ceil+floor-to-1 behavior on Retry-After and the consistent JSON body are solid.


48-55: Limiter wiring through app.state is correctly integrated.

Refreshing the enabled flag before registration and exposing both limiters on app.state is a clean setup for decorated routes.

cloud/app/config.py (1)

26-34: Config additions are clear and scoped well.

The new Sentry profiling and rate-limit bucket settings are straightforward and match the intended runtime controls.

cloud/tests/test_sentry.py (1)

116-201: Strong regression coverage added for the Sentry hardening paths.

These tests close the key gaps (nested container scrubbing, profile sampling wiring, and tag_user behavior with/without active Sentry).

cloud/app/routes/health.py (1)

43-52: Rate-limit integration on health endpoints looks correct.

The decorator + request signature update is applied consistently to both probes.

cloud/RAILWAY-ENV.md (1)

35-68: Docs update is practical and rollout-friendly.

The added migration callout and env var sections make the deployment expectations explicit.

cloud/migrations/005_processed_webhooks.sql (1)

16-43: Migration shape matches the hardened webhook claim model.

The status + timestamp fields and claim-time index support the handler’s idempotency/reclaim flow well.

cloud/app/sentry_init.py (1)

69-102: Sentry hardening changes are well implemented.

Recursive scrubbing, profiles sampling forwarding, and isolation-scope tagging are all aligned with the stated goals.

Also applies to: 134-153, 156-174

cloud/tests/test_billing.py (1)

510-612: Good regression coverage for stale vs. active claims.

These two tests pin the intended reclaim boundary well and make the claim lifecycle much harder to regress accidentally.

Comment thread cloud/app/routes/billing.py Outdated
Comment thread cloud/app/routes/billing.py
- Tag Sentry user context immediately after JWT/API-key verification so
  403 "Not your brain", 404 "No brain found", and operator-denied rejections
  carry identity in Sentry (CR minor on auth.py).
- Remove redundant processed_webhooks INSERT from _handle_event. The outer
  _claim_event/_mark_processed wrappers own idempotency; the inner INSERT
  was a stale path from the pre-claim-flow code kept accidentally by the
  main-branch merge, causing duplicate-row test failures.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

Rebased on main (now up-to-date)

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — rebased on latest main, all prior fixes applied

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown

@Gradata: Thanks for the heads-up on the rebase! Triggering a full review now to re-verify all the latest changes end-to-end on the rebased branch — including the rate-limiting wiring (Settings.rate_limit_enabled through limits_enabled()), the CAS-style stale claim reclaim in _claim_event(), the tightened _already_processed() contract, Sentry PII scrubbing, and the regression tests.

✅ Actions performed

Full review triggered.

Gradata and others added 2 commits April 14, 2026 23:17
CRITICAL: _claim_event() collapsed all failure modes (missing migration,
DB down, PK conflict) into "return False" which the webhook route acked
as 200 duplicate. A crash in the claim store silently dropped Stripe
events. Split into distinct outcomes via a new ClaimStoreUnavailable
exception — the route now returns 503 on store failures so Stripe
retries, and only returns 200 duplicate on a legitimate race loss.

MAJOR: the stale-claim reclaim UPDATE filtered only on event_id +
claimed_at, so if a worker flipped the row to status='processed'
between our SELECT and UPDATE, we'd reset it back to 'processing'
and re-run the handler. Added status='processing' to the CAS filter.

Tests: added test_claim_store_unavailable_returns_503_not_duplicate_ack
covering the CRITICAL regression. Updated the race-loss test to seed
the winner's active claim row (since _already_processed lookup failure
no longer silently returns False — the follow-up claim path handles it).

All 143 cloud tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

Addressed the latest CR blockers in 616cc99.

CRITICAL (billing.py:245) — fixed. _claim_event() no longer conflates "store unavailable" with "race loss". A new ClaimStoreUnavailable exception is raised when the INSERT + follow-up SELECT both fail (missing migration, DB outage, vanished row). The webhook route converts that into a 503 so Stripe retries rather than an ambiguous 200 {"duplicate": true}. Only a legitimate concurrent claim returns the duplicate ack now.

MAJOR (billing.py:266) — fixed. The stale-claim reclaim UPDATE now filters on status='processing' in addition to event_id + claimed_at. Previously, if a worker flipped the row to processed between our SELECT and UPDATE, the reclaim could match and reset it back to processing, double-running the handler.

Minor (auth.py:139) — already addressed in 988d947 via the centralised _tag_brain() helper + tag_user() calls placed immediately after JWT / API-key verification in get_current_brain, get_brain_for_request, get_current_user_id, and require_operator. All 403/404 auth paths now carry Sentry identity context.

Tests: added test_claim_store_unavailable_returns_503_not_duplicate_ack guarding the CRITICAL regression. Updated the race-loss test to seed the winner's active claim row. 143/143 cloud tests pass.

Preserved: Stripe signature verification, rate limit defaults, Sentry PII scrubbing — no behavioural changes outside the webhook claim state machine.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cloud/app/auth.py (1)

151-164: ⚠️ Potential issue | 🟡 Minor

Missing Sentry tagging in get_current_user_id_flexible.

This auth dependency authenticates users but doesn't call tag_user, unlike get_current_user_id (line 147) and get_current_brain. If an endpoint using this dependency raises an exception, the Sentry event won't have user context.

🛡️ Proposed fix to add Sentry tagging
 async def get_current_user_id_flexible(
     credentials: HTTPAuthorizationCredentials = Security(_bearer),
 ) -> str:
     """Extract user_id from either a SDK API key or a Supabase JWT.

     Unlike :func:`get_current_brain`, this does NOT require that a brain
     already exists — it only identifies the caller. Use for endpoints that
     should work for brand-new users who haven't created a brain yet.
     """
     cred = credentials.credentials
     if cred.startswith("gd_"):
         brain = await verify_api_key(cred)
-        return brain["user_id"]
-    return await verify_jwt(cred)
+        user_id = brain["user_id"]
+        tag_user(user_id)
+        return user_id
+    user_id = await verify_jwt(cred)
+    tag_user(user_id)
+    return user_id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/auth.py` around lines 151 - 164, get_current_user_id_flexible is
missing a call to tag_user so Sentry won't have user context for errors; update
the function to call tag_user with the resolved user_id before returning in both
branches (after verify_api_key(cred) and after verify_jwt(cred)) — e.g., when
cred startswith "gd_" extract user_id from brain = await verify_api_key(cred),
call tag_user(brain["user_id"]) then return brain["user_id"], and in the JWT
path do id = await verify_jwt(cred); tag_user(id); return id. Ensure you import
or reference the existing tag_user function used by
get_current_user_id/get_current_brain.
cloud/tests/test_billing.py (1)

250-250: 🧹 Nitpick | 🔵 Trivial

Remove unnecessary if True: statement.

This appears to be a leftover artifact. The if True: block doesn't affect execution but adds visual noise.

🧹 Proposed cleanup
         mock_supabase.update = _capture_update  # type: ignore[method-assign]
-        if True:
-            event = {
-                "type": "customer.subscription.updated",
+        event = {
+            "type": "customer.subscription.updated",

Also applies to lines 283 and 307.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/tests/test_billing.py` at line 250, Remove the redundant "if True:"
blocks that were left as artifacts (they appear around the test assertions in
the billing test file) — locate each standalone "if True:" and its indented
block and simply unindent its body into the surrounding test scope (delete the
"if True:" line and adjust indentation), ensuring test logic and fixtures remain
identical; repeat this cleanup for the other occurrences of the same pattern so
no "if True:" statements remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/billing.py`:
- Around line 303-312: The updates to the processed_webhooks row currently pass
the literal string "now()" (in the db.update calls that set claimed_at and
processed_at), which results in the string 'now()' being stored; change these to
real ISO timestamps by importing datetime and timezone and setting claimed_at
and processed_at to datetime.now(timezone.utc).isoformat() when building the
data dicts for the db.update calls (refer to the db.update calls that set
data={"status":"processing","claimed_at":...} and the later update that sets
processed_at); ensure the datetime import is added at the top of the module.

---

Outside diff comments:
In `@cloud/app/auth.py`:
- Around line 151-164: get_current_user_id_flexible is missing a call to
tag_user so Sentry won't have user context for errors; update the function to
call tag_user with the resolved user_id before returning in both branches (after
verify_api_key(cred) and after verify_jwt(cred)) — e.g., when cred startswith
"gd_" extract user_id from brain = await verify_api_key(cred), call
tag_user(brain["user_id"]) then return brain["user_id"], and in the JWT path do
id = await verify_jwt(cred); tag_user(id); return id. Ensure you import or
reference the existing tag_user function used by
get_current_user_id/get_current_brain.

In `@cloud/tests/test_billing.py`:
- Line 250: Remove the redundant "if True:" blocks that were left as artifacts
(they appear around the test assertions in the billing test file) — locate each
standalone "if True:" and its indented block and simply unindent its body into
the surrounding test scope (delete the "if True:" line and adjust indentation),
ensuring test logic and fixtures remain identical; repeat this cleanup for the
other occurrences of the same pattern so no "if True:" statements remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 154d9612-8f2d-4d7b-a3ce-1237fbf829ba

📥 Commits

Reviewing files that changed from the base of the PR and between 8bfc271 and 616cc99.

📒 Files selected for processing (3)
  • cloud/app/auth.py
  • cloud/app/routes/billing.py
  • cloud/tests/test_billing.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (22)
cloud/app/auth.py (3)

101-111: LGTM! Well-designed centralized Sentry tagging helper.

The _tag_brain function correctly centralizes the tagging logic with appropriate fallbacks: explicit user_id (JWT path) takes precedence, then brain.user_id, then brain.id. The workspace_id resolution handles falsy values correctly by converting empty strings to None.


124-139: LGTM! Early tagging ensures Sentry context is set before potential failures.

Both authentication paths now tag immediately after verification succeeds:

  • API-key path: tags via _tag_brain at line 127
  • JWT path: tags user at line 133 before brain lookup, then enriches with workspace at line 138

This ensures 403/404 errors are properly attributed in Sentry.


231-232: LGTM! Operator path now tags immediately after JWT verification.

Tagging happens before email resolution and domain checks, so operator-denied 403s will carry identity context in Sentry.

cloud/app/routes/billing.py (9)

56-61: LGTM! Rate limiting correctly applied to checkout endpoint.

The @auth_limit decorator is properly placed below the route decorator, and the parameter reorder (request, body, ...) is required for slowapi to access the request object.


129-137: LGTM! Clock skew grace period added for replay protection.

The 30-second grace period on top of the 5-minute max age addresses potential clock drift between Stripe's edge servers and your infrastructure.


140-161: LGTM! Event helpers handle both dict and Stripe.Event objects.

Clean defensive implementation that works regardless of how the event is represented.


170-203: LGTM! _already_processed correctly short-circuits only on completed events.

The function only returns True for status='processed' rows, allowing stale processing claims to be reclaimed by _claim_event. Lookup failures are treated as "not processed" so the real guard in _claim_event can surface ClaimStoreUnavailable if needed.


206-213: LGTM! ClaimStoreUnavailable exception enables proper failure mode handling.

This distinction between "store unavailable" and "legitimate duplicate" allows the webhook handler to return 503 for the former (triggering Stripe retries) rather than incorrectly acknowledging an unprocessed event.


327-348: LGTM! Defensive timestamp parsing with safe default.

The _claim_is_stale function handles various timestamp formats (ISO8601 string, datetime object) and correctly treats unparseable values as non-stale to avoid accidental double-processing.


371-412: LGTM! Webhook replay protection is well-implemented.

The handler correctly:

  1. Rejects events missing created (line 399-403)
  2. Rejects events older than max_age + grace (line 407-412)
  3. Applies the @sensitive_limit rate limiter

414-451: LGTM! Claim-first idempotency with proper failure mode handling.

The flow correctly:

  1. Short-circuits on already-processed events (line 419-421)
  2. Returns 503 on ClaimStoreUnavailable so Stripe retries (line 436-439)
  3. Returns 200 duplicate on claim race loss (line 440-442)
  4. Runs handler only when claim is won
  5. Marks processed after successful handler (line 446-449)

454-462: LGTM! Clear documentation of idempotency responsibility.

The docstring correctly documents that _handle_event trusts its caller to have won the atomic claim and must NOT insert into processed_webhooks.

cloud/tests/test_billing.py (10)

227-238: LGTM! Test helper updated to inject required fields for hardening checks.

The setdefault calls ensure every test event has id and created fields, matching real Stripe event structure while allowing individual tests to override.


265-271: LGTM! Test correctly filters to workspace writes only.

The filter [c for c in captured if c["table"] == "workspaces"] correctly excludes the processed_webhooks status update that happens after the handler, focusing assertions on the actual business logic.


338-352: LGTM! Clear documentation of _post_with vs _post_event distinction.

The comment explains why this helper intentionally omits the setdefault calls, allowing hardening tests precise control over id and created fields.


354-367: LGTM! Replay protection test validates stale event rejection.

The test correctly creates an event 6 minutes old (exceeding the 5min + 30s threshold) and verifies a 400 response with "too old" message.


381-415: LGTM! Duplicate detection test validates short-circuit behavior.

The test seeds a status='processed' row and verifies the handler doesn't run for duplicate events.


417-466: LGTM! Claim-first flow test validates INSERT then UPDATE lifecycle.

The test verifies:

  1. Exactly one claim INSERT with status='processing'
  2. One completion UPDATE flipping to status='processed'

468-517: LGTM! Claim race loss test validates double-apply prevention.

The test simulates losing the PK conflict race to another worker with an active (recent) claim and verifies the handler doesn't run.


519-573: LGTM! Stale claim reclaim test validates poison-pill recovery.

The test creates a 20-minute-old processing row (exceeding the 10-minute stale window) and verifies:

  1. A reclaim UPDATE occurs with the prior claimed_at in the filter
  2. The handler runs after successful reclaim

575-621: LGTM! Active claim test validates no premature reclaim.

The test creates a 30-second-old processing row (within the 10-minute window) and verifies neither reclaim nor handler execution occurs.


623-671: LGTM! Store unavailable test validates 503 failure mode.

Critical regression test ensuring that when the processed_webhooks store is unreachable, the handler returns 503 (triggering Stripe retries) rather than incorrectly acknowledging the event.

Comment on lines +303 to +312
try:
updated = await db.update(
"processed_webhooks",
data={"status": "processing", "claimed_at": "now()"},
filters={
"event_id": event_id,
"claimed_at": claimed_at_raw,
"status": "processing",
},
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for how the db wrapper handles update data values
# to verify if "now()" is treated as a SQL expression

# Find the db.py file and check update method implementation
fd -t f 'db.py' --exec cat {}

# Also check if there are any tests that verify "now()" handling
rg -n "now\(\)" --type py -C 3

Repository: Gradata/gradata

Length of output: 22208


🏁 Script executed:

#!/bin/bash
# Search for processed_webhooks table definition/schema
rg -n "processed_webhooks" --type py -A 5 -B 5

# Search for table creation SQL or migrations
fd -t f -e sql -e py | xargs grep -l "processed_webhooks" 2>/dev/null || echo "No SQL files found"

# Look for tests related to webhook processing
fd -t f 'test*.py' | xargs grep -l "processed_webhooks" 2>/dev/null || echo "No tests found"

Repository: Gradata/gradata

Length of output: 22184


🏁 Script executed:

#!/bin/bash
# Read the migration file for processed_webhooks table
cat cloud/migrations/005_processed_webhooks.sql

Repository: Gradata/gradata

Length of output: 2276


🏁 Script executed:

#!/bin/bash
# Search for other timestamp updates in the codebase
rg -n "processed_at|claimed_at|updated_at|created_at" cloud/app/routes/billing.py -A 2 -B 2

# Look for any examples of timestamp handling in updates
rg -n "DEFAULT now()" cloud/migrations/ -A 3 -B 3

# Search for how timestamps are normally passed in data updates
rg -n "data=\{" cloud/app/routes/ -A 5 | head -60

Repository: Gradata/gradata

Length of output: 12820


Replace "now()" string literals with properly formatted timestamps.

The reclaim UPDATE at line 306 and completion UPDATE at line 362 pass "now()" as a string literal to db.update(). Since SupabaseClient.update() sends JSON directly to PostgREST without special function call handling, PostgreSQL will receive the literal string 'now()' instead of the current timestamp, breaking the staleness window logic.

Use the pattern established elsewhere in the codebase (e.g., cloud/app/routes/sync.py:205):

from datetime import datetime, timezone
data={"status": "processing", "claimed_at": datetime.now(timezone.utc).isoformat()}

Apply the same fix to line 362's processed_at update.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/billing.py` around lines 303 - 312, The updates to the
processed_webhooks row currently pass the literal string "now()" (in the
db.update calls that set claimed_at and processed_at), which results in the
string 'now()' being stored; change these to real ISO timestamps by importing
datetime and timezone and setting claimed_at and processed_at to
datetime.now(timezone.utc).isoformat() when building the data dicts for the
db.update calls (refer to the db.update calls that set
data={"status":"processing","claimed_at":...} and the later update that sets
processed_at); ensure the datetime import is added at the top of the module.

@Gradata Gradata merged commit 9075364 into main Apr 15, 2026
10 of 11 checks passed
Gradata added a commit that referenced this pull request Apr 15, 2026
)

* feat(cloud): GDPR endpoints (/me/export, /me/delete, /me/data-summary)

Adds GDPR Article 15 (access) and Article 17 (erasure) endpoints plus a
data summary endpoint for account-settings UIs. Soft-delete cascades via
migration 005 — actual row purge will run as a nightly Railway cron
after the 30-day grace period (TODO tracked in the SQL).

- GET  /me/export: returns inline JSON payload (<10MB), rate-limited to
  1 call/24h via gdpr_export_requests ledger table
- POST /me/delete: 202 Accepted, tombstones user + owned workspaces +
  owned brains, logs a confirmation email (ESP wire-up pending)
- GET  /me/data-summary: counts + oldest/newest timestamps
- auth.py + users.py now filter soft-deleted rows so tombstoned users
  cannot authenticate and tombstoned workspaces disappear from profile
- 13 new tests (UUID fixtures, no PII), 128 total cloud tests passing

* feat(marketing): DPA, SLA, and subprocessors pages (draft)

Adds three enterprise/GDPR compliance pages requested on Day 1 by
enterprise procurement teams. All three are marked DRAFT — pending legal
review at the top of the page so no one mistakes them for
counsel-reviewed contracts.

- /legal/dpa: GDPR Article 28 clauses (definitions, scope, security,
  sub-processors, data-subject rights, breach notification, SCCs)
- /legal/sla: 99.9% uptime target, service-credit table, support
  response matrix by severity, incident communication policy
- /legal/subprocessors: Railway, Supabase, Stripe, Cloudflare, Sentry
  with purpose, data categories, and location per subprocessor
- Footer now links DPA/SLA/Subprocessors alongside Privacy + Terms

* feat: security.txt (RFC 9116) + incident response runbook

- marketing/public/.well-known/security.txt: contact security@gradata.ai,
  90-day coordinated-disclosure window, scoped to gradata.ai apex + api +
  app + SDK repo, expires 2027-04-13 (12 months out).
- cloud/RUNBOOK-INCIDENT.md: step-by-step for the top 5 incidents we
  expect — api.gradata.ai 500s, Railway outage, Supabase pool exhaustion,
  Stripe webhook backlog, GDPR endpoint misbehavior. Includes severity
  table, communication templates, and escalation contacts.

* chore(cloud): rename migration 005_soft_delete -> 006_soft_delete

PR #33 (API security hardening) introduced 005_processed_webhooks.sql.
PR #33 has been merged and 005_processed_webhooks already applied to prod Supabase.
Renaming this migration to 006 to resolve the collision.

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore: update docstring reference to 006_soft_delete.sql

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore: address CodeRabbit feedback on PR #36

- Draft DPA + subprocessors pages: noindex until legal sign-off (Major)
- DPA: soften deletion language to match implementation — nightly
  purge cron is still out of scope, so we now describe the 30-day
  window as a target via the scheduled purge workflow (Major)
- Runbook: remove destructive `DELETE FROM gdpr_export_requests`
  instruction — the DSR ledger is an audit trail and must stay
  immutable. Replaced with an operator-approved, time-boxed override
  procedure requiring approver + reason + follow-up ticket (Major)
- GDPR export: shrink TOCTOU window by inserting the rate-limit
  ledger row BEFORE generating the payload, not after (Minor)
- security.txt: "open source" -> "open-source" compound adjective (Minor)

Declined (with reason):
- N+1 queries in data-summary / export / delete: the current PostgREST
  `db.select` wrapper only supports `eq` filters, no `IN` and no
  transactions. Batching needs a db-layer change tracked separately.
- Test fixture defensive defaults: MockSupabaseClient already uses
  defaultdict, so unregistered tables return [] — no fix needed.
- Supabase admin email fallback in delete_me: no admin client wired
  yet; current defensive None-handling is sufficient and the
  confirmation email itself is still a stub.

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore: address CodeRabbit round-2 feedback on PR #36

- gdpr.py: idempotency guard on /me/delete — repeat calls return existing
  ledger state instead of resetting the 30-day purge window or re-cascading.
  Reuses the pre-cascade users fetch for the email lookup (one fewer round-trip).
- 006_soft_delete.sql: wrap policy creation in DROP POLICY IF EXISTS so the
  migration is re-runnable (Postgres has no CREATE POLICY IF NOT EXISTS).
- subprocessors/page.tsx: hyphenate "open-source" as a compound adjective.
- Add inline notes flagging the tracked follow-ups for the previously
  declined items (cascade N+1 needs db.select IN-support; admin email needs
  ESP wired before _send_deletion_confirmation can dispatch).
- New test: test_delete_is_idempotent_when_already_soft_deleted.

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore: address CodeRabbit round-3 feedback on PR #36 (rate-limit query)

Flag the _enforce_export_rate_limit full-scan as a db-layer follow-up.
SupabaseClient.select currently only accepts eq filters — no order_by /
limit support — so narrowing the query requires extending the wrapper.
Leaving a TODO to track rather than scoping the db-layer change into
this PR.

* chore: address CodeRabbit round-4 on PR #36; reorder cascade + best-effort rollback

- Guard /me/data-summary and /me/export against soft-deleted accounts
  (JWT stays valid until TTL; add users.deleted_at IS NULL check, 404 on deleted).
- Don't burn the 24h export quota on failed exports: move the
  gdpr_export_requests ledger insert to AFTER size check passes so 507
  oversized-payload rejections don't block the user from retrying.
- Strip raw user_id and email from GDPR info logs (log counts/has_email only)
  since these logs are long-lived and compliance-sensitive.
- Reorder soft-delete cascade to tombstone subordinate resources
  (workspaces, brains) BEFORE the users row, so mid-cascade failures leave
  the account recoverable rather than orphaned-but-tombstoned. Wrap in
  try/except with best-effort rollback of partial subordinate updates and
  raise HTTP 500 so the client retries. Proper transactional support is
  tracked as a follow-up (requires db-wrapper changes).
- Add tests for: soft-deleted-user read block, ledger-row-only-on-success,
  cascade subordinate-first ordering, mid-cascade rollback.

Co-Authored-By: Gradata <noreply@gradata.ai>

* refactor(cloud): simplify pass on gdpr-compliance branch

- cloud/app/routes/gdpr.py: extract _collect_brain_children() to dedupe
  the four-table per-brain fetch (used by both /me/data-summary and
  /me/export), and parallelize each table x brain select via
  asyncio.gather. Merge _min_created_at / _max_created_at into a single
  _created_at_range() helper. Compliance-critical preserved untouched:
  _require_active_user 404 guard, ledger-insert-after-size-check,
  PII-scrubbed logs, cascade reorder + best-effort rollback.
- marketing/src/components/LegalPageHeader.tsx (new): shared header
  + LegalDraftBanner for the three new legal pages (DPA, SLA,
  subprocessors) — drops ~60 lines of duplicated markup.
- All 135 cloud tests pass. ruff/pyright clean vs pre-existing baseline.
- Marketing `next build` green; all five /legal/* routes prerender.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(review): address CodeRabbit feedback round 5 on GDPR

Collapse GDPR soft-delete cascade from N+1 per-row PATCHes to 3 bulk
PATCHes using PostgREST id=in.(...). This addresses the recurring
CodeRabbit comment about the cascade being non-atomic: we still can't
use a true transaction (SupabaseClient has no transactional API), but
we shrink the mid-cascade failure window to three single round-trips
and preserve the subordinates-first ordering + best-effort rollback.

- SupabaseClient.update now accepts list/tuple/set filter values and
  emits PostgREST in.(...) syntax. Empty list filters are a no-op
  (return []) so we never fire an unfiltered PATCH that would rewrite
  every row in the table.
- gdpr.delete_me cascades via 3 writes (workspaces, brains, users)
  instead of N+1 and rolls back with at most 2 bulk PATCHes on failure.
- MockSupabaseClient.update mirrors the empty-list-filter no-op so
  tests reflect prod semantics.
- test_delete_rolls_back_partial_cascade_on_mid_flight_failure updated
  to assert bulk-filter rollback shape.

135/135 cloud tests pass.

* fix(legal): address round-6 CodeRabbit review

- SLA page: noindex/nofollow while draft (robots metadata)
- SLA tables: add <caption> + scope="col" for a11y
- DPA: use full /api/v1/me/export and /api/v1/me/delete paths

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(db): refuse unfiltered update/delete (CR round-7)

CodeRabbit flagged that db.update() could issue a full-table PATCH when
filters=None or {}. Now raises HTTPException 500 with a scrubbed message
before touching PostgREST. Applied the same guard to db.delete() for
defense-in-depth. Mock client mirrors the new contract so tests exercise
the guard path.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(cloud/db): defense-in-depth guard + tests for unfiltered PATCH/DELETE

Round-9 response to CodeRabbit critical finding on PR #36. The primary
guard (`if not filters: raise`) was already present from earlier rounds;
this commit adds:

- Post-transform empty-params guard in delete() mirroring update()
- Dedicated test_db_guards.py with 5 tests proving update() and delete()
  never fire an HTTP call when filters is None or {}, using a fake http
  client that asserts on any call

---------

Co-authored-by: Oliver Le <oliver@gradata.com>
Co-authored-by: Gradata <noreply@gradata.ai>
@Gradata Gradata deleted the feat/api-security-hardening branch April 15, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant