feat: GDPR data endpoints + DPA/SLA/security.txt + incident runbook#36
Conversation
Adds GDPR Article 15 (access) and Article 17 (erasure) endpoints plus a data summary endpoint for account-settings UIs. Soft-delete cascades via migration 005 — actual row purge will run as a nightly Railway cron after the 30-day grace period (TODO tracked in the SQL). - GET /me/export: returns inline JSON payload (<10MB), rate-limited to 1 call/24h via gdpr_export_requests ledger table - POST /me/delete: 202 Accepted, tombstones user + owned workspaces + owned brains, logs a confirmation email (ESP wire-up pending) - GET /me/data-summary: counts + oldest/newest timestamps - auth.py + users.py now filter soft-deleted rows so tombstoned users cannot authenticate and tombstoned workspaces disappear from profile - 13 new tests (UUID fixtures, no PII), 128 total cloud tests passing
Adds three enterprise/GDPR compliance pages requested on Day 1 by enterprise procurement teams. All three are marked DRAFT — pending legal review at the top of the page so no one mistakes them for counsel-reviewed contracts. - /legal/dpa: GDPR Article 28 clauses (definitions, scope, security, sub-processors, data-subject rights, breach notification, SCCs) - /legal/sla: 99.9% uptime target, service-credit table, support response matrix by severity, incident communication policy - /legal/subprocessors: Railway, Supabase, Stripe, Cloudflare, Sentry with purpose, data categories, and location per subprocessor - Footer now links DPA/SLA/Subprocessors alongside Privacy + Terms
- marketing/public/.well-known/security.txt: contact security@gradata.ai, 90-day coordinated-disclosure window, scoped to gradata.ai apex + api + app + SDK repo, expires 2027-04-13 (12 months out). - cloud/RUNBOOK-INCIDENT.md: step-by-step for the top 5 incidents we expect — api.gradata.ai 500s, Railway outage, Supabase pool exhaustion, Stripe webhook backlog, GDPR endpoint misbehavior. Includes severity table, communication templates, and escalation contacts.
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthrough
WalkthroughAdds GDPR endpoints (data-summary, export with 24h rate-limit and 10MB inline cap, delete scheduling with 30-day purge), Pydantic response models, DB soft-delete and export ledger schema with RLS, soft-delete-aware auth/profile filtering, Supabase write-guards and test mock alignment, tests, incident runbook, legal marketing pages, footer links, and security.txt. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as GDPR API
participant DB
participant Supabase
rect rgba(200,200,255,0.5)
Client->>API: GET /api/v1/me/export (auth)
API->>DB: query gdpr_export_requests for recent entries
alt recent request within 24h
DB-->>API: recent row
API-->>Client: 429 Too Many Requests
else allowed
API->>Supabase: fetch workspaces, brains, corrections, lessons, events, meta_rules (include soft-deleted)
Supabase-->>API: aggregated rows
API->>API: serialize JSON, compute size
alt size ≤ 10MB
API->>DB: insert gdpr_export_requests ledger row
DB-->>API: ack
API-->>Client: 200 JSON export (inline)
else size > 10MB
API-->>Client: 507 Payload Too Large
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Deploying gradata-dashboard with
|
| Latest commit: |
d9f0d83
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://3574eadf.gradata-dashboard.pages.dev |
| Branch Preview URL: | https://feat-gdpr-compliance.gradata-dashboard.pages.dev |
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 173-209: The TOCTOU issue is that
_enforce_export_rate_limit(user_id) and the subsequent db.insert into
"gdpr_export_requests" are not atomic, allowing concurrent requests to bypass
the limit; fix by making the check-and-record atomic: either perform an
insert-first/upsert with a uniqueness constraint (e.g., on (user_id,
date_bucket)) and rely on ON CONFLICT/ignore to detect prior exports, or wrap
the logic in a DB transaction that locks the user's export ledger row before
checking and inserting; modify the code paths around _enforce_export_rate_limit
and the db.insert("gdpr_export_requests", ...) call so the DB enforces
uniqueness/transactional semantics rather than trusting separate read-then-write
steps.
- Around line 273-289: The cascade currently issues N individual db.update calls
for each workspace/brain via _collect_user_workspaces and _collect_user_brains
causing N+1 writes and potential partial soft-delete; instead collect the IDs
from owned_workspaces/owned_brains and perform a single bulk update (e.g.,
update WHERE id IN [...], setting deleted_at to now_iso) for the "workspaces"
and "brains" tables, and wrap both bulk updates and the parent user soft-delete
in a single transaction using your db.transaction/db.with_transaction API (or
fallback to a single raw SQL update if bulk update helper is unavailable) to
ensure atomicity and avoid partial failures.
- Around line 104-108: The loop over brain_ids issues an N+1 pattern by calling
db.select per brain for corrections, lessons, events, and meta_rules; replace
this with four batched queries that fetch all rows for those tables in one call
each using an IN-style filter on brain_id (e.g., use db.select("corrections",
filters={"brain_id": <IN brain_ids>}) etc.), then extend all_corrections,
all_lessons, all_events, and all_meta_rules with the results; update code that
references brain_ids/ db.select to use the batched results and remove the
per-bid selects.
- Around line 291-296: The shadow users upsert doesn't populate email, so
db.select("users", ...) can return None and _send_deletion_confirmation(user_id,
email) may be called without an address; modify the flow to fall back to the
Supabase admin auth API when email is None: after the db.select call, if email
is still None call the Supabase admin user lookup (e.g., use the project admin
client to fetch auth.users by user_id or call the admin auth API/get_user_by_id)
to retrieve the user's email and assign it before calling
_send_deletion_confirmation; keep the existing db.select fallback but ensure the
Supabase admin lookup runs only when email is missing.
In `@cloud/RUNBOOK-INCIDENT.md`:
- Around line 158-160: Replace the instruction to delete rows from the
gdpr_export_requests ledger with a non-destructive operator-override process:
remove the DELETE guidance and instead document an explicit override workflow
(who can approve, required approver field, reason field, and where to record
approval) and an immutable marker/update action (e.g., an update flag or
separate audit table) that preserves the original ledger row; reference the
gdpr_export_requests ledger name and ensure the runbook describes how to record
the approver, timestamp, and justification for any override so auditability and
compliance are maintained.
In `@cloud/tests/test_gdpr.py`:
- Around line 36-70: The gdpr_client fixture's inner function _make should
register safe default empty responses on mock_supabase for tables that some
tests omit (at minimum "corrections", "lessons", "events", "meta_rules") so
unregistered-table access won't break tests; update _make to call
mock_supabase.add_response("corrections","select", []) etc. when the
corresponding parameter is None (keep the existing conditional pattern used for
workspaces/brains/users) so tests that don't pass those lists still receive an
empty result.
In `@marketing/app/legal/dpa/page.tsx`:
- Line 9: The robots meta currently enables indexing via the object robots: {
index: true, follow: true }; change it to prevent indexing for the draft DPA by
setting index to false (e.g., robots: { index: false, follow: true } or
equivalent) so the draft page is noindexed until legal approval; update the
robots object where it is defined in this page (the robots export/object in
marketing/app/legal/dpa/page.tsx) to reflect the noindex state.
- Around line 126-127: Update the DPA language to avoid asserting automatic
purging within 30 days: replace the phrases "is then purged" and any statement
like "will delete … within 30 days" with wording that reflects current
implementation (e.g., data enters a 30-day soft-delete retention period and is
eligible for purge but nightly purge cron is out of scope/subject to operational
processes), and apply the same change to the similar occurrences around lines
referenced (the other instance at 168-169) so the text no longer guarantees an
automated purge/deletion timeline.
In `@marketing/app/legal/subprocessors/page.tsx`:
- Line 9: The page currently sets robots: { index: true, follow: true } but the
page content contains the “DRAFT — pending legal review.” marker; change the
robots metadata to prevent indexing until sign-off by setting index to false
(e.g., robots: { index: false, follow: true }) or implement a conditional around
the robots object that flips index to false when the draft marker is present;
update the robots/index value in page.tsx (the robots object) so drafts are
noindexed.
In `@marketing/public/.well-known/security.txt`:
- Line 14: Change the phrase "- The Gradata open source SDK
(github.com/Gradata/gradata)" to use the compound adjective "open-source" (i.e.,
"open-source SDK") so the line reads "- The Gradata open-source SDK
(github.com/Gradata/gradata)" to match standard usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fb67bf1e-e7ee-4acd-9939-91bc8a62e8cd
📒 Files selected for processing (13)
cloud/RUNBOOK-INCIDENT.mdcloud/app/auth.pycloud/app/models.pycloud/app/routes/__init__.pycloud/app/routes/gdpr.pycloud/app/routes/users.pycloud/migrations/005_soft_delete.sqlcloud/tests/test_gdpr.pymarketing/app/legal/dpa/page.tsxmarketing/app/legal/sla/page.tsxmarketing/app/legal/subprocessors/page.tsxmarketing/public/.well-known/security.txtmarketing/src/components/Footer.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cloudflare Pages
🧰 Additional context used
🪛 LanguageTool
marketing/public/.well-known/security.txt
[grammar] ~14-~14: Use a hyphen to join words.
Context: ...i, api.gradata.ai # - The Gradata open source SDK (github.com/Gradata/gradata) ...
(QB_NEW_EN_HYPHEN)
🔇 Additional comments (9)
cloud/app/routes/__init__.py (1)
13-13: GDPR router integration looks correct.Import and registration are consistent with the existing router aggregation pattern.
Also applies to: 36-36
marketing/src/components/Footer.tsx (1)
35-37: Legal footer link additions are clean and consistent.The new DPA/SLA/Subprocessors entries follow the same pattern as existing legal links.
cloud/app/routes/users.py (1)
37-38: Soft-delete filtering is correctly applied in both profile paths.Good consistency between
get_profileandupdate_profilewhen excluding tombstoned workspaces from response payloads.Also applies to: 84-85
cloud/app/auth.py (1)
48-50: Auth-path soft-delete enforcement is a strong correctness/security improvement.The new filters consistently prevent tombstoned brains from authenticating or passing ownership checks.
Also applies to: 118-120, 135-137
marketing/app/legal/sla/page.tsx (1)
1-122: LGTM!The SLA page is well-structured with proper semantic HTML (
<article>,<header>,<section>), accessible table markup, and appropriate Next.js metadata configuration. The draft banner is clearly visible, and the Tailwind v4 CSS variable syntax is correctly applied.cloud/app/models.py (1)
326-376: LGTM!The GDPR models are well-designed with clear docstrings, appropriate type hints, and sensible defaults. The
DeleteAccountResponsemessage provides clear user guidance including the contact email for cancellation.cloud/migrations/005_soft_delete.sql (2)
41-48: LGTM!The partial indexes are well-designed—indexing only rows where
deleted_at IS NOT NULLkeeps the indexes small and focused on the tombstoned records that the purge cron will query.
29-35: File reference error and misunderstanding of authentication model: RLS is not applicable here.The file referenced is
006_soft_delete.sql(not 005). More importantly, theusersshadow table does not need RLS because the app uses Supabase's service-role authentication (viasettings.supabase_service_keyindb.py). Service-role keys bypass RLS entirely in Supabase.The migration file explicitly documents this design choice (lines 8-10):
"Existing RLS policies SHOULD filter deleted_at IS NULL; this migration adds a convenience view but does NOT rewrite policies. App-side queries must filter on their own (enforced in app/db + route handlers)."
The intended protection is app-side filtering, not RLS. This is evident in the codebase:
gdpr.py(lines 265, 293) andauth.py(line 171) handle tombstone filtering at the application level, which is the correct approach for service-role authenticated clients. Thegdpr_export_requeststable has RLS because it requires direct user isolation (the policy usesauth.uid()), but theuserstable is already protected by app-side logic sufficient for service-role access.> Likely an incorrect or invalid review comment.cloud/tests/test_gdpr.py (1)
1-336: LGTM!Comprehensive test coverage for all three GDPR endpoints including:
- Authentication enforcement
- Correct counting and date-range calculation for data-summary
- Export payload structure and rate-limiting behavior
- Soft-delete inclusion in exports / exclusion in summaries
- Delete cascade and idempotency
- Backward compatibility checks
The UUID-based fixtures appropriately avoid PII in test output.
| for bid in brain_ids: | ||
| all_corrections.extend(await db.select("corrections", filters={"brain_id": bid})) | ||
| all_lessons.extend(await db.select("lessons", filters={"brain_id": bid})) | ||
| all_events.extend(await db.select("events", filters={"brain_id": bid})) | ||
| all_meta_rules.extend(await db.select("meta_rules", filters={"brain_id": bid})) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
N+1 query pattern: consider batching per-brain queries.
Each brain triggers 4 separate database queries (corrections, lessons, events, meta_rules). For users with many brains, this could degrade performance. Consider batching with an IN clause on brain_id.
♻️ Suggested approach
- for bid in brain_ids:
- all_corrections.extend(await db.select("corrections", filters={"brain_id": bid}))
- all_lessons.extend(await db.select("lessons", filters={"brain_id": bid}))
- all_events.extend(await db.select("events", filters={"brain_id": bid}))
- all_meta_rules.extend(await db.select("meta_rules", filters={"brain_id": bid}))
+ # Batch query all records for user's brains in single calls
+ if brain_ids:
+ all_corrections = await db.select("corrections", filters={"brain_id": ("in", brain_ids)})
+ all_lessons = await db.select("lessons", filters={"brain_id": ("in", brain_ids)})
+ all_events = await db.select("events", filters={"brain_id": ("in", brain_ids)})
+ all_meta_rules = await db.select("meta_rules", filters={"brain_id": ("in", brain_ids)})Note: Exact syntax depends on your db.select API's support for IN clauses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/gdpr.py` around lines 104 - 108, The loop over brain_ids
issues an N+1 pattern by calling db.select per brain for corrections, lessons,
events, and meta_rules; replace this with four batched queries that fetch all
rows for those tables in one call each using an IN-style filter on brain_id
(e.g., use db.select("corrections", filters={"brain_id": <IN brain_ids>}) etc.),
then extend all_corrections, all_lessons, all_events, and all_meta_rules with
the results; update code that references brain_ids/ db.select to use the batched
results and remove the per-bid selects.
| await _enforce_export_rate_limit(user_id) | ||
| db = get_db() | ||
|
|
||
| workspaces = await _collect_user_workspaces(user_id, include_deleted=True) | ||
| brains = await _collect_user_brains(user_id, include_deleted=True) | ||
| brain_ids = [b["id"] for b in brains] | ||
|
|
||
| corrections: list[dict] = [] | ||
| lessons: list[dict] = [] | ||
| events: list[dict] = [] | ||
| meta_rules: list[dict] = [] | ||
| for bid in brain_ids: | ||
| corrections.extend(await db.select("corrections", filters={"brain_id": bid})) | ||
| lessons.extend(await db.select("lessons", filters={"brain_id": bid})) | ||
| events.extend(await db.select("events", filters={"brain_id": bid})) | ||
| meta_rules.extend(await db.select("meta_rules", filters={"brain_id": bid})) | ||
|
|
||
| payload: dict[str, Any] = { | ||
| "schema_version": 1, | ||
| "user_id": user_id, | ||
| "generated_at": _iso(_utcnow()), | ||
| "workspaces": workspaces, | ||
| "brains": brains, | ||
| "corrections": corrections, | ||
| "lessons": lessons, | ||
| "meta_rules": meta_rules, | ||
| "events": events, | ||
| } | ||
|
|
||
| serialized = json.dumps(payload, default=str) | ||
| size_bytes = len(serialized.encode("utf-8")) | ||
|
|
||
| # Log the export request for rate-limit enforcement on the next call. | ||
| await db.insert( | ||
| "gdpr_export_requests", | ||
| {"user_id": user_id, "created_at": _iso(_utcnow())}, | ||
| ) |
There was a problem hiding this comment.
TOCTOU race: concurrent requests can bypass rate limit.
The rate limit check (Line 173) and the ledger insert (Lines 206-209) are not atomic. Two concurrent requests could both pass the check before either inserts a record.
This is low-severity since the 24h window limits abuse, but consider using an INSERT with conflict handling or a transaction to make the rate limit atomic.
🛡️ Suggested fix using INSERT-first approach
async def export_me(
user_id: str = Depends(get_current_user_id),
) -> DataExportResponse:
- await _enforce_export_rate_limit(user_id)
db = get_db()
+
+ # Atomically check-and-insert: if recent record exists, fail; otherwise insert
+ now = _utcnow()
+ cutoff = now - EXPORT_RATE_WINDOW
+ # Check for recent exports first
+ rows = await db.select(
+ "gdpr_export_requests",
+ columns="id,created_at",
+ filters={"user_id": user_id},
+ )
+ for r in rows:
+ created = r.get("created_at")
+ # ... existing parsing logic ...
+ if created_dt >= cutoff:
+ raise HTTPException(status_code=429, detail="...")
+
+ # Insert rate limit record BEFORE generating payload
+ await db.insert(
+ "gdpr_export_requests",
+ {"user_id": user_id, "created_at": _iso(now)},
+ )
workspaces = await _collect_user_workspaces(user_id, include_deleted=True)
# ... rest of export logic ...
-
- # Log the export request for rate-limit enforcement on the next call.
- await db.insert(
- "gdpr_export_requests",
- {"user_id": user_id, "created_at": _iso(_utcnow())},
- )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/gdpr.py` around lines 173 - 209, The TOCTOU issue is that
_enforce_export_rate_limit(user_id) and the subsequent db.insert into
"gdpr_export_requests" are not atomic, allowing concurrent requests to bypass
the limit; fix by making the check-and-record atomic: either perform an
insert-first/upsert with a uniqueness constraint (e.g., on (user_id,
date_bucket)) and rely on ON CONFLICT/ignore to detect prior exports, or wrap
the logic in a DB transaction that locks the user's export ledger row before
checking and inserting; modify the code paths around _enforce_export_rate_limit
and the db.insert("gdpr_export_requests", ...) call so the DB enforces
uniqueness/transactional semantics rather than trusting separate read-then-write
steps.
| # Resolve email from the shadow users table if present (best-effort). | ||
| email: str | None = None | ||
| user_rows = await db.select("users", columns="email", filters={"id": user_id}) | ||
| if user_rows: | ||
| email = user_rows[0].get("email") | ||
| await _send_deletion_confirmation(user_id, email) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how users.email gets populated elsewhere in the codebase
rg -n "users.*email" --type py cloud/app/Repository: Gradata/gradata
Length of output: 400
🏁 Script executed:
#!/bin/bash
# Look at the upsert operation in context (lines 264-271 and surrounding code)
sed -n '260,300p' cloud/app/routes/gdpr.py | cat -nRepository: Gradata/gradata
Length of output: 1627
🏁 Script executed:
#!/bin/bash
# Search for all places where users table is updated with email
rg -n "db\.(insert|update|upsert)" cloud/app/routes/gdpr.py -A 5 | head -60Repository: Gradata/gradata
Length of output: 791
🏁 Script executed:
#!/bin/bash
# Check how users.email gets written in the entire codebase
rg -n "users.*email|email.*users" cloud/app/ --type py -B 2 -A 2Repository: Gradata/gradata
Length of output: 1249
🏁 Script executed:
#!/bin/bash
# Find all places where email is written to the users table
rg -n "db\.(insert|update|upsert)" cloud/app/ --type py -B 2 -A 10 | rg -B 5 -A 5 "users" | head -100Repository: Gradata/gradata
Length of output: 2949
🏁 Script executed:
#!/bin/bash
# Look for email population in auth.py and other auth-related flows
cat -n cloud/app/auth.py | head -250Repository: Gradata/gradata
Length of output: 8434
Email field will be None for newly-created shadow rows.
The upsert (lines 264–271) only sets id, deleted_at, and purge_after—not email. If the users row doesn't already exist, the subsequent lookup will return None. While the code defensively handles this, the confirmation email may not be sent. Either ensure the shadow users row is pre-populated with email before this endpoint runs, or remove the email dependency by fetching it from Supabase's auth.users table via the admin API.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/gdpr.py` around lines 291 - 296, The shadow users upsert
doesn't populate email, so db.select("users", ...) can return None and
_send_deletion_confirmation(user_id, email) may be called without an address;
modify the flow to fall back to the Supabase admin auth API when email is None:
after the db.select call, if email is still None call the Supabase admin user
lookup (e.g., use the project admin client to fetch auth.users by user_id or
call the admin auth API/get_user_by_id) to retrieve the user's email and assign
it before calling _send_deletion_confirmation; keep the existing db.select
fallback but ensure the Supabase admin lookup runs only when email is missing.
PR #36 (GDPR) uses migration 006_soft_delete.sql. Renaming telemetry migration to 007 to resolve collision. Co-Authored-By: Gradata <noreply@gradata.ai>
- Draft DPA + subprocessors pages: noindex until legal sign-off (Major) - DPA: soften deletion language to match implementation — nightly purge cron is still out of scope, so we now describe the 30-day window as a target via the scheduled purge workflow (Major) - Runbook: remove destructive `DELETE FROM gdpr_export_requests` instruction — the DSR ledger is an audit trail and must stay immutable. Replaced with an operator-approved, time-boxed override procedure requiring approver + reason + follow-up ticket (Major) - GDPR export: shrink TOCTOU window by inserting the rate-limit ledger row BEFORE generating the payload, not after (Minor) - security.txt: "open source" -> "open-source" compound adjective (Minor) Declined (with reason): - N+1 queries in data-summary / export / delete: the current PostgREST `db.select` wrapper only supports `eq` filters, no `IN` and no transactions. Batching needs a db-layer change tracked separately. - Test fixture defensive defaults: MockSupabaseClient already uses defaultdict, so unregistered tables return [] — no fix needed. - Supabase admin email fallback in delete_me: no admin client wired yet; current defensive None-handling is sufficient and the confirmation email itself is still a stub. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (4)
cloud/app/routes/gdpr.py (4)
104-108: 🧹 Nitpick | 🔵 TrivialN+1 query pattern: consider batching per-brain queries.
This iterates 4 queries per brain. For users with many brains, this degrades performance. If your
db.selectAPI supportsINfilters, batch these into 4 total queries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/gdpr.py` around lines 104 - 108, The loop issues N+1 queries by calling db.select for each brain_id; instead batch by querying each table once with an IN filter using the full brain_ids list (replace the per-id calls that populate all_corrections, all_lessons, all_events, all_meta_rules). Modify the code around brain_ids and the db.select calls so you run four selects (one per table) with an IN-style filter (e.g., filters that express brain_id IN brain_ids according to your db.select API) and extend the corresponding lists with the combined results.
279-295: 🧹 Nitpick | 🔵 TrivialConsider batching cascade updates and wrapping in a transaction.
Individual updates per workspace/brain create N+1 writes. If one update fails mid-cascade, the result is partial soft-deletion. Consider batching with
INfilters if supported, and wrapping in a transaction for atomicity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/gdpr.py` around lines 279 - 295, The current cascade loops call db.update per record (in _collect_user_workspaces and _collect_user_brains) causing N+1 writes and risking partial soft-deletes; change the logic to collect IDs from owned_workspaces and owned_brains and perform a single batched update using an IN-style filter (e.g., filters={"id__in": ids} or db-specific equivalent) to update the "workspaces" and "brains" tables with data={"deleted_at": now_iso}; additionally wrap both batched updates in a database transaction so both cascade updates succeed or fail atomically, keeping references to the existing functions/variables (_collect_user_workspaces, _collect_user_brains, db.update, workspaces, brains, now_iso).
297-302:⚠️ Potential issue | 🟡 MinorEmail will be
Nonefor newly-created shadow rows.The upsert at lines 270-277 only sets
id,deleted_at, andpurge_after— notusersrow didn't exist before, the select here returnsNonefor email, and the deletion confirmation won't have an address.Consider either:
- Pre-populating
users.emailduring signup/auth flows, or- Fetching email from Supabase's
auth.usersvia admin API as a fallback
196-200: 🧹 Nitpick | 🔵 TrivialSame N+1 pattern in export endpoint.
Same issue as
/me/data-summary— 4 queries per brain. Consider batching.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/gdpr.py` around lines 196 - 200, The loop over brain_ids causes an N+1 query pattern by calling db.select four times per brain; replace the per-brain selects with a single batched select per table using the full brain_ids list (e.g., use a filter that queries brain_id IN brain_ids) to populate corrections, lessons, events and meta_rules in one query each and eliminate the for bid in brain_ids loop and its db.select calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 253-262: The delete_me endpoint should be made idempotent: before
updating the user's deleted_at and purge_after timestamps, fetch the current
user record (in the delete_me handler) and if deleted_at is already set, return
a DeleteAccountResponse built from the existing deleted_at/purge_after (and skip
cascading updates to workspaces/brains); otherwise proceed to set deleted_at and
purge_after and cascade as currently implemented. Locate the delete_me function
and add the pre-check for user.deleted_at, returning the existing values to
avoid resetting the 30-day purge window on repeated calls.
In `@cloud/migrations/006_soft_delete.sql`:
- Around line 65-66: The CREATE POLICY gdpr_export_self on gdpr_export_requests
is not idempotent and will error if reapplied; modify the migration to first
drop the policy if it exists and then recreate it (use DROP POLICY IF EXISTS
gdpr_export_self; followed by the existing CREATE POLICY gdpr_export_self ON
gdpr_export_requests FOR ALL USING (user_id = auth.uid());) so re-running the
migration is safe—ensure both statements run in the same migration
transaction/schema context as the original CREATE POLICY.
In `@marketing/app/legal/subprocessors/page.tsx`:
- Line 75: Replace the phrase "open source SDK" with the compound adjective
"open-source SDK" in the JSX text that currently reads "hosted Gradata Cloud
service. The open source SDK running locally" in page.tsx; update the string so
the sentence becomes "... hosted Gradata Cloud service. The open-source SDK
running locally" to match security.txt formatting.
---
Duplicate comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 104-108: The loop issues N+1 queries by calling db.select for each
brain_id; instead batch by querying each table once with an IN filter using the
full brain_ids list (replace the per-id calls that populate all_corrections,
all_lessons, all_events, all_meta_rules). Modify the code around brain_ids and
the db.select calls so you run four selects (one per table) with an IN-style
filter (e.g., filters that express brain_id IN brain_ids according to your
db.select API) and extend the corresponding lists with the combined results.
- Around line 279-295: The current cascade loops call db.update per record (in
_collect_user_workspaces and _collect_user_brains) causing N+1 writes and
risking partial soft-deletes; change the logic to collect IDs from
owned_workspaces and owned_brains and perform a single batched update using an
IN-style filter (e.g., filters={"id__in": ids} or db-specific equivalent) to
update the "workspaces" and "brains" tables with data={"deleted_at": now_iso};
additionally wrap both batched updates in a database transaction so both cascade
updates succeed or fail atomically, keeping references to the existing
functions/variables (_collect_user_workspaces, _collect_user_brains, db.update,
workspaces, brains, now_iso).
- Around line 196-200: The loop over brain_ids causes an N+1 query pattern by
calling db.select four times per brain; replace the per-brain selects with a
single batched select per table using the full brain_ids list (e.g., use a
filter that queries brain_id IN brain_ids) to populate corrections, lessons,
events and meta_rules in one query each and eliminate the for bid in brain_ids
loop and its db.select calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a0718f08-5da6-45be-8e96-6d84871b6587
📒 Files selected for processing (6)
cloud/RUNBOOK-INCIDENT.mdcloud/app/routes/gdpr.pycloud/migrations/006_soft_delete.sqlmarketing/app/legal/dpa/page.tsxmarketing/app/legal/subprocessors/page.tsxmarketing/public/.well-known/security.txt
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (12)
cloud/RUNBOOK-INCIDENT.md (2)
158-167: LGTM — ledger immutability guidance correctly updated.The revised guidance preserves the
gdpr_export_requestsaudit trail by requiring documented operator approval instead of direct row deletion. The four-point documentation requirement (approver, reason, window, follow-up ticket) maintains compliance accountability.
1-209: LGTM — comprehensive incident runbook.The runbook provides actionable playbooks for the five most likely incident types, with appropriate severity definitions, communication templates, and escalation paths. The SQL examples for GDPR troubleshooting are well-targeted.
marketing/public/.well-known/security.txt (1)
1-22: LGTM!The security.txt file follows RFC 9116 correctly with all required fields (Contact, Expires, Canonical) and recommended fields (Policy, Acknowledgments, Preferred-Languages). The scope definition is clear, the 90-day disclosure window is reasonable, and the expiry date (~1 year out) is appropriate.
marketing/app/legal/subprocessors/page.tsx (2)
4-11: LGTM — noindex correctly applied for draft page.The robots directive properly prevents indexing while the page is in draft status, with a clear comment explaining the intent. This addresses the compliance concern about indexing unpublished legal content.
21-57: LGTM — well-structured subprocessor data.The typed
Subprocessorinterface and array provide clear, maintainable documentation of third-party vendors. External links correctly userel="noreferrer"for security.marketing/app/legal/dpa/page.tsx (3)
4-11: LGTM — noindex correctly applied for draft DPA.The robots directive properly prevents search engine indexing while the page awaits legal review.
121-133: LGTM — deletion language appropriately hedged.The phrasing "soft-delete window with a target purge horizon of 30 days, executed via a scheduled purge workflow" accurately reflects the implementation status (nightly cron is out of scope) without making binding commitments that aren't yet enforced.
165-173: LGTM — termination deletion language consistent.The "targets deletion" phrasing aligns with Section 9 and avoids overpromising given the pending purge cron implementation.
cloud/migrations/006_soft_delete.sql (2)
20-48: LGTM — schema additions are well-structured.The
IF NOT EXISTSguards make the migration idempotent for table/column/index creation. Partial indexes ondeleted_at IS NOT NULLefficiently support purge queries without bloating the index with active rows.
29-35: RLS is intentional by design — no action needed.The
public.userstable intentionally lacks RLS because all access is backend-only via service role (which bypasses RLS anyway), and app-side filtering is enforced in the db layer and route handlers. This design is explicitly documented in the migration file (lines 8-10): "App-side queries must filter on their own (enforced in app/db + route handlers)." Thegdpr_export_requeststable uses RLS for defensive purposes, but theusersshadow table doesn't require it given the access pattern.cloud/app/routes/gdpr.py (2)
176-186: LGTM — TOCTOU window reduced by moving insert before payload generation.The ledger insert now happens before the expensive data collection, reducing the race window. The comment at lines 176-182 transparently acknowledges the remaining limitation without PostgREST transaction support.
1-52: LGTM — well-documented module with clear contracts.The module docstring clearly explains the three endpoints, rate limiting strategy, inline threshold, and TODOs. Helper functions are appropriately scoped and documented.
- gdpr.py: idempotency guard on /me/delete — repeat calls return existing ledger state instead of resetting the 30-day purge window or re-cascading. Reuses the pre-cascade users fetch for the email lookup (one fewer round-trip). - 006_soft_delete.sql: wrap policy creation in DROP POLICY IF EXISTS so the migration is re-runnable (Postgres has no CREATE POLICY IF NOT EXISTS). - subprocessors/page.tsx: hyphenate "open-source" as a compound adjective. - Add inline notes flagging the tracked follow-ups for the previously declined items (cascade N+1 needs db.select IN-support; admin email needs ESP wired before _send_deletion_confirmation can dispatch). - New test: test_delete_is_idempotent_when_already_soft_deleted. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 129-160: In _enforce_export_rate_limit, avoid selecting all rows
by changing the db.select call for "gdpr_export_requests" to fetch only the most
recent record (use order by created_at DESC and limit 1, or the equivalent
supported by your db.select API) and then run the same datetime parsing and
cutoff comparison against that single row; if db.select doesn't support
order/limit, replace it with a query method that does (or request the
max(created_at) for the user) so only the latest export timestamp is retrieved
and checked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3b17cfe8-3216-4b83-ade2-2208aad2df8e
📒 Files selected for processing (4)
cloud/app/routes/gdpr.pycloud/migrations/006_soft_delete.sqlcloud/tests/test_gdpr.pymarketing/app/legal/subprocessors/page.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test (3.12)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (19)
marketing/app/legal/subprocessors/page.tsx (3)
4-11: Draft metadata is correctly configured for non-indexing.Line 10 correctly keeps this draft page out of search indexing, and the canonical URL is consistent with the legal route structure.
73-79: Draft notice copy and legal cross-link look good.The banner language is clear, and the
/legal/dpa/link is consistent with the related legal page path.
84-113: Subprocessor list rendering is clean and maintainable.Typed data + mapped rendering with semantic
dl/dt/ddoutput is solid and easy to extend.cloud/app/routes/gdpr.py (7)
104-108: N+1 query pattern for per-brain data.Each brain triggers 4 separate queries. This was flagged in a previous review.
196-200: Same N+1 pattern as/me/data-summary.This duplicates the per-brain loop pattern flagged in the previous review for data summary.
297-316: N+1 cascade updates acknowledged with follow-up tracking.The inline NOTE documents the per-row UPDATE pattern as a known limitation. This was flagged in a previous review.
318-321: Email will beNonefor users without pre-existing shadow row.The email lookup relies on
existing_rowswhich is empty for first-time deletions. This was flagged in a previous review.
176-186: Acknowledged TOCTOU mitigation — good defensive comment.Moving the ledger insert before payload generation shrinks the race window. The inline comment clearly documents the trade-off and why true atomicity isn't achievable with the current DB wrapper.
270-285: LGTM — idempotency guard prevents purge window reset.Repeat calls now return the existing
deleted_at/purge_afterwithout re-cascading, addressing the previous review feedback.
226-235: LGTM — clear error handling for oversized exports.The 507 response with actionable contact information is appropriate until signed-URL delivery is implemented.
cloud/tests/test_gdpr.py (6)
35-70: Missing default empty responses for some tables may cause fragile tests.The
gdpr_clientfixture only registers mock responses for explicitly provided parameters. Tests liketest_delete_returns_202_and_schedules_purgedon't providecorrections,lessons,events, ormeta_rules. Ifmock_supabasedoesn't gracefully handle unregistered table access, these tests could break unexpectedly.This was noted in a previous review.
78-97: LGTM — auth enforcement tests cover all three endpoints.Testing for both 401 and 403 is appropriate since FastAPI's
HTTPBearerbehavior can vary.
105-169: LGTM — comprehensive coverage for data summary endpoint.Tests verify counts, timestamp calculations, and soft-delete exclusion.
177-265: LGTM — thorough export endpoint test coverage.Tests cover inline payload delivery, rate limiting (both directions), ledger insertion, and soft-deleted row inclusion for GDPR transparency.
321-350: LGTM — idempotency test correctly verifies no re-cascade.The test confirms that repeat
/me/deletecalls return existing tombstone values and don't insert a new record, protecting the 30-day purge window.
359-368: LGTM — sanity check for backward compatibility.This confirms that the
/workspaces/{id}/membersendpoint remains functional after the soft-delete filtering changes were introduced elsewhere.cloud/migrations/006_soft_delete.sql (3)
65-69: LGTM — idempotency guard added for policy.The
DROP POLICY IF EXISTSbeforeCREATE POLICYaddresses the previous review feedback and makes the migration re-runnable.
29-35: Shadowuserstable is only populated on-demand — confirm this is intentional.The
public.userstable is created here but no trigger populates it on user signup (the existinghandle_new_user()in001_initial_schema.sqlonly createsworkspacesandworkspace_members). This means for most users, the row won't exist until they call/me/delete.This lazy approach is valid, but note that the
NULLfor newly-created tombstone rows since the upsert ingdpr.pydoesn't populate it. The confirmation email stub in the route will receiveNone.
41-48: LGTM — partial indexes are efficient for soft-delete queries.The
WHERE deleted_at IS NOT NULLpartial indexes will only index tombstoned rows, keeping the indexes small while enabling fast purge-candidate lookups.
| async def _enforce_export_rate_limit(user_id: str) -> None: | ||
| """Raise 429 if the user already exported within EXPORT_RATE_WINDOW.""" | ||
| db = get_db() | ||
| rows = await db.select( | ||
| "gdpr_export_requests", | ||
| columns="id,created_at", | ||
| filters={"user_id": user_id}, | ||
| ) | ||
| if not rows: | ||
| return | ||
| cutoff = _utcnow() - EXPORT_RATE_WINDOW | ||
| for r in rows: | ||
| created = r.get("created_at") | ||
| if not created: | ||
| continue | ||
| # Accept both datetime and ISO-string forms. | ||
| if isinstance(created, str): | ||
| try: | ||
| created_dt = datetime.fromisoformat(created.replace("Z", "+00:00")) | ||
| except ValueError: | ||
| continue | ||
| elif isinstance(created, datetime): | ||
| created_dt = created | ||
| else: | ||
| continue | ||
| if created_dt.tzinfo is None: | ||
| created_dt = created_dt.replace(tzinfo=timezone.utc) | ||
| if created_dt >= cutoff: | ||
| raise HTTPException( | ||
| status_code=429, | ||
| detail="Data export is rate-limited to 1 request per 24 hours.", | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider fetching only the most recent export request.
The rate limit check fetches all historical export requests for the user. While the 24h rate limit bounds growth to ~365 rows/year, fetching only the most recent record would be more efficient.
♻️ Suggested optimization
async def _enforce_export_rate_limit(user_id: str) -> None:
"""Raise 429 if the user already exported within EXPORT_RATE_WINDOW."""
db = get_db()
- rows = await db.select(
- "gdpr_export_requests",
- columns="id,created_at",
- filters={"user_id": user_id},
- )
- if not rows:
+ # Only need to check the most recent request
+ rows = await db.select(
+ "gdpr_export_requests",
+ columns="id,created_at",
+ filters={"user_id": user_id},
+ order="created_at.desc",
+ limit=1,
+ )
+ if not rows:
return
cutoff = _utcnow() - EXPORT_RATE_WINDOW
- for r in rows:
- created = r.get("created_at")
+ created = rows[0].get("created_at")
+ if not created:
+ return
...Note: Exact syntax depends on your db.select API's support for order and limit parameters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/app/routes/gdpr.py` around lines 129 - 160, In
_enforce_export_rate_limit, avoid selecting all rows by changing the db.select
call for "gdpr_export_requests" to fetch only the most recent record (use order
by created_at DESC and limit 1, or the equivalent supported by your db.select
API) and then run the same datetime parsing and cutoff comparison against that
single row; if db.select doesn't support order/limit, replace it with a query
method that does (or request the max(created_at) for the user) so only the
latest export timestamp is retrieved and checked.
Flag the _enforce_export_rate_limit full-scan as a db-layer follow-up. SupabaseClient.select currently only accepts eq filters — no order_by / limit support — so narrowing the query requires extending the wrapper. Leaving a TODO to track rather than scoping the db-layer change into this PR.
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
cloud/app/routes/gdpr.py (1)
291-319:⚠️ Potential issue | 🟠 MajorThe soft-delete cascade is still non-atomic.
usersis tombstoned before the per-rowworkspaces/brainsupdates run, so any laterupdate()failure can leave the account marked deleted while some owned resources remain active. This still needs a transaction or DB-side bulk update once the wrapper supports it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/routes/gdpr.py` around lines 291 - 319, The current soft-delete flow performs db.upsert("users", ...) before per-row db.update calls for workspaces and brains (via _collect_user_workspaces and _collect_user_brains), leaving the operation non-atomic; wrap the user tombstone plus all workspace/brain updates in a single database transaction (or use a DB-side bulk update once supported) so either all rows (users, workspaces, brains) are updated or none are; specifically, perform the db.upsert("users", ...) and the subsequent db.update("workspaces", ...) and db.update("brains", ...) calls inside the same transactional context (or replace the per-row loops with a single bulk UPDATE using a filter like id=in.(...)) to ensure atomicity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 88-91: The GET handlers for user data (e.g., data_summary)
currently accept user_id from get_current_user_id which only validates the JWT;
update these handlers to verify the account is active (not soft-deleted) before
returning data. Specifically, add an active-user check after obtaining user_id
(or replace get_current_user_id with a new auth helper) that queries the users
table for deleted_at IS NULL (or uses the existing delete_me check logic) and
raise an appropriate HTTP 403/404 if the user is soft-deleted; apply the same
change to the /me/export handler referenced around lines 167-169 so both
data_summary and export endpoints block access for soft-deleted accounts.
- Around line 179-189: The current logic inserts into gdpr_export_requests (via
the db.insert call) before the export payload is generated, which consumes the
24-hour quota even on failures; change the flow so that either (a) you only
insert the successful export row into gdpr_export_requests after the payload is
generated/delivered, or (b) add a status field (e.g.,
status="success"|"failure") and write a failure record separately, and update
_enforce_export_rate_limit to count only status="success" rows; apply the same
change to the analogous block around lines 227-238 so failed/oversized exports
do not decrement the user's retry quota.
- Line 219: The log call currently emits raw PII (e.g., the _log.info(...) that
logs user_id and size_bytes and other GDPR-related logs that include email in
the deletion path); replace raw identifiers with a non-reversible or redacted
representation before logging (for example compute a deterministic hash of
user_id/email or redact them to a short masked string) and continue to log
counts/status like size_bytes; update every GDPR-related logger usage (the
_log.info call that prints "GDPR export generated for user=%s size=%d" and the
deletion-path logs that include email) to use the hashed/redacted identifier
instead of the raw value.
---
Duplicate comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 291-319: The current soft-delete flow performs db.upsert("users",
...) before per-row db.update calls for workspaces and brains (via
_collect_user_workspaces and _collect_user_brains), leaving the operation
non-atomic; wrap the user tombstone plus all workspace/brain updates in a single
database transaction (or use a DB-side bulk update once supported) so either all
rows (users, workspaces, brains) are updated or none are; specifically, perform
the db.upsert("users", ...) and the subsequent db.update("workspaces", ...) and
db.update("brains", ...) calls inside the same transactional context (or replace
the per-row loops with a single bulk UPDATE using a filter like id=in.(...)) to
ensure atomicity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a1031240-f307-4380-a5ff-5f80b4410c42
📒 Files selected for processing (1)
cloud/app/routes/gdpr.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
cloud/app/routes/gdpr.py (1)
273-288: Nice idempotency guard.Returning the existing
deleted_at/purge_afteravoids silently extending the purge window on repeated delete calls.
…ffort rollback - Guard /me/data-summary and /me/export against soft-deleted accounts (JWT stays valid until TTL; add users.deleted_at IS NULL check, 404 on deleted). - Don't burn the 24h export quota on failed exports: move the gdpr_export_requests ledger insert to AFTER size check passes so 507 oversized-payload rejections don't block the user from retrying. - Strip raw user_id and email from GDPR info logs (log counts/has_email only) since these logs are long-lived and compliance-sensitive. - Reorder soft-delete cascade to tombstone subordinate resources (workspaces, brains) BEFORE the users row, so mid-cascade failures leave the account recoverable rather than orphaned-but-tombstoned. Wrap in try/except with best-effort rollback of partial subordinate updates and raise HTTP 500 so the client retries. Proper transactional support is tracked as a follow-up (requires db-wrapper changes). - Add tests for: soft-deleted-user read block, ledger-row-only-on-success, cascade subordinate-first ordering, mid-cascade rollback. Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
@coderabbitai addressed round-4 in 0f4eea8: Three actionable items — fixed as suggested:
Duplicate non-atomic cascade — declining the transactional fix for the third time, but made it resilient:
New tests cover: soft-deleted-user read blocks, ledger-only-on-success (happy + 507 paths), subordinate-first ordering, and mid-cascade rollback on simulated brain-update failure. 135/135 cloud tests pass. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me verify the implementation in the updated code before responding. 🏁 Script executed: Length of output: 36181
✅ ✅ Ledger insert after size check ✅ PII stripped from logs ✅ Resilient cascade (ordering + rollback) New tests — One minor note for the backlog: Everything else looks solid — happy to approve this. [approve] |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
Addressed round-7 CR:
140/141 cloud tests pass; the 1 failure is pre-existing ( |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review — fixes pushed since last review, please re-verify |
🧩 Analysis chainLet me verify the latest state of the round-7 fixes before triggering the review. 🏁 Script executed: Length of output: 13710
✅ ✅ ✅ ✅ All round 4–6 items still holding
Everything looks solid. Triggering the review now. ✅ Actions performedReview triggered.
|
|
@coderabbitai review — CR Pro active, please re-verify. |
🧩 Analysis chain🏁 Script executed: Length of output: 4991
✅ Round-7 (db guards)
✅ Round-6 (marketing)
✅ Rounds 4–5 (GDPR endpoints)
Everything looks solid. Triggering the review now. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Rebased on main (merge 24ce2b0) + round-8 CR review: All previously-flagged items were already addressed in round-7 (def0fb1 hard-guards unfiltered PATCH/DELETE in cloud/app/db.py). CodeRabbit's re-flag of update() is a stale read — current code raises HTTPException 500 when filters is None/empty, and also refuses when all filter values evaluate empty. PII logs already redacted (size_bytes only for export, has_email bool for deletion). SLA/DPA/Subprocessors draft pages already have robots noindex. Cloud tests: 135 passed (incl. 20 GDPR); unrelated test_proof.py failure is due to missing cloud/scripts/export_ab_proof.py on main — tracked in separate branch fix/missing-export-ab-proof-script. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
@coderabbitai review — rebased on latest main, all prior fixes applied |
🧩 Analysis chain🏁 Script executed: Length of output: 6047
Nothing was lost in the merge onto main. Triggering the review now. ✅ Actions performedReview triggered.
|
…LETE Round-9 response to CodeRabbit critical finding on PR #36. The primary guard (`if not filters: raise`) was already present from earlier rounds; this commit adds: - Post-transform empty-params guard in delete() mirroring update() - Dedicated test_db_guards.py with 5 tests proving update() and delete() never fire an HTTP call when filters is None or {}, using a fake http client that asserts on any call
|
Round-9: reinforced unfiltered-PATCH guard in db.update() + test |
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cloud/app/db.py (1)
132-157:⚠️ Potential issue | 🟠 Major
delete()lacks list/tuple filter handling, asymmetric withupdate().The
update()method correctly handles list/tuple/set filter values (converting toin.(...)syntax and treating empty lists as a no-op), butdelete()only useseq.{val}for all values. This creates inconsistent behavior:
Call update()behaviordelete()behaviorfilters={"id": [1,2,3]}id=in.(1,2,3)✅id=eq.[1, 2, 3]❌ invalidfilters={"id": []}Returns [], no HTTP ✅id=eq.[], fires DELETE ❌The
update()method is already being called with list filters in production (e.g.,gdpr.pylines 354–364 passworkspace_idsandbrain_idslists). Thedelete()method currently receives only scalars, but should be aligned to prevent unsafe behavior if batch deletions are added in future.Proposed fix
async def delete( self, table: str, filters: dict[str, Any] | None = None, ) -> list[dict]: """DELETE rows matching eq filters. Returns deleted rows when PostgREST sends them back. Refuses to issue any DELETE without at least one filter constraint to prevent accidental full-table deletion. """ if not filters: _log.error("Refusing unfiltered delete on %s", table) raise HTTPException(status_code=500, detail="Database error") params: dict[str, str] = {} for key, val in filters.items(): - params[key] = f"eq.{val}" + if isinstance(val, (list, tuple, set)): + if not val: + # No matches possible; refuse to DELETE without a filter. + return [] + joined = ",".join(str(v) for v in val) + params[key] = f"in.({joined})" + else: + params[key] = f"eq.{val}" # Final safety net: if every filter value was stripped away somehow, # refuse rather than issue a full-table DELETE. if not params:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cloud/app/db.py` around lines 132 - 157, The delete() method currently treats every filter value as eq.{val} and must be updated to mirror update() semantics: for each filters.items() if val is a list/tuple/set then if empty return [] (no HTTP call) else convert to "in.(v1,v2,...)"; if val is not an iterable or is a string use "eq.{val}"; build params accordingly (same symbol: delete(), _http.delete, _raise_db_error, resp.json()) and keep the existing safety checks and error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cloud/tests/test_db_guards.py`:
- Around line 74-91: Add a test that covers the empty-list filter case for
delete: create an async test (e.g.
test_delete_returns_empty_for_empty_list_filter) that calls
client.delete("users", filters={"id": []}) using _make_client(), asserts the
result is [] and asserts fake.calls == [] to ensure no DELETE is fired; this
documents and will catch regressions in the client.delete behavior referenced in
db.py.
---
Outside diff comments:
In `@cloud/app/db.py`:
- Around line 132-157: The delete() method currently treats every filter value
as eq.{val} and must be updated to mirror update() semantics: for each
filters.items() if val is a list/tuple/set then if empty return [] (no HTTP
call) else convert to "in.(v1,v2,...)"; if val is not an iterable or is a string
use "eq.{val}"; build params accordingly (same symbol: delete(), _http.delete,
_raise_db_error, resp.json()) and keep the existing safety checks and error
handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 141929a8-9039-483f-a300-956ebcb775fd
📒 Files selected for processing (2)
cloud/app/db.pycloud/tests/test_db_guards.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test (3.11)
- GitHub Check: Test (Python 3.12)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (3)
cloud/app/db.py (1)
93-125: LGTM on the update() guards.The defense-in-depth approach is solid:
- Early guard rejects
None/emptyfilters- Empty list filter values return
[]without issuing a PATCH- List/tuple/set values correctly convert to PostgREST
in.(...)syntax- Final safety net catches edge cases where all filters were stripped
cloud/tests/test_db_guards.py (2)
12-42: Well-designed test infrastructure.The
_FakeHTTPstub and_make_client()helper cleanly isolate the guard logic from real HTTP calls. The approach of appending tocallsbefore raising ensures assertions remain verifiable even if guards fail.
45-71: Good coverage forupdate()guard scenarios.The three test cases comprehensively cover:
filters=None→ raises 500filters={}→ raises 500filters={"id": []}→ returns[](no-op)
| @pytest.mark.asyncio | ||
| async def test_delete_refuses_when_filters_is_none(): | ||
| """delete(filters=None) must raise and NOT fire a DELETE.""" | ||
| client, fake = _make_client() | ||
| with pytest.raises(HTTPException) as exc: | ||
| await client.delete("users", filters=None) | ||
| assert exc.value.status_code == 500 | ||
| assert fake.calls == [] | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_delete_refuses_when_filters_is_empty_dict(): | ||
| """delete(filters={}) must raise and NOT fire a DELETE.""" | ||
| client, fake = _make_client() | ||
| with pytest.raises(HTTPException) as exc: | ||
| await client.delete("users", filters={}) | ||
| assert exc.value.status_code == 500 | ||
| assert fake.calls == [] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding test for delete(filters={"id": []}).
The delete() tests cover None and {} filters but not empty-list filter values. Given the asymmetry flagged in db.py, adding this test would:
- Document the current (potentially problematic) behavior
- Catch regressions if
delete()is updated to matchupdate()
📝 Suggested additional test
`@pytest.mark.asyncio`
async def test_delete_returns_empty_for_empty_list_filter():
"""delete(filters={id: []}) should be a no-op returning [] without firing DELETE."""
client, fake = _make_client()
result = await client.delete("users", filters={"id": []})
assert result == []
assert fake.calls == []Note: This test will fail with the current delete() implementation, confirming the issue flagged in db.py.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cloud/tests/test_db_guards.py` around lines 74 - 91, Add a test that covers
the empty-list filter case for delete: create an async test (e.g.
test_delete_returns_empty_for_empty_list_filter) that calls
client.delete("users", filters={"id": []}) using _make_client(), asserts the
result is [] and asserts fake.calls == [] to ensure no DELETE is fired; this
documents and will catch regressions in the client.delete behavior referenced in
db.py.
* feat(marketing): Plausible analytics with env-gated event tracking
Adds privacy-first, cookie-free analytics (Plausible) gated by
NEXT_PUBLIC_ENABLE_ANALYTICS so local dev and preview deploys stay clean.
Respects DNT client-side.
Tracked events (all client-side, no PII):
- signup_click (header, hero, home footer CTA)
- signup_complete (fires once on /?signup=success redirect from Supabase)
- docs_click (header nav)
- install_copy (hero install command copy button)
Added vitest with 10 passing tests for the analytics helper (opt-in gate,
DNT, domain default, exception safety, prop forwarding). Documented env
vars in marketing/.env.example and cloud/RAILWAY-ENV.md.
* feat(sdk,cloud): opt-in anonymous activation telemetry
Gives us a way to measure time-to-first-value without collecting any
PII, code, or lesson content. Strictly opt-in, off by default.
SDK (src/gradata/_telemetry.py):
- Prompts once on `gradata init` with explicit "[y/N]" default-off copy
- Persists choice in ~/.gradata/config.toml under [telemetry] enabled
- GRADATA_TELEMETRY=0 env var always wins (kill switch for opted-in users)
- Fires four activation events, each at most once per machine:
brain_initialized (gradata init)
first_correction_captured (first brain.correct())
first_graduation (first correction that graduates a rule)
first_hook_installed (first gradata hooks install)
- Wire payload is exactly {event, user_id=sha256(machine_id), ts, sdk_version}
- Network calls run on a daemon thread; never block the user; never raise
Cloud (cloud/app/routes/telemetry.py + migration 006):
- POST /telemetry/event — public, unauthenticated, always 202
- In-memory sliding-window rate limit (100/min/IP)
- Replay protection: drops events >1h old or >5min future
- Pydantic schema enforces field shape; extra fields silently dropped
- DB errors swallowed (never leak state to the public surface)
- New telemetry_events table with RLS enabled (service role only)
Tests: 27 SDK tests + 19 cloud tests, all green. Full SDK suite 2097 pass,
cloud suite 134 pass.
* chore(cloud): rename migration 006_telemetry_events -> 007
PR #36 (GDPR) uses migration 006_soft_delete.sql. Renaming telemetry
migration to 007 to resolve collision.
Co-Authored-By: Gradata <noreply@gradata.ai>
* style: ruff --fix for PR #37
* chore: address CodeRabbit feedback on PR #37
MAJOR (cli.py)
- Replace hardcoded ~/.gradata/config.toml string with _telemetry.config_path()
so the user sees the real path on Windows/macOS/Linux.
- Replace bare exception swallow with a logger.debug call.
- Guard send_once("brain_initialized") with is_enabled() + try/except so a
telemetry bug or DNS hiccup can never break ``gradata init``.
MINOR (_telemetry.py)
- Fix prefix-match bug in _write_config_key: drop startswith() pre-filter and
rely on exact lhs == bare check after splitting on '='.
- Wrap send_once read-modify-write in a cross-process advisory file lock
(msvcrt on Windows, fcntl elsewhere). Degrades silently if locking is
unavailable.
- Expose config_path() helper for callers.
MINOR (cloud/app/routes/telemetry.py)
- Tighten _is_hex64: explicitly require lowercase to match docstring and keep
storage canonical. Also migrate datetime.now(timezone.utc) -> datetime.now(UTC).
MINOR (cloud/migrations/007_telemetry_events.sql)
- Add explicit deny-all RLS policy (telemetry_events_deny_all) wrapped in
DROP POLICY IF EXISTS for re-runnability.
NIT (marketing)
- Remove redundant <span aria-hidden>{site.name}</span> from page.tsx —
``site`` is already used in the metadata export.
- Make CodeBlock aria-label dynamic via copyAriaLabel prop (default
"Copy code"). Hero passes "Copy install command" to preserve current copy.
NIT (tests/test_telemetry.py)
- Parametrize kill-switch tests across all recognized off- and truthy-values.
- Tighten MAC-leak assertion: assert real MAC is >=12 chars then check it is
not a substring of the digest, plus a deterministic synthetic-seed test
that compares the digest to the known sha256.
Co-Authored-By: Gradata <noreply@gradata.ai>
* fix(test): loosen MAC-leak assertion to check non-inclusion, not length
The prior tightening to `len(mac_hex) >= 12` broke on GitHub-hosted
CI runners where `uuid.getnode()` can yield a shorter hex string.
The security-relevant invariant is that the raw MAC hex must not
appear in the hashed user_id — not that the MAC is a particular
length. The deterministic sha256 check in
test_does_not_leak_synthetic_short_mac remains as the stronger
guarantee.
Co-Authored-By: Gradata <noreply@gradata.ai>
* chore: address CodeRabbit round-2 feedback on PR #37 (print->log, PII ip, proxy-aware limit, unmount cleanup)
* chore: address CodeRabbit round-3 on PR #37 (reject naive ts + other fixes)
- cloud/telemetry.py: reject naive ts at API boundary (silent 202, matches
always-202 contract) so malicious clients can't extend replay window
via local-tz interpretation. Adds test_drops_naive_timestamps.
- marketing/CodeBlock.tsx: fix copyValue docstring — it replaces the whole
copied text, not just the first line.
- tests/test_telemetry.py (SDK): drop redundant test_no_extra_fields
(test_exact_keys already asserts the same invariant), kept comment
explaining the security rationale.
* refactor: simplify pass on analytics-and-activation branch
Remove redundant is_enabled() check in cmd_init — send_once() already
gates on is_enabled() internally, so wrapping it in an outer check is
duplicate work that obscures the actual invariant.
No behavior change. Privacy-sensitive code preserved verbatim:
- _config_lock (msvcrt/fcntl) cross-process lock
- naive-ts silent-drop at API boundary
- source_ip excluded from DB (PII)
- get_remote_address X-Forwarded-For support
- CodeBlock unmount cleanup + copyAriaLabel prop
Co-Authored-By: Gradata <noreply@gradata.ai>
* fix(review): address CodeRabbit feedback round 4 on analytics
Honor the timeout parameter in _config_lock on POSIX by using
non-blocking fcntl.flock with a retry loop, matching the Windows
branch. Previously, blocking flock(LOCK_EX) could hang indefinitely
if another process held the lock, violating the documented
best-effort contract.
Co-Authored-By: Gradata <noreply@gradata.ai>
* fix(telemetry): route config path through shared resolver (CR round-5)
Addresses CodeRabbit Major finding on PR #37: _telemetry.py was
hardcoding Path.home() / ".gradata", violating the src/gradata/**/*.py
coding guideline ("no hardcoded paths"). Introduces a small
_config_paths module with get_config_dir() that supports
GRADATA_CONFIG_DIR and XDG_CONFIG_HOME overrides, and reworks
_telemetry.py to resolve CONFIG_DIR / CONFIG_PATH through it on every
call. Telemetry remains opt-in (default off) and behaviour is
unchanged on the default path.
Also addresses the Minor test fix in cloud/tests/test_telemetry.py:
renames the misleading test_drops_exactly_one_hour_old (which actually
used hours=1, seconds=5) and parametrizes the boundary into two
clearly-named cases, plus adds a positive test asserting an
exactly-1h-old event is still accepted (since the route uses strict
"now - ts > 1h").
tests/test_telemetry.py switches from monkeypatching CONFIG_DIR /
CONFIG_PATH to setting GRADATA_CONFIG_DIR, aligning with the new
single source of truth.
---------
Co-authored-by: Oliver Le <oliver@gradata.com>
Co-authored-by: Gradata <noreply@gradata.ai>
Summary
Three workstreams closing enterprise / GDPR compliance gaps that procurement asks about on Day 1.
1. GDPR data endpoints (cloud backend)
GET /api/v1/me/export— full JSON dump of workspaces, brains, corrections, lessons, meta_rules, events. Inline payload up to 10MB (larger returns 507 with a TODO for signed-URL delivery). Rate-limited to 1 call per user per rolling 24h via the newgdpr_export_requestsledger table.POST /api/v1/me/delete— returns 202 Accepted, tombstones the user row plus owned workspaces + brains withdeleted_at/purge_aftertimestamps. Actual hard-delete is deferred to a nightly Railway cron (TODO in the SQL, out of scope for this PR).GET /api/v1/me/data-summary— counts + oldest/newest record timestamps for the account-settings "what do you have on me" modal.005_soft_delete.sqladdsdeleted_at TIMESTAMPTZtoworkspaces,brains, plus a new shadowpublic.userstable withpurge_after, and thegdpr_export_requestsledger.auth.pyandroutes/users.pynow filter soft-deleted rows so tombstoned users cannot authenticate and tombstoned workspaces disappear from the profile response.cloud/tests/test_gdpr.py(UUID fixtures, no PII). All 128 cloud tests pass.2. Public compliance docs (marketing site)
/legal/dpa— Data Processing Agreement with standard GDPR Article 28 clauses (definitions, scope, security measures, sub-processor authorization, Articles 15/17 assistance, breach notification, SCCs, audit, return-or-delete)./legal/sla— 99.9% monthly uptime target with service-credit table, P1/P2/P3 support response matrix, incident communication policy./legal/subprocessors— Railway, Supabase, Stripe, Cloudflare, Sentry, each with purpose / data categories / location.3. Security.txt + runbook
marketing/public/.well-known/security.txtper RFC 9116:security@gradata.aicontact, 12-month expiry, 90-day coordinated disclosure, scoped to apex + api + app + SDK repo with explicit out-of-scope items.cloud/RUNBOOK-INCIDENT.md: step-by-step runbooks for the five incidents we expect — api.gradata.ai 500s, Railway outage, Supabase pool exhaustion, Stripe webhook backlog, GDPR endpoint misbehavior. Includes severity table, comms templates, and escalation contacts.Constraints honored
test_team.pystill passes afterauth.pysoft-delete filter was added; explicit back-compat test intest_gdpr.py.Verification
cd cloud && pytest -x— 128 passedcd marketing && pnpm run build— 13 static routes generated including the 3 new legal pagesNew public URLs
GET https://api.gradata.ai/api/v1/me/exportGET https://api.gradata.ai/api/v1/me/data-summaryPOST https://api.gradata.ai/api/v1/me/deleteFollow-ups (explicitly out of scope)
deleted_at < now() - 30 daysrowsTest plan
cloud(pytest 128 tests)marketing(next build)/me/data-summaryon a real account before shipping migrationSELECT deleted_at FROM brains LIMIT 1;returns a columnGenerated with Gradata