Skip to content

feat: GDPR data endpoints + DPA/SLA/security.txt + incident runbook#36

Merged
Gradata merged 18 commits into
mainfrom
feat/gdpr-compliance
Apr 15, 2026
Merged

feat: GDPR data endpoints + DPA/SLA/security.txt + incident runbook#36
Gradata merged 18 commits into
mainfrom
feat/gdpr-compliance

Conversation

@Gradata

@Gradata Gradata commented Apr 13, 2026

Copy link
Copy Markdown
Owner

Summary

Three workstreams closing enterprise / GDPR compliance gaps that procurement asks about on Day 1.

1. GDPR data endpoints (cloud backend)

  • GET /api/v1/me/export — full JSON dump of workspaces, brains, corrections, lessons, meta_rules, events. Inline payload up to 10MB (larger returns 507 with a TODO for signed-URL delivery). Rate-limited to 1 call per user per rolling 24h via the new gdpr_export_requests ledger table.
  • POST /api/v1/me/delete — returns 202 Accepted, tombstones the user row plus owned workspaces + brains with deleted_at / purge_after timestamps. Actual hard-delete is deferred to a nightly Railway cron (TODO in the SQL, out of scope for this PR).
  • GET /api/v1/me/data-summary — counts + oldest/newest record timestamps for the account-settings "what do you have on me" modal.
  • Migration 005_soft_delete.sql adds deleted_at TIMESTAMPTZ to workspaces, brains, plus a new shadow public.users table with purge_after, and the gdpr_export_requests ledger.
  • auth.py and routes/users.py now filter soft-deleted rows so tombstoned users cannot authenticate and tombstoned workspaces disappear from the profile response.
  • 13 new tests in cloud/tests/test_gdpr.py (UUID fixtures, no PII). All 128 cloud tests pass.

2. Public compliance docs (marketing site)

  • /legal/dpa — Data Processing Agreement with standard GDPR Article 28 clauses (definitions, scope, security measures, sub-processor authorization, Articles 15/17 assistance, breach notification, SCCs, audit, return-or-delete).
  • /legal/sla — 99.9% monthly uptime target with service-credit table, P1/P2/P3 support response matrix, incident communication policy.
  • /legal/subprocessors — Railway, Supabase, Stripe, Cloudflare, Sentry, each with purpose / data categories / location.
  • Footer now links DPA / SLA / Subprocessors alongside existing Privacy + Terms.
  • All three pages carry a visible DRAFT — pending legal review banner so no one mistakes them for counsel-reviewed contracts.

3. Security.txt + runbook

  • marketing/public/.well-known/security.txt per RFC 9116: security@gradata.ai contact, 12-month expiry, 90-day coordinated disclosure, scoped to apex + api + app + SDK repo with explicit out-of-scope items.
  • cloud/RUNBOOK-INCIDENT.md: step-by-step runbooks for the five incidents we expect — api.gradata.ai 500s, Railway outage, Supabase pool exhaustion, Stripe webhook backlog, GDPR endpoint misbehavior. Includes severity table, comms templates, and escalation contacts.

Constraints honored

  • Draft-quality disclaimer on all legal pages.
  • No PII in tests — all UUID fixtures.
  • Backwards compat verified: test_team.py still passes after auth.py soft-delete filter was added; explicit back-compat test in test_gdpr.py.

Verification

  • cd cloud && pytest -x — 128 passed
  • cd marketing && pnpm run build — 13 static routes generated including the 3 new legal pages

New public URLs

Follow-ups (explicitly out of scope)

  • Railway nightly cron to purge deleted_at < now() - 30 days rows
  • ESP wire-up for deletion confirmation email (currently logged only)
  • Signed-URL delivery for >10MB exports (currently returns 507)

Test plan

  • CI green on cloud (pytest 128 tests)
  • CI green on marketing (next build)
  • Manual: hit /me/data-summary on a real account before shipping migration
  • After migration deploy: spot-check SELECT deleted_at FROM brains LIMIT 1; returns a column
  • Legal: circulate DPA draft to outside counsel before quoting clause numbers in a sales cycle

Generated with Gradata

Oliver Le added 3 commits April 13, 2026 13:56
Adds GDPR Article 15 (access) and Article 17 (erasure) endpoints plus a
data summary endpoint for account-settings UIs. Soft-delete cascades via
migration 005 — actual row purge will run as a nightly Railway cron
after the 30-day grace period (TODO tracked in the SQL).

- GET  /me/export: returns inline JSON payload (<10MB), rate-limited to
  1 call/24h via gdpr_export_requests ledger table
- POST /me/delete: 202 Accepted, tombstones user + owned workspaces +
  owned brains, logs a confirmation email (ESP wire-up pending)
- GET  /me/data-summary: counts + oldest/newest timestamps
- auth.py + users.py now filter soft-deleted rows so tombstoned users
  cannot authenticate and tombstoned workspaces disappear from profile
- 13 new tests (UUID fixtures, no PII), 128 total cloud tests passing
Adds three enterprise/GDPR compliance pages requested on Day 1 by
enterprise procurement teams. All three are marked DRAFT — pending legal
review at the top of the page so no one mistakes them for
counsel-reviewed contracts.

- /legal/dpa: GDPR Article 28 clauses (definitions, scope, security,
  sub-processors, data-subject rights, breach notification, SCCs)
- /legal/sla: 99.9% uptime target, service-credit table, support
  response matrix by severity, incident communication policy
- /legal/subprocessors: Railway, Supabase, Stripe, Cloudflare, Sentry
  with purpose, data categories, and location per subprocessor
- Footer now links DPA/SLA/Subprocessors alongside Privacy + Terms
- marketing/public/.well-known/security.txt: contact security@gradata.ai,
  90-day coordinated-disclosure window, scoped to gradata.ai apex + api +
  app + SDK repo, expires 2027-04-13 (12 months out).
- cloud/RUNBOOK-INCIDENT.md: step-by-step for the top 5 incidents we
  expect — api.gradata.ai 500s, Railway outage, Supabase pool exhaustion,
  Stripe webhook backlog, GDPR endpoint misbehavior. Includes severity
  table, communication templates, and escalation contacts.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai

coderabbitai Bot commented Apr 13, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough
  • Three new GDPR API endpoints: GET /me/data-summary (user data inventory), GET /me/export (full JSON export with 10MB inline limit, rate-limited to 1 call/user/24h), POST /me/delete (202 Accepted; soft-delete with 30-day purge window)
  • Soft-delete infrastructure: Migration adds deleted_at columns to workspaces/brains, new public.users shadow table with purge_after, and gdpr_export_requests ledger; soft-deleted records filtered from auth/workspace lookups
  • Defense-in-depth write guards: db.py refactored to refuse unfiltered PATCH/DELETE operations (raises HTTP 500); empty filter collections return no-op; guards enforced in mocks and tested
  • Compliance documentation: Marketing pages added for DPA (referencing API paths), SLA (response times, service credits), and Subprocessors (30-day change notification); all draft pages marked with banner and set to noindex
  • Security: RFC 9116 security.txt: Added at /.well-known/security.txt with 90-day disclosure window, scoped hosts/repos, contact security@gradata.ai
  • Incident runbook: cloud/RUNBOOK-INCIDENT.md standardizes response for 5 common incidents (API 500s, Railway outages, DB pool exhaustion, Stripe backlog, GDPR misbehavior) with severity levels, troubleshooting steps, comms templates, escalation matrix
  • Export accounting: Ledger insertion deferred to post-validation (oversized 507 responses do not consume quota); soft-deleted rows included in export for transparency
  • Cascade resilience: Deletion of user cascades bulk-updates to workspaces/brains first; on failure, best-effort rollback reverts subordinate tombstones and returns 500 to prompt retry
  • Comprehensive test coverage: 21 new GDPR tests validate authentication, rate limiting, cascades, rollback, idempotency, and ledger accounting; guard tests prove unfiltered writes never issue HTTP calls

Walkthrough

Adds GDPR endpoints (data-summary, export with 24h rate-limit and 10MB inline cap, delete scheduling with 30-day purge), Pydantic response models, DB soft-delete and export ledger schema with RLS, soft-delete-aware auth/profile filtering, Supabase write-guards and test mock alignment, tests, incident runbook, legal marketing pages, footer links, and security.txt.

Changes

Cohort / File(s) Summary
Database & Schema
cloud/migrations/006_soft_delete.sql
Adds deleted_at to workspaces/brains, new public.users shadow table (deleted_at,purge_after), gdpr_export_requests ledger, RLS policy, and partial indexes for soft-delete and export queries.
GDPR API & Models
cloud/app/routes/gdpr.py, cloud/app/models.py, cloud/app/routes/__init__.py
New gdpr router with GET /me/data-summary, GET /me/export (10MB inline cap, 24h rolling rate-limit via ledger), and POST /me/delete (idempotent tombstone cascade + 30-day purge). Adds DataSummaryResponse, DataExportResponse, and DeleteAccountResponse; router registered under gdpr.
Auth & Profile filtering
cloud/app/auth.py, cloud/app/routes/users.py
Auth lookups and user profile endpoints now exclude soft-deleted brains/workspaces by filtering deleted_at.
Supabase client & test mocks
cloud/app/db.py, cloud/tests/conftest.py
SupabaseClient.update/delete now refuse unfiltered writes (raise 500), encode scalar vs iterable filters (eq. vs in.(...)), treat empty-collection filter values as no-op returning []; mocks updated to mirror prod semantics.
Tests
cloud/tests/test_gdpr.py, cloud/tests/test_db_guards.py, cloud/tests/conftest.py
Adds GDPR endpoint test suite covering auth, counts, timestamps, export payload/size/ledger/rate-limit, delete cascade/idempotency/rollback; adds DB-guard regression tests and aligns mock behavior with new guards.
Runbook
cloud/RUNBOOK-INCIDENT.md
New incident response runbook: immediate protocol, severity levels, troubleshooting playbooks (API, Railway, Supabase, Stripe, GDPR ops), communication templates, and escalation matrix.
Marketing / Legal pages & components
marketing/app/legal/dpa/page.tsx, marketing/app/legal/sla/page.tsx, marketing/app/legal/subprocessors/page.tsx, marketing/src/components/LegalPageHeader.tsx, marketing/src/components/Footer.tsx, marketing/public/.well-known/security.txt
Adds DPA, SLA, and Subprocessors pages (draft metadata), LegalPageHeader/LegalDraftBanner components, footer links for DPA/SLA/Subprocessors, and security.txt.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as GDPR API
    participant DB
    participant Supabase
    rect rgba(200,200,255,0.5)
    Client->>API: GET /api/v1/me/export (auth)
    API->>DB: query gdpr_export_requests for recent entries
    alt recent request within 24h
        DB-->>API: recent row
        API-->>Client: 429 Too Many Requests
    else allowed
        API->>Supabase: fetch workspaces, brains, corrections, lessons, events, meta_rules (include soft-deleted)
        Supabase-->>API: aggregated rows
        API->>API: serialize JSON, compute size
        alt size ≤ 10MB
            API->>DB: insert gdpr_export_requests ledger row
            DB-->>API: ack
            API-->>Client: 200 JSON export (inline)
        else size > 10MB
            API-->>Client: 507 Payload Too Large
        end
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #28: Modifies Supabase client update/delete behaviors and mock semantics — closely related to cloud/app/db.py and test updates.
  • PR #22: Prior backend work that overlaps on GDPR/soft-delete and routing integration; likely a foundational predecessor.
  • PR #27: Related marketing/legal UI/footer changes that touch similar legal pages and footer link additions.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main workstreams: GDPR data endpoints, legal/compliance documents (DPA/SLA), and security infrastructure (security.txt and incident runbook).
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing all three workstreams, constraints honored, verification steps, and follow-ups.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gdpr-compliance

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Apr 13, 2026

Copy link
Copy Markdown

Deploying gradata-dashboard with  Cloudflare Pages  Cloudflare Pages

Latest commit: d9f0d83
Status: ✅  Deploy successful!
Preview URL: https://3574eadf.gradata-dashboard.pages.dev
Branch Preview URL: https://feat-gdpr-compliance.gradata-dashboard.pages.dev

View logs

PR #33 (API security hardening) introduced 005_processed_webhooks.sql.
PR #33 has been merged and 005_processed_webhooks already applied to prod Supabase.
Renaming this migration to 006 to resolve the collision.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 173-209: The TOCTOU issue is that
_enforce_export_rate_limit(user_id) and the subsequent db.insert into
"gdpr_export_requests" are not atomic, allowing concurrent requests to bypass
the limit; fix by making the check-and-record atomic: either perform an
insert-first/upsert with a uniqueness constraint (e.g., on (user_id,
date_bucket)) and rely on ON CONFLICT/ignore to detect prior exports, or wrap
the logic in a DB transaction that locks the user's export ledger row before
checking and inserting; modify the code paths around _enforce_export_rate_limit
and the db.insert("gdpr_export_requests", ...) call so the DB enforces
uniqueness/transactional semantics rather than trusting separate read-then-write
steps.
- Around line 273-289: The cascade currently issues N individual db.update calls
for each workspace/brain via _collect_user_workspaces and _collect_user_brains
causing N+1 writes and potential partial soft-delete; instead collect the IDs
from owned_workspaces/owned_brains and perform a single bulk update (e.g.,
update WHERE id IN [...], setting deleted_at to now_iso) for the "workspaces"
and "brains" tables, and wrap both bulk updates and the parent user soft-delete
in a single transaction using your db.transaction/db.with_transaction API (or
fallback to a single raw SQL update if bulk update helper is unavailable) to
ensure atomicity and avoid partial failures.
- Around line 104-108: The loop over brain_ids issues an N+1 pattern by calling
db.select per brain for corrections, lessons, events, and meta_rules; replace
this with four batched queries that fetch all rows for those tables in one call
each using an IN-style filter on brain_id (e.g., use db.select("corrections",
filters={"brain_id": <IN brain_ids>}) etc.), then extend all_corrections,
all_lessons, all_events, and all_meta_rules with the results; update code that
references brain_ids/ db.select to use the batched results and remove the
per-bid selects.
- Around line 291-296: The shadow users upsert doesn't populate email, so
db.select("users", ...) can return None and _send_deletion_confirmation(user_id,
email) may be called without an address; modify the flow to fall back to the
Supabase admin auth API when email is None: after the db.select call, if email
is still None call the Supabase admin user lookup (e.g., use the project admin
client to fetch auth.users by user_id or call the admin auth API/get_user_by_id)
to retrieve the user's email and assign it before calling
_send_deletion_confirmation; keep the existing db.select fallback but ensure the
Supabase admin lookup runs only when email is missing.

In `@cloud/RUNBOOK-INCIDENT.md`:
- Around line 158-160: Replace the instruction to delete rows from the
gdpr_export_requests ledger with a non-destructive operator-override process:
remove the DELETE guidance and instead document an explicit override workflow
(who can approve, required approver field, reason field, and where to record
approval) and an immutable marker/update action (e.g., an update flag or
separate audit table) that preserves the original ledger row; reference the
gdpr_export_requests ledger name and ensure the runbook describes how to record
the approver, timestamp, and justification for any override so auditability and
compliance are maintained.

In `@cloud/tests/test_gdpr.py`:
- Around line 36-70: The gdpr_client fixture's inner function _make should
register safe default empty responses on mock_supabase for tables that some
tests omit (at minimum "corrections", "lessons", "events", "meta_rules") so
unregistered-table access won't break tests; update _make to call
mock_supabase.add_response("corrections","select", []) etc. when the
corresponding parameter is None (keep the existing conditional pattern used for
workspaces/brains/users) so tests that don't pass those lists still receive an
empty result.

In `@marketing/app/legal/dpa/page.tsx`:
- Line 9: The robots meta currently enables indexing via the object robots: {
index: true, follow: true }; change it to prevent indexing for the draft DPA by
setting index to false (e.g., robots: { index: false, follow: true } or
equivalent) so the draft page is noindexed until legal approval; update the
robots object where it is defined in this page (the robots export/object in
marketing/app/legal/dpa/page.tsx) to reflect the noindex state.
- Around line 126-127: Update the DPA language to avoid asserting automatic
purging within 30 days: replace the phrases "is then purged" and any statement
like "will delete … within 30 days" with wording that reflects current
implementation (e.g., data enters a 30-day soft-delete retention period and is
eligible for purge but nightly purge cron is out of scope/subject to operational
processes), and apply the same change to the similar occurrences around lines
referenced (the other instance at 168-169) so the text no longer guarantees an
automated purge/deletion timeline.

In `@marketing/app/legal/subprocessors/page.tsx`:
- Line 9: The page currently sets robots: { index: true, follow: true } but the
page content contains the “DRAFT — pending legal review.” marker; change the
robots metadata to prevent indexing until sign-off by setting index to false
(e.g., robots: { index: false, follow: true }) or implement a conditional around
the robots object that flips index to false when the draft marker is present;
update the robots/index value in page.tsx (the robots object) so drafts are
noindexed.

In `@marketing/public/.well-known/security.txt`:
- Line 14: Change the phrase "- The Gradata open source SDK
(github.com/Gradata/gradata)" to use the compound adjective "open-source" (i.e.,
"open-source SDK") so the line reads "- The Gradata open-source SDK
(github.com/Gradata/gradata)" to match standard usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fb67bf1e-e7ee-4acd-9939-91bc8a62e8cd

📥 Commits

Reviewing files that changed from the base of the PR and between ce1ad3c and e602d68.

📒 Files selected for processing (13)
  • cloud/RUNBOOK-INCIDENT.md
  • cloud/app/auth.py
  • cloud/app/models.py
  • cloud/app/routes/__init__.py
  • cloud/app/routes/gdpr.py
  • cloud/app/routes/users.py
  • cloud/migrations/005_soft_delete.sql
  • cloud/tests/test_gdpr.py
  • marketing/app/legal/dpa/page.tsx
  • marketing/app/legal/sla/page.tsx
  • marketing/app/legal/subprocessors/page.tsx
  • marketing/public/.well-known/security.txt
  • marketing/src/components/Footer.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
🪛 LanguageTool
marketing/public/.well-known/security.txt

[grammar] ~14-~14: Use a hyphen to join words.
Context: ...i, api.gradata.ai # - The Gradata open source SDK (github.com/Gradata/gradata) ...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (9)
cloud/app/routes/__init__.py (1)

13-13: GDPR router integration looks correct.

Import and registration are consistent with the existing router aggregation pattern.

Also applies to: 36-36

marketing/src/components/Footer.tsx (1)

35-37: Legal footer link additions are clean and consistent.

The new DPA/SLA/Subprocessors entries follow the same pattern as existing legal links.

cloud/app/routes/users.py (1)

37-38: Soft-delete filtering is correctly applied in both profile paths.

Good consistency between get_profile and update_profile when excluding tombstoned workspaces from response payloads.

Also applies to: 84-85

cloud/app/auth.py (1)

48-50: Auth-path soft-delete enforcement is a strong correctness/security improvement.

The new filters consistently prevent tombstoned brains from authenticating or passing ownership checks.

Also applies to: 118-120, 135-137

marketing/app/legal/sla/page.tsx (1)

1-122: LGTM!

The SLA page is well-structured with proper semantic HTML (<article>, <header>, <section>), accessible table markup, and appropriate Next.js metadata configuration. The draft banner is clearly visible, and the Tailwind v4 CSS variable syntax is correctly applied.

cloud/app/models.py (1)

326-376: LGTM!

The GDPR models are well-designed with clear docstrings, appropriate type hints, and sensible defaults. The DeleteAccountResponse message provides clear user guidance including the contact email for cancellation.

cloud/migrations/005_soft_delete.sql (2)

41-48: LGTM!

The partial indexes are well-designed—indexing only rows where deleted_at IS NOT NULL keeps the indexes small and focused on the tombstoned records that the purge cron will query.


29-35: File reference error and misunderstanding of authentication model: RLS is not applicable here.

The file referenced is 006_soft_delete.sql (not 005). More importantly, the users shadow table does not need RLS because the app uses Supabase's service-role authentication (via settings.supabase_service_key in db.py). Service-role keys bypass RLS entirely in Supabase.

The migration file explicitly documents this design choice (lines 8-10):

"Existing RLS policies SHOULD filter deleted_at IS NULL; this migration adds a convenience view but does NOT rewrite policies. App-side queries must filter on their own (enforced in app/db + route handlers)."

The intended protection is app-side filtering, not RLS. This is evident in the codebase: gdpr.py (lines 265, 293) and auth.py (line 171) handle tombstone filtering at the application level, which is the correct approach for service-role authenticated clients. The gdpr_export_requests table has RLS because it requires direct user isolation (the policy uses auth.uid()), but the users table is already protected by app-side logic sufficient for service-role access.

			> Likely an incorrect or invalid review comment.
cloud/tests/test_gdpr.py (1)

1-336: LGTM!

Comprehensive test coverage for all three GDPR endpoints including:

  • Authentication enforcement
  • Correct counting and date-range calculation for data-summary
  • Export payload structure and rate-limiting behavior
  • Soft-delete inclusion in exports / exclusion in summaries
  • Delete cascade and idempotency
  • Backward compatibility checks

The UUID-based fixtures appropriately avoid PII in test output.

Comment thread cloud/app/routes/gdpr.py Outdated
Comment on lines +104 to +108
for bid in brain_ids:
all_corrections.extend(await db.select("corrections", filters={"brain_id": bid}))
all_lessons.extend(await db.select("lessons", filters={"brain_id": bid}))
all_events.extend(await db.select("events", filters={"brain_id": bid}))
all_meta_rules.extend(await db.select("meta_rules", filters={"brain_id": bid}))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

N+1 query pattern: consider batching per-brain queries.

Each brain triggers 4 separate database queries (corrections, lessons, events, meta_rules). For users with many brains, this could degrade performance. Consider batching with an IN clause on brain_id.

♻️ Suggested approach
-    for bid in brain_ids:
-        all_corrections.extend(await db.select("corrections", filters={"brain_id": bid}))
-        all_lessons.extend(await db.select("lessons", filters={"brain_id": bid}))
-        all_events.extend(await db.select("events", filters={"brain_id": bid}))
-        all_meta_rules.extend(await db.select("meta_rules", filters={"brain_id": bid}))
+    # Batch query all records for user's brains in single calls
+    if brain_ids:
+        all_corrections = await db.select("corrections", filters={"brain_id": ("in", brain_ids)})
+        all_lessons = await db.select("lessons", filters={"brain_id": ("in", brain_ids)})
+        all_events = await db.select("events", filters={"brain_id": ("in", brain_ids)})
+        all_meta_rules = await db.select("meta_rules", filters={"brain_id": ("in", brain_ids)})

Note: Exact syntax depends on your db.select API's support for IN clauses.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 104 - 108, The loop over brain_ids
issues an N+1 pattern by calling db.select per brain for corrections, lessons,
events, and meta_rules; replace this with four batched queries that fetch all
rows for those tables in one call each using an IN-style filter on brain_id
(e.g., use db.select("corrections", filters={"brain_id": <IN brain_ids>}) etc.),
then extend all_corrections, all_lessons, all_events, and all_meta_rules with
the results; update code that references brain_ids/ db.select to use the batched
results and remove the per-bid selects.

Comment thread cloud/app/routes/gdpr.py Outdated
Comment on lines +173 to +209
await _enforce_export_rate_limit(user_id)
db = get_db()

workspaces = await _collect_user_workspaces(user_id, include_deleted=True)
brains = await _collect_user_brains(user_id, include_deleted=True)
brain_ids = [b["id"] for b in brains]

corrections: list[dict] = []
lessons: list[dict] = []
events: list[dict] = []
meta_rules: list[dict] = []
for bid in brain_ids:
corrections.extend(await db.select("corrections", filters={"brain_id": bid}))
lessons.extend(await db.select("lessons", filters={"brain_id": bid}))
events.extend(await db.select("events", filters={"brain_id": bid}))
meta_rules.extend(await db.select("meta_rules", filters={"brain_id": bid}))

payload: dict[str, Any] = {
"schema_version": 1,
"user_id": user_id,
"generated_at": _iso(_utcnow()),
"workspaces": workspaces,
"brains": brains,
"corrections": corrections,
"lessons": lessons,
"meta_rules": meta_rules,
"events": events,
}

serialized = json.dumps(payload, default=str)
size_bytes = len(serialized.encode("utf-8"))

# Log the export request for rate-limit enforcement on the next call.
await db.insert(
"gdpr_export_requests",
{"user_id": user_id, "created_at": _iso(_utcnow())},
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TOCTOU race: concurrent requests can bypass rate limit.

The rate limit check (Line 173) and the ledger insert (Lines 206-209) are not atomic. Two concurrent requests could both pass the check before either inserts a record.

This is low-severity since the 24h window limits abuse, but consider using an INSERT with conflict handling or a transaction to make the rate limit atomic.

🛡️ Suggested fix using INSERT-first approach
 async def export_me(
     user_id: str = Depends(get_current_user_id),
 ) -> DataExportResponse:
-    await _enforce_export_rate_limit(user_id)
     db = get_db()
+    
+    # Atomically check-and-insert: if recent record exists, fail; otherwise insert
+    now = _utcnow()
+    cutoff = now - EXPORT_RATE_WINDOW
+    # Check for recent exports first
+    rows = await db.select(
+        "gdpr_export_requests",
+        columns="id,created_at",
+        filters={"user_id": user_id},
+    )
+    for r in rows:
+        created = r.get("created_at")
+        # ... existing parsing logic ...
+        if created_dt >= cutoff:
+            raise HTTPException(status_code=429, detail="...")
+    
+    # Insert rate limit record BEFORE generating payload
+    await db.insert(
+        "gdpr_export_requests",
+        {"user_id": user_id, "created_at": _iso(now)},
+    )
 
     workspaces = await _collect_user_workspaces(user_id, include_deleted=True)
     # ... rest of export logic ...
-
-    # Log the export request for rate-limit enforcement on the next call.
-    await db.insert(
-        "gdpr_export_requests",
-        {"user_id": user_id, "created_at": _iso(_utcnow())},
-    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 173 - 209, The TOCTOU issue is that
_enforce_export_rate_limit(user_id) and the subsequent db.insert into
"gdpr_export_requests" are not atomic, allowing concurrent requests to bypass
the limit; fix by making the check-and-record atomic: either perform an
insert-first/upsert with a uniqueness constraint (e.g., on (user_id,
date_bucket)) and rely on ON CONFLICT/ignore to detect prior exports, or wrap
the logic in a DB transaction that locks the user's export ledger row before
checking and inserting; modify the code paths around _enforce_export_rate_limit
and the db.insert("gdpr_export_requests", ...) call so the DB enforces
uniqueness/transactional semantics rather than trusting separate read-then-write
steps.

Comment thread cloud/app/routes/gdpr.py Outdated
Comment thread cloud/app/routes/gdpr.py
Comment on lines +291 to +296
# Resolve email from the shadow users table if present (best-effort).
email: str | None = None
user_rows = await db.select("users", columns="email", filters={"id": user_id})
if user_rows:
email = user_rows[0].get("email")
await _send_deletion_confirmation(user_id, email)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how users.email gets populated elsewhere in the codebase
rg -n "users.*email" --type py cloud/app/

Repository: Gradata/gradata

Length of output: 400


🏁 Script executed:

#!/bin/bash
# Look at the upsert operation in context (lines 264-271 and surrounding code)
sed -n '260,300p' cloud/app/routes/gdpr.py | cat -n

Repository: Gradata/gradata

Length of output: 1627


🏁 Script executed:

#!/bin/bash
# Search for all places where users table is updated with email
rg -n "db\.(insert|update|upsert)" cloud/app/routes/gdpr.py -A 5 | head -60

Repository: Gradata/gradata

Length of output: 791


🏁 Script executed:

#!/bin/bash
# Check how users.email gets written in the entire codebase
rg -n "users.*email|email.*users" cloud/app/ --type py -B 2 -A 2

Repository: Gradata/gradata

Length of output: 1249


🏁 Script executed:

#!/bin/bash
# Find all places where email is written to the users table
rg -n "db\.(insert|update|upsert)" cloud/app/ --type py -B 2 -A 10 | rg -B 5 -A 5 "users" | head -100

Repository: Gradata/gradata

Length of output: 2949


🏁 Script executed:

#!/bin/bash
# Look for email population in auth.py and other auth-related flows
cat -n cloud/app/auth.py | head -250

Repository: Gradata/gradata

Length of output: 8434


Email field will be None for newly-created shadow rows.

The upsert (lines 264–271) only sets id, deleted_at, and purge_after—not email. If the users row doesn't already exist, the subsequent lookup will return None. While the code defensively handles this, the confirmation email may not be sent. Either ensure the shadow users row is pre-populated with email before this endpoint runs, or remove the email dependency by fetching it from Supabase's auth.users table via the admin API.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 291 - 296, The shadow users upsert
doesn't populate email, so db.select("users", ...) can return None and
_send_deletion_confirmation(user_id, email) may be called without an address;
modify the flow to fall back to the Supabase admin auth API when email is None:
after the db.select call, if email is still None call the Supabase admin user
lookup (e.g., use the project admin client to fetch auth.users by user_id or
call the admin auth API/get_user_by_id) to retrieve the user's email and assign
it before calling _send_deletion_confirmation; keep the existing db.select
fallback but ensure the Supabase admin lookup runs only when email is missing.

Comment thread cloud/RUNBOOK-INCIDENT.md Outdated
Comment thread cloud/tests/test_gdpr.py
Comment thread marketing/app/legal/dpa/page.tsx Outdated
Comment thread marketing/app/legal/dpa/page.tsx Outdated
Comment thread marketing/app/legal/subprocessors/page.tsx Outdated
Comment thread marketing/public/.well-known/security.txt Outdated
Gradata pushed a commit that referenced this pull request Apr 13, 2026
PR #36 (GDPR) uses migration 006_soft_delete.sql. Renaming telemetry
migration to 007 to resolve collision.

Co-Authored-By: Gradata <noreply@gradata.ai>
- Draft DPA + subprocessors pages: noindex until legal sign-off (Major)
- DPA: soften deletion language to match implementation — nightly
  purge cron is still out of scope, so we now describe the 30-day
  window as a target via the scheduled purge workflow (Major)
- Runbook: remove destructive `DELETE FROM gdpr_export_requests`
  instruction — the DSR ledger is an audit trail and must stay
  immutable. Replaced with an operator-approved, time-boxed override
  procedure requiring approver + reason + follow-up ticket (Major)
- GDPR export: shrink TOCTOU window by inserting the rate-limit
  ledger row BEFORE generating the payload, not after (Minor)
- security.txt: "open source" -> "open-source" compound adjective (Minor)

Declined (with reason):
- N+1 queries in data-summary / export / delete: the current PostgREST
  `db.select` wrapper only supports `eq` filters, no `IN` and no
  transactions. Batching needs a db-layer change tracked separately.
- Test fixture defensive defaults: MockSupabaseClient already uses
  defaultdict, so unregistered tables return [] — no fix needed.
- Supabase admin email fallback in delete_me: no admin client wired
  yet; current defensive None-handling is sufficient and the
  confirmation email itself is still a stub.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
cloud/app/routes/gdpr.py (4)

104-108: 🧹 Nitpick | 🔵 Trivial

N+1 query pattern: consider batching per-brain queries.

This iterates 4 queries per brain. For users with many brains, this degrades performance. If your db.select API supports IN filters, batch these into 4 total queries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 104 - 108, The loop issues N+1 queries
by calling db.select for each brain_id; instead batch by querying each table
once with an IN filter using the full brain_ids list (replace the per-id calls
that populate all_corrections, all_lessons, all_events, all_meta_rules). Modify
the code around brain_ids and the db.select calls so you run four selects (one
per table) with an IN-style filter (e.g., filters that express brain_id IN
brain_ids according to your db.select API) and extend the corresponding lists
with the combined results.

279-295: 🧹 Nitpick | 🔵 Trivial

Consider batching cascade updates and wrapping in a transaction.

Individual updates per workspace/brain create N+1 writes. If one update fails mid-cascade, the result is partial soft-deletion. Consider batching with IN filters if supported, and wrapping in a transaction for atomicity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 279 - 295, The current cascade loops
call db.update per record (in _collect_user_workspaces and _collect_user_brains)
causing N+1 writes and risking partial soft-deletes; change the logic to collect
IDs from owned_workspaces and owned_brains and perform a single batched update
using an IN-style filter (e.g., filters={"id__in": ids} or db-specific
equivalent) to update the "workspaces" and "brains" tables with
data={"deleted_at": now_iso}; additionally wrap both batched updates in a
database transaction so both cascade updates succeed or fail atomically, keeping
references to the existing functions/variables (_collect_user_workspaces,
_collect_user_brains, db.update, workspaces, brains, now_iso).

297-302: ⚠️ Potential issue | 🟡 Minor

Email will be None for newly-created shadow rows.

The upsert at lines 270-277 only sets id, deleted_at, and purge_after — not email. If the users row didn't exist before, the select here returns None for email, and the deletion confirmation won't have an address.

Consider either:

  1. Pre-populating users.email during signup/auth flows, or
  2. Fetching email from Supabase's auth.users via admin API as a fallback

196-200: 🧹 Nitpick | 🔵 Trivial

Same N+1 pattern in export endpoint.

Same issue as /me/data-summary — 4 queries per brain. Consider batching.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 196 - 200, The loop over brain_ids
causes an N+1 query pattern by calling db.select four times per brain; replace
the per-brain selects with a single batched select per table using the full
brain_ids list (e.g., use a filter that queries brain_id IN brain_ids) to
populate corrections, lessons, events and meta_rules in one query each and
eliminate the for bid in brain_ids loop and its db.select calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 253-262: The delete_me endpoint should be made idempotent: before
updating the user's deleted_at and purge_after timestamps, fetch the current
user record (in the delete_me handler) and if deleted_at is already set, return
a DeleteAccountResponse built from the existing deleted_at/purge_after (and skip
cascading updates to workspaces/brains); otherwise proceed to set deleted_at and
purge_after and cascade as currently implemented. Locate the delete_me function
and add the pre-check for user.deleted_at, returning the existing values to
avoid resetting the 30-day purge window on repeated calls.

In `@cloud/migrations/006_soft_delete.sql`:
- Around line 65-66: The CREATE POLICY gdpr_export_self on gdpr_export_requests
is not idempotent and will error if reapplied; modify the migration to first
drop the policy if it exists and then recreate it (use DROP POLICY IF EXISTS
gdpr_export_self; followed by the existing CREATE POLICY gdpr_export_self ON
gdpr_export_requests FOR ALL USING (user_id = auth.uid());) so re-running the
migration is safe—ensure both statements run in the same migration
transaction/schema context as the original CREATE POLICY.

In `@marketing/app/legal/subprocessors/page.tsx`:
- Line 75: Replace the phrase "open source SDK" with the compound adjective
"open-source SDK" in the JSX text that currently reads "hosted Gradata Cloud
service. The open source SDK running locally" in page.tsx; update the string so
the sentence becomes "... hosted Gradata Cloud service. The open-source SDK
running locally" to match security.txt formatting.

---

Duplicate comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 104-108: The loop issues N+1 queries by calling db.select for each
brain_id; instead batch by querying each table once with an IN filter using the
full brain_ids list (replace the per-id calls that populate all_corrections,
all_lessons, all_events, all_meta_rules). Modify the code around brain_ids and
the db.select calls so you run four selects (one per table) with an IN-style
filter (e.g., filters that express brain_id IN brain_ids according to your
db.select API) and extend the corresponding lists with the combined results.
- Around line 279-295: The current cascade loops call db.update per record (in
_collect_user_workspaces and _collect_user_brains) causing N+1 writes and
risking partial soft-deletes; change the logic to collect IDs from
owned_workspaces and owned_brains and perform a single batched update using an
IN-style filter (e.g., filters={"id__in": ids} or db-specific equivalent) to
update the "workspaces" and "brains" tables with data={"deleted_at": now_iso};
additionally wrap both batched updates in a database transaction so both cascade
updates succeed or fail atomically, keeping references to the existing
functions/variables (_collect_user_workspaces, _collect_user_brains, db.update,
workspaces, brains, now_iso).
- Around line 196-200: The loop over brain_ids causes an N+1 query pattern by
calling db.select four times per brain; replace the per-brain selects with a
single batched select per table using the full brain_ids list (e.g., use a
filter that queries brain_id IN brain_ids) to populate corrections, lessons,
events and meta_rules in one query each and eliminate the for bid in brain_ids
loop and its db.select calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a0718f08-5da6-45be-8e96-6d84871b6587

📥 Commits

Reviewing files that changed from the base of the PR and between e602d68 and f6f7e79.

📒 Files selected for processing (6)
  • cloud/RUNBOOK-INCIDENT.md
  • cloud/app/routes/gdpr.py
  • cloud/migrations/006_soft_delete.sql
  • marketing/app/legal/dpa/page.tsx
  • marketing/app/legal/subprocessors/page.tsx
  • marketing/public/.well-known/security.txt
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (12)
cloud/RUNBOOK-INCIDENT.md (2)

158-167: LGTM — ledger immutability guidance correctly updated.

The revised guidance preserves the gdpr_export_requests audit trail by requiring documented operator approval instead of direct row deletion. The four-point documentation requirement (approver, reason, window, follow-up ticket) maintains compliance accountability.


1-209: LGTM — comprehensive incident runbook.

The runbook provides actionable playbooks for the five most likely incident types, with appropriate severity definitions, communication templates, and escalation paths. The SQL examples for GDPR troubleshooting are well-targeted.

marketing/public/.well-known/security.txt (1)

1-22: LGTM!

The security.txt file follows RFC 9116 correctly with all required fields (Contact, Expires, Canonical) and recommended fields (Policy, Acknowledgments, Preferred-Languages). The scope definition is clear, the 90-day disclosure window is reasonable, and the expiry date (~1 year out) is appropriate.

marketing/app/legal/subprocessors/page.tsx (2)

4-11: LGTM — noindex correctly applied for draft page.

The robots directive properly prevents indexing while the page is in draft status, with a clear comment explaining the intent. This addresses the compliance concern about indexing unpublished legal content.


21-57: LGTM — well-structured subprocessor data.

The typed Subprocessor interface and array provide clear, maintainable documentation of third-party vendors. External links correctly use rel="noreferrer" for security.

marketing/app/legal/dpa/page.tsx (3)

4-11: LGTM — noindex correctly applied for draft DPA.

The robots directive properly prevents search engine indexing while the page awaits legal review.


121-133: LGTM — deletion language appropriately hedged.

The phrasing "soft-delete window with a target purge horizon of 30 days, executed via a scheduled purge workflow" accurately reflects the implementation status (nightly cron is out of scope) without making binding commitments that aren't yet enforced.


165-173: LGTM — termination deletion language consistent.

The "targets deletion" phrasing aligns with Section 9 and avoids overpromising given the pending purge cron implementation.

cloud/migrations/006_soft_delete.sql (2)

20-48: LGTM — schema additions are well-structured.

The IF NOT EXISTS guards make the migration idempotent for table/column/index creation. Partial indexes on deleted_at IS NOT NULL efficiently support purge queries without bloating the index with active rows.


29-35: RLS is intentional by design — no action needed.

The public.users table intentionally lacks RLS because all access is backend-only via service role (which bypasses RLS anyway), and app-side filtering is enforced in the db layer and route handlers. This design is explicitly documented in the migration file (lines 8-10): "App-side queries must filter on their own (enforced in app/db + route handlers)." The gdpr_export_requests table uses RLS for defensive purposes, but the users shadow table doesn't require it given the access pattern.

cloud/app/routes/gdpr.py (2)

176-186: LGTM — TOCTOU window reduced by moving insert before payload generation.

The ledger insert now happens before the expensive data collection, reducing the race window. The comment at lines 176-182 transparently acknowledges the remaining limitation without PostgREST transaction support.


1-52: LGTM — well-documented module with clear contracts.

The module docstring clearly explains the three endpoints, rate limiting strategy, inline threshold, and TODOs. Helper functions are appropriately scoped and documented.

Comment thread cloud/app/routes/gdpr.py
Comment thread cloud/migrations/006_soft_delete.sql
Comment thread marketing/app/legal/subprocessors/page.tsx Outdated
- gdpr.py: idempotency guard on /me/delete — repeat calls return existing
  ledger state instead of resetting the 30-day purge window or re-cascading.
  Reuses the pre-cascade users fetch for the email lookup (one fewer round-trip).
- 006_soft_delete.sql: wrap policy creation in DROP POLICY IF EXISTS so the
  migration is re-runnable (Postgres has no CREATE POLICY IF NOT EXISTS).
- subprocessors/page.tsx: hyphenate "open-source" as a compound adjective.
- Add inline notes flagging the tracked follow-ups for the previously
  declined items (cascade N+1 needs db.select IN-support; admin email needs
  ESP wired before _send_deletion_confirmation can dispatch).
- New test: test_delete_is_idempotent_when_already_soft_deleted.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 129-160: In _enforce_export_rate_limit, avoid selecting all rows
by changing the db.select call for "gdpr_export_requests" to fetch only the most
recent record (use order by created_at DESC and limit 1, or the equivalent
supported by your db.select API) and then run the same datetime parsing and
cutoff comparison against that single row; if db.select doesn't support
order/limit, replace it with a query method that does (or request the
max(created_at) for the user) so only the latest export timestamp is retrieved
and checked.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3b17cfe8-3216-4b83-ade2-2208aad2df8e

📥 Commits

Reviewing files that changed from the base of the PR and between f6f7e79 and 31ace9c.

📒 Files selected for processing (4)
  • cloud/app/routes/gdpr.py
  • cloud/migrations/006_soft_delete.sql
  • cloud/tests/test_gdpr.py
  • marketing/app/legal/subprocessors/page.tsx
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (3.12)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (19)
marketing/app/legal/subprocessors/page.tsx (3)

4-11: Draft metadata is correctly configured for non-indexing.

Line 10 correctly keeps this draft page out of search indexing, and the canonical URL is consistent with the legal route structure.


73-79: Draft notice copy and legal cross-link look good.

The banner language is clear, and the /legal/dpa/ link is consistent with the related legal page path.


84-113: Subprocessor list rendering is clean and maintainable.

Typed data + mapped rendering with semantic dl/dt/dd output is solid and easy to extend.

cloud/app/routes/gdpr.py (7)

104-108: N+1 query pattern for per-brain data.

Each brain triggers 4 separate queries. This was flagged in a previous review.


196-200: Same N+1 pattern as /me/data-summary.

This duplicates the per-brain loop pattern flagged in the previous review for data summary.


297-316: N+1 cascade updates acknowledged with follow-up tracking.

The inline NOTE documents the per-row UPDATE pattern as a known limitation. This was flagged in a previous review.


318-321: Email will be None for users without pre-existing shadow row.

The email lookup relies on existing_rows which is empty for first-time deletions. This was flagged in a previous review.


176-186: Acknowledged TOCTOU mitigation — good defensive comment.

Moving the ledger insert before payload generation shrinks the race window. The inline comment clearly documents the trade-off and why true atomicity isn't achievable with the current DB wrapper.


270-285: LGTM — idempotency guard prevents purge window reset.

Repeat calls now return the existing deleted_at/purge_after without re-cascading, addressing the previous review feedback.


226-235: LGTM — clear error handling for oversized exports.

The 507 response with actionable contact information is appropriate until signed-URL delivery is implemented.

cloud/tests/test_gdpr.py (6)

35-70: Missing default empty responses for some tables may cause fragile tests.

The gdpr_client fixture only registers mock responses for explicitly provided parameters. Tests like test_delete_returns_202_and_schedules_purge don't provide corrections, lessons, events, or meta_rules. If mock_supabase doesn't gracefully handle unregistered table access, these tests could break unexpectedly.

This was noted in a previous review.


78-97: LGTM — auth enforcement tests cover all three endpoints.

Testing for both 401 and 403 is appropriate since FastAPI's HTTPBearer behavior can vary.


105-169: LGTM — comprehensive coverage for data summary endpoint.

Tests verify counts, timestamp calculations, and soft-delete exclusion.


177-265: LGTM — thorough export endpoint test coverage.

Tests cover inline payload delivery, rate limiting (both directions), ledger insertion, and soft-deleted row inclusion for GDPR transparency.


321-350: LGTM — idempotency test correctly verifies no re-cascade.

The test confirms that repeat /me/delete calls return existing tombstone values and don't insert a new record, protecting the 30-day purge window.


359-368: LGTM — sanity check for backward compatibility.

This confirms that the /workspaces/{id}/members endpoint remains functional after the soft-delete filtering changes were introduced elsewhere.

cloud/migrations/006_soft_delete.sql (3)

65-69: LGTM — idempotency guard added for policy.

The DROP POLICY IF EXISTS before CREATE POLICY addresses the previous review feedback and makes the migration re-runnable.


29-35: Shadow users table is only populated on-demand — confirm this is intentional.

The public.users table is created here but no trigger populates it on user signup (the existing handle_new_user() in 001_initial_schema.sql only creates workspaces and workspace_members). This means for most users, the row won't exist until they call /me/delete.

This lazy approach is valid, but note that the email column will always be NULL for newly-created tombstone rows since the upsert in gdpr.py doesn't populate it. The confirmation email stub in the route will receive None.


41-48: LGTM — partial indexes are efficient for soft-delete queries.

The WHERE deleted_at IS NOT NULL partial indexes will only index tombstoned rows, keeping the indexes small while enabling fast purge-candidate lookups.

Comment thread cloud/app/routes/gdpr.py
Comment on lines +129 to +160
async def _enforce_export_rate_limit(user_id: str) -> None:
"""Raise 429 if the user already exported within EXPORT_RATE_WINDOW."""
db = get_db()
rows = await db.select(
"gdpr_export_requests",
columns="id,created_at",
filters={"user_id": user_id},
)
if not rows:
return
cutoff = _utcnow() - EXPORT_RATE_WINDOW
for r in rows:
created = r.get("created_at")
if not created:
continue
# Accept both datetime and ISO-string forms.
if isinstance(created, str):
try:
created_dt = datetime.fromisoformat(created.replace("Z", "+00:00"))
except ValueError:
continue
elif isinstance(created, datetime):
created_dt = created
else:
continue
if created_dt.tzinfo is None:
created_dt = created_dt.replace(tzinfo=timezone.utc)
if created_dt >= cutoff:
raise HTTPException(
status_code=429,
detail="Data export is rate-limited to 1 request per 24 hours.",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider fetching only the most recent export request.

The rate limit check fetches all historical export requests for the user. While the 24h rate limit bounds growth to ~365 rows/year, fetching only the most recent record would be more efficient.

♻️ Suggested optimization
 async def _enforce_export_rate_limit(user_id: str) -> None:
     """Raise 429 if the user already exported within EXPORT_RATE_WINDOW."""
     db = get_db()
-    rows = await db.select(
-        "gdpr_export_requests",
-        columns="id,created_at",
-        filters={"user_id": user_id},
-    )
-    if not rows:
+    # Only need to check the most recent request
+    rows = await db.select(
+        "gdpr_export_requests",
+        columns="id,created_at",
+        filters={"user_id": user_id},
+        order="created_at.desc",
+        limit=1,
+    )
+    if not rows:
         return
     cutoff = _utcnow() - EXPORT_RATE_WINDOW
-    for r in rows:
-        created = r.get("created_at")
+    created = rows[0].get("created_at")
+    if not created:
+        return
     ...

Note: Exact syntax depends on your db.select API's support for order and limit parameters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 129 - 160, In
_enforce_export_rate_limit, avoid selecting all rows by changing the db.select
call for "gdpr_export_requests" to fetch only the most recent record (use order
by created_at DESC and limit 1, or the equivalent supported by your db.select
API) and then run the same datetime parsing and cutoff comparison against that
single row; if db.select doesn't support order/limit, replace it with a query
method that does (or request the max(created_at) for the user) so only the
latest export timestamp is retrieved and checked.

Flag the _enforce_export_rate_limit full-scan as a db-layer follow-up.
SupabaseClient.select currently only accepts eq filters — no order_by /
limit support — so narrowing the query requires extending the wrapper.
Leaving a TODO to track rather than scoping the db-layer change into
this PR.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
cloud/app/routes/gdpr.py (1)

291-319: ⚠️ Potential issue | 🟠 Major

The soft-delete cascade is still non-atomic.

users is tombstoned before the per-row workspaces/brains updates run, so any later update() failure can leave the account marked deleted while some owned resources remain active. This still needs a transaction or DB-side bulk update once the wrapper supports it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/routes/gdpr.py` around lines 291 - 319, The current soft-delete
flow performs db.upsert("users", ...) before per-row db.update calls for
workspaces and brains (via _collect_user_workspaces and _collect_user_brains),
leaving the operation non-atomic; wrap the user tombstone plus all
workspace/brain updates in a single database transaction (or use a DB-side bulk
update once supported) so either all rows (users, workspaces, brains) are
updated or none are; specifically, perform the db.upsert("users", ...) and the
subsequent db.update("workspaces", ...) and db.update("brains", ...) calls
inside the same transactional context (or replace the per-row loops with a
single bulk UPDATE using a filter like id=in.(...)) to ensure atomicity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 88-91: The GET handlers for user data (e.g., data_summary)
currently accept user_id from get_current_user_id which only validates the JWT;
update these handlers to verify the account is active (not soft-deleted) before
returning data. Specifically, add an active-user check after obtaining user_id
(or replace get_current_user_id with a new auth helper) that queries the users
table for deleted_at IS NULL (or uses the existing delete_me check logic) and
raise an appropriate HTTP 403/404 if the user is soft-deleted; apply the same
change to the /me/export handler referenced around lines 167-169 so both
data_summary and export endpoints block access for soft-deleted accounts.
- Around line 179-189: The current logic inserts into gdpr_export_requests (via
the db.insert call) before the export payload is generated, which consumes the
24-hour quota even on failures; change the flow so that either (a) you only
insert the successful export row into gdpr_export_requests after the payload is
generated/delivered, or (b) add a status field (e.g.,
status="success"|"failure") and write a failure record separately, and update
_enforce_export_rate_limit to count only status="success" rows; apply the same
change to the analogous block around lines 227-238 so failed/oversized exports
do not decrement the user's retry quota.
- Line 219: The log call currently emits raw PII (e.g., the _log.info(...) that
logs user_id and size_bytes and other GDPR-related logs that include email in
the deletion path); replace raw identifiers with a non-reversible or redacted
representation before logging (for example compute a deterministic hash of
user_id/email or redact them to a short masked string) and continue to log
counts/status like size_bytes; update every GDPR-related logger usage (the
_log.info call that prints "GDPR export generated for user=%s size=%d" and the
deletion-path logs that include email) to use the hashed/redacted identifier
instead of the raw value.

---

Duplicate comments:
In `@cloud/app/routes/gdpr.py`:
- Around line 291-319: The current soft-delete flow performs db.upsert("users",
...) before per-row db.update calls for workspaces and brains (via
_collect_user_workspaces and _collect_user_brains), leaving the operation
non-atomic; wrap the user tombstone plus all workspace/brain updates in a single
database transaction (or use a DB-side bulk update once supported) so either all
rows (users, workspaces, brains) are updated or none are; specifically, perform
the db.upsert("users", ...) and the subsequent db.update("workspaces", ...) and
db.update("brains", ...) calls inside the same transactional context (or replace
the per-row loops with a single bulk UPDATE using a filter like id=in.(...)) to
ensure atomicity.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a1031240-f307-4380-a5ff-5f80b4410c42

📥 Commits

Reviewing files that changed from the base of the PR and between 31ace9c and aff5e4a.

📒 Files selected for processing (1)
  • cloud/app/routes/gdpr.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
cloud/app/routes/gdpr.py (1)

273-288: Nice idempotency guard.

Returning the existing deleted_at/purge_after avoids silently extending the purge window on repeated delete calls.

Comment thread cloud/app/routes/gdpr.py
Comment thread cloud/app/routes/gdpr.py Outdated
Comment thread cloud/app/routes/gdpr.py Outdated
…ffort rollback

- Guard /me/data-summary and /me/export against soft-deleted accounts
  (JWT stays valid until TTL; add users.deleted_at IS NULL check, 404 on deleted).
- Don't burn the 24h export quota on failed exports: move the
  gdpr_export_requests ledger insert to AFTER size check passes so 507
  oversized-payload rejections don't block the user from retrying.
- Strip raw user_id and email from GDPR info logs (log counts/has_email only)
  since these logs are long-lived and compliance-sensitive.
- Reorder soft-delete cascade to tombstone subordinate resources
  (workspaces, brains) BEFORE the users row, so mid-cascade failures leave
  the account recoverable rather than orphaned-but-tombstoned. Wrap in
  try/except with best-effort rollback of partial subordinate updates and
  raise HTTP 500 so the client retries. Proper transactional support is
  tracked as a follow-up (requires db-wrapper changes).
- Add tests for: soft-deleted-user read block, ledger-row-only-on-success,
  cascade subordinate-first ordering, mid-cascade rollback.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai addressed round-4 in 0f4eea8:

Three actionable items — fixed as suggested:

  1. Added _require_active_user() guard called by data_summary and export_me — soft-deleted accounts now get 404 on reads even if their JWT is still within TTL.
  2. Moved the gdpr_export_requests ledger insert to AFTER the size check passes, so a 507 oversized-payload rejection no longer burns the 24h quota.
  3. Stripped user_id and email from the three GDPR info logs — we now log counts / has_email=bool only.

Duplicate non-atomic cascade — declining the transactional fix for the third time, but made it resilient:
The db wrapper (SupabaseClient) still has no transaction support and no id=in.(...) filter — a proper atomic cascade is blocked on a separate follow-up that grows either. Rather than punt entirely, this round:

  • Reorders the cascade: workspaces + brains tombstone FIRST, users row LAST. A mid-flight failure now leaves the account still active (recoverable on retry) instead of tombstoned-with-live-resources (orphaned and invisible to the user).
  • Wraps in try/except with best-effort rollback: on any subordinate update failure, we revert the already-tombstoned workspace/brain rows and raise HTTP 500 so the user retries the whole cascade.
  • Comment left in-code linking to the follow-up for proper transactional support.

New tests cover: soft-deleted-user read blocks, ledger-only-on-success (happy + 507 paths), subordinate-first ordering, and mid-cascade rollback on simulated brain-update failure. 135/135 cloud tests pass.

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me verify the implementation in the updated code before responding.


🏁 Script executed:

#!/bin/bash
# Check gdpr.py for the new guard, ledger insert position, log changes, and cascade ordering
fd gdpr.py --type f | xargs cat

Length of output: 36181


@Gradata — verified all four items against 0f4eea8. Everything is exactly as described:

_require_active_user() guard
Present in both data_summary and export_me. The shadow-table check handles the no-row case correctly (if rows and rows[0].get("deleted_at")) — users who were never soft-deleted have no entry in the users table, so the guard passes without error. delete_me intentionally skips this guard and handles idempotency inline, which is the right call.

✅ Ledger insert after size check
The await db.insert("gdpr_export_requests", ...) sits below the if size_bytes > INLINE_EXPORT_MAX_BYTES raise, and the TOCTOU note in the comment is clear and appropriately scoped.

✅ PII stripped from logs
All three log sites emit only size=, has_email=bool(email), and aggregate counts — no user_id, no email string anywhere in the log calls.

✅ Resilient cascade (ordering + rollback)
workspacesbrainsusers (upsert) order confirmed. The try/except collects updated_workspaces / updated_brains and reverts them on failure before raising HTTP 500. The comment linking to the follow-up for a proper transaction or id=in.(...) bulk PATCH is in place and appropriate.

New teststest_delete_tombstones_subordinates_before_user, test_delete_rolls_back_partial_cascade_on_mid_flight_failure, the two soft-deleted-user read-block tests, and both ledger-quota tests are all present and correctly structured.

One minor note for the backlog: _require_active_user adds an extra DB round-trip on every GDPR read. It's fine for now given the low call frequency on these endpoints, but worth memoizing within the request lifetime (e.g., via a @lru_cache scoped to the dependency chain) once the user-table lookup gets busier.

Everything else looks solid — happy to approve this.

[approve]

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

Addressed round-7 CR:

  • cloud/app/db.py: update() now raises HTTPException 500 with a scrubbed message when filters is None/empty, closing the full-table PATCH hole CR flagged. Added a final if not params safety net in case every filter value was stripped to an empty collection.
  • cloud/app/db.py: Applied the same guard to delete() for defense-in-depth on the GDPR cascade paths.
  • cloud/tests/conftest.py: MockSupabaseClient.update/delete mirror the new contract so tests exercise the guard path.

140/141 cloud tests pass; the 1 failure is pre-existing (test_export_script_handles_empty_run_dir missing scripts/export_ab_proof.py in this worktree, unrelated to this PR).

@Gradata

Gradata commented Apr 14, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 14, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — fixes pushed since last review, please re-verify

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
🧩 Analysis chain

Let me verify the latest state of the round-7 fixes before triggering the review.


🏁 Script executed:

#!/bin/bash
echo "=== cloud/app/db.py (update + delete guards) ==="
fd db.py cloud/app --type f | xargs cat

echo ""
echo "=== cloud/tests/conftest.py (MockSupabaseClient update/delete) ==="
fd conftest.py cloud/tests --type f | xargs cat

echo ""
echo "=== cloud/app/routes/gdpr.py (key sections) ==="
fd gdpr.py cloud/app --type f | xargs grep -n "require_active_user\|gdpr_export_requests\|has_email\|user_id\|deleted_at\|rollback\|bulk" | head -80

Length of output: 13710


@Gradata — verified all round-7 fixes against the latest branch state. Everything is correctly implemented:

db.update() unfiltered-PATCH guard
Three-layer defense confirmed: (1) if not filters guard at entry raises HTTP 500, (2) empty-list filter value short-circuits to return [] as a no-op, (3) final if not params safety net catches any edge case where all filter values were stripped away.

db.delete() unfiltered-DELETE guard
if not filters guard raises HTTP 500 with a scrubbed message before any params are built — mirrors the update guard pattern.

MockSupabaseClient parity in conftest.py
Both update and delete now match production semantics exactly: if not filtersHTTPException(500), and update additionally returns [] on an empty list/tuple/set filter value.

✅ All round 4–6 items still holding

  • _require_active_user() called in both data_summary and export_me (soft-deleted JWT blocked)
  • Ledger insert (gdpr_export_requests) sits after the 10MB size check (507 doesn't burn quota)
  • Logs emit has_email=bool(email) only — no raw user_id/email anywhere on the GDPR paths
  • Cascade order: workspacesbrainsusers upsert; best-effort rollback via bulk deleted_at=None patch on failure

Everything looks solid. Triggering the review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — CR Pro active, please re-verify.

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== cloud/app/db.py: update + delete guards ==="
fd db.py --type f | grep -v __pycache__ | xargs grep -n "not filters\|not params\|empty.*collection\|unfiltered\|in\.\(" | head -40

echo ""
echo "=== cloud/app/routes/gdpr.py: key guards and ordering ==="
fd gdpr.py --type f | grep routes | xargs grep -n "_require_active_user\|gdpr_export_requests\|has_email\|size_bytes > INLINE\|deleted_at\|rollback\|workspaces.*brains\|brains.*users" | head -60

echo ""
echo "=== cloud/tests/conftest.py: mock update/delete guards ==="
fd conftest.py --type f | xargs grep -n "not filters\|HTTPException\|empty\|list\|tuple\|set" | head -30

echo ""
echo "=== marketing SLA noindex ==="
fd "page.tsx" --type f | xargs grep -l "sla" | xargs grep -n "index.*false\|noindex\|robots" | head -10

echo ""
echo "=== marketing DPA API paths ==="
fd "page.tsx" --type f | xargs grep -l "dpa" | xargs grep -n "/api/v1/me" | head -10

echo ""
echo "=== security.txt present ==="
fd "security.txt" --type f

echo ""
echo "=== migration 006 soft delete ==="
fd "006_soft_delete.sql" --type f | xargs grep -c "."

Length of output: 4991


@Gradata — re-verified against the latest branch state. All fixes across rounds 4–7 are confirmed intact:

✅ Round-7 (db guards)

  • db.update(): three-layer guard — not filters → HTTP 500; empty list/tuple/set filter value → [] no-op; final not params safety net (mirrored exactly in MockSupabaseClient.update)
  • db.delete(): not filters → HTTP 500 (mirrored in MockSupabaseClient.delete)

✅ Round-6 (marketing)

  • SLA robots: { index: false, follow: false } confirmed at line 10
  • DPA data-subject rights section references full API paths /api/v1/me/export and /api/v1/me/delete at line 122

✅ Rounds 4–5 (GDPR endpoints)

  • _require_active_user() called in both data_summary (line 143) and export_me (line 224)
  • gdpr_export_requests ledger insert at line 277, strictly after the 10MB size check at line 255 — 507 responses never burn quota
  • Logs emit has_email=bool(email) only — no raw PII
  • Cascade order: workspaces → brains → users upsert; best-effort bulk rollback (deleted_at=None) on mid-flight failure
  • Migration 006 (006_soft_delete.sql) present at 61 lines

Everything looks solid. Triggering the review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

Rebased on main (merge 24ce2b0) + round-8 CR review: All previously-flagged items were already addressed in round-7 (def0fb1 hard-guards unfiltered PATCH/DELETE in cloud/app/db.py). CodeRabbit's re-flag of update() is a stale read — current code raises HTTPException 500 when filters is None/empty, and also refuses when all filter values evaluate empty. PII logs already redacted (size_bytes only for export, has_email bool for deletion). SLA/DPA/Subprocessors draft pages already have robots noindex. Cloud tests: 135 passed (incl. 20 GDPR); unrelated test_proof.py failure is due to missing cloud/scripts/export_ab_proof.py on main — tracked in separate branch fix/missing-export-ab-proof-script.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review — rebased on latest main, all prior fixes applied

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== db.py guards (update + delete) ==="
fd db.py --type f | grep -v __pycache__ | xargs grep -n "not filters\|not params\|HTTPException\|unfiltered\|in\.\(" | head -40

echo ""
echo "=== gdpr.py: _require_active_user, ledger ordering, PII logs, cascade ==="
fd gdpr.py --type f | grep routes | xargs grep -n "_require_active_user\|gdpr_export_requests\|has_email\|size_bytes > INLINE\|deleted_at\|rollback\|workspaces\|brains\|users" | head -80

echo ""
echo "=== conftest.py mock guards ==="
fd conftest.py --type f | xargs grep -n "not filters\|HTTPException\|empty\|list.*tuple\|tuple.*set" | head -20

echo ""
echo "=== SLA noindex ==="
fd "page.tsx" --type f | xargs grep -l "sla" 2>/dev/null | xargs grep -n "index.*false\|robots" | head -10

echo ""
echo "=== DPA API paths ==="
fd "page.tsx" --type f | xargs grep -l "dpa" 2>/dev/null | xargs grep -n "/api/v1/me" | head -10

echo ""
echo "=== migration 006 present ==="
fd "006_soft_delete.sql" --type f

echo ""
echo "=== security.txt ==="
fd "security.txt" --type f

echo ""
echo "=== routes/__init__.py includes gdpr router ==="
fd "__init__.py" --type f | grep routes | xargs grep -n "gdpr" | head -10

echo ""
echo "=== git log --oneline -5 (recent commits) ==="
git log --oneline -5

Length of output: 6047


@Gradata — verified the post-rebase state against the latest commit (0a51347). All round 4–8 fixes are intact:

Item Status Location
_require_active_user() guard (data_summary + export_me) gdpr.py lines 143, 224
Ledger insert after 10MB size check (507 ≠ quota burn) gdpr.py lines 255 → 277
PII-scrubbed logs (has_email=bool only) gdpr.py line 299
Cascade order: workspaces → brains → users gdpr.py lines 355, 362, 371
Best-effort bulk rollback on mid-cascade failure gdpr.py lines 389–402
db.update() / db.delete() unfiltered-operation guards conftest.py lines 59–61, 72–74
Mock parity (empty list → no-op; no filters → HTTP 500) conftest.py line 63
SLA robots: { index: false, follow: false } sla/page.tsx line 10
DPA full API paths /api/v1/me/export, /api/v1/me/delete dpa/page.tsx line 122
Migration 006_soft_delete.sql Present
GDPR router registered routes/__init__.py lines 13, 38

Nothing was lost in the merge onto main. Triggering the review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Gradata added 2 commits April 14, 2026 23:17
…LETE

Round-9 response to CodeRabbit critical finding on PR #36. The primary
guard (`if not filters: raise`) was already present from earlier rounds;
this commit adds:

- Post-transform empty-params guard in delete() mirroring update()
- Dedicated test_db_guards.py with 5 tests proving update() and delete()
  never fire an HTTP call when filters is None or {}, using a fake http
  client that asserts on any call
@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

Round-9: reinforced unfiltered-PATCH guard in db.update() + test

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cloud/app/db.py (1)

132-157: ⚠️ Potential issue | 🟠 Major

delete() lacks list/tuple filter handling, asymmetric with update().

The update() method correctly handles list/tuple/set filter values (converting to in.(...) syntax and treating empty lists as a no-op), but delete() only uses eq.{val} for all values. This creates inconsistent behavior:

Call update() behavior delete() behavior
filters={"id": [1,2,3]} id=in.(1,2,3) id=eq.[1, 2, 3] ❌ invalid
filters={"id": []} Returns [], no HTTP ✅ id=eq.[], fires DELETE ❌

The update() method is already being called with list filters in production (e.g., gdpr.py lines 354–364 pass workspace_ids and brain_ids lists). The delete() method currently receives only scalars, but should be aligned to prevent unsafe behavior if batch deletions are added in future.

Proposed fix
    async def delete(
        self, table: str, filters: dict[str, Any] | None = None,
    ) -> list[dict]:
        """DELETE rows matching eq filters. Returns deleted rows when PostgREST sends them back.

        Refuses to issue any DELETE without at least one filter constraint to
        prevent accidental full-table deletion.
        """
        if not filters:
            _log.error("Refusing unfiltered delete on %s", table)
            raise HTTPException(status_code=500, detail="Database error")
        params: dict[str, str] = {}
        for key, val in filters.items():
-            params[key] = f"eq.{val}"
+            if isinstance(val, (list, tuple, set)):
+                if not val:
+                    # No matches possible; refuse to DELETE without a filter.
+                    return []
+                joined = ",".join(str(v) for v in val)
+                params[key] = f"in.({joined})"
+            else:
+                params[key] = f"eq.{val}"
         # Final safety net: if every filter value was stripped away somehow,
         # refuse rather than issue a full-table DELETE.
         if not params:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/app/db.py` around lines 132 - 157, The delete() method currently treats
every filter value as eq.{val} and must be updated to mirror update() semantics:
for each filters.items() if val is a list/tuple/set then if empty return [] (no
HTTP call) else convert to "in.(v1,v2,...)"; if val is not an iterable or is a
string use "eq.{val}"; build params accordingly (same symbol: delete(),
_http.delete, _raise_db_error, resp.json()) and keep the existing safety checks
and error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cloud/tests/test_db_guards.py`:
- Around line 74-91: Add a test that covers the empty-list filter case for
delete: create an async test (e.g.
test_delete_returns_empty_for_empty_list_filter) that calls
client.delete("users", filters={"id": []}) using _make_client(), asserts the
result is [] and asserts fake.calls == [] to ensure no DELETE is fired; this
documents and will catch regressions in the client.delete behavior referenced in
db.py.

---

Outside diff comments:
In `@cloud/app/db.py`:
- Around line 132-157: The delete() method currently treats every filter value
as eq.{val} and must be updated to mirror update() semantics: for each
filters.items() if val is a list/tuple/set then if empty return [] (no HTTP
call) else convert to "in.(v1,v2,...)"; if val is not an iterable or is a string
use "eq.{val}"; build params accordingly (same symbol: delete(), _http.delete,
_raise_db_error, resp.json()) and keep the existing safety checks and error
handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 141929a8-9039-483f-a300-956ebcb775fd

📥 Commits

Reviewing files that changed from the base of the PR and between def0fb1 and d9f0d83.

📒 Files selected for processing (2)
  • cloud/app/db.py
  • cloud/tests/test_db_guards.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: test (3.11)
  • GitHub Check: Test (Python 3.12)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (3)
cloud/app/db.py (1)

93-125: LGTM on the update() guards.

The defense-in-depth approach is solid:

  1. Early guard rejects None/empty filters
  2. Empty list filter values return [] without issuing a PATCH
  3. List/tuple/set values correctly convert to PostgREST in.(...) syntax
  4. Final safety net catches edge cases where all filters were stripped
cloud/tests/test_db_guards.py (2)

12-42: Well-designed test infrastructure.

The _FakeHTTP stub and _make_client() helper cleanly isolate the guard logic from real HTTP calls. The approach of appending to calls before raising ensures assertions remain verifiable even if guards fail.


45-71: Good coverage for update() guard scenarios.

The three test cases comprehensively cover:

  • filters=None → raises 500
  • filters={} → raises 500
  • filters={"id": []} → returns [] (no-op)

Comment on lines +74 to +91
@pytest.mark.asyncio
async def test_delete_refuses_when_filters_is_none():
"""delete(filters=None) must raise and NOT fire a DELETE."""
client, fake = _make_client()
with pytest.raises(HTTPException) as exc:
await client.delete("users", filters=None)
assert exc.value.status_code == 500
assert fake.calls == []


@pytest.mark.asyncio
async def test_delete_refuses_when_filters_is_empty_dict():
"""delete(filters={}) must raise and NOT fire a DELETE."""
client, fake = _make_client()
with pytest.raises(HTTPException) as exc:
await client.delete("users", filters={})
assert exc.value.status_code == 500
assert fake.calls == []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding test for delete(filters={"id": []}).

The delete() tests cover None and {} filters but not empty-list filter values. Given the asymmetry flagged in db.py, adding this test would:

  1. Document the current (potentially problematic) behavior
  2. Catch regressions if delete() is updated to match update()
📝 Suggested additional test
`@pytest.mark.asyncio`
async def test_delete_returns_empty_for_empty_list_filter():
    """delete(filters={id: []}) should be a no-op returning [] without firing DELETE."""
    client, fake = _make_client()
    result = await client.delete("users", filters={"id": []})
    assert result == []
    assert fake.calls == []

Note: This test will fail with the current delete() implementation, confirming the issue flagged in db.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cloud/tests/test_db_guards.py` around lines 74 - 91, Add a test that covers
the empty-list filter case for delete: create an async test (e.g.
test_delete_returns_empty_for_empty_list_filter) that calls
client.delete("users", filters={"id": []}) using _make_client(), asserts the
result is [] and asserts fake.calls == [] to ensure no DELETE is fired; this
documents and will catch regressions in the client.delete behavior referenced in
db.py.

@Gradata Gradata merged commit 9a538d0 into main Apr 15, 2026
11 checks passed
Gradata added a commit that referenced this pull request Apr 15, 2026
* feat(marketing): Plausible analytics with env-gated event tracking

Adds privacy-first, cookie-free analytics (Plausible) gated by
NEXT_PUBLIC_ENABLE_ANALYTICS so local dev and preview deploys stay clean.
Respects DNT client-side.

Tracked events (all client-side, no PII):
- signup_click   (header, hero, home footer CTA)
- signup_complete (fires once on /?signup=success redirect from Supabase)
- docs_click     (header nav)
- install_copy   (hero install command copy button)

Added vitest with 10 passing tests for the analytics helper (opt-in gate,
DNT, domain default, exception safety, prop forwarding). Documented env
vars in marketing/.env.example and cloud/RAILWAY-ENV.md.

* feat(sdk,cloud): opt-in anonymous activation telemetry

Gives us a way to measure time-to-first-value without collecting any
PII, code, or lesson content. Strictly opt-in, off by default.

SDK (src/gradata/_telemetry.py):
- Prompts once on `gradata init` with explicit "[y/N]" default-off copy
- Persists choice in ~/.gradata/config.toml under [telemetry] enabled
- GRADATA_TELEMETRY=0 env var always wins (kill switch for opted-in users)
- Fires four activation events, each at most once per machine:
    brain_initialized          (gradata init)
    first_correction_captured  (first brain.correct())
    first_graduation           (first correction that graduates a rule)
    first_hook_installed       (first gradata hooks install)
- Wire payload is exactly {event, user_id=sha256(machine_id), ts, sdk_version}
- Network calls run on a daemon thread; never block the user; never raise

Cloud (cloud/app/routes/telemetry.py + migration 006):
- POST /telemetry/event — public, unauthenticated, always 202
- In-memory sliding-window rate limit (100/min/IP)
- Replay protection: drops events >1h old or >5min future
- Pydantic schema enforces field shape; extra fields silently dropped
- DB errors swallowed (never leak state to the public surface)
- New telemetry_events table with RLS enabled (service role only)

Tests: 27 SDK tests + 19 cloud tests, all green. Full SDK suite 2097 pass,
cloud suite 134 pass.

* chore(cloud): rename migration 006_telemetry_events -> 007

PR #36 (GDPR) uses migration 006_soft_delete.sql. Renaming telemetry
migration to 007 to resolve collision.

Co-Authored-By: Gradata <noreply@gradata.ai>

* style: ruff --fix for PR #37

* chore: address CodeRabbit feedback on PR #37

MAJOR (cli.py)
- Replace hardcoded ~/.gradata/config.toml string with _telemetry.config_path()
  so the user sees the real path on Windows/macOS/Linux.
- Replace bare exception swallow with a logger.debug call.
- Guard send_once("brain_initialized") with is_enabled() + try/except so a
  telemetry bug or DNS hiccup can never break ``gradata init``.

MINOR (_telemetry.py)
- Fix prefix-match bug in _write_config_key: drop startswith() pre-filter and
  rely on exact lhs == bare check after splitting on '='.
- Wrap send_once read-modify-write in a cross-process advisory file lock
  (msvcrt on Windows, fcntl elsewhere). Degrades silently if locking is
  unavailable.
- Expose config_path() helper for callers.

MINOR (cloud/app/routes/telemetry.py)
- Tighten _is_hex64: explicitly require lowercase to match docstring and keep
  storage canonical. Also migrate datetime.now(timezone.utc) -> datetime.now(UTC).

MINOR (cloud/migrations/007_telemetry_events.sql)
- Add explicit deny-all RLS policy (telemetry_events_deny_all) wrapped in
  DROP POLICY IF EXISTS for re-runnability.

NIT (marketing)
- Remove redundant <span aria-hidden>{site.name}</span> from page.tsx —
  ``site`` is already used in the metadata export.
- Make CodeBlock aria-label dynamic via copyAriaLabel prop (default
  "Copy code"). Hero passes "Copy install command" to preserve current copy.

NIT (tests/test_telemetry.py)
- Parametrize kill-switch tests across all recognized off- and truthy-values.
- Tighten MAC-leak assertion: assert real MAC is >=12 chars then check it is
  not a substring of the digest, plus a deterministic synthetic-seed test
  that compares the digest to the known sha256.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(test): loosen MAC-leak assertion to check non-inclusion, not length

The prior tightening to `len(mac_hex) >= 12` broke on GitHub-hosted
CI runners where `uuid.getnode()` can yield a shorter hex string.
The security-relevant invariant is that the raw MAC hex must not
appear in the hashed user_id — not that the MAC is a particular
length. The deterministic sha256 check in
test_does_not_leak_synthetic_short_mac remains as the stronger
guarantee.

Co-Authored-By: Gradata <noreply@gradata.ai>

* chore: address CodeRabbit round-2 feedback on PR #37 (print->log, PII ip, proxy-aware limit, unmount cleanup)

* chore: address CodeRabbit round-3 on PR #37 (reject naive ts + other fixes)

- cloud/telemetry.py: reject naive ts at API boundary (silent 202, matches
  always-202 contract) so malicious clients can't extend replay window
  via local-tz interpretation. Adds test_drops_naive_timestamps.
- marketing/CodeBlock.tsx: fix copyValue docstring — it replaces the whole
  copied text, not just the first line.
- tests/test_telemetry.py (SDK): drop redundant test_no_extra_fields
  (test_exact_keys already asserts the same invariant), kept comment
  explaining the security rationale.

* refactor: simplify pass on analytics-and-activation branch

Remove redundant is_enabled() check in cmd_init — send_once() already
gates on is_enabled() internally, so wrapping it in an outer check is
duplicate work that obscures the actual invariant.

No behavior change. Privacy-sensitive code preserved verbatim:
  - _config_lock (msvcrt/fcntl) cross-process lock
  - naive-ts silent-drop at API boundary
  - source_ip excluded from DB (PII)
  - get_remote_address X-Forwarded-For support
  - CodeBlock unmount cleanup + copyAriaLabel prop

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(review): address CodeRabbit feedback round 4 on analytics

Honor the timeout parameter in _config_lock on POSIX by using
non-blocking fcntl.flock with a retry loop, matching the Windows
branch. Previously, blocking flock(LOCK_EX) could hang indefinitely
if another process held the lock, violating the documented
best-effort contract.

Co-Authored-By: Gradata <noreply@gradata.ai>

* fix(telemetry): route config path through shared resolver (CR round-5)

Addresses CodeRabbit Major finding on PR #37: _telemetry.py was
hardcoding Path.home() / ".gradata", violating the src/gradata/**/*.py
coding guideline ("no hardcoded paths"). Introduces a small
_config_paths module with get_config_dir() that supports
GRADATA_CONFIG_DIR and XDG_CONFIG_HOME overrides, and reworks
_telemetry.py to resolve CONFIG_DIR / CONFIG_PATH through it on every
call. Telemetry remains opt-in (default off) and behaviour is
unchanged on the default path.

Also addresses the Minor test fix in cloud/tests/test_telemetry.py:
renames the misleading test_drops_exactly_one_hour_old (which actually
used hours=1, seconds=5) and parametrizes the boundary into two
clearly-named cases, plus adds a positive test asserting an
exactly-1h-old event is still accepted (since the route uses strict
"now - ts > 1h").

tests/test_telemetry.py switches from monkeypatching CONFIG_DIR /
CONFIG_PATH to setting GRADATA_CONFIG_DIR, aligning with the new
single source of truth.

---------

Co-authored-by: Oliver Le <oliver@gradata.com>
Co-authored-by: Gradata <noreply@gradata.ai>
@Gradata Gradata deleted the feat/gdpr-compliance branch April 15, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant