Skip to content

feat: multi-tenant SDK + cloud alignment (tenant_id, visibility, clusters, sync_state)#102

Merged
Gradata merged 23 commits into
mainfrom
worktree-multi-tenant-cloud-align
Apr 17, 2026
Merged

feat: multi-tenant SDK + cloud alignment (tenant_id, visibility, clusters, sync_state)#102
Gradata merged 23 commits into
mainfrom
worktree-multi-tenant-cloud-align

Conversation

@Gradata

@Gradata Gradata commented Apr 17, 2026

Copy link
Copy Markdown
Owner

Summary

Future-proofs the Gradata SDK for a second user. Local brain and cloud schema now share identity (tenant_id == brain_id), every per-tenant INSERT carries the tenant UUID, and a push-only MVP streams rows to Supabase behind an env flag.

Key changes:

  • Migration infra (src/gradata/_migrations/): unified package with numbered-migration dispatch + idempotency tracking (migrations table). Migration 001_add_tenant_id adds tenant_id to 35 tables, backfills all rows, adds visibility to meta_rules/frameworks/rule_relationships, and creates per-tenant indexes.
  • Tenant helpers (src/gradata/_tenant.py): tenant_for(brain_dir) lru-cached lookup, file-backed at <brain>/.tenant_id, env override via GRADATA_TENANT_ID.
  • INSERT wiring (7 commits): every per-tenant write path now emits tenant_id — events, rule_provenance, meta_rules, correction_patterns, activity_log, prep_outcomes, rule_relationships, pending_approvals, lesson_transitions, brain_fts_content. Each site has a defensive ALTER TABLE ADD COLUMN tenant_id wrapped in suppress(OperationalError) for pre-migration brains.
  • Cloud schema (cloud/scripts/): adds visibility (private/shared/global) + source_cluster_id to meta_rules, drops NOT NULL on meta_rules.brain_id for global rules, creates sync_state and clusters tables with RLS policies. Clusters restore the richer grouping concept (member_lesson_ids, centroid_summary, contradiction_score, cohesion_score) that was lost when we switched to flat meta_rules.
  • Cloud sync MVP (src/gradata/_cloud_sync.py): ~170 LOC push-only client. Stdlib-only (urllib), no threads. Reads tenant-scoped rows from events/lessons/meta_rules/clusters/correction_patterns/rule_provenance since sync_state.last_push_at and POSTs to Supabase PostgREST with upsert semantics. Disabled by default (GRADATA_CLOUD_SYNC=1 to enable).
  • Local re-point: production brain's .tenant_id was aligned to the existing cloud brain_id (3b49e9c6-2cf1-4e6f-b9bf-51a618fe830a) before applying migration 001, so the 2954 backfilled rows land under the correct tenant.

Test results: 3684 passed, 21 skipped, 2 xfailed. The 2 failures on test_cli_export_writes_cursorrules are the pre-existing cp1252 encoding flake — unrelated to this PR (see commit log before fad6cdc).

Test plan

  • Migration 001 dry-run + apply against Oliver's production brain (2062 events) — all rows backfilled, local tenant_id matches cloud brain_id
  • 10 new INSERT-site assertion tests in tests/test_tenant_id_inserts.py (every wired site asserts tenant_id IS NOT NULL)
  • 6 new push-sync tests in tests/test_cloud_row_push.py (disabled-by-default, tenant filter, sync_state update, missing-db no-op, unmigrated-table skip)
  • Full pytest green (3684 pass)
  • Verify RLS isolation with second tenant — deferred to follow-up (requires second Supabase user)
  • Pull / delete / conflict resolution — deferred to follow-up (MVP is push-only)

Generated with Gradata

Gradata and others added 14 commits April 16, 2026 23:57
…ompts

Two measured waste eliminations:

1. inject_brain_rules: mandatory_reminder block restated every mandatory
   rule verbatim in a recency tag. mandatory_block already places them in
   primacy position — the reminder was defensive redundancy costing
   ~50-200 tok/session per mandatory rule.

2. context_inject: MIN_MESSAGE_LEN raised 10 → 60 to skip brain FTS
   search on trivial follow-ups ("ok", "yes", "continue"). Previous
   threshold only filtered 1-9 char messages, causing ~400 tok of
   wasted context per ack-style reply.

Council audit (adversarial, synthesized from Skeptic + Architect +
Pragmatist) found my earlier measurement methodology used synthetic
inputs that masked real per-operation costs. Real heavy session
(10 prompts, 30 edits, 5 agents) accumulates ~9.6K gradata tokens;
context_inject alone can contribute 3.5K when prompts are substantive.

Tests: 2884 passed, 1 preexisting cp1252 Windows unicode failure
in test_rule_to_hook (unrelated to injection).

Co-Authored-By: Gradata <noreply@gradata.ai>
…imits

Rule_enforcement was re-injecting SessionStart rules on every Write/Edit,
costing ~165 tok/edit for zero added primacy value. Default it off behind
GRADATA_RULE_ENFORCEMENT=1 opt-in for ablation.

Also add env knobs for tuning injection volume without redeploying:
- context_inject: GRADATA_CONTEXT_INJECT kill switch + MIN_MESSAGE_LEN
  raised 60 -> 100 so ack-style replies skip FTS cost
- inject_brain_rules: GRADATA_MAX_RULES, GRADATA_MIN_CONFIDENCE,
  GRADATA_MAX_META_RULES for tuning SessionStart payload

Tests updated to opt into rule_enforcement where they exercise the hook
and to use a >=100 char message for context_inject.

Co-Authored-By: Gradata <noreply@gradata.ai>
Telemetry showed 10 SessionStart events in a long session re-injected
1.9KB each (~3.7k tokens total) — duplicative because the compact summary
already carries rules from the prior session, and the new session's
primacy slot is consumed by that summary.

Skip inject_brain_rules when source is 'compact' or 'resume'. Opt back
in with GRADATA_INJECT_ON_COMPACT=1 for ablation.

Co-Authored-By: Gradata <noreply@gradata.ai>
…d metas

When a meta-rule is injected, its source_lesson_ids are added to a mutex
set; individual RULE/PATTERN leaves whose _lesson_id() is in that set are
skipped so the principle isn't double-injected as both the meta and its
concrete leaves.

Freed slots get refilled: rank_rules is now called with a 3x overshoot
so the render loop has candidates beyond the MAX_RULES cap; the cap is
enforced at render time (cluster_lines + individual_lines <= MAX_RULES)
rather than at rank time.

On current brain: 1822B -> 1319B per SessionStart (27% smaller), and
leaf categories diversify from {LEADS, DEMO_PREP, DRAFTING} to
{DRAFTING, TONE, DATA_INTEGRITY, ARCHITECTURE, CODE} while the LEADS +
DEMO_PREP principles still land via metas.

Opt-out: GRADATA_META_RULE_MUTEX=0 for ablation. Mirrors the existing
cluster-vs-meta mutex pattern (meta_covered_categories).

Co-Authored-By: Gradata <noreply@gradata.ai>
…ffold

Adds the substrate for multi-tenant SDK + cloud:

- src/gradata/_migrations/ — idempotent migration runner, tenant UUID
  helpers, and 001_add_tenant_id which adds tenant_id + visibility +
  schema_version columns across 35 per-tenant and mixed-visibility tables
- src/gradata/_migrations/fill_null_tenant.py — safety-net backfill for
  INSERT sites not yet wired to tenant_for()
- src/gradata/_tenant.py — cached tenant_for(brain_dir) + peek/read
  helpers with GRADATA_TENANT_ID env override
- src/gradata/_events.py — both events INSERT sites now write tenant_id
  and schema_version=1; defensive ALTER + index for older brains
- cloud/scripts/cloud_schema.sql + cloud_rls_test.sql — Supabase
  schema scaffold (superseded by in-place evolution, kept for reference)
- docs/architecture/multi-tenant-future-proofing.md and
  cloud-apply-instructions.md — strategy + apply guide

Co-Authored-By: Gradata <noreply@gradata.ai>
The previous commit added src/gradata/_migrations/ as a package but the
sibling src/gradata/_migrations.py still shadowed it, so
Brain.__init__ crashed with "cannot import name 'run_migrations'".

This commit merges both:

- _migrations/__init__.py exposes run_migrations(db_path) with the
  signature Brain.__init__ expects.
- Keeps the inline _BASE_TABLES + _MIGRATIONS idempotent ALTER list for
  the fast path on existing brains.
- Adds tracked numbered-migration dispatch: scans NNN_*.py in the
  package, skips any already in the migrations table, and invokes
  module.up(conn, tenant_id). Tenant UUID resolved via
  get_or_create_tenant_id on the brain dir.
- Deletes the obsolete _migrations.py.

Brain.init now runs both layers on every open, so upgrades are
transparent.

Co-Authored-By: Gradata <noreply@gradata.ai>
Minimal Karpathy-style module: ~170 LOC, stdlib-only (urllib), no threads.
Reads tenant_id-scoped rows from events/lessons/meta_rules/clusters/
correction_patterns/rule_provenance since sync_state.last_push_at and
POSTs to Supabase PostgREST with Prefer: resolution=merge-duplicates.

- Disabled by default; requires GRADATA_CLOUD_SYNC=1 + URL + KEY.
- Gracefully skips tables lacking tenant_id column (pre-migration brains).
- Visibility stays on the row — RLS decides who else can read it.
- 6 unit tests covering disabled path, tenant filtering, sync_state
  update, missing-db no-op, and unmigrated-table skip.

Co-Authored-By: Gradata <noreply@gradata.ai>
- add tenant_for import to audit.py
- write_provenance now appends tenant_id to the rule_provenance INSERT
- add defensive ALTER TABLE guard so pre-migration brains get the column
  on first write (same pattern as _events.py _ensure_table)

Co-Authored-By: Gradata <noreply@gradata.ai>
… INSERT

- add tenant_for import; Path import made unconditional (was TYPE_CHECKING)
- _CREATE_TABLE_SQL: add tenant_id TEXT, visibility TEXT DEFAULT 'private' columns
- ensure_table: add defensive ALTER TABLE for tenant_id + visibility on older brains
- save_meta_rules: resolve _tid once, pass as tenant_id in INSERT
- ensure_pattern_table: add tenant_id column to DDL + defensive ALTER TABLE guard
- upsert_correction_pattern: pass _tid in INSERT
- upsert_correction_patterns_batch: pass _tid per row in executemany

Co-Authored-By: Gradata <noreply@gradata.ai>
…mes INSERT

- add tenant_for import
- _init_tables: add tenant_id TEXT column to both table DDLs and defensive ALTER guard
- log_activity: resolve _tid from db_path.parent, pass in INSERT
- log_prep: resolve _tid, pass in INSERT
- log_outcome fallback path: reuse _tid (resolved once at function top), pass in INSERT

Co-Authored-By: Gradata <noreply@gradata.ai>
- add tenant_for import
- store_relationship: resolve _tid from db_path.parent
- add defensive ALTER TABLE guards for tenant_id + visibility before INSERT
  (handles brains created by older code that predates migration 001)
- pass _tid in the INSERT column list

Co-Authored-By: Gradata <noreply@gradata.ai>
… INSERT

Both sites are inside brain_end_session / brain_correct where brain.dir is
available, so tenant_for(brain.dir) is the natural source.

- pending_approvals INSERT: import tenant_for locally, resolve _tid,
  add defensive ALTER TABLE guard, pass _tid in column list
- lesson_transitions INSERT: same pattern; ALTER guard runs once before
  the per-transition loop

Co-Authored-By: Gradata <noreply@gradata.ai>
Three write paths: fts_index, fts_index_batch, fts_rebuild.

- add tenant_for import
- _ensure_fts_table: add tenant_id TEXT to brain_fts_content DDL +
  defensive ALTER TABLE guard for pre-migration brains
- fts_index: derive brain_dir from ctx.brain_dir or Path(db).parent,
  resolve _tid, pass in INSERT
- fts_index_batch: same brain_dir derivation, resolve _tid once, pass
  in every INSERT
- fts_rebuild: brain_path already computed; resolve _tid once before the
  doc loop, pass in every INSERT

Co-Authored-By: Gradata <noreply@gradata.ai>
10 tests covering: rule_provenance, meta_rules, correction_patterns
(single + batch), activity_log, prep_outcomes (log_prep + log_outcome
fallback), rule_relationships, brain_fts_content (fts_index + fts_index_batch).

Each test plants a deterministic tenant UUID in brain_dir/.tenant_id and
verifies the retrieved row carries that exact value.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

Caution

Review failed

Pull request was closed or merged during review

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough
  • Multi-tenant schema: Added tenant_id UUID to 35+ tables via Migration 001_add_tenant_id with idempotent SQLite migration infrastructure (src/gradata/_migrations/)
  • New tenant APIs: get_tenant_id(), peek_tenant_id(), tenant_for() in src/gradata/_tenant.py with LRU caching; supports GRADATA_TENANT_ID env override and file-backed .tenant_id with atomic writes
  • Cloud sync MVP: push(brain_dir) in src/gradata/_cloud_sync.py for push-only Supabase sync with tenant-scoped incremental uploads; disabled by default (requires GRADATA_CLOUD_SYNC=1)
  • Per-tenant INSERT wiring: Updated 10+ write paths to include tenant_id (audit, meta_rules, correction_patterns, activity_log, prep_outcomes, rule_relationships, pending_approvals, lesson_transitions, fts_content, events) with defensive schema migrations
  • Cloud schema & visibility: Added visibility (private/shared/global) and source_cluster_id fields; created sync_state and clusters tables with RLS policies
  • CLI precedence fix: GRADATA_BRAIN env var now takes priority over --brain-dir argument
  • Safe env parsing: New _env_int() helper with minimum clamping; MAX_REMINDERS=0 disables reminders
  • Migration tracking: New migrations table for idempotent numbered migrations; has_applied(), mark_applied() helpers
  • Test coverage: Added test_tenant_id_inserts.py (14 tests for tenant isolation across write paths) and test_cloud_row_push.py (cloud sync MVP validation)
  • Documentation: multi-tenant-future-proofing.md and cloud-apply-instructions.md with end-to-end provisioning and RLS verification runbooks

Walkthrough

Adds multi-tenant support and a Supabase push-only cloud sync MVP; introduces tenant UUID management, numbered SQLite migrations to add tenant_id/visibility/schema_version, propagates tenant scoping across many write paths, and adds runbooks and tests for provisioning, migration, and RLS verification.

Changes

Cohort / File(s) Summary
Documentation
docs/architecture/cloud-apply-instructions.md, docs/architecture/multi-tenant-future-proofing.md
New Supabase provisioning/migration/RLS runbook and multi-tenant architecture/runbook describing tenant model, phased rollout, onboarding, and verification steps.
Cloud Sync MVP
src/gradata/_cloud_sync.py, tests/test_cloud_row_push.py
New push-only Supabase/PostgREST sync: env-gated, per-brain tenant filtering, per-table incremental selection using sync_state.last_push_at, POST upserts with Prefer header, durable watermark update semantics; tests for enablement, tenant-scoped uploads, watermark updates, missing-table handling.
Tenant ID & Migration Helpers
src/gradata/_tenant.py, src/gradata/_migrations/_runner.py, src/gradata/_migrations/tenant_uuid.py
New tenant resolution API with env override and cached lookup; atomic .tenant_id file create/read helpers; migration-runner utilities for migrations table, idempotent add-column/index, and DB path resolution.
Numbered Migrations
src/gradata/_migrations/__init__.py, src/gradata/_migrations/001_add_tenant_id.py, src/gradata/_migrations/fill_null_tenant.py
Two-phase migration orchestration (inline + numbered NNN_*.py); 001_add_tenant_id.py adds tenant_registry, tenant_id/visibility/schema_version columns with backfill and indexes; fill_null_tenant.py backfills NULL tenant_id rows; migrations are tracked in migrations table.
Core, Events & FTS
src/gradata/_core.py, src/gradata/_events.py, src/gradata/_query.py
Writes now include tenant_id for pending approvals, lesson transitions, events, and FTS content; events schema extended with tenant_id and schema_version, dedup uniqueness made tenant-scoped; cross-platform fsync attempt added; defensive ALTERs for older schemas.
Rules, Audit & Enhancements
src/gradata/audit.py, src/gradata/rules/rule_graph.py, src/gradata/enhancements/meta_rules_storage.py, src/gradata/enhancements/scoring/loop_intelligence.py
Persist tenant_id (and visibility where applicable) for rule provenance, relationships, meta-rules, correction patterns, activity/prep/outcome logging; defensive ALTER TABLE migrations on write; meta_rules inserts set visibility='private'.
CLI & Hooks
src/gradata/cli.py, src/gradata/hooks/rule_enforcement.py
Brain directory resolution now prefers GRADATA_BRAIN env, then --brain-dir, then CWD; rule_enforcement gains robust _env_int parsing for MAX_REMINDERS with early exit when ≤0 and stricter domain-conflict filtering.
Minor & Hygiene
src/gradata/enhancements/self_improvement/__init__.py, src/gradata/rules/rule_engine/__init__.py
Added module-level comment and from __future__ import annotations in one module.
Tests: Tenant Isolation
tests/test_tenant_id_inserts.py
New comprehensive tests verifying inserted rows across many paths populate tenant_id from the per-brain .tenant_id; covers single and batch upserts and fallback-insert behavior.

Sequence Diagram(s)

sequenceDiagram
    participant App as Gradata CLI/Sync
    participant Brain as SQLite Brain DB
    participant SyncState as sync_state Table
    participant PostgREST as Supabase PostgREST

    App->>Brain: resolve_brain_db(brain_dir)
    App->>Brain: tenant_for(brain_dir)
    App->>SyncState: SELECT last_push_at WHERE tenant_id=?

    loop for each table in PUSH_TABLES
        App->>Brain: check table columns (tenant_id, timestamps)
        App->>Brain: SELECT rows WHERE tenant_id=? AND ts > last_push_at
        App->>App: build JSON payload
        App->>PostgREST: POST /rest/v1/{table} (apikey + Prefer: resolution=merge-duplicates)
        PostgREST-->>App: 2xx or error response
        App->>App: record accepted row count
    end

    alt all pending tables fully accepted
        App->>SyncState: UPDATE last_push_at = now() WHERE tenant_id=?
        App->>Brain: COMMIT
    else partial failure or error
        App->>Brain: do not advance watermark
        App->>Brain: COMMIT
    end

    App-->>App: return map(table -> accepted_count)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main changes: multi-tenant SDK support with cloud alignment, tenant_id, visibility, and sync infrastructure.
Description check ✅ Passed The description is detailed and directly related to the changeset, covering migration infrastructure, tenant helpers, INSERT wiring, cloud schema changes, and cloud sync MVP with concrete test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-multi-tenant-cloud-align

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 17, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 22

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/gradata/enhancements/scoring/loop_intelligence.py (1)

238-261: ⚠️ Potential issue | 🟠 Major

UPDATE path doesn't filter by tenant_id — potential cross-tenant update.

The log_outcome function computes _tid but the UPDATE statement (lines 258-261) doesn't filter by tenant_id. If two tenants have prep_outcomes with the same prospect and prep_type, this could update the wrong tenant's row.

🐛 Proposed fix
     row = conn.execute(
         """SELECT id, date FROM prep_outcomes
-           WHERE prospect = ? AND prep_type = ? AND outcome IS NULL
+           WHERE prospect = ? AND prep_type = ? AND outcome IS NULL AND tenant_id = ?
            ORDER BY date DESC, id DESC LIMIT 1""",
-        (prospect, prep_type),
+        (prospect, prep_type, _tid),
     ).fetchone()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/enhancements/scoring/loop_intelligence.py` around lines 238 -
261, The UPDATE for prep_outcomes can affect other tenants because it doesn't
include tenant_id; use the computed _tid to scope DB operations: add "tenant_id
= ?" to the WHERE clause(s) used in conn.execute (both the SELECT that finds the
row and the UPDATE that sets outcome/days_to_outcome) and pass _tid as an
additional parameter to those queries so the row selection and update are
restricted to the current tenant in log_outcome (use variables _tid and
conn.execute to locate and modify the queries).
src/gradata/_query.py (1)

1-15: ⚠️ Potential issue | 🟡 Minor

Missing from __future__ import annotations import.

The coding guidelines require from __future__ import annotations for all files in src/gradata/**/*.py. This file starts with import contextlib without the required future import. This also contributes to the Ruff I001 error about unsorted imports.

🛠️ Proposed fix
+from __future__ import annotations
+
 import contextlib
 import sqlite3
 from datetime import datetime
 from pathlib import Path
 from typing import TYPE_CHECKING

 import gradata._paths as _p
 from gradata._tenant import tenant_for
 from gradata._config import (

As per coding guidelines: "type safety (from future import annotations required)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_query.py` around lines 1 - 15, This module is missing the
required future import: add the line "from __future__ import annotations" at the
very top of the gradata._query module before any other imports, then reorder
imports (contextlib, sqlite3, datetime, pathlib, typing, and local imports like
gradata._paths and tenant_for) so they are alphabetically/sorted per Ruff to
eliminate the I001 unsorted imports error; ensure the future import remains the
first statement in the file and that no other code or docstring precedes it.
src/gradata/rules/rule_graph.py (1)

244-264: 🛠️ Refactor suggestion | 🟠 Major

Use context manager for SQLite connection to prevent resource leaks.

If an exception occurs between sqlite3.connect() and conn.close(), the connection leaks. The defensive ALTER TABLE statements could raise exceptions not covered by contextlib.suppress.

♻️ Proposed fix
-    conn = sqlite3.connect(str(db_path))
-    # Defensive migration: brains created before migration 001 lack tenant_id.
-    with contextlib.suppress(sqlite3.OperationalError):
-        conn.execute("ALTER TABLE rule_relationships ADD COLUMN tenant_id TEXT")
-    with contextlib.suppress(sqlite3.OperationalError):
-        conn.execute("ALTER TABLE rule_relationships ADD COLUMN visibility TEXT DEFAULT 'private'")
-    conn.execute(
-        "INSERT INTO rule_relationships "
-        "(rule_a_id, rule_b_id, relationship, confidence, detected_at, tenant_id) "
-        "VALUES (?, ?, ?, ?, ?, ?)",
-        (
-            rule_a_id,
-            rule_b_id,
-            rel_type.value,
-            clamped,
-            datetime.now(UTC).isoformat(),
-            _tid,
-        ),
-    )
-    conn.commit()
-    conn.close()
+    with contextlib.closing(sqlite3.connect(str(db_path))) as conn:
+        # Defensive migration: brains created before migration 001 lack tenant_id.
+        with contextlib.suppress(sqlite3.OperationalError):
+            conn.execute("ALTER TABLE rule_relationships ADD COLUMN tenant_id TEXT")
+        with contextlib.suppress(sqlite3.OperationalError):
+            conn.execute("ALTER TABLE rule_relationships ADD COLUMN visibility TEXT DEFAULT 'private'")
+        conn.execute(
+            "INSERT INTO rule_relationships "
+            "(rule_a_id, rule_b_id, relationship, confidence, detected_at, tenant_id) "
+            "VALUES (?, ?, ?, ?, ?, ?)",
+            (
+                rule_a_id,
+                rule_b_id,
+                rel_type.value,
+                clamped,
+                datetime.now(UTC).isoformat(),
+                _tid,
+            ),
+        )
+        conn.commit()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/rules/rule_graph.py` around lines 244 - 264, The code opens
sqlite3.connect(str(db_path)) into conn and manually calls conn.commit() and
conn.close(), which can leak the connection if an exception occurs; change this
to use a context manager: wrap the DB work in "with
sqlite3.connect(str(db_path)) as conn:" so all executes (including the defensive
ALTERs using contextlib.suppress and the INSERT that uses rule_a_id, rule_b_id,
rel_type.value, clamped, datetime.now(UTC).isoformat(), _tid) run inside the
with-block and you can remove the explicit conn.commit() and conn.close(); keep
the contextlib.suppress(...) calls around the ALTER TABLE lines but ensure they
remain inside the with-block so the connection is always closed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/architecture/cloud-apply-instructions.md`:
- Around line 40-49: Replace the hardcoded personal email in the SQL example
INSERT INTO tenant_registry VALUES tuple with a non-PII placeholder (e.g.,
"<your-email>" or "name@example.com") so the example uses a template value;
update the VALUES row in the INSERT INTO tenant_registry example (the tuple
containing 'oliver@hausgem.com') to the chosen placeholder and ensure any
surrounding explanatory text reflects the templated email.

In `@docs/architecture/multi-tenant-future-proofing.md`:
- Around line 48-57: The doc/migration mismatch: update either the docs
(docs/architecture/multi-tenant-future-proofing.md) or the migration
(src/gradata/_migrations/001_add_tenant_id.py) so both treat lesson_transitions,
ablation_log, entities, and rule_canary the same way; if you change the
migration, modify 001_add_tenant_id.py to make those four tables "mixed" (add
tenant_id as NULLABLE and add/handle a visibility column) instead of treating
them as per-tenant (NOT NULL + backfill), and ensure any backfill/constraint
logic and tests in the migration reflect nullable tenant_id and visibility
handling.

In `@src/gradata/_cloud_sync.py`:
- Line 117: Remove the unused "# noqa: S608" suppressions from the SQL string
constructions where the variable sql is set (the f-string building "SELECT *
FROM {table} WHERE {' AND '.join(where)}") and the similar occurrence later
around line 142; simply delete those noqa directives so the linter no longer
reports RUF100 and CI will pass, leaving the SQL construction logic unchanged.
- Around line 155-166: The push function currently opens the DB by resolving a
raw brain_dir and hardcodes "system.db"; change push to accept a BrainContext
(e.g., def push(brain: BrainContext) -> dict[str,int]) and remove any hardcoded
"system.db" use and manual Path resolution in push; instead use the
BrainContext's API/attributes (e.g., brain.path / brain.db_path property or
brain.get_system_db()) to access the DB, update any callsites to pass a
BrainContext, and ensure enabled() still respects the BrainContext or is called
before DB access.
- Line 187: The exported names in the module-level __all__ are unsorted and
trigger Ruff RUF022; update the __all__ declaration so the list is
alphabetically sorted (e.g., order the symbols "PUSH_TABLES", "enabled", "push"
or whichever alphabetical ordering matches the linter) by editing the __all__ =
["enabled", "push", "PUSH_TABLES"] line to contain the same symbols sorted
lexicographically; ensure you only change the order of the entries and preserve
the exact symbol names used by the module (refer to __all__, enabled, push, and
PUSH_TABLES).
- Around line 70-77: In _last_push_at, narrow the sqlite3.OperationalError
handling: wrap the conn.execute(... "SELECT last_push_at FROM sync_state WHERE
brain_id = ?", (tenant_id,)).fetchone() call in try/except, catch
OperationalError as e and inspect the error message; if it indicates the missing
table (e.g. contains "no such table" and "sync_state") return None, otherwise
re-raise the exception so real DB errors surface; this keeps the special-case
for the missing sync_state table while allowing other errors from conn.execute
to propagate.
- Around line 80-92: The _mark_push function can crash on pre-migration DBs
because the sync_state table may be missing; wrap the INSERT...ON CONFLICT block
in a try/except for sqlite3.OperationalError, and if the error indicates "no
such table: sync_state" create the sync_state table (define brain_id as PRIMARY
KEY and columns last_push_at and updated_at) using conn.execute, then re-run the
INSERT and conn.commit(); mirror the tolerant behavior used in _last_push_at so
unmigrated brains don't abort the sync.
- Line 65: The lint error UP017 is caused by using timezone.utc; update the call
that returns the timestamp so it uses the datetime class UTC constant instead
(replace datetime.now(timezone.utc).isoformat() with
datetime.now(datetime.UTC).isoformat()), and adjust imports accordingly (remove
or stop using timezone if it was imported and ensure datetime refers to the
class that exposes UTC).
- Around line 175-182: The code currently advances the watermark if any table
key exists in pushed even when a table's POST returned 0 (failure); update the
loop around PUSH_TABLES/_rows_since/_post so that you either (a) only add table
entries to pushed when _post(table, rows) returns a successful truthy/positive
result, or (b) after populating pushed, check that every pushed[table] is a
successful result (e.g., >0) before calling _mark_push(conn, tenant_id,
started); use the existing symbols PUSH_TABLES, _rows_since, _post and
_mark_push to implement this conditional so sync_state.last_push_at is only
advanced when all attempted POSTs succeeded.

In `@src/gradata/_migrations/_runner.py`:
- Around line 25-50: The helpers currently perform commits which creates unsafe
transaction boundaries; remove all internal commits and make them side-effect
free: in ensure_migrations_table(conn) drop conn.commit() so it only executes
MIGRATIONS_TABLE_SQL, and in mark_applied(...) remove its conn.commit() so it
only executes the INSERT/REPLACE statement; keep or allow calls to
ensure_migrations_table from has_applied/mark_applied but ensure those helpers
do not commit — the caller should open/commit the transaction that creates the
table and inserts the migration row.

In `@src/gradata/_migrations/001_add_tenant_id.py`:
- Around line 160-241: The up() function performs a DB commit at the end which
makes schema/data changes durable before the migration is marked applied; remove
the explicit conn.commit() call from up() so the function is transaction-free
and let the caller mark_applied() and commit once after recording the migration;
locate the conn.commit() in up() and delete it (do not add any BEGIN/COMMIT or
error-handling transactions inside up()), leaving the rest of the logic
(tenant_registry insert, per-tenant and mixed table updates, index creation, and
events.schema_version updates) unchanged.
- Around line 231-238: The migration currently only runs the backfill UPDATE
when add_column_if_missing added events.schema_version, leaving existing NULLs
if the column already existed; change the logic in the events block so that if
table_exists(conn, "events") you always execute conn.execute("UPDATE events SET
schema_version = 1 WHERE schema_version IS NULL") to backfill NULLs, but only
append "events.schema_version" to summary["columns_added"] when
add_column_if_missing actually returns True; keep using the same
functions/variables (table_exists, add_column_if_missing, conn.execute, summary)
so the update is idempotent and runs regardless of whether the column was added
this run.

In `@src/gradata/_migrations/fill_null_tenant.py`:
- Around line 63-65: The migration is currently including mixed-visibility
tables ("meta_rules", "frameworks", "rule_relationships") in CANDIDATE_TABLES
causing global (tenant_id IS NULL) rows to be assigned to a primary tenant;
update the migration in src/gradata/_migrations/fill_null_tenant.py to exclude
those mixed tables from CANDIDATE_TABLES (or alternatively change the backfill
logic in the function that iterates CANDIDATE_TABLES to only update rows whose
visibility is explicitly private), referencing the CANDIDATE_TABLES variable and
the backfill loop/logic around lines 100-117 so only strictly per-tenant tables
(or explicit-private rows) are modified.

In `@src/gradata/_migrations/tenant_uuid.py`:
- Around line 25-35: The tenant-id creation in get_or_create_tenant_id is racy
(exists() then write_text()); replace the non-atomic path with an atomic-create
pattern: after creating the brain dir, read the file if present and valid,
otherwise generate a UUID and attempt to create the file atomically (e.g., open
with O_CREAT|O_EXCL or write to a temp file and os.replace/rename); if the
create fails with EEXIST, re-read and validate the file and return that value,
retrying as needed. Apply the same atomic-create/fallback re-read logic to the
mirrored implementation in the other tenant code (the corresponding function in
_tenant.py) so concurrent processes cannot write different tenant IDs.

In `@src/gradata/_query.py`:
- Around line 47-49: The migration currently suppresses all exceptions using
contextlib.suppress(Exception) around conn.execute("ALTER TABLE
brain_fts_content ADD COLUMN tenant_id TEXT"); change the suppression to only
catch sqlite3.OperationalError (i.e., use
contextlib.suppress(sqlite3.OperationalError)) so you don't hide unrelated
errors, and ensure sqlite3 is imported at the top of the module if not already
present; keep the conn.execute call and SQL unchanged.

In `@src/gradata/_tenant.py`:
- Around line 73-80: tenant_for() currently uses _cached(brain_key) but
get_tenant_id() respects the GRADATA_TENANT_ID override, so once a value is
cached changing the env var has no effect; fix by either skipping the cache when
the env override is present or by including the effective override in the cache
key. Concretely, update _cached to accept a second parameter for the current env
override (e.g. env_tenant = os.getenv("GRADATA_TENANT_ID") or ENV_TENANT_ID
constant) and have tenant_for call _cached(str(Path(...).resolve()), env_tenant)
so the cache key includes the override; alternatively, detect if
GRADATA_TENANT_ID is set inside tenant_for and call get_tenant_id() directly
(bypassing _cached) when it is present.

In `@src/gradata/enhancements/meta_rules_storage.py`:
- Around line 35-36: Remove the empty TYPE_CHECKING block and the now-unused
TYPE_CHECKING import: delete the "if TYPE_CHECKING: pass" block and remove
TYPE_CHECKING from the import statement so no empty type-checking block remains
(ensure no other references to TYPE_CHECKING in the file before removing).

In `@src/gradata/hooks/context_inject.py`:
- Around line 18-19: The import-time int(os.environ.get(...)) calls for
MIN_MESSAGE_LEN and MAX_CONTEXT_LEN (and the similar expression in
rule_enforcement.py) can raise ValueError and crash imports; replace them with a
small safe parser used at module import: implement a helper like
safe_parse_int_env(key, default) that reads os.environ.get(key), tries to int()
it inside try/except ValueError (and optionally logs a debug/warning), and
returns the default on failure or missing value, then use safe_parse_int_env for
MIN_MESSAGE_LEN and MAX_CONTEXT_LEN and the analogous env parse in
rule_enforcement.py so malformed env values no longer raise at import time.

In `@src/gradata/hooks/inject_brain_rules.py`:
- Around line 311-314: Remove the redundant `lesson_id_fn is not None` check
inside the conditional that increments `suppressed_by_meta`; since
`meta_mutex_enabled` already guarantees `lesson_id_fn` is set (checked earlier),
simplify the if to `if meta_mutex_enabled and lesson_id_fn(r) in
meta_covered_lesson_ids:` so the block using `lesson_id_fn`,
`meta_covered_lesson_ids`, and `suppressed_by_meta` stays the same but without
the needless None check.

In `@src/gradata/hooks/rule_enforcement.py`:
- Line 39: The module-level int conversion for MAX_REMINDERS can raise
ValueError at import time; update the MAX_REMINDERS initialization in
rule_enforcement.py to parse the GRADATA_MAX_REMINDERS env var defensively: read
the raw string, try to convert to int inside a try/except (falling back to the
default 5 on failure or non-numeric input), then clamp the resulting value to a
non-negative integer (e.g., max(0, parsed_value)); ensure the symbol
MAX_REMINDERS is assigned only after this safe parse so import-time exceptions
are avoided.

In `@src/gradata/rules/rule_graph.py`:
- Line 241: The local import of contextlib inside the function
store_relationship should be moved to the module import section at the top of
the file with the other imports and the in-function import removed; update the
module-level imports to include contextlib and delete the now-redundant import
statement inside store_relationship so the function uses the module-level
contextlib instead.

In `@tests/test_hooks_intelligence.py`:
- Around line 182-184: The test is flaky when ambient GRADATA_CONTEXT_INJECT is
set; update the patched environment in the test to explicitly set
"GRADATA_CONTEXT_INJECT" (e.g., "1") alongside "GRADATA_BRAIN_DIR" so the
context injection behavior is deterministic. Modify the patch.dict call that
currently sets {"GRADATA_BRAIN_DIR": str(tmp_path)} to include the explicit
"GRADATA_CONTEXT_INJECT" value so the invocation of context_main(...) with
mock_brain is independent of external environment.

---

Outside diff comments:
In `@src/gradata/_query.py`:
- Around line 1-15: This module is missing the required future import: add the
line "from __future__ import annotations" at the very top of the gradata._query
module before any other imports, then reorder imports (contextlib, sqlite3,
datetime, pathlib, typing, and local imports like gradata._paths and tenant_for)
so they are alphabetically/sorted per Ruff to eliminate the I001 unsorted
imports error; ensure the future import remains the first statement in the file
and that no other code or docstring precedes it.

In `@src/gradata/enhancements/scoring/loop_intelligence.py`:
- Around line 238-261: The UPDATE for prep_outcomes can affect other tenants
because it doesn't include tenant_id; use the computed _tid to scope DB
operations: add "tenant_id = ?" to the WHERE clause(s) used in conn.execute
(both the SELECT that finds the row and the UPDATE that sets
outcome/days_to_outcome) and pass _tid as an additional parameter to those
queries so the row selection and update are restricted to the current tenant in
log_outcome (use variables _tid and conn.execute to locate and modify the
queries).

In `@src/gradata/rules/rule_graph.py`:
- Around line 244-264: The code opens sqlite3.connect(str(db_path)) into conn
and manually calls conn.commit() and conn.close(), which can leak the connection
if an exception occurs; change this to use a context manager: wrap the DB work
in "with sqlite3.connect(str(db_path)) as conn:" so all executes (including the
defensive ALTERs using contextlib.suppress and the INSERT that uses rule_a_id,
rule_b_id, rel_type.value, clamped, datetime.now(UTC).isoformat(), _tid) run
inside the with-block and you can remove the explicit conn.commit() and
conn.close(); keep the contextlib.suppress(...) calls around the ALTER TABLE
lines but ensure they remain inside the with-block so the connection is always
closed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 08d012e9-7c42-499a-a5c0-0d7cb01e400b

📥 Commits

Reviewing files that changed from the base of the PR and between 357fbdd and adb930e.

📒 Files selected for processing (25)
  • docs/architecture/cloud-apply-instructions.md
  • docs/architecture/multi-tenant-future-proofing.md
  • src/gradata/_cloud_sync.py
  • src/gradata/_core.py
  • src/gradata/_events.py
  • src/gradata/_migrations/001_add_tenant_id.py
  • src/gradata/_migrations/__init__.py
  • src/gradata/_migrations/_runner.py
  • src/gradata/_migrations/fill_null_tenant.py
  • src/gradata/_migrations/tenant_uuid.py
  • src/gradata/_query.py
  • src/gradata/_tenant.py
  • src/gradata/audit.py
  • src/gradata/enhancements/meta_rules_storage.py
  • src/gradata/enhancements/scoring/loop_intelligence.py
  • src/gradata/hooks/context_inject.py
  • src/gradata/hooks/inject_brain_rules.py
  • src/gradata/hooks/rule_enforcement.py
  • src/gradata/rules/rule_graph.py
  • tests/test_cloud_row_push.py
  • tests/test_hooks_intelligence.py
  • tests/test_hooks_safety.py
  • tests/test_rule_enforcement_scope.py
  • tests/test_self_healing_fix.py
  • tests/test_tenant_id_inserts.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_hooks_intelligence.py
  • tests/test_hooks_safety.py
  • tests/test_rule_enforcement_scope.py
  • tests/test_self_healing_fix.py
  • tests/test_tenant_id_inserts.py
  • tests/test_cloud_row_push.py
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/rules/rule_graph.py
  • src/gradata/hooks/rule_enforcement.py
  • src/gradata/hooks/context_inject.py
  • src/gradata/_events.py
  • src/gradata/_query.py
  • src/gradata/audit.py
  • src/gradata/_core.py
  • src/gradata/enhancements/scoring/loop_intelligence.py
  • src/gradata/hooks/inject_brain_rules.py
  • src/gradata/_migrations/fill_null_tenant.py
  • src/gradata/enhancements/meta_rules_storage.py
  • src/gradata/_tenant.py
  • src/gradata/_migrations/__init__.py
  • src/gradata/_migrations/tenant_uuid.py
  • src/gradata/_migrations/001_add_tenant_id.py
  • src/gradata/_cloud_sync.py
  • src/gradata/_migrations/_runner.py
src/gradata/hooks/**

⚙️ CodeRabbit configuration file

src/gradata/hooks/**: JavaScript hooks for Claude Code integration. Check for: no shell injection (no execSync with user
input), temp files must use per-user subdirectory, HTTP calls must have timeouts, errors must be silent (never block
the tool chain).

Files:

  • src/gradata/hooks/rule_enforcement.py
  • src/gradata/hooks/context_inject.py
  • src/gradata/hooks/inject_brain_rules.py
🪛 GitHub Actions: SDK CI
src/gradata/rules/rule_graph.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/hooks/rule_enforcement.py

[error] 98-101: Ruff SIM103: Return the negated condition directly (inline the negated condition).


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/hooks/context_inject.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_events.py

[error] 49-52: Ruff SIM117: Use a single with statement with multiple contexts instead of nested with statements.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_query.py

[error] 8-32: Ruff I001: Import block is un-sorted or un-formatted. Organize imports.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/audit.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_core.py

[error] 465-470: Ruff I001: Import block is un-sorted or un-formatted. Organize imports.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/enhancements/scoring/loop_intelligence.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/hooks/inject_brain_rules.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_migrations/fill_null_tenant.py

[error] 24-25: Ruff I001: Import block is un-sorted or un-formatted. Organize imports.


[error] 24-24: Ruff RUF100: Unused noqa directive (non-enabled: E402). Remove the # noqa: E402 comment.


[error] 25-25: Ruff RUF100: Unused noqa directive (non-enabled: E402). Remove the # noqa: E402 comment.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/enhancements/meta_rules_storage.py

[error] 36-36: Ruff TC005: Found empty type-checking block (if TYPE_CHECKING: pass). Remove the empty block.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_tenant.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_migrations/__init__.py

[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_migrations/tenant_uuid.py

[error] 15-19: Ruff I001: Import block is un-sorted or un-formatted. Organize imports.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_migrations/001_add_tenant_id.py

[error] 1-1: Ruff N999: Invalid module name: '001_add_tenant_id'.


[error] 35-44: Ruff I001: Import block is un-sorted or un-formatted. Organize imports.


[error] 35-35: Ruff RUF100: Unused noqa directive (non-enabled: E402). Remove the # noqa: E402 comment on the import.


[error] 44-44: Ruff RUF100: Unused noqa directive (non-enabled: E402). Remove the # noqa: E402 comment.


[warning] 176-176: Ruff UP017: Use datetime.UTC alias. Convert datetime.now(timezone.utc).isoformat().


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_cloud_sync.py

[warning] 65-65: Ruff UP017: Use datetime.UTC alias. Convert datetime.now(timezone.utc).isoformat() to use datetime.UTC.


[error] 117-117: Ruff RUF100: Unused noqa directive (non-enabled: S608). Remove the # noqa: S608 -- ... comment.


[error] 142-142: Ruff RUF100: Unused noqa directive (non-enabled: S310). Remove the # noqa: S310 -- ... comment.


[error] 187-187: Ruff RUF022: __all__ is not sorted. Apply isort-style sorting to __all__.


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

src/gradata/_migrations/_runner.py

[error] 8-12: Ruff I001: Import block is un-sorted or un-formatted. Organize imports.


[warning] 48-48: Ruff UP017: Use datetime.UTC alias. Convert datetime.now(timezone.utc).isoformat().


[error] 1-1: Ruff check failed: Found 83 errors. Process completed with exit code 1.

🪛 LanguageTool
docs/architecture/multi-tenant-future-proofing.md

[uncategorized] ~138-~138: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...yle flow. Defer until it matters. 4. Open source tension. The SDK is Apache-2.0 but `s...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~140-~140: Use a hyphen to join words.
Context: ...Apache license clean-room. 5. "Future proof now" is a trap. Most future-proo...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.22.0)
docs/architecture/multi-tenant-future-proofing.md

[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 24-24: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 38-38: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 43-43: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 86-86: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/2/3

(MD029, ol-prefix)


[warning] 87-87: Ordered list item prefix
Expected: 2; Actual: 5; Style: 1/2/3

(MD029, ol-prefix)


[warning] 91-91: Ordered list item prefix
Expected: 3; Actual: 6; Style: 1/2/3

(MD029, ol-prefix)


[warning] 100-100: Ordered list item prefix
Expected: 1; Actual: 7; Style: 1/2/3

(MD029, ol-prefix)


[warning] 106-106: Ordered list item prefix
Expected: 2; Actual: 8; Style: 1/2/3

(MD029, ol-prefix)


[warning] 110-110: Ordered list item prefix
Expected: 3; Actual: 9; Style: 1/2/3

(MD029, ol-prefix)


[warning] 114-114: Ordered list item prefix
Expected: 1; Actual: 10; Style: 1/1/1

(MD029, ol-prefix)

🔇 Additional comments (19)
src/gradata/hooks/rule_enforcement.py (1)

105-108: Kill-switch gate is clean and low-risk.

Default-off behavior with explicit opt-in is implemented correctly here.

tests/test_hooks_safety.py (1)

107-129: Good update: rule-enforcement tests now explicitly enable the hook.

This keeps the tests aligned with the new default-off runtime behavior.

tests/test_self_healing_fix.py (1)

318-318: Helper output now correctly matches mandatory-directives-only behavior.

tests/test_rule_enforcement_scope.py (1)

13-17: Autouse env setup is the right fix for default-off enforcement.

src/gradata/audit.py (1)

49-62: Tenant-aware provenance insert is correctly wired.

Including tenant_id in rule_provenance and guarding pre-migration brains is a solid compatibility change.

src/gradata/hooks/context_inject.py (1)

23-26: Kill-switch behavior is clear and correctly placed at function entry.

src/gradata/_events.py (2)

77-80: Events schema evolution is consistent and migration-safe.

Adding tenant_id + schema_version with guarded ALTERs and tenant index creation is a solid compatibility pattern.

Also applies to: 101-106


174-180: Good consistency: single-write and batch-write paths both persist tenant metadata.

Also applies to: 599-616

src/gradata/enhancements/scoring/loop_intelligence.py (1)

107-122: LGTM — tenant_id columns added consistently to both tables.

The schema creation and defensive migration for activity_log and prep_outcomes tables correctly include tenant_id TEXT columns and handle pre-existing schemas gracefully.

src/gradata/_core.py (2)

465-480: LGTM — tenant-scoped pending_approvals insert.

The defensive schema migration and tenant_id inclusion follow the established PR pattern. The imports inside the try block are acceptable for this optional functionality path.


768-785: LGTM — tenant-scoped lesson_transitions lineage.

Consistent application of the tenant_id pattern for graduation lineage persistence.

src/gradata/hooks/inject_brain_rules.py (2)

47-50: LGTM — environment-configurable injection limits.

Making MAX_RULES, MIN_CONFIDENCE, and MAX_META_RULES configurable via environment variables improves flexibility without code changes.


133-141: LGTM — skip re-injection on compact/resume.

Good optimization to avoid duplicating ~1.9KB of rules when the compacted summary already carries them. The opt-in override via GRADATA_INJECT_ON_COMPACT enables ablation testing.

src/gradata/enhancements/meta_rules_storage.py (2)

124-152: LGTM — tenant-scoped meta_rules persistence.

The save_meta_rules function correctly derives tenant_id from the brain directory and includes it in the INSERT with explicit visibility='private'.


388-414: LGTM — defensive tenant_id migration for correction_patterns.

The schema creation and defensive ALTER TABLE migration follow the PR's established pattern.

tests/test_cloud_row_push.py (1)

1-112: LGTM — comprehensive test coverage for cloud sync push.

The test suite effectively covers:

  • Environment gating (disabled by default, disabled without credentials)
  • Tenant filtering (only matching tenant_id rows are pushed)
  • State tracking (sync_state updated after push)
  • Edge cases (no DB, tables without tenant_id column)

The fixture setup correctly creates a realistic test environment with two tenants' data.

docs/architecture/cloud-apply-instructions.md (1)

1-106: Well-structured cloud provisioning runbook.

The documentation provides clear step-by-step instructions with appropriate security considerations (1Password references, service_role key handling, RLS verification). The pre-flight checklist ensures proper setup before enabling sync.

tests/test_tenant_id_inserts.py (1)

77-314: Good regression coverage for tenant-scoped inserts.

These tests pin the exact UUID written at each insert site instead of only checking for non-nullness, which is the right failure mode for this change.

As per coding guidelines, tests/**: assertions check specific values not just truthiness.

src/gradata/_cloud_sync.py (1)

119-119: No issue. The project requires Python 3.11+ (requires-python = ">=3.11" in pyproject.toml), which fully supports zip(..., strict=False). This pattern is used consistently across the codebase and is appropriate for the baseline.

			> Likely an incorrect or invalid review comment.

Comment thread docs/architecture/cloud-apply-instructions.md
Comment thread docs/architecture/multi-tenant-future-proofing.md
Comment thread src/gradata/_cloud_sync.py Outdated
Comment thread src/gradata/_cloud_sync.py Outdated
Comment thread src/gradata/_cloud_sync.py
Comment thread src/gradata/hooks/context_inject.py
Comment on lines +311 to +314
if meta_mutex_enabled and lesson_id_fn is not None \
and lesson_id_fn(r) in meta_covered_lesson_ids:
suppressed_by_meta += 1
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Redundant None check for lesson_id_fn.

lesson_id_fn is not None is already verified in the meta_mutex_enabled condition on line 295-296. The second check on line 311 is redundant.

♻️ Minor cleanup
-        if meta_mutex_enabled and lesson_id_fn is not None \
-                and lesson_id_fn(r) in meta_covered_lesson_ids:
+        if meta_mutex_enabled and lesson_id_fn(r) in meta_covered_lesson_ids:
             suppressed_by_meta += 1
             continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/hooks/inject_brain_rules.py` around lines 311 - 314, Remove the
redundant `lesson_id_fn is not None` check inside the conditional that
increments `suppressed_by_meta`; since `meta_mutex_enabled` already guarantees
`lesson_id_fn` is set (checked earlier), simplify the if to `if
meta_mutex_enabled and lesson_id_fn(r) in meta_covered_lesson_ids:` so the block
using `lesson_id_fn`, `meta_covered_lesson_ids`, and `suppressed_by_meta` stays
the same but without the needless None check.

Comment thread src/gradata/hooks/rule_enforcement.py Outdated
Comment thread src/gradata/rules/rule_graph.py Outdated
Comment on lines 182 to +184
with patch.dict(os.environ, {"GRADATA_BRAIN_DIR": str(tmp_path)}), \
patch("gradata.brain.Brain", return_value=mock_brain):
result = context_main({"message": "How do I set up the pipeline for new prospects?"})
result = context_main({"message": "How do I set up the pipeline for new prospects in the onboarding workflow? I'd like to understand the full process from lead capture through to qualification."})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make this test independent of ambient GRADATA_CONTEXT_INJECT.

This test can false-fail if that env var is set to "0" by surrounding test environment. Set it explicitly in this patch context.

Proposed fix
-    with patch.dict(os.environ, {"GRADATA_BRAIN_DIR": str(tmp_path)}), \
+    with patch.dict(os.environ, {"GRADATA_BRAIN_DIR": str(tmp_path), "GRADATA_CONTEXT_INJECT": "1"}), \
          patch("gradata.brain.Brain", return_value=mock_brain):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with patch.dict(os.environ, {"GRADATA_BRAIN_DIR": str(tmp_path)}), \
patch("gradata.brain.Brain", return_value=mock_brain):
result = context_main({"message": "How do I set up the pipeline for new prospects?"})
result = context_main({"message": "How do I set up the pipeline for new prospects in the onboarding workflow? I'd like to understand the full process from lead capture through to qualification."})
with patch.dict(os.environ, {"GRADATA_BRAIN_DIR": str(tmp_path), "GRADATA_CONTEXT_INJECT": "1"}), \
patch("gradata.brain.Brain", return_value=mock_brain):
result = context_main({"message": "How do I set up the pipeline for new prospects in the onboarding workflow? I'd like to understand the full process from lead capture through to qualification."})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_hooks_intelligence.py` around lines 182 - 184, The test is flaky
when ambient GRADATA_CONTEXT_INJECT is set; update the patched environment in
the test to explicitly set "GRADATA_CONTEXT_INJECT" (e.g., "1") alongside
"GRADATA_BRAIN_DIR" so the context injection behavior is deterministic. Modify
the patch.dict call that currently sets {"GRADATA_BRAIN_DIR": str(tmp_path)} to
include the explicit "GRADATA_CONTEXT_INJECT" value so the invocation of
context_main(...) with mock_brain is independent of external environment.

Critical:
- _cloud_sync.push: only advance sync_state.last_push_at when every
  table with pending rows pushed successfully (no more skipped retries)
- tenant_uuid.get_or_create_tenant_id: atomic O_CREAT|O_EXCL + os.replace
  so concurrent startups can't mint different UUIDs for the same brain

Major:
- _cloud_sync: datetime.UTC (UP017), sort __all__ (RUF022), drop unused
  noqa (RUF100), narrow OperationalError scope, tolerate missing sync_state
  in _mark_push, accept db file or brain dir (no hardcoded 'system.db')
- _migrations/_runner: mark_applied/ensure_migrations_table no longer
  commit internally — caller (_apply_numbered) owns the transaction so
  schema changes and the tracking row land atomically
- 001_add_tenant_id.up: remove trailing commit, backfill events.schema_version
  unconditionally so retries clean up partial NULLs
- fill_null_tenant: drop meta_rules/frameworks/rule_relationships from
  CANDIDATE_TABLES — NULL tenant_id is the global sentinel there
- _query: suppress sqlite3.OperationalError (not bare Exception)
- _tenant.tenant_for: bypass lru_cache when GRADATA_TENANT_ID is set so
  test/CI overrides take effect immediately
- loop_intelligence.log_outcome: UPDATE prep_outcomes now filters by
  tenant_id (allow NULL legacy rows) to prevent cross-tenant writes
- meta_rules_storage: remove empty TYPE_CHECKING block (TC005)
- rule_graph: hoist contextlib import to module scope

Docs:
- cloud-apply-instructions: template the tenant/email placeholders
- multi-tenant-future-proofing: table classification now matches
  PER_TENANT_TABLES/MIXED_VISIBILITY_TABLES in migration 001

Tests: 3614 passed, 21 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/gradata/_query.py (3)

105-108: 🧹 Nitpick | 🔵 Trivial

Overly broad exception suppression in fts_rebuild.

Using contextlib.suppress(Exception) hides all errors including programming mistakes. Other defensive migrations in this PR use sqlite3.OperationalError. Consider narrowing for consistency.

♻️ Proposed fix
-    with contextlib.suppress(Exception):
+    with contextlib.suppress(sqlite3.OperationalError):
         conn.execute("DROP TABLE IF EXISTS brain_fts")
-    with contextlib.suppress(Exception):
+    with contextlib.suppress(sqlite3.OperationalError):
         conn.execute("DROP TABLE IF EXISTS brain_fts_content")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_query.py` around lines 105 - 108, The two overly-broad
suppressions in fts_rebuild that use contextlib.suppress(Exception) should be
narrowed to only suppress sqlite3.OperationalError (or the same sqlite error
class used elsewhere) so real programming errors aren't silenced; update the two
calls that wrap conn.execute("DROP TABLE IF EXISTS brain_fts") and
conn.execute("DROP TABLE IF EXISTS brain_fts_content") to use
contextlib.suppress(sqlite3.OperationalError) and add an import for sqlite3 if
it's not already present, matching the project’s other defensive migration
patterns.

1-12: ⚠️ Potential issue | 🟡 Minor

Missing from __future__ import annotations import.

The coding guidelines require from __future__ import annotations for all files in src/gradata/**/*.py. This file is missing the import.

🛠️ Proposed fix
 """
 Brain Query Script — FTS5 Search.
 ====================================
 Primary search: SQLite FTS5 (keyword matching).
 sqlite-vec planned for vector similarity.
 """
+from __future__ import annotations

 import contextlib

As per coding guidelines, "src/gradata/**/*.py: ... type safety (from future import annotations required)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_query.py` around lines 1 - 12, This module is missing the
required future annotations import; add the line "from __future__ import
annotations" immediately after the module docstring (before other imports such
as contextlib, sqlite3, datetime, Path, typing) in src/gradata/_query.py so all
type hints in this file use postponed evaluation as required by the project's
coding guidelines.

8-32: ⚠️ Potential issue | 🟡 Minor

Organize imports to fix ruff I001 error.

The import block is unsorted or unformatted. Ruff reports an I001 error at lines 8–32. Organize imports to resolve the linting failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_query.py` around lines 8 - 32, The import block in
src/gradata/_query.py is unsorted and triggers ruff I001; reorder imports into
standard groups (stdlib, third-party, local) and alphabetize within groups,
ensure typing imports (TYPE_CHECKING) stay with stdlib, keep from-imports
together (e.g., from datetime import datetime, from pathlib import Path), and
collapse the large gradata._config multi-line import into a properly sorted,
grouped from-import list; remove any truly unused imports if present and run
isort/ruff autofix to validate (look for symbols like contextlib, sqlite3,
datetime, Path, TYPE_CHECKING, gradata._paths as _p, tenant_for, and the
CONFIDENCE_*/DEFAULT_TOP_K/FILE_TYPE_MAP/... constants to preserve).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/architecture/cloud-apply-instructions.md`:
- Around line 28-49: The SQL example uses a generic `<your-tenant-uuid>` which
conflicts with the earlier requirement that `auth.users.id` must equal the
pinned tenant UUID `402bc79c-1ef3-42e4-a410-52b909babfc6`; update the `INSERT
INTO tenant_registry` example so the `tenant_id` value is the exact UUID
`402bc79c-1ef3-42e4-a410-52b909babfc6` (keep the other placeholders like
`display_name` and `email` as-is) to ensure the `tenant_registry.tenant_id`
matches the auth user ID and avoids breaking RLS mapping.

In `@docs/architecture/multi-tenant-future-proofing.md`:
- Around line 83-100: Update the Phase 1 plan to reflect the actual APIs and
scope introduced in this PR: replace references to the non-existent
get_tenant_id() and src/gradata/_migrations/cloud_sync.py push+pull behavior
with the implemented tenant_for(...) API in src/gradata/_tenant.py and the
current push-only cloud sync implementation (module _cloud_sync.py); explicitly
state that sync is push-only, that tenant resolution uses tenant_for(...), and
remove or mark as deferred any push+pull, pull-only, or offline-queueing
responsibilities that the original plan described but the code does not provide.

In `@src/gradata/_cloud_sync.py`:
- Around line 167-180: _resolve_db currently hardcodes "system.db" and
"brain.db", which violates the guideline to use BrainContext for DB access;
update the callers (e.g., the push() flow) to accept a BrainContext and use
ctx.db_path (or change _resolve_db to accept a BrainContext and return
ctx.db_path) instead of probing for hardcoded filenames, remove the
("system.db","brain.db") literals from _resolve_db, and if you intentionally
keep an adapter, add a clear docstring stating it's an MVP adapter and why it
falls back to filename probing.

In `@src/gradata/_migrations/_runner.py`:
- Around line 30-35: has_applied() currently calls
ensure_migrations_table(conn), which can create the migrations table and thus
mutates the DB during status checks; change has_applied to avoid calling
ensure_migrations_table — instead first check for the existence of the
"migrations" table via a read-only query against sqlite_master (or PRAGMA) and
only run the SELECT from migrations if that check shows the table exists; update
the function referencing has_applied, ensure_migrations_table, and the
"migrations" table so dry-run/status checks do not create schema objects.

In `@src/gradata/_migrations/001_add_tenant_id.py`:
- Around line 264-314: Replace all direct print() calls in the migration
entrypoint with the project logger: import logging and get a module logger
(e.g., logger = logging.getLogger(__name__)), then convert stdout prints to
logger.info(...) and the stderr error (the "ERROR: brain DB not found..."
message) to logger.error(...) or logger.exception(...) if an exception is
available; also convert dry-run and progress messages (plan listing, sample
table rows, summary lines) to logger.info and use logger.debug for verbose
internal details if desired. Update the code paths around
get_or_create_tenant_id, has_applied, plan, up, and mark_applied to use logger
instead of print, preserving the same message text and formatting (including
f-strings) and ensure no print() calls remain in this module.
- Around line 300-315: The migration caller never commits the DB transaction:
after calling up(conn, tenant_id=tid) and mark_applied(conn, NAME, ...), add
conn.commit() before printing success/return to ensure changes (backfilled rows
and the migrations row) are persisted; also replace all print(...) calls in this
block with logging.info(...) (and add import logging at the top) so the module
follows the src/gradata logging guideline and the success messages are emitted
via the logger.

In `@src/gradata/_migrations/fill_null_tenant.py`:
- Around line 23-25: The unused "# noqa: E402" directives on the imports are
causing CI failures; edit the import lines that reference _runner
(column_exists, resolve_brain_db, table_exists) and tenant_uuid (read_tenant_id)
to remove only the trailing "noqa: E402" comments while keeping the existing
"type: ignore[import-not-found]" and formatting intact so the imports read
normally without the unused noqa directive.

In `@src/gradata/_migrations/tenant_uuid.py`:
- Around line 15-20: Reorder and group the imports in tenant_uuid.py to satisfy
ruff: keep the __future__ import first (from __future__ import annotations), add
a blank line, then place all stdlib imports sorted alphabetically (argparse, os,
uuid, and from pathlib import Path) so the import block reads: __future__
import, blank line, then argparse, os, uuid, and Path in alphabetical order.
- Around line 53-57: Remove the TOCTOU pattern by deleting the if not
fpath.exists() check and instead unconditionally attempt os.replace(tmp, fpath)
in a try/except: call os.replace(tmp, fpath) directly, and on any OSError
(another process raced and replaced the file) ensure tmp is cleaned up
(os.unlink(tmp) if it still exists), then proceed to the existing read logic
that reads fpath (the lines that parse the winner's UUID). Keep using fpath and
tmp variables and preserve the final read at lines ~62-64 so the winner's value
is returned.

In `@src/gradata/_tenant.py`:
- Around line 55-58: get_tenant_id currently writes the new UUID with
fpath.write_text which is non-atomic and can race; change get_tenant_id to use
the atomic creation pattern used in src/gradata/_migrations/tenant_uuid.py by
either calling tenant_uuid.get_or_create_tenant_id() from get_tenant_id or
re-implementing the O_CREAT|O_EXCL + os.replace pattern (create a temp file with
the new UUID, use os.open with O_CREAT|O_EXCL or tempfile to ensure exclusive
create, then os.replace into fpath) and keep brain.mkdir(parents=True,
exist_ok=True) before the atomic create so two concurrent processes cannot
produce different tenant_ids.

---

Outside diff comments:
In `@src/gradata/_query.py`:
- Around line 105-108: The two overly-broad suppressions in fts_rebuild that use
contextlib.suppress(Exception) should be narrowed to only suppress
sqlite3.OperationalError (or the same sqlite error class used elsewhere) so real
programming errors aren't silenced; update the two calls that wrap
conn.execute("DROP TABLE IF EXISTS brain_fts") and conn.execute("DROP TABLE IF
EXISTS brain_fts_content") to use contextlib.suppress(sqlite3.OperationalError)
and add an import for sqlite3 if it's not already present, matching the
project’s other defensive migration patterns.
- Around line 1-12: This module is missing the required future annotations
import; add the line "from __future__ import annotations" immediately after the
module docstring (before other imports such as contextlib, sqlite3, datetime,
Path, typing) in src/gradata/_query.py so all type hints in this file use
postponed evaluation as required by the project's coding guidelines.
- Around line 8-32: The import block in src/gradata/_query.py is unsorted and
triggers ruff I001; reorder imports into standard groups (stdlib, third-party,
local) and alphabetize within groups, ensure typing imports (TYPE_CHECKING) stay
with stdlib, keep from-imports together (e.g., from datetime import datetime,
from pathlib import Path), and collapse the large gradata._config multi-line
import into a properly sorted, grouped from-import list; remove any truly unused
imports if present and run isort/ruff autofix to validate (look for symbols like
contextlib, sqlite3, datetime, Path, TYPE_CHECKING, gradata._paths as _p,
tenant_for, and the CONFIDENCE_*/DEFAULT_TOP_K/FILE_TYPE_MAP/... constants to
preserve).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c0f0e5b8-90b3-4607-b61d-68a409ab0edd

📥 Commits

Reviewing files that changed from the base of the PR and between adb930e and 25ae13e.

📒 Files selected for processing (12)
  • docs/architecture/cloud-apply-instructions.md
  • docs/architecture/multi-tenant-future-proofing.md
  • src/gradata/_cloud_sync.py
  • src/gradata/_migrations/001_add_tenant_id.py
  • src/gradata/_migrations/_runner.py
  • src/gradata/_migrations/fill_null_tenant.py
  • src/gradata/_migrations/tenant_uuid.py
  • src/gradata/_query.py
  • src/gradata/_tenant.py
  • src/gradata/enhancements/meta_rules_storage.py
  • src/gradata/enhancements/scoring/loop_intelligence.py
  • src/gradata/rules/rule_graph.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/rules/rule_graph.py
  • src/gradata/enhancements/scoring/loop_intelligence.py
  • src/gradata/_tenant.py
  • src/gradata/_migrations/tenant_uuid.py
  • src/gradata/_migrations/fill_null_tenant.py
  • src/gradata/_query.py
  • src/gradata/_migrations/_runner.py
  • src/gradata/_cloud_sync.py
  • src/gradata/_migrations/001_add_tenant_id.py
  • src/gradata/enhancements/meta_rules_storage.py
🪛 GitHub Actions: SDK CI
src/gradata/rules/rule_graph.py

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/enhancements/scoring/loop_intelligence.py

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_tenant.py

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_migrations/tenant_uuid.py

[error] 15-20: ruff: Import block is un-sorted or un-formatted. (I001) Help: Organize imports


[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_migrations/fill_null_tenant.py

[error] 24-25: ruff: Import block is un-sorted or un-formatted. (I001) Help: Organize imports


[error] 24-24: ruff: Unused noqa directive (non-enabled: E402). (RUF100) Help: Remove unused noqa directive


[error] 25-25: ruff: Unused noqa directive (non-enabled: E402). (RUF100) Help: Remove unused noqa directive


[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_query.py

[error] 8-32: ruff: Import block is un-sorted or un-formatted. (I001) Help: Organize imports


[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_migrations/_runner.py

[error] 8-12: ruff: Import block is un-sorted or un-formatted. (I001) Help: Organize imports


[error] 50-50: ruff: Use datetime.UTC alias. (UP017) Help: Convert to datetime.UTC alias


[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_cloud_sync.py

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/_migrations/001_add_tenant_id.py

[error] 1-1: ruff: Invalid module name: '001_add_tenant_id'. (N999)


[error] 35-44: ruff: Import block is un-sorted or un-formatted. (I001) Help: Organize imports


[error] 35-35: ruff: Unused noqa directive (non-enabled: E402). (RUF100) Help: Remove unused noqa directive


[error] 44-44: ruff: Unused noqa directive (non-enabled: E402). (RUF100) Help: Remove unused noqa directive


[error] 176-176: ruff: Use datetime.UTC alias. (UP017) Help: Convert to datetime.UTC alias


[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

src/gradata/enhancements/meta_rules_storage.py

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

🪛 LanguageTool
docs/architecture/multi-tenant-future-proofing.md

[uncategorized] ~142-~142: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...yle flow. Defer until it matters. 4. Open source tension. The SDK is Apache-2.0 but `s...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~144-~144: Use a hyphen to join words.
Context: ...Apache license clean-room. 5. "Future proof now" is a trap. Most future-proo...

(QB_NEW_EN_HYPHEN)

🔇 Additional comments (14)
src/gradata/_tenant.py (1)

78-88: LGTM: Cache bypass for env override is correctly implemented.

The tenant_for function now correctly bypasses the LRU cache when GRADATA_TENANT_ID is set, addressing the earlier concern about test/CI overrides not taking effect.

src/gradata/rules/rule_graph.py (1)

243-264: LGTM: Tenant-scoped persistence with defensive migration.

The store_relationship function correctly:

  • Derives tenant_id from the database path
  • Performs defensive schema migration for pre-tenant databases
  • Includes tenant_id in the INSERT statement
  • Clamps confidence to [0.0, 1.0] per SDK guidelines

The pattern is consistent with meta_rules_storage.py and other modules in this PR.

src/gradata/_query.py (1)

47-51: LGTM: Defensive migration correctly narrowed to OperationalError.

The exception handling now correctly suppresses only sqlite3.OperationalError for the duplicate column case, allowing other errors to surface. This addresses the earlier review feedback.

src/gradata/enhancements/scoring/loop_intelligence.py (2)

134-136: LGTM: Defensive migration for tenant_id column.

The migration correctly adds tenant_id TEXT to both activity_log and prep_outcomes tables using the established contextlib.suppress(sqlite3.OperationalError) pattern.


242-263: LGTM: Backward-compatible tenant scoping in log_outcome.

The (tenant_id = ? OR tenant_id IS NULL) clause correctly handles:

  • New rows with explicit tenant_id
  • Pre-migration rows where tenant_id is NULL

This ensures existing unresolved preps can still be linked to outcomes during the migration period.

src/gradata/enhancements/meta_rules_storage.py (3)

100-104: LGTM: Defensive migration for tenant_id and visibility columns.

The ensure_table function correctly migrates existing schemas by adding both tenant_id TEXT and visibility TEXT DEFAULT 'private' columns using the established suppression pattern.


121-150: LGTM: Tenant-scoped meta-rule persistence.

The save_meta_rules function correctly:

  • Computes tenant_id from the database path
  • Includes tenant_id in the INSERT statement
  • Hard-codes visibility to 'private' for locally-created rules

406-408: LGTM: Defensive migration for correction_patterns.

Consistent with the established pattern for adding tenant_id to existing tables.

src/gradata/_migrations/tenant_uuid.py (1)

85-100: Print statements in CLI entrypoint are acceptable.

The print() calls are in the _main() CLI function, which is appropriate for command-line output. This doesn't violate the "no print()" guideline which targets library code, not CLI scripts.

src/gradata/_cloud_sync.py (3)

69-84: LGTM: Properly handles missing sync_state table.

The _last_push_at function now checks _sync_state_exists() before querying, avoiding OperationalError on pre-migration brains. This addresses the earlier review feedback.


87-101: LGTM: _mark_push gracefully handles missing table.

The function now returns early when sync_state doesn't exist, preventing errors on unmigrated brains.


206-219: LGTM: Watermark only advances on full success.

The all_ok flag correctly ensures _mark_push is only called when every table's rows were fully accepted. Partial failures leave the watermark unchanged for retry.

src/gradata/_migrations/fill_null_tenant.py (2)

36-69: LGTM: Mixed-visibility tables correctly excluded.

The CANDIDATE_TABLES list deliberately excludes meta_rules, frameworks, and rule_relationships as documented in the comment (lines 31-35). This prevents accidentally converting global (tenant_id IS NULL) rows into tenant-specific rows.


108-119: SQL injection risk mitigated by allowlist.

The f-string SQL at lines 109 and 118 uses table names from the hardcoded CANDIDATE_TABLES allowlist, not user input. This is an acceptable pattern for migration scripts where the table list is controlled.

Comment on lines +28 to +49
Critical: Oliver's `auth.users.id` MUST equal his local tenant UUID
`402bc79c-1ef3-42e4-a410-52b909babfc6`. Otherwise RLS blocks him from his
own data.

Do this via the Studio -> Authentication -> Users -> "Add user" flow only
if the Studio UI exposes a user-id override. If it does NOT, use the admin
REST endpoint `/auth/v1/admin/users` with the service role token from
1Password, setting the body field `id` to the UUID above and `email_confirm`
to true. Keep the token in an env var, never paste it in a file.

After the auth user exists, in SQL Editor run:

```sql
INSERT INTO tenant_registry (tenant_id, display_name, email, is_primary, tier)
VALUES (
'<your-tenant-uuid>',
'<your-display-name>',
'<your-email>',
TRUE,
'founder'
) ON CONFLICT (tenant_id) DO NOTHING;
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reuse the exact auth UUID in the tenant_registry example.

This section says the auth.users.id must be the pinned tenant UUID, but the SQL snippet then falls back to a generic <your-tenant-uuid> placeholder. That makes it easy to seed tenant_registry with a different UUID and immediately break the RLS mapping this step is warning about.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/architecture/cloud-apply-instructions.md` around lines 28 - 49, The SQL
example uses a generic `<your-tenant-uuid>` which conflicts with the earlier
requirement that `auth.users.id` must equal the pinned tenant UUID
`402bc79c-1ef3-42e4-a410-52b909babfc6`; update the `INSERT INTO tenant_registry`
example so the `tenant_id` value is the exact UUID
`402bc79c-1ef3-42e4-a410-52b909babfc6` (keep the other placeholders like
`display_name` and `email` as-is) to ensure the `tenant_registry.tenant_id`
matches the auth user ID and avoids breaking RLS mapping.

Comment thread docs/architecture/multi-tenant-future-proofing.md
Comment on lines +167 to +180
def _resolve_db(brain_dir: str | Path) -> Path | None:
"""Return the SQLite path inside ``brain_dir`` if present, else None.

Accepts either the brain directory OR the .db file directly so the caller
can pass a BrainContext.db_path without a wrapping if-guard.
"""
p = Path(brain_dir).expanduser().resolve()
if p.is_file():
return p if p.exists() else None
for name in ("system.db", "brain.db"):
candidate = p / name
if candidate.exists():
return candidate
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Hardcoded database filenames in _resolve_db.

The function hardcodes "system.db" and "brain.db" at lines 176-177. While the function provides flexibility by accepting either a directory or file path, this pattern diverges from the SDK guideline preferring BrainContext for DB access.

Consider accepting BrainContext in push() and using ctx.db_path directly, or document this as an intentional adapter pattern for the cloud sync MVP.

As per coding guidelines, "src/gradata/**/*.py: ... all functions accepting BrainContext where DB access occurs, no hardcoded paths."

🧰 Tools
🪛 GitHub Actions: SDK CI

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_cloud_sync.py` around lines 167 - 180, _resolve_db currently
hardcodes "system.db" and "brain.db", which violates the guideline to use
BrainContext for DB access; update the callers (e.g., the push() flow) to accept
a BrainContext and use ctx.db_path (or change _resolve_db to accept a
BrainContext and return ctx.db_path) instead of probing for hardcoded filenames,
remove the ("system.db","brain.db") literals from _resolve_db, and if you
intentionally keep an adapter, add a clear docstring stating it's an MVP adapter
and why it falls back to filename probing.

Comment thread src/gradata/_migrations/_runner.py
Comment on lines +264 to +314
print(f"ERROR: brain DB not found at {db_path}", file=sys.stderr)
return 2

brain_dir = db_path.parent
tid = get_or_create_tenant_id(brain_dir)
print(f"Brain : {db_path}")
print(f"Tenant : {tid}")

conn = sqlite3.connect(str(db_path))
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA busy_timeout=5000")

try:
if has_applied(conn, NAME) and not args.dry_run:
print(f"Already applied: {NAME} (no-op)")
return 0

p = plan(conn)
print("\n--- plan ---")
for a in p["actions"]:
print(f" {a}")
print(
f" backfill {p['total_rows_to_backfill']} rows across "
f"{len(p['row_backfills'])} tables"
)
if p["row_backfills"]:
sample = p["row_backfills"][:10]
for t, c in sample:
print(f" {t:30s} {c:>8} rows")
if len(p["row_backfills"]) > 10:
print(f" ... and {len(p['row_backfills']) - 10} more")

if args.dry_run:
print("\n(dry-run) no changes made")
return 0

print("\n--- applying ---")
summary = up(conn, tenant_id=tid)
mark_applied(
conn,
NAME,
rows_affected=summary["rows_backfilled"],
notes=f"primary tenant {tid}",
)
print(f"columns_added : {len(summary['columns_added'])}")
for c in summary["columns_added"]:
print(f" + {c}")
print(f"indexes_created : {len(summary['indexes_created'])}")
print(f"rows_backfilled : {summary['rows_backfilled']}")
print(f"visibility_backfilled : {summary['visibility_backfilled']}")
print("\nOK")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace print()-based reporting with logging.

This new migration entrypoint is under src/gradata/**/*.py, but the CLI path still writes directly to stdout/stderr throughout Lines 264-314. Please switch this reporting to the project logger instead of print().

As per coding guidelines, src/gradata/**/*.py: "no print() statements (use logging)".

🧰 Tools
🪛 GitHub Actions: SDK CI

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/001_add_tenant_id.py` around lines 264 - 314, Replace
all direct print() calls in the migration entrypoint with the project logger:
import logging and get a module logger (e.g., logger =
logging.getLogger(__name__)), then convert stdout prints to logger.info(...) and
the stderr error (the "ERROR: brain DB not found..." message) to
logger.error(...) or logger.exception(...) if an exception is available; also
convert dry-run and progress messages (plan listing, sample table rows, summary
lines) to logger.info and use logger.debug for verbose internal details if
desired. Update the code paths around get_or_create_tenant_id, has_applied,
plan, up, and mark_applied to use logger instead of print, preserving the same
message text and formatting (including f-strings) and ensure no print() calls
remain in this module.

Comment on lines +300 to +315
print("\n--- applying ---")
summary = up(conn, tenant_id=tid)
mark_applied(
conn,
NAME,
rows_affected=summary["rows_backfilled"],
notes=f"primary tenant {tid}",
)
print(f"columns_added : {len(summary['columns_added'])}")
for c in summary["columns_added"]:
print(f" + {c}")
print(f"indexes_created : {len(summary['indexes_created'])}")
print(f"rows_backfilled : {summary['rows_backfilled']}")
print(f"visibility_backfilled : {summary['visibility_backfilled']}")
print("\nOK")
return 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and verify the file exists
find . -name "001_add_tenant_id.py" -type f

Repository: Gradata/gradata

Length of output: 105


🏁 Script executed:

# Check the file size to determine how much we can safely read
wc -l src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 107


🏁 Script executed:

# Read lines 1-30 to verify __future__ import and initial context
head -30 src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 1115


🏁 Script executed:

# Read lines 240-260 to understand the up() function and transaction ownership
sed -n '240,260p' src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 717


🏁 Script executed:

# Read lines 290-320 to verify the exact content at lines 300-315
sed -n '290,320p' src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 1121


🏁 Script executed:

# Check if logging is imported anywhere in the file
grep -n "import logging\|from logging" src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Search for all conn.commit() calls in the file to understand transaction management
grep -n "conn\.commit()" src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Verify if this migration file is meant to be run as a standalone script
# Check the main guard and execution model
sed -n '315,321p' src/gradata/_migrations/001_add_tenant_id.py

Repository: Gradata/gradata

Length of output: 168


🏁 Script executed:

# Check if there are other migration files to understand the pattern
ls -la src/gradata/_migrations/

Repository: Gradata/gradata

Length of output: 508


Add conn.commit() before closing; transaction is left open and changes are discarded.

The up() function correctly delegates commit responsibility to the caller, but _main() never executes conn.commit() after calling mark_applied(). This means the migration can print OK and exit while the sqlite connection closes with all changes (backfilled data and the migrations tracking row) still uncommitted.

Additionally, lines 300–315 use print() statements which violate the coding guideline for src/gradata/**/*.py (use logging instead).

Suggested fix
         summary = up(conn, tenant_id=tid)
         mark_applied(
             conn,
             NAME,
             rows_affected=summary["rows_backfilled"],
             notes=f"primary tenant {tid}",
         )
+        conn.commit()
         print(f"columns_added          : {len(summary['columns_added'])}")

Also replace print() calls with logging.info() or similar and add import logging at the top.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print("\n--- applying ---")
summary = up(conn, tenant_id=tid)
mark_applied(
conn,
NAME,
rows_affected=summary["rows_backfilled"],
notes=f"primary tenant {tid}",
)
print(f"columns_added : {len(summary['columns_added'])}")
for c in summary["columns_added"]:
print(f" + {c}")
print(f"indexes_created : {len(summary['indexes_created'])}")
print(f"rows_backfilled : {summary['rows_backfilled']}")
print(f"visibility_backfilled : {summary['visibility_backfilled']}")
print("\nOK")
return 0
print("\n--- applying ---")
summary = up(conn, tenant_id=tid)
mark_applied(
conn,
NAME,
rows_affected=summary["rows_backfilled"],
notes=f"primary tenant {tid}",
)
conn.commit()
print(f"columns_added : {len(summary['columns_added'])}")
for c in summary["columns_added"]:
print(f" + {c}")
print(f"indexes_created : {len(summary['indexes_created'])}")
print(f"rows_backfilled : {summary['rows_backfilled']}")
print(f"visibility_backfilled : {summary['visibility_backfilled']}")
print("\nOK")
return 0
🧰 Tools
🪛 GitHub Actions: SDK CI

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/001_add_tenant_id.py` around lines 300 - 315, The
migration caller never commits the DB transaction: after calling up(conn,
tenant_id=tid) and mark_applied(conn, NAME, ...), add conn.commit() before
printing success/return to ensure changes (backfilled rows and the migrations
row) are persisted; also replace all print(...) calls in this block with
logging.info(...) (and add import logging at the top) so the module follows the
src/gradata logging guideline and the success messages are emitted via the
logger.

Comment thread src/gradata/_migrations/fill_null_tenant.py Outdated
Comment on lines +15 to +20
from __future__ import annotations

import argparse
import os
import uuid
from pathlib import Path

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix import sorting to resolve CI failure.

The pipeline reports ruff: Import block is un-sorted or un-formatted. (I001) for this file.

🧰 Tools
🪛 GitHub Actions: SDK CI

[error] 15-20: ruff: Import block is un-sorted or un-formatted. (I001) Help: Organize imports


[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/tenant_uuid.py` around lines 15 - 20, Reorder and
group the imports in tenant_uuid.py to satisfy ruff: keep the __future__ import
first (from __future__ import annotations), add a blank line, then place all
stdlib imports sorted alphabetically (argparse, os, uuid, and from pathlib
import Path) so the import block reads: __future__ import, blank line, then
argparse, os, uuid, and Path in alphabetical order.

Comment on lines +53 to +57
if not fpath.exists():
os.replace(tmp, fpath)
else:
# Lost the race: drop our temp, read what the winner wrote.
os.unlink(tmp)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Subtle TOCTOU race in atomic tenant creation.

The if not fpath.exists() check at line 53 introduces a race: another process could win between the check and os.replace(). The safer pattern is to unconditionally attempt os.replace() and handle any resulting state.

However, since os.replace() is atomic and the final read at lines 62-64 always returns the winner's value, the practical impact is low (worst case: one process overwrites the other's identical-format UUID file, but both get a valid UUID). The current code is acceptable but could be simplified.

💡 Simplified atomic pattern
-        if not fpath.exists():
-            os.replace(tmp, fpath)
-        else:
-            # Lost the race: drop our temp, read what the winner wrote.
-            os.unlink(tmp)
+        # Unconditionally replace - os.replace is atomic.
+        # If another process already wrote, we'll read their value below.
+        os.replace(tmp, fpath)
🧰 Tools
🪛 GitHub Actions: SDK CI

[error] ruff check reported 13 errors and exited with code 1 (Process completed with exit code 1).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/tenant_uuid.py` around lines 53 - 57, Remove the
TOCTOU pattern by deleting the if not fpath.exists() check and instead
unconditionally attempt os.replace(tmp, fpath) in a try/except: call
os.replace(tmp, fpath) directly, and on any OSError (another process raced and
replaced the file) ensure tmp is cleaned up (os.unlink(tmp) if it still exists),
then proceed to the existing read logic that reads fpath (the lines that parse
the winner's UUID). Keep using fpath and tmp variables and preserve the final
read at lines ~62-64 so the winner's value is returned.

Comment thread src/gradata/_tenant.py Outdated
- Apply ruff --fix imports/sort across migrations + _query
- Add file-level noqa for N999 on 001_add_tenant_id.py
  (digit prefix is intentional — runs as script, not import)

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/gradata/_query.py (2)

1-14: ⚠️ Potential issue | 🟡 Minor

Add missing from __future__ import annotations import.

The file is missing the required future annotations import for type safety.

🛠️ Proposed fix
 """
 Brain Query Script — FTS5 Search.
 ====================================
 Primary search: SQLite FTS5 (keyword matching).
 sqlite-vec planned for vector similarity.
 """
+from __future__ import annotations

 import contextlib
 import sqlite3

As per coding guidelines, src/gradata/**/*.py: "type safety (from future import annotations required)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_query.py` around lines 1 - 14, Add the missing future
annotations import by inserting "from __future__ import annotations" as the very
first statement in the module to enable postponed evaluation of annotations;
update the top of src/gradata/_query.py (before any other imports such as
contextlib, sqlite3, datetime, pathlib, typing and the import of gradata._paths)
so functions/classes in this module use forward-referenced type annotations
safely.

102-110: ⚠️ Potential issue | 🟡 Minor

Overly broad exception suppression in fts_rebuild.

Lines 105-108 use contextlib.suppress(Exception) which hides all errors including programming errors and I/O failures. The defensive migration at lines 47-51 correctly uses sqlite3.OperationalError — this should follow the same pattern.

🛠️ Proposed fix
 def fts_rebuild(ctx: "BrainContext | None" = None):
     db = ctx.db_path if ctx else _p.DB_PATH
     conn = sqlite3.connect(str(db))
-    with contextlib.suppress(Exception):
+    with contextlib.suppress(sqlite3.OperationalError):
         conn.execute("DROP TABLE IF EXISTS brain_fts")
-    with contextlib.suppress(Exception):
+    with contextlib.suppress(sqlite3.OperationalError):
         conn.execute("DROP TABLE IF EXISTS brain_fts_content")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_query.py` around lines 102 - 110, In fts_rebuild, replace the
overly broad contextlib.suppress(Exception) wrappers around the DROP TABLE
statements with contextlib.suppress(sqlite3.OperationalError) so only SQLite
operational errors are swallowed; locate the two conn.execute("DROP TABLE IF
EXISTS brain_fts") and conn.execute("DROP TABLE IF EXISTS brain_fts_content")
calls inside fts_rebuild and change their suppress(...) arguments accordingly,
leaving the rest of the function (conn.commit() and the subsequent
_ensure_fts_table call) unchanged.
♻️ Duplicate comments (3)
src/gradata/_migrations/001_add_tenant_id.py (2)

264-298: 🧹 Nitpick | 🔵 Trivial

Replace print() with logging in CLI entrypoint.

The _main() function uses print() statements throughout, which violates the coding guidelines for src/gradata/**/*.py.

As per coding guidelines, src/gradata/**/*.py: "no print() statements (use logging)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/001_add_tenant_id.py` around lines 264 - 298, The CLI
entrypoint uses print() calls (e.g., in _main() around
get_or_create_tenant_id(brain_dir), printing Brain/Tenant, plan output, errors
and dry-run notices) which should be replaced with logging: import logging and
get a module logger (logger = logging.getLogger(__name__)), replace print(...,
file=sys.stderr) with logger.error(...), informational messages like "Brain",
"Tenant", plan headers and row summaries with logger.info(...) (or debug for
very verbose output), and keep f-strings intact; ensure messages use logger
methods consistently and respect args.dry_run/has_applied(NAME) logic so
behavior is unchanged.

300-317: ⚠️ Potential issue | 🔴 Critical

Critical: Missing conn.commit() after mark_applied().

The _main() function calls up() and mark_applied() but never commits the transaction. When conn.close() is called in the finally block, all changes (backfilled data and migration tracking row) are discarded. The migration will print "OK" but no changes will persist.

🐛 Proposed fix
         summary = up(conn, tenant_id=tid)
         mark_applied(
             conn,
             NAME,
             rows_affected=summary["rows_backfilled"],
             notes=f"primary tenant {tid}",
         )
+        conn.commit()
         print(f"columns_added          : {len(summary['columns_added'])}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/001_add_tenant_id.py` around lines 300 - 317, The
migration currently calls up(conn, tenant_id=tid) and mark_applied(...) but
never commits the transaction, so all changes are lost when conn.close() runs;
after mark_applied(...) (and before printing "OK" / returning) call
conn.commit() to persist the backfilled rows and the migration tracking row
(i.e., add a conn.commit() immediately after mark_applied(...) in the _main()
flow that uses up(...) and mark_applied(...), ensuring commit happens before
conn.close()).
src/gradata/_migrations/_runner.py (1)

29-34: ⚠️ Potential issue | 🟠 Major

has_applied() can create tables during dry-run.

The function calls ensure_migrations_table() which executes DDL. When 001_add_tenant_id.py invokes has_applied() before the --dry-run branch (line 277), a dry run against a fresh brain still creates the migrations table, breaking the "no changes made" contract.

🐛 Proposed fix
 def has_applied(conn: sqlite3.Connection, name: str) -> bool:
-    ensure_migrations_table(conn)
+    if not table_exists(conn, "migrations"):
+        return False
     row = conn.execute(
         "SELECT 1 FROM migrations WHERE name = ?", (name,)
     ).fetchone()
     return row is not None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/_runner.py` around lines 29 - 34, has_applied()
currently calls ensure_migrations_table() which performs DDL and thus creates
the migrations table during a dry-run; change has_applied(conn, name) to accept
a flag like create_if_missing: bool (default True) or dry_run: bool and, when
false, avoid calling ensure_migrations_table() and instead safely query
sqlite_master (e.g. SELECT 1 FROM migrations WHERE name = ? only if table
exists) or return False if the table is absent; update callers that perform
dry-run (e.g., the migration runner / 001_add_tenant_id.py invocation path) to
pass create_if_missing=False (or dry_run=True) so a dry-run never issues DDL.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_core.py`:
- Around line 769-786: The code duplicates imports for contextlib and sqlite3 as
_ctx_mod2/_sqlite3_mod2 in the block that uses get_connection and tenant_for;
remove the redundant second imports and reuse the existing aliases (e.g.,
_ctx_mod and _sqlite3_mod) or import once at the function scope, then update the
with-suppress and exception reference inside the block that calls conn.execute
and INSERT into lesson_transitions (the lines using _ctx_mod2.suppress and
_sqlite3_mod2.OperationalError) to use the consolidated aliases so there are no
duplicate import aliases.

In `@src/gradata/_migrations/_runner.py`:
- Around line 53-98: The helper functions interpolate identifiers into SQL and
risk injection; add a small validator (e.g. validate_identifier(name): allow
only ASCII letters, digits and underscores and must not be empty) and call it
before any f-string SQL that embeds a table/column/index (at least in
column_exists, add_column_if_missing, create_index_if_missing and any place
building "columns" lists); for the "columns" parameter validate it's a
comma-separated list of valid identifiers (split and validate each) and raise
ValueError if validation fails so only trusted identifier characters are
permitted before using f-strings.

In `@src/gradata/_migrations/fill_null_tenant.py`:
- Around line 75-140: The main() function uses print() for all messages; replace
prints with the module logger so it follows the project's "no print()"
guideline: import logging at top, create logger = logging.getLogger(__name__),
and change error prints (e.g. the "ERROR: brain DB not found..." and "ERROR: no
tenant UUID..." messages when checking db_path and tid) to logger.error(...),
and change informational prints (Brain, Tenant, Mode, the per-table lines, the
"no NULL tenant_id rows found" message, and the final summary line) to
logger.info(...). Keep message text and formatting (f-strings) unchanged, and do
not call logging.basicConfig here; leave configuration to the application.
Target replacements inside main() and around variables/db operations (db_path,
tid, touched, CANDIDATE_TABLES, conn, cur).

In `@src/gradata/_migrations/tenant_uuid.py`:
- Around line 61-65: The function that reads the tenant id (the code that uses
fpath.read_text(), validates with _is_valid_uuid(tid) and falls back to new_tid)
should log a warning before returning the generated new_tid so
corrupted/unwritten tenant files are visible; update the block that currently
returns new_tid to call the module logger (or processLogger) to warn that the
tenant file at fpath contained invalid data and that a new UUID (new_tid) is
being used, and ideally include the invalid tid value and fpath in the message
to aid debugging while leaving the rest of the fallback behavior unchanged.
- Around line 84-99: The CLI entrypoint _main() uses print() calls; replace them
with the module logger (create logger = logging.getLogger(__name__) if missing)
and use logger.info() for normal outputs (the tenant id and
get_or_create_tenant_id result) and logger.warning() or logger.info() for the
"(no tenant id)" message before returning 1; keep return codes the same and do
not call print() anywhere in _main(); reference read_tenant_id and
get_or_create_tenant_id when replacing their printed outputs.

---

Outside diff comments:
In `@src/gradata/_query.py`:
- Around line 1-14: Add the missing future annotations import by inserting "from
__future__ import annotations" as the very first statement in the module to
enable postponed evaluation of annotations; update the top of
src/gradata/_query.py (before any other imports such as contextlib, sqlite3,
datetime, pathlib, typing and the import of gradata._paths) so functions/classes
in this module use forward-referenced type annotations safely.
- Around line 102-110: In fts_rebuild, replace the overly broad
contextlib.suppress(Exception) wrappers around the DROP TABLE statements with
contextlib.suppress(sqlite3.OperationalError) so only SQLite operational errors
are swallowed; locate the two conn.execute("DROP TABLE IF EXISTS brain_fts") and
conn.execute("DROP TABLE IF EXISTS brain_fts_content") calls inside fts_rebuild
and change their suppress(...) arguments accordingly, leaving the rest of the
function (conn.commit() and the subsequent _ensure_fts_table call) unchanged.

---

Duplicate comments:
In `@src/gradata/_migrations/_runner.py`:
- Around line 29-34: has_applied() currently calls ensure_migrations_table()
which performs DDL and thus creates the migrations table during a dry-run;
change has_applied(conn, name) to accept a flag like create_if_missing: bool
(default True) or dry_run: bool and, when false, avoid calling
ensure_migrations_table() and instead safely query sqlite_master (e.g. SELECT 1
FROM migrations WHERE name = ? only if table exists) or return False if the
table is absent; update callers that perform dry-run (e.g., the migration runner
/ 001_add_tenant_id.py invocation path) to pass create_if_missing=False (or
dry_run=True) so a dry-run never issues DDL.

In `@src/gradata/_migrations/001_add_tenant_id.py`:
- Around line 264-298: The CLI entrypoint uses print() calls (e.g., in _main()
around get_or_create_tenant_id(brain_dir), printing Brain/Tenant, plan output,
errors and dry-run notices) which should be replaced with logging: import
logging and get a module logger (logger = logging.getLogger(__name__)), replace
print(..., file=sys.stderr) with logger.error(...), informational messages like
"Brain", "Tenant", plan headers and row summaries with logger.info(...) (or
debug for very verbose output), and keep f-strings intact; ensure messages use
logger methods consistently and respect args.dry_run/has_applied(NAME) logic so
behavior is unchanged.
- Around line 300-317: The migration currently calls up(conn, tenant_id=tid) and
mark_applied(...) but never commits the transaction, so all changes are lost
when conn.close() runs; after mark_applied(...) (and before printing "OK" /
returning) call conn.commit() to persist the backfilled rows and the migration
tracking row (i.e., add a conn.commit() immediately after mark_applied(...) in
the _main() flow that uses up(...) and mark_applied(...), ensuring commit
happens before conn.close()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 22fe80f1-d305-45a2-89ae-003ca6684f5c

📥 Commits

Reviewing files that changed from the base of the PR and between 25ae13e and e5e4eca.

📒 Files selected for processing (6)
  • src/gradata/_core.py
  • src/gradata/_migrations/001_add_tenant_id.py
  • src/gradata/_migrations/_runner.py
  • src/gradata/_migrations/fill_null_tenant.py
  • src/gradata/_migrations/tenant_uuid.py
  • src/gradata/_query.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test (Python 3.12)
🧰 Additional context used
📓 Path-based instructions (1)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/_core.py
  • src/gradata/_query.py
  • src/gradata/_migrations/001_add_tenant_id.py
  • src/gradata/_migrations/tenant_uuid.py
  • src/gradata/_migrations/fill_null_tenant.py
  • src/gradata/_migrations/_runner.py
🔇 Additional comments (5)
src/gradata/_core.py (1)

465-481: LGTM! Tenant-scoped pending_approvals INSERT.

The tenant_id integration follows the established defensive pattern with ALTER TABLE ADD COLUMN guarded by contextlib.suppress(sqlite3.OperationalError). The tenant_for() function guarantees a valid UUID.

src/gradata/_query.py (1)

39-59: LGTM! Defensive tenant_id migration and schema update.

The defensive ALTER TABLE ADD COLUMN pattern correctly suppresses only sqlite3.OperationalError and includes clear documentation about why this is necessary.

src/gradata/_migrations/fill_null_tenant.py (1)

39-72: LGTM! Correctly excludes mixed-visibility tables.

The CANDIDATE_TABLES list appropriately excludes meta_rules, frameworks, and rule_relationships where tenant_id IS NULL is a valid sentinel for global/shareable rows. The documentation at lines 34-38 clearly explains the rationale.

src/gradata/_migrations/001_add_tenant_id.py (1)

160-246: LGTM! Migration up() function is transaction-free.

The up() function correctly delegates commit responsibility to the caller, ensuring schema changes and the migration tracking row land atomically. The schema_version backfill is now idempotent.

src/gradata/_migrations/_runner.py (1)

24-26: LGTM! Migration helpers are now side-effect free.

Both ensure_migrations_table() and mark_applied() correctly delegate commit responsibility to the caller, allowing the schema changes and tracking row to land atomically.

Also applies to: 37-50

Comment thread src/gradata/_core.py
Comment on lines +769 to +786
import contextlib as _ctx_mod2
import sqlite3 as _sqlite3_mod2
from datetime import UTC, datetime

from gradata._db import get_connection
from gradata._tenant import tenant_for as _tenant_for
now = datetime.now(UTC).isoformat()
_tid = _tenant_for(brain.dir)
with get_connection(brain.db_path) as conn:
with _ctx_mod2.suppress(_sqlite3_mod2.OperationalError):
conn.execute("ALTER TABLE lesson_transitions ADD COLUMN tenant_id TEXT")
for lesson, old_state, new_state in transitions:
conn.execute(
"INSERT INTO lesson_transitions "
"(lesson_desc, category, old_state, new_state, confidence, "
"fire_count, session, transitioned_at) VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
"fire_count, session, transitioned_at, tenant_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)",
(lesson.description[:100], lesson.category, old_state, new_state,
lesson.confidence, lesson.fire_count, None, now))
lesson.confidence, lesson.fire_count, None, now, _tid))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider consolidating duplicate import aliases.

The code imports contextlib and sqlite3 twice with different aliases (_ctx_mod/_ctx_mod2, _sqlite3_mod/_sqlite3_mod2). Since both blocks are in the same function scope, you could reuse the same aliases or import them once at the function level.

♻️ Simplified imports
 def brain_end_session(
     brain: Brain, *, session_corrections: list[dict] | None = None,
     ...
 ) -> dict:
+    import contextlib as _ctx_mod
+    import sqlite3 as _sqlite3_mod
     ...
         # Persist lineage (table created by _migrations.py)
         if transitions and brain.db_path.is_file():
             try:
-                import contextlib as _ctx_mod2
-                import sqlite3 as _sqlite3_mod2
                 from datetime import UTC, datetime

                 from gradata._db import get_connection
                 from gradata._tenant import tenant_for as _tenant_for
                 now = datetime.now(UTC).isoformat()
                 _tid = _tenant_for(brain.dir)
                 with get_connection(brain.db_path) as conn:
-                    with _ctx_mod2.suppress(_sqlite3_mod2.OperationalError):
+                    with _ctx_mod.suppress(_sqlite3_mod.OperationalError):
                         conn.execute("ALTER TABLE lesson_transitions ADD COLUMN tenant_id TEXT")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 769 - 786, The code duplicates imports for
contextlib and sqlite3 as _ctx_mod2/_sqlite3_mod2 in the block that uses
get_connection and tenant_for; remove the redundant second imports and reuse the
existing aliases (e.g., _ctx_mod and _sqlite3_mod) or import once at the
function scope, then update the with-suppress and exception reference inside the
block that calls conn.execute and INSERT into lesson_transitions (the lines
using _ctx_mod2.suppress and _sqlite3_mod2.OperationalError) to use the
consolidated aliases so there are no duplicate import aliases.

Comment on lines +53 to +98
def column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
rows = conn.execute(f"PRAGMA table_info({table})").fetchall()
return any(r[1] == column for r in rows)


def table_exists(conn: sqlite3.Connection, table: str) -> bool:
row = conn.execute(
"SELECT 1 FROM sqlite_master WHERE type='table' AND name = ?", (table,)
).fetchone()
return row is not None


def index_exists(conn: sqlite3.Connection, index: str) -> bool:
row = conn.execute(
"SELECT 1 FROM sqlite_master WHERE type='index' AND name = ?", (index,)
).fetchone()
return row is not None


def add_column_if_missing(
conn: sqlite3.Connection,
table: str,
column: str,
decl: str,
) -> bool:
"""Returns True if column was added, False if already present."""
if not table_exists(conn, table):
return False
if column_exists(conn, table, column):
return False
conn.execute(f"ALTER TABLE {table} ADD COLUMN {column} {decl}")
return True


def create_index_if_missing(
conn: sqlite3.Connection,
index: str,
table: str,
columns: str,
) -> bool:
if not table_exists(conn, table):
return False
if index_exists(conn, index):
return False
conn.execute(f"CREATE INDEX {index} ON {table} ({columns})")
return True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SQL injection risk in schema introspection utilities.

The column_exists, add_column_if_missing, and create_index_if_missing functions interpolate table, column, and index names directly into SQL strings via f-strings. While these are currently called with hardcoded values from migration scripts, the functions are exported and could be misused.

Consider adding identifier validation or documenting that these must only be called with trusted values.

🛡️ Proposed defensive validation
+import re
+
+_IDENT_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
+
+def _validate_ident(name: str, kind: str) -> None:
+    if not _IDENT_RE.match(name):
+        raise ValueError(f"Invalid {kind} identifier: {name!r}")
+
 def column_exists(conn: sqlite3.Connection, table: str, column: str) -> bool:
+    _validate_ident(table, "table")
     rows = conn.execute(f"PRAGMA table_info({table})").fetchall()
     return any(r[1] == column for r in rows)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/_runner.py` around lines 53 - 98, The helper
functions interpolate identifiers into SQL and risk injection; add a small
validator (e.g. validate_identifier(name): allow only ASCII letters, digits and
underscores and must not be empty) and call it before any f-string SQL that
embeds a table/column/index (at least in column_exists, add_column_if_missing,
create_index_if_missing and any place building "columns" lists); for the
"columns" parameter validate it's a comma-separated list of valid identifiers
(split and validate each) and raise ValueError if validation fails so only
trusted identifier characters are permitted before using f-strings.

Comment on lines +75 to +140
def main() -> int:
ap = argparse.ArgumentParser()
ap.add_argument("--brain", help="Brain directory or system.db path")
ap.add_argument("--dry-run", action="store_true")
args = ap.parse_args()

db_path = resolve_brain_db(args.brain)
if not db_path.exists():
print(f"ERROR: brain DB not found at {db_path}", file=sys.stderr)
return 2

brain_dir = db_path.parent
tid = read_tenant_id(brain_dir)
if not tid:
print(
f"ERROR: no tenant UUID in {brain_dir}/.tenant_id - run migration 001 first",
file=sys.stderr,
)
return 2

print(f"Brain : {db_path}")
print(f"Tenant : {tid}")
print(f"Mode : {'dry-run' if args.dry_run else 'apply'}")
print()

conn = sqlite3.connect(str(db_path))
conn.execute("PRAGMA busy_timeout=5000")

total = 0
touched: list[tuple[str, int]] = []
try:
for t in CANDIDATE_TABLES:
if not table_exists(conn, t):
continue
if not column_exists(conn, t, "tenant_id"):
continue
cnt = conn.execute(
f"SELECT COUNT(*) FROM {t} WHERE tenant_id IS NULL"
).fetchone()[0]
if not cnt:
continue
if args.dry_run:
touched.append((t, cnt))
total += cnt
continue
cur = conn.execute(
f"UPDATE {t} SET tenant_id = ? WHERE tenant_id IS NULL",
(tid,),
)
if cur.rowcount:
touched.append((t, cur.rowcount))
total += cur.rowcount
if not args.dry_run:
conn.commit()
finally:
conn.close()

if not touched:
print("no NULL tenant_id rows found")
return 0

for t, n in touched:
print(f" {t:32s} {n:>8} rows")
verb = "would backfill" if args.dry_run else "backfilled"
print(f"\n{verb} {total} rows across {len(touched)} tables")
return 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Replace print() with logging.

The main() function uses print() statements for all output, which violates the coding guidelines for src/gradata/**/*.py.

♻️ Proposed fix (partial)
+import logging
+
+_log = logging.getLogger(__name__)
+
 def main() -> int:
     ...
     if not db_path.exists():
-        print(f"ERROR: brain DB not found at {db_path}", file=sys.stderr)
+        _log.error("brain DB not found at %s", db_path)
         return 2

     brain_dir = db_path.parent
     tid = read_tenant_id(brain_dir)
     if not tid:
-        print(
-            f"ERROR: no tenant UUID in {brain_dir}/.tenant_id - run migration 001 first",
-            file=sys.stderr,
-        )
+        _log.error("no tenant UUID in %s/.tenant_id - run migration 001 first", brain_dir)
         return 2

-    print(f"Brain  : {db_path}")
-    print(f"Tenant : {tid}")
-    print(f"Mode   : {'dry-run' if args.dry_run else 'apply'}")
+    _log.info("Brain  : %s", db_path)
+    _log.info("Tenant : %s", tid)
+    _log.info("Mode   : %s", 'dry-run' if args.dry_run else 'apply')
     ...

As per coding guidelines, src/gradata/**/*.py: "no print() statements (use logging)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/fill_null_tenant.py` around lines 75 - 140, The
main() function uses print() for all messages; replace prints with the module
logger so it follows the project's "no print()" guideline: import logging at
top, create logger = logging.getLogger(__name__), and change error prints (e.g.
the "ERROR: brain DB not found..." and "ERROR: no tenant UUID..." messages when
checking db_path and tid) to logger.error(...), and change informational prints
(Brain, Tenant, Mode, the per-table lines, the "no NULL tenant_id rows found"
message, and the final summary line) to logger.info(...). Keep message text and
formatting (f-strings) unchanged, and do not call logging.basicConfig here;
leave configuration to the application. Target replacements inside main() and
around variables/db operations (db_path, tid, touched, CANDIDATE_TABLES, conn,
cur).

Comment on lines +61 to +65
tid = fpath.read_text(encoding="utf-8").strip()
if _is_valid_uuid(tid):
return tid
# Shouldn't happen; return our generated UUID as last resort.
return new_tid

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Defensive fallback returns unwritten UUID if file read fails.

If fpath.read_text() at line 61 succeeds but _is_valid_uuid() fails (corrupted file), the function returns new_tid which was never persisted. This could cause different processes to use different tenant IDs for the same brain.

Consider logging a warning when falling back to new_tid so corruption is visible.

🛡️ Proposed fix to add warning
+import logging
+_log = logging.getLogger(__name__)
+
     tid = fpath.read_text(encoding="utf-8").strip()
     if _is_valid_uuid(tid):
         return tid
     # Shouldn't happen; return our generated UUID as last resort.
+    _log.warning("Corrupted .tenant_id at %s, using generated UUID", fpath)
     return new_tid
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/tenant_uuid.py` around lines 61 - 65, The function
that reads the tenant id (the code that uses fpath.read_text(), validates with
_is_valid_uuid(tid) and falls back to new_tid) should log a warning before
returning the generated new_tid so corrupted/unwritten tenant files are visible;
update the block that currently returns new_tid to call the module logger (or
processLogger) to warn that the tenant file at fpath contained invalid data and
that a new UUID (new_tid) is being used, and ideally include the invalid tid
value and fpath in the message to aid debugging while leaving the rest of the
fallback behavior unchanged.

Comment on lines +84 to +99
def _main() -> int:
ap = argparse.ArgumentParser(description="Read or create brain tenant UUID")
ap.add_argument("--brain", required=True, help="Path to brain directory")
ap.add_argument("--peek", action="store_true", help="Read only; never create")
args = ap.parse_args()

if args.peek:
tid = read_tenant_id(args.brain)
if tid is None:
print("(no tenant id)")
return 1
print(tid)
return 0

print(get_or_create_tenant_id(args.brain))
return 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Replace print() with logging in CLI entrypoint.

The _main() function uses print() statements for output, which violates the coding guidelines for src/gradata/**/*.py.

♻️ Proposed fix
+import logging
+
+_log = logging.getLogger(__name__)
+
 def _main() -> int:
     ap = argparse.ArgumentParser(description="Read or create brain tenant UUID")
     ap.add_argument("--brain", required=True, help="Path to brain directory")
     ap.add_argument("--peek", action="store_true", help="Read only; never create")
     args = ap.parse_args()

     if args.peek:
         tid = read_tenant_id(args.brain)
         if tid is None:
-            print("(no tenant id)")
+            _log.info("(no tenant id)")
             return 1
-        print(tid)
+        _log.info(tid)
         return 0

-    print(get_or_create_tenant_id(args.brain))
+    _log.info(get_or_create_tenant_id(args.brain))
     return 0

As per coding guidelines, src/gradata/**/*.py: "no print() statements (use logging)".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _main() -> int:
ap = argparse.ArgumentParser(description="Read or create brain tenant UUID")
ap.add_argument("--brain", required=True, help="Path to brain directory")
ap.add_argument("--peek", action="store_true", help="Read only; never create")
args = ap.parse_args()
if args.peek:
tid = read_tenant_id(args.brain)
if tid is None:
print("(no tenant id)")
return 1
print(tid)
return 0
print(get_or_create_tenant_id(args.brain))
return 0
import logging
_log = logging.getLogger(__name__)
def _main() -> int:
ap = argparse.ArgumentParser(description="Read or create brain tenant UUID")
ap.add_argument("--brain", required=True, help="Path to brain directory")
ap.add_argument("--peek", action="store_true", help="Read only; never create")
args = ap.parse_args()
if args.peek:
tid = read_tenant_id(args.brain)
if tid is None:
_log.info("(no tenant id)")
return 1
_log.info(tid)
return 0
_log.info(get_or_create_tenant_id(args.brain))
return 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_migrations/tenant_uuid.py` around lines 84 - 99, The CLI
entrypoint _main() uses print() calls; replace them with the module logger
(create logger = logging.getLogger(__name__) if missing) and use logger.info()
for normal outputs (the tenant id and get_or_create_tenant_id result) and
logger.warning() or logger.info() for the "(no tenant id)" message before
returning 1; keep return codes the same and do not call print() anywhere in
_main(); reference read_tenant_id and get_or_create_tenant_id when replacing
their printed outputs.

Pre-existing violations were blocking CI. Fixes:
- F401 barrel re-exports: ruff noqa header on self_improvement __init__
- SIM105: try/except/pass -> contextlib.suppress in _file_lock
- SIM103: negated-return inline in rule_enforcement
- RUF022: isort __all__ in rule_engine __init__
- TC001: move gradata._types.Lesson to TYPE_CHECKING block in _models
- UP035/I001/F401 auto-fixes across cli, hooks, enhancements

Runtime imports verified intact via smoke tests.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

- _tenant.py: atomic .tenant_id creation via open(..., 'x') O_EXCL,
  fall back to reading existing UUID if another process wrote first
- _events.py: widen dedup unique index to (tenant_id, ts, type, source)
  so two tenants can't collide on identical event signatures
- rule_enforcement.py: treat GRADATA_MAX_REMINDERS=0 as "disabled" rather
  than emitting an empty rules header
- self_improvement/__init__.py: drop stale ruff noqa now that __all__
  carries the re-exports

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 17, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai please review latest commits. Addressed: atomic tenant file, tenant-scoped event dedup, MAX_REMINDERS=0 disable semantics, unused noqa.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

@Gradata Thanks for the heads-up on the addressed items! I'll kick off a fresh review of the latest commits now to verify all four fixes (atomic tenant file, tenant-scoped event dedup, MAX_REMINDERS=0 disable semantics, and the unused noqa) and catch anything new.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

@Gradata Thanks for the summary of addressed items! Kicking off a fresh review now to verify the four fixes:

  1. Atomic tenant file – race-free .tenant_id write path
  2. Tenant-scoped event dedup – deduplication keyed per-tenant
  3. MAX_REMINDERS=0 disable semantics – zero value fully disables reminders
  4. Unused noqa – lint suppression cleanup

I'll flag anything new or outstanding in the review.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

@Gradata Thanks for the update — I'll kick off a fresh review of the latest commits now to verify all four addressed items:

  1. Atomic tenant file – race-free .tenant_id write path
  2. Tenant-scoped event dedup – deduplication keyed per-tenant
  3. MAX_REMINDERS=0 disable semantics – zero value fully disables reminders
  4. Unused noqa – lint suppression cleanup

I'll flag anything new or outstanding in the review.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai[bot]
coderabbitai Bot previously requested changes Apr 17, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/gradata/_events.py (1)

172-178: ⚠️ Potential issue | 🟠 Major

Scope the fallback event-ID lookup by tenant_id.

The insert path is now tenant-scoped, but the SELECT id fallback is still tenant-agnostic. If two tenants emit the same (ts, type, source), this can resolve the wrong row ID after an INSERT OR IGNORE.

Suggested fix
             else:
                 existing = conn.execute(
-                    "SELECT id FROM events WHERE ts=? AND type=? AND source=?",
-                    (ts, event_type, source),
+                    "SELECT id FROM events WHERE tenant_id=? AND ts=? AND type=? AND source=?",
+                    (_tid, ts, event_type, source),
                 ).fetchone()
                 event["id"] = existing[0] if existing else None

Also applies to: 183-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_events.py` around lines 172 - 178, The fallback SELECT that
fetches the event id after the INSERT OR IGNORE is tenant-agnostic; modify the
SELECT used after the INSERT OR IGNORE (the conn.execute(...) that queries
"SELECT id FROM events WHERE ts=? AND type=? AND source=?") to include tenant_id
in its WHERE clause and pass _tid as an additional parameter so the lookup is
scoped to the same tenant; apply the same tenant-scoping change to the second
fallback SELECT occurrence around the later block (the other conn.execute(...)
that selects id by ts/type/source).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_tenant.py`:
- Around line 50-63: The current logic returns a newly generated tid when a
pre-existing .tenant_id file contains invalid data (caught in the
FileExistsError block) but never persists it, causing inconsistent tenant IDs;
modify the FileExistsError handler in the block that creates/reads fpath so that
when existing = fpath.read_text(...) is invalid (i.e. _is_valid_uuid(existing)
is False) you overwrite the file with the new tid (persist it atomically via
fpath.write_text or by opening for write) and then return tid; if the existing
value is valid, return it as before. Ensure you reference fpath, tid, existing,
and _is_valid_uuid in your change.

In `@src/gradata/cli.py`:
- Around line 35-44: The CLI helper _get_brain currently prefers the --brain-dir
arg over the GRADATA_BRAIN env var, which conflicts with _resolve_brain_root;
update _get_brain so its precedence matches _resolve_brain_root by reading
env_str("GRADATA_BRAIN") first, then falling back to getattr(args, "brain_dir",
None), then Path.cwd(); adjust the logic around the brain_dir variable in
_get_brain (and keep references to Brain, env_str and args.brain_dir intact) so
both helpers target the same brain.

In `@src/gradata/hooks/rule_enforcement.py`:
- Around line 39-48: The helper _env_int currently returns the raw default
directly on both the None and exception paths, which can violate the "clamp to
>= minimum" contract; update both early returns to return max(minimum, default)
(i.e., replace `return default` with `return max(minimum, default)`) so the
default is clamped, keeping the rest of the function and the final max(minimum,
value) check unchanged.

---

Duplicate comments:
In `@src/gradata/_events.py`:
- Around line 172-178: The fallback SELECT that fetches the event id after the
INSERT OR IGNORE is tenant-agnostic; modify the SELECT used after the INSERT OR
IGNORE (the conn.execute(...) that queries "SELECT id FROM events WHERE ts=? AND
type=? AND source=?") to include tenant_id in its WHERE clause and pass _tid as
an additional parameter so the lookup is scoped to the same tenant; apply the
same tenant-scoping change to the second fallback SELECT occurrence around the
later block (the other conn.execute(...) that selects id by ts/type/source).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40cfd6c9-3cb1-44ca-b84c-c5b62aaa6360

📥 Commits

Reviewing files that changed from the base of the PR and between 716073d and 1b9d5ab.

📒 Files selected for processing (6)
  • src/gradata/_events.py
  • src/gradata/_tenant.py
  • src/gradata/cli.py
  • src/gradata/enhancements/self_improvement/__init__.py
  • src/gradata/hooks/rule_enforcement.py
  • src/gradata/rules/rule_engine/__init__.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/enhancements/self_improvement/__init__.py
  • src/gradata/rules/rule_engine/__init__.py
  • src/gradata/cli.py
  • src/gradata/_events.py
  • src/gradata/_tenant.py
  • src/gradata/hooks/rule_enforcement.py
src/gradata/hooks/**

⚙️ CodeRabbit configuration file

src/gradata/hooks/**: JavaScript hooks for Claude Code integration. Check for: no shell injection (no execSync with user
input), temp files must use per-user subdirectory, HTTP calls must have timeouts, errors must be silent (never block
the tool chain).

Files:

  • src/gradata/hooks/rule_enforcement.py
🪛 Ruff (0.15.10)
src/gradata/_tenant.py

[warning] 30-30: Consider moving this statement to an else block

(TRY300)


[warning] 60-60: Consider moving this statement to an else block

(TRY300)

🔇 Additional comments (4)
src/gradata/rules/rule_engine/__init__.py (1)

15-15: Good compliance update: future annotations import added.

This matches the required type-safety rule for core SDK Python files.

As per coding guidelines, "type safety (from future import annotations required)".

src/gradata/enhancements/self_improvement/__init__.py (1)

1-1: LGTM — documentation clarity improvement.

The module-level comment explicitly clarifies that this is a barrel re-export module, which helps developers understand the file's purpose at a glance.

src/gradata/hooks/rule_enforcement.py (2)

108-118: Domain scope filtering behavior is now correctly conservative.

This change correctly avoids filtering when domain is malformed/non-string or when the file domain is unknown, and only filters on explicit non-empty string mismatch.


127-131: MAX_REMINDERS <= 0 short-circuit is correct and prevents empty injections.

Good fix: this respects operator disable intent and avoids emitting a header-only block.

Comment thread src/gradata/_tenant.py Outdated
Comment thread src/gradata/cli.py Outdated
Comment thread src/gradata/hooks/rule_enforcement.py
…d-only

_cloud_sync.push now lets mark_push errors surface — _mark_push already
no-ops on missing sync_state, so the blanket suppress was hiding real bugs.

_migrations._runner.has_applied no longer creates the migrations table as
a side effect of being called; tooling can probe migration state safely.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

- cli.py: _get_brain precedence now matches _resolve_brain_root exactly
  (GRADATA_BRAIN env > --brain-dir arg > cwd). Both helpers always target
  the same brain.
- _tenant.py: FileExistsError fallback with invalid UUID now overwrites
  the corrupt file with the fresh UUID so subsequent reads stay consistent
  instead of diverging from on-disk state.
- rule_enforcement.py: _env_int clamps the default to minimum on every
  return path (None / parse failure / parsed value).
- _events.py: tenant-scope the fallback SELECT used after INSERT OR IGNORE
  so two tenants sharing (ts, type, source) can't resolve to the wrong id.

Co-Authored-By: Gradata <noreply@gradata.ai>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@Gradata

Gradata commented Apr 17, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review please. Round 2 addresses: cli.py env-first precedence, _tenant.py corrupt-file overwrite, _env_int default clamping, _events.py tenant-scoped fallback SELECT. All ruff + 99 tests green locally.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown

@Gradata Thanks for the detailed round 2 summary! Kicking off a full re-review now to verify all four addressed items:

  1. cli.py env-first precedenceGRADATA_BRAIN env var takes priority over cwd()
  2. _tenant.py corrupt-file overwrite – stale/invalid .tenant_id is safely replaced
  3. _env_int default clamping – default value is clamped to minimum before use
  4. _events.py tenant-scoped fallback SELECT – dedup/fallback lookup scoped by tenant_id

I'll cover the full diff and flag anything outstanding.


✏️ Learnings added
Learnt from: Gradata
URL: https://github.com/Gradata/gradata/pull/102

Timestamp: 2026-04-17T17:18:07.417Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

✅ Actions performed

Full review triggered.

@Gradata Gradata dismissed stale reviews from coderabbitai[bot] and coderabbitai[bot] April 17, 2026 17:23

All 3 actionable items from this review addressed in f2a224a: cli.py env-first precedence, _tenant.py corrupt-file overwrite, _env_int default clamping, _events.py tenant-scoped fallback SELECT (both occurrences). Dismissing stale review; CR re-review pending.

@Gradata Gradata merged commit 88e9969 into main Apr 17, 2026
1 check passed
@Gradata Gradata deleted the worktree-multi-tenant-cloud-align branch April 17, 2026 17:47
Gradata added a commit that referenced this pull request Apr 19, 2026
Covers: gradata.patterns deprecation (remove in v0.8.0), Alert dedup (#109),
Meta-Harness A-D, multi-tenant SDK (#102), BM25 JIT ranking (#101),
gradata seed/mine CLI, rule_verifier wiring, security hardening, and 67+66
ruff-violation fixes (#100, #103).

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 19, 2026
Covers: gradata.patterns deprecation (remove in v0.8.0), Alert dedup (#109),
Meta-Harness A-D, multi-tenant SDK (#102), BM25 JIT ranking (#101),
gradata seed/mine CLI, rule_verifier wiring, security hardening, and 67+66
ruff-violation fixes (#100, #103).

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata added a commit that referenced this pull request Apr 20, 2026
* docs: v0.6.1 changelog entry covering post-v0.6.0 work through PR #114

Covers: gradata.patterns deprecation (remove in v0.8.0), Alert dedup (#109),
Meta-Harness A-D, multi-tenant SDK (#102), BM25 JIT ranking (#101),
gradata seed/mine CLI, rule_verifier wiring, security hardening, and 67+66
ruff-violation fixes (#100, #103).

Co-Authored-By: Gradata <noreply@gradata.ai>

* docs(changelog): qualify env-centralization claim

CodeRabbit flagged that GRADATA_RULE_VERIFIER is still read directly in
rule_pipeline.py. Weaken the claim rather than block on the migration.

Co-Authored-By: Gradata <noreply@gradata.ai>

---------

Co-authored-by: Gradata <noreply@gradata.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant