fix(cloud/client): push events with watermark + idempotency (Bug 2 SDK side) by Gradata · Pull Request #161 · Gradata/gradata

Gradata · 2026-05-02T09:16:05Z

Pairs with gradata-cloud PR #12 (https://github.com/Gradata/gradata-cloud/pull/12).

Bug

client.sync() POSTed only {brain_id, manifest} — no events, no cursor. Server 200ed but ingested nothing. Cloud Supabase has 1376 events; live brain has 7172 (5.2x gap).

Fix

Read events.jsonl, filter by last_sync_at watermark, batch 500/req
Advance cursor on 200, retry with smaller batch on 413
Sync state at <BRAIN_DIR>/.gradata-sync-state.json (events.jsonl untouched)

Tests

9/9 new tests pass in tests/test_cloud_client_sync.py.

Audit trail

Diagnosed in /tmp/audit-bug2-watermark.md. Council voted D-via-telemetry but audit-gate blocked it — 3 cloud routes read raw events directly. Pivoted to council P3 (Skeptic) plan.

Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json` vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns` -> `corrections`). Added `_transform_row` per-table mapper with deterministic uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` / `GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both backend and SDK. Co-Authored-By: Gradata <noreply@gradata.ai>

…provider synth Phase 1 of the learning-pipeline revamp. Rule graduation now flows through the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN, >= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline. Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED header, regenerated only at session_close after the pipeline fires. LLM synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with claude CLI fallback (Max-plan OAuth) so users without an exportable key still get synthesis. Meta-rule deterministic fallback now warns loudly instead of silently discarding. Drops five env-flag gates in favour of file-based signals. Co-Authored-By: Gradata <noreply@gradata.ai>

Adds --cloud / --no-cloud flags to the doctor CLI command and the underlying diagnose() function. Flips the default cloud endpoint to api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py (all passing). Co-Authored-By: Gradata <noreply@gradata.ai>

Regex coverage was brittle to shorthand: real corrections like "Why r you not asking" and "Why flag.. we dont skip" slipped the \bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK events. That silently breaks Gradata's core promise ("learn from any correction"). Adds: - negation: dont/cant/shouldnt (no-apostrophe variants), never - reminder: "again" marker, "dont forget" - challenge: "why r u", "why not/r/are/is/does", "why word..", "how come", "you missed/forgot/failed/didnt" All 8 target phrases now detect. 25 existing implicit-feedback tests remain green. Co-Authored-By: Gradata <noreply@gradata.ai>

14 new tests pinning the regex expansion from 5a6da45. Covers real corrections observed this session ("Why r you not asking council", "Why flag.. we don't skip we do work") plus shorthand cases (dont / cant / again / you missed / how come). Dual-signal cases assert both types detect. Full suite: 37 passed, 1 pre-existing skip. Co-Authored-By: Gradata <noreply@gradata.ai>

Five post-launch metrics with precise definitions (activation, D7 retention, time-to-first-graduation, free->Pro conversion, correction-rate decay). Numeric triggers: pivot <20% activation + flat decay at D30; kill <100 installs at D60; scale >1K installs + >=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8 of the pre-launch gap analysis. Co-Authored-By: Gradata <noreply@gradata.ai>

The source-provenance docstring referenced "cloud-side LLM synthesis" which is stale since the graduation-cloud-gate was removed. Synthesis runs on the user's machine via rule_synthesizer.py's two-provider path (Anthropic SDK with user's key, or Claude Code Max CLI OAuth). Co-Authored-By: Gradata <noreply@gradata.ai>

Graduation and meta-rule LLM synthesis run entirely locally as of a few sessions ago (rule_synthesizer.py uses user's own Anthropic key or Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly still claimed "cloud runs better graduation engine" and implied a cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy paragraph to match reality: free is functionally complete; Pro is visualization, history, export, and the future community corpus. NOTE: this file is listed in .gitignore per the earlier "untrack private files" cleanup. Force-added at request. Co-Authored-By: Gradata <noreply@gradata.ai>

Test was checking the pre-transform local key name. _cloud_sync._transform_row correctly emits brain_id (cloud schema) from tenant_id (local schema); the assertion was stale. Co-Authored-By: Gradata <noreply@gradata.ai>

Previously nothing wrote to lesson_applications — the table existed (onboard.py), was size-checked (_validator.py), and synced to cloud (_cloud_sync.py), but no code ever inserted a row. The compound-quality story had no evidence: rules claimed to fire with no receipt. Now: - inject_brain_rules writes one PENDING row per injected rule (cluster members included), storing {category, description, task} in context so session_close can attribute outcomes back to specific rules. - session_close resolves PENDING rows at end-of-waterfall: REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the session shares the lesson's category (or description substring). CONFIRMED otherwise (rule survived the session). Both paths are best-effort — DB missing, schema drift, or IO errors degrade silently rather than blocking injection or session close. Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually fire and survive?" — the answer now has a row-level audit trail. Co-Authored-By: Gradata <noreply@gradata.ai>

Sweeps the remaining docs that still claimed cloud gated any part of the learning loop. Actual architecture (as of the graduation-local pivot): Local SDK owns: correction capture, graduation, meta-rule clustering AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth), rule-to-hook promotion, manifest computation. Cloud owns: dashboard/visualization, cross-device sync, team brains, managed backups, future opt-in corpus donation. Files touched: - docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance. - docs/architecture/cloud-monolith-v2.md — cloud-side workload framing. - docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow. - docs/concepts/meta-rules.md — synthesis is local, not cloud-gated. - docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize. README.md was already accurate; no changes there. Co-Authored-By: Gradata <noreply@gradata.ai>

Silent-failure-hunter CRITICAL-1: - inject_brain_rules: wrap lesson_applications connection in try/finally and escalate OperationalError to warning (missing-table surfaces). Silent-failure-hunter CRITICAL-2: - _cloud_sync.push: per-row try/except on _transform_row so one bad row no longer propagates and kills the whole push batch. Leak scan blockers: - Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md from the public repo; add both to .gitignore. These contain kill triggers, pricing, and PII that belong in the private brain vault only. Code-reviewer BLOCKER-3: - _doctor._check_vector_store returns status="ok" with FTS5 detail in the detail field, restoring the documented status vocabulary ({ok, warn, fail, skip, missing, error}). Test-coverage gaps: - Add tests/test_rule_synthesizer.py — both providers absent, empty input, cache hit, CLI fallback on SDK raise, malformed output. - Add IMPLICIT_FEEDBACK → REJECTED integration test to test_lesson_applications.py. Verification: full suite 3802 pass, 22 skip, 2 xfailed.

Gradata is fully local-first now. Cloud-gate stubs and "requires cloud" skip markers were legacy artifacts from an earlier architecture where discovery/synthesis lived server-side. This commit finishes the port: - meta_rules.discover_meta_rules + merge_into_meta run locally: category grouping + greedy semantic-similarity clustering, zombie filter on RULE-state lessons below 0.90, decay after 20 sessions, count/(count+3) confidence smoothing. - Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer, test_meta_rule_generalization, test_multi_brain_simulation, test_pipeline_e2e. These tests now exercise the local impl directly. - Retire the api_key-kwarg-on-merge_into_meta path (session-close rule_synthesizer drives LLM distillation now). - Update fixtures to realistic prose so they survive the noise filter that rejects "cut:/added:" edit-distance summaries. - Bump test_meta_rules confidence assertion to the smoothed formula. - Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges (deprecated adapter shims, cloud docs, stale module docstrings). Suite: 3809 passed, 14 skipped, 2 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>

…xtures discover_meta_rules is implemented now (local-first). The if not metas: pytest.skip('discover_meta_rules not yet implemented') guards were vestiges from the cloud-only era — convert to real asserts. Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive the zombie filter (RULE at <0.90 is treated as a decayed rule). Suite: 3813 passed, 10 skipped, 2 xfailed. Remaining skips are all legit: - test_file_lock.py (2): Windows vs POSIX platform gates - test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money - test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY - test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env xfails (2) are tracked for v0.7 reconciliation in test docstring. Co-Authored-By: Gradata <noreply@gradata.ai>

Found while clearing remaining skipped/xfailed tests: Bug: agent_graduation._update_lesson_confidence had confidence = max(0.0, confidence - MISFIRE_PENALTY) but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added confidence on rejection. Test test_rejection_decreases_confidence was xfail'd with 'API drift, reconcile in v0.7' — it was a real bug. Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY). Other cleanups in the same pass: - test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson graduates straight to RULE (stronger than PATTERN). Accept either state. - test_integration_workflow: delete stale module-level skipif guarding 5 tests behind ANTHROPIC/OPENAI keys they never actually use. They only exercise local brain.correct/convergence/efficiency — no network. - test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test already covered by the 20+ fake-client tests in the same file). - test_meta_rules: delete test_with_real_data — dev-time exploration script with zero asserts, requiring GRADATA_LESSONS_PATH env var. Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed. Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl, which does not exist on Windows. Complementary Windows paths skip on Linux — running on each platform covers all 4. Cannot be eliminated. From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>

…ten stale notes Co-Authored-By: Gradata <noreply@gradata.ai>

…ate refresh - agent_graduation: add _extract_output() to handle all Claude Code PostToolUse payload key variants (tool_response/tool_output/tool_result/output/response) so plan-mode agents no longer silently drop output - session_close: add _load_soul_mandatories() (VOICE rules from soul.md injected into brain_prompt.md) and _refresh_loop_state() (regenerates loop-state.md on session close with live DB + lesson counts); raise Stop hook timeout to 90 s - _events: add _redact_payload() (recursive email PII redaction) wired into emit() before any write; raw side-log to events.raw.jsonl (best-effort); redactor failure aborts write (fail closed) Co-Authored-By: Gradata <noreply@gradata.ai>

…e watermarks - _ulid.py: minimal stdlib ULID generator (no external dep); ulid_from_iso() preserves timestamp sort order during historical backfill - device_uuid.py: atomic read-or-create of per-brain dev_<hex> device id; race-safe via O_EXCL temp file + os.replace - 002_add_event_identity: adds event_id/device_id/content_hash/correction_chain_id/ origin_agent columns + indexes to events table; chunked 10k-row backfill that is idempotent and resumes on restart - 003_add_sync_state: creates sync_state table if missing and adds device_id/ last_push_event_id/last_pull_cursor/tenant_id watermark columns + composite indexes - tests: 44 tests covering all migration paths, chunked backfill, idempotency, PII redaction (email), loop-state generation, and session_close functions Co-Authored-By: Gradata <noreply@gradata.ai>

…ts DB Reads ~/.claude/projects/<project-hash>/*.jsonl count as the session number — the actual Anthropic session log — rather than MAX(session) from the Gradata events table. The two diverged (314 vs 367). Falls back to the events DB if the project dir can't be located. Co-Authored-By: Gradata <noreply@gradata.ai>

Previous fix only counted the active project dir (314). Global sum across all project dirs gives 659, matching the actual Anthropic session log total. Falls back to events DB if projects dir missing. Co-Authored-By: Gradata <noreply@gradata.ai>

…oop-state.md (367) Session number was read from loop-state.md (Gradata events DB counter). Now counts .jsonl files across all ~/.claude/projects/ dirs — the real Claude Code session total, same logic as status_line.py. Co-Authored-By: Gradata <noreply@gradata.ai>

Every silent except Exception: pass in the core library layers now emits a _log.debug() so failures surface under GRADATA_LOG=debug without breaking the best-effort semantics. Files touched: brain.py (telemetry guard), context_wrapper.py (apply_brain_rules / context_for fallbacks), _brain_manifest.py + _context_compile.py (added module loggers), _context_packet.py (12 data-loading guards), _manifest_metrics.py (7 DB query guards), _doctor.py (HTTP body read guard + contextlib import), _mine_transcripts.py (SIM108 ternary), hooks/session_close.py (4 x SIM105 OSError guards converted to contextlib.suppress). Co-Authored-By: Gradata <noreply@gradata.ai>

ruff check src/ --fix resolved 8 auto-fixable violations (E, F, I rules). ruff format src/ reformatted 163 files to enforce consistent style. Zero errors remain; 13 pre-existing warnings (optional cloud/framework imports, lazy __all__ patterns) are unchanged. Co-Authored-By: Gradata <noreply@gradata.ai>

Two tests expected s0/s42 but got s659 because _claude_session_count() was walking the real ~/.claude/projects/. Add fake_home fixture so the function returns None and falls back to the events DB as intended. Co-Authored-By: Gradata <noreply@gradata.ai>

…eshold New Stop hook writes a structured handoff to brain/sessions/handoff-{ts}.md when context usage exceeds GRADATA_CTX_THRESHOLD (default 65%). inject_brain_rules surfaces a <watchdog-alert> block at next session start so the LLM knows to review the handoff and run /compact or /clear. Also: bracket_confidence() in session_close for cache-key stability; remove MAX_RULES render cap from inject_brain_rules (overshoot logic was masking gaps); 13 new tests in test_ctx_watchdog, tests in test_rule_synthesizer updated. Co-Authored-By: Gradata <noreply@gradata.ai>

…ript store + retroactive sweep P1: call_provider() dispatch in rule_synthesizer.py routes by model prefix (claude-* → Anthropic, gpt-*/o1/o3 → OpenAI, gemini-* → Google, http → generic). session_close._refresh_brain_prompt now uses call_provider instead of inline SDK. P2: _bracket_confidence() buckets FSRS floats into 3 stable bands (low/mid/high) so per-tick confidence changes no longer bust the synthesis cache. P3: New _transcript.py (log_turn, load_turns, cleanup_ttl) and _transcript_providers.py (ProviderTranscriptSource + GradataTranscriptSource) form the transcript store layer. _retroactive_sweep() in the waterfall runs implicit_feedback patterns across all session turns (gated on GRADATA_TRANSCRIPT=1). OpenAI, LangChain, CrewAI middleware adapters gain session_id + log_turn() calls. 21 new tests in test_transcript.py. Co-Authored-By: Gradata <noreply@gradata.ai>

…only The global Path.is_file patch in _run_main() caused inject_brain_rules to also read a fake pending_handoff.txt and append a <watchdog-alert> block. Test now extracts content between <brain-rules>...</brain-rules> before counting lines, making it immune to any outer blocks appended to the result. Co-Authored-By: Gradata <noreply@gradata.ai>

- pre_compact.py rewritten: when auto-compact fires with a pending handoff, replaces the compact summary verbatim with handoff content so no lossy LLM summarization occurs. Manual compact falls back to snapshot. Corrects field name from "type" → "trigger" (keeps legacy fallback). - inject_brain_rules._build_watchdog_block() extracted from inline main(): Phase 1 (pre-/clear): consumes pending_handoff.txt, stages content to post_clear_handoff.txt, injects <watchdog-alert> with run-/clear prompt. Phase 2 (post-/clear): consumes post_clear_handoff.txt, injects <session-handoff> into fresh session. Phase 2 takes priority if both exist. - implicit_feedback: return None instead of signal name string to reduce UserPromptSubmit noise. - tests/test_pre_compact.py: 9 tests covering both trigger paths. - tests/test_inject_watchdog_phases.py: 8 tests covering both phases. Co-Authored-By: Gradata <noreply@gradata.ai>

graph_first_check.py (PreToolUse, Glob|Grep): blocks exploratory code searches until the session flag is set. Returns a block decision with the exact ToolSearch call needed to unblock. graph_session_track.py (PostToolUse, ToolSearch): writes a per-session flag file when a ToolSearch query contains "code-review-graph", clearing the block for the rest of the session. inject_brain_rules.py: appends <code-graph-tools> directive to every SessionStart injection, with the mandatory ToolSearch query string. Both hooks registered in ~/.claude/settings.json. Bypass via GRADATA_GRAPH_CHECK=0. 18 tests, smoke-tested end-to-end. Co-Authored-By: Gradata <noreply@gradata.ai>

…tignore cleanup - test_hooks_intelligence.py: implicit_feedback tests now assert result is None and verify IMPLICIT_FEEDBACK event via mock_emit (hook emits, doesn't return) - session_close.py: reorder imports alphabetically (isort) - .gitignore: add graphify temp files, run.log patterns, and /.archive/ personal Claude Code config backups so they never accidentally land in commits Co-Authored-By: Gradata <noreply@gradata.ai>

… migration reference - Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py: move legacy Streamlit dashboard per Phase 4 deprecation plan (gradata.ai web dashboard now covers all panels — /rules, /corrections, /self-healing, /observability) - Gradata/migrations/supabase/: reference copies of cloud migrations 014-016 applied to prod 2026-04-24 (corrections unique, events unique, brains.last_used_at) - Gradata/docs/specs/cloud-sync-and-pricing.md: DRAFT v1 sync architecture + pricing tier spec Co-Authored-By: Gradata <noreply@gradata.ai>

Stale file created by a subagent Bash redirect. Grouped with the existing Windows cmd.exe stdout misparse artifact entries. Co-Authored-By: Gradata <noreply@gradata.ai>

Co-Authored-By: Gradata <noreply@gradata.ai>

- CHANGELOG.md: add [Unreleased] section covering 18 commits since 2026-04-23 (cloud sync, hooks hardening, Supabase migrations, Streamlit archival, statusline session-count source, implicit_feedback emit-only contract) - migrations/supabase/014,015: wrap constraint adds in DO blocks that check pg_constraint first, making re-runs safe on any DB (prod already had inline UNIQUE _key variants from CREATE TABLE; these migrations added redundant _unique variants, now documented as no-op on existing systems) - migrations/supabase/README.md: document prod constraint state (both _key and _unique present on corrections + events) and drift-cleanup deferred Co-Authored-By: Gradata <noreply@gradata.ai>

Critic audit flagged a silent-drop path: when resolve_brain_dir() returns None (fresh install, CI env, unconfigured brain) the hook detected signals but skipped emit() with no log — every correction became invisible. - hooks/implicit_feedback.py: add debug log in the else branch recording how many signals were detected and of which types, so operators running `GRADATA_LOG_LEVEL=DEBUG` see the breadcrumb. - tests/test_implicit_feedback.py: add TestMainNoBrainDir covering the main() path (previously only _detect_signals was tested) — verifies the debug log fires on detected signals, stays quiet on no-signal input, and short messages don't crash. Co-Authored-By: Gradata <noreply@gradata.ai>

Watermark stalls from 23505 unique-violations were invisible unless a caller grepped logs: _post() logged everything at WARNING. Now HTTP 409 and any "23505" body are logged at ERROR with a body snippet, and the last error is persisted to brain_dir/cloud_push_error.json so 'gradata doctor' can surface it ('fail' for constraint violations, 'warn' for other non-2xx). Successful pushes clear the file. _post() signature is now (accepted, error_info|None); call sites and the three existing tests patching _post are updated. A _coerce_post_result shim tolerates legacy int returns from any external patches. Closes T17 from the overnight backlog (critic finding cycle-2 #1).

Addresses three cycle-3 council findings on commit 492c3dd: 1. Non-atomic write (critic #1, high-severity race). `_record_push_error` now writes to `<name>.tmp` then `os.replace`s into the target. Concurrent readers (doctor + daemon + MCP server) can no longer observe a truncated file that would mask a constraint violation as "error file unreadable". 2. PII leak in persisted error (critic #2). PostgREST 23505 bodies echo conflicting row values in `details`/`hint` fields, and `gradata doctor` prints the file verbatim. New `_scrub_error_body` parses the body as JSON and keeps only `code` + the first 120 chars of `message` (enough for the constraint name). Non-JSON bodies reduce to a length marker. Log messages use the scrubbed form too. 3. Removed the `_coerce_post_result` shim (verifier + critic). Zero tests exercised the bare-int branch it guarded; callers now destructure `_post` returns directly. Tests: +2 (`test_post_error_body_scrubs_row_values`, `test_scrub_error_body_handles_non_json`), 28/28 in the cloud test files pass, 3944 passed / 3 skipped full suite. Ruff + pyright clean. Co-Authored-By: Gradata <noreply@gradata.ai>

When doctor reports on cloud_push_error.json, the detail string now names the brain directory it checked. In multi-brain deployments, push() and doctor() can resolve different brain_dirs silently — surfacing the path lets users spot the divergence instead of chasing phantom "ok" reports. Cycle-3 critic finding #3. Co-Authored-By: Gradata <noreply@gradata.ai>

Co-Authored-By: Gradata <noreply@gradata.ai>

…metry Three bugs kept last_sync_at frozen: - cloud/client.py POSTed /brains/sync (path doesn't exist) -> /sync - cloud/sync.py POSTed /v1/telemetry/metrics -> /api/v1/telemetry/metrics - Stop hook never fired cloud sync because Claude Code doesn't call brain.end_session(). Added cloud_sync_tick() helper in _core.py and new _run_cloud_sync step in session_close.py waterfall. Also elevated silent DEBUG failures to WARNING with HTTP status + exc_info so the next failure mode surfaces in run.log. 3945 tests pass. Co-Authored-By: Gradata <noreply@gradata.ai>

New CLI: gradata skill export <name> [--output-dir DIR] [--description STR] [--category CAT] [--no-meta] The bet: Claude Skills' "gotchas" section is exactly what graduated RULE-tier lessons are -- but generated from real corrections instead of hand-written. This turns a brain into a portable, shippable Skill folder with valid YAML frontmatter, category-grouped gotchas, and (when available) injectable meta-principles. - new module enhancements/skill_export.py reuses _parse_rules from rule_export so the RULE-only filter and [hooked] marker stripping stay consistent across exporters - auto-generated frontmatter description lists rule categories with defensive 900-char clip (Anthropic 1024 ceiling) - name slugified for safe folder name + frontmatter alignment - description quote-escapes preserve YAML validity - meta-rule loader degrades gracefully on missing system.db / table 24 new tests; full suite 3969 pass (+24, 0 regressions). Unblocks M4 items 7 and 9 (self-dev Skill, composition Skill) per plans/swift-toasting-origami.md. Co-Authored-By: Gradata <noreply@gradata.ai>

…ug 2) Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md. - client.sync() now reads events.jsonl, filters by last_sync_at watermark, batches 500 at a time, advances cursor on 200, retries with smaller batch on 413. - Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl which stays append-only and untouched). - 9/9 new tests pass in tests/test_cloud_client_sync.py. Council perspective P3 (Skeptic) had this take after audit-gate blocked the aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py) read raw events directly, so telemetry-only would have flatlined them.

greptile-apps · 2026-05-02T09:16:10Z

Too many files changed for review. (243 files found, 100 file limit)

coderabbitai · 2026-05-02T09:16:22Z

📝 Walkthrough

Summary

Cloud Sync & Event Handling (Core Feature)

Fixed critical bug: client.sync() now properly reads events.jsonl, filters by watermark (last_sync_at), batches events (500 per request), and advances cursor on successful POST
Robust batching: Handles HTTP 413 errors by halving batch size and retrying; persists sync state to .gradata-sync-state.json with watermark tracking
Event schema: Added deterministic event formatting with SHA-256-derived IDs and in-batch deduplication
Error handling: Constraint violations and network failures logged and persisted to cloud_push_error.json; non-fatal failures don't block retries

Database & Persistence

Event identity migration: Added event_id, device_id, content_hash columns with deterministic ULID generation from timestamps
Sync state table: New sync_state table with per-device watermarks (last_push_event_id, last_pull_cursor)
Schema constraints: Unique constraints on corrections and events tables (idempotent migrations)

Breaking Changes

CloudClient.sync(): Return type changed from dict → int (ingested event count); now requires explicit batch_size parameter (default 500)
brain_end_session(): Added skip_meta_rules: bool = False parameter

New Public APIs

Config helpers: get_config_dir(), get_config_file(name: str)
Event emissions: emit_gate_result(...), emit_gate_override(...)
Cloud telemetry: cloud_sync_tick(brain_dir, session_number)
Skill export: export_skill(...), write_skill(...) for exporting graduated rules as Claude Skills
Transcript logging: log_turn(...), load_turns(...), cleanup_ttl(...) for session conversation logging
Device/tenant helpers: get_or_create_device_id(), ULID generation (new_ulid(), ulid_from_iso())
Diagnostics: diagnose() extended with include_cloud, cloud_only flags for cloud-specific probes
Rule synthesis: New synthesize_rules_block() module for LLM-powered rule consolidation with caching

Security & Data Quality

PII redaction: Email addresses redacted from event payloads before persistence (dual-write: canonical redacted + best-effort raw backup)
HTTPS enforcement: GenericHTTPProvider now guards configured base URL with HTTPS validation
Deprecations: Streamlit dashboard deprecated (archived); local meta-rule synthesis now deterministic (no cloud dependency)

Testing & Tools

9 new tests: test_cloud_client_sync.py covering watermark filtering, batching, retry logic, and error scenarios
Backfill script: scripts/backfill_to_cloud.py for one-shot replay of historical events with idempotency

Documentation

Updated architecture docs clarifying local-first sourcing, cloud as mirror/sync target only
Meta-rules, graduation, and synthesis now fully local; cloud holds visualization & optional telemetry only

Walkthrough

This PR implements a comprehensive shift toward a "local-first" architecture: Gradata becomes functionally complete without cloud services, meta-rule synthesis runs locally using the user's LLM provider, cloud syncs graduated rules and events for visualization/backup/cross-device use only, and new migrations/client code support resumable multi-device sync. Additionally, graduated lessons export as Anthropic Claude Skills via a new CLI command, and extensive code refactoring standardizes formatting and improves observability.

Changes

Local-First Architecture Shift

Layer / File(s)	Summary
Documentation Updates `Gradata/docs/architecture/`, `Gradata/docs/cloud/`, `Gradata/docs/concepts/meta-rules.md`	Repositioned cloud as a sync target and visualization layer, not a learning gate. Removed claims that cloud synthesis or shared meta-rules are required. Clarified that local SDK performs graduation, diffing, rule injection, and meta-rule synthesis; cloud mirrors results.
Meta-Rule Localization `Gradata/src/gradata/enhancements/meta_rules.py`	Replaced cloud placeholder discovery with deterministic local clustering. Lessons are grouped by semantic similarity within categories, principles synthesized from highest-confidence examples, and confidence decayed by non-reinforcement. Falls back to deterministic meta-rules when LLM unavailable (with warning).
Meta-Rule Storage Schema `Gradata/src/gradata/enhancements/meta_rules_storage.py`, `Gradata/migrations/supabase/014_corrections_unique.sql`, `015_events_unique.sql`, `016_brains_last_used_at.sql`	Added `applies_when`, `never_when`, `transfer_scope`, `source` columns to local/cloud meta-rule tables. Supabase migrations ensure deduplication and track sync identity.
Local Graduation Integration `Gradata/src/gradata/enhancements/self_improvement/_graduation.py`	New centralized graduation module implementing Beta lower-bound gates, per-lesson state transitions (INSTINCT ↔ PATTERN, PATTERN → RULE), multi-stage adversarial gates, and optional rule-wording refinement via tree-of-thoughts.
Core Wiring Updates `Gradata/src/gradata/_core.py`	`brain_end_session` now skips meta-rule discovery when `skip_meta_rules=True`. Lesson/rule state transitions routed through unified `graduate(...)`. New `cloud_sync_tick()` exports lessons and corrections for a session.

Cloud Sync & Multi-Device Support

Layer / File(s)	Summary
Sync State & Migrations `Gradata/src/gradata/_migrations/002_add_event_identity.py`, `003_add_sync_state.py`, `device_uuid.py`, `_ulid.py`	New migrations add `event_id` (ULID), `device_id`, `content_hash` to events table for deterministic replay. `sync_state` table tracks per-device watermarks (`last_push_event_id`, `last_pull_cursor`). Device IDs generated/persisted atomically in `.device_id`. ULID generator for stable timestamps.
Cloud Sync Client Upgrade `Gradata/src/gradata/cloud/client.py`	Replaced POST `/brains/sync` with `/sync` endpoint. `sync(batch_size)` now reads events.jsonl, batches filtered events, handles HTTP 413 via adaptive batch downsizing, and returns ingested count. Sync state persisted to disk for resumability.
Cloud Payload Transformation `Gradata/src/gradata/_cloud_sync.py`	Added row transformation helpers: deterministic UUID generation, table remapping (`events` → `events`, `meta_rules` → `meta_rules`, etc.), JSON scrubbing (NUL/surrogate removal), session coercion. Post failures now persist to `cloud_push_error.json` for inspection.
Error Persistence & Logging `Gradata/src/gradata/_cloud_sync.py`, `_doctor.py`	`_cloud_sync.py` logs constraint violations and network errors separately; failures recorded in `cloud_push_error.json` and cleared on success. `diagnose()` gains `include_cloud`/`cloud_only` flags to probe cloud connectivity, auth, and push-error state.

Skill Export & CLI Enhancement

Layer / File(s)	Summary
Skill Export Module `Gradata/src/gradata/enhancements/skill_export.py`	New module exports graduated RULE lessons as Anthropic Claude Skill `SKILL.md` with YAML frontmatter (name, description) and "Gotchas" sections grouped by rule category. Optionally includes injectable meta-principles from `system.db`. Caches full export.
CLI Skill Command `Gradata/src/gradata/cli.py`	Added top-level `skill export` subcommand accepting name, output-dir, optional description/category, and meta-principles toggle. Wires into `export_skill()` / `write_skill()` for folder output or stdout.
Doctor Cloud Diagnostics `Gradata/src/gradata/_doctor.py`	`diagnose()` now accepts `include_cloud` and `cloud_only` flags. New cloud probes check credentials, connectivity, auth token validity, dashboard visibility, and persisted sync errors.

Observability & Event Handling Enhancements

Layer / File(s)	Summary
PII Redaction `Gradata/src/gradata/_events.py`	Event payloads now redact email addresses before persisting. Canonical event uses `redacted_data`; raw unredacted event side-logged to `events.raw.jsonl`. New `emit_gate_result()` and `emit_gate_override()` helpers standardize gate-event emission.
Debug Logging `Gradata/src/gradata/_context_packet.py`, `_context_compile.py`, `_brain_manifest.py`, `_manifest_metrics.py`, `context_wrapper.py`	Replaced silent `pass` exception handlers with debug logging via `_log.debug(...)`. Non-fatal failures (context queries, rule retrieval, config parsing) now surface for troubleshooting.
Transcript Logging `Gradata/src/gradata/_transcript.py`, `_transcript_providers.py`	New opt-in transcript logger: `log_turn()` appends JSON-line turns to `sessions/{session_id}/transcript.jsonl` (gated by `GRADATA_TRANSCRIPT=1`). Supports Claude Code native JSONL and Gradata middleware transcripts via source-agnostic interface.

Config & Path Resolution

Layer / File(s)	Summary
Centralized Config Paths `Gradata/src/gradata/_config_paths.py`	New `get_config_dir()` resolves config via `GRADATA_CONFIG_DIR` → `XDG_CONFIG_HOME/gradata` → `~/.gradata`. `get_config_file(name)` convenience wrapper. Platform-aware precedence.
Brain Context DI `Gradata/src/gradata/_paths.py`	Introduced `BrainContext` frozen dataclass to hold resolved per-brain paths. `BrainContext.from_brain_dir()` derives all context paths from a brain directory, supporting dependency injection in DI-aware modules. Kept module-level defaults for backward compatibility.

Extensive Refactoring & Formatting

Layer / File(s)	Summary
Code Formatting ~80+ files across `contrib/patterns/`, `enhancements/`, `src/gradata/*`	Multi-line parameter lists, dictionary/list literals, and function signatures reformatted for readability. No logic changes; error messages, thresholds, and control flow remain identical. Consolidates single-line expressions where verbose multi-line was redundant.
.gitignore Updates `.gitignore`	Added `.graphify_` patterns (artifacts/scripts), `Gradata/run.log`, `*/run.log`, and `Gradata/docs/pre-launch-plan.md` to ignored SDK-internal drafts. Added standalone tokens near Windows cmd section.
Hook Configuration `Gradata/hooks/hooks.json`	Added context-window watchdog Stop hook (`ctx_watchdog`, 10000ms). Updated existing session-close hook description and extended timeout (15000→90000ms) to reflect gated, concurrency-locked, SDK-only throttled graduation sweep.
CHANGELOG & Cleanup Docs `Gradata/CHANGELOG.md`, `Gradata/docs/LEGACY_CLEANUP.md`	Added [Unreleased] section documenting cloud sync, Supabase migrations, dashboard deprecation, implicit feedback contract, and test/governance fixes. Legacy cleanup checklist for retiring cloud-gate concepts and adapter shims.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Brain as Brain (Local)
    participant Migrate as Migrations
    participant CloudSync as CloudClient
    participant Cloud as Cloud (Supabase)
    participant Dashboard as Dashboard

    Note over User,Dashboard: Initial Setup: Multi-Device Sync Initialization
    User->>Brain: Call set_brain_dir (device A, first time)
    Brain->>Migrate: 001_add_tenant_id: backfill tenant_id
    Brain->>Migrate: 002_add_event_identity: generate device_id, event_id, content_hash
    Migrate->>Brain: Store .device_id locally
    Migrate->>Brain: Create sync_state table with watermarks
    Brain->>Brain: Initialize .gradata-sync-state.json

    Note over User,Dashboard: Session: Meta-Rule Synthesis (Local)
    User->>Brain: brain.correct(...)
    Brain->>Brain: _attribute_domain_fires(), build lessons
    Brain->>Brain: brain_end_session(...)
    Brain->>Brain: discover_meta_rules() → cluster by similarity (local)
    Brain->>Brain: merge_into_meta() → deterministic synthesis
    Brain->>Brain: emit LESSON_CHANGE, RULE_CREATED events
    Brain->>Brain: Persist to events.jsonl, system.db

    Note over User,Dashboard: Sync: Push Events + Graduated Rules to Cloud
    User->>CloudSync: Call client.sync()
    CloudSync->>Brain: Read events.jsonl, load last_push_event_id from sync_state
    CloudSync->>CloudSync: Filter pending events, batch (batch_size=500)
    CloudSync->>CloudSync: Transform rows: deterministic UUIDs, table remap, JSON scrub
    CloudSync->>Cloud: POST /sync (batched events)
    Cloud->>Cloud: Upsert events, meta_rules (conflict-free append-only)
    CloudSync->>Brain: Write new watermark to sync_state.json
    CloudSync->>Brain: Return ingested_count

    Note over User,Dashboard: Dashboard View (Async, Read-Only)
    Dashboard->>Cloud: Query events, meta_rules, graduated lessons
    Dashboard->>User: Render charts, learning funnel, meta-rule corpus
    Note over Dashboard: Cloud never re-runs graduation or modifies local state

    Note over User,Dashboard: Second Device: Resume Learning
    User->>Brain: Set brain_dir (device B, existing brain)
    Brain->>Migrate: device_uuid.get_or_create_device_id() → new device_id
    Brain->>CloudSync: Call client.sync()
    CloudSync->>Cloud: Pull new events/rules since last cursor
    Cloud->>CloudSync: Return events for this device
    CloudSync->>Brain: Ingest events, update local system.db
    CloudSync->>Brain: Advance last_pull_cursor
    Brain->>Brain: Continue learning loop (graduation, synthesis local)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

PR #144: Introduces the skill export module and CLI commands, plus cloud_sync_tick() in _core.py, with overlapping cloud sync client and endpoint path changes.
PR #102: Adds multi-tenant tenant_id handling and multi-device sync migrations with overlapping changes to _cloud_sync.py and tenant helper wiring.
PR #133: Touches overlapping files (.gitignore, docs, hooks, _cloud_sync.py, meta_rules.py, cli.py) implementing related hook and local-first wiring.

Suggested labels

feature, refactor

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/sync-push-events

⚔️ Resolve merge conflicts

Resolve merge conflict in branch fix/sync-push-events

One-shot: counts events.jsonl, resets local sync state, calls client.sync() in a loop until cursor catches up. Idempotent — server upserts on (brain_id, event_id). Run after PRs #11/#12/#161 merge to backfill the ~5800 historical events the broken sync silently dropped.

coderabbitai

Actionable comments posted: 55

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6681236a-68ce-4e4c-b663-7010e17c61fe

📥 Commits

Reviewing files that changed from the base of the PR and between 951791e and caa503f.

⛔ Files ignored due to path filters (1)

.claude/hooks/statusline/sprites-statusline.js is excluded by !.claude/**

📒 Files selected for processing (242)

.gitignore
Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
Gradata/CHANGELOG.md
Gradata/docs/LEGACY_CLEANUP.md
Gradata/docs/architecture/cloud-monolith-v2.md
Gradata/docs/architecture/multi-tenant-future-proofing.md
Gradata/docs/cloud/dashboard.md
Gradata/docs/cloud/overview.md
Gradata/docs/concepts/meta-rules.md
Gradata/docs/specs/cloud-sync-and-pricing.md
Gradata/hooks/hooks.json
Gradata/migrations/supabase/014_corrections_unique.sql
Gradata/migrations/supabase/015_events_unique.sql
Gradata/migrations/supabase/016_brains_last_used_at.sql
Gradata/migrations/supabase/README.md
Gradata/scripts/backfill_to_cloud.py
Gradata/skills/core/session-start/SKILL.md
Gradata/src/gradata/__init__.py
Gradata/src/gradata/_brain_manifest.py
Gradata/src/gradata/_cloud_sync.py
Gradata/src/gradata/_config.py
Gradata/src/gradata/_config_paths.py
Gradata/src/gradata/_context_compile.py
Gradata/src/gradata/_context_packet.py
Gradata/src/gradata/_core.py
Gradata/src/gradata/_data_flow_audit.py
Gradata/src/gradata/_db.py
Gradata/src/gradata/_doctor.py
Gradata/src/gradata/_events.py
Gradata/src/gradata/_export_brain.py
Gradata/src/gradata/_fact_extractor.py
Gradata/src/gradata/_file_lock.py
Gradata/src/gradata/_http.py
Gradata/src/gradata/_installer.py
Gradata/src/gradata/_manifest_helpers.py
Gradata/src/gradata/_manifest_metrics.py
Gradata/src/gradata/_migrations/001_add_tenant_id.py
Gradata/src/gradata/_migrations/002_add_event_identity.py
Gradata/src/gradata/_migrations/003_add_sync_state.py
Gradata/src/gradata/_migrations/_runner.py
Gradata/src/gradata/_migrations/_ulid.py
Gradata/src/gradata/_migrations/device_uuid.py
Gradata/src/gradata/_migrations/fill_null_tenant.py
Gradata/src/gradata/_migrations/tenant_uuid.py
Gradata/src/gradata/_mine_transcripts.py
Gradata/src/gradata/_paths.py
Gradata/src/gradata/_query.py
Gradata/src/gradata/_stats.py
Gradata/src/gradata/_telemetry.py
Gradata/src/gradata/_tenant.py
Gradata/src/gradata/_text_utils.py
Gradata/src/gradata/_transcript.py
Gradata/src/gradata/_transcript_providers.py
Gradata/src/gradata/_types.py
Gradata/src/gradata/_validator.py
Gradata/src/gradata/_workers.py
Gradata/src/gradata/adapters/mem0.py
Gradata/src/gradata/audit.py
Gradata/src/gradata/brain.py
Gradata/src/gradata/brain_inspection.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/cloud/client.py
Gradata/src/gradata/cloud/sync.py
Gradata/src/gradata/context_wrapper.py
Gradata/src/gradata/contrib/enhancements/eval_benchmark.py
Gradata/src/gradata/contrib/enhancements/install_manifest.py
Gradata/src/gradata/contrib/enhancements/quality_gates.py
Gradata/src/gradata/contrib/enhancements/truth_protocol.py
Gradata/src/gradata/contrib/patterns/__init__.py
Gradata/src/gradata/contrib/patterns/agent_modes.py
Gradata/src/gradata/contrib/patterns/context_brackets.py
Gradata/src/gradata/contrib/patterns/evaluator.py
Gradata/src/gradata/contrib/patterns/execute_qualify.py
Gradata/src/gradata/contrib/patterns/guardrails.py
Gradata/src/gradata/contrib/patterns/human_loop.py
Gradata/src/gradata/contrib/patterns/loop_detection.py
Gradata/src/gradata/contrib/patterns/mcp.py
Gradata/src/gradata/contrib/patterns/memory.py
Gradata/src/gradata/contrib/patterns/middleware.py
Gradata/src/gradata/contrib/patterns/orchestrator.py
Gradata/src/gradata/contrib/patterns/parallel.py
Gradata/src/gradata/contrib/patterns/pipeline.py
Gradata/src/gradata/contrib/patterns/q_learning_router.py
Gradata/src/gradata/contrib/patterns/rag.py
Gradata/src/gradata/contrib/patterns/reconciliation.py
Gradata/src/gradata/contrib/patterns/reflection.py
Gradata/src/gradata/contrib/patterns/sub_agents.py
Gradata/src/gradata/contrib/patterns/task_escalation.py
Gradata/src/gradata/contrib/patterns/tools.py
Gradata/src/gradata/contrib/patterns/tree_of_thoughts.py
Gradata/src/gradata/correction_detector.py
Gradata/src/gradata/daemon.py
Gradata/src/gradata/detection/addition_pattern.py
Gradata/src/gradata/enhancements/_sanitize.py
Gradata/src/gradata/enhancements/bandits/collaborative_filter.py
Gradata/src/gradata/enhancements/bandits/contextual_bandit.py
Gradata/src/gradata/enhancements/behavioral_engine.py
Gradata/src/gradata/enhancements/causal_chains.py
Gradata/src/gradata/enhancements/cluster_manager.py
Gradata/src/gradata/enhancements/clustering.py
Gradata/src/gradata/enhancements/contradiction_detector.py
Gradata/src/gradata/enhancements/dedup.py
Gradata/src/gradata/enhancements/diff_engine.py
Gradata/src/gradata/enhancements/edit_classifier.py
Gradata/src/gradata/enhancements/freshness.py
Gradata/src/gradata/enhancements/git_backfill.py
Gradata/src/gradata/enhancements/graduation/agent_graduation.py
Gradata/src/gradata/enhancements/graduation/judgment_decay.py
Gradata/src/gradata/enhancements/graduation/rules_distillation.py
Gradata/src/gradata/enhancements/graduation/scoring.py
Gradata/src/gradata/enhancements/instruction_cache.py
Gradata/src/gradata/enhancements/learning_pipeline.py
Gradata/src/gradata/enhancements/lesson_discriminator.py
Gradata/src/gradata/enhancements/llm_provider.py
Gradata/src/gradata/enhancements/llm_synthesizer.py
Gradata/src/gradata/enhancements/memory_taxonomy.py
Gradata/src/gradata/enhancements/meta_rules.py
Gradata/src/gradata/enhancements/meta_rules_storage.py
Gradata/src/gradata/enhancements/metrics.py
Gradata/src/gradata/enhancements/observation_hooks.py
Gradata/src/gradata/enhancements/pattern_extractor.py
Gradata/src/gradata/enhancements/pattern_integration.py
Gradata/src/gradata/enhancements/pipeline_rewriter.py
Gradata/src/gradata/enhancements/profiling/tone_profile.py
Gradata/src/gradata/enhancements/prompt_synthesizer.py
Gradata/src/gradata/enhancements/reporting.py
Gradata/src/gradata/enhancements/retrieval_fusion.py
Gradata/src/gradata/enhancements/router_warmstart.py
Gradata/src/gradata/enhancements/rule_canary.py
Gradata/src/gradata/enhancements/rule_context_bridge.py
Gradata/src/gradata/enhancements/rule_export.py
Gradata/src/gradata/enhancements/rule_integrity.py
Gradata/src/gradata/enhancements/rule_pipeline.py
Gradata/src/gradata/enhancements/rule_synthesizer.py
Gradata/src/gradata/enhancements/rule_to_hook.py
Gradata/src/gradata/enhancements/rule_verifier.py
Gradata/src/gradata/enhancements/scoring/brain_scores.py
Gradata/src/gradata/enhancements/scoring/calibration.py
Gradata/src/gradata/enhancements/scoring/correction_tracking.py
Gradata/src/gradata/enhancements/scoring/failure_detectors.py
Gradata/src/gradata/enhancements/scoring/gate_calibration.py
Gradata/src/gradata/enhancements/scoring/loop_intelligence.py
Gradata/src/gradata/enhancements/scoring/memory_extraction.py
Gradata/src/gradata/enhancements/scoring/reports.py
Gradata/src/gradata/enhancements/scoring/success_conditions.py
Gradata/src/gradata/enhancements/self_improvement/__init__.py
Gradata/src/gradata/enhancements/self_improvement/_confidence.py
Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Gradata/src/gradata/enhancements/similarity.py
Gradata/src/gradata/enhancements/skill_export.py
Gradata/src/gradata/events_bus.py
Gradata/src/gradata/graph.py
Gradata/src/gradata/hooks/_base.py
Gradata/src/gradata/hooks/_generated_runner_core.py
Gradata/src/gradata/hooks/_installer.py
Gradata/src/gradata/hooks/_profiles.py
Gradata/src/gradata/hooks/agent_graduation.py
Gradata/src/gradata/hooks/agent_precontext.py
Gradata/src/gradata/hooks/auto_correct.py
Gradata/src/gradata/hooks/brain_maintain.py
Gradata/src/gradata/hooks/claude_code.py
Gradata/src/gradata/hooks/client.py
Gradata/src/gradata/hooks/config_protection.py
Gradata/src/gradata/hooks/config_validate.py
Gradata/src/gradata/hooks/context_inject.py
Gradata/src/gradata/hooks/ctx_watchdog.py
Gradata/src/gradata/hooks/daemon.py
Gradata/src/gradata/hooks/dispatch_post.py
Gradata/src/gradata/hooks/duplicate_guard.py
Gradata/src/gradata/hooks/generated_runner.py
Gradata/src/gradata/hooks/generated_runner_post.py
Gradata/src/gradata/hooks/graph_first_check.py
Gradata/src/gradata/hooks/graph_session_track.py
Gradata/src/gradata/hooks/implicit_feedback.py
Gradata/src/gradata/hooks/inject_brain_rules.py
Gradata/src/gradata/hooks/jit_inject.py
Gradata/src/gradata/hooks/pre_compact.py
Gradata/src/gradata/hooks/rule_enforcement.py
Gradata/src/gradata/hooks/secret_scan.py
Gradata/src/gradata/hooks/self_review.py
Gradata/src/gradata/hooks/session_boot.py
Gradata/src/gradata/hooks/session_close.py
Gradata/src/gradata/hooks/session_persist.py
Gradata/src/gradata/hooks/stale_hook_check.py
Gradata/src/gradata/hooks/status_line.py
Gradata/src/gradata/hooks/telemetry_summary.py
Gradata/src/gradata/hooks/tool_failure_emit.py
Gradata/src/gradata/hooks/tool_finding_capture.py
Gradata/src/gradata/inspection.py
Gradata/src/gradata/integrations/anthropic_adapter.py
Gradata/src/gradata/integrations/openai_adapter.py
Gradata/src/gradata/mcp_server.py
Gradata/src/gradata/mcp_tools.py
Gradata/src/gradata/middleware/__init__.py
Gradata/src/gradata/middleware/_core.py
Gradata/src/gradata/middleware/anthropic_adapter.py
Gradata/src/gradata/middleware/crewai_adapter.py
Gradata/src/gradata/middleware/langchain_adapter.py
Gradata/src/gradata/middleware/openai_adapter.py
Gradata/src/gradata/notifications.py
Gradata/src/gradata/onboard.py
Gradata/src/gradata/rules/rule_context.py
Gradata/src/gradata/rules/rule_engine/__init__.py
Gradata/src/gradata/rules/rule_engine/_formatting.py
Gradata/src/gradata/rules/rule_ranker.py
Gradata/src/gradata/rules/scope.py
Gradata/src/gradata/safety.py
Gradata/src/gradata/security/correction_hash.py
Gradata/src/gradata/security/correction_provenance.py
Gradata/src/gradata/security/manifest_signing.py
Gradata/src/gradata/sidecar/watcher.py
Gradata/tests/conftest.py
Gradata/tests/test_agent_graduation.py
Gradata/tests/test_bug_fixes.py
Gradata/tests/test_cloud_client_sync.py
Gradata/tests/test_cloud_row_push.py
Gradata/tests/test_cloud_sync.py
Gradata/tests/test_cluster_injection.py
Gradata/tests/test_ctx_watchdog.py
Gradata/tests/test_doctor_cloud.py
Gradata/tests/test_emit_pii_redaction.py
Gradata/tests/test_graph_enforcement.py
Gradata/tests/test_hooks_intelligence.py
Gradata/tests/test_hooks_learning.py
Gradata/tests/test_implicit_feedback.py
Gradata/tests/test_inject_watchdog_phases.py
Gradata/tests/test_integration_workflow.py
Gradata/tests/test_lesson_applications.py
Gradata/tests/test_llm_synthesizer.py
Gradata/tests/test_mem0_adapter.py
Gradata/tests/test_meta_rule_generalization.py
Gradata/tests/test_meta_rules.py
Gradata/tests/test_migration_002_event_identity.py
Gradata/tests/test_migration_003_sync_state.py
Gradata/tests/test_multi_brain_simulation.py
Gradata/tests/test_pipeline_e2e.py
Gradata/tests/test_pre_compact.py
Gradata/tests/test_rule_pipeline.py
Gradata/tests/test_rule_synthesizer.py
Gradata/tests/test_session_close_loop_state.py
Gradata/tests/test_skill_export.py
Gradata/tests/test_transcript.py

💤 Files with no reviewable changes (1)

Gradata/src/gradata/enhancements/self_improvement/_graduation.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (1)

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/enhancements/instruction_cache.py
Gradata/src/gradata/_file_lock.py
Gradata/src/gradata/contrib/patterns/memory.py
Gradata/src/gradata/enhancements/clustering.py
Gradata/src/gradata/enhancements/router_warmstart.py
Gradata/src/gradata/_migrations/_runner.py
Gradata/src/gradata/_http.py
Gradata/src/gradata/_migrations/_ulid.py
Gradata/src/gradata/__init__.py
Gradata/src/gradata/_config.py
Gradata/src/gradata/enhancements/rule_export.py
Gradata/src/gradata/_types.py
Gradata/src/gradata/enhancements/pattern_extractor.py
Gradata/src/gradata/_tenant.py
Gradata/src/gradata/contrib/patterns/tools.py
Gradata/src/gradata/enhancements/bandits/contextual_bandit.py
Gradata/src/gradata/events_bus.py
Gradata/src/gradata/_context_compile.py
Gradata/src/gradata/contrib/patterns/evaluator.py
Gradata/src/gradata/contrib/patterns/__init__.py
Gradata/src/gradata/enhancements/scoring/brain_scores.py
Gradata/src/gradata/enhancements/rule_verifier.py
Gradata/src/gradata/enhancements/diff_engine.py
Gradata/src/gradata/_migrations/device_uuid.py
Gradata/src/gradata/_migrations/002_add_event_identity.py
Gradata/src/gradata/_migrations/tenant_uuid.py
Gradata/src/gradata/_db.py
Gradata/src/gradata/_mine_transcripts.py
Gradata/src/gradata/contrib/patterns/sub_agents.py
Gradata/src/gradata/_data_flow_audit.py
Gradata/src/gradata/cloud/sync.py
Gradata/src/gradata/_text_utils.py
Gradata/src/gradata/contrib/patterns/middleware.py
Gradata/src/gradata/contrib/patterns/pipeline.py
Gradata/src/gradata/enhancements/dedup.py
Gradata/src/gradata/enhancements/rule_context_bridge.py
Gradata/src/gradata/contrib/enhancements/truth_protocol.py
Gradata/src/gradata/contrib/patterns/parallel.py
Gradata/src/gradata/enhancements/lesson_discriminator.py
Gradata/src/gradata/enhancements/freshness.py
Gradata/src/gradata/audit.py
Gradata/src/gradata/enhancements/scoring/loop_intelligence.py
Gradata/src/gradata/contrib/patterns/q_learning_router.py
Gradata/src/gradata/enhancements/edit_classifier.py
Gradata/src/gradata/enhancements/pattern_integration.py
Gradata/src/gradata/context_wrapper.py
Gradata/src/gradata/enhancements/contradiction_detector.py
Gradata/src/gradata/_migrations/fill_null_tenant.py
Gradata/src/gradata/enhancements/pipeline_rewriter.py
Gradata/src/gradata/_workers.py
Gradata/src/gradata/enhancements/profiling/tone_profile.py
Gradata/src/gradata/enhancements/rule_canary.py
Gradata/src/gradata/contrib/patterns/tree_of_thoughts.py
Gradata/src/gradata/contrib/patterns/task_escalation.py
Gradata/src/gradata/enhancements/skill_export.py
Gradata/src/gradata/_transcript_providers.py
Gradata/src/gradata/_transcript.py
Gradata/src/gradata/_fact_extractor.py
Gradata/src/gradata/enhancements/scoring/reports.py
Gradata/src/gradata/contrib/patterns/execute_qualify.py
Gradata/src/gradata/_migrations/003_add_sync_state.py
Gradata/src/gradata/contrib/patterns/loop_detection.py
Gradata/src/gradata/brain_inspection.py
Gradata/src/gradata/contrib/patterns/reconciliation.py
Gradata/src/gradata/contrib/enhancements/quality_gates.py
Gradata/src/gradata/adapters/mem0.py
Gradata/src/gradata/enhancements/similarity.py
Gradata/src/gradata/enhancements/graduation/judgment_decay.py
Gradata/src/gradata/enhancements/metrics.py
Gradata/src/gradata/enhancements/graduation/agent_graduation.py
Gradata/src/gradata/enhancements/memory_taxonomy.py
Gradata/src/gradata/enhancements/graduation/rules_distillation.py
Gradata/src/gradata/cloud/client.py
Gradata/src/gradata/contrib/patterns/orchestrator.py
Gradata/src/gradata/enhancements/reporting.py
Gradata/src/gradata/_validator.py
Gradata/src/gradata/enhancements/git_backfill.py
Gradata/src/gradata/_brain_manifest.py
Gradata/src/gradata/enhancements/scoring/gate_calibration.py
Gradata/src/gradata/_migrations/001_add_tenant_id.py
Gradata/src/gradata/enhancements/retrieval_fusion.py
Gradata/src/gradata/enhancements/llm_provider.py
Gradata/src/gradata/_manifest_helpers.py
Gradata/src/gradata/contrib/patterns/context_brackets.py
Gradata/src/gradata/_installer.py
Gradata/src/gradata/enhancements/learning_pipeline.py
Gradata/src/gradata/enhancements/behavioral_engine.py
Gradata/src/gradata/enhancements/scoring/memory_extraction.py
Gradata/src/gradata/enhancements/prompt_synthesizer.py
Gradata/src/gradata/enhancements/scoring/correction_tracking.py
Gradata/src/gradata/_doctor.py
Gradata/src/gradata/contrib/patterns/mcp.py
Gradata/src/gradata/enhancements/self_improvement/_confidence.py
Gradata/src/gradata/contrib/patterns/agent_modes.py
Gradata/src/gradata/_cloud_sync.py
Gradata/src/gradata/enhancements/scoring/failure_detectors.py
Gradata/src/gradata/enhancements/llm_synthesizer.py
Gradata/src/gradata/enhancements/bandits/collaborative_filter.py
Gradata/src/gradata/contrib/patterns/rag.py
Gradata/src/gradata/enhancements/rule_to_hook.py
Gradata/src/gradata/enhancements/scoring/calibration.py
Gradata/src/gradata/_export_brain.py
Gradata/src/gradata/contrib/patterns/human_loop.py
Gradata/src/gradata/enhancements/scoring/success_conditions.py
Gradata/src/gradata/_stats.py
Gradata/src/gradata/detection/addition_pattern.py
Gradata/src/gradata/enhancements/causal_chains.py
Gradata/src/gradata/enhancements/_sanitize.py
Gradata/src/gradata/enhancements/rule_integrity.py
Gradata/src/gradata/correction_detector.py
Gradata/src/gradata/contrib/patterns/reflection.py
Gradata/src/gradata/_telemetry.py
Gradata/src/gradata/contrib/enhancements/install_manifest.py
Gradata/src/gradata/_manifest_metrics.py
Gradata/src/gradata/enhancements/rule_synthesizer.py
Gradata/src/gradata/enhancements/meta_rules.py
Gradata/src/gradata/daemon.py
Gradata/src/gradata/_events.py
Gradata/src/gradata/cli.py
Gradata/src/gradata/contrib/enhancements/eval_benchmark.py
Gradata/src/gradata/graph.py
Gradata/src/gradata/brain.py
Gradata/src/gradata/_query.py
Gradata/src/gradata/_context_packet.py
Gradata/src/gradata/_config_paths.py
Gradata/src/gradata/_paths.py
Gradata/src/gradata/enhancements/observation_hooks.py
Gradata/src/gradata/enhancements/rule_pipeline.py
Gradata/src/gradata/contrib/patterns/guardrails.py
Gradata/src/gradata/enhancements/meta_rules_storage.py
Gradata/src/gradata/enhancements/cluster_manager.py
Gradata/src/gradata/enhancements/self_improvement/__init__.py
Gradata/src/gradata/enhancements/graduation/scoring.py
Gradata/src/gradata/_core.py

🪛 LanguageTool

Gradata/docs/specs/cloud-sync-and-pricing.md

[grammar] ~102-~102: Ensure spelling is correct
Context: ...vior: - Triggered on Stop hook or every 5min when events accumulated. - Pushes since...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~269-~269: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...vent push logged with content_hash. - Every ACL change emits an acl_changed event...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[grammar] ~290-~290: Ensure spelling is correct
Context: ... cadence:** hourly for Personal+, every 15min for Teams+, continuous WAL for Enterpri...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

Gradata/docs/cloud/overview.md

[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...

(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)

🪛 markdownlint-cli2 (0.22.1)

Gradata/skills/core/session-start/SKILL.md

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Gradata/docs/LEGACY_CLEANUP.md

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

Gradata/docs/concepts/meta-rules.md

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

Gradata/migrations/supabase/README.md

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Gradata/docs/specs/cloud-sync-and-pricing.md

[warning] 35-35: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

[warning] 60-60: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

[warning] 81-81: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Gradata/docs/architecture/multi-tenant-future-proofing.md

[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

coderabbitai · 2026-05-02T09:33:08Z

+Run:  streamlit run C:/Users/olive/SpritesWork/brain/scripts/dashboard.py
+"""
+
+import json
+import re
+import sqlite3
+from datetime import datetime
+from pathlib import Path
+
+import pandas as pd
+import plotly.graph_objects as go
+import streamlit as st
+
+# ---------------------------------------------------------------------------
+# Config
+# ---------------------------------------------------------------------------
+BRAIN_DIR = Path("C:/Users/olive/SpritesWork/brain")
+DB_PATH = BRAIN_DIR / "system.db"
+EVENTS_PATH = BRAIN_DIR / "events.jsonl"
+LESSONS_PATH = BRAIN_DIR / "lessons.md"
+PROSPECTS_DIR = BRAIN_DIR / "prospects"
+BRIEF_PATH = BRAIN_DIR / "morning-brief.md"
+TASKS_DIR = Path("C:/Users/olive/.claude/scheduled-tasks")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the user-specific absolute paths from the archived script.

This hardcodes C:/Users/olive/... in both the docstring and runtime config, which leaks a private workstation path into the repo and makes the archive non-portable everywhere else. Parameterize these via env/CLI args or derive them relative to the script.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines 4 - 26, The file hardcodes user-specific absolute paths (notably the docstring run path and constants BRAIN_DIR, DB_PATH, EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR, BRIEF_PATH, TASKS_DIR); change these to be derived from environment/CLI inputs or relative locations: replace the literal "C:/Users/olive/..." usage by reading a base path from an environment variable (e.g., BRAIN_DIR_ENV) or a CLI arg (or default to Path.home() / "SpritesWork/brain"), then compute DB_PATH, EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR, BRIEF_PATH, and TASKS_DIR from that base; also update the docstring run example to show a generic placeholder (e.g., streamlit run path/to/dashboard.py) rather than the absolute user path.

coderabbitai · 2026-05-02T09:33:08Z

+DELETE FROM events a
+USING events b
+WHERE a.brain_id = b.brain_id
+  AND a.type = b.type
+  AND a.created_at = b.created_at
+  AND a.ctid > b.ctid;


⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

This dedupe key is too coarse for raw events.

Using only (brain_id, type, created_at) can collapse legitimate same-type events that happen at the same timestamp, and the DELETE makes that loss irreversible. Given this PR’s move toward explicit event identity/idempotency, the uniqueness boundary should be a real event identifier, not type + timestamp.

Also applies to: 23-37

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/migrations/supabase/015_events_unique.sql` around lines 14 - 19, The DELETE in the migration is using a too-coarse dedupe key (a.brain_id, a.type, a.created_at) which can remove legitimate simultaneous events; update the deduplication to use a true event identifier (for example an event_id or idempotency_key column) instead of type+timestamp—modify the DELETE ... USING query to compare a.event_id = b.event_id (or the appropriate unique identifier column) and only delete duplicates based on that stable identifier, and if such a column does not exist add a non-null unique event identifier to the events table first and rework the dedupe logic; apply the same fix to the analogous block referenced in lines 23-37.

coderabbitai · 2026-05-02T09:33:08Z

+    with open(events_jsonl) as f:
+        for line in f:
+            line = line.strip()
+            if not line:
+                continue
+            try:
+                ev = json.loads(line)
+            except json.JSONDecodeError:
+                continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Read events.jsonl as UTF-8 explicitly.

Line 57 uses the platform default encoding. On Windows or other non-UTF-8 locales, one non-ASCII event is enough to crash the backfill before any sync happens.

Suggested fix

- with open(events_jsonl) as f: + with open(events_jsonl, encoding="utf-8", errors="replace") as f:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with open(events_jsonl) as f:

for line in f:

line = line.strip()

if not line:

continue

try:

ev = json.loads(line)

except json.JSONDecodeError:

continue

with open(events_jsonl, encoding="utf-8", errors="replace") as f:

for line in f:

line = line.strip()

if not line:

continue

try:

ev = json.loads(line)

except json.JSONDecodeError:

continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/scripts/backfill_to_cloud.py` around lines 57 - 65, The file reading loop currently opens events_jsonl with the platform default encoding which can fail on non-UTF-8 systems; update the open call that reads events_jsonl (the "with open(events_jsonl) as f:" line) to explicitly specify encoding='utf-8' (optionally add errors='replace' or 'ignore' if you prefer resilient parsing) so json.loads receives proper UTF-8 text and non-ASCII events do not crash the backfill.

coderabbitai · 2026-05-02T09:33:08Z

+Read `C:/Users/olive/SpritesWork/brain/continuation.md`. If exists, follow its Resume Point, then archive: `python C:/Users/olive/SpritesWork/brain/scripts/continuation.py archive`. If missing, continue.
+
+## Step 2: Load Context (parallel batch)
+
+Fire all at once — no dependencies:
+1. Read `domain/pipeline/startup-brief.md` (pipeline snapshot, handoff section) *(verify path — may be stale)*
+2. Read `C:/Users/olive/SpritesWork/brain/lessons.md` (scan for mistakes to avoid)
+3. Check Google Calendar today + 30 days (demos, calls, meetings)
+4. Read `C:/Users/olive/SpritesWork/brain/loop-state.md` (session number, open items) *(auto-regenerated by session_close hook — always fresh)*
+5. Read `C:/Users/olive/SpritesWork/brain/brain_prompt.md` (soul.md VOICE mandatories + graduated RULE-level lessons)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove machine-specific paths and private/internal references from this skill.

Hardcoding C:/Users/olive/..., SpritesWork, Oliver, and sprites_context.md makes the skill non-portable and leaks private repo/user details into a shipped artifact. Please switch these to runtime placeholders or repo-relative/public paths.

As per coding guidelines, "Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*".

Also applies to: 38-52

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/skills/core/session-start/SKILL.md` around lines 12 - 21, The SKILL.md contains hardcoded, machine-specific paths and private names (e.g., "C:/Users/olive/SpritesWork/...", "SpritesWork", "Oliver", and filenames like continuation.md, lessons.md, loop-state.md, brain_prompt.md and domain/pipeline/startup-brief.md) which must be replaced with portable placeholders or repo-relative references; update the Step 1/Step 2 file references in this document (and the similar occurrences around lines 38–52) to use runtime variables or repo-relative paths (e.g., {{WORKSPACE}}/brain/continuation.md or ./brain/continuation.md) and remove any personal identifiers, ensuring each bullet clearly indicates a configurable placeholder (or public path) and add a short note that these files are expected to be present at runtime rather than hardcoded to a local user folder.

coderabbitai · 2026-05-02T09:33:08Z

+```
+[check] S[N] loaded | [today's calendar or "clear"]
+[tasks] Top 2-3 from loop-state open items
+[alert] Only if something is broken/overdue — otherwise omit
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced code block.

This currently trips markdownlint MD040. Use something like ```text here.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/skills/core/session-start/SKILL.md` around lines 32 - 36, The fenced code block in SKILL.md (the block showing "[check] S[N] loaded ..." etc.) lacks a language tag and triggers markdownlint MD040; update that fenced block to start with a language label (e.g., use "```text" instead of "```") so the block is explicitly marked as plain text and the linter error is resolved.

coderabbitai · 2026-05-02T09:33:10Z

            try:
                import json as _json
+
                disp_path.write_text(
-                    _json.dumps(tracker.to_dict(), indent=2), encoding="utf-8",
+                    _json.dumps(tracker.to_dict(), indent=2),
+                    encoding="utf-8",
                )


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Write disposition.json through the atomic JSON helper.

This overwrites the file in place during a best-effort phase. If the process dies after truncation, the next run loses the entire disposition state. Route this through the repo's atomic JSON writer instead.

As per coding guidelines, "Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/rule_pipeline.py` around lines 529 - 535, The code currently calls disp_path.write_text(_json.dumps(tracker.to_dict(), indent=2), ...) which can truncate disposition.json mid-write; change this to use the repository's atomic JSON writer (the atomic JSON helper) to write tracker.to_dict() to disp_path atomically instead of using disp_path.write_text; remove the direct json dump and call the atomic helper (passing the dict and desired indent/encoding) so disposition.json is written via the repo's atomic-write utility.

coderabbitai · 2026-05-02T09:33:10Z

+                    brain.emit(
+                        HOOK_DEMOTED,
+                        source,
+                        {
+                            "slug": slug,
+                            "hook_path": str(target),
+                        },
+                    )


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Include rule_id in HOOK_DEMOTED events.

count_human_reversals() only counts reversal events whose payload contains a matching rule_id, but this emit only writes slug/hook_path. That means a manual demotion never feeds back into the empirical gate, so the same rule can be auto-promoted again immediately after being removed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/rule_to_hook.py` around lines 837 - 844, The HOOK_DEMOTED emit is missing the rule identifier required by count_human_reversals(), so include the rule's id in the emitted payload (e.g., add "rule_id": rule.id or "rule_id": rule_id depending on the local symbol available) when calling brain.emit(HOOK_DEMOTED, source, {...}); ensure you reference the actual Rule object or local rule_id variable used in this module so manual demotions are counted by count_human_reversals().

coderabbitai · 2026-05-02T09:33:10Z

+    re.compile(
+        r"(?:by|before|on|until)\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\d{1,2}[/-]\d{1,2})",
+        re.I,
+    ),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat bare date fragments as action items.

This regex now matches standalone phrases like on Monday, and extract() persists that fragment via match.group(0). A user saying the meeting is on Monday will now create an action_item even though no action was requested, which pollutes prospective memory and can duplicate the temporal fact from the same sentence.

Proposed fix

_ACTION_PATTERNS = [ re.compile(r"(?:need to|should|will|going to|have to|must)\s+(.+?)(?:\.|$)", re.I), re.compile(r"(?:follow up|schedule|send|check|review|prepare|draft)\s+(.+?)(?:\.|$)", re.I), - re.compile( - r"(?:by|before|on|until)\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\d{1,2}[/-]\d{1,2})", - re.I, - ), ]

If you still want deadline-aware action items, fold the deadline into the verb-based patterns so the stored fact is the full action, e.g. send the report by Monday, not just by Monday.

Also applies to: 152-165

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/scoring/memory_extraction.py` around lines 87 - 90, The regex that matches "(?:by|before|on|until)\s+(...)" in memory_extraction.py is capturing standalone date fragments (e.g., "on Monday") and extract() stores match.group(0) as an action_item; change the patterns so date/deadline fragments are only captured when attached to a verb/action phrase (e.g., require a verb or imperative before the deadline or fold the deadline into existing verb-based patterns like the verb-driven pattern list used by extract()); specifically, update the loose date-only pattern(s) at the places referenced (the re.compile call and the similar block at lines ~152-165) to either (a) remove the standalone "(?:by|before|on|until) ..." pattern, or (b) require a preceding verb token or phrase (e.g., using a positive lookbehind or adding "\b(?:send|submit|remind|schedule|prepare|complete|...)\b.*?(?:by|before|on|until)\s+..." or merge the deadline part into the verb-based regexes), and ensure extract() continues to use the full matched action phrase rather than a bare date fragment.

coderabbitai · 2026-05-02T09:33:10Z

        # 6. Output not becoming bland (from metrics module)
        try:
            from gradata.enhancements.metrics import compute_metrics
+
            m = compute_metrics(db_path, window)
-            blandness = m.get("blandness_score", 0.0) if isinstance(m, dict) else getattr(m, "blandness_score", 0.0)
+            blandness = (
+                m.get("blandness_score", 0.0)
+                if isinstance(m, dict)
+                else getattr(m, "blandness_score", 0.0)
+            )
            bland_ok = blandness < 0.70
-            conditions.append(ConditionResult(
-                name="output_not_bland",
-                met=bland_ok,
-                current_value=round(blandness, 4),
-                baseline_value=0.70,
-                trend="varied" if bland_ok else "generic",
-                detail=f"Blandness: {blandness:.2f} (threshold: 0.70)",
-            ))
+            conditions.append(
+                ConditionResult(
+                    name="output_not_bland",
+                    met=bland_ok,
+                    current_value=round(blandness, 4),
+                    baseline_value=0.70,
+                    trend="varied" if bland_ok else "generic",
+                    detail=f"Blandness: {blandness:.2f} (threshold: 0.70)",
+                )
+            )
        except Exception:
            pass



⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently swallow errors in the blandness (compute_metrics) success-condition path.

The blandness evaluation uses except Exception: pass, which will hide import errors or unexpected metric-shape issues, resulting in “mysteriously” missing/incorrect output_not_bland condition state.

Suggested fix

+import logging +logger = logging.getLogger(__name__) @@ try: from gradata.enhancements.metrics import compute_metrics @@ conditions.append( ConditionResult( name="output_not_bland", met=bland_ok, current_value=round(blandness, 4), baseline_value=0.70, trend="varied" if bland_ok else "generic", detail=f"Blandness: {blandness:.2f} (threshold: 0.70)", ) ) - except Exception: - pass + except Exception: + logger.warning("Failed to compute blandness success condition", exc_info=True)

As per coding guidelines, “Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product”.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/scoring/success_conditions.py` around lines 256 - 279, The try/except around the blandness check silently swallows errors; update the block where you call gradata.enhancements.metrics.compute_metrics and build the ConditionResult (the code that computes blandness, bland_ok, and appends ConditionResult("output_not_bland")) to catch specific errors (e.g., ImportError, Exception as e) rather than a bare except, log a warning including exc_info=True via the module logger or processLogger, and ensure you still append a reasonable ConditionResult when metrics cannot be computed (e.g., met=False or met=None with detail describing the exception) so the pipeline surfaces the failure instead of disappearing the condition.

coderabbitai · 2026-05-02T09:33:10Z

+    try:
+        metas = load_meta_rules(db_path)
+    except Exception:
+        return []


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log meta-rule load failures instead of silently dropping them.

With include_meta=True, any DB/schema error in load_meta_rules() currently degrades the export to "no meta-principles" with no signal. Please at least log the exception before returning [], so partial exports are diagnosable.

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/skill_export.py` around lines 168 - 171, The code swallows errors from load_meta_rules() when include_meta=True; change the bare except to catch Exception as e and log the failure before returning an empty list so failures are visible: replace the current except block with something like "except Exception as e: logger.warning('Failed to load meta rules for include_meta export', exc_info=True)" (ensure you use the module logger or import one) and then return [] — reference load_meta_rules, include_meta and the metas assignment to locate the change.

coderabbitai

Review continued from previous batch...

coderabbitai · 2026-05-02T09:34:21Z

 ### 1. Local-first stays the source of truth
-SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
+SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix markdownlint MD022: add blank line after heading

### 1. Local-first stays the source of truth is not followed by a blank line before the bullet list, which triggers MD022. citestatic_analysis_hints

✅ Proposed change

### 1. Local-first stays the source of truth + SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### 1. Local-first stays the source of truth

SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.

SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.

### 1. Local-first stays the source of truth

SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/docs/architecture/multi-tenant-future-proofing.md` around lines 21 - 22, The heading "### 1. Local-first stays the source of truth" violates markdownlint MD022 because it is not followed by a blank line; fix this by inserting a single blank line immediately after that heading (i.e., add an empty line between the heading and the following paragraph/bullet list) so the document conforms to MD022 while keeping the existing heading text and subsequent content unchanged.

coderabbitai · 2026-05-02T09:34:21Z

 # Dashboard

-The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It wraps the same data the local `brain.manifest.json` exposes, plus Cloud-only views for meta-rule synthesis, team management, and the operator console.
+The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It visualizes the same data the local `brain.manifest.json` exposes, plus Cloud-only views for team management and the operator console. Meta-rule synthesis runs locally in the SDK — the dashboard renders the results, it does not re-run them.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This page still contradicts itself about where meta-rules are synthesized.

Line 3 now says synthesis runs locally, but the Brain detail bullets later still describe meta-rules as “cloud-synthesized.” Please update that downstream copy in the same pass.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/docs/cloud/dashboard.md` at line 3, Update the contradictory wording in the Gradata Cloud dashboard docs: change the downstream "Brain detail" bullets that currently call meta-rules “cloud-synthesized” to match the earlier statement that meta-rule synthesis runs locally in the SDK (e.g., refer to "meta-rules", "brain.manifest.json", and the "Brain detail" bullets in Gradata/docs/cloud/dashboard.md) so all references consistently state that synthesis is performed locally and the dashboard only renders the results.

coderabbitai · 2026-05-02T09:34:21Z

 # Gradata Cloud

-Gradata Cloud is the hosted dashboard and back-end that complements the open-source SDK. The SDK keeps running locally; Cloud adds synchronization, cross-device continuity, team sharing, meta-rule synthesis, and an operator view for engineering teams.
+Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reduce wordiness flagged by LanguageTool

Replace “on top of that local loop” with a shorter phrase (e.g., “on the local loop”) to address the wordiness lint. citestatic_analysis_hints

✅ Proposed change

-Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop. +Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.

Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop.

🧰 Tools

🪛 LanguageTool

[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...

(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/docs/cloud/overview.md` at line 3, The sentence containing "on top of that local loop" in the Gradata Cloud overview should be shortened for clarity; replace that phrase with "on the local loop" so the sentence reads "...Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop." Locate the paragraph that begins "Gradata Cloud is the hosted dashboard..." and update the exact string accordingly.

coderabbitai · 2026-05-02T09:34:21Z

+!!! info "Local by default"
+    Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.

-    The math, the events, and the storage are all open. Only the LLM-driven synthesis that turns `[rule, rule, rule] → "Verify before acting"` is cloud-gated.
+    Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the footer cross-reference to match the new local-first explanation.

This section says cloud does not synthesize meta-rules, but the “Next” link at the bottom still sends readers to Cloud Overview “for meta-rule synthesis.” That pointer is now misleading.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/docs/concepts/meta-rules.md` around lines 47 - 50, The footer "Next" link that currently points readers to the Cloud Overview "for meta-rule synthesis" is now misleading given the "Local by default" text; update the bottom cross-reference to either remove the claim about synthesis or retarget the link to the Cloud Overview section that discusses hosted dashboard/team sync/team brains (or a more appropriate cloud-topic page), and ensure the link text reflects that cloud is relevant for dashboard/sync/team features rather than meta-rule synthesis; locate the "Local by default" block and the subsequent "Next" link text in the same markdown and adjust the link target and label accordingly.

coderabbitai · 2026-05-02T09:34:21Z

+### 1. Deprecated adapter shims (scheduled v0.8.0)
+- `src/gradata/integrations/anthropic_adapter.py` → `middleware.wrap_anthropic`
+- `src/gradata/integrations/langchain_adapter.py` → `middleware.LangChainCallback`
+- `src/gradata/integrations/crewai_adapter.py` → `middleware.CrewAIGuard`
+Warnings are in place; remove the modules and their tests at v0.8.0.
+
+### 2. `_cloud_sync.py` terminology
+File posts to an optional external dashboard — fine to keep, but the
+module docstring should make clear it is optional telemetry, not a
+mandatory cloud dependency. Callers already tolerate absence.
+
+### 3. Docstring drift in `meta_rules.py`
+Module header still says "require Gradata Cloud" and "no-ops in the
+open-source build". That is no longer true as of the local-first port —
+rewrite the header to describe the local clustering algorithm.
+
+### 4. Test-level cloud gating
+Former `@_requires_cloud` / `skipif` markers were deleted in this cycle.
+If any new test reintroduces a cloud gate, delete the gate instead — the
+feature should either be local-first or not ship.
+
+### 5. `api_key` kwarg on `merge_into_meta`
+The old `merge_into_meta(..., api_key=...)` path routed into
+`synthesise_principle_llm` directly. Current architecture drives LLM
+distillation from `rule_synthesizer` at session close instead. The kwarg
+is still accepted via `**kwargs` for forward compatibility but performs
+no work — remove after one release.
+
+### 6. Doc sweep
+`docs/cloud/` should be audited for pages that imply cloud is required.
+Rewrite as "optional managed hosting" or delete.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add blank lines after each subsection heading.

markdownlint-cli2 is flagging every ### block here with MD022. A blank line after each heading will clear the linter without changing content.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 16 - 46, Add a single blank line after each subsection heading in LEGACY_CLEANUP.md (e.g., after "### 1. Deprecated adapter shims (scheduled v0.8.0)", "### 2. `_cloud_sync.py` terminology", "### 3. Docstring drift in `meta_rules.py`", "### 4. Test-level cloud gating", "### 5. `api_key` kwarg on `merge_into_meta`", and "### 6. Doc sweep") so every '###' header is followed by an empty line to satisfy markdownlint-md022.

coderabbitai · 2026-05-02T09:34:22Z

+  IF NOT EXISTS (
+    SELECT 1
+    FROM pg_constraint c
+    JOIN pg_class t ON t.oid = c.conrelid
+    WHERE t.relname = 'corrections'
+      AND c.contype = 'u'
+      AND c.conkey @> ARRAY[
+        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'),
+        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'),
+        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description')
+      ]::smallint[]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check for an exact unique key match here, not just a superset.

c.conkey @> ARRAY[...] also matches a wider constraint like (brain_id, session, description, created_at). In that case this migration would skip adding the intended 3-column uniqueness and still allow duplicate descriptions per session.

Suggested fix

IF NOT EXISTS ( SELECT 1 FROM pg_constraint c JOIN pg_class t ON t.oid = c.conrelid + CROSS JOIN LATERAL ( + SELECT ARRAY[ + (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'), + (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'), + (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description') + ]::smallint[] AS target_cols + ) cols WHERE t.relname = 'corrections' AND c.contype = 'u' - AND c.conkey @> ARRAY[ - (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'), - (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'), - (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description') - ]::smallint[] + AND c.conkey @> cols.target_cols + AND c.conkey <@ cols.target_cols ) THEN

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/migrations/supabase/014_corrections_unique.sql` around lines 23 - 33, The uniqueness check currently uses "c.conkey @> ARRAY[...]" which matches superset constraints; change it to test for an exact match by comparing arrays exactly (e.g., use "c.conkey = ARRAY[ ... ]::smallint[]" or use both "@>" and "<@" to ensure equality) for the constraint on columns brain_id, session, description so the migration only skips when a true 3-column unique constraint already exists for corrections.

coderabbitai · 2026-05-02T09:34:22Z

+```
+corrections_brain_session_desc_key              UNIQUE (brain_id, session, description)  -- pre-existing
+corrections_brain_session_description_unique    UNIQUE (brain_id, session, description)  -- from 014
+events_brain_type_created_at_key                UNIQUE (brain_id, type, created_at)      -- pre-existing
+events_brain_type_created_at_unique             UNIQUE (brain_id, type, created_at)      -- from 015
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to this fenced block.

markdownlint-cli2 is already flagging this snippet with MD040. Label it text or sql so the new README stays lint-clean.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/migrations/supabase/README.md` around lines 32 - 37, The fenced code block containing the UNIQUE constraint lines (e.g., corrections_brain_session_desc_key, corrections_brain_session_description_unique, events_brain_type_created_at_key, events_brain_type_created_at_unique) needs a language tag to satisfy markdownlint MD040; edit the block start from ``` to ```sql (or ```text) so the snippet is labeled (e.g., change ``` to ```sql) and the README will lint cleanly.

coderabbitai · 2026-05-02T09:34:22Z

+                for (raw,) in rows:
+                    try:
+                        parsed = _json.loads(raw) if isinstance(raw, str) else raw
+                        if isinstance(parsed, dict):
+                            session_corrections.append(parsed)
+                    except (TypeError, _json.JSONDecodeError):


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize DB correction payloads before forwarding them to _cloud_sync_session().

brain_correct() stores draft_text / final_text, but _cloud_sync_session() reads draft / final. Appending the raw event payload here makes hook-driven sessions report zero blandness and any future final-based metrics will be wrong.

Suggested fix

for (raw,) in rows: try: parsed = _json.loads(raw) if isinstance(raw, str) else raw if isinstance(parsed, dict): - session_corrections.append(parsed) + normalized = dict(parsed) + if "draft" not in normalized and "draft_text" in normalized: + normalized["draft"] = normalized["draft_text"] + if "final" not in normalized and "final_text" in normalized: + normalized["final"] = normalized["final_text"] + session_corrections.append(normalized) except (TypeError, _json.JSONDecodeError): continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_core.py` around lines 1408 - 1413, The loop that builds session_corrections currently appends raw event payloads (parsed) which contain draft_text/final_text, but _cloud_sync_session() expects draft/final; update the normalization inside the for-loop (where parsed is created) to map parsed.get("draft_text") -> parsed["draft"] and parsed.get("final_text") -> parsed["final"] (preserving existing draft/final if present) before appending to session_corrections so brain_correct() payloads align with _cloud_sync_session() expectations.

coderabbitai · 2026-05-02T09:34:22Z

+    provenance = {
+        "source": "gradata",
+        "skill_id": skill_id,
+        "brain_name": brain.dir.name,
+        "exported_at": datetime.now(UTC).isoformat(),
+        "min_state": min_state,
+    }


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Write provenance.json atomically.

This payload is persisted just below via Path.write_text(). A mid-write crash can leave a truncated JSON file inside an otherwise-created skill directory; please route it through the repo's atomic JSON write helper.

As per coding guidelines, Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_core.py` around lines 1771 - 1777, The provenance dict (provenance with keys source, skill_id, brain_name, exported_at, min_state) is being persisted with a plain Path.write_text which can leave a truncated provenance.json on crash; replace that direct write with the repository's atomic JSON write helper (use the project helper such as write_json_atomic/atomic_write_json — locate the helper in the repo) and call it to serialize the provenance dict to provenance.json atomically instead of using Path.write_text; ensure the same provenance object and target filename are used so behavior is unchanged except for atomicity.

coderabbitai · 2026-05-02T09:34:22Z

+def _call_gemini(
+    model: str, system: str, user_prompt: str, max_tokens: int, timeout: float
+) -> str | None:
+    key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY")
+    if not key:
+        _log.debug("synth: GOOGLE_API_KEY / GEMINI_API_KEY not set")
+        return None
+    try:
+        from google import genai
+        from google.genai import types as genai_types
+
+        client = genai.Client(api_key=key)
+        config = genai_types.GenerateContentConfig(
+            system_instruction=system,
+            max_output_tokens=max_tokens,
+        )
+        resp = client.models.generate_content(
+            model=model,
+            contents=user_prompt,
+            config=config,
+        )
+        text = resp.text
+        return text.strip() if text else None
+    except Exception as exc:


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

How do you configure client-level or per-request timeouts in the google-genai Python SDK?

💡 Result:

In the google-genai Python SDK (also referred to as Google Gen AI SDK), timeouts are configured using the HttpOptions class from google.genai.types. Timeouts are specified in milliseconds. Client-level timeouts: Pass http_options to the genai.Client constructor. Example: from google import genai from google.genai import types client = genai.Client( api_key='YOUR_API_KEY', http_options=types.HttpOptions(timeout=30000) # 30 seconds ) Per-request timeouts: Override via the config parameter in methods like generate_content, using GenerateContentConfig with http_options. Example: response = client.models.generate_content( model='gemini-2.0-flash-exp', contents='Tell me a story', config=types.GenerateContentConfig( http_options=types.HttpOptions(timeout=120000) # 2 minutes ) ) Note: Some users have reported issues where client-level timeouts may not always take effect as expected (e.g., GitHub issue #911), in which case per-request configuration or retries via HttpRetryOptions can be used. The SDK converts the millisecond timeout to seconds internally for the underlying httpx client.

Citations:

1: https://googleapis-python-genai-70.mintlify.app/guides/http-options

2: https://github.com/googleapis/python-genai/blob/main/google/genai/_api_client.py

3: https://stackoverflow.com/questions/78142050/how-to-set-a-timeout-on-google-gemini-generate-content-request-with-the-vertex-a

4: https://mintlify.com/googleapis/python-genai/guides/http-options

🏁 Script executed:

cat -n Gradata/src/gradata/enhancements/rule_synthesizer.py | head -50

Repository: Gradata/gradata

Length of output: 2888

🏁 Script executed:

cat -n Gradata/src/gradata/enhancements/rule_synthesizer.py | sed -n '170,250p'

Repository: Gradata/gradata

Length of output: 3292

🏁 Script executed:

# Check if there are other timeout-handling provider functions rg -A 15 "def _call_anthropic|def _call_openai" Gradata/src/gradata/enhancements/rule_synthesizer.py

Repository: Gradata/gradata

Length of output: 1131

Add timeout support to the Gemini provider to honor the fail-safe contract.

All provider paths accept a timeout parameter, but _call_gemini() (line 211) ignores it. The module's docstring explicitly promises fail-safe behavior including "model timeout" handling (line 9), yet the Gemini client is created without timeout configuration. In contrast, _call_anthropic(), _call_openai(), and _call_http() all pass timeout to their respective clients.

Per the google-genai SDK, timeouts can be set client-level via http_options=types.HttpOptions(timeout=timeout*1000) in the genai.Client() call, or per-request in the GenerateContentConfig. Without this, a slow Gemini response can block longer than SYNTH_TIMEOUT, breaking the fail-safe guarantee.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 200 - 223, _call_gemini currently ignores the passed timeout which breaks the module's fail-safe behavior; update the function so the google-genai client is created with timeout (e.g., pass http_options=genai_types.HttpOptions(timeout=int(timeout * 1000)) to genai.Client) or set the timeout on the request/config (e.g., in GenerateContentConfig), ensuring the timeout value is converted to milliseconds per the SDK and used when instantiating genai.Client and/or in GenerateContentConfig to enforce the model timeout.

Gradata · 2026-05-02T19:23:00Z

Replaced by clean rebase — #161 branch had 43 unrelated commits drifted from main. See new PR.

…2) (#162) * fix(cloud/client): push events with watermark cursor + idempotency (Bug 2) Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md. - client.sync() now reads events.jsonl, filters by last_sync_at watermark, batches 500 at a time, advances cursor on 200, retries with smaller batch on 413. - Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl which stays append-only and untouched). - 9/9 new tests pass in tests/test_cloud_client_sync.py. Council perspective P3 (Skeptic) had this take after audit-gate blocked the aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py) read raw events directly, so telemetry-only would have flatlined them. * feat(scripts): add backfill_to_cloud.py for Bug 2 history rescue One-shot: counts events.jsonl, resets local sync state, calls client.sync() in a loop until cursor catches up. Idempotent — server upserts on (brain_id, event_id). Run after PRs #11/#12/#161 merge to backfill the ~5800 historical events the broken sync silently dropped.

Gradata and others added 30 commits April 20, 2026 15:16

fix(tests): assert brain_id not tenant_id in cloud push test

f141efd

Test was checking the pre-transform local key name. _cloud_sync._transform_row correctly emits brain_id (cloud schema) from tenant_id (local schema); the assertion was stale. Co-Authored-By: Gradata <noreply@gradata.ai>

fix(session-start): correct lessons path, add brain_prompt load, tigh…

c2cc47b

…ten stale notes Co-Authored-By: Gradata <noreply@gradata.ai>

Gradata and others added 12 commits April 24, 2026 03:29

chore(gitignore): add BrainDetail to bash redirect artifact list

848c8b9

Stale file created by a subagent Bash redirect. Grouped with the existing Windows cmd.exe stdout misparse artifact entries. Co-Authored-By: Gradata <noreply@gradata.ai>

chore(gitignore): ignore bash redirect artifact file named '0'

ac8afde

Co-Authored-By: Gradata <noreply@gradata.ai>

chore(gitignore): add backtick to Windows bash artifact exclusions

470da2f

Co-Authored-By: Gradata <noreply@gradata.ai>

coderabbitai Bot added bug Something isn't working feature labels May 2, 2026

coderabbitai Bot requested changes May 2, 2026

View reviewed changes

Gradata closed this May 2, 2026

Gradata mentioned this pull request May 2, 2026

fix(cloud/client): push events with watermark + backfill script (Bug 2) #162

Merged

	Gradata Cloud is the hosted dashboard that complements the open-source SDK. The SDK is functionally complete on its own — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.
	Gradata Cloud is the hosted dashboard that complements the open-source SDK. The SDK is functionally complete on its own — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop.

Conversation

Gradata commented May 2, 2026

Bug

Fix

Tests

Audit trail

Uh oh!

greptile-apps Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cloud Sync & Event Handling (Core Feature)

Database & Persistence

Breaking Changes

New Public APIs

Security & Data Quality

Testing & Tools

Documentation

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

greptile-apps Bot commented May 2, 2026 •

edited

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading