fix(cloud/client): push events with watermark + idempotency (Bug 2 SDK side)#161
fix(cloud/client): push events with watermark + idempotency (Bug 2 SDK side)#161Gradata wants to merge 43 commits into
Conversation
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json` vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns` -> `corrections`). Added `_transform_row` per-table mapper with deterministic uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` / `GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both backend and SDK. Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth Phase 1 of the learning-pipeline revamp. Rule graduation now flows through the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN, >= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline. Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED header, regenerated only at session_close after the pipeline fires. LLM synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with claude CLI fallback (Max-plan OAuth) so users without an exportable key still get synthesis. Meta-rule deterministic fallback now warns loudly instead of silently discarding. Drops five env-flag gates in favour of file-based signals. Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the underlying diagnose() function. Flips the default cloud endpoint to api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py (all passing). Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").
Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
"how come", "you missed/forgot/failed/didnt"
All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.
Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real corrections observed this session ("Why r you not asking council", "Why flag.. we don't skip we do work") plus shorthand cases (dont / cant / again / you missed / how come). Dual-signal cases assert both types detect. Full suite: 37 passed, 1 pre-existing skip. Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7 retention, time-to-first-graduation, free->Pro conversion, correction-rate decay). Numeric triggers: pivot <20% activation + flat decay at D30; kill <100 installs at D60; scale >1K installs + >=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8 of the pre-launch gap analysis. Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis" which is stale since the graduation-cloud-gate was removed. Synthesis runs on the user's machine via rule_synthesizer.py's two-provider path (Anthropic SDK with user's key, or Claude Code Max CLI OAuth). Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a few sessions ago (rule_synthesizer.py uses user's own Anthropic key or Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly still claimed "cloud runs better graduation engine" and implied a cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy paragraph to match reality: free is functionally complete; Pro is visualization, history, export, and the future community corpus. NOTE: this file is listed in .gitignore per the earlier "untrack private files" cleanup. Force-added at request. Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row correctly emits brain_id (cloud schema) from tenant_id (local schema); the assertion was stale. Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.
Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
members included), storing {category, description, task} in context so
session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
session shares the lesson's category (or description substring).
CONFIRMED otherwise (rule survived the session).
Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.
Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.
Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of the learning loop. Actual architecture (as of the graduation-local pivot): Local SDK owns: correction capture, graduation, meta-rule clustering AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth), rule-to-hook promotion, manifest computation. Cloud owns: dashboard/visualization, cross-device sync, team brains, managed backups, future opt-in corpus donation. Files touched: - docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance. - docs/architecture/cloud-monolith-v2.md — cloud-side workload framing. - docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow. - docs/concepts/meta-rules.md — synthesis is local, not cloud-gated. - docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize. README.md was already accurate; no changes there. Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
and escalate OperationalError to warning (missing-table surfaces).
Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
no longer propagates and kills the whole push batch.
Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
from the public repo; add both to .gitignore. These contain kill
triggers, pricing, and PII that belong in the private brain vault only.
Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
the detail field, restoring the documented status vocabulary
({ok, warn, fail, skip, missing, error}).
Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
test_lesson_applications.py.
Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud" skip markers were legacy artifacts from an earlier architecture where discovery/synthesis lived server-side. This commit finishes the port: - meta_rules.discover_meta_rules + merge_into_meta run locally: category grouping + greedy semantic-similarity clustering, zombie filter on RULE-state lessons below 0.90, decay after 20 sessions, count/(count+3) confidence smoothing. - Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer, test_meta_rule_generalization, test_multi_brain_simulation, test_pipeline_e2e. These tests now exercise the local impl directly. - Retire the api_key-kwarg-on-merge_into_meta path (session-close rule_synthesizer drives LLM distillation now). - Update fixtures to realistic prose so they survive the noise filter that rejects "cut:/added:" edit-distance summaries. - Bump test_meta_rules confidence assertion to the smoothed formula. - Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges (deprecated adapter shims, cloud docs, stale module docstrings). Suite: 3809 passed, 14 skipped, 2 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures
discover_meta_rules is implemented now (local-first). The
if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.
Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).
Suite: 3813 passed, 10 skipped, 2 xfailed.
Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env
xfails (2) are tracked for v0.7 reconciliation in test docstring.
Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests: Bug: agent_graduation._update_lesson_confidence had confidence = max(0.0, confidence - MISFIRE_PENALTY) but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added confidence on rejection. Test test_rejection_decreases_confidence was xfail'd with 'API drift, reconcile in v0.7' — it was a real bug. Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY). Other cleanups in the same pass: - test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson graduates straight to RULE (stronger than PATTERN). Accept either state. - test_integration_workflow: delete stale module-level skipif guarding 5 tests behind ANTHROPIC/OPENAI keys they never actually use. They only exercise local brain.correct/convergence/efficiency — no network. - test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test already covered by the 20+ fake-client tests in the same file). - test_meta_rules: delete test_with_real_data — dev-time exploration script with zero asserts, requiring GRADATA_LESSONS_PATH env var. Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed. Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl, which does not exist on Windows. Complementary Windows paths skip on Linux — running on each platform covers all 4. Cannot be eliminated. From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>
…ten stale notes Co-Authored-By: Gradata <noreply@gradata.ai>
…ate refresh - agent_graduation: add _extract_output() to handle all Claude Code PostToolUse payload key variants (tool_response/tool_output/tool_result/output/response) so plan-mode agents no longer silently drop output - session_close: add _load_soul_mandatories() (VOICE rules from soul.md injected into brain_prompt.md) and _refresh_loop_state() (regenerates loop-state.md on session close with live DB + lesson counts); raise Stop hook timeout to 90 s - _events: add _redact_payload() (recursive email PII redaction) wired into emit() before any write; raw side-log to events.raw.jsonl (best-effort); redactor failure aborts write (fail closed) Co-Authored-By: Gradata <noreply@gradata.ai>
…e watermarks - _ulid.py: minimal stdlib ULID generator (no external dep); ulid_from_iso() preserves timestamp sort order during historical backfill - device_uuid.py: atomic read-or-create of per-brain dev_<hex> device id; race-safe via O_EXCL temp file + os.replace - 002_add_event_identity: adds event_id/device_id/content_hash/correction_chain_id/ origin_agent columns + indexes to events table; chunked 10k-row backfill that is idempotent and resumes on restart - 003_add_sync_state: creates sync_state table if missing and adds device_id/ last_push_event_id/last_pull_cursor/tenant_id watermark columns + composite indexes - tests: 44 tests covering all migration paths, chunked backfill, idempotency, PII redaction (email), loop-state generation, and session_close functions Co-Authored-By: Gradata <noreply@gradata.ai>
…ts DB Reads ~/.claude/projects/<project-hash>/*.jsonl count as the session number — the actual Anthropic session log — rather than MAX(session) from the Gradata events table. The two diverged (314 vs 367). Falls back to the events DB if the project dir can't be located. Co-Authored-By: Gradata <noreply@gradata.ai>
Previous fix only counted the active project dir (314). Global sum across all project dirs gives 659, matching the actual Anthropic session log total. Falls back to events DB if projects dir missing. Co-Authored-By: Gradata <noreply@gradata.ai>
…oop-state.md (367) Session number was read from loop-state.md (Gradata events DB counter). Now counts .jsonl files across all ~/.claude/projects/ dirs — the real Claude Code session total, same logic as status_line.py. Co-Authored-By: Gradata <noreply@gradata.ai>
Every silent except Exception: pass in the core library layers now emits a _log.debug() so failures surface under GRADATA_LOG=debug without breaking the best-effort semantics. Files touched: brain.py (telemetry guard), context_wrapper.py (apply_brain_rules / context_for fallbacks), _brain_manifest.py + _context_compile.py (added module loggers), _context_packet.py (12 data-loading guards), _manifest_metrics.py (7 DB query guards), _doctor.py (HTTP body read guard + contextlib import), _mine_transcripts.py (SIM108 ternary), hooks/session_close.py (4 x SIM105 OSError guards converted to contextlib.suppress). Co-Authored-By: Gradata <noreply@gradata.ai>
ruff check src/ --fix resolved 8 auto-fixable violations (E, F, I rules). ruff format src/ reformatted 163 files to enforce consistent style. Zero errors remain; 13 pre-existing warnings (optional cloud/framework imports, lazy __all__ patterns) are unchanged. Co-Authored-By: Gradata <noreply@gradata.ai>
Two tests expected s0/s42 but got s659 because _claude_session_count() was walking the real ~/.claude/projects/. Add fake_home fixture so the function returns None and falls back to the events DB as intended. Co-Authored-By: Gradata <noreply@gradata.ai>
…eshold
New Stop hook writes a structured handoff to brain/sessions/handoff-{ts}.md
when context usage exceeds GRADATA_CTX_THRESHOLD (default 65%). inject_brain_rules
surfaces a <watchdog-alert> block at next session start so the LLM knows to
review the handoff and run /compact or /clear.
Also: bracket_confidence() in session_close for cache-key stability; remove
MAX_RULES render cap from inject_brain_rules (overshoot logic was masking gaps);
13 new tests in test_ctx_watchdog, tests in test_rule_synthesizer updated.
Co-Authored-By: Gradata <noreply@gradata.ai>
…ript store + retroactive sweep P1: call_provider() dispatch in rule_synthesizer.py routes by model prefix (claude-* → Anthropic, gpt-*/o1/o3 → OpenAI, gemini-* → Google, http → generic). session_close._refresh_brain_prompt now uses call_provider instead of inline SDK. P2: _bracket_confidence() buckets FSRS floats into 3 stable bands (low/mid/high) so per-tick confidence changes no longer bust the synthesis cache. P3: New _transcript.py (log_turn, load_turns, cleanup_ttl) and _transcript_providers.py (ProviderTranscriptSource + GradataTranscriptSource) form the transcript store layer. _retroactive_sweep() in the waterfall runs implicit_feedback patterns across all session turns (gated on GRADATA_TRANSCRIPT=1). OpenAI, LangChain, CrewAI middleware adapters gain session_id + log_turn() calls. 21 new tests in test_transcript.py. Co-Authored-By: Gradata <noreply@gradata.ai>
…only The global Path.is_file patch in _run_main() caused inject_brain_rules to also read a fake pending_handoff.txt and append a <watchdog-alert> block. Test now extracts content between <brain-rules>...</brain-rules> before counting lines, making it immune to any outer blocks appended to the result. Co-Authored-By: Gradata <noreply@gradata.ai>
- pre_compact.py rewritten: when auto-compact fires with a pending handoff, replaces the compact summary verbatim with handoff content so no lossy LLM summarization occurs. Manual compact falls back to snapshot. Corrects field name from "type" → "trigger" (keeps legacy fallback). - inject_brain_rules._build_watchdog_block() extracted from inline main(): Phase 1 (pre-/clear): consumes pending_handoff.txt, stages content to post_clear_handoff.txt, injects <watchdog-alert> with run-/clear prompt. Phase 2 (post-/clear): consumes post_clear_handoff.txt, injects <session-handoff> into fresh session. Phase 2 takes priority if both exist. - implicit_feedback: return None instead of signal name string to reduce UserPromptSubmit noise. - tests/test_pre_compact.py: 9 tests covering both trigger paths. - tests/test_inject_watchdog_phases.py: 8 tests covering both phases. Co-Authored-By: Gradata <noreply@gradata.ai>
graph_first_check.py (PreToolUse, Glob|Grep): blocks exploratory code searches until the session flag is set. Returns a block decision with the exact ToolSearch call needed to unblock. graph_session_track.py (PostToolUse, ToolSearch): writes a per-session flag file when a ToolSearch query contains "code-review-graph", clearing the block for the rest of the session. inject_brain_rules.py: appends <code-graph-tools> directive to every SessionStart injection, with the mandatory ToolSearch query string. Both hooks registered in ~/.claude/settings.json. Bypass via GRADATA_GRAPH_CHECK=0. 18 tests, smoke-tested end-to-end. Co-Authored-By: Gradata <noreply@gradata.ai>
…tignore cleanup - test_hooks_intelligence.py: implicit_feedback tests now assert result is None and verify IMPLICIT_FEEDBACK event via mock_emit (hook emits, doesn't return) - session_close.py: reorder imports alphabetically (isort) - .gitignore: add graphify temp files, run.log patterns, and /.archive/ personal Claude Code config backups so they never accidentally land in commits Co-Authored-By: Gradata <noreply@gradata.ai>
… migration reference - Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py: move legacy Streamlit dashboard per Phase 4 deprecation plan (gradata.ai web dashboard now covers all panels — /rules, /corrections, /self-healing, /observability) - Gradata/migrations/supabase/: reference copies of cloud migrations 014-016 applied to prod 2026-04-24 (corrections unique, events unique, brains.last_used_at) - Gradata/docs/specs/cloud-sync-and-pricing.md: DRAFT v1 sync architecture + pricing tier spec Co-Authored-By: Gradata <noreply@gradata.ai>
Stale file created by a subagent Bash redirect. Grouped with the existing Windows cmd.exe stdout misparse artifact entries. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
- CHANGELOG.md: add [Unreleased] section covering 18 commits since 2026-04-23 (cloud sync, hooks hardening, Supabase migrations, Streamlit archival, statusline session-count source, implicit_feedback emit-only contract) - migrations/supabase/014,015: wrap constraint adds in DO blocks that check pg_constraint first, making re-runs safe on any DB (prod already had inline UNIQUE _key variants from CREATE TABLE; these migrations added redundant _unique variants, now documented as no-op on existing systems) - migrations/supabase/README.md: document prod constraint state (both _key and _unique present on corrections + events) and drift-cleanup deferred Co-Authored-By: Gradata <noreply@gradata.ai>
Critic audit flagged a silent-drop path: when resolve_brain_dir() returns None (fresh install, CI env, unconfigured brain) the hook detected signals but skipped emit() with no log — every correction became invisible. - hooks/implicit_feedback.py: add debug log in the else branch recording how many signals were detected and of which types, so operators running `GRADATA_LOG_LEVEL=DEBUG` see the breadcrumb. - tests/test_implicit_feedback.py: add TestMainNoBrainDir covering the main() path (previously only _detect_signals was tested) — verifies the debug log fires on detected signals, stays quiet on no-signal input, and short messages don't crash. Co-Authored-By: Gradata <noreply@gradata.ai>
Watermark stalls from 23505 unique-violations were invisible unless a
caller grepped logs: _post() logged everything at WARNING. Now HTTP 409
and any "23505" body are logged at ERROR with a body snippet, and the
last error is persisted to brain_dir/cloud_push_error.json so
'gradata doctor' can surface it ('fail' for constraint violations,
'warn' for other non-2xx). Successful pushes clear the file.
_post() signature is now (accepted, error_info|None); call sites and
the three existing tests patching _post are updated. A _coerce_post_result
shim tolerates legacy int returns from any external patches.
Closes T17 from the overnight backlog (critic finding cycle-2 #1).
Addresses three cycle-3 council findings on commit 492c3dd: 1. Non-atomic write (critic #1, high-severity race). `_record_push_error` now writes to `<name>.tmp` then `os.replace`s into the target. Concurrent readers (doctor + daemon + MCP server) can no longer observe a truncated file that would mask a constraint violation as "error file unreadable". 2. PII leak in persisted error (critic #2). PostgREST 23505 bodies echo conflicting row values in `details`/`hint` fields, and `gradata doctor` prints the file verbatim. New `_scrub_error_body` parses the body as JSON and keeps only `code` + the first 120 chars of `message` (enough for the constraint name). Non-JSON bodies reduce to a length marker. Log messages use the scrubbed form too. 3. Removed the `_coerce_post_result` shim (verifier + critic). Zero tests exercised the bare-int branch it guarded; callers now destructure `_post` returns directly. Tests: +2 (`test_post_error_body_scrubs_row_values`, `test_scrub_error_body_handles_non_json`), 28/28 in the cloud test files pass, 3944 passed / 3 skipped full suite. Ruff + pyright clean. Co-Authored-By: Gradata <noreply@gradata.ai>
When doctor reports on cloud_push_error.json, the detail string now names the brain directory it checked. In multi-brain deployments, push() and doctor() can resolve different brain_dirs silently — surfacing the path lets users spot the divergence instead of chasing phantom "ok" reports. Cycle-3 critic finding #3. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…metry Three bugs kept last_sync_at frozen: - cloud/client.py POSTed /brains/sync (path doesn't exist) -> /sync - cloud/sync.py POSTed /v1/telemetry/metrics -> /api/v1/telemetry/metrics - Stop hook never fired cloud sync because Claude Code doesn't call brain.end_session(). Added cloud_sync_tick() helper in _core.py and new _run_cloud_sync step in session_close.py waterfall. Also elevated silent DEBUG failures to WARNING with HTTP status + exc_info so the next failure mode surfaces in run.log. 3945 tests pass. Co-Authored-By: Gradata <noreply@gradata.ai>
New CLI: gradata skill export <name> [--output-dir DIR] [--description STR]
[--category CAT] [--no-meta]
The bet: Claude Skills' "gotchas" section is exactly what graduated
RULE-tier lessons are -- but generated from real corrections instead of
hand-written. This turns a brain into a portable, shippable Skill folder
with valid YAML frontmatter, category-grouped gotchas, and (when
available) injectable meta-principles.
- new module enhancements/skill_export.py reuses _parse_rules from
rule_export so the RULE-only filter and [hooked] marker stripping
stay consistent across exporters
- auto-generated frontmatter description lists rule categories with
defensive 900-char clip (Anthropic 1024 ceiling)
- name slugified for safe folder name + frontmatter alignment
- description quote-escapes preserve YAML validity
- meta-rule loader degrades gracefully on missing system.db / table
24 new tests; full suite 3969 pass (+24, 0 regressions).
Unblocks M4 items 7 and 9 (self-dev Skill, composition Skill) per
plans/swift-toasting-origami.md.
Co-Authored-By: Gradata <noreply@gradata.ai>
…ug 2) Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md. - client.sync() now reads events.jsonl, filters by last_sync_at watermark, batches 500 at a time, advances cursor on 200, retries with smaller batch on 413. - Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl which stays append-only and untouched). - 9/9 new tests pass in tests/test_cloud_client_sync.py. Council perspective P3 (Skeptic) had this take after audit-gate blocked the aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py) read raw events directly, so telemetry-only would have flatlined them.
|
Too many files changed for review. ( |
📝 WalkthroughSummaryCloud Sync & Event Handling (Core Feature)
Database & Persistence
Breaking Changes
New Public APIs
Security & Data Quality
Testing & Tools
Documentation
WalkthroughThis PR implements a comprehensive shift toward a "local-first" architecture: Gradata becomes functionally complete without cloud services, meta-rule synthesis runs locally using the user's LLM provider, cloud syncs graduated rules and events for visualization/backup/cross-device use only, and new migrations/client code support resumable multi-device sync. Additionally, graduated lessons export as Anthropic Claude Skills via a new CLI command, and extensive code refactoring standardizes formatting and improves observability. ChangesLocal-First Architecture Shift
Cloud Sync & Multi-Device Support
Skill Export & CLI Enhancement
Observability & Event Handling Enhancements
Config & Path Resolution
Extensive Refactoring & Formatting
Sequence Diagram(s)sequenceDiagram
participant User
participant Brain as Brain (Local)
participant Migrate as Migrations
participant CloudSync as CloudClient
participant Cloud as Cloud (Supabase)
participant Dashboard as Dashboard
Note over User,Dashboard: Initial Setup: Multi-Device Sync Initialization
User->>Brain: Call set_brain_dir (device A, first time)
Brain->>Migrate: 001_add_tenant_id: backfill tenant_id
Brain->>Migrate: 002_add_event_identity: generate device_id, event_id, content_hash
Migrate->>Brain: Store .device_id locally
Migrate->>Brain: Create sync_state table with watermarks
Brain->>Brain: Initialize .gradata-sync-state.json
Note over User,Dashboard: Session: Meta-Rule Synthesis (Local)
User->>Brain: brain.correct(...)
Brain->>Brain: _attribute_domain_fires(), build lessons
Brain->>Brain: brain_end_session(...)
Brain->>Brain: discover_meta_rules() → cluster by similarity (local)
Brain->>Brain: merge_into_meta() → deterministic synthesis
Brain->>Brain: emit LESSON_CHANGE, RULE_CREATED events
Brain->>Brain: Persist to events.jsonl, system.db
Note over User,Dashboard: Sync: Push Events + Graduated Rules to Cloud
User->>CloudSync: Call client.sync()
CloudSync->>Brain: Read events.jsonl, load last_push_event_id from sync_state
CloudSync->>CloudSync: Filter pending events, batch (batch_size=500)
CloudSync->>CloudSync: Transform rows: deterministic UUIDs, table remap, JSON scrub
CloudSync->>Cloud: POST /sync (batched events)
Cloud->>Cloud: Upsert events, meta_rules (conflict-free append-only)
CloudSync->>Brain: Write new watermark to sync_state.json
CloudSync->>Brain: Return ingested_count
Note over User,Dashboard: Dashboard View (Async, Read-Only)
Dashboard->>Cloud: Query events, meta_rules, graduated lessons
Dashboard->>User: Render charts, learning funnel, meta-rule corpus
Note over Dashboard: Cloud never re-runs graduation or modifies local state
Note over User,Dashboard: Second Device: Resume Learning
User->>Brain: Set brain_dir (device B, existing brain)
Brain->>Migrate: device_uuid.get_or_create_device_id() → new device_id
Brain->>CloudSync: Call client.sync()
CloudSync->>Cloud: Pull new events/rules since last cursor
Cloud->>CloudSync: Return events for this device
CloudSync->>Brain: Ingest events, update local system.db
CloudSync->>Brain: Advance last_pull_cursor
Brain->>Brain: Continue learning loop (graduation, synthesis local)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
There was a problem hiding this comment.
Actionable comments posted: 55
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6681236a-68ce-4e4c-b663-7010e17c61fe
⛔ Files ignored due to path filters (1)
.claude/hooks/statusline/sprites-statusline.jsis excluded by!.claude/**
📒 Files selected for processing (242)
.gitignoreGradata/.archive/dashboard_streamlit_deprecated_2026-04-23.pyGradata/CHANGELOG.mdGradata/docs/LEGACY_CLEANUP.mdGradata/docs/architecture/cloud-monolith-v2.mdGradata/docs/architecture/multi-tenant-future-proofing.mdGradata/docs/cloud/dashboard.mdGradata/docs/cloud/overview.mdGradata/docs/concepts/meta-rules.mdGradata/docs/specs/cloud-sync-and-pricing.mdGradata/hooks/hooks.jsonGradata/migrations/supabase/014_corrections_unique.sqlGradata/migrations/supabase/015_events_unique.sqlGradata/migrations/supabase/016_brains_last_used_at.sqlGradata/migrations/supabase/README.mdGradata/scripts/backfill_to_cloud.pyGradata/skills/core/session-start/SKILL.mdGradata/src/gradata/__init__.pyGradata/src/gradata/_brain_manifest.pyGradata/src/gradata/_cloud_sync.pyGradata/src/gradata/_config.pyGradata/src/gradata/_config_paths.pyGradata/src/gradata/_context_compile.pyGradata/src/gradata/_context_packet.pyGradata/src/gradata/_core.pyGradata/src/gradata/_data_flow_audit.pyGradata/src/gradata/_db.pyGradata/src/gradata/_doctor.pyGradata/src/gradata/_events.pyGradata/src/gradata/_export_brain.pyGradata/src/gradata/_fact_extractor.pyGradata/src/gradata/_file_lock.pyGradata/src/gradata/_http.pyGradata/src/gradata/_installer.pyGradata/src/gradata/_manifest_helpers.pyGradata/src/gradata/_manifest_metrics.pyGradata/src/gradata/_migrations/001_add_tenant_id.pyGradata/src/gradata/_migrations/002_add_event_identity.pyGradata/src/gradata/_migrations/003_add_sync_state.pyGradata/src/gradata/_migrations/_runner.pyGradata/src/gradata/_migrations/_ulid.pyGradata/src/gradata/_migrations/device_uuid.pyGradata/src/gradata/_migrations/fill_null_tenant.pyGradata/src/gradata/_migrations/tenant_uuid.pyGradata/src/gradata/_mine_transcripts.pyGradata/src/gradata/_paths.pyGradata/src/gradata/_query.pyGradata/src/gradata/_stats.pyGradata/src/gradata/_telemetry.pyGradata/src/gradata/_tenant.pyGradata/src/gradata/_text_utils.pyGradata/src/gradata/_transcript.pyGradata/src/gradata/_transcript_providers.pyGradata/src/gradata/_types.pyGradata/src/gradata/_validator.pyGradata/src/gradata/_workers.pyGradata/src/gradata/adapters/mem0.pyGradata/src/gradata/audit.pyGradata/src/gradata/brain.pyGradata/src/gradata/brain_inspection.pyGradata/src/gradata/cli.pyGradata/src/gradata/cloud/client.pyGradata/src/gradata/cloud/sync.pyGradata/src/gradata/context_wrapper.pyGradata/src/gradata/contrib/enhancements/eval_benchmark.pyGradata/src/gradata/contrib/enhancements/install_manifest.pyGradata/src/gradata/contrib/enhancements/quality_gates.pyGradata/src/gradata/contrib/enhancements/truth_protocol.pyGradata/src/gradata/contrib/patterns/__init__.pyGradata/src/gradata/contrib/patterns/agent_modes.pyGradata/src/gradata/contrib/patterns/context_brackets.pyGradata/src/gradata/contrib/patterns/evaluator.pyGradata/src/gradata/contrib/patterns/execute_qualify.pyGradata/src/gradata/contrib/patterns/guardrails.pyGradata/src/gradata/contrib/patterns/human_loop.pyGradata/src/gradata/contrib/patterns/loop_detection.pyGradata/src/gradata/contrib/patterns/mcp.pyGradata/src/gradata/contrib/patterns/memory.pyGradata/src/gradata/contrib/patterns/middleware.pyGradata/src/gradata/contrib/patterns/orchestrator.pyGradata/src/gradata/contrib/patterns/parallel.pyGradata/src/gradata/contrib/patterns/pipeline.pyGradata/src/gradata/contrib/patterns/q_learning_router.pyGradata/src/gradata/contrib/patterns/rag.pyGradata/src/gradata/contrib/patterns/reconciliation.pyGradata/src/gradata/contrib/patterns/reflection.pyGradata/src/gradata/contrib/patterns/sub_agents.pyGradata/src/gradata/contrib/patterns/task_escalation.pyGradata/src/gradata/contrib/patterns/tools.pyGradata/src/gradata/contrib/patterns/tree_of_thoughts.pyGradata/src/gradata/correction_detector.pyGradata/src/gradata/daemon.pyGradata/src/gradata/detection/addition_pattern.pyGradata/src/gradata/enhancements/_sanitize.pyGradata/src/gradata/enhancements/bandits/collaborative_filter.pyGradata/src/gradata/enhancements/bandits/contextual_bandit.pyGradata/src/gradata/enhancements/behavioral_engine.pyGradata/src/gradata/enhancements/causal_chains.pyGradata/src/gradata/enhancements/cluster_manager.pyGradata/src/gradata/enhancements/clustering.pyGradata/src/gradata/enhancements/contradiction_detector.pyGradata/src/gradata/enhancements/dedup.pyGradata/src/gradata/enhancements/diff_engine.pyGradata/src/gradata/enhancements/edit_classifier.pyGradata/src/gradata/enhancements/freshness.pyGradata/src/gradata/enhancements/git_backfill.pyGradata/src/gradata/enhancements/graduation/agent_graduation.pyGradata/src/gradata/enhancements/graduation/judgment_decay.pyGradata/src/gradata/enhancements/graduation/rules_distillation.pyGradata/src/gradata/enhancements/graduation/scoring.pyGradata/src/gradata/enhancements/instruction_cache.pyGradata/src/gradata/enhancements/learning_pipeline.pyGradata/src/gradata/enhancements/lesson_discriminator.pyGradata/src/gradata/enhancements/llm_provider.pyGradata/src/gradata/enhancements/llm_synthesizer.pyGradata/src/gradata/enhancements/memory_taxonomy.pyGradata/src/gradata/enhancements/meta_rules.pyGradata/src/gradata/enhancements/meta_rules_storage.pyGradata/src/gradata/enhancements/metrics.pyGradata/src/gradata/enhancements/observation_hooks.pyGradata/src/gradata/enhancements/pattern_extractor.pyGradata/src/gradata/enhancements/pattern_integration.pyGradata/src/gradata/enhancements/pipeline_rewriter.pyGradata/src/gradata/enhancements/profiling/tone_profile.pyGradata/src/gradata/enhancements/prompt_synthesizer.pyGradata/src/gradata/enhancements/reporting.pyGradata/src/gradata/enhancements/retrieval_fusion.pyGradata/src/gradata/enhancements/router_warmstart.pyGradata/src/gradata/enhancements/rule_canary.pyGradata/src/gradata/enhancements/rule_context_bridge.pyGradata/src/gradata/enhancements/rule_export.pyGradata/src/gradata/enhancements/rule_integrity.pyGradata/src/gradata/enhancements/rule_pipeline.pyGradata/src/gradata/enhancements/rule_synthesizer.pyGradata/src/gradata/enhancements/rule_to_hook.pyGradata/src/gradata/enhancements/rule_verifier.pyGradata/src/gradata/enhancements/scoring/brain_scores.pyGradata/src/gradata/enhancements/scoring/calibration.pyGradata/src/gradata/enhancements/scoring/correction_tracking.pyGradata/src/gradata/enhancements/scoring/failure_detectors.pyGradata/src/gradata/enhancements/scoring/gate_calibration.pyGradata/src/gradata/enhancements/scoring/loop_intelligence.pyGradata/src/gradata/enhancements/scoring/memory_extraction.pyGradata/src/gradata/enhancements/scoring/reports.pyGradata/src/gradata/enhancements/scoring/success_conditions.pyGradata/src/gradata/enhancements/self_improvement/__init__.pyGradata/src/gradata/enhancements/self_improvement/_confidence.pyGradata/src/gradata/enhancements/self_improvement/_graduation.pyGradata/src/gradata/enhancements/similarity.pyGradata/src/gradata/enhancements/skill_export.pyGradata/src/gradata/events_bus.pyGradata/src/gradata/graph.pyGradata/src/gradata/hooks/_base.pyGradata/src/gradata/hooks/_generated_runner_core.pyGradata/src/gradata/hooks/_installer.pyGradata/src/gradata/hooks/_profiles.pyGradata/src/gradata/hooks/agent_graduation.pyGradata/src/gradata/hooks/agent_precontext.pyGradata/src/gradata/hooks/auto_correct.pyGradata/src/gradata/hooks/brain_maintain.pyGradata/src/gradata/hooks/claude_code.pyGradata/src/gradata/hooks/client.pyGradata/src/gradata/hooks/config_protection.pyGradata/src/gradata/hooks/config_validate.pyGradata/src/gradata/hooks/context_inject.pyGradata/src/gradata/hooks/ctx_watchdog.pyGradata/src/gradata/hooks/daemon.pyGradata/src/gradata/hooks/dispatch_post.pyGradata/src/gradata/hooks/duplicate_guard.pyGradata/src/gradata/hooks/generated_runner.pyGradata/src/gradata/hooks/generated_runner_post.pyGradata/src/gradata/hooks/graph_first_check.pyGradata/src/gradata/hooks/graph_session_track.pyGradata/src/gradata/hooks/implicit_feedback.pyGradata/src/gradata/hooks/inject_brain_rules.pyGradata/src/gradata/hooks/jit_inject.pyGradata/src/gradata/hooks/pre_compact.pyGradata/src/gradata/hooks/rule_enforcement.pyGradata/src/gradata/hooks/secret_scan.pyGradata/src/gradata/hooks/self_review.pyGradata/src/gradata/hooks/session_boot.pyGradata/src/gradata/hooks/session_close.pyGradata/src/gradata/hooks/session_persist.pyGradata/src/gradata/hooks/stale_hook_check.pyGradata/src/gradata/hooks/status_line.pyGradata/src/gradata/hooks/telemetry_summary.pyGradata/src/gradata/hooks/tool_failure_emit.pyGradata/src/gradata/hooks/tool_finding_capture.pyGradata/src/gradata/inspection.pyGradata/src/gradata/integrations/anthropic_adapter.pyGradata/src/gradata/integrations/openai_adapter.pyGradata/src/gradata/mcp_server.pyGradata/src/gradata/mcp_tools.pyGradata/src/gradata/middleware/__init__.pyGradata/src/gradata/middleware/_core.pyGradata/src/gradata/middleware/anthropic_adapter.pyGradata/src/gradata/middleware/crewai_adapter.pyGradata/src/gradata/middleware/langchain_adapter.pyGradata/src/gradata/middleware/openai_adapter.pyGradata/src/gradata/notifications.pyGradata/src/gradata/onboard.pyGradata/src/gradata/rules/rule_context.pyGradata/src/gradata/rules/rule_engine/__init__.pyGradata/src/gradata/rules/rule_engine/_formatting.pyGradata/src/gradata/rules/rule_ranker.pyGradata/src/gradata/rules/scope.pyGradata/src/gradata/safety.pyGradata/src/gradata/security/correction_hash.pyGradata/src/gradata/security/correction_provenance.pyGradata/src/gradata/security/manifest_signing.pyGradata/src/gradata/sidecar/watcher.pyGradata/tests/conftest.pyGradata/tests/test_agent_graduation.pyGradata/tests/test_bug_fixes.pyGradata/tests/test_cloud_client_sync.pyGradata/tests/test_cloud_row_push.pyGradata/tests/test_cloud_sync.pyGradata/tests/test_cluster_injection.pyGradata/tests/test_ctx_watchdog.pyGradata/tests/test_doctor_cloud.pyGradata/tests/test_emit_pii_redaction.pyGradata/tests/test_graph_enforcement.pyGradata/tests/test_hooks_intelligence.pyGradata/tests/test_hooks_learning.pyGradata/tests/test_implicit_feedback.pyGradata/tests/test_inject_watchdog_phases.pyGradata/tests/test_integration_workflow.pyGradata/tests/test_lesson_applications.pyGradata/tests/test_llm_synthesizer.pyGradata/tests/test_mem0_adapter.pyGradata/tests/test_meta_rule_generalization.pyGradata/tests/test_meta_rules.pyGradata/tests/test_migration_002_event_identity.pyGradata/tests/test_migration_003_sync_state.pyGradata/tests/test_multi_brain_simulation.pyGradata/tests/test_pipeline_e2e.pyGradata/tests/test_pre_compact.pyGradata/tests/test_rule_pipeline.pyGradata/tests/test_rule_synthesizer.pyGradata/tests/test_session_close_loop_state.pyGradata/tests/test_skill_export.pyGradata/tests/test_transcript.py
💤 Files with no reviewable changes (1)
- Gradata/src/gradata/enhancements/self_improvement/_graduation.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/enhancements/instruction_cache.pyGradata/src/gradata/_file_lock.pyGradata/src/gradata/contrib/patterns/memory.pyGradata/src/gradata/enhancements/clustering.pyGradata/src/gradata/enhancements/router_warmstart.pyGradata/src/gradata/_migrations/_runner.pyGradata/src/gradata/_http.pyGradata/src/gradata/_migrations/_ulid.pyGradata/src/gradata/__init__.pyGradata/src/gradata/_config.pyGradata/src/gradata/enhancements/rule_export.pyGradata/src/gradata/_types.pyGradata/src/gradata/enhancements/pattern_extractor.pyGradata/src/gradata/_tenant.pyGradata/src/gradata/contrib/patterns/tools.pyGradata/src/gradata/enhancements/bandits/contextual_bandit.pyGradata/src/gradata/events_bus.pyGradata/src/gradata/_context_compile.pyGradata/src/gradata/contrib/patterns/evaluator.pyGradata/src/gradata/contrib/patterns/__init__.pyGradata/src/gradata/enhancements/scoring/brain_scores.pyGradata/src/gradata/enhancements/rule_verifier.pyGradata/src/gradata/enhancements/diff_engine.pyGradata/src/gradata/_migrations/device_uuid.pyGradata/src/gradata/_migrations/002_add_event_identity.pyGradata/src/gradata/_migrations/tenant_uuid.pyGradata/src/gradata/_db.pyGradata/src/gradata/_mine_transcripts.pyGradata/src/gradata/contrib/patterns/sub_agents.pyGradata/src/gradata/_data_flow_audit.pyGradata/src/gradata/cloud/sync.pyGradata/src/gradata/_text_utils.pyGradata/src/gradata/contrib/patterns/middleware.pyGradata/src/gradata/contrib/patterns/pipeline.pyGradata/src/gradata/enhancements/dedup.pyGradata/src/gradata/enhancements/rule_context_bridge.pyGradata/src/gradata/contrib/enhancements/truth_protocol.pyGradata/src/gradata/contrib/patterns/parallel.pyGradata/src/gradata/enhancements/lesson_discriminator.pyGradata/src/gradata/enhancements/freshness.pyGradata/src/gradata/audit.pyGradata/src/gradata/enhancements/scoring/loop_intelligence.pyGradata/src/gradata/contrib/patterns/q_learning_router.pyGradata/src/gradata/enhancements/edit_classifier.pyGradata/src/gradata/enhancements/pattern_integration.pyGradata/src/gradata/context_wrapper.pyGradata/src/gradata/enhancements/contradiction_detector.pyGradata/src/gradata/_migrations/fill_null_tenant.pyGradata/src/gradata/enhancements/pipeline_rewriter.pyGradata/src/gradata/_workers.pyGradata/src/gradata/enhancements/profiling/tone_profile.pyGradata/src/gradata/enhancements/rule_canary.pyGradata/src/gradata/contrib/patterns/tree_of_thoughts.pyGradata/src/gradata/contrib/patterns/task_escalation.pyGradata/src/gradata/enhancements/skill_export.pyGradata/src/gradata/_transcript_providers.pyGradata/src/gradata/_transcript.pyGradata/src/gradata/_fact_extractor.pyGradata/src/gradata/enhancements/scoring/reports.pyGradata/src/gradata/contrib/patterns/execute_qualify.pyGradata/src/gradata/_migrations/003_add_sync_state.pyGradata/src/gradata/contrib/patterns/loop_detection.pyGradata/src/gradata/brain_inspection.pyGradata/src/gradata/contrib/patterns/reconciliation.pyGradata/src/gradata/contrib/enhancements/quality_gates.pyGradata/src/gradata/adapters/mem0.pyGradata/src/gradata/enhancements/similarity.pyGradata/src/gradata/enhancements/graduation/judgment_decay.pyGradata/src/gradata/enhancements/metrics.pyGradata/src/gradata/enhancements/graduation/agent_graduation.pyGradata/src/gradata/enhancements/memory_taxonomy.pyGradata/src/gradata/enhancements/graduation/rules_distillation.pyGradata/src/gradata/cloud/client.pyGradata/src/gradata/contrib/patterns/orchestrator.pyGradata/src/gradata/enhancements/reporting.pyGradata/src/gradata/_validator.pyGradata/src/gradata/enhancements/git_backfill.pyGradata/src/gradata/_brain_manifest.pyGradata/src/gradata/enhancements/scoring/gate_calibration.pyGradata/src/gradata/_migrations/001_add_tenant_id.pyGradata/src/gradata/enhancements/retrieval_fusion.pyGradata/src/gradata/enhancements/llm_provider.pyGradata/src/gradata/_manifest_helpers.pyGradata/src/gradata/contrib/patterns/context_brackets.pyGradata/src/gradata/_installer.pyGradata/src/gradata/enhancements/learning_pipeline.pyGradata/src/gradata/enhancements/behavioral_engine.pyGradata/src/gradata/enhancements/scoring/memory_extraction.pyGradata/src/gradata/enhancements/prompt_synthesizer.pyGradata/src/gradata/enhancements/scoring/correction_tracking.pyGradata/src/gradata/_doctor.pyGradata/src/gradata/contrib/patterns/mcp.pyGradata/src/gradata/enhancements/self_improvement/_confidence.pyGradata/src/gradata/contrib/patterns/agent_modes.pyGradata/src/gradata/_cloud_sync.pyGradata/src/gradata/enhancements/scoring/failure_detectors.pyGradata/src/gradata/enhancements/llm_synthesizer.pyGradata/src/gradata/enhancements/bandits/collaborative_filter.pyGradata/src/gradata/contrib/patterns/rag.pyGradata/src/gradata/enhancements/rule_to_hook.pyGradata/src/gradata/enhancements/scoring/calibration.pyGradata/src/gradata/_export_brain.pyGradata/src/gradata/contrib/patterns/human_loop.pyGradata/src/gradata/enhancements/scoring/success_conditions.pyGradata/src/gradata/_stats.pyGradata/src/gradata/detection/addition_pattern.pyGradata/src/gradata/enhancements/causal_chains.pyGradata/src/gradata/enhancements/_sanitize.pyGradata/src/gradata/enhancements/rule_integrity.pyGradata/src/gradata/correction_detector.pyGradata/src/gradata/contrib/patterns/reflection.pyGradata/src/gradata/_telemetry.pyGradata/src/gradata/contrib/enhancements/install_manifest.pyGradata/src/gradata/_manifest_metrics.pyGradata/src/gradata/enhancements/rule_synthesizer.pyGradata/src/gradata/enhancements/meta_rules.pyGradata/src/gradata/daemon.pyGradata/src/gradata/_events.pyGradata/src/gradata/cli.pyGradata/src/gradata/contrib/enhancements/eval_benchmark.pyGradata/src/gradata/graph.pyGradata/src/gradata/brain.pyGradata/src/gradata/_query.pyGradata/src/gradata/_context_packet.pyGradata/src/gradata/_config_paths.pyGradata/src/gradata/_paths.pyGradata/src/gradata/enhancements/observation_hooks.pyGradata/src/gradata/enhancements/rule_pipeline.pyGradata/src/gradata/contrib/patterns/guardrails.pyGradata/src/gradata/enhancements/meta_rules_storage.pyGradata/src/gradata/enhancements/cluster_manager.pyGradata/src/gradata/enhancements/self_improvement/__init__.pyGradata/src/gradata/enhancements/graduation/scoring.pyGradata/src/gradata/_core.py
🪛 LanguageTool
Gradata/docs/specs/cloud-sync-and-pricing.md
[grammar] ~102-~102: Ensure spelling is correct
Context: ...vior: - Triggered on Stop hook or every 5min when events accumulated. - Pushes since...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~269-~269: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...vent push logged with content_hash. - Every ACL change emits an acl_changed event...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[grammar] ~290-~290: Ensure spelling is correct
Context: ... cadence:** hourly for Personal+, every 15min for Teams+, continuous WAL for Enterpri...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
Gradata/docs/cloud/overview.md
[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...
(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)
🪛 markdownlint-cli2 (0.22.1)
Gradata/skills/core/session-start/SKILL.md
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Gradata/docs/LEGACY_CLEANUP.md
[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
Gradata/docs/concepts/meta-rules.md
[warning] 50-50: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
Gradata/migrations/supabase/README.md
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Gradata/docs/specs/cloud-sync-and-pricing.md
[warning] 35-35: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
[warning] 60-60: Fenced code blocks should be surrounded by blank lines
(MD031, blanks-around-fences)
[warning] 81-81: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 95-95: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 109-109: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Gradata/docs/architecture/multi-tenant-future-proofing.md
[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
| Run: streamlit run C:/Users/olive/SpritesWork/brain/scripts/dashboard.py | ||
| """ | ||
|
|
||
| import json | ||
| import re | ||
| import sqlite3 | ||
| from datetime import datetime | ||
| from pathlib import Path | ||
|
|
||
| import pandas as pd | ||
| import plotly.graph_objects as go | ||
| import streamlit as st | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Config | ||
| # --------------------------------------------------------------------------- | ||
| BRAIN_DIR = Path("C:/Users/olive/SpritesWork/brain") | ||
| DB_PATH = BRAIN_DIR / "system.db" | ||
| EVENTS_PATH = BRAIN_DIR / "events.jsonl" | ||
| LESSONS_PATH = BRAIN_DIR / "lessons.md" | ||
| PROSPECTS_DIR = BRAIN_DIR / "prospects" | ||
| BRIEF_PATH = BRAIN_DIR / "morning-brief.md" | ||
| TASKS_DIR = Path("C:/Users/olive/.claude/scheduled-tasks") |
There was a problem hiding this comment.
Remove the user-specific absolute paths from the archived script.
This hardcodes C:/Users/olive/... in both the docstring and runtime config, which leaks a private workstation path into the repo and makes the archive non-portable everywhere else. Parameterize these via env/CLI args or derive them relative to the script.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines 4
- 26, The file hardcodes user-specific absolute paths (notably the docstring run
path and constants BRAIN_DIR, DB_PATH, EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR,
BRIEF_PATH, TASKS_DIR); change these to be derived from environment/CLI inputs
or relative locations: replace the literal "C:/Users/olive/..." usage by reading
a base path from an environment variable (e.g., BRAIN_DIR_ENV) or a CLI arg (or
default to Path.home() / "SpritesWork/brain"), then compute DB_PATH,
EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR, BRIEF_PATH, and TASKS_DIR from that
base; also update the docstring run example to show a generic placeholder (e.g.,
streamlit run path/to/dashboard.py) rather than the absolute user path.
| DELETE FROM events a | ||
| USING events b | ||
| WHERE a.brain_id = b.brain_id | ||
| AND a.type = b.type | ||
| AND a.created_at = b.created_at | ||
| AND a.ctid > b.ctid; |
There was a problem hiding this comment.
This dedupe key is too coarse for raw events.
Using only (brain_id, type, created_at) can collapse legitimate same-type events that happen at the same timestamp, and the DELETE makes that loss irreversible. Given this PR’s move toward explicit event identity/idempotency, the uniqueness boundary should be a real event identifier, not type + timestamp.
Also applies to: 23-37
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/migrations/supabase/015_events_unique.sql` around lines 14 - 19, The
DELETE in the migration is using a too-coarse dedupe key (a.brain_id, a.type,
a.created_at) which can remove legitimate simultaneous events; update the
deduplication to use a true event identifier (for example an event_id or
idempotency_key column) instead of type+timestamp—modify the DELETE ... USING
query to compare a.event_id = b.event_id (or the appropriate unique identifier
column) and only delete duplicates based on that stable identifier, and if such
a column does not exist add a non-null unique event identifier to the events
table first and rework the dedupe logic; apply the same fix to the analogous
block referenced in lines 23-37.
| with open(events_jsonl) as f: | ||
| for line in f: | ||
| line = line.strip() | ||
| if not line: | ||
| continue | ||
| try: | ||
| ev = json.loads(line) | ||
| except json.JSONDecodeError: | ||
| continue |
There was a problem hiding this comment.
Read events.jsonl as UTF-8 explicitly.
Line 57 uses the platform default encoding. On Windows or other non-UTF-8 locales, one non-ASCII event is enough to crash the backfill before any sync happens.
Suggested fix
- with open(events_jsonl) as f:
+ with open(events_jsonl, encoding="utf-8", errors="replace") as f:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| with open(events_jsonl) as f: | |
| for line in f: | |
| line = line.strip() | |
| if not line: | |
| continue | |
| try: | |
| ev = json.loads(line) | |
| except json.JSONDecodeError: | |
| continue | |
| with open(events_jsonl, encoding="utf-8", errors="replace") as f: | |
| for line in f: | |
| line = line.strip() | |
| if not line: | |
| continue | |
| try: | |
| ev = json.loads(line) | |
| except json.JSONDecodeError: | |
| continue |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/scripts/backfill_to_cloud.py` around lines 57 - 65, The file reading
loop currently opens events_jsonl with the platform default encoding which can
fail on non-UTF-8 systems; update the open call that reads events_jsonl (the
"with open(events_jsonl) as f:" line) to explicitly specify encoding='utf-8'
(optionally add errors='replace' or 'ignore' if you prefer resilient parsing) so
json.loads receives proper UTF-8 text and non-ASCII events do not crash the
backfill.
| Read `C:/Users/olive/SpritesWork/brain/continuation.md`. If exists, follow its Resume Point, then archive: `python C:/Users/olive/SpritesWork/brain/scripts/continuation.py archive`. If missing, continue. | ||
|
|
||
| ## Step 2: Load Context (parallel batch) | ||
|
|
||
| Fire all at once — no dependencies: | ||
| 1. Read `domain/pipeline/startup-brief.md` (pipeline snapshot, handoff section) *(verify path — may be stale)* | ||
| 2. Read `C:/Users/olive/SpritesWork/brain/lessons.md` (scan for mistakes to avoid) | ||
| 3. Check Google Calendar today + 30 days (demos, calls, meetings) | ||
| 4. Read `C:/Users/olive/SpritesWork/brain/loop-state.md` (session number, open items) *(auto-regenerated by session_close hook — always fresh)* | ||
| 5. Read `C:/Users/olive/SpritesWork/brain/brain_prompt.md` (soul.md VOICE mandatories + graduated RULE-level lessons) |
There was a problem hiding this comment.
Remove machine-specific paths and private/internal references from this skill.
Hardcoding C:/Users/olive/..., SpritesWork, Oliver, and sprites_context.md makes the skill non-portable and leaks private repo/user details into a shipped artifact. Please switch these to runtime placeholders or repo-relative/public paths.
As per coding guidelines, "Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*".
Also applies to: 38-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/skills/core/session-start/SKILL.md` around lines 12 - 21, The
SKILL.md contains hardcoded, machine-specific paths and private names (e.g.,
"C:/Users/olive/SpritesWork/...", "SpritesWork", "Oliver", and filenames like
continuation.md, lessons.md, loop-state.md, brain_prompt.md and
domain/pipeline/startup-brief.md) which must be replaced with portable
placeholders or repo-relative references; update the Step 1/Step 2 file
references in this document (and the similar occurrences around lines 38–52) to
use runtime variables or repo-relative paths (e.g.,
{{WORKSPACE}}/brain/continuation.md or ./brain/continuation.md) and remove any
personal identifiers, ensuring each bullet clearly indicates a configurable
placeholder (or public path) and add a short note that these files are expected
to be present at runtime rather than hardcoded to a local user folder.
| ``` | ||
| [check] S[N] loaded | [today's calendar or "clear"] | ||
| [tasks] Top 2-3 from loop-state open items | ||
| [alert] Only if something is broken/overdue — otherwise omit | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block.
This currently trips markdownlint MD040. Use something like ```text here.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/skills/core/session-start/SKILL.md` around lines 32 - 36, The fenced
code block in SKILL.md (the block showing "[check] S[N] loaded ..." etc.) lacks
a language tag and triggers markdownlint MD040; update that fenced block to
start with a language label (e.g., use "```text" instead of "```") so the block
is explicitly marked as plain text and the linter error is resolved.
| try: | ||
| import json as _json | ||
|
|
||
| disp_path.write_text( | ||
| _json.dumps(tracker.to_dict(), indent=2), encoding="utf-8", | ||
| _json.dumps(tracker.to_dict(), indent=2), | ||
| encoding="utf-8", | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Write disposition.json through the atomic JSON helper.
This overwrites the file in place during a best-effort phase. If the process dies after truncation, the next run loses the entire disposition state. Route this through the repo's atomic JSON writer instead.
As per coding guidelines, "Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/rule_pipeline.py` around lines 529 - 535,
The code currently calls disp_path.write_text(_json.dumps(tracker.to_dict(),
indent=2), ...) which can truncate disposition.json mid-write; change this to
use the repository's atomic JSON writer (the atomic JSON helper) to write
tracker.to_dict() to disp_path atomically instead of using disp_path.write_text;
remove the direct json dump and call the atomic helper (passing the dict and
desired indent/encoding) so disposition.json is written via the repo's
atomic-write utility.
| brain.emit( | ||
| HOOK_DEMOTED, | ||
| source, | ||
| { | ||
| "slug": slug, | ||
| "hook_path": str(target), | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Include rule_id in HOOK_DEMOTED events.
count_human_reversals() only counts reversal events whose payload contains a matching rule_id, but this emit only writes slug/hook_path. That means a manual demotion never feeds back into the empirical gate, so the same rule can be auto-promoted again immediately after being removed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/rule_to_hook.py` around lines 837 - 844, The
HOOK_DEMOTED emit is missing the rule identifier required by
count_human_reversals(), so include the rule's id in the emitted payload (e.g.,
add "rule_id": rule.id or "rule_id": rule_id depending on the local symbol
available) when calling brain.emit(HOOK_DEMOTED, source, {...}); ensure you
reference the actual Rule object or local rule_id variable used in this module
so manual demotions are counted by count_human_reversals().
| re.compile( | ||
| r"(?:by|before|on|until)\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\d{1,2}[/-]\d{1,2})", | ||
| re.I, | ||
| ), |
There was a problem hiding this comment.
Don't treat bare date fragments as action items.
This regex now matches standalone phrases like on Monday, and extract() persists that fragment via match.group(0). A user saying the meeting is on Monday will now create an action_item even though no action was requested, which pollutes prospective memory and can duplicate the temporal fact from the same sentence.
Proposed fix
_ACTION_PATTERNS = [
re.compile(r"(?:need to|should|will|going to|have to|must)\s+(.+?)(?:\.|$)", re.I),
re.compile(r"(?:follow up|schedule|send|check|review|prepare|draft)\s+(.+?)(?:\.|$)", re.I),
- re.compile(
- r"(?:by|before|on|until)\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\d{1,2}[/-]\d{1,2})",
- re.I,
- ),
]If you still want deadline-aware action items, fold the deadline into the verb-based patterns so the stored fact is the full action, e.g. send the report by Monday, not just by Monday.
Also applies to: 152-165
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/scoring/memory_extraction.py` around lines
87 - 90, The regex that matches "(?:by|before|on|until)\s+(...)" in
memory_extraction.py is capturing standalone date fragments (e.g., "on Monday")
and extract() stores match.group(0) as an action_item; change the patterns so
date/deadline fragments are only captured when attached to a verb/action phrase
(e.g., require a verb or imperative before the deadline or fold the deadline
into existing verb-based patterns like the verb-driven pattern list used by
extract()); specifically, update the loose date-only pattern(s) at the places
referenced (the re.compile call and the similar block at lines ~152-165) to
either (a) remove the standalone "(?:by|before|on|until) ..." pattern, or (b)
require a preceding verb token or phrase (e.g., using a positive lookbehind or
adding
"\b(?:send|submit|remind|schedule|prepare|complete|...)\b.*?(?:by|before|on|until)\s+..."
or merge the deadline part into the verb-based regexes), and ensure extract()
continues to use the full matched action phrase rather than a bare date
fragment.
| # 6. Output not becoming bland (from metrics module) | ||
| try: | ||
| from gradata.enhancements.metrics import compute_metrics | ||
|
|
||
| m = compute_metrics(db_path, window) | ||
| blandness = m.get("blandness_score", 0.0) if isinstance(m, dict) else getattr(m, "blandness_score", 0.0) | ||
| blandness = ( | ||
| m.get("blandness_score", 0.0) | ||
| if isinstance(m, dict) | ||
| else getattr(m, "blandness_score", 0.0) | ||
| ) | ||
| bland_ok = blandness < 0.70 | ||
| conditions.append(ConditionResult( | ||
| name="output_not_bland", | ||
| met=bland_ok, | ||
| current_value=round(blandness, 4), | ||
| baseline_value=0.70, | ||
| trend="varied" if bland_ok else "generic", | ||
| detail=f"Blandness: {blandness:.2f} (threshold: 0.70)", | ||
| )) | ||
| conditions.append( | ||
| ConditionResult( | ||
| name="output_not_bland", | ||
| met=bland_ok, | ||
| current_value=round(blandness, 4), | ||
| baseline_value=0.70, | ||
| trend="varied" if bland_ok else "generic", | ||
| detail=f"Blandness: {blandness:.2f} (threshold: 0.70)", | ||
| ) | ||
| ) | ||
| except Exception: | ||
| pass | ||
|
|
There was a problem hiding this comment.
Don’t silently swallow errors in the blandness (compute_metrics) success-condition path.
The blandness evaluation uses except Exception: pass, which will hide import errors or unexpected metric-shape issues, resulting in “mysteriously” missing/incorrect output_not_bland condition state.
Suggested fix
+import logging
+logger = logging.getLogger(__name__)
@@
try:
from gradata.enhancements.metrics import compute_metrics
@@
conditions.append(
ConditionResult(
name="output_not_bland",
met=bland_ok,
current_value=round(blandness, 4),
baseline_value=0.70,
trend="varied" if bland_ok else "generic",
detail=f"Blandness: {blandness:.2f} (threshold: 0.70)",
)
)
- except Exception:
- pass
+ except Exception:
+ logger.warning("Failed to compute blandness success condition", exc_info=True)As per coding guidelines, “Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/scoring/success_conditions.py` around lines
256 - 279, The try/except around the blandness check silently swallows errors;
update the block where you call gradata.enhancements.metrics.compute_metrics and
build the ConditionResult (the code that computes blandness, bland_ok, and
appends ConditionResult("output_not_bland")) to catch specific errors (e.g.,
ImportError, Exception as e) rather than a bare except, log a warning including
exc_info=True via the module logger or processLogger, and ensure you still
append a reasonable ConditionResult when metrics cannot be computed (e.g.,
met=False or met=None with detail describing the exception) so the pipeline
surfaces the failure instead of disappearing the condition.
| try: | ||
| metas = load_meta_rules(db_path) | ||
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
Log meta-rule load failures instead of silently dropping them.
With include_meta=True, any DB/schema error in load_meta_rules() currently degrades the export to "no meta-principles" with no signal. Please at least log the exception before returning [], so partial exports are diagnosable.
As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/skill_export.py` around lines 168 - 171, The
code swallows errors from load_meta_rules() when include_meta=True; change the
bare except to catch Exception as e and log the failure before returning an
empty list so failures are visible: replace the current except block with
something like "except Exception as e: logger.warning('Failed to load meta rules
for include_meta export', exc_info=True)" (ensure you use the module logger or
import one) and then return [] — reference load_meta_rules, include_meta and the
metas assignment to locate the change.
| ### 1. Local-first stays the source of truth | ||
| SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed. | ||
| SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed. |
There was a problem hiding this comment.
Fix markdownlint MD022: add blank line after heading
### 1. Local-first stays the source of truth is not followed by a blank line before the bullet list, which triggers MD022. citestatic_analysis_hints
✅ Proposed change
### 1. Local-first stays the source of truth
+
SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### 1. Local-first stays the source of truth | |
| SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed. | |
| SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed. | |
| ### 1. Local-first stays the source of truth | |
| SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed. |
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/architecture/multi-tenant-future-proofing.md` around lines 21 -
22, The heading "### 1. Local-first stays the source of truth" violates
markdownlint MD022 because it is not followed by a blank line; fix this by
inserting a single blank line immediately after that heading (i.e., add an empty
line between the heading and the following paragraph/bullet list) so the
document conforms to MD022 while keeping the existing heading text and
subsequent content unchanged.
| # Dashboard | ||
|
|
||
| The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It wraps the same data the local `brain.manifest.json` exposes, plus Cloud-only views for meta-rule synthesis, team management, and the operator console. | ||
| The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It visualizes the same data the local `brain.manifest.json` exposes, plus Cloud-only views for team management and the operator console. Meta-rule synthesis runs locally in the SDK — the dashboard renders the results, it does not re-run them. |
There was a problem hiding this comment.
This page still contradicts itself about where meta-rules are synthesized.
Line 3 now says synthesis runs locally, but the Brain detail bullets later still describe meta-rules as “cloud-synthesized.” Please update that downstream copy in the same pass.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/cloud/dashboard.md` at line 3, Update the contradictory wording
in the Gradata Cloud dashboard docs: change the downstream "Brain detail"
bullets that currently call meta-rules “cloud-synthesized” to match the earlier
statement that meta-rule synthesis runs locally in the SDK (e.g., refer to
"meta-rules", "brain.manifest.json", and the "Brain detail" bullets in
Gradata/docs/cloud/dashboard.md) so all references consistently state that
synthesis is performed locally and the dashboard only renders the results.
| # Gradata Cloud | ||
|
|
||
| Gradata Cloud is the hosted dashboard and back-end that complements the open-source SDK. The SDK keeps running locally; Cloud adds synchronization, cross-device continuity, team sharing, meta-rule synthesis, and an operator view for engineering teams. | ||
| Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop. |
There was a problem hiding this comment.
Reduce wordiness flagged by LanguageTool
Replace “on top of that local loop” with a shorter phrase (e.g., “on the local loop”) to address the wordiness lint. citestatic_analysis_hints
✅ Proposed change
-Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.
+Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop. | |
| Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop. |
🧰 Tools
🪛 LanguageTool
[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...
(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/cloud/overview.md` at line 3, The sentence containing "on top of
that local loop" in the Gradata Cloud overview should be shortened for clarity;
replace that phrase with "on the local loop" so the sentence reads "...Cloud
adds visualization, cross-device continuity, team sharing, and managed backups
on the local loop." Locate the paragraph that begins "Gradata Cloud is the
hosted dashboard..." and update the exact string accordingly.
| !!! info "Local by default" | ||
| Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK. | ||
|
|
||
| The math, the events, and the storage are all open. Only the LLM-driven synthesis that turns `[rule, rule, rule] → "Verify before acting"` is cloud-gated. | ||
| Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally. |
There was a problem hiding this comment.
Update the footer cross-reference to match the new local-first explanation.
This section says cloud does not synthesize meta-rules, but the “Next” link at the bottom still sends readers to Cloud Overview “for meta-rule synthesis.” That pointer is now misleading.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 50-50: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/concepts/meta-rules.md` around lines 47 - 50, The footer "Next"
link that currently points readers to the Cloud Overview "for meta-rule
synthesis" is now misleading given the "Local by default" text; update the
bottom cross-reference to either remove the claim about synthesis or retarget
the link to the Cloud Overview section that discusses hosted dashboard/team
sync/team brains (or a more appropriate cloud-topic page), and ensure the link
text reflects that cloud is relevant for dashboard/sync/team features rather
than meta-rule synthesis; locate the "Local by default" block and the subsequent
"Next" link text in the same markdown and adjust the link target and label
accordingly.
| ### 1. Deprecated adapter shims (scheduled v0.8.0) | ||
| - `src/gradata/integrations/anthropic_adapter.py` → `middleware.wrap_anthropic` | ||
| - `src/gradata/integrations/langchain_adapter.py` → `middleware.LangChainCallback` | ||
| - `src/gradata/integrations/crewai_adapter.py` → `middleware.CrewAIGuard` | ||
| Warnings are in place; remove the modules and their tests at v0.8.0. | ||
|
|
||
| ### 2. `_cloud_sync.py` terminology | ||
| File posts to an optional external dashboard — fine to keep, but the | ||
| module docstring should make clear it is optional telemetry, not a | ||
| mandatory cloud dependency. Callers already tolerate absence. | ||
|
|
||
| ### 3. Docstring drift in `meta_rules.py` | ||
| Module header still says "require Gradata Cloud" and "no-ops in the | ||
| open-source build". That is no longer true as of the local-first port — | ||
| rewrite the header to describe the local clustering algorithm. | ||
|
|
||
| ### 4. Test-level cloud gating | ||
| Former `@_requires_cloud` / `skipif` markers were deleted in this cycle. | ||
| If any new test reintroduces a cloud gate, delete the gate instead — the | ||
| feature should either be local-first or not ship. | ||
|
|
||
| ### 5. `api_key` kwarg on `merge_into_meta` | ||
| The old `merge_into_meta(..., api_key=...)` path routed into | ||
| `synthesise_principle_llm` directly. Current architecture drives LLM | ||
| distillation from `rule_synthesizer` at session close instead. The kwarg | ||
| is still accepted via `**kwargs` for forward compatibility but performs | ||
| no work — remove after one release. | ||
|
|
||
| ### 6. Doc sweep | ||
| `docs/cloud/` should be audited for pages that imply cloud is required. | ||
| Rewrite as "optional managed hosting" or delete. |
There was a problem hiding this comment.
Add blank lines after each subsection heading.
markdownlint-cli2 is flagging every ### block here with MD022. A blank line after each heading will clear the linter without changing content.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 16 - 46, Add a single blank line
after each subsection heading in LEGACY_CLEANUP.md (e.g., after "### 1.
Deprecated adapter shims (scheduled v0.8.0)", "### 2. `_cloud_sync.py`
terminology", "### 3. Docstring drift in `meta_rules.py`", "### 4. Test-level
cloud gating", "### 5. `api_key` kwarg on `merge_into_meta`", and "### 6. Doc
sweep") so every '###' header is followed by an empty line to satisfy
markdownlint-md022.
| IF NOT EXISTS ( | ||
| SELECT 1 | ||
| FROM pg_constraint c | ||
| JOIN pg_class t ON t.oid = c.conrelid | ||
| WHERE t.relname = 'corrections' | ||
| AND c.contype = 'u' | ||
| AND c.conkey @> ARRAY[ | ||
| (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'), | ||
| (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'), | ||
| (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description') | ||
| ]::smallint[] |
There was a problem hiding this comment.
Check for an exact unique key match here, not just a superset.
c.conkey @> ARRAY[...] also matches a wider constraint like (brain_id, session, description, created_at). In that case this migration would skip adding the intended 3-column uniqueness and still allow duplicate descriptions per session.
Suggested fix
IF NOT EXISTS (
SELECT 1
FROM pg_constraint c
JOIN pg_class t ON t.oid = c.conrelid
+ CROSS JOIN LATERAL (
+ SELECT ARRAY[
+ (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'),
+ (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'),
+ (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description')
+ ]::smallint[] AS target_cols
+ ) cols
WHERE t.relname = 'corrections'
AND c.contype = 'u'
- AND c.conkey @> ARRAY[
- (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'),
- (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'),
- (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description')
- ]::smallint[]
+ AND c.conkey @> cols.target_cols
+ AND c.conkey <@ cols.target_cols
) THEN🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/migrations/supabase/014_corrections_unique.sql` around lines 23 - 33,
The uniqueness check currently uses "c.conkey @> ARRAY[...]" which matches
superset constraints; change it to test for an exact match by comparing arrays
exactly (e.g., use "c.conkey = ARRAY[ ... ]::smallint[]" or use both "@>" and
"<@" to ensure equality) for the constraint on columns brain_id, session,
description so the migration only skips when a true 3-column unique constraint
already exists for corrections.
| ``` | ||
| corrections_brain_session_desc_key UNIQUE (brain_id, session, description) -- pre-existing | ||
| corrections_brain_session_description_unique UNIQUE (brain_id, session, description) -- from 014 | ||
| events_brain_type_created_at_key UNIQUE (brain_id, type, created_at) -- pre-existing | ||
| events_brain_type_created_at_unique UNIQUE (brain_id, type, created_at) -- from 015 | ||
| ``` |
There was a problem hiding this comment.
Add a language to this fenced block.
markdownlint-cli2 is already flagging this snippet with MD040. Label it text or sql so the new README stays lint-clean.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/migrations/supabase/README.md` around lines 32 - 37, The fenced code
block containing the UNIQUE constraint lines (e.g.,
corrections_brain_session_desc_key,
corrections_brain_session_description_unique, events_brain_type_created_at_key,
events_brain_type_created_at_unique) needs a language tag to satisfy
markdownlint MD040; edit the block start from ``` to ```sql (or ```text) so the
snippet is labeled (e.g., change ``` to ```sql) and the README will lint
cleanly.
| for (raw,) in rows: | ||
| try: | ||
| parsed = _json.loads(raw) if isinstance(raw, str) else raw | ||
| if isinstance(parsed, dict): | ||
| session_corrections.append(parsed) | ||
| except (TypeError, _json.JSONDecodeError): |
There was a problem hiding this comment.
Normalize DB correction payloads before forwarding them to _cloud_sync_session().
brain_correct() stores draft_text / final_text, but _cloud_sync_session() reads draft / final. Appending the raw event payload here makes hook-driven sessions report zero blandness and any future final-based metrics will be wrong.
Suggested fix
for (raw,) in rows:
try:
parsed = _json.loads(raw) if isinstance(raw, str) else raw
if isinstance(parsed, dict):
- session_corrections.append(parsed)
+ normalized = dict(parsed)
+ if "draft" not in normalized and "draft_text" in normalized:
+ normalized["draft"] = normalized["draft_text"]
+ if "final" not in normalized and "final_text" in normalized:
+ normalized["final"] = normalized["final_text"]
+ session_corrections.append(normalized)
except (TypeError, _json.JSONDecodeError):
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/_core.py` around lines 1408 - 1413, The loop that builds
session_corrections currently appends raw event payloads (parsed) which contain
draft_text/final_text, but _cloud_sync_session() expects draft/final; update the
normalization inside the for-loop (where parsed is created) to map
parsed.get("draft_text") -> parsed["draft"] and parsed.get("final_text") ->
parsed["final"] (preserving existing draft/final if present) before appending to
session_corrections so brain_correct() payloads align with _cloud_sync_session()
expectations.
| provenance = { | ||
| "source": "gradata", | ||
| "skill_id": skill_id, | ||
| "brain_name": brain.dir.name, | ||
| "exported_at": datetime.now(UTC).isoformat(), | ||
| "min_state": min_state, | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Write provenance.json atomically.
This payload is persisted just below via Path.write_text(). A mid-write crash can leave a truncated JSON file inside an otherwise-created skill directory; please route it through the repo's atomic JSON write helper.
As per coding guidelines, Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/_core.py` around lines 1771 - 1777, The provenance dict
(provenance with keys source, skill_id, brain_name, exported_at, min_state) is
being persisted with a plain Path.write_text which can leave a truncated
provenance.json on crash; replace that direct write with the repository's atomic
JSON write helper (use the project helper such as
write_json_atomic/atomic_write_json — locate the helper in the repo) and call it
to serialize the provenance dict to provenance.json atomically instead of using
Path.write_text; ensure the same provenance object and target filename are used
so behavior is unchanged except for atomicity.
| def _call_gemini( | ||
| model: str, system: str, user_prompt: str, max_tokens: int, timeout: float | ||
| ) -> str | None: | ||
| key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY") | ||
| if not key: | ||
| _log.debug("synth: GOOGLE_API_KEY / GEMINI_API_KEY not set") | ||
| return None | ||
| try: | ||
| from google import genai | ||
| from google.genai import types as genai_types | ||
|
|
||
| client = genai.Client(api_key=key) | ||
| config = genai_types.GenerateContentConfig( | ||
| system_instruction=system, | ||
| max_output_tokens=max_tokens, | ||
| ) | ||
| resp = client.models.generate_content( | ||
| model=model, | ||
| contents=user_prompt, | ||
| config=config, | ||
| ) | ||
| text = resp.text | ||
| return text.strip() if text else None | ||
| except Exception as exc: |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
How do you configure client-level or per-request timeouts in the google-genai Python SDK?
💡 Result:
In the google-genai Python SDK (also referred to as Google Gen AI SDK), timeouts are configured using the HttpOptions class from google.genai.types. Timeouts are specified in milliseconds. Client-level timeouts: Pass http_options to the genai.Client constructor. Example: from google import genai from google.genai import types client = genai.Client( api_key='YOUR_API_KEY', http_options=types.HttpOptions(timeout=30000) # 30 seconds ) Per-request timeouts: Override via the config parameter in methods like generate_content, using GenerateContentConfig with http_options. Example: response = client.models.generate_content( model='gemini-2.0-flash-exp', contents='Tell me a story', config=types.GenerateContentConfig( http_options=types.HttpOptions(timeout=120000) # 2 minutes ) ) Note: Some users have reported issues where client-level timeouts may not always take effect as expected (e.g., GitHub issue #911), in which case per-request configuration or retries via HttpRetryOptions can be used. The SDK converts the millisecond timeout to seconds internally for the underlying httpx client.
Citations:
- 1: https://googleapis-python-genai-70.mintlify.app/guides/http-options
- 2: https://github.com/googleapis/python-genai/blob/main/google/genai/_api_client.py
- 3: https://stackoverflow.com/questions/78142050/how-to-set-a-timeout-on-google-gemini-generate-content-request-with-the-vertex-a
- 4: https://mintlify.com/googleapis/python-genai/guides/http-options
🏁 Script executed:
cat -n Gradata/src/gradata/enhancements/rule_synthesizer.py | head -50Repository: Gradata/gradata
Length of output: 2888
🏁 Script executed:
cat -n Gradata/src/gradata/enhancements/rule_synthesizer.py | sed -n '170,250p'Repository: Gradata/gradata
Length of output: 3292
🏁 Script executed:
# Check if there are other timeout-handling provider functions
rg -A 15 "def _call_anthropic|def _call_openai" Gradata/src/gradata/enhancements/rule_synthesizer.pyRepository: Gradata/gradata
Length of output: 1131
Add timeout support to the Gemini provider to honor the fail-safe contract.
All provider paths accept a timeout parameter, but _call_gemini() (line 211) ignores it. The module's docstring explicitly promises fail-safe behavior including "model timeout" handling (line 9), yet the Gemini client is created without timeout configuration. In contrast, _call_anthropic(), _call_openai(), and _call_http() all pass timeout to their respective clients.
Per the google-genai SDK, timeouts can be set client-level via http_options=types.HttpOptions(timeout=timeout*1000) in the genai.Client() call, or per-request in the GenerateContentConfig. Without this, a slow Gemini response can block longer than SYNTH_TIMEOUT, breaking the fail-safe guarantee.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 200 - 223,
_call_gemini currently ignores the passed timeout which breaks the module's
fail-safe behavior; update the function so the google-genai client is created
with timeout (e.g., pass
http_options=genai_types.HttpOptions(timeout=int(timeout * 1000)) to
genai.Client) or set the timeout on the request/config (e.g., in
GenerateContentConfig), ensuring the timeout value is converted to milliseconds
per the SDK and used when instantiating genai.Client and/or in
GenerateContentConfig to enforce the model timeout.
|
Replaced by clean rebase — #161 branch had 43 unrelated commits drifted from main. See new PR. |
…2) (#162) * fix(cloud/client): push events with watermark cursor + idempotency (Bug 2) Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md. - client.sync() now reads events.jsonl, filters by last_sync_at watermark, batches 500 at a time, advances cursor on 200, retries with smaller batch on 413. - Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl which stays append-only and untouched). - 9/9 new tests pass in tests/test_cloud_client_sync.py. Council perspective P3 (Skeptic) had this take after audit-gate blocked the aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py) read raw events directly, so telemetry-only would have flatlined them. * feat(scripts): add backfill_to_cloud.py for Bug 2 history rescue One-shot: counts events.jsonl, resets local sync state, calls client.sync() in a loop until cursor catches up. Idempotent — server upserts on (brain_id, event_id). Run after PRs #11/#12/#161 merge to backfill the ~5800 historical events the broken sync silently dropped.
Pairs with gradata-cloud PR #12 (https://github.com/Gradata/gradata-cloud/pull/12).
Bug
client.sync()POSTed only{brain_id, manifest}— no events, no cursor. Server 200ed but ingested nothing. Cloud Supabase has 1376 events; live brain has 7172 (5.2x gap).Fix
<BRAIN_DIR>/.gradata-sync-state.json(events.jsonl untouched)Tests
9/9 new tests pass in
tests/test_cloud_client_sync.py.Audit trail
Diagnosed in
/tmp/audit-bug2-watermark.md. Council voted D-via-telemetry but audit-gate blocked it — 3 cloud routes read raw events directly. Pivoted to council P3 (Skeptic) plan.