Skip to content

fix(cloud/client): push events with watermark + idempotency (Bug 2 SDK side)#161

Closed
Gradata wants to merge 43 commits into
mainfrom
fix/sync-push-events
Closed

fix(cloud/client): push events with watermark + idempotency (Bug 2 SDK side)#161
Gradata wants to merge 43 commits into
mainfrom
fix/sync-push-events

Conversation

@Gradata

@Gradata Gradata commented May 2, 2026

Copy link
Copy Markdown
Owner

Pairs with gradata-cloud PR #12 (https://github.com/Gradata/gradata-cloud/pull/12).

Bug

client.sync() POSTed only {brain_id, manifest} — no events, no cursor. Server 200ed but ingested nothing. Cloud Supabase has 1376 events; live brain has 7172 (5.2x gap).

Fix

  • Read events.jsonl, filter by last_sync_at watermark, batch 500/req
  • Advance cursor on 200, retry with smaller batch on 413
  • Sync state at <BRAIN_DIR>/.gradata-sync-state.json (events.jsonl untouched)

Tests

9/9 new tests pass in tests/test_cloud_client_sync.py.

Audit trail

Diagnosed in /tmp/audit-bug2-watermark.md. Council voted D-via-telemetry but audit-gate blocked it — 3 cloud routes read raw events directly. Pivoted to council P3 (Skeptic) plan.

Gradata and others added 30 commits April 20, 2026 15:16
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json`
vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns`
-> `corrections`). Added `_transform_row` per-table mapper with deterministic
uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone
UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each
batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's
opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` /
`GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both
backend and SDK.

Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth

Phase 1 of the learning-pipeline revamp. Rule graduation now flows through
the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN,
>= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline.
Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED
header, regenerated only at session_close after the pipeline fires. LLM
synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with
claude CLI fallback (Max-plan OAuth) so users without an exportable key
still get synthesis. Meta-rule deterministic fallback now warns loudly
instead of silently discarding. Drops five env-flag gates in favour of
file-based signals.

Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the
underlying diagnose() function. Flips the default cloud endpoint to
api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py
(all passing).

Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").

Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
  "how come", "you missed/forgot/failed/didnt"

All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.

Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real
corrections observed this session ("Why r you not asking council",
"Why flag.. we don't skip we do work") plus shorthand cases
(dont / cant / again / you missed / how come). Dual-signal cases
assert both types detect. Full suite: 37 passed, 1 pre-existing skip.

Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7
retention, time-to-first-graduation, free->Pro conversion,
correction-rate decay). Numeric triggers: pivot <20% activation +
flat decay at D30; kill <100 installs at D60; scale >1K installs +
>=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8
of the pre-launch gap analysis.

Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis"
which is stale since the graduation-cloud-gate was removed. Synthesis
runs on the user's machine via rule_synthesizer.py's two-provider path
(Anthropic SDK with user's key, or Claude Code Max CLI OAuth).

Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a
few sessions ago (rule_synthesizer.py uses user's own Anthropic key or
Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly
still claimed "cloud runs better graduation engine" and implied a
cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy
paragraph to match reality: free is functionally complete; Pro is
visualization, history, export, and the future community corpus.

NOTE: this file is listed in .gitignore per the earlier
"untrack private files" cleanup. Force-added at request.

Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row
correctly emits brain_id (cloud schema) from tenant_id (local schema); the
assertion was stale.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.

Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
  members included), storing {category, description, task} in context so
  session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
    REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
    session shares the lesson's category (or description substring).
    CONFIRMED otherwise (rule survived the session).

Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.

Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.

Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of
the learning loop. Actual architecture (as of the graduation-local
pivot):

  Local SDK owns: correction capture, graduation, meta-rule clustering
  AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth),
  rule-to-hook promotion, manifest computation.

  Cloud owns: dashboard/visualization, cross-device sync, team brains,
  managed backups, future opt-in corpus donation.

Files touched:
- docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance.
- docs/architecture/cloud-monolith-v2.md — cloud-side workload framing.
- docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow.
- docs/concepts/meta-rules.md — synthesis is local, not cloud-gated.
- docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize.

README.md was already accurate; no changes there.

Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
  and escalate OperationalError to warning (missing-table surfaces).

Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
  no longer propagates and kills the whole push batch.

Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
  from the public repo; add both to .gitignore. These contain kill
  triggers, pricing, and PII that belong in the private brain vault only.

Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
  the detail field, restoring the documented status vocabulary
  ({ok, warn, fail, skip, missing, error}).

Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
  input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
  test_lesson_applications.py.

Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud"
skip markers were legacy artifacts from an earlier architecture where
discovery/synthesis lived server-side. This commit finishes the port:

- meta_rules.discover_meta_rules + merge_into_meta run locally:
  category grouping + greedy semantic-similarity clustering, zombie
  filter on RULE-state lessons below 0.90, decay after 20 sessions,
  count/(count+3) confidence smoothing.
- Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer,
  test_meta_rule_generalization, test_multi_brain_simulation,
  test_pipeline_e2e. These tests now exercise the local impl directly.
- Retire the api_key-kwarg-on-merge_into_meta path (session-close
  rule_synthesizer drives LLM distillation now).
- Update fixtures to realistic prose so they survive the noise filter
  that rejects "cut:/added:" edit-distance summaries.
- Bump test_meta_rules confidence assertion to the smoothed formula.
- Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges
  (deprecated adapter shims, cloud docs, stale module docstrings).

Suite: 3809 passed, 14 skipped, 2 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures

discover_meta_rules is implemented now (local-first). The
  if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.

Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).

Suite: 3813 passed, 10 skipped, 2 xfailed.

Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env

xfails (2) are tracked for v0.7 reconciliation in test docstring.

Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests:

Bug: agent_graduation._update_lesson_confidence had
  confidence = max(0.0, confidence - MISFIRE_PENALTY)
but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added
confidence on rejection. Test test_rejection_decreases_confidence was
xfail'd with 'API drift, reconcile in v0.7' — it was a real bug.

Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY).

Other cleanups in the same pass:

- test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern
  was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson
  graduates straight to RULE (stronger than PATTERN). Accept either state.
- test_integration_workflow: delete stale module-level skipif guarding 5
  tests behind ANTHROPIC/OPENAI keys they never actually use. They only
  exercise local brain.correct/convergence/efficiency — no network.
- test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test
  already covered by the 20+ fake-client tests in the same file).
- test_meta_rules: delete test_with_real_data — dev-time exploration
  script with zero asserts, requiring GRADATA_LESSONS_PATH env var.

Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed.

Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl,
which does not exist on Windows. Complementary Windows paths skip on
Linux — running on each platform covers all 4. Cannot be eliminated.

From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ten stale notes

Co-Authored-By: Gradata <noreply@gradata.ai>
…ate refresh

- agent_graduation: add _extract_output() to handle all Claude Code PostToolUse
  payload key variants (tool_response/tool_output/tool_result/output/response)
  so plan-mode agents no longer silently drop output
- session_close: add _load_soul_mandatories() (VOICE rules from soul.md injected
  into brain_prompt.md) and _refresh_loop_state() (regenerates loop-state.md on
  session close with live DB + lesson counts); raise Stop hook timeout to 90 s
- _events: add _redact_payload() (recursive email PII redaction) wired into
  emit() before any write; raw side-log to events.raw.jsonl (best-effort);
  redactor failure aborts write (fail closed)

Co-Authored-By: Gradata <noreply@gradata.ai>
…e watermarks

- _ulid.py: minimal stdlib ULID generator (no external dep); ulid_from_iso()
  preserves timestamp sort order during historical backfill
- device_uuid.py: atomic read-or-create of per-brain dev_<hex> device id;
  race-safe via O_EXCL temp file + os.replace
- 002_add_event_identity: adds event_id/device_id/content_hash/correction_chain_id/
  origin_agent columns + indexes to events table; chunked 10k-row backfill that
  is idempotent and resumes on restart
- 003_add_sync_state: creates sync_state table if missing and adds device_id/
  last_push_event_id/last_pull_cursor/tenant_id watermark columns + composite indexes
- tests: 44 tests covering all migration paths, chunked backfill, idempotency,
  PII redaction (email), loop-state generation, and session_close functions

Co-Authored-By: Gradata <noreply@gradata.ai>
…ts DB

Reads ~/.claude/projects/<project-hash>/*.jsonl count as the session
number — the actual Anthropic session log — rather than MAX(session)
from the Gradata events table. The two diverged (314 vs 367). Falls
back to the events DB if the project dir can't be located.

Co-Authored-By: Gradata <noreply@gradata.ai>
Previous fix only counted the active project dir (314). Global sum
across all project dirs gives 659, matching the actual Anthropic
session log total. Falls back to events DB if projects dir missing.

Co-Authored-By: Gradata <noreply@gradata.ai>
…oop-state.md (367)

Session number was read from loop-state.md (Gradata events DB counter).
Now counts .jsonl files across all ~/.claude/projects/ dirs — the real
Claude Code session total, same logic as status_line.py.

Co-Authored-By: Gradata <noreply@gradata.ai>
Every silent except Exception: pass in the core library layers now emits
a _log.debug() so failures surface under GRADATA_LOG=debug without
breaking the best-effort semantics. Files touched: brain.py (telemetry
guard), context_wrapper.py (apply_brain_rules / context_for fallbacks),
_brain_manifest.py + _context_compile.py (added module loggers),
_context_packet.py (12 data-loading guards), _manifest_metrics.py
(7 DB query guards), _doctor.py (HTTP body read guard + contextlib
import), _mine_transcripts.py (SIM108 ternary), hooks/session_close.py
(4 x SIM105 OSError guards converted to contextlib.suppress).

Co-Authored-By: Gradata <noreply@gradata.ai>
ruff check src/ --fix resolved 8 auto-fixable violations (E, F, I rules).
ruff format src/ reformatted 163 files to enforce consistent style.
Zero errors remain; 13 pre-existing warnings (optional cloud/framework
imports, lazy __all__ patterns) are unchanged.

Co-Authored-By: Gradata <noreply@gradata.ai>
Two tests expected s0/s42 but got s659 because _claude_session_count()
was walking the real ~/.claude/projects/. Add fake_home fixture so the
function returns None and falls back to the events DB as intended.

Co-Authored-By: Gradata <noreply@gradata.ai>
…eshold

New Stop hook writes a structured handoff to brain/sessions/handoff-{ts}.md
when context usage exceeds GRADATA_CTX_THRESHOLD (default 65%). inject_brain_rules
surfaces a <watchdog-alert> block at next session start so the LLM knows to
review the handoff and run /compact or /clear.

Also: bracket_confidence() in session_close for cache-key stability; remove
MAX_RULES render cap from inject_brain_rules (overshoot logic was masking gaps);
13 new tests in test_ctx_watchdog, tests in test_rule_synthesizer updated.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ript store + retroactive sweep

P1: call_provider() dispatch in rule_synthesizer.py routes by model prefix
(claude-* → Anthropic, gpt-*/o1/o3 → OpenAI, gemini-* → Google, http → generic).
session_close._refresh_brain_prompt now uses call_provider instead of inline SDK.

P2: _bracket_confidence() buckets FSRS floats into 3 stable bands (low/mid/high)
so per-tick confidence changes no longer bust the synthesis cache.

P3: New _transcript.py (log_turn, load_turns, cleanup_ttl) and
_transcript_providers.py (ProviderTranscriptSource + GradataTranscriptSource)
form the transcript store layer. _retroactive_sweep() in the waterfall runs
implicit_feedback patterns across all session turns (gated on GRADATA_TRANSCRIPT=1).
OpenAI, LangChain, CrewAI middleware adapters gain session_id + log_turn() calls.
21 new tests in test_transcript.py.

Co-Authored-By: Gradata <noreply@gradata.ai>
…only

The global Path.is_file patch in _run_main() caused inject_brain_rules to
also read a fake pending_handoff.txt and append a <watchdog-alert> block.
Test now extracts content between <brain-rules>...</brain-rules> before
counting lines, making it immune to any outer blocks appended to the result.

Co-Authored-By: Gradata <noreply@gradata.ai>
- pre_compact.py rewritten: when auto-compact fires with a pending handoff,
  replaces the compact summary verbatim with handoff content so no lossy
  LLM summarization occurs. Manual compact falls back to snapshot. Corrects
  field name from "type" → "trigger" (keeps legacy fallback).

- inject_brain_rules._build_watchdog_block() extracted from inline main():
  Phase 1 (pre-/clear): consumes pending_handoff.txt, stages content to
  post_clear_handoff.txt, injects <watchdog-alert> with run-/clear prompt.
  Phase 2 (post-/clear): consumes post_clear_handoff.txt, injects
  <session-handoff> into fresh session. Phase 2 takes priority if both exist.

- implicit_feedback: return None instead of signal name string to reduce
  UserPromptSubmit noise.

- tests/test_pre_compact.py: 9 tests covering both trigger paths.
- tests/test_inject_watchdog_phases.py: 8 tests covering both phases.

Co-Authored-By: Gradata <noreply@gradata.ai>
graph_first_check.py (PreToolUse, Glob|Grep): blocks exploratory code
searches until the session flag is set. Returns a block decision with
the exact ToolSearch call needed to unblock.

graph_session_track.py (PostToolUse, ToolSearch): writes a per-session
flag file when a ToolSearch query contains "code-review-graph", clearing
the block for the rest of the session.

inject_brain_rules.py: appends <code-graph-tools> directive to every
SessionStart injection, with the mandatory ToolSearch query string.

Both hooks registered in ~/.claude/settings.json. Bypass via
GRADATA_GRAPH_CHECK=0. 18 tests, smoke-tested end-to-end.

Co-Authored-By: Gradata <noreply@gradata.ai>
…tignore cleanup

- test_hooks_intelligence.py: implicit_feedback tests now assert result is None
  and verify IMPLICIT_FEEDBACK event via mock_emit (hook emits, doesn't return)
- session_close.py: reorder imports alphabetically (isort)
- .gitignore: add graphify temp files, run.log patterns, and /.archive/ personal
  Claude Code config backups so they never accidentally land in commits

Co-Authored-By: Gradata <noreply@gradata.ai>
Gradata and others added 12 commits April 24, 2026 03:29
… migration reference

- Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py: move legacy
  Streamlit dashboard per Phase 4 deprecation plan (gradata.ai web dashboard
  now covers all panels — /rules, /corrections, /self-healing, /observability)
- Gradata/migrations/supabase/: reference copies of cloud migrations 014-016
  applied to prod 2026-04-24 (corrections unique, events unique, brains.last_used_at)
- Gradata/docs/specs/cloud-sync-and-pricing.md: DRAFT v1 sync architecture +
  pricing tier spec

Co-Authored-By: Gradata <noreply@gradata.ai>
Stale file created by a subagent Bash redirect. Grouped with the existing
Windows cmd.exe stdout misparse artifact entries.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
- CHANGELOG.md: add [Unreleased] section covering 18 commits since 2026-04-23
  (cloud sync, hooks hardening, Supabase migrations, Streamlit archival,
  statusline session-count source, implicit_feedback emit-only contract)
- migrations/supabase/014,015: wrap constraint adds in DO blocks that check
  pg_constraint first, making re-runs safe on any DB (prod already had inline
  UNIQUE _key variants from CREATE TABLE; these migrations added redundant
  _unique variants, now documented as no-op on existing systems)
- migrations/supabase/README.md: document prod constraint state (both _key
  and _unique present on corrections + events) and drift-cleanup deferred

Co-Authored-By: Gradata <noreply@gradata.ai>
Critic audit flagged a silent-drop path: when resolve_brain_dir() returns
None (fresh install, CI env, unconfigured brain) the hook detected signals
but skipped emit() with no log — every correction became invisible.

- hooks/implicit_feedback.py: add debug log in the else branch recording
  how many signals were detected and of which types, so operators running
  `GRADATA_LOG_LEVEL=DEBUG` see the breadcrumb.
- tests/test_implicit_feedback.py: add TestMainNoBrainDir covering the
  main() path (previously only _detect_signals was tested) — verifies the
  debug log fires on detected signals, stays quiet on no-signal input, and
  short messages don't crash.

Co-Authored-By: Gradata <noreply@gradata.ai>
Watermark stalls from 23505 unique-violations were invisible unless a
caller grepped logs: _post() logged everything at WARNING. Now HTTP 409
and any "23505" body are logged at ERROR with a body snippet, and the
last error is persisted to brain_dir/cloud_push_error.json so
'gradata doctor' can surface it ('fail' for constraint violations,
'warn' for other non-2xx). Successful pushes clear the file.

_post() signature is now (accepted, error_info|None); call sites and
the three existing tests patching _post are updated. A _coerce_post_result
shim tolerates legacy int returns from any external patches.

Closes T17 from the overnight backlog (critic finding cycle-2 #1).
Addresses three cycle-3 council findings on commit 492c3dd:

1. Non-atomic write (critic #1, high-severity race). `_record_push_error`
   now writes to `<name>.tmp` then `os.replace`s into the target. Concurrent
   readers (doctor + daemon + MCP server) can no longer observe a truncated
   file that would mask a constraint violation as "error file unreadable".

2. PII leak in persisted error (critic #2). PostgREST 23505 bodies echo
   conflicting row values in `details`/`hint` fields, and `gradata doctor`
   prints the file verbatim. New `_scrub_error_body` parses the body as
   JSON and keeps only `code` + the first 120 chars of `message`
   (enough for the constraint name). Non-JSON bodies reduce to a length
   marker. Log messages use the scrubbed form too.

3. Removed the `_coerce_post_result` shim (verifier + critic). Zero tests
   exercised the bare-int branch it guarded; callers now destructure
   `_post` returns directly.

Tests: +2 (`test_post_error_body_scrubs_row_values`,
`test_scrub_error_body_handles_non_json`), 28/28 in the cloud test files
pass, 3944 passed / 3 skipped full suite. Ruff + pyright clean.

Co-Authored-By: Gradata <noreply@gradata.ai>
When doctor reports on cloud_push_error.json, the detail string now names
the brain directory it checked. In multi-brain deployments, push() and
doctor() can resolve different brain_dirs silently — surfacing the path
lets users spot the divergence instead of chasing phantom "ok" reports.

Cycle-3 critic finding #3.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…metry

Three bugs kept last_sync_at frozen:
- cloud/client.py POSTed /brains/sync (path doesn't exist) -> /sync
- cloud/sync.py POSTed /v1/telemetry/metrics -> /api/v1/telemetry/metrics
- Stop hook never fired cloud sync because Claude Code doesn't call
  brain.end_session(). Added cloud_sync_tick() helper in _core.py and
  new _run_cloud_sync step in session_close.py waterfall.

Also elevated silent DEBUG failures to WARNING with HTTP status +
exc_info so the next failure mode surfaces in run.log.

3945 tests pass.

Co-Authored-By: Gradata <noreply@gradata.ai>
New CLI: gradata skill export <name> [--output-dir DIR] [--description STR]
                                      [--category CAT] [--no-meta]

The bet: Claude Skills' "gotchas" section is exactly what graduated
RULE-tier lessons are -- but generated from real corrections instead of
hand-written. This turns a brain into a portable, shippable Skill folder
with valid YAML frontmatter, category-grouped gotchas, and (when
available) injectable meta-principles.

- new module enhancements/skill_export.py reuses _parse_rules from
  rule_export so the RULE-only filter and [hooked] marker stripping
  stay consistent across exporters
- auto-generated frontmatter description lists rule categories with
  defensive 900-char clip (Anthropic 1024 ceiling)
- name slugified for safe folder name + frontmatter alignment
- description quote-escapes preserve YAML validity
- meta-rule loader degrades gracefully on missing system.db / table

24 new tests; full suite 3969 pass (+24, 0 regressions).

Unblocks M4 items 7 and 9 (self-dev Skill, composition Skill) per
plans/swift-toasting-origami.md.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ug 2)

Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md.

- client.sync() now reads events.jsonl, filters by last_sync_at watermark,
  batches 500 at a time, advances cursor on 200, retries with smaller batch on 413.
- Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl
  which stays append-only and untouched).
- 9/9 new tests pass in tests/test_cloud_client_sync.py.

Council perspective P3 (Skeptic) had this take after audit-gate blocked the
aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py)
read raw events directly, so telemetry-only would have flatlined them.
@greptile-apps

greptile-apps Bot commented May 2, 2026

Copy link
Copy Markdown

Too many files changed for review. (243 files found, 100 file limit)

@coderabbitai

coderabbitai Bot commented May 2, 2026

Copy link
Copy Markdown
📝 Walkthrough

Summary

Cloud Sync & Event Handling (Core Feature)

  • Fixed critical bug: client.sync() now properly reads events.jsonl, filters by watermark (last_sync_at), batches events (500 per request), and advances cursor on successful POST
  • Robust batching: Handles HTTP 413 errors by halving batch size and retrying; persists sync state to .gradata-sync-state.json with watermark tracking
  • Event schema: Added deterministic event formatting with SHA-256-derived IDs and in-batch deduplication
  • Error handling: Constraint violations and network failures logged and persisted to cloud_push_error.json; non-fatal failures don't block retries

Database & Persistence

  • Event identity migration: Added event_id, device_id, content_hash columns with deterministic ULID generation from timestamps
  • Sync state table: New sync_state table with per-device watermarks (last_push_event_id, last_pull_cursor)
  • Schema constraints: Unique constraints on corrections and events tables (idempotent migrations)

Breaking Changes

  • CloudClient.sync(): Return type changed from dictint (ingested event count); now requires explicit batch_size parameter (default 500)
  • brain_end_session(): Added skip_meta_rules: bool = False parameter

New Public APIs

  • Config helpers: get_config_dir(), get_config_file(name: str)
  • Event emissions: emit_gate_result(...), emit_gate_override(...)
  • Cloud telemetry: cloud_sync_tick(brain_dir, session_number)
  • Skill export: export_skill(...), write_skill(...) for exporting graduated rules as Claude Skills
  • Transcript logging: log_turn(...), load_turns(...), cleanup_ttl(...) for session conversation logging
  • Device/tenant helpers: get_or_create_device_id(), ULID generation (new_ulid(), ulid_from_iso())
  • Diagnostics: diagnose() extended with include_cloud, cloud_only flags for cloud-specific probes
  • Rule synthesis: New synthesize_rules_block() module for LLM-powered rule consolidation with caching

Security & Data Quality

  • PII redaction: Email addresses redacted from event payloads before persistence (dual-write: canonical redacted + best-effort raw backup)
  • HTTPS enforcement: GenericHTTPProvider now guards configured base URL with HTTPS validation
  • Deprecations: Streamlit dashboard deprecated (archived); local meta-rule synthesis now deterministic (no cloud dependency)

Testing & Tools

  • 9 new tests: test_cloud_client_sync.py covering watermark filtering, batching, retry logic, and error scenarios
  • Backfill script: scripts/backfill_to_cloud.py for one-shot replay of historical events with idempotency

Documentation

  • Updated architecture docs clarifying local-first sourcing, cloud as mirror/sync target only
  • Meta-rules, graduation, and synthesis now fully local; cloud holds visualization & optional telemetry only

Walkthrough

This PR implements a comprehensive shift toward a "local-first" architecture: Gradata becomes functionally complete without cloud services, meta-rule synthesis runs locally using the user's LLM provider, cloud syncs graduated rules and events for visualization/backup/cross-device use only, and new migrations/client code support resumable multi-device sync. Additionally, graduated lessons export as Anthropic Claude Skills via a new CLI command, and extensive code refactoring standardizes formatting and improves observability.

Changes

Local-First Architecture Shift

Layer / File(s) Summary
Documentation Updates
Gradata/docs/architecture/*, Gradata/docs/cloud/*, Gradata/docs/concepts/meta-rules.md
Repositioned cloud as a sync target and visualization layer, not a learning gate. Removed claims that cloud synthesis or shared meta-rules are required. Clarified that local SDK performs graduation, diffing, rule injection, and meta-rule synthesis; cloud mirrors results.
Meta-Rule Localization
Gradata/src/gradata/enhancements/meta_rules.py
Replaced cloud placeholder discovery with deterministic local clustering. Lessons are grouped by semantic similarity within categories, principles synthesized from highest-confidence examples, and confidence decayed by non-reinforcement. Falls back to deterministic meta-rules when LLM unavailable (with warning).
Meta-Rule Storage Schema
Gradata/src/gradata/enhancements/meta_rules_storage.py, Gradata/migrations/supabase/014_corrections_unique.sql, 015_events_unique.sql, 016_brains_last_used_at.sql
Added applies_when, never_when, transfer_scope, source columns to local/cloud meta-rule tables. Supabase migrations ensure deduplication and track sync identity.
Local Graduation Integration
Gradata/src/gradata/enhancements/self_improvement/_graduation.py
New centralized graduation module implementing Beta lower-bound gates, per-lesson state transitions (INSTINCT ↔ PATTERN, PATTERN → RULE), multi-stage adversarial gates, and optional rule-wording refinement via tree-of-thoughts.
Core Wiring Updates
Gradata/src/gradata/_core.py
brain_end_session now skips meta-rule discovery when skip_meta_rules=True. Lesson/rule state transitions routed through unified graduate(...). New cloud_sync_tick() exports lessons and corrections for a session.

Cloud Sync & Multi-Device Support

Layer / File(s) Summary
Sync State & Migrations
Gradata/src/gradata/_migrations/002_add_event_identity.py, 003_add_sync_state.py, device_uuid.py, _ulid.py
New migrations add event_id (ULID), device_id, content_hash to events table for deterministic replay. sync_state table tracks per-device watermarks (last_push_event_id, last_pull_cursor). Device IDs generated/persisted atomically in .device_id. ULID generator for stable timestamps.
Cloud Sync Client Upgrade
Gradata/src/gradata/cloud/client.py
Replaced POST /brains/sync with /sync endpoint. sync(batch_size) now reads events.jsonl, batches filtered events, handles HTTP 413 via adaptive batch downsizing, and returns ingested count. Sync state persisted to disk for resumability.
Cloud Payload Transformation
Gradata/src/gradata/_cloud_sync.py
Added row transformation helpers: deterministic UUID generation, table remapping (eventsevents, meta_rulesmeta_rules, etc.), JSON scrubbing (NUL/surrogate removal), session coercion. Post failures now persist to cloud_push_error.json for inspection.
Error Persistence & Logging
Gradata/src/gradata/_cloud_sync.py, _doctor.py
_cloud_sync.py logs constraint violations and network errors separately; failures recorded in cloud_push_error.json and cleared on success. diagnose() gains include_cloud/cloud_only flags to probe cloud connectivity, auth, and push-error state.

Skill Export & CLI Enhancement

Layer / File(s) Summary
Skill Export Module
Gradata/src/gradata/enhancements/skill_export.py
New module exports graduated RULE lessons as Anthropic Claude Skill SKILL.md with YAML frontmatter (name, description) and "Gotchas" sections grouped by rule category. Optionally includes injectable meta-principles from system.db. Caches full export.
CLI Skill Command
Gradata/src/gradata/cli.py
Added top-level skill export subcommand accepting name, output-dir, optional description/category, and meta-principles toggle. Wires into export_skill() / write_skill() for folder output or stdout.
Doctor Cloud Diagnostics
Gradata/src/gradata/_doctor.py
diagnose() now accepts include_cloud and cloud_only flags. New cloud probes check credentials, connectivity, auth token validity, dashboard visibility, and persisted sync errors.

Observability & Event Handling Enhancements

Layer / File(s) Summary
PII Redaction
Gradata/src/gradata/_events.py
Event payloads now redact email addresses before persisting. Canonical event uses redacted_data; raw unredacted event side-logged to events.raw.jsonl. New emit_gate_result() and emit_gate_override() helpers standardize gate-event emission.
Debug Logging
Gradata/src/gradata/_context_packet.py, _context_compile.py, _brain_manifest.py, _manifest_metrics.py, context_wrapper.py
Replaced silent pass exception handlers with debug logging via _log.debug(...). Non-fatal failures (context queries, rule retrieval, config parsing) now surface for troubleshooting.
Transcript Logging
Gradata/src/gradata/_transcript.py, _transcript_providers.py
New opt-in transcript logger: log_turn() appends JSON-line turns to sessions/{session_id}/transcript.jsonl (gated by GRADATA_TRANSCRIPT=1). Supports Claude Code native JSONL and Gradata middleware transcripts via source-agnostic interface.

Config & Path Resolution

Layer / File(s) Summary
Centralized Config Paths
Gradata/src/gradata/_config_paths.py
New get_config_dir() resolves config via GRADATA_CONFIG_DIRXDG_CONFIG_HOME/gradata~/.gradata. get_config_file(name) convenience wrapper. Platform-aware precedence.
Brain Context DI
Gradata/src/gradata/_paths.py
Introduced BrainContext frozen dataclass to hold resolved per-brain paths. BrainContext.from_brain_dir() derives all context paths from a brain directory, supporting dependency injection in DI-aware modules. Kept module-level defaults for backward compatibility.

Extensive Refactoring & Formatting

Layer / File(s) Summary
Code Formatting
~80+ files across contrib/patterns/*, enhancements/*, src/gradata/*
Multi-line parameter lists, dictionary/list literals, and function signatures reformatted for readability. No logic changes; error messages, thresholds, and control flow remain identical. Consolidates single-line expressions where verbose multi-line was redundant.
.gitignore Updates
.gitignore
Added .graphify_* patterns (artifacts/scripts), Gradata/run.log, **/run.log, and Gradata/docs/pre-launch-plan.md to ignored SDK-internal drafts. Added standalone tokens near Windows cmd section.
Hook Configuration
Gradata/hooks/hooks.json
Added context-window watchdog Stop hook (ctx_watchdog, 10000ms). Updated existing session-close hook description and extended timeout (15000→90000ms) to reflect gated, concurrency-locked, SDK-only throttled graduation sweep.
CHANGELOG & Cleanup Docs
Gradata/CHANGELOG.md, Gradata/docs/LEGACY_CLEANUP.md
Added [Unreleased] section documenting cloud sync, Supabase migrations, dashboard deprecation, implicit feedback contract, and test/governance fixes. Legacy cleanup checklist for retiring cloud-gate concepts and adapter shims.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Brain as Brain (Local)
    participant Migrate as Migrations
    participant CloudSync as CloudClient
    participant Cloud as Cloud (Supabase)
    participant Dashboard as Dashboard

    Note over User,Dashboard: Initial Setup: Multi-Device Sync Initialization
    User->>Brain: Call set_brain_dir (device A, first time)
    Brain->>Migrate: 001_add_tenant_id: backfill tenant_id
    Brain->>Migrate: 002_add_event_identity: generate device_id, event_id, content_hash
    Migrate->>Brain: Store .device_id locally
    Migrate->>Brain: Create sync_state table with watermarks
    Brain->>Brain: Initialize .gradata-sync-state.json

    Note over User,Dashboard: Session: Meta-Rule Synthesis (Local)
    User->>Brain: brain.correct(...)
    Brain->>Brain: _attribute_domain_fires(), build lessons
    Brain->>Brain: brain_end_session(...)
    Brain->>Brain: discover_meta_rules() → cluster by similarity (local)
    Brain->>Brain: merge_into_meta() → deterministic synthesis
    Brain->>Brain: emit LESSON_CHANGE, RULE_CREATED events
    Brain->>Brain: Persist to events.jsonl, system.db

    Note over User,Dashboard: Sync: Push Events + Graduated Rules to Cloud
    User->>CloudSync: Call client.sync()
    CloudSync->>Brain: Read events.jsonl, load last_push_event_id from sync_state
    CloudSync->>CloudSync: Filter pending events, batch (batch_size=500)
    CloudSync->>CloudSync: Transform rows: deterministic UUIDs, table remap, JSON scrub
    CloudSync->>Cloud: POST /sync (batched events)
    Cloud->>Cloud: Upsert events, meta_rules (conflict-free append-only)
    CloudSync->>Brain: Write new watermark to sync_state.json
    CloudSync->>Brain: Return ingested_count

    Note over User,Dashboard: Dashboard View (Async, Read-Only)
    Dashboard->>Cloud: Query events, meta_rules, graduated lessons
    Dashboard->>User: Render charts, learning funnel, meta-rule corpus
    Note over Dashboard: Cloud never re-runs graduation or modifies local state

    Note over User,Dashboard: Second Device: Resume Learning
    User->>Brain: Set brain_dir (device B, existing brain)
    Brain->>Migrate: device_uuid.get_or_create_device_id() → new device_id
    Brain->>CloudSync: Call client.sync()
    CloudSync->>Cloud: Pull new events/rules since last cursor
    Cloud->>CloudSync: Return events for this device
    CloudSync->>Brain: Ingest events, update local system.db
    CloudSync->>Brain: Advance last_pull_cursor
    Brain->>Brain: Continue learning loop (graduation, synthesis local)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

  • PR #144: Introduces the skill export module and CLI commands, plus cloud_sync_tick() in _core.py, with overlapping cloud sync client and endpoint path changes.
  • PR #102: Adds multi-tenant tenant_id handling and multi-device sync migrations with overlapping changes to _cloud_sync.py and tenant helper wiring.
  • PR #133: Touches overlapping files (.gitignore, docs, hooks, _cloud_sync.py, meta_rules.py, cli.py) implementing related hook and local-first wiring.

Suggested labels

feature, refactor

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/sync-push-events
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch fix/sync-push-events

One-shot: counts events.jsonl, resets local sync state, calls client.sync()
in a loop until cursor catches up. Idempotent — server upserts on
(brain_id, event_id). Run after PRs #11/#12/#161 merge to backfill the
~5800 historical events the broken sync silently dropped.
@coderabbitai coderabbitai Bot added bug Something isn't working feature labels May 2, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 55


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6681236a-68ce-4e4c-b663-7010e17c61fe

📥 Commits

Reviewing files that changed from the base of the PR and between 951791e and caa503f.

⛔ Files ignored due to path filters (1)
  • .claude/hooks/statusline/sprites-statusline.js is excluded by !.claude/**
📒 Files selected for processing (242)
  • .gitignore
  • Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
  • Gradata/CHANGELOG.md
  • Gradata/docs/LEGACY_CLEANUP.md
  • Gradata/docs/architecture/cloud-monolith-v2.md
  • Gradata/docs/architecture/multi-tenant-future-proofing.md
  • Gradata/docs/cloud/dashboard.md
  • Gradata/docs/cloud/overview.md
  • Gradata/docs/concepts/meta-rules.md
  • Gradata/docs/specs/cloud-sync-and-pricing.md
  • Gradata/hooks/hooks.json
  • Gradata/migrations/supabase/014_corrections_unique.sql
  • Gradata/migrations/supabase/015_events_unique.sql
  • Gradata/migrations/supabase/016_brains_last_used_at.sql
  • Gradata/migrations/supabase/README.md
  • Gradata/scripts/backfill_to_cloud.py
  • Gradata/skills/core/session-start/SKILL.md
  • Gradata/src/gradata/__init__.py
  • Gradata/src/gradata/_brain_manifest.py
  • Gradata/src/gradata/_cloud_sync.py
  • Gradata/src/gradata/_config.py
  • Gradata/src/gradata/_config_paths.py
  • Gradata/src/gradata/_context_compile.py
  • Gradata/src/gradata/_context_packet.py
  • Gradata/src/gradata/_core.py
  • Gradata/src/gradata/_data_flow_audit.py
  • Gradata/src/gradata/_db.py
  • Gradata/src/gradata/_doctor.py
  • Gradata/src/gradata/_events.py
  • Gradata/src/gradata/_export_brain.py
  • Gradata/src/gradata/_fact_extractor.py
  • Gradata/src/gradata/_file_lock.py
  • Gradata/src/gradata/_http.py
  • Gradata/src/gradata/_installer.py
  • Gradata/src/gradata/_manifest_helpers.py
  • Gradata/src/gradata/_manifest_metrics.py
  • Gradata/src/gradata/_migrations/001_add_tenant_id.py
  • Gradata/src/gradata/_migrations/002_add_event_identity.py
  • Gradata/src/gradata/_migrations/003_add_sync_state.py
  • Gradata/src/gradata/_migrations/_runner.py
  • Gradata/src/gradata/_migrations/_ulid.py
  • Gradata/src/gradata/_migrations/device_uuid.py
  • Gradata/src/gradata/_migrations/fill_null_tenant.py
  • Gradata/src/gradata/_migrations/tenant_uuid.py
  • Gradata/src/gradata/_mine_transcripts.py
  • Gradata/src/gradata/_paths.py
  • Gradata/src/gradata/_query.py
  • Gradata/src/gradata/_stats.py
  • Gradata/src/gradata/_telemetry.py
  • Gradata/src/gradata/_tenant.py
  • Gradata/src/gradata/_text_utils.py
  • Gradata/src/gradata/_transcript.py
  • Gradata/src/gradata/_transcript_providers.py
  • Gradata/src/gradata/_types.py
  • Gradata/src/gradata/_validator.py
  • Gradata/src/gradata/_workers.py
  • Gradata/src/gradata/adapters/mem0.py
  • Gradata/src/gradata/audit.py
  • Gradata/src/gradata/brain.py
  • Gradata/src/gradata/brain_inspection.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/cloud/client.py
  • Gradata/src/gradata/cloud/sync.py
  • Gradata/src/gradata/context_wrapper.py
  • Gradata/src/gradata/contrib/enhancements/eval_benchmark.py
  • Gradata/src/gradata/contrib/enhancements/install_manifest.py
  • Gradata/src/gradata/contrib/enhancements/quality_gates.py
  • Gradata/src/gradata/contrib/enhancements/truth_protocol.py
  • Gradata/src/gradata/contrib/patterns/__init__.py
  • Gradata/src/gradata/contrib/patterns/agent_modes.py
  • Gradata/src/gradata/contrib/patterns/context_brackets.py
  • Gradata/src/gradata/contrib/patterns/evaluator.py
  • Gradata/src/gradata/contrib/patterns/execute_qualify.py
  • Gradata/src/gradata/contrib/patterns/guardrails.py
  • Gradata/src/gradata/contrib/patterns/human_loop.py
  • Gradata/src/gradata/contrib/patterns/loop_detection.py
  • Gradata/src/gradata/contrib/patterns/mcp.py
  • Gradata/src/gradata/contrib/patterns/memory.py
  • Gradata/src/gradata/contrib/patterns/middleware.py
  • Gradata/src/gradata/contrib/patterns/orchestrator.py
  • Gradata/src/gradata/contrib/patterns/parallel.py
  • Gradata/src/gradata/contrib/patterns/pipeline.py
  • Gradata/src/gradata/contrib/patterns/q_learning_router.py
  • Gradata/src/gradata/contrib/patterns/rag.py
  • Gradata/src/gradata/contrib/patterns/reconciliation.py
  • Gradata/src/gradata/contrib/patterns/reflection.py
  • Gradata/src/gradata/contrib/patterns/sub_agents.py
  • Gradata/src/gradata/contrib/patterns/task_escalation.py
  • Gradata/src/gradata/contrib/patterns/tools.py
  • Gradata/src/gradata/contrib/patterns/tree_of_thoughts.py
  • Gradata/src/gradata/correction_detector.py
  • Gradata/src/gradata/daemon.py
  • Gradata/src/gradata/detection/addition_pattern.py
  • Gradata/src/gradata/enhancements/_sanitize.py
  • Gradata/src/gradata/enhancements/bandits/collaborative_filter.py
  • Gradata/src/gradata/enhancements/bandits/contextual_bandit.py
  • Gradata/src/gradata/enhancements/behavioral_engine.py
  • Gradata/src/gradata/enhancements/causal_chains.py
  • Gradata/src/gradata/enhancements/cluster_manager.py
  • Gradata/src/gradata/enhancements/clustering.py
  • Gradata/src/gradata/enhancements/contradiction_detector.py
  • Gradata/src/gradata/enhancements/dedup.py
  • Gradata/src/gradata/enhancements/diff_engine.py
  • Gradata/src/gradata/enhancements/edit_classifier.py
  • Gradata/src/gradata/enhancements/freshness.py
  • Gradata/src/gradata/enhancements/git_backfill.py
  • Gradata/src/gradata/enhancements/graduation/agent_graduation.py
  • Gradata/src/gradata/enhancements/graduation/judgment_decay.py
  • Gradata/src/gradata/enhancements/graduation/rules_distillation.py
  • Gradata/src/gradata/enhancements/graduation/scoring.py
  • Gradata/src/gradata/enhancements/instruction_cache.py
  • Gradata/src/gradata/enhancements/learning_pipeline.py
  • Gradata/src/gradata/enhancements/lesson_discriminator.py
  • Gradata/src/gradata/enhancements/llm_provider.py
  • Gradata/src/gradata/enhancements/llm_synthesizer.py
  • Gradata/src/gradata/enhancements/memory_taxonomy.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/enhancements/meta_rules_storage.py
  • Gradata/src/gradata/enhancements/metrics.py
  • Gradata/src/gradata/enhancements/observation_hooks.py
  • Gradata/src/gradata/enhancements/pattern_extractor.py
  • Gradata/src/gradata/enhancements/pattern_integration.py
  • Gradata/src/gradata/enhancements/pipeline_rewriter.py
  • Gradata/src/gradata/enhancements/profiling/tone_profile.py
  • Gradata/src/gradata/enhancements/prompt_synthesizer.py
  • Gradata/src/gradata/enhancements/reporting.py
  • Gradata/src/gradata/enhancements/retrieval_fusion.py
  • Gradata/src/gradata/enhancements/router_warmstart.py
  • Gradata/src/gradata/enhancements/rule_canary.py
  • Gradata/src/gradata/enhancements/rule_context_bridge.py
  • Gradata/src/gradata/enhancements/rule_export.py
  • Gradata/src/gradata/enhancements/rule_integrity.py
  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/src/gradata/enhancements/rule_to_hook.py
  • Gradata/src/gradata/enhancements/rule_verifier.py
  • Gradata/src/gradata/enhancements/scoring/brain_scores.py
  • Gradata/src/gradata/enhancements/scoring/calibration.py
  • Gradata/src/gradata/enhancements/scoring/correction_tracking.py
  • Gradata/src/gradata/enhancements/scoring/failure_detectors.py
  • Gradata/src/gradata/enhancements/scoring/gate_calibration.py
  • Gradata/src/gradata/enhancements/scoring/loop_intelligence.py
  • Gradata/src/gradata/enhancements/scoring/memory_extraction.py
  • Gradata/src/gradata/enhancements/scoring/reports.py
  • Gradata/src/gradata/enhancements/scoring/success_conditions.py
  • Gradata/src/gradata/enhancements/self_improvement/__init__.py
  • Gradata/src/gradata/enhancements/self_improvement/_confidence.py
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
  • Gradata/src/gradata/enhancements/similarity.py
  • Gradata/src/gradata/enhancements/skill_export.py
  • Gradata/src/gradata/events_bus.py
  • Gradata/src/gradata/graph.py
  • Gradata/src/gradata/hooks/_base.py
  • Gradata/src/gradata/hooks/_generated_runner_core.py
  • Gradata/src/gradata/hooks/_installer.py
  • Gradata/src/gradata/hooks/_profiles.py
  • Gradata/src/gradata/hooks/agent_graduation.py
  • Gradata/src/gradata/hooks/agent_precontext.py
  • Gradata/src/gradata/hooks/auto_correct.py
  • Gradata/src/gradata/hooks/brain_maintain.py
  • Gradata/src/gradata/hooks/claude_code.py
  • Gradata/src/gradata/hooks/client.py
  • Gradata/src/gradata/hooks/config_protection.py
  • Gradata/src/gradata/hooks/config_validate.py
  • Gradata/src/gradata/hooks/context_inject.py
  • Gradata/src/gradata/hooks/ctx_watchdog.py
  • Gradata/src/gradata/hooks/daemon.py
  • Gradata/src/gradata/hooks/dispatch_post.py
  • Gradata/src/gradata/hooks/duplicate_guard.py
  • Gradata/src/gradata/hooks/generated_runner.py
  • Gradata/src/gradata/hooks/generated_runner_post.py
  • Gradata/src/gradata/hooks/graph_first_check.py
  • Gradata/src/gradata/hooks/graph_session_track.py
  • Gradata/src/gradata/hooks/implicit_feedback.py
  • Gradata/src/gradata/hooks/inject_brain_rules.py
  • Gradata/src/gradata/hooks/jit_inject.py
  • Gradata/src/gradata/hooks/pre_compact.py
  • Gradata/src/gradata/hooks/rule_enforcement.py
  • Gradata/src/gradata/hooks/secret_scan.py
  • Gradata/src/gradata/hooks/self_review.py
  • Gradata/src/gradata/hooks/session_boot.py
  • Gradata/src/gradata/hooks/session_close.py
  • Gradata/src/gradata/hooks/session_persist.py
  • Gradata/src/gradata/hooks/stale_hook_check.py
  • Gradata/src/gradata/hooks/status_line.py
  • Gradata/src/gradata/hooks/telemetry_summary.py
  • Gradata/src/gradata/hooks/tool_failure_emit.py
  • Gradata/src/gradata/hooks/tool_finding_capture.py
  • Gradata/src/gradata/inspection.py
  • Gradata/src/gradata/integrations/anthropic_adapter.py
  • Gradata/src/gradata/integrations/openai_adapter.py
  • Gradata/src/gradata/mcp_server.py
  • Gradata/src/gradata/mcp_tools.py
  • Gradata/src/gradata/middleware/__init__.py
  • Gradata/src/gradata/middleware/_core.py
  • Gradata/src/gradata/middleware/anthropic_adapter.py
  • Gradata/src/gradata/middleware/crewai_adapter.py
  • Gradata/src/gradata/middleware/langchain_adapter.py
  • Gradata/src/gradata/middleware/openai_adapter.py
  • Gradata/src/gradata/notifications.py
  • Gradata/src/gradata/onboard.py
  • Gradata/src/gradata/rules/rule_context.py
  • Gradata/src/gradata/rules/rule_engine/__init__.py
  • Gradata/src/gradata/rules/rule_engine/_formatting.py
  • Gradata/src/gradata/rules/rule_ranker.py
  • Gradata/src/gradata/rules/scope.py
  • Gradata/src/gradata/safety.py
  • Gradata/src/gradata/security/correction_hash.py
  • Gradata/src/gradata/security/correction_provenance.py
  • Gradata/src/gradata/security/manifest_signing.py
  • Gradata/src/gradata/sidecar/watcher.py
  • Gradata/tests/conftest.py
  • Gradata/tests/test_agent_graduation.py
  • Gradata/tests/test_bug_fixes.py
  • Gradata/tests/test_cloud_client_sync.py
  • Gradata/tests/test_cloud_row_push.py
  • Gradata/tests/test_cloud_sync.py
  • Gradata/tests/test_cluster_injection.py
  • Gradata/tests/test_ctx_watchdog.py
  • Gradata/tests/test_doctor_cloud.py
  • Gradata/tests/test_emit_pii_redaction.py
  • Gradata/tests/test_graph_enforcement.py
  • Gradata/tests/test_hooks_intelligence.py
  • Gradata/tests/test_hooks_learning.py
  • Gradata/tests/test_implicit_feedback.py
  • Gradata/tests/test_inject_watchdog_phases.py
  • Gradata/tests/test_integration_workflow.py
  • Gradata/tests/test_lesson_applications.py
  • Gradata/tests/test_llm_synthesizer.py
  • Gradata/tests/test_mem0_adapter.py
  • Gradata/tests/test_meta_rule_generalization.py
  • Gradata/tests/test_meta_rules.py
  • Gradata/tests/test_migration_002_event_identity.py
  • Gradata/tests/test_migration_003_sync_state.py
  • Gradata/tests/test_multi_brain_simulation.py
  • Gradata/tests/test_pipeline_e2e.py
  • Gradata/tests/test_pre_compact.py
  • Gradata/tests/test_rule_pipeline.py
  • Gradata/tests/test_rule_synthesizer.py
  • Gradata/tests/test_session_close_loop_state.py
  • Gradata/tests/test_skill_export.py
  • Gradata/tests/test_transcript.py
💤 Files with no reviewable changes (1)
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/instruction_cache.py
  • Gradata/src/gradata/_file_lock.py
  • Gradata/src/gradata/contrib/patterns/memory.py
  • Gradata/src/gradata/enhancements/clustering.py
  • Gradata/src/gradata/enhancements/router_warmstart.py
  • Gradata/src/gradata/_migrations/_runner.py
  • Gradata/src/gradata/_http.py
  • Gradata/src/gradata/_migrations/_ulid.py
  • Gradata/src/gradata/__init__.py
  • Gradata/src/gradata/_config.py
  • Gradata/src/gradata/enhancements/rule_export.py
  • Gradata/src/gradata/_types.py
  • Gradata/src/gradata/enhancements/pattern_extractor.py
  • Gradata/src/gradata/_tenant.py
  • Gradata/src/gradata/contrib/patterns/tools.py
  • Gradata/src/gradata/enhancements/bandits/contextual_bandit.py
  • Gradata/src/gradata/events_bus.py
  • Gradata/src/gradata/_context_compile.py
  • Gradata/src/gradata/contrib/patterns/evaluator.py
  • Gradata/src/gradata/contrib/patterns/__init__.py
  • Gradata/src/gradata/enhancements/scoring/brain_scores.py
  • Gradata/src/gradata/enhancements/rule_verifier.py
  • Gradata/src/gradata/enhancements/diff_engine.py
  • Gradata/src/gradata/_migrations/device_uuid.py
  • Gradata/src/gradata/_migrations/002_add_event_identity.py
  • Gradata/src/gradata/_migrations/tenant_uuid.py
  • Gradata/src/gradata/_db.py
  • Gradata/src/gradata/_mine_transcripts.py
  • Gradata/src/gradata/contrib/patterns/sub_agents.py
  • Gradata/src/gradata/_data_flow_audit.py
  • Gradata/src/gradata/cloud/sync.py
  • Gradata/src/gradata/_text_utils.py
  • Gradata/src/gradata/contrib/patterns/middleware.py
  • Gradata/src/gradata/contrib/patterns/pipeline.py
  • Gradata/src/gradata/enhancements/dedup.py
  • Gradata/src/gradata/enhancements/rule_context_bridge.py
  • Gradata/src/gradata/contrib/enhancements/truth_protocol.py
  • Gradata/src/gradata/contrib/patterns/parallel.py
  • Gradata/src/gradata/enhancements/lesson_discriminator.py
  • Gradata/src/gradata/enhancements/freshness.py
  • Gradata/src/gradata/audit.py
  • Gradata/src/gradata/enhancements/scoring/loop_intelligence.py
  • Gradata/src/gradata/contrib/patterns/q_learning_router.py
  • Gradata/src/gradata/enhancements/edit_classifier.py
  • Gradata/src/gradata/enhancements/pattern_integration.py
  • Gradata/src/gradata/context_wrapper.py
  • Gradata/src/gradata/enhancements/contradiction_detector.py
  • Gradata/src/gradata/_migrations/fill_null_tenant.py
  • Gradata/src/gradata/enhancements/pipeline_rewriter.py
  • Gradata/src/gradata/_workers.py
  • Gradata/src/gradata/enhancements/profiling/tone_profile.py
  • Gradata/src/gradata/enhancements/rule_canary.py
  • Gradata/src/gradata/contrib/patterns/tree_of_thoughts.py
  • Gradata/src/gradata/contrib/patterns/task_escalation.py
  • Gradata/src/gradata/enhancements/skill_export.py
  • Gradata/src/gradata/_transcript_providers.py
  • Gradata/src/gradata/_transcript.py
  • Gradata/src/gradata/_fact_extractor.py
  • Gradata/src/gradata/enhancements/scoring/reports.py
  • Gradata/src/gradata/contrib/patterns/execute_qualify.py
  • Gradata/src/gradata/_migrations/003_add_sync_state.py
  • Gradata/src/gradata/contrib/patterns/loop_detection.py
  • Gradata/src/gradata/brain_inspection.py
  • Gradata/src/gradata/contrib/patterns/reconciliation.py
  • Gradata/src/gradata/contrib/enhancements/quality_gates.py
  • Gradata/src/gradata/adapters/mem0.py
  • Gradata/src/gradata/enhancements/similarity.py
  • Gradata/src/gradata/enhancements/graduation/judgment_decay.py
  • Gradata/src/gradata/enhancements/metrics.py
  • Gradata/src/gradata/enhancements/graduation/agent_graduation.py
  • Gradata/src/gradata/enhancements/memory_taxonomy.py
  • Gradata/src/gradata/enhancements/graduation/rules_distillation.py
  • Gradata/src/gradata/cloud/client.py
  • Gradata/src/gradata/contrib/patterns/orchestrator.py
  • Gradata/src/gradata/enhancements/reporting.py
  • Gradata/src/gradata/_validator.py
  • Gradata/src/gradata/enhancements/git_backfill.py
  • Gradata/src/gradata/_brain_manifest.py
  • Gradata/src/gradata/enhancements/scoring/gate_calibration.py
  • Gradata/src/gradata/_migrations/001_add_tenant_id.py
  • Gradata/src/gradata/enhancements/retrieval_fusion.py
  • Gradata/src/gradata/enhancements/llm_provider.py
  • Gradata/src/gradata/_manifest_helpers.py
  • Gradata/src/gradata/contrib/patterns/context_brackets.py
  • Gradata/src/gradata/_installer.py
  • Gradata/src/gradata/enhancements/learning_pipeline.py
  • Gradata/src/gradata/enhancements/behavioral_engine.py
  • Gradata/src/gradata/enhancements/scoring/memory_extraction.py
  • Gradata/src/gradata/enhancements/prompt_synthesizer.py
  • Gradata/src/gradata/enhancements/scoring/correction_tracking.py
  • Gradata/src/gradata/_doctor.py
  • Gradata/src/gradata/contrib/patterns/mcp.py
  • Gradata/src/gradata/enhancements/self_improvement/_confidence.py
  • Gradata/src/gradata/contrib/patterns/agent_modes.py
  • Gradata/src/gradata/_cloud_sync.py
  • Gradata/src/gradata/enhancements/scoring/failure_detectors.py
  • Gradata/src/gradata/enhancements/llm_synthesizer.py
  • Gradata/src/gradata/enhancements/bandits/collaborative_filter.py
  • Gradata/src/gradata/contrib/patterns/rag.py
  • Gradata/src/gradata/enhancements/rule_to_hook.py
  • Gradata/src/gradata/enhancements/scoring/calibration.py
  • Gradata/src/gradata/_export_brain.py
  • Gradata/src/gradata/contrib/patterns/human_loop.py
  • Gradata/src/gradata/enhancements/scoring/success_conditions.py
  • Gradata/src/gradata/_stats.py
  • Gradata/src/gradata/detection/addition_pattern.py
  • Gradata/src/gradata/enhancements/causal_chains.py
  • Gradata/src/gradata/enhancements/_sanitize.py
  • Gradata/src/gradata/enhancements/rule_integrity.py
  • Gradata/src/gradata/correction_detector.py
  • Gradata/src/gradata/contrib/patterns/reflection.py
  • Gradata/src/gradata/_telemetry.py
  • Gradata/src/gradata/contrib/enhancements/install_manifest.py
  • Gradata/src/gradata/_manifest_metrics.py
  • Gradata/src/gradata/enhancements/rule_synthesizer.py
  • Gradata/src/gradata/enhancements/meta_rules.py
  • Gradata/src/gradata/daemon.py
  • Gradata/src/gradata/_events.py
  • Gradata/src/gradata/cli.py
  • Gradata/src/gradata/contrib/enhancements/eval_benchmark.py
  • Gradata/src/gradata/graph.py
  • Gradata/src/gradata/brain.py
  • Gradata/src/gradata/_query.py
  • Gradata/src/gradata/_context_packet.py
  • Gradata/src/gradata/_config_paths.py
  • Gradata/src/gradata/_paths.py
  • Gradata/src/gradata/enhancements/observation_hooks.py
  • Gradata/src/gradata/enhancements/rule_pipeline.py
  • Gradata/src/gradata/contrib/patterns/guardrails.py
  • Gradata/src/gradata/enhancements/meta_rules_storage.py
  • Gradata/src/gradata/enhancements/cluster_manager.py
  • Gradata/src/gradata/enhancements/self_improvement/__init__.py
  • Gradata/src/gradata/enhancements/graduation/scoring.py
  • Gradata/src/gradata/_core.py
🪛 LanguageTool
Gradata/docs/specs/cloud-sync-and-pricing.md

[grammar] ~102-~102: Ensure spelling is correct
Context: ...vior: - Triggered on Stop hook or every 5min when events accumulated. - Pushes since...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~269-~269: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...vent push logged with content_hash. - Every ACL change emits an acl_changed event...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[grammar] ~290-~290: Ensure spelling is correct
Context: ... cadence:** hourly for Personal+, every 15min for Teams+, continuous WAL for Enterpri...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

Gradata/docs/cloud/overview.md

[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...

(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)

🪛 markdownlint-cli2 (0.22.1)
Gradata/skills/core/session-start/SKILL.md

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Gradata/docs/LEGACY_CLEANUP.md

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

Gradata/docs/concepts/meta-rules.md

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

Gradata/migrations/supabase/README.md

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Gradata/docs/specs/cloud-sync-and-pricing.md

[warning] 35-35: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


[warning] 60-60: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


[warning] 81-81: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 109-109: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Gradata/docs/architecture/multi-tenant-future-proofing.md

[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

Comment on lines +4 to +26
Run: streamlit run C:/Users/olive/SpritesWork/brain/scripts/dashboard.py
"""

import json
import re
import sqlite3
from datetime import datetime
from pathlib import Path

import pandas as pd
import plotly.graph_objects as go
import streamlit as st

# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
BRAIN_DIR = Path("C:/Users/olive/SpritesWork/brain")
DB_PATH = BRAIN_DIR / "system.db"
EVENTS_PATH = BRAIN_DIR / "events.jsonl"
LESSONS_PATH = BRAIN_DIR / "lessons.md"
PROSPECTS_DIR = BRAIN_DIR / "prospects"
BRIEF_PATH = BRAIN_DIR / "morning-brief.md"
TASKS_DIR = Path("C:/Users/olive/.claude/scheduled-tasks")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove the user-specific absolute paths from the archived script.

This hardcodes C:/Users/olive/... in both the docstring and runtime config, which leaks a private workstation path into the repo and makes the archive non-portable everywhere else. Parameterize these via env/CLI args or derive them relative to the script.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines 4
- 26, The file hardcodes user-specific absolute paths (notably the docstring run
path and constants BRAIN_DIR, DB_PATH, EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR,
BRIEF_PATH, TASKS_DIR); change these to be derived from environment/CLI inputs
or relative locations: replace the literal "C:/Users/olive/..." usage by reading
a base path from an environment variable (e.g., BRAIN_DIR_ENV) or a CLI arg (or
default to Path.home() / "SpritesWork/brain"), then compute DB_PATH,
EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR, BRIEF_PATH, and TASKS_DIR from that
base; also update the docstring run example to show a generic placeholder (e.g.,
streamlit run path/to/dashboard.py) rather than the absolute user path.

Comment on lines +14 to +19
DELETE FROM events a
USING events b
WHERE a.brain_id = b.brain_id
AND a.type = b.type
AND a.created_at = b.created_at
AND a.ctid > b.ctid;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

This dedupe key is too coarse for raw events.

Using only (brain_id, type, created_at) can collapse legitimate same-type events that happen at the same timestamp, and the DELETE makes that loss irreversible. Given this PR’s move toward explicit event identity/idempotency, the uniqueness boundary should be a real event identifier, not type + timestamp.

Also applies to: 23-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/migrations/supabase/015_events_unique.sql` around lines 14 - 19, The
DELETE in the migration is using a too-coarse dedupe key (a.brain_id, a.type,
a.created_at) which can remove legitimate simultaneous events; update the
deduplication to use a true event identifier (for example an event_id or
idempotency_key column) instead of type+timestamp—modify the DELETE ... USING
query to compare a.event_id = b.event_id (or the appropriate unique identifier
column) and only delete duplicates based on that stable identifier, and if such
a column does not exist add a non-null unique event identifier to the events
table first and rework the dedupe logic; apply the same fix to the analogous
block referenced in lines 23-37.

Comment on lines +57 to +65
with open(events_jsonl) as f:
for line in f:
line = line.strip()
if not line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Read events.jsonl as UTF-8 explicitly.

Line 57 uses the platform default encoding. On Windows or other non-UTF-8 locales, one non-ASCII event is enough to crash the backfill before any sync happens.

Suggested fix
-    with open(events_jsonl) as f:
+    with open(events_jsonl, encoding="utf-8", errors="replace") as f:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with open(events_jsonl) as f:
for line in f:
line = line.strip()
if not line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
with open(events_jsonl, encoding="utf-8", errors="replace") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/scripts/backfill_to_cloud.py` around lines 57 - 65, The file reading
loop currently opens events_jsonl with the platform default encoding which can
fail on non-UTF-8 systems; update the open call that reads events_jsonl (the
"with open(events_jsonl) as f:" line) to explicitly specify encoding='utf-8'
(optionally add errors='replace' or 'ignore' if you prefer resilient parsing) so
json.loads receives proper UTF-8 text and non-ASCII events do not crash the
backfill.

Comment on lines +12 to +21
Read `C:/Users/olive/SpritesWork/brain/continuation.md`. If exists, follow its Resume Point, then archive: `python C:/Users/olive/SpritesWork/brain/scripts/continuation.py archive`. If missing, continue.

## Step 2: Load Context (parallel batch)

Fire all at once — no dependencies:
1. Read `domain/pipeline/startup-brief.md` (pipeline snapshot, handoff section) *(verify path — may be stale)*
2. Read `C:/Users/olive/SpritesWork/brain/lessons.md` (scan for mistakes to avoid)
3. Check Google Calendar today + 30 days (demos, calls, meetings)
4. Read `C:/Users/olive/SpritesWork/brain/loop-state.md` (session number, open items) *(auto-regenerated by session_close hook — always fresh)*
5. Read `C:/Users/olive/SpritesWork/brain/brain_prompt.md` (soul.md VOICE mandatories + graduated RULE-level lessons)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove machine-specific paths and private/internal references from this skill.

Hardcoding C:/Users/olive/..., SpritesWork, Oliver, and sprites_context.md makes the skill non-portable and leaks private repo/user details into a shipped artifact. Please switch these to runtime placeholders or repo-relative/public paths.

As per coding guidelines, "Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*".

Also applies to: 38-52

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/skills/core/session-start/SKILL.md` around lines 12 - 21, The
SKILL.md contains hardcoded, machine-specific paths and private names (e.g.,
"C:/Users/olive/SpritesWork/...", "SpritesWork", "Oliver", and filenames like
continuation.md, lessons.md, loop-state.md, brain_prompt.md and
domain/pipeline/startup-brief.md) which must be replaced with portable
placeholders or repo-relative references; update the Step 1/Step 2 file
references in this document (and the similar occurrences around lines 38–52) to
use runtime variables or repo-relative paths (e.g.,
{{WORKSPACE}}/brain/continuation.md or ./brain/continuation.md) and remove any
personal identifiers, ensuring each bullet clearly indicates a configurable
placeholder (or public path) and add a short note that these files are expected
to be present at runtime rather than hardcoded to a local user folder.

Comment on lines +32 to +36
```
[check] S[N] loaded | [today's calendar or "clear"]
[tasks] Top 2-3 from loop-state open items
[alert] Only if something is broken/overdue — otherwise omit
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced code block.

This currently trips markdownlint MD040. Use something like ```text here.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/skills/core/session-start/SKILL.md` around lines 32 - 36, The fenced
code block in SKILL.md (the block showing "[check] S[N] loaded ..." etc.) lacks
a language tag and triggers markdownlint MD040; update that fenced block to
start with a language label (e.g., use "```text" instead of "```") so the block
is explicitly marked as plain text and the linter error is resolved.

Comment on lines 529 to 535
try:
import json as _json

disp_path.write_text(
_json.dumps(tracker.to_dict(), indent=2), encoding="utf-8",
_json.dumps(tracker.to_dict(), indent=2),
encoding="utf-8",
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Write disposition.json through the atomic JSON helper.

This overwrites the file in place during a best-effort phase. If the process dies after truncation, the next run loses the entire disposition state. Route this through the repo's atomic JSON writer instead.

As per coding guidelines, "Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_pipeline.py` around lines 529 - 535,
The code currently calls disp_path.write_text(_json.dumps(tracker.to_dict(),
indent=2), ...) which can truncate disposition.json mid-write; change this to
use the repository's atomic JSON writer (the atomic JSON helper) to write
tracker.to_dict() to disp_path atomically instead of using disp_path.write_text;
remove the direct json dump and call the atomic helper (passing the dict and
desired indent/encoding) so disposition.json is written via the repo's
atomic-write utility.

Comment on lines +837 to +844
brain.emit(
HOOK_DEMOTED,
source,
{
"slug": slug,
"hook_path": str(target),
},
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Include rule_id in HOOK_DEMOTED events.

count_human_reversals() only counts reversal events whose payload contains a matching rule_id, but this emit only writes slug/hook_path. That means a manual demotion never feeds back into the empirical gate, so the same rule can be auto-promoted again immediately after being removed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_to_hook.py` around lines 837 - 844, The
HOOK_DEMOTED emit is missing the rule identifier required by
count_human_reversals(), so include the rule's id in the emitted payload (e.g.,
add "rule_id": rule.id or "rule_id": rule_id depending on the local symbol
available) when calling brain.emit(HOOK_DEMOTED, source, {...}); ensure you
reference the actual Rule object or local rule_id variable used in this module
so manual demotions are counted by count_human_reversals().

Comment on lines +87 to +90
re.compile(
r"(?:by|before|on|until)\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\d{1,2}[/-]\d{1,2})",
re.I,
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't treat bare date fragments as action items.

This regex now matches standalone phrases like on Monday, and extract() persists that fragment via match.group(0). A user saying the meeting is on Monday will now create an action_item even though no action was requested, which pollutes prospective memory and can duplicate the temporal fact from the same sentence.

Proposed fix
 _ACTION_PATTERNS = [
     re.compile(r"(?:need to|should|will|going to|have to|must)\s+(.+?)(?:\.|$)", re.I),
     re.compile(r"(?:follow up|schedule|send|check|review|prepare|draft)\s+(.+?)(?:\.|$)", re.I),
-    re.compile(
-        r"(?:by|before|on|until)\s+(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\d{1,2}[/-]\d{1,2})",
-        re.I,
-    ),
 ]

If you still want deadline-aware action items, fold the deadline into the verb-based patterns so the stored fact is the full action, e.g. send the report by Monday, not just by Monday.

Also applies to: 152-165

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/scoring/memory_extraction.py` around lines
87 - 90, The regex that matches "(?:by|before|on|until)\s+(...)" in
memory_extraction.py is capturing standalone date fragments (e.g., "on Monday")
and extract() stores match.group(0) as an action_item; change the patterns so
date/deadline fragments are only captured when attached to a verb/action phrase
(e.g., require a verb or imperative before the deadline or fold the deadline
into existing verb-based patterns like the verb-driven pattern list used by
extract()); specifically, update the loose date-only pattern(s) at the places
referenced (the re.compile call and the similar block at lines ~152-165) to
either (a) remove the standalone "(?:by|before|on|until) ..." pattern, or (b)
require a preceding verb token or phrase (e.g., using a positive lookbehind or
adding
"\b(?:send|submit|remind|schedule|prepare|complete|...)\b.*?(?:by|before|on|until)\s+..."
or merge the deadline part into the verb-based regexes), and ensure extract()
continues to use the full matched action phrase rather than a bare date
fragment.

Comment on lines 256 to 279
# 6. Output not becoming bland (from metrics module)
try:
from gradata.enhancements.metrics import compute_metrics

m = compute_metrics(db_path, window)
blandness = m.get("blandness_score", 0.0) if isinstance(m, dict) else getattr(m, "blandness_score", 0.0)
blandness = (
m.get("blandness_score", 0.0)
if isinstance(m, dict)
else getattr(m, "blandness_score", 0.0)
)
bland_ok = blandness < 0.70
conditions.append(ConditionResult(
name="output_not_bland",
met=bland_ok,
current_value=round(blandness, 4),
baseline_value=0.70,
trend="varied" if bland_ok else "generic",
detail=f"Blandness: {blandness:.2f} (threshold: 0.70)",
))
conditions.append(
ConditionResult(
name="output_not_bland",
met=bland_ok,
current_value=round(blandness, 4),
baseline_value=0.70,
trend="varied" if bland_ok else "generic",
detail=f"Blandness: {blandness:.2f} (threshold: 0.70)",
)
)
except Exception:
pass

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently swallow errors in the blandness (compute_metrics) success-condition path.

The blandness evaluation uses except Exception: pass, which will hide import errors or unexpected metric-shape issues, resulting in “mysteriously” missing/incorrect output_not_bland condition state.

Suggested fix
+import logging
+logger = logging.getLogger(__name__)
@@
         try:
             from gradata.enhancements.metrics import compute_metrics
@@
             conditions.append(
                 ConditionResult(
                     name="output_not_bland",
                     met=bland_ok,
                     current_value=round(blandness, 4),
                     baseline_value=0.70,
                     trend="varied" if bland_ok else "generic",
                     detail=f"Blandness: {blandness:.2f} (threshold: 0.70)",
                 )
             )
-        except Exception:
-            pass
+        except Exception:
+            logger.warning("Failed to compute blandness success condition", exc_info=True)

As per coding guidelines, “Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/scoring/success_conditions.py` around lines
256 - 279, The try/except around the blandness check silently swallows errors;
update the block where you call gradata.enhancements.metrics.compute_metrics and
build the ConditionResult (the code that computes blandness, bland_ok, and
appends ConditionResult("output_not_bland")) to catch specific errors (e.g.,
ImportError, Exception as e) rather than a bare except, log a warning including
exc_info=True via the module logger or processLogger, and ensure you still
append a reasonable ConditionResult when metrics cannot be computed (e.g.,
met=False or met=None with detail describing the exception) so the pipeline
surfaces the failure instead of disappearing the condition.

Comment on lines +168 to +171
try:
metas = load_meta_rules(db_path)
except Exception:
return []

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Log meta-rule load failures instead of silently dropping them.

With include_meta=True, any DB/schema error in load_meta_rules() currently degrades the export to "no meta-principles" with no signal. Please at least log the exception before returning [], so partial exports are diagnosable.

As per coding guidelines, "Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/skill_export.py` around lines 168 - 171, The
code swallows errors from load_meta_rules() when include_meta=True; change the
bare except to catch Exception as e and log the failure before returning an
empty list so failures are visible: replace the current except block with
something like "except Exception as e: logger.warning('Failed to load meta rules
for include_meta export', exc_info=True)" (ensure you use the module logger or
import one) and then return [] — reference load_meta_rules, include_meta and the
metas assignment to locate the change.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines 21 to +22
### 1. Local-first stays the source of truth
SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix markdownlint MD022: add blank line after heading

### 1. Local-first stays the source of truth is not followed by a blank line before the bullet list, which triggers MD022. citestatic_analysis_hints

✅ Proposed change
 ### 1. Local-first stays the source of truth
+
 SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### 1. Local-first stays the source of truth
SDK writes to local SQLite + jsonl. Cloud is a **sync target + shared meta-rule source + proprietary scoring service**. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
### 1. Local-first stays the source of truth
SDK writes to local SQLite + jsonl and runs the full learning loop (graduation, synthesis, rule-to-hook promotion) locally. Cloud is a **sync target + dashboard + future team + future shared-corpus surface** — not a gate on the local loop. Do NOT migrate SDK storage to Postgres. Reasons: privacy, offline, open source, speed.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 21-21: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/architecture/multi-tenant-future-proofing.md` around lines 21 -
22, The heading "### 1. Local-first stays the source of truth" violates
markdownlint MD022 because it is not followed by a blank line; fix this by
inserting a single blank line immediately after that heading (i.e., add an empty
line between the heading and the following paragraph/bullet list) so the
document conforms to MD022 while keeping the existing heading text and
subsequent content unchanged.

# Dashboard

The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It wraps the same data the local `brain.manifest.json` exposes, plus Cloud-only views for meta-rule synthesis, team management, and the operator console.
The Gradata Cloud dashboard is a Next.js app at [app.gradata.ai](https://app.gradata.ai). It visualizes the same data the local `brain.manifest.json` exposes, plus Cloud-only views for team management and the operator console. Meta-rule synthesis runs locally in the SDK — the dashboard renders the results, it does not re-run them.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

This page still contradicts itself about where meta-rules are synthesized.

Line 3 now says synthesis runs locally, but the Brain detail bullets later still describe meta-rules as “cloud-synthesized.” Please update that downstream copy in the same pass.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/cloud/dashboard.md` at line 3, Update the contradictory wording
in the Gradata Cloud dashboard docs: change the downstream "Brain detail"
bullets that currently call meta-rules “cloud-synthesized” to match the earlier
statement that meta-rule synthesis runs locally in the SDK (e.g., refer to
"meta-rules", "brain.manifest.json", and the "Brain detail" bullets in
Gradata/docs/cloud/dashboard.md) so all references consistently state that
synthesis is performed locally and the dashboard only renders the results.

# Gradata Cloud

Gradata Cloud is the hosted dashboard and back-end that complements the open-source SDK. The SDK keeps running locally; Cloud adds synchronization, cross-device continuity, team sharing, meta-rule synthesis, and an operator view for engineering teams.
Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reduce wordiness flagged by LanguageTool

Replace “on top of that local loop” with a shorter phrase (e.g., “on the local loop”) to address the wordiness lint. citestatic_analysis_hints

✅ Proposed change
-Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.
+Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on top of that local loop.
Gradata Cloud is the hosted dashboard that complements the open-source SDK. **The SDK is functionally complete on its own** — graduation, meta-rule synthesis, rule-to-hook promotion, and every piece of the learning loop run locally. Cloud adds visualization, cross-device continuity, team sharing, and managed backups on the local loop.
🧰 Tools
🪛 LanguageTool

[style] ~3-~3: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...uity, team sharing, and managed backups on top of that local loop. ## What's in the SDK vs th...

(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/cloud/overview.md` at line 3, The sentence containing "on top of
that local loop" in the Gradata Cloud overview should be shortened for clarity;
replace that phrase with "on the local loop" so the sentence reads "...Cloud
adds visualization, cross-device continuity, team sharing, and managed backups
on the local loop." Locate the paragraph that begins "Gradata Cloud is the
hosted dashboard..." and update the exact string accordingly.

Comment on lines +47 to +50
!!! info "Local by default"
Meta-rule clustering **and** principle synthesis both run locally. Synthesis uses whichever LLM path you've configured: your own Anthropic API key (set `ANTHROPIC_API_KEY`) or the Claude Code Max OAuth path via `claude -p`. Cloud is not required for any of it — the full `[rule, rule, rule] → "Verify before acting"` pipeline runs in the OSS SDK.

The math, the events, and the storage are all open. Only the LLM-driven synthesis that turns `[rule, rule, rule] → "Verify before acting"` is cloud-gated.
Cloud becomes relevant when you want a hosted dashboard, cross-device sync, team brains, or (future) opt-in corpus donation. It does not re-synthesize or override what graduated locally.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the footer cross-reference to match the new local-first explanation.

This section says cloud does not synthesize meta-rules, but the “Next” link at the bottom still sends readers to Cloud Overview “for meta-rule synthesis.” That pointer is now misleading.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 50-50: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/concepts/meta-rules.md` around lines 47 - 50, The footer "Next"
link that currently points readers to the Cloud Overview "for meta-rule
synthesis" is now misleading given the "Local by default" text; update the
bottom cross-reference to either remove the claim about synthesis or retarget
the link to the Cloud Overview section that discusses hosted dashboard/team
sync/team brains (or a more appropriate cloud-topic page), and ensure the link
text reflects that cloud is relevant for dashboard/sync/team features rather
than meta-rule synthesis; locate the "Local by default" block and the subsequent
"Next" link text in the same markdown and adjust the link target and label
accordingly.

Comment on lines +16 to +46
### 1. Deprecated adapter shims (scheduled v0.8.0)
- `src/gradata/integrations/anthropic_adapter.py` → `middleware.wrap_anthropic`
- `src/gradata/integrations/langchain_adapter.py` → `middleware.LangChainCallback`
- `src/gradata/integrations/crewai_adapter.py` → `middleware.CrewAIGuard`
Warnings are in place; remove the modules and their tests at v0.8.0.

### 2. `_cloud_sync.py` terminology
File posts to an optional external dashboard — fine to keep, but the
module docstring should make clear it is optional telemetry, not a
mandatory cloud dependency. Callers already tolerate absence.

### 3. Docstring drift in `meta_rules.py`
Module header still says "require Gradata Cloud" and "no-ops in the
open-source build". That is no longer true as of the local-first port —
rewrite the header to describe the local clustering algorithm.

### 4. Test-level cloud gating
Former `@_requires_cloud` / `skipif` markers were deleted in this cycle.
If any new test reintroduces a cloud gate, delete the gate instead — the
feature should either be local-first or not ship.

### 5. `api_key` kwarg on `merge_into_meta`
The old `merge_into_meta(..., api_key=...)` path routed into
`synthesise_principle_llm` directly. Current architecture drives LLM
distillation from `rule_synthesizer` at session close instead. The kwarg
is still accepted via `**kwargs` for forward compatibility but performs
no work — remove after one release.

### 6. Doc sweep
`docs/cloud/` should be audited for pages that imply cloud is required.
Rewrite as "optional managed hosting" or delete.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add blank lines after each subsection heading.

markdownlint-cli2 is flagging every ### block here with MD022. A blank line after each heading will clear the linter without changing content.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 16-16: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 22-22: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 27-27: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 37-37: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


[warning] 44-44: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/docs/LEGACY_CLEANUP.md` around lines 16 - 46, Add a single blank line
after each subsection heading in LEGACY_CLEANUP.md (e.g., after "### 1.
Deprecated adapter shims (scheduled v0.8.0)", "### 2. `_cloud_sync.py`
terminology", "### 3. Docstring drift in `meta_rules.py`", "### 4. Test-level
cloud gating", "### 5. `api_key` kwarg on `merge_into_meta`", and "### 6. Doc
sweep") so every '###' header is followed by an empty line to satisfy
markdownlint-md022.

Comment on lines +23 to +33
IF NOT EXISTS (
SELECT 1
FROM pg_constraint c
JOIN pg_class t ON t.oid = c.conrelid
WHERE t.relname = 'corrections'
AND c.contype = 'u'
AND c.conkey @> ARRAY[
(SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'),
(SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'),
(SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description')
]::smallint[]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check for an exact unique key match here, not just a superset.

c.conkey @> ARRAY[...] also matches a wider constraint like (brain_id, session, description, created_at). In that case this migration would skip adding the intended 3-column uniqueness and still allow duplicate descriptions per session.

Suggested fix
   IF NOT EXISTS (
     SELECT 1
     FROM pg_constraint c
     JOIN pg_class t ON t.oid = c.conrelid
+    CROSS JOIN LATERAL (
+      SELECT ARRAY[
+        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'),
+        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'),
+        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description')
+      ]::smallint[] AS target_cols
+    ) cols
     WHERE t.relname = 'corrections'
       AND c.contype = 'u'
-      AND c.conkey @> ARRAY[
-        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'brain_id'),
-        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'session'),
-        (SELECT attnum FROM pg_attribute WHERE attrelid = t.oid AND attname = 'description')
-      ]::smallint[]
+      AND c.conkey @> cols.target_cols
+      AND c.conkey <@ cols.target_cols
   ) THEN
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/migrations/supabase/014_corrections_unique.sql` around lines 23 - 33,
The uniqueness check currently uses "c.conkey @> ARRAY[...]" which matches
superset constraints; change it to test for an exact match by comparing arrays
exactly (e.g., use "c.conkey = ARRAY[ ... ]::smallint[]" or use both "@>" and
"<@" to ensure equality) for the constraint on columns brain_id, session,
description so the migration only skips when a true 3-column unique constraint
already exists for corrections.

Comment on lines +32 to +37
```
corrections_brain_session_desc_key UNIQUE (brain_id, session, description) -- pre-existing
corrections_brain_session_description_unique UNIQUE (brain_id, session, description) -- from 014
events_brain_type_created_at_key UNIQUE (brain_id, type, created_at) -- pre-existing
events_brain_type_created_at_unique UNIQUE (brain_id, type, created_at) -- from 015
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to this fenced block.

markdownlint-cli2 is already flagging this snippet with MD040. Label it text or sql so the new README stays lint-clean.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 32-32: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/migrations/supabase/README.md` around lines 32 - 37, The fenced code
block containing the UNIQUE constraint lines (e.g.,
corrections_brain_session_desc_key,
corrections_brain_session_description_unique, events_brain_type_created_at_key,
events_brain_type_created_at_unique) needs a language tag to satisfy
markdownlint MD040; edit the block start from ``` to ```sql (or ```text) so the
snippet is labeled (e.g., change ``` to ```sql) and the README will lint
cleanly.

Comment on lines +1408 to +1413
for (raw,) in rows:
try:
parsed = _json.loads(raw) if isinstance(raw, str) else raw
if isinstance(parsed, dict):
session_corrections.append(parsed)
except (TypeError, _json.JSONDecodeError):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize DB correction payloads before forwarding them to _cloud_sync_session().

brain_correct() stores draft_text / final_text, but _cloud_sync_session() reads draft / final. Appending the raw event payload here makes hook-driven sessions report zero blandness and any future final-based metrics will be wrong.

Suggested fix
                 for (raw,) in rows:
                     try:
                         parsed = _json.loads(raw) if isinstance(raw, str) else raw
                         if isinstance(parsed, dict):
-                            session_corrections.append(parsed)
+                            normalized = dict(parsed)
+                            if "draft" not in normalized and "draft_text" in normalized:
+                                normalized["draft"] = normalized["draft_text"]
+                            if "final" not in normalized and "final_text" in normalized:
+                                normalized["final"] = normalized["final_text"]
+                            session_corrections.append(normalized)
                     except (TypeError, _json.JSONDecodeError):
                         continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_core.py` around lines 1408 - 1413, The loop that builds
session_corrections currently appends raw event payloads (parsed) which contain
draft_text/final_text, but _cloud_sync_session() expects draft/final; update the
normalization inside the for-loop (where parsed is created) to map
parsed.get("draft_text") -> parsed["draft"] and parsed.get("final_text") ->
parsed["final"] (preserving existing draft/final if present) before appending to
session_corrections so brain_correct() payloads align with _cloud_sync_session()
expectations.

Comment on lines +1771 to +1777
provenance = {
"source": "gradata",
"skill_id": skill_id,
"brain_name": brain.dir.name,
"exported_at": datetime.now(UTC).isoformat(),
"min_state": min_state,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Write provenance.json atomically.

This payload is persisted just below via Path.write_text(). A mid-write crash can leave a truncated JSON file inside an otherwise-created skill directory; please route it through the repo's atomic JSON write helper.

As per coding guidelines, Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/_core.py` around lines 1771 - 1777, The provenance dict
(provenance with keys source, skill_id, brain_name, exported_at, min_state) is
being persisted with a plain Path.write_text which can leave a truncated
provenance.json on crash; replace that direct write with the repository's atomic
JSON write helper (use the project helper such as
write_json_atomic/atomic_write_json — locate the helper in the repo) and call it
to serialize the provenance dict to provenance.json atomically instead of using
Path.write_text; ensure the same provenance object and target filename are used
so behavior is unchanged except for atomicity.

Comment on lines +200 to +223
def _call_gemini(
model: str, system: str, user_prompt: str, max_tokens: int, timeout: float
) -> str | None:
key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY")
if not key:
_log.debug("synth: GOOGLE_API_KEY / GEMINI_API_KEY not set")
return None
try:
from google import genai
from google.genai import types as genai_types

client = genai.Client(api_key=key)
config = genai_types.GenerateContentConfig(
system_instruction=system,
max_output_tokens=max_tokens,
)
resp = client.models.generate_content(
model=model,
contents=user_prompt,
config=config,
)
text = resp.text
return text.strip() if text else None
except Exception as exc:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

How do you configure client-level or per-request timeouts in the google-genai Python SDK?

💡 Result:

In the google-genai Python SDK (also referred to as Google Gen AI SDK), timeouts are configured using the HttpOptions class from google.genai.types. Timeouts are specified in milliseconds. Client-level timeouts: Pass http_options to the genai.Client constructor. Example: from google import genai from google.genai import types client = genai.Client( api_key='YOUR_API_KEY', http_options=types.HttpOptions(timeout=30000) # 30 seconds ) Per-request timeouts: Override via the config parameter in methods like generate_content, using GenerateContentConfig with http_options. Example: response = client.models.generate_content( model='gemini-2.0-flash-exp', contents='Tell me a story', config=types.GenerateContentConfig( http_options=types.HttpOptions(timeout=120000) # 2 minutes ) ) Note: Some users have reported issues where client-level timeouts may not always take effect as expected (e.g., GitHub issue #911), in which case per-request configuration or retries via HttpRetryOptions can be used. The SDK converts the millisecond timeout to seconds internally for the underlying httpx client.

Citations:


🏁 Script executed:

cat -n Gradata/src/gradata/enhancements/rule_synthesizer.py | head -50

Repository: Gradata/gradata

Length of output: 2888


🏁 Script executed:

cat -n Gradata/src/gradata/enhancements/rule_synthesizer.py | sed -n '170,250p'

Repository: Gradata/gradata

Length of output: 3292


🏁 Script executed:

# Check if there are other timeout-handling provider functions
rg -A 15 "def _call_anthropic|def _call_openai" Gradata/src/gradata/enhancements/rule_synthesizer.py

Repository: Gradata/gradata

Length of output: 1131


Add timeout support to the Gemini provider to honor the fail-safe contract.

All provider paths accept a timeout parameter, but _call_gemini() (line 211) ignores it. The module's docstring explicitly promises fail-safe behavior including "model timeout" handling (line 9), yet the Gemini client is created without timeout configuration. In contrast, _call_anthropic(), _call_openai(), and _call_http() all pass timeout to their respective clients.

Per the google-genai SDK, timeouts can be set client-level via http_options=types.HttpOptions(timeout=timeout*1000) in the genai.Client() call, or per-request in the GenerateContentConfig. Without this, a slow Gemini response can block longer than SYNTH_TIMEOUT, breaking the fail-safe guarantee.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 200 - 223,
_call_gemini currently ignores the passed timeout which breaks the module's
fail-safe behavior; update the function so the google-genai client is created
with timeout (e.g., pass
http_options=genai_types.HttpOptions(timeout=int(timeout * 1000)) to
genai.Client) or set the timeout on the request/config (e.g., in
GenerateContentConfig), ensuring the timeout value is converted to milliseconds
per the SDK and used when instantiating genai.Client and/or in
GenerateContentConfig to enforce the model timeout.

@Gradata

Gradata commented May 2, 2026

Copy link
Copy Markdown
Owner Author

Replaced by clean rebase — #161 branch had 43 unrelated commits drifted from main. See new PR.

@Gradata Gradata closed this May 2, 2026
Gradata added a commit that referenced this pull request May 2, 2026
…2) (#162)

* fix(cloud/client): push events with watermark cursor + idempotency (Bug 2)

Pairs with gradata-cloud PR #12. Was Bug 2 from /tmp/audit-bug2-watermark.md.

- client.sync() now reads events.jsonl, filters by last_sync_at watermark,
  batches 500 at a time, advances cursor on 200, retries with smaller batch on 413.
- Sync state at <BRAIN_DIR>/.gradata-sync-state.json (separate from events.jsonl
  which stays append-only and untouched).
- 9/9 new tests pass in tests/test_cloud_client_sync.py.

Council perspective P3 (Skeptic) had this take after audit-gate blocked the
aggregate-only path — 3 cloud routes (analytics.py, activity.py, corrections.py)
read raw events directly, so telemetry-only would have flatlined them.

* feat(scripts): add backfill_to_cloud.py for Bug 2 history rescue

One-shot: counts events.jsonl, resets local sync state, calls client.sync()
in a loop until cursor catches up. Idempotent — server upserts on
(brain_id, event_id). Run after PRs #11/#12/#161 merge to backfill the
~5800 historical events the broken sync silently dropped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant