bench: PMR-100 procedural memory retention benchmark#148
Conversation
Local SQLite and cloud Supabase schemas diverged (wide `tenant_id` + `data_json` vs narrow `brain_id` + `data` jsonb, plus table rename `correction_patterns` -> `corrections`). Added `_transform_row` per-table mapper with deterministic uuid5 ids so repeat pushes upsert cleanly. `_scrub` strips NUL bytes and lone UTF-16 surrogates that Postgres JSONB rejects. `_post` dedupes within each batch, honors `_TABLE_REMAP`, and chunks large pushes to avoid PostgREST's opaque "Empty or invalid json" body-limit errors. `GRADATA_SUPABASE_URL` / `GRADATA_SUPABASE_SERVICE_KEY` now work as aliases so one .env serves both backend and SDK. Co-Authored-By: Gradata <noreply@gradata.ai>
…provider synth Phase 1 of the learning-pipeline revamp. Rule graduation now flows through the canonical _graduation.graduate() path (strict > for INSTINCT->PATTERN, >= for PATTERN->RULE) instead of the inline duplicate in rule_pipeline. Injection hook reads a persistent brain_prompt.md gated by an AUTO-GENERATED header, regenerated only at session_close after the pipeline fires. LLM synthesis gets a two-provider path: anthropic SDK (ANTHROPIC_API_KEY) with claude CLI fallback (Max-plan OAuth) so users without an exportable key still get synthesis. Meta-rule deterministic fallback now warns loudly instead of silently discarding. Drops five env-flag gates in favour of file-based signals. Co-Authored-By: Gradata <noreply@gradata.ai>
Adds --cloud / --no-cloud flags to the doctor CLI command and the underlying diagnose() function. Flips the default cloud endpoint to api.gradata.ai/api/v1. Covers new behaviour with test_doctor_cloud.py (all passing). Co-Authored-By: Gradata <noreply@gradata.ai>
Regex coverage was brittle to shorthand: real corrections like
"Why r you not asking" and "Why flag.. we dont skip" slipped the
\bwhy (did|would|are) you\b pattern and never became IMPLICIT_FEEDBACK
events. That silently breaks Gradata's core promise ("learn from any
correction").
Adds:
- negation: dont/cant/shouldnt (no-apostrophe variants), never
- reminder: "again" marker, "dont forget"
- challenge: "why r u", "why not/r/are/is/does", "why word..",
"how come", "you missed/forgot/failed/didnt"
All 8 target phrases now detect. 25 existing implicit-feedback tests
remain green.
Co-Authored-By: Gradata <noreply@gradata.ai>
14 new tests pinning the regex expansion from 5a6da45. Covers real corrections observed this session ("Why r you not asking council", "Why flag.. we don't skip we do work") plus shorthand cases (dont / cant / again / you missed / how come). Dual-signal cases assert both types detect. Full suite: 37 passed, 1 pre-existing skip. Co-Authored-By: Gradata <noreply@gradata.ai>
Five post-launch metrics with precise definitions (activation, D7 retention, time-to-first-graduation, free->Pro conversion, correction-rate decay). Numeric triggers: pivot <20% activation + flat decay at D30; kill <100 installs at D60; scale >1K installs + >=5% conversion at D90. Monday 30-min retro agenda. Source: Card 8 of the pre-launch gap analysis. Co-Authored-By: Gradata <noreply@gradata.ai>
The source-provenance docstring referenced "cloud-side LLM synthesis" which is stale since the graduation-cloud-gate was removed. Synthesis runs on the user's machine via rule_synthesizer.py's two-provider path (Anthropic SDK with user's key, or Claude Code Max CLI OAuth). Co-Authored-By: Gradata <noreply@gradata.ai>
Graduation and meta-rule LLM synthesis run entirely locally as of a few sessions ago (rule_synthesizer.py uses user's own Anthropic key or Claude Code Max CLI OAuth). The Pro-tier inclusion list incorrectly still claimed "cloud runs better graduation engine" and implied a cloud-enhanced sqlite-vec path. Rewrite the inclusion list + philosophy paragraph to match reality: free is functionally complete; Pro is visualization, history, export, and the future community corpus. NOTE: this file is listed in .gitignore per the earlier "untrack private files" cleanup. Force-added at request. Co-Authored-By: Gradata <noreply@gradata.ai>
Test was checking the pre-transform local key name. _cloud_sync._transform_row correctly emits brain_id (cloud schema) from tenant_id (local schema); the assertion was stale. Co-Authored-By: Gradata <noreply@gradata.ai>
Previously nothing wrote to lesson_applications — the table existed
(onboard.py), was size-checked (_validator.py), and synced to cloud
(_cloud_sync.py), but no code ever inserted a row. The compound-quality
story had no evidence: rules claimed to fire with no receipt.
Now:
- inject_brain_rules writes one PENDING row per injected rule (cluster
members included), storing {category, description, task} in context so
session_close can attribute outcomes back to specific rules.
- session_close resolves PENDING rows at end-of-waterfall:
REJECTED if any CORRECTION/IMPLICIT_FEEDBACK/RULE_FAILURE in the
session shares the lesson's category (or description substring).
CONFIRMED otherwise (rule survived the session).
Both paths are best-effort — DB missing, schema drift, or IO errors
degrade silently rather than blocking injection or session close.
Unblocks the Card 6 MVP day-14 metric: "did a graduated rule actually
fire and survive?" — the answer now has a row-level audit trail.
Co-Authored-By: Gradata <noreply@gradata.ai>
Sweeps the remaining docs that still claimed cloud gated any part of the learning loop. Actual architecture (as of the graduation-local pivot): Local SDK owns: correction capture, graduation, meta-rule clustering AND LLM-synthesis (via user's Anthropic key or Claude Code Max OAuth), rule-to-hook promotion, manifest computation. Cloud owns: dashboard/visualization, cross-device sync, team brains, managed backups, future opt-in corpus donation. Files touched: - docs/cloud/overview.md — capability matrix, architecture diagram, use-when guidance. - docs/architecture/cloud-monolith-v2.md — cloud-side workload framing. - docs/architecture/multi-tenant-future-proofing.md — proprietary boundary, verification flow. - docs/concepts/meta-rules.md — synthesis is local, not cloud-gated. - docs/cloud/dashboard.md — dashboard visualizes local output, does not re-synthesize. README.md was already accurate; no changes there. Co-Authored-By: Gradata <noreply@gradata.ai>
Silent-failure-hunter CRITICAL-1:
- inject_brain_rules: wrap lesson_applications connection in try/finally
and escalate OperationalError to warning (missing-table surfaces).
Silent-failure-hunter CRITICAL-2:
- _cloud_sync.push: per-row try/except on _transform_row so one bad row
no longer propagates and kills the whole push batch.
Leak scan blockers:
- Delete docs/pre-launch-plan.md and docs/gradata-marketing-strategy.md
from the public repo; add both to .gitignore. These contain kill
triggers, pricing, and PII that belong in the private brain vault only.
Code-reviewer BLOCKER-3:
- _doctor._check_vector_store returns status="ok" with FTS5 detail in
the detail field, restoring the documented status vocabulary
({ok, warn, fail, skip, missing, error}).
Test-coverage gaps:
- Add tests/test_rule_synthesizer.py — both providers absent, empty
input, cache hit, CLI fallback on SDK raise, malformed output.
- Add IMPLICIT_FEEDBACK → REJECTED integration test to
test_lesson_applications.py.
Verification: full suite 3802 pass, 22 skip, 2 xfailed.
Gradata is fully local-first now. Cloud-gate stubs and "requires cloud" skip markers were legacy artifacts from an earlier architecture where discovery/synthesis lived server-side. This commit finishes the port: - meta_rules.discover_meta_rules + merge_into_meta run locally: category grouping + greedy semantic-similarity clustering, zombie filter on RULE-state lessons below 0.90, decay after 20 sessions, count/(count+3) confidence smoothing. - Drop @_requires_cloud markers from test_bug_fixes, test_llm_synthesizer, test_meta_rule_generalization, test_multi_brain_simulation, test_pipeline_e2e. These tests now exercise the local impl directly. - Retire the api_key-kwarg-on-merge_into_meta path (session-close rule_synthesizer drives LLM distillation now). - Update fixtures to realistic prose so they survive the noise filter that rejects "cut:/added:" edit-distance summaries. - Bump test_meta_rules confidence assertion to the smoothed formula. - Add docs/LEGACY_CLEANUP.md tracking the remaining cloud-gate vestiges (deprecated adapter shims, cloud docs, stale module docstrings). Suite: 3809 passed, 14 skipped, 2 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>
…xtures
discover_meta_rules is implemented now (local-first). The
if not metas: pytest.skip('discover_meta_rules not yet implemented')
guards were vestiges from the cloud-only era — convert to real asserts.
Also bump 0.88-confidence RULE-state fixtures to 0.90 so they survive
the zombie filter (RULE at <0.90 is treated as a decayed rule).
Suite: 3813 passed, 10 skipped, 2 xfailed.
Remaining skips are all legit:
- test_file_lock.py (2): Windows vs POSIX platform gates
- test_integration_workflow.py (5): require ANTHROPIC/OPENAI keys, cost money
- test_mem0_adapter.py::test_real_mem0_roundtrip: requires MEM0_API_KEY
- test_meta_rules.py::test_with_real_data: requires GRADATA_LESSONS_PATH env
xfails (2) are tracked for v0.7 reconciliation in test docstring.
Co-Authored-By: Gradata <noreply@gradata.ai>
Found while clearing remaining skipped/xfailed tests: Bug: agent_graduation._update_lesson_confidence had confidence = max(0.0, confidence - MISFIRE_PENALTY) but MISFIRE_PENALTY = -0.15 (negative). Subtracting a negative added confidence on rejection. Test test_rejection_decreases_confidence was xfail'd with 'API drift, reconcile in v0.7' — it was a real bug. Fix: align with canonical _confidence.py usage (confidence + MISFIRE_PENALTY). Other cleanups in the same pass: - test_agent_graduation: drop both xfail markers. test_lesson_graduates_to_pattern was also wrong on its own terms — with ACCEPTANCE_BONUS=0.20 the lesson graduates straight to RULE (stronger than PATTERN). Accept either state. - test_integration_workflow: delete stale module-level skipif guarding 5 tests behind ANTHROPIC/OPENAI keys they never actually use. They only exercise local brain.correct/convergence/efficiency — no network. - test_mem0_adapter: delete test_real_mem0_roundtrip (live-API smoke test already covered by the 20+ fake-client tests in the same file). - test_meta_rules: delete test_with_real_data — dev-time exploration script with zero asserts, requiring GRADATA_LESSONS_PATH env var. Suite: 3820 passed, 3 skipped, 0 xfailed, 0 failed. Remaining 3 skips are test_file_lock.py POSIX paths that require fcntl, which does not exist on Windows. Complementary Windows paths skip on Linux — running on each platform covers all 4. Cannot be eliminated. From 22 skipped + 2 xfailed to 3 skipped + 0 xfailed. Co-Authored-By: Gradata <noreply@gradata.ai>
…ten stale notes Co-Authored-By: Gradata <noreply@gradata.ai>
…ate refresh - agent_graduation: add _extract_output() to handle all Claude Code PostToolUse payload key variants (tool_response/tool_output/tool_result/output/response) so plan-mode agents no longer silently drop output - session_close: add _load_soul_mandatories() (VOICE rules from soul.md injected into brain_prompt.md) and _refresh_loop_state() (regenerates loop-state.md on session close with live DB + lesson counts); raise Stop hook timeout to 90 s - _events: add _redact_payload() (recursive email PII redaction) wired into emit() before any write; raw side-log to events.raw.jsonl (best-effort); redactor failure aborts write (fail closed) Co-Authored-By: Gradata <noreply@gradata.ai>
…e watermarks - _ulid.py: minimal stdlib ULID generator (no external dep); ulid_from_iso() preserves timestamp sort order during historical backfill - device_uuid.py: atomic read-or-create of per-brain dev_<hex> device id; race-safe via O_EXCL temp file + os.replace - 002_add_event_identity: adds event_id/device_id/content_hash/correction_chain_id/ origin_agent columns + indexes to events table; chunked 10k-row backfill that is idempotent and resumes on restart - 003_add_sync_state: creates sync_state table if missing and adds device_id/ last_push_event_id/last_pull_cursor/tenant_id watermark columns + composite indexes - tests: 44 tests covering all migration paths, chunked backfill, idempotency, PII redaction (email), loop-state generation, and session_close functions Co-Authored-By: Gradata <noreply@gradata.ai>
…ts DB Reads ~/.claude/projects/<project-hash>/*.jsonl count as the session number — the actual Anthropic session log — rather than MAX(session) from the Gradata events table. The two diverged (314 vs 367). Falls back to the events DB if the project dir can't be located. Co-Authored-By: Gradata <noreply@gradata.ai>
Previous fix only counted the active project dir (314). Global sum across all project dirs gives 659, matching the actual Anthropic session log total. Falls back to events DB if projects dir missing. Co-Authored-By: Gradata <noreply@gradata.ai>
…oop-state.md (367) Session number was read from loop-state.md (Gradata events DB counter). Now counts .jsonl files across all ~/.claude/projects/ dirs — the real Claude Code session total, same logic as status_line.py. Co-Authored-By: Gradata <noreply@gradata.ai>
Every silent except Exception: pass in the core library layers now emits a _log.debug() so failures surface under GRADATA_LOG=debug without breaking the best-effort semantics. Files touched: brain.py (telemetry guard), context_wrapper.py (apply_brain_rules / context_for fallbacks), _brain_manifest.py + _context_compile.py (added module loggers), _context_packet.py (12 data-loading guards), _manifest_metrics.py (7 DB query guards), _doctor.py (HTTP body read guard + contextlib import), _mine_transcripts.py (SIM108 ternary), hooks/session_close.py (4 x SIM105 OSError guards converted to contextlib.suppress). Co-Authored-By: Gradata <noreply@gradata.ai>
ruff check src/ --fix resolved 8 auto-fixable violations (E, F, I rules). ruff format src/ reformatted 163 files to enforce consistent style. Zero errors remain; 13 pre-existing warnings (optional cloud/framework imports, lazy __all__ patterns) are unchanged. Co-Authored-By: Gradata <noreply@gradata.ai>
Two tests expected s0/s42 but got s659 because _claude_session_count() was walking the real ~/.claude/projects/. Add fake_home fixture so the function returns None and falls back to the events DB as intended. Co-Authored-By: Gradata <noreply@gradata.ai>
…eshold
New Stop hook writes a structured handoff to brain/sessions/handoff-{ts}.md
when context usage exceeds GRADATA_CTX_THRESHOLD (default 65%). inject_brain_rules
surfaces a <watchdog-alert> block at next session start so the LLM knows to
review the handoff and run /compact or /clear.
Also: bracket_confidence() in session_close for cache-key stability; remove
MAX_RULES render cap from inject_brain_rules (overshoot logic was masking gaps);
13 new tests in test_ctx_watchdog, tests in test_rule_synthesizer updated.
Co-Authored-By: Gradata <noreply@gradata.ai>
…ript store + retroactive sweep P1: call_provider() dispatch in rule_synthesizer.py routes by model prefix (claude-* → Anthropic, gpt-*/o1/o3 → OpenAI, gemini-* → Google, http → generic). session_close._refresh_brain_prompt now uses call_provider instead of inline SDK. P2: _bracket_confidence() buckets FSRS floats into 3 stable bands (low/mid/high) so per-tick confidence changes no longer bust the synthesis cache. P3: New _transcript.py (log_turn, load_turns, cleanup_ttl) and _transcript_providers.py (ProviderTranscriptSource + GradataTranscriptSource) form the transcript store layer. _retroactive_sweep() in the waterfall runs implicit_feedback patterns across all session turns (gated on GRADATA_TRANSCRIPT=1). OpenAI, LangChain, CrewAI middleware adapters gain session_id + log_turn() calls. 21 new tests in test_transcript.py. Co-Authored-By: Gradata <noreply@gradata.ai>
…only The global Path.is_file patch in _run_main() caused inject_brain_rules to also read a fake pending_handoff.txt and append a <watchdog-alert> block. Test now extracts content between <brain-rules>...</brain-rules> before counting lines, making it immune to any outer blocks appended to the result. Co-Authored-By: Gradata <noreply@gradata.ai>
- pre_compact.py rewritten: when auto-compact fires with a pending handoff, replaces the compact summary verbatim with handoff content so no lossy LLM summarization occurs. Manual compact falls back to snapshot. Corrects field name from "type" → "trigger" (keeps legacy fallback). - inject_brain_rules._build_watchdog_block() extracted from inline main(): Phase 1 (pre-/clear): consumes pending_handoff.txt, stages content to post_clear_handoff.txt, injects <watchdog-alert> with run-/clear prompt. Phase 2 (post-/clear): consumes post_clear_handoff.txt, injects <session-handoff> into fresh session. Phase 2 takes priority if both exist. - implicit_feedback: return None instead of signal name string to reduce UserPromptSubmit noise. - tests/test_pre_compact.py: 9 tests covering both trigger paths. - tests/test_inject_watchdog_phases.py: 8 tests covering both phases. Co-Authored-By: Gradata <noreply@gradata.ai>
graph_first_check.py (PreToolUse, Glob|Grep): blocks exploratory code searches until the session flag is set. Returns a block decision with the exact ToolSearch call needed to unblock. graph_session_track.py (PostToolUse, ToolSearch): writes a per-session flag file when a ToolSearch query contains "code-review-graph", clearing the block for the rest of the session. inject_brain_rules.py: appends <code-graph-tools> directive to every SessionStart injection, with the mandatory ToolSearch query string. Both hooks registered in ~/.claude/settings.json. Bypass via GRADATA_GRAPH_CHECK=0. 18 tests, smoke-tested end-to-end. Co-Authored-By: Gradata <noreply@gradata.ai>
…tignore cleanup - test_hooks_intelligence.py: implicit_feedback tests now assert result is None and verify IMPLICIT_FEEDBACK event via mock_emit (hook emits, doesn't return) - session_close.py: reorder imports alphabetically (isort) - .gitignore: add graphify temp files, run.log patterns, and /.archive/ personal Claude Code config backups so they never accidentally land in commits Co-Authored-By: Gradata <noreply@gradata.ai>
… migration reference - Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py: move legacy Streamlit dashboard per Phase 4 deprecation plan (gradata.ai web dashboard now covers all panels — /rules, /corrections, /self-healing, /observability) - Gradata/migrations/supabase/: reference copies of cloud migrations 014-016 applied to prod 2026-04-24 (corrections unique, events unique, brains.last_used_at) - Gradata/docs/specs/cloud-sync-and-pricing.md: DRAFT v1 sync architecture + pricing tier spec Co-Authored-By: Gradata <noreply@gradata.ai>
Stale file created by a subagent Bash redirect. Grouped with the existing Windows cmd.exe stdout misparse artifact entries. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
- CHANGELOG.md: add [Unreleased] section covering 18 commits since 2026-04-23 (cloud sync, hooks hardening, Supabase migrations, Streamlit archival, statusline session-count source, implicit_feedback emit-only contract) - migrations/supabase/014,015: wrap constraint adds in DO blocks that check pg_constraint first, making re-runs safe on any DB (prod already had inline UNIQUE _key variants from CREATE TABLE; these migrations added redundant _unique variants, now documented as no-op on existing systems) - migrations/supabase/README.md: document prod constraint state (both _key and _unique present on corrections + events) and drift-cleanup deferred Co-Authored-By: Gradata <noreply@gradata.ai>
Critic audit flagged a silent-drop path: when resolve_brain_dir() returns None (fresh install, CI env, unconfigured brain) the hook detected signals but skipped emit() with no log — every correction became invisible. - hooks/implicit_feedback.py: add debug log in the else branch recording how many signals were detected and of which types, so operators running `GRADATA_LOG_LEVEL=DEBUG` see the breadcrumb. - tests/test_implicit_feedback.py: add TestMainNoBrainDir covering the main() path (previously only _detect_signals was tested) — verifies the debug log fires on detected signals, stays quiet on no-signal input, and short messages don't crash. Co-Authored-By: Gradata <noreply@gradata.ai>
Watermark stalls from 23505 unique-violations were invisible unless a
caller grepped logs: _post() logged everything at WARNING. Now HTTP 409
and any "23505" body are logged at ERROR with a body snippet, and the
last error is persisted to brain_dir/cloud_push_error.json so
'gradata doctor' can surface it ('fail' for constraint violations,
'warn' for other non-2xx). Successful pushes clear the file.
_post() signature is now (accepted, error_info|None); call sites and
the three existing tests patching _post are updated. A _coerce_post_result
shim tolerates legacy int returns from any external patches.
Closes T17 from the overnight backlog (critic finding cycle-2 #1).
Addresses three cycle-3 council findings on commit 492c3dd: 1. Non-atomic write (critic #1, high-severity race). `_record_push_error` now writes to `<name>.tmp` then `os.replace`s into the target. Concurrent readers (doctor + daemon + MCP server) can no longer observe a truncated file that would mask a constraint violation as "error file unreadable". 2. PII leak in persisted error (critic #2). PostgREST 23505 bodies echo conflicting row values in `details`/`hint` fields, and `gradata doctor` prints the file verbatim. New `_scrub_error_body` parses the body as JSON and keeps only `code` + the first 120 chars of `message` (enough for the constraint name). Non-JSON bodies reduce to a length marker. Log messages use the scrubbed form too. 3. Removed the `_coerce_post_result` shim (verifier + critic). Zero tests exercised the bare-int branch it guarded; callers now destructure `_post` returns directly. Tests: +2 (`test_post_error_body_scrubs_row_values`, `test_scrub_error_body_handles_non_json`), 28/28 in the cloud test files pass, 3944 passed / 3 skipped full suite. Ruff + pyright clean. Co-Authored-By: Gradata <noreply@gradata.ai>
When doctor reports on cloud_push_error.json, the detail string now names the brain directory it checked. In multi-brain deployments, push() and doctor() can resolve different brain_dirs silently — surfacing the path lets users spot the divergence instead of chasing phantom "ok" reports. Cycle-3 critic finding #3. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…metry Three bugs kept last_sync_at frozen: - cloud/client.py POSTed /brains/sync (path doesn't exist) -> /sync - cloud/sync.py POSTed /v1/telemetry/metrics -> /api/v1/telemetry/metrics - Stop hook never fired cloud sync because Claude Code doesn't call brain.end_session(). Added cloud_sync_tick() helper in _core.py and new _run_cloud_sync step in session_close.py waterfall. Also elevated silent DEBUG failures to WARNING with HTTP status + exc_info so the next failure mode surfaces in run.log. 3945 tests pass. Co-Authored-By: Gradata <noreply@gradata.ai>
New CLI: gradata skill export <name> [--output-dir DIR] [--description STR]
[--category CAT] [--no-meta]
The bet: Claude Skills' "gotchas" section is exactly what graduated
RULE-tier lessons are -- but generated from real corrections instead of
hand-written. This turns a brain into a portable, shippable Skill folder
with valid YAML frontmatter, category-grouped gotchas, and (when
available) injectable meta-principles.
- new module enhancements/skill_export.py reuses _parse_rules from
rule_export so the RULE-only filter and [hooked] marker stripping
stay consistent across exporters
- auto-generated frontmatter description lists rule categories with
defensive 900-char clip (Anthropic 1024 ceiling)
- name slugified for safe folder name + frontmatter alignment
- description quote-escapes preserve YAML validity
- meta-rule loader degrades gracefully on missing system.db / table
24 new tests; full suite 3969 pass (+24, 0 regressions).
Unblocks M4 items 7 and 9 (self-dev Skill, composition Skill) per
plans/swift-toasting-origami.md.
Co-Authored-By: Gradata <noreply@gradata.ai>
…ignore scratch dirs P0-1: AGENTS.md previously described 'Sprites Work multi-agent TypeScript/Claude Flow framework' which is unrelated to this Python SDK. Council unanimously flagged as credibility-killer for first-time evaluators. Replaced with accurate guidance for AGENTS.md-aware coding agents. P0-3 (partial): added .tmp/, .archive/, sessions/handoff-*.md, /0, /BrainDetail to .gitignore so scratch artifacts stop getting committed. Existing tracked scratch files left in place — destructive removal deferred for user review.
P0-2 from council review: silent except-pass blocks in core SDK swallowed errors during the headline correction-write path. - Added tests/test_no_bare_excepts.py: AST scan that fails CI if any single-statement except-pass returns to src/gradata/, except for documented optional-dep ImportError probes (allowlist). - Converted 57 non-allowlisted handlers to logger.warning(..., exc_info=True). - Added logger = logging.getLogger(__name__) where files lacked one. - Codex (gpt-5.5) generated the patch under workspace-write sandbox; could not self-commit because .git/ lives outside its writable root. This commit lifts the patch into the repo. Verification: - python3 exec of test_no_bare_excepts.py: passes - python3 -m compileall src/gradata: no syntax errors NOT in this commit (planned next): P0-5 atomic rule_graph writes, P0-4 BRAIN_DIR hard-fail, P0-7 import integrity, P0-6 thread-safety lock.
…rity, thread-safety lock P0-5 atomic writes (rule_graph.json): - New src/gradata/_atomic.py: atomic_write_text() (temp file + fsync + os.replace + dir fsync on POSIX) - RuleGraph.save() now uses atomic_write_text — crash mid-write preserves prior state - Test: test_rule_graph_atomic.py::test_rule_graph_save_preserves_prior_state_when_replace_fails P0-4 BRAIN_DIR hard-fail (no more silent data loss): - exceptions.py: new GradataError base + BrainNotConfiguredError(GradataError) - implicit_feedback hook: raises BrainNotConfiguredError instead of swallowing - hooks/_base.py: hook runner no longer suppresses BrainNotConfiguredError - _doctor.py: reports missing BRAIN_DIR as clear brain_dir failure - Test: test_brain_dir_required.py (3 cases) P0-7 package import integrity: - Test: test_import_integrity.py — subprocess: import gradata; from gradata import Brain, Lesson, LessonState - Plus Brain.init() smoke test in tmp_path P0-6 thread-safety lock (Brain documented NOT thread-safe yet ships daemon + mcp_server): - New src/gradata/_brain_lock.py: acquire_brain_lock(brain_dir) context mgr - POSIX: fcntl.flock LOCK_EX|LOCK_NB on BRAIN_DIR/.brain.lock - Windows: msvcrt.locking - In-process duplicate-lock guard - BrainLockedError in exceptions.py - daemon.py: acquires lock at start, releases in cleanup - mcp_server.py: acquires for run_server() duration - Brain itself does NOT auto-acquire (would break tests) - Test: test_brain_lock.py::test_second_mock_daemon_raises_brain_locked CHANGES.md and REPORT.md committed for posterity. Codex (gpt-5.5) generated all four patches; could not self-commit because .git/ is outside its sandbox writable root.
The single benchmark council recommended Gradata ship before launch. 100 scripted sessions, 6 correction classes, recall@1/recall@3 metrics with per-class breakdown. First baseline run (3 sessions, BEHAVIORAL class): 0% rules extracted, 0% recall. This is the work. Track on every PR. Ship at >=70% recall@1 across all classes. Run: python -m bench.pmr_100 [--quick] [-n N]
|
Too many files changed for review. ( |
…atch Fix wrong assumption that apply_brain_rules returns a list of rule objects. It returns a formatted prompt string. Recall scoring now checks whether expected keywords appear in the rendered text. Smoke (10 sessions): still 0% recall — confirms the kernel does not graduate rules from a single correction. Multiple reinforcements needed before lessons file populates. This is by design (FSRS scoring) and is the real launch question: how many reinforcements until rules become callable?
📝 WalkthroughPMR-100 Procedural Memory Retention BenchmarkCore Addition:
New Public API:
Infrastructure & Error Handling:
Cloud & Learning Pipeline:
Documentation:
No breaking changes to existing public APIs; all changes are additive or internal logging/formatting improvements. WalkthroughThis PR implements a comprehensive local-first architecture overhaul for the Gradata SDK, featuring event-sourced cloud sync with schema migrations, local meta-rule discovery and synthesis, brain directory locking, PII-redacted event logging, benchmarking infrastructure, and pervasive exception-logging improvements across the codebase. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
|
There was a problem hiding this comment.
Actionable comments posted: 47
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (20)
Gradata/src/gradata/enhancements/dedup.py (1)
297-299:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake dedup failures observable in exception path
Line 297 catches all exceptions, but Line 298 logs only at debug and drops traceback context. This can mask persistent ingestion/dedup failures. Please either narrow exception types or log at warning with
exc_info=True.Proposed fix
- except Exception as e: - _log.debug("Observation dedup failed: %s", e) + except (sqlite3.Error, OSError, ValueError): + _log.warning("Observation dedup failed", exc_info=True) return FalseAs per coding guidelines, "Never use bare
except: pass; use typed exceptions or at minimumlogger.warning(...)withexc_info=True."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/dedup.py` around lines 297 - 299, The except Exception as e block that currently calls _log.debug("Observation dedup failed: %s", e) should be changed so failures are observable: either catch specific exception types raised by the dedup code (e.g., ValueError, KeyError, or any domain-specific exceptions) instead of broad Exception, or at minimum change the logging call to _log.warning("Observation dedup failed", exc_info=True) so the traceback is recorded; keep the existing return False behavior. Ensure you update the except clause and the _log call in the block that contains the caught Exception and the current debug log.Gradata/src/gradata/_query.py (1)
51-52:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNarrow the migration exception handling to duplicate-column only.
Line 51 currently suppresses all
sqlite3.OperationalError, which can silently mask real DB failures (e.g., locked DB). The inline comment says only duplicate-column should be ignored, so the implementation should match that.Suggested fix
- with contextlib.suppress(sqlite3.OperationalError): - conn.execute("ALTER TABLE brain_fts_content ADD COLUMN tenant_id TEXT") + try: + conn.execute("ALTER TABLE brain_fts_content ADD COLUMN tenant_id TEXT") + except sqlite3.OperationalError as exc: + if "duplicate column name" not in str(exc).lower(): + raiseAs per coding guidelines: "Never use bare
except: pass; use typed exceptions or at minimumlogger.warning(...)withexc_info=True."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_query.py` around lines 51 - 52, The code currently suppresses all sqlite3.OperationalError around the ALTER TABLE call; replace the contextlib.suppress block with an explicit try/except that catches sqlite3.OperationalError as e, inspects str(e).lower() for a duplicate-column indicator (e.g., "duplicate column", "duplicate column name", or the SQLite error code text) and only swallow the exception in that specific case; for any other OperationalError re-raise it (or call processLogger.warning/processLogger.error with exc_info=True before raising) so real DB failures (like locks) are not silently ignored. Keep the ALTER statement and the conn.execute("ALTER TABLE brain_fts_content ADD COLUMN tenant_id TEXT") location intact while changing suppression logic around it.Gradata/src/gradata/enhancements/router_warmstart.py (1)
123-123:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd
exc_info=Trueto capture the traceback.The exception handler logs a warning but doesn't capture the stack trace, making warm-start failures difficult to debug. As per coding guidelines, exception logging should include
exc_info=Trueto preserve traceback information.🔍 Proposed fix to preserve traceback
- _log.warning("Warm-start failed: %s", e) + _log.warning("Warm-start failed: %s", e, exc_info=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/router_warmstart.py` at line 123, The warning log in router_warmstart.py currently calls _log.warning("Warm-start failed: %s", e) without the traceback; update this exception handler to pass exc_info=True to _log.warning so the stack trace is preserved (i.e., change the _log.warning call in the warm-start exception handling block to include exc_info=True while keeping the existing message and exception argument).Gradata/src/gradata/enhancements/reporting.py (2)
349-355:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHealth report can remain falsely
HEALTHYwhen lesson counting fails.If
_count_active_lessons(...)raises, the code logs a warning but does not setreport.healthy = Falseor record an issue, so output can misreport health.Proposed fix
try: brain_dir = Path(db_path).parent if db_path else (Path(ctx.brain_dir) if ctx else None) if brain_dir: report.lessons_active = _count_active_lessons(brain_dir) except Exception: - logger.warning('Suppressed exception in generate_health_report', exc_info=True) + logger.warning("Suppressed exception in generate_health_report", exc_info=True) + report.issues.append("Could not read lessons") + report.healthy = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/reporting.py` around lines 349 - 355, The try/except around calling _count_active_lessons in generate_health_report swallows exceptions and leaves report.healthy untouched, allowing a false "HEALTHY" status; change the except block to mark the report as unhealthy and record the failure: set report.healthy = False and add a descriptive issue to the report (e.g., using report.issues.append(...) or report.add_issue(...) depending on the report API) including the exception message and context, and continue to log the warning with exc_info=True via logger so the error is both recorded in the report and logged.
192-197:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
include_instincts=Truecurrently does not include low-confidence INSTINCT rules.At Line 197,
lesson.confidence >= min_confidencestill gates INSTINCT entries, so low-confidence instincts remain excluded even wheninclude_instinctsis enabled (contradicting the function contract/docstring).Proposed fix
- for lesson in lessons: - state_upper = lesson.state.value.upper() - if state_upper in allowed_states and lesson.confidence >= min_confidence: + for lesson in lessons: + state_upper = lesson.state.value.upper() + meets_confidence = ( + state_upper == "INSTINCT" and include_instincts + ) or lesson.confidence >= min_confidence + if state_upper in allowed_states and meets_confidence: briefing.rules.append( BriefingRule( category=lesson.category,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/reporting.py` around lines 192 - 197, The loop over lessons currently filters INSTINCT entries by lesson.confidence >= min_confidence, so enabling include_instincts doesn't include low-confidence INSTINCTs; update the conditional in the for-loop that uses state_upper, allowed_states and lesson.confidence so that if state_upper == "INSTINCT" and include_instincts is True the lesson is accepted regardless of lesson.confidence, otherwise keep the existing check (e.g. allow when state_upper in allowed_states and lesson.confidence >= min_confidence); modify the condition around the for-loop that references lesson.state.value, state_upper, and lesson.confidence accordingly.Gradata/src/gradata/_export_brain.py (3)
262-273:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFallback manifest should include the same
export.modefield.The success path sets
manifest["export"]["mode"], but the fallback manifest omits it. Keep the schema consistent across both paths to avoid downstream branching surprises.Suggested fix
- "export": {"exported_at": now.isoformat(), "files": [path for path, _ in sanitized]}, + "export": { + "exported_at": now.isoformat(), + "mode": "domain-only" + if domain_only + else ("no-prospects" if not include_prospects else "full"), + "files": [path for path, _ in sanitized], + },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_export_brain.py` around lines 262 - 273, The fallback manifest being built in the failure path (the manifest dict initialized near variables version, domain, sessions, graduated, active, sanitized, now) is missing the same export.mode field present in the success path; update the manifest["export"] entry to include "mode" (matching the success path semantics) alongside "exported_at" and "files" so both paths produce the same schema and avoid downstream branching surprises.
214-238:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
ctxis only partially honored, causing mixed-context exports.From Line 217 through Line 237, metadata and file lists are still resolved via global
_ppaths (read_version,read_domain_name,collect_*,_LESSONS_*,_CARL_*,_QUALITY_RUBRICS) instead of the providedctx. With non-default contexts, this can export the wrong brain/domain data.Based on learnings, “Round 2 addressed ... tenant-scoped fallback ...” and this path-mixing breaks that same tenant/context-scoping principle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_export_brain.py` around lines 214 - 238, The code is using global path helpers and readers (read_version, read_domain_name, read_session_count, count_lessons, collect_domain_files, collect_brain_files, _LESSONS_ARCHIVE, _LESSONS_ACTIVE, _CARL_LOOP, _CARL_GLOBAL, _QUALITY_RUBRICS, build_prospect_map) instead of the provided ctx, causing mixed-context exports; update this block so every metadata read and file collection uses the ctx-scoped equivalents (e.g., call context-aware readers or pass ctx into read_version/read_domain_name/read_session_count/count_lessons/collect_* or resolve lesson/carl/quality paths from ctx) and make build_prospect_map use ctx.prospects_dir (or fall back to _p only when ctx is None) so all resolved files and counts are consistently tenant-scoped.
241-244:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not silently skip unreadable source files in export loop.
Line 243 suppresses all read failures and continues, which can produce incomplete archives without any signal. Log and narrow exception types here.
Suggested fix
- except Exception: - continue + except (OSError, UnicodeDecodeError): + logger.warning( + "Skipping unreadable export source: %s", + source_path, + exc_info=True, + ) + continueAs per coding guidelines, “Never use bare
except: pass; use typed exceptions or at minimumlogger.warning(...)withexc_info=True.”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_export_brain.py` around lines 241 - 244, The loop currently swallows all errors when calling source_path.read_text(encoding="utf-8"); replace the bare except with a typed exception handler (e.g. except (OSError, UnicodeDecodeError) as e:) and log a warning that includes the file path and the exception (use logger.warning(f"Unable to read {source_path}: {e}", exc_info=True)) before continuing; if your module uses a different logger name, use that (e.g. self.logger) so unreadable files are reported instead of silently skipped.Gradata/src/gradata/enhancements/clustering.py (1)
95-100:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard non-dict
scope_jsonpayloads before calling.get().At Line 97,
json.loadsmay return a list/string/number; calling.get()then raisesAttributeErrorand breaks clustering instead of falling back to"global".Proposed fix
if lesson.scope_json: try: scope = json.loads(lesson.scope_json) - return scope.get("domain", "") or "global" + if isinstance(scope, dict): + return scope.get("domain", "") or "global" + logger.warning( + "Invalid scope_json type in cluster_rules._extract_domain: expected object, got %s", + type(scope).__name__, + ) + return "global" except (json.JSONDecodeError, TypeError): logger.warning('Suppressed exception in cluster_rules._extract_domain', exc_info=True) return "global"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/clustering.py` around lines 95 - 100, The current _extract_domain logic calls scope.get(...) after json.loads which can return non-dict types; modify the try block in clustering.py (inside cluster_rules._extract_domain) to validate that scope is a dict before calling .get — if isinstance(scope, dict) use scope.get("domain", "") or "global", otherwise return "global"; keep the existing except (json.JSONDecodeError, TypeError) handler and logger.warning as-is to preserve error logging.Gradata/src/gradata/_context_packet.py (1)
231-240:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
_load_audit_contextshould close SQLite connections infinally.If any statement raises before
conn.close(), the connection remains open.Suggested fix
def _load_audit_context(session: int, ctx: "BrainContext | None" = None) -> dict: result = {"metrics": {}, "outputs": [], "gates": [], "correction_rate": {}} + conn = None try: db = ctx.db_path if ctx else _p.DB_PATH conn = sqlite3.connect(str(db)) conn.row_factory = sqlite3.Row row = conn.execute("SELECT * FROM session_metrics WHERE session = ?", (session,)).fetchone() if row: result["metrics"] = dict(row) - conn.close() except Exception as e: _log.debug("audit metrics query failed (non-fatal): %s", e) + finally: + if conn is not None: + with contextlib.suppress(Exception): + conn.close()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_context_packet.py` around lines 231 - 240, The try/except in _load_audit_context opens a SQLite connection with sqlite3.connect and calls conn.close() only on the normal path, leaving conn open if an exception occurs; move connection cleanup into a finally block (or use a context manager) so that conn.close() is always called: ensure the sqlite3.connect(...) result (conn) is closed in a finally or use "with sqlite3.connect(...)" around the body that sets conn.row_factory, executes the SELECT on session_metrics, sets result["metrics"], and then closes the connection regardless of errors, keeping the existing except _log.debug(...) behavior for non-fatal errors.Gradata/src/gradata/_manifest_helpers.py (1)
56-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnsure DB connections are always closed on error paths.
conn.close()currently executes only on success. Ifexecute()/fetchall()raises, the connection leaks. Wrap connection lifetime infinally(orcontextlib.closing) in both helpers.Suggested fix
def _count_events(ctx: "BrainContext | None" = None) -> dict: result = {"total": 0, "by_type": {}} + conn = None try: db = ctx.db_path if ctx else _p.DB_PATH conn = get_connection(db) rows = conn.execute("SELECT type, COUNT(*) as cnt FROM events GROUP BY type").fetchall() for row in rows: result["by_type"][row["type"]] = row["cnt"] result["total"] += row["cnt"] - conn.close() except Exception: logger.warning('Suppressed exception in _count_events', exc_info=True) + finally: + if conn is not None: + conn.close() return result def _get_tables(ctx: "BrainContext | None" = None) -> list[str]: + conn = None try: db = ctx.db_path if ctx else _p.DB_PATH conn = get_connection(db) rows = conn.execute( "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name" ).fetchall() - conn.close() return [r[0] for r in rows] except Exception: return [] + finally: + if conn is not None: + conn.close()Also applies to: 72-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_manifest_helpers.py` around lines 56 - 68, The DB connection in _count_events (and the other similar helper at the nearby block) can leak if execute()/fetchall() raises because conn.close() is only called on the success path; wrap the connection lifetime so it always closes (e.g., use contextlib.closing(get_connection(db)) as conn: or assign conn = get_connection(db) and call conn.close() in a finally: block) and move the SQL execution inside that safe scope; ensure you update both _count_events and the analogous helper (the function that queries results in the nearby 72-81 block) to always close the connection even on errors.Gradata/src/gradata/contrib/patterns/human_loop.py (1)
494-507:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
contextis ignored inHumanLoopGate.checkLine 501 calls
gate(action)directly, socontextnever affects risk override/entity extraction in this path. That can change gating outcomes unexpectedly.Proposed fix
-def gate( +def gate( action: str, risk: RiskAssessment | None = None, auto_approve_low: bool = True, + context: dict[str, Any] | None = None, ) -> ApprovalRequest | None: @@ return ApprovalRequest( action=action, risk=risk, - preview=preview_action(action, context=None), + preview=preview_action(action, context=context), ) @@ def check( self, action: str, context: dict | None = None, approver: Callable | None = None, ) -> ApprovalResult: """Full gate check: assess risk, request approval if needed.""" - request = gate(action) + risk = assess_risk(action, context) + request = gate(action, risk=risk, context=context)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/contrib/patterns/human_loop.py` around lines 494 - 507, HumanLoopGate.check currently ignores the provided context by calling gate(action) which prevents context-driven risk overrides/entity extraction; update the call to invoke gate with the context (e.g., gate(action, context)) so the gate function receives and can use the context when producing the request, and ensure the rest of the method (returning approver(request) or ApprovalResult) continues to operate on that request.Gradata/src/gradata/contrib/patterns/rag.py (1)
391-406:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTwo-pass retrieval is unreachable due to the early hybrid return
On Line 391, any non-empty
all_resultsreturns at Line 402, so thecfg.two_passbranch at Line 405 never executes. This makestwo_pass=Trueineffective.Proposed fix
- if all_results: + hybrid_result: RetrievalResult | None = None + if all_results: merged = rrf_merge(*all_results, k=cfg.hybrid_rrf_k) graduated = apply_graduation_scoring(merged, cfg) - result = RetrievalResult( + hybrid_result = RetrievalResult( chunks=graduated[:limit], query=query, mode="hybrid", total_candidates=total, ) if _cascade_errors: - result.mode = f"hybrid (warnings: {', '.join(_cascade_errors)})" - return result + hybrid_result.mode = f"hybrid (warnings: {', '.join(_cascade_errors)})" @@ if cfg.two_pass and all_results: ... if pass2_results: return RetrievalResult(...) + + if hybrid_result is not None: + return hybrid_result🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/contrib/patterns/rag.py` around lines 391 - 406, The hybrid-return happens whenever all_results is non-empty, which prevents the subsequent cfg.two_pass branch from ever executing; move or restructure the two-pass logic so cfg.two_pass is evaluated before returning the hybrid RetrievalResult. Concretely, before calling rrf_merge/apply_graduation_scoring and building the RetrievalResult (functions rrf_merge, apply_graduation_scoring, class RetrievalResult and variables all_results, _cascade_errors, limit, total, query), check cfg.two_pass and perform the two-pass expansion (the flat_chunks handling) and update all_results accordingly; only after two_pass processing (or skipping it) perform the merge, graduation scoring, set result.mode (including _cascade_errors) and return the final RetrievalResult.Gradata/src/gradata/_manifest_metrics.py (1)
58-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnsure DB connections are closed on exception paths.
Several functions close
connonly on the success path. If an exception occurs after open (for example after Line 60 or Line 136), the connection can leak.💡 Suggested pattern (apply to each DB block)
+from contextlib import closing + def _correction_rate_trend(ctx: "BrainContext | None" = None, window: int = 10) -> dict | None: """Compare current CRO window to baseline window.""" try: db = ctx.db_path if ctx else _p.DB_PATH - conn = get_connection(db) - max_session, _ = _session_window(conn, window) + with closing(get_connection(db)) as conn: + max_session, _ = _session_window(conn, window) - if max_session < window * 2: - conn.close() - return None + if max_session < window * 2: + return None - def _cro(min_s, max_s): - outputs = ( - conn.execute( + def _cro(min_s, max_s): + outputs = ( + conn.execute( "SELECT COUNT(*) FROM events WHERE type='OUTPUT' AND session BETWEEN ? AND ?", (min_s, max_s), - ).fetchone()[0] - or 0 - ) - corrections = ( - conn.execute( + ).fetchone()[0] + or 0 + ) + corrections = ( + conn.execute( "SELECT COUNT(*) FROM events WHERE type='CORRECTION' AND session BETWEEN ? AND ?", (min_s, max_s), - ).fetchone()[0] - or 0 - ) - return round(corrections / outputs, 4) if outputs > 0 else None + ).fetchone()[0] + or 0 + ) + return round(corrections / outputs, 4) if outputs > 0 else None - current = _cro(max_session - window + 1, max_session) - baseline = _cro(max_session - window * 2 + 1, max_session - window) - conn.close() + current = _cro(max_session - window + 1, max_session) + baseline = _cro(max_session - window * 2 + 1, max_session - window) if current is None or baseline is None: return NoneAlso applies to: 84-87, 102-104, 134-137, 199-221, 307-312, 339-342, 376-379, 390-393, 493-501
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_manifest_metrics.py` around lines 58 - 66, The DB connection created by get_connection(...) (e.g., in the block that calls _session_window and assigns conn) can leak on exception paths; ensure every place that opens a connection (references to get_connection, the blocks around _session_window, and the other listed DB blocks) always closes the connection by using a try/finally (or a context manager wrapper) so conn.close() runs on both success and exception paths; update each block (including the sections that reference max_session, the subsequent query blocks and the ranges noted) to wrap the work in try/finally (or use a connection context) and call conn.close() in finally.Gradata/src/gradata/_migrations/__init__.py (1)
109-110:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLog suppressed base-table migration errors too.
Line 109-110 still swallows
sqlite3.OperationalErrorsilently for_BASE_TABLES; that can hide startup schema failures and make migration debugging hard.As per coding guidelines, "Never use bare `except: pass`; use typed exceptions or at minimum `logger.warning(...)` with `exc_info=True`."Proposed fix
- for sql in _BASE_TABLES: - with contextlib.suppress(sqlite3.OperationalError): - conn.execute(sql) + for sql in _BASE_TABLES: + try: + conn.execute(sql) + except sqlite3.OperationalError: + logger.warning("Suppressed exception in _apply_inline base table setup", exc_info=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_migrations/__init__.py` around lines 109 - 110, The current use of contextlib.suppress(sqlite3.OperationalError) around conn.execute(sql) for _BASE_TABLES silently swallows schema errors; change it to an explicit try/except: wrap conn.execute(sql) in try and catch sqlite3.OperationalError as e, and call the module logger.warning (or appropriate logger) including a clear message, the SQL statement context (sql or _BASE_TABLES) and exc_info=True so the traceback is recorded; keep suppressing only after logging so startup/schema failures are visible for debugging. Ensure you reference the same symbols: conn.execute(sql), sqlite3.OperationalError, and _BASE_TABLES.Gradata/REPORT.md (1)
1-81: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winPlease drop this operational handoff report from the PR.
REPORT.mdreads like a sandbox/session artifact rather than durable product or developer documentation. It adds environment-specific noise and is likely to go stale immediately.Based on learnings: Never commit scratch files (
.tmp/,.archive/,sessions/handoff-*.md, files named0orBrainDetail); these belong in.gitignore.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/REPORT.md` around lines 1 - 81, REPORT.md is a sandbox operational handoff that should not be committed; remove REPORT.md from the PR/diff and any staged changes, and instead add appropriate ignore rules (e.g., patterns for scratch/session files like REPORT.md, sessions/handoff-*.md, .tmp/, .archive/, files named "0" or "BrainDetail") to the repository .gitignore so future ephemeral artifacts are excluded; ensure the commit only contains the intended code changes (e.g., your AST test tests/test_no_bare_excepts.py and the src/gradata edits) and not this report, then re-run the commit/PR without REPORT.md included.Gradata/src/gradata/_core.py (1)
1337-1343:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEnv-only cloud auth now logs a warning on every session close.
This re-parses
~/.gradata/config.tomlforsync_modeeven when the user authenticated purely throughGRADATA_API_KEY. In that caseread_text()raises and we emit a warning+traceback on every session, even though the setup is valid. Guard withconfig_path.is_file()or drop this to DEBUG.Suggested fix
sync_mode = "metrics_only" - try: - cfg = _parse_toml_cloud(config_path) - sync_mode = cfg.get("sync_mode", "metrics_only") - except Exception: - logger.warning('Suppressed exception in _cloud_sync_session', exc_info=True) + if config_path.is_file(): + try: + cfg = _parse_toml_cloud(config_path) + sync_mode = cfg.get("sync_mode", "metrics_only") + except Exception as e: + _log.debug("cloud sync_mode parse failed: %s", e)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/_core.py` around lines 1337 - 1343, The code in _cloud_sync_session re-parses the config file via _parse_toml_cloud even when authentication is from env (GRADATA_API_KEY), causing spurious warnings; modify the block to first check config_path.is_file() before calling _parse_toml_cloud (or alternatively change logger.warning to logger.debug), e.g. guard the try/except with if config_path.is_file(): then call cfg = _parse_toml_cloud(config_path) and set sync_mode = cfg.get("sync_mode", "metrics_only"), leaving the exception handling to log only when the file exists.Gradata/src/gradata/enhancements/meta_rules.py (3)
632-668:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
refresh_meta_rules()still bypasses the new local discovery path.This function still logs "requires Gradata Cloud" and returns only validated existing entries. Any caller that uses the public refresh API will never see the local-first
discover_meta_rules()output added in this PR.Suggested fix
def refresh_meta_rules( lessons: list[Lesson], existing_metas: list[MetaRule], @@ ) -> list[MetaRule]: @@ - _log.info("Meta-rule discovery requires Gradata Cloud") corrections = recent_corrections or [] - # Validate existing meta-rules (invalidation still works locally) - valid: list[MetaRule] = [] + valid: dict[str, MetaRule] = {} for meta in existing_metas: if validate_meta_rule(meta, corrections): meta.last_validated_session = current_session - valid.append(meta) + valid[meta.id] = meta - valid.sort(key=lambda m: m.confidence, reverse=True) - return valid + for meta in discover_meta_rules( + lessons, + min_group_size=min_group_size, + current_session=current_session, + **kwargs, + ): + valid.setdefault(meta.id, meta) + + return sorted(valid.values(), key=lambda m: m.confidence, reverse=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 632 - 668, refresh_meta_rules currently always skips local discovery and only validates existing_metas; call the local discover_meta_rules(lessons, min_group_size=min_group_size, recent_corrections=recent_corrections or [], **kwargs) inside refresh_meta_rules to obtain new_meta_candidates, validate those with validate_meta_rule (same as existing flow), set last_validated_session=current_session on validated discovered metas, merge them with validated existing_metas (deduplicate by unique id or rule signature), sort by confidence and return the combined list; keep the log message but update it to note that local discovery runs and cloud is required only for full discovery if needed.
532-565:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFilter non-injectable meta-rules before formatting.
format_meta_rules_for_prompt()is the prompt-injection path, but it currently formats deterministic meta-rules too. That breaks theINJECTABLE_META_SOURCESinvariant and can reintroduce the known-bad principles whenever callers pass raw discovery results.Suggested fix
if not metas: return "" # Filter by preconditions/anti-conditions if condition_context is not None: metas = [m for m in metas if evaluate_conditions(m, condition_context)] + metas = [m for m in metas if m.source in INJECTABLE_META_SOURCES] if not metas: return ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 532 - 565, format_meta_rules_for_prompt currently includes deterministic/non-injectable meta-rules; before formatting you must filter metas to only those whose source is allowed by INJECTABLE_META_SOURCES. After the existing filtering/ranking steps (e.g., after the context/limit block and before constructing lines = ["## Brain Meta-Rules..."]), add a filter like: metas = [m for m in metas if m.source in INJECTABLE_META_SOURCES] (or the correct attribute that identifies the meta's origin), so only injectable meta-rules are rendered.
763-785:⚠️ Potential issue | 🟠 Major
GRADATA_GEMMA_API_KEYis incompatible withllm_synthesize_rules()due to auth scheme mismatch.
_resolve_llm_credentials()returns Gemma credentials as valid input, but_call_llm_for_synthesis()always POST to/chat/completionswith OpenAI-style bearer auth. Gemma's native API requiresx-goog-api-keyheader and a different endpoint. With onlyGRADATA_GEMMA_API_KEYset, the credential check passes at line 967–970, but the synthesis call fails at the request.Meanwhile,
_try_llm_principle()correctly handles both providers via separate branches: OpenAI viasynthesise_principle_llm()(lines 848–860) and Gemma via_call_gemma_native()(lines 862–865).Either route Gemma credentials through
_call_gemma_native()in_call_llm_for_synthesis(), or restrict_resolve_llm_credentials()to return only OpenAI-compatible credentials for the synthesis path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Gradata/src/gradata/enhancements/meta_rules.py` around lines 763 - 785, _resolve_llm_credentials currently returns Gemma credentials which are incompatible with the OpenAI-style POST and Bearer auth used by _call_llm_for_synthesis; update the synthesis path so Gemma creds are handled by the Gemma-specific branch instead of being passed to _call_llm_for_synthesis. Concretely: in the logic that invokes _call_llm_for_synthesis (the synthesis flow referenced by _try_llm_principle and the credential check at lines ~967–970), detect when GRADATA_GEMMA_API_KEY is present and route the call to _call_gemma_native (or synthesise_principle_llm for OpenAI) rather than calling _call_llm_for_synthesis with gemma credentials; alternatively, change _resolve_llm_credentials to only return OpenAI-compatible key/base/model for the synthesis path and keep Gemma usage confined to _call_gemma_native. Ensure you modify the calling site(s) that currently assume any non-empty credentials are OpenAI-compatible.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0e5f3a02-3cce-40e0-be5b-f5b3337c7df1
⛔ Files ignored due to path filters (1)
.claude/hooks/statusline/sprites-statusline.jsis excluded by!.claude/**
📒 Files selected for processing (263)
.gitignoreGradata/.archive/dashboard_streamlit_deprecated_2026-04-23.pyGradata/.gitignoreGradata/AGENTS.mdGradata/CHANGELOG.mdGradata/CHANGES.mdGradata/REPORT.mdGradata/bench/README.mdGradata/bench/__init__.pyGradata/bench/pmr_100.pyGradata/docs/LEGACY_CLEANUP.mdGradata/docs/architecture/cloud-monolith-v2.mdGradata/docs/architecture/multi-tenant-future-proofing.mdGradata/docs/cloud/dashboard.mdGradata/docs/cloud/overview.mdGradata/docs/concepts/meta-rules.mdGradata/docs/specs/cloud-sync-and-pricing.mdGradata/hooks/hooks.jsonGradata/migrations/supabase/014_corrections_unique.sqlGradata/migrations/supabase/015_events_unique.sqlGradata/migrations/supabase/016_brains_last_used_at.sqlGradata/migrations/supabase/README.mdGradata/skills/core/session-start/SKILL.mdGradata/src/gradata/__init__.pyGradata/src/gradata/_atomic.pyGradata/src/gradata/_brain_lock.pyGradata/src/gradata/_brain_manifest.pyGradata/src/gradata/_cloud_sync.pyGradata/src/gradata/_config.pyGradata/src/gradata/_config_paths.pyGradata/src/gradata/_context_compile.pyGradata/src/gradata/_context_packet.pyGradata/src/gradata/_core.pyGradata/src/gradata/_data_flow_audit.pyGradata/src/gradata/_db.pyGradata/src/gradata/_doctor.pyGradata/src/gradata/_events.pyGradata/src/gradata/_export_brain.pyGradata/src/gradata/_fact_extractor.pyGradata/src/gradata/_file_lock.pyGradata/src/gradata/_http.pyGradata/src/gradata/_installer.pyGradata/src/gradata/_manifest_helpers.pyGradata/src/gradata/_manifest_metrics.pyGradata/src/gradata/_manifest_quality.pyGradata/src/gradata/_migrations/001_add_tenant_id.pyGradata/src/gradata/_migrations/002_add_event_identity.pyGradata/src/gradata/_migrations/003_add_sync_state.pyGradata/src/gradata/_migrations/__init__.pyGradata/src/gradata/_migrations/_runner.pyGradata/src/gradata/_migrations/_ulid.pyGradata/src/gradata/_migrations/device_uuid.pyGradata/src/gradata/_migrations/fill_null_tenant.pyGradata/src/gradata/_migrations/tenant_uuid.pyGradata/src/gradata/_mine_transcripts.pyGradata/src/gradata/_paths.pyGradata/src/gradata/_query.pyGradata/src/gradata/_stats.pyGradata/src/gradata/_tag_taxonomy.pyGradata/src/gradata/_telemetry.pyGradata/src/gradata/_tenant.pyGradata/src/gradata/_text_utils.pyGradata/src/gradata/_transcript.pyGradata/src/gradata/_transcript_providers.pyGradata/src/gradata/_types.pyGradata/src/gradata/_validator.pyGradata/src/gradata/_workers.pyGradata/src/gradata/adapters/mem0.pyGradata/src/gradata/audit.pyGradata/src/gradata/brain.pyGradata/src/gradata/brain_inspection.pyGradata/src/gradata/cli.pyGradata/src/gradata/cloud/client.pyGradata/src/gradata/cloud/sync.pyGradata/src/gradata/context_wrapper.pyGradata/src/gradata/contrib/enhancements/eval_benchmark.pyGradata/src/gradata/contrib/enhancements/install_manifest.pyGradata/src/gradata/contrib/enhancements/quality_gates.pyGradata/src/gradata/contrib/enhancements/truth_protocol.pyGradata/src/gradata/contrib/patterns/__init__.pyGradata/src/gradata/contrib/patterns/agent_modes.pyGradata/src/gradata/contrib/patterns/context_brackets.pyGradata/src/gradata/contrib/patterns/evaluator.pyGradata/src/gradata/contrib/patterns/execute_qualify.pyGradata/src/gradata/contrib/patterns/guardrails.pyGradata/src/gradata/contrib/patterns/human_loop.pyGradata/src/gradata/contrib/patterns/loop_detection.pyGradata/src/gradata/contrib/patterns/mcp.pyGradata/src/gradata/contrib/patterns/memory.pyGradata/src/gradata/contrib/patterns/middleware.pyGradata/src/gradata/contrib/patterns/orchestrator.pyGradata/src/gradata/contrib/patterns/parallel.pyGradata/src/gradata/contrib/patterns/pipeline.pyGradata/src/gradata/contrib/patterns/q_learning_router.pyGradata/src/gradata/contrib/patterns/rag.pyGradata/src/gradata/contrib/patterns/reconciliation.pyGradata/src/gradata/contrib/patterns/reflection.pyGradata/src/gradata/contrib/patterns/sub_agents.pyGradata/src/gradata/contrib/patterns/task_escalation.pyGradata/src/gradata/contrib/patterns/tools.pyGradata/src/gradata/contrib/patterns/tree_of_thoughts.pyGradata/src/gradata/correction_detector.pyGradata/src/gradata/daemon.pyGradata/src/gradata/detection/addition_pattern.pyGradata/src/gradata/enhancements/_sanitize.pyGradata/src/gradata/enhancements/bandits/collaborative_filter.pyGradata/src/gradata/enhancements/bandits/contextual_bandit.pyGradata/src/gradata/enhancements/behavioral_engine.pyGradata/src/gradata/enhancements/causal_chains.pyGradata/src/gradata/enhancements/cluster_manager.pyGradata/src/gradata/enhancements/clustering.pyGradata/src/gradata/enhancements/contradiction_detector.pyGradata/src/gradata/enhancements/dedup.pyGradata/src/gradata/enhancements/diff_engine.pyGradata/src/gradata/enhancements/edit_classifier.pyGradata/src/gradata/enhancements/freshness.pyGradata/src/gradata/enhancements/git_backfill.pyGradata/src/gradata/enhancements/graduation/agent_graduation.pyGradata/src/gradata/enhancements/graduation/judgment_decay.pyGradata/src/gradata/enhancements/graduation/rules_distillation.pyGradata/src/gradata/enhancements/graduation/scoring.pyGradata/src/gradata/enhancements/instruction_cache.pyGradata/src/gradata/enhancements/learning_pipeline.pyGradata/src/gradata/enhancements/lesson_discriminator.pyGradata/src/gradata/enhancements/llm_provider.pyGradata/src/gradata/enhancements/llm_synthesizer.pyGradata/src/gradata/enhancements/memory_taxonomy.pyGradata/src/gradata/enhancements/meta_rules.pyGradata/src/gradata/enhancements/meta_rules_storage.pyGradata/src/gradata/enhancements/metrics.pyGradata/src/gradata/enhancements/observation_hooks.pyGradata/src/gradata/enhancements/pattern_extractor.pyGradata/src/gradata/enhancements/pattern_integration.pyGradata/src/gradata/enhancements/pipeline_rewriter.pyGradata/src/gradata/enhancements/profiling/tone_profile.pyGradata/src/gradata/enhancements/prompt_synthesizer.pyGradata/src/gradata/enhancements/reporting.pyGradata/src/gradata/enhancements/retrieval_fusion.pyGradata/src/gradata/enhancements/router_warmstart.pyGradata/src/gradata/enhancements/rule_canary.pyGradata/src/gradata/enhancements/rule_context_bridge.pyGradata/src/gradata/enhancements/rule_export.pyGradata/src/gradata/enhancements/rule_integrity.pyGradata/src/gradata/enhancements/rule_pipeline.pyGradata/src/gradata/enhancements/rule_synthesizer.pyGradata/src/gradata/enhancements/rule_to_hook.pyGradata/src/gradata/enhancements/rule_verifier.pyGradata/src/gradata/enhancements/scoring/brain_scores.pyGradata/src/gradata/enhancements/scoring/calibration.pyGradata/src/gradata/enhancements/scoring/correction_tracking.pyGradata/src/gradata/enhancements/scoring/failure_detectors.pyGradata/src/gradata/enhancements/scoring/gate_calibration.pyGradata/src/gradata/enhancements/scoring/loop_intelligence.pyGradata/src/gradata/enhancements/scoring/memory_extraction.pyGradata/src/gradata/enhancements/scoring/reports.pyGradata/src/gradata/enhancements/scoring/success_conditions.pyGradata/src/gradata/enhancements/self_improvement/__init__.pyGradata/src/gradata/enhancements/self_improvement/_confidence.pyGradata/src/gradata/enhancements/self_improvement/_graduation.pyGradata/src/gradata/enhancements/similarity.pyGradata/src/gradata/enhancements/skill_export.pyGradata/src/gradata/events_bus.pyGradata/src/gradata/exceptions.pyGradata/src/gradata/graph.pyGradata/src/gradata/hooks/_base.pyGradata/src/gradata/hooks/_generated_runner_core.pyGradata/src/gradata/hooks/_installer.pyGradata/src/gradata/hooks/_profiles.pyGradata/src/gradata/hooks/agent_graduation.pyGradata/src/gradata/hooks/agent_precontext.pyGradata/src/gradata/hooks/auto_correct.pyGradata/src/gradata/hooks/brain_maintain.pyGradata/src/gradata/hooks/claude_code.pyGradata/src/gradata/hooks/client.pyGradata/src/gradata/hooks/config_protection.pyGradata/src/gradata/hooks/config_validate.pyGradata/src/gradata/hooks/context_inject.pyGradata/src/gradata/hooks/ctx_watchdog.pyGradata/src/gradata/hooks/daemon.pyGradata/src/gradata/hooks/dictGradata/src/gradata/hooks/dispatch_post.pyGradata/src/gradata/hooks/duplicate_guard.pyGradata/src/gradata/hooks/generated_runner.pyGradata/src/gradata/hooks/generated_runner_post.pyGradata/src/gradata/hooks/graph_first_check.pyGradata/src/gradata/hooks/graph_session_track.pyGradata/src/gradata/hooks/implicit_feedback.pyGradata/src/gradata/hooks/inject_brain_rules.pyGradata/src/gradata/hooks/jit_inject.pyGradata/src/gradata/hooks/pre_compact.pyGradata/src/gradata/hooks/rule_enforcement.pyGradata/src/gradata/hooks/secret_scan.pyGradata/src/gradata/hooks/self_review.pyGradata/src/gradata/hooks/session_boot.pyGradata/src/gradata/hooks/session_close.pyGradata/src/gradata/hooks/session_persist.pyGradata/src/gradata/hooks/stale_hook_check.pyGradata/src/gradata/hooks/status_line.pyGradata/src/gradata/hooks/telemetry_summary.pyGradata/src/gradata/hooks/tool_failure_emit.pyGradata/src/gradata/hooks/tool_finding_capture.pyGradata/src/gradata/inspection.pyGradata/src/gradata/integrations/anthropic_adapter.pyGradata/src/gradata/integrations/langchain_adapter.pyGradata/src/gradata/integrations/openai_adapter.pyGradata/src/gradata/mcp_server.pyGradata/src/gradata/mcp_tools.pyGradata/src/gradata/middleware/__init__.pyGradata/src/gradata/middleware/_core.pyGradata/src/gradata/middleware/anthropic_adapter.pyGradata/src/gradata/middleware/crewai_adapter.pyGradata/src/gradata/middleware/langchain_adapter.pyGradata/src/gradata/middleware/openai_adapter.pyGradata/src/gradata/notifications.pyGradata/src/gradata/onboard.pyGradata/src/gradata/patterns/__init__.pyGradata/src/gradata/rules/rule_context.pyGradata/src/gradata/rules/rule_engine/__init__.pyGradata/src/gradata/rules/rule_engine/_formatting.pyGradata/src/gradata/rules/rule_engine/_scoring.pyGradata/src/gradata/rules/rule_graph.pyGradata/src/gradata/rules/rule_ranker.pyGradata/src/gradata/rules/scope.pyGradata/src/gradata/safety.pyGradata/src/gradata/security/correction_hash.pyGradata/src/gradata/security/correction_provenance.pyGradata/src/gradata/security/manifest_signing.pyGradata/src/gradata/sidecar/watcher.pyGradata/tests/conftest.pyGradata/tests/test_agent_graduation.pyGradata/tests/test_brain_dir_required.pyGradata/tests/test_brain_lock.pyGradata/tests/test_bug_fixes.pyGradata/tests/test_cloud_row_push.pyGradata/tests/test_cloud_sync.pyGradata/tests/test_cluster_injection.pyGradata/tests/test_ctx_watchdog.pyGradata/tests/test_doctor_cloud.pyGradata/tests/test_emit_pii_redaction.pyGradata/tests/test_graph_enforcement.pyGradata/tests/test_hooks_intelligence.pyGradata/tests/test_hooks_learning.pyGradata/tests/test_implicit_feedback.pyGradata/tests/test_import_integrity.pyGradata/tests/test_inject_watchdog_phases.pyGradata/tests/test_integration_workflow.pyGradata/tests/test_lesson_applications.pyGradata/tests/test_llm_synthesizer.pyGradata/tests/test_mem0_adapter.pyGradata/tests/test_meta_rule_generalization.pyGradata/tests/test_meta_rules.pyGradata/tests/test_migration_002_event_identity.pyGradata/tests/test_migration_003_sync_state.pyGradata/tests/test_multi_brain_simulation.pyGradata/tests/test_no_bare_excepts.pyGradata/tests/test_pipeline_e2e.pyGradata/tests/test_pre_compact.pyGradata/tests/test_rule_graph_atomic.pyGradata/tests/test_rule_pipeline.pyGradata/tests/test_rule_synthesizer.pyGradata/tests/test_session_close_loop_state.pyGradata/tests/test_skill_export.pyGradata/tests/test_transcript.py
| > Guidance for AGENTS.md-aware coding agents (Hermes Agent, Codex, Gemini CLI, OpenCode, etc.) working in this repository. | ||
| > | ||
| > Claude Code reads `CLAUDE.md` instead — keep both files in sync when changing project-wide guidance. | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f -name "CLAUDE.md" -o -name "AGENTS.md"Repository: Gradata/gradata
Length of output: 290
🏁 Script executed:
cat -n Gradata/AGENTS.md | head -200Repository: Gradata/gradata
Length of output: 8948
🏁 Script executed:
test -f Gradata/CLAUDE.md && echo "CLAUDE.md exists" || echo "CLAUDE.md does not exist"Repository: Gradata/gradata
Length of output: 84
🏁 Script executed:
test -f Gradata/CLAUDE.md && cat -n Gradata/CLAUDE.md | head -100Repository: Gradata/gradata
Length of output: 41
🏁 Script executed:
git status Gradata/AGENTS.mdRepository: Gradata/gradata
Length of output: 125
🏁 Script executed:
git log --oneline -1 -- Gradata/AGENTS.mdRepository: Gradata/gradata
Length of output: 141
🏁 Script executed:
sed -n '75,86p; 100,108p; 173,182p' Gradata/AGENTS.md | cat -nRepository: Gradata/gradata
Length of output: 1081
CLAUDE.md does not exist; either create it or remove the reference at lines 3–6.
The file declares that "Claude Code reads CLAUDE.md instead — keep both files in sync when changing project-wide guidance," but CLAUDE.md does not exist in the repository. This creates an unmet dependency. Either:
- Create
CLAUDE.mdwith the repo-wide agent guidance copied from this file, or - Remove the reference to
CLAUDE.mdfrom lines 3–6 if it is not actually used by Claude Code.
Additionally, the code blocks at lines 80 (layer diagram), 103 (lesson state diagram), and 175 (commit format) lack language labels and should specify bash, text, or appropriate alternatives for markdown linting compliance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/AGENTS.md` around lines 3 - 6, The AGENTS.md references a
non-existent CLAUDE.md and has unlabeled code blocks; either create a new
CLAUDE.md containing the same repo-wide agent guidance (copy contents of
AGENTS.md) or remove the “Claude Code reads `CLAUDE.md` instead” reference from
AGENTS.md so there’s no unmet dependency, and update the three code fences (the
layer diagram, the lesson state diagram, and the commit format blocks) to
include explicit language labels (e.g., bash or text) to satisfy markdown
linting; search for the string "CLAUDE.md" and the unlabeled diagram/commit code
blocks in AGENTS.md to apply the fixes.
| ``` | ||
| Layer 2 — Public API brain.py, cli.py, daemon.py, mcp_server.py | ||
| Layer 1 — Enhancements enhancements/*, rules/ | ||
| Layer 0 — Primitives _types.py, _db.py, _events.py, _paths.py, _file_lock.py … | ||
| Layer 0 — Patterns contrib/patterns/* | ||
| ``` |
There was a problem hiding this comment.
Add language tags to the fenced blocks.
markdownlint is already flagging these three fences. Labeling them keeps the new guide lint-clean and avoids adding permanent doc-noise to CI.
Also applies to: 103-107, 175-181
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 80-80: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/AGENTS.md` around lines 80 - 85, The fenced code blocks showing the
layer lists (e.g., the block containing "Layer 2 — Public API brain.py,
cli.py, daemon.py, mcp_server.py" and the other similar blocks later) are
missing language tags; update those fenced blocks to include a language tag such
as ```text (apply the same change to the other two fenced blocks referenced in
the review) so markdownlint stops flagging them and the file remains lint-clean.
| from gradata import Brain, Lesson, LessonState | ||
| from gradata._types import CorrectionType |
There was a problem hiding this comment.
The benchmark never actually drives the advertised CorrectionType classes.
Scenario.correction_class is only copied into the result; run_one_session() never passes it into brain.correct() or any lower-level correction path. On top of that, the corpus comment says “24 scenarios across 6 classes,” but this file currently defines 25 scenarios and only five distinct class labels. That makes the per-class breakdown and the “across all 6 classes” acceptance gate non-actionable.
Also applies to: 78-80, 249-257
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/bench/pmr_100.py` around lines 60 - 61, Scenario.correction_class is
never used by the correction path and the scenario corpus/counts are
inconsistent; update run_one_session() to forward each Scenario.correction_class
into the correction path (either by passing it into brain.correct(...,
correction_class=...) or by constructing Lesson/ LessonState with the
correction_class so downstream correction uses it), ensure you reference
Scenario.correction_class, run_one_session(), brain.correct(), Lesson and
LessonState when making the change, and then fix the scenario definitions so
there are exactly 24 scenarios and six distinct correction class labels (or
update the corpus comment to match) so the per-class breakdown and “across all 6
classes” assertion is accurate.
| # apply_brain_rules returns a formatted string (not a list of rule objects). | ||
| # We score by whether expected keywords appear in the rendered prompt block. | ||
| text_lower = rules_text.lower() | ||
| recall_at_1 = bool(rules_text) and any(kw.lower() in text_lower | ||
| for kw in scenario.expected_keywords) | ||
| # @3 collapses to @1 with a string return — kept for compatibility / future | ||
| # variant where we score per-rule chunks. | ||
| recall_at_3 = recall_at_1 |
There was a problem hiding this comment.
recall@3 is not a real metric in the current scorer.
Setting recall_at_3 = recall_at_1 means this benchmark can never distinguish top-1 retrieval from top-3 retrieval, so the documented ≥85% recall@3 ship gate is unverifiable. Either expose ranked matches from the probing API or parse/scored rule chunks before keeping a separate recall@3 field.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/bench/pmr_100.py` around lines 265 - 272, The scorer currently
collapses recall@3 to recall@1 because apply_brain_rules returns a single
formatted string (rules_text), so change the pipeline to produce ranked rule
matches (or parse into discrete rule chunks) and compute recall@3 from the top-3
ranked items: update apply_brain_rules (or its caller) to return a list/ordered
sequence of rule strings or match objects rather than a single concatenated
string, then replace the current boolean recall_at_1 and recall_at_3 logic with
checks that compute recall_at_1 = any(expected keyword in top-1) and recall_at_3
= any(expected keyword in top-3) using the returned list (reference symbols:
apply_brain_rules, rules_text, recall_at_1, recall_at_3,
scenario.expected_keywords).
| Interpretation: `Brain.correct()` does not synchronously extract rules that | ||
| are returned by `apply_brain_rules()` on the first probe. Either the | ||
| extraction is async/queued (daemon-only), graduation thresholds prevent the | ||
| rule from being callable, or the matching path needs warm-up. **This is the | ||
| work.** Track this number on every PR. Ship when ≥70% recall@1 across all 6 | ||
| correction classes — Anthropic-tier memory bench results. |
There was a problem hiding this comment.
Keep the ship gate aligned with the full acceptance criteria.
This says to ship once recall@1 clears 70%, but the documented benchmark gate also includes recall@3 ≥85% and median session <2s. Leaving those out here will make future PMR-100 runs look acceptable when they are still below the stated bar.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/bench/README.md` around lines 35 - 40, The README currently states
the ship gate as "ship when ≥70% recall@1" but omits the other acceptance
criteria; update the Bench README text so the gate explicitly includes all three
metrics from the benchmark (recall@1 ≥70%, recall@3 ≥85%, and median session
<2s) and reference the relevant checks (e.g., Brain.correct(),
apply_brain_rules()) or the benchmark class names for the six correction classes
to ensure reviewers and automated PMR-100 runs validate all criteria together
rather than only recall@1.
| def _call_gemini( | ||
| model: str, system: str, user_prompt: str, max_tokens: int, timeout: float | ||
| ) -> str | None: | ||
| key = os.environ.get("GOOGLE_API_KEY") or os.environ.get("GEMINI_API_KEY") | ||
| if not key: | ||
| _log.debug("synth: GOOGLE_API_KEY / GEMINI_API_KEY not set") | ||
| return None | ||
| try: | ||
| from google import genai | ||
| from google.genai import types as genai_types | ||
|
|
||
| client = genai.Client(api_key=key) | ||
| config = genai_types.GenerateContentConfig( | ||
| system_instruction=system, | ||
| max_output_tokens=max_tokens, | ||
| ) | ||
| resp = client.models.generate_content( | ||
| model=model, | ||
| contents=user_prompt, | ||
| config=config, | ||
| ) | ||
| text = resp.text | ||
| return text.strip() if text else None | ||
| except Exception as exc: |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the current google-genai Python SDK, what is the correct way to enforce a per-request timeout for client.models.generate_content(...), and which parameter(s) should be used?
💡 Result:
In the current google-genai Python SDK (googleapis/python-genai), the correct way to enforce a per-request timeout for client.models.generate_content(...) is to use the http_options parameter within the config argument. Use types.GenerateContentConfig with http_options=types.HttpOptions(timeout=milliseconds). Example: from google import genai from google.genai import types client = genai.Client(api_key='your-api-key') response = client.models.generate_content( model='gemini-2.5-flash', contents='Why is the sky blue?', config=types.GenerateContentConfig( http_options=types.HttpOptions(timeout=30000) # 30 seconds ) ) The timeout is specified in milliseconds. This applies the timeout specifically to that generate_content request. For a global timeout across all requests from the client, pass http_options to the Client constructor: client = genai.Client( api_key='your-api-key', http_options=types.HttpOptions(timeout=30000) ) Note: There are some reported issues with timeouts not always working as expected in certain scenarios (e.g., long generations or async streaming), but this is the documented and intended method per official SDK docs and examples.
Citations:
- 1: https://googleapis-python-genai-70.mintlify.app/guides/http-options
- 2: https://github.com/googleapis/python-genai/blob/main/google/genai/_api_client.py
- 3: Setting timeout in genai.Client() does not work googleapis/python-genai#911
- 4: https://github.com/googleapis/python-genai/
- 5: https://github.com/googleapis/python-genai/tree/release-please--branches--main
🏁 Script executed:
# First, let's examine the specific file and line range mentioned
cd Gradata && find . -name "rule_synthesizer.py" -type fRepository: Gradata/gradata
Length of output: 106
🏁 Script executed:
# Read the specific lines from the file
head -230 src/gradata/enhancements/rule_synthesizer.py | tail -40Repository: Gradata/gradata
Length of output: 163
🏁 Script executed:
# Read the specific lines from the file with correct path
head -230 Gradata/src/gradata/enhancements/rule_synthesizer.py | tail -40Repository: Gradata/gradata
Length of output: 1298
Apply timeout to Gemini API requests using http_options.
The timeout parameter is accepted but ignored. The google-genai SDK requires passing http_options=types.HttpOptions(timeout=...) to GenerateContentConfig to enforce per-request timeouts. Without this, calls can hang indefinitely and distort benchmarks.
Update the GenerateContentConfig call to include:
http_options=genai_types.HttpOptions(timeout=int(timeout * 1000)) # timeout in milliseconds🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 200 - 223,
The _call_gemini function currently ignores the timeout parameter; update the
GenerateContentConfig invocation inside _call_gemini to pass http_options using
genai_types.HttpOptions with the timeout in milliseconds (e.g.,
http_options=genai_types.HttpOptions(timeout=int(timeout * 1000))), ensuring the
float seconds timeout argument is converted to ms and applied to the Gemini
request.
| cached = _read_cache(brain_dir, cache_key) | ||
| if cached: | ||
| _log.debug("synth cache hit: %s", cache_key) | ||
| return cached |
There was a problem hiding this comment.
Validate cache contents before returning them.
On cache hit, the function returns raw cached text without re-validating <brain-wisdom> shape/length. A truncated or corrupted cache entry can bypass the fail-safe path and be injected as-is.
Suggested fix
cached = _read_cache(brain_dir, cache_key)
if cached:
- _log.debug("synth cache hit: %s", cache_key)
- return cached
+ cached_block = _extract_wisdom_block(cached)
+ if cached_block and len(cached_block) >= 50:
+ _log.debug("synth cache hit: %s", cache_key)
+ return cached_block
+ _log.debug("synth cache invalid, regenerating: %s", cache_key)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/rule_synthesizer.py` around lines 326 - 329,
The cached value returned from _read_cache(brain_dir, cache_key) must be
validated before returning: inspect the cached string (variable cached) to
ensure it contains a well-formed <brain-wisdom> block and expected length/fields
(use the same validation/parsing logic used elsewhere in rule_synthesizer, or
implement a small validator function like validate_brain_wisdom that checks tags
and minimum length); if validation fails, log a warning via
_log.warning/_log.debug, discard the cached entry (do not return it), and fall
through to the normal synthesis/fail-safe path (optionally remove or invalidate
the cache file). Ensure this change is applied where the cache hit is handled so
_read_cache and the calling code (the block using cached, cache_key, brain_dir)
never return corrupted/truncated wisdom.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| _log = logging.getLogger(__name__) |
There was a problem hiding this comment.
Duplicate logger definitions — use _log consistently.
Line 17 introduces logger = logging.getLogger(__name__) but line 20 already defines _log = logging.getLogger(__name__). This creates two identical logger references. Line 745 uses logger.warning(...) while the rest of the file uses _log.debug(...).
Pick one name and use it consistently throughout the module.
🔧 Proposed fix
-logger = logging.getLogger(__name__)
-
-
-_log = logging.getLogger(__name__)
+_log = logging.getLogger(__name__)And at line 745:
- logger.warning('Suppressed exception in _log_outcome', exc_info=True)
+ _log.warning('Suppressed exception in _log_outcome', exc_info=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/rule_to_hook.py` around lines 17 - 20, There
are duplicate logger variables (logger and _log); choose one name and use it
everywhere—prefer using the existing _log variable (defined as _log =
logging.getLogger(__name__)); remove the redundant logger =
logging.getLogger(...) definition and replace all usages of logger (e.g., the
logger.warning(...) call near the current line 745) with _log.warning(...) so
the module consistently uses _log for logging.
| max_row = conn.execute( | ||
| "SELECT COALESCE(MAX(session), 0) FROM events" | ||
| ).fetchone() | ||
| max_row = conn.execute("SELECT COALESCE(MAX(session), 0) FROM events").fetchone() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider breaking this line for improved readability.
This line exceeds the PEP 8 recommended 79-character limit (~97 characters). Multi-line formatting would improve readability, especially for SQL queries.
♻️ Suggested formatting
- max_row = conn.execute("SELECT COALESCE(MAX(session), 0) FROM events").fetchone()
+ max_row = conn.execute(
+ "SELECT COALESCE(MAX(session), 0) FROM events"
+ ).fetchone()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| max_row = conn.execute("SELECT COALESCE(MAX(session), 0) FROM events").fetchone() | |
| max_row = conn.execute( | |
| "SELECT COALESCE(MAX(session), 0) FROM events" | |
| ).fetchone() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/scoring/brain_scores.py` at line 104, The
single long SQL exec line exceeds PEP8 width; refactor the call in the scope
where conn.execute is used (around the variable max_row) by splitting the SQL
into a separate multi-line string or using implicit string
concatenation/parentheses and then pass that variable to conn.execute (e.g.,
query = ("SELECT COALESCE(MAX(session), 0) " "FROM events"); max_row =
conn.execute(query).fetchone()), keeping the variable name max_row and the
conn.execute(...).fetchone() call intact.
| system_health = round( | ||
| min(100.0, (gate_sessions / total_sessions) * 100.0), 1 | ||
| ) | ||
| system_health = round(min(100.0, (gate_sessions / total_sessions) * 100.0), 1) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider breaking this expression across multiple lines.
This line (~95 characters) exceeds PEP 8's 79-character recommendation and contains nested operations that would be clearer when formatted across multiple lines.
♻️ Suggested formatting
- system_health = round(min(100.0, (gate_sessions / total_sessions) * 100.0), 1)
+ system_health = round(
+ min(100.0, (gate_sessions / total_sessions) * 100.0), 1
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| system_health = round(min(100.0, (gate_sessions / total_sessions) * 100.0), 1) | |
| system_health = round( | |
| min(100.0, (gate_sessions / total_sessions) * 100.0), 1 | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/enhancements/scoring/brain_scores.py` at line 123, The
single-line calculation for system_health is too long and hard to read; refactor
the expression by introducing intermediate values or breaking the parenthesized
expression across multiple lines: compute the ratio (gate_sessions /
total_sessions), compute the percentage (ratio * 100.0) and apply min(100.0,
...), then round that result and assign to system_health; target the assignment
to system_health and the operands gate_sessions and total_sessions in your
change (or wrap the existing expression in parentheses with line breaks) to
satisfy PEP8 line-length and improve clarity.
| """ | ||
| Gradata Dashboard — Your AI's fitness tracker. | ||
| =============================================== | ||
| Run: streamlit run C:/Users/olive/SpritesWork/brain/scripts/dashboard.py | ||
| """ | ||
|
|
||
| import json | ||
| import re | ||
| import sqlite3 | ||
| from datetime import datetime | ||
| from pathlib import Path | ||
|
|
||
| import pandas as pd | ||
| import plotly.graph_objects as go | ||
| import streamlit as st | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Config | ||
| # --------------------------------------------------------------------------- | ||
| BRAIN_DIR = Path("C:/Users/olive/SpritesWork/brain") | ||
| DB_PATH = BRAIN_DIR / "system.db" | ||
| EVENTS_PATH = BRAIN_DIR / "events.jsonl" | ||
| LESSONS_PATH = BRAIN_DIR / "lessons.md" | ||
| PROSPECTS_DIR = BRAIN_DIR / "prospects" | ||
| BRIEF_PATH = BRAIN_DIR / "morning-brief.md" | ||
| TASKS_DIR = Path("C:/Users/olive/.claude/scheduled-tasks") |
There was a problem hiding this comment.
Drop this archived local-only file from the PR.
This file is committed under .archive/ and hard-codes one developer's Windows paths/user name (C:/Users/olive/...). That leaks local environment details into the repo and makes the dashboard unusable anywhere else without manual edits.
Based on learnings: Never commit scratch files (.tmp/, .archive/, sessions/handoff-*.md, files named 0 or BrainDetail); these belong in .gitignore.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines 1
- 26, Remove this archived local-only file from the PR and the repository
history: delete Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py
from the commit (or move it out of the repo to a local backup) and ensure no
other committed files contain hard-coded paths like BRAIN_DIR, DB_PATH,
EVENTS_PATH, LESSONS_PATH, PROSPECTS_DIR, BRIEF_PATH or TASKS_DIR that leak a
developer's local Windows path. Add .archive/ and common scratch patterns (e.g.,
.tmp/, sessions/handoff-*.md, files named 0, BrainDetail) to .gitignore so these
files are not committed in future, and run git rm --cached for any
already-tracked instances before committing the updated .gitignore.
| if st.sidebar.button("Refresh"): | ||
| st.cache_resource.clear() | ||
| st.rerun() |
There was a problem hiding this comment.
Refresh does not invalidate the event projections.
The button clears only st.cache_resource(), but the learning panels are backed by @st.cache_data(ttl=60). After clicking Refresh, those views can still show stale events.jsonl data until the TTL expires.
Suggested fix
if st.sidebar.button("Refresh"):
+ st.cache_data.clear()
st.cache_resource.clear()
st.rerun()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/.archive/dashboard_streamlit_deprecated_2026-04-23.py` around lines
212 - 214, The Refresh handler only clears st.cache_resource but not the cached
learning-panel data which uses `@st.cache_data`(ttl=60); update the button
callback (the st.sidebar.button block) to also clear the cached data by calling
st.cache_data.clear() (or the appropriate Streamlit cache_data clearing API)
before calling st.rerun() so event projections backed by `@st.cache_data` are
invalidated immediately.
| "median_session_seconds": round( | ||
| sorted(r.elapsed_seconds for r in results)[len(results) // 2] if results else 0, 3), |
There was a problem hiding this comment.
The reported median session time is wrong for even-sized runs.
For the default 100-session run, this picks the upper middle element instead of the median, so the < 2s median session acceptance check can be materially misreported.
Suggested fix
+ elapsed = sorted(r.elapsed_seconds for r in results)
summary = {
"total_sessions": total,
"rules_extracted_pct": round(100 * extracted / total, 1) if total else 0.0,
"recall_at_1_pct": round(100 * r1 / total, 1) if total else 0.0,
"recall_at_3_pct": round(100 * r3 / total, 1) if total else 0.0,
"errors": len(errors),
"median_session_seconds": round(
- sorted(r.elapsed_seconds for r in results)[len(results) // 2] if results else 0, 3),
+ (
+ elapsed[total // 2]
+ if total % 2 == 1
+ else (elapsed[total // 2 - 1] + elapsed[total // 2]) / 2
+ )
+ if elapsed
+ else 0,
+ 3,
+ ),
"by_class": dict(by_class),
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/bench/pmr_100.py` around lines 342 - 343, The median calculation for
"median_session_seconds" incorrectly picks the upper-middle element for
even-sized "results"; update the computation to compute the true median by
sorting the list of r.elapsed_seconds and if the count is odd take the middle
element, otherwise average the two middle elements (or simply use
statistics.median on the list) and then round to 3 decimals so
"median_session_seconds" reflects the correct median for both even and odd
lengths.
| # Aliases — accept the Supabase-native env var names too, so a single .env | ||
| # works for both the cloud backend service and the SDK push path. | ||
| ENV_URL_ALIAS: Final[str] = "GRADATA_SUPABASE_URL" | ||
| ENV_KEY_ALIAS: Final[str] = "GRADATA_SUPABASE_SERVICE_KEY" | ||
|
|
||
|
|
||
| def _env_url() -> str: | ||
| return os.environ.get(ENV_URL) or os.environ.get(ENV_URL_ALIAS) or "" | ||
|
|
||
|
|
||
| def _env_key() -> str: | ||
| return os.environ.get(ENV_KEY) or os.environ.get(ENV_KEY_ALIAS) or "" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does Supabase's service_role/service key bypass Row Level Security, and is it recommended to expose that key in end-user or distributed client applications?
💡 Result:
Yes, Supabase's service_role key (or the equivalent secret key) bypasses Row Level Security (RLS), granting full access to the database. It is not recommended to expose this key in end-user or distributed client applications, as it should only be used in secure server-side environments.
Citations:
- 1: https://supabase.com/docs/guides/troubleshooting/why-is-my-service-role-key-client-getting-rls-errors-or-not-returning-data-7_1K9z
- 2: https://supabase.com/docs/guides/getting-started/api-keys
- 3: https://supabase.com/docs/guides/database/postgres/row-level-security
- 4: https://supabase.com/docs/guides/api/api-keys
🏁 Script executed:
# Read the full _cloud_sync.py file to understand its purpose
wc -l Gradata/src/gradata/_cloud_sync.pyRepository: Gradata/gradata
Length of output: 98
🏁 Script executed:
# Get file size first, then read relevant sections
head -100 Gradata/src/gradata/_cloud_sync.pyRepository: Gradata/gradata
Length of output: 3426
🏁 Script executed:
# Search for how ENV_KEY and ENV_KEY_ALIAS are documented/used
rg -B 5 -A 5 "ENV_KEY|ENV_URL" Gradata/src/gradata/_cloud_sync.py | head -100Repository: Gradata/gradata
Length of output: 878
🏁 Script executed:
# Look for module docstring and imports to understand the purpose
head -50 Gradata/src/gradata/_cloud_sync.pyRepository: Gradata/gradata
Length of output: 1918
🏁 Script executed:
# Search the codebase for references to _cloud_sync to understand how this module is used
rg "_cloud_sync|from.*_cloud_sync" Gradata/src/gradata --type py -A 2Repository: Gradata/gradata
Length of output: 3552
🏁 Script executed:
# Check if there's a README or docs explaining the architecture
find Gradata -name "README*" -o -name "*.md" | head -10 | xargs grep -l "cloud" 2>/dev/nullRepository: Gradata/gradata
Length of output: 118
Do not accept a Supabase service key in the SDK sync path.
This module is client-side cloud sync that explicitly documents expecting GRADATA_CLOUD_KEY as a "Supabase anon key (RLS-scoped to tenant)" and relies on RLS for tenant isolation. Accepting GRADATA_SUPABASE_SERVICE_KEY as a fallback is a security boundary violation: service keys bypass Row Level Security entirely, converting a tenant-scoped push path into a cross-tenant write path if misconfigured.
Suggested fix
-ENV_KEY_ALIAS: Final[str] = "GRADATA_SUPABASE_SERVICE_KEY"
-
-
def _env_key() -> str:
- return os.environ.get(ENV_KEY) or os.environ.get(ENV_KEY_ALIAS) or ""
+ return os.environ.get(ENV_KEY) or ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/_cloud_sync.py` around lines 48 - 59, The code currently
accepts a Supabase service key fallback via ENV_KEY_ALIAS
(GRADATA_SUPABASE_SERVICE_KEY) which is a security risk; update _env_key to only
return the client/anon key (ENV_KEY, i.e. GRADATA_CLOUD_KEY) and stop reading
ENV_KEY_ALIAS, and remove or stop referencing ENV_KEY_ALIAS (and its declaration
ENV_KEY_ALIAS) so the SDK sync path cannot accept service keys; ensure only
ENV_KEY (GRADATA_CLOUD_KEY) is consulted and fallback behavior is limited to an
empty string if unset.
| transformed = [] | ||
| for r in rows: | ||
| try: | ||
| transformed.append(_transform_row(table, r, tenant_id)) | ||
| except Exception as exc: | ||
| _log.warning("cloud_sync: skipping malformed row in %s: %s", table, exc) | ||
| all_ok = False | ||
| if not transformed: | ||
| continue | ||
| accepted, error = _post(table, transformed) | ||
| pushed[table] = accepted | ||
| if accepted != len(rows): | ||
| if error is not None and last_error is None: | ||
| last_error = {**error, "table": table} | ||
| if accepted != len(transformed): | ||
| all_ok = False | ||
| if pushed and all_ok: | ||
| _mark_push(conn, tenant_id, started) | ||
| _clear_push_error(brain) | ||
| elif last_error is not None: | ||
| _record_push_error(brain, last_error) |
There was a problem hiding this comment.
Transformation failures still create a silent infinite retry loop.
When _transform_row() throws, this path sets all_ok = False and leaves the watermark unchanged, but last_error remains None, so cloud_push_error.json is never written. The same bad row will then block every subsequent push while gradata doctor stays blind to the root cause.
Suggested fix
for r in rows:
try:
transformed.append(_transform_row(table, r, tenant_id))
except Exception as exc:
_log.warning("cloud_sync: skipping malformed row in %s: %s", table, exc)
+ if last_error is None:
+ last_error = {
+ "code": 0,
+ "message": f"transform failed: {type(exc).__name__}",
+ "constraint_violation": False,
+ "table": table,
+ }
all_ok = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/_cloud_sync.py` around lines 516 - 535, When a call to
_transform_row(table, r, tenant_id) raises, add logic in the except block to set
last_error (if not already set) to include the exception details, table name and
identifying row info so the failure is recorded; keep setting all_ok = False and
continue, then let the existing branch call _record_push_error(brain,
last_error) so the bad row is written to cloud_push_error.json for diagnostics
(update the except block that currently only logs to _log.warning to also assign
last_error when missing).
| with sqlite3.connect(db_path) as conn: | ||
| rows = conn.execute( | ||
| "SELECT data_json FROM events WHERE type = 'CORRECTION' AND session = ?", | ||
| (session_number,), | ||
| ).fetchall() |
There was a problem hiding this comment.
Scope the session correction query by tenant_id.
cloud_sync_tick() reads every CORRECTION row for the session number, but earlier fixes in this repo already had to add tenant-scoped event queries to avoid cross-tenant bleed. If a shared or imported DB contains multiple tenants, this will attribute the wrong corrections to the current brain and can leak another tenant's telemetry upstream.
Suggested fix
import json as _json
import sqlite3
from pathlib import Path as _Path
+ from gradata._tenant import tenant_for as _tenant_for
+
bd = _Path(brain_dir)
if not bd.is_dir():
return
+ tenant_id = _tenant_for(bd)
all_lessons: list[Lesson] = []
@@
with sqlite3.connect(db_path) as conn:
rows = conn.execute(
- "SELECT data_json FROM events WHERE type = 'CORRECTION' AND session = ?",
- (session_number,),
+ "SELECT data_json FROM events "
+ "WHERE tenant_id = ? AND type = 'CORRECTION' AND session = ?",
+ (tenant_id, session_number),
).fetchall()Based on learnings: _events.py previously needed a tenant-scoped fallback SELECT for dedup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/_core.py` around lines 1405 - 1409, The SELECT in
cloud_sync_tick reads all CORRECTION rows for a session but is missing tenant
scoping; modify the query in the block that opens sqlite3.connect(db_path)
(inside cloud_sync_tick in _core.py) to include "AND tenant_id = ?" (or the
correct tenant column name used elsewhere) and add the current tenant identifier
to the parameter tuple so the query only returns corrections for the current
tenant/session pair, mirroring the tenant-scoped fallback used in _events.py.
| if self._session_id: | ||
| for d in all_dirs: | ||
| candidate = d / f"{self._session_id}.jsonl" | ||
| if candidate.is_file(): | ||
| return candidate | ||
|
|
||
| # Fallback: most-recently modified JSONL across all project dirs. | ||
| all_jsonls: list[Path] = [] | ||
| for d in all_dirs: | ||
| try: | ||
| all_jsonls.extend(f for f in d.iterdir() if f.suffix == ".jsonl") | ||
| except OSError: | ||
| continue | ||
| return max(all_jsonls, key=lambda p: p.stat().st_mtime) if all_jsonls else None |
There was a problem hiding this comment.
Do not fall back to the newest Claude transcript globally.
If the exact {session_id}.jsonl file is missing, _locate() currently grabs the most recently modified transcript under ~/.claude/projects/. On a machine with multiple active Claude sessions, that can bind another project's conversation to this brain and mine unrelated corrections into local rules. Fail closed here unless the caller can provide an exact session file.
Suggested fix
- # Fallback: most-recently modified JSONL across all project dirs.
- all_jsonls: list[Path] = []
- for d in all_dirs:
- try:
- all_jsonls.extend(f for f in d.iterdir() if f.suffix == ".jsonl")
- except OSError:
- continue
- return max(all_jsonls, key=lambda p: p.stat().st_mtime) if all_jsonls else None
+ return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self._session_id: | |
| for d in all_dirs: | |
| candidate = d / f"{self._session_id}.jsonl" | |
| if candidate.is_file(): | |
| return candidate | |
| # Fallback: most-recently modified JSONL across all project dirs. | |
| all_jsonls: list[Path] = [] | |
| for d in all_dirs: | |
| try: | |
| all_jsonls.extend(f for f in d.iterdir() if f.suffix == ".jsonl") | |
| except OSError: | |
| continue | |
| return max(all_jsonls, key=lambda p: p.stat().st_mtime) if all_jsonls else None | |
| if self._session_id: | |
| for d in all_dirs: | |
| candidate = d / f"{self._session_id}.jsonl" | |
| if candidate.is_file(): | |
| return candidate | |
| return None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Gradata/src/gradata/_transcript_providers.py` around lines 48 - 61, In
_locate(), stop the global "newest JSONL" fallback: if self._session_id is set
but no exact candidate file is found among all_dirs, return None (fail closed)
instead of scanning all_dirs for the most-recently modified JSONL; remove or
disable the all_jsonls/max(...) fallback logic so the method only returns an
exact session match (use symbols _locate, self._session_id, all_dirs) and
otherwise returns None.
|
Superseded by clean rebase rebase/pmr-100-benchmark. |
Why
Council unanimously called this the single most important pre-launch deliverable. Gradata claims to learn how an agent thinks. PMR-100 is the only test that proves it.
What
100 scripted sessions. Each:
24 scenarios across 6
CorrectionTypeclasses (BEHAVIORAL, PREFERENCE, FACTUAL, PROCEDURAL, DOMAIN, plus extras).First baseline
python -m bench.pmr_100 -n 3This is the work. Either correction extraction is async (daemon-only and the daemon isn't running in the test), graduation thresholds prevent the rule from being callable on first probe, or matching needs a warm-up. The benchmark turns this into a measurable, defensible number.
Acceptance criteria for shipping
Track this number on every PR.
Run
Stacked on top of
feat/council-phase-b-fixes(PR #145).